**Jasmin Blanchette Laura Kovács Dirk Pattinson (Eds.)**

# **Automated Reasoning**

**11th International Joint Conference, IJCAR 2022 Haifa, Israel, August 8–10, 2022 Proceedings**

# **Lecture Notes in Artificial Intelligence 13385**

# Subseries of Lecture Notes in Computer Science

Series Editors

Randy Goebel *University of Alberta, Edmonton, Canada*

Wolfgang Wahlster *DFKI, Berlin, Germany*

Zhi-Hua Zhou *Nanjing University, Nanjing, China*

## Founding Editor

Jörg Siekmann *DFKI and Saarland University, Saarbrücken, Germany* More information about this subseries at https://link.springer.com/bookseries/1244

Jasmin Blanchette · Laura Kovács · Dirk Pattinson (Eds.)

# Automated Reasoning

11th International Joint Conference, IJCAR 2022 Haifa, Israel, August 8–10, 2022 Proceedings

*Editors* Jasmin Blanchette Vrije Universiteit Amsterdam Amsterdam, The Netherlands

Dirk Pattinson Australian National University Canberra, ACT, Australia

Laura Kovács Vienna University of Technology Wien, Austria

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-031-10768-9 ISBN 978-3-031-10769-6 (eBook) https://doi.org/10.1007/978-3-031-10769-6

LNCS Sublibrary: SL7 – Artificial Intelligence

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# **Preface**

This volume contains the papers presented at the 11th International Joint Conference on Automated Reasoning (IJCAR 2022) held during August 8–10, 2022, in Haifa, Israel. IJCAR was part of the Federated Logic Conference (FLoC 2022), which took place from July 31 to August 12, 2022, in Haifa.

IJCAR is the premier international joint conference on all aspects of automated reasoning, including foundations, implementations, and applications, comprising several leading conferences and workshops. IJCAR 2022 united the Conference on Automated Deduction (CADE), the International Symposium on Frontiers of Combining Systems (FroCoS), and the International Conference on Automated Reasoning with Analytic Tableaux and Related Methods (TABLEAUX). Previous IJCAR conferences were held in Siena, Italy, in 2001, Cork, Ireland, in 2004, Seattle, USA, in 2006, Sydney, Australia, in 2008, Edinburgh, UK, in 2010, Manchester, UK, in 2012, Vienna, Austria, in 2014, Coimbra, Portugal, in 2016, Oxford, UK, in 2018, and Paris, France, in 2020 (virtual).

There were 85 submissions. Each submission was assigned to at least three Program Committee members and was reviewed in single-blind mode. The committee decided to accept 41 papers: 32 regular papers and nine system descriptions.

The program also included two invited talks, by Elvira Albert and Gilles Dowek, as well as a plenary FLoC talk by Aarti Gupta.

We acknowledge the FLoC sponsors:


We also acknowledge the generous sponsorship of Springer and the Trakhtenbrot family, as well as the invaluable support provided by the EasyChair developers. We finally thank the FLoC 2022 organization team for assisting us with local organization and general conference management.

May 2022 Jasmin Blanchette Laura Kovács Dirk Pattinson

# **Organization**

### **Program Committee**

Nikolaj Bjørner Microsoft, USA Frédéric Blanqui Inria, France Kaustuv Chaudhuri Inria, France Nathan Fulton IBM, USA

Orna Kupferman Hebrew University, Israel Vivek Nigam Huawei ERC, Germany Nicolas Peltier CNRS, LIG, France

Erika Abraham RWTH Aachen University, Germany Carlos Areces Universidad Nacional de Córdoba, Spain Bernhard Beckert Karlsruhe Institute of Technology, Germany Alexander Bentkamp Chinese Academy of Sciences, China Armin Biere University of Freiburg, Germany Jasmin Blanchette (Co-chair) Vrije Universiteit Amsterdam, The Netherlands Maria Paola Bonacina Università degli Studi di Verona, Italy Agata Ciabattoni Vienna University of Technology, Austria Stéphane Demri CNRS, LMF, ENS Paris-Saclay, France Clare Dixon University of Manchester, UK Huimin Dong Sun Yat-sen University, China Katalin Fazekas Vienna University of Technology, Austria Mathias Fleury University of Freiburg, Austria Pascal Fontaine Université de Liège, Belgium Silvio Ghilardi Università degli Studi di Milano, Italy Jürgen Giesl RWTH Aachen University, Germany Rajeev Gore Australian National University, Australia Marijn Heule Carnegie Mellon University, USA Radu Iosif Verimag, CNRS, Université Grenoble Alpes, France Mikolas Janota Czech Technical University in Prague, Czech Republic Moa Johansson Chalmers University of Technology, Sweden Cezary Kaliszyk University of Innsbruck, Austria Laura Kovacs (Co-chair) Vienna University of Technology, Austria Cláudia Nalon University of Brasília, Brazil Tobias Nipkow Technical University of Munich, Germany Jens Otten University of Oslo, Norway Dirk Pattinson (Co-chair) Australian National University, Australia

Andrew Reynolds University of Iowa, USA Philipp Ruemmer Uppsala University, Sweden Renate A. Schmidt University of Manchester, UK Stephan Schulz DHBW Stuttgart, Germany Roberto Sebastiani University of Trento, Italy Lutz Straßburger Inria, France

Yoni Zohar Bar-Ilan University, Israel

## **Additional Reviewers**

László Antal Paolo Baldi Lionel Blatter Brandon Bohrer Marius Bozga Chad Brown Lucas Bueri Guillaume Burel Marcelo Coniglio Riccardo De Masellis Warren Del-Pinto Zafer Esen Michael Färber Sicun Gao Jacques Garrigue Thibault Gauthier

Brigitte Pientka McGill University, Canada Elaine Pimentel University College London, UK André Platzer Carnegie Mellon University, USA Giles Reger Amazon Web Services, USA, and University of Manchester, UK Simon Robillard Université de Montpellier, France Albert Rubio Universidad Complutense de Madrid, Spain Martina Seidl Johannes Kepler University Linz, Austria Viorica Sofronie-Stokkermans University of Koblenz-Landau, Germany Martin Suda Czech Technical University in Prague, Czech Republic Tanel Tammet Tallinn University of Technology, Estonia Sophie Tourret Inria, France, and Max Planck Institute for Informatics, Germany Uwe Waldmann Max Planck Institute for Informatics, Germany Christoph Weidenbach Max Planck Institute for Informatics, Germany Sarah Winkler Free University of Bozen-Bolzano, Italy

> Samir Genaim Alessandro Gianola Raúl Gutiérrez Fajar Haifani Alejandro Hernández-Cerezo Ullrich Hustadt Jan Jakubuv Martin Jonas Michael Kirsten Gereon Kremer Roman Kuznets Jonathan Laurent Chencheng Liang Enrico Lipparini Florin Manea Marco Maratea

Sonia Marin Enrique Martin-Martin Andrea Mazzullo Stephan Merz Antoine Miné Sibylle Möhle Cristian Molinaro Markus Müller-Olm Jasper Nalbach Joel Ouaknine Tobias Paxian Wolfram Pfeifer Andrew Pitts

Amaury Pouly Stanisław Purgał Michael Rawson Giselle Reis Clara Rodríguez-Núñez Daniel Skurt Giuseppe Spallitta Sorin Stratulat Petar Vukmirovi´c Alexander Weigl Richard Zach Anna Zamansky Michal Zawidzki

# **Contents**

#### **Invited Talks**


#### **Calculi and Orderings**




#### **Evolution, Termination, and Decision Problems**



# **Invited Talks**

# **Using Automated Reasoning Techniques for Enhancing the Efficiency and Security of (Ethereum) Smart Contracts**

Elvira Albert1,2(B) , Pablo Gordillo<sup>1</sup> , Alejandro Hern´andez-Cerezo<sup>1</sup> , Clara Rodr´ıguez-N´u˜nez<sup>1</sup> , and Albert Rubio1,2

> <sup>1</sup> Complutense University of Madrid, Madrid, Spain <sup>2</sup> Instituto de Tecnolog´ıa del Conocimiento, Madrid, Spain elvira@fdi.ucm.es

The use of the Ethereum blockchain platform [17] has experienced an enormous growth since its very first transaction back in 2015 and, along with it, the verification and optimization of the programs executed in the blockchain (known as Ethereum smart contracts) have raised considerable interest within the research community. As for any other kind of programs, the main properties of smart contracts are their efficiency and security. However, in the context of the blockchain, these properties acquire even more relevance. As regards efficiency, due to the huge volume of transactions, the cost and response time of the Ethereum blockchain platform have increased notably: the processing capacity of the transactions is limited and it is providing low transaction ratios per minute together with increased costs per transaction. Ethereum is aware of such limitations and it is currently working on solutions to improve scalability with the goal of increasing its capacity. As regards security, due to the public nature and immutability of smart contracts and the fact that their public functions can be executed by any user at any time, programming errors can be exploited by attackers and have a high economic impact [7,13]. Verification is key to ensure the security of smart contract's execution and provide safety guarantees. This talk will present our work on the use of automated reasoning techniques and tools to enhance the security and efficiency [2–4,6] of Ethereum smart contracts along the two directions described below.

Security. Our main focus on security will be to detect and avoid potential reentrancy attacks, one of the best known and exploited vulnerabilities that have caused infamous attacks in the Ethereum ecosystem due to they economic impact [9,11,15]. Reentrancy attacks might occur on programs with callbacks, a mechanism that allows making calls among contracts. Callbacks occur when a method of a contract invokes a method of another contract and the latter, either directly or indirectly, invokes one or more methods of the former before the original method invocation returns. While this mechanism is useful and powerful

c The Author(s) 2022

This work was funded partially by the Ethereum Foundation (Grant FY21-0372), the Spanish MCIU, AEI and FEDER (EU) project RTI2018-094403-B-C31 and by the CM project S2018/TCS-4314 co-funded by EIE Funds of the European Union.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 3–7, 2022. https://doi.org/10.1007/978-3-031-10769-6\_1

in event-driven programming, it has been used to exploit vulnerabilities. Our approach to detect potential reentrancy problems is to ensure that the program meets the Effectively Callback Freeness (ECF) property [10]. ECF guarantees the modularity of a contract in the sense that executions with callbacks cannot result in new states that are not reachable by callback free executions. This implies that the use of callbacks will not lead to unpredicted, potentially dangerous, states. In order to ensure the ECF property, we use commutation and projection of fragments of code [6]. Intuitively, given a function fragment *A* followed by *B* (denoted *A.B*), in case we can receive a callback to some function *f* between these fragments (that is, *A.f.B*), we ensure safety by proving that this execution that contains callbacks is equivalent to a callback free execution: either to *A.B* (projection), *f.A.B* (left-commutation) or *A.B.f* (right-commutation). The use of automated reasoning techniques enables proving this kind of properties. Inspired by the use of SMT solvers to prove redundancy of concurrent executions [1,8,16], we have implemented such checks using state-of-the-art SMT solvers.

The ECF property can be generalized to allow callbacks to introduce new behaviors as long as they are benign, as [5] does by defining the notion of R-ECF. The main difference between ECF and R-ECF is that while ECF checks that the states reached by executions with callbacks are exactly the same as the ones reached by executions that do not contain callbacks, R-ECF checks that they satisfy a relation with respect to the states reached without callbacks. This way, R-ECF is able to recognize and distinguish the benign behaviors introduced by callbacks from the ones that are potentially dangerous, while ECF cannot. The main application of R-ECF is that, from a particular invariant of the program, it allows reducing the problem of verifying the invariant in the presence of callbacks, to the callback-free setting. For example, if we consider the invariant balance ≥ 0 and prove that the contract is R-ECF with respect to the relation balance*cb* ≥ balance*cbf ree* (i.e., the balance reached by executions with callbacks is greater than the one reached without callbacks), then we only need to consider callback free executions in order to prove the preservation of the invariant.

We considered as benchmarks the top-150 contracts based on volume of usage, and studied the modularity of their functions in terms of ECF and R-ECF. A total of 386 of their functions were susceptible to have callbacks, from which 62.7% were verified to be ECF. The R-ECF approach was able to increase the accuracy of the analysis, being able to prove the correctness of an extra 2% of functions [5,6].

Efficiency. The main focus on efficiency will be on optimizing the resource consumption of smart contract executions. On the Ethereum blockchain, the resource consumption is measured in terms of gas, a unit introduced in the system to quantify the computational effort and charge a fee accordingly in order to have a transaction executed. To understand how we can optimize gas, we need to discuss it (and do it) at the level of the Ethereum bytecode. Smart contracts in Ethereum are executed using the Ethereum Virtual Machine (EVM). The EVM is a simple stack-based architecture which uses 256-bit words and has its own repertory of instructions (EVM opcodes). In the EVM, the memory model is split into two different structures: the storage, which is persistent between transactions and expensive to use; and the memory, which does not persist between transactions and is cheaper. Each opcode has a gas cost associated to its execution. Besides, an additional fee must be paid for each byte when the smart contract is deployed. Thus, the resource to be optimized can be either the total amount of gas in a program or its size. Even though both criteria are usually related, there are some situations in which they do not correlate. For instance, pushing a big number in the stack consumes a small amount of gas and increases significantly the bytecode size, whereas obtaining the same value using arithmetic operations is more expensive but involves fewer bytes.

Among all possible techniques to optimize code, we have used the technique known as superoptimization [12]. The main idea of superoptimization is automatically finding an equivalent optimal sequence of instructions to another given loop-free sequence. In order to achieve this goal, we enumerate all possible candidates and determine the best option among them wrt.the optimization criteria. In the context of EVM, there exists several superoptimizers: EBSO [14], SYRUP [3,4] and GASOL [2]. The techniques presented in this work correspond to the ones implemented in GASOL, which are an improvement and extension of the ones in SYRUP. We apply two kinds of automated reasoning techniques to superoptimize Ethereum smart contracts, symbolic execution and Max-SMT as described next.


each instruction using soft constraints. For both criteria, the corresponding set of soft constraints satisfies that an optimal model returned by the solver corresponds to an optimal block for that criteria.

Combining both approaches, we obtain significant savings for both criteria. For a subset of 30 smart contracts, selected among the latest published in Etherscan as of June 21, 2021 and optimized using the compiler solc v0.8.9, GASOL still manages to reduce 0.72% the amount of gas with the gas criteria enabled, and decreases the overall size by 3.28% with the size criteria enabled.

Future work. The current directions for future work include enhancing the performance of the smart contract optimizer in both accuracy and scalability of the process while keeping the efficiency. For the accuracy we are currently working on adding further reasoning on non-stack operations while staying in a quite simple logic. This will allow us to consider a wider set of equivalent blocks and hence increase the savings. Scalability can be threatened when we consider blocks of code of large size. We are investigating different approaches to scale better, including heuristics to partition the blocks in smaller sub-blocks, more efficient SMT encodings, among others. Finally, another direction for future work is to formally prove the correctness of the optimizer, i.e.developing a checker that can formally prove the equivalence of the optimized and the original (Ethereum) bytecode. For this, we are planning to use the Coq proof assistant in which we will develop a checker that, given an original bytecode –that corresponds a block of the control flow graph– and its optimization, it can formally prove their equivalence for any possible execution, and optionally it can generate a soundness proof that can be used as certificate.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **From the Universality of Mathematical Truth to the Interoperability of Proof Systems**

Gilles Dowek(B)

Inria and ENS Paris-Saclay, Paris, France gilles.dowek@ens-paris-saclay.fr

## **1 Yet Another Crisis of the Universality of Mathematical Truth**

The development of computerized proof systems, such as Coq, Matita, Agda, Lean, HOL 4, HOL Light, Isabelle/HOL, Mizar, etc. is a major step forward in the never ending quest of mathematical rigor. But it jeopardizes the universality of mathematical truth [5]: we used to have proofs of Fermat's little theorem, we now have Coq proofs of Fermat's little theorem, Isabelle/HOL proofs of Fermat's little theorem, PVS proofs of Fermat's little theorem, etc. Each proof system: Coq, Isabelle/HOL, PVS, etc. defining its own language for mathematical statements and its own truth conditions for these statements.

This crisis can be compared to previous ones, when mathematicians have disagreed on the truth of some mathematical statements: the discovery of the incommensurability of the diagonal and side of a square, the introduction of infinite series, the non-Euclidean geometries, the discovery of the independence of the axiom of choice, and the emergence of constructivity. All these past crises have been resolved.

### **2 Predicate Logic and Other Logical Frameworks**

One way to resolve a crisis, such as that of non-Euclidean geometries, or that of the axiom of choice, is to view geometry, or set theory, as an axiomatic theory. The judgement that the statement *the sum of the angles in a triangle equals the straight angle* is true evolves to that that it is a consequence of the parallel axiom and of the other axioms of geometry. Thus, the truth conditions must be defined, not for the statements of geometry, but for arbitrary sequents: pairs Γ -A formed with a theory, a set of axioms, Γ and a statement A.

This induces a separation between the definition of the truth conditions of a sequent: the logical framework and the definition of the various geometries as theories in this logical framework. This logical framework, Predicate logic, was made precise by Hilbert and Ackermann [13], in 1928, more than a century after the beginning of the crisis of non-Euclidean geometries. The invention of Predicate Logic was a huge step forward. But Predicate Logic also has some limitations.

To overcome these limitation, it has been modernized in various ways in the last decades. First, λ-Prolog [15] and Isabelle [17] have extended Predicate logic with variable binding function symbols, such as the symbol λ in the term λx x. Then, the λΠ-calculus [12] has permitted to explicitly represent prooftrees, using the so-called Brouwer-Heyting-Kolmogorov algorithmic interpretation of proofs and Curry-de Bruijn-Howard correspondence. In a second stream of research, Deduction modulo theory [4,6] has introduced a distinction between computation and deduction, in such a way that the statement 27 × 37 = 999 computes to 999 = 999, with the algorithm of multiplication, and then to , with the algorithm of natural number comparison. It thus has a trivial proof. A third stream of research has extended classical Predicate logic to an Ecumenical predicate logic [3,9–11,14,18,19] with both constructive and classical logical constants.

These streams of research have merged, to provide a logical framework, the λΠ-calculus modulo theory [2], also called Martin-L¨of's logical framework [16]. This framework permits function symbols to bind variables, it includes an explicit representation for proof-trees, it distinguishes computation from deduction, and it permits to define both constructive and classical logical constants. It is the basis of the language Dedukti, where Simple type theory, Martin-L¨of's type theory, the Calculus of constructions, etc. can easily be expressed.

# **3 The Theory** *U*

The expression in Dedukti of Simple type theory, Simple type theory with polymorphism, Simple type theory with predicate subtyping, the Calculus of constructions, etc. use symbol declarations and computation rules that play the *rˆole* of axioms in Predicate logic. But, just like the various geometries or the various set theories share a lot of axioms and distinguish by a few, these theories share a lot of symbols and rules. This remark leads to defining a large theory, the theory U [1], that contains Simple type theory, Simple type theory with polymorphism, Simple type theory with predicate subtyping, and the Calculus of constructions, etc. as sub-theories.

Many proofs developed in proof processing systems can be expressed in the theory U and depending on the symbols and rules they use they can be translated to more common formulations of the theories implemented in these systems.

For instance, F. Thir´e has expressed a large library of arithmetic, originally developed in Matita, in an sub-theory of the theory <sup>U</sup>, corresponding to Simple type theory with polymorphism and translated these proofs to the language of seven proof systems [20], Y. G´eran has expressed the first book of Euclid's elements originally developed in Coq, in a sub-theory of the theory <sup>U</sup>, corresponding to Predicate logic, and translated these proofs to the language of many proof systems, including predicate logic ones [8], and T. Felicissimo has shown that a large library of proofs originally developed in Matita, including a proof of Bertrand's postulate, could be expressed in predicative type theory and expressed in Agda [7].

# **References**


20. Thir´e, F.: Sharing a library between proof assistants: reaching out to the HOL family. In: Blanqui, F., Reis, G. (eds.) Proceedings of the 13th International Workshop on Logical Frameworks and Meta-Languages, vol. 274, pp. 57–71. EPTCS (2018)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Satisfiability, SMT Solving, and Arithmetic**

# **Flexible Proof Production in an Industrial-Strength SMT Solver**

Haniel Barbosa<sup>1</sup>, Andrew Reynolds<sup>2</sup>, Gereon Kremer<sup>3</sup>, Hanna Lachnitt<sup>3</sup>, Aina Niemetz<sup>3</sup>, Andres N¨otzli<sup>3</sup>, Alex Ozdemir<sup>3</sup>, Mathias Preiner<sup>3</sup>, Arjun Viswanathan<sup>2</sup>, Scott Viteri<sup>3</sup>, Yoni Zohar4(B) , Cesare Tinelli<sup>2</sup>, and Clark Barrett<sup>3</sup>

 Universidade Federal de Minas Gerais, Belo Horizonte, Brazil The University of Iowa, Iowa City, USA Stanford University, Stanford, USA Bar-Ilan University, Ramat Gan, Israel yoni206@gmail.com

**Abstract.** Proof production for SMT solvers is paramount to ensure their correctness independently from implementations, which are often prohibitively difficult to verify. Historically, however, SMT proof production has struggled with performance and coverage issues, resulting in the disabling of many crucial solving techniques and in coarse-grained (and thus hard to check) proofs. We present a flexible proof-production architecture designed to handle the complexity of versatile, industrialstrength SMT solvers and show how we leverage it to produce detailed proofs, including for components previously unsupported by any solver. The architecture allows proofs to be produced modularly, lazily, and with numerous safeguards for correctness. This architecture has been implemented in the state-of-the-art SMT solver cvc5. We evaluate its proofs for SMT-LIB benchmarks and show that the new architecture produces better coverage than previous approaches, has acceptable performance overhead, and supports detailed proofs for most solving components.

## **1 Introduction**

SMT solvers [9] are widely used as backbones of formal methods tools in a variety of applications, often safety-critical ones. These tools rely on the solver's correctness to guarantee the validity of their results such as, for instance, that an access policy does not inadvertently give access to sensitive data [4]. However, SMT solvers, particularly industrial-strength ones, are often extremely complex pieces of engineering. This makes it hard to ensure that implementation issues do not affect results. As the industrial use of SMT solvers increases, it is paramount to be able to convince non-experts of the trustworthiness of their results.

A solution is to decouple confidence from the implementation by coupling results with machine-checkable certificates of their correctness. For SMT solvers,

This work was partially supported by the Office of Naval Research (Contract No. 68335-17-C-0558), a gift from Amazon Web Services, and by NSF-BSF grant numbers 2110397 (NSF) and 2020704 (BSF).

this amounts to providing proofs of unsatisfiability. The main challenges are justifying a combination of theory-specific algorithms while keeping the solver performant and providing enough details to allow *scalable* proof checking, i.e., checking that is fundamentally simpler than solving. Moreover, while proof production is well understood for propositional reasoning and common theories, that is not the case for more expressive theories, such as the theory of strings, or for more advanced solver operations such as formula preprocessing.

We present a new, flexible proof-production architecture for versatile, industrial-strength SMT solvers and discuss its integration into the cvc5 solver [5]. The architecture (Sect. 2) aims to facilitate the implementation effort via modular proof production and internal proof checking, so that more critical components can be enabled when generating proofs. We provide some details on the core proof calculus and how proofs are produced (Sect. 3), in particular how we support eager and lazy proof production with built-in proof reconstruction (Sect. 3.2). This feature is particularly important for substitution and rewriting techniques, facilitating the instrumentation of notoriously challenging functionalities, such as simplification under global assumptions [6, Section 6.1] and string solving [40,46, 48], to produce detailed proofs. Finally, we describe (Sect. 5) how the architecture is leveraged to produce detailed proofs for most of the theory reasoning, critical preprocessing, and underlying SAT solving of cvc5. We evaluate proof production in cvc5 (Sect. 6) by measuring the proof overhead and the proof quality over an extensive set of benchmarks from SMT-LIB [8].

In summary, *our contributions* are a flexible proof-producing architecture for state-of-the-art SMT solvers, its implementation in cvc5, the production of detailed proofs for simplification under global assumptions and the full theory of strings, and initial experimental evidence that proof-production overhead is acceptable and detailed proofs can be generated for a majority of the problems.

**Preliminaries.** We assume the usual notions and terminology of many-sorted first-order logic with equality (≈) [29]. We consider signatures Σ all containing the distinguished Boolean sort Bool. We adopt the usual definitions of well-sorted Σ-terms, with literals and formulas as terms of sort Bool, and Σ-interpretations. <sup>A</sup> Σ*-theory* is a pair T = (Σ, **<sup>I</sup>**) where **<sup>I</sup>**, the *models* of T, is a class of Σinterpretations closed under variable reassignment. A Σ-formula ϕ is *T-valid* (resp., *T-unsatisfiable*) if it is satisfied by all (resp., no) interpretations in **<sup>I</sup>**. Two Σ-terms s and t of the same sort are *T-equivalent* if s <sup>≈</sup> t is *T*-valid. We write a to denote a tuple (a<sup>1</sup>, ..., an) of elements, with <sup>n</sup> <sup>≥</sup> 0. Depending on context, we will abuse this notation and also denote the set of the tuple's elements or, in case of formulas, their conjunction. Similarly, for term tuples s,t of the same length and sort, we will write s <sup>≈</sup> t to denote the conjunction of equalities between their respective elements.

#### **2 Proof-Production Architecture**

Our proof-production architecture is intertwined with the CDCL(T ) architecture [43], as shown in Fig. 1. Proofs are produced and stored modularly by each solving component, which also checks they meet the expected proof structure

**Fig. 1.** Flexible proof-production architecture for CDCL(T )-based SMT solvers. In the above, <sup>ψ</sup>*<sup>i</sup>* ∈ {φ, L } for each <sup>i</sup>, with <sup>ψ</sup>*<sup>i</sup>* not necessarily distinct from <sup>ψ</sup>*<sup>i</sup>*+1.

for that component, as described below. Proofs are combined only when needed, via post-processing. The *pre-processor* receives an input formula ϕ and simplifies it in a variety of ways into formulas <sup>φ</sup>1, ..., φn. For each <sup>φ</sup>i, the pre-processor stores a proof <sup>P</sup> : <sup>ϕ</sup> <sup>→</sup> <sup>φ</sup>i justifying its derivation from <sup>ϕ</sup>.

The *propositional engine* receives the preprocessed formulas, and its *clausifier* converts them into a conjunctive normal form <sup>C</sup><sup>1</sup> ∧···∧ <sup>C</sup>l. A proof <sup>P</sup> : <sup>ψ</sup> <sup>→</sup> <sup>C</sup>i is stored for each clause <sup>C</sup>i, where <sup>ψ</sup> is a preprocessed formula. Note that several clauses may derive from each formula. Corresponding propositional clauses Cp <sup>1</sup> , ..., C<sup>p</sup> l , where first-order atoms are abstracted as Boolean variables, are sent to the SAT solver, which checks their joint satisfiability. The propositional engine enters a loop with the *theory engine*, which considers a set of literals asserted by the SAT solver (corresponding to a model of the propositional clauses) and verifies its satisfiability modulo a *combination of theories* T. If the set is Tunsatisfiable, a lemma L is sent to the propositional engine together with its proof P : L. Note that since lemmas are T-valid, their proofs have no assumptions. The propositional engine stores these proofs and clausifies the lemmas, keeping the respective clausification proofs in the clausifier. The clausified and abstracted lemmas are sent to the SAT solver to block the current model and cause the assertion of a different set of literals, if possible. If no new set is asserted, then all the clauses <sup>C</sup><sup>1</sup>, ..., Cm generated until then are jointly unsatisfiable, and the SAT solver yields a proof <sup>P</sup> : <sup>C</sup><sup>1</sup> ∧···∧ <sup>C</sup>m → ⊥. Note that the proof is in terms of the first-order clauses, as are the derivation rules that conclude ⊥ from them. The propositional abstraction does not need to be represented in the proof.

The post-processor of the propositional engine connects the assumptions of the SAT solver proof with the clausifier proofs, building a proof <sup>P</sup> : <sup>φ</sup><sup>1</sup>∧· · ·∧φn <sup>→</sup> <sup>⊥</sup>. Since theory lemmas are T-valid, the resulting proof only has preprocessed formulas as assumptions. The final proof is built by the SMT solver's postprocessor combining this proof with the preprocessing proofs <sup>P</sup> : <sup>ϕ</sup> <sup>→</sup> <sup>φ</sup>i. The resulting proof P : ϕ → ⊥ justifies the T-unsatisfiability of the input formula.

### **3 The Internal Proof Calculus**

In this section, we specify how proofs are represented in the internal calculus of cvc5. We also provide some low-level details on how proofs are constructed and managed in our implementation.

The proof rules of the internal calculus are similar to rules in other calculi for ground first-order formulas, except that they are made a little more operational by optionally having *argument* terms and *side conditions*. Each rule has the form

$$r \xrightarrow{\varphi\_1 \cdots \varphi\_n} r \quad \text{or} \quad \begin{array}{c} \text{or} \\ \text{or} \\ \cdot \end{array} \begin{array}{c} \text{or} \\ \end{array} \begin{array}{c} \begin{array}{c} \varphi\_1 \cdots \varphi\_n \ | \ \, \, t\_1, \ldots, t\_m \\ \psi \\ \cdot \end{array} \text{if } C \\ \cdot \end{array} $$

with *identifier* <sup>r</sup>, *premises* <sup>ϕ</sup>1,...,ϕn, *arguments* <sup>t</sup>1,...,tm, *conclusion* <sup>ψ</sup>, and *side condition* C. The argument terms are used to construct the conclusion from the premises and can be used in the side condition together with the premises.

#### **3.1 Proof Checkers and Proofs**

The semantics of each proof rule r is provided operationally in terms of a *proofrule checker* for r. This is a procedure that takes as input a list of argument terms t and a list of premises ϕ for r. It returns fail if the input is malformed, i.e., it does not match the rule's arguments and premises or does not satisfy the side condition. Otherwise, it returns a conclusion formula ψ expressing the result of applying the rule. All proof rules of the internal calculus have an associated proof-rule checker. We say that a proof rule *proves* a formula ψ, from given arguments and premises, if its checker returns ψ.

cvc5 has an internal proof checker built modularly out of the individual proof-rule checkers. This checker is meant mostly for internal debugging during development, to help guarantee that the constructed proofs are correct. The expectation is that users will rely instead on third-party tools to check the proof certificates emitted by the solver.

A proof object is constructed internally using a data structure that we will describe abstractly here and call a *proof node*. This is a triple (r, N, t) consisting of a rule identifier r; a sequence N of proof nodes, its *children*; and a sequence t of terms, its *arguments*. The relationships between proof nodes and their children induces a directed graph over proof nodes, with edges from proofs nodes to their children. We call a single-root graph rooted at node N <sup>a</sup> *proof*. A proof P is

$$\begin{array}{llll}\texttt{refl }\frac{\cdot\mid\ \cdot\mid\ t}{t\approx s} & \texttt{trans}\,\frac{\tau\approx s\quad s\approx t}{r\approx t} & \texttt{con}\,\frac{\vec{s}\approx\vec{t}\mid\ f}{f(\vec{s})}\approx f(\vec{t})\text{ is well sorted} \\\\ \texttt{symm}\,\frac{s\approx t}{t\approx s} & \texttt{sr}\,\frac{\varphi\quad\vec{\varphi}\mid\ \mathcal{S},\mathcal{R},\mathcal{D},\psi}{\psi}\text{ if }\mathcal{S}(\varphi,\mathcal{D}(\vec{\varphi}))\uparrow\downarrow\_{\mathcal{R}} = \mathcal{S}(\psi,\mathcal{D}(\vec{\varphi}))\uparrow\downarrow\_{\mathcal{R}} \\\\ \texttt{eq.res}\,\frac{\varphi\quad\varphi\approx\psi}{\psi} & \texttt{atom.rewrite}\,\frac{\bot\mid\ \mathcal{R},s}{s\approx t} \text{ if }s\downarrow\_{\mathcal{R}}=t & \texttt{witness}\,\frac{\bot\mid\ k}{k\approx k\uparrow} \\\\ & \texttt{assume}\,\frac{\bot\mid\ \varphi}{\varphi} & \texttt{score}\,\frac{\varphi\mid\ \varphi\_{1},\ldots,\varphi\_{n}}{\varphi\_{1}\wedge\cdots\wedge\varphi\_{n}\Rightarrow\varphi} \end{array}$$

**Fig. 2.** Core proof rules of the internal calculus.

*well-formed* if it is finite, acyclic, and there is a total mapping Ψ from the nodes of <sup>P</sup> to formulas such that, for each node <sup>N</sup> = (r, (N1,...,Nm), t), <sup>Ψ</sup>(N) is the formula returned by the proof checker for rule r when given premises <sup>Ψ</sup>(N<sup>1</sup>),...,Ψ(Nn) and arguments t. For a well-formed proof <sup>P</sup> with root <sup>N</sup> and mapping Ψ, the *conclusion* of P is the formula Ψ(N); a *subproof* of P is any proof rooted at a descendant of N in P. For convenience, we will identify a well-formed proof with its root node from now on.

#### **3.2 Core Proof Rules**

In total, the internal calculus of cvc5 consists of 155 proof rules,<sup>1</sup> which cover all reasoning performed by the SMT solver, including theory-specific rules, rules for Boolean reasoning, and others. In the remainder of this section, we describe the *core* rules of the internal calculus, which are used throughout the system, and are illustrated in Fig. 2.

**Proof Rules for Equality.** Many theory solvers in cvc5 perform theory-specific reasoning on top of basic equational reasoning. The latter is captured by the proof rules eq res, refl, symm, trans, and cong. The first rule is used to prove a formula ψ from a formula ϕ that was proved equivalent to ψ. The rest are the standard rules for computing the congruence closure of a set of term equalities.

**Proof Rules for Rewriting, Substitution and Witness Forms.** A single *coarse-grained* rule, sr, is used for tracking justifications for core utilities in the SMT solver such as *rewriting* and *substitution*. This rule, together with other non-core rules with side conditions (omitted for brevity), allows the generation of coarse-grained proofs that trust the correctness of complex side conditions. Those conditions involve rewriting and substitution operations performed by cvc5 during solving. More fine-grained proofs can be constructed from coarse-grained ones by justifying the various rewriting and substitution steps in terms of simpler proof rules. This is done with the aid of the equality rules mentioned above and the additional core rules atom rewrite and witness. To describe atom rewrite, witness, and sr, we first need to introduce some definitions and notations.

<sup>1</sup> See https://cvc5.github.io/docs/cvc5-1.0.0/proofs/proof rules.html.

<sup>A</sup> *rewriter* <sup>R</sup> is a function over terms that preserves equivalence in the background theory <sup>T</sup>, i.e., returns a term <sup>t</sup>↓<sup>R</sup> <sup>T</sup>-equivalent to its input <sup>t</sup>. We call <sup>t</sup>↓<sup>R</sup> the *rewritten* form of <sup>t</sup> with respect to <sup>R</sup>. Currently, cvc<sup>5</sup> uses a handful of specialized rewriters for various purposes, such as evaluating constant terms, preprocessing input formulas, and normalizing terms during solving. Each individual rewrite step executed by a rewriter R is justified in fine-grained proofs by an application of the rule atom rewrite, which takes as argument both (an identifier for) <sup>R</sup> and the term s the rewrite was applied to. Note that the rule's soundness requires that the rewrite step be equivalence preserving.

<sup>A</sup> *(term) substitution* <sup>σ</sup> is a finite sequence (t<sup>1</sup> → <sup>s</sup>1,...,tn → <sup>s</sup>n) of oriented pairs of terms of the same sort. A *substitution method* <sup>S</sup> is a function that takes a term r and a substitution σ and returns a new term that is the result of *applying* σ to r, according to some strategy. We write <sup>S</sup>(r, σ) to denote the resulting term. We distinguish three kinds of substitution methods for σ: *simultaneous*, which returns the term obtained by simultaneously replacing every occurrence of term <sup>t</sup>i in <sup>r</sup> with <sup>s</sup>i, for <sup>i</sup> = 1,...,n; *sequential*, which splits <sup>σ</sup> into <sup>n</sup> substitutions (t<sup>1</sup> → <sup>s</sup><sup>1</sup>),...,(tn → <sup>s</sup>n) and applies them in sequence to <sup>r</sup> using the simultaneous strategy above; and *fixed-point*, which, starting with r, repeatedly applies σ with the simultaneous strategy until no further subterm replacements are possible. For example, consider the application <sup>S</sup>(y,(x → u, y → f(z), z → g(x))). The steps the substitution method takes in computing its result are the following: y f(z) if <sup>S</sup> is simultaneous; y f(z) f(g(x)) if <sup>S</sup> is sequential; y f(z) f(g(x)) f(g(u)) if <sup>S</sup> is fixed-point.

In cvc5, we use a *substitution derivation method* <sup>D</sup> to derive a *contextual* substitution (t<sup>1</sup> → <sup>s</sup>1,...,tn → <sup>s</sup>n) from a collection ϕ of derived formulas. The substitution essentially orients a selection of term equalities <sup>t</sup>i <sup>≈</sup> <sup>s</sup>i entailed by ϕ and, as such, can be applied soundly to formulas derived from ϕ. <sup>2</sup> We write <sup>D</sup>(ϕ) to denote the substitution computed by <sup>D</sup> from ϕ.

Finally, cvc<sup>5</sup> often introduces fresh variables, or *Skolem* variables, which are implicitly globally existentially quantified. This happens as a consequence of Skolemization of existential variables, lifting of if-then-else terms, and some kinds of flattening. Each Skolem variable k is associated with a term k<sup>↑</sup> of the same sort containing no Skolem variables, called its *witness term*. This global map from Skolem variables to their witness term allows cvc5 to detect when two Skolem variables can be equated, as a consequence of their respective witness terms becoming equivalent in the current context [47]. Witness terms can also be used to eliminate Skolem variables at proof output time. We write t<sup>↑</sup> to denote the *witness form* of term t, which is obtained by replacing every Skolem variable in <sup>t</sup> by its witness term. For example, if <sup>k</sup><sup>1</sup> and <sup>k</sup><sup>2</sup> are Skolem variables with associated witness terms ite(x <sup>≈</sup> z, y, z) and y <sup>−</sup> z, respectively, and ϕ is the formula ite(<sup>x</sup> <sup>≈</sup> <sup>k</sup><sup>2</sup>, k<sup>1</sup> <sup>≈</sup> y, k<sup>1</sup> <sup>≈</sup> <sup>z</sup>), the witness form <sup>ϕ</sup><sup>↑</sup> of <sup>ϕ</sup> is the formula ite(x <sup>≈</sup> y <sup>−</sup> z, ite(x <sup>≈</sup> z, y, z) <sup>≈</sup> y, ite(x <sup>≈</sup> z, y, z) <sup>≈</sup> z). When a Skolem variable k

<sup>2</sup> Observe that substitutions are generated dynamically from the formulas being processed, whereas rewrite rules are hard-coded in cvc5's rewriters.

appears in a proof, the witness proof rule is used to explicitly constrain its value to be the same as that of the term k<sup>↑</sup> it abstracts.<sup>3</sup>

We can now explain the sr proof rule, which is parameterized by a substitution method S, a rewriter R, and substitution derivation method D. The rule is used to transform the proof of a formula ϕ into one of a formula ψ provided that the two formulas are equal up to rewriting under a substitution derived from the premises ϕ. Note that this rule is quite general because its conclusion ψ, which is provided as an argument, can be any formula that satisfies the side condition.

**Proof Rules for Scoped Reasoning.** Two of the core proof rules, assume and scope, enable local reasoning. Together they achieve the effect of the ⇒ introduction rule of Natural Deduction. However, separating the local assumption functionality in assume provides more flexibility. That rule has no premises and introduces a local assumption ϕ provided as an argument. The scope rule is used to *close the scope* of the local assumptions <sup>ϕ</sup>1,...,ϕn made to prove a formula <sup>ϕ</sup>, inferring the formula <sup>ϕ</sup><sup>1</sup> ∧···∧ <sup>ϕ</sup>n <sup>⇒</sup> <sup>ϕ</sup>.

We say that ϕ is a *free assumption* in proof P if P has a node (assume, (), ϕ) that is not a subproof of a scope node with ϕ as one of its arguments. A proof is *closed* if it has no free assumptions, and *open* otherwise.

**Soundness.** All proof rules other than assume are *sound* with respect to the background theory T in the following sense: if a rule proves a formula ψ from premises ϕ, every model of T that satisfies ϕ, and assigns the same values to Skolem variables and their respective witness term, satisfies ψ as well. Based on this and a simple structural induction argument, one can show that well-formed closed proofs have T-valid conclusions. In contrast, open proofs have conclusions that are T-valid only under assumptions. More precisely, in general, if ϕ are all the free assumptions of a well-formed proof P with conclusion ψ and k are all the Skolem variables introduced in P, then k <sup>≈</sup> k↑ ∧ ϕ <sup>⇒</sup> ψ is T-valid.

#### **3.3 Constructing Proof Nodes**

We have implemented a library of *proof generators* that encapsulates common patterns for constructing proof nodes. We assume a method getProof that takes the proof generator g and a formula ϕ as input and returns a proof node with conclusion ϕ based on the information in g. During solving, cvc<sup>5</sup> uses a combination of *eager* and *lazy* proof generation. In general terms, eager proof generation involves constructing proof nodes for inference steps at the time those steps are taken during solving. Eager proof generation may be required if the computation state pertinent to that inference cannot be easily recovered later. In contrast, lazy proof generation occurs for inferred formulas associated with proof generators that can do internal bookkeeping to be able to construct proof nodes for the formula *after* solving is completed. Depending on the formula, different kinds of proof generators are used. For brevity, we only describe in detail (see Sect. 3.2)

<sup>3</sup> The proof rules that account for the introduction of Skolem variables in the first place are not part of the core set and so are not discussed here.

**Algorithm 1 .** Proof generation for term-conversion generators, rewrite-once policy. B is a lazy proof builder, R a map from terms to their converted form, and <sup>c</sup>pre, cpost are sets of pairs of equalities and the proof generators justifying them.

getProof(g, ϕ) where <sup>g</sup> contains <sup>c</sup>pre, <sup>c</sup>post and <sup>ϕ</sup> is <sup>t</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup><sup>2</sup> 1: B := <sup>∅</sup>, R := <sup>∅</sup> 2: getTermConv(t<sup>1</sup>, cpre, cpost,B,R) 3: **if** <sup>R</sup>[t<sup>1</sup>] <sup>=</sup> <sup>t</sup><sup>2</sup> **then** fail **else return** getProof(B, t<sup>1</sup> <sup>≈</sup> <sup>R</sup>[t<sup>1</sup>]) getTermConv(s, cpre, cpost,B,R), where s <sup>=</sup> f(s<sup>1</sup>,...,s*<sup>n</sup>*) 1: **if** s in dom(R) **then return** 2: **if** (s <sup>≈</sup> s- , g- ) <sup>∈</sup> <sup>c</sup>pre for some <sup>s</sup>- , g **then** 3: R[s] := s- , addLazyStep(B, s <sup>≈</sup> s- , g- ) 4: **return** 5: **for** <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> n **do** getTermConv(s*<sup>i</sup>*, cpre, cpost,B,R) 6: R[s] := r, where r <sup>=</sup> f(R[s<sup>1</sup>],...,R[s*<sup>n</sup>*]) 7: **if** <sup>s</sup> <sup>=</sup> <sup>r</sup> **then** addStep(B, cong, (s<sup>1</sup> <sup>≈</sup> <sup>R</sup>[s<sup>1</sup>],...,s*<sup>n</sup>* <sup>≈</sup> <sup>R</sup>[s*<sup>n</sup>*]), f) 8: **else** addStep(B,rfl, (), s <sup>≈</sup> s) 9: **if** (r <sup>≈</sup> r- , g- ) <sup>∈</sup> <sup>c</sup>post for some <sup>r</sup>- , g **then** 10: R[s] := r- , addLazyStep(B, r <sup>≈</sup> r- , g- ), addStep(B,trans, (s <sup>≈</sup> r, r <sup>≈</sup> r- ), ())

the proof generator most relevant to the core calculus, the *term-conversion proof generator*, targeted for substitution and rewriting proofs.

## **4 Proof Reconstruction for Substitution and Rewriting**

Once it determines that the input formulas <sup>ϕ</sup>1,...,ϕn are jointly unsatisfiable, the SMT solver has a reference to a proof node P that concludes <sup>⊥</sup> from the free assumptions <sup>ϕ</sup><sup>1</sup>,...,ϕn. After the post-processor is run, the (closed) proof (scope, P , (ϕ<sup>1</sup>,...,ϕn)) is then generated as the final proof for the user, where P is the result of optionally expanding coarse-grained steps (in particular, applications of the rule sr) in P into fine-grained ones. To do so, we require the following algorithm for generating *term-conversion* proofs.

In particular, we focus on equalities t <sup>≈</sup> s whose proof can be justified by a set of steps that replace subterms of t until it is syntactically equal to s. We assume these steps are provided to a *term-conversion proof generator*. Formally, a term-conversion proof generator <sup>g</sup> is a pair of sets <sup>c</sup>pre and <sup>c</sup>post. The set <sup>c</sup>pre (resp., <sup>c</sup>post) contains pairs of the form (<sup>t</sup> <sup>≈</sup> s, gt,s) indicating that <sup>t</sup> should be replaced by s in a preorder (resp., postorder) traversal of the terms that g processes, where <sup>g</sup>t,s is a proof generator that can prove the equality <sup>t</sup> <sup>≈</sup> <sup>s</sup>. We require that neither <sup>c</sup>pre nor <sup>c</sup>post contain multiple entries of the form (<sup>t</sup> <sup>≈</sup> <sup>s</sup><sup>1</sup>, g<sup>1</sup>) and (t <sup>≈</sup> s<sup>2</sup>, g<sup>2</sup>) for distinct (s<sup>1</sup>, g<sup>1</sup>) and (s<sup>2</sup>, g<sup>2</sup>).

The procedure for generating proofs from a term-conversion proof generator <sup>g</sup> is given in Algorithm 1. When asked to prove an equality <sup>t</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup><sup>2</sup>, getProof traverses the structure of <sup>t</sup><sup>1</sup> and applies steps from the sets <sup>c</sup>pre and <sup>c</sup>post from <sup>g</sup>. The traversal is performed by the auxiliary procedure getTermConv which relies on two data structures. The first is a *lazy proof builder* B that stores the intermediate steps in the overall proof of <sup>t</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup><sup>2</sup>. The proof builder is given these steps either via addStep, as a concrete triple with the proof rule, a list of premise formulas, and a list of argument terms, or as a *lazy* step via addLazyStep, with a formula and a reference to another generator that can prove that formula. The second data structure is a mapping R from terms to terms that is updated (using array syntax in the pseudo-code) as the converted form of terms is computed by getTermConv. For any term s, executing getTermConv(s, cpre, cpost,B,R) will result in <sup>R</sup>[s] containing the converted form of <sup>s</sup> according to the rewrites in <sup>c</sup>pre and cpost, and B storing a proof step for s <sup>≈</sup> R[s]. Thus, the procedure getProof succeeds when, after invoking getTermConv(t1, cpre, cpost,B,R) with <sup>B</sup> and <sup>R</sup> initially empty, the mapping <sup>R</sup> contains <sup>t</sup><sup>2</sup> as the converted form of <sup>t</sup><sup>1</sup>. The proof for the equality <sup>t</sup><sup>1</sup> <sup>≈</sup> <sup>R</sup>[t<sup>1</sup>] can then be constructed by calling getProof on the lazy proof builder B, based on the (lazy) steps stored in it.

Each subterm <sup>s</sup> of <sup>t</sup><sup>1</sup> is traversed only once by getTermConv by checking whether R already contains the converted form of s. When that is not the case, <sup>s</sup> is first preorder processed. If <sup>c</sup>pre contains an entry indicating that <sup>s</sup> rewrites to s , this rewrite step is added to the lazy proof builder and the converted form R[s] of s is set to s . Otherwise, the immediate subterms of s, if any, are traversed and then s is postorder processed. The converted form of s is set to some term <sup>r</sup> of the form <sup>f</sup>(R[s<sup>1</sup>],...,R[sn]), considering how its immediate subterms were converted. Note that B will contain steps for s <sup>≈</sup> R[s]. Thus, the equality s <sup>≈</sup> r can be proven by congruence for function f with these premises if s <sup>=</sup> r, and by reflexivity otherwise. Furthermore, if <sup>c</sup>post indicates that <sup>r</sup> rewrites to <sup>r</sup> , then this step is added to the lazy proof builder; a transitivity step is added to prove s <sup>≈</sup> r from <sup>t</sup> <sup>≈</sup> <sup>r</sup> and <sup>r</sup> <sup>≈</sup> <sup>r</sup> ; and the converted form R[s] is set to r .

*Example 1.* Consider the equality t ≈ ⊥, where t <sup>=</sup> f(b)+f(a) < f(a−0)+f(b), and suppose the conversion of t is justified by a term-conversion proof generator <sup>g</sup> containing the sets <sup>c</sup>pre <sup>=</sup> {(f(b) + <sup>f</sup>(a) <sup>≈</sup> <sup>f</sup>(a) + <sup>f</sup>(b), gAC),(<sup>a</sup> <sup>−</sup> <sup>0</sup> <sup>≈</sup> a, gArith <sup>0</sup> )} and <sup>c</sup>post <sup>=</sup> {(f(a)+f(b) < f(a)+f(b) ≈ ⊥, gArith <sup>1</sup> )}. The generator <sup>g</sup>AC provides a proof based on associative and commutative reasoning, whereas gArith <sup>0</sup> and gArith <sup>1</sup> provide proofs based on arithmetic reasoning. Invoking getProof(g, t ≈ ⊥) initiates the traversal with getTermConv(t, cpre, cpost, <sup>∅</sup>, <sup>∅</sup>). Since <sup>t</sup> is not in the conversion map, it is preorder processed. However, as it does not occur in cpre, nothing is done and its subterms are traversed. The subterm f(b) + f(a) is equated to f(a) + f(b) in cpre, justified by <sup>g</sup>AC. Therefore <sup>R</sup> is updated with R[f(b) + f(a)] = f(a) + f(b) and the respective lazy step is added to B. The subterms of f(b)+f(a) are not traversed, therefore the next term to be traversed is f(a−0)+f(b). Since it does not occur in cpre, its subterm <sup>f</sup>(a−0) is traversed, which analogously leads to the traversal of a−0. As a−0 does occur in cpre, both <sup>R</sup> and B are updated accordingly and the processing of its parent f(a−0) resumes. A congruence step added to B justifies its conversion to f(a) being added to R. No more additions happen since f(a) does not occur in cpost. Analogously, R and B are updated with f(b) not changing and f(a <sup>−</sup> 0) + f(b) being converted into f(a) + f(b). Finally, the processing returns to the initial term t, which has been converted to R[f(b) + f(a)] < R[f(a + 0) + f(b)], i.e., f(a) + f(b) < f(a) + f(b). Since this term is equated to <sup>⊥</sup> in cpost, justified by <sup>g</sup>Arith <sup>1</sup> , the respective lazy step is added to B, as well as a transitivity step to connect f(b) + f(a) < f(a <sup>−</sup> 0) + f(b) <sup>≈</sup> f(a) + f(b) < f(a) + f(b) and f(a) + f(b) < f(a) + f(b) ≈ ⊥. At this point, the execution terminates with R[f(b)+f(a) < f(a+0)+f(b)] = <sup>⊥</sup>, as expected. A proof for t ≈ ⊥ with the following structure can then be extracted from B:

$$\begin{array}{c|c} \text{L2xy } \frac{g^{\mathsf{K}}}{f(b) + f(a) \approx f(a) + f(b)} & P\_1 & < &\\\hline f(b) + f(a) < f(a - 0) + f(b) \approx f(a) + f(b) < f(a) + f(b) & P\_2 & \text{ref} & \frac{\bot}{f(b) \approx f(b)}\\\hline \end{array} \quad P\_2: \text{ref} & \begin{array}{c|c} \text{L2yy } \frac{g\_0^{\mathsf{K} \text{ih}}}{f(a) + f(b) < f(a) + f(b) \approx \bot}\\\hline \end{array}$$
 
$$\begin{array}{c|c} \text{L2yy } \frac{g\_0^{\mathsf{K} \text{ih}}}{f(b) + f(a) < f(a - 0) + f(b) \approx \bot} & P\_1: \text{Cong} & \frac{f(a - 0) \approx f(a)}{f(a - 0) + f(b) \approx f(a) + f(b)}\\\hline \end{array} \quad P\_2 & \begin{array}{c|c} \text{L2yy } \frac{g\_0^{\mathsf{K} \text{ih}}}{f(a - 0) \approx f(a) \quad P\_2 & \text{ref} & \text{ref}\\\hline \end{array}$$

We use several extensions to the procedures in Algorithm 1. Notice that this procedure follows the policy that terms on the right-hand side of conversion steps (equalities from <sup>c</sup>pre and <sup>c</sup>post) are not traversed further. The procedure getTermConv is used by term-conversion proof generators that have the *rewriteonce* policy. A similar procedure which additionally traverses those terms is used by term-conversion proof generators that have a *rewrite-to-fixpoint* policy.

We now show how the term-conversion proof generator can be used for reconstructing fine-grained proofs from coarse-grained ones. In particular we focus on proofs <sup>P</sup>ψ<sup>1</sup> of the form (sr, (Qψ<sup>0</sup> , Q ), (S, <sup>R</sup>, <sup>D</sup>, ψ)). Recall from Fig. <sup>2</sup> that the proof rule sr concludes a formula ψ that can be shown equivalent to the formula <sup>ψ</sup><sup>0</sup> proven by <sup>Q</sup>ψ<sup>0</sup> based on a substitution derived from the conclusions of the nodes Q . A proof like <sup>P</sup>ψ<sup>1</sup> above can be transformed to one that involves (atomic) theory rewrites and equality rules only. We show this transformation in two phases. In the first phase, the proof is expanded to:

(eq res, (Qψ<sup>0</sup> ,(trans, (R<sup>0</sup>,(symm, R<sup>1</sup> )))))

with <sup>R</sup>i = (trans, ((subs, Q ϕ, (S, <sup>D</sup>, ψi)),(rewrite, (), (R, <sup>S</sup>(ψi, <sup>D</sup>(ϕ)))))) for <sup>i</sup> <sup>∈</sup> {0, <sup>1</sup>} where ϕ are the conclusions of Q ϕ, and subs and rewrite are auxiliary proof rules used for further expansion in the second phase. We describe them next.

*Substitution Steps.* Let <sup>P</sup>t≈s be the subproof (subs, Q ϕ, (S, <sup>D</sup>, t)) of <sup>R</sup>i above proving <sup>t</sup> <sup>≈</sup> <sup>s</sup> with <sup>s</sup> <sup>=</sup> <sup>S</sup>(ψi, <sup>D</sup>(ϕ)) and <sup>D</sup>(ϕ)=(t<sup>1</sup> → <sup>s</sup><sup>1</sup>,...,tn → <sup>s</sup>n). Substitution steps can be expanded to fine-grained proofs using a term-conversion proof generator. First, for each <sup>j</sup> = 1,...,n, we construct a proof of <sup>t</sup>j <sup>≈</sup> <sup>s</sup>j , which involves simple transformations on the proofs of ϕ. Suppose we store all of these in an eager proof generator g. If <sup>S</sup> is a simultaneous or fixed-point substitution, we then build a single term-conversion proof generator C, which recall is modeled as a pair of mappings (cpre, cpost). We add (tj <sup>≈</sup> <sup>s</sup>j , g) to <sup>c</sup>pre for all j. We use the rewrite-once policy for C if <sup>S</sup> is a simultaneous substitution, and the rewrite-fixed-point policy for C otherwise. We then replace the proof <sup>P</sup>t≈s by getProof(C, t <sup>≈</sup> <sup>s</sup>), which runs the procedure in Algorithm 1. Otherwise, if S is a sequential substitution, we construct a term-conversion generator <sup>C</sup>j for *each* <sup>j</sup>, initializing it so that its <sup>c</sup>pre set contains the single rewrite step (tj <sup>≈</sup> <sup>s</sup>j , g) and uses a rewrite-once policy. We then replace the proof <sup>P</sup>t≈s by (trans, (P<sup>1</sup>,...,Pn)) where, for <sup>j</sup> = 1,...,n: <sup>P</sup>j is generated by getProof(Cj , sj−<sup>1</sup> <sup>≈</sup> <sup>s</sup>j ); <sup>s</sup><sup>0</sup> <sup>=</sup> <sup>t</sup>; <sup>s</sup>i is the result of the substitution <sup>D</sup>(ϕ) after the first <sup>i</sup> steps; and <sup>s</sup>n <sup>=</sup> <sup>s</sup>.

*Rewrite Steps.* Let P be the proof node (rewrite, (), (R, t)), which proves the equality <sup>t</sup> <sup>≈</sup> <sup>t</sup>↑↓R. During reconstruction, we replace P with a proof involving only fine-grained rules, depending on the rewrite method R. For example, if <sup>R</sup> is the core rewriter, we run the rewriter again on t in proof tracking mode. Normally, the core rewriter performs a term traversal and applies atomic rewrites to completion. In proof tracking mode, it also return two lists, for pre- and postrewrites, of steps (t<sup>1</sup> <sup>≈</sup> <sup>s</sup>1, g),...,(tn <sup>≈</sup> <sup>s</sup>n, g) where <sup>g</sup> is a proof generator that returns (atom rewrite, (), (R, ti)) for all equalities <sup>t</sup>i <sup>≈</sup> <sup>s</sup>i. Furthermore, for each Skolem k that is a subterm of t, we construct the rewrite steps (k <sup>≈</sup> k↑, g ) where g is a proof generator that returns (witness, (), (k)) for equalities k <sup>≈</sup> k↑. We add these rewrite proof steps to a term-conversion generator C with rewritefixed-point policy, and replace P by getProof(C, t <sup>≈</sup> t↑↓R).

#### **5 SMT Proofs**

Here we briefly describe each component shown in Sect. 2 and how it produces proofs with the infrastructure from Sects. 3 and 3.2.

#### **5.1 Preprocessing Proofs**

The *pre-processor* transforms an input formula ϕ into a list of formulas to be given to the core solver. It applies a sequence of *preprocessing passes*. A pass may *replace* a formula <sup>ϕ</sup>i with another one <sup>φ</sup>i, in which case it is responsible for providing a proof of <sup>ϕ</sup>i <sup>≈</sup> <sup>φ</sup>i. It may also append a new formula <sup>φ</sup> to the list, in which case it is responsible for providing a proof for it. We use a (lazy) proof generator that tracks these proofs, maintaining the invariant that a proof can be provided for all (preprocessed) formulas when requested. We have instrumented proof production for the most common preprocessing passes, relying heavily on the sr rule to model transformations such as expansion of function definitions and, with witness forms, Skolemization and if-then-else elimination [6].

*Simplification Under Global Assumptions.* cvc<sup>5</sup> aggressively learns literals that hold globally by performing Boolean constraint propagation over the input formula. When a learned literal corresponds to a variable elimination (e.g., x <sup>≈</sup> <sup>5</sup> corresponds to x → 5) or a constant propagation (e.g., P(x) corresponds to P(x) → ), we apply the corresponding (term) substitution to the input. This application is justified via sr, while the derivation of the globally learned literals is justified via clausification and resolution proofs, as explained in Sect. 5.3.

The key features of our architecture that make it feasible to produce proofs for this simplification are the automatic reconstruction of sr steps and the ability to customize the strategy for substitution application during reconstruction, as detailed in Sect. 3.2. When a new variable elimination x → t is learned, old ones need to be normalized to eliminate any occurrences of x in their right-hand sides. Computing the appropriate simultaneous substitution for all eliminations requires quadratically many traversals over those terms. We have observed that the size of substitutions generated by this preprocessing pass can be very large (with thousands of entries), which makes this computation prohibitively expensive. Using the fixed-point strategy, however, the reconstruction for the sr steps can apply the substitution efficiently and its complexity depends on how many applications are necessary to reach a fix-point, which is often low in practice.

### **5.2 Theory Proofs**

The theory engine produces lemmas, as disjunctions of literals, from an individual theory or a combination of them. In the first case, the lemma's proof is provided directly by the corresponding theory solver. In the second case, a theory solver may produce a lemma ψ containing a literal derived by some other theory solver from literals . A lemma over the combined theory is generated by replacing in ψ by . This regression process, which is similar to the computation of *explanations* during solving, is repeated until the lemma contains only input literals. The proof of the final lemma then uses rules like sr to combine the proofs of the intermediate literals derived locally in various theories and their replacement by input literals in the final lemma.

*Equality and Uninterpreted Function (EUF) Proofs.* The EUF solver can be easily instrumented to produce proofs [31,42] with equality rules (see Fig. 2). In cvc5, term equivalences are also derived via rewriting in some other theory T: when a function from T has all of its arguments inferred to be congruent to T-values, it may be rewritten into a T-value itself, and this equivalence asserted. Such equivalences are justified via sr steps. Since generating equality proofs incurs minimal overhead [42] and rewriting proofs are reconstructed lazily, EUF proofs are generated during solving and stored in an eager proof generator.

*Extensional Arrays and Datatypes Proofs.* While these two theories differ significantly, they both combine equality reasoning with rules for handling their particular operators. For arrays, these are rules for select, store, and array extensionality (see [36, Sec. 5]). For datatypes, they are rules reflecting the properties of *constructors* and *selectors*, as well as acyclicity. The justifications for lemmas are also generated eagerly and stored in an eager proof generator.

*Bit-Vector Proofs.* The bit-vector solver applies bit-blasting to reduce bit-vector problems to equisatisfiable propositional problems. Thus, its lemmas amount to the rewriting of the bit-vector literals into Boolean formulas, which will be solved and proved by the propositional engine. The bit-vector lemmas are proven lazily, analogous to sr steps, with the difference that the reconstruction uses the bit-blaster in the bit-vector solver instead of the rewriter.

*Arithmetic Proofs.* The *linear* arithmetic solver is based on the simplex algorithm [24], and each of its lemmas is the negation of an unsatisfiable conjunction of inequalities. Farkas' lemma [30,49] guarantees that there exists a linear combination of these inequalities equivalent to ⊥. The coefficients of the combination are computed during solving with minimal overhead [38], and the equivalence is proven with an sr step. To allow the rewriter to prove this equivalence, the bounds of the inequalities are scaled by constants and summed during reconstruction. Integer reasoning is proved through rules for branching and integer bound tightening, recorded eagerly.

*Non-linear* arithmetic lemmas are generated from incremental linearization [16] or cylindrical algebraic coverings [1]. The former can be proven via propositional and basic arithmetic rules, with only a few, such as the tangent plane lemma, needing a dedicated proof rule. The latter requires two complex rules that are not inherently simpler than solving, albeit not as complex as those for regular CAD-based theory solvers [2]. We point out that checking these rules would require a significant portion of CAD-related theory, whose proper formalization is still an open, if actively researched, problem [18,25,34,41,53].

*Quantifier Proofs.* Quantified formulas not Skolemized during pre-processing are handled via instantiation, which produces theory lemmas of the form (∀x ϕ) <sup>⇒</sup> ϕσ, where σ is a grounding substitution. An instantiation rule proves them independently of how the substitution was actually derived, since any well-typed one suffices for soundness.

*String Proofs.* The strings solver applies a layered approach, distinguishing between core [40] and extended operators [48]. The core operators consist of (dis)equalities between string concatenations and length constraints. Reasoning over them is proved by a combination of equality and linear integer arithmetic proofs, as well as specific string rules. The extended operators are reduced to core ones via formulas with bounded quantifiers. The reductions are proven with rules defining each extended function's semantics, and sr steps justifying the reductions. Finally, regular membership constraints are handled by string rules that unfold occurrences of the Kleene star operator and split up regular expression concatenations into different parts. Overall, the proofs for the strings theory solver encompass not only string-specific reasoning but also equality, linear integer arithmetic, and quantifier reasoning, as well as substitution and rewriting.

*Unsupported.* The theory solvers for the theories of floating-point arithmetic, sequences, sets and relations, and separation logic are currently not proofproducing in cvc5. These are relatively new or non-standard theories in SMT and have not been our focus, but we intend to produce proofs for them in the future.


**Table 1.** Cumulative solving times (s) on benchmarks solved by all configurations, with the slowdown versus cvc+s in parentheses.

#### **5.3 Propositional Proofs**

Propositional proofs justify both the conversion of preprocessed input formulas and theory lemmas into conjunctive normal form (CNF) and the derivation of ⊥ from the resulting clauses. CNF proofs are a combination of Boolean transformations and introductions of Boolean formulas representing the definition of Tseytin variables, used to ensure that the CNF conversion is polynomial. The clausifier uses a lazy proof builder which stores the clausification steps eagerly, with the preprocessed input formulas as assumptions, and the theory lemmas as lazy steps, with associated proof generators.

For Boolean reasoning, cvc5 uses a version of MiniSat [27] instrumented to produce resolution proofs. It uses a lazy proof builder to record resolution steps for learned clauses as they are derived (see [7, Chap 1] for more details) and to lazily build a refutation with only the resolution steps necessary for deriving ⊥. The resolution rule, however, is ground first-order resolution, since the proofs are in terms of the first-order clauses rather than their propositional abstractions.

#### **6 Evaluation**

In this section, we discuss an initial evaluation of our implementation in cvc5 of the proof-production architecture presented in this paper. In the following, we denote different configurations of cvc5 by cvc plus some suffixes. A configuration using variable and clause elimination in the SAT solver [26], symmetry breaking [23] in the EUF solver, and black-box SAT solving in the bit-vector (BV) solver, is denoted by the suffix o. These techniques are currently incompatible with the proof production architecture. Other cvc5 techniques for which we do not yet support fine-grained proofs, however, are active and have their inferences registered in the proofs as trusted steps. A configuration that includes simplification under global assumptions is denoted by s; one that includes producing proofs by p; and one that additionally reconstructs proofs by r. The default configuration of cvc5 is cvc+os.

We split our evaluation into measuring the proof-production cost as well as the performance impact of making key techniques proof-producing; the proof reconstruction overhead; and the coverage of the proof production. We also comment on how cvc5's proofs compare with CVC4's proofs. Note that the internal proof checking described in Sect. 3, which was invaluable for a correct implementation, is disabled for evaluating performance. Experiments ran on a cluster with

**Fig. 3.** (a) Cactus plot for non-BVs (b) Cactus plot for BVs (c) Scatter plot of overall proof cost (d) Reconstruction cost

Intel Xeon E5-2620 v4 CPUs, with 300s and 8GB of RAM for each solver and benchmark pair. We consider 162,060 unsatisfiable problems from SMT-LIB [8], across all logics except those with floating point arithmetic, as determined by cvc5 [5, Sec. 4]. We split them into 38,732 problems with the BV theory (the BVs set) and 123,328 problems without (the non-BVs set).

*Proof Production Cost.* The cost of proof production is summarized in Table <sup>1</sup> and Figs. 3a to 3d. The impact of running without o is negligible overall in non-BVs, but steep for BVs, both in terms of solving time and number of problems solved, as evidenced by the table and Fig. 3b respectively. This is expected given the effectiveness of combining bit-blasting with black-box SAT solvers. The overhead of p is similar for both sets, although more pronounced in BVs. While the total time is around double that of cvc+s, Fig. 3c shows a finer distribution, with most problems having a less significant overhead. Moreover, the total number of problems solved is quite similar, as shown in Figs. 3a and 3b, particularly for non-BVs. The difference in overhead due to p between the BVs and non-BVs sets can be attributed to the cost of managing large proofs, which are more common in BVs. This stems from the well-known blow-up in problem size incurred by bit-blasting, which is reflected in the proofs.

The cost of generating fine-grained steps for the sr rule and for the similarly reconstructed theory-specific steps mentioned in Sect. 5, varies again between the two sets, but more starkly. While for non-BVs the overall solving time and number of problems solved are very similar between cvc+sp and cvc+spr, for the BVs set cvc+spr is significantly slower overall. This difference again arises mainly because of the increased proof sizes. Nevertheless, r leads to only a small increase in unsolved problems in BVs, as shown in Fig. 3b.

The importance of being able to produce proofs for simplification under global assumptions is made clear by Fig. 3a: the impact of disabling s is virtually the same as that of adding p; moreover, cvc+spr significantly outperforms cvc+pr. In Fig. 3b the difference is less pronounced but still noticeable.

*Proofs Coverage.* When using techniques that are not yet fully proof-producing, but still active, cvc<sup>5</sup> inserts *trusted steps* in the proof. These are usually steps whose checking is not inherently simpler than solving. They effectively represent holes in the proof, but are still useful for users who avail themselves of powerful proof-checking techniques. Trusted steps are commonly used when integrating SMT solvers into proof assistants [11,28,51].

The percentage of cvc+spr proofs *without* trusted steps is 92% for BVs and 80% for non-BVs. That is to say, out of 145,683 proofs, 120,473 of them are fully fine-grained proofs. The vast majority of the trusted steps in the remaining proofs are due to theory-specific preprocessing passes that are not yet fully proofproducing. In non-BVs, the occurrence of trusted steps is heavily dependent on the specific SMT-LIB logic, as expected. Common offenders are logics with datatypes, with trusted steps for acyclicity checks, and quantified logics, with trusted steps for certain α-equivalence eliminations. In non-linear real arithmetic logics, all cylindrical algebraic coverings proofs are built with trusted steps (see Sect. 5.2), but we note this is the state of the art for CAD-based proofs. As for non-linear integer arithmetic logics, our proof support is still in its early stages, so a significant portion of their theory lemmas are trusted steps.

We stress the extent of our coverage for string proofs, which were previously unsupported by any SMT solver. In the string logics without length constraints, 100% of the proofs are fully fine-grained. This rate goes down to 80% in the logics with length. For the remaining 20%, the overwhelming majority of the trusted steps are for theory-specific preprocessing or some particular string or linear arithmetic inference within the proof of a theory lemma.

*Comparison with CVC4 Proofs.* We compare the proof coverage of cvc<sup>5</sup> versus CVC4. The cvc5 proof production replaces CVC4's [32,36], which was incomplete and monolithic. CVC4 did not produce proofs at all for strings, substitutions, rewriting, preprocessing, quantifiers, datatypes, or non-linear arithmetic. In particular, simplification over global assumptions had to be disabled when producing proofs. In fragments supported by both systems, CVC4's proofs are at most as detailed as cvc5's. The only superior aspect of CVC4's proof production was to support proofs from external SAT solvers [45] used in the BV solver, which are very significant for solving performance, as shown above. Integrating this feature into cvc5 is left as future work, but we note that there is no limitation in the proof architecture that would prevent it. We also point out that cvc5 produces resolution proofs for the bit-blasted BV constraints, which can be checked in polynomial time, whereas external SAT solvers produce DRAT proofs [33] (or reconstructions of them via other tools [19,20,37,39]), which can take exponential time to check. So there is a significant trade-off to be considered.

### **7 Related Work**

Two significant proof-producing state-of-the-art SMT solvers are z3 [22] and veriT [14]. Both can have their proofs successfully reconstructed in proof assistants [3,12,13,51]. They can produce detailed proofs for the propositional and theory reasoning in EUF and linear arithmetic, as well as for quantifiers. However, z3's proofs are coarse-grained for preprocessing and rewriting, and for bitvector reasoning, which complicates proof checking. Moreover, to the best of our knowledge, z3 does not produce proofs for its other theories. In contrast, veriT can produce fine-grained proofs for preprocessing and rewriting [6], which has led to a better integration with Isabelle/HOL [51]. However, it does so eagerly, which requires a tight integration between the preprocessing and the proof-production code. In addition, it does not support simplification under global assumptions when producing proofs, which significantly impacts its performance. Other proofproducing SMT solvers are MathSAT5 [17] and SMTInterpol [15]. They produce resolution proofs and theory proofs for EUF, linear arithmetic, and, in SMTInterpol's case, array theories. Their proofs are tailored towards unsatisfiable core and interpolant generation, rather than external certification. Moreover, they do not seem to provide proofs for preprocessing, clausification or rewriting.

While cvc5 is possibly the only proof-producing solver for the full theory of strings, CertiStr [35] is a certified solver for the fragment with concatenation and regular expressions. It is automatically generated from Isabelle/HOL [44] but is significantly less performant than cvc5, although a proper comparison would need to account for proof-checking time in cvc5's case.

#### **8 Conclusion and Future Work**

We presented and evaluated a flexible proof production architecture, showing it is capable of producing proofs with varying levels of granularity in a scalable manner for a state-of-the-art and industrial-strength SMT solver like cvc5.

Since currently, there is no standard proof format for SMT solvers, our architecture is designed to support multiple proof formats via a final post-processing transformation to convert internal proofs accordingly. We are developing backends for the LFSC [52] proof checker and the proof assistants Lean 4 [21], Isabelle/HOL [44], and Coq [10], the latter two via the Alethe proof format [50]. Since using these tools requires mechanizing the respective target proof calculi in their languages, besides external checking, another benefit is to decouple confidence on the soundness of the proof calculi from the internal cvc5 proof calculus.

A considerable challenge for SMT proofs is the plethora of rewrite rules used by the solvers, which are specific for each theory and vary in complexity. In particular, string rewrites can be very involved [46] and hard to check. We are also developing an SMT-LIB-based DSL for specifying rewrite rules, to be used during proof reconstruction to decompose rewrite steps in terms of them, thus providing more fine-grained proofs for rewriting.

Finally, we plan to incorporate into the proof-production architecture the unsupported theories and features mentioned in Sects. 5.2 and 6, particularly those relevant for solving performance that currently either leave holes in proofs, such as theory pre-processing or non-linear arithmetic reasoning, or that have to be disabled, such as the use of external SAT solvers in the BV theory.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **CTL***<sup>∗</sup>* **Model Checking for Data-Aware Dynamic Systems with Arithmetic**

Paolo Felli, Marco Montali, and Sarah Winkler(B)

Free University of Bolzano-Bozen, Bolzano, Italy *{*pfelli,montali,winkler*}*@inf.unibz.it

**Abstract.** The analysis of complex dynamic systems is a core research topic in formal methods and AI, and combined modelling of systems with data has gained increasing importance in applications such as business process management. In addition, process mining techniques are nowadays used to automatically mine process models from event data, often without correctness guarantees. Thus verification techniques for linear and branching time properties are needed to ensure desired behavior.

Here we consider data-aware dynamic systems with arithmetic (DDSAs), which constitute a concise but expressive formalism of transition systems with linear arithmetic guards. We present a CTL<sup>∗</sup> model checking procedure for DDSAs that addresses a generalization of the classical verification problem, namely to compute conditions on the initial state, called *witness maps*, under which the desired property holds. Linear-time verification was shown to be decidable for specific classes of DDSAs where the constraint language or the control flow are suitably confined. We investigate several of these restrictions for the case of CTL∗, with both positive and negative results: witness maps can always be found for monotonicity and integer periodicity constraint systems, but verification of bounded lookback systems is undecidable. To demonstrate the feasibility of our approach, we implemented it in an SMT-based prototype, showing that many practical business process models can be effectively analyzed.

**Keywords:** Verification · CTL<sup>∗</sup> · Counter systems · Constraints · SMT

## **1 Introduction**

The study of complex dynamic systems is a core research topic in AI, with a long tradition in formal methods. It finds application in a variety of domains, such as notably business process management (BPM), where studying the interplay between control-flow and data has gained momentum [9,10,24,46]. Processes are increasingly mined by automatic techniques [1,3] that lack any correctness guarantees, making verification even more important to ensure the desired behavior.

c The Author(s) 2022

This work is partially supported by the UNIBZ projects DaCoMan, QUEST, SMART-APP, VERBA, and WineId.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 36–56, 2022. https://doi.org/10.1007/978-3-031-10769-6\_4

However, the presence of data pushes verification to the verge of undecidability due to an infinite state space. This is aggravated by the use of arithmetic, in spite of its importance for practical applications [24]. Indeed, model checking of transition systems operating on numeric data variables with arithmetic constraints is known to be undecidable, as it is easy to model a two-counter machine.

In this work, we focus on the concise but expressive framework of data-aware dynamic systems with arithmetic (DDSAs) [28,38], also known as counter systems [13,20,34]. Several classes of DDSAs have been isolated where specific verification tasks are decidable, notably reachability [6,13,29,34] and linear-time model checking [14,20,22,28,38]. Fewer results are known about the case of branching time, except for flat counter systems [21], gap-order systems where constraints are restricted to the form x − y ≥ 2 [8,42], and systems with a *nice symbolic valuation abstraction* [31]. However, many processes in BPM and beyond fall into neither of these classes, as illustrated by the next example.

*Example 1.* The following DDSA B models a management process for road fines by the Italian police [41]. It maintains seven so-called *case data* variables (i.e., variables local to each process instance, called "case" in the BPM literature): a (amount), t (total amount), d (dismissal code), p (points deducted), e (expenses), and time durations *ds*, *dp*, *dj*. The process starts by creating a case, upon which the offender is notified within 90 days, i.e., 2160h (send fine). If the offender pays a sufficient amount t, the process terminates via silent actions τ1, τ2, or τ3. For the less happy paths, the credit collection action is triggered if the payment was insufficient; while appeal to judge and appeal to prefecture reflect filed protests by the offender, which again need to respect certain time constraints.

This model was generated from real-life logs by automatic process mining techniques paired with domain knowledge [41], but without any correctness guarantee. For instance, *data-aware soundness* [4,25] requires that the process can always reach a final state from any reachable configuration, expressed by the branching-time property A G E F end. This property is false here, as B can get stuck in state p<sup>7</sup> if d > 1. In addition, process-specific linear-time properties are needed, e.g., that a send fine event is always followed by a sufficient payment (i.e., send fine → F payment(t ≥ a), where α is the next operator via action α).

This example highlights how both linear-time and branching-time verification are needed. In this paper, we present a CTL<sup>∗</sup> model checking algorithm for DDSAs, adopting a finite-trace semantics (CTL∗ <sup>f</sup> ) [44] to reflect the nature of processes as in Example 1. More precisely, our approach can synthesize conditions on the initial variable assignment such that a given property χ holds, called *witness maps*. If such a witness map can be found, it is in particular decidable what is more commonly called the *verification problem*, namely whether χ is satisfied in a designated initial configuration. We derive an abstract criterion on the computability of witness maps, which is satisfied by two practical DDSA classes that restrict the constraint language to (a) monotonicity constraints [20,25], i.e., variable-to-variable or variable-to-constant comparisons over Q or R, and (b) integer periodicity constraints [18,22], i.e., variable-to-constant and restricted variable-to-variable comparisons with modulo operators. On the other hand, we show that the verification problem is undecidable for *bounded lookback* systems [28], a control flow restriction that generalizes *feedback freedom* [14].

In summary, we make the following contributions:


The paper is structured as follows: The rest of this section recapitulates related work. Section 2 compiles preliminaries about DDSAs and CTL<sup>∗</sup> <sup>f</sup> . Section 3 is dedicated to LTL with configuration maps, which is used by our model checking procedure in Sect. 4. Based on an abstract termination criterion, (un)decidability results for concrete DDSA classes are given in Sect. 5. We describe our implementation in Sect. 6. Complete proofs and further examples can be found in [27].

*Related work.* Verification of transition systems with arithmetic constraints, also called counter systems, has been studied in many areas including formal methods, database theory, and BPM. Reachability was proven decidable for a variety of classes, e.g., reversal-bounded counter machines [34], finite linear [29], flat [13], and gap-order constraint (GC) systems [6]. Considerable work has also been dedicated to linear-time verification: LTL model checking is decidable for monotonicity constraint (MC) systems [20]. LTL verification is also decidable for integer periodicity constraint (IPC) systems, even with past-time operators [18,22]; and feedback-free systems, for an enriched constraint language referring to a read-only database [14]. DDSAs with MCs are also considered in [25] from the perspective of LTL with a finite-run semantics (LTL<sup>f</sup> ), giving a procedure to compute finite, faithful abstractions. LTL<sup>f</sup> is moreover decidable for systems with the abstract *finite summary* property [28], which includes MC, GC, and systems with bounded lookback, where the latter generalizes feedback freedom.

Branching-time verification was less studied: Decidability of CTL∗ was proven for flat counter systems with Presburger-definable loop iteration [21], even in NP [19]. Moreover, it was shown that CTL<sup>∗</sup> verification is decidable for pushdown systems, which can model counter systems with a single integer variable [30]. For integer relational automata (IRA), i.e., systems with constraints x ≥ y or x>y and domain Z, CTL model checking is undecidable while the existential and universal fragments of CTL<sup>∗</sup> remain decidable [12]. For GC systems, which extend IRAs to constraints of the form x−y ≥ k, the existential fragment of CTL<sup>∗</sup> is decidable while the universal one is not [8]. A similar dichotomy holds for the EF and EG fragments of CTL [42]. A subclass of IRAs was considered in [7,11], allowing only periodicity and monotonicity constraints. While satisfiability of CTL<sup>∗</sup> was proven decidable, model checking is not (as already shown in [12]), though it is decidable for CEF<sup>+</sup> properties, an extension of the EF fragment [7]. In contrast, rather than restricting temporal operators, we show decidability of model checking under an abstract property of the DDSA and the verified property, which can be guaranteed by suitably constraining the constraint class or the control flow. More closely related is work by Gascon [31], who shows decidability of CTL<sup>∗</sup> model checking for DDSAs that admit a *nice symbolic valuation abstraction*, an abstract property which includes MC and IPC systems. The relationship between our decidability criterion and the property defined by Gascon will need further investigation. Another difference is that we here adopt a finite-path semantics for CTL<sup>∗</sup> as e.g. considered in [47], since for the analysis of real-world processes such as business processes it is sufficient to consider finite traces. On a high level, our method follows a common approach to CTL∗: the verification property is processed bottom-up, computing solutions for each subproperty. These are then used to solve an equivalent linear-time problem [2, p. 429]. For the latter, we partially rely on earlier work [28].

#### **2 Background**

We start by defining the set of constraints over expressions of sort *int*, *rat*, or *real*, with associated domains dom(*int*) = Z, dom(*rat*) = Q, and dom(*real*) = R.

**Definition 1.** *For a given set of sorted variables* V *, expressions* e<sup>s</sup> *of sort* s *and atoms* a *are defined as follows:*

es*:=* v<sup>s</sup> | k<sup>s</sup> | e<sup>s</sup> + e<sup>s</sup> | e<sup>s</sup> − e<sup>s</sup> a *:=* e<sup>s</sup> = e<sup>s</sup> | e<sup>s</sup> < e<sup>s</sup> | e<sup>s</sup> ≤ e<sup>s</sup> | e*int* ≡<sup>n</sup> e*int*

*where* k<sup>s</sup> ∈dom(s)*,* v<sup>s</sup> ∈ V *has sort* s*, and* ≡<sup>n</sup> *denotes equality modulo some* n∈ N*. A* constraint *is then a quantifier-free boolean expression over atoms* a*.*

The set of all constraints built from atoms over variables V is denoted by C(V ). For instance, x = 1, x<y − z, and x − y = 2 ∧ y = 1 are valid constraints independent of the sort of {x, y, z}, while u ≡<sup>3</sup> v + 1 is a constraint for integer variables u and v. We write Var(ϕ) for the set of variables in a formula ϕ. For an assignment α with domain V that maps variables to values in their domain, and a formula ϕ we write α |= ϕ if α satisfies ϕ.

We are thus in the realm of SMT with linear arithmetic, which is decidable and admits *quantifier elimination* [45]: if ϕ is a formula in C(X ∪ {y}), thus having free variables X ∪ {y}, there is a quantifier-free ϕ with free variables X that is equivalent to ∃y.ϕ, i.e., ϕ ≡ ∃y.ϕ, where ≡ denotes logical equivalence.

#### **2.1 Data-Aware Dynamic Systems with Arithmetic**

From now on, V is a fixed, finite set of variables. We consider two disjoint, marked copies of <sup>V</sup> , denoted <sup>V</sup> <sup>r</sup> <sup>=</sup> {v<sup>r</sup> <sup>|</sup> <sup>v</sup> <sup>∈</sup><sup>V</sup> } and <sup>V</sup> <sup>w</sup> <sup>=</sup> {v<sup>w</sup> <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>V</sup> }, called the *read* and *write* variables. They will refer to variable values before and after a transition, respectively. We also write V for a vector that orders V in an arbitrary but fixed way, and <sup>V</sup> <sup>r</sup> and <sup>V</sup> <sup>w</sup> for vectors ordering <sup>V</sup> <sup>r</sup> and <sup>V</sup> <sup>w</sup> in the same way.

**Definition 2.** *A* DDSA B = B, b*<sup>I</sup>* , A,T,B<sup>F</sup> ,V,α*<sup>I</sup>* , *guard is a labeled transition system where (i)* B *is a finite set of* control states*, with* b*<sup>I</sup>* ∈ B *the initial one; (ii)* A *is a set of* actions*; (iii)* T ⊆ B×A×B *is a* transition relation*; (iv)* B<sup>F</sup> ⊆ B *are* final states*; (v)* V *is the set of* process variables*; (vi)* α*<sup>I</sup> the* initial variable assignment*; (vii) guard* : <sup>A</sup> → C(<sup>V</sup> <sup>r</sup> <sup>∪</sup> <sup>V</sup> <sup>w</sup>) *specifies* executability constraints *for actions over variables* <sup>V</sup> <sup>r</sup> <sup>∪</sup><sup>V</sup> <sup>w</sup>*.*

*Example 2.* We consider the following DDSAs B, B*bl* , and B*ipc*, where x, y have domain Q and u, v, s have domain Z. Initial and final states have incoming arrows and double borders, respectively; α*<sup>I</sup>* is not fixed for now.

$$\bigoplus\_{\mathbf{a}\succeq 1} \underbrace{\operatorname{s.t.}\begin{bmatrix} \boldsymbol{u}\_{n} > 0 \\ \mathbf{c}\_{1} \circ \boldsymbol{v}\_{n} > 0 \end{bmatrix}}\_{\mathbf{a}\succeq 1} \oplus \underbrace{\operatorname{c.t.}\begin{bmatrix} \boldsymbol{v}\_{n} - \boldsymbol{v}\_{n} \\ \mathbf{c}\_{2} \circ \boldsymbol{v}\_{n} - \boldsymbol{v}\_{n} \end{bmatrix}}\_{\mathbf{a}\succeq 1} \oplus \underbrace{\operatorname{c.t.}\begin{bmatrix} \boldsymbol{v}\_{n} - \boldsymbol{v}\_{n} \circ \boldsymbol{v}\_{n} - \boldsymbol{v}\_{n} \end{bmatrix}}\_{\mathbf{a}\succeq 0} \oplus \underbrace{\operatorname{c.t.}\begin{bmatrix} \boldsymbol{v}\_{n} - \boldsymbol{v}\_{n} \circ \boldsymbol{v}\_{n} - \boldsymbol{v}\_{n} \end{bmatrix}}\_{\mathbf{a}\succeq 0}$$

Also the system in Example 1 represents a DDSA. If state b admits a transition to b via action a, namely (b, a, b ) <sup>∈</sup> <sup>Δ</sup>, this is denoted by <sup>b</sup> <sup>a</sup> −→ <sup>b</sup> . A *configuration* of B is a pair (b, α) where b∈ B and α is an assignment with domain V . A *guard assignment* is an assignment <sup>β</sup> with domain <sup>V</sup> <sup>r</sup> <sup>∪</sup> <sup>V</sup> <sup>w</sup>. For an action <sup>a</sup>, let write(a) = <sup>V</sup>ar(*guard*(a)) <sup>∩</sup> <sup>V</sup> <sup>w</sup>. As defined next, an action <sup>a</sup> transforms a configuration (b, α) into a new configuration (b , α ) by updating the assignment α according to the action guard, which can at the same time evaluate conditions on the current values of variables and write new values:

**Definition 3.** *A DDSA* B = B, b*<sup>I</sup>* , A,T,B<sup>F</sup> ,V,α*<sup>I</sup>* , *guard* admits a step *from configuration* (b, α) *to* (b , α ) *via action* a*, denoted* (b, α) <sup>a</sup> −→ (b , α )*, if* b <sup>a</sup> −→ <sup>b</sup> *,* α (v) = α(v) *for all* v ∈ V \ write(a)*, and the guard assignment* β *given by* β(v<sup>r</sup>) = α(v) *and* β(v<sup>w</sup>) = α (v) *for all* v ∈ V *, satisfies* β |= *guard*(a)*.*

For instance, for B in Example 2 and initial assignment α*<sup>I</sup>* (x) = α*<sup>I</sup>* (y) = 0, the initial configuration admits a step (b1, - x=0 <sup>y</sup>=0 ) <sup>a</sup><sup>1</sup> −→ (b2, - x=0 <sup>y</sup>=3 ) with β(x<sup>r</sup>) = β(x<sup>w</sup>) = β(y<sup>r</sup>) = 0 and β(y<sup>w</sup>) = 3.

A *run* ρ of a DDSA B of length n from configuration (b, α) is a sequence of steps <sup>ρ</sup>: (b, α)=(b0, α0) <sup>a</sup><sup>1</sup> −→ (b1, α1) <sup>a</sup><sup>2</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ (bn, αn). We also associate with <sup>ρ</sup> the *symbolic run* <sup>σ</sup> : <sup>b</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>b</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ <sup>b</sup><sup>n</sup> where state and action sequences are recorded without assignments, and say that σ is the *abstraction* of ρ (or, σ *abstracts* ρ). For some m<n, σ|<sup>m</sup> denotes the prefix of σ that has m steps.

#### **2.2 History Constraints**

In this section, we fix a DDSA B = B, b*<sup>I</sup>* , A,T,B<sup>F</sup> ,V,α*<sup>I</sup>* , guard. We aim to build an abstraction of B that covers the (potentially infinite) set of configurations by finitely many nodes of the form (b, ϕ), where b∈ B is a control state and ϕ a formula that expresses conditions on the variables V . A state (b, ϕ) thus represents all configurations (b, α) s.t. α |= ϕ. To express how such a formula ϕ is modified by executing an action, let the *transition formula* of action <sup>a</sup> be <sup>Δ</sup>a(<sup>V</sup> <sup>r</sup> , <sup>V</sup> <sup>w</sup>) = *guard*(a) <sup>∧</sup> <sup>v</sup>∈<sup>V</sup> \*write*(a) <sup>v</sup><sup>w</sup> <sup>=</sup> <sup>v</sup><sup>r</sup>. This states conditions on variables before and after executing a: *guard*(a) must hold and the values of all variables that are not written are propagated by inertia. We write Δa(X, Y ) for the formula obtained from <sup>Δ</sup><sup>a</sup> by replacing <sup>V</sup> <sup>r</sup> by <sup>X</sup> and <sup>V</sup> <sup>w</sup> by <sup>Y</sup> . Let a variable vector U be a *fresh copy* of V if it has the same length as |V | and U ∩ V = ∅. To mimic steps on the abstract level, we define the following *update* function:

**Definition 4.** *For a formula* ϕ *with free variables* V *and action* a*, update*(ϕ, a) = ∃U.ϕ(U) ∧ Δa(U, V )*, where* U *is a fresh copy of* V *.*

Our approach will generate formulas of a special shape called *history constraints* [28], obtained by iterated *update* operations in combination with a sequence of *verification constraints* ϑ. Intuitively, the latter depends on the verification property. For now it suffices to consider ϑ an arbitrary sequence of constraints with free variables V . Its prefix of length k is denoted by ϑ|<sup>k</sup>. We need a fixed set of placeholder variables V<sup>0</sup> disjoint from V , and assume an injective variable renaming ν : V → V0. Let ϕ<sup>ν</sup> be the formula ϕ<sup>ν</sup> = <sup>v</sup>∈<sup>V</sup> <sup>v</sup> <sup>=</sup> <sup>ν</sup>(v).

**Definition 5.** *For a symbolic run* <sup>σ</sup> : <sup>b</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>b</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ <sup>b</sup>n*, and verification constraint sequence* ϑ = ϑ0,...,ϑ<sup>n</sup>*, the* history constraint h(σ, ϑ) *is given by* h(σ, ϑ) = ϕ<sup>ν</sup> ∧ϑ<sup>0</sup> *if* n = 0*, and* h(σ, ϑ) = *update*(h(σ|<sup>n</sup>−<sup>1</sup>, ϑ|<sup>n</sup>−<sup>1</sup>), an)∧ϑ<sup>n</sup> *if* n > 0*.*

Thus, history constraints are formulas with free variables V ∪ V0. Satisfying assignments for history constraints are closely related to assignments in runs:<sup>1</sup>

**Lemma 1.** *For a symbolic run* <sup>σ</sup> : <sup>b</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>b</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ <sup>b</sup><sup>n</sup> *and* <sup>ϑ</sup> <sup>=</sup> ϑ0,...,ϑ<sup>n</sup>*,* h(σ, ϑ) *is satisfied by assignment* α *with domain* V ∪V<sup>0</sup> *iff* σ *abstracts a run* <sup>ρ</sup>: (b0, α0) <sup>a</sup><sup>1</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ (bn, αn) *such that (i)* <sup>α</sup>0(v) = <sup>α</sup>(ν(v))*, and (ii)* <sup>α</sup>n(v) = α(v) *for all* v ∈ V *, and (iii)* α<sup>i</sup> |= ϑ<sup>i</sup> *for all* i*,* 0 ≤ i ≤ n*.*

<sup>1</sup> Lemma 1 is a slight variation of [28, Lemma 3.5]: Definition 5 differs from history constraints in [28] in that the initial assignment is not fixed. A proof can be found in [27].

#### **2.3 CTL***∗ f*

For a DDSA B as above, we consider the following verification properties:

**Definition 6.** *CTL*<sup>∗</sup> <sup>f</sup> *state formulas* χ *and path formulas* ψ *are defined by the following grammar, for constraints* c∈ C(V ) *and control states* b∈ B*:*

$$\chi := \top \mid c \mid b \mid \chi \land \chi \mid \neg \chi \mid \mathsf{E}\psi \qquad \psi := \chi \mid \psi \land \psi \mid \neg \psi \mid \mathsf{X}\psi \mid \mathsf{G}\psi \mid \psi \mid \mathsf{U}\psi$$

We use the usual abbreviations F ψ = U ψ, χ<sup>1</sup> ∨ χ<sup>2</sup> = ¬(¬χ<sup>1</sup> ∧ ¬χ2), and Aψ = ¬E¬ψ. To simplify the presentation, we do not explicitly treat next state operators a via a specific action a, as used in Example 1, though this would be possible (cf. [28]). However, such an operator can be encoded by adding a fresh data variable x to V , the conjunct x<sup>w</sup> = 1 to *guard*(a), and x<sup>w</sup> = 0 to all other guards, and replacing aψ in the verification property by X(ψ ∧ x = 1).

The maximal number of nested path quantifiers in a formula ψ is called the *quantifier depth* of ψ, denoted by *qd*(ψ). We adopt a finite path semantics for CTL<sup>∗</sup> [44]: For a control state b ∈ B and a state assignment α, let *FRuns*(b, α) be the set of *final runs* <sup>ρ</sup>: (b, α)=(b0, α0) <sup>a</sup><sup>1</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ (bn, αn) such that <sup>b</sup><sup>n</sup> <sup>∈</sup> <sup>F</sup> is a final state. The i-th configuration (bi, αi) in ρ is denoted by ρi.

**Definition 7.** *The semantics of CTL*<sup>∗</sup> <sup>f</sup> *is inductively defined as follows. For a DDSA* B *with configuration* (b, α)*, state formulas* χ*,* χ *, and path formulas* ψ*,* ψ *:*

$$\begin{array}{ll} (b,\alpha) \vdash \top \\ (b,\alpha) \vdash c & \mbox{iff } \alpha \vdash c \\ (b,\alpha) \vdash b' & \mbox{iff } b = b' \\ (b,\alpha) \vdash \chi \land \chi' \mbox{iff } (b,\alpha) \vdash \chi \mbox{ and } (b,\alpha) \vdash \chi' \\ (b,\alpha) \vdash \neg\chi & \mbox{iff } (b,\alpha) \not\models \chi \\ (b,\alpha) \vdash \mathsf{E}\,\psi & \mbox{iff } \exists \rho \in FRms(b,\alpha) \text{ such that } \rho \vdash \psi \end{array}$$

*where* ρ |= ψ *iff* ρ, 0 |= ψ *holds, and for a run* ρ *of length* n *and all* i*,* 0 ≤ i ≤ n*:*

$$\begin{array}{lll} \rho,i = \chi & \mbox{iff } \rho\_{i} = \chi\\ \rho,i = \neg\psi & \mbox{iff } \rho,i \neq \psi\\ \rho,i = \psi \wedge \psi' & \mbox{iff } \rho,i = \psi \text{ and } \rho,i \neq \psi'\\ \rho,i = \mathsf{X}\psi & \mbox{iff } i < n \text{ and } \rho,i+1 \mid = \psi\\ \rho,i = \mathsf{G}\ \psi & \mbox{iff } \mbox{for all } j,\ i \le j \le n, \mbox{ it holds that } \rho,j \mid = \psi\\ \rho,i = \psi \cup \psi' & \mbox{iff } \exists k \, with \, i+k \le n \text{ such that } \rho,i+k \mid = \psi'\\ & \mbox{and for all } j,0 \le j < k, \mbox{ it holds that } \rho,i+j \mid = \psi. \end{array}$$

Instead of simply checking whether the initial configuration of a DDSA B satisfies a CTL<sup>∗</sup> <sup>f</sup> property χ, we try to determine, for every state b ∈ B, which constraints on variables need to hold in order to satisfy χ. As the number of configurations (b, α) of a DDSA B is usually infinite, configuration sets cannot be enumerated explicitly. Instead, we represent a set of configurations as a *configuration map* K : B → C(V ) that associates with every control state b ∈ B a formula K(b) ∈ C(V ), representing all configurations (b, α) such that α |= K(b).

We now define when a configuration captures the maximal set of configurations in which a formula χ holds. We call these witness maps.

**Definition 8.** *For a DDSA* B *and state formula* χ*, a configuration map* K *is a* witness map *if it holds that* (b, α) |= χ *iff* α |= K(b)*, for all* b∈ B *and all* α*.*

For instance, for B from Example 2 and χ<sup>1</sup> = A G(x ≥ 2), a witness map is given by K = {b<sup>1</sup> → ⊥, b<sup>2</sup> → x ≥ 2 ∧ y ≥ 2, b<sup>3</sup> → x ≥ 2}. For χ<sup>2</sup> |= E X(A G(x ≥ 2)), a solution is K = {b<sup>1</sup> → x ≥ 2, b<sup>2</sup> → y ≥ 2, b<sup>3</sup> → ⊥}. As b<sup>1</sup> is the initial state, B satisfies χ<sup>2</sup> with every initial assignment that sets α*<sup>I</sup>* (x) ≥ 2.

In this paper we address the problem of finding a witness map for B and χ. Note that a witness map in particular allows to decide what is commonly called the *verification problem*, namely to check whether (b*<sup>I</sup>* , α*<sup>I</sup>* ) |= χ holds, by testing α*<sup>I</sup>* |= K(b*<sup>I</sup>* ). It remains to investigate whether there exist a DDSA B and χ for which no witness map exists, as the configuration set satisfying χ is not finitely representable. Even if it exists, finding it is in general undecidable. However, in this paper we identify DDSA classes where a witness map can always be found.

#### **3 LTL with Configuration Maps**

Following a common approach to CTL<sup>∗</sup> verification, our technique processes the property χ bottom-up, computing solutions for each subformula E ψ, before solving a linear-time model checking problem χ in which the solutions to subformulas appear as atoms. Given our representation of sets of configurations, we use LTL formulas where atoms are configuration maps, and denote this specification language by LTL<sup>B</sup> <sup>f</sup> . For a given DDSA B, it is formally defined as follows:

$$\psi \;:=\; K \mid \psi \land \psi \mid \neg \psi \mid \mathsf{X}\psi \mid \mathsf{G}\,\psi \mid \psi \; \mathsf{U}\,\psi$$

where K ∈ KB, for K<sup>B</sup> is the set of configuration maps for B.

**Definition 9.** *A run* ρ *of length* n satisfies *an* LTL<sup>B</sup> <sup>f</sup> *formula* ψ*, denoted* ρ |=<sup>K</sup> ψ*, iff* ρ, 0 |=<sup>K</sup> ψ *holds, where for all* i*,* 0 ≤ i ≤ n*:*

ρ, i |=<sup>K</sup> K *iff* ρ<sup>i</sup> = (b, α) *and* α |= *K(b);* ρ, i |=<sup>K</sup> ψ ∧ ψ *iff* ρ, i |=<sup>K</sup> ψ *and* ρ, i |=<sup>K</sup> ψ *;* ρ, i |=<sup>K</sup> ¬ψ *iff* ρ, i |=<sup>K</sup> ψ*;* ρ, i |=<sup>K</sup> Xψ *iff i* < *n and* ρ, i+1 |=<sup>K</sup> ψ*;* ρ, i |=<sup>K</sup> Gψ *iff* ρ, i |=<sup>K</sup> ψ *and (i* = *n or* ρ, i+1 |=<sup>K</sup> Gψ*);* ρ, i |=<sup>K</sup> ψ U ψ *iff* ρ, i |=<sup>K</sup> ψ *or ( i* < *n and* ρ, i |=<sup>K</sup> ψ *and* ρ, i+1 |=<sup>K</sup> ψ U ψ *)*.

Our approach to LTL<sup>B</sup> <sup>f</sup> verification proceeds along the lines of the LTL<sup>f</sup> procedure from [28], with the difference that simple constraint atoms are replaced by configuration maps. In order to express the requirements on a run of a DDSA B to satisfy an LTL<sup>B</sup> <sup>f</sup> formula χ, we use a nondeterministic automaton (NFA) N<sup>ψ</sup> = (Q, Σ, , q0, Q<sup>F</sup> ), where the states Q are a set of subformulas of ψ, Σ = 2<sup>K</sup><sup>B</sup> is the alphabet, is the transition relation, q<sup>0</sup> ∈ Q is the initial state, and Q<sup>F</sup> ⊆ Q is the set of final states. The construction of N<sup>ψ</sup> is standard [15,28], treating configuration maps for the time being as propositions; but for completeness it is described in [27, Appendix C]. For instance, for a configuration map K, ψ = F K

corresponds to the NFA <sup>ψ</sup> <sup>K</sup> and ψ = X K to <sup>ψ</sup> <sup>K</sup> <sup>K</sup> . (For simplicity, edges labels {K} are shown as K, and edge labels ∅ are omitted.)

For w<sup>i</sup> ∈ Σ, i.e., w<sup>i</sup> is a set of configuration maps, wi(b) denotes the formula <sup>K</sup>∈<sup>w</sup> <sup>K</sup>(b). Moreover, for <sup>w</sup> <sup>=</sup> <sup>w</sup>0,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> and a symbolic run <sup>σ</sup> : <sup>b</sup><sup>0</sup> <sup>a</sup><sup>1</sup> −→ <sup>b</sup><sup>1</sup> <sup>a</sup><sup>2</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ <sup>b</sup>n, let <sup>w</sup>⊗<sup>σ</sup> denote the sequence of formulas w0(b0),...,wn(bn), i.e., the component-wise application of w to the control states of σ. A word <sup>w</sup>0,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is *consistent* with a run (b0, α0) <sup>a</sup><sup>1</sup> −→ (b1, α1) <sup>a</sup><sup>2</sup> −→ ... <sup>a</sup><sup>n</sup> −−→ (bn, αn) if α<sup>i</sup> |= wi(bi) for all i, 0 ≤ i ≤ n. The key correctness property of N<sup>ψ</sup> is the following (cf. [28, Lemma 4.4], and see [27] for the proof adapted to LTL<sup>B</sup> <sup>f</sup> ):

**Lemma 2.** N<sup>ψ</sup> *accepts a word that is consistent with a run* ρ *iff* ρ |=<sup>K</sup> ψ*.*

*Product Construction.* As a next step in our verification procedure, given a control state b of B, we aim to find (a symbolic representation of) all configurations (b, α) that satisfy an LTL<sup>B</sup> <sup>f</sup> formula ψ. To that end, we combine N<sup>ψ</sup> with B to a cross-product automaton <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b. For technical reasons, when performing the product construction, the steps in B need to be shifted by one with respect to the steps in N<sup>ψ</sup>. Hence, given b∈ B, let B<sup>b</sup> be the DDSA obtained from B by adding a dummy initial state b, so that B<sup>b</sup> has state set B = B ∪ {b} and transition relation T = T ∪ {(b, a0, b)} for a fresh action a<sup>0</sup> with *guard*(a0) = .

**Definition 10.** *The* product automaton <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b *is defined for an* LTL<sup>B</sup> <sup>f</sup> *formula* ψ*, a DDSA* B*, and a control state* b ∈ B*. Let* B<sup>b</sup> = B , b, A, T , B<sup>F</sup> ,V,α*<sup>I</sup>* , *guard and* <sup>N</sup><sup>ψ</sup> *as above. Then* <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b = (P, R, p0, P<sup>F</sup> ) *is as follows:*


*Example 3.* Consider the DDSA B from Example 2, and let K = {b<sup>1</sup> → ⊥, b<sup>2</sup> → x ≥ 2 ∧ y ≥ 2, b<sup>3</sup> → x ≥ 2}. The property ψ = X K is captured by the NFA <sup>ψ</sup> <sup>K</sup> <sup>K</sup> . The product automata <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b<sup>1</sup> and <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b<sup>2</sup> are as follows:

where the shaded nodes are final. The formulas in nodes were obtained by applying quantifier elimination to the formulas built using *update* according to Definition 10. <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b<sup>3</sup> consists only of the dummy transition and has no final states.

Definition 10 need not terminate if infinitely many non-equivalent formulas occur in the construction. In Sect. 4 we will identify a criterion that guarantees termination. First, we state the key correctness property, which lifts [28, Theorem 4.7] to LTL with configuration maps. Its proof is similar to the respective result in [28], and can be found in [27].

**Theorem 1.** *Let* ψ ∈LTL<sup>B</sup> <sup>f</sup> *and* b∈ B *such that there is a finite product automaton* <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b*. Then there is a final run* <sup>ρ</sup>: (b, α0) <sup>→</sup><sup>∗</sup> (b<sup>F</sup> , α<sup>F</sup> ) *of* <sup>B</sup> *such that* <sup>ρ</sup> <sup>|</sup>=<sup>K</sup> <sup>ψ</sup>*, iff* <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b *has a final state* (b<sup>F</sup> , q<sup>F</sup> , ϕ) *for some* <sup>q</sup><sup>F</sup> *and* <sup>ϕ</sup> *such that* <sup>ϕ</sup> *is satisfied by assignment* γ *with* γ(V0) = α0(V ) *and* γ(V ) = α<sup>F</sup> (V )*.*

Thus, witnesses for ψ correspond to paths to final states in the product automaton: e.g., in <sup>N</sup> <sup>ψ</sup> <sup>B</sup>,b<sup>1</sup> in Example <sup>3</sup> the formula in the left final node is satisfied by γ(x0) = γ(x) = γ(y) = 3 and γ(y0) = 0. For α<sup>0</sup> and α<sup>2</sup> such that α0(V ) = γ(V <sup>0</sup>) = {x → 3, y → 0} and α2(V ) = γ(V ) = {x → 3, y → 3} there is a witness run for ψ from (b1, α0) to (b1, α2), e.g., (b1, - x=3 <sup>y</sup>=0 ) <sup>a</sup><sup>1</sup> −→ (b2, - x=3 <sup>y</sup>=3 ) <sup>a</sup><sup>3</sup> −→ (b3, - x=3 <sup>y</sup>=3 ).

#### **4 Model Checking Procedure**

Using the results of Sect. 3, we define a model checking procedure, shown in Fig. 1. First, we explain the tasks achieved by the three mutually recursive functions:

• *checkState*(χ) returns a configuration map representing the set of configurations that satisfy a state formula χ. In the base cases, it returns a function that checks the respective condition, for boolean operators we recurse on the arguments, and for a formula E ψ we proceed to the *checkPath* procedure.

• *checkPath*(ψ) returns a configuration map K that represents all configurations from which a path satisfying ψ exists. First, *toLTL*<sup>K</sup> is used to obtain an equivalent LTL<sup>B</sup> <sup>f</sup> formula ψ (which entails the computation of solutions for all subproperties E η). Then solution K is constructed as follows: For every control state <sup>b</sup>, we build the product automaton <sup>N</sup> <sup>ψ</sup>- <sup>B</sup>,b, and collect the set <sup>Φ</sup><sup>F</sup> of formulas in final states. Every ϕ ∈ Φ<sup>F</sup> encodes runs from b to a final state of B that satisfy ψ . The variables V<sup>0</sup> and V in ϕ act as placeholders for the initial and the final values of the runs, respectively. We rename variables in ϕ to use V at the start and U at the end, we quantify existentially over U (as the final valuation is irrelevant), and take the disjunction over all ϕ ∈ Φ<sup>F</sup> . The resulting formula ϕ encodes all final runs from b that satisfy ψ , so we set K(b) := ϕ .

• *toLTL*K(ψ) computes an LTL<sup>B</sup> <sup>f</sup> formula equivalent to a path formula ψ. To this end, it performs two kinds of replacements in ψ: (a) , b∈ B, and constraints c are represented as configuration maps; and (b) subformulas E η are replaced by their solutions KE<sup>η</sup>, which are computed by a recursive call to *checkPath*.

To represent the base cases of formulas as configuration maps in Fig. 1, we define K := (λ . ), K<sup>b</sup> := (λb . b = b ? : ⊥) for all b∈ B, and K<sup>c</sup> := (λ . c) for constraints c. We also write ¬K for (λb.¬K(b)) and K ∧K for (λb.K(b)∧K (b)). The next example illustrates the approach.

*Example 4.* Consider χ = E X(A G(x ≥ 2)) and the DDSA B in Example 2. To get a solution K<sup>1</sup> to *checkState*(χ) = *checkPath*(ψ1) for ψ<sup>1</sup> = X(A G(x ≥ 2)), we first compute an equivalent LTL<sup>B</sup> <sup>f</sup> formula ψ <sup>1</sup> = X K2, where K<sup>2</sup> is a solution to A G(x ≥ 2) ≡ ¬E F (x < 2). To this end, we run *checkPath*(ψ2) for ψ<sup>2</sup> = F (x < 2), which is represented in LTL<sup>B</sup> <sup>f</sup> as ψ <sup>2</sup> = F (Kx<2) with NFA <sup>ψ</sup> <sup>1</sup> <sup>K</sup>x<<sup>2</sup> . Next, *checkPath* builds <sup>N</sup> <sup>ψ</sup>- 2 <sup>B</sup>,b for all states <sup>b</sup>. For instance, for <sup>b</sup><sup>2</sup> we get:

where dashed arrows indicate transitions to non-final sink states. For U = x, ˆ yˆ, and the formulas ϕ1, ϕ2, and ϕ<sup>3</sup> in final nodes, we compute

$$\begin{array}{l} \exists U. \,\varphi\_{1}(V,U) = \exists \hat{x} \,\hat{y}. \,\hat{x} = x = \hat{y} = y \land x < 2 \equiv x < 2\\ \exists \overline{U}. \,\varphi\_{2}(\overline{V},\overline{U}) = \exists \hat{x} \,\hat{y}. \,\hat{x} = \hat{y} = y \land \hat{y} < 2\\ \exists \overline{U}. \,\varphi\_{3}(\overline{V},\overline{U}) = \exists \hat{x} \,\hat{y}. \,\hat{x} = \hat{y} = y \land x < 2 \end{array} \equiv \begin{array}{l} \equiv y < 2\\ \equiv y < 2\\ \equiv x < 2 \end{array}$$

so that K<sup>3</sup> := *checkPath*(ψ2) sets K3(b2) = <sup>3</sup> <sup>i</sup>=1 ∃U. ϕi(V ,U) ≡ x < 2 ∨ y < 2. For reasons of space, the constructions for b<sup>1</sup> and b<sup>3</sup> are shown in [27, Appendix B]; we obtain K3(b1) = and K3(b3) = x < 2. By negation, the solution K<sup>2</sup> to A G(x ≥ 2) is K<sup>2</sup> = ¬K<sup>3</sup> = {b<sup>1</sup> → ⊥, b<sup>2</sup> → x ≥ 2 ∧ y ≥ 2, b<sup>3</sup> → x ≥ 2}. Now we can proceed with *checkPath*(ψ1). The NFA and product automata for ψ <sup>1</sup> = X K<sup>2</sup> are as shown in Example 3 and in a similar way as above we obtain the solution K<sup>1</sup> for EXAG(x ≥ 2) as K<sup>1</sup> = {b<sup>1</sup> → x ≥ 2, b<sup>2</sup> → y ≥ 2, b<sup>3</sup> → ⊥}. Thus, B satisfies the property for any initial assignment α*<sup>I</sup>* with α*<sup>I</sup>* (x) ≥ 2.

Next we prove correctness of *checkState*(χ) under the condition that it is defined, i.e., all required product automata are finite. First we state our main result, but before giving its proof we show helpful properties of *toLTL*<sup>K</sup> and *checkPath*.

**Theorem 2.** *For every configuration* (b, α) *of the DDSA* B *and every state property* χ*, if checkState*(χ) *is defined then* (b, α) |= χ *iff* α |= *checkState*(χ)(b)*.*

**Lemma 3.** *Let* ψ *be a path formula with qd*(ψ) = k*. Suppose that for all configurations* (b, α) *and path formulas* ψ *with qd*(ψ ) < k*, there is a* ρ ∈ *FRuns*(b, α) *with* ρ |= ψ *iff* α |= *checkPath*(ψ )(b)*. Then* ρ |= ψ *iff* ρ |=<sup>K</sup> *toLTL*K(ψ)*.*

*Proof (sketch).* By induction on ψ. The base cases are by the definitions of K, Kb, and Kc. In the induction step, if ψ = E ψ then ρ |= ψ iff ∃ρ ∈ *FRuns*(b0, α0) with ρ |= ψ , for ρ<sup>0</sup> = (b0, α0). As *qd*(ψ ) < *qd*(ψ), this holds by assumption iff α<sup>0</sup> |= *checkPath*(ψ )(b0). This is equivalent to ρ |=<sup>K</sup> *toLTL*K(ψ) = *checkPath*(ψ ). All other cases are by the induction hypothesis and Definitions 7 and 9.

**Lemma 4.** *If* ψ = *toLTL*K(ψ) *such that for all runs* ρ *it is* ρ |= ψ *iff* ρ |=<sup>K</sup> ψ *, there is a run* ρ∈ *FRuns*(b, α) *with* ρ |= ψ *iff* α |= *checkPath*(ψ)(b)*.*

*Proof.* (=⇒) Suppose there is a run ρ∈ *FRuns*(b, α) with ρ |= ψ, so ρ is of the form (b, α) →<sup>∗</sup> (b<sup>F</sup> , α<sup>F</sup> ) for some b<sup>F</sup> ∈ B<sup>F</sup> . By assumption, this implies ρ |=<sup>K</sup> ψ , so that by Theorem 1, <sup>N</sup> <sup>ψ</sup>- <sup>B</sup>,b has a final state (b<sup>F</sup> , q<sup>F</sup> , ϕ) where <sup>ϕ</sup> is satisfied by an assignment γ with domain V ∪V<sup>0</sup> such that γ(V0) = α(V ) and γ(V ) = α<sup>F</sup> (V ). By definition, *checkPath*(ψ)(b) contains a disjunct ∃U. ϕ(V ,U). As γ satisfies ϕ and γ(V0) = α(V ), α |= *checkPath*(ψ)(b). (⇐=) If α |= *checkPath*(ψ)(b), by definition of *checkPath* there is a formula ϕ such that α |= ∃U. ϕ(V ,U) and ϕ occurs in a final state (b<sup>F</sup> , q<sup>F</sup> , ϕ) of <sup>N</sup> <sup>ψ</sup>- <sup>B</sup>,b. Hence there is an assignment <sup>γ</sup> with domain V ∪V<sup>0</sup> and γ(V0) = α(V ) such that γ |= ϕ. By Theorem 1, there is a run ρ: (b, α) →<sup>∗</sup> (b<sup>F</sup> , α<sup>F</sup> ) such that ρ |=<sup>K</sup> ψ . By the assumption, we have ρ |= ψ.

At this point the main theorem can be proven:

*Proof (of Theorem* 2*).* We first show (): for any path formula ψ, there is a run ρ ∈ *FRuns*(b, α) with ρ |= ψ iff α |= *checkPath*(ψ)(b). The proof is by induction on *qd*(ψ). If ψ contains no path quantifiers, Lemma 3 implies that ρ |= ψ iff ρ |=<sup>K</sup> *toLTL*K(ψ) for all runs ρ, so () follows from Lemma 4. In the induction step, we conclude from Lemma 3, using the induction hypothesis of () as assumption, that ρ |= ψ iff ρ |=<sup>K</sup> *toLTL*K(ψ) for all runs ρ. Again, () follows from Lemma 4.

The theorem is then shown by induction on χ: The base cases , b ∈ B, c∈ C are easy to check, and for properties of the form ¬χ and χ<sup>1</sup> ∧χ<sup>2</sup> the claim follows from the induction hypothesis and the definitions. Finally, for χ = E ψ, (b, α) |= χ iff there is a run ρ ∈ *FRuns*(b, α) such that ρ |= ψ. By () this is the case iff α |= *checkPath*(ψ)(b) = *checkState*(χ)(b).

*Termination.* We next show that the formulas generated in our procedure all have a particular shape, to obtain an abstract termination result. For a set of formulas Φ ⊆ C(V ) and a symbolic run σ, let a history constraint h(σ, ϑ) be *over basis* Φ if ϑ = ϑ0,...,ϑ<sup>n</sup> and for all i, 1 ≤ i ≤ n, there is a subset T<sup>i</sup> ⊆ Φ s.t. ϑ<sup>i</sup> = Ti. Moreover, for a set of formulas Φ, let Φ<sup>±</sup> = Φ ∪ {¬ϕ | ϕ ∈ Φ}.

**Definition 11.** *For a DDSA* B*, a constraint set* C *over free variables* V *, and* k ≥ 0*, the formula sets* Φ<sup>k</sup> *are inductively defined by* Φ<sup>0</sup> = C ∪ { , ⊥} *and*

$$\Phi\_{k+1} = \{ \bigvee\_{\varphi \in H} \exists \overline{U}. \ \varphi(\overline{V}, \overline{U}) \mid H \subseteq \mathcal{H}\_k \} $$

*where* H<sup>k</sup> *is the set of all history constraints of* B *with basis* <sup>i</sup>≤<sup>k</sup> <sup>Φ</sup><sup>±</sup> i *.*

Note that formulas in Φ<sup>k</sup> have free variables V , while those in H<sup>k</sup> have free variables V<sup>0</sup> ∪V . We next show that these sets correspond to the formulas generated by our procedure, if all constraints in the verification property are in C.

**Lemma 5.** *Let* <sup>E</sup> <sup>ψ</sup> *have quantifier depth* <sup>k</sup>*,* <sup>ψ</sup> <sup>=</sup> *toLTL*K(ψ)*, and* <sup>N</sup> <sup>ψ</sup>- <sup>B</sup>,b *be a constraint graph constructed in checkPath*(ψ) *for some* b ∈ B*. Then,*

*(1) for all nodes* (b , q,ϕ) *in* <sup>N</sup> <sup>ψ</sup>- <sup>B</sup>,b *there is some* <sup>ϕ</sup> ∈ H<sup>k</sup> *such that* <sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup> *, (2) checkPath*(ψ)(b) *is equivalent to a formula in* Φ<sup>k</sup>+1*.*

The statements are proven by induction on k, using the results on the product construction ([27, Lemma 6]). From part (1) of this lemma and Theorem 2 we thus obtain an abstract criterion for decidability that will be useful in the next section:

**Corollary 1.** *For a DDSA* B *as above and a state formula* χ*, if* H<sup>j</sup> (b) *is finite up to equivalence for all* j < *qd*(χ) *and* b∈ B*, a witness map can always be computed.*

*Proof.* By the assumption about the sets H<sup>j</sup> (b) for j < *qd*(χ), all product automata constructions in recursive calls *checkPath*(ψ) of *checkState*(χ) terminate if logical equivalence of formulas is checked eagerly. Thus *checkState*(χ) is defined, and by Theorem 2 the result is a witness map.

The property that all sets H<sup>j</sup> (b), j < *qd*(χ), are finite might not be decidable itself. However, in the next section we will show means to guarantee this property. Moreover, we remark that finiteness of all H<sup>j</sup> (b) implies a *finite history set*, a decidability criterion identified for the linear-time case [28, Definition 3.6]; but Example 5 below illustrates that the requirement on the H<sup>j</sup> (b)'s is strictly stronger.

#### **5 Decidability of DDSA Classes**

We here illustrate restrictions on DDSAs, either on the control flow or on the constraint language, that render our approach a decision procedure for CTL∗ f .

**Monotonicity constraints** (MCs) restrict constraints (Definition 1) as follows: MCs over variables V and domain D have the form p q where p, q ∈ D ∪V and is one of =, =, ≤, <, ≥, or >. The domain D may be R or Q. We call a boolean formula whose atoms are MCs an *MC formula*, a DDSA where all atoms in guards are MCs an *MC-DDSA*, and a CTL<sup>∗</sup> <sup>f</sup> property whose constraint atoms are MCs an *MC property*. For instance, B in Example 2 is an MC-DDSA.

We exploit a useful quantifier elimination property: If ϕ is an MC formula over a set of constants L and variables V ∪{x}, there is some ϕ ≡ ∃x. ϕ such that ϕ is a quantifier-free MC formula over V and L. Such a ϕ can be obtained by writing ϕ in disjunctive normal form and applying a Fourier-Motzkin procedure [36, Sect. 5.4] to each disjunct, which guarantees that all constants in ϕ also occur in ϕ.

**Theorem 3.** *For any DDSA* B *and property* χ *over monotonicity constraints, a witness map is computable.*

*Proof.* Let χ be an MC property, and L the finite set of constants in constraints in χ, α0, and guards of B. Let moreover MC<sup>L</sup> be the set of quantifier-free formulas whose atoms are MCs over V ∪ V<sup>0</sup> and L, so MC<sup>L</sup> is finite up to equivalence.

We show the following property (): all history constraints h(σ, ϑ) over basis MC<sup>L</sup> are equivalent to a formula in MCL. For a symbolic run <sup>σ</sup> : <sup>b</sup><sup>0</sup> <sup>→</sup><sup>∗</sup> <sup>b</sup><sup>n</sup>−<sup>1</sup> <sup>a</sup> −→ <sup>b</sup><sup>n</sup> and a sequence ϑ = ϑ0,...,ϑ<sup>n</sup> over MCL, the proof is by induction on n. In the base case, h(σ, ϑ) = ϕ<sup>ν</sup> ∧ ϑ<sup>0</sup> is in MC<sup>L</sup> because ϕ<sup>ν</sup> is a conjunction of equalities between V ∪ V0, and ϑ<sup>0</sup> ∈ MC<sup>L</sup> by assumption. In the induction step, h(σ, ϑ) = *update*(h(σ|<sup>n</sup>−1, ϑ|<sup>n</sup>−<sup>1</sup>), an) ∧ ϑn. By induction hypothesis, h(σ|<sup>n</sup>−<sup>1</sup>, ϑ|<sup>n</sup>−<sup>1</sup>) ≡ ϕ for some ϕ in MCL. Thus h(σ, ϑ) ≡ ∃U.ϕ(U)∧Δa(U, V )∧ϑn. As B is an MC-DDSA, Δa(U, V ) is a conjunction of MCs over V ∪ U and constants L, and ϑ<sup>n</sup> ∈ MC<sup>L</sup> by assumption. By the quantifier elimination property, there exists a quantifier-free MC-formula ϕ over variables V<sup>0</sup> ∪V that is equivalent to ∃U.ϕ(U)∧Δa(U, V )∧ϑn, and mentions only constants in L, so ϕ ∈ MCL.

For C the set of constraints in χ, we now show that H<sup>j</sup> ⊆ MC<sup>L</sup> for all j ≥ 0, by induction on j. In the base case (j = 0), the claim follows from (), as all constraints in Φ0, i.e., in χ, are in MCL. For j > 0, consider first a formula <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup><sup>j</sup> for some <sup>b</sup><sup>∈</sup> <sup>B</sup>. Then <sup>ϕ</sup> is of the form <sup>ϕ</sup> <sup>=</sup> <sup>ϕ</sup>∈<sup>H</sup> <sup>∃</sup>U. ϕ(V ,U) for some H ⊆ H<sup>j</sup>−<sup>1</sup>. By the induction hypothesis, H ⊆ MCL, so by the quantifier elimination property of MC formulas, <sup>ϕ</sup> is equivalent to an MC-formula over <sup>V</sup> and L in MCL. As H<sup>j</sup> s built over basis Φ<sup>j</sup> , the claim follows from ().

Notably, the above quantifier elimination property fails for MCs over integer variables; indeed, CTL model checking is undecidable in this case [42, Theorem 4.1].

**Integer periodicity constraint** systems confine the constraint language to variable-to-constant comparisons and restricted forms of variable-to-variable comparisons, and are for instance used in calendar formalisms [18,22]. More precisely, *integer periodicity constraint* (IPC) atoms have the form x = y, x d for ∈{=, =, <, >}, x ≡<sup>k</sup> y + d, or x ≡<sup>k</sup> d, for variables x, y with domain Z and k, d ∈ N. A boolean formula whose atoms are IPCs is an *IPC formula*, a DDSA whose guards are conjunctions of IPCs an *IPC-DDSA*, and a CTL<sup>∗</sup> f formula whose constraint atoms are IPCs an *IPC property*. For instance, B*ipc* in Example 2 is an IPC-DDSA.

Using Corollary 1 and a known quantifier elimination property for IPCs [18, Theorem 2], one can show that witness maps are also computable for IPC-DDSAs, in a proof that resembles the one of Theorem 3 (see [27, Theorem 4]).

**Theorem 4.** *For any DDSA* B *and property* χ *over integer periodicity constraints, a witness map is computable.*

The proofs of both Theorems 3 and 4 rely on the fact that all transition guards and constraints in the verification property are in a *finite* set of constraints C that is closed under quantifier elimination, so that for all ϕ ∈ C and actions a, *update*(ϕ, a) is again equivalent to a formula in C. However, this is not the only way to ensure the requirements of Corollary 1: For a simple example, these requirements are satisfied by a loop-free DDSA, where the number of runs is finite. Interestingly, while the cases of MC and IPC systems are also captured by the abstract decidability criterion by Gascon [31], this need not apply to loopfree DDSAs. A clarification of the relationship between the criteria in Corollary 1 and [31, Thm 4.5] requires further investigation.

**Bounded lookback** [28] restricts the control flow of a DDSA rather than the constraint language, and is a generalization of the earlier *feedback-freedom* property [14]. Intuitively, k-bounded lookback demands that the behavior of a DDSA at any point in time depends only on k events from the past. We refer to [28, Definition 5.9] for the formal definition. Systems that enjoy bounded lookback allow for decidable linear-time verification [28, Theorem 5.10]. However, we next show that this result does not extend to branching time.

*Example 5.* We reduce control state reachability of two-counter machines (2CM) to the verification problem of CTL<sup>∗</sup> <sup>f</sup> formulas in bounded lookback systems, inspired by [42, Theorem 4.1]. 2CMs have a finite control structure and two counters x1, and x<sup>2</sup> that can be incremented, decremented, and tested for 0. It is undecidable whether a 2CM will ever reach a designated control state f [43]. For a 2CM M, we build a feedback-free DDSA B = B, b*<sup>I</sup>* , A,T,B<sup>F</sup> ,V,α*<sup>I</sup>* , *guard* and a CTL<sup>∗</sup> <sup>f</sup> property χ such that B satisfies χ iff f is reachable in M. The set B consists of the control states of M, together with an error state *e* and auxiliary states b<sup>t</sup> for transitions t of M, and B<sup>F</sup> = {f,e}. The set V consists of x1, x<sup>2</sup> and auxiliary variables p1, p2, m1, m2. Zero-test transitions of M are directly modeled in B, whereas a step q → q that increments x<sup>i</sup> by one is modeled as:

The step q → b<sup>t</sup> writes xi, storing its previous value in pi, but if the write was not an increment by exactly 1, a step to state e is enabled. Decrements are modeled similarly. Intuitively, bounded lookback holds because variable dependencies are limited: in a run of M, a variable dependency that is not an equality extends over at most two time points. (More formally, non-equality paths in the computation graph have at most length 1.) As increments are not exact, B overapproximates M. However, χ = E G(¬E Xe) asserts existence of a path that never allows for a step to e (i.e., it properly simulates M) but reaches the final state f. Thus, B satisfies χ iff f is reachable in M.

#### **6 Implementation**

We implemented our approach in the prototype ada (arithmetic DDS analyzer) in Python; source code, benchmarks, and a web interface are available (https:// ctlstar.adatool.dev). As input, the tool takes a CTL<sup>∗</sup> property χ together with a DDSA in JSON format; alternatively, a given (bounded) Petri net with data (DPN) in PNML format [5] can be transformed into a DDSA. The tool then applies the algorithm in Fig. 1. If successful, it outputs the configuration map returned by *checkState*(χ), and it can visualize the product constructions. For SMT checks and quantifier elimination, ada interfaces CVC5 [23] and Z3 [17]. Besides numeric variables, ada also supports variables of type boolean and string; for the latter, only equality comparison is supported, so different constants can be represented by distinct integers. In addition to the operations in Definition 6, ada allows next operators a via an action a, which are useful for verification.

We tested ada on a set of business process models presented as Data Petri nets (DPNs) in the literature. As these nets are bounded, they can be transformed into DDSAs. The results are reported in the table below. We indicate whether the system belongs to a decidable class, the verified property and whether it is satisfied by the initial configuration, the verification time, the number of SMT checks, and the number of nodes in the DDSA B and the sum of all product constructions, respectively. We used CVC5 as SMT solver; times are without visualization, which tends to be time-consuming for large graphs. All tests were run on an Intel Core i7 with 4×2.60 GHz and 19 GB RAM.


We briefly comment on the benchmarks and some properties: For all examples we checked *no deadlock*, which abbreviates AG EF χ<sup>f</sup> where χ<sup>f</sup> is a disjunction of all final states. This is one of the two requirements of the crucial *soundness* property (cf. Example 1). Weak soundness [4] relaxes this requirement to demand only that if a transition is reachable, it does not lead to deadlocks; this is called here *no deadlock*(a), expressed by EF (a ) → AG(a → F χ<sup>f</sup> ). One can also check whether a specific state p is deadlock-free, via AG(p → EF χ<sup>f</sup> ).


all initial configurations, the output of ada reveals that for (h) this need not hold for other initial assignments.


Seven systems are in a decidable class wrt. the listed properties: (a), (b), (d), (f), (h), (i), (k) are MC, while (d), (h), (i), (k) are IPC. This is due to the fact that automatic mining techniques often produce monotonicity constraints [39].

#### **7 Conclusion**

This paper presents a technique to compute witness maps for a given DDSA and CTL<sup>∗</sup> <sup>f</sup> property, where a witness map specifies conditions on the initial variable assignment such that the property holds. The addressed problem is thus a slight generalization of the common verification problem. While our model checking procedure need not terminate in general, we show that it does if an abstract property on history constraints holds. Moreover, witness maps always exist for monotonicity and integer periodicity constraint systems. However, this result does not extend to bounded lookback systems. We implemented our approach in the tool ada and showed its usefulness on a range of business process models.

We see various opportunities to extend this work. A richer verification language could support past time operators [18] and future variable values [20]. Further decidable fragments could be sought using covers [33], or aiming for compatibility with locally finite theories [32]. Moreover, a restricted version of the bounded lookback property could guarantee decidability of CTL<sup>∗</sup> <sup>f</sup> , similarly to the way feedback freedom was strengthened in [35]. The implementation could be improved to avoid the computation of many similar formulas, thus gaining efficiency. Finally, the complexity class that our approach implies for CTL<sup>∗</sup> <sup>f</sup> in the decidable classes is yet to be clarified.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SAT-Based Proof Search in Intermediate Propositional Logics**

Camillo Fiorentini<sup>1</sup> and Mauro Ferrari2(B)

<sup>1</sup> Department of Computer Science, Universit`a degli Studi di Milano, Milan, Italy <sup>2</sup> Department of Theoretical and Applied Sciences, Universit`a degli Studi dell'Insubria, Varese, Italy mauro.ferrari@uninsubria.it

**Abstract.** We present a decision procedure for intermediate logics relying on a modular extension of the SAT-based prover intuitR for IPL (Intuitionistic Propositional Logic). Given an intermediate logic L and a formula α, the procedure outputs either a Kripke countermodel for α or the instances of the characteristic axioms of L that must be added to IPL in order to prove α. The procedure exploits an incremental SAT-solver; during the computation, new clauses are learned and added to the solver.

## **1 Introduction**

Recently, Claessen and Ros´en have introduced intuit [4], an efficient decision procedure for Intuitionistic Propositional Logic (IPL) based on the Satisfiability Modulo Theories (SMT) approach. The prover language consists of (flat) clauses of the form - <sup>A</sup><sup>1</sup> <sup>→</sup> <sup>A</sup><sup>2</sup> (with <sup>A</sup>i a set of atoms), which are fed to the SATsolver, and implication clauses of the form (a <sup>→</sup> b) <sup>→</sup> c (a, b, c atoms); thus, we need an auxiliary clausification procedure to preprocess the input formula. The search is performed via a proper variant of the DPLL(T ) procedure [16], by exploiting an incremental SAT-solver; during the computation, whenever a semantic conflict is thrown, a new clause is learned and added to the SAT-solver. As discussed in [9], there is a close connection between the intuit approach and the known proof-theoretic methods. Actually, the decision procedure mimics the standard root-first proof search strategy for a sequent calculus strongly connected with Dyckhoff's calculus LJT [5] (alias G4ip). To improve performances, we have re-designed the prover by adding a restart operation, thus obtaining intuitR [8] (intuit with Restart). Differently from intuit, the intuitR procedure has a simple structure, consisting of two nested loops. Given a formula α, if α is provable in IPL the call intuitR(α) yields a derivation of α in the sequent calculus introduced in [8], a plain calculus where derivations have a single branch. If α is not provable in IPL, the outcome of intuitR(α) is a (typically small) countermodel for α, namely a Kripke model falsifying α. We stress that intuitR is highly performant: on the basis of a standard benchmarks suite, it outperforms intuit and other state-of-the-art provers (in particular, fCube [6] and intHistGC [12]).

In this paper we present intuitRIL, an extension of intuitR to Intermediate Logics, namely propositional logics extending IPL and contained in CPL (Classical Propositional Logic). Specifically, let α be a formula and L an axiomatizable intermediate logic having Kripke semantics; the call intuitRIL(α,L) tries to prove the validity of α in L. To this aim, the prover searches for a set Ψ containing instances of Ax(L), the characteristic axioms of L, such that α can be proved in IPL from Ψ. Note that this is different from other approaches, where the focus is on the synthesis of specific inference rules for the logic at hand (see, e.g., [17]). Basically, intuitRIL(α,L) searches for a countermodel <sup>K</sup> for α, exploiting the search engine of intuitR: whenever we get <sup>K</sup>, we check whether <sup>K</sup> is a model of L. If this is the case, we conclude that α is not valid in L (and <sup>K</sup> is a witness to this). Otherwise, the prover selects an instance ψ of Ax(L) falsified in <sup>K</sup> (there exists at least one); ψ is acknowledged as learned axiom and, after clausification, it is fed to the SAT-solver. We stress that a naive implementation of the procedure, where at each iteration of the main loop the computation restarts from scratch, would be highly inefficient: each time the SAT-solver should be initialized by inserting all the clauses encoding the input problem and all the clauses learned so far. Instead, we exploit an incremental SAT-solver, where clauses can be added but never deleted (hence, all the simplifications and optimisations performed by the solver are preserved); note that this prevents us from exploiting strategies based on standard sequent/tableaux calculi, where backtracking is required.

If the call intuitRIL(α,L) succeeds, by tracking the computation we get a derivation <sup>D</sup> of <sup>α</sup> in the sequent calculus <sup>C</sup>L (see Fig. 1); from <sup>D</sup> we can extract all the axioms learned during the computation. We stress that the procedure is quite modular: to handle a logic L, one has only to implement a specific learning mechanism for L (namely: if <sup>K</sup> is not a model of L, pick an instance of Ax(L) falsified in K). The main drawback is that there is no general way to bound the learned axioms, thus termination must be investigated on a case-by-case basis. We guarantee termination for some relevant intermediate logics, such as G¨odel-Dummett Logic GL, the family GLn (<sup>n</sup> <sup>≥</sup> 1) of G¨odel-Dummett Logics with depth bounded by <sup>n</sup> (GL<sup>1</sup> coincides with Here and There Logic, well known for its applications in Answer Set Programming [15]) and Jankov Logic (for a presentation of such logics see [2]). As a corollary, for each of the mentioned logic L we get a bounding function [3], namely: given α, we compute a bounded set <sup>Ψ</sup>α of instances of Ax(L) such that <sup>α</sup> is valid in <sup>L</sup> iff <sup>α</sup> is provable in IPL from assumptions <sup>Ψ</sup>α; in general we improve the bounds in [1,3]. The intuitRIL Haskell implementation and other additional material (e.g., the omitted proofs) can be downloaded at https://github.com/cfiorentini/intuitRIL.

#### **2 Basic Definitions**

Formulas, denoted by lowercase Greek letters, are built from an enumerable set of propositional variables V, the constant ⊥ and the connectives ∧, ∨, →; moreover, <sup>¬</sup>α stands for α → ⊥ and α <sup>↔</sup> β stands for (α <sup>→</sup> β) <sup>∧</sup> (β <sup>→</sup> α). Elements of the set V ∪ {⊥} are called *atoms* and are denoted by lowercase Roman letters, uppercase Greek letters denote sets of formulas. By <sup>V</sup>α we denote the set of propositional variables occurring in <sup>α</sup>. The notation is extended to sets: <sup>V</sup>Γ is the union of <sup>V</sup>α such that <sup>α</sup> <sup>∈</sup> <sup>Γ</sup>; <sup>V</sup>Γ,Γ and <sup>V</sup>Γ,α stand for <sup>V</sup>Γ∪Γ and <sup>V</sup>Γ∪{α} respectively. A *substitution* is a map from propositional variables to formulas. By [p<sup>1</sup> <sup>→</sup> <sup>α</sup><sup>1</sup>,...,pn <sup>→</sup> <sup>α</sup>n] we denote the substitution <sup>χ</sup> such that <sup>χ</sup>(p) = <sup>α</sup>i if <sup>p</sup> <sup>=</sup> <sup>p</sup>i and <sup>χ</sup>(p) = <sup>p</sup> otherwise; the set {p<sup>1</sup>,...,pn} is the *domain* of <sup>χ</sup>, denoted by Dom(χ); is the substitution having empty domain. The application of χ to a formula α, denoted by χ(α), is defined as usual; χ(Γ) is the set of χ(α) such that <sup>α</sup> <sup>∈</sup> <sup>Γ</sup>. The *composition* <sup>χ</sup><sup>1</sup> · <sup>χ</sup><sup>2</sup> is the substitution mapping <sup>p</sup> to <sup>χ</sup>1(χ2(p)).

<sup>A</sup> *(classical) interpretation* M is a subset of <sup>V</sup>, identifying the propositional variables assigned to true. By M <sup>|</sup><sup>=</sup> α we mean that α is true in M; M <sup>|</sup><sup>=</sup> Γ iff M <sup>|</sup><sup>=</sup> α for every α <sup>∈</sup> Γ. Classical Propositional Logic (CPL) is the set of formulas true in every interpretation. We write <sup>Γ</sup> <sup>c</sup> <sup>α</sup> iff <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>Γ</sup> implies <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>α</sup>, for every <sup>M</sup>. Note that <sup>α</sup> is CPL-valid (namely, <sup>α</sup> <sup>∈</sup> CPL) iff <sup>∅</sup><sup>c</sup> <sup>α</sup>.

A (rooted) Kripke model is a quadruple W, <sup>≤</sup>, r, ϑ where W is a finite and non-empty set (the set of *worlds*), ≤ is a reflexive and transitive binary relation over W, the world r (the *root* of <sup>K</sup>) is the minimum of W w.r.t. <sup>≤</sup>, and ϑ : W → 2<sup>V</sup> (the *valuation* function) is a map obeying the persistence condition: for every pair of worlds <sup>w</sup><sup>1</sup> and <sup>w</sup><sup>2</sup> of <sup>K</sup>, <sup>w</sup><sup>1</sup> <sup>≤</sup> <sup>w</sup><sup>2</sup> implies <sup>ϑ</sup>(w<sup>1</sup>) <sup>⊆</sup> <sup>ϑ</sup>(w<sup>2</sup>); the triple W, <sup>≤</sup>, r is called *(Kripke) frame*. The valuation ϑ is extended to a *forcing* relation between worlds and formulas as follows:

$$\forall w \Vdash p \text{ iff } p \in \vartheta(w), \forall p \in \mathcal{V} \qquad w \Vdash \bot \quad \begin{array}{c} w \Vdash \bot \quad \begin{array}{c} w \Vdash \alpha \land \beta \text{ iff } w \Vdash \alpha \text{ and } w \Vdash \beta \end{array} \\ \implies \exists w \Vdash \beta \text{ iff } w \Vdash \alpha \text{ or } w \Vdash \beta \quad \begin{array}{c} w \Vdash \alpha \to \beta \text{ iff } \forall w' \ge w, w' \Vdash \alpha \text{ implies } w' \Vdash \beta. \end{array} \end{array}$$

By w - Γ we mean that w α for every α <sup>∈</sup> Γ. A formula α is *valid* in the frame W, <sup>≤</sup>, r iff for every valuation ϑ, r α in the model W, <sup>≤</sup>, r, ϑ. Propositional Intuitionistic Logic (IPL) is the set of formulas valid in all frames. Accordingly, if there is a model <sup>K</sup> such that r α (here and below r designates the root of <sup>K</sup>), then <sup>α</sup> is not IPL-valid; we call <sup>K</sup> <sup>a</sup> *countermodel* for <sup>α</sup>. We write <sup>Γ</sup> <sup>i</sup> <sup>δ</sup> iff, for every model <sup>K</sup>, r - Γ implies r <sup>δ</sup>; thus, <sup>α</sup> is IPL-valid iff <sup>∅</sup><sup>i</sup> <sup>α</sup>.

Let L be one of the logics IPL and CPL; then, L is closed under modus ponens ({α, α <sup>→</sup> β} ⊆ L implies β <sup>∈</sup> L) and under substitution (for every χ, α <sup>∈</sup> L implies χ(α) <sup>∈</sup> L). An *intermediate logic* is any set of formulas L such that IPL <sup>⊆</sup> L <sup>⊆</sup> CPL, L is closed under modus ponens and under substitution. A model <sup>K</sup> is an L-model iff r - L; if r α, we say that <sup>K</sup> is an L-*countermodel* for α. An intermediate logic L can be characterized by a set of CPL-valid formulas, called the L-*axioms* and denoted by Ax(L). An L-axiom ψ of Ax(L) must be understood as a schematic formula, representing all the formulas of the kind χ(ψ); we call χ(ψ) an *instance* of ψ. Formally, IPL + Ax(L) is the intermediate logic collecting the formulas <sup>α</sup> such that <sup>Ψ</sup> <sup>i</sup> <sup>α</sup>, where <sup>Ψ</sup> is a finite set of instances of L-axioms from Ax(L). A *bounding function* for L is a map that, given <sup>α</sup>, yields a finite set <sup>Ψ</sup>α of instances of <sup>L</sup>-axioms such that <sup>Ψ</sup>α <sup>i</sup> <sup>α</sup>. If <sup>L</sup> admits a computable bounding function, we can reduce L-validity to IPL-validity (see [3] for an in-depth discussion). Let F be a class of frames and let Log(F) be the set of formulas valid in all frames of F; then, Log(F) is an intermediate logic. A logic L has *Kripke semantics* iff there exists a class of frames <sup>F</sup> such that L = Log(F); we also say that L is characterized by <sup>F</sup>. Henceforth, when we mention a logic L, we leave understood that L is an axiomatizable intermediate logic having Kripke semantics.

*Example 1 (*GL*).* A well-known intermediate logic is G¨odel-Dummett logic GL [2], characterized by the class of linear frames. An axiomatization of GL is obtained by adding the linearity axiom **lin** = (a <sup>→</sup> b) <sup>∨</sup> (b <sup>→</sup> a) to IPL. Using the terminology of [3], GL is formula-axiomatizable: a bounding function for GL is obtained by mapping <sup>α</sup> to the set <sup>Ψ</sup>α of instances of **lin** where <sup>a</sup> and <sup>b</sup> are replaced with subformulas of α. In [1] it is proved that it is sufficient to consider the subformulas of <sup>α</sup> of the kind <sup>p</sup> ∈ Vα, <sup>¬</sup>β, <sup>β</sup><sup>1</sup> <sup>→</sup> <sup>β</sup><sup>2</sup>. In Lemma <sup>4</sup> we further improve this bound tacking as bounding function the following map:

$$\begin{array}{c} \text{Ax}\_{\text{GL}}(\alpha) = \{ (a \to b) \lor (b \to a) \mid a, b \in \mathcal{V}\_{\alpha} \} \cup \{ (a \to \neg a) \lor (\neg a \to a) \mid a \in \mathcal{V}\_{\alpha} \} \\ \cup \{ (a \to (a \to b)) \lor ((a \to b) \to a) \} \mid a, b \in \mathcal{V}\_{\alpha} \} \end{array}$$

Thus, if <sup>V</sup>α <sup>=</sup> {a}, the only instance of **lin** to consider is (<sup>a</sup> → ¬a)∨(¬<sup>a</sup> <sup>→</sup> <sup>a</sup>), independently of the size of α (the other instances are IPL-valid and can be omitted). As pointed out in [3], GL is not variable-axiomatizable, namely: it is not sufficient to consider instances of **lin** obtained by replacing a and b with variables from <sup>V</sup>α. As an example, let <sup>α</sup> <sup>=</sup> <sup>¬</sup><sup>a</sup> ∨ ¬¬a; <sup>α</sup> is GL-valid, the only variable-replacement instance of **lin** is <sup>ψ</sup>α = (<sup>a</sup> <sup>→</sup> <sup>a</sup>) <sup>∨</sup> (<sup>a</sup> <sup>→</sup> <sup>a</sup>) and <sup>ψ</sup>α <sup>i</sup> <sup>α</sup>. ♦

We review the main concepts about the clausification procedure described in [4]. *Clauses* ϕ and *implication clauses* λ are defined as

$$\begin{aligned} \varphi &:= \bigwedge A\_1 \to \bigvee A\_2 \mid \bigvee A\_2 \\ \lambda &:= (a \to b) \to c \end{aligned} \qquad \begin{aligned} \emptyset &\subset A\_k \subseteq \mathcal{V} \cup \{\perp\}, \text{for} & k \in \{1, 2\} \\ a &\in \mathcal{V}, \ \{b, c\} \subseteq \mathcal{V} \cup \{\perp\} \end{aligned}$$

where - <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> denote the conjunction and the disjunction of the atoms in <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> respectively (-{a} <sup>=</sup> {a} <sup>=</sup> a). Henceforth, - ∅ → <sup>A</sup><sup>2</sup> must be read as A<sup>2</sup>; <sup>R</sup>, <sup>R</sup><sup>1</sup>, . . . denote sets of clauses, <sup>X</sup>, <sup>X</sup><sup>1</sup>, . . . sets of implication clauses. Given a set of implication clauses X, the *closure* of X, denoted by (X), is the set of clauses b <sup>→</sup> c such that (a <sup>→</sup> b) <sup>→</sup> c <sup>∈</sup> X.

The following lemma states some properties of clauses and closures.

**Lemma 1.** *(i)* <sup>R</sup> <sup>i</sup> <sup>g</sup> *iff* <sup>R</sup> <sup>c</sup> <sup>g</sup>*, for every set of clauses* <sup>R</sup> *and every atom* <sup>g</sup>*. (ii)* <sup>X</sup> <sup>i</sup> <sup>b</sup> <sup>→</sup> <sup>c</sup>*, for every* <sup>b</sup> <sup>→</sup> <sup>c</sup> <sup>∈</sup> (X)*. (iii)* <sup>Γ</sup> <sup>i</sup> <sup>α</sup> *iff* <sup>α</sup> <sup>↔</sup> g, Γ <sup>i</sup> <sup>g</sup>*, where* <sup>g</sup> ∈ VΓ,α*.*

*Clausification.* We assume a procedure Clausify that, given a formula α, computes sets of clauses R and X equivalent to α w.r.t. IPL. Formally, let α be a formula and let <sup>V</sup> be a set of propositional variables such that <sup>V</sup>α <sup>⊆</sup> <sup>V</sup> . The procedure Clausify(α,V ) computes a triple (R, X, χ) satisfying:


$$\begin{array}{llll} \frac{R \vdash\_{\mathbb{C}} g}{R, X \Rightarrow g} & \text{cpl}\_{0} & \frac{R, A \vdash\_{\mathbb{C}} b \qquad R, \varphi, X \Rightarrow g}{R, X \Rightarrow g} & \text{cpl}\_{1}(\lambda) & A \subseteq \mathsf{V}\_{R, X, g} \\\\ \frac{R, (X)^{\*}, X \Rightarrow g}{\Rightarrow \alpha} & \text{Claus}\_{0}(g, \chi) & (R, X, \chi) = \mathsf{Clausify}(\alpha \leftrightarrow g, \mathsf{V}\_{\alpha, g}) \\\\ \frac{R, R', (X')^{\*}, X, X' \Rightarrow g}{R, X \Rightarrow g} & \text{Claus}\_{1}(\psi, \chi) & \begin{matrix} \psi \in \text{Ax}(L, \mathsf{V}\_{R, X, g}) \\\\ (R', X', \chi) = \mathsf{Clausify}(\psi, \mathsf{V}\_{R, X, g}) \end{matrix} \\\\ \frac{R}{} R \text{ is a set of clauses} & \begin{matrix} \phi \ \mathsf{f}\_{\mathsf{R}} \left[g \leftrightarrow \alpha\right] \cdot \chi \end{matrix} & \begin{matrix} \begin{matrix} \phi \ \mathsf{f}\_{\mathsf{R}} \left[g \rightarrow \alpha\right] \end{matrix} & \text{if } \rho = \mathsf{l} \text{Less} \begin{pmatrix} \phi \ \mathsf{f}\_{\mathsf{R}} \end{pmatrix} \\\\ \end{array} \end{array}$$

# **Fig. 1.** The sequent calculus C<sup>L</sup>.

Basically, clausification introduces new propositional variables to represent subformulas of α; as a result we obtain a substitution χ which tracks the mapping on the new variables. Condition (C1) states that α can be replaced by R <sup>∪</sup> X in IPL reasoning. By (C2) the domain of χ consists of the new variables introduced in the clausification process. The following properties easily follow by (C1)–(C3):

$$\text{(P1)}\ R, X \vdash\_{\text{i}} \alpha.\text{ }\qquad\text{(P2)}\ R, X \vdash\_{\text{i}} \beta \leftrightarrow \chi(\beta) \text{ for every formula } \beta.$$

We exploit a Clausify procedure essentially similar to the one described in [4], with slight modifications in order to match (C3). As discussed in [4], in IPL we can use a weaker condition (either R, X <sup>i</sup> <sup>p</sup> <sup>→</sup> <sup>χ</sup>(p) or R, X <sup>i</sup> <sup>χ</sup>(p) <sup>→</sup> <sup>p</sup> according to the case). It is not obvious whether the weaker condition should be more efficient; in many cases strong equivalences are more performant, maybe because they trigger more simplifications in the SAT-solver.

*Example 2.* Let α = (a <sup>→</sup> b)∨(b <sup>→</sup> a) and V <sup>=</sup> {a, b}. The call Clausify(α,V ) introduces the new variables ˜p<sup>0</sup> and ˜p<sup>1</sup> associated with the subformulas <sup>a</sup> <sup>→</sup> <sup>b</sup> and b <sup>→</sup> a respectively. Accordingly, the obtained sets R and X must satisfy R, X <sup>i</sup> <sup>p</sup>˜<sup>0</sup> <sup>↔</sup> (<sup>a</sup> <sup>→</sup> <sup>b</sup>) and R, X <sup>i</sup> <sup>p</sup>˜<sup>1</sup> <sup>↔</sup> (<sup>b</sup> <sup>→</sup> <sup>a</sup>). We get:

$$\begin{array}{ll} R & = \left\{ \tilde{p}\_0 \lor \tilde{p}\_1, \,\,\tilde{p}\_0 \land a \to b, \,\,\tilde{p}\_1 \land b \to a \right\} & \chi &= \left[ \,\,\tilde{p}\_0 \mapsto a \to b, \,\,\tilde{p}\_1 \mapsto b \to a \right] \\ X & = \left\{ \,(a \to b) \to \tilde{p}\_0, \,(b \to a) \to \tilde{p}\_1 \right\} \end{array}$$

♦

## **3 The Calculus** *CL*

Let <sup>L</sup> be an intermediate logic; we introduce the sequent calculus <sup>C</sup>L to prove L-validity. We assume that L is axiomatized by a set Ax(L) of L-axioms; by

$$\begin{array}{c} \begin{array}{l} \begin{array}{l} \begin{array}{l} R\_{n-1} \ \vdash \text{c} \end{array} g \end{array} \\ \begin{array}{l} \begin{array}{l} \begin{array}{l} R\_{n-1}, X\_{n-1} \Rightarrow g \end{array} \rho\_{n} = \text{cpl}\_{0} \\ R\_{n-2}, X\_{n-2} \Rightarrow g \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \rho\_{n} = \text{cpl}\_{0} \\ \rho\_{n-1} \end{array} \\ \begin{array}{l} \forall i \in \{1, \ldots, n-1\}, \ \rho\_{i} = \text{cpl}\_{i} \end{array} \end{array} \quad \begin{array}{l} \forall i \in \{1, \ldots, n-1\}, \ \rho\_{i} = \text{cpl}\_{1} \text{ or } \rho\_{i} = \text{Claus}\_{1} \\ \begin{array}{l} \begin{array}{l} \pi(\mathcal{D}) = \{\Psi\_{0} \cup \cdots \cup \Psi\_{n} \, , \chi\_{0} \cdot \ldots \,\chi\_{n}\} \end{array} \end{array} \end{array} \end{array}$$

**Fig. 2.** <sup>A</sup> C<sup>L</sup>-derivation of <sup>⇒</sup> α.

Ax(L, V ) we denote the set of instances <sup>ψ</sup> of <sup>L</sup>-axioms such that <sup>V</sup>ψ <sup>⊆</sup> <sup>V</sup> . The calculus relies on a clausification procedure Clausify satisfying conditions (C1)– (C3) and acts on sequents Γ <sup>⇒</sup> δ such that:

– either Γ <sup>=</sup> <sup>∅</sup> or Γ <sup>=</sup> R <sup>∪</sup> X and (X) <sup>⊆</sup> <sup>R</sup> and <sup>δ</sup> is an atom.

Rules of <sup>C</sup><sup>L</sup> are displayed in Fig. 1. Rule cpl<sup>0</sup> (initial rule) can only be applied if the condition <sup>R</sup> <sup>c</sup> <sup>g</sup> holds; if this is the case, the conclusion R, X <sup>⇒</sup> <sup>g</sup> is an initial sequent, namely a top sequent of a derivation. The other rules depend on parameters that are made explicit in the rule name. A bottom-up application of cpl<sup>1</sup> requires the choice of an implication clause <sup>λ</sup> = (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>→</sup> <sup>c</sup> from <sup>X</sup>, we call the *main formula*, and the selection of a set of atoms <sup>A</sup> ⊆ VR,X,g such that R, A <sup>c</sup> <sup>b</sup>, where <sup>b</sup> is the middle variable in <sup>λ</sup>. As discussed in [8,9], cpl<sup>1</sup> is a sort of generalization of the rule L →→ of the sequent calculus LJT/G4ip for IPL [5,18]. Rules Claus<sup>0</sup> and Claus<sup>1</sup> exploit the clausification procedure. Rule Claus<sup>0</sup> requires the clausification of the formula <sup>α</sup> <sup>↔</sup> <sup>g</sup>, with <sup>g</sup> a new atom (<sup>g</sup> ∈ Vα); in rule Claus1, the clausified formula <sup>ψ</sup> is selected from Ax(L, <sup>V</sup>R,X,g). In both cases, the clauses returned by Clausify are stored in the premise of the applied rule and the computed substitution χ is displayed in the rule name; moreover, Claus<sup>0</sup> is annotated with the new atom <sup>g</sup> and Claus<sup>1</sup> with the chosen L-axiom ψ. To recover the relevant information associated with the application of a rule ρ, in Fig. <sup>1</sup> we define the pair π(ρ) = Ψ,χ, where Ψ is a set of instances of <sup>L</sup>-axioms and <sup>χ</sup> is a substitution. <sup>C</sup>L-trees and <sup>C</sup>L-derivations are defined as usual (see e.g. [18]); a sequent <sup>σ</sup> is provable in <sup>C</sup>L iff there exists a <sup>C</sup>L-derivation having root sequent <sup>σ</sup>. Let us consider a <sup>C</sup>L-derivation <sup>D</sup> of <sup>⇒</sup> <sup>α</sup> (see Fig. 2). Reading the derivation bottom-up, the first applied rule is Claus0. After such an application, the obtained sequents have the form <sup>σ</sup>k <sup>=</sup> <sup>R</sup>k, Xk <sup>⇒</sup> <sup>g</sup>, where <sup>R</sup>k <sup>∪</sup> <sup>X</sup>k is non-empty, thus rule Claus<sup>0</sup> cannot be applied any more; the rule applied at the top is cpl0. Note that D contains a unique branch, consisting of the sequents <sup>⇒</sup> α, σ<sup>0</sup>,...,σn−<sup>1</sup>. In Fig. <sup>2</sup> we also define the pair <sup>π</sup>(D) = Ψ,χ: Ψ collects the (instances of) L-axioms selected by rule Claus1, <sup>χ</sup> is obtained by composing the substitutions associated with the applied rules. The definition of <sup>π</sup>(<sup>T</sup> ), with <sup>T</sup> <sup>a</sup> <sup>C</sup>L-tree, is similar. By <sup>T</sup> (α; R, X <sup>⇒</sup> <sup>g</sup>) we denote a <sup>C</sup>L-tree having root <sup>⇒</sup> <sup>α</sup> and leaf R, X <sup>⇒</sup> <sup>g</sup>. Given a <sup>C</sup>L-tree <sup>T</sup> , <sup>V</sup><sup>T</sup> is the set of variables occurring in <sup>T</sup> . We state some properties about <sup>C</sup>L-trees:

**Lemma 2.** *Let* <sup>T</sup> <sup>=</sup> <sup>T</sup> (α; R, X <sup>⇒</sup> g) *and let* π(<sup>T</sup> ) = Ψ,χ*.*

*(i)* <sup>V</sup>χ(p) ⊆ Vα*, for every* <sup>p</sup> ∈ V<sup>T</sup> *. (ii)* R, X <sup>i</sup> <sup>β</sup> <sup>↔</sup> <sup>χ</sup>(β)*, for every formula* <sup>β</sup>*. (iii) If* R, X, Γ <sup>i</sup> <sup>g</sup> *and* <sup>V</sup>Γ ⊆ Vα*, then* Γ, χ(Ψ) <sup>i</sup> <sup>α</sup>*.*

**Proposition 1.** *Let* <sup>D</sup> *be a* <sup>C</sup>L*-derivation of* <sup>⇒</sup> <sup>α</sup> *and let* <sup>π</sup>(D) = Ψ,χ*. Then,* <sup>V</sup>χ(Ψ) ⊆ V<sup>α</sup> *and* <sup>χ</sup>(Ψ) <sup>i</sup> <sup>α</sup>*.*

*Proof.* Since <sup>D</sup> is a <sup>C</sup>L-derivation, <sup>D</sup> has the form depicted on the right where <sup>T</sup> <sup>=</sup> <sup>T</sup> (α; R, X <sup>⇒</sup> g); note that <sup>π</sup>(<sup>T</sup> ) = <sup>π</sup>(D) = Ψ,χ. Since <sup>R</sup> <sup>c</sup> <sup>g</sup>, by Lemma 1(i) we get <sup>R</sup> <sup>i</sup> <sup>g</sup>, hence R, X <sup>i</sup> <sup>g</sup>. We can apply Lemma <sup>2</sup> and claim that <sup>V</sup>χ(Ψ) ⊆ V<sup>α</sup> and <sup>χ</sup>(Ψ) <sup>i</sup> <sup>α</sup>. D = <sup>R</sup> <sup>c</sup> <sup>g</sup> cpl<sup>0</sup> R, X <sup>⇒</sup> g . . . T <sup>⇒</sup> α

Given a <sup>C</sup>L-derivation <sup>D</sup> of <sup>⇒</sup> <sup>α</sup>, Prop. <sup>1</sup> exhibits how to extract a set of instances <sup>Ψ</sup>α of the <sup>L</sup>-axioms such that <sup>Ψ</sup>α <sup>i</sup> <sup>α</sup>. If <sup>D</sup> does not contain applications of rule Claus1, <sup>Ψ</sup>α is empty, and this ascertains that <sup>α</sup> is IPL-valid; actually, D can be immediately embedded into the calculus for IPL introduced in [8]. As an immediate consequence of Prop. 1, we get the soundness of <sup>C</sup>L: if <sup>⇒</sup> <sup>α</sup> is provable in <sup>C</sup>L, then <sup>α</sup> is <sup>L</sup>-valid.

Even though <sup>C</sup>L-derivations have a simple structure, the design of a rootfirst proof search strategy for <sup>C</sup>L is far from being trivial. After having applied rule Claus<sup>0</sup> to the root sequent <sup>⇒</sup> <sup>α</sup>, we enter a loop where at each iteration <sup>k</sup> we search for a derivation of <sup>σ</sup>k <sup>=</sup> <sup>R</sup>k, Xk <sup>⇒</sup> <sup>g</sup>. It is convenient to firstly check whether <sup>R</sup>k <sup>c</sup> <sup>g</sup> so that, by applying rule cpl0, we immediately close the derivation at hand. To check classical provability, we exploit a SAT-solver; each time the solver is invoked, the set <sup>R</sup>k has increased, thus it is advantageous to use an incremental SAT-solver. If <sup>R</sup><sup>k</sup> <sup>c</sup> <sup>g</sup>, we have to apply either rule cpl<sup>1</sup> or rule Claus1, but it is not obvious which strategy should be followed. First, we have to select one between the two rules. If rule cpl<sup>1</sup> is chosen, we have to guess proper <sup>λ</sup> and A; otherwise, we have to apply Claus1, and this requires the selection of an instance ψ of an L-axiom. In any case, if we followed a blind choice, the procedure would be highly inefficient. To guide proof search, we follow a different approach based on countermodel construction; to this aim, we introduce a representation of Kripke models where worlds are classical interpretations ordered by inclusion.

*Countermodels.* Let W be a finite set of interpretations with minimum M<sup>0</sup>, namely: <sup>M</sup><sup>0</sup> <sup>⊆</sup> <sup>M</sup> for every <sup>M</sup> <sup>∈</sup> <sup>W</sup>. By <sup>K</sup>(W) we denote the Kripke model W, <sup>≤</sup>, M<sup>0</sup>, ϑ where <sup>≤</sup> coincides with the subset relation <sup>⊆</sup> and <sup>ϑ</sup> is the identity map, thus M p (in <sup>K</sup>(W)) iff p <sup>∈</sup> M. We introduce the following *realizability relation* W between elements of <sup>W</sup> and implication clauses:

$$M \rhd\_W (a \to b) \to c \text{ iff } (a \in M) \text{ or } (b \in M) \text{ or } (c \in M) \text{ or }$$

$$(\exists M' \in W \text{ s.t. } M \subset M' \text{ and } a \in M' \text{ and } b \notin M') \text{ .}$$

By M W <sup>X</sup> we mean that M W <sup>λ</sup> for every <sup>λ</sup> <sup>∈</sup> <sup>X</sup>. We state the crucial properties of the model <sup>K</sup>(W):

**Proposition 2.** *Let* <sup>K</sup>(W) *be the model generated by* W *and let* w <sup>∈</sup> W*. Let* ϕ *be a clause and* λ = (a <sup>→</sup> b) <sup>→</sup> c *an implication clause.*

*(i) If* w <sup>|</sup><sup>=</sup> <sup>ϕ</sup>*, for every* <sup>w</sup> <sup>∈</sup> <sup>W</sup> *such that* <sup>w</sup> <sup>≤</sup> <sup>w</sup> *, then* w ϕ*. (ii) If* <sup>w</sup> <sup>|</sup><sup>=</sup> <sup>b</sup> <sup>→</sup> <sup>c</sup> *and* <sup>w</sup> W <sup>λ</sup>*, for every* <sup>w</sup> <sup>∈</sup> <sup>W</sup> *such that* <sup>w</sup> <sup>≤</sup> <sup>w</sup> *, then* w λ*.*

Let <sup>K</sup>(W) be a model with root r, and assume that every interpretation w in W is a model of R; our goal is to get r - <sup>R</sup> <sup>∪</sup> <sup>X</sup> (where (X)<sup>∗</sup> <sup>⊆</sup> R), possibly by filling W with new worlds. To this aim, we exploit Prop. 2. By our assumption and point (i), we claim that r - R. Suppose that there is w <sup>∈</sup> W and λ <sup>=</sup> (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>→</sup> <sup>c</sup> <sup>∈</sup> <sup>X</sup> such that <sup>w</sup>W <sup>λ</sup>; is it possible to amend <sup>K</sup>(W) in order to match (ii) and conclude r - <sup>X</sup>? By definition of W , none of the atoms <sup>a</sup>, <sup>b</sup>, <sup>c</sup> belongs to w; moreover <sup>K</sup>(W) lacks a world w such that <sup>w</sup> <sup>⊂</sup> <sup>w</sup> and <sup>a</sup> <sup>∈</sup> <sup>w</sup> and b ∈ w . We can try to fix <sup>K</sup>(W) by inserting the missing world w ; to preserve (i), we also need <sup>w</sup> <sup>|</sup><sup>=</sup> <sup>R</sup>. Accordingly, such a <sup>w</sup> exists if and only if R, w, a <sup>c</sup> <sup>b</sup>. This can be checked by querying a SAT-solver; moreover, if R, w, a <sup>c</sup> <sup>b</sup>, the solver also computes the required w . This completion process must be iterated until <sup>K</sup>(W) has been saturated with all the missing worlds or we get stuck. It is easy to check that the process eventually terminates. This is one of the key ideas beyond the procedure intuitRIL we present in next section.

## **4 The Procedure** intuitRIL

We present the procedure intuitRIL (intuit with Restart for Intermediate Logics) that, given a formula α and a logic L = IPL + Ax(L), returns either a set of <sup>L</sup>-axioms <sup>Ψ</sup>α or a model <sup>K</sup>(W) with the following properties:

(Q1) If intuitRIL(α,L) returns <sup>Ψ</sup>α, then <sup>Ψ</sup>α <sup>⊆</sup> Ax(L, <sup>V</sup>α) and <sup>Ψ</sup>α <sup>i</sup> <sup>α</sup>.

(Q2) If intuitRIL(α,L) returns <sup>K</sup>(W), then <sup>K</sup>(W) is an L-countermodel for α.

Thus, α is L-valid in the former case, not L-valid in the latter. If intuitRIL(α,L) returns <sup>Ψ</sup>α, by tracing the computation we can build a <sup>C</sup>L-derivation <sup>D</sup> of <sup>⇒</sup> <sup>α</sup> such that <sup>Ψ</sup>α <sup>=</sup> <sup>χ</sup>(Ψ), where Ψ,χ <sup>=</sup> <sup>π</sup>(D); this certificates that <sup>Ψ</sup>α <sup>i</sup> <sup>α</sup>.

The procedure is described by the flowchart in Fig. 3 and exploits a single incremental SAT-solver s: clauses can be added to s but not removed; by R(s) we denote the set of clauses stored in s. The SAT-solver is required to support the following operations:

	- Yes(A ): thus, <sup>A</sup> <sup>⊆</sup> <sup>A</sup> and R(s), A <sup>c</sup> <sup>g</sup>;
	- No(M): thus, <sup>A</sup> <sup>⊆</sup> <sup>M</sup> ⊆ VR(s) <sup>∪</sup> <sup>A</sup> and <sup>M</sup> <sup>|</sup>= R(s) and <sup>g</sup> ∈ <sup>M</sup>.

In the former case it follows that R(s), A <sup>c</sup> <sup>g</sup>, in the latter R(s), A <sup>c</sup> <sup>g</sup>.

**Fig. 3.** Computation of intuitRIL(α, L).

The computation of intuitRIL(α,L) consists of the following steps:



Intuitively, intuitRIL(α,L) searches for an L-countermodel <sup>K</sup>(W) for α. In the construction of <sup>K</sup>(W), whenever a conflict arises, a restart operation is triggered. A basic restart happens when it is not possible to fill the set W with a missing world (see the discussion after Prop. 2). A semantic restart is thrown when <sup>K</sup>(W) is a countermodel for α but it fails to be an L-model. In either case, the construction of <sup>K</sup>(W) restarts from scratch. However, to prevent that the same kind of conflict shows up again, new clauses are learned and fed to the SAT-solver (this complies with DPLL(T ) with learning computation paradigm [16]). If the outcome is <sup>χ</sup>(Ψ), by tracing the computation we can build a <sup>C</sup>L-derivation <sup>D</sup> of <sup>⇒</sup> α such that π(D) = Ψ,χ. The derivation is built bottom-up. The initial Step (S0) corresponds to the application of rule Claus<sup>0</sup> to the root sequent <sup>⇒</sup> <sup>α</sup>; basic and semantic restarts bottom-up expand the derivation by applying rule cpl<sup>1</sup> and Claus<sup>1</sup> respectively. We stress that the procedure is quite modular; to treat a specific logic L one has only to provide a concrete implementation of Step (S4). For L = IPL, Step (S4) is trivial, since the set Ax(IPL, V ) is empty. Actually, intuitRIL applied to IPL has the same behaviour as the procedure intuitR introduced in [8].

*Example 3.* Let us consider *Jankov axiom* **wem** <sup>=</sup> <sup>¬</sup>a ∨ ¬¬a [2,13] (aka *weak excluded middle*), which holds in all frames having a single maximal world (thus, **wem** is GL-valid). The trace of the execution of intuitRIL(**wem**,GL) is shown in Fig. 4. The initial clausification yields (R<sup>0</sup>, X<sup>0</sup>, <sup>g</sup>˜), where <sup>X</sup><sup>0</sup> consists of the implication clauses <sup>λ</sup><sup>0</sup>, λ<sup>1</sup> in Fig. <sup>4</sup> and <sup>R</sup><sup>0</sup> contains the 7 clauses below:

$$
\bar{g} \to \bar{p}\_2, \quad \bar{p}\_0 \to \bar{p}\_2, \quad a \land \bar{p}\_0 \to \bot, \quad \bar{p}\_1 \to \bar{p}\_2, \quad \bar{p}\_0 \land \bar{p}\_1 \to \bot, \quad \bar{p}\_2 \to \bar{g}, \quad \bar{p}\_2 \to \bar{p}\_0 \lor \bar{p}\_1.
$$

Each row in Fig. 4 displays the validity tests performed by the SAT-solver and the computed answers. If the result is No(M), the last two columns show the worlds <sup>w</sup>k in the current set W and, for each <sup>w</sup>k, the list of <sup>λ</sup> such that <sup>w</sup>W <sup>λ</sup>; the pair selected for the next step is underlined. For instance, after call (1) we have <sup>W</sup> <sup>=</sup> {w<sup>0</sup>}, <sup>w</sup><sup>0</sup>W <sup>λ</sup><sup>0</sup> and <sup>w</sup><sup>0</sup>W <sup>λ</sup><sup>1</sup>; the selected pair is <sup>w</sup><sup>0</sup>, λ<sup>0</sup>. After call (2), the set <sup>W</sup> is updated by adding the world <sup>w</sup><sup>1</sup>; we have <sup>w</sup><sup>1</sup> W <sup>λ</sup><sup>0</sup>, <sup>w</sup><sup>1</sup> <sup>W</sup> <sup>λ</sup><sup>1</sup>, <sup>w</sup><sup>0</sup> <sup>W</sup> <sup>λ</sup><sup>0</sup> and <sup>w</sup><sup>0</sup>W <sup>λ</sup><sup>1</sup>. Whenever the SAT-solver outputs Yes(A), we display the learned clause <sup>ψ</sup>k. The SAT-solver is invoked 18 times and there are 6 restarts (1 semantic, 5 basic). After (3), we get W <sup>=</sup> {w<sup>0</sup>, w<sup>1</sup>, w<sup>2</sup>} and no pair w, λ can be selected, hence the model <sup>K</sup>(W) (displayed in the figure) is a countermodel for **wem**. However, <sup>K</sup>(W) is not a GL-model (indeed, it is not linear), hence we choose an instance of the linearity axiom not forced at w<sup>0</sup>, namely <sup>ψ</sup><sup>0</sup>, and we force a semantic restart. The clausification of <sup>ψ</sup><sup>0</sup> produces 6 new clauses and the new implication clauses λ<sup>2</sup>, <sup>λ</sup><sup>3</sup>, <sup>λ</sup><sup>4</sup>. After each restart, the sets <sup>R</sup>j are:

$$\begin{aligned} R\_1 &= R\_0 \cup \{ \bar{p}\_3 \to \bar{p}\_4, \, a \to \bar{p}\_5, \, \bar{p}\_3 \land \bar{p}\_5 \to a, \, a \land \bar{p}\_4 \to \bar{p}\_3, \, a \land \bar{p}\_3 \to \perp, \, \bar{p}\_4 \lor \bar{p}\_5 \}, \\ R\_j &= R\_{j-1} \cup \{ \psi\_{j-1} \} \quad \text{for } 2 \le j \le 6 \text{ (the } \psi'\_j \text{s are defined in Fig. 4)}. \end{aligned}$$

The CGL-derivation of ⇒ ¬a ∨ ¬¬a extracted from the computation is:

$$\begin{array}{c} \text{R}\_{3}, a, \tilde{p}\_{0}, \tilde{p}\_{\text{c}} \stackrel{\text{R}\_{5}, a, \tilde{p}\_{0}}{\text{\textdegree\text{\textdegree\text{\textdegree\text{\textdegree}}}} \text{\textdegree\text{\textdegree\text{\textdegree}}} \text{\textdegree\text{\textdegree\text{\textdegree}}} \text{\textdegree\text{\textdegree\text{\textdegree}}} \text{\textdegree\text{\textdegree\text{\textdegree}}} \text{\textdegree\text{\textdegree}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree}} \text{\text{\textdegree\text{\textdegree}}} \text{\text{\textdegree}} \text{\text{\textdegree}} \text{\text{\text{\textdegree}}} \text{\text{\text$$

Now, we discuss partial correctness and termination of intuitRIL. Let us denote with <sup>∼</sup><sup>c</sup> classical equivalence (<sup>α</sup> <sup>∼</sup><sup>c</sup> <sup>β</sup> iff <sup>c</sup> <sup>α</sup> <sup>↔</sup> <sup>β</sup>) and with <sup>∼</sup><sup>i</sup> intuitionistic equivalence (<sup>α</sup> <sup>∼</sup><sup>i</sup> <sup>β</sup> iff <sup>i</sup> <sup>α</sup> <sup>↔</sup> <sup>β</sup>). We introduce some notation.

	- <sup>Φ</sup>k is the set collecting all the learned basic clauses;
	- <sup>R</sup>k is the set of clauses stored in the SAT-solver <sup>s</sup>;
	- <sup>X</sup>k, <sup>Ψ</sup>k, <sup>V</sup>k, <sup>χ</sup>k, <sup>r</sup>k are the values of the corresponding global variables.

In Fig. <sup>5</sup> we inductively define the <sup>C</sup>L-tree <sup>T</sup>k, having the form <sup>T</sup> (α; <sup>R</sup>k, Xk <sup>⇒</sup> <sup>g</sup>). In the application of rule Claus0, g and χ are defined as in Step (S0). In rule cpl1, λ is the implication clause selected at iteration k <sup>−</sup> 1 (of the main loop) in the last execution of Step (S3); A is the value computed at Step (S6) of iteration k <sup>−</sup> 1. In the application of rule Claus1, <sup>ψ</sup> and <sup>χ</sup> are defined as in the execution of Step (S4) and (S5) of iteration k <sup>−</sup> 1. One can easily check that the applications of the rules are sound. If Step (S1) yields Yes(∅), we can turn <sup>T</sup>k into a <sup>C</sup>L-derivation by applying rule cpl0.

Next lemma states some relevant properties of the computations of intuitRIL.

**Lemma 3.** *Let us consider the execution of iteration* k *of the main loop (*k <sup>≥</sup> <sup>0</sup>*).*

*(i)* (Xk) <sup>∪</sup> <sup>Φ</sup>k <sup>⊆</sup> <sup>R</sup>k*. (ii)* <sup>V</sup><sup>k</sup> <sup>=</sup> <sup>V</sup><sup>T</sup>*<sup>k</sup> and* <sup>Ψ</sup><sup>k</sup> <sup>⊆</sup> Ax(L, Vk) *and* <sup>π</sup>(Tk) = <sup>Ψ</sup>k, χk*. (iii)* <sup>V</sup>χ*k*(p) ⊆ Vα*, for every* <sup>p</sup> <sup>∈</sup> <sup>V</sup>k*, and* <sup>R</sup>k, X<sup>k</sup> <sup>i</sup> <sup>β</sup> <sup>↔</sup> <sup>χ</sup>k(β)*, for every* <sup>β</sup>*.*



**Fig. 4.** Computation of intuitRIL(¬a ∨ ¬¬a, GL).

$$\mathcal{T}\_0 = \begin{array}{c} R\_0, X\_0 \Rightarrow g \\ \hline \Rightarrow \alpha \end{array} \text{Clauso}\_0(g, \chi')$$

$$\mathcal{T}\_k = \begin{array}{c} \frac{R\_{k-1}, A \vdash\_c b \quad R\_k, X\_k \Rightarrow g \quad \text{cpl}\_1(\lambda)}{R\_{k-1}, X\_{k-1} \Rightarrow g} \text{ cpl}\_1(\lambda) \\ \vdots \\ \mathcal{T}\_{k-1} \\ \Rightarrow \alpha \end{array} \qquad \begin{array}{c} \begin{array}{c} R\_k, X\_k \Rightarrow g \\ \hline R\_{k-1}, X\_{k-1} \Rightarrow g \end{array} \text{Class}\_1(\psi, \chi') \\ \vdots \\ \begin{array}{c} \mathcal{T}\_{k-1} \\ \Rightarrow \alpha \end{array} \end{array}$$

**Fig. 5.** Definition of <sup>T</sup><sup>k</sup> (<sup>k</sup> <sup>≥</sup> 0).


*Proof.* We only sketch the proof of the non-trivial points. *(iii)*. By Lemma <sup>2</sup> applied to <sup>T</sup>k.

*(v)*. Every interpretation <sup>M</sup> generated at Step (S6) is a superset of <sup>r</sup>k, thus after Step (S2) <sup>r</sup>k is the minimum element of <sup>W</sup> and the root of <sup>K</sup>(W). By (iv) and Prop. 2(i), <sup>r</sup>k - <sup>R</sup>k. Since <sup>g</sup> ∈ <sup>r</sup>k, we get <sup>r</sup>k g.

*(vi)*. At Step (S4), w W <sup>λ</sup> for every <sup>w</sup> <sup>∈</sup> <sup>W</sup> and <sup>λ</sup> <sup>∈</sup> <sup>X</sup>k. Since (Xk) <sup>⊆</sup> <sup>R</sup>k, by Prop. 2(ii) we get <sup>r</sup>k - <sup>X</sup>k. Let <sup>ψ</sup> <sup>∈</sup> <sup>Ψ</sup>k; then, <sup>ψ</sup> has been learned at some iteration k < k. Let (R , X , χ ) be the output of Clausify(ψ,V ) at Step (S5) of iteration <sup>k</sup> . Since <sup>R</sup> <sup>⊆</sup> <sup>R</sup>k and <sup>X</sup> <sup>⊆</sup> <sup>X</sup>k, it holds that <sup>r</sup>k - R <sup>∪</sup>X . By (P1) R , X <sup>i</sup> <sup>ψ</sup>, hence <sup>r</sup>k <sup>ψ</sup>, which proves <sup>r</sup>k -<sup>Ψ</sup>k.

*(vii)*. Let <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup>k; we show that <sup>ϕ</sup> ∼<sup>c</sup> <sup>ϕ</sup> . Let ϕ <sup>=</sup> -(A \ {a}) <sup>→</sup> c; then, there are <sup>w</sup> <sup>∈</sup> <sup>W</sup> and <sup>λ</sup> = (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>→</sup> <sup>c</sup> <sup>∈</sup> <sup>X</sup>k such that w, λ has been selected at Step (S3) and the outcome of satProve(s,w ∪ {a},b) at Step (S6) is Yes(A). Note that <sup>w</sup>W <sup>λ</sup>, hence <sup>c</sup> ∈ <sup>w</sup>; since <sup>A</sup> <sup>⊆</sup> <sup>w</sup> ∪ {a}, we get <sup>w</sup> |<sup>=</sup> <sup>ϕ</sup>. On the other hand, w <sup>|</sup><sup>=</sup> ϕ , since <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup>k and <sup>Φ</sup>k <sup>⊆</sup> <sup>R</sup>k. We conclude <sup>ϕ</sup> ∼<sup>c</sup> <sup>ϕ</sup> .

*(viii)*. Let <sup>ψ</sup> <sup>∈</sup> <sup>Ψ</sup>k and let <sup>K</sup>(W) be the model obtained at Step (S4) of iteration <sup>k</sup>. By (iii) <sup>R</sup>k, Xk <sup>i</sup> <sup>ψ</sup> <sup>↔</sup> <sup>χ</sup>k(ψ) and <sup>R</sup>k, Xk <sup>i</sup> <sup>ψ</sup> <sup>↔</sup> <sup>χ</sup>k(ψ ). Since <sup>r</sup>k <sup>ψ</sup> and <sup>r</sup>k <sup>ψ</sup> (indeed, <sup>ψ</sup> <sup>∈</sup> <sup>Ψ</sup>k and <sup>r</sup>k - <sup>Ψ</sup>k) and <sup>r</sup>k - <sup>R</sup>k <sup>∪</sup> <sup>X</sup>k, we get rk <sup>χ</sup>k(ψ) and <sup>r</sup>k <sup>χ</sup>k(ψ ). We conclude <sup>χ</sup>k(ψ) ∼<sup>i</sup> <sup>χ</sup>k(ψ ).

The following proposition proves the partial correctness of intuitRIL:

# **Proposition 3.** intuitRIL(α*,*L) *satisfies properties (Q1) and (Q2).*

*Proof.* Let us assume that the computation ends at iteration k with output <sup>Ψ</sup>α. Then, the call to the SAT-solver at Step (S0) yields Yes(∅), meaning that <sup>R</sup>k <sup>c</sup> <sup>g</sup>. We can build the following <sup>C</sup>L-derivation <sup>D</sup> of <sup>⇒</sup> <sup>α</sup>:

$$\mathcal{D} = \begin{array}{c} \frac{R\_k \vdash\_c g}{R\_k, X\_k \Rightarrow\_g g} \text{ cpl}\_0\\ \vdots \\ \frac{1}{2}\mathcal{T}\_k\\ \Rightarrow \alpha \end{array} \quad \pi(\mathcal{D}) = \pi(\mathcal{T}\_k) = \langle \Psi\_k, \chi\_k \rangle$$

Note that <sup>Ψ</sup>α <sup>=</sup> <sup>χ</sup>k(Ψk). Accordingly, by Prop. <sup>1</sup> we get (Q1).

Let us assume that the output is the model <sup>K</sup>(W), having root r. Then, <sup>K</sup>(W) is an L-model (otherwise, Step (S4) should have forced a semantic restart). By Lemma 3(vi) we get r - <sup>R</sup><sup>0</sup>∪X<sup>0</sup> and <sup>r</sup> g. Since at Step (S0) we have clausified the formula <sup>α</sup> <sup>↔</sup> <sup>g</sup>, by (P1) we get <sup>R</sup>0, X<sup>0</sup> <sup>i</sup> <sup>α</sup> <sup>↔</sup> <sup>g</sup>, which implies <sup>r</sup> α <sup>↔</sup> g. We conclude that r α, hence (Q2) holds.

It seems challenging to provide a general proof of termination, and each logic must be treated apart. We can only state some general properties about the termination of the inner loop and of consecutive basic restarts.

# **Proposition 4.** *(i) The inner loop is terminating.*

*(ii) The number of consecutive basic restarts is finite.*

*Proof.* Let us assume, by absurd, that the inner loop is not terminating. For every <sup>j</sup> <sup>≥</sup> 0, by <sup>W</sup>j we denote the value of <sup>W</sup> at Step (S3) of iteration <sup>j</sup> of the inner loop; note that the value of the variable V does not change during the iterations. We show that <sup>W</sup>j <sup>⊂</sup> <sup>W</sup>j+1, for every <sup>j</sup> <sup>≥</sup> 0. At iteration <sup>j</sup>, the outcome of Step (S6) is No(M). Thus, there are <sup>w</sup> <sup>∈</sup> <sup>W</sup>j and <sup>λ</sup> = (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>→</sup> <sup>c</sup> <sup>∈</sup> <sup>X</sup> such that the pair w, λ has been selected at Step (S3); accordingly, <sup>w</sup>W*<sup>j</sup>* <sup>λ</sup> and <sup>w</sup> ∪ {a} ⊆ <sup>M</sup> and <sup>b</sup> ∈ <sup>M</sup>. We have <sup>M</sup> ∈ <sup>W</sup><sup>j</sup> , otherwise we would get w W*<sup>j</sup>* <sup>λ</sup>, a contradiction. Since <sup>W</sup>j+1 <sup>=</sup> <sup>W</sup>j ∪ {M}, this proves that <sup>W</sup>j <sup>⊂</sup> <sup>W</sup>j+1. We have shown that <sup>W</sup><sup>0</sup> <sup>⊂</sup> <sup>W</sup><sup>1</sup> <sup>⊂</sup> <sup>W</sup><sup>2</sup> ... . This leads to a contradiction since, for every <sup>j</sup> <sup>≥</sup> 0 and every <sup>w</sup> <sup>∈</sup> <sup>W</sup>j , <sup>w</sup> is a subset of <sup>V</sup> and <sup>V</sup> is finite. We conclude that the inner loop is terminating, and this proves (i).

Let us assume, by contradiction, that there is an infinite sequence of consecutive basic restarts. Then, there is n <sup>≥</sup> 0 such that, for every k <sup>≥</sup> n, the iteration <sup>k</sup> of the main loop ends with a basic restart. Let <sup>ϕ</sup>k be the clause learned at iteration k. Note that an iteration ending with a basic restart does not introduce new atoms, thus <sup>V</sup>ϕ*<sup>k</sup>* <sup>⊆</sup> <sup>V</sup><sup>n</sup> for every <sup>k</sup> <sup>≥</sup> <sup>n</sup> (where <sup>V</sup><sup>n</sup> is defined as in (†)). We get a contradiction, since <sup>V</sup>n is finite and, by Lemma 3(vi), the clauses <sup>ϕ</sup>k are pairwise non ∼c-equivalent; this proves (ii).

Lemma 3(vii) guarantees that the learned axioms are pairwise distinct, but this is not sufficient to prove termination since in general we cannot set a bound on the size and on the number of learned axioms. In next section we present some relevant logics where the procedure is terminating.

### **5 Termination**

Let GL = IPL + **lin** be the G¨odel-Dummett logic presented in Ex. 1; we show that every call intuitRIL(α,GL) is terminating. To this aim, we exploit the bounding function AxGL(α) presented in the mentioned example.

**Lemma 4.** *Let us consider the computation of* intuitRIL(α*,*GL) *and assume that at iteration* k *of the main loop Step (S4) is executed and that the obtained model* <sup>K</sup>(W) *is not linear. Then, there exists* <sup>ψ</sup> <sup>∈</sup> AxGL(α) *such that* <sup>r</sup>k ψ*.*

*Proof.* Let us assume that <sup>K</sup>(W) has two distinct maximal worlds <sup>w</sup><sup>1</sup> and <sup>w</sup><sup>2</sup>; note that <sup>w</sup><sup>1</sup> <sup>⊆</sup> <sup>V</sup>k and <sup>w</sup><sup>2</sup> <sup>⊆</sup> <sup>V</sup>k (with <sup>V</sup>k defined as in (†)). We show that:

(a) <sup>w</sup><sup>1</sup> ∩ Vα <sup>=</sup> <sup>w</sup><sup>2</sup> ∩ Vα.

Suppose by contradiction <sup>w</sup><sup>1</sup> ∩ Vα <sup>=</sup> <sup>w</sup><sup>2</sup> ∩ Vα; let <sup>p</sup> <sup>∈</sup> <sup>V</sup>k and <sup>β</sup> <sup>=</sup> <sup>χ</sup>k(p) (with <sup>χ</sup>k defined as in (†)). By Lemma 3(iii), <sup>R</sup>k, Xk <sup>i</sup> <sup>p</sup> <sup>↔</sup> <sup>β</sup>; by Lemma 3(vi) we get <sup>w</sup><sup>1</sup> <sup>p</sup> <sup>↔</sup> <sup>β</sup> and <sup>w</sup><sup>2</sup> <sup>p</sup> <sup>↔</sup> <sup>β</sup>. Since <sup>V</sup>β ⊆ Vα (see Lemma 3(iii)) and we are assuming <sup>w</sup><sup>1</sup> ∩ Vα <sup>=</sup> <sup>w</sup><sup>2</sup> ∩ Vα, it holds that <sup>w</sup><sup>1</sup> <sup>β</sup> iff <sup>w</sup><sup>2</sup> β, thus <sup>w</sup><sup>1</sup> <sup>p</sup> iff <sup>w</sup><sup>2</sup> <sup>p</sup>, namely <sup>p</sup> <sup>∈</sup> <sup>w</sup><sup>1</sup> iff <sup>p</sup> <sup>∈</sup> <sup>w</sup><sup>2</sup>. Since <sup>p</sup> is any element of <sup>V</sup>k, we get <sup>w</sup><sup>1</sup> <sup>=</sup> <sup>w</sup><sup>2</sup>, a contradiction; this proves (a). By (a) there is <sup>a</sup> ∈ Vα such that either <sup>a</sup> <sup>∈</sup> <sup>w</sup><sup>1</sup> \ <sup>w</sup><sup>2</sup> or <sup>a</sup> <sup>∈</sup> <sup>w</sup><sup>2</sup> \ <sup>w</sup><sup>1</sup>. We consider the former case (the latter one is symmetric), corresponding to Case 1 in Fig. 6. We have <sup>w</sup><sup>1</sup> <sup>a</sup> and <sup>w</sup><sup>2</sup> - <sup>¬</sup>a; setting <sup>ψ</sup> = (<sup>a</sup> → ¬a) <sup>∨</sup> (¬<sup>a</sup> <sup>→</sup> <sup>a</sup>), we conclude <sup>r</sup>k ψ.

Assume that <sup>K</sup>(W) has only one maximal world; since it is not linear, there are three distinct worlds <sup>w</sup><sup>1</sup>, <sup>w</sup><sup>2</sup>, <sup>w</sup><sup>3</sup> as in Case 2 in Fig. 6, namely: <sup>w</sup><sup>1</sup> is an immediate successor of <sup>w</sup><sup>2</sup> and <sup>w</sup><sup>3</sup> (i.e., for <sup>j</sup> ∈ {2, <sup>3</sup>}, <sup>w</sup>j < w<sup>1</sup> and, if <sup>w</sup>j < w, then <sup>w</sup><sup>1</sup> <sup>≤</sup> <sup>w</sup>), <sup>w</sup><sup>2</sup> ≤ <sup>w</sup><sup>3</sup>, <sup>w</sup><sup>3</sup> ≤ <sup>w</sup><sup>2</sup>. Reasoning as in (a), we get:

$$\text{(b) } w\_2 \cap \mathcal{V}\_\alpha \neq w\_3 \cap \mathcal{V}\_\alpha. \qquad \text{(c) } w\_2 \cap \mathcal{V}\_\alpha \subset w\_1 \cap \mathcal{V}\_\alpha \text{ and } w\_3 \cap \mathcal{V}\_\alpha \subset w\_1 \cap \mathcal{V}\_\alpha.$$

By (b) there is <sup>a</sup> ∈ Vα such that either <sup>a</sup> <sup>∈</sup> <sup>w</sup><sup>2</sup> \ <sup>w</sup><sup>3</sup> or <sup>a</sup> <sup>∈</sup> <sup>w</sup><sup>3</sup> \ <sup>w</sup><sup>2</sup>. Let us consider the former case (the latter one is symmetric). By (c), there is <sup>b</sup> ∈ Vα such that <sup>b</sup> <sup>∈</sup> <sup>w</sup><sup>1</sup> \ <sup>w</sup><sup>2</sup>. If <sup>b</sup> <sup>∈</sup> <sup>w</sup><sup>3</sup> (Case 2.1 in Fig. 6), we get <sup>a</sup> <sup>∈</sup> <sup>w</sup><sup>2</sup>, <sup>b</sup> ∈ <sup>w</sup><sup>2</sup>, <sup>a</sup> ∈ <sup>w</sup><sup>3</sup>, <sup>b</sup> <sup>∈</sup> <sup>w</sup><sup>3</sup>. Setting <sup>ψ</sup> = (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>∨</sup> (<sup>b</sup> <sup>→</sup> <sup>a</sup>), we conclude <sup>r</sup>k ψ. Finally, let us assume <sup>b</sup> ∈ <sup>w</sup><sup>3</sup> (Case 2.2). We have {a, b} ⊆ <sup>w</sup><sup>1</sup>, <sup>a</sup> <sup>∈</sup> <sup>w</sup><sup>2</sup>, <sup>b</sup> ∈ <sup>w</sup><sup>2</sup>, <sup>a</sup> ∈ <sup>w</sup><sup>3</sup> and <sup>b</sup> ∈ <sup>w</sup><sup>3</sup>. It is easy to check that <sup>w</sup><sup>3</sup> <sup>a</sup> <sup>→</sup> <sup>b</sup> (recall that <sup>w</sup><sup>3</sup> < w implies <sup>w</sup><sup>1</sup> <sup>≤</sup> <sup>w</sup>), thus <sup>w</sup><sup>3</sup> - (<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>→</sup> <sup>a</sup>. On the other hand <sup>w</sup><sup>2</sup> a <sup>→</sup> (a <sup>→</sup> b). Setting <sup>ψ</sup> = (<sup>a</sup> <sup>→</sup> (<sup>a</sup> <sup>→</sup> <sup>b</sup>)) <sup>∨</sup> ((<sup>a</sup> <sup>→</sup> <sup>b</sup>) <sup>→</sup> <sup>a</sup>), we get <sup>r</sup>k ψ.

We exploit Lemma <sup>4</sup> to implement Step (S4). If <sup>K</sup>(W) is linear, then <sup>K</sup>(W) is a GL-model and we are done. Otherwise, the proof of Lemma 4 hints an effective method to select an instance ψ of **lin** from AxGL(α).

# **Proposition 5.** *The computation of* intuitRIL(α*,*GL) *is terminating.*

*Proof.* Assume that intuitRIL(α,GL) is not terminating. Since the number of iterations of the inner loop and of the consecutive basic restarts is finite (see Prop. 4), Step (S4) must be executed infinitely many times. This leads to a contradiction, since the axioms selected at Step (S4) are pairwise distinct (see Lemma 3(vii)) and such axioms are chosen from the finite set AxGL(α).

**Fig. 6.** Proof of Lemma 4, case analysis.

As a corollary, we get that AxGL(α) is a bounding function for GL:

**Proposition 6.** *If* <sup>α</sup> *is* GL*-valid, there is* <sup>Ψ</sup>α <sup>⊆</sup> AxGL(α) *such that* <sup>Ψ</sup>α <sup>i</sup> <sup>α</sup>*.*

Other proof-search strategies for GL are discussed in [10,14]. This technique can be extended to other notable intermediate logics. Among these, we recall the logics GLn (G¨odel Logic of depth <sup>n</sup>), obtained by adding to GL the axioms **bd**n (bounded depth) where: **bd**<sup>0</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> ∨ ¬a<sup>0</sup>, **bd**n+1 <sup>=</sup> <sup>a</sup>n+1 <sup>∨</sup> (an+1 <sup>→</sup> **bd**n). Semantically, GLn is the logic characterized by linear frames having depth at most <sup>n</sup>. We are not able to prove termination for the logics IPL + **bd**n, but we can implement the following terminating strategy for GLn. Let <sup>K</sup>(W) be the model obtained at Step (S4) of the computation of intuitRIL(α,GLn):


Another terminating logic is the Jankov Logic (see Ex. 3); actually, also in this case the learned axiom can be chosen by renaming the **wem** axiom. In general, all the logics BTWn (Bounded Top Width, at most <sup>n</sup> maximal worlds, see [2]) are terminating. An intriguing case is Scott Logic ST [2]: even though the class of ST-frames is not first-order definable, we can implement a learning procedure for ST-axioms arguing as in [7] (see Sec. 2.5.2). Some of the mentioned logics have been implemented in intuitRIL<sup>1</sup>.

One may wonder whether this method can be applied to other non-classical logics or to fragments of predicate logics (these issues have been already raised in the seminal paper [4]). A significant work in this direction is [11], where the procedure has been applied to some modal logics. However, the main difference with the original approach is that it is not possible to use a single SAT-solver, but one needs a supply of SAT-solvers. This is primarily due to the fact that forcing relation of modal Kripke models is not persistent; thus worlds are loosely related and must be handled by independent solvers.

<sup>1</sup> Available at https://github.com/cfiorentini/intuitRIL.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Clause Redundancy and Preprocessing in Maximum Satisfiability**

Hannes Ihalainen, Jeremias Berg(B) , and Matti J¨arvisalo

HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland hannes.ihalainen@helsinki.fi, jeremias.berg@helsinki.fi, matti.jarvisalo@helsinki.fi

**Abstract.** The study of clause redundancy in Boolean satisfiability (SAT) has proven significant in various terms, from fundamental insights into preprocessing and inprocessing to the development of practical proof checkers and new types of strong proof systems. We study liftings of the recently-proposed notion of propagation redundancy—based on a semantic implication relationship between formulas—in the context of maximum satisfiability (MaxSAT), where of interest are reasoning techniques that preserve optimal cost (in contrast to preserving satisfiability in the realm of SAT). We establish that the strongest MaxSAT-lifting of propagation redundancy allows for changing in a controlled way the set of minimal correction sets in MaxSAT. This ability is key in succinctly expressing MaxSAT reasoning techniques and allows for obtaining correctness proofs in a uniform way for MaxSAT reasoning techniques very generally. Bridging theory to practice, we also provide a new MaxSAT preprocessor incorporating such extended techniques, and show through experiments its wide applicability in improving the performance of modern MaxSAT solvers.

**Keywords:** Maximum satisfiability · Clause redundancy · Propagation redundancy · Preprocessing

# **1 Introduction**

Building heavily on the success of Boolean satisfiability (SAT) solving [13], maximum satisfiability (MaxSAT) as the optimization extension of SAT constitutes a viable approach to solving real-world NP-hard optimization problems [6,35]. In the context of SAT, the study of fundamental aspects of clause redundancy [20,21,23,28,29,31,32] has proven central for developing novel types of preprocessing and inprocessing-style solving techniques [24,29] as well as in enabling efficient proof checkers [7,15,16,18,19,41,42] via succinct representation of most practical SAT solving techniques. Furthermore, clause redundancy notions have

c The Author(s) 2022

Work financially supported by Academy of Finland under grants 322869, 328718 and 342145. The authors wish to thank the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 75–94, 2022. https://doi.org/10.1007/978-3-031-10769-6\_6

been shown to give rise to very powerful proof systems, going far beyond resolution [22,23,30]. In contrast to viewing clause redundancy through the lens of logical entailment, the redundancy criteria developed in this line of work are based on a semantic implication relationship between formulas, making them desirably efficient to decide and at the same time are guaranteed to merely preserve satisfiability rather than logical equivalence.

The focus of this work is the study of clause redundancy in the context of MaxSAT through lifting recently-proposed variants of the notion of *propagation redundancy* [23] based on a semantic implication relationship between formulas from the realm of SAT. The study of such liftings is motivated from several perspectives. Firstly, earlier it has been shown that a natural MaxSAT-lifting called SRAT [10] of the redundancy notion of the notion of *resolution asymmetric tautologies* (RAT) [29] allows for establishing the general correctness of MaxSATliftings of typical preprocessing techniques in SAT solving [14], alleviating the need for correctness proofs for individual preprocessing techniques [8]. However, the need for preserving the *optimal cost* in MaxSAT—as a natural counterpart for preserving satisfiability in SAT—allows for developing MaxSAT-centric preprocessing and solving techniques which cannot be expressed through SRAT [2,11]. Capturing more generally such cost-aware techniques requires developing more expressive notions of clause redundancy. Secondly, due to the fundamental connections between solutions and so-called minimal corrections sets (MCSes) of MaxSAT instances [8,25], analyzing the effect of clauses that are redundant in terms of expressive notions of redundancy on the MCSes of MaxSAT instances can provide further understanding on the relationship between the different notions and their fundamental impact on the solutions of MaxSAT instances. Furthermore, in analogy with SAT, more expressive redundancy notions may prove fruitful for developing further practical preprocessing and solving techniques for MaxSAT.

Our main contributions are the following. We propose natural liftings of the three recently-proposed variants PR, LPR and SPR of propagation redundancy in the context of SAT to MaxSAT. We provide a complete characterization of the relative expressiveness of the lifted notions CPR, CLPR and CSPR (C standing for cost for short) and of their impact on the set of MCSes in MaxSAT instances. In particular, while removing or adding clauses redundant in terms of CSPR and CLPR (the latter shown to be equivalent with SRAT) do not influence the set of MCSes underlying MaxSAT instances, CPR can in fact have an influence on MCSes. In terms of solutions, this result implies that CSPR or CLPR clauses can not remove minimal (in terms of sum-of-weights of falsified soft clauses) solutions of MaxSAT instances, while CPR clauses can.

The—theoretically greater—effect that CPR clauses have on the solutions of MaxSAT instances is key for succinctly expressing further MaxSAT reasoning techniques via CPR and allows for obtaining correctness proofs in a uniform way for MaxSAT reasoning techniques very generally; we give concrete examples of how CPR captures techniques not in the reach of SRAT. Bridging to practical preprocessing in MaxSAT, we also provide a new MaxSAT preprocessor extended with such techniques. Finally, we provide large-scale empirical evidence on the positive impact of the preprocessor on the runtimes of various modern MaxSAT solvers, covering both complete and incomplete approaches, suggesting that extensive preprocessing going beyond the scope of SRAT appears beneficial to integrate for speeding up modern MaxSAT solvers.

An extended version of this paper, with formal proofs missing from this version, is available via the authors' homepages.

### **2 Preliminaries**

**SAT.** For a Boolean variable <sup>x</sup> there are two literals, the positive <sup>x</sup> and the negative ¬x, with ¬¬l = l for a literal l. A clause C is a set (disjunction) of literals and a CNF formula F a set (conjunction) of clauses. We assume that all clauses are non-tautological, i.e., do not contain both a literal and its negation. The set var(C) = {x | x ∈ C or ¬x ∈ C} consists of the variables of the literals in C. The set of variables and literals, respectively, of a formula are var(F) = - <sup>C</sup>∈<sup>F</sup> var(C) and lit(F) = - <sup>C</sup>∈<sup>F</sup> <sup>C</sup>, respectively. For a set <sup>L</sup> of literals, the set ¬L = {¬l | l ∈ L} consists of the negations of the literals in L.

A *(truth) assignment* τ is a set of literals for which x /∈ τ or ¬x /∈ τ for any variable x. For a literal l we denote l ∈ τ by τ (l) = 1 and ¬l ∈ τ by τ (l) = 0 or τ (¬l) = 1 as convenient, and say that τ assigns l the value 1 and 0, respectively. The set var(τ ) = {x | x ∈ τ or ¬x ∈ τ} is the range of τ , i.e., it consists of the variables τ assigns a value for. For a set L of literals and an assignment τ , the assignment τ<sup>L</sup> = (τ \ ¬L) ∪ L is obtained from τ by setting τL(l) = 1 for all l ∈ L and τL(l) = τ (l) for all l /∈ L assigned by τ . For a literal l, τ<sup>l</sup> stands for τ{l}. An assignment τ satisfies a clause C (τ (C) = 1) if τ ∩C = ∅ or equivalently if τ (l) = 1 for some l ∈ C, and a CNF formula F (τ (F) = 1) if it satisfies each clause C ∈ F. A CNF formula is satisfiable if there is an assignment that satisfies it, and otherwise unsatisfiable. The empty formula is satisfied by any truth assignment and the empty clause ⊥ is unsatisfiable. The Boolean satisfiability problem (SAT) asks to decide whether a given CNF formula F is satisfiable.

Given two CNF formulas F<sup>1</sup> and F2, F<sup>1</sup> entails F<sup>2</sup> (F<sup>1</sup> |= F2) if any assignment τ that satisfies F<sup>1</sup> and only assigns variables of F<sup>1</sup> (i.e. for which var(<sup>τ</sup> ) <sup>⊂</sup> var(F1)) can be extended into an assignment <sup>τ</sup> <sup>2</sup> <sup>⊃</sup> <sup>τ</sup> that satisfies <sup>F</sup>2. The formulas are equisatisfiable if F<sup>1</sup> is satisfiable iff F<sup>2</sup> is. An assignment τ is complete for a CNF formula F if var(F) ⊂ var(τ ), and otherwise partial for F. The restriction F <sup>τ</sup> of F wrt a partial assignment τ is a CNF formula obtained by (i) removing from F all clauses that are satisfied by τ and (ii) removing from the remaining clauses of F literals l for which τ (l) = 0. Applying unit propagation on F refers to iteratively restricting F by τ = {l} for a unit clause (clause with a single literal) (l) ∈ F until the resulting (unique) formula, denoted by UP(F), contains no unit clauses or some clause in F becomes empty. We say that unit propagation on F derives a conflict if UP(F) contains the empty clause. The formula F<sup>1</sup> implies F<sup>2</sup> under unit propagation (F<sup>1</sup> <sup>1</sup> F2) if, for each C ∈ F2, unit propagation derives a conflict in F<sup>1</sup> ∧ {(¬l) | l ∈ C}. Note that F<sup>1</sup> <sup>1</sup> F<sup>2</sup> implies F<sup>1</sup> |= F2, but not vice versa in general.

**Maximum Satisfiability.** An instance <sup>F</sup> = (FH, <sup>F</sup>S, w) of (weighted partial) maximum satisfiability (MaxSAT for short) consists of two CNF formulas, the hard clauses <sup>F</sup><sup>H</sup> and the soft clauses <sup>F</sup>S, and a weight function <sup>w</sup>: <sup>F</sup><sup>S</sup> <sup>→</sup> <sup>N</sup> that assigns a positive weight to each soft clause.

Without loss of generality, we assume that every soft clause <sup>C</sup> ∈ F<sup>S</sup> is unit<sup>1</sup>. The set of *blocking* literals B(F) = {l | (¬l) ∈ FS} consists of the literals l the negation of which occurs in FS. The weight function w is extended to blocking literals by w(l) = w((¬l)). Without loss of generality, we also assume that <sup>l</sup> <sup>∈</sup> lit(FH) for all <sup>l</sup> ∈ B(F)<sup>2</sup>. Instead of using the definition of MaxSAT in terms of hard and soft clauses, we will from now on view a MaxSAT instance F = (FH, B(F), w) as a set F<sup>H</sup> of hard clauses, a set B(F) of blocking literals and a weight function <sup>w</sup>: <sup>B</sup>(F) <sup>→</sup> <sup>N</sup>.

Any complete assignment τ over var(FH) that satisfies F<sup>H</sup> is a solution to F. The cost COST(F, τ ) = <sup>l</sup>∈B(F) <sup>τ</sup> (l)w(l) of a solution <sup>τ</sup> is the sum of weights of blocking literals it assigns to 1<sup>3</sup>. The cost of a complete assignment τ that does not satisfy F<sup>H</sup> is defined as ∞. The cost of a partial assignment τ over var(FH) is defined as the cost of smallest-cost assignments that are extensions of <sup>τ</sup> . A solution <sup>τ</sup> <sup>o</sup> is optimal if COST(F, τ <sup>o</sup>) <sup>≤</sup> COST(F, τ ) holds for all solutions τ of F. The cost of the optimal solutions of a MaxSAT instance is denoted by COST(F), with COST(F) = ∞ iff F<sup>H</sup> is unsatisfiable. In MaxSAT the task is to find an optimal solution to a given MaxSAT instance.

*Example 1.* Let F = (FH, B(F), w) be a MaxSAT instance with F<sup>H</sup> = {(x ∨ b1),(¬x ∨ b2),(y ∨ b<sup>3</sup> ∨ b4),(z ∨ ¬y ∨ b4),(¬z)}, B(F) = {b1, b2, b3, b4} having w(b1) = w(b4) = 1, w(b2) = 2 and w(b3) = 8. The assignment τ = {b1, b4,¬b2,¬b3,¬x,¬z,y} is an example of an optimal solution of F and has COST(F, τ ) = COST(F) = 2.

With a slight abuse of notation, we denote by F ∧ C = (F<sup>H</sup> ∪ {C}, B(F ∧ C), w) the MaxSAT instance obtained by adding a clause C to an instance F = (FH, B(F), w). Adding clauses may introduce new blocking literals but not change the weights of already existing ones, i.e., B(F) ⊂ B(F ∧ C) and <sup>w</sup><sup>F</sup> (l) = <sup>w</sup>F∧<sup>C</sup> (l) for all <sup>l</sup> ∈ B(F).

**Correction Sets.** For a MaxSAT instance <sup>F</sup>, a subset cs ⊂ B(F) is a minimal correction set (MCS) of F if (i) F<sup>H</sup> ∧ <sup>l</sup>∈B(F)\cs(¬l) is satisfiable and (ii) <sup>F</sup><sup>H</sup> <sup>∧</sup> <sup>l</sup>∈B(F)\cs*<sup>s</sup>* (¬l) is unsatisfiable for every cs<sup>s</sup> cs. In words, cs is an MCS if it

<sup>1</sup> A soft clause <sup>C</sup> can be replaced by the hard clause <sup>C</sup> <sup>∨</sup><sup>x</sup> and soft clause (¬x), where

<sup>x</sup> is a variable not in var(F*<sup>H</sup>* ∧ F*S*), without affecting the costs of solutions. <sup>2</sup> Otherwise the instance can be simplified by unit propagating <sup>¬</sup><sup>l</sup> without changing the costs of solutions. As a consequence, any complete assignment for F*<sup>H</sup>* will be

complete for <sup>F</sup>*<sup>H</sup>* ∧ F*<sup>S</sup>* as well. <sup>3</sup> This is equivalent to the sum of weights of soft clauses not satisfied by <sup>τ</sup> .

is a subset-minimal set of blocking literals that is included in some solution τ of F. <sup>4</sup> We denote the set of MCSes of <sup>F</sup> by mcs(F).

There is a tight connection between the MCSes and solutions of MaxSAT instances. Given an optimal solution <sup>τ</sup> <sup>o</sup> of a MaxSAT instance <sup>F</sup>, the set <sup>τ</sup> <sup>o</sup> <sup>∩</sup> B(F) is an MCS of F. In the other direction, for any cs ∈ mcs(F), there is a (not necessary optimal) solution <sup>τ</sup> cs such that cs = <sup>B</sup>(F) <sup>∩</sup> <sup>τ</sup> cs and COST(F, τ cs ) = <sup>l</sup>∈cs <sup>w</sup>(l).

*Example 2.* Consider the instance F from Example 1. The set {b1, b4} ∈ mcs(F) is an MCS of F that corresponds to the optimal solution τ described in Example 1. The set {b2, b3} ∈ mcs(F) is another example of an MCS that instead corresponds to the solution τ<sup>2</sup> = {b2, b3,¬b1,¬b4, x,¬z,¬y} for which COST(F, τ ) = 10.

#### **3 Propagation Redundancy in MaxSAT**

We extend recent work [23] on characterizing redundant clauses using semantic implication in the context of SAT to MaxSAT. In particular, we provide natural counterparts for several recently-proposed strong notions of redundancy in SAT to the context of MaxSAT and analyze the relationships between them.

In the context of SAT, the most general notion of clause redundancy is seemingly simple: a clause C is redundant for a formula F if it does not affect its satisfiability, i.e., clause C is redundant wrt a CNF formula F if F and F ∧ {C} are equisatisfiable [20,29]. This allows for the set of satisfying assignments to change, and does not require preserving logical equivalence; we are only interested in satisfiability.

A natural counterpart for this general view in MaxSAT is that the *cost* of optimal solutions (rather than the set of optimal solutions) should be preserved.

**Definition 1.** *A clause* <sup>C</sup> *is redundant wrt a MaxSAT instance* <sup>F</sup> *if* COST(F) = COST(F ∧ C)*.*

This coincides with the counterpart in SAT whenever B(F) = ∅, since then the cost of a MaxSAT instance F is either 0 (if F<sup>H</sup> is satisfiable) or ∞ (if F<sup>H</sup> is unsatisfiable). Unless explicitly specified, we will use the term "redundant" to refer to Definition 1.

Following [23], we say that a clause C *blocks* the assignment ¬C (and all assignments τ for which ¬C ⊂ τ ). As shown in the context of SAT [23], a clause C is redundant (in the equisatisfiability sense) for a CNF formula F if C does not block all of its satisfying assignments. The counterpart that arises in the context of MaxSAT from Definition 1 is that the cost of at least one of the solutions not blocked by C is no greater than the cost of ¬C.

**Proposition 1.** *A clause* <sup>C</sup> *is redundant wrt a MaxSAT instance* <sup>F</sup> *if and only if there is an assignment* τ *for which* COST(F ∧ C, τ ) = COST(F, τ ) ≤ COST(F,¬C)*.*

<sup>4</sup> This is equivalent to a subset-minimal set of soft clauses falsified by τ .

The equality COST(F ∧ C, τ ) = COST(F, τ ) of Proposition 1 is necessary, as witnessed by the following example.

*Example 3.* Consider the MaxSAT instance F detailed in Example 1, the clause C = (b5) with b<sup>5</sup> ∈ B(F ∧ C) and the assignment τ = {b5}. Then 2 = COST(F, τ ) ≤ COST(F,¬C) = 2 but C is not redundant since COST(F ∧ C) = 2 + <sup>w</sup>F∧<sup>C</sup> (b5) <sup>&</sup>gt; 2 = COST(F).

Proposition 1 provides a sufficient condition for a clause C being redundant. Further requirements on the assignment τ can be imposed without loss of generality.

**Theorem 1.** *A non-empty clause* <sup>C</sup> *is redundant wrt a MaxSAT instance* <sup>F</sup> <sup>=</sup> (FH, B(F), w) *if and only if there is an assignment* τ *such that (i)* τ (C)=1*, (ii)* F<sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>|</sup><sup>=</sup> <sup>F</sup><sup>H</sup> <sup>τ</sup> *and (iii)* COST(F ∧ C, τ ) = COST(F, τ ) ≤ COST(F,¬C)*.*

As we will see later, a reason for including two additional conditions in Theorem 1 is to allow defining different restrictions of redundancy notions, some of which allow for efficiently identifying redundant clauses.

*Example 4.* Consider the instance F = (FH, B(F), w) detailed in Example 1, a clause C = (¬x∨b5) for a b<sup>5</sup> ∈ B(F ∧C) and an assignment τ = {¬x, b1}. Then: τ (C) = 1, {(b2),(y ∨ b<sup>3</sup> ∨ b4),(z ∨ ¬y ∨ b4),(¬z)} = F<sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>|</sup><sup>=</sup> <sup>F</sup><sup>H</sup> <sup>τ</sup> = {(y ∨ b<sup>3</sup> ∨ b4),(z∨¬y∨b4),(¬z)}, and 2 = COST(F∧C, τ ) = COST(F, τ ) ≤ COST(F,¬C) = 3. We conclude that C is redundant.

In the context of SAT, imposing restrictions on the entailment operator and the set of assignments has been shown to give rise to several interesting redundancy notions which hold promise of practical applicability. These include three variants (LPR, SPR, and PR) of so-called (literal/set) propagation redundancy [23]. For completeness we restate the definitions of these three notions. A clause C is LPR wrt a CNF formula F if there is a literal l ∈ C for which F <sup>¬</sup><sup>C</sup> <sup>1</sup> <sup>F</sup> (¬C)*<sup>l</sup>* , SPR if the same holds for a subset L ⊂ C, and PR if there exists an assignment τ that satisfies C and for which F <sup>¬</sup><sup>C</sup> <sup>1</sup> <sup>F</sup> <sup>τ</sup> . With the help of Theorem 1, we obtain counterparts for these notions in the context of MaxSAT.

**Definition 2.** *With respect to an instance* <sup>F</sup> = (FH, <sup>B</sup>(F), w)*, a clause* <sup>C</sup> *is*


*Example 5.* Consider again F = (FH, B(F), w) from Example 1. The clause D = (b<sup>1</sup> ∨ b2) is CLPR wrt F since ⊥ ∈ UP(F<sup>H</sup> <sup>¬</sup>D) as {(x),(¬x)}⊂F<sup>H</sup> ¬D. As for the redundant clause C and assignment τ detailed in Example 3, we have that C is CPR, since F<sup>H</sup> <sup>τ</sup> ⊂ F<sup>H</sup> <sup>¬</sup><sup>C</sup> which implies <sup>F</sup><sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>1</sup> F<sup>H</sup> τ .

We begin the analysis of the relationship between these redundancy notions by showing that CSPR (and by extension CLPR) clauses also satisfy the MaxSAT-centric condition (iii) of Theorem 1. Assume that C is CSPR wrt a instance F = (FH, B(F), w) on the set L.

**Lemma 1.** *Let* <sup>τ</sup> ⊃ ¬<sup>C</sup> *be a solution of* <sup>F</sup>*. Then,* COST(F, τ ) <sup>≥</sup> COST(F, τL)*.*

The following corollary of Lemma 1 establishes that CSPR and CLPR clauses are redundant according to Definition 1.

**Corollary 1.** COST(F ∧ C,(¬C)L) = COST(F,(¬C)L) <sup>≤</sup> COST(F,¬C)*.*

The fact that CPR clauses are redundant follows trivially from the fact that F<sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>1</sup> F<sup>H</sup> <sup>τ</sup> implies F<sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>|</sup><sup>=</sup> <sup>F</sup><sup>H</sup> <sup>τ</sup> . However, given a solution ω that does not satisfy a CPR clause C, the next example demonstrates that the assignment ω<sup>τ</sup> need not have a cost lower than ω. Stated in another way, the example demonstrates that an observation similar to Lemma 1 does not hold for CPR clauses in general.

*Example 6.* Consider a MaxSAT instance F = (FH, B(F), w) having F<sup>H</sup> = {(x ∨ b1),(¬x, b2)}, B(F) = {b1, b2} and w(b1) = w(b2) = 1. The clause C = (x) is CPR wrt F, the assignment τ = {x, b2} satisfies the three conditions of Definition 2. Now δ = {¬x, b1} is a solution of F that does not satisfy C for which δ<sup>τ</sup> = {x, b1, b2} and 1 = COST(F, δ) < 2 = COST(F, δ<sup>τ</sup> ).

Similarly as in the context of SAT, verifying that a clause is CSPR (and by extension CLPR) can be done efficiently. However, in contrast to SAT, we conjecture that verifying that a clause is CPR can not in the general case be done efficiently, *even if the assignment* τ *is given.* While we will not go into detail on the complexity of identifying CPR clauses, the following proposition gives some support for our conjecture.

**Proposition 2.** *Let* <sup>F</sup> *be an instance and* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. There is another instance* <sup>F</sup><sup>M</sup>*, a clause* <sup>C</sup>*, and an assignment* <sup>τ</sup> *such that* <sup>C</sup> *is CPR wrt* <sup>F</sup><sup>M</sup> *if and only if* COST(F) ≥ k*.*

As deciding if COST(F) ≥ k is NP-complete in the general case, Proposition 2 suggests that it may not be possible to decide in polynomial time if an assignment τ satisfies the three conditions of Definition 2 unless P=NP. This is in contrast to SAT, where verifying propagation redundancy can be done in polynomial time if the assignment τ is given, but is NP-complete if not [24].

The following observations establish a more precise relationship between the redundancy notions. For the following, let RED(F) denote the set of clauses that are redundant wrt a MaxSAT instance F according to Definition 1. Analogously, the sets CPR(F), CSPR(F) and CLPR(F) consist of the clauses that are CPR, CSPR and CLPR wrt F, respectively.

**Observation 1** CLPR(F) <sup>⊂</sup> CSPR(F) <sup>⊂</sup> CPR(F) <sup>⊂</sup> RED(F) *holds for any MaxSAT instance* F*.*

**Observation 2** *There are MaxSAT instances* <sup>F</sup>1, <sup>F</sup><sup>2</sup> *and* <sup>F</sup><sup>3</sup> *for which* CLPR(F1) - CSPR(F1)*,* CSPR(F2) - CPR(F2) *and* CPR(F3) -RED(F3)*.*

The proofs of Observations 1 and 2 follow directly from known results in the context of SAT [23] by noting that any CNF formula can be viewed as an instance of MaxSAT without blocking literals.

For a MaxSAT-centric observation on the relationship between the redundancy notions, we note that the concept of redundancy and CPR coincide for any MaxSAT instance that has solutions.

**Observation 3** CPR(F) = RED(F) *holds for any MaxSAT instance* <sup>F</sup> *with* COST(F) < ∞*.*

We note that a result similar to Observation 3 could be formulated in the context of SAT. The SAT-counterpart would state that the concept of redundancy (in the equisatisfiability sense) coincides with the concept of propagation redundancy for SAT solving (defined e.g. in [23]) for *satisfiable* CNF formulas. However, assuming that a CNF formula is satisfiable is very restrictive in the context of SAT. In contrast, it is natural to assume that a MaxSAT instance admits solutions.

We end this section with a simple observation: adding a redundant clause C to a MaxSAT instance F preserves not only optimal cost, but optimal solutions of F ∧ C are also optimal solutions of F. However, the converse need not hold; an instance F might have optimal solutions that do not satisfy C.

*Example 7.* Consider an instance F = (FH, B(F), w) with F<sup>H</sup> = {(b<sup>1</sup> ∨ b2)}, B(F) = {b1, b2} and w(b1) = w(b2) = 1. The clause C = (¬b1) is CPR wrt F. In order to see this, let τ = {¬b1, b2}. Then τ satisfies C (condition (i) of Definition 2). Furthermore, τ satisfies FH, implying F<sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>1</sup> F<sup>H</sup> <sup>τ</sup> (condition (ii)). Finally, we have that 1 = COST(F, τ ) = COST(F ∧ C, τ ) ≤ COST(F,¬C)=1 (condition (iii)). The assignment δ = {b1,¬b2} is an example of an optimal solution of F that is not a solution of F ∧ C.

## **4 Propagation Redundancy and MCSes**

In this section, we analyze the effect of adding redundant clauses on the MCSes of MaxSAT instances. As the main result, we show that adding CSPR (and by extension CLPR) clauses to a MaxSAT instance F preserves all MCSes while adding CPR clauses does not in general. Stated in terms of solutions, this means that adding CSPR clauses to F preserves not only all optimal solutions, but all solutions τ for which (τ ∩ B(F)) ∈ mcs(F), while adding CPR clauses only preserves at least one optimal solution.

**Effect of CLPR Clauses on MCSes.** MaxSAT-liftings of four specific SAT solving techniques (including bounded variable elimination and self-subsuming resolution) were earlier proposed in [8]. Notably, the correctness of the liftings was shown individually for each of the techniques by arguing individually that applying one of the liftings does not change the set of MCSes of any MaxSAT instance. Towards a more generic understanding of optimal cost preserving MaxSAT preprocessing, in [10] the notion of solution resolution asymmetric tautologies (SRAT) was proposed as a MaxSAT-lifting of the concept of resolution asymmetric tautologies (RAT). In short, a clause C is a SRAT clause for a MaxSAT instance F = (FH, B(F), w) if there is a literal l ∈ C \ B(F ∧ C) such that F<sup>H</sup> <sup>1</sup> ((C ∨ D) \ {¬l}) for every D ∈ F<sup>H</sup> for which ¬l ∈ D.

In analogy with RAT [29], SRAT was shown in [10] to allow for a general proof of correctness for natural MaxSAT-liftings of a wide range of SAT preprocessing techniques, covering among other the four techniques for which individual correctness proofs were provided in [8]. The generality follows essentially from the fact that the addition and removal of SRAT clauses preserves MCSes. The same observations apply to CLPR, as CLPR and SRAT are equivalent.

# **Proposition 3.** *A clause* <sup>C</sup> *is CLPR wrt* <sup>F</sup> *iff it is SRAT wrt* <sup>F</sup>*.*

The proof of Proposition 3 follows directly from corresponding results in the context of SAT [23]. Informally speaking, a clause C is SRAT on a literal l iff it is RAT [29] on l and l /∈ B(F). Similarly, a clause C is CLPR on a literal l iff it is LPR as defined in [23] on l and l /∈ B(F). Proposition 3 together with previous results from [10] implies that the MCSes of MaxSAT instances are preserved under removing and adding CLPR clauses.

**Corollary 2.** *If* <sup>C</sup> *is CLPR wrt* <sup>F</sup>*, then* mcs(F) = mcs(F ∧ <sup>C</sup>)*.*

**Effect of CPR Clauses on MCSes.** We turn our attention to the effect of CPR clauses on the MCSes of MaxSAT instances. Our analysis makes use of the previously-proposed MaxSAT-centric preprocessing rule known as *subsumed label elimination* (SLE) [11,33] 5.

**Definition 3.** *(Subsumed Label Elimination [11,33]) Consider a MaxSAT instance* F = (FH, B(F), w) *and a blocking literal* l ∈ B(F) *for which* ¬l /∈ lit(FH)*. Assume that there is another blocking literal* l<sup>s</sup> ∈ B(F) *for which (1)* ¬l<sup>s</sup> ∈/ lit(FH)*, (2)* {C ∈ F<sup>H</sup> | l ∈ C}⊂{C ∈ F<sup>H</sup> | l<sup>s</sup> ∈ C} *and (3)* w(l) ≥ w(ls)*. The subsumed label elimination (SLE) rule allows adding* (¬l) *to* FH*.*

A specific proof of correctness of SLE was given in [11]. The following proposition provides an alternative proof based on CPR.

**Proposition 4 (Proof of correctness for SLE).** *Let* <sup>F</sup> *be a MaxSAT instance and assume that the blocking literals* l,l<sup>s</sup> ∈ B(F) *satisfy the three conditions of Definition 3. Then, the clause* C = (¬l) *is CPR wrt* F*.*

<sup>5</sup> Rephrased here using our notation.

*Proof.* We show that τ = {¬l,ls} satisfies the three conditions of Definition 2. First τ satisfies C (condition (i)). Conditions (1) and (2) of Definition 3 imply FH <sup>τ</sup> ⊂ F<sup>H</sup> <sup>¬</sup><sup>C</sup> which in turn implies <sup>F</sup><sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>1</sup> F<sup>H</sup> <sup>τ</sup> (condition (ii)).

As for condition (iii), the requirement COST(F ∧ C, τ ) = COST(F, τ ) follows from B(F ∧ C) = B(F). Let δ ⊃ ¬C be a complete assignment of F<sup>H</sup> for which COST(F, δ) = COST(F,¬C). If COST(F, δ) = ∞ then COST(F, τ ) ≤ COST(F,¬C) follows trivially. Otherwise δ \¬C satisfies F<sup>H</sup> <sup>¬</sup><sup>C</sup> so by <sup>F</sup><sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>1</sup> F<sup>H</sup> <sup>τ</sup> it satisfies F<sup>H</sup> <sup>τ</sup> as well. Thus <sup>δ</sup><sup>R</sup> = ((δ\¬C)\{¬<sup>l</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>τ</sup>})∪<sup>τ</sup> = (δ\{l,¬l,¬ls})∪{¬l,ls} is an extension of <sup>τ</sup> that satisfies <sup>F</sup><sup>H</sup> and for which COST(F, τ ) <sup>≤</sup> COST(F, δ<sup>R</sup>) <sup>≤</sup> COST(F, δ) by condition (3) of Definition 3. Thereby τ satisfies the conditions of Definition 2 so C is CPR wrt F.

*Example 8.* The blocking literals b3, b<sup>4</sup> ∈ B(F) of the instance F detailed in Example 1 satisfy the conditions of Definition 3. By Proposition 4 the clause (¬b3) is CPR wrt F.

In [11] it was shown that SLE does not preserve MCSes in general. By Corollary 2, this implies that SLE can not be viewed as the addition of CLPR clauses. Furthermore, by Proposition 4 we obtain the following.

**Corollary 3.** *There is a MaxSAT instance* <sup>F</sup> *and a clause* <sup>C</sup> *that is CPR wrt* F *for which* mcs(F) = mcs(F ∧ C)*.*

**Effect of CSPR Clauses on MCSes.** Having established that CLPR clauses preserve MCSes while CPR clauses do not, we complete the analysis by demonstrating that CSPR clauses preserve MCSes.

**Theorem 2.** *Let* <sup>F</sup> *be a MaxSAT instance and* <sup>C</sup> *a CSPR clause of* <sup>F</sup>*. Then* mcs(F) = mcs(F ∧ C)*.*

Theorem 2 follows from the following lemmas and propositions. In the following, let C be a clause that is CSPR wrt a MaxSAT instance F on a set L ⊂ C \ B(F ∧ C).

**Lemma 2.** *Let cs* ⊂ B(F)*. If* <sup>F</sup><sup>H</sup> <sup>∧</sup> <sup>l</sup>∈B(F)\*cs*(¬l) *is satisfiable, then* (F<sup>H</sup> ∧ C) ∧ <sup>l</sup>∈B(F∧C)\*cs*(¬l) *is satisfiable.*

Lemma 2 helps in establishing one direction of Theorem 2.

**Proposition 5.** mcs(F) <sup>⊂</sup> mcs(F ∧ <sup>C</sup>).

*Proof.* Let cs ∈ mcs(F). Then F<sup>H</sup> ∧ <sup>l</sup>∈B(F)\cs(¬l) is satisfiable, which by Lemma 2 implies that (F<sup>H</sup> ∧ C) ∧ <sup>l</sup>∈B(F∧C)\cs(¬l) is satisfiable.

To show that (F<sup>H</sup> ∧C)∧ <sup>l</sup>∈B(F∧C)\cs*<sup>s</sup>* (¬l) is unsatisfiable for any cs<sup>s</sup> cs ⊂ B(F), we note that any assignment satisfying (F<sup>H</sup> ∧C)∧ <sup>l</sup>∈B(F∧C)\cs*<sup>s</sup>* (¬l) would also satisfy F<sup>H</sup> ∧ <sup>l</sup>∈B(F)\cs*<sup>s</sup>* (¬l), contradicting cs <sup>∈</sup> mcs(F).

The following lemma is useful for showing inclusion in the other direction.

**Lemma 3.** *Let cs* <sup>∈</sup> mcs(F ∧ <sup>C</sup>)*. Then cs* ⊂ B(F)*.*

Lemma 3 allows for completing the proof of Theorem 2.

**Proposition 6.** mcs(F ∧ <sup>C</sup>) <sup>⊂</sup> mcs(F).

*Proof.* Let cs ∈ mcs(F ∧ C), which by Lemma 3 implies cs ⊂ B(F). Let τ be a solution that satisfies (F<sup>H</sup> ∧ C) ∧ <sup>l</sup>∈B(F∧C)\cs(¬l). Then <sup>τ</sup> satisfies <sup>F</sup><sup>H</sup> <sup>∧</sup> <sup>l</sup>∈B(F)\cs(¬l). For contradiction, assume that <sup>F</sup><sup>H</sup> <sup>∧</sup> <sup>l</sup>∈B(F)\cs*<sup>s</sup>* (¬l) is satisfiable for some cs<sup>s</sup> cs. Then by Lemma 2, (F<sup>H</sup> ∧C)∧ <sup>l</sup>∈B(F∧C)\cs*<sup>s</sup>* (¬l) is satisfiable as well, contradicting cs ∈ mcs(F ∧ C). Thereby cs ∈ mcs(F).

Theorem 2 implies that SLE can not be viewed as the addition of CSPR clauses. In light of this, an interesting remark is that—in contrast to CPR clauses in general (recall Example 6)—the assignment τ used in the proof of Proposition 4 can be used to convert any assignment that does not satisfy the CPR clause detailed in Definition 3 into one that does, without increasing its cost.

**Observation 4** *Let* <sup>F</sup> *be a MaxSAT instance and assume that the blocking literals* l,l<sup>s</sup> ∈ B(F) *satisfy the three conditions of Definition 3. Let* τ = {¬l,ls} *and consider any solution* δ ⊃ ¬C *of* F *that does not satisfy the CPR clause* C = (¬l)*. Then* δ<sup>τ</sup> *is a solution of* F ∧ C *for which* COST(F, δ<sup>τ</sup> ) ≤ COST(F, δ)*.*

#### **5 CPR-Based Preprocessing for MaxSAT**

Mapping the theoretical observations into practical preprocessing, in this section we discuss through examples how CPR clauses can be used as a unified theoretical basis for capturing a wide variety of known MaxSAT reasoning rules, and how they could potentially help in the development of novel MaxSAT reasoning techniques.

Our first example is the so-called *hardening rule* [2,8,17,26]. In terms of our notation, given a solution τ to a MaxSAT instance F = (FH, B(F), w) and a blocking literal l ∈ B(F) for which w(l) > COST(F, τ ), the hardening rule allows adding the clause C = (¬l) to FH.

The correctness of the hardening rule can be established with CPR clauses. More specifically, as COST(F, τ ) < w(l) it follows that τ (C) = 1 (condition (i) of Definition 2). Since τ satisfies F, we have that F<sup>H</sup> <sup>τ</sup> = so F<sup>H</sup> <sup>¬</sup><sup>C</sup> <sup>1</sup> F<sup>H</sup> τ (condition (ii)). Finally, as COST(F, δ) ≥ w(l) > COST(F, τ ) holds for all δ ⊃ ¬C it follows that COST(F,¬C) > COST(F, τ ) = COST(F ∧ C, τ ). As such, (¬l) is CPR clause wrt F. If fact, instead of assuming w(l) > COST(F, τ ) it suffices to assume w(l) ≥ COST(F, τ ) and τ (l) = 0.

The hardening rule can not be viewed as the addition of CSPR or CLPR clauses because it does not in general preserve MCSes.

*Example 9.* Consider the MaxSAT instance F from Example 1 and a solution τ = {b1, b2, b4,¬b3,¬z, x, y}. Since COST(F, τ )=3 < 8 = w(b3), the clause (¬b3) is CPR. However, mcs(F) = mcs(F ∧ C) since the set {b2, b3} ∈ mcs(F) is not an MCS of F ∧ C as (F<sup>H</sup> ∧ C) ∧ <sup>l</sup>∈B(F)\cs(¬l)=(F<sup>H</sup> <sup>∧</sup> (¬b3)) <sup>∧</sup> (¬b1) <sup>∧</sup> (¬b4) is not satisfiable.

Viewing the hardening rule through the lens of CPR clauses demonstrates novel aspects of the MaxSAT-liftings of propagation redundancy. In particular, instantiated in the context of SAT, an argument similar to the one we made for hardening shows that given a CNF formula F, an assignment τ satisfying F, and a literal l for which τ (l) = 0, the clause (¬l) is redundant (wrt equisatisfiability). While formally correct, such a rule is not very useful for SAT solving. In contrast, in the context of MaxSAT the hardening rule is employed in various modern MaxSAT solvers and leads to non-trivial performance-improvements [4,5].

As another example of capturing MaxSAT-centric reasoning with CPR, consider the so-called TrimMaxSAT rule [39]. Given a MaxSAT instance F = (FH, B(F), w) and a literal l ∈ B(F) for which τ (l) = 1 for all solutions of F, the TrimMaxSAT rule allows adding the clause C = (l) to FH. In this case the assumptions imply that all solutions of F also satisfy C, i.e., that F<sup>H</sup> <sup>¬</sup><sup>C</sup> is unsatisfiable. As such, any assignment τ that satisfies C and F<sup>H</sup> will also satisfy the three conditions of Definition 2 which demonstrates that C is CPR. It is, however, not CSPR since the only literal in C is blocking.

As a third example of capturing (new) reasoning techniques with CPR, consider an extension of the central variable elimination rule that allows (to some extent) for eliminating blocking literals.

**Definition 4.** *Consider a MaxSAT instance* <sup>F</sup> *and a blocking literal* <sup>l</sup> ∈ B(F)*. Let* BBVE(F) *be the instance obtained by (i) adding the clause* C ∨ D *to* F *for every pair* (C ∨ l),(D ∨ ¬l) ∈ F<sup>H</sup> *and (ii) removing all clauses* (D ∨ ¬l) ∈ FH*. Then* COST(F) = COST(BBVE(F)) *and* mcs(F) = mcs(BBVE(F))*.*

**On the Limitations of CPR.** Finally, we note that while CPR clauses significantly generalize existing theory on reasoning and preprocessing rules for MaxSAT, there are known reasoning techniques that can not (at least straightforwardly) be viewed through the lens of propagation redundancy. For a concrete example, consider the so-called intrinsic atmost1 technique [26].

**Definition 5.** *Consider a MaxSAT instance* <sup>F</sup> *and a set* <sup>L</sup> ⊂ B(F) *of blocking literals. Assume that (i)* |τ ∩ {¬l | l ∈ L}| ≤ 1 *holds for any solution* τ *of* F *and (ii)* w(l)=1 *for each* l ∈ L*. Now form the instance* AT-MOST-ONE(F, L) *by (i) removing each literal* l ∈ L *from* B(F)*, and (ii) adding the clause* {(¬l) | l ∈ L}∪{lL} *to* F*, where* l<sup>L</sup> *is a fresh blocking literal with* w(lL)=1*.*

It has been established that any optimal solution of AT-MOST-ONE(F, L) is an optimal solution of F [26]. However, as the next example demonstrates, the preservation of optimal solutions is in general not due to the clauses added being redundant, as applying the technique can affect optimal cost.

*Example 10.* Consider the MaxSAT instance F = (FH, B(F), w) with F<sup>H</sup> = {(li) | i = 1 ...n}, B(F) = {l<sup>1</sup> ...ln} and w(l) = 1 for all l ∈ B(F). Then |τ ∩ ¬B(F)| = 0 ≤ 1 holds for all solutions τ of F so the intrinsic-at-most-one technique can be used to obtain the instance <sup>F</sup><sup>2</sup> <sup>=</sup> AT-MOST-ONE(F, <sup>B</sup>(F)) = (F<sup>2</sup> <sup>H</sup>, <sup>B</sup>(F<sup>2</sup>), w<sup>2</sup>) with <sup>F</sup><sup>2</sup> <sup>H</sup> <sup>=</sup> <sup>F</sup><sup>H</sup> ∪ {(¬l<sup>1</sup> <sup>∨</sup> ... ∨ ¬l<sup>n</sup> <sup>∨</sup> <sup>l</sup>L)}, <sup>B</sup>(F<sup>2</sup>) = {lL} and <sup>w</sup><sup>2</sup>(lL) = 1. Now <sup>δ</sup> <sup>=</sup> {<sup>l</sup> <sup>|</sup> <sup>l</sup> ∈ B(F)}∪{lL} is an optimal solution to both <sup>F</sup><sup>2</sup> and <sup>F</sup> for which 1 = COST(F<sup>2</sup>, δ) <sup>&</sup>lt; COST(F, δ) = <sup>n</sup>.

Example 10 implies that the intrinsic atmost1 technique can not be viewed as the addition or removal of redundant clauses. Generalizing CPR to cover weight changes could lead to further insights especially due to potential connections with core-guided MaxSAT solving [1,36–38].

#### **6 MaxPre 2: More General Preprocesssing in Practice**

Connecting to practice, we extended the MaxSAT preprocessor MaxPre [33] version 1 with support for techniques captured by propagation redundancy. The resulting MaxPre version 2, as outlined in the following, hence includes techniques which have previously only been implemented in specific solver implementations rather than in general-purpose MaxSAT preprocessors.

First, let us mention that the earlier MaxPre [33] version 1 assumes that any blocking literals only appear in a single polarity among the hard clauses. Removing this assumption—supported by theory developed in Sects. 3–4 decreases the number of auxiliary variables that need to be introduced when a MaxSAT instance is rewritten to only include unit soft clauses. For example, consider a MaxSAT instance F with F<sup>H</sup> = {(¬x ∨ y),(¬y ∨ x)} and F<sup>S</sup> = {(x),(¬y)}. For preprocessing the instance, MaxPre 1 extends both soft clauses with a new, auxiliary variable and runs preprocessing on the instance F = {(¬x ∨ y),(¬y ∨ x),(x ∨ b1),(¬y ∨ b2)} with B(F) = {b1, b2}. In contrast, MaxPre 2 detects that the clauses in F<sup>S</sup> are unit and reuses them as blocking literals, invoking preprocessing on F = {(¬x∨y),(¬y∨x)} with B(F) = {¬x, y}.

In addition to the techniques already implemented in MaxPre 1, MaxPre 2 includes the following additional techniques: hardening [2], a variant Trim-MaxSAT [39] that works on all literals of a MaxSAT instance, the intrinsic atmost1 technique [26] and a MaxSAT-lifting of failed literal elimination [12]. In short, failed literal elimination adds the clause (¬l) to the hard clauses F<sup>H</sup> of an instance in case unit-propagation derives a conflict in F<sup>H</sup> ∧ {(l)}. Additionally, the implementation of failed literal elimination attempts to identify implied equivalences between literals that can lead to further simplification.

For computing the solutions required by TrimMaxSAT and detecting the cardinality constraints required by intrinsic-at-most-one constraints, MaxPre 2 uses the Glucose 3.0 SAT-solver [3]. For computing solutions required by hardening, MaxPre 2 additionally uses the SatLike incomplete MaxSAT solver [34] within preprocessing. MaxPre 2 is available in open source at https://bitbucket.org/ coreo-group/maxpre2/.

We emphasize that, while the additional techniques implemented by MaxPre 2 have been previously implemented as heuristics in specific solver implementations, MaxPre 2 is—to the best of our understanding—the first stand-alone implementation supporting techniques whose correctness cannot be established with previously-proposed MaxSAT redundancy notions (i.e., SRAT). The goal of our empirical evaluation presented in the next section is to demonstrate the potential of viewing expressive reasoning techniques not only as solver heuristics, but as a separate step in the MaxSAT solving process whose correctness can be established via propagation redundancy.

# **7 Empirical Evaluation**

We report on results from an experimental evaluation of the potential of incorporating more general reasoning in MaxSAT preprocessing. In particular, we evaluated both complete solvers (geared towards finding provably-optimal solutions) and incomplete solvers (geared towards finding relatively good solutions fast) on standard heterogenous benchmarks from recent MaxSAT Evaluations. All experiments were run on 2.60-GHz Intel Xeon E5-2670 8-core machines with 64 GB memory and CentOS 7. All reported runtimes include the time used in preprocessing (when applicable).

# **7.1 Impact of Preprocessing on Complete Solvers**

We start by considering recent representative complete solvers covering three central MaxSAT solving paradigms: the core-guided solver CGSS [27] (as a recent improvement to the successful RC2 solver [26]), and the MaxSAT Evaluation 2021 versions of the implicit hitting set based solver MaxHS [17] and the solutionimproving solver Pacose [40]. For each solver S we consider the following variants.


More precisely, <TECH> specifies which of the techniques HTVGR are applied: H for hardening, T and V for TrimMaxSAT on blocking and non-blocking literals, respectively, G for intrinsic-at-most-one-constraints and R for failed literal elimination. It should be noted that an exhaustive evaluation of all subsets and application orders of these techniques is infeasible in practice. Based on preliminary experiments, we observed that the following choices were promising: HRT for CGSS and MaxHS, and HTVGR for Pacose; we report results using these individual configurations.

As benchmarks, we used the combined set of weighted instances from the complete tracks of MaxSAT Evaluation 2020 and 2021. After removing duplicates, this gave a total of 1117 instances. We enforced a per-instance time limit of

**Fig. 1.** Impact of preprocessing on complete solvers. For each solver, the number of instances solved within a 60-min per-instance time limit in parentheses.

60 minutes and memory limit of 32 GB. Furthermore, we enforced a per-instance 120-second time limit on preprocessing.

An overview of the results is shown in Fig. 1, illustrating for each solver the number of instances solved (x-axis) under different per-instance time limits (y-axis). We observe that for both CGSS and MaxHS, S+maxpre1 and S+maxpre2/none leads to less instances solved compared to S. In contrast, S+maxpre2/HRT, i.e., incorporating the stronger reasoning techniques of Max-Pre 2, performs best of all preprocessing variants and improves on MaxHS also in terms of the number of instances solved. For Pacose, we observe that both Pacose+maxpre1 and Pacose+maxpre2/new (without the stronger reasoning techniques) already improve the performance of Pacose, leading to more instances solved. Incorporating the stronger reasoning rules further significantly improves performance, with Pacose+maxpre2/HVRTG performing the best among all of the Pacose variants.

# **7.2 Impact of Preprocessing on Incomplete MaxSAT Solving**

As a representative incomplete MaxSAT solver we consider the MaxSAT Evaluation 2021 version of Loandra [9], as the best-performing solver in the incomplete


**Table 1.** Impact of preprocessing on the incomplete solver Loandra. The wins are organized column-wise, the cell on row X column Y contains the total number of instances that the solver on column Y wins over the solver on row X.

track of MaxSAT Evaluation under a 300s per-instance time limit on weighted instances. Loandra combines core-guided and solution-improving search towards finding good solutions fast. We consider the following variants of Loandra.


As benchmarks, we used the combined set of weighted instances from the incomplete tracks of MaxSAT Evaluation 2020 and 2021. After removing duplicates, this gave a total of 451 instances. When reporting results, we consider for each instance and solver the cost of the best solution found by the solver within 300 s (including time spent preprocessing and solution reconstruction).

We compare the relative runtime performance of the solver variants using two metrics: *#wins* and the average *incomplete score*. Assume that τ<sup>x</sup> and τ<sup>y</sup> are the lowest-cost solutions computed by two solvers X and Y on a MaxSAT instance F and that best-cost(F) is the lowest cost of a solution of F found either in our evaluation or in the MaxSAT Evaluations. Then X wins over Y if COST(F, τx) < COST(F, τy). The incomplete score, score(F, X), obtained by solver X on F is the ratio between the cost of the solution found by X and best-cost(F), i.e., score(F, X)=(best-cost(F) + 1)/(COST(F, τx) + 1). The score of X on F is 0 if X is unable to find any solutions within 300 s.

An overview of the results is shown in Table 1. The upper part of the table shows a pairwise comparison on the number of wins over all benchmarks. The wins are organized column-wise, i.e., the cell on row X column Y contains the total number of instances that the solver on column Y wins over the solver on row X. The last row contains the average score obtained by each solver over all instances. We observe that any form of preprocessing improves the performance of Loandra, as witnessed by the fact that no-prepro is clearly the worstperforming variant. The variants that make use of MaxPre 2 outperform the

**Fig. 2.** Impact of preprocessing on instance size.

baseline under both metrics; both maxpre2 no new and maxpre2-w:VG obtain a higher average score and win on more instances over base. The comparison between maxpre2/none and maxpre2/VG is not as clear. On one hand, the score obtained by maxpre2/VG is higher. On the other hand, maxpre2/none wins on 80 instances over maxpre2/VG and looses on 77. This suggests that the quality of solutions computed by maxpre2/VG is on average higher, and that on the instances on which maxpre2/none wins the difference is smaller.

# **7.3 Impact of Preprocessing on Instance Sizes**

In addition to improved solver runtimes, we note that MaxPre 2 has a positive effect on the size of instances (both in terms of the number of variables and clauses remaining) when compared to preprocessing with MaxPre 1; see Fig. 2 for a comparison, with maxpre2/HRT compared to maxpre1 (left) and to original instance sizes (right).

#### **8 Conclusions**

We studied liftings of variants of propagation redundancy from SAT in the context of maximum satisfiability where—more fine-grained than in SAT—of interest are reasoning techniques that preserve optimal cost. We showed that CPR, the strongest MaxSAT-lifting, allows for changing minimal corrections sets in MaxSAT in a controlled way, thereby succinctly expressing MaxSAT reasoning techniques very generally. We also provided a practical MaxSAT preprocessor extended with techniques captured by CPR and showed empirically that extended preprocessing has a positive overall impact on a range of MaxSAT solvers. Interesting future work includes the development of new CPR-based preprocessing rules for MaxSAT capable of significantly affecting the MaxSAT solving pipeline both in theory and practice, as well as developing an understanding of the relationship between redundancy notions and the transformations performed by MaxSAT solving algorithms.

### **References**

1. Ans´otegui, C., Bonet, M., Levy, J.: SAT-based MaxSAT algorithms. Artif. Intell. **196**, 77–105 (2013)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Cooperating Techniques for Solving Nonlinear Real Arithmetic in the** cvc5 **SMT Solver (System Description)**

Gereon Kremer<sup>1</sup> , Andrew Reynolds2(B) , Clark Barrett<sup>1</sup> , and Cesare Tinelli<sup>2</sup>

> <sup>1</sup> Stanford University, Stanford, USA <sup>2</sup> The University of Iowa, Iowa City, USA andrew.j.reynolds@gmail.com

**Abstract.** The cvc5 SMT solver solves quantifier-free nonlinear real arithmetic problems by combining the cylindrical algebraic coverings method with incremental linearization in an abstraction-refinement loop. The result is a complete algebraic decision procedure that leverages efficient heuristics for refining candidate models. Furthermore, it can be used with quantifiers, integer variables, and in combination with other theories. We describe the overall framework, individual solving techniques, and a number of implementation details. We demonstrate its effectiveness with an evaluation on the SMT-LIB benchmarks.

**Keywords:** Satisfiability modulo theories · Nonlinear real arithmetic · Abstraction refinement · Cylindrical algebraic coverings

# **1 Introduction**

SMT solvers are used as back-end engines for a wide variety of academic and industrial applications [2,19,20]. Efficient reasoning in the theory of real arithmetic is crucial for many such applications [5,8]. While modern SMT solvers have been shown to be quite effective at reasoning about *linear* real arithmetic problems [21,43], *nonlinear* problems are typically much more difficult. This is not surprising, given that the worst-case complexity for deciding the satisfiability of nonlinear real arithmetic formulas is doubly-exponential in the number of variables in the formula [15]. Nevertheless, a variety of techniques have been proposed and implemented, each attempting to target a class of formulas for which reasonable performance can be observed in practice.

*Related Work.* All complete decision procedures for nonlinear real arithmetic (or the *theory of the reals*) originate in computer algebra, the most prominent being cylindrical algebraic decomposition (CAD) [11]. While alternatives exist [6,25,41], they have not seen much use [27], and CAD-based methods are the only sound and complete methods in practical use today. CAD-based methods used in modern SMT solvers include incremental CAD implementations [34,36] and cylindrical algebraic coverings [3], both of which are integrated in the traditional CDCL(T) framework for SMT [40].

In contrast, the NLSAT [30] calculus and the generalized MCSAT [28,39] framework provide for a much tighter integration of a conflict-driven CAD-based theory solver into a theory-aware core solver. This has been the dominant approach over the last decade due to its strong performance in practice. However, it has the significant disadvantage of being difficult to integrate with CDCL(T) based frameworks for theory combination.

A number of *incomplete* techniques are also used by various SMT solvers: incremental linearization [9] gradually refines an abstraction of the nonlinear formula obtained via a naive linearization by refuting spurious models of the abstraction; interval constraint propagation [24,36,45] employs interval arithmetic to narrow down the search space; subtropical satisfiability [22] provides sufficient linear conditions for nonlinear solutions in the exponent space of the polynomials; and virtual substitution [12,31,46] makes use of parametric solution formulas for polynomials of bounded degree. Though all of these techniques have limitations, each of them is useful for certain subclasses of nonlinear real arithmetic or in combination with other techniques.

*Contributions.* We present an integration of cylindrical algebraic coverings and incremental linearization, implemented in the cvc5 SMT solver. Crucial to the success of the integration is an abstraction-refinement loop used to combine the two techniques cooperatively. The solution is effective in practice, as witnessed by the fact that cvc5 won the nonlinear real arithmetic category of SMT-COMP 2021 [44], the first time a non-MCSAT-based technique has won since 2013. Our integrated technique also has the advantage of being very flexible: in particular, it fits into the regular CDCL(T) schema for theory solvers and theory combination, it supports (mixed) integer problems, and it can be easily extended using further subsolvers that support additional arithmetic operators beyond the scope of traditional algebraic routines (e.g., transcendental functions).

## **2 Nonlinear Solving Techniques**

The nonlinear arithmetic solver implemented in cvc5 generally follows the abstraction-refinement framework introduced by Cimatti et al. [9] and depicted in Fig. 1. The input assertions are first checked by the linear arithmetic solver, where they are linearized implicitly by treating every application of multiplication as if it were an arithmetic variable. For example, given input assertions *x*·*y >* 0 ∧ *x >* 1∧*y <* 0, the linear solver treats the expression *x*·*y* as a variable. It may then find the (spurious) model: *x* → 2, *y* → −1, and *x* · *y* → 1. We call the candidate model returned by the linear arithmetic solver, where applications of multiplication are treated as variables, a *linear model*. If a linear model does not exist, i.e., the input is unsatisfiable according to the linear solver, the linear solver generates a conflict that is immediately returned to the CDCL(T) engine.

**Fig. 1.** Structural overview of the nonlinear solver

When a linear model does exist, we check whether it already satisfies the input assertions or try to *repair* it to do so. We only apply a few very simple heuristics for repairs such as updating the value for *z* in the presence of a constraint like *z* = *x* · *y* based on the values of *x* and *y*.

If the model can not be repaired, we refine the abstraction for the linear solver [9]. This step constructs lemmas, or conflicts, based on the input assertions and the linear model, to advance the solving process by blocking either the current linear model or the current Boolean model, that is, the propositional assignment generated by the SMT solver's SAT engine. The Boolean model is usually eliminated only by the coverings approach, while the incremental linearization technique generates lemmas with new literals that target the linear model, e.g., the lemma *x >* 0 ∧ *y <* 0 ⇒ *x*·*y <* 0 in the example above. We next describe our implementation of cylindrical algebraic coverings and incremental linearization, and how they are combined in cvc5.

#### **2.1 Cylindrical Algebraic Coverings**

Cylindrical algebraic coverings is a technique recently proposed by Abrah´ ´ am et al. [3] and is heavily inspired by CAD. While the way the computation proceeds is very different from traditional CAD, and instead somewhat similar to NLSAT [30], their mathematical underpinnings are essentially identical. The cylindrical algebraic coverings subsolver in cvc5 closely follows the presentation in [3]. Below, we discuss some differences and extensions. For this discussion, we must refer the reader to [3] for the relevant background material because of space constraints. We note that cvc5 relies on the libpoly library [29] to provide most of the computational infrastructure for algebraic reasoning.

*Square-Free Basis.* As with most CAD projection schemas, the set of projection polynomials needs to be a square-free basis when computing the characterization for an interval in [3, Algorithm 4]. However, the resultants computed in this algorithm combine polynomials from different sets, which are not necessarily coprime. The remedy is to either make these sets of polynomials pairwise squarefree or to fully factor all projection polynomials. We adopt the former approach.

*Starting Model.* Although the linear model may not satisfy the nonlinear constraints, we may expect it to be in the vicinity of a proper model. We thus optionally use the linear model as an *initial assignment* for the cylindrical algebraic coverings algorithm in one of two ways: either using it initially in the search and discarding it as soon as it conflicts; or using it whenever possible, even if it leads to a conflict in another branch of the search. Unfortunately, neither technique has any discernible impact in our experiments.

*Interval Pruning.* As already noted in [3], a covering may contain two kinds of redundant intervals: intervals fully contained in another interval, or intervals contained in the union of other intervals. Removing the former kind of redundancies is not only clearly beneficial, but also required for how the characterizations are computed. It is not clear, however, if it is worthwhile to remove redundancies of the second kind because, while it can simplify the characterization locally, it may also make the resulting interval smaller, slowing down the overall solving process. Note that there may not be a unique redundant interval: e.g., if multiple intervals overlap, it may be possible to remove one of two intervals, but not both of them. We have implemented a simple heuristic to detect redundancies of the second kind, always removing the smallest interval with respect to the interval ordering given in [3]. Even if these redundancies occur in about 7*.*5% of all QF NRA benchmarks, using this technique has only a very limited impact. It may be that for certain kinds of benchmarks, underrepresented in SMT-LIB, the technique is valuable. Or it may be that some variation of the technique is more broadly helpful. These are interesting directions for future work.

*Lifting and Coefficient Selection with Lazard.* The original cylindrical algebraic coverings technique is based on McCallum's projection operator [37], which is particularly well-studied, but also (refutationally) unsound: polynomial nullification may occur when computing the real roots, possibly leading to the loss of real roots and thus solution candidates. One then needs to check for these cases and fall back to a more conservative, albeit more costly, projection schema such as those due to Collins [11] or Hong [26].

Lazard's projection schema [35], which has been proven correct only recently [38], provides very small projection sets and is both sound and complete. This comes at the price of a different mathematical background and a modified lifting procedure, which corresponds to a modified procedure for real root isolation. Although the local projections employed in cylindrical algebraic coverings have not been formally verified for Lazard's projection schema yet, we expect no significant issues there. Adopting it seems to be a logical improvement, as already mentioned in [3]. The modified real root isolation procedure is a significant hurdle in practice, as it requires additional nontrivial algorithms [32, Section 5.3.2]. We implemented it using CoCoALib [1] in cvc5 [33], achieving soundness without any discernible negative performance impact.

Using Lazard's projection schema, for all its benefits, may seem questionable for the following reasons: (*i*) the unsoundness of McCallum's projection operator is virtually never witnessed in practice [32,33, Section 6.5], and (*ii*) the projection sets computed by Lazard's and McCallums's projection operator are identical on more than 99.5% on all of QF NRA [33]. We argue, though, that working in the domain of formal verification warrants the effort of obtaining a (provably) correct result, especially if it does not incur a performance overhead.

*Proof Generation.* Recently, generating formal proofs to certify the result of SMT solvers has become an area of focus. In particular, there is a large and ongoing effort to produce proofs in cvc5. The incremental linearization approach can be seen as an oracle which produces lemmas that are easy to prove individually, so cvc5 does generate proofs for them; the complex part is finding those lemmas and making sure they actually help the solver make progress.

The situation is very different for cylindrical algebraic coverings: the produced lemma is the infeasible subset, and we usually have no simpler proof than the computations relying on CAD theory. That said, cylindrical algebraic coverings appear to be more amenable to automatic proof generation than traditional CAD-based approaches [4,14]. In fact, although making these proofs detailed enough for automated verification is still an open problem, they are already broken into smaller parts that closely follow the tree-shaped computation of the algorithm. This allows cvc5 to produce at least a proof skeleton in that case.

#### **2.2 Incremental Linearization**

Our theory solver for nonlinear (real) arithmetic optionally uses lemma schemas following the incremental linearization approaches described by Cimatti et al. [9] and Reynolds et al. [42]. These schemas incrementally refine candidate models from the linear arithmetic solver by introducing selected quantifier-free lemmas that express properties of multiplication, such as signedness (e.g., *x >* 0 ∧ *y >* 0 ⇒ *x*·*y >* 0) or monotonicity (e.g., |*x*| *>* |*y*| ⇒ *x*·*x>y* ·*y*). They are generated as needed to refute spurious models that violate these properties.

Most lemma schemas built-in in cvc5 are crafted so as to avoid introducing new monomial terms or coefficients, since that could lead to non-termination in the CDCL(T) search. As a notable exception, we rely on a lemma schema for *tangent planes* for multiplication [9], which can be used to refute the candidate model for any application of the multiplication operator · whose value in the linear model is inconsistent with the standard interpretation of ·. Note that since these lemmas depend upon the current model value chosen for arithmetic variables, tangent plane lemmas may introduce an unbounded number of new literals into the search. The set of lemma schemas used by the solver is userconfigurable, as described in the following section.

#### **2.3 Strategy**

The overall theory solver for nonlinear arithmetic is built from several subsolvers, implementing the techniques described above, using a rather naive strategy, as summarized in Algorithm 1. After a spurious linear model has been constructed that cannot be repaired, we first apply a subset of the lemma schemas that do not introduce an unbounded number of new terms (with procedure IncLinearizationLight); then, we continue with the remaining lemma schemas


**Algorithm 1:** Strategy for nonlinear arithmetic solver

(with procedure IncLinearizationFull); finally, we resort to the coverings solver which is guaranteed to find either a conflict or a model. Internally, each procedure sequentially tries its assigned lemma schemas from [9,42] until it constructs a lemma that can block the spurious model.

The approach is dynamically configured based on input options and the logic of the input formula. For example, by default, we disable IncLinearizationFull for QF NRA as it tends to diverge in cases where the coverings solver quickly terminates.

### **2.4 Beyond** QF NRA

The presented solver primarily targets quantifier-free nonlinear real arithmetic, but is used also in the presence of quantifiers and with multiple theories.

*Quantified Logics.* Solving quantified logics for nonlinear arithmetic requires solving quantifier-free subproblems, and thus any improvement to quantifier-free solving also benefits solving with quantifiers. In practice, however, the instantiation heuristics are just as important for overall solver performance.

*Multiple Theories.* The theory combination framework as implemented in cvc5 requires evaluating equalities over the combined model. To support this functionality, real algebraic numbers had to be properly integrated into the entire solver; in particular, the ability to compute with these numbers could not be local to the cylindrical algebraic coverings module or even the nonlinear solver.

# **3 Experimental Results**

We evaluate our implementation within cvc5 (commit id 449dd7e) in comparison with other SMT solvers on all 11552 benchmarks in the quantifier-free nonlinear real arithmetic (QF NRA) logic of SMT-LIB. We consider three configurations of cvc5, each of which runs a subset of steps from Algorithm 1. All the configurations run lines 2–4. In addition, cvc5.cov runs line 7, cvc5.inclin runs lines 5 and 6, and cvc5 runs lines 5 and 7. All experiments were conducted on Intel Xeon E5-2637v4 CPUs with a time limit of 20 min and 8 GB memory.

We compare cvc5 with recent versions of all other SMT solvers that participated in the QF NRA logic of SMT-COMP 2021 [44]: MathSAT 5.6.6 [10], SMT-RAT 19.10.560 [13], veriT [7] (veriT+raSAT+Redlog), Yices2 2.6.4 [18] (Yices-QS for quantified logics), and z3 4.8.14 [16]. MathSAT employs an abstraction-refinement mechanism very similar to the one described in Sect. 2.2; veriT [23] forwards nonlinear arithmetic problems to the external tools raSAT [45], which uses interval constraint propagation, and Redlog/Reduce [17], which focuses on virtual substitution and cylindrical algebraic decomposition; SMT-RAT, Yices2, and z3 all implement some variant of MCSAT [30]. Note that SMT-RAT also implements the cylindrical algebraic coverings approach, but it is less effective than SMT-RAT's adaptation of MCSAT [3].


**Fig. 2.** (a) Experiments for QF NRA (b) Experiments for NRA and QF UFNRA

Figure 2a shows that cvc5 significantly outperforms all other QF NRA solvers. Both the coverings approach (cvc5.cov) and the incremental linearization approach (cvc5.inclin) contribute substantially to the overall performance of the unified solver in cvc5, with coverings solving many satisfiable instances, and incremental linearization helping on unsatisfiable ones. Even though cvc5.inclin closely follows [9], it outperforms MathSAT on unsatisfiable benchmarks, those where cvc5 relies on incremental linearization the most.

Comparing cvc5 and Yices2 is particularly interesting, as the coverings approach in cvc5 and the NLSAT solver in Yices2 both rely on libpoly [29], thus using the same implementation of algebraic numbers and operations over them. Our integration of incremental linearization and algebraic coverings is compatible with the traditional CDCL(T) framework and outperforms the alternative NLSAT approach, which is specially tailored to nonlinear real arithmetic.

Going beyond QF NRA, we also evaluate the performance of our solver in the context of theory combination (with all 37 benchmarks from QF UFNRA) and quantifiers (with all 4058 benchmarks from NRA). There, cvc5 is a close runner-up to Yices2 and z3, thanks to the coverings subsolver which significantly improves cvc5's performance. We conjecture that the remaining gap is due to components other than the nonlinear arithmetic solver, such as the solver for equality and uninterpreted functions, details of theory combination, or quantifier instantiation heuristics. Interestingly, the sets of unsolved instances in NRA are almost disjoint for cvc5.cov, Yices2 and z3, indicating that each tool could solve the remaining benchmarks with reasonable extra effort.

# **4 Conclusion**

We have presented an approach for solving quantifier-free nonlinear real arithmetic problems that combines previous approaches based on incremental linearization [9] and cylindrical algebraic coverings [3] into one coherent abstraction-refinement loop. The resulting implementation is very effective, outperforming other state-of-the-art solver implementations, and integrates seamlessly in the CDCL(T) framework.

The general approach also applies to integer problems, quantified formulas, and instances with multiple theories, and can additionally be used in combination with transcendental functions [9] and bitwise conjunction for integers [47]. Further evaluations of these combinations are left to future work.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Preprocessing of Propagation Redundant Clauses**

Joseph E. Reeves(B) , Marijn J. H. Heule , and Randal E. Bryant

Carnegie Mellon University, Pittsburgh, PA, USA *{*jereeves,mheule,randy.bryant*}*@cs.cmu.edu

**Abstract.** The *propagation redundant* (PR) proof system generalizes the *resolution* and *resolution asymmetric tautology* proof systems used by *conflict-driven clause learning* (CDCL) solvers. PR allows short proofs of unsatisfiability for some problems that are difficult for CDCL solvers. Previous attempts to automate PR clause learning used hand-crafted heuristics that work well on some highly-structured problems. For example, the solver SADICAL incorporates PR clause learning into the CDCL loop, but it cannot compete with modern CDCL solvers due to its fragile heuristics. We present PRELEARN, a preprocessing technique that learns short PR clauses. Adding these clauses to a formula reduces the search space that the solver must explore. By performing PR clause learning as a preprocessing stage, PR clauses can be found efficiently without sacrificing the robustness of modern CDCL solvers. On a large portion of SAT competition benchmarks we found that preprocessing with PRELEARN improves solver performance. In addition, there were several satisfiable and unsatisfiable formulas that could only be solved after preprocessing with PRELEARN. PRELEARN supports proof logging, giving a high level of confidence in the results.

## **1 Introduction**

*Conflict-driven clause learning* (CDCL) [27] is the standard paradigm for solving the satisfiability problem (SAT) in propositional logic. CDCL solvers learn clauses implied through *resolution* inferences. Additionally, all competitive CDCL solvers use pre- and in-processing techniques captured by the *resolution asymmetric tautology* (RAT) proof system [21]. As examples, the well-studied pigeonhole and mutilated chessboard problems are challenging benchmarks with exponentially-sized resolution proofs [1,12]. It is possible to construct small hand-crafted proofs for the pigeonhole problem using *extended resolution* (ER) [8], a proof system that allows the introduction of new variables [32]. ER can be expressed in RAT but has proved difficult to automate due to the large search space. Even with modern inprocessing techniques, many CDCL solvers struggle on these seemingly simple problems. The *propagation redundant* (PR) proof system allows short proofs for these problems [14,15], and unlike in ER, no new variables are required. This makes PR an attractive candidate for automation.

At a high level, CDCL solvers make decisions that typically yield an unsatisfiable branch of a problem. The clause that prunes the unsatisfiable branch from the search space is learned, and the solver continues by searching another branch. PR extends this

c The Author(s) 2022

The authors are supported by the NSF under grant CCF-2108521.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 106–124, 2022. https://doi.org/10.1007/978-3-031-10769-6\_8

paradigm by allowing more aggressive pruning. In the PR proof system a branch can be pruned as long as there exists another branch that is at least as satisfiable. As an example, consider the mutilated chessboard. The mutilated chessboard problem involves finding a covering of 2 × 1 dominos on an n × n chessboard with two opposite corners removed (see Section 5.4). Given two horizontally oriented dominoes covering a 2 × 2 square, two vertically oriented dominos could cover the same 2 × 2 square. For any solution that uses the dominos in the horizontal orientation, replacing them with the dominos in the vertical orientation would also be a solution. The second orientation is as satisfiable as the first, and so the first can be pruned from the search space. Even though the number of possible solutions may be reduced, the pruning is satisfiability preserving. This is a powerful form of reasoning that can efficiently remove many symmetries from the mutilated chessboard, making the problem much easier to solve [15].

The *satisfaction-driven clause learning* (SDCL) solver SADICAL [16] incorporates PR clause learning into the CDCL loop. SADICAL implements hand-crafted decision heuristics that exploit the canonical structure of the pigeonhole and mutilated chessboard problems to find short proofs. However, SADICAL's performance deteriorates under slight variations to the problems including different constraint encodings [7]. The heuristics were developed from a few well-understood problems and do not generalize to other problem classes. Further, the heuristics for PR clause learning are likely ill-suited for CDCL, making the solver less robust.

In this paper, we present PRELEARN, a preprocessing technique for learning PR clauses. PRELEARN alternates between finding and learning PR clauses. We develop multiple heuristics for finding PR clauses and multiple configurations for learning some subset of the found PR clauses. As PR clauses are learned we use failed literal probing [11] to find unit clauses implied by the formula. The preprocessing is made efficient by taking advantage of the inner/outer solver framework in SADICAL. The learned PR clauses are added to the original formula, aggressively pruning the search space in an effort to guide CDCL solvers to short proofs. With this method PR clauses can be learned without altering the complex heuristics that make CDCL solvers robust. PRELEARN focuses on finding short PR clauses and failed literals to effectively reduce the search space. This is done with general heuristics that work across a wide range of problems.

Most SAT solvers support logging proofs of unsatisfiability for independent checking [17,20,33]. This has proved valuable for verifying solutions independent of a (potentially buggy) solver. Modern SAT solvers log proofs in the DRAT proof system (RAT [21] with deletions). DRAT captures all widely used pre- and in-processing techniques including bounded variable elimination [10], bounded variable addition [26], and extended learning [4,32]. DRAT can express the common symmetry-breaking techniques, but it is complicated [13]. PR can compactly express some symmetry-breaking techniques [14,15], yielding short proofs that can be checked by the proof checker DPR-TRIM [16]. PR gives a framework for strong symmetry-breaking inferences and also maintains the highly desirable ability to independently verify proofs.

The contributions of this paper include: (1) giving a high-level algorithm for extracting PR clauses, (2) implementing several heuristics for finding and learning PR clauses, (3) evaluating the effectiveness of different heuristic configurations, and (4) assessing the impact of PRELEARN on solver performance. PRELEARN improves the performance of the CDCL solver KISSAT on a quarter of the satisfiable and unsatisfiable competition benchmarks we considered. The improvement is significant for a number of instances that can only be solved by KISSAT after preprocessing. Most of them come from hard combinatorial problems with small formulas. In addition, PRELEARN directly produces refutation proofs for the mutilated chessboard problem containing only unit and binary PR clauses.

# **2 Preliminaries**

We consider propositional formulas in *conjunctive normal form* (CNF). A CNF formula ψ is a conjunction of *clauses* where each clause is a disjunction of *literals*. A literal l is either a variable x (positive literal) or a negated variable x (negative literal). For a set of literals L the formula ψ(L) is the clauses {C ∈ ψ | C ∩ L = ∅}.

An *assignment* is a mapping from variables to truth values 1 (*true*) and 0 (*false*). An assignment is *total* if it assigns every variable to a value, and *partial* if it assigns a subset of variables to values. The set of variables occurring in a formula, assignment, or clause is given by var(ψ), var(α), or var(C). For a literal l, var(l) is a variable.

An assignment α *satisfies* a positive (negative) literal l if α maps var(l) to true (α maps var(l) to false, respectively), and *falsifies* it if α maps var(l) to false (α maps var(l) to true, respectively). We write a finite partial assignment as the set of literals it satisfies. An assignment satisfies a clause if the clause contains a literal satisfied by the assignment. An assignment satisfies a formula if every clause in the formula is satisfied by the assignment. A formula is *satisfiable* if there exists a satisfying assignment, and *unsatisfiable* otherwise. Two formula are *logically equivalent* if they share the same set of satisfying assignments. Two formulas are *satisfiability equivalent* if they are either both satisfiable or both unsatisfiable.

If an assignment α satisfies a clause C we define C |α = , otherwise C |α represents the clause C with the literals falsified by α removed. The empty clause is denoted by ⊥. The formula ψ reduced by an assignment α is given by ψ|α = {C |α | C ∈ <sup>ψ</sup> and <sup>C</sup> <sup>|</sup><sup>α</sup> <sup>=</sup> }. Given an assignment <sup>α</sup> <sup>=</sup> <sup>l</sup><sup>1</sup> ...ln, <sup>C</sup> = (l<sup>1</sup> ∨···∨ <sup>l</sup>n) is the clause that *blocks* α. The assignment *blocked* by a clause is the negation of the literals in the clause. The literals touched by an assignment is defined by touchedα(C) = {<sup>l</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>C</sup> and var(l) <sup>∈</sup> var(α)} for a clause. For a formula <sup>ψ</sup>, touchedα(ψ) is the union of touched variables for each clause in ψ. A *unit* is a clause containing a single literal. The *unit clause rule* takes the assignment α of all units in a formula ψ and generates ψ|α. Iteratively applying the unit clause rule until fixpoint is referred to as *unit propagation*. In cases where unit propagation yields ⊥ we say it derived a *conflict*. A formula ψ *implies* a formula ψ , denoted ψ |= ψ , if every assignment satisfying ψ satisfies ψ . By ψ <sup>1</sup> ψ we denote that for every clause C ∈ ψ , applying unit propagation to the assignment blocked by C in ψ derives a conflict. If unit propagation derives a conflict on the formula ψ∪{{l}}, we say l is a *failed literal* and the unit l is logically implied by the formula. Failed literal probing [11] is the process of successively assigning literals to check if units are implied by the formula. In its simplest form, probing involves assigning a literal l and learning the unit l if unit propagation derives a conflict, otherwise l is unassigned and the next literal is checked.

To evaluate the satisfiability of a formula, a CDCL solver [27] iteratively performs the following operations: First, the solver performs unit propagation, then tests for a conflict. Unit propagation is made efficient with two-literal watch pointers [28]. If there is no conflict and all variables are assigned, the formula is satisfiable. Otherwise, the solver chooses an unassigned variable through a variable decision heuristic [6,25], assigns a truth value to it, and performs unit propagation. If, however, there is a conflict, the solver performs conflict analysis potentially learning a short clause. In case this clause is the empty clause, the formula is unsatisfiable.

#### **3 The PR Proof System**

A clause C is *redundant* w.r.t. a formula ψ if ψ and ψ∪{C} are *satisfiability equivalent*. The clause sequence ψ, C1, C2,...,Cn is a clausal proof of <sup>C</sup>n if each clause <sup>C</sup>i (1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>) is redundant w.r.t. <sup>ψ</sup> ∪ {C1, C2,...,Ci−<sup>1</sup>}. The proof is a refutation of <sup>ψ</sup> if <sup>C</sup>n is ⊥. Clausal proof systems may also allow deletion. In a refutation proof clauses can be deleted freely because the deletion cannot make a formula less satisfiable.

Clausal proof systems are distinguished by the kinds of redundant clauses they allow to be added. The standard SAT solving paradigm CDCL learns clauses implied through *resolution*. These clauses are logically implied by the formula, and fall under the *reverse unit propagation* (RUP) proof system. The *Resolution Asymmetric Tautology* (RAT) proof system generalizes RUP. All commonly used inprocessing techniques emit DRAT proofs. The *propagation redundant* (PR) proof system generalizes RAT by allowing the pruning of branches *without loss of satisfaction*.

Let C be a clause in the formula ψ and α the assignment blocked by C. Then C is PR w.r.t. ψ if and only if there exists an assignment ω such that ψ|α <sup>1</sup> ψ|ω and ω satisfies C. Intuitively, this allows inferences that block a partial assignment α as long as another assignment ω is as satisfiable. This means every assignment containing α that satisfies ψ can be transformed to an assignment containing ω that satisfies ψ.

Clausal proofs systems must be checkable in polynomial time to be useful in practice. RUP and RAT are efficiently checkable due to unit propagation. In general, determining if a clause is PR is an NP-complete problem [18]. However, a PR proof is checkable in polynomial time if the witness assignments ω are included. A clausal proof with witnesses will look like ψ,(C1, ω1),(C2, ω2),...,(Cn, ωn). The proof checker DPR-TRIM can efficiently check PR proofs that include witnesses. Further, DPR-TRIM can emit proofs in the LPR format. They can be validated by the formally-verified checker CAKE-LPR [31], which was used to validate results in recent SAT competitions.

### **4 Pruning Predicates and SADICAL**

Determining if a clause is PR is NP-complete and can naturally be formulated in SAT. Given a clause C and formula ψ, a *pruning predicate* is a formula such that if it is satisfiable, the clause C is redundant w.r.t. ψ. SADICAL uses two pruning predicates to determine if a clause is PR: *positive reduct* and *filtered positive reduct*. If either predicate is satisfiable, the satisfying assignment serves as the witness showing the clause is PR.

Given a formula ψ and assignment α, the *positive reduct* is the formula G ∧ C where <sup>C</sup> is the clause that blocks <sup>α</sup> and <sup>G</sup> <sup>=</sup> {touchedα(D) <sup>|</sup> <sup>D</sup> <sup>∈</sup> <sup>ψ</sup> and <sup>D</sup>|<sup>α</sup> <sup>=</sup> }. If the positive reduct is satisfiable, the clause C is PR w.r.t. ψ. The positive reduct is satisfiable iff the clause blocked by α is a *set-blocked* clause [23].

Given a formula ψ and assignment α, the *filtered positive reduct* is the formula G∧C where <sup>C</sup> is the clause that blocks <sup>α</sup> and <sup>G</sup> <sup>=</sup> {touchedα(D) <sup>|</sup> <sup>D</sup> <sup>∈</sup> <sup>ψ</sup> and <sup>D</sup>|<sup>α</sup> -1 touchedα(D)}. If the filtered positive reduct is satisfiable, the clause <sup>C</sup> is PR w.r.t. <sup>ψ</sup>. The filtered positive reduct is a subset of the positive reduct and is satisfiable iff the clause blocked by α is a *set-propagation redundant* clause [14]. Example 1 shows a formula for which the positive and filtered positive reducts are different, and only the filtered positive reduct is satisfiable.

*Example 1.* Given the formula (x<sup>1</sup> ∨ x2) ∧ (x<sup>1</sup> ∨ x2), the positive reduct with α = x<sup>1</sup> is (x1)∧(x1), which is unsatisfiable. The clause (x1) can be filtered, giving the filtered positive reduct (x1), which is satisfiable.

SADICAL [16] uses satisfaction-driven clause learning (SDCL) that extends CDCL by learning PR clauses [18] based on (filtered) positive reducts. SADICAL uses an inner/outer solver framework. The outer solver attempts to solve the SAT problem with SDCL. SDCL diverges from the basic CDCL loop when unit propagation after a decision does not derive a conflict. In this case a reduct is generated using the current assignment, and the inner solver attempts to solve the reduct using CDCL. If the reduct is satisfiable, the PR clause blocking the current assignment is learned, and the SDCL loop continues. The PR clause can be simplified by removing all non-decision variables from the assignment. SADICAL emits PR proofs by logging the satisfying assignment of the reduct as the witness, and these proofs are verified with DPR-TRIM. The key to SADI-CAL finding good PR clauses leading to short proofs is the decision heuristic, because variable selection builds the candidate PR clauses. Hand-crafted decision heuristics enable SADICAL to find short proofs on pigeonhole and mutilated chessboard problems. However, these heuristics differ significantly from the score-based heuristics in most CDCL solvers. Our experiences with SaCiDaL suggest that improving the heuristics for SDCL reduces the performance of CDCL and the other way around. This may explain why SADICAL performs worse than standard CDCL solvers on the majority of the SAT competition benchmarks. While SADICAL integrates finding PR clauses of arbitrary size in the main search loop, our tool focuses on learning short PR clauses as a preprocessing step. This allows us to develop good heuristics for PR learning without compromising the main search loop.

## **5 Extracting PR Clauses**

The goal of PRELEARN is to find useful PR clauses that improve the performance of CDCL solvers on both satisfiable and unsatisfiable instances. Figure 1 shows how a SAT problem is solved using PRELEARN. For some preset time limit, PR clauses are found and then added to the original formula. Interleaved in this process is failed literal probing to check if unit clauses can be learned. When the preprocessing stage ends, the new formula that includes learned PR clauses is solved by a CDCL solver. If the

**Fig. 1.** Solving a formula with PRELEARN and a CDCL solver.

formula is satisfiable, the solver will produce a satisfying assignment. If the formula is unsatisfiable, a refutation proof of the original formula can be computed by combining the satisfaction preserving proof from PRELEARN and the refutation proof emitted by the CDCL solver. The complete proof can be verified with DPR-TRIM.

PRELEARN alternates between finding PR clauses and learning PR clauses. Candidate PR clauses are found by iterating over each variable in the formula, and for each variable constructing clauses that include that variable. To determine if a clause is PR, the positive reduct generated by that clause is solved. It can be costly to generate and solve many positive reducts, so heuristics are used to find candidate clauses that are more likely to be PR. It is possible to find multiple PR clauses that conflict with each other. PR clauses are conflicting if adding one of the PR clauses to the formula makes the other no longer PR. Learning PR clauses involves selecting PR clauses that are nonconflicting. The selection may maximize the number of PR clauses learned or optimize for some other metric. Adding PR clauses and units derived from probing may cause new clauses to become PR, so the entire process is iterated multiple times.

#### **5.1 Finding PR Clauses**

PR clauses are found by constructing a set of candidate clauses and solving the positive reduct generated by each clause. In SADICAL the candidates are the clauses blocking the partial assignment of the solver after each decision in the SDCL loop that does not derive a conflict. In effect, candidates are constructed using the solver's variable decision heuristic. We take a more general approach, constructing sets of candidates for each variable based on unit propagation and the partial assignment's neighbors.

For a variable x, neighbors(x) denotes the set of variables occurring in clauses containing literal x or x, excluding variable x. For a partial assignment α, neighbors(α) denotes - x∈var(α) neighbors(x) \ var(α). Candidate clauses for a literal <sup>l</sup> are generated in the following way:


Example 2 shows how candidate binary clauses are constructed using both polarities of an initial variable x. In Example 3 the depth is expanded to reach more variables and create larger sets of candidate clauses. The depth parameter is used in Section 5.4.

*Example 2.* Consider the following formula: (x<sup>1</sup> ∨ x2) ∧ (x<sup>1</sup> ∨ x3) ∧ (x<sup>1</sup> ∨ x<sup>4</sup> ∨ x5) ∧ (x<sup>2</sup> ∨ x<sup>6</sup> ∨ x7) ∧ (x<sup>3</sup> ∨ x<sup>7</sup> ∨ x8) ∧ (x<sup>8</sup> ∨ x9),

**Case 1:** We start with var(x1)=1 and perform unit propagation resulting in α = {x1x3}. Observe that neighbors(α) = {x2, x4, x5, x7, x8}. The generated candidate clauses are (x<sup>1</sup> ∨ x2),(x<sup>1</sup> ∨ x2),(x<sup>1</sup> ∨ x4),(x<sup>1</sup> ∨ x4),...,(x<sup>1</sup> ∨ x8),(x<sup>1</sup> ∨ x8). **Case 2:** We start with var(x1)=0 and perform unit propagation resulting in α =

{x1x2}. Observe that neighbors(α) = {x3, x4, x5, x6, x7}. The generated candidate clauses are (x<sup>1</sup> ∨ x3),(x<sup>1</sup> ∨ x3),(x<sup>1</sup> ∨ x4),(x<sup>1</sup> ∨ x4),...,(x<sup>1</sup> ∨ x7),(x<sup>1</sup> ∨ x7).

*Example 3.* Take the formula from Example 2 and assignment of var(x1)=1 as in case 1. The set of candidate clauses can be expanded by also considering the unassigned neighbors of the variables in neighbors(α). For example, neighbors(x8) = {x3, x7, x9}, of which x<sup>9</sup> is new and unassigned. This adds (x<sup>1</sup> ∨ x9) and (x<sup>1</sup> ∨ x9) to the set of candidate clauses. This can be iterated by including neighbors of new unassigned variables from the prior step.

We consider both polarities when constructing candidates for a variable. After all candidates for a variable are constructed, the positive reduct for each candidate is generated and solved in order. Note that propagated literals appearing in the partial assignment do not appear in the PR clause. The satisfying assignment is stored as the witness and the PR clause may be learned immediately depending on the learning configuration.

This process is naturally extended to ternary clauses. The binary candidates are generated, and for each candidate (x∨y), x and y are assigned to false in the first step. The variables z ∈ neighbors(α) yield clauses (x∨y ∨z) and (x∨y ∨z). This approach can generate many candidate ternary clauses depending on the connectivity of the formula since each candidate binary clause is expanded. A filtering operation would be useful to avoid the blow-up in number of candidates. There are likely diminishing returns when searching for larger PR clauses because (1) there are more possible candidates, (2) the positive reducts are likely larger, and (3) each clause blocks less of the search space. We consider only unit and binary candidate clauses in our main evaluation.

Ideally, we should construct candidate clauses that are likely PR to reduce the number of failed reducts generated. Note, the (filtered) positive reduct can only be satisfiable if given the partial assignment there exists a reduced, satisfied clause. By focusing on neighbors, we guarantee that such a clause exists. The *reduced* heuristic in SADICAL finds variables in all reduced but unsatisfied clauses. The idea behind this heuristic is to direct the assignment towards conditional autarkies that imply a satisfiable positive reduct [18]. The neighbors approach generalizes this to variables in all reduced clauses whether or not they are unsatisfied. A comparison can be found in our repository.

#### **5.2 Learning PR Clauses**

Given multiple clauses that are PR w.r.t. the same formula, it is possible that some of the clauses conflict with each other and cannot be learned simultaneously. Example 4 shows how learning one PR clause may invalidate the witness of another PR clause. It may be that a different witness exists, but finding it requires regenerating the positive reduct to include the learned PR clause and solving it. The simplest way to avoid conflicting PR clause is to learn PR clauses as they are found. When a reduct is satisfiable, the PR clauses is added to the formula and logged with its witness in the proof. Then subsequent reducts will be generated from the formula including all added PR clauses. Therefore, a satisfiable reduct ensures a PR clause can be learned.

Alternatively, clauses can be found in batches, then a subset of nonconflicting clauses can be learned. The set of conflicts between PR clauses can be computed in polynomial time. For each pair of PR clauses C and D, if the assignment that generated the pruning predicate for D touches C and C is not satisfied by the witness of D, then C conflicts with D. In some cases reordering the two PR clauses may avoid a conflict. In Example 4 learning the second clause would not affect the validity of the first clauses' witness. Once the conflicts are known, clauses can be learned based on some heuristic ordering. Batch learning configurations are discussed more in the following section.

*Example 4.* Assume the following clause witness pairs are valid in a formula ψ: {(x<sup>1</sup> ∨ x<sup>2</sup> ∨ x3), x1x2x3}, and {(x<sup>1</sup> ∨ x<sup>2</sup> ∨ x4), x1x2x4}. The first clause conflicts with the second. If the first clause is added to ψ, the clause (x<sup>1</sup> ∨ x2) would be in the positive reduct for the second clause, but it is not satisfied by the witness of the second clause.

#### **5.3 Additional Configurations**

The sections above describe the PRELEARN configuration used in the main evaluation, i.e., finding candidate PR clauses with the neighbors heuristic and learning clauses instantly as the positive reducts are solved. In this section we present several additional configurations. The time-constrained reader may skip ahead to Section 5.4 for the presentation of our main results.

In batch learning a set of PR clauses are found in batches then learned. Learning as many nonconflicting clauses as possible coincides with the maximum independent set problem. This problem is NP-Hard. We approximate the solution by adding the clause causing the fewest conflicts with unblocked clauses. When a clause is added, the clauses it blocks are removed from the batch and conflict counts are recalculated Alternatively, clauses can be added in a random order. Random ordering requires less computation at the cost of potentially fewer learned PR clauses.

The neighbors heuristic for constructing candidate clauses can be modified to include a depth parameter. neighbors(i) indicates the number of iterations expanding the variables. For example, neighbors(2) expands on the variables in neighbors(1), seen in Example 3. We also implement the reduced heuristic, shown in Example 5. Detailed evaluations and comparisons can be found in our repository. In general, we found that the additional configurations did not improve on our main configuration. More work needs to be done to determine when and how to apply these additional configurations.

*Example 5.* Given the set of clauses (x<sup>1</sup> ∨ x<sup>2</sup> ∨ x3) ∧ (x<sup>1</sup> ∨ x<sup>3</sup> ∨ x4) ∧ (x<sup>3</sup> ∨ x5), and initial assignment α = x1, only the second clause is reduced and not satisfied, giving reduced(α) = {x3, x4} and candidate clauses (x1∨x3), (x1∨x4), (x1∨x3), (x1∨x4).

#### **5.4 Implementation**

PRELEARN was implemented using the inner/outer-solver framework in SADICAL. The inner solver acts the same as in SADICAL, solving pruning predicates using CDCL. The outer solver is not used for SDCL, but the SDCL data-structures are used to find and learn PR clauses. The outer solver is initialized with the original formula and maintains the list of variables, clauses, and watch pointers. By default, the outer solver has no variables assigned other than learned units. When finding candidates, the variables in the partial clause are assigned in the outer solver. Unit propagation makes it possible to find all reduced clauses in the formula with a single pass. This is necessary for constructing the positive reduct. After a candidate clause has been assigned and the positive reduct solved, the variables are unassigned. This returns the outer solver to the top-level before examining the next candidate. When a PR clause is learned, it is added to the formula along with its watch pointers. Additionally, failed literals are found if assigning a variable at the top-level causes a conflict through unit propagation. The negation of a failed literal is a unit that can be added to the formula.

In a single iteration each variable in the formula is processed in a breadth-first search (BFS) starting from the first variable in the numbering. When a variable is encountered it is first checked whether either assignment of the variable is a failed literal or a unit PR clause. If not, binary candidates are generated based on the selected heuristic and PR clauses are learned based on the learning configuration. Variables are added to the frontier of the BFS as they are encountered during candidate clause generation, but they are not repeated. Optionally, after all variables have been encountered the BFS restarts, now constructing ternary candidates. The repetition continues to the desired clause length. Then another iteration begins again with binary clauses. Running PRELEARN multiple times is important because adding PR clauses in one iteration may allow additional clauses to be added in the next.

# **6 Mutilated Chessboard**

The *mutilated chessboard* is an n × n grid of alternating black and white squares with two opposite corners removed. The problem is whether or not the the board can be covered with 2 × 1 dominoes. This can be encoded in CNF by using variables to represent

**Fig. 2.** Occurrences of two horizontal dominoes may be replaced by two vertical dominos in a solution. Similarly, occurrences of a horizontal domino atop two vertical dominos can be replaced by shifting the horizontal domino down.

Units and Binary PR Clauses Learned per Execution for *N* = 20

**Fig. 3.** Unit and binary PR clauses learned each execution (red-dotted line) until a contradiction was found. Markers on binary PR lines represent an iteration within an execution.

domino placements on the board. At-most-one constraints (using the pairwise encodings) say only one domino can cover each square, and at-least-one constraints (using a disjunction) say some domino must cover each square.

In recent SAT competitions, no proof-generating SAT solver could deal with instances larger than N = 18. In ongoing work, we found refutation proofs that contain only units and binary PR clauses for some boards of size N ≤ 30. PRELEARN can be modified to automatically find proofs of this type. Running iterations of PRELEARN until *saturation*, meaning no new binary PR clauses or units can be found, yields some set of units and binary PR clauses. Removing the binary PR clauses from the formula and rerunning PRELEARN will yield additional units and a new set of binary PR clauses. Repeating the process of removing binary PR clauses and keeping units will eventually derive the empty clause for this problem. Figure 3 gives detailed values for N = 20. Within each execution (red dotted lines) there are at most 10 iterations (red tick markers), and each iteration learns some set of binary PR clauses (red). Some executions saturate binary PR clauses before the tenth iteration and exit early. At the end of each execution the binary PR clauses are deleted, but the units (blue) are kept for the following execution. A complete DPR proof (PR with deletion) can be constructed by adding deletion information for the binary PR clauses removed between each execution when concatenating the PRELEARN proofs. The approach works for mutilated chess because in each execution there are many binary PR clauses that can be learned and will lead to units, but they are mutually exclusive and cannot be learned simultaneously. Further, adding units allows new binary PR clauses to be learned in following executions.

Table 1 shows the statistics for PRELEARN. Achieving these results required some modifications to the configuration of PRELEARN. First, notice in Figure 2 the PR clauses that can be learned involve blocking one domino orientation that can be replaced by a symmetric orientation. To optimize for these types of PR clauses, we only

**Table 1.** Statistics running multiple executions of PRELEARN on the mutilated chessboard problem with the configurations described below. Total units includes failed literals and learned PR units. The average units and average binary PR clauses learned during each execution (Exe.) are shown as well.


constructed candidates where the first literal was negative. The neighbors heuristic had to be increased to a depth of 6, meaning more candidates were generated for each variable. Intuitively, the proof is constructed by adding binary PR clauses in order to find negative units (dominos that cannot be placed) around the borders of the board. Following iterations build more units inwards, until a point is reached where units cover almost the entire board. This forces an impossible domino placement leading to a contradiction. Complete proofs using only units and binary PR clauses were found for boards up to size N = 24 within 5,000 seconds. We verified all proofs using DPR-TRIM. The mutilated chessboard has a high degree of symmetry and structure, making it suitable for this approach. For most problems it is not expected that multiple executions while keeping learned units will find new PR clauses.

Experiments were done with several configurations (see Section 5.3) to find the best results. We found that increasing the depth of neighbors was necessary for larger boards including N = 24. Increasing the depth allows more binary PR clauses to be found, at the cost of generating more reducts. This is necessary to find units. The reduced heuristic (a subset of neighbors) did not yield complete proofs. We also tried incrementing the depth after each execution starting with 1 and reseting at 9. In this approach, the execution times for depth greater than 6 were larger but did not yield more unit clauses on average. We attempted batch learning on every 500 found clauses using either random or the sorted heuristic. In each batch many of the 500 PR clauses blocked each other because many conflicting PR clauses can be found on a small set of variables in mutilated chess. The PR clauses that were blocked would be found again in following iterations, leading to more reducts generated and solved. This caused much longer execution times. Adding PR clauses instantly is a good configuration for reducing execution time when there are many conflicting clauses. However, for some less symmetric problems it may be worth the tradeoff to learn the clauses in batches, because learning a few bad PR clauses may disrupt the subsequent iterations.

#### **7 SAT Competition Benchmarks**

We evaluated PRELEARN on previous SAT competition formulas. Formulas from the '13, '15, '16, '19, '20, and '21 SAT competitions' main tracks were grouped by size. **0-10k** contains the 323 formulas with less than 10,000 clauses and **10k-50k** contains


**Table 2.** Fraction of benchmarks where PR clauses were learned, average runtime of PRELEARN, generated positive reducts and satisfiable positive reducts (PR clauses learned), and number of failed literals found.

the 348 formulas with between 10,000 and 50,000 clauses. In general, short PR proofs have been found for hard combinatorial problems typically having few clauses (0-10k). These include the pigeonhole and mutilated chessboard problems, some of which appear in 0-10k benchmarks. The PR clauses that can be derived for these formulas are intuitive and almost always beneficial to solvers. Less is known about the impact of PR clauses on larger formulas, motivating our separation of test sets by size. The repository containing the preprocessing tool, experiment configurations, and experiment data can be found at https://github.com/jreeves3/PReLearn.

We ran our experiments on StarExec [30]. The specs for the compute nodes can be found online.<sup>1</sup> The compute nodes that ran our experiments were Intel Xeon E5 cores with 2.4 GHz, and all experiments ran with 64 GB of memory and a 5,000 second timeout. We run PRELEARN for 50 iterations over 100 seconds, exiting early if no new PR clauses were found in an iteration.

PRELEARN was executed as a stand-alone program, producing a derivation proof and a modified CNF. For experiments, the CDCL solver KISSAT [5] was called once on the original formula and once on the modified CNF. KISSAT was selected because of its high-rankings in previous SAT competitions, but we expect the results to generalize to other CDCL SAT solvers.

Derivation proofs from PRELEARN were verified in all solved instances using the independent proof checker DPR-TRIM using a forward check. This can be extended to complete proofs in the following way. In the unsatisfiable case the proof for the learned PR clauses is concatenated to the proof traced by KISSAT, and the complete proof is verified against the original formula. In the satisfiable case the partial proof for the learned PR clauses is verified using a forward check in DPR-TRIM, and the satisfying assignment found by KISSAT is verified by the StarExec post-processing tool. Due to resource limitations, we verified a subset of complete proofs in DPR-TRIM. This is more costly because it involves running KISSAT with proof logging, then running DPR-TRIM on the complete proof.

Table 2 shows the cumulative statistics for running PRELEARN on the benchmark sets. Note the number of satisfiable reducts is the number of learned PR clauses, because PR clauses are learned immediately after the reduct is solved. These include both unit and binary PR clauses. A very small percentage of generated reducts is satisfiable, and subsequently learned. This is less important for small formulas when reducts can be computed quickly and there are fewer candidates to consider. However, for the 10k-50k formulas the average runtime more than triples but the number of generated reducts

<sup>1</sup> https://starexec.org/starexec/public/about.jsp


**Table 3.** Number of total solved instances and exclusive solved instances running KISSAT with and without PRELEARN. Number of improved instances running KISSAT with PRELEARN. PRELEARN execution times were included in total execution times.

less than doubles. PR clauses are found in about two thirds of the formulas, showing our approach generalizes beyond the canonical problems for which we knew PR clauses existed. Expanding the exploration and increasing the time limit did not help to find PR clauses in the remaining one third.

Table 3 gives a high-level picture of PRELEARN's impact on KISSAT. PRELEARN significantly improves performance on 0-10k SAT and UNSAT benchmarks. These contain the hard combinatorial problems including pigeonhole that PRELEARN was expected to perform well on. There were 4 additional SAT formulas solved with PRE-LEARN that KISSAT alone could not solve. This shows that PRELEARN impacts not only hard unsatisfiable problems but satisfiable problems as well. On the other hand, the addition of PR clauses makes some problems more difficult. This is clear with the 10k-50k results, where 5 benchmarks are solved exclusively with PRELEARN and 7 are solved exclusively without. Additionally, PRELEARN improved KISSAT's performance on 102 of 671 or approx. 15% of benchmarks. This is a large portion of benchmarks, both SAT and UNSAT, for which PRELEARN is helpful.

Figure 4 gives a more detailed picture on the impact of PRELEARN per benchmark. In the scatter plot the left-hand end of each line indicates the KISSAT execution time, while the length of the line indicates the PRELEARN execution time, and so the righthand end gives the total time for PRELEARN plus KISSAT. Lines that cross the diagonal indicate that the preprocessing improved KISSAT's performance but ran for longer than the improvement. PRELEARN improved performance for points above the diagonal. Points on the dotted-lines (timeout) are solved by one configuration and not the other.

The top plot gives the results for the 0-10k formulas, with many points on the top timeout line as expected. These are the hard combinatorial problems that can only be solved with PRELEARN. In general, the unsatisfiable formulas benefit more than the satisfiable formulas. PR clauses can reduce the number of solutions in a formula and this may explain the negative impact on many satisfiable formulas. However, there are still some satisfiable formulas that are only solved with PRELEARN.

In the bottom plot, formulas that take a long time to solve (above the diagonal in the upper right-hand corner) are helped more by PRELEARN. In the bottom half of the plot, many lines cross the diagonal meaning the addition of PR clauses provided a negligible benefit. For this set there are more satisfiable formulas for which PRELEARN is helpful.

**Fig. 4.** Execution times w/ and w/o PRELEARN on 0-10k (top) and 10k-50k (bottom) benchmarks. The left-hand point of each segment shows the time for the SAT solver alone; the righthand point indicates the combined time for preprocessing and solving.


**Table 4.** Some formulas solved by KISSAT exclusively *with* PRELEARN (top) and some formulas solved exclusively *without* PRELEARN (bottom). (\*) solved without KISSAT. Clauses include PR clauses and failed literals learned.

The results in Figure 4 are encouraging, with many formulas significantly benefitting from PRELEARN. PRELEARN improves the performance on both SAT and UN-SAT formulas of varying size and difficulty. In addition, lines that cross the diagonal imply that improving the runtime efficiency of PRELEARN alone would produce more improved instances. For future work, it would be beneficial to classify formulas before running PRELEARN. There may exist general properties of a formula that signal when PRELEARN will be useful and when PRELEARN will be harmful to a CDCL solver. For instance, a formula's community structure [2] may help focus the search to parts of the formula where PR clauses are beneficial.

#### **7.1 Benchmark Families**

In this section we analyze benchmark families that PRELEARN had the greatest positive (negative) effect on, found in Table 4. Studying the formulas PRELEARN works well on may reveal better heuristics for finding good PR clauses.

It has been shown that PR works well for hard combinatorial problems based on perfect matchings [14,15]. The perfect matching benchmarks (randomG) [7] are a generalization of the pigeonhole (php) and mutilated chessboard problems with varying at-most-one encodings and edge densities. The binary PR clauses can be intuitively understood as blocking two edges from the perfect matching if there exists two other edges that match the same nodes. These benchmarks are relatively small but extremely hard for CDCL solvers. Symmetry-breaking with PR clauses greatly reduces the search space and leads KISSAT to a short proof of unsatisfiability. PRELEARN also benefits other hard combinatorial problems that use pseudo-Boolean constraints. The pseudo-Boolean (Pb-chnl) [24] benchmarks are based on at-most-one constraints (using the pairwise encoding) and at-least-one constraints. These formulas have a similar graphical structure to the perfect matching benchmarks. Binary PR clauses block two edges when another set of edges exists that are incident to the same nodes.

For the other two benchmark families that benefited from PRELEARN, the intuition behind PR learning is less clear. The fixed-shape random formulas (fsf) [29] are parameterized non-clausal random formulas built from hyper-clauses. The SAT encoding makes use of the Plaisted-Greenbaum transformation, introducing circuit-like structure to the problem. The superpermutation problem (sp) [22] asks whether a sequence of digits 1–n of length l can contain every permutation of [1, n] as a subsequence, and the optimization variant asks for the smallest such l given n. The sequence of l digits is encoded directly and passed through a multi-layered circuit that checks for the existence of each individual permutation. Digits use the binary (*bin*) or unary (*una*) encoding, are strict *stri* if clauses constrain digit bits to valid encodings and nonstrict *nons* otherwise, and *flat* if the circuit is a large AND or *tree* for prefix recognizing nested circuits. The formulas given ask to find a prefix of a superpermutation for n = 5 or length 26 with 19 permutations. The check for 19 permutations was encoded as cardinality constraints in a pseudo-Boolean instance, then converted back to SAT. Each individual permutation is checked by duplicating circuits at each possible starting position of the permutation in l. PR clauses may be pruning certain starting positions for some permutations or affecting the pseudo-Boolean constraints. This cannot be determined without a deeper knowledge of the benchmark generator.

The relativized pigeonhole problem (rphp) [3] involves placing k pigeons in k − 1 holes with n nesting places. This problem has polynomial hardness for resolution, unlike the exponential hardness of the classical pigeonhole problem. The symmetrybreaking preprocessor BREAKID [9] generates symmetry-breaking formulas for rphp that are easy for a CDCL solver. PRELEARN can learn many PR clauses but the formula does not become easier. Note PRELEARN can solve the php with n = 12 in a second.

One problem is clause and variable permuting (a.k.a. shuffling). The mutilated chessboard problem can still be solved by PRELEARN after permuting variables and clauses. The pigeonhole problem can be solved after permuting clauses but not after permuting variable names. In PRELEARN, PR candidates are sorted by variable name independent of clause ordering, but when the variable names change the order of learned clauses changes. In the mutilated chessboard problem there is local structure, so similar PR clauses are learned under variable renaming. In the pigeonhole problem there is global structure, so a variable renaming can significantly change the binary PR clauses learned and cause earlier saturation with far fewer units.

Another problem is that the addition of PR clauses can change the existing structure of a formula and negatively affect CDCL heuristics. The Pythagorean Triples Problem (Ptn) [19] asks whether monochromatic solutions of the equation a<sup>2</sup> + b<sup>2</sup> = c<sup>2</sup> can be avoided. The formulas encode numbers {1,..., 7824}, for which a valid 2-coloring is possible. In the namings, the *N* in b*N* denotes the number of backbone literals added to the formula. A backbone literal is a literal assigned true in every solution. Adding more than 20 backbone literals makes the problem easy. For each formula KISSAT can find a satisfying assignment, but timeouts with the addition of PR clauses. For one instance, adding only 39 PR clauses will lead to a timeout. In some hard SAT and UNSAT problems solvers require some amount of luck and adding a few clauses or shuffling a formula can cause a CDCL solver's performance to sharply decrease. The Pythagorean Triples problem was originally solved with a local search solver, and local search still performs well after adding PR clauses.

In a straight-forward way, one can avoid the negative effects of adding harmful PR clauses by running two solvers in parallel: one with PRELEARN and one without. This fits with the portfolio approach for solving SAT problems.

# **8 Conclusion and Future Work**

In this paper we presented PRELEARN, a tool built from the SADICAL framework that learns PR clauses in a preprocessing stage. We developed several heuristics for finding PR clauses and multiple configurations for clause learning. In the evaluation we found that PRELEARN improves the performance of the CDCL solver KISSAT on many benchmarks from past SAT competitions.

For future work, quantifying the usefulness of each PR clause in relation to guiding the CDCL solver may lead to better learning heuristics. This is a difficult task that likely requires problem specific information. Separately, failed clause caching can improve performance by remembering and avoiding candidate clauses that fail with unsatisfiable reducts in multiple iterations. This would be most beneficial for problems like the mutilated chessboard that have many conflicting PR clauses. Lastly, incorporating PRELEARN during in-processing may allow for more PR clauses to be learned. This could be implemented with the inner/outer solver framework but would require a significantly narrowed search. CDCL learns many clauses during execution and it would be infeasible to examine binary PR clauses across the entire formula.

**Acknowledgements.** We thank the community at StarExec for providing computational resources.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Reasoning About Vectors Using an SMT Theory of Sequences**

Ying Sheng1(B) , Andres Notzli ¨ 1, Andrew Reynolds2, Yoni Zohar3, David Dill4, Wolfgang Grieskamp4, Junkil Park4, Shaz Qadeer4, Clark Barrett1, and Cesare Tinelli2

> Stanford University, Stanford, USA ying1123@stanford.edu The University of Iowa, Iowa City, USA Bar-Ilan University, Ramat Gan, Israel Meta Novi, Menlo Park, USA

**Abstract.** Dynamic arrays, also referred to as vectors, are fundamental data structures used in many programs. Modeling their semantics efficiently is crucial when reasoning about such programs. The theory of arrays is widely supported but is not ideal, because the number of elements is fixed (determined by its index sort) and cannot be adjusted, which is a problem, given that the length of vectors often plays an important role when reasoning about vector programs. In this paper, we propose reasoning about vectors using a theory of sequences. We introduce the theory, propose a basic calculus adapted from one for the theory of strings, and extend it to efficiently handle common vector operations. We prove that our calculus is sound and show how to construct a model when it terminates with a saturated configuration. Finally, we describe an implementation of the calculus in cvc5 and demonstrate its efficacy by evaluating it on verification conditions for smart contracts and benchmarks derived from existing array benchmarks.

# **1 Introduction**

Generic vectors are used in many programming languages. For example, in C++'s standard library, they are provided by std::vector. Automated verification of software systems that manipulate vectors requires an efficient and automated way of reasoning about them. Desirable characteristics of any approach for reasoning about vectors include: piq expressiveness—operations that are commonly performed on vectors should be supported; piiq generality—vectors are always "vectors of" some type (e.g., vectors of integers), and so it is desirable that vector reasoning be integrated within a more general framework; solvers for satisfiability modulo theories (SMT) provide such a framework and are widely used in verification tools (see [5] for a recent survey); piiiq efficiency—fast and efficient reasoning is essential for usability, especially as verification tools are increasingly used by non-experts and in continuous integration.

This work was funded in part by the Stanford Center for Blockchain Research, NSF-BSF grant numbers 2110397 (NSF) and 2020704 (BSF), and Meta Novi. Part of the work was done when the first author was an intern at Meta Novi.

Despite the ubiquity of vectors in software on the one hand and the effectiveness of SMT solvers for software verification on the other hand, there is not currently a clean way to represent vectors using operators from the SMT-LIB standard [3]. While the theory of arrays can be used, it is not a great fit because arrays have a fixed size determined by their index type. Representing a dynamic array thus requires additional modeling work. Moreover, to reach an acceptable level of expressivity, quantifiers are needed, which often makes the reasoning engine less efficient and robust. Indeed, part of the motivation for this work was frustration with array-based modeling in the Move Prover, a verification framework for smart contracts [24] (see Sect. 6 for more information about the Move Prover and its use of vectors). The current paper bridges this gap by studying and implementing a native theory of *sequences* in the SMT framework, which satisfies the desirable properties for vector reasoning listed above.

We present two SMT-based calculi for determining satisfiability in the theory of sequences. Since the decidability of even weaker theories is unknown (see, e.g., [9,15]), we do not aim for a decision procedure. Rather, we prove model and solution soundness (that is, when our procedure terminates, the answer is correct). Our first calculus leverages techniques for the theory of strings. We generalize these techniques, lifting rules specific to string characters to more general rules for arbitrary element types. By itself, this base calculus is already quite effective. However, it misses opportunities to perform high-level vector-based reasoning. For example, both reading from and updating a vector are very common operations in programming, and reasoning efficiently about the corresponding sequence operators is thus crucial. Our second calculus addresses this gap by integrating reasoning methods from array solvers (which handle reads and updates efficiently) into the first procedure. Notice, however, that this integration is not trivial, as it must handle novel combinations of operators (such as the combination of update and read operators with concatenation) as well as out-of-bounds cases that do not occur with ordinary arrays. We have implemented both variants of our calculus in the cvc5 SMT solver [2] and evaluated them on benchmarks originating from the Move prover, as well as benchmarks that were translated from SMT-LIB array benchmarks.

As is typical, both of our calculi are agnostic to the sort of the elements in the sequence. Reasoning about sequences of elements from a particular theory can then be done via theory combination methods such as Nelson-Oppen [18] or polite combination [16,20]. The former can be done for stably infinite theories (and the theory of sequences that we present here is stably infinite), while the latter requires investigating the politeness of the theory, which we expect to do in future work.

The rest of the paper is organized as follows. Section 2 includes basic notions from first-order logic. Section 3 introduces the theory of sequences and shows how it can be used to model vectors. Section 4 presents calculi for this theory and discusses their correctness. Section 5 describes the implementation of these calculi in cvc5. Section 6 presents an evaluation comparing several variations of the sequence solver in cvc5 and Z3. We conclude in Sect. 7 with directions for further research.

**Related Work:** Our work crucially builds on a proposal by Bjørner et al. [8], but extends it in several key ways. First, their implementation (for a logic they call QF\_BVRE) restricts the generality of the theory by allowing only bit-vector elements (representing characters) and assuming that sequences are bounded. In contrast, our calculus maintains full generality, allowing unbounded sequences and elements of arbitrary types. Second, while our core calculus focuses only on a subset of the operators in [8], our implementation supports the remaining operators by reducing them to the core operators, and also adds native support for the update operator, which is not included in [8].

The base calculus that we present for sequences builds on similar work for the theory of strings [6,17]. We extend our base calculus to support array-like reasoning based on the weak-equivalence approach [10]. Though there exists some prior work on extending the theory of arrays with more operators and reasoning about length [1,12, 14], this work does not include support for most of the of the sequence operators we consider here.

The SMT-solver Z3 [11] also provides a solver for sequences. However, its documentation is limited [7], it does not support update directly, and its internal algorithms are not described in the literature. Furthermore, as we show in Sect. 6, the performance of the Z3 implementation is generally inferior to our implementation in cvc5.

#### **2 Preliminaries**

We assume the usual notions and terminology of many-sorted first-order logic with equality (see, e.g., [13] for a complete presentation). We consider many-sorted signatures Σ, each containing a set of sort symbols (including a Boolean sort Bool), a family of logical symbols « for equality, with sort σ ˆ σ Ñ Bool for all sorts σ in Σ and interpreted as the identity relation, and a set of interpreted (and sorted) function symbols. We assume the usual definitions of well-sorted terms, literals, and formulas as terms of sort Bool. A literal is *flat* if it has the form K, ppx1,...,x*n*q, ppx1,...,x*n*q, x « y, x « y, or x « fpx1,...,x*n*q, where p and f are function symbols and x, y, and x1,...,x*<sup>n</sup>* are variables. A Σ-interpretation M is defined as usual, satisfying MpKq " false and assigns: a set Mpσq to every sort σ of Σ, a function Mpfq : Mpσ1q ˆ ... ˆ Mpσ*n*q → Mpσq to any function symbol f of Σ with arity σ<sup>1</sup> ˆ ... ˆ σ*<sup>n</sup>* → σ, and an element Mpxq P Mpσq to any variable x of sort σ. The satisfaction relation between interpretations and formulas is defined as usual and is denoted by |".

A theory is a pair T " pΣ, **I**q, in which Σ is a signature and **I** is a class of Σinterpretations, closed under variable reassignment. The models of T are the interpretations in **I** without any variable assignments. A Σ-formula ϕ is satisfiable (resp., unsatisfiable) in T if it is satisfied by some (resp., no) interpretation in **I**. Given a (set of) terms S, we write T pSq to denote the set of all subterms of S. For a theory T " pΣ, **I**q, a set S of Σ-formulas and a Σ-formula ϕ, we write S |"*<sup>T</sup>* ϕ if every interpretation M P **I** that satisfies S also satisfies ϕ. By convention and unless otherwise stated, we use letters w, x, y, z to denote variables and s, t, u, v to denote terms.

The theory TLIA " pΣLIA, **I***<sup>T</sup>*LIA q of *linear integer arithmetic* is based on the signature ΣLIA that includes a single sort Int, all natural numbers as constant symbols, the unary ´ symbol, the binary ` symbol and the binary <sup>ď</sup> relation. When <sup>k</sup> <sup>P</sup> <sup>N</sup>, we use the notation k · x, inductively defined by 0 · x " 0 and pm ` 1q · x " x ` m · x. In turn, **I***<sup>T</sup>*LIA consists of all structures M for ΣLIA in which the domain MpIntq of Int is the set


**Fig. 1.** Signature for the theory of sequences.

of integer numbers, for every constant symbol <sup>n</sup> <sup>P</sup> <sup>N</sup>, <sup>M</sup>pnq " <sup>n</sup>, and `, ´, and <sup>ď</sup> are interpreted as usual. We use standard notation for integer intervals (e.g., [a, b] for the set of integers i, where a ď i ď b and [a, bq for the set where a ď i ă b).

### **3 A Theory of Sequences**

We define the theory TSeq of sequences. Its signature ΣSeq is given in Fig. 1. It includes the sorts Seq, Elem, Int, and Bool, intuitively denoting sequences, elements, integers, and Booleans, respectively. The first four lines include symbols of ΣLIA. We write t<sup>1</sup> ' t2, with ' P {ą, ă, ď}, as syntactic sugar for the equivalent literal expressed using ď (and possibly -). The sequence symbols are given on the remaining lines. Their arities are also given in Fig. 1. Notice that `` ··· `` is a variadic function symbol.

Interpretations M of TSeq interpret: Int as the set of integers; Elem as some set; Seq as the set of finite sequences whose elements are from Elem; as the empty sequence; unit as a function that takes an element from MpElemq and returns the sequence that contains only that element; nth as a function that takes an element s from MpSeqq and an integer i and returns the ith element of s, in case i is non-negative and is smaller than the length of s (we take the first element of a sequence to have index 0). Otherwise, the function has no restrictions; update as a function that takes an element s from MpSeqq, an integer i, and an element a from MpElemq and returns the sequence obtained from s by replacing its ith element by a, in case i is non-negative and smaller than the length of s. Otherwise, the returned value is s itself; extract as a function that takes a sequence s and integers i and j, and returns the maximal sub-sequence of s that starts at index i and has length at most j, in case both i and j are non-negative and i is smaller than the length of s. Otherwise, the returned value is the empty sequence;1 | | as a function that takes a sequence and returns its length; and `` ··· `` as a function that takes some number of sequences (at least 2) and returns their concatenation.

<sup>1</sup> In [8], the second argument *j* denotes the end index, while here it denotes the length of the sub-sequence, in order to be consistent with the theory of strings in the SMT-LIB standard.

Notice that the interpretations of Elem and nth are not completely fixed by the theory: Elem can be set arbitrarily, and nth is only defined by the theory for some values of its second argument. For the rest, it can be set arbitrarily.

#### **3.1 Vectors as Sequences**

We show the applicability of TSeq by using it for a simple verification task. Consider the C++ function swap at the top of Fig. 2. This function swaps two elements in a vector. The comments above the function include a partial specification for it: if both indexes are in-bounds and the indexed elements are equal, then the function should not change the vector (this is expressed by s\_out==s). We now consider how to encode the verification condition induced by the code and the specification. The function variables a, b, i, and j can be encoded as variables of sort Int with the same names. We include two copies of s: s for its value at the beginning, and s*out* for its value at the end. But what should be the sorts of s and s*out*? In Fig. 2 we consider two options: one is based on arrays and the other on sequences.

*Example 1 (Arrays).* The theory of arrays includes three sorts: index, element (in this case, both are Int), and an array sort Arr, as well as two operators: x[i], interpreted as the ith element of x; and x[i Ð a], interpreted as the array obtained from x by setting the element at index i to a. We declare s and s*out* as variables of an uninterpreted sort V and declare two functions and **c**, which, given v of sort V , return its length (of sort Int) and content (of sort Arr), respectively.2

Next, we introduce functions to model vector operations: «**<sup>A</sup>** for comparing vectors, nth**<sup>A</sup>** for reading from them, and update**<sup>A</sup>** for updating them. These functions need to be axiomatized. We include two axioms (bottom of Fig. 2): Ax<sup>1</sup> states that two vectors are equal iff they have the same length and the same contents. Ax<sup>2</sup> axiomatizes the update operator; the result has the same length, and if the updated index is in bounds, then the corresponding element is updated. These axioms are not meant to be complete, but are rather just strong enough for the example.

The first two lines of the swap function are encoded as equalities using nth**A**, and the last two lines are combined into one nested constraint that involves update**A**. The precondition of the specification is naturally modeled using nth**A**, and the post-condition is negated, so that the unsatisfiability of the formula entails the correctness of the function w.r.t. the specification. Indeed, the conjunction of all formulas in this encoding is unsatisfiable in the combined theories of arrays, integers, and uninterpreted functions.

The above encoding has two main shortcomings: It introduces auxiliary symbols, and it uses quantifiers, thus reducing clarity and efficiency. In the next example, we see how using the theory of sequences allows for a much more natural and succinct encoding.

*Example 2 (Sequences).* In the sequences encoding, s and s*out* have sort Seq. No auxiliary sorts or functions are needed, as the theory symbols can be used directly. Further,

<sup>2</sup> It is possible to obtain a similar encoding using the theory of datatypes; however, here we use uninterpreted functions which are simpler and better supported by SMT solvers.


#### **Fig. 2.** An example using *T*Seq.

these symbols do not need to be axiomatized as their semantics is fixed by the theory. The resulting formula, much shorter than in Exmaple 2 and with no quantifiers, is unsatisfiable in TSeq.

## **4 Calculi**

After introducing some definitions and assumptions, we describe a basic calculus for the theory of sequences, which adapts techniques from previous procedures for the theory of strings. In particular, the basic calculus reduces the operators nth and update by introducing concatenation terms. We then show how to extend the basic calculus by introducing additional rules inspired by solvers for the theory of arrays; the modified calculus can often reason about nth and update terms directly, avoiding the introduction of concatenation terms (which are typically expensive to reason about).

Given a vector of sequence terms *t* " pt1,...,t*n*q, we use *t* to denote the term corresponding to the concatenation of t1,...,t*n*. If n " 0, *t* denotes , and if n " 1, *t* denotes t1; otherwise (when n ą 1), *t* denotes a concatenation term having n children. In our calculi, we distinguish between sequence and arithmetic constraints.

**Definition 1.** *A* ΣSeq*-formula* ϕ *is a* sequence constraint *if it has the form* s « t *or* s « t*; it is an* arithmetic constraint *if it has the form* s « t*,* s ≥ t*,* s « t*, or* s ă t *where* s, t *are terms of sort* Int*, or if it is a disjunction* c<sup>1</sup> ∨ c<sup>2</sup> *of two arithmetic constraints.*

Notice that sequence constraints do not have to contain sequence terms (e.g., x « y where x, y are Elem-variables). Also, equalities and disequalities between terms of sort Int are both sequence and arithmetic constraints. In this paper we focus on sequence

**Fig. 3.** Rewrite rules for the reduced form *t*↓ of a term *t*, obtained from *t* by applying these rules to completion.

constraints and arithmetic constraints. This is justified by the following lemma. (Proofs of this lemma and later results can be found in an extended version of this paper [23].)

**Lemma 1.** *For every quantifier-free* ΣSeq*-formula* ϕ*, there are sets* S1,..., S*<sup>n</sup> of sequence constraints and sets* A1,...,A*<sup>n</sup> of arithmetic constraints such that* ϕ *is* TSeq*satisfiable iff* S*<sup>i</sup>* Y A*<sup>i</sup> is* TSeq*-satisfiable for some* i P [1, n]*.*

Throughout the presentation of the calculi, we will make a few simplifying assumptions.

**Assumption 1.** *Whenever we refer to a set* S *of sequence constraints, we assume:*


*3. all literals in* S *are flat.*

*Whenever we refer to a set of arithmetic constraints, we assume all its literals are flat.*

These assumptions are without loss of generality as any set can easily be transformed into an equisatisfiable set satisfying the assumptions by the addition of fresh variables and equalities. Note that some rules below introduce non-flat literals. In such cases, we assume that similar transformations are done immediately after applying the rule to maintain the invariant that all literals in S Y A are flat. Rules may also introduce fresh variables k of sort Seq. We further assume that in such cases, a corresponding constraint *<sup>k</sup>* « |k| is added to S with a fresh variable *k*.

**Definition 2.** *Let* C *be a set of constraints. We write* C |" ϕ *to denote that* C *entails formula* ϕ *in the empty theory, and write* "<sup>C</sup> *to denote the binary relation over* T pCq *such that* s "<sup>C</sup> t *iff* C |" s « t*.*

**Lemma 2.** *For all set* S *of sequence constraints,* "<sup>S</sup> *is an equivalence relation; furthermore, every equivalence class of* "<sup>S</sup> *contains at least one variable.*

We denote the equivalence class of a term s according to "<sup>S</sup> by [s]"<sup>S</sup> and drop the "<sup>S</sup> subscript when it is clear from the context.

In the presentation of the calculus, it will often be useful to normalize terms to what will be called a *reduced form*.

**Definition 3.** *Let* t *be a* ΣSeq*-term. The* reduced form *of* t*, denoted by* t↓*, is the term obtained by applying the rewrite rules listed in Fig. 3 to completion.*

Observe that t↓ is well defined because the given rewrite rules form a terminating rewrite system. This can be seen by noting that each rule reduces the number of applications of sequence operators in the left-hand side term or keeps that number the same but reduces the size of the term. It is not difficult to show that |"*<sup>T</sup>*Seq t « t↓.

We now introduce some basic definitions related to concatenation terms.

**Definition 4.** *A* concatenation term *is a term of the form* s<sup>1</sup> `` ··· `` s*<sup>n</sup> with* n ≥ 2*. If each* s*<sup>i</sup> is a variable, it is a* variable concatenation term*. For a set* S *of sequence constraints, a variable concatenation term* x1``···``x*<sup>n</sup> is* singular *in* S *if* S |" x*<sup>i</sup>* « *for at most one variable* x*<sup>i</sup> with* i P [1, n]*. A sequence variable* x *is* atomic in S *if* S x « *and for all variable concatenation terms* s P T pSq *such that* S |" x « s*,* s *is singular in* S*.*

We lift the concept of atomic variables to atomic representatives of equivalence classes.

**Definition 5.** *Let* S *be a set of sequence constraints. Assume a choice function* α : T pSq/"<sup>S</sup> → T pSq *that chooses a variable from each equivalence class of* "S*. A sequence variable* x *is an* atomic representative in S *if it is atomic in* S *and* x " αp[x]"<sup>S</sup> q*.*

Finally, we introduce a relation that is the foundation for reasoning about concatenations.

**Definition 6.** *Let* S *be a set of sequence constraints. We inductively define a relation* S |"`` x « s*, where* x *is a sequence variable in* S *and* s *is a sequence term whose variables are in* T pSq*, as follows:*


*Let* α *be a choice function for* S *as defined in Definition 5. We additionally define the entailment relation* S |"<sup>∗</sup> `` x « *y, where y is of length* n ≥ 0*, to hold if each element of y is an atomic representative in* S *and there exists z of length* n *such that* S |"`` x « *z and* S |" y*<sup>i</sup>* « z*<sup>i</sup> for* i P [1, n]*.*

In other words, S |"<sup>∗</sup> `` x « t holds when t is a concatenation of atomic representatives and is entailed to be equal to x by S. In practice, t is determined by recursively expanding concatenations using equalities in S until a fixpoint is reached.

*Example 3.* Suppose S " {x « y `` z,y « w `` u, u « v} (we omit the additional constraints required by Assumption 1, part 2 for brevity). It is easy to see that u, v, w, and z are atomic in S, but x and y are not. Furthermore, w and z (and one of u or v) must also be atomic representatives. Clearly, S |"`` x « x and S |" x « y `` z. Moreover, y `` z is a variable concatenation term that is not singular in S. Hence, we have S |"`` x « py `` zq↓, and so S |"`` x « y `` z (by using either Item 2 or Item 3 of Defintion 6, as in fact x « y `` z P S. ). Now, since S |"`` x « y `` z, S |" y « w `` u, and w `` u is a variable concatenation term not singular in S, we get that S |"`` x « ppw `` uq `` zq↓, and so S |"`` x « w `` u `` z. Now, assume that v " αp[v]"<sup>S</sup> q " αp{v, u}q. Then, S |"<sup>∗</sup> `` x « w `` v `` z.

Our calculi can be understood as modeling abstractly a cooperation between an *arithmetic subsolver* and a *sequence subsolver*. Many of the derivation rules lift those in the string calculus of Liang et al. [17] to sequences of elements of an arbitrary type. We describe them similarly as rules that modify *configurations*.

**Definition 7.** *A* configuration *is either the distinguished configuration* unsat *or a pair* pS,Aq *of a set* S *of sequence constraints and a set* A *of arithmetic constraints.*

The rules are given in guarded assignment form, where the rule premises describe the conditions on the current configuration under which the rule can be applied, and the conclusion is either unsat, or otherwise describes the resulting modifications to the configuration. A rule may have multiple conclusions separated by . In the rules, some of the premises have the form S |" s « t (see Definition 2). Such entailments can be checked with standard algorithms for congruence closure. Similarly, premises of the form S |"LIA s « t can be checked by solvers for linear integer arithmetic.

An application of a rule is redundant if it has a conclusion where each component in the derived configuration is a subset of the corresponding component in the premise configuration. We assume that for rules that introduce fresh variables, the introduced variables are identical whenever the premises triggering the rule are the same (i.e., we cannot generate an infinite sequence of rule applications by continuously using the same premises to introduce fresh variables).3 A configuration other than unsat is saturated with respect to a set R of derivation rules if every possible application of a rule in R to it is redundant. A derivation tree is a tree where each node is a configuration whose children, if any, are obtained by a non-redundant application of a rule of the calculus. A derivation tree is closed if all of its leaves are unsat. As we show later, a closed derivation tree with root node pS,Aq is a proof that A Y S is unsatisfiable in TSeq. In contrast, a derivation tree with root node pS,Aq and a saturated leaf with respect to all the rules of the calculus is a witness that A Y S is satisfiable in TSeq.

#### **4.1 Basic Calculus**

#### **Definition 8.** *The calculus* BASE *consists of the derivation rules in Figs. 4 and 5.*

Some of the rules are adapted from previous work on string solvers [17,22]. Compared to that work, our presentation of the rules is noticeably simpler, due to our use of the relation |"<sup>∗</sup> `` from Definition 6. In particular, our configurations consist only of pairs of sets of formulas, without any auxiliary data-structures.

Note that judgments of the form S |"<sup>∗</sup> `` x « t are used in premises of the calculus. It is possible to compute whether such a premise holds thanks to the following lemma.

**Lemma 3.** *Let* S *be a set of sequence constraints and* A *a set of arithmetic constraints. If* pS,Aq *is saturated w.r.t. S-Prop, L-Intro and L-Valid, the problem of determining whether* S |"<sup>∗</sup> `` x « s *for given* x *and* s *is decidable.*

Lemma 3 assumes saturation with respect to certain rules. Accordingly, our proof strategy, described in Sect. 5, will ensure such saturation before attempting to apply rules relying on |"<sup>∗</sup> ``. The relation |"<sup>∗</sup> `` induces a normal form for each equivalence class of "S.

<sup>3</sup> In practice, this is implemented by associating each introduced variable with a *witness term* as described in [21].

$$\begin{array}{c} \text{A-Conf } \frac{\mathbf{A} \cdot \square \mathbf{u} \cdot \square}{\text{unus} \text{at}} \qquad \mathbf{A} \cdot \mathsf{Prox} \xrightarrow{\mathbf{A}} \mathbf{S} \cdot \square \mathbf{v} \overset{\mathbf{A} \cdot \square \mathbf{u} \text{ s} \cdot \text{t} \quad s, t \in T(\mathbf{S})}{\mathbf{S} \cdot \square \mathbf{S}, s \approx t} \\\\ \text{S-Conf } \frac{\mathbf{S} \cdot \square \mathbf{u}}{\text{unus} \text{at}} \qquad \mathbf{S} \cdot \mathsf{Prox} \xrightarrow{\mathbf{S} \cdot \square \mathbf{s} \times \text{t}} \quad s, t \in T(\mathbf{S}) \quad s, t \text{ are } \Sigma\_{\parallel \mathbf{A}} \text{-terms} \\\\ \text{S-A} \cdot \frac{x, y \in \tau(T(\mathbf{S}) \cap T(\mathbf{A})) \quad x, y \text{ is } \mathsf{t} \\\\ \text{A-Into } \frac{\mathbf{s} \in T(\mathbf{S})}{\mathbf{S} \cdot \square \mathbf{s} \cdot \square \mathbf{s} \times \left(\frac{\mathbf{s}}{\mathbf{s}}\right) \qquad \mathbf{L} \cdot \square \mathbf{s} \cdot \square \mathbf{e} \\\\ \text{A-Into } \frac{\mathbf{s} \in T(\mathbf{S})}{\mathbf{S} \cdot \square \mathbf{s} \cdot \square \mathbf{s} \left(\frac{\mathbf{s}}{\mathbf{s}}\right) \qquad \mathbf{L} \cdot \square \mathbf{e} = \mathbf{A}, \ell\_{x} \geq 0 \\\\ \text{U-Eq} \xrightarrow{\mathbf{S} \cdot \square \mathbf{u} \left(\frac{\mathbf{s}}{\mathbf{s}}\right) \simeq \mathbf{u} \text{in} \, \mu\_{\parallel} \, \$$

**Fig. 4.** Core derivation rules. The rules use *k* and *i* to denote fresh variables of sequence and integer sort, respectively, and *w*<sup>1</sup> and *w*<sup>2</sup> for fresh element variables.

**Lemma 4.** *Let* S *be a set of sequence constraints and* A *a set of arithmetic constraints. Suppose* pS,Aq *is saturated w.r.t. A-Conf, S-Prop, L-Intro, L-Valid, and C-Split. Then, for every equivalence class* e *of* "<sup>S</sup> *whose terms are of sort* Seq*, there exists a unique (possibly empty) s such that whenever* S |"<sup>∗</sup> `` x « *s for* x P e*, then s-* " *s. In this case, we call s the* normal form *of* e *(and of* x*).*

We now turn to the description of the rules in Fig. 4, which form the core of the calculus. For greater clarity, some of the conclusions of the rules include terms before they are flattened. First, either subsolver can report that the current set of constraints is unsatisfiable by using the rules A-Conf or S-Conf. For the former, the entailment |"LIA (which abbreviates |"*<sup>T</sup>*LIA ) can be checked by a standard procedure for linear integer arithmetic, and the latter corresponds to a situation where congruence closure detects a conflict between an equality and a disequality. The rules A-Prop, S-Prop, and S-A correspond to a form of Nelson-Oppen-style theory combination between the two subsolvers. The first two communicate equalities between the sub-solvers, while the third guesses arrangements for shared variables of sort Int. L-Intro ensures that the length term |s| for each sequence term s is equal to its reduced form p|s|q↓. L-Valid restricts sequence lengths to be non-negative, splitting on whether each sequence is empty or has a length greater than 0. The unit operator is injective, which is captured by U-Eq. C-Eq concludes that two sequence terms are equal if they have the same normal form. If two sequence variables have different normal forms, then C-Split takes the first differing components y and y from the two normal forms and splits on their length relationship. Note that C-Split is the source for non-termination of the calculus (see, e.g., [17,22]).

$$\begin{array}{l} \mathsf{R-Extract} \begin{array}{l} x \approx \mathsf{Extract}(y, i, j) \in \mathsf{S} \\ \hline \mathsf{A-Extract} \begin{array}{l} \mathsf{A-A}, i < 0 \vee i \geqslant \ell\_{y} \vee j \leqslant 0 \\ \end{array} & \mathsf{S} := \mathsf{S}, x \approx \epsilon \quad \| \\ \mathsf{A}:= \mathsf{A}, 0 \leqslant i < \ell\_{y}, j > 0, \ell\_{k} \approx i, \ell\_{x} \approx \min(j, \ell\_{y} - i) \\ \end{array} \\ \mathsf{R-Nth} \begin{array}{l} x \approx \mathsf{s} \mathsf{th}(y, i) \in \mathsf{S} \\ \hline \mathsf{A}:= \mathsf{A}, i < 0 \vee i > \ell\_{y} \quad \| \\ \end{array} \\ \end{array} \\ \begin{array}{l} \mathsf{R-Update} \quad x \approx \mathsf{s} \mathsf{t}, i < 0 \vee i > \ell\_{y} \quad \| \\ \hline \mathsf{A}:= \mathsf{A}, 0 \leqslant i < \ell\_{y}, \ell\_{k} \approx i \end{array} \\ \begin{array}{l} x \approx \mathsf{update}(y, i, z) \in \mathsf{S} \\ \hline \mathsf{A}:= \mathsf{A}, i < 0 \vee i > \ell\_{y} \quad \mathsf{S} := \mathsf{S}, x \approx y \quad \| \\ \end{array} \\ \begin{array}{l} \mathsf{R-Update} \quad x \approx \mathsf{s} \mathsf{t}, i < 0 \vee i > \ell\_{y} \quad \mathsf{S} := \mathsf{S}, x \approx y \quad \| \\ \hline \mathsf{A}:= \mathsf{A}, 0 \leqslant i < \ell\_{y}, \ell\_{k} \approx i, \ell\_{k'} \approx 1 \\ \end{array} \\ \begin{array}{l} \mathsf{S}: = \mathsf{S},$$

**Fig. 5.** Reduction rules for extract, nth, and update. The rules use *k*, *k*- , and *k*- to denote fresh sequence variables. We write *s* « minp*t, u*q as an abbreviation for *s* « *t* ∨ *s* « *u, s* ď *t, s* ď *u*.

Finally, Deq-Ext handles disequalities between sequences x and y by either asserting that their lengths are different or by choosing an index i at which they differ.

Figure 5 includes a set of reduction rules for handling operators that are not directly handled by the core rules. These reduction rules capture the semantics of these operators by reduction to concatenation. R-Extract splits into two cases: Either the extraction uses an out-of-bounds index or a non-positive length, in which case the result is the empty sequence, or the original sequence can be described as a concatenation that includes the extracted sub-sequence. R-Nth creates an equation between y and a concatenation term with unitpxq as one of its components, as long as i is not out of bounds. R-Update considers two cases. If i is out of bounds, then the update term is equal to y. Otherwise, y is equal to a concatenation, with the middle component (k ) representing the part of y that is updated. In the update term, k is replaced by unitpzq.

*Example 4.* Consider a configuration pS,Aq, where S contains the formulas x « y``z, z « v `` x `` w, and v « unitpuq, and A is empty. Hence, S |" |x|«|y `` z|. By L-Intro, we have S |" |y `` z|«|y|`|z|. Together with Assumption 1, we have S |" *<sup>x</sup>* « *<sup>y</sup>* ` *z*, and then with S-Prop, we have *<sup>x</sup>* « *<sup>y</sup>* ` *<sup>z</sup>* P A. Similarly, we can derive *<sup>z</sup>* « *<sup>v</sup>* ` *<sup>x</sup>* ` *w*, *<sup>v</sup>* « 1 P S, and so p∗qA |"LIA *<sup>z</sup>* « 1 ` *<sup>y</sup>* ` *<sup>z</sup>* ` *w*. Notice that for any variable k of sort Seq, we can apply L-Valid, L-Intro, and S-Prop to add to A either *<sup>k</sup>* ą 0 or *<sup>k</sup>* " 0. Applying this to y, z, w, we have that A |"LIA K in each branch thanks to p∗q, and so A-Conf applies and we get unsat.

#### **4.2 Extended Calculus**

**Definition 9.** *The calculus* EXT *is comprised of the derivation rules in Figs. 4 and 6, with the addition of rule R-Extract from Fig. 5.*

Our extended calculus combines array reasoning, based on [10] and expressed by the rules in Fig. 6, with the core rules of Fig. 4 and the R-Extract rule. Unlike in BASE, those rules do not reduce nth and update. Instead, they reason about those operators directly and handle their combination with concatenation. Nth-Concat identifies the ith element of sequence y with the corresponding element selected from its normal form (see Lemma 4). Update-Concat operates similarly, applying update to all the components. Update-Concat-Inv operates similarly on the updated sequence rather than on the original sequence. Nth-Unit captures the semantics of nth when applied to a unit term. Update-Unit is similar and distinguishes an update on an out-of-bounds index (different from 0) from an update within the bound. Nth-Intro is meant to ensure that Nth-Update (explained below) and Nth-Unit (explained above) are applicable whenever an update term exists in the constraints. Nth-Update captures the read-over-write axioms of arrays, adapted to consider their lengths (see, e.g., [10]). It distinguishes three cases: In the first, the update index is out of bounds. In the second, it is not out of bounds, and the corresponding nth term accesses the same index that was updated. In the third case, the index used in the nth term is different from the updated index. Update-Bound considers two cases: either the update changes the sequence, or the sequence remains the same. Finally, Nth-Split introduces a case split on the equality between two sequence variables x and x whenever they appear as arguments to nth with equivalent second arguments. This is needed to ensure that we detect all cases where the arguments of two nth terms must be equal.

#### **4.3 Correctness**

In this section we prove the following theorem:

**Theorem 1.** *Let* X P {BASE, EXT} *and* pS0,A0q *be a configuration, and assume without loss of generality that* A<sup>0</sup> *contains only arithmetic constraints that are not sequence constraints. Let* T *be a derivation tree obtained by applying the rules of* X *with* pS0,A0q *as the initial configuration.*


The theorem states that the calculi are correct in the following sense: if a closed derivation tree is obtained for the constraints S<sup>0</sup> Y A<sup>0</sup> then those constraints are unsatisfiable in TSeq; if a tree with a saturated leaf is obtained, then they are satisfiable. It is possible, however, that neither kind of tree can be derived by the calculi, making them neither refutation-complete nor terminating. This is not surprising since, as mentioned in the introduction, the decidability of even weaker theories is still unknown.

Proving the first claim in Theorem 1 reduces to a local soundness argument for each of the rules. For the second claim, we sketch below how to construct a satisfying model M from a saturated configuration for the case of EXT. The case for BASE is similar and simpler.

*Model Construction Steps.* The full model construction and its correctness are described in a longer version of this paper [23] together with a proof of the theorem above. Here is a summary of the steps needed for the model construction.

1. Sorts: MpElemq is interpreted as some arbitrary countably infinite set. MpSeqq and MpIntq are then determined by the theory.

**Fig. 6.** Extended derivation rules. The rules use *z*1*,...,z<sup>n</sup>* to denote fresh sequence variables and *e, e*to denote fresh element variables.

	- (a) length: we first use the assignments to variables *<sup>x</sup>* to set the length of Mpxq, without assigning its actual value.
	- (b) unit variables: for variables x with x "<sup>S</sup> unitpzq, we set Mpxq to be [Mpzq].

We conclude this section with an example of the construction of M.

*Example 5.* Consider a signature in which Elem is Int, and a saturated configuration pS∗,A<sup>∗</sup>q w.r.t. EXT that includes the following formulas: y « y<sup>1</sup> `` y2, x « x<sup>1</sup> `` x2, y<sup>2</sup> « x2, y<sup>1</sup> « updatepx1, i, aq, |y1|"|x1|, |y2|"|x2|, nthpy, iq « a, nthpy1, iq « a. Following the above construction, a satisfying interpretation M can be built as follows:


# **5 Implementation**

We implemented our procedure for sequences as an extension of a previous theory solver for strings [17,22]. This solver is integrated in cvc5, and has been generalized to reason about both strings and sequences. In this section, we describe how the rules of the calculus are implemented and the overall strategy for when they are applied.

Like most SMT solvers, cvc5 is based on the CDCLpTq architecture [19] which combines several subsolvers, each specialized on a specific theory, with a solver for propositional satisfiability (SAT). Following that architecture, cvc5 maintains an evolving set of formulas F. When F starts with quantifier-free formulas over the theory TSeq, the case targeted by this work, the SAT solver searches for a satisfying assignment for F, represented as the set M of literals it satisfies. If none exists, the problem is unsatisfiable at the propositional level and hence TSeq-unsatisfiable. Otherwise, M is partitioned into the arithmetic constraints A and the sequence constraints S and checked for TSeqsatisfiability using the rules of the EXT calculus. Many of those rules, including all those with multiple conclusions, are implemented by adding new formulas to F (following the splitting-on-demand approach [4]). This causes the SAT solver to try to extend its assignment to those formulas, which results in the addition of new literals to M (and thereby also to A and S).

In this setting, the rules of the two calculi are implemented as follows. The effect of rule A-Conf is achieved by invoking cvc5's theory solver for linear integer arithmetic. Rule S-Conf is implemented by the congruence closure submodule of the theory solver for sequences. Rules A-Prop and S-Prop are implemented by the standard mechanism for theory combination. Note that each of these four rules may be applied *eagerly*, that is, before constructing a complete satisfying assignment M for F.

The remaining rules are implemented in the theory solver for sequences. Each time M is checked for satisfiability, cvc5 follows a strategy to determine which rule to apply next. If none of the rules apply and the configuration is different from unsat, then it is saturated, and the solver returns sat. The strategy for EXT prioritizes rules as follows. Only the first applicable rule is applied (and then control goes back to the SAT solver).


Whenever a rule is applied, the strategy will restart from the beginning in the next iteration. The strategy is designed to apply with higher priority steps that are easy to compute and are likely to lead to conflicts. Some steps are ordered based on dependencies from other steps. For instance, Steps 5 and 7 use normal forms, which are computed in Step 4. The strategy for the BASE calculus is the same, except that Steps 7 and 8 are replaced by one that applies R-Update and R-Nth to all update and nth terms in S.

We point out that the C-Split rule may cause non-termination of the proof strategy described above in the presence of *cyclic* sequence constraints, for instance, constraints where sequence variables appear on both sides of an equality. The solver uses methods for detecting some of these cycles, to restrict when C-Split is applied. In particular, when S |"<sup>∗</sup> `` x « p*u* `` s `` *w*q↓, S |"<sup>∗</sup> `` x « p*u* `` t `` *v*q↓, and s occurs in *v*, then C-Split is not applied. Instead, other heuristics are used, and in some cases the solver terminates with a response of "unknown" (see e.g., [17] for details). In addition to the version shown here, we also use another variation of the C-Split rule where the normal forms are matched in reverse (starting from the last terms in the concatenations). The implementation also uses fast entailment tests for length inequalities. These tests may allow us to conclude which branch of C-Split, if any, is feasible, without having to branch on cases explicitly.

Although not shown here, the calculus can also accommodate certain *extended* sequence constraints, that is, constraints using a signature with additional functions. For example, our implementation supports sequence containment, replacement, and reverse. It also supports an extended variant of the update operator, in which the third argument is a sequence that overrides the sequence being updated starting from the index given in the second argument. Constraints involving these functions are handled by reduction rules, similar to those shown in Fig. 5. The implementation is further optimized by using context-dependent simplifications, which may eagerly infer when certain sequence terms can be simplified to constants based on the current set of assertions [22].

# **6 Evaluation**

We evaluate the performance of our approach, as implemented in cvc5. The evaluation investigates: (i) whether the use of sequences is a viable option for reasoning about vectors in programs, (ii) how our approach compares with other sequence solvers, and (iii) what is the performance impact of our array-style extended rules. As a baseline, we use Version 4.8.14 of the Z3 SMT solver, which supports a theory of sequences without updates. For cvc5, we evaluate implementations of both the basic calculus (denoted **cvc5**) and the extended array-based calculus (denoted **cvc5-a**). The benchmarks, solver configurations, and logs from our runs are available for download.<sup>4</sup> We ran all experiments on a cluster equipped with Intel Xeon E5-2620 v4 CPUs. We allocated one physical CPU core and 8 GB of RAM for each solver-benchmark pair and used a time limit of 300 s. We use the following two sets of benchmarks:

**Array Benchmarks (ARRAYS).** The first set of benchmarks is derived from the QF\_AX benchmarks in SMT-LIB [3]. To generate these benchmarks, we (i) replace declarations of arrays with declarations of sequences of uninterpreted sorts, (ii) change the sort of index terms to integers, and (iii) replace store with update and select with nth. The resulting benchmarks are quantifier-free and do not contain concatenations. Note that the original and the derived benchmarks are not equisatisfiable, because sequences take into account out-of-bounds cases that do not occur in arrays. For the Z3 runs, we add to the benchmarks a definition of update in terms of extraction and concatenation.

**Smart Contract Verification (DIEM).** The second set of benchmarks consists of verification conditions generated by running the Move Prover [24] on smart contracts written for the Diem framework. By default, the encoding does not use the sequence update

<sup>4</sup> http://dx.doi.org/10.5281/zenodo.6146565.

**Fig. 7.** Figure a lists the number of solved benchmarks and total time on commonly solved benchmarks. The scatter plots compare the base solver (**cvc5**) and the extended solver (**cvc5-a**) on ARRAY (Fig. b) and DIEM (Fig. c) benchmarks.

operation, and so Z3 can be used directly. However, we also modified the Move Prover encoding to generate benchmarks that do use the update operator, and ran cvc5 on them. In addition to using the sequence theory, the benchmarks make heavy use of quantifiers and the SMT-LIB theory of datatypes.

Figure 7a summarizes the results in terms of number of solved benchmarks and total time in seconds on commonly solved benchmarks. The configuration that solves the largest number of benchmarks is the implementation of the extended calculus (**cvc5-a**). This approach also successfully solves most of the DIEM benchmarks, which suggests that sequences are a promising option for encoding vectors in programs. The results further show that the sequences solver of cvc5 significantly outperforms Z3 on both the number of solved benchmarks and the solving time on commonly solved benchmarks.

Figures 7b and 7c show scatter plots comparing **cvc5** and **cvc5-a** on the two benchmark sets. We can see a clear trend towards better performance when using the extended solver. In particular, the table shows that in addition to solving the most benchmarks, **cvc5-a** is also fastest on the commonly solved instances from the DIEM benchmark set.

For the ARRAYS set, we can see that some benchmarks are slower with the extended solver. This is also reflected in the table, where **cvc5-a** is slower on the commonly solved instances. This is not too surprising, as the extra machinery of the extended solver can sometimes slow down easy problems. As problems get harder, however, the benefit of the extended solver becomes clear. For example, if we drop Z3 and consider just the commonly solved instances between **cvc5** and **cvc5-a** (of which there are 242), **cvc5-a** is about 2.47ˆ faster (426 vs 1053 s). Of course, further improving the performance of **cvc5-a** is something we plan to explore in future work.

#### **7 Conclusion**

We introduced calculi for checking satisfiability in the theory of sequences, which can be used to model the vector data type. We described our implementation in cvc5 and provided an evaluation, showing that the proposed theory is rich enough to naturally express verification conditions without introducing quantifiers, and that our implementation is efficient. We believe that verification tools can benefit by changing their encoding of verification conditions that involve vectors to use the proposed theory and implementation.

We plan to propose the incorporation of this theory in the SMT-LIB standard and contribute our benchmarks to SMT-LIB. As future research, we plan to integrate other approaches for array solving into our basic solver. We also plan to study the politeness [16,20] and decidability of various fragments of the theory of sequences.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Calculi and Orderings**

# **An Efficient Subsumption Test Pipeline for BS(LRA) Clauses**

Martin Bromberger<sup>1</sup> , Lorenz Leutgeb1,2(B) , and Christoph Weidenbach<sup>1</sup>

<sup>1</sup> Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr¨ucken, Germany

{mbromber,lorenz,weidenb}@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarland Informatics Campus, Saarbr¨ucken, Germany

**Abstract.** The importance of subsumption testing for redundancy elimination in first-order logic automatic reasoning is well-known. Although the problem is already NP-complete for first-order clauses, the meanwhile developed test pipelines efficiently decide subsumption in almost all practical cases. We consider subsumption between first-oder clauses of the Bernays-Sch¨onfinkel fragment over linear real arithmetic constraints: BS(LRA). The bottleneck in this setup is deciding implication between the LRA constraints of two clauses. Our new *sample point heuristic* preempts expensive implication decisions in about 94% of all cases in benchmarks. Combined with filtering techniques for the first-order BS part of clauses, it results again in an efficient subsumption test pipeline for BS(LRA) clauses.

**Keywords:** Bernays-Sch¨onfinkel fragment · Linear arithmetic · Redundancy elimination · Subsumption

# **1 Introduction**

The elimination of redundant clauses is crucial for the efficient automatic reasoning in first-order logic. In a resolution [5,50] or superposition setting [4,44], a newly inferred clause might be subsumed by a clause that is already known (*forward subsumption*) or it might subsume a known clause (*backward subsumption*). Although the SCL calculi family [1,11,21] does not require forward subsumption tests, a property also inherent to the propositional CDCL (Conflict Driven Clause Learning) approach [8,34,41,55,63], backward subsumption and hence subsumption remains an important test in order to remove redundant clauses.

In this work we present advances in deciding subsumption for constrained clauses, specifically employing the Bernays-Sch¨onfinkel fragment as foreground logic, and linear real arithmetic as background theory, BS(LRA). BS(LRA) is of particular interest because it can be used to model *supervisors*, i.e., components in technical systems that control system functionality. An example for a supervisor is the electronic control unit of a combustion engine. The logics we use to model supervisors and their properties are called *SupERLogs*—(Sup)ervisor (E)ffective(R)easoning (Log)ics. SupERLogs are instances of function-free firstorder logic extended with arithmetic [18], which means BS(LRA) is an example of a SupERLog.

Subsumption is an important redundancy criterion in the context of hierarchic clausal reasoning [6,11,20,35,37]. At the heart of this paper is a new technique to speed up the treatment of linear arithmetic constraints as part of deciding subsumption. For every clause, we store a solution of its associated constraints, which is used to quickly falsify implication decisions, acting as a filter, called the *sample point heuristic*. In our experiments with various benchmarks, the technique is very effective: It successfully preempts expensive implication decisions in about 94% of cases. We elaborate on these findings in Sect. 4.

For example, consider three BS clauses, none of which subsumes another:

$$C\_1 := P(a, x) \qquad C\_2 := \neg P(y, z) \lor Q(y, z, b) \qquad C\_3 := \neg R(b) \lor Q(a, x, b)$$

Let C<sup>4</sup> be the resolvent of C<sup>1</sup> and C<sup>2</sup> upon the atom P(a, x), i.e., C<sup>4</sup> := Q(a, z, b). Now <sup>C</sup><sup>4</sup> backward-subsumes <sup>C</sup><sup>3</sup> with matcher <sup>σ</sup> := {<sup>z</sup> → <sup>x</sup>}, i.e. <sup>C</sup>4<sup>σ</sup> <sup>⊂</sup> <sup>C</sup>3, thus C<sup>3</sup> is redundant and can be eliminated. Now, consider an extension of the above clauses with some simple LRA constraints following the same reasoning:

$$\begin{aligned} C\_1' &:= x \ge 1 \parallel P(a, x) \\ C\_2' &:= z \ge 0 \parallel \neg P(y, z) \lor Q(y, z, b) \\ C\_3' &:= x \ge 0 \parallel \neg R(b) \lor Q(a, x, b) \end{aligned}$$

where is interpreted as an implication, i.e., clause <sup>C</sup> <sup>1</sup> stands for <sup>¬</sup><sup>x</sup> <sup>≥</sup> <sup>1</sup>∨P(a, x) or simply x < <sup>1</sup> <sup>∨</sup> <sup>P</sup>(a, x). The respective resolvent on the constrained clauses is C <sup>4</sup> := <sup>z</sup> <sup>≥</sup> <sup>0</sup>, z <sup>≥</sup> <sup>1</sup> <sup>Q</sup>(a, z, b) or after constraint simplification <sup>C</sup> <sup>4</sup> := <sup>z</sup> <sup>≥</sup> <sup>1</sup> <sup>Q</sup>(a, z, b) because <sup>z</sup> <sup>≥</sup> 1 implies <sup>z</sup> <sup>≥</sup> 0. For the constrained clauses, <sup>C</sup> <sup>4</sup> does no longer subsume C <sup>3</sup> with matcher <sup>σ</sup> := {<sup>z</sup> → <sup>x</sup>}, because <sup>z</sup> <sup>≥</sup> 0 does not LRA-imply <sup>z</sup> <sup>≥</sup> 1. Now, if we store the sample point <sup>x</sup> = 0 as a solution for the constraint of clause C <sup>3</sup>, this sample point already reveals that <sup>z</sup> <sup>≥</sup> 0 does not LRA-imply <sup>z</sup> <sup>≥</sup> 1. This constitutes the basic idea behind our sample point heuristic. In general, constraints are not just simple bounds as in the above example, and sample points are solutions to the system of linear inequalities of the LRA constraint of a clause.

Please note that our test on LRA constraints is based on LRA theory implication and not on a syntactic notion such as subsumption on the first-order part of the clause. In this sense it is "stronger" than its first-order counterpart. This fact is stressed by the following example, taken from [26, Ex. 2], which shows that first-order implication does not imply subsumption. Let

$$\begin{aligned} C\_1 &:= \neg P(x, y) \lor \neg P(y, z) \lor P(x, z) \\ C\_2 &:= \neg P(a, b) \lor \neg P(b, c) \lor \neg P(c, d) \lor P(a, d) \end{aligned}$$

Then we have <sup>C</sup><sup>1</sup> <sup>→</sup> <sup>C</sup>2, but again, for all <sup>σ</sup> we have <sup>C</sup>1<sup>σ</sup> ⊆ <sup>C</sup>2: Constructing <sup>σ</sup> from left to right we obtain <sup>σ</sup> := {<sup>x</sup> → a, y → b, z → <sup>c</sup>}, but <sup>P</sup>(a, c) ∈ <sup>C</sup>2. Constructing <sup>σ</sup> from right to left we obtain <sup>σ</sup> := {<sup>z</sup> → d, x → a, y → <sup>c</sup>}, but <sup>¬</sup>P(a, c) ∈ <sup>C</sup>2.

*Related Work.* Treatment of questions regarding the complexity of deciding subsumption of first-order clauses [27] dates back more than thirty years. Notions of subsumption, varying in generality, are studied in different sub-fields of theorem proving, whereas we restrict our attention to first-order theorem proving. Modern implementations typically decide multiple thousand instances of this problem per second: In [62, Sect. 2], Voronkov states that initial versions of Vampire "seemed to [. . . ] deadlock" without efficient implementations to decide (forward) subsumption.

In order to reduce the number of clauses out of a set of clauses to be considered for pairwise subsumption checking, the best known practice in firstorder theorem proving is to use (imperfect) indexing data structures as a means for pre-filtering and research concerning appropriate techniques is plentiful, see [24,25,27–30,33,39,40,43,45–49,52–54,56,59,61] for an evaluation of these techniques. Here we concentrate on the efficiency of a subsumption check between two clauses and therefore do not take indexing techniques into account. Furthermore, the implication test between two linear arithmetic constraints is of a semantic nature and is not related to any syntactic features of the involved constraints and can therefore hardly be filtered by a syntactic indexing approach.

In addition to pre-filtering via indexing, almost all above mentioned implementations of first-order subsumption tests rely on additional filters on the clause level. The idea is to generate an abstraction of clauses together with an ordering relation such that the ordering relation is necessary to hold between two clauses in order for one clause to subsume the other. Furthermore, the abstraction as well as the ordering relation should be efficiently computable. For example, a necessary condition for a first-order clause C<sup>1</sup> to subsume a first-order clause <sup>C</sup><sup>2</sup> is <sup>|</sup> vars(C1)|≥| vars(C2)|, i.e., the number of different variables in <sup>C</sup><sup>1</sup> must be larger or equal than the number of variables in C2. Further and additional abstractions included by various implementations rely on the size of clauses, number of ground literals, depth of literals and terms, occurring predicate and function symbols. For the BS(LRA) clauses considered here, the structure of the first-order BS part, which consists of predicates and flat terms (variables and constants) only, is not particularly rich.

The exploration of sample points has already been studied in the context of first-order clauses with arithmetic constraints. In [17,36] it was used to improve the performance of iSAT [23] on testing non-linear arithmetic constraints. In general, iSAT tests satisfiability by interval propagation for variables. If intervals get "too small" it typically gives up, however sometimes the explicit generation of a sample point for a small interval can still lead to a certificate for satisfiability. This technique was successfully applied in [17], but was not used for deciding subsumption of constrained clauses.

*Motivation.* The main motivation for this work is the realization that computing implication decisions required to treat constraints of the background theory presents the bottleneck of an BS(LRA) subsumption check in practice. Inspired by the success of filtering techniques in first-order logic, we devise an exceptionally effective filter for constraints and adopt well-known first-order filters to the BS fragment. Our sample point heuristic for LRA could easily be generalized to other arithmetic theories as well as full first-order logic.

*Structure.* The paper is structured as follows. After a section defining BS(LRA) and common notions and notation, Sect. 2, we define redundancy notions and our sample point heuristic in Sect. 3. Section 4 justifies the success of the sample point heuristic by numerous experiments in various application domains of BS(LRA). The paper ends with a discussion of the obtained results, Sect. 5. Binaries, utility scripts, benchmarking instances used as input, and the output used for evaluation may be obtained online [13].

# **2 Preliminaries**

We briefly recall the basic logical formalisms and notations we build upon [10]. Our starting point is a standard many-sorted first-order language for BS with *constants* (denoted a, b, c), without non-constant function symbols, with *variables* (denoted w, x, y, z), and *predicates* (denoted P, Q, R) of some fixed *arity*. *Terms* (denoted t, s) are variables or constants. An *atom* (denoted A, B) is an expression P(t1,...,tn) for a predicate P of arity n. A *positive literal* is an atom <sup>A</sup> and a *negative literal* is a negated atom <sup>¬</sup>A. We define comp(A) = <sup>¬</sup>A, comp(¬A) = <sup>A</sup>, <sup>|</sup>A<sup>|</sup> <sup>=</sup> <sup>A</sup> and |¬A<sup>|</sup> <sup>=</sup> <sup>A</sup>. Literals are usually denoted L, K, H. Formulas are defined in the usual way using quantifiers ∀, ∃ and the boolean connectives ¬, ∨, ∧, →, and ≡.

<sup>A</sup> *clause* (denoted C, D) is a universally closed disjunction of literals <sup>A</sup>1∨···∨ <sup>A</sup>n∨¬B1∨···∨¬Bm. Clauses are identified with their respective multisets and all standard multiset operations are extended to clauses. For instance, <sup>C</sup> <sup>⊆</sup> <sup>D</sup> means that all literals in C also appear in D respecting their number of occurrences. A clause is *Horn* if it contains at most one positive literal, i.e. n - 1, and a *unit clause* if it has exactly one literal, i.e. n + m = 1. We write C<sup>+</sup> for the set of positive literals, or *conclusions* of <sup>C</sup>, i.e. <sup>C</sup><sup>+</sup> := {A1,...,An} and respectively <sup>C</sup><sup>−</sup> for the set of negative literals, or *premises* of <sup>C</sup>, i.e. <sup>C</sup><sup>−</sup> := {¬B1,...,¬Bm}. If Y is a term, formula, or a set thereof, vars(Y ) denotes the set of all variables in <sup>Y</sup> , and <sup>Y</sup> is *ground* if vars(<sup>Y</sup> ) = <sup>∅</sup>.

The *Bernays-Sch¨onfinkel Clause Fragment* (BS) in first-order logic consists of first-order clauses where all involved terms are either variables or constants. The *Horn Bernays-Sch¨onfinkel Clause Fragment* (HBS) consists of all sets of BS Horn clauses.

A *substitution* σ is a function from variables to terms with a finite domain dom(σ) = {<sup>x</sup> <sup>|</sup> xσ <sup>=</sup> <sup>x</sup>} and codomain codom(σ) = {xσ <sup>|</sup> <sup>x</sup> <sup>∈</sup> dom(σ)}. We denote substitutions by σ, δ, ρ. The application of substitutions is often written postfix, as in xσ, and is homomorphically extended to terms, atoms, literals, clauses, and quantifier-free formulas. A substitution σ is *ground* if codom(σ) is ground. Let Y denote some term, literal, clause, or clause set. A substitution σ is a *grounding* for Y if Y σ is ground, and Y σ is a *ground instance* of Y in this case. We denote by gnd(Y ) the set of all ground instances of Y , and by gndB(Y ) the set of all ground instances over a given set of constants B. The *most general unifier* mgu(Z1, Z2) of two terms/atoms/literals Z<sup>1</sup> and Z<sup>2</sup> is defined as usual, and we assume that it does not introduce fresh variables and is idempotent.

We assume a standard many-sorted first-order logic model theory, and write <sup>A</sup> <sup>φ</sup> if an interpretation <sup>A</sup> satisfies a first-order formula <sup>φ</sup>. A formula <sup>ψ</sup> is a logical consequence of <sup>φ</sup>, written <sup>φ</sup> <sup>ψ</sup>, if <sup>A</sup> <sup>ψ</sup> for all <sup>A</sup> such that <sup>A</sup> <sup>φ</sup>. Sets of clauses are semantically treated as conjunctions of clauses with all variables quantified universally.

#### **2.1 Bernays-Sch¨onfinkel with Linear Real Arithmetic**

The extension of BS with linear real arithmetic, BS(LRA), is the basis for the formalisms studied in this paper. We consider a standard *many-sorted* firstorder logic with one first-order sort F and with the sort R for the real numbers. Given a clause set <sup>N</sup>, the interpretations <sup>A</sup> of our sorts are fixed: <sup>R</sup><sup>A</sup> <sup>=</sup> <sup>R</sup> and <sup>F</sup><sup>A</sup> <sup>=</sup> <sup>F</sup>. This means that <sup>F</sup><sup>A</sup> is a Herbrand interpretation, i.e., <sup>F</sup> is the set of first-order constants in N, or a single constant out of the signature if no such constant occurs. Note that this is not a deviation from standard semantics in our context as for the arithmetic part the canonical domain is considered and the first-order sort has the finite model property over the occurring constants (note that equality is not part of BS).

Constant symbols, arithmetic function symbols, variables, and predicates are uniquely declared together with their respective sort. The unique sort of a constant symbol, variable, predicate, or term is denoted by the function sort(Y ) and we assume all terms, atoms, and formulas to be well-sorted. We assume *pure* input clause sets, which means the only constants of sort R are (rational) numbers. This means the only constants that we do allow are rational numbers <sup>c</sup> <sup>∈</sup> <sup>Q</sup> and the constants defining our finite first-order sort <sup>F</sup>. Irrational numbers are not allowed by the standard definition of the theory. The current implementation comes with the caveat that only integer constants can be parsed. Satisfiability of pure BS(LRA) clause sets is semi-decidable, e.g., using *hierarchic superposition* [6] or *SCL(T)* [11]. Impure BS(LRA) is no longer compact and satisfiability becomes undecidable, but its restriction to ground clause sets is decidable [22].

All arithmetic predicates and functions are interpreted in the usual way. An interpretation of BS(LRA) coincides with <sup>A</sup>LRA on arithmetic predicates and functions, and freely interprets free predicates. For pure clause sets this is well-defined [6]. Logical satisfaction and entailment is defined as usual, and uses similar notation as for BS.

*Example 1.* The clause y < <sup>5</sup> <sup>∨</sup> <sup>x</sup> <sup>=</sup> <sup>x</sup> + 1 ∨ ¬S0(x, y) <sup>∨</sup> <sup>S</sup>1(x , 0) is part of a timed automaton with two clocks x and y modeled in BS(LRA). It represents a transition from state S<sup>0</sup> to state S<sup>1</sup> that can be traversed only if clock y is at least 5 and that resets y to 0 and increases x by 1.

Arithmetic terms are constructed from a set X of *variables*, the set of integer constants <sup>c</sup> <sup>∈</sup> <sup>Z</sup>, and binary function symbols + and <sup>−</sup> (written infix). Additionally, we allow multiplication · if one of the factors is an integer constant. Multiplication only serves us as syntactic sugar to abbreviate other arithmetic terms, e.g., <sup>x</sup> <sup>+</sup> <sup>x</sup> <sup>+</sup> <sup>x</sup> is abbreviated to 3 · <sup>x</sup>. Atoms in BS(LRA) are either *first-order atoms* (e.g., P(13, x)) or *(linear) arithmetic atoms* (e.g., x < 42). Arithmetic atoms are denoted by <sup>λ</sup> and may use the predicates <sup>≤</sup>, <, =, <sup>=</sup>, >, <sup>≥</sup>, which are written infix and have the expected fixed interpretation. We use as a placeholder for any of these predicates. Predicates used in first-order atoms are called *free*. *First-order literals* and related notation is defined as before. *Arithmetic literals* coincide with arithmetic atoms, since the arithmetic predicates are closed under negation, e.g., <sup>¬</sup>(<sup>x</sup> <sup>≥</sup> 42) <sup>≡</sup> x < 42.

BS(LRA) clauses are defined as for BS but using BS(LRA) atoms. We often write clauses in the form <sup>Λ</sup> <sup>C</sup> where <sup>C</sup> is a clause solely built of free first-order literals and Λ is a multiset of LRA atoms called the *constraint* of the clause. A clause of the form <sup>Λ</sup> <sup>C</sup> is therefore also called a *constrained clause*. The semantics of <sup>Λ</sup> <sup>C</sup> is as follows:

$$A \parallel C \quad \text{iff} \quad \left(\bigwedge\_{\lambda \in A} \lambda\right) \to C \quad \text{iff} \quad \left(\bigvee\_{\lambda \in A} \neg \lambda\right) \lor C$$

For example, the clause x > <sup>1</sup>∨<sup>y</sup> = 5∨¬Q(x)∨R(x, y) is also written <sup>x</sup> <sup>≤</sup> <sup>1</sup>, y <sup>=</sup> <sup>5</sup>||¬Q(x) <sup>∨</sup> <sup>R</sup>(x, y). The negation <sup>¬</sup>(Λ <sup>C</sup>) of a constrained clause <sup>Λ</sup> <sup>C</sup> where <sup>C</sup> <sup>=</sup> <sup>A</sup><sup>1</sup> ∨···∨ <sup>A</sup><sup>n</sup> ∨ ¬B<sup>1</sup> ∨···∨¬B<sup>m</sup> is thus equivalent to ( <sup>λ</sup>∈<sup>Λ</sup> <sup>λ</sup>) ∧ ¬A<sup>1</sup> <sup>∧</sup> ···∧¬A<sup>n</sup> <sup>∧</sup> <sup>B</sup><sup>1</sup> ∧···∧ <sup>B</sup>m. Note that since the neutral element of conjunction is , an empty constraint is thus valid, i.e. equivalent to true.

An *assignment* for a constraint Λ is a substitution (denoted β) that maps all variables in vars(Λ) to real numbers <sup>c</sup> <sup>∈</sup> <sup>R</sup>. An assignment is a *solution* for a constraint <sup>Λ</sup> if all atoms <sup>λ</sup> <sup>∈</sup> (Λβ) evaluate to true. A constraint <sup>Λ</sup> is *satisfiable* if there exists a solution for Λ. Otherwise it is *unsatisfiable*. Note that assignments can be extended to C by also mapping variables of the first-order sort accordingly.

A clause or clause set is *abstracted* if its first-order literals contain only variables or first-order constants. Every clause C is equivalent to an abstracted clause that is obtained by replacing each non-variable arithmetic term t that occurs in a first-order atom by a fresh variable <sup>x</sup> while adding an arithmetic atom <sup>x</sup> <sup>=</sup> <sup>t</sup> to C. We assume abstracted clauses for theory development, but we prefer nonabstracted clauses in examples for readability, e.g., a unit clause P(3, 5) is considered in the development of the theory as the clause <sup>x</sup> = 3, y = 5 <sup>P</sup>(x, y). In the implementation, we mostly prefer abstracted clauses except that we allow integer constants <sup>c</sup> <sup>∈</sup> <sup>Z</sup> to appear as arguments of first-order literals. In some cases, this makes it easier to recognize whether two clauses can be matched or not. For instance, we see by syntactic comparison that the two unit clauses P(3, 5) and P(0, 1) have no substitution σ such that P(3, 5) = P(0, 1)σ. For the abstracted versions on the other hand, <sup>x</sup> = 3, y = 5 <sup>P</sup>(x, y) and <sup>u</sup> = 0, v = 1 <sup>P</sup>(u, v) we can find a matching substitution for the first-order part <sup>σ</sup> := {<sup>u</sup> → x, v → <sup>y</sup>} and would have to check the constraints semantically to exclude the matching.

*Hierarchic Resolution.* One inference rule, foundational to most algorithms for solving constrained first-order clauses, is *hierarchic resolution* [6]:

$$\frac{A\_1 \parallel L\_1 \lor C\_1 \quad A\_2 \parallel L\_2 \lor C\_2 \quad \sigma = \text{mgu}(L\_1, \text{comp}(L\_2))}{\left(A\_1, A\_2 \parallel C\_1 \lor C\_2\right)\sigma}$$

The conclusion is called *hierarchic resolvent* (of the two clauses in the premise). <sup>A</sup> *refutation* is the sequence of resolution steps that produces a clause <sup>Λ</sup> <sup>⊥</sup> with <sup>A</sup>LRA Λδ for some grounding <sup>δ</sup>. Hierarchic resolution is sound and refutationally complete for the BS(LRA) clauses considered here, since every set N of BS(LRA) clauses is *sufficiently complete* [6], because all constatnts of the arithemtic sort are numbers. Hence *hierarchic resolution* is sound and refutationally complete for N [6,7]. *Hierarchic unit resolution* is a special case of hierarchic resolution, that only combines two clauses in case one of them is a unit clause. Hierarchic unit resolution is sound and complete for HBS(LRA) [6,7], but not even refutationally complete for BS(LRA).

Most algorithms for Bernays-Schn¨onfinkel, first-order logic, and beyond utilize resolution. The SCL(T) calculus for HBS(LRA) uses hierarchic resolution in order to learn from the conflicts it encounters during its search. The hierarchic superposition calculus on the other hand derives new clauses via hierarchic resolution based on an ordering. The goal is to either derive the empty clause or a saturation of the clause set, i.e., a state from which no new clauses can be derived. Each of those algorithms must derive new clauses in order to progress, but their subroutines also get progressively slower as more clauses are derived. In order to increase efficiency, it is necessary to eliminate clauses that are obsolete. One measure that determines whether a clause is useful or not is *redundancy*.

*Redundancy.* In order to define redundancy for constrained clauses, we need an H*-order*, i.e., a well-founded, total, strict ordering ≺ on ground literals such that literals in the constraints (in our case arithmetic literals) are always smaller than first-order literals. Such an ordering can be lifted to constrained clauses and sets thereof by its respective multiset extension. Hence, we overload any such order ≺ for literals, constrained clauses, and sets of constrained clause if the meaning is clear from the context. We define as the reflexive closure of ≺ and <sup>N</sup> Λ<sup>C</sup> := {<sup>D</sup> <sup>|</sup> <sup>D</sup> <sup>∈</sup> <sup>N</sup> and <sup>D</sup> <sup>Λ</sup> <sup>C</sup>}. An instance of an LPO [15] with appropriate precedence can serve as an H-order.

**Definition 2 (Clause Redundancy).** *A ground clause* <sup>Λ</sup> <sup>C</sup> *is* redundant *with respect to a set* <sup>N</sup> *of ground clauses and an* <sup>H</sup>*-order* <sup>≺</sup> *if* <sup>N</sup> <sup>Λ</sup> <sup>C</sup> <sup>Λ</sup> <sup>C</sup>*. A clause* <sup>Λ</sup> <sup>C</sup> *is* redundant *with respect to a clause set* <sup>N</sup> *and an* <sup>H</sup>*-order* <sup>≺</sup> *if for all* <sup>Λ</sup> <sup>C</sup> <sup>∈</sup> gnd(Λ <sup>C</sup>) *the clause* <sup>Λ</sup> <sup>C</sup> *is redundant with respect to* gnd(N)*.*

If a clause <sup>Λ</sup> <sup>C</sup> is redundant with respect to a clause set <sup>N</sup>, then it can be removed from N without changing its semantics. Determining clause redundancy is an undecidable problem [11,63]. However, there are special cases of redundant clauses that can be easily checked, e.g., tautologies and subsumed clauses. Techniques for tautology deletion and subsumption deletion are the most common elimination techniques in modern first-order provers.

A *tautology* is a clause that evaluates to true independent of the predicate interpretation or assignment. It is therefore redundant with respect to all orders and clause sets; even the empty set.

**Corollary 3 (Tautology for Constrained Clauses).** *A clause* <sup>Λ</sup> <sup>C</sup> *is a tautology if the existential closure of* <sup>¬</sup>(Λ <sup>C</sup>) *is unsatisfiable.*

Since <sup>¬</sup>(Λ <sup>C</sup>) is essentially ground (by existential closure and skolemization), it can be solved with an appropriate SMT solver, i.e., an SMT solver that supports unquantified uninterpreted functions coupled with linear real arithmetic. In [2], it is recommended to check only the following conditions for tautology deletion in hierarchic superposition:

**Corollary 4 (Tautology Check).** *A clause* <sup>Λ</sup> <sup>C</sup> *is a tautology if the existential closure of* Λ *is unsatisfiable or if* C *contains two literals* L<sup>1</sup> *and* L<sup>2</sup> *with* L<sup>1</sup> = comp(L2)*.*

The advantage is that the check on the first-order side of the clause is still purely syntactic and corresponds to the tautology check for pure first-order logic. Nonetheless, there are tautologies that are not captured by Corollary 4, e.g., <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>P</sup>(x) ∨ ¬P(y). The SCL(T) calculus on the other hand requires no tautology checks because it never learns tautologies as part of its conflict analysis [1,11,21]. This property is also inherent to the propositional CDCL (Conflict Driven Clause Learning) approach [8,34,41,55,63].

# **3 Subsumption for Constrained Clauses**

A *subsumed* constrained clause is a clause that is redundant with respect to a single clause in our clause set. Formally, subsumption is defined as follows.

**Definition 5. (Subsumption for Constrained Clauses** [2]**).** *A constrained clause* <sup>Λ</sup><sup>1</sup> <sup>C</sup><sup>1</sup> subsumes *another constrained clause* <sup>Λ</sup><sup>2</sup> <sup>C</sup><sup>2</sup> *if there exists a substitution* <sup>σ</sup> *such that* <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup>2*,* vars(Λ1σ) <sup>⊆</sup> vars(Λ2)*, and the universal closure of* <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) *holds in LRA.*

Eliminating redundant clauses is crucial for the efficient operation of an automatic first-order theorem prover. Although subsumption is considered one of the easier redundancy relationships that we can check in practice, it is still a hard problem in general:

**Lemma 6. (Complexity of Subsumption in the BS Fragment).** *Deciding subsumption for a pair of BS clauses is NP-complete.*

*Proof.* Containment in NP follows from the fact that the size of subsumption matchers is limited by the subsumed clause and set inclusion of literals can be decided in polynomial time. For the hardness part, consider the following polynomial-time reduction from 3-SAT. Take a propositional clause set where all clauses have length three. Now introduce a 6-place predicate R and encode each propositional variable P by a first-order variable x<sup>P</sup> . Then a propositional clause <sup>L</sup><sup>1</sup> <sup>∨</sup>L<sup>2</sup> <sup>∨</sup>L<sup>3</sup> can be encoded by an atom <sup>R</sup>(xP<sup>1</sup> , p1, xP<sup>2</sup> , p2, xP<sup>3</sup> , p3) where p<sup>i</sup> is 0 if L<sup>i</sup> is negative and 1 otherwise and P<sup>i</sup> is the predicate of Li. This way the clause set N can be represented by a single BS clause C<sup>N</sup> . Now construct a clause D that contains all atoms representing the way a clause of length three can become true by ground atoms over R and constants 0, 1. For example, it contains atoms like R(0, 0,...) and R(1, 1,...) representing that the first literal of a clause is true. Actually, for each such atom R(0, 0,...) the clause D contains <sup>|</sup>C<sup>N</sup> <sup>|</sup> copies. Finally, <sup>C</sup><sup>N</sup> subsumes <sup>D</sup> if and only if <sup>N</sup> is satisfiable.

In order to be efficient, modern theorem provers need to decide multiple thousand subsumption checks per second. In the pure first-order case, this is possible because of indexing and filtering techniques that quickly decide most subsumption checks [24,25,27–30,33,39,40,45–49,52–54,56,59,61,62].

For BS(LRA) (and FOL(LRA)), there also exists research on how to perform the subsumption check in general [2,36], but the literature contains no dedicated indexing or filtering techniques for the constraint part of the subsumption check. In this section and as the main contribution of this paper, we present the first such filtering techniques for BS(LRA). But first, we explain how to solve the subsumption check for constrained clauses in general.

**First-Order Check.** The first step of the subsumption check is exactly the same as in first-order logic without arithmetic. We have to find a substitution <sup>σ</sup>, also called a *matcher*, such that <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup>2. The only difference is that it is not enough to compute one matcher σ, but we have to compute all matchers for <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup><sup>2</sup> until we find one that satisfies the implication <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ). For instance, there are two matchers for the clauses <sup>C</sup><sup>1</sup> := <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≥</sup> <sup>0</sup> <sup>Q</sup>(x, y) and <sup>C</sup><sup>2</sup> := x < <sup>0</sup>, y <sup>≥</sup> <sup>0</sup> <sup>Q</sup>(x, x) <sup>∨</sup> <sup>Q</sup>(y, y). The matcher {<sup>x</sup> → <sup>y</sup>} satisfies the implication <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) and {<sup>y</sup> → <sup>x</sup>} does not. Our own algorithm for finding matchers is in the style of Stillman except that we continue after we find the first matcher [27,58].

**Implication Check.** The universal closure of the implication <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) can be solved by any SMT solver for the respective theory after we negate it. Note that the resulting formula

$$\exists x\_1, \dots, x\_n. \ A\_2 \land \neg (A\_1 \sigma) \qquad \text{where } \{x\_1, \dots, x\_n\} = \text{vars}(A\_2) \tag{1}$$

is already in clause normal form and that the formula can be treated as ground since existential variables can be handled as constants. Intuitively, the universal closure <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) asserts that the set of solutions satisfying <sup>Λ</sup><sup>2</sup> is a subset of

**Fig. 1.** Solutions of the constraints Λ1σ, Λ2, and Λ<sup>3</sup> depicted as polytopes

the set of solutions satisfying Λ1σ. This means a solution to its negation (1) is a solution for Λ2, but not for Λ1σ, thus a counterexample of the subset relation.

*Example 7.* Let us now look at an example to illustrate the role that formula (1) plays in deciding subsumption. In our example, we have three clauses: <sup>Λ</sup><sup>1</sup> <sup>C</sup>1, <sup>Λ</sup><sup>2</sup> <sup>C</sup>2, and <sup>Λ</sup><sup>3</sup> <sup>C</sup>2, where <sup>C</sup><sup>1</sup> := <sup>¬</sup>P(x, y) <sup>∨</sup> <sup>Q</sup>(u, z), <sup>C</sup><sup>2</sup> := <sup>¬</sup>P(x, y) <sup>∨</sup> <sup>Q</sup>(2, x), <sup>Λ</sup><sup>1</sup> := <sup>y</sup> <sup>≥</sup> <sup>0</sup> , y <sup>≤</sup> u, y <sup>≤</sup> <sup>x</sup>+z,y <sup>≥</sup> <sup>x</sup>+z−2·u, <sup>Λ</sup><sup>2</sup> := <sup>x</sup> <sup>≥</sup> <sup>1</sup> , y <sup>≤</sup> <sup>1</sup> , y <sup>≥</sup> <sup>x</sup>−1, and <sup>Λ</sup><sup>3</sup> := <sup>x</sup> <sup>≥</sup> <sup>2</sup> , y <sup>≤</sup> <sup>1</sup> , y <sup>≥</sup> <sup>x</sup> <sup>−</sup> 2. Our goal is to test whether <sup>Λ</sup><sup>1</sup> <sup>C</sup><sup>1</sup> subsumes the other two clauses. As our first step, we try to find a substitution <sup>σ</sup> such that <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup>2. The most general substitution fulfilling this condition is <sup>σ</sup> := {<sup>z</sup> → x, u → <sup>2</sup>}. Next, we check whether <sup>Λ</sup>1<sup>σ</sup> is implied by <sup>Λ</sup><sup>2</sup> and <sup>Λ</sup>3. Normally, we would do so by solving the formula (1) with an SMT solver, but to help our intuitive understanding, we instead look at their solution sets depicted in Fig. 1. Note that <sup>Λ</sup>1<sup>σ</sup> simplifies to <sup>Λ</sup>1<sup>σ</sup> := <sup>y</sup> <sup>≥</sup> <sup>0</sup> , y <sup>≤</sup> <sup>2</sup> , y <sup>≤</sup> <sup>2</sup>·x, y <sup>≥</sup> <sup>2</sup>·x−4. Here we see that the solution set for Λ<sup>2</sup> is a subset of Λ1σ. Hence, Λ<sup>2</sup> implies <sup>Λ</sup>1σ, which means that <sup>Λ</sup><sup>2</sup> <sup>C</sup><sup>2</sup> is subsumed by <sup>Λ</sup><sup>1</sup> <sup>C</sup>1. The solution set for <sup>Λ</sup><sup>3</sup> is not a subset of <sup>Λ</sup>1σ. For instance, the assignment <sup>β</sup><sup>2</sup> := {<sup>x</sup> → <sup>3</sup>, y → <sup>1</sup>} is a counterexample and therefore a solution to the respective instance of formula (1). Hence, <sup>Λ</sup><sup>1</sup> <sup>C</sup><sup>1</sup> does not subsume <sup>Λ</sup><sup>3</sup> <sup>C</sup>2.

**Excess Variables.** Note that in general it is not sufficient to find a substitution σ that matches the first-order parts to also match the theory constraints: <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup><sup>2</sup> does not generally imply vars(Λ1σ) <sup>⊆</sup> vars(Λ2). In particular, if Λ<sup>1</sup> contains variables that do not appear in the first-order part C1, then these must be projected to Λ2. We arrive at a variant of (1), that is <sup>∃</sup>x1,...,xn∀y1,...,ym. Λ<sup>2</sup> ∧ ¬(Λ1σ) where {x1,...,xn} = vars(Λ2) and {y1,...,ym} = vars(Λ1) \ vars(C1). Our solution to this problem is to normalize all clauses <sup>Λ</sup> <sup>C</sup> by eliminating all *excess variables* <sup>Y</sup> := vars(Λ) \ vars(C) such that vars(Λ) <sup>⊆</sup> vars(C) is guaranteed. For linear real arithmetic this is possible with quantifier elimintation techniques, e.g., Fourier-Motzkin elimination (FME). Although these techniques typically cause the size of Λ to increase exponentially, they often behave well in practice. In fact, we get rid of almost all excess variables in our benchmark examples with simplification techniques based on Gaussian elimination with execution time linear in the number of LRA atoms. Given the precondition Y = ∅ achieved by such elimination techniques, we can compute σ as matcher for the first-order parts and then directly use it for testing whether the universal closure of <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) holds. An alternative solution to the issue of excess variables has been proposed: In [2], the substitution σ is decomposed as σ = δτ , where δ is the first-order matcher and τ is a *theory matcher*, i.e. dom(<sup>τ</sup> ) ⊆ Y and vars(codom(<sup>τ</sup> )) <sup>⊆</sup> vars(Λ2). Then, exploiting Farkas' lemma, the computation of τ is reduced to testing the feasibility of a linear program (restricted to matchers that are affine transformations).

The reduction to solving a linear program offers polynomial worst-case complexity but in practice typically behaves worse than solving the variant with quantifier alternations using an SMT solver such as Z3 [36,42].

**Filtering First-Order Literals.** Even though deciding implication of theory constraints is in practice more expensive than constructing a matcher and deciding inclusion of first-order literals, we still incorporate some lightweight filters for our evaluation. Inspired by Schulz [54] we choose three features, so that every feature f maps clauses to N0, and f(C1) <sup>f</sup>(C2) is necessary for <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup>2.

The features are: <sup>|</sup>C<sup>+</sup>|, the number of positive first-order literals in <sup>C</sup>, <sup>|</sup>C<sup>−</sup>|, the number of negative first-order literals in <sup>C</sup>, and C, the number of occurrences of constants in C.

**Sample Point Heuristic.** The majority of subsumption tests fail because we cannot find a fitting substitution for their first-order parts. In our experiments, between 66.5% and 99.9% of subsumption tests failed this way. This means our tool only has to check in less than 33.5% of the cases whether one theory constraint implies the other. Despite this, our tool spends more time on implication checks than on the first-order part of the subsumption tests without filtering on the constraint implication tests. The reason is that constraint implication tests are typically much more expensive than the first-order part of a subsumption test. For this reason, we developed the *sample point heuristic* that is much faster to execute than a full constraint implication test, but still filters out the majority of implications that do not hold (in our experiments between 93.8% and 100%).

The idea behind the sample point heuristic is straightforward. We store for each clause <sup>Λ</sup> <sup>C</sup> a sample solution <sup>β</sup> for its theory constraint <sup>Λ</sup>. Before we execute a full constraint implication test, we simply evaluate whether the sample solution β for Λ<sup>2</sup> is also a solution for Λ1σ. If this is not the case, then β is a solution for (1) and a counterexample for the implication. If β is a solution for Λ1σ, then the heuristic returns unknown and we have to execute a full constraint implication test, i.e., solve the SMT problem (1).

Often it is possible to get our sample solutions for free. Theorem provers based on hierarchic superposition typically check for every new clause <sup>Λ</sup> <sup>C</sup> whether Λ is satisfiable in order to eliminate tautologies. This means we can already use this tautology check to compute and store a sample solution for every new clause without extra cost. We only need to pick a solver for the check that returns a solution as a certificate of satisfiability. Although the SCL(T) calculus never learns any tautologies, it is also possible to get a sample solution for free as part of its conflict analysis [11].

*Example 8.* We revisit Example 7 to illustrate the sample point heuristic. During the tautology check for <sup>Λ</sup><sup>2</sup> <sup>C</sup><sup>2</sup> and <sup>Λ</sup><sup>3</sup> <sup>C</sup>2, we determined that <sup>β</sup><sup>1</sup> := {<sup>x</sup> → <sup>2</sup>, y → <sup>1</sup>} is a sample solution for <sup>Λ</sup><sup>2</sup> and <sup>β</sup><sup>2</sup> := {<sup>x</sup> → <sup>3</sup>, y → <sup>1</sup>} a sample solution for Λ3. Since Λ<sup>2</sup> implies Λ1σ, all sample solutions for Λ<sup>2</sup> automatically satisfy Λ1σ. This is the reason why the sample point heuristic never filters out an implication that actually holds, i.e., it returns unknown when we test whether Λ<sup>2</sup> implies Λ1σ. The assignment β<sup>2</sup> on the other hand does not satisfy Λ1σ. Hence, the sample point heuristic correctly claims that Λ<sup>3</sup> does not imply Λ1σ. Note that we could also have chosen β<sup>1</sup> as the sample point for Λ3. In this case, the sample point heuristic would also return unknown for the implication <sup>Λ</sup><sup>3</sup> <sup>→</sup> <sup>Λ</sup>1<sup>σ</sup> although the implication does not hold.

**Trivial Cases.** Subsumption tests become much easier if the constraint Λ<sup>i</sup> of one of the participating clauses is empty. We use two heuristic filters to exploit this fact. We highlight them here because they already exclude some subsumption tests before we reach the sample point heuristic in our implementation.

The *empty conclusion heuristic* exploits that Λ<sup>1</sup> is valid if Λ<sup>1</sup> is empty. In this case, all implications <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) hold because <sup>Λ</sup>1<sup>σ</sup> evaluates to true under any assignment. So by checking whether <sup>Λ</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup>, we can quickly determine whether <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) holds for some pairs of clauses. Note that in contrast to the sample point heuristic, this heuristic is used to find valid implications.

The *empty premise test* exploits that Λ<sup>2</sup> is valid if Λ<sup>2</sup> is empty. In this case, an implication <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) may only hold if <sup>Λ</sup>1<sup>σ</sup> simplifies to the empty set as well. This is the case because any inequality in the canonical form <sup>n</sup> <sup>i</sup>=1 <sup>a</sup>ixic either simplifies to true (because a<sup>i</sup> = 0 for all i = 1,...,n and 0c holds) and can be removed from Λ1σ, or the inequality eliminates at least one assignment as a solution for <sup>Λ</sup>1<sup>σ</sup> [51]. So if <sup>Λ</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>, we check whether <sup>Λ</sup>1<sup>σ</sup> simplifies to the empty set instead of solving the SMT problem (1).

**Pipeline.** We call our approach a *pipeline* since it combines multiple procedures, which we call *stages*, that vary in complexity and are independent in principle, for the overall aim of efficiently testing subsumption. Pairs of clauses that "make it through" all stages, are those for which the subsumption relation holds. The pipeline is designed with two goals in mind: (1) To reject as many pairs of clauses as early as possible, and (2) to move stages further towards the end of the pipeline the more expensive they are.

The pipeline consists of six stages, all of which are mentioned above. We divide the pipeline into two phases, the *first-order phase* (FO-phase) consisting of two stages, and the *constraint phase* (C-phase), consisting of four stages. First-order filtering rejects all pairs of clauses for which f(C1) > f(C2) holds. Then, matching constructs all matchers <sup>σ</sup> such that <sup>C</sup>1<sup>σ</sup> <sup>⊆</sup> <sup>C</sup>2. Every matcher is individually tested in the constraint phase. Technically, this means that the input of all following stages is not just a pair of clauses, but a triple of two clauses and a matcher. The constraint phase then proceeds with the empty conclusion heuristic and the empty premise test to accept (resp. reject) all trivial cases of **Algorithm 1:** Saturation prover used for evaluation

**Input :** A set N of clauses. **Output :** ⊥ or "unknown". U := {C ∈ N | |C| = 1} **while** U = ∅ **do <sup>3</sup>** M := ∅ **foreach** C ∈ U **do** M := M ∪ resolvents(C, N) **if** ⊥ ∈ M **then return** ⊥ reduce M using N (forward subsumption) **if** M = ∅ **then return** "unknown" reduce N using M (backward subsumption) U := {C ∈ M | |C| = 1} N := N ∪ M **11 end return** "unknown"

the constraint implication test. The next stage is the sample point heuristic. If the sample solution β<sup>2</sup> for Λ<sup>2</sup> is no solution for Λ<sup>1</sup> (i.e. - Λ1σβ2), then the matcher <sup>σ</sup> is rejected. Otherwise (i.e. <sup>Λ</sup>1σβ2), the implication test <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) is performed by solving the SMT problem (1) to produce the overall result of the pipeline and finally determine whether subsumption holds.

# **4 Experimentation**

In order to evaluate our new approach on three benchmark instances, derived from BS(LRA) applications, all presented techniques and their combination in form of a pipeline were implemented in the theorem prover SPASS-SPL, a prototype for BS(LRA) reasoning.

Note that SPASS-SPL contains more than one approach for BS(LRA) reasoning, e.g., the Datalog hammer for HBS(LRA) reasoning [10]. These various modes of operation operate independently, and the desired mode is chosen via command-line option. The reasoning approach discussed here is the current default option. On the first-order side, SPASS-SPL consists of a simple saturation prover based on hierarchic unit resolution, see Algorithm 1. It resolves unit clauses with other clauses until either the empty clause is derived or no new clauses can be derived. Note that this procedure is only complete for Horn clauses. For arithmetic reasoning, SPASS-SPL relies on SPASS-SATT, our sound and complete CDCL(LA) solver for quantifier-free linear real and linear mixed/integer arithmetic [12]. SPASS-SATT implements a version of the dual simplex algorithm fine-tuned towards SMT solving [16]. In order to ensure soundness, SPASS-SATT represents all numbers with the help of the *arbitraryprecision arithmetic library* FLINT [31]. This means all calculations, including the implication test and the sample point heuristic, are always exact and thus free of numerical errors. The most relevant part of SPASS-SPL with regards to


**Table 1.** Overview of how many clause pairs advance in the pipeline (top to bottom)

**Table 2.** An overview of the accuracy of non-perfect pipeline stages


this paper is that it performs tautology and subsumption deletion to eliminate redundant clauses. As a preprocessing step, SPASS-SPL eliminates all tautologies from the set of input clauses. Similarly, the function resolvents(C, N) (see Line 4 of Algorithm 1) filters out all newly derived clauses that are tautologies. Note that we also use these tautology checks to eliminate all excess variables and to store sample solutions for all remaining clauses. After each iteration of the algorithm, we also check for subsumed clauses. We first eliminate newly generated clauses by forward subsumption (see Line 6 of Algorithm 1), then use the remaining clauses for backward subsumption (see Line 8 of Algorithm 1).

*Benchmarks.* Our benchmarking instances come out of three different applications. (1.) A supervisor for an automobile lane change assistant, formulated in the Horn fragment of BS(LRA) [9,10] (five instances, referred to as lc in aggregate). (2.) The formalization of reachability for non-deterministic timed automata, formulated in the non-Horn fragment of BS(LRA) [20] (one instance, referred to as tad). (3.) Formalizations of variants of mutual exclusion protocols, such as the bakery protocol [38], also formulated in the non-Horn fragment of BS(LRA) [19] (one instance, referred to as bakery). The machine used for benchmarking features an Intel Xeon W-1290P CPU (10 cores, 20 threads, up to 5.2 GHz) and 64 GiB DDR4-2933 ECC main memory. Runtime was limited to ten minutes, and memory usage was not limited.


**Table 3.** Evaluation of the sample point heuristic

*Evaluation.* In Table 1 we give an overview of how many pairs of clauses advance how far in the pipeline (in thousands). Rows with grey background refer to a stage of the pipeline and show which portion of pairs of clauses were kept, relative to the previous stage. Rows with white background refer to (virtual) sets of clauses, their absolute size, and their size relative to the number of attempted tests, as well as the condition(s) established. The three groups of columns refer to groups of benchmark instances. Results vary greatly between lc and the aggregate of bakery and tad. In lc the relative number of subsumed clauses is significantly smaller (0.0027% compared to 0.0416%). FO Matching eliminates a large number of pairs in lc, because the number of predicate symbols, and their arity (lc1, ..., lc4: 36 predicates, arities up to 5; lc5: 53 predicates, arities up to 12) is greater than in bakery (11 predicates, all of arity 2) and tad (4 predicates, all of arity 2).

*Binary Classifiers.* To evaluate the performance of each stage of the proposed test pipeline, we view each stage individually as a binary classifier on pairs of constrained clauses. The two classes we consider are "subsumes" (positive outcome) and "does not subsume" (negative outcome). Each stage of the pipeline computes a *prediction* on the *actual* result of the overall pipeline. We are thus interested in minimizing two kinds of errors: (1) When one stage of the pipeline predicts that the subsumption test will succeed (the prediciton is positive) but it fails (the actual result is negative), called *false positive* (FP). (2) When one stage of the pipeline predicts that the subsumption test will fail (the prediction is negative) but it succeeds (the actual result is positive), called *false negative* (FN). Dually, a correct prediction is called *true positive* (TP) and *true negative* (TN). For each stage, at least one kind of error is excluded by design: Firstorder filtering and the sample point heuristic never produce false negatives. The empty conclusion heuristic never produces false positives. The empty premise test is perfect, i.e. it neither produces false positives nor false negatives, with the caveat of not always being applicable. The last stage (implication test) decides the overall result of the pipeline, and thus is also perfect. For evaluation of binary classifiers, we use four different measures (two symmetric pairs):

$$\text{SPC} = \text{TN} \div (\text{TN} + \text{FP}) \qquad \text{PPV} = \text{TP} \div (\text{TP} + \text{FP}) \qquad (2)$$

The first pair, *specificity* (SPC) and *positive predictive value*, see (2), is relevant only in presence of false postives (the measures approach 1 as FP approaches 0).

$$\text{SEN} = \text{TP} \div (\text{TP} + \text{FN}) \qquad \qquad \text{NPV} = \text{TN} \div (\text{TN} + \text{FN}) \qquad \qquad (3)$$

The second pair, *sensitivity* (SEN) and *negative predictive value* (NPV), see (3), is relevant only in presence of false negatives (the measures approach 1 as FN approaches 0). Specificity (resp. sensitivity) might be considered the "success rate" in our setup. They answer the question: "Given the *actual* result of the pipeline is 'subsumed' (resp. 'not subsumed'), in how many cases does this stage *predict* correctly?" A specificity (resp. sensitivity) of 0.99 means that the classifier produces a false positive (resp. negative), i.e. a wrong prediction, in one out of one hundred cases. Both measures are independent of the prevalence of particular actual results, i.e. the measures are not biased by instances that feature many (or few) subsumed clauses. On the other hand, positive and negative predictive value are biased by prevalence. They answer the following question: "Given this stage of the pipeline *predicts* 'subsumed' (resp. 'not subsumed'), how likely is it that the *actual* result indeed is 'subsumed' (resp. 'not subsumed')?"

In Table 2 we present for all non-perfect stages of the pipeline specificity (for those that produce false positives) and sensitivity (for those that produce false negatives) as well as the (positive/negative) predictive value. Note that the sample point heuristic has an exceptionally high specificity, still above 93% in the benchmarks where it performed worst. For the benchmarks bakery and tad it even performs perfectly. Combined, this gives a specificity of above 99.99%. Considering FO Filtering, we expect limited performance, since the structure of terms in BS is flat compared to the rich structure of terms as trees in full first-order logic. This is evidenced by a comparatively low specificity of 35%. However, this classifier is very easy to compute, so pays for itself. FO Matching is a much better classifier, at an aggregate sensitivity of 93%. Even though this classifier is NP-complete, this is not problematic in practice.

*Runtime.* In Table 3 we focus on the runtime improvement achieved by the sample point heuristic. In the first two lines (Bottleneck), we highlight how much slower testing implication of constraints (the C-phase) is compared to treating the firstorder part (the FO-phase). This is equivalent to the time taken for the C-phase per pair of clauses (that reach at least the first C-phase) divided by the time taken for the FO-phase per pair of clauses. We see that without the sample point heuristic, we can expect the constraint implication test to take hundreds to thousands of times longer than the FO-phase. Adding the sample point heuristic decreases this ratio to below one hundred. In the fourth line (avg. pipeline runtime) we do not give a ratio, but the average time it takes to compute the whole pipeline. We achieve millions of subsumption checks per second. In the fifth line (Speedup), we take the time that all C-phases combined take per pair of clauses that reach at least the first C-phase, and take the ratio to the same time without applying the sample point heuristic. In the sixth line (Benefit-to-cost), we consider the time taken to compute the sample point vs. the time it saves. The benefit is about two orders of magnitude greater than the cost.

# **5 Conclusion**

Our next step will be the integration of the subsumption test in the backward subsumption procedure of an SCL based reasoning procedure for BS(LRA) [11] which is currently under development.

There are various ways to improve the sample point heuristic. One improvement would be to store and check multiple sample points per clause. For instance, whenever the sample point heuristic fails and the implication test for <sup>Λ</sup><sup>2</sup> <sup>→</sup> (Λ1σ) also fails, store the solution to (1) as an additional sample point for Λ2. The new sample point will filter out any future implication tests with Λ1σ or similar constraints. However, testing too many sample points might lead to costs outweighing benefits. A potential solution to this problem would be score-based garbage collection, as done in SAT solvers [57]. Another way to store and check multiple sample points per clause is to store a compact description of a set of points that is easy to check against. For instance, we can store the center point and edge length of the largest orthogonal hypercube contained in the solutions of a constraint, which is equivalent to infinitely many sample points. Computing the largest orthogonal hypercube for an LRA constraint is not much harder than finding a sample solution [14]. Checking whether a cube is contained in an LRA constraint works almost the same as evaluating a sample point [14].

Although we developed our sample point technique for the BS(LRA) fragment it is obvious that it will also work for the overall FOL(LRA) clause fragment, because this extension does not affect the LRA constraint part of clauses. From an automated reasoning perspective, satisfiability of the FOL(LRA) and BS(LRA) fragments (clause sets) is undecidable in both cases. Actually, satisfiability of a BS(LRA) clause set is already undecidable if the first-order part is restricted to a single monadic predicate [32]. The first-order part of BS(LRA) is decidable and therefore enables effective guidance for an overall reasoning procedure [11]. Form an application perspective, the BS(LRA) fragment already encompasses a number of used (sub)languages. For example, timed automata [3] and a number of extensions thereof are contained in the BS(LRA) fragment [60].

We also believe that the sample point heuristic will speed up the constraint implication test for FOL(LIA), first-order clauses over linear integer arithmetic, FOL(NRA), i.e., first-order clauses over non-linear real arithmetic, and other combinations of FOL with arithmetic theories. However, the non-linear case will require a more sophisticated setup due to the nature of test points in this case, e.g., a solution may contain root expressions.

**Acknowledgments.** This work was partly funded by DFG grant 389792660 as part of TRR 248, see https://perspicuous-computing.science. We thank the anonymous reviewers for their thorough reading and detailed constructive comments. Martin Desharnais suggested some textual improvements.

# **References**

1. Alagi, G., Weidenbach, C.: NRCL - a model building approach to the Bernays-Sch¨onfinkel fragment. In: Lutz, C., Ranise, S. (eds.) FroCoS 2015. LNCS (LNAI), vol. 9322, pp. 69–84. Springer, Cham (2015). https://doi.org/10.1007/978-3-319- 24246-0 5


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Ground Joinability and Connectedness in the Superposition Calculus**

Andr´e Duarte(B) and Konstantin Korovin(B)

The University of Manchester, Manchester, UK *{*andre.duarte,konstantin.korovin*}*@manchester.ac.uk

**Abstract.** Problems in many theories axiomatised by unit equalities (UEQ), such as groups, loops, lattices, and other algebraic structures, are notoriously difficult for automated theorem provers to solve. Consequently, there has been considerable effort over decades in developing techniques to handle these theories, notably in the context of Knuth-Bendix completion and derivatives. The superposition calculus is a generalisation of completion to full first-order logic; however it does not carry over all the refinements that were developed for it, and is therefore not a strict generalisation. This means that (i) as of today, even state of the art provers for first-order logic based on the superposition calculus, while more general, are outperformed in UEQ by provers based on completion, and (ii) the sophisticated techniques developed for completion are not available in any problem which is not in UEQ. In particular, this includes key simplifications such as ground joinability, which have been known for more than 30 years. In fact, all previous completeness proofs for ground joinability rely on proof orderings and proof reductions, which are not easily extensible to general clauses together with redundancy elimination. In this paper we address this limitation and extend superposition with ground joinability, and show that under an adapted notion of redundancy, simplifications based on ground joinability preserve completeness. Another recently explored simplification in completion is connectedness. We extend this notion to "ground connectedness" and show superposition is complete with both connectedness and ground connectedness. We implemented ground joinability and connectedness in a theorem prover, iProver, the former using a novel algorithm which we also present in this paper, and evaluated over the TPTP library with encouraging results.

**Keywords:** Superposition · Ground joinability · Connectedness · Closure redundancy · First-order theorem proving

## **1 Introduction**

Automated theorem provers based on equational completion [4], such as Waldmeister, MædMax or Twee [13,21,25], routinely outperform superposition-based provers on unit equality problems (UEQ) in competitions such as CASC [22], despite the fact that the superposition calculus was developed as a generalisation of completion to full clausal first-order logic with equality [19]. One of the main ingredients for their good performance is the use of ground joinability criteria for the deletion of redundant equations [1], among other techniques. However, existing proofs of refutational completeness of deduction calculi wrt. these criteria are restricted to unit equalities and rely on proof orderings and proof reductions [1,2,4], which are not easily extensible to general clauses together with redundancy elimination.

Since completion provers perform very poorly (or not at all) on non-UEQ problems (relying at best on incomplete transformations to unit equality [8]), this motivates an attempt to transfer those techniques to the superposition calculus and prove their completeness, so as to combine the generality of the superposition calculus with the powerful simplification rules of completion. To our knowledge, no prover for first-order logic incorporates ground joinability redundancy criteria, except for particular theories such as associativity-commutativity (AC) [20].

For instance, if *f*(*x, y*) <sup>≈</sup> *f*(*y, x*) is an axiom, then the equation *f*(*x, f*(*y, z*)) <sup>≈</sup> *f*(*x, f*(*z,y*)) is redundant, but this cannot be justified by any simplificaton rule in the superposition calculus. On the other hand, a completion prover which implements ground joinability can easily delete the latter equation wrt. the former. We show that ground joinability can be enabled in the superposition calculus without compromising completeness.

As another example, the simplification rule in completion can use *f*(*x*) <sup>≈</sup> *s* (when *f*(*x*) *s*) to rewrite *f*(*a*) <sup>≈</sup> *t* regardless of how *s* and *t* compare, while the corresponding demodulation rule in superposition can only rewrite if *s* <sup>≺</sup> *t*. Our "encompassment demodulation" rule matches the former, while also being complete in the superposition calculus.

In [11] we introduced a novel theoretical framework for proving completeness of the superposition calculus, based on an extension of Bachmair-Ganzinger model construction [5], together with a new notion of redundancy called "closure redundancy". We used it to prove that certain AC joinability criteria, long used in the context of completion [1], could also be incorporated in the superposition calculus for full first-order logic while preserving completeness.

In this paper, we extend this framework to show the completeness of the superposition calculus extended with: (i) a general ground joinability simplification rule, (ii) an improved encompassment demodulation simplification rule, (iii) a connectedness simplification rule extending [3,21], and (iv) a new ground connectedness simplification rule. The proof of completeness that enables these extensions is based on a new encompassment closure ordering. In practice, these extensions help superposition to be competitive with completion in UEQ problems, and improves the performance on non-UEQ problems, which currently do not benefit from these techniques at all.

We also present a novel incremental algorithm to check ground joinability, which is very efficient in practice; this is important since ground joinability can be an expensive criterion to test. Finally, we discuss some of the experimental results we obtained after implementing these techniques in iProver [10,16].

The paper is structured as follows. In Sect. 2 we define some basic notions to be used throughout the paper. In Sect. 3 we define the closure ordering we use to prove redundancies. In Sect. 4 we present redundancy criteria for demodulation, ground joinability, connectedness, and ground connectedness. We prove their completeness in the superposition calculus, and discuss a concrete algorithm for checking ground joinability, and how it may improve on the algorithms used in e.g. Waldmeister [13] or Twee [21]. In Sect. 5 we discuss experimental results.

## **2 Preliminaries**

We consider a signature consisting of a finite set of function symbols and the equality predicate as the only predicate symbol. We fix a countably infinite set of variables. First-order *terms* are defined in the usual manner. Terms without variables are called *ground* terms. A *literal* is an unordered pair of terms with either positive or negative polarity, written *s* <sup>≈</sup> *t* and *s* ≈ *t* respectively (we write *s* <sup>≈</sup>˙ *t* to mean either of the former two). A *clause* is a multiset of literals. Collectively terms, literals, and clauses will be called *expressions*.

A *substitution* is a mapping from variables to terms which is the identity for all but finitely many variables. An injective substitution onto variables is called a *renaming*. If *e* is an expression, we denote application of a substitution *σ* by *eσ*, replacing all variables with their image in *σ*. Let GSubs(*e*) = {*σ* <sup>|</sup> *eσ* is ground} be the set of *ground substitutions* for *e*. Overloading this notation for sets we write GSubs(*E*) = {*σ* | ∀*e* <sup>∈</sup> *E. eσ* is ground}. Finally, we write e.g. GSubs(*e*1*, e*<sup>2</sup>) instead of GSubs({*e*1*, e*<sup>2</sup>}). The identity substitution is denoted by .

A substitution *θ* is *more general* than *σ* if *θρ* <sup>=</sup> *σ* for some substitution *ρ* which is not a renaming. If *s* and *t* can be *unified*, that is, if there exists *σ* such that *sσ* <sup>=</sup> *tσ*, then there also exists the *most general unifier*, written mgu(*s, t*). A term *s* is said to be *more general* than *t* if there exists a substitution *θ* that makes *sθ* <sup>=</sup> *t* but there is no substitution *σ* such that *tσ* <sup>=</sup> *s*. Two terms *s* and *t* are said to be *equal modulo renaming* if there exist injective *θ, σ* such that *sθ* <sup>=</sup> *t* and *tσ* <sup>=</sup> *s*. The relations "less general than", "equal modulo renaming", and their union are represented respectively by the symbols -, ≡, and .

A more refined notion of instance is that of *closure* [6]. Closures are pairs *e* · *σ* that are said to *represent* the expression *eσ* while retaining information about the original term and its instantiation. Closures where *eσ* is ground are said to be *ground closures*. Let GClos(*e*) = {*e* · *<sup>σ</sup>* <sup>|</sup> *eσ* is ground} be the set of ground closures of *e*. Overloading the notation for sets, if *N* is a set of clauses then GClos(*N*) = - *<sup>C</sup>*∈*<sup>N</sup>* GClos(*C*).

We write *s*[*t*] if *t* is a *subterm* of *s*. If also *s* <sup>=</sup> *t*, then it is a *strict subterm*. We denote these relations by *s t* and *s t* respectively. We write *s*[*t* <sup>→</sup> *t* ] to denote the term obtained from *s* by replacing all occurrences of *t* by *t* .

A (strict) partial order is a binary relation which is transitive (*a b c* <sup>⇒</sup> *a c*), irreflexive (*a a*), and asymmetric (*a b* <sup>⇒</sup> *b a*). A (non-strict) partial preorder (or quasiorder) is any transitive, reflexive relation. A (pre)order is total over *X* if <sup>∀</sup>*x, y* <sup>∈</sup> *X. x y* <sup>∨</sup> *y x*. Whenever a non-strict (pre)order is given, the induced equivalence relation ∼ is ∩ , and the induced strict pre(order) is \∼. The *transitive closure* of a relation , the smallest transitive relation that contains , is denoted by <sup>+</sup>. A *transitive reduction* of a relation , the smallest relation whose transitive closure is , is denoted by −.

For an ordering over a set *X*, its *multiset extension* over multisets of *X* is given by: *A B* iff *A* <sup>=</sup> *B* and <sup>∀</sup>*x* <sup>∈</sup> *B. B*(*x*) *> A*(*x*) <sup>∃</sup>*y* <sup>∈</sup> *A. y x*∧*A*(*y*) *> B*(*y*), where *A*(*x*) is the number of occurrences of element *x* in multiset *A* (we also use for the the multiset extension of ). It is well known that the mutltiset extension of a well-founded/total order is also a well-founded/total order, respectively [9]. The *(n-fold) lexicographic extension* of over *X* is denoted lex over ordered *<sup>n</sup>*-tuples of *<sup>X</sup>*, and is given by *x*1*,...,x<sup>n</sup>* lex *y*1*,...,y<sup>n</sup>* iff <sup>∃</sup>*i. x*<sup>1</sup> <sup>=</sup> *<sup>y</sup>*<sup>1</sup> ∧ ··· ∧ *<sup>x</sup><sup>i</sup>*−<sup>1</sup> <sup>=</sup> *<sup>y</sup><sup>i</sup>*−<sup>1</sup> <sup>∧</sup> *<sup>x</sup><sup>i</sup> <sup>y</sup><sup>i</sup>*. The lexicographic extension of a well-founded/total order is also a well-founded/total order, respectively.

A binary relation <sup>→</sup> over the set of terms is a *rewrite relation* if (i) *l* <sup>→</sup> *r* <sup>⇒</sup> *lσ* <sup>→</sup> *rσ* and (ii) *l* <sup>→</sup> *r* <sup>⇒</sup> *s*[*l*] <sup>→</sup> *s*[*l* <sup>→</sup> *r*]. The *reflexive-transitive closure* of a relation is the smallest reflexive-transitive relation which contains it. It is denoted by <sup>∗</sup> <sup>→</sup>. Two terms are *joinable* (*s* <sup>↓</sup> *t*) if *s* <sup>∗</sup> <sup>→</sup> *u* <sup>∗</sup> <sup>←</sup> *t*.

If a rewrite relation is also a strict ordering, then it is a *rewrite ordering*. A *reduction ordering* is a rewrite ordering which is well-founded. In this paper we consider reduction orderings which are total on ground terms, such orderings are also *simplification orderings* i.e., satisfy *s t* <sup>⇒</sup> *s t*.

### **3 Ordering**

In [11] we presented a novel proof of completeness of the superposition calculus based on the notion of closure redundancy, which enables the completeness of stronger redundancy criteria to be shown, including AC normalisation, AC joinability, and encompassment demodulation. In this paper we use a slightly different closure ordering (*cc*), in order to extract better completeness conditions for the redundancy criteria that we present in this paper (the definition of closure redundant clause and closure redundant inference is parametrised by this *cc*).

Let *<sup>t</sup>* be a simplification ordering which is total on ground terms. We extend this first to an ordering on ground term closures, then to an ordering on ground clause closures. Let

$$s \cdot \sigma \succ\_{tc'} t \cdot \rho \qquad\qquad\text{iff}\qquad\begin{array}{c} \text{either } s\sigma \succ\_t t\rho\\ \text{or else } s\sigma = t\rho \text{ and } s \sqsupset t, \end{array}\tag{1}$$

where *sσ* and *tρ* are ground, and let *tc* be an (arbitrary) total well-founded extension of *tc*-. We extend this to an ordering on clause closures. First let

$$M\_{lc}((s \approx t) \cdot \theta) = \{s\theta \cdot \epsilon, t\theta \cdot \epsilon\},\tag{2}$$

$$M\_{lc}((s\not\simeq t)\cdot\theta) = \{s\theta\cdot\epsilon, t\theta\cdot\epsilon, s\theta\cdot\epsilon, t\theta\cdot\epsilon\},\tag{3}$$

and let *<sup>M</sup>cc* be defined as follows, depending on whether the clause is unit or non-unit:

$$M\_{cc}(\emptyset \cdot \theta) = \emptyset,\tag{4}$$

$$M\_{cc}(\emptyset \cdot \theta, \{\emptyset\}, \{\emptyset\}) = M\_{cc}(\emptyset \cdot \theta, \{\emptyset\}, \{\emptyset\}) \tag{5}$$

$$M\_{cc}((s \approx t) \cdot \theta) = \{\{s \cdot \theta\}, \{t \cdot \theta\}\},\tag{5}$$

$$M\_{cc}((s\not\simeq t)\cdot\theta) = \{\{s\cdot\theta, t\cdot\theta, s\theta\cdot\epsilon, t\theta\cdot\epsilon\}\},\tag{6}$$

$$M\_{cc}((s\not\simeq t)\cdot\epsilon, s\mapsto(s\not\simeq)\cdot\theta) = \{\{s\cdot\theta, s\cdot\epsilon, t\cdot\theta, s\cdot\epsilon, t\cdot\epsilon, t\}\}\tag{7}$$

$$M\_{cc}((s \dot{\approx} t \vee \cdots) \cdot \theta) = \{M\_{lc}(L \cdot \theta) \mid L \in (s \dot{\approx} t \vee \cdots)\},\tag{7}$$

then *cc* is defined by

$$C \cdot \sigma \succ\_{cc} D \cdot \rho \qquad\qquad\text{iff}\quad M\_{cc}(C \cdot \sigma) \ngg \succ\_{tc} M\_{cc}(D \cdot \rho). \tag{8}$$

The main purpose of this definition is twofold: (i) that when *sθ <sup>t</sup> tθ* and *<sup>u</sup>* occurs in a clause *<sup>D</sup>*, then *sθ <sup>u</sup>* or *<sup>s</sup> sθ* <sup>=</sup> *<sup>u</sup>* implies (*<sup>s</sup>* <sup>≈</sup> *<sup>t</sup>*)· *θρ* <sup>≺</sup>*cc <sup>D</sup>* · *<sup>ρ</sup>*, and (ii) that when *C* is a positive unit clause, *D* is not, *s* is the maximal subterm in *Cθ* and *<sup>t</sup>* is the maximal subterm in *Dσ*, then *<sup>s</sup> <sup>t</sup> <sup>t</sup>* implies *<sup>C</sup>* · *<sup>θ</sup>* <sup>≺</sup>*cc <sup>D</sup>* · *<sup>σ</sup>*. These two properties enable unconditional rewrites via oriented unit equations on positive unit clauses to succeed whenever they would also succeed in unfailing completion [4], and rewrites on negative unit and non-unit clauses to always succeed. This will enable us to prove the correctness of the simplification rules presented in the following section.

### **4 Redundancies**

In this section we present several redundancy criteria for the superposition calculus and prove their completeness. Recall the definitions in [11]: a clause *C* is redundant in a set *S* if all its ground closures *C* · *θ* follow from closures in GClos(*S*) which are smaller wrt. *cc*; an inference *<sup>C</sup>*1*,...,C<sup>n</sup>* |− *<sup>D</sup>* is redundant in a set *S* if, for all *θ* <sup>∈</sup> GSubs(*C*1*,...,C<sup>n</sup>, D*) such that *<sup>C</sup>*1*θ,..., C<sup>n</sup><sup>θ</sup>* |− *Dθ* is a valid inference, the closure *D* · *θ* follows from closures in GClos(*S*) such that each is smaller than some *<sup>C</sup>*<sup>1</sup> · *θ,..., C<sup>n</sup>* · *<sup>θ</sup>*. These definitions (in terms of ground closures rather than in terms of ground clauses, as in [19]) arise because they enable us to justify stronger redundancy criteria for application in superposition theorem provers, including the AC criteria developed in [11] and the criteria in this section.

**Theorem 1.** The superposition calculus [19] is refutationally complete wrt. closure redundancy, that is, if a set of clauses is saturated up to closure redundancy (meaning any inference with non-redundant premises in the set is redundant) and does not contain the empty clause, then it is satisfiable.

*Proof.* The proof of completeness of the superposition calculus wrt. this closure ordering carries over from [11] with some modifications, which are presented in a full version of this paper [12].

#### **4.1 Encompassment Demodulation**

We introduce the following definition, to be re-used throughout the paper.

**Definition 1.** A rewrite via *l* <sup>≈</sup> *r* in clause *C*[*lθ*] is *admissible* if one of the following conditions holds: (i) *C* is not a positive unit, or (let *C* <sup>=</sup> *s*[*lθ*] <sup>≈</sup> *t* for some *θ*) (ii) *lθ* <sup>=</sup> *s*, or (iii) *lθ <sup>l</sup>*, or (iv) *<sup>s</sup>* <sup>≺</sup>*<sup>t</sup> <sup>t</sup>*, or (v) *rθ* <sup>≺</sup>*<sup>t</sup> <sup>t</sup>*. 1

<sup>1</sup> We note that (iv) is superfluous, but we include it since in practice it is easier to check, as it is local to the clause being rewritten and therefore needs to be checked only once, while (v) needs to be checked with each demodulation attempt.

We then have

$$\begin{array}{ll}\text{Encompassment} & l \approx r \quad \underline{C}[l\theta] \\\text{Demodulation} & \underline{C}[l\theta \mapsto r\theta] \end{array} \quad \text{where } l\theta \succ\_{t} r\theta \text{, and} \\\text{ }\text{rewrite via } l \approx r \text{ in } C \text{ is admissible.} \quad \text{ }(9)$$

In other words, given an equation *l* <sup>≈</sup> *r*, if an instance *lθ* is a subterm in *C*, then the rewrite is admissible (meaning, for example, that an unconditional rewrite is allowed when *lθ <sup>t</sup> rθ*) if *<sup>C</sup>* is not a positive unit, or if *lθ* occurs at a strict subterm position, or if *lθ* is less general than *l*, or if *lθ* occurs outside a maximal side, or if *rθ* is smaller than the other side. This restriction is much weaker than the one given for the usual demodulation rule in superposition [17], and equivalent to the one in equational completion when we restrict ourselves to unit equalities [4].

*Example 1.* If *<sup>f</sup>*(*x*) *<sup>t</sup> <sup>s</sup>*, we can use *<sup>f</sup>*(*x*) <sup>≈</sup> *<sup>s</sup>* to rewrite *<sup>f</sup>*(*x*) <sup>≈</sup> *<sup>t</sup>* when *<sup>s</sup>* <sup>≺</sup>*<sup>t</sup> <sup>t</sup>*, and *f*(*a*) <sup>≈</sup> *t*, *f*(*x*) ≈ *t*, or *f*(*x*) <sup>≈</sup> *t* <sup>∨</sup> *C* regardless of how *s* and *t* compare.

#### **4.2 General Ground Joinability**

In [11] we developed redundancy criteria for the theory of AC functions in the superposition calculus. In this section we extend these techniques to develop redundancy criteria for ground joinability in arbitrary equational theories.

**Definition 2.** Two terms are *strongly joinable* (*<sup>s</sup> t*), in a clause *C* wrt. a set of equations *<sup>S</sup>*, if either *<sup>s</sup>* <sup>=</sup> *<sup>t</sup>*, or *<sup>s</sup>* <sup>→</sup> *<sup>s</sup>*[*l*1*σ*<sup>1</sup> <sup>→</sup> *r*1*σ*<sup>1</sup>] <sup>∗</sup> <sup>→</sup> *<sup>t</sup>* via rules *<sup>l</sup><sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>* <sup>∈</sup> *<sup>S</sup>*, where the rewrite via *<sup>l</sup>*<sup>1</sup> <sup>≈</sup> *<sup>r</sup>*<sup>1</sup> is admissible in *<sup>C</sup>*, or *<sup>s</sup>* <sup>→</sup> *<sup>s</sup>*[*l*1*σ*<sup>1</sup> <sup>→</sup> *r*1*σ*<sup>1</sup>] <sup>↓</sup> *<sup>t</sup>*[*l*2*σ*<sup>2</sup> <sup>→</sup> *<sup>r</sup>*2*σ*<sup>2</sup>] <sup>←</sup> *<sup>t</sup>* via rules *<sup>l</sup><sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>* <sup>∈</sup> *<sup>S</sup>*, where the rewrites via *<sup>l</sup>*<sup>1</sup> <sup>≈</sup> *<sup>r</sup>*<sup>1</sup> and *<sup>l</sup>*<sup>2</sup> <sup>≈</sup> *<sup>r</sup>*<sup>2</sup> are admissible in *<sup>C</sup>*. To make the ordering explicit, we may write *<sup>s</sup> <sup>t</sup>*. Two terms are *strongly ground joinable* (*s <sup>t</sup>*), in a clause *<sup>C</sup>* wrt. a set of equations *S*, if for all *θ* <sup>∈</sup> GSubs(*s, t*) we have *sθ tθ* in *<sup>C</sup>* wrt. *<sup>S</sup>*.

We then have:

**Theorem 2.** Ground joinability is a sound and admissible redundancy criterion of the superposition calculus wrt. closure redundancy.

*Proof.* We will show the positive case first. If *<sup>s</sup> t*, then for any instance (*s* <sup>≈</sup> *t* <sup>∨</sup> *C*) · *θ* we either have *sθ* <sup>=</sup> *tθ*, and therefore ∅ |= (*s* <sup>≈</sup> *t*) · *θ*, or we have wlog. *sθ <sup>t</sup> tθ*, with *sθ* <sup>↓</sup> *tθ*. Then *sθ* and *tθ* can be rewritten to the same normal form *<sup>u</sup>* by *<sup>l</sup><sup>i</sup>σ<sup>i</sup>* <sup>→</sup>*r<sup>i</sup>σ<sup>i</sup>* where *<sup>l</sup><sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>* <sup>∈</sup> *<sup>S</sup>*. Since *<sup>u</sup>* <sup>≺</sup>*<sup>t</sup> sθ* and *<sup>u</sup> <sup>t</sup> tθ*, then (*<sup>s</sup>* <sup>≈</sup> *<sup>t</sup>*∨*C*)· *<sup>θ</sup>*

follows from smaller (*u* <sup>≈</sup> *u* <sup>∨</sup> *C*) · *θ*<sup>2</sup> (a tautology, i.e. follows from <sup>∅</sup>) and from the instances of clauses in *S* used to rewrite *sθ* <sup>→</sup> *u* <sup>←</sup> *tθ*. It only remains to show that these latter instances are also smaller than (*s* <sup>≈</sup> *t* <sup>∨</sup> *C*) · *θ*. Since we have assumed *sθ <sup>t</sup> tθ*, then at least one rewrite step must be done on *sθ*. Let *<sup>l</sup>*<sup>1</sup>*σ*<sup>1</sup> <sup>→</sup> *<sup>r</sup>*<sup>1</sup>*σ*<sup>1</sup> be the instance of the rule used for that step, with (*l*<sup>1</sup> <sup>≈</sup> *<sup>r</sup>*<sup>1</sup>)· *<sup>σ</sup>*<sup>1</sup> the closure that generates it. By Definition 1 and 2, one of the following holds:


As for the remaining steps, they are done on the smaller side *tθ* or on the other side after this first rewrite, which is smaller than *sθ*. Therefore all subsequent steps done by any *<sup>l</sup><sup>j</sup>σ<sup>j</sup>* <sup>→</sup> *<sup>r</sup><sup>j</sup>σ<sup>j</sup>* will have *<sup>r</sup><sup>j</sup>* · *<sup>σ</sup><sup>j</sup>* <sup>≺</sup>*tc <sup>l</sup><sup>j</sup>* · *<sup>σ</sup><sup>j</sup>* <sup>≺</sup>*tc <sup>s</sup>* · *<sup>θ</sup>* <sup>⇒</sup> (*l<sup>j</sup>* <sup>≈</sup> *<sup>r</sup><sup>j</sup>* ) · *<sup>σ</sup><sup>j</sup>* <sup>≺</sup>*cc* (*<sup>s</sup>* <sup>≈</sup> *<sup>t</sup>* <sup>∨</sup> *<sup>C</sup>*) · *<sup>θ</sup>*. As such, since this holds for all ground closures (*s* <sup>≈</sup> *t* <sup>∨</sup> *C*) · *θ*, then *s* <sup>≈</sup> *t* <sup>∨</sup> *C* is redundant wrt. *S*.

For the negative case, the proof is similar. We will conclude that (*s* ≈ *t*∨*C*)·*θ* follows from smaller (*l<sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>*) · *<sup>σ</sup><sup>i</sup>* <sup>∈</sup> GClos(*S*) and smaller (*<sup>u</sup>* ≈ *<sup>u</sup>* <sup>∨</sup> *<sup>C</sup>*) · *<sup>θ</sup>*. The latter, of course, follows from smaller *C* · *θ*, therefore *s* ≈ *t*<sup>∨</sup> *C* is redundant wrt. *S* ∪ {*C*}.

*Example 2.* If *S* <sup>=</sup> {*f*(*x, y*) <sup>≈</sup> *f*(*y, x*)}, then *f*(*x, f*(*y, z*)) <sup>≈</sup> *f*(*x, f*(*z,y*)) is redundant wrt. *S*. Note that *f*(*x, y*) <sup>≈</sup> *f*(*y, x*) is not orientable by any simplification ordering, therefore this cannot be justified by demodulation alone.

**Testing for Ground Joinability.** The general criterion presented above begs the question of how to test, in practice, whether *<sup>s</sup> t* in a clause *s*≈˙ *t*∨*C*. Several such algorithms have been proposed [1,18,21]. All of these are based on the observation that if we consider all total preorders *<sup>v</sup>* on Vars(*s, t*) and for all of them show strong joinability with a modified ordering—which we denote *t*[*v*] then we have shown strong *ground* joinability in the order *<sup>t</sup>* [18].

**Definition 3.** A simplification order on terms *<sup>t</sup> extended with* a preorder on variables *<sup>v</sup>*, denoted *<sup>t</sup>*[*v*], is a simplification preorder (i.e. satisfies all the relevant properties in Sect. 2) such that *<sup>t</sup>*[*v*] ⊇ *<sup>t</sup>* ∪ *v*.

*Example 3.* If *<sup>x</sup> <sup>v</sup> <sup>y</sup>*, then *<sup>g</sup>*(*x*) *t*[*v*] *<sup>g</sup>*(*y*), *<sup>g</sup>*(*x*) *t*[*v*] *<sup>y</sup>*, *<sup>f</sup>*(*x, y*) *t*[*v*] *<sup>f</sup>*(*y, x*), etc.

The simplest algorithm based on this approach would be to enumerate all possible total preorders *<sup>v</sup>* over Vars(*s, t*), and exhaustively reduce both sides

<sup>2</sup> Wlog. *uθ* <sup>=</sup> *u*, renaming variables in *u* if necessary.

via equations in *S* orientable by *t*[*v*], checking if the terms can be reduced to the same normal form for all total preorders. This is very inefficient since there are <sup>O</sup>(*n*!*en*) such total preorders [7], where *n* is the cardinality of Vars(*s, t*). Another approach is to consider only a smaller number of partial preorders, based on the obvious fact that *<sup>s</sup> t*[*v*] *t* ⇒ ∀ *<sup>v</sup>* <sup>⊇</sup>*v. s t*[*v*-] *t*, so that joinability under a smaller number of partial preorders can imply joinability under all the total preorders, necessary to prove ground joinability.

However, this poses the question of how to choose which partial preorders to check. Intuitively, for performance, we would like that whenever the two terms are *not* ground joinable, that some total preorder where they are not joinable is found as early as possible, and that whenever the two terms *are* joinable, that all total preorders are covered in as few partial preorders as possible.

*Example 4.* Let *S* <sup>=</sup> {*f*(*x, f*(*y, z*))≈*f*(*y, f*(*x, z*))}. Then *f*(*x, f*(*y, f*(*z,f*(*w, u*)))) <sup>≈</sup>*f*(*x, f*(*y, f*(*w, f*(*z, u*)))) can be shown to be ground joinable wrt. *S* by checking just three cases: *<sup>v</sup>* ∈ {*zw,z*∼*w,z*≺*w*}, even though there are 6942 possible preorders.

Waldmeister first tries all partial preorders relating two variables among Vars(*s, t*), then three, etc. until success, failure (by trying a total order and failing to join) or reaching a predefined limit of attempts [1]. Twee tries an arbitrary total strict order, then tries to weaken it, and repeats until all total preorders are covered [21]. We propose a novel algorithm—incremental ground joinability whose main improvement is *guiding* the process of picking which preorders to check by finding, during the process of searching for rewrites on subterms of the terms we are attempting to join, minimal extensions of the term order with a variable preorder which allow the rewrite to be done in the direction.

Our algorithm is summarised as follows. We start with an empty queue of variable preorders, *V* , initially containing only the empty preorder. Then, while *<sup>V</sup>* is not empty, we pop a preorder *<sup>v</sup>* from the queue, and attempt to perform a rewrite via an equation which is newly orientable by some extension *<sup>v</sup>* of *<sup>v</sup>*. That is, during the process of finding generalisations of a subterm of *s* or *t* among left-hand sides of candidate unoriented unit equations *l* <sup>≈</sup> *r*, when we check that the instance *lθ* <sup>≈</sup> *rθ* used to rewrite is oriented, we try to force this to be true under some minimal extension *t*[*v*-] of *t*[*v*], if possible. If no such rewrite exists, the two terms are not strongly joinable under *t*[*v*] or any extension, and so are not strongly ground joinable and we are done. If it exists, we exhaustively rewrite with *t*[*v*-], and check if we obtain the same normal form. If we do not obtain it yet, we repeat the process of searching rewrites via equations orientable by further extensions of the preorder. But if we do, then we have proven joinability in the extended preorder; now we must add back to the queue a set of preorders *<sup>O</sup>* such that all the total preorders which are <sup>⊇</sup>*<sup>v</sup>* (popped from the queue) but not ⊇ *<sup>v</sup>* (minimal extension under which we have proven joinability) are ⊇ of some *<sup>v</sup>* <sup>∈</sup> *<sup>O</sup>* (pushed back into the queue to be checked). Obtaining this *O* is implemented by order diff( *<sup>v</sup>, <sup>v</sup>*), defined below. Whenever there are no more preorders in the queue to check, then we have checked that the terms are strongly joinable under all possible total preorders, and we are done.

Together with this, some book-keeping for keeping track of completeness conditions is necessary. We know that for completeness to be guaranteed, the conditions in Definition <sup>1</sup> must hold. They automatically do if *C* is not a positive unit or if the rewrite happens on a strict subterm. We also know that after a term has been rewritten at least once, rewrites on that side are always complete (since it was rewritten to a smaller term). Therefore we store in the queue, together with the preorder, a flag in <sup>P</sup>({L*,* <sup>R</sup>}) indicating on which sides does a top rewrite need to be checked for completeness. Initially the flag is {L} if *<sup>s</sup> <sup>t</sup> <sup>t</sup>*, {R} if *<sup>s</sup>* <sup>≺</sup>*<sup>t</sup> <sup>t</sup>*, {L*,* <sup>R</sup>} if *<sup>s</sup>* and *<sup>t</sup>* are incomparable, and {} if the clause is not a positive unit. When a rewrite at the top is attempted (say, *l* <sup>≈</sup> *r* used to rewrite *s* <sup>=</sup> *lθ* with *t* being the other side), if the flag for that side is set, then we check if *lθ l* or *rθ* <sup>≺</sup> *t*. If this fails, the rewrite is rejected. Whenever a side is rewritten (at any position), the flag for that side is cleared.

The definition of order diff is as follows. Let the transitive reduction of be represented by a set of links of the form *xy* / *x*∼*y*.

$$\text{order.diff}(\succeq\_1, \succeq\_2) = \{\succeq^+ | \succeq \in \text{order.diff}'(\succeq\_1, \succeq\_2^-) \},\tag{11a}$$

$$\text{order.diff}'(\succeq\_1, \succeq\_2^-) = \{\mboxarrow\} \tag{11b}$$

$$\begin{cases} \succeq\_{2}^{-} = \{x \succ y\} \Downarrow \succeq\_{2}^{-'} \Rightarrow \begin{cases} x \succ\_{1} y \Rightarrow \text{order.diff}'(\succeq\_{1}, \succeq\_{2}^{-'})\\ x \not\succ\_{1} y \Rightarrow \begin{cases} \succeq\_{1} \cup \{y \succ x\}, \succeq\_{1} \cup \{x \sim y\} \} \end{cases} \\ x \not\succ\_{1} y \Rightarrow \begin{cases} x \succ\_{1} y \Rightarrow \text{order.diff}'(\succeq\_{1} \cup \{x \succ y\}, \succeq\_{2}^{-'})\\ x \sim\_{1} y \Rightarrow \text{order.diff}'(\succeq\_{1}, \succeq\_{2}^{-'})\\ x \not\succ\_{1} y \Rightarrow \begin{cases} \succeq\_{1} \cup \{x \succ y\}, \succeq\_{1} \cup \{y \succ x\} \end{cases} \end{cases} \end{cases} \end{cases}$$
 
$$\succeq\_{2}^{-} = \{x \sim y\} \Downarrow \begin{cases} x \sim\_{1} y \Rightarrow \begin{cases} \succeq\_{1} \cup \{x \succ y\}, \succeq\_{1} \cup \{y \succ x\} \} \\ x \not\succ\_{1} y \Rightarrow \begin{cases} \succeq\_{1} \cup \{x \succ y\}, \succeq\_{2} \uriff \end{cases} \end{cases} \right\}$$

where <sup>1</sup> ⊆ 2. In other words, we take a transitive reduction of <sup>2</sup>, and for all links in that reduction which are not part of <sup>1</sup>, we return orders <sup>1</sup> augmented with the reverse of and recurse with <sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>∪</sup> .

*Example 5.*


**Theorem 3.** For all total *<sup>T</sup> <sup>v</sup>* ⊇ 1, there exists one and only one *<sup>i</sup>* ∈ { 2} ∪ order diff( <sup>1</sup>*,* <sup>2</sup>) such that *<sup>T</sup> <sup>v</sup>* <sup>⊇</sup>*i*. For all *<sup>T</sup> <sup>v</sup>* <sup>1</sup>, there is no *<sup>i</sup>* ∈ { 2} ∪ order diff( <sup>1</sup>*,* <sup>2</sup>) such that *<sup>T</sup> <sup>v</sup>* ⊇ *i*.

*Proof.* See full version of the paper [12].

An algorithm based on searching for rewrites in minimal extensions of a variable preorder (starting with minimal extensions of the bare term ordering, *t*[∅]), has several advantages. The main benefit of this approach is that, instead of imposing an a priori ordering on variables and then checking joinability under that ordering, we instead build a minimal ordering *while* searching for candidate unit equations to rewrite subterms of *s, t*. For instance, if two terms are *not* ground joinable, or not even rewritable in any *t*[*v*] where it was not rewritable in *t*, then an approach such as the one used in Avenhaus, Hillenbrand and L¨ochner [1] cannot detect this until it has extended the preorder arbitrarily to a total ordering, while our incremental algorithm immediately realises this. We should note that empirically this is what happens in most cases: most of the literals we check during a run are *not* ground joinable, so for practical performance it is essential to optimise this case.

**Theorem 4.** Algorithm <sup>1</sup> returns "Success" only if *s <sup>t</sup>* in *<sup>C</sup>* wrt. *<sup>S</sup>*. 3

*Proof.* We will show that Algorithm <sup>1</sup> returns "Success" if and only if *<sup>s</sup> t*[*vT* ] *t* for all total *<sup>T</sup> <sup>v</sup>* over Vars(*s, t*), which implies *<sup>s</sup> <sup>t</sup> <sup>t</sup>*.

When *<sup>v</sup>, s, t, c* is popped from *<sup>V</sup>* , we exhaustively reduce *s, t* via equations in *<sup>S</sup>* oriented wrt. *t*[*v*], obtaining *<sup>s</sup><sup>r</sup>, t<sup>r</sup>*. If *<sup>s</sup><sup>r</sup>* <sup>∼</sup>*t*[*v*] *<sup>t</sup> <sup>r</sup>*, then *<sup>s</sup> t*[*v*] *t*, and so *s t*[*vT* ] *t* for all total *<sup>T</sup> <sup>v</sup>* <sup>⊇</sup>*v*. If *<sup>s</sup><sup>r</sup> t*[*v*] *<sup>t</sup> <sup>r</sup>*, we will attempt to rewrite one of *s<sup>r</sup>, t<sup>r</sup>* using *some* extended *t*[*v*-] where *<sup>v</sup>* ⊃ *v*. If this is impossible, then *<sup>s</sup> t*[*v*-] *t* for any *<sup>v</sup>* <sup>⊇</sup>*v*, and therefore there exists at least one total *<sup>T</sup> <sup>v</sup>* such that *<sup>s</sup> <sup>T</sup> <sup>v</sup> <sup>t</sup>*, and we return "Fail".

If this is possible, then we repeat the process: we exhaustively reduce wrt. *t*[*v*-], obtaining *s , t* . If *s t*[*v*-] *t* , then we start again the process from the step where we attempt to rewrite via an extension of *<sup>v</sup>*: we either find a rewrite with some *t*[*v*--] with *<sup>v</sup>* ⊃ *<sup>v</sup>*, and exhaustively normalise wrt. *t*[*v*--] obtaining *s, t*, etc., or we fail to do so and return "Fail".

If in any such step (after exhaustively normalising wrt. *t*[*v*-]) we find *s* <sup>∼</sup>*t*[*v*-] *t* , then *<sup>s</sup> t*[*v*-] *<sup>t</sup>*, and so *<sup>s</sup> t*[*vT* ] *t* for all total *<sup>T</sup> <sup>v</sup>* ⊇ *<sup>v</sup>*. Now at this point we must add back to the queue a set of preorders *v i* such that: for all total *T <sup>v</sup>* <sup>⊇</sup>*v*, either *<sup>T</sup> <sup>v</sup>* ⊇ *<sup>v</sup>* (proven to be ) or *<sup>T</sup> <sup>v</sup>* ⊇ some *v i* (added to *<sup>V</sup>* to be checked). For efficiency, we would also like for there to be no overlap: no total *<sup>T</sup> <sup>v</sup>* ⊇ *<sup>v</sup>* is an extension of more than one of { *<sup>v</sup>, <sup>v</sup>* <sup>1</sup>*,...*}.

This is true because of Theorem 3. So we add { *v i, s<sup>r</sup>, t<sup>r</sup>, c<sup>r</sup>*| *v i* ∈ order diff( *<sup>v</sup>, <sup>v</sup>*)} to *V* , where *c<sup>r</sup>* <sup>=</sup> *c* \ (if *s<sup>r</sup>* <sup>=</sup> *s* then {L} else {}) \(if *t <sup>r</sup>* <sup>=</sup> *<sup>t</sup>* then {R} else {}). Note also that *<sup>s</sup> t*[*v*] *<sup>s</sup><sup>r</sup>* and *<sup>t</sup> t*[*v*] *t <sup>r</sup>*, therefore also *s t*[*vi*--] *<sup>s</sup><sup>r</sup>* and *<sup>t</sup> t*[*vi*--] *t <sup>r</sup>* if *v i* ⊃ *v*.

<sup>3</sup> Note that the other direction may not always hold, there are strongly ground joinable terms which are not detected by this method of analysing all preorders between variables, e.g. *f*(*x, g*(*y*)) *<sup>f</sup>*(*g*(*y*)*, x*) wrt. *<sup>S</sup>* <sup>=</sup> {*f*(*x, y*) <sup>≈</sup> *<sup>f</sup>*(*y, x*)}.


**Input:** literal *s* <sup>≈</sup>˙ *t* <sup>∈</sup> *C*; set of unorientable equations *S* **Output:** whether *s <sup>t</sup>* in *<sup>C</sup>* wrt. *<sup>S</sup>* **begin** *c* ← ∅ if *C* is not pos. unit, {L} if *s t*, {R} if *s* <sup>≺</sup> *t*, {L*,* <sup>R</sup>} otherwise *V* ← { <sup>∅</sup>*, s, t, c*} **while** *V* is not empty **do** *<sup>v</sup>, s, t, c*<sup>←</sup> pop from *V s, t* <sup>←</sup> normalise *s, t* wrt. *t*[*v*], with completeness flag *<sup>c</sup> c* <sup>←</sup> *c* \ ({L} if *s* was changed) \ ({R} if *t* was changed) **if** *<sup>s</sup>* <sup>∼</sup>*t*[*v*] *<sup>t</sup>* **then continue else** *s*- *, t*- *, c*- <sup>←</sup> *s, t, c* **while** there exists *l* <sup>≈</sup> *r* <sup>∈</sup> *S* that can rewrite *s* or *t* wrt. some -- *<sup>v</sup>* ⊃ *<sup>v</sup>*, with completeness flag *c* **do** *s*- *, t*- <sup>←</sup> normalise *s*- *, t* wrt. *t*[*v*-], with completeness flag *c c*- <sup>←</sup> *c*- \ ({L} if *s* was changed) \ ({R} if *t* was changed) **if** *s*- ∼*t*[*v*-] *t* **then for** --- *<sup>v</sup>* in order diff(*<sup>v</sup>,* -- *<sup>v</sup>*) **do** push --- *<sup>v</sup> , s, t, c* to *<sup>V</sup>* **break end** *<sup>v</sup>* ← -- *v* **else return** Fail **end end else return** Success **end end where** rewriting *u* in *s, t* wrt. with completeness flag *c* succeeds **if** (i) *u* is a strict subterm of *s* or *t*, (ii) *u* <sup>=</sup> *s* with <sup>L</sup> <sup>∈</sup>*/ c*, (iii) *u* <sup>=</sup> *t* with <sup>R</sup> <sup>∈</sup>*/ c*, (iv) instance *lσ* <sup>≈</sup> *rσ* used to rewrite has *l u*, (v) *u* <sup>=</sup> *s* with *rσ* <sup>≺</sup> *t*, (vi) or *u* <sup>=</sup> *t* with *rσ* <sup>≺</sup> *s*. **end**

During this whole process, any rewrites must pass a completeness test mentioned previously, such that the conditions in the definition of hold. Let *<sup>s</sup>*<sup>0</sup>*, t*<sup>0</sup> be the original terms and *s, t* be the ones being rewritten and *c* the completeness flag. If the rewrite is at a strict subterm position, it succeeds by Definition 2. If the rewrite is at the top, then we check *c*. If <sup>L</sup> is unset (<sup>L</sup> <sup>∈</sup>*/ c*), then either *<sup>s</sup> <sup>s</sup>*<sup>0</sup> <sup>≺</sup> *<sup>t</sup>*<sup>0</sup> or *<sup>s</sup>* <sup>≺</sup> *<sup>s</sup>*<sup>0</sup> or the clause is not a positive unit, so we allow a rewrite at the top of *s*, again by Definition 2. If <sup>L</sup> is set (<sup>L</sup> <sup>∈</sup> *c*), then an explicit check must be done: we allow a rewrite at the top of *s* (= *s*<sup>0</sup>) iff it is done by *lσ* <sup>→</sup> *rσ* with *lσ l* or *rσ* <sup>≺</sup> *t*<sup>0</sup>. Respectively for <sup>R</sup>, with the roles of *<sup>s</sup>* and *<sup>t</sup>* swapped.

In short, we have shown that if *v, s , t , c* is popped from *V* , then *V* is only ever empty, and so the algorithm only terminates with "Success", if *<sup>s</sup> t*[*vT* ] *t* for all total *<sup>T</sup> <sup>v</sup>* <sup>⊇</sup>*v*. Since *<sup>V</sup>* is initialised with ∅*, s, t, c*, then the algorithm only returns "Success" if *<sup>s</sup> t*[*vT* ] *t* for all total *<sup>T</sup> <sup>v</sup>* .

**Orienting via Extension of Variable Ordering.** In order to apply the ground joinability algorithm we need a way to check, for a given *<sup>t</sup>* and *<sup>v</sup>* and some *s, t*, whether there exists a *<sup>v</sup>* <sup>⊃</sup>*<sup>v</sup>* such that *<sup>s</sup> t*[*v*-] *<sup>t</sup>*. Here we show how to do this when *<sup>t</sup>* is a Knuth-Bendix Ordering (KBO) [15].

Recall the definition of KBO. Let *<sup>s</sup>* be a partial order on symbols, *<sup>w</sup>* be an N-valued weight function on symbols and variables, with the property that <sup>∃</sup>*m* <sup>∀</sup>*x* ∈ V*. w*(*x*) = *m*, *w*(*c*) <sup>≥</sup> *m* for all constants *c*, and there may only exist one unary symbol *<sup>f</sup>* with *<sup>w</sup>*(*f*) = 0 and in this case *<sup>f</sup> <sup>s</sup> <sup>g</sup>* for all other symbols *<sup>g</sup>*. For terms, their weight is *<sup>w</sup>*(*f*(*s*1*,...*)) = *<sup>w</sup>*(*f*) + *<sup>w</sup>*(*s*<sup>1</sup>) + ··· . Let also <sup>|</sup>*s*|*<sup>x</sup>* be the number of occurrences of *x* in *s*. Then

$$f(s\_1, \ldots) \succ\_{\text{KBO}} g(t\_1, \ldots) \quad \text{iff} \quad \begin{cases} \text{either } w(f(s\_1, \ldots)) > w(g(t\_1, \ldots)), \\ \text{ or } w(f(s\_1, \ldots)) = w(g(t\_1, \ldots)) \\ \text{and } f \succ\_s g, \\ \text{or } w(f(s\_1, \ldots)) = w(g(t\_1, \ldots)) \\ \text{and } f = g, \\ \text{and } s\_1, \ldots, \succ\_{\text{KBO}} \mathbf{1}\_1, \ldots, \vdots \\ \text{and } \forall x \in \mathsf{V}. \ |f(\cdot, \ldots)|\_x \geq |g(\cdot, \ldots)|\_x. \\ \text{if } \quad |f(s\_1, \ldots)|\_x \geq 1. \\ \text{ and } \mathbf{\color{array}{l}} \mathbf{\color{array}{l} |f(s\_1, \ldots)|\_x \geq 1. \end{cases} \tag{12b}$$

The conditions on variable occurrences ensure that *<sup>s</sup>* KBO *<sup>t</sup>* ⇒ ∀*θ. sθ* KBO *tθ*.

When we extend the order KBO with a variable preorder *<sup>v</sup>*, the starting point is that *<sup>x</sup> <sup>v</sup> <sup>y</sup>* <sup>⇒</sup> *<sup>x</sup>* KBO[*v*] *<sup>y</sup>* and *<sup>x</sup>* <sup>∼</sup>*<sup>v</sup> <sup>y</sup>* <sup>⇒</sup> *<sup>x</sup>* <sup>∼</sup>KBO[*v*] *<sup>y</sup>*. Then, to ensure that all the properties of a simplification order (included the one mentioned above) hold, we arrive at the following definition (similar to [1]).

$$f(s\_1, \ldots) \succ\_{\text{KBO}[v]} g(t\_1, \ldots) \quad \text{iff} \quad \begin{cases} \text{either } w(f(\ldots)) > w(g(\ldots)), \\ \text{ or } w(f(s\_1, \ldots)) = w(g(t\_1, \ldots)) \\ \text{and } f \succ\_s g, \\ \text{or } w(f(s\_1, \ldots)) = w(g(t\_1, \ldots)) \\ \text{and } f = g, \\ \text{and } s\_1, \ldots, \succ\_{\text{KBO}[v]\_{\text{lex}}} t\_1, \ldots; \vdots \\ \text{and } \forall x \in \mathcal{V}. \sum\_{y \succeq\_{x} x} f(\ldots)|\_y \\ \ge \sum\_{y \succeq\_{x} x} |g(\ldots)|\_y. \end{cases} \tag{13a}$$
  $f(s\_1, \ldots) \succeq\_{\text{VBO}[v]} x \qquad \text{iff} \quad \exists y \succeq\_{x} x, \ |f(s\_1, \ldots)|\_y \ge 1. \tag{13b}$ 

$$\begin{array}{llll} f(s\_1, \ldots) \succ\_{\text{KBO}[v]} x & \text{iff} & \exists y \succeq\_{v} x. \; |f(s\_1, \ldots)|\_y \ge 1. & \text{(13b)}\\ x \succ\_{\text{KBO}[v]} y & \text{iff} & x \succ\_{v} y. \end{array} \tag{13b}$$

To check whether there exists a *<sup>v</sup>* <sup>⊃</sup>*<sup>v</sup>* such that *<sup>s</sup>* KBO[*v*-] *<sup>t</sup>*, we need to check whether there are some *<sup>x</sup><sup>y</sup>* or *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>* relations that we can add to *<sup>v</sup>* such that all the conditions above hold (and such that it still remains a valid preorder). Let us denote "there exists a *<sup>v</sup>* <sup>⊃</sup>*<sup>v</sup>* such that *<sup>s</sup>* KBO[*v*-] *<sup>t</sup>*" by *s* KBO[*v,v*-] *<sup>t</sup>*. Then the definition is

$$f(s\_1, \ldots) \succ\_{\text{KBO}[v, v']} g(t\_1, \ldots) \quad \text{iff} \quad \begin{cases} \text{either } w(f(s\_1, \ldots)) > w(g(t\_1, \ldots)), \\ \text{ or } w(f(s\_1, \ldots)) = w(g(t\_1, \ldots)) \\ \text{and } f \succ\_s g, \\ \text{or } w(f(s\_1, \ldots)) = w(g(t\_1, \ldots)) \\ \text{and } f = g, \\ \text{and } \pm\_1, \ldots, \text{KBO}[\text{lex}\_v \{t\_1, \ldots\}] \\ \text{and } \pm\_1, y\_1, \ldots \\ \text{such that } \forall x \in \mathcal{V}, \sum\_{y \ge\_u x} [f(\ldots)] y \\ \ge \sum\_{y \ge\_u x} [g(\ldots)]\_y, \\ \text{and } \forall x \in \mathcal{V}, \sum\_{y \ge\_u x} [f(\ldots)]\_y \\ \text{and } \sum\_{y' \ge\_u} [f(\ldots)]\_y \ge 1, \\ \text{with } \ge\_v' = \ge\_v \cup \{x \succeq y\} \\ \text{or } \ge\_v' = \ge\_v \cup \{x = y\}. \end{cases} \tag{14b}$$

This check can be used in Algorithm 1 for finding extensions of variable orderings that orient rewrite rules allowing required normalisations.

#### **4.3 Connectedness**

Testing for joinability (i.e. demodulating to *s* <sup>≈</sup> *s* or *s* ≈ *s*) and ground joinability (presented in the previous section) require that each step in proving them is done via an oriented instance of an equation in the set. However, we can weaken this restriction, if we also change the notion of redundancy being used.

As criteria for redundancy of a clause, finding either joinability or ground joinability of a literal in the clause means that the clause can be deleted or the literal removed from the clause (in case of a positive or negative literal, resp.) in any context, that is, we can for example add them to a set of deleted clauses, and for any new clause, if it appears in that set, then immediately remove it since we already saw that it is redundant. The criterion of connectedness [3,21], however, is a criterion for redundancy of *inferences*. This means that a conclusion simplified by this criterion can be deleted (or rather, not added), but in that context only; if it ever comes up again as a conclusion of a different inference, then it is not necessarily also redundant. Connectedness was introduced in the context of equational completion, here we extend it to general clauses and show that it is a redundancy in the superposition calculus.

**Definition 4.** Terms *s* and *t* are *connected* under clauses *U* and unifier *ρ* wrt. a set of equations *S* if there exist terms *v*<sup>1</sup>*,...,v<sup>n</sup>*, equations *<sup>l</sup>*<sup>1</sup> <sup>≈</sup> *<sup>r</sup>*<sup>1</sup>*,...,l<sup>n</sup>*−<sup>1</sup> <sup>≈</sup> *<sup>r</sup><sup>n</sup>*−<sup>1</sup>, and substitutions *<sup>σ</sup>*<sup>1</sup>*,...,σ<sup>n</sup>*−<sup>1</sup> such that:


**Theorem 5.** Superposition inferences of the form

$$\frac{l \approx r \lor C \quad s[u] \approx t \lor D}{l \; (s[u \mapsto r] \approx t \lor C \lor D)\rho}, \quad \begin{array}{l} \text{where } \rho = \text{mgu}(l, u), \\\ l\rho \not\le r\rho, \; s\rho \not\le t\rho, \\\ \text{and } u \text{ not a variable}, \end{array} \tag{15}$$

where *s*[*u* <sup>→</sup> *r*]*ρ* and *tρ* are connected under {*l* <sup>≈</sup> *r* <sup>∨</sup> *C, s* <sup>≈</sup> *t* <sup>∨</sup> *D*} and unifier *ρ* wrt. some set of clauses *S*, are redundant inferences wrt. *S*.

*Proof.* Let us denote *s* <sup>=</sup> *<sup>s</sup>*[*<sup>u</sup>* <sup>→</sup> *r*]. Let also *U* <sup>=</sup> {*l* <sup>≈</sup> *r* <sup>∨</sup> *C, s* <sup>≈</sup> *t* <sup>∨</sup> *D*} and *M* <sup>=</sup> - *C*∈*U* - *<sup>p</sup>*≈˙ *<sup>q</sup>*∈*<sup>C</sup>* {*p, q*}. We will show that if *<sup>s</sup> ρ* and *tρ* are connected under *U* and *ρ*, by equations in *S*, then every instance of that inference obeys the condition for closure redundancy of an inference (see, Sect. 4), wrt. *S*.

Consider any (*s* <sup>≈</sup> *t* <sup>∨</sup> *C* <sup>∨</sup> *D*)*ρ* · *θ* where *θ* <sup>∈</sup> GSubs(*U ρ*). Either *s ρθ* <sup>=</sup> *tρθ*, and we are done (it follows from <sup>∅</sup>), or *s ρθ tρθ*, or *s ρθ* <sup>≺</sup> *tρθ*.

Consider the case *s ρθ tρθ*. For all *i* <sup>∈</sup> <sup>1</sup>*,...,n*−1, there exists a *C* <sup>∈</sup> *U* and <sup>a</sup> *<sup>w</sup>* <sup>∈</sup> *<sup>C</sup>* such that either (iii.a) *<sup>l</sup><sup>i</sup>σ<sup>i</sup><sup>θ</sup>* <sup>≺</sup> *wρθ*, or (iii.b) *<sup>l</sup><sup>i</sup>σ<sup>i</sup><sup>θ</sup>* <sup>=</sup> *wρθ* and *<sup>l</sup><sup>i</sup> <sup>v</sup>*, or (iii.b) *l<sup>i</sup>σ<sup>i</sup><sup>θ</sup>* <sup>=</sup> *wρθ* and *<sup>C</sup>* is not a positive unit. Likewise for *<sup>r</sup><sup>i</sup>*. Therefore, for all *<sup>i</sup>* <sup>∈</sup> <sup>1</sup>*,...,n* <sup>−</sup> 1, there exists a *<sup>C</sup>* <sup>∈</sup> *<sup>U</sup>* such that (*l<sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>*) · *<sup>σ</sup><sup>i</sup><sup>θ</sup>* <sup>≺</sup> *<sup>C</sup>* · *ρθ*. Since (*t* <sup>≈</sup> *t* ∨···)*ρ* · *θ* is also smaller than (*s* <sup>≈</sup> *t* ∨···)*ρ* · *θ* and a tautology, then the instance (*s* <sup>≈</sup> *t* ∨···)*ρ* · *θ* of the conclusion follows from closures in GClos(*S*) such that each is smaller than one of (*l* <sup>≈</sup> *r* <sup>∨</sup> *C*) · *ρθ*, (*s* <sup>≈</sup> *t* <sup>∨</sup> *D*) · *ρθ*.

In the case that *s ρθ* <sup>≺</sup> *tρθ*, the same idea applies, but now it is (*s* <sup>≈</sup> *s* ∨···)*ρ* · *θ* which is smaller than (*s* <sup>≈</sup> *t* ∨···)*ρ* · *θ* and is a tautology.

Therefore, we have shown that for all *θ* <sup>∈</sup> GSubs((*l* <sup>≈</sup> *r* <sup>∨</sup>*C*)*ρ,* (*s* <sup>≈</sup> *t*<sup>∨</sup> *D*)*ρ*), the instance (*s* <sup>≈</sup> *t*∨···)*ρ*·*θ* of the conclusion follows from closures in GClos(*S*) which are all smaller than one of (*l* <sup>≈</sup> *r* <sup>∨</sup> *C*) · *ρθ,* (*s* <sup>≈</sup> *t* <sup>∨</sup> *D*) · *ρθ*. Since any valid superposition inference with ground clauses has to have *l* <sup>=</sup> *u*, then any *θ* <sup>∈</sup> GSubs(*<sup>l</sup>* <sup>≈</sup> *<sup>r</sup>* <sup>∨</sup> *C, s* <sup>≈</sup> *<sup>t</sup>* <sup>∨</sup> *D,* (*s* <sup>≈</sup> *<sup>t</sup>* <sup>∨</sup> *<sup>C</sup>* <sup>∨</sup> *<sup>D</sup>*)*ρ*) such that the inference (*l* <sup>≈</sup> *r* <sup>∨</sup> *C*)*θ ,*(*s* <sup>≈</sup> *t* <sup>∨</sup> *D*)*θ* |− (*s* <sup>≈</sup> *t* <sup>∨</sup> *C* <sup>∨</sup> *D*)*ρθ* is valid must have *θ* <sup>=</sup> *ρθ*, since *<sup>ρ</sup>* is the most general unifier. Therefore, we have shown that for all *θ* <sup>∈</sup> GSubs(*<sup>l</sup>* <sup>≈</sup> *<sup>r</sup>* <sup>∨</sup> *C, s* <sup>≈</sup> *<sup>t</sup>* <sup>∨</sup> *D,* (*s* <sup>≈</sup> *<sup>t</sup>* <sup>∨</sup> *<sup>C</sup>* <sup>∨</sup> *<sup>D</sup>*)*ρ*) for which (*l* <sup>≈</sup> *r* <sup>∨</sup> *C*)*θ ,*(*s* <sup>≈</sup> *t* <sup>∨</sup> *D*)*θ* |− (*s* <sup>≈</sup> *t* <sup>∨</sup> *C* <sup>∨</sup> *D*)*ρθ* is a valid superposition inference, the instance (*s* <sup>≈</sup> *t* ∨···)*ρ* · *θ* of the conclusion follows from closures in GClos(*S*) which are all smaller than one of (*l* <sup>≈</sup> *r* <sup>∨</sup> *C*)· *θ ,* (*s* <sup>≈</sup> *t* <sup>∨</sup> *D*)· *θ* , so the inference is redundant.

<sup>4</sup> That is, in the set of top-level terms of literals of clauses in *U*.

**Theorem 6.** Superposition inferences of the form

$$\frac{l \approx r \lor C \quad s[u] \not\not\approx t \lor D}{(s[u \mapsto r] \not\approx t \lor C \lor D)\rho}, \quad \begin{array}{ll} \text{where } \rho = \text{mgu}(l, u), \\ \text{l}\rho \not\not\le r\rho, \; s\rho \not\le t\rho, \\ \text{and } u \text{ not a variable}, \end{array} \tag{16}$$

where *s*[*u* <sup>→</sup> *r*]*ρ* and *tρ* are connected under {*l* <sup>≈</sup> *r* <sup>∨</sup> *C, s* ≈ *t* <sup>∨</sup> *D*} and unifier *ρ* wrt. some set of clauses *S*, are redundant inferences wrt. *S* ∪ {(*C* <sup>∨</sup> *D*)*ρ*}.

*Proof.* Analogously to the previous proof, we find that for all instances of the inference, the closure (*s* ≈ *t*∨···)*ρ*·*θ* follows from smaller closure (*t* ≈ *t*∨···)*ρ*·*θ* or (*s* ≈ *<sup>s</sup>* ∨···)*<sup>ρ</sup>* · *<sup>θ</sup>* and closures (*l<sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>*)· *<sup>σ</sup><sup>i</sup><sup>θ</sup>* smaller than max{(*<sup>l</sup>* <sup>≈</sup> *<sup>r</sup>* <sup>∨</sup> *<sup>C</sup>*)· *θ ,* (*s* ≈ *t*∨*D*)·*θ ,* (*s* ≈ *<sup>t</sup>*∨*C*∨*D*)*ρ*·*θ*}. But (*<sup>t</sup>* ≈ *<sup>t</sup>*∨*C*∨*D*)*ρ*·*<sup>θ</sup>* and (*s* ≈ *<sup>s</sup>* <sup>∨</sup>*C*∨*D*)*ρ*·*θ* both follow from smaller (*C* <sup>∨</sup> *D*)*ρ* · *θ*, therefore the inference is redundant wrt. *S* ∪ {(*C* <sup>∨</sup> *D*)*ρ*}.

#### **4.4 Ground Connectedness**

Just as joinability can be generalised to ground joinability, so can connectedness be generalised to ground connectedness. Two terms *s, t* are *ground connected* under *U* and *ρ* wrt. *S* if, for all *θ* <sup>∈</sup> GSubs(*s, t*), *sθ* and *tθ* are connected under *D* and *ρ* wrt. *S*. Analogously to strong ground joinability, we have that if *s* and *t* are connected using *t*[*v*] for all total *<sup>v</sup>* over Vars(*s, t*), then *<sup>s</sup>* and *<sup>t</sup>* are ground connected.

**Theorem 7.** Superposition inferences of the form

$$\frac{l \approx r \lor C \quad s[u] \approx t \lor D}{l \; (s[u \mapsto r] \approx t \lor C \lor D)\rho}, \quad \begin{array}{l} \text{where } \rho = \text{mgu}(l, u), \\\ l\rho \not\le r\rho, \; s\rho \not\le t\rho, \\\ \text{and } u \text{ not a variable}, \end{array} \tag{17}$$

where *s*[*u* <sup>→</sup> *r*]*ρ* and *tρ* are ground connected under {*l* <sup>≈</sup> *r* <sup>∨</sup> *C, s* <sup>≈</sup> *t* <sup>∨</sup> *D*} and unifier *ρ* wrt. some set of clauses *S*, are redundant inferences wrt. *S*.

**Theorem 8.** Superposition inferences of the form

$$\frac{l \approx r \lor C \quad s[u] \not\not\approx t \lor D}{(s[u \mapsto r] \not\approx t \lor C \lor D)\rho}, \quad \begin{array}{l} \text{where } \rho = \text{mgu}(l, u), \\ \not\not\rho \not\ge r\rho, \; s\rho \not\not\le t\rho, \\ \text{and } u \text{ not a variable}, \end{array} \tag{18}$$

where *s*[*u* <sup>→</sup> *r*]*ρ* and *tρ* are ground connected under {*l* <sup>≈</sup> *r* <sup>∨</sup> *C, s* ≈ *t* <sup>∨</sup> *D*} and unifier *ρ* wrt. some set of clauses *S*, are redundant inferences wrt. *S*∪{(*C* <sup>∨</sup>*D*)*ρ*}.

*Proof.* The proof of Theorem 7 and 8 is analogous to that of Theorem 5 and 6. The weakening of connectedness to ground connectedness only means that the proof of connectedness (e.g. the *<sup>v</sup><sup>i</sup>*, *<sup>l</sup><sup>i</sup>* <sup>≈</sup> *<sup>r</sup><sup>i</sup>*, *<sup>σ</sup><sup>i</sup>*) may be different for different ground instances. For all the steps in the proof to hold we only need that for all the instances *θ* <sup>∈</sup> GSubs(*l* <sup>≈</sup> *r* <sup>∨</sup> *C,s* <sup>≈</sup>˙ *t* <sup>∨</sup> *D ,* (*s*[*u* <sup>→</sup> *r*] <sup>≈</sup>˙ *t* <sup>∨</sup> *C* <sup>∨</sup> *D*)*ρ*) of the inference, *θ* <sup>=</sup> *σθ* with *<sup>σ</sup>* <sup>∈</sup> GSubs(*s*[*<sup>u</sup>* <sup>→</sup> *r*]*ρ, tρ*), which is true.

Discussion about the strategy for implementation of connectedness and ground connectedness is outside the scope of this paper.

#### **5 Evaluation**

We implemented ground joinability in a theorem prover for first-order logic, iProver [10,16].<sup>5</sup> iProver combines superposition, Inst-Gen, and resolution calculi. For superposition, iProver implements a range of simplifications including encompassment demodulation, AC normalisation [10], light normalisation [16], subsumption and subsumption resolution. We run our experiments over FOF problems of the TPTP v7.5 library [23] (17 348 problems) on a cluster of Linux servers with 3 GHz 11 core CPUs, 128 GB memory, with each problem running on a single core with a time limit of 300 s. We used a default strategy (which has not yet been fine-tuned after the introduction of ground joinability), with superposition enabled and the rest of the components disabled. With ground joinability enabled, iProver solved 133 problems more which it did not solve without ground joinability. Note that this excludes the contribution of AC ground joinability or encompassment demodulation [11] (always enabled).

Some of the problems are not interesting for this analysis because ground joinability is not even tried, either because they are solved before superposition saturation begins, or because they are ground. If we exclude these, we are left with 10 005 problems. Ground joinability is successfully used to eliminate clauses in 3057 of them (30*.*6%, Fig. 1a). This indicates that ground joinability is useful in many classes of problems, including in non-unit problems where it previously had never been used.

**Fig. 1.** (a) Clauses simplified by ground joinability. (b) % of runtime spent in gr. joinability

In terms of the performance impact of enabling ground joinability, we measure that among problems whose runtime exceeds 1 s, only in 72 out of 8574 problems does the time spent inside the ground joinability algorithm exceed 20% of runtime, indicating that our incremental algorithm is efficient and suitable for broad application (Fig. 1b).

<sup>5</sup> iProver is available at http://www.cs.man.ac.uk/∼korovink/iprover.

TPTP classifies problems by rating in [0*,*1]. Problems with rating <sup>≥</sup>0*.*9 are considered to be very challenging. Problems with rating 1*.*0 have never been solved by any automated theorem prover. iProver using ground joinability solves 3 previously unsolved rating 1*.*0 problems, and 7 further problems with rating in [0*.*9*,*1*.*0[ (Table 1). We note that some of these latter (e.g. LAT140-1, ROB018-10, REL045-1) were previously only solved by UEQ or SMT provers, but not by any full first-order prover.

**Table 1.** Hard or unsolved problems in TPTP, solved by iProver with ground joinability.


#### **6 Conclusion and Further Work**

In this work we extended the superposition calculus with ground joinability and connectedness, and proved that these rules preserve completness using a modified notion of redundancy, thus bringing for the first time these techniques for use in full first-order logic problems. We have also presented an algorithm for checking ground joinability which attempts to check as few variable preorders as possible.

Preliminary results show three things: (1) ground joinability is applicable in a sizeable number of problems across different domains, including in non-unit problems (where it was never applied before), (2) our proposed algorithm for checking ground joinability is efficient, with over <sup>3</sup> <sup>4</sup> of problems spending less than 1% of runtime there, and (3) application of ground joinability in the superposition calculus of iProver improves overall performance, including discovering solutions to hitherto unsolved problems.

These results are promising, and further optimisations can be done. Immediate next steps include fine-tuning the implementation, namely adjusting the strategies and strategy combinations to make full use of ground joinability and connectedness. iProver uses a sophisticated heuristic system which has not yet been tuned for ground joinability and connectedness [14].

In terms of practical implementation of connectedness and ground connectedness, further research is needed on the interplay between those (criteria for redundancy of inferences) and joinability and ground joinability (criteria for redundancy of clauses).

On the theoretical level, recent work [24] provides a general framework for saturation theorem proving, and we will investigate how techniques developed in this paper can be incorporated into this framework.

# **References**


2008. LNCS (LNAI), vol. 5195, pp. 292–298. Springer, Heidelberg (2008). https:// doi.org/10.1007/978-3-540-71070-7 24


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Connection-Minimal Abduction in** *EL* **via Translation to FOL**

Fajar Haifani1,2(B) , Patrick Koopmann3(B) , Sophie Tourret1,4(B) , and Christoph Weidenbach1(B)

<sup>1</sup> Max-Planck-Institut f¨ur Informatik, Saarland Informatics Campus, Saarbr¨ucken, Germany *{*f.haifani,c.weidenbach*}*@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarbr¨ucken, Germany <sup>3</sup> TU Dresden, Dresden, Germany patrick.koopmann@tu-dresden.de

<sup>4</sup> Universit´e de Lorraine, CNRS, Inria, LORIA, Nancy, France sophie.tourret@inria.fr

**Abstract.** Abduction in description logics finds extensions of a knowledge base to make it entail an observation. As such, it can be used to explain why the observation does not follow, to repair incomplete knowledge bases, and to provide possible explanations for unexpected observations. We consider TBox abduction in the lightweight description logic *EL*, where the observation is a concept inclusion and the background knowledge is a TBox, i.e., a set of concept inclusions. To avoid useless answers, such problems usually come with further restrictions on the solution space and/or minimality criteria that help sort the chaff from the grain. We argue that existing minimality notions are insufficient, and introduce connection minimality. This criterion follows Occam's razor by rejecting hypotheses that use concept inclusions unrelated to the problem at hand. We show how to compute a special class of connection-minimal hypotheses in a sound and complete way. Our technique is based on a translation to first-order logic, and constructs hypotheses based on prime implicates. We evaluate a prototype implementation of our approach on ontologies from the medical domain.

### **1 Introduction**

Ontologies are used in areas like biomedicine or the semantic web to represent and reason about terminological knowledge. They consist normally of a set of axioms formulated in a description logic (DL), giving definitions of concepts, or stating relations between them. In the lightweight description logic EL [2], particularly used in the biomedical domain, we find ontologies that contain around a hundred thousand axioms. For instance, SNOMED CT<sup>1</sup> contains over 350,000 axioms, and the Gene Ontology GO<sup>2</sup> defines over 50,000 concepts. A central

c The Author(s) 2022

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 188–207, 2022. https://doi.org/10.1007/978-3-031-10769-6\_12

<sup>1</sup> https://www.snomed.org/.

<sup>2</sup> http://geneontology.org/.

reasoning task for ontologies is to determine whether one concept is subsumed by another, a question that can be answered in polynomial time [1], and rather efficiently in practice using highly optimized description logic reasoners [29]. If the answer to this question is unexpected or hints at an error, a natural interest is in an explanation for that answer—especially if the ontology is complex. But whereas explaining entailments—i.e., explaining why a concept subsumption holds—is well-researched in the DL literature and integrated into standard ontology editors [21,22], the problem of explaining non-entailments has received less attention, and there is no standard tool support. Classical approaches involve counter-examples [5], or *abduction*.

In abduction a non-entailment T -|= α, for a TBox T and an observation α, is explained by providing a "missing piece", the *hypothesis*, that, when added to the ontology, would entail α. Thus it provides possible fixes in case the entailment should hold. In the DL context, depending on the shape of the observation, one distinguishes between concept abduction [6], ABox abduction [7–10,12,19,24, 25,30,31], TBox abduction [11,33] or knowledge base abduction [14,26]. We are focusing here on TBox abduction, where the ontology and hypothesis are TBoxes and the observation is a concept inclusion (CI), i.e., a single TBox axiom.

To illustrate this problem, consider the following TBox, about academia,

T<sup>a</sup> = { ∃employment.ResearchPosition ∃qualification.Diploma Researcher, ∃writes.ResearchPaper Researcher, Doctor ∃qualification.PhD, Professor ≡ Doctor ∃employment.Chair, FundsProvider ∃writes.GrantApplication }

that states, in natural language:


The observation α<sup>a</sup> = Professor Researcher, "Being a professor implies being a researcher", does not follow from T<sup>a</sup> although it should. We can use TBox abduction to find different ways of recovering this entailment.

Commonly, to avoid trivial answers, the user provides syntactic restrictions on hypotheses, such as a set of abducible axioms to pick from [8,30], a set of abducible predicates [25,26], or patterns on the shape of the solution [11]. But even with those restrictions in place, there may be many possible solutions and, to find the ones with the best explanatory potential, syntactic criteria are usually combined with minimality criteria such as subset minimality, size minimality, or semantic minimality [7]. Even combined, these minimality criteria still retain a major flaw. They allow for explanations that go against the principle of parsimony, also known as Occam's razor, in that they may contain concepts that are completely unrelated to the problem at hands. As an illustration, let us return to our academia example. The TBoxes

$$\begin{aligned} \mathcal{H}\_{\mathrm{a1}} &= \{ \,\,\mathsf{Chair} \sqsubseteq \,\mathsf{RessearchPosition}, \,\mathsf{PhD} \sqsubseteq \,\mathsf{Diplorm} \} \text{ and} \\ \mathcal{H}\_{\mathrm{a2}} &= \{ \,\,\mathsf{Professor} \sqsubseteq \,\mathsf{FundsProvider}, \,\mathsf{GradApplicate} \sqsubseteq \,\mathsf{ResarchPathPaper} \} \end{aligned}$$

are two hypotheses solving the TBox abduction problem involving T<sup>a</sup> and αa. Both of them are subset-minimal, have the same size, and are incomparable w.r.t. the entailment relation, so that traditional minimality criteria cannot distinguish them. However, intuitively, the second hypothesis feels more arbitrary than the first. Looking at Ha1, Chair and ResearchPosition occur in T<sup>a</sup> in concept inclusions where the concepts in α<sup>a</sup> also occur, and both PhD and Diploma are similarly related to α<sup>a</sup> but via the role qualification. In contrast, Ha2 involves the concepts FundsProvider and GrantApplication that are not related to α<sup>a</sup> in any way in Ta. In fact, any random concept inclusion A ∃writes.B in T<sup>a</sup> would lead to a hypothesis similar to Ha2 where A replaces FundsProvider and B replaces GrantApplication. Such explanations are not parsimonious.

We introduce a new minimality criterion called *connection minimality* that is parsimonious (Sect. 3), defined for the lightweight description logic EL. This criterion characterizes hypotheses for T and α that connect the left- and righthand sides of the observation α without introducing spurious connections. To achieve this, every left-hand side of a CI in the hypothesis must follow from the left-hand side of α in T , and, taken together, all the right-hand sides of the CIs in the hypothesis must imply the right-hand side of α in T , as is the case for Ha1. To compute connection-minimal hypotheses in practice, we present a technique based on first-order reasoning that proceeds in three steps (Sect. 4). First, we translate the abduction problem into a first-order formula Φ. We then compute the prime implicates of Φ, that is, a set of minimal logical consequences of Φ that subsume all other consequences of Φ. In the final step, we construct, based on those prime implicates, solutions to the original problem. We prove that all hypotheses generated in this way satisfy the connection minimality criterion, and that the method is complete for a relevant subclass of connectionminimal hypotheses. We use the SPASS theorem prover [34] as a restricted SOSresolution [18,35] engine for the computation of prime implicates in a prototype implementation (Sect. 5), and we present an experimental analysis of its performances on a set of bio-medical ontologies.(Sect. 6). Our results indicate that our method can in many cases be applied in practice to compute connection-minimal hypotheses. A technical report companion of this paper includes all proofs as well as a detailed example of our method as appendices [16].

There are not many techniques that can handle TBox abduction in EL or more expressive DLs [11,26,33]. In [11], instead of a set of abducibles, a set of *justification patterns* is given, in which the solutions have to fit. An arbitrary oracle function is used to decide whether a solution is admissible or not (which may use abducibles, justification patterns, or something else), and it is shown that deciding the existence of hypotheses is tractable. However, different to our approach, they only consider atomic CIs in hypotheses, while we also allow for hypotheses involving conjunction. The setting from [33] also considers EL, and abduction under various minimality notions such as subset minimality and size minimality. It presents practical algorithms, and an evaluation of an implementation for an always-true informativeness oracle (i.e., limited to subset minimality). Different to our approach, it uses an external DL reasoner to decide entailment relationships. In contrast, we present an approach that directly exploits first-order reasoning, and thus has the potential to be generalisable to more expressive DLs.

While dedicated resolution calculi have been used before to solve abduction in DLs [9,26], to the best of our knowledge, the only work that relies on firstorder reasoning for DL abduction is [24]. Similar to our approach, it uses SOSresolution, but to perform ABox adbuction for the more expressive DL ALC. Apart from the different problem solved, in contrast to [24] we also provide a semantic characterization of the hypotheses generated by our method. We believe this characterization to be a major contribution of our paper. It provides an intuition of what parsimony is for this problem, independently of one's ease with first-order logic calculi, which should facilitate the adoption of this minimality criterion by the DL community. Thanks to this characterization, our technique is calculus agnostic. Any method to compute prime implicates in first-order logic can be a basis for our abduction technique, without additional theoretical work, which is not the case for [24]. Thus, abduction in EL can benefit from the latest advances in prime implicates generation in first-order logic.

#### **2 Preliminaries**

We first recall the descripton logic EL and its translation to first-order logic [2], as well as TBox abduction in this logic.

Let N<sup>C</sup> and N<sup>R</sup> be pair-wise disjoint, countably infinite sets of unary predicates called *atomic concepts* and of binary predicates called *roles*, respectively. Generally, we use letters A, B, E, F,... for atomic concepts, and r for roles, possibly annotated. Letters C, D, possibly annotated, denote EL *concepts*, built according to the syntax rule

$$C ::= \top \mid A \mid C \sqcap C \mid \exists r. C \; .$$

We implicitly represent EL conjunctions as sets, that is, without order, nested conjunctions, and multiple occurrences of a conjunct. We use -{C1,...,C*m*} to abbreviate C<sup>1</sup> ... C*m*, and identify the empty conjunction (m = 0) with . An EL *TBox* T is a finite set of *concept inclusions* (CIs) of the form C D.

EL is a syntactic variant of a fragment of first-order logic that uses N<sup>C</sup> and N<sup>R</sup> as predicates. Specifically, TBoxes T and CIs α correspond to closed first-order formulas π(T ) and π(α) resp., while concepts C correspond to open formulas π(C, x) with a free variable x. In particular, we have

$$\begin{aligned} \pi(\top, x) &:= \mathtt{true}, & \pi(\exists r. C, x) &:= \exists y. (r(x, y) \land \pi(C, y)), \\ \pi(A, x) &:= A(x), & \pi(C \sqsubseteq D) &:= \forall x. (\pi(C, x) \to \pi(D, x)), \\ \pi(C \sqcap D, x) &:= \pi(C, x) \land \pi(D, x), & \pi(\mathcal{T}) &:= \bigwedge \{\pi(\alpha) \mid \alpha \in \mathcal{T}\}. \end{aligned}$$

As common, we often omit the in conjunctions Φ, that is, we identify sets of formulas with the conjunction over those. The notions of a *term* t; an *atom* P(t ¯) where t ¯ is a sequence of terms; a *positive literal* P(t ¯); a *negative literal* ¬P(t ¯); and a clause, Horn, definite, positive or negative, are defined as usual for first-order logic, and so are entailment and satisfaction of first-order formulas.

We identify CIs and TBoxes with their translation into first-order logic, and can thus speak of the entailment between formulas, CIs and TBoxes. When T |= C D for some T , we call C a *subsumee* of D and D a *subsumer* of C. We adhere here to the definition of the word "subsume": "to include or contain something else", although the terminology is reversed in first-order logic. We say two TBoxes T1, T<sup>2</sup> are *equivalent*, denoted T<sup>1</sup> ≡ T<sup>2</sup> iff T<sup>1</sup> |= T<sup>2</sup> and T<sup>2</sup> |= T1. For example {D C1,...,D C*n*}≡{D C<sup>1</sup> ... C*n*}. It is well known that, due to the absence of concept negation, every EL TBox is consistent.

The abduction problem we are concerned with in this paper is the following:

**Definition 1.** *An* EL TBox abduction problem *(shortened to* abduction problem*) is a tuple* T , Σ, C<sup>1</sup> C<sup>2</sup> *, where* T *is a TBox called the* background knowledge*,* Σ *is a set of atomic concepts called the* abducible signature*, and* C<sup>1</sup> C<sup>2</sup> *is a CI called the* observation*, s.t.* T -|= C<sup>1</sup> C2*. A solution to this problem is a TBox*

$$\mathcal{H} \subseteq \{ A\_1 \sqcap \dots \sqcap A\_n \sqsubseteq B\_1 \sqcap \dots \sqcap B\_m \mid \{ A\_1, \dots, A\_n, B\_1, \dots, B\_m \} \subseteq \Sigma \} $$

*where* m > 0*,* n ≥ 0 *and such that* T ∪H|= C<sup>1</sup> C<sup>2</sup> *and, for all CIs* α ∈ H*,* T -|= α*. A solution to an abduction problem is called a* hypothesis*.*

For example, Ha1 and Ha2 are solutions for Ta, Σ, α<sup>a</sup> , as long as Σ contains all the atomic concepts that occur in them. Note that in our setting, as in [6, 33], concept inclusions in a hypothesis are *flat*, i.e., they contain no existential role restrictions. While this restricts the solution space for a given problem, it is possible to bypass this limitation in a targeted way, by introducing fresh atomic concepts equivalent to a concept of interest. We exclude the consistency requirement T ∪H -|= ⊥, that is given in other definitions of DL abduction problem [25], since EL TBoxes are always consistent. We also allow m > 1 instead of the usual m = 1. This produces the same hypotheses modulo equivalence.

For simplicity, we assume in the following that the concepts C<sup>1</sup> and C<sup>2</sup> in the abduction problem are atomic. We can always introduce fresh atomic concepts A<sup>1</sup> and A<sup>2</sup> with A<sup>1</sup> C<sup>1</sup> and C<sup>2</sup> A<sup>2</sup> to solve the problem for complex concepts.

Common minimality criteria include *subset* minimality, *size* minimality and *semantic* minimality, that respectively favor <sup>H</sup> over <sup>H</sup> if: <sup>H</sup> - H ; the number of atomic concepts in H is smaller than in H ; and if H |= H but H -|= H.

### **3 Connection-Minimal Abduction**

To address the lack of parsimony of common minimality criteria, illustrated in the academia example, we introduce *connection* minimality, Intuitively, connection minimality only accepts those hypotheses that ensure that every CI in the hypothesis is connected to both C<sup>1</sup> and C<sup>2</sup> in T , as is the case for Ha1 in the academia example. The definition of connection minimality is based on the following ideas: 1) Hypotheses for the abduction problem should create a *connection* between C<sup>1</sup> and C2, which can be seen as a concept D that satisfies T ∪H|= C<sup>1</sup> D, D C2. 2) To ensure parsimony, we want this connection to be based on concepts D<sup>1</sup> and D<sup>2</sup> for which we already have T |= C<sup>1</sup> D1, D<sup>2</sup> C2. This prevents the introduction of unrelated concepts in the hypothesis. Note however that D<sup>1</sup> and D<sup>2</sup> can be complex, thus the connection from C<sup>1</sup> to D<sup>1</sup> (resp. D<sup>2</sup> to C2) can be established by arbitrarily long chains of concept inclusions. 3) We additionally want to make sure that the connecting concepts are not more complex than necessary, and that H only contains CIs that directly connect parts of D<sup>2</sup> to parts of D<sup>1</sup> by closely following their structure.

To address point 1), we simply introduce connecting concepts formally.

**Definition 2.** *Let* C<sup>1</sup> *and* C<sup>2</sup> *be concepts. A concept* D connects C<sup>1</sup> *to* C<sup>2</sup> *in* T *if and only if* T |= C<sup>1</sup> D *and* T |= D C2*.*

Note that if T |= C<sup>1</sup> C<sup>2</sup> then both C<sup>1</sup> and C<sup>2</sup> are connecting concepts from C<sup>1</sup> to C2, and if T -|= C<sup>1</sup> C2, the case of interest, neither of them are.

To address point 2), we must capture *how* a hypothesis creates the connection between the concepts C<sup>1</sup> and C2. As argued above, this is established via concepts D<sup>1</sup> and D<sup>2</sup> that satisfy T |= C<sup>1</sup> D1, D<sup>2</sup> C2. Note that having only two concepts D<sup>1</sup> and D<sup>2</sup> is exactly what makes the approach parsimonious. If there was only one concept, C<sup>1</sup> and C<sup>2</sup> would already be connected, and as soon as there are more than two concepts, hypotheses start becoming more arbitrary: for a very simple example with unrelated concepts, assume given a TBox that entails Lion Felidae, Mammal Animal and House Building. A possible hypothesis to explain Lion Animal is {Felidae House,Building Mammal} but this explanation is more arbitrary than {Felidae Mammal}—as is the case when comparing Ha2 with Ha1 in the academia example—because of the lack of connection of House Building with both Lion and Animal. Clearly this CI could be replaced by any other CI entailed by T , which is what we want to avoid.

We can represent the structure of D<sup>1</sup> and D<sup>2</sup> in graphs by using EL *description trees*, originally from Baader et al. [3].

**Definition 3.** *An* EL description tree *is a finite labeled tree* T = (V,E,v0, l) *where* V *is a set of nodes with root* v<sup>0</sup> ∈ V *, the nodes* v ∈ V *are labeled with* l(v) ⊆ NC*, and the (directed) edges* vrw ∈ E *are such that* v, w ∈ V *and are labeled with* r ∈ NR*.*

Given a tree T = (V,E,v0, l) and v ∈ V , we denote by T(v) the subtree of T that is rooted in v. If l(v0) = {A1,...,A*k*} and v1, ..., v*<sup>n</sup>* are all the children of v0, we

**Fig. 1.** Description trees of <sup>D</sup><sup>1</sup> (left) and <sup>D</sup><sup>2</sup> (right).

can define the concept represented by T recursively using C<sup>T</sup> = A<sup>1</sup> ... A*<sup>k</sup>* ∃r1.CT(*v*1) ... ∃r*l*.CT(*vl*) where for j ∈ {1,...,n}, v0r*j*v*<sup>j</sup>* ∈ E. Conversely, we can define T*<sup>C</sup>* for a concept C = A<sup>1</sup> ... A*<sup>k</sup>* ∃r1.C<sup>1</sup> ... ∃r*n*.C*<sup>n</sup>* inductively based on the pairwise disjoint description trees T*<sup>C</sup><sup>i</sup>* = {V*i*, E*i*, v*i*, l*i*}, i ∈ {1,...,n}. Specifically, T*<sup>C</sup>* = (V*<sup>C</sup>* , E*<sup>C</sup>* , v*<sup>C</sup>* , l*<sup>C</sup>* ), where

$$\begin{array}{ll} V\_C = \{v\_0\} \cup \bigcup\_{i=1}^n V\_i, & l\_C(v) = l\_i(v) \text{ for } v \in V\_i, \\ E\_C = \{v\_0 r\_i v\_i \mid 1 \le i \le n\} \cup \bigcup\_{i=1}^n E\_i, & l\_C(v\_0) = \{A\_1, \dots, A\_k\}. \end{array}$$

If T = ∅, then subsumption between EL concepts is characterized by the existence of a homomorphism between the corresponding description trees [3]. We generalise this notion to also take the TBox into account.

**Definition 4.** *Let* T<sup>1</sup> = (V1, E1, v0, l1) *and* T<sup>2</sup> = (V2, E2, w0, l2) *be two description trees and* T *a TBox. A mapping* φ : V<sup>2</sup> → V<sup>1</sup> *is a* T -homomorphism *from* T<sup>2</sup> *to* T<sup>1</sup> *if and only if the following conditions are satisfied:*

*1.* φ(w0) = v<sup>0</sup> *2.* φ(v)rφ(w) ∈ E<sup>1</sup> *for all* vrw ∈ E<sup>2</sup> *3. for every* v ∈ V<sup>1</sup> *and* w ∈ V<sup>2</sup> *with* v = φ(w)*,* T |= - l1(v) - l2(w)

*If only 1 and 2 are satisfied, then* φ *is called a* weak *homomorphism.*

T -homomorphisms for a given TBox T capture subsumption w.r.t. T . If there exists a T -homomorphism φ from T<sup>2</sup> to T1, then T |= C<sup>T</sup><sup>1</sup> C<sup>T</sup><sup>2</sup> . This can be shown easily by structural induction using the definitions [16]. The weak homomorphism is the structure on which a T -homomorphism can be built by adding some hypothesis H to T . It is used to reveal missing links between a subsumee D<sup>2</sup> of C<sup>2</sup> and a subsumer D<sup>1</sup> of C1, that can be added using H.

*Example 5.* Consider the concepts

$$\begin{aligned} D\_1 &= \exists \mathsf{employment.Chair} . \mathsf{Thair} \sqcap \exists \mathsf{qualification.PhD} \\ D\_2 &= \exists \mathsf{employment.ResearchPosition} \sqcap \exists \mathsf{qualification.Diployment.} \end{aligned}$$

from the academia example. Figure 1 illustrates description trees for D<sup>1</sup> (left) and D<sup>2</sup> (right). The curved arrows show a weak homomorphism from T*<sup>D</sup>*<sup>2</sup> to T*<sup>D</sup>*<sup>1</sup> that can be strengthened into a T -homomorphism for some TBox T that corresponds to the set of CIs in Ha1 ∪ { }. The figure can also be used to illustrate what we mean by connection minimality: in order to create a connection between D<sup>1</sup> and D2, we should *only* add the CIs from Ha1 ∪ { } *unless* they are already entailed by Ta. In practice, this means the weak homomorphism from D<sup>2</sup> to D<sup>1</sup> becomes a (T<sup>a</sup> ∪ Ha1)-homomorphism.

To address point 3), we define a partial order on concepts, s.t. C D if we can turn D into C by removing conjuncts in subexpressions, e.g., ∃r .B ∃r.A ∃r .(B B ). Formally, this is achieved by the following definition.

**Definition 6.** *Let* C *and* D *be arbitrary concepts. Then* C D *if either:*


We can finally capture our ideas on connection minimality formally.

**Definition 7 (Connection-Minimal Abduction).** *Given an abduction problem* T , Σ, C<sup>1</sup> C<sup>2</sup> *, a hypothesis* H *is* connection-minimal *if there exist concepts* D<sup>1</sup> *and* D<sup>2</sup> *built over* Σ ∪ N<sup>R</sup> *and a mapping* φ *satisfying each of the following conditions:*


H *is additionally called* packed *if the left-hand sides of the CIs in* H *cannot hold more conjuncts than they do, which is formally stated as: for* H*, there is no* H *defined from the same* D<sup>2</sup> *and a* D <sup>1</sup> *and* φ *s.t. there is a node* w ∈ V<sup>2</sup> *for which* l1(φ(w)) l 1(φ (w)) *and* l1(φ(w )) = l 1(φ (w )) *for* w -= w*.*

Straightforward consequences of Definition 7 include that φ is a (T ∪H) homomorphism from T*<sup>D</sup>*<sup>2</sup> to T*<sup>D</sup>*<sup>1</sup> and that D<sup>1</sup> and D<sup>2</sup> are connecting concepts from C<sup>1</sup> to C<sup>2</sup> in T ∪H so that T ∪H |= C<sup>1</sup> C<sup>2</sup> as wanted [16]. With the help of Fig. 1 and Example 5, one easily establishes that hypothesis Ha1 is connection-minimal—and even packed. Connection-minimality rejects Ha2, as a single T -homomorphism for some T between two concepts D<sup>1</sup> and D<sup>2</sup> would be insufficient: we would need two weak homomorphisms, one linking Professor to FundsProvider and another linking ∃writes.GrantApplication to ∃writes.ResearchPaper.

## **4 Computing Connection-Minimal Hypotheses Using Prime Implicates**

To compute connection-minimal hypotheses in practice, we propose a method based on first-order prime implicates, that can be derived by resolution. We

**Fig. 2.** *EL* abduction using prime implicate generation in FOL.

assume the reader is familiar with the basics of first-order resolution, and do not reintroduce notions of clauses, Skolemization and resolution inferences here (for details, see [4]). In our context, every term is built on variables, denoted x, y, a single constant sk<sup>0</sup> and unary Skolem functions usually denoted sk, possibly annotated. Prime implicates are defined as follows.

**Definition 8 (Prime Implicate).** *Let* Φ *be a set of clauses. A clause* ϕ *is an* implicate *of* Φ *if* Φ |= ϕ*. Moreover* ϕ *is* prime *if for any other implicate* ϕ *of* Φ *s.t.* ϕ |= ϕ*, it also holds that* ϕ |= ϕ *.*

Let Σ <sup>⊆</sup> <sup>N</sup><sup>C</sup> be a set of unary predicates. Then PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ) denotes the set of all positive ground prime implicates of Φ that only use predicate symbols from <sup>Σ</sup> <sup>∪</sup> <sup>N</sup>R, while PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) denotes the set of all negative ground prime implicates of Φ that only use predicates symbols from Σ ∪ NR.

*Example 9.* Given a set of clauses Φ = {A1(sk0),¬B1(sk0),¬A1(x)∨r(x, sk(x)), ¬A1(x) ∨ A2(sk(x)),¬B2(x) ∨ ¬r(x, y) ∨ ¬B3(y) ∨ B1(x)}, the ground prime implicates of <sup>Φ</sup> for Σ = <sup>N</sup><sup>C</sup> are, on the positive side, PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ) = {A1(sk0), <sup>A</sup>2(sk(sk0)), r(sk0, sk(sk0))} and, on the negative side, PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) = {¬B1(sk0), ¬B2(sk0)∨ ¬B3(sk(sk0))}. They are implicates because all of them are entailed by Φ. For a ground implicate ϕ, another ground implicate ϕ such that ϕ |= ϕ and ϕ -|= ϕ can only be obtained from ϕ by dropping literals. Such an operation does not produce another implicate for any of the clauses presented above as belonging to PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ)and PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ), thus they really are all prime.

To generate hypotheses, we translate the abduction problem into a set of firstorder clauses, from which we can infer prime implicates that we then combine to obtain the result as illustrated in Fig. 2. In more details: We first translate the problem into a set Φ of Horn clauses. Prime implicates can be computed using an off-the-shelf tool [13,28] or, in our case, a slight extension of the resolution-based version of the SPASS theorem prover [34] using the set-of-support strategy and some added features described in Sect. 5. Since <sup>Φ</sup> is Horn, PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ) contains only unit clauses. A final recombination step looks at the clauses in PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) one after the other. These correspond to candidates for the connecting concepts D<sup>2</sup> of Definition 7. Recombination attempts to match each literal in one such clause with unit clauses from PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ). If such a match is possible, it produces a suitable D<sup>1</sup> to match D2, and allows the creation of a solution to the abduction problem. The set S contains all the hypotheses thus obtained.

In what follows, we present our translation of abduction problems into firstorder logic and formalize the construction of hypotheses from the prime implicates of this translation. We then show how to obtain termination for the prime implicate generation process with soundness and completeness guarantees on the solutions computed.

*Abduction Method.* We assume the EL TBox in the input is in normal form as defined, e.g., by Baader et al. [2]. Thus every CI is of one of the following forms:

$$A \sqsubseteq B \qquad A\_1 \sqcap A\_2 \sqsubseteq B \qquad \exists r. A \sqsubseteq B \qquad A \sqsubseteq \exists r. B$$

where A, A1, A2, B ∈ N<sup>C</sup> ∪ {}.

The use of normalization is justified by the following lemma.

**Lemma 10.** *For every* EL *TBox* T *, we can compute in polynomial time an* EL *TBox* T *in normal form such that for every other TBox* H *and every CI* C D *that use only names occurring in* T *, we have* T ∪H |= C D *iff* T ∪H |= C D*.*

After the normalisation, we eliminate occurrences of , replacing this concept everywhere by the fresh atomic concept A. We furthermore add ∃r.A A and B A in T for every role r and atomic concept B occurring in T . This simulates the semantics of for A, namely the implicit property that C holds for any C no matter what the TBox is. In particular, this ensures that whenever there is a positive prime implicate B(t) or r(t, t ), A(t) also becomes a prime implicate. Note that normalisation and elimination extend the signature, and thus potentially the solution space of the abduction problem. This is remedied by intersecting the set of abducible predicates Σ with the signature of the original input ontology. We assume that T is in normal form and without in the rest of the paper.

We denote by T <sup>−</sup> the result of renaming all atomic concepts A in T using fresh *duplicate* symbols A−. This renaming is done only on concepts but not on roles, and on C<sup>2</sup> but not on C<sup>1</sup> in the observation. This ensures that the literals in a clause of PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) all relate to the conjuncts of a -minimal subsumee of C2. Without it, some of these conjuncts would not appear in the negative implicates due to the presence of their positive counterparts as atoms in PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ). The translation of the abduction problem T , Σ, C<sup>1</sup> C<sup>2</sup> is defined as the Skolemization of

$$
\pi(\mathcal{T}\uplus\mathcal{T}^-) \land \neg \pi(C\_1 \sqsubseteq C\_2^-),
$$

where sk<sup>0</sup> is used as the unique fresh Skolem constant such that the Skolemization of ¬π(C<sup>1</sup> C<sup>−</sup> <sup>2</sup> ) results in {C1(sk0),¬C<sup>−</sup> <sup>2</sup> (sk0)}. This translation is usually denoted Φ and always considered in clausal normal form.

**Theorem 11.** *Let* T , Σ, C<sup>1</sup> C<sup>2</sup> *be an abduction problem and* Φ *be its firstorder translation. Then, a TBox* H *is a packed connection-minimal solution to the problem if and only if an equivalent hypothesis* H *can be constructed from non-empty sets* A *and* B *of atoms verifying:*


We call the hypotheses that are constructed as in Theorem 11 *constructible*. This theorem states that every packed connection-minimal hypothesis is equivalent to a constructible hypothesis and vice versa. A constructible hypothesis is built from the concepts in *one* negative prime implicate in PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) and *all* matching concepts from prime implicates in PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ). The matching itself is determined by the Skolem terms that occur in all these clauses. The subterm relation between the terms of the clauses in PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ) and PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) is the same as the ancestor relation in the description trees of subsumers of C<sup>1</sup> and subsumees of C<sup>2</sup> respectively. The terms matching in positive and negative prime implicates allow us to identify where the missing entailments between a subsumer D<sup>1</sup> of C<sup>1</sup> and a subsumee D<sup>2</sup> of C<sup>2</sup> are. These missing entailments become the constructible H. The condition CB*,t* - CA*,t* is a way to write that CA*,t* CB*,t* is not a tautology, which can be tested by subset inclusion.

The formal proof of this result is detailed in the technical report [16]. We sketch it briefly here. To start, we link the subsumers of <sup>C</sup><sup>1</sup> with PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ). This is done at the semantics level: We show that all Herbrand models of Φ, i.e., models built on the symbols in <sup>Φ</sup>, are also models of PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ), that is itself such a model. Then we show that C1(sk0) as well as the formulas corresponding to the subsumers of C<sup>1</sup> in our translation are satisfied by all Herbrand models. This follows from the fact that Φ is in fact a set of Horn clauses. Next, we show, using a similar technique, how duplicate negative ground implicates, not necessarily prime, relate to subsumees of C2, with the restriction that there must exist a weak homomorphism from a description tree of a subsumer of C<sup>1</sup> to a description tree of the considered subsumee of C2. Thus, H provides the missing CIs that will turn the weak homomorphism into a (T ∪H)-homomorphism. Then, we establish an equivalence between the -minimality of the subsumee of C<sup>2</sup> and the primality of the corresponding negative implicate. Packability is the last aspect we deal with, whose use is purely limited to the reconstruction. It holds because <sup>A</sup> contains all <sup>A</sup>(t) ∈ PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ) for all terms t occurring in B.

*Example 12.* Consider the abduction problem Ta, Σ, α<sup>a</sup> where Σ contains all concepts from Ta. For the translation Φ of this problem, we have

$$\begin{aligned} \mathcal{PT}\_{\Sigma}^{g+}(\Phi) &= \{ \mathsf{Professor}(\mathsf{sk}\_{0}), \mathsf{Doctor}(\mathsf{sk}\_{0}), \mathsf{Chair}(\mathsf{sk}\_{1}(\mathsf{sk}\_{0})), \mathsf{PhD}(\mathsf{sk}\_{2}(\mathsf{sk}\_{0})) \}, \\ \mathcal{PT}\_{\Sigma}^{g-}(\Phi) &= \{ \ \neg \mathsf{Researcher}^{-}(\mathsf{sk}\_{0}), \\ &\neg \mathsf{ResearchPosition}^{-}(\mathsf{sk}\_{1}(\mathsf{sk}\_{0})) \lor \neg \mathsf{Digmolma}^{-}(\mathsf{sk}\_{2}(\mathsf{sk}\_{0})) \} \end{aligned}$$

where sk<sup>1</sup> is the Skolem function introduced for Professor ∃employment.Chair and sk<sup>2</sup> is introduced for Doctor ∃qualification.PhD. This leads to two constructible solutions: {Professor Doctor Researcher} and Ha1, that are both packed connection-minimal hypotheses if Σ = NC. Another example is presented in full details in the technical report [16].

*Termination.* If T contains cycles, there can be infinitely many prime implicates. For example, for T = {C<sup>1</sup> A, A ∃r.A, ∃r.B B,B C2} both the positive and negative ground prime implicates of Φ are unbounded even though the set of constructible hypotheses is finite (as it is for any abduction problem):

$$\begin{split} \mathcal{PT}\_{\Sigma}^{g+}(\Phi) &= \{C\_{1}(\mathbf{s}\mathbf{k}\_{0}), A(\mathbf{s}\mathbf{k}\_{0}), A(\mathbf{s}\mathbf{k}(\mathbf{s}\mathbf{k}\_{0})), A(\mathbf{s}\mathbf{k}(\mathbf{s}\mathbf{k}(\mathbf{s}\mathbf{k}\_{0}))), \dots \}, \\ \mathcal{PT}\_{\Sigma}^{g-}(\Phi) &= \{\neg C\_{2}^{-}(\mathbf{s}\mathbf{k}\_{0}), \neg B^{-}(\mathbf{s}\mathbf{k}\_{0}), \neg B^{-}(\mathbf{s}\mathbf{k}(\mathbf{s}\mathbf{k}\_{0})), \dots \}. \end{split}$$

To find all constructible hypotheses of an abduction problem, an approach that simply computes all prime implicates of Φ, e.g., using the standard resolution calculus, will never terminate on cyclic problems. However, if we look only for subset-minimal constructible hypotheses, termination can be achieved for cyclic and non-cyclic problems alike, because it is possible to construct all such hypotheses from prime implicates that have a polynomially bounded term depth, as shown below. To obtain this bound, we consider resolution derivations of the ground prime implicates and we show that they can be done under some restrictions that imply this bound.

Before performing resolution, we compute the *presaturation* Φ*<sup>p</sup> of the set of clauses* Φ, defined as

$$\Phi\_p = \Phi \cup \{ \neg A(x) \lor B(x) \mid \Phi \mid = \neg A(x) \lor B(x) \}.$$

where A and B are either both original or both duplicate atomic concepts. The presaturation can be efficiently computed before the translation, using a modern EL reasoner such as Elk [23], which is highly optimized towards the computation of all entailments of the form A B. While the presaturation computes nothing a resolution procedure could not derive, it is what allows us to bind the maximal depth of terms in inferences to that in prime implicates. If Φ*<sup>p</sup>* is presaturated, we do not need to perform inferences that produce Skolem terms of a higher nesting depth than what is needed for the prime implicates.

Starting from the presaturated set Φ*p*, we can show that all the relevant prime implicates can be computed if we restrict all inferences to those where


The first restriction turns the derivation of PI*<sup>g</sup>*<sup>+</sup> <sup>Σ</sup> (Φ) and PI*<sup>g</sup>*<sup>−</sup> <sup>Σ</sup> (Φ) into an SOSresolution derivation [18] with set of support {C1(sk0), C<sup>−</sup> <sup>2</sup> (sk0)}, i.e., the only two clauses with ground terms in Φ. This restriction is a straightforward consequence of our interest in computing only ground implicates, and of the fact that the non-ground clauses in Φ cannot entail the empty clause since every EL TBox is consistent. The other restrictions are consequences of the following theorems, whose proofs are available in the technical report [16].

**Theorem 13.** *Given an abduction problem and its translation* Φ*, every constructible hypothesis can be built from prime implicates that are inferred under restriction 4.*

In fact, for PI*g*<sup>+</sup> <sup>Σ</sup> (Φ) it is even possible to restrict inferences to generating only ground resolvents, as can be seen in the proof of Theorem 13, that directly looks at the kinds of clauses that are derivable by resolution from Φ.

**Theorem 14.** *Given an abduction problem and its translation* Φ*, every subsetminimal constructible hypothesis can be built from prime implicates that have a nesting depth of at most* n × m*, where* n *is the number of atomic concepts in* Φ*, and* m *is the number of occurrences of existential role restrictions in* T *.*

The proof of Theorem 14 is based on a structure called a *solution tree*, which resembles a description tree, but with multiple labeling functions. It assigns to each node a Skolem term, a set of atomic concepts called *positive label*, and a single atomic concept called *negative label*. The nodes correspond to matching partners in a constructible hypothesis: The Skolem term is the term on which we match literals. The positive label collects the atomic concepts in the positive prime implicates containing that term. The maximal anti-chains of the tree, i.e., the maximal subsets of nodes s.t. no node is the ancestor of another are such that their negative labels correspond to the literals in a derivable negative implicate. For every solution tree, the Skolem labels and negative labels of the leaves determine a negative prime implicate, and by combining the positive and negative labels of these leaves, we obtain a constructible hypothesis, called the *solution* of the tree. We show that from every solution tree with solution H we can obtain a solution tree with solution H ⊆ H s.t. on no path, there are two nodes that agree both on the head of their Skolem labeling and on the negative label. Furthermore the number of head functions of Skolem labels is bounded by the total number n of Skolem functions, while the number of distinct negative labels is bounded by the number m of atomic concepts, bounding the depth of the solution tree for H at n × m. This justifies the bound in Theorem 14. This bound is rather loose. For the academia example, it is equal to 22 × 6 = 132.

# **5 Implementation**

We implemented our method to compute all subset-minimal constructible hypotheses in the tool CAPI.<sup>3</sup> To compute the prime implicates, we used SPASS [34], a first-order theorem prover that includes resolution among other calculi. We implemented everything before and after the prime implicate computation in Java, including the parsing of ontologies, preprocessing (detailed below), clausification of the abduction problems, translation to SPASS input, as well as the parsing and processing of the output of SPASS to build the constructible hypotheses and filter out the non-subset-minimal ones. On the Java side, we used the OWL API for all DL-related functionalities [20], and the EL reasoner Elk for computing the presaturations [23].

<sup>3</sup> available under https://lat.inf.tu-dresden.de/∼koopmann/CAPI.

*Preprocessing.* Since realistic TBoxes can be too large to be processed by SPASS, we replace the background knowledge in the abduction problem by a subset of axioms relevant to the abduction problem. Specifically, we replace the abduction problem (T , Σ, C<sup>1</sup> C2) by the abduction problem (M<sup>⊥</sup> *<sup>C</sup>*<sup>1</sup> ∪ M *<sup>C</sup>*<sup>2</sup> , Σ, C<sup>1</sup> C2), where M<sup>⊥</sup> *<sup>C</sup>*<sup>1</sup> is the ⊥*-module* of T for the signature of C1, and M *<sup>C</sup>*<sup>2</sup> is the  *module* of T for the signature of C<sup>2</sup> [15]. Those notions are explained in the technical report [16]. Their relevant properties are that M<sup>⊥</sup> *<sup>C</sup>*<sup>1</sup> is a subset of T s.t. M<sup>⊥</sup> *<sup>C</sup>*<sup>1</sup> |= C<sup>1</sup> D iff T |= C<sup>1</sup> D for all concepts D, while M *<sup>C</sup>*<sup>2</sup> is a subset of T that ensures M *<sup>C</sup>*<sup>2</sup> |= D C<sup>2</sup> iff T |= D C<sup>2</sup> for all concepts D. It immediately follows that every connection-minimal hypothesis for the original problem (T , Σ, C<sup>1</sup> C2) is also a connection-minimal hypothesis for (M<sup>⊥</sup> *<sup>C</sup>*<sup>1</sup> ∪ M *<sup>C</sup>*<sup>2</sup> , <sup>Σ</sup>, C<sup>1</sup> <sup>C</sup>2). For the presaturation, we compute with Elk all CIs of the form A B s.t. M<sup>⊥</sup> *<sup>C</sup>*<sup>1</sup> ∪ M *<sup>C</sup>*<sup>2</sup> |= A B.

*Prime implicates generation.* We rely on a slightly modified version of SPASS v3.9 to compute all ground prime implicates. In particular, we added the possibility to limit the number of variables allowed in the resolvents to enforce **R2**. For each of the restrictions **R1**–**R3** there is a corresponding flag (or set of flags) that is passed to SPASS as an argument.

*Recombination.* The construction of hypotheses from the prime implicates found in the previous stage starts with a straightforward process of matching negative prime implicates with a set of positive ones based on their Skolem terms. It is followed by subset minimality tests to discard non-subset-minimal hypotheses, since, with the bound we enforce, there is no guarantee that these are valid constructible hypotheses because the negative ground implicates they are built upon may not be prime. If SPASS terminates due to a timeout instead of reaching the bound, then it is possible that some subset-minimal constructible hypotheses are not found, and thus, some non-constructible hypotheses may be kept. Note that these are in any case solutions to the abduction problem.

# **6 Experiments**

There is no benchmark suite dedicated to TBox abduction in EL, so we created our own, using realistic ontologies from the bio-medical domain. For this, we used ontologies from the 2017 snapshot of Bioportal [27]. We restricted each ontology to its EL fragment by filtering out unsupported axioms, where we replaced domain axioms and n-ary equivalence axioms in the usual way [2]. Note that, even if the ontology contains more expressive axioms, an EL hypothesis is still useful if found. From the resulting set of TBoxes, we selected those containing at least 1 and at most 50,000 axioms, resulting in a set of 387 EL TBoxes. Precisely, they contained between 2 and 46,429 axioms, for an average of 3,039 and a median of 569. Towards obtaining realistic benchmarks, we created three different categories of abduction problems for each ontology T , where in each case, we used the signature of the entire ontology for Σ.


All experiments were run on Debian Linux (Intel Core i5-4590, 3.30 GHz, 23 GB Java heap size). The code and scripts used in the experiments are available online [17]. The three phases of the method (see Fig. 2) were each assigned a hard time limit of 90 s.

For each ontology, we attempted to create and translate 5 abduction problems of each category. This failed on some ontologies because either there was no corresponding entailment (25/28/25 failures out of the 387 ontologies for ORIGIN/JUSTIF/REPAIR), there was a timeout during the translation (5/5/5 failures for ORIGIN/JUSTIF/REPAIR), or because the computation of justifications caused an exception (-/2/0 failures for ORIGIN/JUSTIF/REPAIR). The final number of abduction problems for each category is in the first column of Table 1.

We then attempted to compute prime implicates for these benchmarks using SPASS. In addition to the hard time limit, we gave a soft time limit of 30 s to SPASS, after which it should stop exploring the search space and return the implicates already found. In Table 1 we show, for each category, the percentage of problems on which SPASS succeeded in computing a non-empty set of clauses (Success) and the percentage of problems on which SPASS terminated within the time limit, where all solutions are computed (Compl.). The high number of CIs in the background knowledge explains most of the cases where SPASS reached the soft time limit. In a lot of these cases, the bound on the term depth goes into the billion, rendering it useless in practice. However, the "Compl." column shows that the bound is reached before the soft time limit in most cases.

The reconstruction never reached the hard time limit. We measured the median, average and maximal number of solutions found (#H), size of solutions in number of CIs (|H|), size of CIs from solutions in number of atomic concepts (|α|), and SPASS runtime (time, in seconds), all reported in Table 1. Except for the simple JUSTIF problems, the number of solutions may become very large. At the same time, solutions always contain very few axioms (never


**Table 1.** Evaluation results.

more than 3), though the axioms become large too. We also noticed that highly nested Skolem terms rarely lead to more hypotheses being found: 8/1/15 for ORIGIN/JUSTIF/REPAIR, and the largest nesting depth used was: 3/1/2 for ORIGIN/JUSTIF/REPAIR. This hints at the fact that longer time limits would not have produced more solutions, and motivates future research into redundancy criteria to stop derivations (much) earlier.

## **7 Conclusion**

We have introduced connection-minimal TBox abduction for EL which finds parsimonious hypotheses, ruling out the ones that entail the observation in an arbitrary fashion. We have established a formal link between the generation of connection-minimal hypotheses in EL and the generation of prime implicates of a translation Φ of the problem to first-order logic. In addition to obtaining these theoretical results, we developed a prototype for the computation of subsetminimal constructible hypotheses, a subclass of connection-minimal hypotheses that is easy to construct from the prime implicates of Φ. Our prototype uses the SPASS theorem prover as an SOS-resolution engine to generate the needed implicates. We tested this tool on a set of realistic medical ontologies, and the results indicate that the cost of computing connection-minimal hypotheses is high but not prohibitive.

We see several ways to improve our technique. The bound we computed to ensure termination could be advantageously replaced by a redundancy criterion discarding irrelevant implicates long before it is reached, thus greatly speeding computation in SPASS. We believe it should also be possible to further constrain inferences, e.g., to have them produce ground clauses only, or to generate the prime implicates with terms of increasing depth in a controlled incremental way instead of enforcing the soft time limit, but these two ideas remain to be proved feasible. As an alternative to using prime implicates, one may investigate direct method for computing connection-minimal hypotheses in EL.

The theoretical worst-case complexity of connection-minimal abduction is another open question. Our method only gives a very high upper bound: by bounding only the nesting dept of Skolem terms polynomially as we did with Theorem 13, we may still permit clauses with exponentially many literals, and thus double exponentially many clauses in the worst case, which would give us an 2ExpTime upper bound to the problem of computing all subset-minimal constructible hypotheses. Using structure-sharing and guessing, it is likely possible to get a lower bound. We have not looked yet at lower bounds for the complexity either.

While this work focuses on abduction problems where the observation is a CI, we believe that our technique can be generalised to knowledge that also contains ground facts (ABoxes), and to observations that are of the form of conjunctive queries on the ABoxes in such knowledge bases. The motivation for such an extension is to understand why a particular query does not return any results, and to compute a set of TBox axioms that fix this problem. Since our translation already transforms the observation into ground facts, it should be possible to extend it to this setting. We would also like to generalize TBox abduction by finding a reasonable way to allow role restrictions in the hypotheses, and to extend connection-minimality to more expressive DLs such as ALC.

**Acknowledgments.** This work was supported by the Deutsche Forschungsgemeinschaft (DFG), Grant 389792660 within TRR 248.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Semantic Relevance**

Fajar Haifani1,2 and Christoph Weidenbach1(B)

<sup>1</sup> Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr¨ucken, Germany

{f.haifani,weidenbach}@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarbr¨ucken, Germany

**Abstract.** A clause *C* is syntactically relevant in some clause set *N*, if it occurs in every refutation of *N*. A clause *C* is syntactically semirelevant, if it occurs in some refutation of *N*. While syntactic relevance coincides with satisfiability (if *C* is syntactically relevant then *N* \ {*C*} is satisfiable), the semantic counterpart for syntactic semi-relevance was not known so far. Using the new notion of a *conflict literal* we show that for independent clause sets *N* a clause *C* is syntactically semi-relevant in the clause set *N* if and only if it adds to the number of conflict literals in *N*. A clause set is independent, if no clause out of the clause set is the consequence of different clauses from the clause set.

Furthermore, we relate the notion of relevance to that of a minimally unsatisfiable subset (MUS) of some independent clause set *N*. In propositional logic, a clause *C* is relevant if it occurs in all MUSes of some clause set *N* and semi-relevant if it occurs in some MUS. For first-order logic the characterization needs to be refined with respect to ground instances of *N* and *C*.

### **1 Introduction**

In our previous work [11], we introduced a notion of syntactic relevance based on refutations while at the same time generalized the completeness result for resolution by the set-of-support strategy (SOS) [28,33] as its test. Our notion of syntactic relevance is useful for explaining why a set of clauses is unsatisfiable. In this paper, we introduce a semantic counterpart of syntactic relevance that sheds further light on the relationship between a clause out of a clause set and the potential refutations of this clause set. In the following Sect. 1.1, we first recall syntactic relevance along with an example and then proceeds to explain it in terms of our new semantic relevance in the later Sect. 1.2.

#### **1.1 Syntactic Relevance**

Given an unsatisfiable set of clauses <sup>N</sup>, <sup>C</sup> <sup>∈</sup> <sup>N</sup> is *syntactically relevant* if it occurs in all refutations, it is *syntactically semi-relevant* if it occurs in some refutation, otherwise it is called *syntactically irrelevant*. The clause-based notion of relevance is useful in relating the contribution of a clause to refutation (goal conjecture). This has in particular been shown in the context of product scenarios built out of construction kits as they are used in the car industry [8,32].

For an illustration of our privous notions and results we now consider the following unsatisfiable first-order clause set N where Fig. 1 presents a refutation of N.

$$\begin{aligned} N &= \{(1)A(f(a)) \lor D(x\_3), \\ &(2)\neg D(x\_7), \\ &(3)\neg B(c, a) \lor B(b, f(x\_6)), \\ &(4)B(x\_1, x\_2) \lor C(x\_1), \\ &(5)\neg C(x\_5), \\ &(6)\neg A(x\_4) \lor \neg B(b, x\_4) \} \end{aligned}$$

**Fig. 1.** A refutation of *N* in tree representation

In essence, inferences in an SOS refutation always involve at least one clause in the SOS and put the resulting clause back in it. So, this refutation is not an SOS refutation from the syntactically semi-relevant clause (3)¬B(c,a) ∨ B(b,f(x6)), because only the shaded part represents an SOS refutation starting with this clause. More specifically, there are two inferences ended in (8)¬B(b,f(a)) which violates the condition for an SOS refutaiton. Nevertheless, it can be transformed into an SOS refutation where the clause (3)¬B(c,a) ∨ B(b,f(x6)) is in the SOS [11], Fig. 2. Please note that N \ {(3)¬B(c, a) ∨ B(b, f(x6))} is still unsatisfiable and classical SOS completeness [33] is not sufficient to guarantee the existence of a refutation with SOS {(3)¬B(c,a) ∨ B(b,f(x6))} [11].

In addition, <sup>N</sup> \ {(3)¬B(c, a) <sup>∨</sup> <sup>B</sup>(b, f(x6))} is also a *minimally unsatisfiable subset* (MUS), where Fig. <sup>3</sup> presents a respective refutation. A MUS is an unsatisfiable clause set such that removing a clause from this set would render it satisfiable. Consequently, a MUS-based defined notion of semi-relevance on the level of the original first-order clauses is not sufficient here. The clause

**Fig. 2.** Semi-relevant clause (3)¬*B*(*c, a*) ∨ *B*(*b, f*(*x*6)) in SOS

(3)¬B(c, a) ∨ B(b, f(x6)) should not be disregarded, because it leads to a different grounding of the clauses. For example, in the refutation of Fig. 2 clause (5)¬C(x5) is necessarily instantiated with {x<sup>5</sup> → c} where in the refutation of Fig. 3 it is necessarily instantiated with {x<sup>5</sup> → b}. Therefore, the two refutations are different and clause (3)¬B(c, a) ∨ B(b, f(x6)) should be considered semirelevant. Nevertheless, in propositional logic it is sufficient to consider MUSes to explain unsatisfiability on the original clause level, Lemma 18.

**Fig. 3.** A refutation of *N* without (3)¬*B*(*c, a*) ∨ *B*(*b, f*(*x*6))

#### **1.2 Semantic Relevance**

We now illustrate how our new notion of relevance works on the previous example. First, different from the other works, we propose a way of characterizing semantic relevance by using our novel concept of a *conflict literal*. A ground literal L is a conflict literal in a clause set N if there are some satisfiable sets of instances N<sup>1</sup> and N<sup>2</sup> from N s.t. N<sup>1</sup> |= L and N<sup>2</sup> |= comp(L). On the one hand, explaining an unsatisfiable clause set as the absence of a model (as it is usually defined) is not that helpful since an absence means there is nothing to discuss in the first place. On the other hand, the contribution of a clause to unsatisfiability of a clause set can only partially be explained using the concept of a MUS which we have discussed before. A conflict literal provides a middle ground to explain the contribution of a clause to unsatisfiability between the absence of a model and MUSes. It also better reflects our intuition that there is a contradiction (in the form of two implied simple facts that cannot be both true at the same time) in an unsatisfiable set of clauses.

From Fig. 1, we can already see that C(c) and its complement ¬C(c) are conflict literals because

$$\begin{aligned} N \mid \{\neg C(x)\} &= C(c) \\ \neg C(x) &= \neg C(c) \end{aligned}$$

Also, in addition to that {¬C(x)} is trivially satisfiable, N \ {¬C(x)} is also satisfiable. Based on the refutation in Fig. 3, ¬C(x) is syntactically relevant due to N \ {(3)¬B(c, a) ∨ B(b, f(x6))} being a MUS. We will also show that for a ground MUS any ground literal occurring in it is a conflict literal, Lemma 20. For our ongoing example it is still possible to identify the conflict literals by means of ground MUSes by looking into the refutations from Fig. 1 and Fig. 3. This leads to the following conflict literals for N, see Definition 10:

$$\begin{aligned} \text{conflict}(N) &= \{ (\neg)A(f(a)),\\ &\qquad (\neg)B(b, f(a)), (\neg)B(c, a),\\ &\qquad (\neg)C(b), (\neg)C(c) \} \quad \cup \\ &\qquad \{ (\neg)D(t) \mid t \text{ is a ground term} \} \end{aligned}$$

These conflict literals can be identified by pushing the substitutions in the refutations from Fig. 1 and Fig. 3 towards the input clauses. They correspond to two first-order MUSes M<sup>1</sup> and M2. All the ground literals are conflict literals and all other ground conflict literals can be obtained by grounding the remaining variables.

$$\begin{aligned} M\_1 &= \{ (5) \neg C(c), (2) \neg D(x\_7), \\ &(1) A(f(a)) \lor D(x\_3), \\ &(3) \neg B(c, a) \lor B(b, f(a)), \\ &(4) B(c, a) \lor C(c), \\ &(6) \neg A(f((a))) \lor \neg B(b, f(a)) \}, \\ M\_2 &= \{ (5) \neg C(b), \\ &(4) B(b, f(a)), (2) \neg D(x\_7), \\ &(1) A(f(a)) \lor D(x\_3), \\ &(6) \neg A(f(a)) \lor \neg B(b, f(a)) \} \end{aligned}$$

One can see that, despite (3)¬B(c, a) ∨ B(b, f(x6)) is outside of the only MUS on the first-order level, an instance of it does occur in some ground MUS, take M<sup>1</sup> and an arbitrary grounding of x<sup>3</sup> and x<sup>7</sup> to the identical term t, and the conflict literal (¬)B(c, a) depends on clause (3). Nevertheless, determining conflict literals is not so obvious in the general case since we do not necessarily know beforehand which ground terms should substitute the variables in the clauses. Moreover, there can be an infinite number of such ground MUSes of possibly unbounded size.

Based on conflict literals, here we introduce a notion of relevance that is semantic in nature, Definition 16. This will also serve as an alternative characterization to our previous refutation-based syntactic relevance. As redundant clauses, e.g., tautologies, can also be syntactically semi-relevant, we require independent clause sets for the definition of semantic relevance. A clause set is *independent*, if it does not contain clauses with instances implied by satisfiable sets of instances of different clauses out of the set. Given an unsatisfiable independent set of clauses <sup>N</sup>, a clause <sup>C</sup> is *relevant* in <sup>N</sup> if <sup>N</sup> without <sup>C</sup> has no conflict literals, it is *semi-relevant* if <sup>C</sup> is necessary to some conflict literals, and it is *irrelevant* otherwise.

Similar to our previous work, relevant clauses are the obvious ones because removing them would make our set satisfiable. On the other hand, irrelevant clauses can be freely identified once we know the semi-relevant ones. For our running example, in fact (3)¬B(c, a) ∨ B(b, f(x6)) is semi-relevant because it is necessary for the conflict literals (¬)C(c) and (¬)B(c, a). More specifically, the set of conflicts for N \ {¬B(c, a) ∨ B(b, f(x6))} does not include (¬)C(c) and (¬)B(c, a):

$$\begin{aligned} \text{conflict}(N \mid \{\neg B(c, a) \lor B(b, f(x\_6))\}) &= \{ (\neg)A(f(a)), (\neg)B(b, f(a)), (\neg)C(b) \} \cup \\ &\{ (\neg)D(t) | t \text{ is a ground term} \} \end{aligned}$$

These are conflict literals identifiable from M2: Assume that the variables x<sup>3</sup> and x<sup>7</sup> in M<sup>2</sup> are both grounded by an identical term t. Take some ground literal, for example, A(f(a)) ∈ conflict(N \ {¬B(c, a) ∨ B(b, f(x6))), and define

$$\begin{aligned} N\_{\emptyset} &= \{ C \in M\_2 | A(f(a)) \notin C \text{ and } \neg A(f(a)) \notin C \} \\ &= \{ (5) \neg C(b), (4)B(b, f(a)), (2)\neg D(t) \} \\ N\_{A(f(a))} &= \{ C \in M\_2 | A(f(a)) \in C \} \\ &= \{ (1)A(f(a)) \lor D(t) \} \\ N\_{\neg A(f(a))} &= \{ C \in M\_2 | \neg A(f(a)) \in C \} \\ &= \{ (6) \neg A(f(a)) \lor \neg B(b, f(a)) \} \end{aligned}$$

N<sup>∅</sup> ∪NA(f(a)) and N<sup>∅</sup> ∪N¬A(f(a)) are satisfiable because of the Herbrand model {B(b, f(a)), A(f(a))} and {B(b, f(a))} respectively. In addition,

$$\begin{aligned} N\_{\emptyset} \cup N\_A(f(a)) &= A(f(a)) \\ N\_{\emptyset} \cup N\_{\neg A(f(a))} &= \neg A(f(a)) \end{aligned}$$

because A(f(a)) can be acquired using resolution between (1) and (2) for N<sup>∅</sup> ∪ NA(f(a)) and ¬A(f(a)) can be acquired using resolution between (4) and (6) for N<sup>∅</sup> ∪ N¬A(f(a)). In a similar manner, we can show that the other ground literals are also conflict literals.

*Related Work:* Other works which aim to explain unsatisfiability mostly rely on the notion of MUSes, mainly in propositional logic [14–16,21,26]. The complexity of determining whether a clause set is a MUS is Dp-complete for a propositional clause set with at most three literals per clause and at most three occurrences of each propositional variable [25]. In [14], syntactically semi-relevant clauses for propositional logic are called a *plain clause set*. Using the terminology in [16], a clause <sup>C</sup> <sup>∈</sup> <sup>N</sup> is *necessary* if it occurs in all MUSes, it is *potentially necessary* if it occurs in some MUS, otherwise, it is *never necessary*. In addition, a clause is defined to be *usable* if it occurs in some refutation. This is thus similar to our syntactic notion of semi-relevance [11]: Given a clause C ∈ N, C is usable if-and-only-if C is syntactically semi-relevant. It is also argued that a usable clause that is not potentially necessary is semantically superfluous. A different but related notion has also been applied for propositional abduction [7]. The notion of a MUS has also been used for explaining unsatisfiability in firstorder logic [20]. There, it has been defined in a more general setting: If a set of clauses <sup>N</sup> is divided into <sup>N</sup> <sup>=</sup> <sup>N</sup> <sup>N</sup> with a *non-relaxable* clause set <sup>N</sup> and *relaxable* clause set <sup>N</sup> (which must be satisfiable), a MUS is a subset M of N s.t. N M is unsatisfiable but removing a clause from M would render it satisfiable. There are also some works in satisfiability modulo theory (SMT) [5,6,9,35]. A deletion-based approach well-known in propositional logic has also been used for MUS extraction in SMT [9]. In [5,6], a MUS is extracted by combining an SMT solver with an arbitrary external propositional core extractor. Another approach is to construct some graph representing the subformulas of the problem instance, recursively remove clauses in a depth-first-search manner and additionally use some heuristics to further improve the runtime[35]. For the function-free and equality-free first-order fragment, there is a "decomposemerge" approach to compute all MUSes [19,34]. In description logic, a notion that is related to MUS is called *minimal axiom set* (MinA) usually identified by the problem of axiom pinpointing [1,4,13,30]. Its computation is usually divided into two categories: black-box and white-box. A black-box approach picks some inputs, executes it using some sound and complete reasoner, and then interprets the output [13]. On the other hand, white-box approach takes some reasoner and performs an internal modification for it. In this case, Tableau is mostly used [1,30]. In addition, the concept of a lean kernel has also been used to approximate the union of such MinA's [27]. The way relevance is defined is similar in spirit but usually used for an entailment problem instead of unsatisfiability. The notion of syntactic semi-relevance has also been applied to description logics via a translation scheme to first-order logic [10].

The paper is organized as follows. Section 2 fixes the notations, definitions and existing results in particular from [11]. Section 3 is reserved for our new notion of semantic relevance. Finally, we conclude our work in Sect. 4 with a discussion of our results.

### **2 Preliminaries**

We assume a standard first-order language without equality over a signature Σ = (Ω,Π) where Ω is a non-empty set of functions symbols, Π a non-empty set of predicate symbols both coming with their respective fixed arities denoted by the function arity. The set of terms over an infinite set of variables X is denoted by T(Σ, X ). Atoms, literals, clauses, and clause sets are defined as usual, e.g., see [24]. We identify a clause with its multiset of literals. Variables in clauses are universally quantified. Then N denotes a clause set; C, D denote clauses; L, K denote literals; A, B denote atoms; P, Q, R, T denote predicates; t, s terms; f, g, h functions; a, b, c, d constants; and x, y, z variables, all possibly indexed. The complement of a literal is denoted by the function comp. Atoms, literals, clauses, and clause sets are *ground* if they do not contain any variable.

An interpretation <sup>I</sup> with a nonempty *domain* (or *universe*) <sup>U</sup> assigns (i) a total function <sup>f</sup><sup>I</sup> : <sup>U</sup> <sup>n</sup> → U for each <sup>f</sup> <sup>∈</sup> <sup>Ω</sup> with arity(f) = <sup>n</sup> and (ii) a relation <sup>P</sup> ⊆ U<sup>m</sup> to every predicate symbol <sup>P</sup> <sup>I</sup> <sup>∈</sup> <sup>Π</sup> with arity(P) = <sup>m</sup>. A valuation <sup>β</sup> is a function X → U where the assignment of some variable x can be modified to e ∈ U by β[x → e]. It is extended to terms as I(β) : T(Σ, X ) → U. Semantic entailment |= considers variables in clauses to be universally quantified. The extension to atoms, literals, disjunctions, clauses and sets of clauses is as follows: I(β)(P(t1,...,tn)) = 1 if (I(β)(t1),..., I(β)(tn)) ∈ P <sup>I</sup> and 0 otherwise; I(β)(¬φ)=1 − I(β)(φ); for a disjunction L<sup>1</sup> ∨ ... ∨ Lk, I(β)(L<sup>1</sup> ∨ ... ∨ Lk) = max(I(β)(L1),..., I(β)(Lk)); for a clause C, I(β)(C) = 1 if for all valuations β = {x<sup>1</sup> → e1,...,x<sup>n</sup> → en} where the x<sup>i</sup> are the free variables in C there is a literal L ∈ C such that I(β)(L) = 1; for a set of clauses N = {C1,...,Ck}, <sup>I</sup>(β)({C1,...,Ck}) = min(I(β)(C1),..., <sup>I</sup>(β)(Ck)). A set of clauses <sup>N</sup> is *satisfiable* if there is an <sup>I</sup> of <sup>N</sup> such that <sup>I</sup>(β)(N) = 1, <sup>β</sup> arbitrary, (in this case <sup>I</sup> is called a *model* of <sup>N</sup>: I |<sup>=</sup> <sup>N</sup>) otherwise <sup>N</sup> is called *unsatisfiable*.

Substitutions σ, τ are total mappings from variables to terms, where dom(σ) := {x | xσ = x} is finite and codom(σ) := {t | xσ = t, x ∈ dom(σ)}. <sup>A</sup> *renaming* <sup>σ</sup> is a bijective substitution. The application of substitutions is extended to literals, clauses, and sets/sequences of such objects in the usual way. If <sup>C</sup> <sup>=</sup> Cσ for some substitution <sup>σ</sup>, then <sup>C</sup> is an *instance* of <sup>C</sup>. A *unifier* σ for a set of terms t1,...,t<sup>k</sup> satisfies tiσ = tjσ for all 1 ≤ i, j ≤ k and it is called <sup>a</sup> *most general unifier* if for any unifier <sup>σ</sup> of <sup>t</sup>1,...,t<sup>k</sup> there is a substitution <sup>τ</sup> s.t. <sup>σ</sup> <sup>=</sup> στ . The function mgu denotes the *most general unifier* of two terms, atoms, literals if it exists. We assume that any mgu of two terms or literals does not introduce any fresh variables and is idempotent.

The resolution calculus consists of two inference rules: Resolution and Factoring [28,29]. The rules operate on a state (N,S) where the initial state for a classical resolution refutation from a clause set N is (∅, N) and for an SOS (Set Of Support) refutation with clause set N and initial SOS clause set S the initial state is (N,S). We describe the rules in the form of abstract rewrite rules operating on states (N,S). As usual we assume for the resolution rule that the involved clauses are variable disjoint. This can always be achieved by applying renamings into fresh variables.

**Resolution** (N,S {C ∨ K}) ⇒RES (N,S ∪ {C ∨ K,(D ∨ C)σ}) provided (D ∨ L) ∈ (N ∪ S) and σ = mgu(L, comp(K))

**Factoring** (N,S {C ∨ L ∨ K}) ⇒RES (N,S ∪ {C ∨ L ∨ K}∪{(C ∨ L)σ}) provided σ = mgu(L, K)

The clause (D∨C)<sup>σ</sup> is the result of a *Resolution inference* between its parents and called a *resolvent*. The clause (<sup>C</sup> <sup>∨</sup>L)<sup>σ</sup> is the result of a *Factoring inference* of its parent and called a *factor*. A sequence of rule applications (N,S) <sup>⇒</sup><sup>∗</sup> RES (N,S ) is called a *resolution derivation*. It is called an *SOS resolution derivation* if <sup>N</sup> <sup>=</sup> <sup>∅</sup>. In case ⊥ ∈ <sup>S</sup> it is a called a *(SOS) resolution refutation*. If for two clauses C, D there exists a substitution σ such that Cσ ⊆ D, then we say that <sup>C</sup> *subsumes* <sup>D</sup>. In this case <sup>C</sup> <sup>|</sup><sup>=</sup> <sup>D</sup>.

**Theorem 1 (Soundness and Refutational Completeness of (SOS) Resolution** [11,28,33]**).** *Resolution is sound and refutationally complete [28]. If for some clause set* <sup>N</sup> *and initial SOS* <sup>S</sup>*,* <sup>N</sup> *is satisfiable and* <sup>N</sup> <sup>∪</sup><sup>S</sup> *is unsatisfiable, then there is a (SOS) resolution derivation of* <sup>⊥</sup> *from* (N,S) *[33]. If for some clause set* <sup>N</sup> *and clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup> *there exists a resolution refutation from* <sup>N</sup> *using* <sup>C</sup>*, then there is an SOS derivation of* <sup>⊥</sup> *from* (<sup>N</sup> \ {C}, {C}) *[11].*

Please note that the recent SOS completeness result of [11] generalizes the classical SOS completeness result by [33].

**Theorem 2 (Deductive Completeness of Resolution** [17,22]**).** *Given a set of clauses* <sup>N</sup> *and a clause* <sup>D</sup>*, if* <sup>N</sup> <sup>|</sup><sup>=</sup> <sup>D</sup>*, then there is a resolution derivation of some clause* <sup>C</sup> *from* (∅, N) *such that* <sup>C</sup> *subsumes* <sup>D</sup>*.*

For deductions we require every clause to be used exactly once, so deductions always have a tree form.

**Definition 3 (Deduction** [11]**).** *<sup>A</sup>* deduction <sup>π</sup><sup>N</sup> = [C1,...,Cn] *of a clause* <sup>C</sup><sup>n</sup> *from some clause set* <sup>N</sup> *is a finite sequence of clauses such that for each* <sup>C</sup><sup>i</sup> *the following holds:*

*1.1* <sup>C</sup><sup>i</sup> *is a renamed, variable-fresh version of a clause in* <sup>N</sup>*, or*


*and for each* <sup>C</sup><sup>i</sup> <sup>∈</sup> <sup>π</sup><sup>N</sup> *,* i<n*:*


*We omit the subscript* <sup>N</sup> *in* <sup>π</sup><sup>N</sup> *if the context is clear.*

A deduction π of some clause C ∈ π, where π, π are deductions from N is a subdeduction of π if π ⊆ π, where the subset relation is overloaded for sequences. A deduction <sup>π</sup><sup>N</sup> = [C1,...,C<sup>n</sup>−<sup>1</sup>, <sup>⊥</sup>] is called a *refutation*. While the conditions 3.1.1, 3.1.2, and 3.1.3 are sufficient to represent a resolution derivation, the conditions 3.2.1 and 3.2.2 force deductions to be minimal with respect to Cn.

Note that variable renamings are only applied to clauses from N such that all clauses from N that are introduced in the deduction are variable disjoint. Also recall that our notion of a deduction implies a tree structure. Both assumptions together admit the existence of overall grounding substitutions for a deduction.

**Definition 4 (Overall Substitution of a Deduction** [11]**).** *Given a deduction* <sup>π</sup> *of a clause* <sup>C</sup><sup>n</sup> *the* overall substitution <sup>τ</sup>π,i *of* <sup>C</sup><sup>i</sup> <sup>∈</sup> <sup>π</sup> *is recursively defined by*

*1 if* <sup>C</sup><sup>i</sup> *is a factor of* <sup>C</sup><sup>j</sup> *with* j<i *and mgu* <sup>σ</sup>*, then* <sup>τ</sup>π,i <sup>=</sup> <sup>τ</sup>π,j ◦ <sup>σ</sup>*, 2 if* <sup>C</sup><sup>i</sup> *is a resolvent of* <sup>C</sup><sup>j</sup> *and* <sup>C</sup><sup>k</sup> *with* j<k<i *and mgu* <sup>σ</sup>*, then* <sup>τ</sup>π,i <sup>=</sup>

(τπ,j ◦ <sup>τ</sup>π,k) ◦ <sup>σ</sup>*, 3 if* <sup>C</sup><sup>i</sup> *is an initial clause, then* <sup>τ</sup>π,i <sup>=</sup> <sup>∅</sup>*,*

*and the overall substitution of the deduction is* <sup>τ</sup><sup>π</sup> <sup>=</sup> <sup>τ</sup>π,n*. We omit the subscript* <sup>π</sup> *if the context is clear.*

Overall substitutions are well-defined because clauses introduced from N into the deduction are variable disjoint and each clause is used exactly once in the deduction. A grounding of an overall substitution τ of some deduction π is a substitution τ δ such that codom(τ δ) only contains ground terms and dom(δ) is exactly the variables from codom(τ ).

**Definition 5 (SOS Deduction** [11]**).** *A deduction* <sup>π</sup><sup>N</sup>∪<sup>S</sup> = [C1,...,Cn] *is called an* SOS deduction *with SOS* <sup>S</sup>*, if the derivation* (N,S0) <sup>⇒</sup><sup>∗</sup> *RES* (N,Sm) *is an SOS derivation where* <sup>C</sup> 1,...,C <sup>m</sup> *is the subsequence from* [C1,...,Cn] *with input clauses removed,* <sup>S</sup><sup>0</sup> <sup>=</sup> <sup>S</sup>*, and* <sup>S</sup>i+1 <sup>=</sup> <sup>S</sup><sup>i</sup> <sup>∪</sup> <sup>C</sup> <sup>i</sup>+1*.*

Oftentimes, it is of particular interest to identify the set of clauses that is minimally unsatisfiable, i.e., removing a clause would make it satisfiable. The earliest mention of such a notion is in [26] where it is introduced via a decision problem. Minimally unsatisfiable sets (MUS) have also gained a lot of attention in practice.

**Definition 6 (Minimal Unsatisfiable Subset (MUS)** [20]**).** *Given an unsatisfiable set of clauses* <sup>N</sup>*, the subset* <sup>N</sup> <sup>⊆</sup> <sup>N</sup> *is a minimally unsatisfiable subset (MUS) of* <sup>N</sup> *if any strict subset of* <sup>N</sup> *is satisfiable.*

In our previous work, we defined a notion of relevance based on how clauses may contribute to unsatisfiability by means of refutations.

**Definition 7 (Syntactic Relevance** [11]**).** *Given an unsatisfiable set of clauses* <sup>N</sup>*, a clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup> *is* syntactically relevant *if for all refutations* <sup>π</sup> *of* <sup>N</sup> *it holds that* <sup>C</sup> <sup>∈</sup> <sup>π</sup>*. A clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup> *is* syntactically semi-relevant *if there exists a refutation* <sup>π</sup> *of* <sup>N</sup> *in which* <sup>C</sup> <sup>∈</sup> <sup>π</sup>*. A clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup> *is* syntactically irrelevant *if there is no refutation* <sup>π</sup> *of* <sup>N</sup> *in which* <sup>C</sup> <sup>∈</sup> <sup>π</sup>*.*

Syntactic relevance can be identified by using the resolution calculus. A clause C ∈ N is syntactically semi-relevant if and only if there exists an SOS refutation from SOS {C} and N \ {C}.

**Theorem 8 (Syntactic Relevance** [11]**).** *Given an unsatisfiable set of clauses* <sup>N</sup>*, the clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup> *is*


An open problem from [11] is the question of a semantic counterpart to syntactic semi-relevance. Without any further properties of the clause set N, the notion of semi-relevance can lead to unintuitive results. For example, a tautology could be semi-relevant. Given a refutation showing semi-relevance of some clause C, where, in the refutation, some unary predicate P occurs, the refutation can be immediately extended using the tautology P(x) ∨ ¬P(x). We may additionally stumble upon a problem in the case where our set of clauses contains a subsumed clause. For example, if both clauses Q(a) and Q(x) exist in a clause set, they may be both semi-relevant, although from an intuition point of view one may only want to consider Q(x) to be semi-relevant, or even relevant. On the other hand, in some cases, redundant clauses are welcome as semi-relevant clauses.

*Example 9 (Redundant Clauses).* Given a set of clauses

$$N = \{Q(x), \quad Q(a), \quad \neg Q(a) \lor P(b), \quad \neg P(b), \quad P(x) \lor \neg P(x)\},$$

all clauses are syntactically semi-relevant while ¬Q(a) ∨ P(b) and ¬P(b) are syntactically relevant. However, if we disregard the redundant clauses Q(a) and P(x)∨¬P(x), then the clause Q(x) becomes a relevant clause. Therefore, for our semantic notion of relevance we will only consider clause sets without clauses implied by other, different clauses from the clause set.

#### **3 Semantic Relevance**

Except for the trivially false clause ⊥, the simplest form of a contradiction is two unit clauses K and L such that K and comp(L) are unifiable. They will be called *conflict literals*, below. Then the idea for our semantic definition of semi-relevance is to consider clauses that contribute to the number of conflict literals of a clause set. Furthermore, we will show that in any MUS every literal is a conflict literal.

While conflict literals could straightforwardly be defined in propositional logic having the above idea in mind, in first-order logic we have always to relate properties of literals, clauses to their respective ground instances. This is simply due to the fact that unsatisfiability of a first-order clause set is given by unsatisfiability of a finite set of ground instances from this set. Eventually, we will show that for independent clause sets a clause is semi-relevant, if it contributes to the number of conflict literals.

**Definition 10 (Conflict Literal).** *Given a set of clauses* <sup>N</sup> *over some signature* <sup>Σ</sup>*, a ground literal* <sup>L</sup> *is a* conflict literal *in a clause set* <sup>N</sup> *if there are two satisfiable clause sets* <sup>N</sup>1, N<sup>2</sup> *such that*


conflict(N) *denotes the set of conflict literals in* <sup>N</sup>*.*

Our notion of a conflict literal generalizes the respective notion in [12] defined for propositional logic.

*Example 11 (Conflict Literal).* Given an unsatisfiable set of clauses over the signature Σ = ({a, b, c, d, f}, {P}):

$$N = \{ \neg P(f(a, x)) \lor \neg P(f(c, y)), P(f(x, d)) \lor P(f(y, b)) \}$$

Consider the following satisfiable sets of instances from N

$$\begin{aligned} N\_1 &= \{ \neg P(f(a,d)) \lor \neg P(f(c,y)), P(f(x,d)) \lor P(f(a,b)) \}, \\ N\_2 &= \{ \neg P(f(a,b)) \lor \neg P(f(c,y)), P(f(x,d)) \lor P(f(c,b)) \} \end{aligned}$$

P(f(a, b)) is a conflict literal because N<sup>1</sup> |= P(f(a, b)) and N<sup>2</sup> |= ¬P(f(a, b)).

We can show that N<sup>1</sup> |= P(f(a, b)) because the resolution calculus is sound. Resolving both literals of ¬P(f(a, d)) ∨ ¬P(f(c, y)) with the first literal of the clause P(f(x, d)) ∨ P(f(a, b)) results in the clause P(f(a, b)) ∨ P(f(a, b)) which can be factorized to P(f(a, b)). Moreover, N<sup>1</sup> is satisfiable: An interpretation I with I(P(f(a, b))) = 1 and I(P(t)) = 0 for all terms t = f(a, b) satisfies N<sup>1</sup> and P(f(a, b)). N<sup>2</sup> |= ¬P(f(a, b)) can also be shown in the same manner.

*Example 12 (Conflict Literal).* Given

$$\begin{aligned} N &= \{ \neg R(z), R(c) \lor P(a, y), \\ Q(a), \neg Q(x) &\lor P(x, b), \\ \neg P(a, b) \} \end{aligned}$$

its conflict literals are

$$\begin{aligned} \text{conflict}(N) &= \{ P(a, b), \neg P(a, b), \\ R(c), \neg R(c), \\ Q(a), \neg Q(a) \} \end{aligned}$$

In addition to a refutation, the existence of a conflict literal is another way to characterize unsatisfiability of a clause set. Obviously, conflict literals always come in pairs.

**Lemma 13 (Minimal Unsatisfiable Ground Clause Sets and Conflict Literals).** *If* <sup>N</sup> *is a minimally unsatisfiable set of ground clauses (MUS) then any literal occurring in* <sup>N</sup> *is a conflict literal.*

*Proof* Take any ground atom <sup>A</sup> such that <sup>A</sup> occurs in <sup>N</sup>. <sup>N</sup> can be split into three disjoint clause sets:

$$\begin{aligned} N\_{\emptyset} &= \{ C \in N | A \notin C \text{ and } \neg A \notin C \} \\ N\_A &= \{ C \in N | A \in C \} \\ N\_{\neg A} &= \{ C \in N | \neg A \in C \} \end{aligned}$$

Since N is minimal, N<sup>A</sup> and N¬<sup>A</sup> are nonempty, because otherwise A is a pure literal and its corresponding clauses can be removed from N preserving unsatisfiability. Obviously N<sup>∅</sup> ∪ N<sup>A</sup> must be satisfiable, for otherwise the initial choice of N was not minimal. However, N<sup>∅</sup> ∪ N <sup>A</sup>, where N <sup>A</sup> results from all N<sup>A</sup> by deleting all A literals from the clauses of NA, must be unsatisfiable, for otherwise we can construct a satisfying interpretation for N. Thus, every model of N<sup>∅</sup> ∪ N<sup>A</sup> must also be a model of A: N<sup>∅</sup> ∪ N<sup>A</sup> |= A. Using the same argument, N<sup>∅</sup> ∪ N¬<sup>A</sup> is satisfiable and N<sup>∅</sup> ∪ N¬<sup>A</sup> |= ¬A. Therefore, A is a conflict literal.

**Lemma 14 (Conflict Literals and Unsatisfiability).** *Given a set of clauses* <sup>N</sup>*,* conflict(N) <sup>=</sup> <sup>∅</sup> *if and only if* <sup>N</sup> *is unsatisfiable.*

*Proof* "⇒" Let <sup>L</sup> <sup>∈</sup> conflict(N). By definition, there are two satisfiable subsets of instances N1, N<sup>2</sup> from N such that N<sup>1</sup> |= L and N<sup>2</sup> |= comp(L). Towards contradiction, suppose N is satisfiable. Then, there exists an interpretation I with I |= N and therefore it holds that I |= N<sup>1</sup> and I |= N2. Furthermore, by definition of a conflict literal, I |= L and I |= comp(L), a contradiction.

"⇐" Given an unsatisfiable clause set N, we show that there is a conflict literal in N. Since N is unsatisfiable, by compactness of first-order logic there is a minimal set of ground instances N from N that is also unsatisfiable. The rest follows from Lemma 13.

Intuitively, a clause that is implied by other clauses is redundant and can be removed from the set of clauses. However, then applying a calculus generating new clauses, this intuitive notion of redundancy may destroy completeness [2,23]. Still, the detection and elimination of redundant clauses, compatible or incompatible with completeness, is an important concept to the efficiency of automatic reasoning, e.g., in propositional logic [3,18]. It is also apparently important when we try to define a semantic notion of relevance. For example, a syntactically relevant clause would step down to be syntactically semi-relevant if it is duplicated. So, in order to have a semantically robust notion of relevance in first-order logic, we need to use a strong notion of (in)dependency.

**Definition 15 (Dependency).** *A clause* <sup>C</sup> *is* dependent *in* <sup>N</sup> *if there exists a satisfiable set of instances* <sup>N</sup> *from* <sup>N</sup> \ {C} *such that* <sup>N</sup> <sup>|</sup><sup>=</sup> Cσ *for some* <sup>σ</sup>*. If* <sup>C</sup> *is not dependent in* <sup>N</sup> *it is* independent *in* <sup>N</sup>*. A clause set* <sup>N</sup> *is* independent *if it does not contain any dependent clauses.*

A subsumed clause is obviously a dependent clause. However, there could also be non-subsumed clauses that are dependent. For example, in the set of clauses

$$N = \{P(a, y), P(x, b), \neg P(a, b)\}$$

P(x, b) is dependent because P(a, b) is an instance of P(x, b) and it is entailed by P(a, y). Now, we are ready to define the semantic notion of relevance based on conflict literals and dependency.

In some way, our notion of independence of clause sets is a strong assumption because there might be non-redundant clauses that are considered dependent. While this holds by design in some scenarios (e.g. the mentioned car scenario) in others it is violated by design. In addition, one question that may arise is how to acquire an independent clause set out of a dependent one. For example, in a scenario where some theory is developed out of some independent axioms. Then of course proven lemmas, theorems are dependent with respect to the axioms. In this case one could trace out of the proofs the dependency relations between the intermediate lemmas, theorems and the axioms and this way calculate independent clause sets with respect to some proven conjecture. This would then lead again to independent (sub) clause sets with respect to the proven conjecture where our results are applicable.

**Definition 16 (Semantic Relevance).** *Given an unsatisfiable set of independent clauses* <sup>N</sup>*, a clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup> *is*


*Example 17 (Dependent Clauses in Propositional Logic).*

$$\begin{aligned} N &= \{ P, \neg P, \\ &\neg P \lor Q, \neg R \lor P, \\ &\neg Q \lor R \} \end{aligned}$$

The existence of dependent clauses ¬P ∨ Q and ¬R ∨ P causes an independent clause ¬Q ∨ R to be a semi-relevant clause. However, ¬Q ∨ R is not inside the only MUS {P,¬P}.

Very often, concepts from propositional logic can be generalized to first-order logic. However, in the context of relevance this is not the case. Our notion of (semi-)relevance can also be characterized by MUSes in propositional logic, but not in first-order logic without considering instances of clauses.

**Lemma 18 (Propositional Clause Sets and Relevance).** *Given an independent unsatisfiable set of propositional clauses* <sup>N</sup>*, the relevant clauses coincide with the intersection of all MUSes and the semi-relevant clauses coincide with the union of all MUSes.*

*Proof* For the case of relevance: Given <sup>C</sup> <sup>∈</sup> <sup>N</sup>, <sup>C</sup> is relevant if and only if conflict(N \ {C}) = ∅ if and only if N \ {C} is satisfiable by Lemma 14 if and only if C is contained in all MUSes N of N.

For the case of semi-relevance: Given C ∈ N, we show C is semi-relevant if and only if C is in some MUS N ⊆ N.

"⇒": Towards contradiction, suppose there is a semi-relevant clause C that is not in any MUS. By definition of semi-relevant clauses, there are satisfiable sets N<sup>1</sup> and N<sup>2</sup> and a propositional variable P such that N<sup>1</sup> |= P, N<sup>2</sup> |= ¬P but the MUS M out of N<sup>1</sup> ∪ N<sup>2</sup> does not contain C. By Theorem 2 there exist deductions π<sup>1</sup> and π<sup>2</sup> of P and ¬P from N<sup>1</sup> and N2, respectively. Since a deduction is connected, some clauses in M and (N<sup>1</sup> ∪ N2) \ M must have some complementary propositional literals Q and ¬Q, respectively to be eventually resolved upon in either π<sup>1</sup> or π2. At least one of these deductions must contain this resolution step between a clause from M and one from (N<sup>1</sup> ∪ N2) \ M. Now by Lemma 13 the literals Q and ¬Q are conflict literals in M. Thus, there are satisfiable subsets from M which entail Q and ¬Q, respectively. Therefore, the clause containing Q or ¬Q in (N<sup>1</sup> ∪ N2) \ M is dependent contradicting the assumption that N does not contain dependent clauses.

"⇐": If C is in some MUS N ⊆ N, then, N \ {C} is satisfiable. So invoking Lemma 13 any literal L ∈ C is a conflict literal in N . In addition, L is not a conflict literal in N \ {C} for otherwise C is dependent: Suppose L is a conflict literal in N \ {C} then, by definition, there is satisfiable subset from N \ {C} which entails L. However, since L |= C, it means C is dependent.

The next example demonstrates that the notion of a MUS cannot be carried over straightforwardly to the level of clauses with variables to characterize semirelevant clauses in first-order logic.

*Example 19 (First-Order Relevant Clauses).* Given a set of clauses

$$\begin{aligned} N &= \{ P(a, y), \neg P(a, d) \lor Q(b, d), \\ &\neg P(x, c), \neg Q(b, d) \lor P(d, c), Q(z, e) \} \end{aligned}$$

over Σ = ({a, b, c, d, e}, {P, Q}). The conflict literals are

$$\{ (\neg)P(a,c), (\neg)Q(b,d), (\neg)P(d,c), (\neg)P(a,d) \}.$$

The clause P(a, y) is relevant. The literals entailed by some satisfiable instances N from N such that P(a, y) ∈ N are {¬Q(b, d)} {¬P(t, c),¬Q(t, e) | t ∈ {a, b, c, d, e}} and no two of them are complementary. Thus, conflict(N \ {P(a, y)}) = ∅. The clause ¬P(a, d) ∨ Q(b, d) is semi-relevant: Q(b, d) ∈ conflict(N \ {¬P(a, d) ∨ Q(b, d)}). The clause Q(z, e) is irrelevant.

With respect to a MUS, the clause ¬P(a, d) ∨ Q(b, d) from Example 19 is irrelevant. The only MUS from N is {P(a, y),¬P(x, c)} with grounding substitution {x → a, y → c}. However, in first-order logic we should not ignore the clauses ¬P(a, d) ∨ Q(b, d), ¬Q(b, d) ∨ P(d, c), because together with the clauses P(a, y),¬P(x, c) they result in a different grounding {x → d, y → d}. So, we argue that MUS-based (semi-)relevance on the original clause set is not sufficient to characterize the way clauses are used to derive a contradiction for full first-order logic. However, it does so if ground instances are considered.

**Lemma 20 (Relevance and MUSes on First-Order Clauses).** *Given an unsatisfiable set of independent first-order clauses* <sup>N</sup>*. Then a clause* <sup>C</sup> *is relevant in* <sup>N</sup>*, if all MUSes of unsatisfiable sets of ground instances from* <sup>N</sup> *contain a ground instance of* <sup>C</sup>*. The clause* <sup>C</sup> *is semi-relevant in* <sup>N</sup>*, if there exists a MUS of an unsatisfiable set of ground instances from* <sup>N</sup> *that contains a ground instance of* <sup>C</sup>*.*

*Proof* (Relevance) Since all ground instances from <sup>N</sup> contain a ground instance of C, then, if N \ {C} contains a ground MUS from N it means that some ground instance of C is entailed by N \ {C}. This violates our assumption that N contains no dependent clauses. Thus, N \{C} contains no ground MUSes. This further means that N \ {C} is satisfiable by the compactness theorem of firstorder logic. By Lemma 14 it therefore has no conflict literals and C is relevant. (Semi-Relevance) Take some ground MUS M containing some ground instance C of C. Due to Lemma 13, any literal P ∈ C is a conflict literal in M and consequently also in N. In addition, P is not a conflict literal in N \ {C} for otherwise C is dependent: Suppose P is a conflict literal in N \ {C}. Then, by definition, there is some satisfiable instances from N \ {C} which entails P. However, since P |= C , it means C is dependent. In conclusion, P ∈ conflict(N)\ conflict(N \ {C}) and thus C is semi-relevant.

In Example 19, we could identify two ground MUSes:

$$\{P(a,c), \neg P(a,c)\}$$

and

$$\{P(a,d), \neg P(a,d) \lor Q(b,d), \neg P(d,c), \neg Q(b,d) \lor P(d,c)\}$$

Our notion of relevance is thus alternatively explainable using Lemma 20: P(a, y) is relevant because every MUS contains an instance of it (P(a, c) and P(a, d)). The clause ¬P(a, d)∨Q(b, d) is semi-relevant as it is immediately contained in the second MUS. The clause Q(z, e) is irrelevant since no MUS contains any instance of Q(z, e). On the other hand, we may still encounter the case where a dependent clause is actually categorized as syntactically semi-relevant. Therefore, by using the dependency notion while at the same time not restricting a refutation to only use MUS as the input set, we can show that (semi-)relevance actually coincides with the syntactic (semi-)relevance. So, the semi-decidability result also follows.

**Theorem 21 (Semantic versus Syntactic Relevance).** *Given an independent, unsatisfiable set of clauses* <sup>N</sup> *in first-order logic, then (semi)-relevant clauses coincide with syntactically (semi)-relevant clauses.*

*Proof* We show the following: if <sup>N</sup> contains no dependent clause, <sup>C</sup> is (semi-) relevant if and only if C is syntactically (semi-)relevant. The case for relevant clauses is a consequence of Lemma 14. Now, we show it for semi-relevant clauses. "⇒" Let L be a ground literal with L ∈ conflict(N) \ conflict(N \ {C}). We can construct a refutation using C. There are two satisfiable subsets of instances N1, N<sup>2</sup> from N such that N<sup>1</sup> |= L and N<sup>2</sup> |= comp(L) where N<sup>1</sup> ∪N<sup>2</sup> contains at least one instance of C, for otherwise L ∈ conflict(N) \ conflict(N \ {C}). By the deductive completeness, Theorem 2, and the fact that L and comp(L) are ground literals, there are two variable disjoint deductions π<sup>1</sup> and π<sup>2</sup> of some literals K<sup>1</sup> and K<sup>2</sup> such that K1σ = L and K2σ = comp(L) for some grounding σ. Obviously, the two variable disjoint deductions can be combined to a refutation π1.π2.⊥ containing C. Thus, C is syntactically semi-relevant in N.

"⇐" Given an SOS refutation π using C, i.e., an SOS refutation π from N \ {C} with SOS {C} and overall grounding substitution σ, we show that C is semantically semi-relevant. Let N be the variable renamed versions of clauses from N \ {C} used in the refutation and S be the renamed copies of C used in the refutation. First, we show that N σ is satisfiable. Towards contradiction, suppose N σ is unsatisfiable and let Mσ ⊆ N σ be its MUS. Since π is connected, some clauses in Mσ and S σ ∪ (N σ \ Mσ) contains literals L and comp(L) respectively. By Lemma 13, L and comp(L) are also conflict literals in Mσ. So, by Definition 15, the clause containing comp(L) in S σ∪(N σ\Mσ) is dependent violating our initial assumption.

Now, since N σ is satisfiable, there is a ground MUS from (N ∪ S )σ containing some C σ ∈ Sσ. Due to Lemma 13, any L ∈ C σ is a conflict literal in N (and consequently also in N). In addition, L is not a conflict literal in N \ {C} for otherwise C is dependent: Suppose L is a conflict literal in N \ {C}. Then, by definition, there is some satisfiable instances from N \ {C} which entails L. However, since L |= C σ, it means C is dependent. In conclusion, L ∈ conflict(N) \ conflict(N \ {C}) and thus C is semi-relevant.

When we have a ground MUS, identifification of conflict literals is obvious because all of the literals in it are. However, testing if a literal L is a conflict literal is not trivial, in general. One can try enumerating all MUSes and check if L is contained in some. This definitely works for propositional logic despite being computationally expensive. In first-order logic, this is problematic because there could potentially be an infinite number of MUSes and determining a MUS is not even semi-decidable, in general. The following lemma provides a semi-decidable test using the SOS strategy.

**Lemma 22** *Given a ground literal* <sup>L</sup> *and an unsatisfiable set of clauses* <sup>N</sup> *with no dependent clauses,* <sup>L</sup> *is a conflict literal if and only if there is an SOS refutation from* (N, {<sup>L</sup> <sup>∨</sup> comp(L)})*.*

*Proof* "⇒" By the deductive completeness, Theorem 2, and the fact that <sup>L</sup> and comp(L) are ground literals, there are two variable disjoint deductions π<sup>1</sup> and π<sup>2</sup> of some literals K<sup>1</sup> and K<sup>2</sup> such that K1σ = L and K2σ = comp(L) for some grounding σ. Obviously, the two variable disjoint deductions can be combined to a refutation π1.π2.⊥. We can then construct a refutation π1.π2.(L∨ ¬L).(comp(L)).⊥ where K<sup>2</sup> is resolved with L∨ comp(L) to get comp(L) which will be resolved with K<sup>1</sup> from π<sup>1</sup> to get ⊥. By Theorem 7, it means there is an SOS refutation from (N, {L ∨ ¬L})

"⇐" Given an SOS refutation π using {L∨comp(L)}, i.e., an SOS refutation π from N \ {{L∨comp(L)}} with SOS {{L∨comp(L)}}, Let N be the variable renamed versions of clauses from N and overall grounding substitution σ. N σ is a MUS for otherwise there is a dependent clause: Suppose N σ \ M is an MUS where M is non-empty. Since π is connected, some clause D in M must be resolved with some D ∈ N σ upon some literal K. Thus, by Lemma 13, K and comp(K) are also conflict literals in N σ \ M. So, by Definition 15, the clause subsuming D in N is dependent violating our initial assumption. Finally, because L occurs in N σ and N σ is an MUS, by Lemma 13, L is a conflict literal.

## **4 Conclusion**

The main results of this paper are: (i) a semantic notion of relevance based on the existence of conflict literals, Definition 10, and Definition 16, (ii) its relationship to syntactic relevance, namely, both notions coincide for independent clause sets, Theorem 21, and (iii) the relationship of semantic relevance to minimal unsatisfiable sets, MUSes, both for propositional logic, Lemma 18, and firstorder logic, Lemma 20.

The semantic relevance notion sheds some further light on the way clauses may contribute to a refutation beyond what can be offered by the notion of MUSes. While the syntactic notion of semi-relevance also considers redundant clauses such as tautologies to be semi-relevant, the semantic notion rules out redundant clauses. Here, the notions only coincide for independent clause sets. Still, the syntactic notion is "easier" to test and there are applications where clause sets do not contain implied clauses by construction. Hence, the syntacticrelevance coincides with semantic relevance. For example, first-order toolbox formalizations have this property because every tool is formalized by its own distinct predicate. Still a goal, refutation, can be reached by the use of different tools. The classic example is the toolbox for car/truck/tractor building [8,31].

**Acknowledgments.** This work was partly funded by DFG grant 389792660 as part of TRR 248. We thank Christopher Lynch and David Plaisted for a number of discussions on semantic relevance. We thank the anonymous reviewers for their constructive and detailed comments.

# **References**


tiers in Artificial Intelligence and Applications, vol. 185, pp. 339–401. IOS Press, Amsterdam (2009)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SCL(EQ): SCL for First-Order Logic with Equality**

Hendrik Leidinger1,2(B) and Christoph Weidenbach<sup>1</sup>

<sup>1</sup> Max-Planck Institute for Informatics, Saarbr¨ucken, Germany *{*hleiding,weidenbach*}*@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarbr¨ucken, Germany

**Abstract.** We propose a new calculus SCL(EQ) for first-order logic with equality that only learns non-redundant clauses. Following the idea of CDCL (Conflict Driven Clause Learning) and SCL (Clause Learning from Simple Models) a ground literal model assumption is used to guide inferences that are then guaranteed to be non-redundant. Redundancy is defined with respect to a dynamically changing ordering derived from the ground literal model assumption. We prove SCL(EQ) sound and complete and provide examples where our calculus improves on superposition.

**Keywords:** First-order logic with equality · Term rewriting · Model-based reasoning

## **1 Introduction**

There has been extensive research on sound and complete calculi for first-order logic with equality. The current prime calculus is superposition [2], where ordering restrictions guide paramodulation inferences and an abstract redundancy notion enables a number of clause simplification and deletion mechanisms, such as rewriting or subsumption. Still this "syntactic" form of superposition infers many redundant clauses. The completeness proof of superposition provides a "semantic" way of generating only non-redundant clauses, however, the underlying ground model assumption cannot be effectively computed in general [31]. It requires an ordered enumeration of infinitely many ground instances of the given clause set, in general. Our calculus overcomes this issue by providing an effective way of generating ground model assumptions that then guarantee non-redundant inferences on the original clauses with variables.

The underlying ordering is based on the order of ground literals in the model assumption, hence changes during a run of the calculus. It incorporates a standard rewrite ordering. For practical redundancy criteria this means that both rewriting and redundancy notions that are based on literal subset relations are permitted to dynamically simplify or eliminate clauses. Newly generated clauses are non-redundant, so redundancy tests are only needed backwards. Furthermore, the ordering is automatically generated by the structure of the clause set. Instead of a fixed ordering as done in the superposition case, the calculus finds and changes an ordering according to the currently easiest way to make progress, analogous to CDCL (Conflict Driven Clause Learning) [11,21,25,29,34].

Typical for CDCL and SCL (Clause Learning from Simple Models) [1,14,18] approaches to reasoning, the development of a model assumption is done by decisions and propagations. A decision guesses a ground literal to be true whereas a propagation concludes the truth of a ground literal through an otherwise false clause. While propagations in CDCL and propositional logic are restricted to the finite number of propositional variables, in first-order logic there can already be infinite propagation sequences [18]. In order to overcome this issue, model assumptions in SCL(EQ) are at any point in time restricted to a finite number of ground literals, hence to a finite number of ground instances of the clause set at hand. Therefore, without increasing the number of considered ground literals, the calculus either finds a refutation or runs into a *stuck state* where the current model assumption satisfies the finite number of ground instances. In this case one can check whether the model assumption can be generalized to a model assumption of the overall clause set or the information of the stuck state can be used to appropriately increase the number of considered ground literals and continue search for a refutation. SCL(EQ) does not require exhaustive propagation, in general, it just forbids the decision of the complement of a literal that could otherwise be propagated.

For an example of SCL(EQ) inferring clauses, consider the three first-order clauses

$$\begin{array}{c} C\_1 := h(x) \approx g(x) \lor c \approx d \\ C\_3 := f(x) \not\approx h(x) \lor f(x) \not\approx g(x) \end{array}$$

with a Knuth-Bendix Ordering (KBO), unique weight 1, and precedence d ≺ c ≺ b ≺ a ≺ g ≺ h ≺ f. A Superposition Left [2] inference between C<sup>2</sup> and C<sup>3</sup> results in

$$C'\_4 := h(x) \not\cong g(x) \lor f(x) \not\cong g(x) \lor a \approx b.$$

For SCL(EQ) we start by building a partial model assumption, called a *trail*, with two decisions

$$\Gamma := \left[ h(a) \approx g(a)^{1; \langle h(x) \approx g(x) \lor h(x) \not\cong g(x) \rangle \cdot \sigma}, f(a) \approx g(a)^{2; \langle f(x) \approx g(x) \lor f(x) \not\cong g(x) \rangle \cdot \sigma} \right]$$

where σ := {x → a}. Decisions and propagations are always ground instances of literals from the first-order clauses, and are annotated with a level and a justification clause, in case of a decision a tautology. Now with respect to Γ clause C<sup>3</sup> is false with grounding σ, and rule Conflict is applicable; see Sect. 3.1 for details on the inference rules. In general, clauses and justifications are considered variable disjoint, but for simplicity of the presentation of this example, we repeat variable names here as long as the same ground substitution is shared. The maximal literal in C3σ is (f(x) ≈ h(x))σ and a rewrite refutation using the ground equations from the trail results in the justification clause

(g(x) ≈ g(x) ∨ f(x) ≈ g(x) ∨ f(x) ≈ g(x) ∨ h(x) ≈ g(x))· σ

where for the refutation justification clauses and all otherwise inferred clauses we use the grounding σ for guidance, but operate on the clauses with variables. The respective ground clause is smaller than (f(x) ≈ h(x))σ, false with respect to Γ and becomes our new conflict clause by an application of our inference rule Explore-Refutation. It is simplified by our inference rules Equality-Resolution and Factorize, resulting in the finally learned clause

$$C\_4 := h(x) \not\cong g(x) \lor f(x) \not\cong g(x)$$

which is then used to apply rule Backtrack to the trail. Observe that C<sup>4</sup> is strictly stronger than C <sup>4</sup> the clause inferred by superposition and that C<sup>4</sup> cannot be inferred by superposition. Thus SCL(EQ) can infer stronger clauses than superposition for this example.

*Related Work:* SCL(EQ) is based on ideas of SCL [1,14,18] but for the first time includes a native treatment of first-order equality reasoning. Similar to [14] propagations need not to be exhaustively applied, the trail is built out of decisions and propagations of ground literals annotated by first-order clauses, SCL(EQ) only learns non-redundant clauses, but for the first time conflicts resulting out of a decision have to be considered, due to the nature of the equality relation.

There have been suggested several approaches to lift the idea of an inference guiding model assumption from propositional to full first-order logic [6,12,13,18]. They do not provide a native treatment of equality, e.g., via paramodulation or rewriting.

Baumgartner et al. describe multiple calculi that handle equality by using unit superposition style inference rules and are based on either hyper tableaux [5] or DPLL [15,16]. Hyper tableaux fix a major problem of the well-known free variable tableaux, namely the fact that free variables within the tableau are rigid, i.e., substitutions have to be applied to all occurrences of a free variable within the entire tableau. Hyper tableaux with equality [7] in turn integrates unit superposition style inference rules into the hyper tableau calculus.

Another approach that is related to ours is the model evolution calculus with equality (ME<sup>E</sup> ) by Baumgartner et al. [8,9] which lifts the DPLL calculus to first-order logic with equality. Similar to our approach, ME<sup>E</sup> creates a candidate model until a clause instance contradicts this model or all instances are satisfied by the model. The candidate model results from a so-called context, which consists of a finite set of non-ground rewrite literals. Roughly speaking, a context literal specifies the truth value of all its ground instances unless a more specific literal specifies the complement. Initially the model satisfies the identity relation over the set of all ground terms. Literals within a context may be universal or parametric, where universal literals guarantee all its ground instances to be true. If a clause contradicts the current model, it is repaired by a non-deterministic split which adds a parametric literal to the current model. If the added literal does not share any variables in the contradictory clause it is added as a universal literal.

Another approach by Baumgartner and Waldmann [10] combined the superposition calculus with the Model Evolution calculus with equality. In this calculus the atoms of the clauses are labeled as "split atoms" or "superposition atoms". The superposition part of the calculus then generates a model for the superposition atoms while the model evolution part generates a model for the split atoms. Conversely, this means that if all atoms are labeled as "split atom", the calculus behaves similar to the model evolution calculus. If all atoms are labeled as "superposition atom", it behaves like the superposition calculus.

Both the hyper tableaux calculus with equality and the model evolution calculus with equality allow only unit superposition applications, while SCL(EQ) inferences are guided paramodulation inferences on clauses of arbitrary length. The model evolution calculus with equality was revised and implemented in 2011 [8] and compares its performance with that of hyper tableaux. Model evolution performed significantly better, with more problems solved in all relevant TPTP [30] categories, than the implementation of the hyper tableaux calculus.

Plaisted et al. [27] present the Ordered Semantic Hyper-Linking (OSHL) calculus. OSHL is an instantiation based approach that repeatedly chooses ground instances of a non-ground input clause set such that the current model does not satisfy the current ground clause set. A further step repairs the current model such that it satisfies the ground clause set again. The algorithm terminates if the set of ground clauses contains the empty clause. OSHL supports rewriting and narrowing, but only with unit clauses. In order to handle non-unit clauses it makes use of other mechanisms such as Brand's Transformation [3].

Inst-Gen [22] is an instantiation based calculus, that creates ground instances of the input first-order formulas which are forwarded to a SAT solver. If a ground instance is unsatisfiable, then the first-order set is as well. If not then the calculus creates more instances. The Inst-Gen-EQ calculus [23] creates instances by extracting instantiations of unit superposition refutations of selected literals of the first-order clause set. The ground abstraction is then extended by the extracted clauses and an SMT solver then checks the satisfiability of the resulting set of equational and non-equational ground literals.

In favor of examples and explanations we omit all proofs. They are available in an extended version published as a research report [24]. The rest of the paper is organized as follows. Section 2 provides basic formalisms underlying SCL(EQ). The rules of the calculus are presented in Sect. 3. Soundness and completeness results are provided in Sect. 4. We end with a discussion of obtained results and future work, Sect. 5. The main contribution of this paper is the SCL(EQ) calculus that only learns non-redundant clauses, permits subset based redundancy elimination and rewriting, and its soundness and completeness.

#### **2 Preliminaries**

We assume a standard first-order language with equality and signature Σ = (Ω, ∅) where the only predicate symbol is equality ≈. N denotes a set of clauses, C, D denote clauses, L, K, H denote equational literals, A, B denote equational atoms, t, s terms from T(Ω, X ) for an infinite set of variables X , f, g, h function symbols from Ω, a, b, c constants from Ω and x, y, z variables from X . The function comp denotes the complement of a literal. We write s ≈ t as a shortcut for ¬(s ≈ t). The literal s # t may denote both s ≈ t and s ≈ t. The semantics of first-order logic and semantic entailment |= is defined as usual.

By σ, τ, δ we denote substitutions, which are total mappings from variables to terms. Let σ be a substitution, then its finite domain is defined as dom(σ) := {x | xσ = x} and its codomain is defined as codom(σ) = {t | xσ = t, x ∈ dom(σ)}. We extend their application to literals, clauses and sets of such objects in the usual way. A term, literal, clause or sets of these objects is ground if it does not contain any variable. A substitution σ is ground if codom(σ) is ground. A substitution σ is grounding for a term t, literal L, clause C if tσ, Lσ, Cσ is ground, respectively. By C· σ, L· σ we denote a closure consisting of a clause C, literal L and a grounding substitution σ, respectively. The function gnd computes the set of all ground instances of a literal, clause, or clause set. The function mgu denotes the most general unifier of terms, atoms, literals, respectively. We assume that mgus do not introduce fresh variables and that they are idempotent.

The set of positions pos(L) of a literal (term pos(t)) is inductively defined as usual. The notion L|<sup>p</sup> denotes the subterm of a literal L (t|<sup>p</sup> for term t) at position p ∈ pos(L) (p ∈ pos(t)). The replacement of a subterm of a literal L (term t) at position p ∈ pos(L) (p ∈ pos(t)) by a term s is denoted by L[s]<sup>p</sup> (t[s]p). For example, the term f(a, g(x)) has the positions {, 1, 2, 21}, f(a, g(x))|<sup>21</sup> = x and f(a, g(x))[b]<sup>2</sup> denotes the term f(a, b).

Let <sup>R</sup> be a set of rewrite rules <sup>l</sup> <sup>→</sup> <sup>r</sup>, called a *term rewrite system* (TRS). The rewrite relation →R⊆ T(Ω, X ) × T(Ω, X ) is defined as usual by s →<sup>R</sup> t if there exists (l → r) ∈ R, p ∈ pos(s), and a matcher σ, such that s|<sup>p</sup> = lσ and t = s[rσ]p. We write s = t↓<sup>R</sup> if s is the normal form of t in the rewrite relation →R. We write s # t = (s # t )↓<sup>R</sup> if s is the normal form of s and t is the normal form of t . A rewrite relation is terminating if there is no infinite descending chain t<sup>0</sup> → t<sup>1</sup> → ... and confluent if t <sup>∗</sup> ← s →<sup>∗</sup> t implies t ↔<sup>∗</sup> t . A rewrite relation is convergent if it is terminating and confluent. A rewrite order is a irreflexive and transitive rewrite relation. A TRS R is terminating, confluent, convergent, if the rewrite relation →<sup>R</sup> is terminating, confluent, convergent, respectively. A term t is called irreducible by a TRS R if no rule from R rewrites t. Otherwise it is called reducible. A literal, clause is irreducible if all of its terms are irreducible, and reducible otherwise. A substitution σ is called irreducible if any t ∈ codom(σ) is irreducible, and reducible otherwise.

Let ≺<sup>T</sup> denote a well-founded rewrite ordering on terms which is total on ground terms and for all ground terms t there exist only finitely many ground terms <sup>s</sup> <sup>≺</sup><sup>T</sup> <sup>t</sup>. We call <sup>≺</sup><sup>T</sup> <sup>a</sup> *desired* term ordering. We extend <sup>≺</sup><sup>T</sup> to equations by assigning the multiset {s, t} to positive equations s ≈ t and {s, s, t, t} to inequations s ≈ t. Furthermore, we identify ≺<sup>T</sup> with its multiset extension comparing multisets of literals. For a (multi)set of terms {t1,...,tn} and a term t, we define {t1,...,tn} ≺<sup>T</sup> t if {t1,...,tn} ≺<sup>T</sup> {t}. For a (multi)set of Literals {L1,...,Ln} and a term t, we define {L1,...,Ln} ≺<sup>T</sup> t if {L1,...,Ln} ≺<sup>T</sup> {{t}}. Given a ground term β then gnd<sup>≺</sup><sup>T</sup> <sup>β</sup> computes the set of all ground instances of a literal, clause, or clause set where the groundings are smaller than β according to the ordering ≺<sup>T</sup> . Given a set (sequence) of ground literals Γ let conv(Γ) be a convergent rewrite system out of the positive equations in Γ using ≺<sup>T</sup> .

Let ≺ be a well-founded, total, strict ordering on ground literals, which is lifted to clauses and clause sets by its respective multiset extension. We overload ≺ for literals, clauses, clause sets if the meaning is clear from the context. The ordering is lifted to the non-ground case via instantiation: we define C ≺ D if for all grounding substitutions σ it holds Cσ ≺ Dσ. Then we define as the reflexive closure of <sup>≺</sup> and <sup>N</sup> <sup>C</sup> := {<sup>D</sup> <sup>|</sup> <sup>D</sup> <sup>∈</sup> <sup>N</sup> and <sup>D</sup> <sup>C</sup>} and use the standard superposition style notion of redundancy [2].

**Definition 1 (Clause Redundancy).** *A ground clause* <sup>C</sup> *is* redundant *with respect to a set* <sup>N</sup> *of ground clauses and an ordering* <sup>≺</sup> *if* <sup>N</sup> <sup>C</sup> <sup>|</sup><sup>=</sup> <sup>C</sup>*. A clause* <sup>C</sup> *is* redundant *with respect to a clause set* <sup>N</sup> *and an ordering* <sup>≺</sup> *if for all* <sup>C</sup> <sup>∈</sup> gnd(C)*,* <sup>C</sup> *is redundant with respect to* gnd(N)*.*

### **3 The SCL(EQ) Calculus**

We start the introduction of the calculus by defining the ingredients of an SCL(EQ) state.

**Definition 2 (Trail).** *A* trail <sup>Γ</sup> := [L<sup>i</sup>1:C1·σ<sup>1</sup> <sup>1</sup> , ..., L<sup>i</sup>n:Cn·σ<sup>n</sup> <sup>n</sup> ] *is a consistent sequence of ground equations and inequations where* <sup>L</sup><sup>j</sup> *is annotated by a level* <sup>i</sup><sup>j</sup> *with* <sup>i</sup><sup>j</sup>−<sup>1</sup> <sup>≤</sup> <sup>i</sup><sup>j</sup> *, and a closure* <sup>C</sup><sup>j</sup> · <sup>σ</sup><sup>j</sup> *. We omit the annotations if they are not needed in a certain context. A ground literal* <sup>L</sup> *is true in* <sup>Γ</sup> *if* <sup>Γ</sup> <sup>|</sup><sup>=</sup> <sup>L</sup>*. A ground literal* <sup>L</sup> *is false in* <sup>Γ</sup> *if* <sup>Γ</sup> <sup>|</sup><sup>=</sup> comp(L)*. A ground literal* <sup>L</sup> *is undefined in* <sup>Γ</sup> *if* <sup>Γ</sup> |<sup>=</sup> <sup>L</sup> *and* <sup>Γ</sup> |<sup>=</sup> comp(L)*. Otherwise it is defined. For each literal* <sup>L</sup><sup>j</sup> *in* <sup>Γ</sup> *it holds that* <sup>L</sup><sup>j</sup> *is undefined in* [L1, ..., L<sup>j</sup>−<sup>1</sup>] *and irreducible by* conv({L1, ..., L<sup>j</sup>−<sup>1</sup>})*.*

The above definition of truth and undefinedness is extended to clauses in the obvious way. The notions of true, false, undefined can be parameterized by a ground term β by saying that L is β-undefined in a trail Γ if β ≺<sup>T</sup> L or L is undefined. The notions of a β-true, β-false term are restrictions of the above notions to literals smaller β, respectively. All SCL(EQ) reasoning is layered with respect to a ground term β.

**Definition 3.** *Let* <sup>Γ</sup> *be a trail and* <sup>L</sup> *a ground literal such that* <sup>L</sup> *is defined in* <sup>Γ</sup>*. By* core(Γ;L) *we denote a minimal subsequence* <sup>Γ</sup> <sup>⊆</sup> <sup>Γ</sup> *such that* <sup>L</sup> *is defined in* <sup>Γ</sup> *. By* cores(Γ;L) *we denote the set of all cores.*

Note that core(Γ;L) is not necessarily unique. There can be multiple cores for a given trail Γ and ground literal L.

**Definition 4 (Trail Ordering).** *Let* <sup>Γ</sup> := [L1, ..., Ln] *be a trail. The (partial) trail ordering* <sup>≺</sup><sup>Γ</sup> *is the sequence ordering given by* <sup>Γ</sup>*, i.e.,* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>L</sup><sup>j</sup> *if* i<j *for all* <sup>1</sup> <sup>≤</sup> i, j <sup>≤</sup> <sup>n</sup>*.*

**Definition 5 (Defining Core and Defining Literal).** *For a trail* <sup>Γ</sup> *and a sequence of literals* <sup>Δ</sup> <sup>⊆</sup> <sup>Γ</sup> *we write* max≺<sup>Γ</sup> (Δ) *for the largest literal in* <sup>Δ</sup> *according to the trail ordering* <sup>≺</sup><sup>Γ</sup> *. Let* <sup>Γ</sup> *be a trail and* <sup>L</sup> *a ground literal such that* <sup>L</sup> *is defined in* <sup>Γ</sup>*. Let* <sup>Δ</sup> <sup>∈</sup> cores(Γ;L) *be a sequence of literals where* max≺<sup>Γ</sup> (Δ) <sup>Γ</sup> max≺<sup>Γ</sup> (Λ) *for all* <sup>Λ</sup> <sup>∈</sup> cores(Γ;L)*, then* max<sup>Γ</sup> (L) := max≺<sup>Γ</sup> (Δ) *is called the* defining literal *and* <sup>Δ</sup> *is called a* defining core *for* <sup>L</sup> *in* <sup>Γ</sup>*. If* cores(Γ;L) *contains only the empty core, then* <sup>L</sup> *has no* defining literal *and no* defining core*.*

Note that there can be multiple defining cores but only one defining literal for any defined literal <sup>L</sup>. For example, consider a trail <sup>Γ</sup> := [f(a) <sup>≈</sup> <sup>f</sup>(b)1:C1·σ<sup>1</sup> , a <sup>≈</sup> <sup>b</sup>2:C2·σ<sup>2</sup> , b <sup>≈</sup> <sup>c</sup>3:C3·σ<sup>3</sup> ] with an ordering <sup>≺</sup><sup>T</sup> that orders the terms of the equations from left to right, and a literal g(f(a)) ≈ g(f(c)). Then the defining cores are Δ<sup>1</sup> := [a ≈ b, b ≈ c] and Δ<sup>2</sup> := [f(a) ≈ f(b), b ≈ c]. The defining literal, however, is in both cases b ≈ c. Defined literals that have no defining core and therefore no defining literal are literals that are trivially false or true. Consider, for example, g(f(a)) ≈ g(f(a)). This literal is trivially true in Γ. Thus an empty subset of Γ is sufficient to show that g(f(a)) ≈ g(f(a)) is defined in Γ.

**Definition 6 (Literal Level).** *Let* <sup>Γ</sup> *be a trail. A ground literal* <sup>L</sup> <sup>∈</sup> <sup>Γ</sup> *is of* level <sup>i</sup> *if* <sup>L</sup> *is annotated with* <sup>i</sup> *in* <sup>Γ</sup>*. A defined ground literal* <sup>L</sup> ∈ <sup>Γ</sup> *is of level* <sup>i</sup> *if the defining literal of* <sup>L</sup> *is of level* <sup>i</sup>*. If* <sup>L</sup> *has no defining literal, then* <sup>L</sup> *is of level* <sup>0</sup>*. A ground clause* <sup>D</sup> *is of level* <sup>i</sup> *if* <sup>i</sup> *is the maximum level of a literal in* D*.*

The restriction to minimal subsequences for the defining literal and definition of a level eventually guarantee that learned clauses are smaller in the trail ordering. This enables completeness in combination with learning non-redundant clauses as shown later.

**Lemma 7.** *Let* <sup>Γ</sup><sup>1</sup> *be a trail and* <sup>K</sup> *a defined literal that is of level* <sup>i</sup> *in* <sup>Γ</sup><sup>1</sup>*. Then* <sup>K</sup> *is of level* <sup>i</sup> *in a trail* <sup>Γ</sup> := <sup>Γ</sup>1, Γ<sup>2</sup>*.*

**Definition 8.** *Let* <sup>Γ</sup> *be a trail and* <sup>L</sup> <sup>∈</sup> <sup>Γ</sup> *a literal.* <sup>L</sup> *is called a* decision literal *if* <sup>Γ</sup> <sup>=</sup> <sup>Γ</sup>0, K<sup>i</sup>:C·<sup>τ</sup> , L<sup>i</sup>+1:C- ·τ- , Γ<sup>1</sup>*. Otherwise* <sup>L</sup> *is called a* propagated literal*.*

In our above example g(f(a)) ≈ g(f(c)) is of level 3 since the defining literal b ≈ c is annotated with 3. a ≈ b on the other hand is of level 2.

We define a well-founded total strict ordering which is induced by the trail and with which non-redundancy is proven in Sect. 4. Unlike SCL [14,18] we use this ordering for the inference rules as well. In previous SCL calculi, conflict resolution automatically chooses the greatest literal and resolves with this literal. In SCL(EQ) this is generalized. Coming back to our running example above, suppose we have a conflict clause f(b) ≈ f(c)∨b ≈ c. The defining literal for both inequations is b ≈ c. So we could do paramodulation inferences with both literals. The following ordering makes this non-deterministic choice deterministic.

**Definition 9 (Trail Induced Ordering).** *Let* <sup>Γ</sup> := [Li1:C1·σ<sup>1</sup> <sup>1</sup> , ..., Lin:Cn·σ<sup>n</sup> <sup>n</sup> ] *be a trail,* <sup>β</sup> *a ground term such that* {L1, ..., Ln} ≺<sup>T</sup> <sup>β</sup> *and* <sup>M</sup>i,j *all* <sup>β</sup>*defined ground literals not contained in* <sup>Γ</sup> <sup>∪</sup> comp(Γ)*: for a defining literal* max<sup>Γ</sup> (Mi,j ) = <sup>L</sup><sup>i</sup> *and for two literals* <sup>M</sup>i,j *,* <sup>M</sup>i,k *we have* j<k *if* <sup>M</sup>i,j <sup>≺</sup><sup>T</sup> <sup>M</sup>i,k*. The trail induces a total well-founded strict order* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *on* <sup>β</sup>*-defined ground literals* <sup>M</sup>k,l, Mm,n*,* <sup>L</sup>i*,* <sup>L</sup><sup>j</sup> *of level greater than zero, where*

*1.* <sup>M</sup>i,j <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>M</sup>k,l *if* i<k *or (*<sup>i</sup> <sup>=</sup> <sup>k</sup> *and* j<l*) 2.* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>L</sup><sup>j</sup> *if* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>L</sup><sup>j</sup> *3.* comp(Li) <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>L</sup><sup>j</sup> *if* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>L</sup><sup>j</sup> *4.* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> comp(L<sup>j</sup> ) *if* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>L</sup><sup>j</sup> *or* <sup>i</sup> <sup>=</sup> <sup>j</sup> *5.* comp(Li) <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> comp(L<sup>j</sup> ) *if* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>L</sup><sup>j</sup> *6.* <sup>L</sup><sup>i</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>M</sup>k,l*,* comp(Li) <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>M</sup>k,l *if* <sup>i</sup> <sup>≤</sup> <sup>k</sup> *7.* <sup>M</sup>k,l <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>L</sup><sup>i</sup>*,* <sup>M</sup>k,l <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> comp(Li) *if* k<i

*and for all* <sup>β</sup>*-defined literals* <sup>L</sup> *of level zero:*

*8.* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> :=≺<sup>T</sup> *9.* <sup>L</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>K</sup> *if* <sup>K</sup> *is of level greater than zero and* <sup>K</sup> *is* <sup>β</sup>*-defined*

*and can eventually be extended to* <sup>β</sup>*-undefined ground literals* K, H *by*

*10.* <sup>K</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>H</sup> *if* <sup>K</sup> <sup>≺</sup><sup>T</sup> <sup>H</sup> *11.* <sup>L</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>H</sup> *if* <sup>L</sup> *is* <sup>β</sup>*-defined*

*The literal ordering* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *is extended to ground clauses by multiset extension and identified with* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *as well.*

# **Lemma 10 (Properties of** ≺<sup>Γ</sup> <sup>∗</sup> **).**

*1.* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *is well-defined. 2.* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *is a total strict order, i.e.* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *is irreflexive, transitive and total. 3.* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *is a well-founded ordering.*

*Example 11.* Assume a trail <sup>Γ</sup> := [<sup>a</sup> <sup>≈</sup> <sup>b</sup>1:C0·σ<sup>0</sup> , c <sup>≈</sup> <sup>d</sup>1:C1·σ<sup>1</sup> , f(a ) ≈ f(b )1:C2·σ<sup>2</sup> ], select KBO as the term ordering <sup>≺</sup><sup>T</sup> where all symbols have weight one and a ≺ a ≺ b ≺ b ≺ c ≺ d ≺ f and a ground term β := f(f(a)). According to the trail induced ordering we have that a ≈ b ≺<sup>Γ</sup> <sup>∗</sup> c ≈ d ≺<sup>Γ</sup> <sup>∗</sup> f(a ) ≈ f(b ) by 9.2. Furthermore we have that

$$a \approx b \prec\_{\Gamma^{\*}} a \not\not\simeq b \prec\_{\Gamma^{\*}} c \approx d \prec\_{\Gamma^{\*}} c \not\simeq d \prec\_{\Gamma^{\*}} f(a') \not\simeq f(b') \prec\_{\Gamma^{\*}} f(a') \approx f(b')$$

by 9.3 and 9.4. Now for any literal <sup>L</sup> that is <sup>β</sup>-*defined* in <sup>Γ</sup> and the defining literal is a ≈ b it holds that a ≈ b ≺<sup>Γ</sup> <sup>∗</sup> L ≺<sup>Γ</sup> <sup>∗</sup> c ≈ d by 9.6 and 9.7. This holds analogously for all literals that are <sup>β</sup>-*defined* in <sup>Γ</sup> and the defining literal is <sup>c</sup> <sup>≈</sup> <sup>d</sup> or f(a ) ≈ f(b ). Thus we get:

$$\begin{array}{c} L\_1 \prec\_{\Gamma^\*} \dots \prec\_{\Gamma^\*} a \approx b \prec\_{\Gamma^\*} a \not\simeq b \prec\_{\Gamma^\*} f(a) \approx f(b) \prec\_{\Gamma^\*} f(a) \not\simeq f(b) \prec\_{\Gamma^\*} \\\ c \approx d \prec\_{\Gamma^\*} c \not\simeq d \prec\_{\Gamma^\*} f(c) \approx f(d) \prec\_{\Gamma^\*} f(c) \not\simeq f(d) \prec\_{\Gamma^\*} f(d) \prec\_{\Gamma^\*} \\\ f(a') \not\not\simeq f(b') \prec\_{\Gamma^\*} f(a') \approx f(b') \prec\_{\Gamma^\*} a' \approx b' \prec\_{\Gamma^\*} a' \not\approx b' \prec\_{\Gamma^\*} K\_1 \prec\_{\Gamma^\*} \dots \end{array}$$

where <sup>K</sup><sup>i</sup> are the <sup>β</sup>-*undefined* literals and <sup>L</sup><sup>j</sup> are the trivially defined literals.

**Definition 12 (Rewrite Step).** *A* rewrite step *is a five-tuple* (s#t· σ, s#<sup>t</sup> <sup>∨</sup> <sup>C</sup>· σ, R, S, p) *and inductively defined as follows. The tuple* (s#t· σ, s#<sup>t</sup> <sup>∨</sup> <sup>C</sup>· σ, , , ) *is a rewrite step. Given rewrite steps* R, S *and a position* <sup>p</sup> *then* (s#t· σ, s#t∨C· σ, R, S, p) *is a* rewrite step*. The literal* <sup>s</sup>#<sup>t</sup> *is called the* rewrite literal*. In case* R, S *are not , the rewrite literal of* <sup>R</sup> *is an equation.*

Rewriting is one of the core features of our calculus. The following definition describes a rewrite inference between two clauses. Note that unlike the superposition calculus we allow rewriting below variable level.

**Definition 13 (Rewrite Inference).** *Let* <sup>I</sup><sup>1</sup> := (l<sup>1</sup> <sup>≈</sup> <sup>r</sup>1· <sup>σ</sup>1, l<sup>1</sup> <sup>≈</sup> <sup>r</sup><sup>1</sup> <sup>∨</sup> <sup>C</sup>1· <sup>σ</sup>1, R1, L1, p1) *and* <sup>I</sup><sup>2</sup> := (l2#r2· <sup>σ</sup>2, l2#r2∨C2· <sup>σ</sup>2, R2, L2, p2) *be two variable disjoint rewrite steps where* <sup>r</sup>1σ<sup>1</sup> <sup>≺</sup><sup>T</sup> <sup>l</sup>1σ1*,* (l2#r2)σ2|<sup>p</sup> <sup>=</sup> <sup>l</sup>1σ<sup>1</sup> *for some position* <sup>p</sup>*. We distinguish two cases:*


**Lemma 14.** *Let* <sup>I</sup><sup>1</sup> := (l<sup>1</sup> <sup>≈</sup> <sup>r</sup>1· <sup>σ</sup>1, l<sup>1</sup> <sup>≈</sup> <sup>r</sup><sup>1</sup> <sup>∨</sup> <sup>C</sup>1· <sup>σ</sup>1, R1, L1, p1) *and* <sup>I</sup><sup>2</sup> := (l2#r2· <sup>σ</sup>2, l2#r<sup>2</sup> <sup>∨</sup> <sup>C</sup>2· <sup>σ</sup>2, R2, L2, p2) *be two variable disjoint rewrite steps where* <sup>r</sup>1σ<sup>1</sup> <sup>≺</sup><sup>T</sup> <sup>l</sup>1σ1*,* (l2#r2)σ2|<sup>p</sup> <sup>=</sup> <sup>l</sup>1σ<sup>1</sup> *for some position* <sup>p</sup>*. Let* <sup>I</sup><sup>3</sup> := (l3#r3· <sup>σ</sup>3, l3#r<sup>3</sup> <sup>∨</sup> <sup>C</sup>3· <sup>σ</sup>3, I1, I2, p) *be the result of a rewrite inference. Then:*


Now that we have defined rewrite inferences we can use them to define a *reduction chain application* and a *refutation*, which are sequences of rewrite steps. Intuitively speaking, a *reduction chain application* reduces a literal in a clause with literals in conv(Γ) until it is irreducible. A *refutation* for a literal <sup>L</sup> that is <sup>β</sup>-*false* in <sup>Γ</sup> for a given <sup>β</sup>, is a sequence of rewrite steps with literals in Γ, L such that ⊥ is inferred. Refutations for the literals of the conflict clause will be examined during conflict resolution by the rule Explore-Refutation.

**Definition 15 (Reduction Chain).** *Let* <sup>Γ</sup> *be a trail. A* reduction chain <sup>P</sup> *from* <sup>Γ</sup> *is a sequence of rewrite steps* [I1, ..., Im] *such that for each* <sup>I</sup><sup>i</sup> <sup>=</sup> (si#ti· <sup>σ</sup>i, si#t<sup>i</sup> <sup>∨</sup> <sup>C</sup>i· <sup>σ</sup>i, I<sup>j</sup> , Ik, pi) *either*


*Let* (<sup>l</sup> # <sup>r</sup>)δo:<sup>l</sup> # <sup>r</sup>∨C·<sup>δ</sup> *be an annotated ground literal. A* reduction chain application *from* <sup>Γ</sup> *to* <sup>l</sup> # <sup>r</sup> *is a reduction chain* [I1, ..., Im] *from* Γ,(<sup>l</sup> # <sup>r</sup>)δo:<sup>l</sup> # <sup>r</sup>∨C·<sup>δ</sup> *such that* lδ↓conv(Γ) <sup>=</sup> <sup>s</sup>mσ<sup>m</sup> *and* rδ↓conv(Γ) <sup>=</sup> <sup>t</sup>mσm*. We assume reduction chain applications to be minimal, i.e., if any rewrite step is removed from the sequence it is no longer a reduction chain application.*

**Definition 16 (Refutation).** *Let* <sup>Γ</sup> *be a trail and* (<sup>l</sup> # <sup>r</sup>)δo:<sup>l</sup> # <sup>r</sup>∨C·<sup>δ</sup> *an annotated ground literal that is* <sup>β</sup>*-false in* <sup>Γ</sup> *for a given* <sup>β</sup>*. A* refutation <sup>P</sup> *from* <sup>Γ</sup> *and* <sup>l</sup> # <sup>r</sup> *is a reduction chain* [I1, ..., Im] *from* Γ,(<sup>l</sup> # <sup>r</sup>)δ<sup>o</sup>:<sup>l</sup> # <sup>r</sup>∨C·<sup>δ</sup> *such that* (sm#tm)σ<sup>m</sup> <sup>=</sup> <sup>s</sup> ≈ <sup>s</sup> *for some* <sup>s</sup>*. We assume refutations to be minimal, i.e., if any rewrite step* <sup>I</sup><sup>k</sup>*,* k<m *is removed from the refutation, it is no longer a refutation.*

#### **3.1 The SCL(EQ) Inference Rules**

We can now define the rules of our calculus based on the previous definitions. <sup>A</sup> *state* is a six-tuple (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) similar to the SCL calculus, where <sup>Γ</sup> <sup>a</sup> sequence of annotated ground literals, N and U the sets of initial and learned clauses, β is a ground term such that for all L ∈ Γ it holds L ≺<sup>T</sup> β, k is the decision level, and D a status that is , ⊥ or a closure C · σ. Before we propagate or decide any literal, we make sure that it is irreducible in the current trail. Together with the design of ≺<sup>Γ</sup> <sup>∗</sup> this eventually enables rewriting as a simplification rule.

#### **Propagate**

(Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; ) <sup>⇒</sup>SCL(EQ) (Γ, sm#tmσ<sup>k</sup>:(sm#tm∨Cm)·σ<sup>m</sup> <sup>m</sup> ; N;U; β; k; ) provided there is a C ∈ (N ∪U), σ grounding for C, C = C0∨C1∨L, Γ |= ¬C0σ, <sup>C</sup>1<sup>σ</sup> <sup>=</sup> Lσ∨...∨Lσ, <sup>C</sup><sup>1</sup> <sup>=</sup> <sup>L</sup>1∨...∨Ln, <sup>μ</sup> <sup>=</sup> mgu(L1, ..., Ln, L) Lσ is <sup>β</sup>-*undefined* in Γ, (C<sup>0</sup> ∨ L)μσ ≺<sup>T</sup> β, σ is irreducible by conv(Γ), [I1,...,Im] is a reduction chain application from <sup>Γ</sup> to Lσ<sup>k</sup>:(L∨C0)μ·<sup>σ</sup> where <sup>I</sup><sup>m</sup> = (sm#tm· <sup>σ</sup>m, sm#t<sup>m</sup> <sup>∨</sup> Cm· σm, I<sup>j</sup> , Ik, pm).

Note that the definition of Propagate also includes the case where Lσ is irreducible by Γ. In this case L = sm#t<sup>m</sup> and m = 1. The rule Decide below, is similar to Propagate, except for the subclause <sup>C</sup><sup>0</sup> which must be <sup>β</sup>-*undefined* or <sup>β</sup>-*true* in <sup>Γ</sup>, i.e., Propagate cannot be applied and the decision literal is annotated by a tautology.

#### **Decide**

(Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; ) <sup>⇒</sup>SCL(EQ) (Γ, sm#tmσ<sup>k</sup>+1:(sm#tm∨comp(sm#tm))·σ<sup>m</sup> <sup>m</sup> ; N;U; β; k + 1; )

provided there is a C ∈ (N ∪ U), σ grounding for C, C = C<sup>0</sup> ∨ L, C0σ is <sup>β</sup>-*undefined* or <sup>β</sup>-*true* in <sup>Γ</sup>, Lσ is <sup>β</sup>-*undefined* in <sup>Γ</sup>, (C<sup>0</sup> <sup>∨</sup> <sup>L</sup>)<sup>σ</sup> <sup>≺</sup><sup>T</sup> <sup>β</sup>, <sup>σ</sup> is irreducible by conv(Γ), [I1,...,Im] is a reduction chain application from Γ to Lσ<sup>k</sup>+1:L∨C0·<sup>σ</sup> where <sup>I</sup><sup>m</sup> = (sm#tm· <sup>σ</sup>m, sm#t<sup>m</sup> <sup>∨</sup> <sup>C</sup>m· <sup>σ</sup>m, I<sup>j</sup> , Ik, pm).

#### **Conflict**

(Γ; N;U; β; k; )⇒SCL(EQ) (Γ; N;U; β; k; D) provided there is a D ∈ (N ∪ U), σ grounding for D , D <sup>σ</sup> is <sup>β</sup>-*false* in <sup>Γ</sup>, <sup>σ</sup> is irreducible by conv(Γ), D = ⊥ if D σ is of level 0 and D = D · σ otherwise.

For the non-equational case, when a conflict clause is found by an SCL calculus [14,18], the complements of its first-order ground literals are contained in the trail. For equational literals this is not the case, in general. The proof showing <sup>D</sup> to be <sup>β</sup>-*false* with respect to <sup>Γ</sup> is a rewrite proof with respect to conv(Γ). This proof needs to be analyzed to eventually perform paramodulation steps on <sup>D</sup> or to replace <sup>D</sup> by a <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> smaller <sup>β</sup>-*false* clause showing up in the proof.

#### **Skip**

(Γ,K<sup>l</sup>:C·<sup>τ</sup> , L<sup>k</sup>:C- ·τ- ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup> · <sup>σ</sup>)⇒SCL(EQ) (Γ,K<sup>l</sup>:C·<sup>τ</sup> ; <sup>N</sup>;U; <sup>β</sup>; <sup>l</sup>; <sup>D</sup> · <sup>σ</sup>) if Dσ is <sup>β</sup>-*false* in Γ,K<sup>l</sup>:C·<sup>τ</sup> .

The Explore-Refutation rule is the FOL with Equality counterpart to the resolve rule in CDCL or SCL. While in CDCL or SCL complementary literals of the conflict clause are present on the trail and can directly be used for resolution steps, this needs a generalization for FOL with Equality. Here, in general, we need to look at (rewriting) refutations of the conflict clause and pick an appropriate clause from the refutation as the next conflict clause.

#### **Explore-Refutation**

(Γ, L; N;U; β; k; (D∨s # t)· σ)) ⇒SCL(EQ) (Γ, L; N;U; β; k; (sj#tj∨C<sup>j</sup> )· σ<sup>j</sup> ) if (s # t)σ is strictly ≺<sup>Γ</sup> <sup>∗</sup> maximal in (D ∨ s # t)σ, L is the defining literal of (s # t)σ, [I1, ..., Im] is a refutation from Γ and (s # t)σ, I<sup>j</sup> = (sj#t<sup>j</sup> · σ<sup>j</sup> ,(sj#t<sup>j</sup> ∨ C<sup>j</sup> )· σ<sup>j</sup> , Il, Ik, p<sup>j</sup> ), 1 ≤ j ≤ m, (s<sup>j</sup> # t<sup>j</sup> ∨ C<sup>j</sup> )σ<sup>j</sup> ≺<sup>Γ</sup> <sup>∗</sup> (D ∨ s # t)σ, (sj#t<sup>j</sup> ∨ C<sup>j</sup> )σ<sup>j</sup> is <sup>β</sup>-*false* in <sup>Γ</sup>.

#### **Factorize**

(Γ; N;U; β; k; (D ∨ L ∨ L ) · σ)⇒SCL(EQ) (Γ; N;U; β; k; (D ∨ L)μ · σ) provided Lσ = L σ, and μ = mgu(L, L ).

#### **Equality-Resolution**

(Γ; N;U; β; k; (D ∨ s ≈ s )· σ)⇒SCL(EQ) (Γ; N;U; β; k; Dμ · σ) provided sσ = s σ, μ = mgu(s, s ).

#### **Backtrack**

(Γ, K, Γ ; N;U; β; k; (D ∨ L) · σ)⇒SCL(EQ) (Γ; N;U ∪ {D ∨ L}; β; j − i; ) provided Dσ is of level i where i < k, K is of level j and Γ,K the minimal trail subsequence such that there is a grounding substitution τ with (D ∨L)τ β-false in Γ,K but not in Γ; i = 1 if K is a decision literal and i = 0 otherwise.

#### **Grow**

```
(Γ; N;U; β; k; ) ⇒SCL(EQ) (; N;U; β
                                     ; 0; )
provided β ≺T β
                .
```
In addition to soundness and completeness of the SCL(EQ) rules their tractability in practice is an important property for a successful implementation. In particular, finding propagating literals or detecting a false clause under some grounding. It turns out that these operations are NP-complete, similar to first-order subsumption which has been shown to be tractable in practice.

**Lemma 17.** *Assume that all ground terms* <sup>t</sup> *with* <sup>t</sup> <sup>≺</sup><sup>T</sup> <sup>β</sup> *for any* <sup>β</sup> *are polynomial in the size of* <sup>β</sup>*. Then testing Propagate (Conflict) is NP-Complete, i.e., the problem of checking for a given clause* <sup>C</sup> *whether there exists a grounding substitution* <sup>σ</sup> *such that* Cσ *propagates (is false) is NP-Complete.*

*Example 18 (SCL(EQ) vs. Superposition: Saturation).* Consider the following clauses:

$$N := \{ C\_1 := c \approx d \lor D, C\_2 := a \approx b \lor c \not\approx d, C\_3 := f(a) \not\approx f(b) \lor g(c) \not\approx g(d) \}$$

where again we assume a KBO with all symbols having weight one, precedence d ≺ c ≺ b ≺ a ≺ g ≺ f and β := f(f(g(a))). Suppose that we first decide <sup>c</sup> <sup>≈</sup> <sup>d</sup> and then propagate <sup>a</sup> <sup>≈</sup> <sup>b</sup>: <sup>Γ</sup> = [<sup>c</sup> <sup>≈</sup> <sup>d</sup>1:c≈d∨c≈<sup>d</sup>, a <sup>≈</sup> <sup>b</sup>1:C<sup>2</sup> ]. Now we have a conflict with C3. Explore-Refutation applied to the conflict clause C<sup>3</sup> results in a paramodulation inference between C<sup>3</sup> and C2. Another application of Equality-Resolution gives us the new conflict clause C<sup>4</sup> := c ≈ d∨g(c) ≈ g(d). Now we can Skip the last literal on the trail, which gives us <sup>Γ</sup> = [<sup>c</sup> <sup>≈</sup> <sup>d</sup>1:c≈d∨c≈<sup>d</sup>]. Another application of the Explore-Refutation rule to C<sup>4</sup> using the decision justification clause followed by Equality-Resolution and Factorize gives us C<sup>5</sup> := c ≈ d. Thus with SCL(EQ) the following clauses remain:

$$\begin{aligned} C\_1' &= D & C\_5 &= c \not\not\le d\\ C\_3 &= f(a) \not\not\le f(b) \lor g(c) \not\approx g(d) \end{aligned}$$

where we derived C <sup>1</sup> out of C<sup>1</sup> by subsumption resolution [33] using C5. Actually, subsumption resolution is compatible with the general redundancy notion of SCL(EQ), see Lemma 25. Now we consider the same example with superposition and the very same ordering (N<sup>i</sup> is the clause set of the previous step and N<sup>0</sup> the initial clause set N).

$$\begin{array}{l} N\_0 \Rightarrow\_{\operatorname{Sup}(C\_2, C\_3)} N\_1 \cup \{ C\_4 := c \not\not\le d \lor g(c) \not\not\le g(d) \} \\ \Rightarrow\_{\operatorname{Sup}(C\_1, C\_4)} N\_2 \cup \{ C\_5 := c \not\not\le d \lor D \} \Rightarrow\_{\operatorname{Sup}(C\_1, C\_5)} N\_3 \cup \{ C\_6 := D \} \end{array}$$

Thus superposition ends up with the following clauses:

$$\begin{array}{ll} C\_2 = a \approx b \lor c \not\ni d & C\_3 = f(a) \not\ni f(b) \lor g(c) \not\ni g(d) \\ C\_4 = c \not\ni d \lor g(c) \not\ni g(d) \; C\_6 = D \end{array}$$

The superposition calculus generates more and larger clauses.

*Example 19 (SCL(EQ) vs. Superposition: Refutation).* Suppose the following set of clauses: N := {C<sup>1</sup> := f(x) ≈ a ∨ f(x) ≈ b, C<sup>2</sup> := f(f(y)) ≈ y, C<sup>3</sup> := a ≈ b} where again we assume a KBO with all symbols having weight one, precedence b ≺ a ≺ f and β := f(f(f(a))). A long refutation by the superposition calculus results in the following (N<sup>i</sup> is the clause set of the previous step and N<sup>0</sup> the initial clause set N):

$$\begin{array}{l} N\_0 \Rightarrow\_{\text{Sup}(C\_1, C\_2)} N\_1 \cup \{ C\_4 := y \not\simeq a \lor f(f(y)) \approx b \} \\ \Rightarrow\_{\text{Sup}(C\_1, C\_4)} N\_2 \cup \{ C\_5 := a \not\not\simeq b \lor f(f(y)) \approx b \lor y \not\simeq a \} \\ \Rightarrow\_{\text{Sup}(C\_2, C\_5)} N\_3 \cup \{ C\_6 := a \not\not\simeq b \lor b \approx y \lor y \not\simeq a \} \\ \Rightarrow\_{\text{Sup}(C\_2, C\_4)} N\_4 \cup \{ C\_7 := y \approx b \lor y \not\simeq a \} \\ \Rightarrow\_{EqRes(C\_7)} N\_5 \cup \{ C\_8 := a \approx b \} \Rightarrow\_{\text{Sup}(C\_3, C\_8)} N\_6 \cup \{ \bot \} \end{array}$$

The shortest refutation by the superposition calculus is as follows:

$$\begin{array}{l} N\_0 \Rightarrow\_{\text{Sup}(C\_1, C\_2)} N\_1 \cup \{ C\_4 := y \not\Rightarrow a \lor f(f(y)) \approx b \} \\ \Rightarrow\_{\text{Sup}(C\_2, C\_4)} N\_2 \cup \{ C\_5 := y \approx b \lor y \not\approx a \} \\ \Rightarrow\_{EqRes(C\_5)} N\_3 \cup \{ C\_6 := a \approx b \} \Rightarrow\_{\text{Sup}(C\_3, C\_6)} N\_4 \cup \{ \bot \} \end{array}$$

In SCL(EQ) on the other hand we would always first propagate a ≈ b, f(f(a)) ≈ a and f(f(b)) ≈ b. As soon as a ≈ b and f(f(a)) ≈ a are propagated we have a conflict with C1{x → f(a)}. So suppose in the worst case we propagate:

$$\Gamma := \left[ a \not\not\Rightarrow b^{0:a\not\Rightarrow b}, f(f(b)) \approx b^{0:(f(f(y))\approx y)\{y\to b\}}, f(f(a)) \approx a^{0:(f(f(y))\approx y)\{y\to a\}} \right]^2$$

Now we have a conflict with C1{x → f(a)}. Since there is no decision literal on the trail, *Conflict* rule immediately returns <sup>⊥</sup> and we are done.

### **4 Soundness and Completeness**

In this section we show soundness and refutational completeness of SCL(EQ) under the assumption of a regular run. We provide the definition of a regular run and show that for a regular run all learned clauses are non-redundant according to our trail induced ordering. We start with the definition of a sound state.

**Definition 20.** *A state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *is sound if the following conditions hold:*


**Lemma 21.** *The initial state* (; <sup>N</sup>; <sup>∅</sup>; <sup>β</sup>; 0; ) *is sound.*

**Definition 22.** *A run is a sequence of applications of SCL(EQ) rules starting from the initial state.*

**Theorem 23.** *Assume a state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *resulting from a run. Then* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *is sound.*

Next, we give the definition of a regular run. Intuitively speaking, in a regular run we are always allowed to do decisions except if


To ensure non-redundant learning we enforce at least one application of Skip during conflict resolution except for the special case of a conflict after a decision.

# **Definition 24 (Regular Run).** *A run is called* regular *if*


Now we show that any learned clause in a regular run is non-redundant according to our trail induced ordering.

**Lemma 25 (Non-Redundant Clause Learning).** *Let* <sup>N</sup> *be a clause set. The clauses learned during a regular run in SCL(EQ) are not redundant with respect to* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *and* <sup>N</sup> <sup>∪</sup> <sup>U</sup>*. For the trail only non-redundant clauses need to be considered.*

The proof of Lemma 25 is based on the fact that conflict resolution eventually produces a clause smaller then the original conflict clause with respect to ≺<sup>Γ</sup> <sup>∗</sup> . All simplifications, e.g., contextual rewriting, as defined in [2,20,33,35–37], are therefore compatible with Lemma 25 and may be applied to the newly learned clause as long as they respect the induced trail ordering. In detail, let Γ be the trail before the application of rule Backtrack. The newly learned clause can be simplified according to the induced trail ordering ≺<sup>Γ</sup> <sup>∗</sup> as long as the simplified clause is smaller with respect to ≺<sup>Γ</sup> <sup>∗</sup> .

Another important consequence of Lemma 25 is that newly learned clauses need not to be considered for redundancy. Furthermore, the SCL(EQ) calculus always terminates, Lemma 33, because there only finitely many non-redundant clauses with respect to a fixed β.

For dynamic redundancy, we have to consider the fact that the induced trail ordering changes. At this level, only redundancy criteria and simplifications that are compatible with *all* induced trail orderings may be applied. Due to the construction of the induced trail ordering, it is compatible with ≺<sup>T</sup> for unit clauses.

**Lemma 26 (Unit Rewriting).** *Assume a state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *resulting from a regular run where the current level* k > <sup>0</sup> *and a unit clause* <sup>l</sup> <sup>≈</sup> <sup>r</sup> <sup>∈</sup> <sup>N</sup>*. Now assume a clause* <sup>C</sup> <sup>∨</sup> <sup>L</sup>[<sup>l</sup> ]<sup>p</sup> <sup>∈</sup> <sup>N</sup> *such that* <sup>l</sup> <sup>=</sup> lμ *for some matcher* <sup>μ</sup>*. Now assume some arbitrary grounding substitutions* <sup>σ</sup> *for* <sup>C</sup> <sup>∨</sup>L[<sup>l</sup> ]p*,* <sup>σ</sup> *for* <sup>l</sup> <sup>≈</sup> <sup>r</sup> *such that* lσ <sup>=</sup> <sup>l</sup> <sup>σ</sup> *and* rσ <sup>≺</sup><sup>T</sup> lσ*. Then* (<sup>C</sup> <sup>∨</sup> <sup>L</sup>[rμσσ ]p)σ ≺<sup>Γ</sup> <sup>∗</sup> (C ∨ L[l ]p)σ *.*

In addition, any notion that is based on a literal subset relationship is also compatible with ordering changes. The standard example is subsumption.

**Lemma 27.** *Let* C, D *be two clauses. If there exists a substitution* <sup>σ</sup> *such that* Cσ <sup>⊂</sup> <sup>D</sup>*, then* <sup>D</sup> *is redundant with respect to* <sup>C</sup> *and any* <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> *.*

The notion of redundancy, Definition 1, only supports a strict subset relation for Lemma 27, similar to the superposition calculus. However, the newly generated clauses of SCL(EQ) are the result of paramodulation inferences [28]. In a recent contribution to dynamic, abstract redundancy [32] it is shown that also the non-strict subset relation in Lemma 27, i.e., Cσ ⊆ D, preserves completeness.

If all stuck states, see below Definition 28, with respect to a fixed β are visited before increasing β then this provides a simple dynamic fairness strategy.

When unit reduction or any other form of supported rewriting is applied to clauses smaller than the current β, it can be applied independently from the current trail. If, however, unit reduction is applied to clauses larger than the current β then the calculus must do a restart to its initial state, in particular the trail must be emptied, as for otherwise rewriting may result generating a conflict that did not exist with respect to the current trail before the rewriting. This is analogous to a restart in CDCL once a propositional unit clause is derived and used for simplification. More formally, we add the following new Restart rule to the calculus to reset the trail to its initial state after a unit reduction.

#### **Restart**

(Γ; N;U; β; k; ) ⇒SCL(EQ) (; N;U; β; 0; )

Next we show refutation completeness of SCL(EQ). To achieve this we first give a definition of a stuck state. Then we show that stuck states only occur if all ground literals <sup>L</sup> <sup>≺</sup><sup>T</sup> <sup>β</sup> are <sup>β</sup>-*defined* in <sup>Γ</sup> and not during conflict resolution. Finally we show that conflict resolution will always result in an application of Backtrack. This allows us to show termination (without application of Grow) and refutational completeness.

**Definition 28 (Stuck State).** *A state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *is called* stuck *if* <sup>D</sup> <sup>=</sup> <sup>⊥</sup> *and none of the rules of the calculus, except for Grow, is applicable.*

**Lemma 29 (Form of Stuck States).** *If a regular run (without rule Grow) ends in a stuck state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>)*, then* <sup>D</sup> <sup>=</sup> *and all ground literals* Lσ <sup>≺</sup><sup>T</sup> <sup>β</sup>*, where* <sup>L</sup> <sup>∨</sup> <sup>C</sup> <sup>∈</sup> (<sup>N</sup> <sup>∪</sup> <sup>U</sup>) *are* <sup>β</sup>*-defined in* <sup>Γ</sup>*.*

**Lemma 30.** *Suppose a sound state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *resulting from a regular run where* <sup>D</sup> ∈ {, ⊥}*. If* Backtrack *is not applicable then any set of applications of Explore-Refutation ,* Skip*, Factorize, Equality-Resolution will finally result in a sound state* (Γ ; N;U; β; k; D )*, where* <sup>D</sup> <sup>≺</sup><sup>Γ</sup> <sup>∗</sup> <sup>D</sup>*. Then Backtrack will be finally applicable.*

**Corollary 31 (Satisfiable Clause Sets).** *Let* <sup>N</sup> *be a satisfiable clause set. Then any regular run without rule Grow will end in a stuck state, for any* <sup>β</sup>*.*

Thus a stuck state can be seen as an indication for a satisfiable clause set. Of course, it remains to be investigated whether the clause set is actually satisfiable. Superposition is one of the strongest approaches to detect satisfiability and constitutes a decision procedure for many decidable first-order fragments [4,19]. Now given a stuck state and some specific ordering such as KBO, LPO, or some polynomial ordering [17], it is decidable whether the ordering can be instantiated from a stuck state such that Γ coincides with the superposition model operator on the ground terms smaller than β. In this case it can be effectively checked whether the clauses derived so far are actually saturated by the superposition calculus with respect to this specific ordering. In this sense, SCL(EQ) has the same power to decide satisfiability of first-order clause sets than superposition.

**Definition 32.** *A regular run terminates in a state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup>) *if* <sup>D</sup> <sup>=</sup> *and no rule is applicable, or* <sup>D</sup> <sup>=</sup> <sup>⊥</sup>*.*

**Lemma 33.** *Let* <sup>N</sup> *be a set of clauses and* <sup>β</sup> *be a ground term. Then any regular run that never uses Grow terminates.*

**Lemma 34.** *If a regular run reaches the state* (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>⊥</sup>) *then* <sup>N</sup> *is unsatisfiable.*

**Theorem 35 (Refutational Completeness).** *Let* <sup>N</sup> *be an unsatisfiable clause set, and* <sup>≺</sup><sup>T</sup> *a desired term ordering. For any ground term* <sup>β</sup> *where* gnd<sup>≺</sup><sup>T</sup> <sup>β</sup>(N) *is unsatisfiable, any regular SCL(EQ) run without rule Grow will terminate by deriving* <sup>⊥</sup>*.*

### **5 Discussion**

We presented SCL(EQ), a new sound and complete calculus for reasoning in firstorder logic with equality. We will now discuss some of its aspects and present ideas for future work beyond the scope of this paper.

The trail induced ordering, Definition 9, is the result of letting the calculus follow the logical structure of the clause set on the literal level and at the same time supporting rewriting at the term level. It can already be seen by examples on ground clauses over (in)equations over constants that this combination requires a layered approach as suggested by Definition 9, see [24].

In case the calculus runs into a stuck state, i.e., the current trail is a model for the set of considered ground instances, then the trail information can be effectively used for a guided continuation. For example, in order to use the trail to certify a model, the trail literals can be used to guide the design of a lifted ordering for the clauses with variables such that propagated trail literals are maximal in respective clauses. Then it could be checked by superposition, if the current clause is saturated by such an ordering. If this is not the case, then there must be a superposition inference larger than the current β, thus giving a hint on how to extend β. Another possibility is to try to extend the finite set of ground terms considered in a stuck state to the infinite set of all ground terms by building extended equivalence classes following patterns that ensure decidability of clause testing, similar to the ideas in [14]. If this fails, then again this information can be used to find an appropriate extension term β for rule Grow.

In contrast to superposition, SCL(EQ) does also inferences below variable level. Inferences in SCL(EQ) are guided by a false clause with respect to a partial model assumption represented by the trail. Due to this guidance and the different style of reasoning this does not result in an explosion in the number of possibly inferred clauses but also rather in the derivation of more general clauses, see [24].

Currently, the reasoning with solely positive equations is done on and with respect to the trail. It is well-known that also inferences from this type of reasoning can be used to speed up the overall reasoning process. The SCL(EQ) calculus already provides all information for such a type of reasoning, because it computes the justification clauses for trail reasoning via rewriting inferences. By an assessment of the quality of these clauses, e.g., their reduction potential with respect to trail literals, they could also be added, independently from resolving a conflict.

The trail reasoning is currently defined with respect to rewriting. It could also be performed by congruence closure [26].

Towards an implementation, the aspect of how to find interesting ground decision or propagation literals for the trail can be treated similar to CDCL [11, 21,25,29]. A simple heuristic may be used from the start, like counting the number of instance relationships of some ground literal with respect to the clause set, but later on a bonus system can focus the search towards the structure of the clause sets. Ground literals involved in a conflict or the process of learning a new clause get a bonus or preference. The regular strategy requires the propagation of all ground unit clauses smaller than β. For an implementation a propagation of the (explicit and implicit) unit clauses with variables to the trail will be a better choice. This complicates the implementation of refutation proofs and rewriting (congruence closure), but because every reasoning is layered by a ground term β this can still be efficiently done.

**Acknowledgments.** This work was partly funded by DFG grant 389792660 as part of TRR 248, see https://perspicuous-computing.science. We thank the anonymous reviewers and Martin Desharnais for their thorough reading, detailed comments, and corrections.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Term Orderings for Non-reachability of (Conditional) Rewriting**

Akihisa Yamada(B)

National Institute of Advanced Industrial Science and Technology, Tokyo, Japan akihisa.yamada@aist.go.jp

**Abstract.** We propose generalizations of reduction pairs, well-established techniques for proving termination of term rewriting, in order to prove unsatisfiability of reachability (infeasibility) in plain and conditional term rewriting. We adapt the weighted path order, a merger of the Knuth–Bendix order and the lexicographic path order, into the proposed framework. The proposed approach is implemented in the termination prover NaTT, and the strength of our approach is demonstrated through examples and experiments.

#### **1 Introduction**

In the research area of term rewriting, among the most well-studied topics are termination, confluence, and reachability analyses.

In termination analysis, a crucial task used to be to design *reduction orders*, well-founded orderings over terms that are closed under contexts and substitutions. Well-known examples of such orderings include the *Knuth–Bendix ordering* [14], *polynomial interpretations* [18], *multiset/lexicographic path ordering* [4,13], and *matrix interpretations* [5]. The *dependency pair framework* generalized reduction orders into *reduction pairs* [2,9,12], and there are a number of implementations that automatically find reduction pairs, e.g., AProVE [7], TTT2 [16], MU-TERM [11], NaTT [35], competing in the International Termination Competition [8].

Traditional reachability analysis (cf. [6]) has been concerned with the possibility of rewriting a given source term s to a target t, where variables in the terms are treated as constants. There is an increasing need for solving a more general question: is it possible to instantiate variables so that the instance of s rewrites to the instance of t? Let us illustrate the problem with an elementary example.

*Example 1.* Consider the following TRS encoding addition of natural numbers:

<sup>R</sup>add := { add(0, *<sup>y</sup>*) <sup>→</sup> *<sup>y</sup>*, add(s(*<sup>x</sup>* ), *<sup>y</sup>*) <sup>→</sup> <sup>s</sup>(add(*<sup>x</sup>* , *<sup>y</sup>*)) }

The reachability constraint add(s(*<sup>x</sup>* ), *<sup>y</sup>*) *y* represents the possibility of rewriting from add(s(*<sup>x</sup>* ), *<sup>y</sup>*) to *<sup>y</sup>*, where variables *<sup>x</sup>* and *<sup>y</sup>* can be arbitrary terms.

This (un)satisfiability problem of reachability, also called (in)feasibility, plays important roles in termination [24] and confluence analyses of (conditional) rewriting [21]. A tool competition dedicated for this problem has been founded as the infeasibility (INF) category in the International Confluence Competition (CoCo) since 2019 [25].

In this paper, we propose a new method for proving unsatisfiability of reachability, using the term ordering techniques developed for termination analysis. Specifically, in Sect. 3, we first generalize reduction pairs to *rewrite pairs*, and show that they can be used for proving unsatisfiability of reachability. We further generalize the notion to *co-rewrite pairs*, yielding a sound and complete method. The power of the proposed method is demonstrated by importing (relaxed) *semantic* term orderings from termination analysis.

In order to import also *syntactic* term orderings, in Sect. 4 we identify a condition when the *weighted path order (WPO)* [36] forms a rewrite pair. Since KBO and LPO are instances of WPO, we see that these orderings can also be used in our method. In Sect. 5 we also present how to derive co-rewrite pairs from WPO.

In Sect. 6, we adapt the approach into conditional rewriting. Section 7 reports on the implementation and experiments conducted on examples in the paper and the benchmark set of CoCo 2021.

*Related Work* Our rewrite pairs are essentially Aoto's *discrimination pairs* [1] which are closed under substitutions. On way of disproving confluence, Aoto introduced discrimination pairs and used them in proving non-joinability. The *joinability* of terms s and t is expressed as <sup>∃</sup>u. s <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>u</sup> <sup>←</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup>, while the current paper is concerned with <sup>∃</sup>θ. sθ <sup>→</sup><sup>∗</sup> <sup>R</sup> tθ. As substitutions are not considered, discrimination pairs do not need closure under substitutions, and Aoto's insights are mainly for dealing with the reverse rewriting ←<sup>∗</sup> R.

Lucas and Guti´errez [19] proposed reducing infeasibility to the model finding of first-order logic. Our formulations especially in Sect. 6 are similar to theirs. A crucial difference is that, while they encode the closure properties and order properties into logical formulas and delegate these tasks to the background theory solvers, we ensure these properties by means of reduction pairs, for which well-established techniques exist in the literature.

Sternagel and Yamada [30] proposed a framework for analyzing reachability by combining basic logical manipulations, and Guti´errez and Lucas [10] proposed another framework, similar to the dependency pair framework. The present work focuses on atomic analysis techniques, and is orthogonal to these efforts of combining techniques.

#### **2 Preliminaries**

We assume familiarity with term rewriting, cf. [3] or [32]. For a binary relation denoted by a symbol like , we denote its dual relation by and the negated relation by . Relation composition is denoted by ◦.

Throughout the paper we fix a set V of *variable symbols*. A *signature* is a set <sup>F</sup> of function symbols, where each f ∈ F is associated with its *arity*, the number of arguments. The set of *terms* built from F and V is denoted by <sup>T</sup> (F, <sup>V</sup>), where a term is either in <sup>V</sup> or of form <sup>f</sup>(s<sup>1</sup>,...,sn) where <sup>f</sup> ∈ F is <sup>n</sup>-ary and <sup>s</sup><sup>1</sup>,...,sn ∈ T (F, <sup>V</sup>). Given a term <sup>s</sup> ∈ T (F, <sup>V</sup>) and a *substitution* θ : V→T (F, <sup>V</sup>), sθ denotes the term obtained from s by replacing every variable x by θ(x). A *context* is a term C ∈ T (F, V∪{}) where a special variable occurs exactly once. Given s ∈ T (F, <sup>V</sup>), we denote by C[s] the term obtained by replacing by s in C.

A relation over terms is *closed under substitutions (resp. contexts)* iff s t implies sθ tθ for any substitution θ (resp. C[s] C[t] for any context C). Relations over terms that are closed under contexts and substitutions are called *rewrite relations*. Rewrite relations which are also preorders are called *rewrite preorders*, and those which are strict orders are *rewrite orders*. Well-founded rewrite orders are called *reduction orders*.

A *term rewrite system (TRS)* R is a (usually finite) relation over terms, where each l, r ∈R is called a *rewrite rule* and written l <sup>→</sup> r. We do not require the usual assumption that l /∈ V and variables occurring in r must occur in l. The *rewrite step* →<sup>R</sup> induced by TRS R is the least rewrite relation containing R. Its reflexive transitive closure is denoted by →<sup>∗</sup> <sup>R</sup>, which is the least rewrite preorder containing R.

<sup>A</sup> *reachability atom* is a pair of terms s and t, written s t. We say that s t is <sup>R</sup>*-satisfiable* iff sθ <sup>→</sup><sup>∗</sup> <sup>R</sup> tθ for some <sup>θ</sup>, and <sup>R</sup>*-unsatisfiable* otherwise.

# **3 Term Orderings for Non-reachability**

*Reduction pairs* constitute the core ingredient in proving termination with dependency pairs. Just as rewrite orders generalize reduction orders, we first introduce the notion of "rewrite pairs" by removing the well-foundedness assumption of reduction pairs.

**Definition 1 (rewrite pair).** *We call a pair* , *of relations an* order pair *if is a preorder, is irreflexive,* ⊆ *, and* ◦ ◦⊆ *. A* rewrite pair *is an order pair* , *over terms such that both and are closed under substitutions and is closed under contexts. It is called a* reduction pair *if moreover is well-founded.*

Standard definitions of reduction pairs put less order-like assumptions than the above definition, but the above (more natural) assumptions do not lose the generality of previous definitions [34]. Due to these assumptions, our rewrite pair satisfies the assumption of discrimination pairs [1].

The following statement is our first observation: a rewrite pair can prove non-reachability.

**Theorem 1.** *If* , *is a rewrite pair,* R⊆ *and* s t*, then* s t *is* R*-unsatisfiable.*

A similar observation has been made [20, Theorem 11], where wellfoundedness is assumed instead of irreflexivity. Note that irreflexivity is essential: if s s for some s, then we have s s but s s is <sup>R</sup>-satisfiable.

The proof of Theorem 1 will be postponed until more general Theorem 2 will be obtained. Instead, we start with utilizing Theorem 1 by generalizing a classical way of defining reduction pairs: the semantic approach [23].

**Definition 2 (model).** *An* <sup>F</sup>*-algebra* <sup>A</sup> <sup>=</sup> A, [·] *specifies a set* A *called the* carrier *and an* interpretation [f] : A<sup>n</sup> <sup>→</sup> <sup>A</sup> *to each* <sup>n</sup>*-ary* <sup>f</sup> ∈ F*. The evaluation of a term* s *under assignment* α : V → A *is defined as usual and denoted by* [s]α*.*

*<sup>A</sup>* related/preordered <sup>F</sup>-algebra <sup>A</sup>, <sup>=</sup> A, [·], *consists of an* F*-algebra and a relation/preorder on* A*. Given* α : V → A*, we write* [s t]α *to mean* [s]α [t]α*. We write* A |<sup>=</sup> s t *if* [s t]α *holds for every* α : V → A*. We say* <sup>A</sup>, *is a* (relational) model *of a TRS* <sup>R</sup> *if* A |<sup>=</sup> l r *for every* <sup>l</sup> <sup>→</sup> r ∈ R*. We say* <sup>A</sup>, *is* monotone *if* <sup>a</sup>i <sup>a</sup> i *implies* [f](a1,...,ai,...,an) [f](a1,...,a i,...,an) *for arbitrary* <sup>a</sup>1,...,an, a i <sup>∈</sup> <sup>A</sup> *and* <sup>n</sup>*-ary* <sup>f</sup> ∈ F*.*

The notion of relational models is due to van Oostrom [28]. In this paper, we simply call them models. Models in terms of equational theory are models <sup>A</sup>, <sup>=</sup> in the above definition, where monotonicity is inherent. *Quasi-models* of Zantema [37] are preordered (or partially ordered) monotone models. Theorem 1 can be reformulated in the semantic manner as follows:

**Corollary 1.** *If* <sup>≥</sup>, > *is an order pair,* <sup>A</sup>, <sup>≥</sup> *is a monotone model of* R*, and* A |<sup>=</sup> s<t*, then* s t *is* <sup>R</sup>*-unsatisfiable.*

Note that Corollary <sup>1</sup> does not demand well-foundedness on >. In particular, one can employ models over negative numbers (or equivalently, positive numbers with the order pair <sup>≤</sup>, < ).

*Example 2.* Consider again the TRS Radd of Example 1. The monotone ordered <sup>F</sup>-algebra <sup>Z</sup>≤<sup>0</sup>, [·], <sup>≥</sup>defined by

$$[\mathsf{add}](x,y) = x+y \qquad\qquad\qquad[\mathsf{s}](x) = x-1 \qquad\qquad[\mathsf{0}] = \mathsf{0}$$

is a model of <sup>R</sup>add: Whenever x, y <sup>∈</sup> <sup>Z</sup>≤<sup>0</sup>, we have

$$[\mathsf{add}]([0],y) = y \qquad \qquad [\mathsf{add}]([\mathsf{s}](x),y) = x+y-1 = [\mathsf{s}]([\mathsf{add}](x,y))$$

Now we can conclude that the reachability constraint add(s(*<sup>x</sup>* ), *<sup>y</sup>*) *y* is Raddunsatisfiable by <sup>Z</sup>≤<sup>0</sup>, [·] <sup>|</sup><sup>=</sup> add(s(*<sup>x</sup>* ), *<sup>y</sup>*) < *<sup>y</sup>*: Whenever x, y <sup>∈</sup> <sup>Z</sup>≤<sup>0</sup>, we have

$$[\mathtt{add}]([\mathtt{s}](x),y) = x+y-1$$

Observe that in Theorem 1, occurs only in the dual form . Hence we now directly analyze the condition which and should satisfy to prove nonreachability, and this gives a sound and complete method.

**Definition 3 (co-rewrite pair).** *We call a pair* , *of relations over terms a* co-rewrite pair*, if is a rewrite preorder, is closed under substitutions, and* ∩ = ∅*.*

**Theorem 2.** s t *is* <sup>R</sup>*-unsatisfiable if and only if there exists a co-rewrite pair* , *such that* R⊆ *and* s t*.*

*Proof.* For the "if" direction, suppose on the contrary that sθ <sup>→</sup><sup>∗</sup> <sup>R</sup> tθ for some <sup>θ</sup>. Since is a rewrite preorder containing R and →<sup>∗</sup> <sup>R</sup> is the least of such, we must have sθ tθ. On the other hand, since s t and is closed under substitutions, we have sθ tθ. This is not possible since <sup>∪</sup> <sup>=</sup> <sup>∅</sup>.

For the "only if" direction, take →<sup>∗</sup> <sup>R</sup> as and define by <sup>s</sup> <sup>t</sup> iff <sup>s</sup> t is R-unsatisfiable. Then clearly is closed under substitutions, →<sup>∗</sup> <sup>R</sup> <sup>∩</sup> <sup>=</sup> <sup>∅</sup>, and R⊆→<sup>∗</sup> <sup>R</sup>.

Theorem 2 can be more concisely reformulated in the model-oriented manner, as the greatest choice of can be specified: s t iff A |<sup>=</sup> s t.

**Corollary 2.** s t *is* <sup>R</sup>*-unsatisfiable if and only if there exists a monotone preordered model* <sup>A</sup>, <sup>≥</sup> *of* <sup>R</sup> *such that* A |<sup>=</sup> s t*.*

Corollary 2 is useful when models over non-totally ordered carriers are considered. There are important methods (for termination) that crucially rely on such carriers: the *matrix interpretations* [5], or more generally the *tuple interpretations* [15,34].

*Example 3.* Consider the following TRS, where the first rule is from [5]:

$$\mathcal{R}\_{mat} = \{ \text{ f}(\mathbf{f}(x)) \to \mathbf{f}(\mathbf{g}(\mathbf{f}(x))), \ \mathbf{f}(x) \to x \}$$

The preordered {f, <sup>g</sup>}-algebra <sup>N</sup>2, [·], <sup>≥</sup>defined by

$$[\mathbf{f}]\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x+y+1 \\ y+1 \end{pmatrix} \qquad\qquad\qquad [\mathbf{g}]\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x+1 \\ 0 \end{pmatrix}$$

is a model of <sup>R</sup>mat, where <sup>≥</sup> is extended pointwise over <sup>N</sup><sup>2</sup>. Indeed, the first rule is oriented as the following calculation demonstrates:

$$[\mathbf{f}]\left([\mathbf{f}]\begin{pmatrix} x \\ y \end{pmatrix}\right) = \begin{pmatrix} x+2y+3 \\ y+2 \end{pmatrix} \ge \begin{pmatrix} x+y+3 \\ 1 \end{pmatrix} = [\mathbf{f}]\left([\mathbf{g}]\left([\mathbf{f}]\begin{pmatrix} x \\ y \end{pmatrix}\right)\right)$$

and the second rule can be easily checked. Now we prove that *x* <sup>g</sup>(*<sup>x</sup>* ) is <sup>R</sup>matunsatisfiable by Corollary 2. Indeed, <sup>N</sup><sup>2</sup>, [·] <sup>|</sup><sup>=</sup> *<sup>x</sup>* g(*x* ) is shown as follows:

$$
\begin{pmatrix} x \\ y \end{pmatrix} \stackrel{\not\downarrow}{\geq} \begin{pmatrix} x+1 \\ 0 \end{pmatrix} = [\mathbf{g}] \begin{pmatrix} x \\ y \end{pmatrix}
$$

for any x, y <sup>∈</sup> <sup>N</sup>. Note also that Theorem <sup>1</sup> is not applicable, since <sup>N</sup><sup>2</sup>, [·] |=/ *<sup>x</sup>* < <sup>g</sup>(*<sup>x</sup>* ) due to the second coordinate.

We conclude the section by proving Theorem 1 via Theorem 2.

*Proof (of Theorem* <sup>1</sup>*).* We show that , form a co-rewrite pair when , is a rewrite pair. It suffices to show that ∩ = ∅. To this end, suppose on the contrary that s t s. By compatibility, we have s s, which contradicts the irreflexivity of .

### **4 Weighted Path Order for Non-reachability**

The previous section was concerned with the semantic approach towards obtaining (co-)rewrite pairs. In this section we focus on the syntactic approach. We choose the weighted path order (WPO), which subsumes both the lexicographic path order (LPO) and the Knuth-Bendix order (KBO), so the result of this section applies to these more well-known methods. The *multiset path order* [4] can also be subsumed [29], but we omit this extension to keep the presentation simple. WPO is induced by three ingredients: an F-algebra; a *precedence* ordering over function symbols; and a *(partial) status*, which controls the recursive behavior of the ordering.

**Definition 4 (partial status).** *<sup>A</sup>* partial status π *specifies for each* n*-ary* f <sup>∈</sup> <sup>F</sup> *a list* π(f) ∈ {1,...,n}<sup>∗</sup>*, also seen as a set, of its argument positions. We say* <sup>π</sup> *is* total *if* <sup>1</sup>,...,n <sup>∈</sup> <sup>π</sup>(f) *whenever* <sup>f</sup> *is* <sup>n</sup>*-ary. When* <sup>π</sup>(f)=[i1,...,im]*, we denote* [si<sup>1</sup> ,...,si*<sup>m</sup>*] *by* <sup>π</sup><sup>f</sup> (s1,...,sn)*.*

For instance, the *empty* status π(f) = [ ] allows WPO to subsume weakly monotone interpretations [36, Section 4.1]. We allow positions to be duplicated, following [33].

**Definition 5 (WPO** [36]**).** *Let* π *be a partial status,* <sup>A</sup> *an* <sup>F</sup>*-algebra, and* <sup>≥</sup>, > *and* , *be pairs of relations on* A *and* F*, respectively. The* weighted path order WPO(π, <sup>A</sup>, <sup>≥</sup>, >, , )*, or* WPO(A) *or even* WPO *for short, is the pair* WPO, WPO *of relations over terms defined as follows:* <sup>s</sup> WPO <sup>t</sup> *iff*

*1.* A |<sup>=</sup> s>t *or 2.* A |<sup>=</sup> s <sup>≥</sup> t *and (a)* <sup>s</sup> <sup>=</sup> <sup>f</sup>(s<sup>1</sup>,...,sn)*,* <sup>s</sup>i WPO <sup>t</sup> *for some* <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f)*; (b)* <sup>s</sup> <sup>=</sup> <sup>f</sup>(s<sup>1</sup>,...,sn)*,* <sup>t</sup> <sup>=</sup> <sup>g</sup>(t<sup>1</sup>,...,tm)*,* <sup>s</sup> WPO <sup>t</sup>j *for every* <sup>j</sup> <sup>∈</sup> <sup>π</sup>(g) *and i.* f g*, or ii.* <sup>f</sup> <sup>g</sup> *and* <sup>π</sup>f (s<sup>1</sup>,...,sn) lex WPO <sup>π</sup>g(t<sup>1</sup>,...,tm)*.*

*The relation* WPO *is defined similarly, but with* lex WPO *instead of* lex WPO *in (2b-ii) and the following subcase is added in case 2:*

*(c)* s <sup>=</sup> t ∈ V*.*

*Here* lex P , lex P *denotes the* lexicographic extension *of a pair* <sup>P</sup> <sup>=</sup> P , P *of relations, defined by:* [s<sup>1</sup>,...,sn] ( ) lex P [t<sup>1</sup>,...,tm] *iff*

*–* <sup>m</sup> = 0 *and* <sup>n</sup> ( ) <sup>≥</sup> <sup>0</sup>*, or –* m, n > <sup>0</sup> *and* <sup>s</sup><sup>1</sup> > t<sup>1</sup> *or both* <sup>s</sup><sup>1</sup> <sup>P</sup> <sup>t</sup><sup>1</sup> *and* [s<sup>2</sup>,...,sn] ( ) lex P [t<sup>2</sup>,...,tm]*.*

LPO is WPO induced by a total status π and a trivial <sup>F</sup>-algebra as <sup>A</sup>, and is written LPO. Allowing partial statuses corresponds to applying *argument filters* [2,17] (except for collapsing ones). KBO is a special case of WPO where π is total and <sup>A</sup> is induced by an admissible weight function.

For termination analysis, a precondition for WPO to be a reduction pair is crucial. In this work, we only need it to be a rewrite pair; that is, well-foundedness is not necessary. Thus, for instance, it is possible to have <sup>x</sup> WPO <sup>f</sup>(x) by [f](x) = x−1. This explains why s ∈ V is permitted in case 1, which might look useless to those who are already familiar with termination analysis.

We formulate the main claim of this section as follows.

**Definition 6 (**π**-simplicity).** *We say a related* <sup>F</sup>*-algebra* A, [·], <sup>≥</sup> *is* πsimple<sup>1</sup> *for a partial status* <sup>π</sup> *iff* [f](a1,...,an) <sup>≥</sup> <sup>a</sup>i *for arbitrary* <sup>n</sup>*-ary* <sup>f</sup> ∈ F*,* <sup>a</sup>1,...,an <sup>∈</sup> <sup>A</sup>*, and* <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f)*.*

**Proposition 1.** *If* <sup>≥</sup>, > *and* , *are order pairs on* <sup>A</sup> *and* <sup>F</sup>*, and* <sup>A</sup>, <sup>≥</sup> *is monotone and* π*-simple, then* WPO, WPO *is a rewrite pair.*

Under these conditions, it is known that WPO is closed under contexts and WPO is compatible with WPO [36, Lemmas 7, 10, 13]. Later in this section we prove other properties necessary for Proposition 1, for which the claims in [36] must be generalized for the purpose of this paper.

The benefit of having syntax-aware methods can be easily observed by recalling why we have them in termination analysis.

*Example 4* ([13]). Consider the TRS R<sup>A</sup> consisting of the following rules:

$$\mathsf{A}(\mathsf{0},y) \to \mathsf{s}(y) \quad \mathsf{A}(\mathsf{s}(x),\mathsf{0}) \to \mathsf{A}(x,\mathsf{s}(\mathsf{0})) \quad \mathsf{A}(\mathsf{s}(x),\mathsf{s}(y)) \to \mathsf{A}(x,\mathsf{A}(\mathsf{s}(x),y))$$

and suppose that a monotone {A,s, <sup>0</sup>}-algebra <sup>N</sup>, [·], <sup>≥</sup> is a model of RA. Then, denoting the Ackermann function by A, we have

$$[\mathbb{A}]([\mathbf{s}]^m(\mathbf{0}), [\mathbf{s}]^n(\mathbf{0})) \ge [\mathbf{s}]^{A(m,n)}(\mathbf{0})\tag{1}$$

Now consider proving the obvious fact that *x* s(*x* ) is RA-unsatisfiable. This requires <sup>N</sup>, [·] <sup>|</sup><sup>=</sup> *<sup>x</sup>* < <sup>s</sup>(*<sup>x</sup>* ), and then [s] <sup>n</sup>(0) <sup>≥</sup> n by an inductive argument. This is not possible if [A] is primitive recursive (e.g., a polynomial), since (1) with [s] <sup>A</sup>(m,n)(0) <sup>≥</sup> A(m, n) contradicts the well-known fact that the Ackermann function has no primitive-recursive bound.

On the other hand, LPO with A s satisfies R<sup>A</sup> ⊆ LPO (⊆ LPO) and *<sup>x</sup>* LPO <sup>s</sup>(*<sup>x</sup>* ). Thus Theorem <sup>1</sup> with , <sup>=</sup> LPO, LPO proves that *x* s(*x* ) is RA-unsatisfiable, thanks to Proposition 1 and Theorem 1.

<sup>1</sup> Such a property would be called *inflationary* in the mathematics literature. In the term rewriting, the word *simple* has been used (see, e.g., [32]) in accordance with *simplification orders*.

*Example 5.* Consider the TRS consisting of the following rules:

$$\mathcal{R}\_{kbo} := \{ \mathbf{f}(\mathbf{g}(x)) \to \mathbf{g}(\mathbf{f}(\mathbf{f}(x))), \ \mathbf{g}(x) \to x \}$$

WPO (or KBO) induced by <sup>A</sup> <sup>=</sup> <sup>N</sup>, [·] and precedence , such that

$$[\mathbf{f}](x) = x \qquad\qquad\qquad [\mathbf{g}](x) = x+1 \qquad\qquad\qquad \mathbf{f} \succ \mathbf{g}$$

satisfies <sup>R</sup>kbo <sup>⊆</sup>WPO. Thus, for instance <sup>g</sup>(*<sup>x</sup>* ) <sup>g</sup>(f(*<sup>x</sup>* )) is <sup>R</sup>kbo-unsatisfiable by Theorem 1. On the other hand, let A, [·], <sup>≥</sup> with <sup>A</sup> <sup>⊆</sup> <sup>Z</sup> be a model of <sup>R</sup>kbo. Using the idea of [38, Proposition 11], one can show [f](x) <sup>≤</sup> x. Hence, Corollary <sup>2</sup> with models over a subset of integers cannot handle the problem. LPO orients the first rule from right to left and hence cannot handle the problem either.

The power of WPO can also be easily verified, by considering

$$\mathcal{R}\_{wpo} := \mathcal{R}\_{kbo} \cup \{ \text{ f}(\mathsf{h}(x)) \to \mathsf{h}(\mathsf{h}(\mathsf{f}(x))), \text{ f}(x) \to x \}$$

By extending the above WPO with [h](*x* ) = *x* and f h, which does not fall into the class of KBO anymore,<sup>2</sup> we can prove, e.g., that f(*x* ) f(h(*x* )) is R-unsatisfiable. None of the above mentioned methods can handle this problem.

The rest of this section is dedicated for proving Proposition 1. Similar results are present in [36], but they make implicit assumptions such as that ≥ and are preorders. In this paper we need more essential assumptions as we will consider non-transitive relations in the next section.

First we reprove the reflexivity of WPO. The proof also serves as a basis for the more complicated irreflexivity proof.

**Lemma 1.** *If both* <sup>≥</sup> *and are reflexive and* <sup>A</sup>, <sup>≥</sup>*is* π*-simple, then*

*1.* <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f) *implies* <sup>f</sup>(s<sup>1</sup>,...,sn) WPO <sup>s</sup>i*, and 2.* <sup>s</sup> WPO <sup>s</sup>*, i.e.,* WPO *is reflexive.*

*Proof.* As <sup>s</sup> WPO <sup>s</sup> is trivial when <sup>s</sup> ∈ V, we assume <sup>s</sup> <sup>=</sup> <sup>f</sup>(s<sup>1</sup>,...,sn) and prove the two claims by induction on the structure of s. For the first claim, by <sup>π</sup>-simplicity, for any <sup>α</sup> we have [s]<sup>α</sup> = [f]([s<sup>1</sup>]α, . . . , [sn]α) <sup>≥</sup> [si]α, and hence A |<sup>=</sup> <sup>s</sup> <sup>≥</sup> <sup>s</sup>i. By the second claim of induction hypothesis we have <sup>s</sup>i WPO <sup>s</sup>i, and thus <sup>s</sup> WPO <sup>s</sup>i follows by (2a) of Definition 5. Next we show <sup>s</sup> WPO <sup>s</sup> holds by (2b-ii). Indeed, A |<sup>=</sup> <sup>s</sup> <sup>≥</sup> <sup>s</sup> follows from the reflexivity of <sup>≥</sup>; <sup>s</sup> WPO <sup>s</sup>i for every <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f) as shown above; <sup>f</sup> <sup>f</sup> as is reflexive; and finally, <sup>π</sup>f (s<sup>1</sup>,...,sn) lex WPO <sup>π</sup><sup>f</sup> (s<sup>1</sup>,...,sn) is due to induction hypothesis and the fact that lexicographic extension preserves reflexivity.

Using reflexivity, we can show that both WPO and WPO are closed under substitutions. This result will be reused in Sect. 5, where it will be essential that neither <sup>≥</sup> nor > need be transitive.

<sup>2</sup> When [h] is the identity. KBO requires h *f* for any *f*.

**Lemma 2.** *If both* <sup>≥</sup> *and are reflexive and* <sup>A</sup>, <sup>≥</sup> *is* π*-simple, then both* WPO *and* WPO *are closed under substitutions.*

*Proof.* We prove by induction on <sup>s</sup> and <sup>t</sup> that <sup>s</sup> WPO <sup>t</sup> implies sθ WPO tθ and that <sup>s</sup> WPO <sup>t</sup> implies sθ WPO tθ. We prove the first claim by case analysis on how <sup>s</sup> WPO <sup>t</sup> is derived. The other claim is analogous, without case (2c) below.

	- (a) <sup>s</sup> <sup>=</sup> <sup>f</sup>(s1,...,sn) and <sup>s</sup>i WPO <sup>t</sup> for some <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f): In this case, we know <sup>s</sup>i<sup>θ</sup> WPO tθ by induction hypothesis on <sup>s</sup>. Thus (2a) concludes sθ WPO tθ.
	- (b) <sup>s</sup> <sup>=</sup> <sup>f</sup>(s1,...,sn), <sup>t</sup> <sup>=</sup> <sup>g</sup>(t1,...,tm), and <sup>s</sup> WPO <sup>t</sup>j for every <sup>j</sup> <sup>∈</sup> <sup>π</sup>(g): By induction hypothesis on <sup>t</sup>, we have sθ WPO <sup>t</sup>j <sup>θ</sup>. So the precondition of (2b) for sθ WPO tθ is satisfied. There are the following subcases:
		- i. f g: Then (2b-i) concludes.
		- ii. <sup>f</sup> <sup>g</sup> and <sup>π</sup>f (s1,...,sn) lex WPO <sup>π</sup>g(t1,...,tm): Then by induction hypothesis we have <sup>π</sup>f (s1θ,..., snθ) lex WPO <sup>π</sup>g(t1θ,..., tmθ), and thus (2b-ii) concludes.
	- (c) <sup>s</sup> <sup>=</sup> <sup>t</sup> ∈ V: Then we have sθ WPO tθ by Lemma 1.

Irreflexivity of WPO is less obvious to have. In fact, [36] uses well-foundedness to claim it. Here we identify more essential conditions.

**Lemma 3.** *If* <sup>≥</sup>, > *is an order pair on* A*, and is irreflexive on* F*, and* <sup>A</sup>, <sup>≥</sup>*is* <sup>π</sup>*-simple, then* WPO *is irreflexive.*

*Proof.* We show <sup>s</sup> WPO <sup>s</sup> for every <sup>s</sup> by induction on the structure of <sup>s</sup>. This is clear if <sup>s</sup> ∈ V, so consider <sup>s</sup> <sup>=</sup> <sup>f</sup>(s1,...,sn). Since <sup>&</sup>gt; is irreflexive, we have <sup>A</sup> <sup>|</sup>=/ s>s, and thus <sup>s</sup> WPO <sup>s</sup> cannot be due to case <sup>1</sup> of Definition 5. As is irreflexive on <sup>F</sup>, f f and thus (2b-i) is not possible, either. Thanks to induction hypothesis and the fact that lexicographic extension preserves irreflexivity, we have <sup>π</sup>f (s<sup>1</sup>,...,sn) lex WPO <sup>π</sup><sup>f</sup> (s<sup>1</sup>,...,sn), and thus (2b-ii) is not possible either.

The remaining (2a) is more involving. To show <sup>s</sup>i WPO <sup>f</sup>(s<sup>1</sup>,...,sn) for any i <sup>∈</sup> π(f), we prove the following more general claim: s <sup>+</sup> π <sup>s</sup> implies <sup>s</sup> WPO <sup>s</sup>, where π denotes the least relation such that <sup>s</sup>i π <sup>f</sup>(s<sup>1</sup>,...,sn) if <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f). This claim is proved by induction on s . Due to the simplicity assumption, we have A |<sup>=</sup> <sup>s</sup> <sup>≥</sup> <sup>s</sup> for every <sup>s</sup> π <sup>s</sup>, and this generalizes for every <sup>s</sup> <sup>+</sup> π <sup>s</sup> by easy induction and the transitivity of <sup>≥</sup>. Thus we cannot have A |<sup>=</sup> s > s, since A |<sup>=</sup> s <sup>≥</sup> s > s contradicts the assumption that <sup>≥</sup>, > is an order pair. This tells us that <sup>s</sup> WPO <sup>s</sup> cannot be due to case 1. Case (2a) is not applicable thanks to (inner) induction hypothesis on s . Case (2b) is not possible either, since <sup>s</sup> WPO <sup>s</sup> thanks to (outer) induction hypothesis on <sup>s</sup>. This concludes <sup>s</sup> WPO <sup>s</sup> for any <sup>s</sup> <sup>+</sup> π <sup>s</sup>, and in particular <sup>s</sup><sup>i</sup> WPO <sup>s</sup> for any <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f), refuting the last possibility for <sup>s</sup> WPO <sup>s</sup> to hold.

## **5 Co-WPO**

The preceding section demonstrated how to use WPO as a rewrite pair in Theorem 1. In this section we show how to use WPO in combination with Theorem 2, that is, when = WPO, what should be. We show that WPO, where WPO := WPO(π, <sup>A</sup>, <sup>≮</sup>, , <sup>⊀</sup>, ), serves the purpose.

**Proposition 2.** *If* <sup>≥</sup>, > *and* , *are order pairs on* <sup>A</sup> *and* <sup>F</sup>*,* <sup>A</sup>, <sup>≥</sup> *is* <sup>π</sup>*-simple and monotone, then* WPO, WPO *is a co-rewrite pair.*

When <sup>A</sup>, <sup>≥</sup> is not total, Example 3 also demonstrates that using Proposition 2 with Theorem 2 is more powerful than using Proposition 1 in combination with Theorem 1, by taking π(f) = [ ] for every f. At the time of writing, however, it is unclear to the author if the difference still exists when <sup>A</sup>, <sup>≥</sup> is totally ordered but <sup>F</sup>, is not. Nevertheless we will clearly see the merit of Proposition 2 under the setting of conditional rewriting in the next section.

The remainder of this section proves Proposition 2. Unfortunately, WPO does not satisfy many important properties of WPO, mostly due to the fact that <sup>≮</sup>, is not even an order pair. Nevertheless, Lemma 2 is applicable to WPO and gives the following fact:

**Lemma 4.** *If* <sup>≥</sup>, > *is an order pair on* <sup>A</sup>*,* <sup>A</sup>, <sup>≥</sup> *is* π*-simple, and is irreflexive, then* WPO *is closed under substitutions.*

*Proof.* We apply Lemma 2 to WPO. To this end, we need to prove the following:


The remaining task is to show that WPO ∩ WPO = ∅. Due to the mutual inductive definition of WPO, we need to simultaneously prove the property for the other combination: WPO ∩ WPO = ∅.

**Definition 7.** *We say that two pairs* <sup>P</sup> <sup>=</sup> P , P *and* <sup>Q</sup> <sup>=</sup> Q, Q *of relations are* co-compatible *iff* P <sup>∩</sup> Q <sup>=</sup> P ∩ Q <sup>=</sup> <sup>∅</sup>*.*

The next claim is a justification for the word "compatible" in Definition 7. Here the compatibility assumption of order pairs is crucial.

**Proposition 3.** *An order pair* , *is co-compatible with itself.*

*Proof.* Suppose on the contrary that a b and b a. Then we have a a by compatibility, contradicting the irreflexivity of .

**Lemma 5.** *If* <sup>P</sup> <sup>=</sup> P , P *and* <sup>Q</sup> <sup>=</sup> Q, Q *are co-compatible pairs of relations, then* lex P , lex P *and* lex Q , lex Q *are co-compatible.*

*Proof.* Let us assume that both

$$\left[s\_1, \ldots, s\_n\right] \stackrel{\textstyle \neg}{=}\_P^{\ker} \left[t\_1, \ldots, t\_m\right] \tag{2}$$

$$\left[s\_1, \ldots, s\_n\right] \sqsubseteq\_Q^{\text{lex}} \left[t\_1, \ldots, t\_m\right] \tag{3}$$

hold and derive a contradiction. The other part lex P ∩ lex Q is analogous. We proceed by induction on the length of [s<sup>1</sup>,...,sn]. If <sup>n</sup> = 0, then (2) demands m = 0 but (3) demands m > 0. Hence we have n > 0, and then (3) demands m > 0. If <sup>s</sup><sup>1</sup> P <sup>t</sup><sup>1</sup> then by assumption we have <sup>s</sup><sup>1</sup> Q <sup>t</sup><sup>1</sup> but (3) demands <sup>s</sup><sup>1</sup> Q <sup>t</sup><sup>1</sup> (or <sup>s</sup><sup>1</sup> Q <sup>t</sup><sup>1</sup>). Hence (2) is due to <sup>s</sup><sup>1</sup> P <sup>t</sup><sup>1</sup> and [s2,...,sn] lex P [t2,...,tm]. By assumption we have <sup>s</sup><sup>1</sup> Q <sup>t</sup><sup>1</sup>, so (3) is due to <sup>s</sup><sup>1</sup> Q <sup>t</sup><sup>1</sup> and [s2,...,sn] lex Q [t2,...,tm]. We derive a contradiction by induction hypothesis. 

We arrive at the main lemma for WPO.

**Lemma 6.** *If* <sup>≥</sup>, > *and* , *are order pairs on* <sup>A</sup> *and* <sup>F</sup>*, and* <sup>A</sup>, <sup>≥</sup> *is* <sup>π</sup>*-simple, then* WPO *and* WPO *are co-compatible.*

*Proof.* We show that neither <sup>s</sup> WPO <sup>t</sup>∧<sup>s</sup> WPO <sup>t</sup> nor <sup>s</sup> WPO <sup>t</sup>∧<sup>s</sup> WPO <sup>t</sup> hold for any s and t, by induction on the structure of s and then t. Let us assume <sup>s</sup> WPO <sup>t</sup> and prove <sup>s</sup> WPO <sup>t</sup>. The other claim is analogous. We proceed by case analysis on the derivation of <sup>s</sup> WPO <sup>t</sup>.

	- (a) <sup>s</sup> <sup>=</sup> <sup>f</sup>(s1,...,sn), <sup>s</sup>i WPO <sup>t</sup> for some <sup>i</sup> <sup>∈</sup> <sup>π</sup>(f): By induction hypothesis on <sup>s</sup>, we have <sup>s</sup><sup>i</sup> WPO <sup>t</sup>, and thus <sup>s</sup> WPO <sup>t</sup> can only be due to (2a). So <sup>t</sup> <sup>=</sup> <sup>g</sup>(t1,...,tm) and <sup>s</sup> WPO <sup>t</sup><sup>j</sup> for some <sup>j</sup> <sup>∈</sup> <sup>π</sup>(g). Then <sup>s</sup> WPO <sup>t</sup><sup>j</sup> by induction hypothesis on <sup>t</sup>. On the contrary we must have <sup>s</sup> WPO <sup>t</sup>j : By Lemma 1–1. we have <sup>s</sup> WPO <sup>s</sup>i WPO <sup>t</sup> WPO <sup>t</sup>j and hence <sup>s</sup> WPO <sup>t</sup>j as WPO, WPO is an order pair.
	- (b) <sup>s</sup> <sup>=</sup> <sup>f</sup>(s<sup>1</sup>,...,sn), <sup>t</sup> <sup>=</sup> <sup>g</sup>(t<sup>1</sup>,...,tm), and <sup>s</sup> WPO <sup>t</sup>j for every <sup>j</sup> <sup>∈</sup> <sup>π</sup>(g): By induction hypothesis on <sup>t</sup>, we have <sup>s</sup> WPO <sup>t</sup><sup>j</sup> for any <sup>j</sup> <sup>∈</sup> <sup>π</sup>(g). Thus <sup>s</sup> WPO <sup>t</sup> must be due to (2b). We proceed by further considering the following two possibilities.
		- i. <sup>f</sup> <sup>g</sup>: As neither <sup>f</sup> <sup>g</sup> nor <sup>f</sup> <sup>g</sup> hold, <sup>s</sup> WPO <sup>t</sup> is not possible.
		- ii. <sup>f</sup> <sup>g</sup> and <sup>π</sup>f (s<sup>1</sup>,...,sn) lex WPO <sup>π</sup>g(t<sup>1</sup>,...,sm): As <sup>f</sup> <sup>g</sup> does not hold, (2b-i) is not applicable to have <sup>s</sup> WPO <sup>t</sup>. By Lemma <sup>5</sup> and induction hypothesis, we have <sup>π</sup>f (s<sup>1</sup>,...,sn) lex WPO <sup>π</sup>g(t<sup>1</sup>,...,tm) and thus (2b-ii) is also not applicable, either.
	- (c) <sup>s</sup> <sup>=</sup> <sup>t</sup> ∈ V: Then clearly <sup>s</sup> WPO <sup>t</sup> cannot hold.

### **6 Conditional Rewriting**

Conditional term rewriting (cf. [27]) is an extension of term rewriting so that rewrite rules can be guarded by conditions. We are interested in the "oriented" variants, as they naturally correspond to functional programming concepts such as where clauses of Haskell or when clauses of OCaml.

<sup>A</sup> *conditional rewrite rule* l <sup>→</sup> r ⇐ φ consists of terms l and r, and a list <sup>φ</sup> of pairs of terms. We may omit "⇐ [ ]" and write <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup>,...,sn tn for [ s1, t<sup>1</sup> ,..., <sup>s</sup>n, tn ]. A *conditional TRS (CTRS)* R is a set of conditional rewrite rules. A CTRS R yields the rewrite preorder →<sup>∗</sup> <sup>R</sup> by the following derivation rules [22]:

$$\begin{array}{c} \begin{array}{ccc} \hline \hline s \rightarrow\_{\mathcal{R}}^{\*}s & \text{REFL} & \begin{array}{c} s \rightarrow\_{\mathcal{R}} t & t \rightarrow\_{\mathcal{R}}^{\*} u \\ s \rightarrow\_{\mathcal{R}}^{\*} u \end{array} \text{TRANS} \\\hline \hline \begin{array}{c} s\_{i} \rightarrow\_{\mathcal{R}} s'\_{i} \\ f(s\_{1}, \ldots, s\_{i}, \ldots, s\_{n}) \rightarrow\_{\mathcal{R}} f(s\_{1}, \ldots, s'\_{i}, \ldots, s\_{n}) \\\hline l\theta \rightarrow\_{\mathcal{R}} r\theta & \end{array} \text{RULL} \\\hline \begin{array}{c} \begin{array}{c} s\_{1}\theta \rightarrow\_{\mathcal{R}} t \; \theta \end{array} \text{RIGHT} & \text{IF} \ \left(l \rightarrow r \Leftarrow s\_{1} \rightarrow t\_{1}, \ldots, s\_{n} \rightarrow t\_{n}\right) \in \mathcal{R} \\\hline \end{array} \end{array} \end{array}$$

To approximate reachability with respect to CTRSs by means of (co-)rewrite pairs, one needs to be careful when dealing with conditions.

*Example 6.* Consider the following CTRS:

$$\mathcal{R}\_{\mathsf{fg}} := \{ \,\,\mathsf{f}(x) \to x, \,\,\mathsf{g}(x) \to y \Leftarrow \mathtt{f}(x) \twoheadrightarrow y \}$$

and a reachability atom g(*x* ) f(*x* ). One might expect that a rewrite preorder such that

$$\mathbf{f}(x) \sqsupset x \qquad \qquad \qquad \qquad \mathbf{g}(x) \sqsupset y \quad \text{if} \quad \mathbf{f}(x) \sqsupset y$$

can over-approximate →<sup>∗</sup> <sup>R</sup>fg , but this is unfortunately false. For instance, any LPO satisfies the above constraints: f(*x* ) LPO *x* as LPO is a simplification order, and the second constraints also vacuously holds as the condition f(*x* ) LPO *y* is false. However, it is unsound to conclude that g(*x* ) f(*x* ) is Rfg-unsatisfiable even if g(*x* ) LPO f(*x* ): by setting f g one can have g(*x* ) LPO f(*x* ) and g(*x* ) LPO f(*x* ), but g(*x* ) →<sup>R</sup>fg f(*x* ).

A solution is to use co-rewrite pairs already for dealing with conditions.

**Proposition 4.** *If* , *is a co-rewrite pair,* (l <sup>→</sup> r ⇐ φ) ∈ R *implies* l r *or* u v *for some* u v <sup>∈</sup> φ*, and* s t*, then* s t *is* <sup>R</sup>*-unsatisfiable.*

*Proof.* We show that s <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup> implies <sup>s</sup> <sup>t</sup>. This is sufficient, since, then sθ <sup>→</sup><sup>∗</sup> R tθ implies sθ tθ, while s t demands sθ tθ, which is not possible since <sup>∩</sup> <sup>=</sup> <sup>∅</sup>. The claim is proved by induction on the derivation of s <sup>→</sup><sup>∗</sup> R t.

– Refl: Since is reflexive, we have s s.


*Example 7.* Consider the following singleton CTRS:

$$\mathcal{R}\_{\mathsf{ab}} := \{ \mathsf{a} \to \mathsf{b} \leftrightarrow \mathsf{b} \twoheadrightarrow \mathsf{a} \mid \}$$

Proposition 4 combined with LPO or WPO induced by a partial precedence such that a b and b a proves that a b is Rab-unsatisfiable: Clearly b LPO a and a LPO b by case (2b-i) of Definition 5. On the other hand, Proposition 4 with the term ordering induced by a totally ordered algebra <sup>A</sup>, <sup>≥</sup> cannot solve the problem, since A |<sup>=</sup> <sup>a</sup> b implies A |= b ≥ a by totality, which then demands A |= a ≥ b to satisfy the assumption of Proposition 4. For the same reason, WPO induced by a totally ordered algebra and a total precedence cannot handle the problem either.

Note that the condition of the rule in Rab is unsatisfiable, and this is one of the two cases where Proposition 4 is effective. The other case is when a condition can be ignored. Proposition 4 is incomplete when conditions are essential, as in Example 6. For dealing with essential conditional rules, the variable binding in a rule should be taken into account. At this point, a model-oriented formulation (*a la* [19]) seems more suitable.

**Definition 8 (model of CTRS).** *We extend the notation* [s t]α *of Definition <sup>2</sup> to* [φ]α *for an arbitrary Boolean formula* φ *with the single binary predicate in the obvious manner. We say* <sup>A</sup> <sup>=</sup> A, [·] validates φ*, written* A |<sup>=</sup> φ*, iff* [φ]α *for every* α : V → A*. We say a related* <sup>F</sup>*-algebra* <sup>A</sup>, *is <sup>a</sup>* model of a CTRS <sup>R</sup> *iff*<sup>3</sup> A |<sup>=</sup> <sup>l</sup> <sup>r</sup> <sup>∨</sup> <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup> ∨ ··· ∨ <sup>s</sup>n <sup>t</sup>n *for every* (<sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup>,...,sn <sup>t</sup>n) ∈ R*.*

Besides minor simplifications (e.g., we do not need two predicates as we are only concerned with reachability in many steps in this paper), the major difference with [19] is that here we do not encode the monotonicity or order axioms into logical formulas (using R of [19]). Instead, we impose these properties as meta-level assumptions over models.

**Theorem 3.** *For a CTRS* <sup>R</sup>*,* s t *is* <sup>R</sup>*-unsatisfiable if and only if there exists a monotone preordered model* <sup>A</sup>, <sup>≥</sup> *of* <sup>R</sup> *such that* A |<sup>=</sup> s t*.*

<sup>3</sup> Here the formula *<sup>s</sup> t* is a shorthand for ¬ *s t*.

*Proof.* We start with the "if" direction. Let <sup>A</sup>, <sup>≥</sup> be a monotone preordered model of <sup>R</sup>. As in Proposition 4, it suffices to show that s <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup> implies A |<sup>=</sup> s <sup>≥</sup> t. The claim is proved by induction on the derivation of s <sup>→</sup><sup>∗</sup> R t.


Next consider the "only if" direction. We show that <sup>T</sup> (F, <sup>V</sup>),→<sup>∗</sup> R is a model of <sup>R</sup>, that is, for every (<sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>s</sup><sup>1</sup> <sup>t</sup>1,...,sn <sup>t</sup>n) ∈ R, we show <sup>T</sup> (F, <sup>V</sup>) <sup>|</sup><sup>=</sup> l <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>r</sup> <sup>∨</sup> <sup>s</sup><sup>1</sup> →<sup>∗</sup> <sup>R</sup> <sup>t</sup><sup>1</sup> ∨···∨ <sup>s</sup><sup>n</sup> →<sup>∗</sup> <sup>R</sup> <sup>t</sup>n. This means lθ <sup>→</sup><sup>∗</sup> <sup>R</sup> rθ for every <sup>θ</sup> : V → <sup>T</sup> (F, <sup>V</sup>) such that s1<sup>θ</sup> <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup>1θ,..., sn<sup>θ</sup> <sup>→</sup><sup>∗</sup> <sup>R</sup> <sup>t</sup>nθ, which is immediate by Rule. The fact that →<sup>∗</sup> <sup>R</sup> is a preorder and closed under contexts is also immediate. Finally, s t being <sup>R</sup>-unsatisfiable means that sθ →<sup>∗</sup> <sup>R</sup> tθ for any <sup>θ</sup> : V → <sup>T</sup> (F, <sup>V</sup>), that is, <sup>T</sup> (F, <sup>V</sup>) <sup>|</sup><sup>=</sup> s →<sup>∗</sup> <sup>R</sup> <sup>t</sup>.

Putting implementation issues aside, it is trivial to use semantic (termination) methods in Theorem 3.

*Example 8.* Consider again the CTRS Rfg of Example 6. The monotone ordered {f, <sup>g</sup>}-algebra <sup>N</sup>, [·], <sup>≥</sup>defined by

$$[\mathbf{f}](x) = x \qquad\qquad\qquad [\mathbf{g}](x) = x+1$$

is a model of <sup>R</sup>fg, since for arbitrary x, y <sup>∈</sup> <sup>N</sup>, we have

$$[\mathbf{f}](x) \ge x \qquad\qquad [\mathbf{g}](x) = x + 1 \ge y \lor [\mathbf{f}](x) = x \not\succ y$$

Then, with Theorem <sup>3</sup> we can show that <sup>f</sup>(x) <sup>g</sup>(x) is <sup>R</sup>fg-unsatisfiable, as [f](x) = x x +1=[g](x) for every x <sup>∈</sup> <sup>N</sup>.

To use WPO(A) in combination with Theorem 3, we need to validate formulas with predicate WPO(A) in the term algebra <sup>T</sup> (F, <sup>V</sup>). We encode these formulas into formulas with predicates <sup>≥</sup> and >, which are then interpreted in <sup>A</sup>.

**Definition 9 (formal WPO).** *Let* <sup>≥</sup>, > *and* , *be pairs of relations over some set and over* <sup>F</sup>*, respectively, and let* π *be a partial status. We define* wpo(π, <sup>≥</sup>, >, , ) *or* wpo *for short to be the pair* wpo, wpo *, where for terms* s, t ∈ T (F, <sup>V</sup>)*,* <sup>s</sup> wpo <sup>t</sup> *and* <sup>s</sup> wpo <sup>t</sup> *are Boolean formulas defined as follows:*

$$s \sqsupset\_{\mathsf{wpo}} t := s > t \vee (s \ge t \wedge \phi)$$

*where* φ *is* False *if* s ∈ V *and is* i∈π(f) <sup>s</sup><sup>i</sup> wpo <sup>t</sup> *<sup>∨</sup>* <sup>ψ</sup> *if* <sup>s</sup> <sup>=</sup> <sup>f</sup>(s<sup>1</sup>,...,sn)*, and* ψ *is* False *if* t ∈ V *and is*

$$\bigwedge\_{j \in \pi(g)} s \sqsupset\_{\mathsf{vpo}} t\_j \land \left( f \succ g \lor \left( f \succ\_{\sim} g \land \pi\_f(s\_1, \dots, s\_n) \right \sqsupset\_{\mathsf{vpo}}^{\mathsf{lex}} \pi\_g(t\_1, \dots, t\_m) \right))$$

*if* <sup>t</sup> <sup>=</sup> <sup>g</sup>(t<sup>1</sup>,...,tm)*. Formula* <sup>s</sup> wpo <sup>t</sup> *is defined analogously, except that* <sup>φ</sup> *is* True *if* s <sup>=</sup> t ∈ V*, and* lex wpo *in formula* <sup>ψ</sup> *is replaced by* lex wpo*.*

We omit an easy proof that verifies that wpo encodes WPO:

**Lemma 7.** <sup>s</sup> ( ) WPO(A) <sup>t</sup> *iff* A |<sup>=</sup> <sup>s</sup> ( ) wpo <sup>t</sup>*.*

Note carefully that <sup>s</sup> WPO(A) <sup>t</sup> is <sup>A</sup> <sup>|</sup>=/ <sup>s</sup> wpo <sup>t</sup> but not A |<sup>=</sup> <sup>s</sup> wpo <sup>t</sup>. Hence we ensure <sup>s</sup> WPO(A) <sup>t</sup> by A |<sup>=</sup> <sup>s</sup> wpo <sup>t</sup>, where wpo denotes wpo(π, <sup>≮</sup>, , <sup>⊀</sup>, ).

**Theorem 4.** *If* <sup>R</sup> *is a CTRS,* <sup>≥</sup>, > *and* , *are order pairs on* A *and* F*,* <sup>A</sup>, <sup>≥</sup> *is* <sup>π</sup>*-simple and monotone,* A |<sup>=</sup> <sup>l</sup> wpo <sup>r</sup> <sup>∨</sup> <sup>u</sup><sup>1</sup> wpo <sup>v</sup><sup>1</sup> ∨···∨ <sup>u</sup>n wpo <sup>v</sup>n *for every* (<sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>u</sup><sup>1</sup> <sup>v</sup>1,...,un <sup>v</sup>n) ∈ R*, and* A |<sup>=</sup> <sup>s</sup> wpo <sup>t</sup>*, then* <sup>s</sup> t *is* R*-unsatisfiable.*

*Proof.* We apply Theorem 3. To this end, we first show that <sup>T</sup> (F, <sup>V</sup>), WPO(A) is a monotone preordered model of R. Monotonicity and preorderedness are due to Proposition 1. For being a model, let (<sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>u</sup><sup>1</sup> <sup>v</sup>1,...,un <sup>v</sup>n) ∈ R. Due to assumption and Lemma 7, we have <sup>l</sup> WPO(A) <sup>r</sup> <sup>∨</sup> <sup>u</sup><sup>1</sup> WPO(A) <sup>v</sup><sup>1</sup> ∨···∨ <sup>u</sup><sup>n</sup> WPO(A) <sup>v</sup>n. Due to Lemmas <sup>2</sup> and 4, we get lθ WPO(A) rθ <sup>∨</sup> <sup>u</sup>1<sup>θ</sup> WPO(A) <sup>v</sup>1<sup>θ</sup> ∨···∨ <sup>u</sup>n<sup>θ</sup> WPO(A) <sup>v</sup>n<sup>θ</sup> for every <sup>θ</sup> : V→T (F, <sup>V</sup>). With Proposition <sup>2</sup> we conclude <sup>T</sup> (F, <sup>V</sup>) <sup>|</sup><sup>=</sup> <sup>l</sup> WPO(A) <sup>r</sup>∨u<sup>1</sup> WPO(A) <sup>v</sup><sup>1</sup>∨···∨u<sup>n</sup> WPO(A) <sup>v</sup>n. Finally, we need <sup>T</sup> (F, <sup>V</sup>) <sup>|</sup><sup>=</sup> <sup>s</sup> WPO(A) <sup>t</sup>, i.e., sθ WPO(A) tθ for any <sup>θ</sup> : V→T (F, <sup>V</sup>). As we assume <sup>s</sup> WPO(A) <sup>t</sup>, by Lemma <sup>4</sup> we have sθ WPO(A) tθ. By Proposition <sup>2</sup> we conclude sθ WPO(A) tθ.

## **7 Experiments**

The proposed methods are implemented in the termination prover NaTT [35], available at https://www.trs.cm.is.nagoya-u.ac.jp/NaTT/.

Internally, NaTT reduces the problem of finding an algebra A that make <sup>A</sup>, <sup>≥</sup> a model of a TRS R (or WPO(A)⊆ R) into a satisfiability modulo theory (SMT) problem, which is then solved by the backend SMT solver z3 [26]. The implementation of Theorem 1 and Corollary 1 is a trivial adaptation from the termination methods. Cororllary 2 is also trivial for totally ordered carriers, since A |<sup>=</sup> s t is equivalent to A |<sup>=</sup> s<t. Matrix/tuple interpretations are also easy, since A |= (a<sup>1</sup>,...,an) - (b<sup>1</sup>,...,bn) is equivalent to A |<sup>=</sup> <sup>a</sup><sup>1</sup> < b<sup>1</sup>∨···∨an < bn. Theorem 2 with WPO is obtained by parametrizing WPO.


**Table 1.** Experimental results.

Theorem 3 needs some tricks. In the unconditional case, finding a desired algebra A can be encoded into SMT over quantifier-free linear arithmetic for a large class of A [36]. For the conditional case, we need to find (∃) parameters that validates (∀) a disjunctive clause. Farkas' lemma would reduce such a problem into quantifier-free SMT, but then the resulting problem is nonlinear. Experimentally, we observe that our backend z3 performs better on quantified linear arithmetic than quantifier-free nonlinear arithmetic, and hence we choose to leave the ∀ quantifiers.

We conducted experiments using the examples presented in the paper and the examples in the INF category of the standard benchmark set COPS. The execution environment is StarExec [31] with the same settings as CoCo 2019.

Many COPS examples contain conjunctive reachability constraints of form <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup>∧···∧sn <sup>t</sup>n. In this experiment we naively collapsed such a constraint into tp(s<sup>1</sup>,...,sn) tp(t<sup>1</sup>,...,tn) by introducing a fresh function symbol tp. Two benchmarks exceed the scope of oriented CTRSs, on which NaTT immediately gives up.

As co-rewrite pairs we tested algebras <sup>S</sup>*um*, <sup>S</sup>*um*<sup>+</sup>, <sup>S</sup>*um*−, <sup>M</sup>*at*, LPO, and WPO. The basic algebra <sup>S</sup>*um* <sup>=</sup> <sup>Z</sup>, [·] is given by [f](x<sup>1</sup>,...,xn) = <sup>c</sup><sup>0</sup> <sup>+</sup> <sup>n</sup> i=1 <sup>c</sup><sup>i</sup> · <sup>x</sup>i, where <sup>c</sup><sup>0</sup> <sup>∈</sup> <sup>Z</sup>, <sup>c</sup><sup>1</sup>,...,c<sup>n</sup> ∈ {0, <sup>1</sup>}. Similarly <sup>S</sup>*um*<sup>+</sup> and <sup>S</sup>*um*<sup>−</sup> are defined, where the ranges of c<sup>0</sup>, which also determine the carrier, are <sup>N</sup> and <sup>Z</sup>≤<sup>0</sup>, respectively. The algebra <sup>M</sup>*at* represents the 2D matrix interpretations.

Table 1 presents the results. For TRSs, we can observe that our proposed methods advance the state of the art, in the sense that they prove new examples that no tool previously participated in CoCo could handle. As there are only 15 TRS examples in the INF category of COPS 2021, we could not derive interesting observations there. Taking CTRS examples into account, we see S*um* is not as good as <sup>S</sup>*um*<sup>+</sup> or <sup>S</sup>*um*−, while the carrier is bigger (<sup>Z</sup> versus <sup>N</sup> or <sup>Z</sup>≤<sup>0</sup>). This phenomenon is explained as follows: For the latter two one knows variables are bounded by 0 (from below or above), and hence one can have <sup>S</sup>*um*<sup>+</sup> <sup>|</sup><sup>=</sup> *<sup>x</sup>* <sup>≥</sup> <sup>a</sup> or S*um*<sup>−</sup> |= a ≥ *x* by [a] = 0. Neither is possible when the carrier is unbounded. This observation also suggests another choice of carriers that are bounded from below and above, which is left for future work.

From the figures in CTRS examples, S*um*<sup>−</sup> performs the best among our methods. However, <sup>M</sup>*at* and WPO(S*um*<sup>+</sup>) solve more examples if TRS examples are counted. It does not seem appropriate yet to judge practical significance from these experiments.

Finally, we implemented as the default strategy of NaTT 2.2 the sequential application of <sup>S</sup>*um*−, LPO, WPO(S*um*<sup>+</sup>), and <sup>M</sup>*at* after the test NaTT already have implemented. There improvement over previous NaTT 2.1 should be clear, although the number of timeouts (indicated by "TO:") is significant.

# **8 Conclusion**

We proposed generalizations of termination techniques that can prove unsatisfiability of reachability, both for term rewriting and for conditional term rewriting. We implemented the approach in the termination prover NaTT, and experimentally evaluated the significance of the proposed approach.

The implementation focused on evaluating the proposed methods separately. The only implemented way of combining their power is a naive one: apply the tests one by one while they fail. For future work, it will be interesting to incorporate the proposed method into the existing frameworks [10,30].

**Acknowledgments.** The author would like to thank Kiraku Shintani for the technical help with the COPS database system. I would also like to thank Nao Hirokawa, Salvador Lucas, Naoki Nishida, and Sarah Winkler for discussions, and the anonymous reviewers for their detailed comments that improved the presentation of the paper.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Knowledge Representation and Justification**

# **Evonne: Interactive Proof Visualization for Description Logics (System Description)**

Christian Alrabbaa1(B) , Franz Baader1(B) , Stefan Borgwardt1(B) , Raimund Dachselt2(B) , Patrick Koopmann1(B) , and Juli´an M´endez2(B)

<sup>1</sup> Institute of Theoretical Computer Science, TU Dresden, Dresden, Germany *{*christian.alrabbaa,franz.baader,stefan.borgwardt, patrick.koopmann*}*@tu-dresden.de <sup>2</sup> Interactive Media Lab Dresden, TU Dresden, Dresden, Germany

*{*raimund.dachselt,julian.mendez2*}*@tu-dresden.de

**Abstract.** Explanations for description logic (DL) entailments provide important support for the maintenance of large ontologies. The "justifications" usually employed for this purpose in ontology editors pinpoint the parts of the ontology responsible for a given entailment. Proofs for entailments make the intermediate reasoning steps explicit, and thus explain how a consequence can actually be derived. We present an interactive system for exploring description logic proofs, called Evonne, which visualizes proofs of consequences for ontologies written in expressive DLs. We describe the methods used for computing those proofs, together with a feature called *signature-based proof condensation*. Moreover, we evaluate the quality of generated proofs using real ontologies.

## **1 Introduction**

Proofs generated by Automated Reasoning (AR) systems are sometimes presented to humans in textual form to convince them of the correctness of a theorem [9,11], but more often employed as certificates that can automatically be checked [20]. In contrast to the AR setting, where very long proofs may be needed to derive a deep mathematical theorem from very few axioms, DL-based ontologies are often very large, but proofs of a single consequence are usually of a more manageable size. For this reason, the standard method of explanation in description logic [8] has long been to compute so-called *justifications*, which point out a minimal set of source statements responsible for an entailment of interest. For example, the ontology editor Prot´eg´e<sup>1</sup> supports the computation of justifications since 2008 [12], which is very useful when working with large DL ontologies. Nevertheless, it is often not obvious why a given consequence actually follows from such a justification [13]. Recently, this explanation capability has been extended towards showing full *proofs* with intermediate reasoning steps, but this is restricted to ontologies written in the lightweight DLs supported by the Elk reasoner [15,16], and the graphical presentation of proofs is very basic.

<sup>1</sup> https://protege.stanford.edu/.

c The Author(s) 2022

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 271–280, 2022. https://doi.org/10.1007/978-3-031-10769-6\_16

In this paper, we present Evonne as an interactive system, for exploring DL proofs for description logic entailments, using the methods for computing small proofs presented in [3,5]. Initial prototypes of Evonne were presented in [6,10], but since then, many improvements were implemented. While Evonne does more than just visualizing proofs, this paper focuses on the proof component of Evonne: specifically, we give a brief overview of the interface for exploring proofs, describe the proof generation methods implemented in the back-end, and present an experimental evaluation of these proofs generation methods in terms of proof size and run time. The improved back-end uses Java libraries that extract proofs using various methods, such as from the Elk calculus, or *forgetting-based proofs* [3] using the forgetting tools Lethe [17] and Fame [21] in a black-box fashion. The new front-end is visually more appealing than the prototypes presented in [6,10], and allows to inspect and explore proofs using various interaction techniques, such as zooming and panning, collapsing and expanding, text manipulation, and compactness adjustments. Additional features include the minimization of the generated proofs according to various measures and the possibility to select a *known signature* that is used to automatically hide parts of the proofs that are assumed to be obvious for users with certain previous knowledge. Our evaluation shows that proof sizes can be significantly reduced in this way, making the proofs more user-friendly. Evonne can be tried and downloaded at https://imld.de/evonne. The version of Evonne described here, as well as the data and scripts used in our experiments, can be found at [2].

# **2 Preliminaries**

We recall some relevant notions for DLs; for a detailed introduction, see [8]. DLs are decidable fragments of first-order logic (FOL) with a special, variable-free syntax, and that use only unary and binary predicates, called *concept names* and *role names*, respectively. These can be used to build complex *concepts*, which correspond to first-order formulas with one free variable, and *axioms* corresponding to first-order sentences. Which kinds of concepts and axioms can be built depends on the expressivity of the used DL. Here we mainly consider the light-weight DL ELH and the more expressive ALCH. We have the usual notion of FOL *entailment* O |<sup>=</sup> <sup>α</sup> of an axiom <sup>α</sup> from a finite set of axioms <sup>O</sup>, called an ontology. of special interest are entailments of *atomic CIs* (concept inclusions) of the form <sup>A</sup> - B, where <sup>A</sup> and <sup>B</sup> are concept names. Following [3], we define *proofs* of O |<sup>=</sup> <sup>α</sup> as finite, acyclic, directed hypergraphs, where vertices v are labeled with axioms (v) and hyperedges are of the form (S, d), with S a set of vertices and d a vertex such that {(v) <sup>|</sup> <sup>v</sup> <sup>∈</sup> <sup>S</sup>} |<sup>=</sup> (d); the leaves of a proof must be labeled by elements of <sup>O</sup> and the root by <sup>α</sup>. In this paper, all proofs are *trees*, i.e. no vertex can appear in the first component of multiple hyperedges (see Fig. 1).

# **3 The Graphical User Interface**

The user interface of Evonne is implemented as a web application. To support users in understanding large proofs, they are offered various layout options and

**Fig. 1.** Overview of Evonne - a condensed proof in the bidirectional layout

interaction components. The proof visualization is linked to a second view showing the context of the proof in a relevant subset of the ontology. In this ontology view, interactions between axioms are visualized, so that users can understand the context of axioms occurring in the proof. The user can also examine possible ways to eliminate unwanted entailments in the ontology view. The focus of this system description, however, is on the proof component: we describe how the proofs are generated and how users can interact with the proof visualization. For details on the ontology view, we refer the reader to the workshop paper [6], where we also describe how Evonne supports ontology repair.

*Initialization.* After starting Evonne for the first time, users create a new project, for which they specify an ontology file. They can then select an entailed atomic CI to be explained. The user can choose between different proof methods, and optionally select a signature of *known terms* (cf. Sect. 4), which can be generated using the term selection tool Prot´eg´e-TS [14].

*Layout.* Proofs are shown as graphs with two kinds of vertices: colored vertices for axioms, gray ones for inference steps. By default, proofs are shown using a *tree layout*. To take advantage of the width of the display when dealing with long axioms, it is possible to show proofs in a *vertical layout*, placing axioms linearly below each other, with inferences represented through edges on the side (without the inference vertices). It is possible to automatically re-order vertices to minimize the distance between conclusion and premises in each step. The third layout option is the *bidirectional layout* (see Fig. 1), a tree layout where, initially, the entire proof is collapsed into a *magic vertex* that links the conclusion directly to its justification, and from which individual inference steps can be pulled out and pushed back from both directions.

*Exploration.* In all views, each vertex is equipped with multiple functionalities for exploring a proof. For proofs generated with Elk, clicking on an inference vertex shows the inference rule used, and the particular inference with relevant sub-elements highlighted in different colors. Axiom vertices show different button ( ,,, ) when hovered over. In the standard tree layout, users can hide subproofs under an axiom . They can also reveal the previous inference step or the entire-sub-proof . In the vertical layout, the button highlights and explains the inference of the current axiom. In the bidirectional layout, the arrow buttons are used for pulling inference steps out of the magic vertex, as well as pushing them back in.

*Presentation.* <sup>A</sup> *minimap* allows users to keep track of the overall structure of the proof, thus enriching the zooming and panning functionality. Users can adjust width and height of proofs through the options side-bar. Long axiom labels can be *shortened* in two ways: either by setting a fixed size to all vertices, or by abbreviating names based on capital letters. Afterwards, it is possible to restore the original labels individually.

# **4 Proof Generation**

To obtain the proofs that are shown to the user, we implemented different proof generation techniques, some of which were initially described in [3]. For ELH ontologies, proofs can be generated natively by the DL reasoner Elk [16]. These proofs use rules from the calculus described in [16]. We apply the Dijkstra-like algorithm introduced in [4,5] to compute a *minimized proof* from the Elk output. This minimization can be done w.r.t. different measures, such as the size, depth, or weighted sum (where each axiom is weighted by its size), as long as they are *monotone* and *recursive* [5]. For ontologies outside of the ELH fragment, we use the forgetting-based approach originally described in [3], for which we now implemented two alternative algorithms for computing more compact proofs (Sect. 4.1). Finally, independently of the proof generation method, one can specify a signature of known terms. This signature contains terminology that the user is familiar with, so that entailments using only those terms do not need to be explained. The condensation of proofs w.r.t. signatures is described in Sect. 4.2.

#### **4.1 Forgetting-Based Proofs**

In a forgetting-based proof, proof steps represent inferences on concept or role names using a *forgetting* operation. Given an ontology <sup>O</sup> and a predicate name <sup>x</sup>, the result <sup>O</sup><sup>−</sup>*<sup>x</sup>* of forgetting <sup>x</sup> in <sup>O</sup> does not contain any occurrences of <sup>x</sup>, while still capturing all entailments of <sup>O</sup> that do not use <sup>x</sup> [18]. In a forgetting-based proof, an inference takes as premises a set P of axioms and has as conclusion some axiom <sup>α</sup> ∈ P<sup>−</sup>*<sup>x</sup>* (where a particular forgetting operation is used to compute <sup>P</sup><sup>−</sup>*<sup>x</sup>*). Intuitively, <sup>α</sup> is obtained from <sup>P</sup> by performing inferences on <sup>x</sup>. To compute a forgetting-based proof, we have to forget the names occuring in the ontology one after the other, until only the names occurring in the statement to be proved are left. For the forgetting operation, the user can select between two implementations: Lethe [17] (using the method supporting ALCH) and Fame [21] (using the method supporting ALCOI). Since the space of possible inference steps is exponentially large, it is not feasible to minimize proofs after their computation, as we do for EL entailments, which is why we rely on heuristics and search algorithms to generate small proofs. Specifically, we implemented three methods for computing forgetting-based proofs: HEUR tries to find proofs fast, SYMB tries to minimize the number of predicates forgotten in a proof, with the aim of obtaining proofs of small depth, and SIZE tries to optimize the size of the proof. The heuristic method HEUR is described in [3], and its implementation has not been changed since then. The search methods SYMB and SIZE are new (details can be found in the extended version [1]).

#### **4.2 Signature-Based Proof Condensation**

When inspecting a proof over a real-world ontology, different parts of the proof will be more or less familiar to the user, depending on their knowledge about the involved concepts or their experience with similar inference steps in the past. For CIs between concepts for which a user has application knowledge, they may not need to see a proof, and consequently, sub-proofs for such axioms can be automatically hidden. We assume that the user's knowledge is given in the form of a *known signature* <sup>Σ</sup> and that axioms that contain only symbols from <sup>Σ</sup> do not need to be explained. The effect can be seen in Fig. 1 through the "known" inference on the left, where Σ contains SebaceousGland and Gland. The known signature is taken into consideration when minimizing the proofs, so that proofs are selected for which more of the known information can be used if convenient. This can be easily integrated into the Dijsktra approach described in [3], by initially assigning to each axiom covered by Σ a proof with a single vertex.

#### **5 Evaluation**

For Evonne to be usable in practice, it is vital that proofs are computed efficiently and that they are not too large. An experimental evaluation of minimized proofs for EL and forgetting-based proofs obtained with Fame and Lethe is provided in [3]. We here present an evaluation of additional aspects: 1) a comparison of the three methods for computing forgetting-based proofs, and 2) an evaluation on the impact of signature-based proof condensation. All experiments were performed on Debian Linux (Intel Core i5-4590, 3.30 GHz, 23 GB Java heap size).

#### **5.1 Minimal Forgetting-Based Proofs**

To evaluate forgetting-based proofs, we extracted ALCH "proof tasks" from the ontologies in the 2017 snapshot of BioPortal [19]. We restricted all ontologies

**Fig. 2.** Run times and proof sizes for different forgetting-based proof methods. Marker size indicates how often each pattern occurred in the BioPortal snapshot. Instances that timed out were assigned size 0.

to ALCH and collected all entailed atomic CIs <sup>α</sup>, for each of which we computed the union <sup>U</sup> of all their justifications. We identified pairs (α, <sup>U</sup>) that were isomorphic modulo renaming of predicates, and kept only those patterns (α, <sup>U</sup>) that contained at least one axiom not expressible in ELH. This was successful in 373 of the ontologies<sup>2</sup> and resulted in 138 distinct *justification patterns* (α, <sup>U</sup>), representing 327 different entailments in the BioPortal snapshot. We then computed forgetting-based proofs for U |<sup>=</sup> <sup>α</sup> with our three methods using Lethe, with a 5-minute timeout. This was successful for 325/327 entailments for the heuristic method (HEUR), 317 for the symbol-minimizing method (SYMB), and 279 for the size-minimizing method (SIZE). In Fig. <sup>2</sup> we compare the resulting *proof sizes* (left) and the *run times* (right), using HEUR as baseline (x-axis). HEUR is indeed faster in most cases, but SIZE reduces proof size by 5% on average compared to HEUR, which is not the case for SYMB. Regarding *proof depth* (not shown in the figure), SYMB did not outperform HEUR on average, while SIZE surprisingly yielded an average reduction of 4% compared to HEUR. Despite this good performance of SIZE for proof size and depth, for entailments that depend on many or complex axioms, computation times for both SYMB and SIZE become unacceptable, while proof generation with HEUR mostly stays in the area of seconds.

#### **5.2 Signature-Based Proof Condensation**

To evaluate how much hiding proof steps in a known signature decreases proof size in practice, we ran experiments on the large medical ontology SNOMED CT (International Edition, July 2020) that is mostly formulated in ELH. <sup>3</sup> As signatures we used SNOMED CT *Reference Sets*, <sup>4</sup> which are restricted vocabularies

<sup>2</sup> The other ontologies could not be processed in this way within the memory limit.

<sup>3</sup> https://www.snomed.org/.

<sup>4</sup> https://confluence.ihtsdotools.org/display/DOCRFSPG/2.3.+Reference+Set.

**Fig. 3.** Size of original and condensed proofs (left). Ratio of proof size depending on the signature coverage (right).

for specific use cases. We extracted justifications similarly to the previous experiment, but did not rename predicates and considered only proof tasks that use at least 5 symbols from the signature, since otherwise no improvement can be expected by using the signatures. For each signature, we randomly selected 500 out of 6.689.452 *proof tasks* (if at least 500 existed). This left the 4 reference sets *General Practitioner/Family Practitioner* (GPFP), *Global Patient Set* (GPS), *International Patient Summary* (IPS), and the one included in the SNOMED CT distribution (DEF). For each of the resulting 2.000 proof tasks, we used Elk [16] and our proof minimization approach to obtain (a) a proof of minimal size and (b) a proof of minimal size after hiding the selected signature. The distribution of proof sizes can be seen in Fig. 3. In 770/2.000 cases, a smaller proof was generated when using the signature. In 91 of these cases, the size was even be reduced to 1, i.e. the target axiom used only the given signature and therefore nothing else needed to be shown. In the other 679 cases with reduced size, the average *ratio* of reduced size to original size was 0.68–0.93 (depending on the signature). One can see that this ratio is correlated with the *signature coverage* of the original proof (i.e. the ratio of signature symbols to total symbols in the proof), with a weak or strong correlation depending on the signature (<sup>r</sup> between <sup>−</sup>0.26 and <sup>−</sup>0.74). However, a substantial number of proofs with relatively high signature coverage could still not be reduced in size at all (see the top right of the right diagram). In summary, we can see that signature-based condensation can be useful, but this depends on the proof task and the signature. We also conducted experiments on the Galen ontology,<sup>5</sup> with comparable results (see the extended version of this paper [1]).

<sup>5</sup> https://bioportal.bioontology.org/ontologies/GALEN.

# **6 Conclusion**

We have presented and compared the proof generation and presentation methods used in Evonne, a visual tool for explaining entailments of DL ontologies. While these methods produce smaller or less deep proofs, which are thus easier to present, there is still room for improvements. Specifically, as the forgetting-based proofs do not provide the same degree of detail as the Elk proofs, it would be desirable to also support methods for more expressive DLs that generate proofs with smaller inference steps. Moreover, our current evaluation focuses on proof size and depth—to understand how well Evonne helps users to understand DL entailments, we would also need a qualitative evaluation of the tool with potential end-users. We are also working on explanations for non-entailments using countermodels [7] and a plugin for the ontology editor Prot´eg´e that is compatible with the PULi library and Proof Explanation plugin presented in [15], which will support all proof generation methods discussed here and more.<sup>6</sup>

**Acknowledgements.** This work was supported by the German Research Foundation (DFG) in Germany's Excellence Strategy: EXC-2068, 390729961 - Cluster of Excellence "Physics of Life" and EXC 2050/1, 390696704 - Cluster of Excellence "Centre for Tactile Internet" (CeTI) of TU Dresden, by DFG grant 389792660 as part of TRR 248 - CPEC, by the AI competence center ScaDS.AI Dresden/Leipzig, and the DFG Research Training Group QuantLA, GRK 1763.

# **References**


<sup>6</sup> https://github.com/de-tu-dresden-inf-lat/evee.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Actions over Core-Closed Knowledge Bases**

Claudia Cauli1,2(B) , Magdalena Ortiz<sup>3</sup>, and Nir Piterman<sup>1</sup>

> University of Gothenburg, Gothenburg, Sweden claudiacauli@gmail.com Amazon Web Services, Seattle, USA TU Wien, Vienna, Austria

**Abstract.** We present new results on the application of semantic- and knowledge-based reasoning techniques to the analysis of cloud deployments. In particular, to the security of *Infrastructure as Code* configuration files, encoded as description logic knowledge bases. We introduce an action language to model *mutating actions*; that is, actions that change the structural configuration of a given deployment by adding, modifying, or deleting resources. We mainly focus on two problems: the problem of determining whether the execution of an action, no matter the parameters passed to it, will not cause the violation of some security requirement (*static verification*), and the problem of finding sequences of actions that would lead the deployment to a state where (un)desirable properties are (not) satisfied (*plan existence* and *plan synthesis*). For all these problems, we provide definitions, complexity results, and decision procedures.

## **1 Introduction**

The use of automated reasoning techniques to analyze the properties of cloud infrastructure is gaining increasing attention [4–7,18]. Despite that, more effort needs to be put into the modeling and verification of generic security requirements over cloud infrastructure pre-deployment. The availability of formal techniques, providing strong security guarantees, would assist complex system-level analyses such as threat modeling and data flow, which now require considerable time, manual intervention, and expert domain knowledge.

We continue our research on the application of semantic-based and knowledge-based reasoning techniques to cloud deployment *Infrastructure as Code* configuration files. In [14], we reported on our experience using expressive description logics to model and reason about Amazon Web Services' proprietary Infrastructure as Code framework (AWS CloudFormation). We used the rich constructs of these logics to encode domain knowledge, simulate closed-world reasoning, and express mitigations and exposures to security threats. Due to the high complexity of basic tasks [3,26], we found reasoning in such a framework to be not efficient at cloud scale. In [15], we introduced *core-closed knowledge*

C. Cauli—This work was done prior to joining Amazon.

c The Author(s) 2022

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 281–299, 2022. https://doi.org/10.1007/978-3-031-10769-6\_17

*bases*—a lightweight description logic combining closed- and open-world reasoning that is tailored to model cloud infrastructure and efficiently query its security properties. Core-closed knowledge bases enable partially-closed predicates whose interpretation is closed over a *core* part of the knowledge base but open elsewhere. To encode potential exposure to security threats, we studied the query satisfiability problem and (together with the usual query entailment problem) applied it to a new class of conjunctive queries that we called Must/May queries. We were able to answer such queries over core-closed knowledge bases in LogSpace in data complexity and NP in combined complexity, improving the required NExptime complexity for satisfiability over ALCOIQ (used in [14]).

Here, we enhance the quality of the analyses done over pre-deployment artifacts, giving users and practitioners additional precise insights on the impact of potential changes, fixes, and general improvements to their cloud projects. We enrich core-closed knowledge bases with the notion of *core-completeness*, which is needed to ensure that updates are consistent. We define the syntax and semantics of an action language that is expressive enough to encode *mutating* API calls, i.e., operations that change a cloud deployment configuration by creating, modifying, or deleting existing resources. As part of our effort to improve the quality of automated analysis, we also provide relevant reasoning tools to identify and predict the consequences of these changes. To this end, we consider procedures that determine whether the execution of a mutating action always preserves given properties (*static verification*); determine whether there exists a sequence of operations that would lead a deployment to a configuration meeting certain requirements (*plan existence*); and find such sequences of operations (*plan synthesis*).

The paper is organized as follows. In Sect. 2, we provide background on coreclosed knowledge bases, conjunctive queries, and Must/May queries. In Sect. 3, we motivate and introduce the notion of *core-completeness*. In Sect. 4, we define the action language. In Sect. 5, we describe the static verification problem and characterize its complexity. In Sect. 6, we address the planning problem and concentrate on the synthesis of minimal plans satisfying a given requirement expressed using Must/May queries. We discuss related works in Sect. 7 and conclude in Sect. 8. Results and proofs that are omitted in this paper are found in the full version [16].

# **2 Background**

Description logics (DLs) are a family of logics for encoding knowledge in terms of concepts, roles, and individuals; analogous to first-order logic unary predicates, binary predicates, and constants, respectively. Standard DL knowledge bases (KBs) have a set of axioms, called *TBox*, and a set of assertions, called *ABox*. The TBox contains axioms that relate to concepts and roles. The ABox contains assertions that relate individuals to concepts and pairs of individuals to roles. KBs are usually interpreted under the open-world assumption, meaning that the asserted facts are not assumed to be complete.

*Core-Closed Knowledge Bases.* In [15], we introduced core-closed knowledge bases (ccKBs) as a suitable description logic formalism to encode cloud deployments. The main characteristic of ccKBs is to allow for a combination of openand closed-world reasoning that ensures tractability. A DL-Lite<sup>F</sup> ccKB is the tuple K = -T , A, S,M built from the standard knowledge base -T , A and the *core* system -S,M. The former encodes incomplete terminological and assertional knowledge. The latter is, in turn, composed of two parts: S (also called the *SBox* ), containing axioms that encode the core structural specifications, and M (also called the *MBox* ), containing positive concept and role assertions that encode the core configuration. Syntactically, M is similar to an ABox but, semantically, is assumed to be complete with respect to the specifications in S.

The ccKB K is defined over the alphabets **C** (of concepts), **R** (of roles), and **I** (of individuals), all partitioned into an open subset and a partially-closed subset. That is, the set of concepts is partitioned into the open concepts **C**<sup>K</sup> and the closed (specification) concepts **C**<sup>S</sup> ; the set of roles is partitioned into open roles **R**<sup>K</sup> and closed (specification) roles **R**<sup>S</sup> ; and the set of individuals is partitioned into open individuals **I**<sup>K</sup> and closed (model) individuals **I**M. We call **C**<sup>S</sup> and **R**<sup>S</sup> core-closed predicates, or partially-closed predicates, as their extension is closed over the core domain **I**<sup>M</sup> and open otherwise. In contrast, we call **C**<sup>K</sup> and **R**<sup>K</sup> open predicates. The syntax of concept and role expressions in DL-Lite<sup>F</sup> [2,8] is as follows:

$$\mathsf{B} ::= \bot \mid \mathsf{A} \mid \exists \mathsf{p}$$

where A denotes a concept name and p is either a role name r or its inverse r −. The syntax of axioms provides for the three following axioms:

$$\mathsf{B}^1 \sqsubseteq \mathsf{B}^2, \qquad \mathsf{B}^1 \sqsubseteq \neg \mathsf{B}^2, \qquad (\mathsf{funct } \mathfrak{p}),$$

respectively called: *positive inclusion* axioms, *negative inclusion* axioms, and *functionality* axioms. These axioms are contained in the sets S and T . To precisely denote the subsets of S and T having only axioms of a given type we use the notation P I<sup>X</sup> , NI<sup>X</sup> , and F<sup>X</sup> , for X ∈ {S, T }, which respectively contain only positive inclusion axioms, negative inclusion axioms, and functionality axioms. From now on, we denote symbols from the alphabet **X**<sup>X</sup> with the subscript X , and symbols from the generic alphabet **X** with no subscript. In core-closed knowledge bases, axioms and assertions fall into the scope of a different set depending on the predicates and individuals that they refer to, according to the set definitions below.

$$\begin{split} \mathcal{M} & \subseteq \{ \mathsf{A}\_{\mathcal{S}}(a\_{\mathcal{M}}), \ \mathsf{R}\_{\mathcal{S}}(a\_{\mathcal{M}},a), \ \mathsf{R}\_{\mathcal{S}}(a,a\_{\mathcal{M}}) \} \\ \mathcal{A} & \subseteq \{ \mathsf{A}\_{\mathcal{K}}(a\_{\mathcal{K}}), \ \mathsf{R}\_{\mathcal{K}}(a\_{\mathcal{K}},b\_{\mathcal{K}}), \ \mathsf{A}\_{\mathcal{S}}(a\_{\mathcal{K}}), \ \mathsf{R}\_{\mathcal{S}}(a\_{\mathcal{K}},b\_{\mathcal{K}}) \} \\ \mathcal{S} & \subseteq \{ \mathsf{B}\_{\mathcal{S}}^{1} \sqsubseteq \mathsf{B}\_{\mathcal{S}}^{2}, \ \mathsf{B}\_{\mathcal{S}}^{1} \sqsubseteq \neg \mathsf{B}\_{\mathcal{S}}^{2}, \ \mathsf{Func}(\mathsf{P}\_{\mathcal{S}}) \} \\ \mathcal{T} & \subseteq \{ \mathsf{B}^{1} \sqsubseteq \mathsf{B}\_{\mathcal{K}}^{2}, \ \mathsf{B}^{1} \sqsubseteq \neg \mathsf{B}\_{\mathcal{K}}^{2}, \ \mathsf{Func}(\mathsf{P}\_{\mathcal{K}}) \} \end{split}$$

In the above definition of the set M, role assertions link at least one individual from the core domain **I**<sup>M</sup> (denoted as aM) to one individual from the general set **I** (denoted as a). Node a could either be an individual from the open partition **I**<sup>K</sup> or the closed partition **I**M. When a is an element from the set **I**K, we refer to it as a "boundary node", as it sits at the boundary between the core and the open parts of the knowledge base. As mentioned earlier, M-assertions are assumed to be complete and consistent with respect to the terminological knowledge given in S; whereas the usual open-world assumption is made for A-assertions. The semantics of a DL-Lite<sup>F</sup> core-closed KB is given in terms of interpretations I, consisting of a non-empty domain Δ<sup>I</sup> and an interpretation function · I. The latter assigns to each concept A a subset A<sup>I</sup> of ΔI, to each role r a subset r I of Δ<sup>I</sup> ×ΔI, and to each individual a a node a<sup>I</sup> in ΔI, and it is extended to concept expressions in the usual way. An interpretation I is a model of an inclusion axiom B<sup>1</sup> B<sup>2</sup> if B<sup>I</sup> <sup>1</sup> ⊆ B<sup>I</sup> <sup>2</sup> . An interpretation I is a model of a membership assertion A(a), (resp. r(a, b)) if a<sup>I</sup> ∈ A<sup>I</sup> (resp. (aI, bI) ∈ r <sup>I</sup>). We say that I models T , S, and A if it models all axioms or assertions contained therein. We say that I models <sup>M</sup>, denoted I |=CWA <sup>M</sup>, when it models an <sup>M</sup>-assertion <sup>f</sup> *if and only if* f ∈M. Finally, I models K if it models T , S, A, and M. When K has at least one model, we say that K is satisfiable.

In the remainder of this paper, we will sometimes refer to the *lts* interpretation of M. The *lts* interpretation of M, denoted lts(M), is the interpretation (Δlts(M), · lts(M)) defined only over concept and role names from the set **C**<sup>S</sup> and **R**<sup>S</sup> , respectively, and over individual names from **I**<sup>K</sup> that appear in the scope of M-assertions. The interpretation lts(M) is the *unique* model of M such that lts(M) <sup>|</sup>=CWA <sup>M</sup>.

In the application presented in [14], description logic KBs are used to encode machine-readable deployment files containing multiple resource declarations. Every resource declaration has an underlying tree structure, whose leaves can potentially link to the roots of other resource declarations. Let **<sup>I</sup>**<sup>r</sup> <sup>⊆</sup> **<sup>I</sup>**<sup>M</sup> be the set of all resource nodes, we encode their resource declarations in M, and formalize the resulting forest structure by partitioning M into multiple subsets {Mi}<sup>i</sup>∈**I**<sup>r</sup> , each representing a tree of assertions rooted at a resource node i (we generally refer to constants in M as nodes). For the purpose of this work, we will refer to core-closed knowledge bases where M is partitioned as described; that is, ccKBs such that K = -T , A, S, {Mi}<sup>i</sup>∈**I**<sup>r</sup> .

*Conjunctive Queries.* A *conjunctive query* (CQ) is an existentially-quantified formula q[x] of the form ∃y.*conj*(x, y), where *conj* is a conjunction of positive atoms and potentially inequalities. A *union of conjunctive queries* (UCQ) is a disjunction of CQs. The variables in x are called *answer variables*, those in y are the existentially-quantified *query variables*. A tuple c of constants appearing in the knowledge base K is an answer to q if for all interpretations I model of K we have I |= q[c]. We call these tuples the *certain answers* of *q* over K, denoted ans(K, q), and the problem of testing whether a tuple is a certain answer *query entailment*. A tuple c of constants appearing in K satisfies q if there exists an interpretation I model of K such that I |= q[c]. We call these tuples the *sat answers* of *q* over K, denoted sat−ans(K, q), and the problem of testing whether a given tuple is a sat answer *query satisfiability*.

Must/May *Queries.* A Must/May query ψ [15] is a Boolean combination of nested UCQs in the scope of a Must or a May operator as follows:

$$\psi ::= \neg \psi \mid \psi\_1 \land \psi\_2 \mid \psi\_1 \lor \psi\_2 \mid \text{Mustr } \varphi \mid \text{ Mary } \varphi\_\#$$

where ϕ and ϕ≈ are unions of conjunctive queries potentially containing inequalities. The reasoning needed for answering the nested queries can be decoupled from the reasoning needed to answer the higher-level formula: nested queries Must ϕ are reduced to conjunctive query entailment, and nested queries May <sup>ϕ</sup>≈ are reduced to conjunctive query satisfiability. We denote by ANS(ψ, <sup>K</sup>) the answers of a Must/May query <sup>ψ</sup> over the core-closed knowledge base <sup>K</sup>.

## **3 Core-Complete Knowledge Bases**

The algorithm Consistent presented in [15] computes satisfiability of DL-Lite<sup>F</sup> core-closed knowledge bases relying on the assumption that M is complete and consistent with respect to S. Such an assumption effectively means that the information contained in M is *explicitly* present and *cannot be completed by inference*. The algorithm relies on the existence of a theoretical object, the canonical interpretation, in which missing assertions can always be introduced when they are logically implied by the positive inclusion axioms. As a matter of fact, positive inclusion axioms are not even included in the inconsistency formula built for the satisfiability check, as it is proven that the canonical interpretation always satisfies them ([15], Lemma 3). When the assumption that M is consistent with respect to S is dropped, the algorithm Consistent becomes insufficient to check satisfiability. We illustrate this with an example.

*Example 1 (Required Configuration).* Let us consider the axioms constraining the AWS resource type S3::Bucket. In particular, the S-axiom S3::Bucket ∃loggingConfiguration prescribing that all buckets must have a *required* logging configuration. For a set M = {S3::Bucket(b)}, according to the partiallyclosed semantics of core-closed knowledge bases, the absence of an assertion loggingConfiguration(b, x), for some x, is interpreted as the assertion being false in M, which is therefore not consistent with respect to S. However, the algorithm Consistent will check the *lts* interpretation of M for an empty formula (as there are no negative inclusion or functionality axioms) and return *true*.

In essence, the algorithm Consistent does not compute the full satisfiability of the whole core-closed knowledge base, but only of its open part. Satisfiability of M with respect to the positive inclusion axioms in S needs to be checked separately. We introduce a new notion to denote when a set M is complete with respect to S that is distinct from the notion of consistency. Let K = -T , A, S,M be a DL-Lite<sup>F</sup> core-closed knowledge base; we say that K is *core-complete* when M models *all* positive inclusion axioms in S under a closed-world assumption; we say that K is *open-consistent* when M and A model all negative inclusion and functionality axioms in K's negative inclusion closure. Finally, we say that K is *fully satisfiable* when is both *core-complete* and *open-consistent*.

**Lemma 1.** *In order to check* full satisfiability *of a DL-Lite*<sup>F</sup> *core-closed KB, one simply needs to check if* K *is* core-complete *(that is, if* M *models all* positive axioms *in* S *under a closed-world assumption) and if* K *is* open-consistent *(that is, to run the algorithm* Consistent*).*

*Proof.* Dropping the assumption that M is consistent w.r.t. S causes Lemma 3 from [15] to fail. In particular, the canonical interpretation of K, can(K), would still be a model of P I<sup>T</sup> , A, and M, but may *not* be a model of P I<sup>S</sup> . This is due to the construction of the canonical model that is based on the notion of applicable axioms. In rules **c5-c8** of [15] Definition 1, axioms in P I<sup>S</sup> are defined as applicable to assertions involving open nodes a<sup>K</sup> but *not* to model nodes a<sup>M</sup> in **I**M. As a result, if the implications of such axioms on model nodes are not included in M itself, then they will not be included in can(K) either, and can(K) will not be a model of P I<sup>S</sup> . On the other hand, one can easily verify that Lemmas 1,2,4,5,6,7 and Corollary 1 would still hold as they do not rely on the assumption. However, since it is not guaranteed anymore that M satisfies all positive inclusion axioms from S, the *if* direction of [15] Theorem 1 does not hold anymore: there can be an unsatisfiable ccKB K such that db(A)∪lts(M) |= cln(T ∪S), A,M. For instance, the knowledge base from Example 1. We also note that the negative inclusion and functionality axioms from S will be checked anyway by the consistency formula, both on db(A) and on lts(M).

**Lemma 2.** *Checking whether a DL-Lite*<sup>F</sup> *core-closed knowledge base is* corecomplete *can be done in polynomial time in* M*. As a consequence, checking full satisfiability is also done in polynomial time in* M*.*

*Proof.* One can write an algorithm that checks *core-completeness* by searching for the existence of a positive inclusion axiom B<sup>1</sup> <sup>S</sup> <sup>B</sup><sup>2</sup> <sup>S</sup> <sup>∈</sup> P I<sup>S</sup> such that M |<sup>=</sup> B1 <sup>S</sup> (aM) and <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>B</sup><sup>2</sup> <sup>S</sup> (aM), where the relation <sup>|</sup>= is defined over DL-Lite<sup>F</sup> concept expressions as follows:

$$\begin{array}{rcl} \mathcal{M} = \bot(a\_{\mathcal{M}}) & \leftrightarrow & false \\ \mathcal{M} = \mathsf{A}\_{\mathcal{S}}(a\_{\mathcal{M}}) & \leftrightarrow & \mathsf{A}\_{\mathcal{S}}(a\_{\mathcal{M}}) \in \mathcal{M} \\ \mathcal{M} = \exists \mathsf{r}\_{\mathcal{S}}(a\_{\mathcal{M}}) & \leftrightarrow & \exists b. \ \mathsf{r}\_{\mathcal{S}}(a\_{\mathcal{M}}, b) \in \mathcal{M} \\ \mathcal{M} = \exists \mathsf{r}\_{\mathcal{S}}^{-}(a\_{\mathcal{M}}) & \leftrightarrow & \exists b. \ \mathsf{r}\_{\mathcal{S}}(b, a\_{\mathcal{M}}) \in \mathcal{M}. \end{array}$$

The knowledge base is *core-complete* if such a node cannot be found.

## **4 Actions**

We now introduce a formal language to encode mutating actions. Let us remind ourselves that, in our application of interest, the execution of a mutating action modifies the configuration of a deployment by either adding new resource instances, deleting existing ones, or modifying their settings. Here, we introduce a framework for DL-Lite<sup>F</sup> core-closed knowledge base updates, triggered by the execution of an action that enables all the above mentioned effects. The only component of the core-closed knowledge base that is modified by the action execution is M; while T , S, and A remain unchanged. As a consequence of updating M, actions can introduce new individuals and delete old ones, thus updating the set **I**<sup>M</sup> as well. Note that this may force changes outside **I**<sup>M</sup> due to the axioms in T and S. The effects of applying an action over M depend on a set of input parameters that will be instantiated at execution time, resulting in different assertions being added or removed from M. As a consequence of assertions being added, fresh individuals might be introduced in the active domain of M, including both model nodes from **I**<sup>M</sup> and boundary nodes from **I**<sup>B</sup>. Differently, as a consequence of assertions being removed, individuals might be removed from the active domain of M, including model nodes from **I**<sup>M</sup> but *not* including boundary nodes from **I**<sup>B</sup>. In fact, boundary nodes are owned by the open portion of the knowledge base and are known to exist regardless of them being used in M. We invite the reader to review the set definitions for A- and M-assertions (Sect. 2) to note that it is indeed possible for a generic boundary individual a involved in an M-assertion to also be involved in an A-assertion.

#### **4.1 Syntax**

An action is defined by a signature and a body. The signature consists of an action name and a list of formal parameters, which will be replaced with actual parameters at execution time. The body, or action effect, can include conditional statements and concatenation of atomic operations over M-assertions. For example, let α be the action act(x) = γ; that is, the action denoted by signature act(x) and body γ, with signature name act, signature parameters x, and body effect γ. Since it contains unbound parameters, or free variables, action α is ungrounded and needs to be instantiated with actual values in order to be executed over a set M. In the following, we assume the existence of a set Var, of variable names, and consider a generic input parameters substitution θ : Var <sup>→</sup> **<sup>I</sup>**, which replaces each variable name by an individual node. For simplicity, we will denote an ungrounded action by its effect γ, and a grounded action by the composition of its effect with an input parameter substitution γθ. Action effects can either be *complex* or *basic*. The syntax of complex action effects γ and basic effects β is constrained by the following grammar.

$$\begin{array}{lclcl}\gamma ::= \epsilon & \mid \ \beta \cdot \gamma & \mid \ [\varphi \leadsto \beta ] \cdot \gamma\\\beta ::= \oplus\_x S & \mid \ \ominus\_x S & \mid \ \ominus\_{x\_{new}} S & \mid \ \ominus\_x \end{array}$$

The complex action effects γ include: the empty effect ( ), the execution of a basic effect followed by a complex one ( β · γ ), and the conditional execution of a basic effect upon evaluation of a formula ϕ over the set M ( [ ϕ β ] · γ ). The basic action effects β include: the addition of a set S of M-assertions to the subset M<sup>x</sup> ( ⊕xS ), the removal of a set S of M-assertions from the subset M<sup>x</sup> (xS ), the addition of a fresh subset M<sup>x</sup>new containing all the M-assertions in the set S ( <sup>x</sup>new S ), and the removal of an existing M<sup>x</sup> subset in its entirety ( <sup>x</sup> ). The set S, the formula ϕ, and the operators ⊕/ might contain *free* *variables*. These variables are of two types: *(1)* variables that are replaced by the grounding of the action input parameters, and *(2)* variables that are the answer variables of the formula ϕ and appear in the nested effect β.

*Example 2.* The following is the definition of the action createBucket from the API reference of the AWS resource type S3::Bucket. The input parameters are two: the new bucket name "name" and the canned access control list "acl" (one of *Private*, *PublicRead*, *PublicReadWrite*, *AuthenticatedRead*, etc.). The effect of the action is to add a fresh subset M<sup>x</sup> for the newly introduced individual x containing the two assertions S3::Bucket(x) and accessControl(x, y).

createBucket(x : name, y : acl) = x{S3::Bucket(x), accessControl(x, y)} · 

The action needs to be instantiated by a specific parameter assignment, for example the substitution θ = [ x ← DataBucket, y ← P rivate ], which binds the variable x to the node DataBucket and the variable y to the node P rivate, both taken from a pool of inactive nodes in **I**.

*Action Query* ϕ. The syntax introduced in the previous paragraph allows for complex actions that conditionally execute a basic effect β depending on the evaluation of a formula ϕ over M. This is done via the construct [ ϕ β ] · γ. The formula ϕ might have a set y of answer variables that appear free in its body and are then bound to concrete tuples of nodes during evaluation. The answer tuples are in turn used to instantiate the free variables in the nested effect β. We call ϕ the *action query* since we use it to select all the nodes that will be involved in the action effect. According to the grammar below, ϕ is a boolean combination of M-assertions potentially containing free variables.

$$\varphi ::= \mathsf{A}\_{\mathcal{S}}(t) \mid \mathsf{R}\_{\mathcal{S}}(t\_1, t\_2) \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_2 \lor \varphi\_2 \mid \neg \varphi$$

In particular, A<sup>S</sup> is a symbol from the set **C**<sup>S</sup> of partially-closed concepts; R<sup>S</sup> is a symbol from the set **R**<sup>S</sup> of partially-closed roles; and t, t1, t<sup>2</sup> are either individual or variable names from the set **I** Var, chosen in such a way that the resulting assertion is an M-assertion. Since the formula ϕ can only refer to M-assertions, which are interpreted under a closed semantics, its evaluation requires looking at the content of the set M. A formula ϕ with no free variables is a boolean formula and evaluates to either true or false. A formula ϕ with answer variables y and arity ar(ϕ) evaluates to all the tuples t, of size equal the arity of ϕ, that make the formula true in M. The free variables of ϕ can only appear in the action β such that ϕ β. We denote by ANS(ϕ,M) the set of answers to the action query ϕ over M. It is easy to see that the maximum number of tuples that could be returned by the evaluation (that is, the size of the set ANS(ϕ,M)) is bounded by <sup>|</sup>**I**<sup>M</sup> **<sup>I</sup>**<sup>B</sup><sup>|</sup> ar(ϕ) , in turn bounded by ( 2|M|)<sup>2</sup>|ϕ<sup>|</sup> .

*Example 3.* The following example shows the encoding of the S3 API operation called deleteBucketEncryption, which requires as unique input parameter the name of the bucket whose encryption configuration is to be deleted. Since a bucket can have multiple encryption configuration rules (each prescribing different encryption keys and algorithms to be used) we use an action query ϕ to select *all* the nodes that match the assertions structure to be removed.

$$\varphi[y,k,z](x) = \mathsf{S3:Bucket}(x) \land \mathsf{enc7Rule}(x,y) \land \mathsf{S\mathsf{S\mathcal{E\mathcal{K}}eq}(y,k)} \land \mathsf{S\mathcal{S\mathcal{E\mathcal{A}}\log}(y,z)}$$

The query ϕ is instantiated by the specific bucket instance (which will replace the variable x) and returns all the triples (y, k, z) of encryption rule, key, and algorithm, respectively, which identify the assertions corresponding to the different encryption configurations that the bucket has. The answer variables are then used in the action effect to instantiate the assertions to remove from Mx:

```
deleteBucketEncryption(x : name)
= [ϕ[y, k, z](x) -
                 x{encrRule(x, y), SSEKey(y, k), SSEAlgo(y, z)}] ·
```
#### **4.2 Semantics**

So far, we have described the syntax of our action language and provided two examples that showcase the encoding of real-world API calls. Now, we define the semantics of action effects with respect to the changes that they induce over a knowledge base. Let us recall that given a substitution θ for the input parameters of an action γ, we denote by γθ the grounded action where all the input variables are replaced according to what prescribed by θ. Let us also recall that the effects of an action apply only to assertions in M and individuals from **I**M, and cannot affect nodes and assertions from the open portion of the knowledge base.

The execution of a grounded action γθ over a DL-Lite<sup>F</sup> core-closed knowledge base K = (T , A, S,M), defined over the set **I**<sup>M</sup> of partially-closed individuals, generates a new knowledge base <sup>K</sup>γθ = (<sup>T</sup> , <sup>A</sup>, <sup>S</sup>,Mγθ ), defined over an updated set of partially-closed individuals **I**Mγθ . Let S be a set of M-assertions, γ a complex action, θ an input parameter substitution, and ρ a generic substitution that potentially replaces all free variables in the action γ. Let ρ<sup>1</sup> and ρ<sup>2</sup> be two substitutions with signature Var → **I** such that dom(ρ1)∩dom(ρ2) = ∅; we denote their composition by ρ1ρ<sup>2</sup> and define it as the new substitution such that ρ1ρ2(x) = a if ρ1(x)=a ∨ ρ2(x)=a, and ρ1ρ2(x) = ⊥ if ρ1(x)=⊥ ∧ ρ2(x)=⊥. We formalize the application of the grounded action γθ as the transformation T γθ that maps the pair - <sup>M</sup>, **<sup>I</sup>**M into the new pair M , **I**M . We sometimes use the notation T γθ (M) or T γθ (**I**M) to refer to the updated MBox or to the updated set of model nodes, respectively. The rules for applying the transformation depend on the structure of the action γ and are reported in Fig. 1. The transformation starts with an initial generic substitution ρ = θ. As the transformation progresses, the generic substitution ρ can be updated only as a result of the evaluation of an action query ϕ over M. Precisely, all the tuples t <sup>1</sup>, ...,t <sup>n</sup> making ϕ true in M will be considered and composed with the current substitution ρ generating n fresh substitutions ρt1, ..., ρt<sup>n</sup> which are used in the subsequent application of the nested effect β. Since the core M of the knowledge base K changes at every action execution, its domain of model nodes **I**<sup>M</sup> changes as well. The execution of an action γθ over the knowledge base <sup>K</sup> = (<sup>T</sup> , <sup>A</sup>, <sup>S</sup>,M) with set of model nodes **<sup>I</sup>**<sup>M</sup> could generate a new <sup>K</sup>γθ = (<sup>T</sup> , <sup>A</sup>, <sup>S</sup>,Mγθ ) with a new set of model nodes **I**M that is not *core-complete* or not *open-consistent* (see Sect. 3 for the corresponding definitions). We illustrate two examples next.

*Example 4 (Violation of core-completeness).* Consider the case where the general specifications of the system require all objects of type bucket to have a logging configuration, and an action that removes the logging configuration from a bucket. Consider the core-closed knowledge base K where S = {S3::Bucket ∃loggingConfiguration} and M = {S3::Bucket(b), loggingConfiguration(b, c)} (consistent wrt S) and the action γ defined as

$$\begin{aligned} \mathsf{deleteLogingConfination}(x:name) \\ = [ (\varphi[y](x) = \mathsf{S3} :: \mathsf{Bucket}(x) \land \mathsf{loggingConfination}(x, y) ) ] ) \\ \leadsto \ominus\_x \{ \mathsf{loggingConfination}(x, y) \} ] \cdot \epsilon \end{aligned}$$

For the input parameter substitution θ = [<sup>x</sup> <sup>←</sup> <sup>b</sup>], it is easy to see that the transformation T γθ applied to <sup>M</sup> results in the update <sup>M</sup>γθ <sup>=</sup> {S3::Bucket(b)}, which is *not* core-complete.

*Example 5 (Violation of open-consistency).* Consider the case where an action application indirectly affects boundary nodes and their properties, leading to inconsistencies in the open portion of the knowledge base. For example, when the knowledge base prescribes that buckets used to store logs cannot be public; however, a change in the configuration of a bucket instance causes a second bucket (initially known to be public) to also become a log store. In particular, this happens when the knowledge base K contains the T -axiom ∃loggingDestination<sup>−</sup> ¬PublicBucket and the A-assertion PublicBucket(b), and we apply an action that introduces a new bucket storing its logs to b, defined as follows:

> createBucketWithLogging(x : name, y : log) = x{S3::Bucket(x), loggingDestination(x, y)}

For the input parameter substitution θ = [<sup>x</sup> <sup>←</sup> newBucket, y <sup>←</sup> <sup>b</sup>], the result of applying the transformation T γθ is the set M = {S3::Bucket(newBucket), loggingDestination(newBucket, b)} which, combined with the pre-existing and unchanged sets <sup>T</sup> and <sup>A</sup>, causes the updated <sup>K</sup>γθ to be *not* open-consistent.

From a practical point of view, the examples highlight the need to re-evaluate core-completeness and open-consistency of a core-closed knowledge base after each action execution. Detecting a violation to core-completeness signals that we have modeled an action that is inconsistent with respect to the systems specifications, which most likely means that the action is missing something and needs to be revised. Detecting a violation to open-consistency signals that our action, even when consistent with respect to the specifications, introduces a change that conflicts with other assumptions that we made about the system, and generally indicates that we should either revise the assumptions or forbid the application of the action. Both cases are important to consider in the development life cycle of the core-closed KB and the action definitions.

## **5 Static Verification**

In this section, we investigate the problem of computing whether the execution of an action, no matter the specific instantiation, always preserves given properties of core-closed knowledge bases. We focus on properties expressed as Must/May queries and define the static verification problem as follows.

**Definition 1 (Static Verification).** *Let* K *be a DL-Lite*<sup>F</sup> *core-closed knowledge base,* q *be a* Must*/*May *query, and* γ *be an action with free variables from the language presented above. Let* θ *be an assignment for the input variables of* <sup>γ</sup> *that transforms* <sup>γ</sup> *into the* grounded *action* γθ*. Let* <sup>K</sup>γθ *be the DL-Lite*<sup>F</sup> *coreclosed knowledge base resulting from the application of the grounded action* γθ *onto* K*. We say that the action* γ "preserves q over K*" iff for every grounded instance* γθ *we have that* ANS(q, <sup>K</sup>) = ANS(q, <sup>K</sup>γθ )*. The static verification problem is that of determining whether an action* γ *is* q*-preserving over* K*.*

An action <sup>γ</sup> is *not* <sup>q</sup>-preserving over <sup>K</sup> iff there exists a grounding θ for the input variables of <sup>γ</sup> such that ANS(q, <sup>K</sup>) <sup>=</sup> ANS(q, <sup>K</sup>γθ ); that is, fixed the grounding θ there exists a tuple t for q's answer variables such that t <sup>∈</sup> ANS(q, <sup>K</sup>) - ANS(q, <sup>K</sup>γθ ) or t <sup>∈</sup> ANS(q, <sup>K</sup>γθ ) -ANS(q, K).

**Theorem 1 (Complexity of the Static Verification Problem).** *The static verification problem, i.e.deciding whether an action* γ *is* q*-preserving over* K*, can be decided in* PTime *in data complexity and* ExpTime *in the arities of* γ *and* q*.* *Proof.* The proof relies on the fact that one could: enumerate all possible assignments θ; compute the updated knowledge bases <sup>K</sup>γθ ; check whether these are fully satisfiable; enumerate all tuples t for the query q; and, finally, check whether there exists at least one such tuple that satisfies <sup>q</sup> over <sup>K</sup> but not <sup>K</sup>γθ or vice versa. The number of assignments θ is bounded by |**I**<sup>M</sup> **I**K|+ar(γ) ar(γ) as it is sufficient to replace each variable appearing in the action γ either by a known object from **<sup>I</sup>**<sup>M</sup> **<sup>I</sup>**<sup>K</sup> or by a fresh one. The computation of the updated <sup>K</sup>γθ is done in polynomial time in M (and is exponential in the size of the action γ) as it may require the evaluation of an internal action query ϕ and the consecutive re-application of the transformation for a number of tuples that is bounded by a polynomial over the size of M. As explained in Sect. 3, checking full satisfiability of the resulting core-closed knowledge base is also polynomial in M. The number of tuples t is bounded by |**I**<sup>M</sup> **I**<sup>K</sup>| + ar(γ) ar(q) as it is enough to consider all those tuples involving known objects plus the fresh individuals introduced by the assignment θ. Checking whether a tuple t satisfies the query q over a core-closed knowledge base is decided in LogSpace in the size of <sup>M</sup> [15] which is, thus, also polynomial in M.

## **6 Planning**

As discussed throughout the paper, the execution of a mutating action modifies the configuration of a deployment and potentially changes its posture with respect to a given set of requirements. In the previous two sections, we introduced a language to encode mutating actions and we investigated the problem of checking whether the application of an action preserves the properties of a core-closed knowledge base. In this section, we investigate the plan existence and synthesis problems; that is, the problem of deciding whether there exists a sequence of grounded actions that leads the knowledge base to a state where a certain requirement is met, and the problem of finding a set of such plans, respectively. We start by defining a notion of transition system that is generated by applying actions to a core-closed knowledge base and then use this notion to focus on the mentioned planning problems. As in classical planning, the plan existence problem for plans computed over unbounded domains is undecidable [17,19]. The undecidability proof is done via reduction from the Word problem. The problem of deciding whether a deterministic Turing machine M accepts a word w ∈ {0, 1}<sup>∗</sup> is reduced to the plan existence problem. Since undecidability holds even for basic action effects, we can show undecidability over an unbounded domain by using the same encoding of [1].

*Transition Systems.* In the style of the work done in [10,21], the combination of a DL-Lite<sup>F</sup> core-closed knowledge base and a set of actions can be viewed as the transition system it generates. Intuitively, the states of the transition system correspond to MBoxes and the transitions between states are labeled by grounded actions. A DL-Lite<sup>F</sup> core-closed knowledge base K = (T , A, S,M0), defined over the possibly infinite set of individuals **I** (and model nodes **I**<sup>M</sup> <sup>0</sup> ⊆ **I**) and the set Act of ungrounded actions, generates the transition system (TS) Υ<sup>K</sup> = (**I**, T , A, S,Σ,M0,→) where Σ is a set of *fully satisfiable* (i.e., *core-complete* and *open-consistent*) MBoxes; M<sup>0</sup> is the initial MBox; and →⊆ Σ × LAct × Σ is a labeled transition relation with LAct the set of all possible *grounded actions*. The sets Σ and → are defined by mutual induction as the smallest sets such that: if <sup>M</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup> then for every grounded action γθ <sup>∈</sup> <sup>L</sup>Act such that the fresh MBox Mi+1 resulting from the transformation T γθ is core-complete and openconsistent, we have that <sup>M</sup>i+1 <sup>∈</sup> <sup>Σ</sup> and (Mi, γθ,Mi+1) ∈→.

Since we assume that actions have input parameters that are replaced during execution by values from **I**, which contains both known objects from **I**<sup>M</sup> **I**<sup>K</sup> and possibly infinitely many fresh objects, the generated transition system Υ<sup>K</sup> is generally infinite. To keep the planning problem decidable, we concentrate on a known finite subset D ⊂ **I** containing all the fresh nodes and value assignments to action variables that are of interest for our application. In the remainder of this paper, we discuss the plan existence and synthesis problem for finite transition systems Υ<sup>K</sup> = (D, T , A, S,Σ,M0,→), whose states in Σ have a domain that is also bounded by D.

*The Plan Existence Problem.* A plan is a sequence of grounded actions whose execution leads to a state satisfying a given property. Let K = (T , A, S,M0) be a DL-Lite<sup>F</sup> core-closed knowledge base; Act be a set of ungrounded actions; and let Υ<sup>K</sup> = (D, T , A, S,Σ,M0,→) be its generated finite TS. Let π be a finite sequence <sup>γ</sup>1θ<sup>1</sup> ··· <sup>γ</sup>nθ<sup>n</sup> of grounded actions taken from the set <sup>L</sup>Act. We call the sequence π *consistent* iff there exists a run ρ = M<sup>0</sup> γ1θ 1 −−−→ M<sup>1</sup> γ2θ 2 −−−→··· <sup>γ</sup>n<sup>θ</sup> n −−−→ M<sup>n</sup> in <sup>Υ</sup>K. Let <sup>q</sup> be a Must/May query mentioning objects from adom(K) and t <sup>a</sup> tuple from the set adom(K)ar(q) . A consistent sequence π of grounded actions is a *plan* from <sup>K</sup> to (t, q) iff t <sup>∈</sup> ANS(q, <sup>K</sup><sup>n</sup> = (<sup>T</sup> , <sup>A</sup>, <sup>S</sup>,Mn)) with <sup>M</sup><sup>n</sup> the final state of the run induced by π.

**Definition 2 (Plan Existence).** *Given a DL-Lite*<sup>F</sup> *core-closed knowledge base* <sup>K</sup>*, a tuple* t*, and a* Must*/*May *query* <sup>q</sup>*, the* plan existence *problem is that of deciding whether there exists a plan from* <sup>K</sup> *to* (t, q)*.*

*Example 6.* Let us consider the transition system Υ<sup>K</sup> generated by the coreclosed knowledge base K = (T , A, S,M0) having the set of partially-closed assertions M<sup>0</sup> defined as

{S3::Bucket(b), KMS::Key(k), bucketEncryptionRule(b, r), bucketKey(r, k), bucketKeyEnabled(r, true), enableKeyRotation(k, f alse)}

and the set of action labels Act containing the actions deleteBucket, createBucket, deleteKey, createKey, enableKeyRotation, putBucketEncryption, and deleteBucketEncryption. Let us assume that we are interested in verifying the existence of a sequence of grounded actions that when applied onto the knowledge base would configure the bucket node b to be encrypted with a rotating key. Formally, this is equivalent to checking the existence of a consistent plan π that when executed on the transition system <sup>Υ</sup><sup>K</sup> leads to a state <sup>M</sup><sup>n</sup> such that the tuple t <sup>=</sup> <sup>b</sup> is in the set ANS(q, K<sup>n</sup> = (T , A, S,Mn)) for q the query

$$q[x] = \mathsf{S3::Bucket}(x) \land \text{Musr}\left(\exists y, z.\begin{array}{c} \mathsf{bucket}\mathsf{S3::Beckeryption}(x, y) \land \text{<} \\ \mathsf{bucketKey}(y, z) \land \text{ enableKeyRotation}(z, true) \end{array}\right)$$

It is easy to see that the following three sequences of grounded actions are valid plans from K to (b, q):

$$\begin{aligned} \pi\_1 &= \texttt{enableKeyROtation}(k) \\ \pi\_2 &= \texttt{createKey}(k\_1) \cdot \texttt{enableKeyROtation}(k\_1) \cdot \texttt{putBucketEncryptyposition}(b, k\_1) \\ \pi\_3 &= \texttt{deleteBucketEncryptytion}(b, k) \cdot \texttt{createKey}(k\_1) \cdot \texttt{enableKeyROtation}(k\_1) \cdot \texttt{putBucketEncryptytion}(b, k\_1) \end{aligned}$$

If, for example, a bucket was only allowed to have one encryption (by means of a functional axiom in S), then π<sup>2</sup> would not be a valid plan, as it would generate an inconsistent run leading to a state M<sup>i</sup> that is not open-consistent w.r.t. S.

**Lemma 3.** *The plan existence problem for a finite transition system* Υ<sup>K</sup> *generated by a DL-Lite*<sup>F</sup> *core-closed knowledge base* K *and a set of actions* Act*, over a finite domain of objects* D*, reduces to graph reachability over a graph whose number of states is at most exponential in the size of* D*.*

*The Plan Synthesis Problem.* We now focus on the problem of finding plans that satisfy a given condition. As discussed in the previous paragraph, we are mostly driven by query answering; in particular, by conditions corresponding to a tuple (of objects from our starting deployment configuration) satisfying a given requirement expressed as a Must/May query. Clearly, this problem is meaningful in our application of interest because it corresponds to finding a set of potential sequences of changes that would allow one to reach a configuration satisfying (resp., not satisfying) one, or more, security mitigations (resp., vulnerabilities). We concentrate on DL-Lite<sup>F</sup> core-closed knowledge bases and their generated finite transition systems, where potential fresh objects are drawn from a fixed set D. We are interested in sequences of grounded actions that are minimal and ignore sequences that extend these. We sometimes call such minimal sequences *simple plans*. A plan π from an initial core-closed knowledge base K to a goal condition b is minimal (or simple) *iff* there does not exist a plan π (from the same initial K to the same goal condition b) s.t. π = π · σ, for σ a non-empty suffix of grounded actions.

In Algorithm 1, we present a depth-first search algorithm that, starting from K, searches for all simple plans that achieve a given target query membership condition. The transition system Υ<sup>K</sup> is computed, and stored, on the fly in the Successors sub-procedure and the graph is explored in a depth-first search traversal fashion.

**Algorithm 1:** FindPlans(K, D,Act, - t, q )

**Inputs :** A ccKB K = (T , A, S,M0), a domain D, a set of actions Act and a pair - t, q of an answer tuple and a Must/May query **Output:** A possibly empty set Π of consistent simple plans 

```
1 def FindPlans ( K, D,Act,
                         -
                          t, q
                             ):
2 Π := ∅;
3 S := ⊥;
4 AllPlanSearch(M0, 
                     , ∅, K, D,Act,
                                -

                                 t, q
                                    ) ;
5 return Π;
6 def AllPlanSearch ( M, π, V, K, D,Act,
                                    -

                                     t, q
                                        ):
7 if M ∈ V then
8 return;
9 if t ∈ ANS(q,-
                  T , A, S,M) then
10 Π := Π ∪ {π};
11 return;
12 Q := ∅;
13 foreach 
              γθ,M

                     ∈ Successors(M,Act, D) do
14 Q.push(

                γθ,M

                      );
15 V := V ∪ {M};
16 while Q = ∅ do
17 
         γθ,M

                = Q.pop();
18 AllPlanSearch(M
                      , π · γθ, V, K, D,Act,
                                       -

                                       t, q
                                           );
19 V := V -
             {M};
20 return;
21 def Successors (M,Act, D):
22 if S[M] is defined then
23 return S[M];
24 N := ∅;
25 foreach γ ∈ Act, θ ∈ Dar(γ) do
26 M := T
               γθ
                (M);
27 if M
             is fully satisfiable then
28 N := N ∪ {
                     γθ,M

                            }
29 S[M] := N;
30 return N;
```
We note that the condition t <sup>∈</sup> ANS(q,-T , A, S,M) (line 9) could be replaced by any other query satisfiability condition and that one could easily rewrite the algorithm to be parameterized by a more general boolean goal. For example, the condition that a given tuple t is *not* an answer to a query q over the analyzed state, with the query q representing an undesired configuration, or a boolean formula over multiple query membership assertions. We also note that Algorithm 1 could be simplified to return only one simple plan, if a plan exists, or NULL, if a plan does not exist, thus solving the so-called *plan generation problem*. We refer the reader to the full version of this paper [16] containing the plan generation algorithm (full version, Appendix A.1) and the proofs of Theorem 2 and 3 below (full version, Appendices A.2 and A3, respectively).

**Theorem 2 (Minimal Plan Synthesis Correctness).** *Let* K *be a DL-Lite*<sup>F</sup> *core-closed knowledge base,* D *be a fixed finite domain, Act be a set of ungrounded action labels, and* - t, q *be a goal. Then a plan* π *is returned by the algorithm* FindPlans(K, D,Act, - t, q ) *if and only if* π *is a minimal plan from* K *to* - t, q *.*

**Theorem 3 (Minimal Plan Synthesis Complexity).** *The* FindPlans *algorithm runs in polynomial time in the size of* M *and exponential time in size of* D*.*

# **7 Related Work**

The syntax of the action language that we presented in this paper is similar to that of [1,12,13]. Differently from their work, we disallow complex action effects to be nested inside conditional statements, and we define basic action effects that consist purely in the addition and deletion of concept and role M-assertions. Thus, our actions are much less general than those used in their framework. The semantics of their action language is defined in terms of changes applied to instances, and the action effects are captured and encoded through a variant of ALCHOIQ called ALCHOIQbr. In our work, instead, the execution of an action updates a portion of the core-closed knowledge base K—the core M, which is interpreted under a close-world assumption and can be seen as a partial assignment for the interpretations that are models of K. Since we directly manipulate M, the semantics of our actions is more similar to that of [21] and, in general, to ABox updates [22,23]. Like the frameworks introduced in [9–11,20], our actions are parameterized and when combined with a core-closed knowledge base generate a transition system. In [11], the authors focus on a variant of *Knowledge and Action Bases* [21] called *Explicit-Input KABs* (eKABs); in particular, on finite and on state-bounded eKABs, for which planning existence is decidable. Our generated transition systems are an adaptation of the work done in *Description Logic based Dynamic Systems*, *KABs*, and *eKABs* to our setting of core-closed knowledge bases. In [24], the authors address decidability of the plan existence problem for logics that are subset of ALCOI. Their action language is similar to the one presented in this paper; including pre-conditions, in the form of a set of ABox assertions, post-conditions, in the form of basic addition or removal of assertions, concatenation, and input parameters. In [11], the plan synthesis problem is discussed also for lightweight description logics. Relying on the FOLreducibility of DL-LiteA, it is shown that plan synthesis over DL-Lite<sup>A</sup> can be compiled into an ADL planning problem [25]. This does not seem possible in our case, as not all necessary tests over core-closed knowledge bases are known to be FOL-reducible. In [10] and [9], the authors concentrate on verifying and synthesizing temporal properties expressed in a variant of μ-calculus over description logic based dynamic systems, both problems are relevant in our application scenario and we will consider them in future works.

## **8 Conclusion**

We focused on the problem of analyzing cloud infrastructure encoded as description logic knowledge bases combining complete and incomplete information. From a practical standpoint, we concentrated on formalizing and foreseeing the impact of potential changes pre-deployment. We introduced an action language to encode mutating actions, whose semantics is given in terms of changes induced to the complete portion of the knowledge base. We defined the static verification problem as the problem of deciding whether the execution of an action, no matter the specific parameters passed, always preserves a set of properties of the knowledge base. We characterized the complexity of the problem and provided procedural steps to solve it. We then focused on three formulations of the classical AI planning problem: namely, plan existence, generation, and synthesis. In our setting, the planning problem is formulated with respect to the transition system arising from the combination of a core-closed knowledge base and a set of actions; goals are given in terms of one, or more, Must/May conjunctive query membership assertion; and plans of interest are simple sequences of parameterized actions.

**Acknowledgments.** This work is supported by the ERC Consolidator grant D-SynMA (No. 772459).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **GK: Implementing Full First Order Default Logic for Commonsense Reasoning (System Description)**

Tanel Tammet1(B) , Dirk Draheim<sup>2</sup> , and Priit J¨arv<sup>1</sup>

<sup>1</sup> Applied Artificial Intelligence Group, Tallinn University of Technology, Tallinn, Estonia *{*tanel.tammet,priit.jarv1*}*@taltech.ee <sup>2</sup> Information Systems Group, Tallinn University of Technology, Tallinn, Estonia dirk.draheim@taltech.ee

**Abstract.** Our goal is to develop a logic-based component for hybrid – machine learning plus logic – commonsense question answering systems. The paper presents an implementation GK of default logic for handling rules with exceptions in unrestricted first order knowledge bases. GK is built on top of our existing automated reasoning system with confidence calculation capabilities. To overcome the problem of undecidability of checking potential exceptions, GK performs delayed recursive checks with diminishing time limits. These are combined with the taxonomy-based priorities for defaults and numerical confidences.

## **1 Introduction**

The problem of handling uncertainty is one of the critical issues when considering the use of logic for automating commonsense reasoning. Most of the facts and rules people use in their daily lives are uncertain. There are many types of uncertainty, like fuzziness (is a person somewhat tall or very tall), confidence (how certain does some fact seem) and exceptions (birds can typically fly, but penguins, ostriches etc., can not). Some of these uncertainties, like fuzziness and confidence, can be represented numerically, while others, like rules with exceptions, are discrete. In [18] we present the design and implementation of the CONFER framework for extending existing automated reasoning systems with confidence calculation capabilities. In the current paper we present the implementation called GK for default logic [13], built by further extending the CONFER implementation. Importantly, we design a novel practical framework for implementing default logic for the full, undecidable first order logic on the basis of a conventional resolution prover.

#### **1.1 Default Logic**

*Default logic* was introduced in 1980 by R. Reiter [13] to model one aspect of common-sense reasoning: rules with exceptions. It has remained one of the most well-known logic-based mechanisms devoted to this goal, with the *circumscription* by J. McCarthy and the *autoepistemic logic* being the early alternatives. Several similar systems have been proposed later, like defeasible logic [11].

Default logic [13] extends classical logic with default rules of the form

$$\frac{\alpha(x) : \beta\_1(x), \dots \beta\_n(x)}{\gamma(x)}$$

where a *precondition* <sup>α</sup>(x), *justifications* <sup>β</sup>1(x), ...β*n*(x) and a *consequent* <sup>γ</sup>(x) are first order predicate calculus formulas whose free variables are among x = x1, ..., x*m*. For every tuple of individuals t = t1, ..., t*n*, if the precondition α(t) is derivable and none of the *negated* justifications <sup>¬</sup>β(t) are derivable from a given knowledge base KB, then the consequent γ(t) can be derived from KB. Differently from classical and most other logics, default logic is *non-monotonic*: adding new assumptions can make some previously derivable formulas non-derivable.

As investigated in [7], the interpretation of quantifiers in default rules can lead to several versions of default logic. We follow the original interpretation of Reiter in [13] which requires the use of Skolemization in a specific manner over default rules. For example, a default rule: ∃xP(x) ∃xP(x) should be interpreted as : P(c) P(c), where c is a Skolem constant.

Consider a typical example for default logic: birds can normally fly, but penguins cannot fly. The classical logic part

# penguin(p) & bird(b) & ∀x.penguin(x) ⇒ bird(x) & ∀x.penguin(x) ⇒ ¬fly(x).

is extended with the default rule bird(x) : fly(x) fly(x). From here we can derive that an arbitrary bird b can fly, but a penguin p cannot. The default rule cannot be applied to p, since a contradiction is derivable from fly(p). This argument cannot be easily modelled using numerical confidences: the probability of an arbitrary living bird being able to fly is relatively high, while the penguins form a specific subset of birds, for which this probability is zero.

Another well-known example – Nixon's triangle – introduces the problem of multiple extensions and *sceptical* vs *credulous* entailment: the classical facts republican(nixon) & quaker(nixon) extended with two mutually excluding default rules republican(x) : <sup>¬</sup>*pacifist*(x) ¬*pacifist*(x) and quaker(x) : *pacifist*(x) *pacifist*(x). The credulous entailment allows giving different priorities to the default rules and accepts different sets (*extensions*) of consequences, if there is a way to assign priorities so that all the consequences in an extension can be derived. The sceptical entailment requires that a consequence is present in all extensions. GK follows the latter interpretation, but allows explicit priorities to be assigned to the default rules.

The concept of *priorities* for default rules has been well investigated, with several mechanisms proposed. G. Brewka argues in [4] that "for realistic applications involving default reasoning it is necessary to reason about the priorities of defaults" and introduces an ordering of defaults based on specificity: default rules for a more specific class of objects should take priority over rules for more general classes. For example, since birds (who typically do fly) are physical objects and physical objects typically do not fly, we have contradictory default rules describing the flying capability of arbitrary birds. Since birds are a subset of physical objects, the flying rule of birds should have a higher priority than the non-flying rule of physical objects.

#### **1.2 Undecidability, Grounding and Implementations**

Perhaps the most significant problem standing in the way of automating default logic is undecidability of the applicability of rules. Indeed, in order to apply a default rule, we must prove that the justifications do not lead to a contradiction with the rest of the knowledge base KB. For full first order logic this is undecidable. Hence, the standard approach for handling default logic has been creating a large ground instance *KB<sup>g</sup>* of the KB, and then performing decidable propositional reasoning on the *KB<sup>g</sup>*.

Almost all the existing implementations of default logic like DeReS [5], DLV2 [1] or CLINGO [8], with the noteworthy exception of s(CASP) [2], follow the same principle. More generally, the field of *Answer Set Programming* (ASP), see [10], is devoted to this approach. As an exception, the s(CASP) system [2] solves queries without the grounding step and is thus better suited for large domains. It is noteworthy that the s(CASP) system has been used in [9] for automating common sense reasoning for autonomous driving with the help of default rules. However, s(CASP) is a logic programming system, not a universal automated reasoner. For example, when we add a rule bird(father(X)) :- bird(X) to the formulation of the above birds example in s(CASP), the search does not terminate, apparently due to the infinitely growing nesting of terms.

While ASP systems are very well suited for specific kinds of problems over a small finite domain, grounding becomes infeasible for large first order knowledge bases (*KB* in the following), in particular when the domain is infinite and nested terms can be derived from the KB. The approach described in this paper accepts the lack of logical omniscience and performs delayed recursive checking of exceptions with diminishing time limits directly on non-grounded clauses, combined with the taxonomy-based priorities for defaults and numerical confidences.

# **2 Algorithms**

Our approach of implementing default rules in GK for first order logic is to delay justification checking until a first-order proof is found and then perform recursively deepening checks with diminishing time limits. Thus, our system first produces a potentially large number of different candidate proofs and then enters a recursive checking phase. The idea of delaying justification checking is already present in the original paper of R. Reiter [13], where he uses linear resolution and delayed checks as the main machinery of his proofs. The results produced by GK thus depend on the time limits and are not stable. Showing specific fixpoint properties of the algorithm is not in the scope of our paper.

A practical question for implementation is the actual representation of default rules and making the rules fit the first-order proof search machinery. To this end we introduce *blocker atoms* which are similar to the justification indexes of Reiter.

In the following we will assume that the underlying first order reasoner uses the resolution method, see [3] for details. The rest of the paper assumes familiarity with the basic concepts, terminology and algorithms of the resolution method.

#### **2.1 Background: Queries and Answers**

We assume our system is presented with a question in one of two forms: *(1)* Is the statement <sup>Q</sup> true? *(2)* Find values <sup>V</sup> for existentially bound variables in <sup>Q</sup> so that Q is true. For simplicity's sake we will assume that the statement Q is in the prefix form, i.e., no quantifiers occur in the scope of other logical connectives.

In the second case, it could be that several different value vectors can be assigned to the variables, essentially giving different answers. We also note that an answer could be a disjunction, giving possible options instead of a single definite answer.

A widely used machinery in resolution-based theorem provers for extracting values of existentially bound variables in <sup>Q</sup> is to use a special *answer predicate*, converting a question statement Q to a formula ∃X(Q(X)&¬answer(X)) for a tuple of existentially quantified variables X in Q [6]. Whenever a clause is derived which consists of only answer predicates, it is treated as a contradiction (essentially, answer) and the arguments of the answer predicate are returned as the values looked for. A common convention is to call such clauses *answer clauses*. We will require that the proof search does not stop whenever an answer clause is found, but will continue to look for new answer clauses until a predetermined time limit is reached. See [16] for a framework of extracting multiple answers.

We also assume that queries take a general form (*KB*&A) <sup>⇒</sup> <sup>Q</sup> where *KB* is a commonsense knowledge base, A is an optional set of precondition statements for this particular question and Q is a question statement. The whole general query form is negated and converted to clauses, i.e., disjunctions of literals (positive or negative atoms). We will call the clauses stemming from the question statement *question clauses*.

#### **2.2 Blocker Atoms and Justification Checking**

Without loss of generality we assume that the precondition and consequent formulas α and γ in default rules are clauses and justifications β1, ..., β*<sup>n</sup>* are literals, i.e. positive or negative atoms: α : β1, ...β*<sup>n</sup>* γ. Complex formulas can be encoded with a new predicate over the free variables of the formula and an equivalence of the new atom with the formula. Recall that Reiter assumes that the default rules are Skolemized.

We encode a default rule as a clause by concatenating into one clause the precondition and consequent clauses α(x) and γ(x) and blocker atoms block(¬β1), ..., block(¬β*n*) where each justification β*<sup>i</sup>* is either a positive or a negative atom. The negation <sup>¬</sup> is used since we prefer to speak about *blockers* and not *justificatons*. For example, the "birds can fly" default rule is represented as a clause

$$\neg \mathtt{birrd}(\mathtt{x}) \lor \mathtt{f1y}(\mathtt{x}) \lor \mathtt{b1ock}(\mathtt{0}, \mathtt{neg}(\mathtt{f1y}(\mathtt{x})))$$

where X is a variable and neg(fly(X)) encodes the negated justification. The first argument of the blocker (0 above) encodes priority information covered in the next section.

A proof of a question clause is a clause containing only answer atoms and blocker atoms. In the justification checking phase the system attempts to prove each decoded second blocker argument ¬β*<sup>i</sup>* in turn: the proof is considered invalid if some of ¬β*<sup>i</sup>* can be proved and this checking-proof itself is valid. If we pose a question fly(X) ⇒ answer(X) to the system to be proved (see the earlier example), we get two different answers: answer(p) ∨ block(neg(fly(p)) and answer(b) ∨ block(neg(fly(b)). Checking the first of these means trying to prove ¬fly(p) which succeeds, hence the first answer is invalid. Checking the second answer we try to prove ¬fly(b) which fails, hence the answer is valid.

Notice that the contents ¬β*<sup>i</sup>* of blockers, just like answer clauses, have a role of collecting substitutions during the proof search: this enables us to disregard the order in which the clauses are used, i.e. both top-down, bottom-up and mixed proof search strategies can be used.

Importantly, blockers are used during the subsumption checks similarly to ordinary literals. A clause C<sup>1</sup> with fewer or more general literals than C<sup>2</sup> is hence always preferred to C2, given that (a) the literals of C<sup>1</sup> subsume C2, disregarding the priority arguments of blockers, and (b) the priority arguments of corresponding blocker literals in C<sup>1</sup> are equal or stronger than these of C2. When combined with the uncertainty and inconsistency handling mechanisms of CONFER, the subsumption restrictions of the latter also apply. There are also other differences to ordinary literals. First, we prohibit the application of equality (demodulation or paramodulation) to the contents of blocker atoms during proof search. Second, we discard clauses containing mutually contradictory blockers (assuming the decoding of the second argument) like we would discard ordinary tautologies.

#### **2.3 Priorities, Recursion and Infinite Branches**

Default rule priorities are critical for the practical encoding of commonsense knowledge. The usage of priorities in proof search is simple: when checking a blocker with a given priority, it is not allowed to use default rules with a lower priority. We encode priority information as a first argument of the blocker literal, offering several ways to determine priority: either as an integer, a taxonomy class number, a string in a taxonomy or a combination of these with an integer.

For automatically using specificity we employ taxonomy classes: a class has a higher prirority than those above it on the taxonomy branch. We have built a topologically sorted acyclic graph of English words using the WordNet taxonomy along with an efficient algorithm for quick priority checks during proof search. Taxonomy classes are indicated with a special term like \$(61598). Alternatively one can use an actual English word like \$("bird") which is automatically recognized to be more specific than, say, \$("object"). To enable more fine-grained priorities, an integer can be added to the term like \$("bird", 2) generating a lexicographic order.

The recursive check for the non-provability of blockers can go arbitrarily deep, except for the time limits. Our algorithm allocates N seconds for the whole proof search and spends half of N for looking for different proofs and answers for the query, with the other half split evenly for each answer. Again, the time allocated for checking an answer is split evenly between the blockers in the answer. Each such time snippet is again split between a search for the proof of the blocker, and if found, for recursively checking the validity of this proof. Once the allocated time is below a given threshold (currently one millisecond) the proof is assumed to be not found.

Answers given by the system depend on the amount of time given, the search strategy chosen etc. For example, consider the Nixon triangle presented earlier, with two contradictory default rules. In case the priorities of these rules are equal and we allow defaults with the same priority to be used for checking an answer containing a blocker, the recursive check terminates only because of a time limit, which is unpredictable. Hence, we may sometimes get one answer and sometimes another. In order to increase both stability and efficiency, GK checks the blockers in the search nodes above, and terminates with failure in cases nonterminating loops are detected. Therefore GK always gives a sceptical result to the Nixon triangle: neither *pacifist*(nixon) nor <sup>¬</sup>*pacifist*(nixon) is proven.

## **3 Confidences and Inconsistencies**

GK integrates the exception-handling algorithms described in the previous chapter with the algorithms designed for handling inconsistent KB-s and numeric confidences assigned to clauses, previously presented as a CONFER framework in [18]. The framework is built on the resolution method. It calculates the estimates for the confidences of derived clauses, using both (a) the decreasing confidence of a conjunction of clauses as performed by the resolution and paramodulation rule, and (b) the increasing confidence of a disjunction of clauses for cumulating evidence. CONFER handles inconsistent KB-s by requiring the proofs of answers to contain the clauses stemming from the question posed. It performs searches both for the question and its negation and returns the resulting confidence calculated as a difference of the confidences found by these two searches.

The integrated algorithm is more complex than the one we previously described. Whenever the algorithms of the previous chapter speak about "proving", the system actually performs two independent searches – one for the positive and one for the negated goal – with the confidences calculated for both of these. A blocker is considered to be proved in case the resulting confidence is over a pre-determined configurable threshold, by default 0.5. Blocker proofs must also contain the clause built from the blocker. Thus, the whole search tree for a query consists of two types of interleaved layers: positive/negative confidence searches and blocker checking searches, the latter type potentially making the tree arbitrarily deep up to the minimal time limit threshold.

# **4 Implementation and Experiments**

The described algorithms are implemented by the first author as a software system GK available at https://logictools.org/gk/. GK is written in C on top of our implementation of the CONFER framework [18] which is built on top of a high-performance resolution prover GKC [17] (see https://github.com/tammet/ gkc) for conventional first order logic. Thus GK inherits most of the capabilities and algorithms of GKC.

A tutorial and a set of default logic example problems along with proofs from GK are also available at http://logictools.org/gk. GK is able to quickly solve nontrivial problems built by extending classic default logic examples. It is also able to solve classification problems combining exception and cumulative evidence and problems with dynamic situations using fluents, including planning problems. We have built a very large integrated knowledge base from the Quasimodo [14] and ConceptNet [15] knowledge bases, converting these to default logic plus confidences. GK is able to solve simple problems using this large knowledge base along with the Wordnet taxonomy for specificity: see the referenced web page for examples.

The following small example illustrates the fundamental difference of GK from the existing ASP systems for default logic. The standard penguins and birds example presented above in the ASP syntax is

```
bird(b1).
penguin(p1).
bird(X) :- penguin(X).
flies(X) :- bird(X), not -flies(X).
```
Both GK and the ASP systems clingo 5.4.0, dlv 2.1.1 and s(CASP) 0.21.10.09 give an expected answer to the queries flies(b1) and flies(p1). However, when we add the rules

bird(father(X)) :- bird(X). penguin(father(X)) :- penguin(X).

none of these ASP systems terminate for these queries, while GK does solve the queries as expected. Notably, as pointed out by the author of s(CASP), this system does terminate for the reformulation of the same problem with the two replacement rules

flies(X) :- bird(X), not abs(X). abs(X) :- penguin(X).

while clingo and dlv do not terminate. When we instead add the facts and rules

```
father(b1,b2).
father(p1,p2).
...
father(bN-1,bN).
father(pN-1,pN).
ancestor(X,Y):- father(X,Y).
ancestor(X,Y) :- ancestor(X,Z), ancestor(Z,Y).
```
for a large N, s(CASP) does not terminate and clingo and dlv become slow for flies(b1): ca 8 s for N = 500 and ca 1 min for N = 1000 on a laptop with a 10-th generation i7 processor. GK solves the same question with N = 1000 under half a second and with N = 100000 under three seconds: the latter problem size is clearly out of scope of the capabilities of existing ASP systems.

We have previously shown that the confidence handling mechanisms in CON-FER may slow down proof search for certain types of problems, but do not have a strong negative effect on very large commonsense CYC [12] problems in the TPTP problem collection. Differently from CONFER, the algorithms for default logic described above do not substantially modify the resolution method implementation of pure first order logic search, thus the performance of these parts of GK are mostly the same as of GKC. The ability to give a correct answer to a query during a given time limit depends on the performance of these components, and not on the overall recursively branching algorithm.

#### **5 Summary and Future Work**

We have presented algorithms and an implementation of an automated reasoning system for default logic on the basis of unrestricted first order logic and a resolution method. While there are several systems able to solve default logic or similar nonmonotonic logic problems, these are built on the basis of answer set programming and are normally based on grounding. We are not aware of other full first order logic reasoning systems for default logic, and neither of systems integrating confidences and inconsistency-handling with rules with exceptions.

Future work is planned on three directions: adding features to the solver, proving several useful properties of the algorithms and incorporating the solver into a commonsense reasoning system able to handle nontrivial tasks posed in natural language. The work on incorporating similarity-based reasoning into GK and building a suitable semantic parser for natural language is currently ongoing. We are particularly interested in exploring practical ways to integrate GK with the machine learning techniques for natural language.

#### **References**

1. Alviano, M., et al.: The ASP system DLV2. In: Balduccini, M., Janhunen, T. (eds.) LPNMR 2017. LNCS (LNAI), vol. 10377, pp. 215–221. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61660-5 19


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Hypergraph-Based Inference Rules for Computing** *EL***<sup>+</sup>-Ontology Justifications**

Hui Yang(B) , Yue Ma, and Nicole Bidoit

LISN, CNRS, Universit´e Paris-Saclay, Gif-sur-Yvette, France {yang,ma,nicole.bidoit}@lisn.fr

**Abstract.** To give concise explanations for a conclusion obtained by reasoning over ontologies, *justifications* have been proposed as minimal subsets of an ontology that entail the given conclusion. Even though computing one justification can be done in polynomial time for tractable Description Logics such as EL<sup>+</sup>, computing all justifications is complicated and often challenging for real-world ontologies. In this paper, based on a graph representation of EL<sup>+</sup>-ontologies, we propose a new set of *inference rules* (called H-rules) and take advantage of them for providing a new method of computing all justifications for a given conclusion. The advantage of our setting is that most of the time, it reduces the number of *inferences* (generated by H-rules) required to derive a given conclusion. This accelerates the enumeration of justifications relying on these inferences. We validate our approach by running real-world ontology experiments. Our graph-based approach outperforms PULi [14], the state-of-the-art algorithm, in most of cases.

### **1 Introduction**

Ontologies provide structured representations of domain knowledge that are suitable for AI reasoning. They are used in various domains, including medicine, biology, and finance. In the domain of ontologies, one of the interesting topics is to provide explanations of reasoning conclusions. To this end, *justifications* have been proposed to offer users a brief explanation for a given conclusion. Computing justifications has been widely explored for different tasks, for instance for debugging ontologies [1,9,11] and computing ontology modules [6]. Extracting just one justification can be easy for tractable ontologies, such as EL<sup>+</sup> [17]. For instance, we can find one justification by deleting unnecessary axioms one by one. However, there may exist more than one justification for a given conclusion. Computing all such justifications is computationally complex and reveals itself to be a challenging problem [18].

There are mainly two different approaches [17] to compute all justifications for a given conclusion, the *black-box* approach and the *glass-box* approach. The *black-box* approach [11] relies only on a reasoner and, as such, can be

c The Author(s) 2022

This work is funded by the BPI-France (PSPC AIDA: 2019-PSPC-09).

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 310–328, 2022. https://doi.org/10.1007/978-3-031-10769-6\_19

used for ontologies in any existing Description Logics. For example, a simple (naive) *black-box* approach would check all the subsets of the ontology using an existing reasoner and then filter the subset-minimal ones (i.e., justifications). Many advanced and optimized black-box algorithms have been proposed since 2007 [10]. Meanwhile, the glass-box approaches have achieved better performances over certain specific ontology languages (such as EL<sup>+</sup>-ontology) by going deep into the reasoning process. Among them, the class of SAT-based methods [1–3,14,16] performs the best. The main idea developed by SAT-based methods is to trace, in a first step, a *complete set of inferences* (*complete set* for short) that contribute to the derivation of a given conclusion, and then, in a second step, to use SAT-tools or resolution to extract all justifications from these inferences. A detailed example is provided in Sect. 4.1.

In the real world, ontologies are always huge. For instance, the SnomedCT ontology contains more than 300,000 axioms. Thus, the traced *complete set* can be large, which could make it challenging to extract the justifications over them. Several techniques could be applied to reduce the size of the traced *complete set*, like the *locality-based modules* [8] and the *goal-directed tracing algorithm* [12]. One of their shared ideas is to identify, for a given conclusion, a particular part of the ontology relevant for the extraction of justifications. For example, the stateof-the-art algorithm, PULi [14], uses a *goal-directed tracing algorithm*. However, even for PULi, a simple ontology <sup>O</sup> <sup>=</sup> {A*<sup>i</sup>* - <sup>A</sup>*i*+1 <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> <sup>−</sup> <sup>1</sup>} with the conclusion <sup>A</sup><sup>0</sup> - <sup>A</sup>*<sup>n</sup>* leads to a *complete set* containing <sup>n</sup> <sup>−</sup> 1 inferences. This set can not be reduced further even with the previously mentioned optimizations. From this observation, we decided to explore a new SAT-based glass-box method to handle such situations better.

Now, let us look carefully at the ontology O above, and let us regard each <sup>A</sup>*<sup>i</sup>* as a graph node <sup>N</sup>*<sup>A</sup><sup>i</sup>* . Then we are able to construct, for <sup>O</sup>, a directed graph whose edges are of the form <sup>N</sup>*<sup>A</sup><sup>i</sup>* <sup>→</sup> <sup>N</sup>*<sup>A</sup>i*+1 . It turns out that all the justifications for the conclusion <sup>A</sup><sup>0</sup> - A*<sup>n</sup>* are extracted from all the paths from N*<sup>A</sup>*<sup>0</sup> to N*<sup>A</sup><sup>n</sup>* , and here we have only one such path. We can easily extend this idea on EL<sup>+</sup> ontology because most of the EL<sup>+</sup>-axioms can be interpreted as direct edges except one case (i.e., <sup>A</sup> <sup>≡</sup> <sup>B</sup>1···B*n*), for which we need a hyperedge (for more details see Definition 3). However, for more expressive ontologies, this translation becomes more complicated. For example, it is hard to map ALC-axioms to edges as those axioms may contain negation or disjunction of concepts.

This example inspired us to explore a hypergraph representation of the ontology and reformulate inferences and justifications. Roughly, our inferences are built from elementary paths of the hypergraph and lead to particular paths called H-paths. Then, computing all the justifications for a given conclusion is made using such H-paths. For the previous ontology O and the conclusion <sup>A</sup><sup>0</sup> - A*n*, our *complete set* is reduced to only two inferences (no matter the value of n) corresponding to the unique path from N*<sup>A</sup>*<sup>0</sup> to N*<sup>A</sup><sup>n</sup>* . The source of improvement provided by our method is twofold. On the one hand, it comes from the fact that elementary paths are pre-computed while extracting the inferences and that existing algorithms like depth-first search can efficiently compute such paths. On the other hand, yet as a consequence, decreasing the size of the *complete sets* of inferences leads to smaller inputs for the SAT-based algorithm extracting justifications from the *complete set* (recall here that our method is a SAT-based glass-box method).

The paper is organized as follows. Section 2 introduces preliminary definitions and notions. In Sect. 3, we associate a hypergraph representation to EL<sup>+</sup> ontology and introduce a new set of inference rules, called H-rules, that generate our inferences. In Sect. 4, we develop the algorithm minH, which compute justifications based on our inferences. Section 5 shows experimental results and Sect. 6 summarizes our work.

# **2 Preliminaries**

# **2.1** *EL***<sup>+</sup>-Ontology**

Given sets of atomic concepts <sup>N</sup>*<sup>C</sup>* <sup>=</sup> {A, B, ···} and atomic roles <sup>N</sup>*<sup>R</sup>* <sup>=</sup> {r, s, t, ···}, the set of EL<sup>+</sup>concepts <sup>C</sup> and axioms <sup>α</sup> are built by the following grammar rules:

$$C ::= \top \mid A \mid C \sqcap C \mid \exists r. C, \quad a ::= C \sqsubseteq C \mid C \equiv C \mid r \sqsubseteq s \mid r\_1 \diamond \cdots \diamond r\_n \sqsubseteq s.$$

<sup>A</sup> EL<sup>+</sup>-ontology <sup>O</sup> is a finite set of EL<sup>+</sup>-axioms. An **interpretation** <sup>I</sup> <sup>=</sup> ( I, · <sup>I</sup>) of O consists of a non-empty set <sup>I</sup> and a mapping from atomic concepts <sup>A</sup> <sup>∈</sup> <sup>N</sup>*<sup>C</sup>* to a subset <sup>A</sup><sup>I</sup> <sup>⊆</sup><sup>I</sup> and from roles <sup>r</sup> <sup>∈</sup> <sup>N</sup>*<sup>R</sup>* to a subset <sup>r</sup><sup>I</sup> <sup>⊆</sup><sup>I</sup> × <sup>I</sup>. For a concept C built from the grammar rules, we define <sup>C</sup><sup>I</sup> inductively by: ()<sup>I</sup> <sup>=</sup> <sup>I</sup>,(<sup>C</sup> <sup>D</sup>)<sup>I</sup> <sup>=</sup> <sup>C</sup><sup>I</sup> <sup>∩</sup> <sup>D</sup>I, (∃r.C)<sup>I</sup> <sup>=</sup> {<sup>a</sup> <sup>∈</sup><sup>I</sup> | <sup>∃</sup>b,(a, b) <sup>∈</sup> <sup>r</sup>I, b <sup>∈</sup> <sup>C</sup><sup>I</sup>}, (<sup>r</sup> ◦s)<sup>I</sup> <sup>=</sup> {(a, b) <sup>∈</sup><sup>I</sup> × <sup>I</sup> | ∃c,(a, c) <sup>∈</sup> <sup>r</sup>I,(c, b) <sup>∈</sup> <sup>s</sup><sup>I</sup>}. An interpretation is a **model** of O if it is compatible with all axioms in <sup>O</sup>, i.e., for all <sup>C</sup> - D, C <sup>≡</sup> D, r s, r<sup>1</sup> ◦ ··· ◦ <sup>r</sup>*<sup>n</sup>* <sup>s</sup> ∈ O, we have <sup>C</sup><sup>I</sup> <sup>⊆</sup> <sup>D</sup>I, C<sup>I</sup> <sup>=</sup> <sup>D</sup>I, r<sup>I</sup> <sup>⊆</sup> <sup>s</sup>I,(r<sup>1</sup> ◦···◦ <sup>r</sup>*n*)<sup>I</sup> <sup>⊆</sup> <sup>s</sup>I, respectively. We say O |<sup>=</sup> <sup>a</sup> where <sup>α</sup> is an axiom iff each model of <sup>O</sup> is compatible with <sup>α</sup>. A concept <sup>A</sup> is **subsumed** by <sup>B</sup> w.r.t. <sup>O</sup> if O |<sup>=</sup> <sup>A</sup> -B.

Next, we use A, B, ··· , G (possibly with subscripts) to denote atomic concepts and we use X, Y, Z (possibly with subscripts) to denote atomic concepts A, ··· , G, or complex concepts <sup>∃</sup>r.A, ··· , <sup>∃</sup>r.G.

We assume that ontologies are normalized. A EL<sup>+</sup>-ontology <sup>O</sup> is **normalized** if all its axioms are of the form <sup>A</sup> <sup>≡</sup> <sup>B</sup><sup>1</sup> ··· <sup>B</sup>*m*, A - <sup>B</sup><sup>1</sup> ··· <sup>B</sup>*m*, A <sup>≡</sup> <sup>∃</sup>r.B, A -<sup>∃</sup>r.B, r <sup>s</sup>, or <sup>r</sup> ◦ <sup>s</sup> <sup>t</sup>, where A, B, B*<sup>i</sup>* <sup>∈</sup> <sup>N</sup>*<sup>C</sup>* , and r, s, t <sup>∈</sup> <sup>N</sup>*R*. Every EL<sup>+</sup>-ontology can be normalised in polynomial time by introducing new atomic concepts and atomic roles.

**Example 1.** *The following set of axioms is a* EL<sup>+</sup>*-ontology:* <sup>O</sup> <sup>=</sup> { <sup>a</sup>1:<sup>A</sup> - D, a2:<sup>D</sup> -<sup>∃</sup>r.E, a3:<sup>E</sup> - F, a4:<sup>B</sup> ≡ ∃t.F, a5:<sup>r</sup> t, a6:<sup>G</sup> <sup>≡</sup> <sup>C</sup> B ,a7:<sup>C</sup> -<sup>A</sup>}.

*It is clear that* O |<sup>=</sup> <sup>A</sup> -<sup>∃</sup>r.E *as for all models* <sup>I</sup>*, we have* <sup>A</sup><sup>I</sup> <sup>⊆</sup> <sup>D</sup><sup>I</sup> *by the axiom* <sup>a</sup><sup>1</sup> *and* <sup>D</sup><sup>I</sup> <sup>⊆</sup> (∃r.E)<sup>I</sup> *by* <sup>a</sup>2*.*

**Table 1.** Inference rules over EL<sup>+</sup>-ontology.

$$\begin{aligned} \mathcal{R}\_1 &: \frac{A \sqsubseteq A\_1, \dots, A \sqsubseteq A\_n, \quad A\_1 \sqcap A\_2 \sqcap \dots \sqcap \overline{A\_n \sqsubseteq B}}{A \sqsubseteq B} \\ \mathcal{R}\_2 &: \frac{A \sqsubseteq A\_1, \quad A\_1 \sqsubseteq \exists r. B}{A \sqsubseteq \exists r. B} \quad & \mathcal{R}\_3 &: \frac{A \sqsubseteq \exists r. B\_1, \quad B\_1 \sqsubseteq B\_2, \quad \exists r. B\_2 \sqsubseteq B}{A \sqsubseteq B} \\ \mathcal{R}\_4 &: \frac{A\_0 \sqsubseteq \exists r. A\_1, \quad \dots , A\_{n-1} \sqsubseteq \exists r\_n. A\_n, \quad r\_1 \diamond \cdots \diamond r\_n \sqsubseteq r}{A\_0 \sqsubseteq \exists r. A\_n} \end{aligned}$$

#### **2.2 Inference, Support and Justification**

Given a EL<sup>+</sup>-ontology <sup>O</sup>, a major reasoning task over <sup>O</sup> is *classification*, which aims at finding all subsumptions O |<sup>=</sup> <sup>A</sup> - B for atomic concepts A, B occurring in O. Generally, it can be solved by applying *inferences* recursively over O [5].

An **inference** <sup>ρ</sup> is a pair ρ*pre*, ρ*con* whose *premise* set <sup>ρ</sup>*pre* consists of EL<sup>+</sup> axioms and *conclusion* <sup>ρ</sup>*con* is a single EL<sup>+</sup>-axiom. As usual, a sequence of inferences <sup>ρ</sup>1, ··· , ρ*<sup>n</sup>* is a **derivation** of an axiom <sup>α</sup> from <sup>O</sup> if <sup>ρ</sup>*<sup>n</sup> con* = α and for any <sup>β</sup> <sup>∈</sup> <sup>ρ</sup>*<sup>i</sup> pre*, <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we have <sup>β</sup> ∈ O or <sup>β</sup> <sup>=</sup> <sup>ρ</sup>*<sup>j</sup> con* for some j<i.

As usual, **inference rules** are used to generate inferences. For instance, Table <sup>1</sup> [1,5] shows a set of inference rules for EL<sup>+</sup>-ontologies. Next, we use O <sup>A</sup> - <sup>B</sup> to denote that <sup>A</sup> - <sup>B</sup> is derivable from <sup>O</sup> using inferences generated by the rules in Table 1. The set of inference rules in Table 1 is *sound* and *complete* for classification [5], i.e., O |<sup>=</sup> <sup>A</sup> - <sup>B</sup> iff O <sup>A</sup> -<sup>B</sup> for any A, B <sup>∈</sup> <sup>N</sup>*<sup>C</sup>* .

<sup>A</sup> **support** of <sup>A</sup> - <sup>B</sup> over <sup>O</sup> is a sub-ontology <sup>O</sup> ⊆ O such that <sup>O</sup> <sup>|</sup><sup>=</sup> <sup>A</sup> - B. The **justifications** for <sup>A</sup> - <sup>B</sup> are subset-minimal supports of <sup>A</sup> - B. We denote the collection of all justifications for <sup>A</sup> - <sup>B</sup> w.r.t. <sup>O</sup> by <sup>J</sup>O(<sup>A</sup> -B).

We say <sup>S</sup> is a **complete set** (of inferences) for <sup>A</sup> - B if for any justifications <sup>O</sup> of <sup>A</sup> - <sup>B</sup>, we can derive <sup>A</sup> -<sup>B</sup> from <sup>O</sup> using only the inferences in <sup>S</sup>.

**Example 2 (Example** 1 **cont'd).** *Before applying inference rules, axioms in* <sup>O</sup> *are preprocessed in order to be compatible with Table 1. For example,* <sup>a</sup><sup>4</sup> *is replaced by* <sup>B</sup> -<sup>∃</sup>t.F *and* <sup>∃</sup>t.F - B*. Then, according to the inference rules of Table 1, we may produce the following inferences:* <sup>ρ</sup> <sup>=</sup> {<sup>A</sup> - D, D -<sup>∃</sup>r.E}, A - <sup>∃</sup>r.E*,* <sup>ρ</sup> <sup>=</sup> {<sup>A</sup> -<sup>∃</sup>r.E, r <sup>t</sup>}, A -<sup>∃</sup>t.E *and* <sup>ρ</sup> <sup>=</sup> {<sup>A</sup> -<sup>∃</sup>t.E, E - F, <sup>∃</sup>t.F - <sup>B</sup>}, A - <sup>B</sup> *generated by rule* <sup>R</sup>2*,* <sup>R</sup><sup>4</sup> *and* <sup>R</sup><sup>3</sup> *respectively. Then* O <sup>A</sup> - B *since* <sup>A</sup> - <sup>B</sup> *is derivable from* <sup>O</sup> *by the sequence* ρ, ρ , ρ*.*

*Notice that* <sup>O</sup> <sup>=</sup> {a1, a2, a3, a4, a5} *is a support for* <sup>A</sup> - B*, and thus, any superset* <sup>O</sup> *of* <sup>O</sup> *is a support of* <sup>A</sup> - <sup>B</sup>*.* <sup>O</sup> *is also one of the justifications for* <sup>A</sup> - <sup>B</sup> *as for any* <sup>O</sup> ⊂ O *, we have* <sup>O</sup> |<sup>=</sup> <sup>A</sup> - B*. Moreover, here the three inferences* ρ, ρ , ρ *provide a complete set for* <sup>A</sup> -B*.*

#### **3 Hypergraph-Based Inference Rules**

#### **3.1 H-Inferences**

In general, a (directed) hypergraph <sup>G</sup> = (V, <sup>E</sup>) is defined by a set of nodes <sup>V</sup> and a set of hyperedges <sup>E</sup> [4,7]. A hyperedge is of the form <sup>e</sup> = (S1, S2), S1, S<sup>2</sup> ⊆ V. In this paper, a hypergraph is associated to an ontology as follows:

**Definition 3.** *For a given* EL<sup>+</sup>*-ontology* <sup>O</sup>*, the associated hypergraph is* <sup>G</sup><sup>O</sup> <sup>=</sup> (VO, <sup>E</sup>O) *where (i) the set of nodes* <sup>V</sup><sup>O</sup> <sup>=</sup> {N*A*, N*r*, N∃*r.A* <sup>|</sup> <sup>A</sup> <sup>∈</sup> *<sup>N</sup><sup>C</sup>* , r <sup>∈</sup> *<sup>N</sup>R*} *and (ii) the set of edges* <sup>E</sup><sup>O</sup> *is defined by* <sup>f</sup>(O) *where* <sup>f</sup> *is the multi-valued mapping shown in Fig. 1. Given a hyperedge* <sup>e</sup> *of* <sup>E</sup>O*, the inverse image of* <sup>e</sup>*,* <sup>f</sup> <sup>−</sup><sup>1</sup>(e)*, is defined in the obvious manner. For a set* <sup>E</sup> *of hyperedges,* <sup>f</sup> <sup>−</sup><sup>1</sup>(E) = <sup>∪</sup>*e*∈*E*<sup>f</sup> <sup>−</sup><sup>1</sup>(e).


**Fig. 1.** Definition of f (left) and graphical illustrations of f(α) (right)

Notice that, the hyperedges associated with <sup>A</sup> <sup>≡</sup> <sup>B</sup><sup>1</sup> ··· <sup>B</sup>*<sup>m</sup>* are (i) the hyperedge ({N*<sup>B</sup>*<sup>1</sup> , ··· , N*<sup>B</sup>m*}, {N*A*}) and (2) of course, the edges corresponding to <sup>A</sup> -<sup>B</sup><sup>1</sup> ··· <sup>B</sup>*m*.

**Example 4 (Example** 1 **cont'd).** *The hypergraph* G<sup>O</sup> *for* O *is shown in Fig. 2, where* <sup>e</sup><sup>0</sup> = ({N*<sup>C</sup>* }, {N*A*})*,* <sup>e</sup><sup>1</sup> = ({N*A*}, {N*D*})*,* <sup>e</sup><sup>2</sup> = ({N*D*}, {N∃*r.E*})*, etc. Also,* <sup>f</sup> <sup>−</sup><sup>1</sup>(e0) = <sup>C</sup> - A, f <sup>−</sup><sup>1</sup>(e1) = <sup>A</sup> - <sup>D</sup>*, and* <sup>f</sup> <sup>−</sup><sup>1</sup>(e2) = <sup>D</sup> -<sup>∃</sup>r.E*, etc.*

$$\begin{array}{c} N\_{\exists r.E} \xleftarrow{e\_2} N\_D \xleftarrow{e\_1} N\_A \xleftarrow{e\_0} N\_C \xleftarrow{e\_0} N\_C \xleftarrow{e\_{10}} N\_G \xleftarrow{e\_{10}} N\_T\\ N\_{\exists t.E} \xleftarrow{e\_5} N\_F \xleftarrow{e\_6} N\_B \xleftarrow{e\_6} N\_E \xleftarrow{e\_9} N\_{\exists r.E} \xleftarrow{e\_3} N\_{\exists r.X} \\ N\_{\exists r.F} \xleftarrow{e\_5} N\_{\exists r.F} \xleftarrow{e\_7} N\_{\forall r.X} \xleftarrow{e\_{10}} N\_{\exists r.X} \end{array}$$

**Fig. 2.** The hypergraph associated with the ontology O.

As for graphs, a path (next called **regular path**) from nodes N<sup>1</sup> to N<sup>2</sup> in a hypergraph is a sequence of edges:

$$e\_0 = (S\_1^0, S\_2^0), e\_1 = (S\_1^1, S\_2^1), \dots, e\_n = (S\_1^n, S\_2^n) \tag{1}$$

where <sup>N</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup><sup>0</sup> <sup>1</sup> , N<sup>2</sup> <sup>∈</sup> <sup>S</sup>*<sup>n</sup>* <sup>2</sup> and S*<sup>i</sup>*−<sup>1</sup> <sup>2</sup> = S*<sup>i</sup>* <sup>1</sup>, <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> n. Next, the **existence** of a regular path from <sup>N</sup>*<sup>X</sup>* to <sup>N</sup>*<sup>Y</sup>* in a hypergraph <sup>G</sup><sup>O</sup> is denoted <sup>N</sup>*<sup>X</sup>* - N*<sup>Y</sup>* . Now, we introduce hypergraph-based inferences which are based on the existence of regular paths as follows:

**Table 2.** H-rules over G<sup>O</sup> = (VO, EO).

$$\begin{split} \mathcal{H}\_{0}: \quad \frac{N\_{X} \sim^{h} N\_{Y}}{N\_{X} \stackrel{h}{\sim^{h}} N\_{Y}} \qquad \mathcal{H}\_{2}: \quad \frac{N\_{X} \stackrel{h}{\sim^{h}} N\_{\exists r.B\_{1}}, \quad N\_{B\_{1}} \stackrel{h}{\sim^{h}} N\_{B\_{2}}, \quad N\_{\exists r.B\_{2}} \sim^{h} N\_{Y}}{N\_{X} \stackrel{h}{\sim^{h}} N\_{Y}} \\ \mathcal{H}\_{1}: \quad \frac{N\_{X} \stackrel{h}{\sim^{h}} N\_{B\_{1}}, \dots, N\_{X} \stackrel{h}{\sim^{h}} N\_{B\_{m}}, \quad N\_{A} \stackrel{\sim}{\sim^{h}} N\_{Y}, \quad e = (\{N\_{B\_{1}}, \dots, N\_{B\_{m}}\}, \{N\_{A}\}) \in \mathcal{E}\_{\mathcal{O}} \\ \mathcal{H}\_{3}: \quad \frac{N\_{X} \stackrel{h}{\sim^{h}} N\_{\exists r.A\_{1}}, \quad N\_{A\_{1}} \stackrel{h}{\sim^{h}} N\_{\exists s.A\_{2}}, \quad N\_{\exists t.A\_{2}} \sim^{h} N\_{Y}, \quad e = (\{N\_{r}, N\_{s}\}, \{N\_{s}, N\_{t}\}) \in \mathcal{E}\_{\mathcal{O}} \end{split}$$


**Definition 5.** *Given a hypergraph* GO*, Table 2 gives a set of inference rules called H-rules. Inferences based on H-rules are called H-inferences. Next, we denote by* O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - N*<sup>Y</sup> (or simply* N*<sup>X</sup> h* - N*<sup>Y</sup> ) the fact that* N*<sup>X</sup> h* - N*<sup>Y</sup> can be derived from* G<sup>O</sup> *using the H-inferences.*

**Example 6 (Example** 4 **cont'd).** *As shown in Fig. 2, we have* N*<sup>A</sup>* - <sup>N</sup>∃*r.E,* N*<sup>E</sup>* - <sup>N</sup>*<sup>F</sup> ,* <sup>N</sup>∃*r.F* - N*<sup>B</sup> from the existence of regular paths . Then we can derive* N*<sup>A</sup> h* - <sup>N</sup>*<sup>B</sup> from* <sup>G</sup><sup>O</sup> *by the H-rules* <sup>H</sup>0*,* <sup>H</sup><sup>0</sup> *and* <sup>H</sup><sup>2</sup> *which generate the Hinferences* <sup>ρ</sup>1, ρ2, ρ3*, where* <sup>ρ</sup><sup>1</sup> <sup>=</sup> {N*<sup>A</sup>* - <sup>N</sup>∃*r.E*}, N*<sup>A</sup> h* - <sup>N</sup>∃*r.E,* <sup>ρ</sup><sup>2</sup> <sup>=</sup> {N*<sup>E</sup>* - <sup>N</sup>*<sup>F</sup>* }, N*<sup>E</sup> h* - <sup>N</sup>*<sup>F</sup> and* <sup>ρ</sup><sup>3</sup> <sup>=</sup> {N*<sup>A</sup> h* - <sup>N</sup>∃*r.E*, N*<sup>E</sup> h* - <sup>N</sup>*<sup>F</sup>* , N∃*r.F* - <sup>N</sup>*B*}, N*<sup>A</sup> h* - <sup>N</sup>*B, respectively.*

Note that the first rule H0, the initialization rule, makes regular paths the elementary components of H-rules. Moreover, Proposition 7 formally states that, in our H-inference system, we do not need to add the transitive inference rule:

$$\frac{N\_X \stackrel{h}{\leadsto} N\_Z, N\_Z \stackrel{h}{\leadsto} N\_Y}{N\_X \stackrel{h}{\leadsto} N\_Y}.$$

**Proposition 7.** *If* O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - <sup>N</sup>*<sup>Z</sup> and* O *<sup>h</sup>* <sup>N</sup>*<sup>Z</sup> h* - <sup>N</sup>*<sup>Y</sup> then* O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* -N*<sup>Y</sup> .*

#### **3.2 Completeness and Soundness of H-Inferences**

The following result is the main result of this section. It states the equivalence of N*<sup>X</sup> h* - <sup>N</sup>*<sup>Y</sup>* derivation (by Table 2) and ontology entailment for <sup>X</sup> - Y , and thus states that our H-rules are sound and complete for EL<sup>+</sup>-ontology.

**Theorem 8.** *If* <sup>O</sup> *is an* EL<sup>+</sup>*-ontology, then* O |<sup>=</sup> <sup>X</sup> - <sup>Y</sup> *iff* O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - N*<sup>Y</sup> , where* X, Y *are concepts of either form* <sup>A</sup> *or* <sup>∃</sup>r.B*.*

*Proof.* "⇐" is obvious by induction over Table <sup>2</sup> and the fact that <sup>N</sup>*<sup>X</sup>* - N*<sup>Y</sup>* implies O |<sup>=</sup> <sup>X</sup> -<sup>Y</sup> , so we only need to prove the direction "⇒".

Assume that O |<sup>=</sup> <sup>X</sup> -Y . We consider two cases: *Case 1.* We assume O <sup>X</sup> - Y <sup>1</sup>. Let d(X, Y ) be the length of one shortest derivation of <sup>X</sup> - <sup>Y</sup> from <sup>O</sup> using Table 1. We prove "⇒" by induction on d(X, Y ).

	- 1. Assume <sup>ρ</sup>*last* is generated by <sup>R</sup>1(n > 1), <sup>R</sup><sup>3</sup> or <sup>R</sup>4(<sup>n</sup> = 2). For example, assume <sup>ρ</sup>*last* <sup>=</sup> {<sup>X</sup> -<sup>∃</sup>r.B1, B<sup>1</sup> - <sup>B</sup>2, <sup>∃</sup>r.B<sup>2</sup> - <sup>Y</sup> }, X - <sup>Y</sup> comes from <sup>R</sup>3. We have <sup>d</sup>(X, <sup>∃</sup>r.B1), d(B1, B2), d(∃r.B2, Y ) < k because their corresponding subsumptions can be derived without ρ*last*. By the assumption O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - <sup>N</sup>∃*r.B*<sup>1</sup> , N*<sup>B</sup>*<sup>1</sup> *h* - <sup>N</sup>*<sup>B</sup>*<sup>2</sup> , N∃*r.B*<sup>2</sup> *h* - N*<sup>Y</sup>* . Then we have O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - <sup>N</sup>∃*r.B*<sup>2</sup> by first deriving <sup>N</sup>*<sup>X</sup> h* - <sup>N</sup>∃*r.B*<sup>1</sup> , N*<sup>B</sup>*<sup>1</sup> *h* - N*<sup>B</sup>*<sup>2</sup> , and then applying H-inference:

$$\rho^{new} = \langle \{ N\_X \stackrel{h}{\leadsto} N\_{\exists r.B\_1}, N\_{B\_1} \stackrel{h}{\leadsto} N\_{B\_2}, N\_{\exists r.B\_2} \leadsto N\_{\exists r.B\_2} \}, N\_X \stackrel{h}{\leadsto} N\_{\exists r.B\_2} \rangle.$$

Then O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - <sup>N</sup>*<sup>Y</sup>* by Proposition <sup>7</sup> since O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - <sup>N</sup>∃*r.B*<sup>2</sup> , N∃*r.B*<sup>2</sup> *h* - <sup>N</sup>*B*. The argument also holds for <sup>R</sup>1(n > 1)(or <sup>R</sup>4(<sup>n</sup> = 2)) by applying <sup>H</sup><sup>1</sup> (or <sup>H</sup>3) instead of <sup>H</sup>2.

2. Assume <sup>ρ</sup>*last* is generated by <sup>R</sup>1(<sup>n</sup> = 1), <sup>R</sup><sup>2</sup> or <sup>R</sup>4(<sup>n</sup> = 1). Then, in each case, we have <sup>ρ</sup>*last* has the form {<sup>X</sup> - Z, Z - <sup>Y</sup> }, X - <sup>Y</sup> . As in case 1, we have <sup>d</sup>(X, Z), d(Z, Y ) < k. By the assumption, O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* - N*Z*, N*<sup>Z</sup> h* - <sup>N</sup>*<sup>Y</sup>* , then O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* -N*<sup>Y</sup>* by Proposition 7.

*Case 2.* If O <sup>X</sup> - Y does not hold, then X or Y is not atomic. In this case, we introduce new axioms <sup>A</sup> <sup>≡</sup> <sup>X</sup>, <sup>B</sup> <sup>≡</sup> <sup>Y</sup> with new atomic concepts A, B and denote the extended ontology by O . Clearly, <sup>O</sup> <sup>|</sup><sup>=</sup> <sup>A</sup> - <sup>B</sup> and thus <sup>O</sup> <sup>A</sup> - B since Table <sup>1</sup> is sound and complete. Therefore, we have <sup>O</sup> *<sup>h</sup>* <sup>N</sup>*<sup>A</sup> h* - N*<sup>B</sup>* by the same arguments as above. Now, notice that GO is obtained from G<sup>O</sup> by adding 4 edges: ({N*A*}, {N*X*}),({N*X*}, {N*A*}),({N*B*}, {N*<sup>Y</sup>* }) and ({N*<sup>Y</sup>* }, {N*B*}), thus we have <sup>O</sup> *<sup>h</sup>* <sup>N</sup>*<sup>A</sup> h* -<sup>N</sup>*<sup>B</sup>* iff O *<sup>h</sup>* <sup>N</sup>*<sup>X</sup> h* -N*<sup>Y</sup>* .

# **3.3 Extracting Justifications from** *G<sup>O</sup>*

Now, we formally define H-paths as a hypergraph representation of classical derivations based on H-rules. The reader should pay attention to the fact that H-paths are not classical hyperpaths [7]. Next, for the sake of homogeneity, we consider a regular path from N*<sup>X</sup>* to N*<sup>Y</sup>* as the set of its edges and denote it as P*X,Y* .

<sup>1</sup> The reader should recall that the equivalence (O |<sup>=</sup> <sup>X</sup> - Y iff O X - Y ) only holds when X and Y are atomic concepts wrt. the inference system presented in Table 1.

**Definition 9 (H-paths).** *In the hypergraph* <sup>G</sup>O*, an H-path* <sup>H</sup>*X,Y from* <sup>N</sup>*<sup>X</sup> to* N*<sup>Y</sup> is a set of edges recursively generated by the following composition rules:*


**Fig. 3.** Structure of H-paths from N*<sup>X</sup>* to N*<sup>Y</sup>*

Figure 3 gives an illustration of H-paths: the blue arrows correspond to regular paths, and the red ones *<sup>h</sup>* to H-paths. It is straightforward to compare composition rules building H-paths with H-rules building derivations in Table 2. One may also consider H-paths as deviation-trees with leaves corresponding to the edges in GO. However, our approach provides a more direct characterization of justifications as shown in Theorem 10.

We say that an H-path H*X,Y* is **minimal** if there is no H-path H *X,Y* such that H *X,Y* <sup>⊂</sup> <sup>H</sup>*X,Y* .

Now, we are ready to explain how H-paths and justifications are related. We can compute justifications from minimal H-paths as stated below:

**Theorem 10.** *Given* X, Y *of either form* <sup>A</sup> *or* <sup>∃</sup>r.B*. Let*

<sup>S</sup> <sup>=</sup> {<sup>f</sup> <sup>−</sup><sup>1</sup>(H*X,Y* ) <sup>|</sup> <sup>H</sup>*X,Y is a minimal H-path from* <sup>N</sup>*<sup>X</sup> to* <sup>N</sup>*<sup>Y</sup>* }.

*Then* <sup>J</sup>O(<sup>X</sup> - <sup>Y</sup> ) = {<sup>s</sup> ∈S| <sup>s</sup> ⊂ s, <sup>∀</sup>s ∈ S}*. That is, all justifications for* <sup>X</sup> -<sup>Y</sup> *are the minimal subsets in* <sup>S</sup>*.*

*Proof.* For any justification <sup>O</sup> of <sup>X</sup> - Y , there exists a minimal H-path H*X,Y* such that <sup>O</sup> <sup>=</sup> <sup>f</sup> <sup>−</sup><sup>1</sup>(H*X,Y* ). The reason is that, since <sup>O</sup> <sup>|</sup><sup>=</sup> <sup>X</sup> - Y , there exists an H-path <sup>H</sup>*X,Y* from <sup>N</sup>*<sup>X</sup>* to <sup>N</sup>*<sup>Y</sup>* on <sup>G</sup>O by Theorem 8. Without loss of generality, we can assume <sup>H</sup>*X,Y* is minimal on <sup>G</sup>O- , then it is also minimal on G<sup>O</sup> since GO is a sub-graph of <sup>G</sup>O. We have <sup>O</sup> <sup>=</sup> <sup>f</sup> <sup>−</sup><sup>1</sup>(H*X,Y* ) because otherwise there exists <sup>O</sup> - <sup>O</sup> such that <sup>O</sup> <sup>=</sup> <sup>f</sup> <sup>−</sup><sup>1</sup>(H*X,Y* ), and thus <sup>O</sup> <sup>|</sup><sup>=</sup> <sup>X</sup> - Y by Theorem 8 again. Therefore, O is not a justification. Contradiction.

Now, we know <sup>S</sup> contains all justifications for <sup>X</sup> - Y . Moreover, <sup>f</sup> <sup>−</sup><sup>1</sup>(H*X,Y* ) <sup>|</sup><sup>=</sup> <sup>X</sup> - <sup>Y</sup> for any H-path <sup>H</sup>*X,Y* . Therefore, we have <sup>J</sup>O(<sup>X</sup> - Y ) = {<sup>s</sup> ∈S| <sup>s</sup> ⊂ s, <sup>∀</sup>s ∈ S} by the definition of justifications.

**Example 11. (Example** <sup>4</sup> **cont'd).** *The regular paths from* <sup>N</sup>*<sup>A</sup> to* <sup>N</sup>∃*r.E and from* <sup>N</sup>*<sup>E</sup> to* <sup>N</sup>*<sup>F</sup> produce two H-paths* <sup>H</sup>*A,*∃*rE* <sup>=</sup> {e1, e2, e3} *and* <sup>H</sup>*E,F* <sup>=</sup> {e4}*. Then, applying the third composition rule with* <sup>H</sup>*A,*∃*rE*, H*E,F and* <sup>P</sup>∃*r.F,B* <sup>=</sup> {e6}*, we get* <sup>H</sup>*A,B* <sup>=</sup> {e1, e2, e3, e4, e6}*, which is the unique H-path from* <sup>N</sup>*<sup>A</sup> to* <sup>N</sup>*B. Thus, by Theorem 10, we have* {α1, α2, α3, α4, α5}*, the only justification for* <sup>A</sup> -B*.*

# **4 Implementation: Computing Justifications**

#### **4.1 SAT-Based Method**

In this section, we describe briefly how PULi [14], the state-of-the-art *glassbox* algorithm, proceeds. Given an ontology <sup>O</sup>, computing <sup>J</sup>O(<sup>X</sup> - Y ) is done through 2 steps: (1) tracing *a complete set* for <sup>X</sup> - Y , (2) using resolution to extract the justifications from *the complete set*. The following example illustrates both steps:

**Example 12 (Example** <sup>1</sup> **cont'd).** *Let us compute* <sup>J</sup>O(<sup>G</sup> - D) *using PULi's method.*

	- *(a) The first part proceeds to the translation of the inferences into clauses. Let us denote* <sup>p</sup>1:<sup>G</sup> - <sup>C</sup>*,* <sup>p</sup>2:<sup>C</sup> - <sup>A</sup>*,* <sup>p</sup>3:<sup>A</sup> - <sup>D</sup>*,* <sup>p</sup>4:<sup>G</sup> - <sup>A</sup>*,* <sup>p</sup>5:<sup>G</sup> - D*. Here the literals* p1, p2, p<sup>3</sup> *(with a bar) are called answer literals as they correspond to the axioms* <sup>a</sup>6, a7, a<sup>1</sup> *in* <sup>O</sup>*. Thus, we obtain* <sup>C</sup> <sup>=</sup> {¬p<sup>1</sup> <sup>∨</sup> <sup>¬</sup>p<sup>2</sup> <sup>∨</sup> <sup>p</sup>4,¬p<sup>4</sup> ∨ ¬p<sup>3</sup> <sup>∨</sup> <sup>p</sup>5} *by rewriting the inferences* <sup>ρ</sup>1*,* <sup>ρ</sup><sup>2</sup> *as clauses.*
	- *(b) Secondly, a new clause* <sup>¬</sup>p<sup>5</sup> *is added to* <sup>C</sup>*, where* <sup>p</sup><sup>5</sup> *corresponds to the conclusion* <sup>G</sup> - <sup>D</sup>*, and resolution is applied over* <sup>C</sup>*. The set of all justifications* <sup>J</sup>O(<sup>G</sup> -D) *is obtained by considering (i) the clauses formed of*

<sup>2</sup> For the sake of simplicity, we use the inference rules in Table 1 although PULi uses a slightly different set of inference rules [13].

```
Algorithm 1: minH
  input : X-
             Y
  output: J: JO(X-
                   Y ).
1 J ← ∅;
2 U ← CompleteH(NX
                     h
                    -
                      NY );
3 min hpaths ← resolution(clauses(U));
4 for h ∈ min hpaths do
5 if f −1(h') ⊂ f −1(h) for any h' ∈ min hpaths then
6 J.add(f −1(h))
7 end
8 end
```
*answer literals only and (ii) among them keeping the minimal ones*3*. In this example, after the resolution phase, the only clause that consists of merely answer literals is* <sup>¬</sup>p1∨¬p2∨¬p3*. Thus, the set of all justifications is* <sup>J</sup>O(<sup>G</sup> -<sup>D</sup>) = {{a1, a6, a7}}*.*

Our method for computing justifications follows the same steps as PULi although here the major difference is that the first step computes a complete set of H-inferences instead of a complete set of inferences wrt. Table 1.

#### **4.2 Computing Justification by Minimal H-Paths**

In this section, given an ontology O and its associated hypergraph GO, we present minH (Algorithm 1) that computes all justifications for <sup>X</sup><sup>0</sup> - Y<sup>0</sup> using the minimal H-paths from <sup>N</sup>*<sup>X</sup>*<sup>0</sup> to <sup>N</sup>*<sup>Y</sup>*<sup>0</sup> over <sup>G</sup>O. The algorithm minH proceeds in two steps described below.

*Step 1.* First, at Line 2, minH computes a *complete set* of inferences U for N*<sup>X</sup>*<sup>0</sup> *h* - <sup>N</sup>*<sup>Y</sup>*<sup>0</sup> using CompleteH (See Algorithm 2). Here, <sup>U</sup> is complete in the sense that for any H-path H*X,Y* , we can derive N*<sup>X</sup> h* - <sup>N</sup>*<sup>Y</sup>* using inferences in <sup>U</sup> from the edge set <sup>H</sup>*X,Y* . CompleteH computes <sup>U</sup> as follows:


*Step 2.* Then Algorithm minH computes all justifications for <sup>X</sup><sup>0</sup> -Y<sup>0</sup> as follows:

<sup>3</sup> Here a clause c is smaller than c<sup>1</sup> if all the literals of c are in c1.

**Algorithm 2:** CompleteH

**input :** N*<sup>X</sup> h* -N*<sup>Y</sup>* **output:** U: a complete set of inferences for N*<sup>X</sup> h* -N*<sup>Y</sup>* . **<sup>1</sup>** <sup>U</sup>*,* history*,* <sup>Q</sup> ← ∅ *;* // <sup>Q</sup> is a queue **2** Q*.add*(N*<sup>X</sup> h* -N*<sup>Y</sup>* ); **<sup>3</sup> while** Q = ∅ **do 4** N*<sup>X</sup>*<sup>1</sup> *h* -N*<sup>Y</sup>*<sup>1</sup> ← Q*.takeNext*(); **<sup>5</sup>** history*.add*(N*<sup>X</sup>*<sup>1</sup> *h* -N*<sup>Y</sup>*<sup>1</sup> ); **<sup>6</sup>** U←U trace one turn(N*<sup>X</sup>*<sup>1</sup> *h* -N*<sup>Y</sup>*<sup>1</sup> ); **<sup>7</sup> for** N*<sup>X</sup>*<sup>2</sup> *h* -N*<sup>Y</sup>*<sup>2</sup> *appearing in* trace one turn(N*<sup>X</sup>*<sup>1</sup> *h* -N*<sup>Y</sup>*<sup>1</sup> ) **do <sup>8</sup> if** N*<sup>X</sup>*<sup>2</sup> *h* -N*<sup>Y</sup>*<sup>2</sup> ∈ history *and* N*<sup>X</sup>*<sup>2</sup> *h* -N*<sup>Y</sup>*<sup>2</sup> ∈ Q **then <sup>9</sup>** Q.add(N*<sup>X</sup>*<sup>2</sup> *h* -N*<sup>Y</sup>*<sup>2</sup> ) **10 end 11 end 12 end <sup>13</sup> for** N*<sup>X</sup>*2-N*<sup>Y</sup>*<sup>2</sup> *appearing in* U **do <sup>14</sup> for** <sup>p</sup>={e1, e2, ··· , e*n*} ∈path(N*<sup>X</sup>*<sup>2</sup> , N*<sup>Y</sup>*<sup>2</sup> ) **do <sup>15</sup>** U*.add*( {e1, e2, ··· , e*n*}, N*<sup>X</sup>*2-N*<sup>Y</sup>*<sup>2</sup> ); **16 end 17 end**


**Example 13 (Example** 4 **cont'd).** *Assume* X<sup>0</sup> = G *and* Y<sup>0</sup> = D *are the input of minH. Then at line 2 of minH, we have* <sup>U</sup> <sup>=</sup> {ρ1, ρ<sup>2</sup>}*, where* <sup>ρ</sup><sup>1</sup> <sup>=</sup> {N*<sup>G</sup>* - <sup>N</sup>*D*}, N*<sup>G</sup> h* - <sup>N</sup>*D is H-inference obtained by CompleteH (line 3–12) and* <sup>ρ</sup><sup>2</sup> <sup>=</sup> {e0, e1, e8}, N*<sup>G</sup>* - <sup>N</sup>*D is produced from regular paths obtained by CompleteH (line 13–17). Let us denote* p0:e0, p1:e1, p2:e<sup>8</sup> *as answer literals and* p3:N*<sup>G</sup>* - N*D,* p4:N*<sup>G</sup> h* -<sup>N</sup>*D. Then clauses(*U*)* <sup>=</sup> {¬p<sup>3</sup> <sup>∨</sup> <sup>p</sup>4, <sup>¬</sup>p<sup>0</sup> ∨ ¬p<sup>1</sup> ∨ ¬p<sup>2</sup> <sup>∨</sup> <sup>p</sup>3}*.*

*By resolution over clauses(*U*), we obtain min hpaths <sup>=</sup>* {{e0, e1, e8}} *at line 3 of minH. Then the output of minH is <sup>J</sup> <sup>=</sup>* {{a1, a6, a7}}*, which is the set of all justifications for* <sup>G</sup> -D*.*

<sup>4</sup> Available at https://github.com/liveontologies/pinpointing-experiments.

#### **Algorithm 3:** trace one turn

**input :** N*<sup>X</sup> h* -N*<sup>Y</sup>* **output:** the set result of all H-inferences whose conclusion is N*<sup>X</sup> h* -N*<sup>Y</sup>* . **<sup>1</sup>** result ← ∅; **<sup>2</sup>** P1(X, Y ) ← {({N*<sup>B</sup>*<sup>1</sup> , ···, N*<sup>B</sup>m*}, {N*A*}) ∈ E<sup>O</sup> | O|=X-A-Y }; **<sup>3</sup> for** ({N*<sup>B</sup>*<sup>1</sup> , ···, N*<sup>B</sup>m*}, {N*A*}) ∈ P1(X, Y ) **do <sup>4</sup> if** path(N*A*, N*<sup>Y</sup>* )<sup>=</sup> <sup>∅</sup> *or* <sup>Y</sup> <sup>=</sup><sup>A</sup> **then <sup>5</sup>** result*.add*( {N*<sup>X</sup> h* -N*<sup>B</sup>*<sup>1</sup> , ··· , N*<sup>X</sup> h* -N*<sup>B</sup>m*, N*A*-N*<sup>Y</sup>* }, N*<sup>X</sup> h* -N*<sup>Y</sup>* ) ; **6 end 7 end <sup>8</sup>** P2(X, Y ) ← {(r, B1, B2) | O|=X-∃r.B1*,* B1-B2, ∃r.B2-Y }; **<sup>9</sup> for** (r, B1, B2) ∈ P2(X, Y ) **do <sup>10</sup> if** path(N∃*r.B*<sup>2</sup> , N*<sup>Y</sup>* )<sup>=</sup> <sup>∅</sup> *or* <sup>Y</sup> <sup>=</sup>∃r.B<sup>2</sup> **then <sup>11</sup>** result*.add*( {N*<sup>X</sup> h* -N∃*r.B*<sup>1</sup> , N*<sup>B</sup>*<sup>1</sup> *h* -N*<sup>B</sup>*<sup>2</sup> , N∃*r.B*2-N*<sup>Y</sup>* }, N*<sup>X</sup> h* -N*<sup>Y</sup>* ); **12 end 13 end <sup>14</sup>** P3(X, Y ) ← {(r, s, t, A1, A2) | r◦st∈O*,* O|=X-∃r.A1, A1-∃s.A2, ∃t.A2-Y }; **for** (r, s, t, A1, A2) ∈ P3(X, Y ) **do <sup>15</sup> if** path(N∃*t.A*<sup>2</sup> , N*<sup>Y</sup>* )<sup>=</sup> <sup>∅</sup> *or* <sup>Y</sup> <sup>=</sup>∃t.A<sup>2</sup> **then <sup>16</sup>** result*.add*( {N*<sup>X</sup> h* -N∃*r.A*<sup>1</sup> , N*<sup>A</sup>*<sup>1</sup> *h* -N∃*s.A*<sup>2</sup> , N∃*t.A*2-N*<sup>Y</sup>* , ({N*r*, N*t*}, {N*s*, N*t*})}, {N*s*, N*t*})}, N*<sup>X</sup> h* -N*<sup>Y</sup>* ); **17 end 18 end**

#### **4.3 Optimization**

Below we present two optimizations that have been implemented in order to accelerate the computation of all justifications.


$$H\_{A, \exists r. B\_1} = H\_{A, \exists r. C} \cup H\_{C, B\_1}. \tag{2}$$

then <sup>H</sup>*C,B*<sup>2</sup> <sup>=</sup> <sup>H</sup>*C,B*<sup>1</sup> <sup>∪</sup>H*<sup>B</sup>*1*,B*<sup>2</sup> is also an H-path and <sup>H</sup>*A,B* <sup>=</sup> <sup>H</sup>*A,*∃*r.C* <sup>∪</sup>H*C,B*<sup>2</sup> <sup>∪</sup> <sup>P</sup>∃*r.B*2*,B*. The two different ways to decompose <sup>H</sup>*A,B* above are already considered in Line 8 when executing Algorithm 3 with the input N*<sup>A</sup> h* - N*B*. It means that the decomposition (2) is redundant. We can avoid such redundancy by requiring <sup>∃</sup>r.B<sup>2</sup> <sup>=</sup> <sup>Y</sup> at Line 11.

**Fig. 4.** Illustration of Optimization 1

#### **5 Experiments**

To evaluate and validate our approach, we compare minH<sup>5</sup> with PULi [14], the state-of-the-art algorithm for computing justifications at this moment. Both methods compute all justifications based on resolution but with different inference rules generated in different ways. PULi uses a complete set (next denoted by *elk*) generated by the ELK reasoner [13], which uses inference rules slightly different from those in Table 1. Our method uses the complete set U generated by Step 1 of minH, described in Sect. 4.2. To analyze the performance of our setting, we make the following two measures: (1) we compare the size of *elk* with that of U, (2) we compare the time cost of PULi with that of minH. All the experiments were conducted on a machine with an INTEL Xeon 2.6 GHz and 128 GiB of RAM.

The experiments were processed with four different ontologies<sup>6</sup>: go-plus, galen7, SnomedCT (version Jan. 2015 and Jan. 2021). All the non-EL<sup>+</sup> axioms are deleted. Here, go-plus, galen7 are the same ontologies used in [14]. We denote the four ontologies above by go-plus, galen7, snt2015 and snt2021. The number of axioms, concepts, relations, and queries for each ontology are shown in Table 3.

Next a **query** refers to a *direct subsumption*<sup>7</sup> <sup>A</sup> - B. In our experiments, for the four ontologies, the set of all justifications <sup>J</sup>O(<sup>A</sup> - B) is computed for each query <sup>A</sup> - <sup>B</sup>. A query <sup>A</sup> - B is called **trivial** iff all minimal H-paths from N*<sup>A</sup>* to N*<sup>B</sup>* are regular paths, otherwise, the query is **non-trivial**.

**Comparing Complete Sets:** U **vs**. *elk.* We summarize our results in Table 4 and Fig. 5. Table 4 shows that on all four ontologies, U is much smaller than *elk* on average. Especially on galen7, the difference between *elk* and U is even up to 50 times. The gap is even more significant for the median value since a large part of the queries is trivial. However, the gap is much smaller for the maximal number. On snt2021, the largest U in size is three times larger than that of *elk*.

<sup>5</sup> A prototype is available at https://gitlab.lisn.upsaclay.fr/yang/minH.

<sup>6</sup> Available at https://osf.io/9sj8n/, https://www.snomed.org/.

<sup>7</sup> i.e., O |<sup>=</sup> <sup>A</sup> - B and there is no other atomic concept A such that O |= A - A , A - B. Direct subsumptions can be computed by a reasoner supporting ontology classification.


**Table 3.** Summary of sizes of the input ontologies.


**Table 4.** Summary of size of *elk*, U.

In Fig. 5, for a given query, if the complete set *elk* contains fewer inference rules than U, the corresponding blue point is below the red line. The percentage of such cases are: 0.34% for go-plus, 0.066% for galen7, 0.79% for snt2015, and 1.01% for snt2021. It means that for most of the queries, the corresponding U is smaller than *elk*.

As shown in Table 4 and in Fig. 5, sometimes minH generates bigger complete set U than PULi. It may happen when, for example, there might be exponentially many different regular paths occurring in the computation process of minH. Therefore, minH could produce a huge complete set. Also, U can be bigger than *elk* when all the regular paths involved are simple. For example, if all regular paths contain only one edge, then the complete set U includes many clauses of the form <sup>¬</sup>p*<sup>e</sup>* <sup>∨</sup> <sup>p</sup>*<sup>N</sup>A*-*<sup>N</sup><sup>B</sup>* , which happens because H-rules use regular paths. Indeed, the clause <sup>¬</sup>p*<sup>e</sup>* <sup>∨</sup> <sup>p</sup>*<sup>N</sup>A*-*<sup>N</sup><sup>B</sup>* is redundant since we can omit this clause by replacing p*<sup>N</sup>A*-*<sup>N</sup><sup>B</sup>* by p*e*. For *elk*, this does not happen.

**Comparing Time Cost:** minH **vs. PULi.** In the following, we only compare the time cost on non-trivial queries. For trivial queries, all H-path are regular paths. Thus all the justifications have already been enumerated by path in minH. It is also easy to compute all the justifications for trivial queries for PULi.

We set a limit of 60 s for each query. The timed-out queries contribute of 60 s to the total time cost. To compare minH with PULi, we test all three different strategies, *threshold, top down, bottom up* of the resolution algorithm proposed in [14]. We summarize in Table 5 the total time cost (top) and the timed-out queries (bottom). Figure 6 gives the comparisons over queries that are successful for both minH and PULi.

**Fig. 5.** Each blue point has coordinate (log(#|U|), log(#|*elk*|)), where U, *elk* are generated from a non-trivial query, the red line is x = y. (Color figure online)

As shown in Table 5, when using the threshold strategy, minH is more time consuming in total (+5%) on snt2021, and minH has more timed-out queries than PULi on snt2015 and snt2021. This is in part due to the fact that U is larger than *elk* for relatively many queries on snt2015 and snt2021 as shown in Fig. 5. For the remaining 11 cases, minH performs better than PULi in terms of total time cost and the number of timed-out queries. Especially on galen7, the gap between the two methods is even up to ten times for the total time cost. We can see from Table 5 that the threshold strategy performs the best for PULi on all four ontologies. This strategy is also the best strategy for minH except for galen7, for which the *bottom up* strategy is the best with minH.

For each strategy detailed in Fig. 6, the black curve (the ordered time costs of minH on successful queries) is always below the red curve (the ordered time costs of PULi on successful queries) for all the ontologies. This suggests that minH spends less time over successful queries. Also, most of the green points are below the red lines, which suggests that minH performs better than PULi most of the time for a given query. In some cases, we can see that PULi is more efficient than minH. One of the reasons might be as follows. Note that when computing justifications by resolution, we have to compare two different clauses and delete the redundant one (i.e., the non-minimal one). When regular paths are big, minH might be time consuming because of these comparisons.

**Fig. 6.** For each line, the left, middle and right charts correspond to *threshold, top down, bottom up* strategies respectively. The y-axis is the log value of time(s). The red (resp. black) curve presents the ascending ordered (log value of) time cost of PULi (resp. minH). For a green point (x, y), e*<sup>y</sup>* is the time cost of minH for the query corresponding to the red line point (x, y ). (Color figure online)


**Table 5.** Total time cost and number of timed-out queries.

## **6 Conclusion**

In this paper, we introduce and investigate a new set of sound and complete inference rules based on a hypergraph representation of ontologies. We design the algorithm minH that leverages these inference rules to compute all justifications for a given conclusion. The key of the performance of our method is that regular paths are used as elementary components of H-paths and this leads to reducing the size of complete sets because (1) rules are more compact than standard ones, (2) redundant inferences are captured and eliminated by regular paths (see Sect. 4.3). The efficiency of the algorithm minH has been validated by our experiments showing that it outperforms PULi in most of the cases.

There are still many possible extensions and applications of the hypergraph approach. For instance, to get even more compact inference rules, we could extend the notion of regular path to a more general one that will encapsulate the inference rule H<sup>2</sup> in the same way as regular paths are encapsulated in H-rules. Moreover, we will try to apply our approach for other tasks like classification and to compute logical differences [15].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Choices, Invariance, Substitutions, and Formalizations**

# **Sequent Calculi for Choice Logics**

Michael Bernreiter1(B), Anela Lolic<sup>1</sup>, Jan Maly<sup>2</sup>, and Stefan Woltran<sup>1</sup>

<sup>1</sup> Institute of Logic and Computation, TU Wien, Vienna, Austria

{mbernrei,alolic,woltran}@dbai.tuwien.ac.at <sup>2</sup> Institute for Logic, Language and Computation, University of Amsterdam,

Amsterdam, The Netherlands

j.f.maly@uva.nl

**Abstract.** Choice logics constitute a family of propositional logics and are used for the representation of preferences, with especially *qualitative choice logic* (QCL) being an established formalism with numerous applications in artificial intelligence. While computational properties and applications of choice logics have been studied in the literature, only few results are known about the proof-theoretic aspects of their use. We propose a sound and complete sequent calculus for preferred model entailment in QCL, where a formula F is entailed by a QCL-theory T if F is true in all preferred models of T. The calculus is based on labeled sequent and refutation calculi, and can be easily adapted for different purposes. For instance, using the calculus as a cornerstone, calculi for other choice logics such as *conjunctive choice logic* (CCL) can be obtained in a straightforward way.

### **1 Introduction**

Choice logics are propositional logics for the representation of alternative options for problem solutions [4]. These logics add new connectives to classical propositional logic that allow for the formalization of ranked options. A prominent example is *qualitative choice logic* (QCL for short) [7], which adds the connective *ordered disjunction* #» <sup>×</sup> to classical propositional logic. Intuitively, <sup>A</sup>#» ×B means that if possible A, but if A is not possible than at least B. The semantics of a choice logic induce a preference ordering over the models of a formula.

As choice logics are well suited for preference handling, they have a multitude of applications in AI such as logic programming [8], alert correlation [3], or database querying [13]. But while computational properties and applications of choice logics have been studied in the literature, only few results are known about the proof-theoretic aspects of their use. In particular, there is no proof system capable of deriving valid sentences containing choice operators. In this paper we propose a sound and complete calculus for preferred model entailment in QCL that can easily be generalized to other choice logics.

Entailment in choice logics is non-monotonic: conclusions that have been drawn might not be derivable in light of new information. It is therefore not surprising that choice logics are related to other non-monotonic formalisms. For instance, it is known [7] that QCL can capture propositional circumscription and that, if additional symbols in the language are admitted, circumscription can be used to generate models corresponding to the inclusion-preferred QCL models up to the additional atoms. We do not intend to use this translation of our choice logic formulas (or sequents) in order to employ an existing calculus for circumscription, for instance [5].

Instead, we define calculi in sequent format directly for choice logics, which are different from existing non-monotonic logics in the way non-monotonicity is introduced. Specifically, the non-standard part of our logics is a new logical connective which is fully embedded in the logical language. For this reason, calculi for choice logics also differ from most other calculi for non-monotonic logics: our calculi do not use non-standard inference rules as in default logic, modal operators expressing consistency or belief as in autoepistemic logic, or predicates whose extensions are minimized as in circumscription. However, one method that can also be applied to choice logics is the use of a refutation calculus (also known as rejection or antisequent calculus) axiomatising invalid formulas, i.e., non-theorems. Refutation calculi for non-monotonic logics were used in [5]. Specifically, by combining a refutation calculus with an appropriate sequent calculus, elegant proof systems for the central non-monotonic formalisms of default logic [16], autoepistemic logic [15], and circumscription [14] were obtained. However, to apply this idea to choice logics, we have to take another facet of their semantics into account.

With choice logics, we are working in a setting similar to many-valued logics. Interpretations ascribe a natural number called satisfaction degree to choice logic formulas. Preferred models of a formula are then those models with the least degree. There are several kinds of sequent calculus systems for many-valued logics, where the representation as a hypersequent calculus [1,10] plays a prominent role. However, there are crucial differences between choice logics and manyvalued logics in the usual sense. Firstly, choice logic interpretations are classical, i.e., they set propositional variables to either true or false. Secondly, non-classical satisfaction degrees only arise when choice connectives, e.g. ordered disjunction in QCL, occur in a formula. Thirdly, when applying a choice connective ◦ to two formulas A and B, the degree of A ◦ B does not only depend on the degrees of A and B, but also on the maximum degrees that A and B can possibly assume. Therefore, techniques used in proof systems for conventional many-valued logics can not be applied directly to choice logics.

In [11] a sequent calculus based system for reasoning with contrary-to-duty obligations was introduced, where a non-classical connective was defined to capture the notion of reparational obligation, which is in force only when a violation of a norm occurs. This is related to the ordered disjunction in QCL, however, based on the intended use in [11] the system was defined only for the occurrence of the new connective on the right side of the sequent sign. We aim for a proof system for reasoning with choice logic operators, and to deduce formulas from choice logic formulas. Thus, we need a calculus with left and right inference rules.

To obtain such a calculus we combine the idea of a refutation calculus with methods developed for multi-valued logics in a novel way. First, we develop a (monotonic) sequent calculus for reasoning about satisfaction degrees using a labeled calculus, a method developed for (finite) many-valued logics [2,9,12]. Secondly, we define a labeled refutation calculus for reasoning about invalidity in terms of satisfaction degrees. Finally, we join both calculi to obtain a sequent calculus for the non-monotonic entailment of QCL. To this end, we introduce a new, non-monotonic inference rule that has sequents of the two labeled calculi as premises and formalizes degree minimization.

The rest of this paper is organized as follows. In the next section we present the basic notions of choice logics and introduce the most prominent choice logics QCL and CCL (*conjunctive choice logic*). In Sect. 3 we develop a labeled sequent calculus for propositional logic extended by the QCL connective #» ×. This calculus is shown to be sound and complete and already can be used to derive interesting sentences containing choice operators. In Sect. 4 we extend the previously defined sequent calculus with an appropriate refutation calculus and non-monotonic reasoning, to capture entailment in QCL. The developed methodology for QCL can be extended to other choice logics as well. In particular we show in Sect. 5 how the calculi can be adapted for CCL.

## **2 Choice Logics**

First, we formally define the notion of choice logics in accordance with the choice logic framework of [4] before giving concrete examples in the form of QCL and CCL. Finally, we define preferred model entailment.

**Definition 1.** *Let* U *denote the alphabet of propositional variables. The set of choice connectives* C<sup>L</sup> *of a choice logic* L *is a finite set of symbols such that* C<sup>L</sup> ∩ {¬,∧,∨} = ∅*. The set* F<sup>L</sup> *of formulas of* L *is defined inductively as follows: (i)* a ∈ F<sup>L</sup> *for all* a ∈ U*; (ii) if* F ∈ FL*, then* (¬F) ∈ FL*; (iii) if* F, G ∈ FL*, then* (F ◦ G) ∈ F<sup>L</sup> *for* ◦ ∈ ({∧,∨} ∪ CL)*.*

For example, <sup>C</sup>QCL <sup>=</sup> {#»×} and ((a#» ×c) ∧ (b #» ×c)) ∈ FQCL. Formulas that do not contain a choice connective are referred to as classical formulas.

The semantics of a choice logic is given by two functions, satisfaction degree and optionality. The satisfaction degree of a formula given an interpretation is either a natural number or ∞. The lower this degree, the more preferable the interpretation. The optionality of a formula describes the maximum finite satisfaction degree that this formula can be ascribed, and is used to penalize non-satisfaction.

**Definition 2.** *The optionality of a choice connective* ◦∈C<sup>L</sup> *in a choice logic* L *is given by a function opt*◦ <sup>L</sup> : <sup>N</sup><sup>2</sup> <sup>→</sup> <sup>N</sup> *such that opt*◦ L(k, -) ≤ (k + 1) · (- + 1) *for all* k, - <sup>∈</sup> <sup>N</sup>*. The optionality of an* <sup>L</sup>*-formula is given via opt*<sup>L</sup> : <sup>F</sup><sup>L</sup> <sup>→</sup> <sup>N</sup> *with (i) opt*L(a)=1 *for every* a ∈ U*; (ii) opt*L(¬F)=1*; (iii) opt*L(F∧G) = *opt*L(F∨ G) = *max* (*opt*L(F), *opt*L(G))*; (iv) opt*L(F ◦ G) = *opt*◦ <sup>L</sup>(*opt*L(F), *opt*L(G)) *for every choice connective* ◦∈CL*.*

The optionality of a classical formula is always 1. Note that, for any choice connective ◦, the optionality of F ◦ G is bounded such that *opt*L(F ◦ G) ≤ (*opt*L(F) + 1) · (*opt*L(G) + 1). In the following, we write <sup>N</sup> for (<sup>N</sup> ∪ {∞}).

**Definition 3.** *The satisfaction degree of a choice connective* ◦∈C<sup>L</sup> *in a choice logic* L *is given by a function deg*◦ <sup>L</sup> : <sup>N</sup><sup>2</sup> <sup>×</sup> <sup>N</sup><sup>2</sup> <sup>→</sup> <sup>N</sup> *such that deg*◦ L(k, -, m, n) ≤ *opt*◦ L(k, -) *or deg*◦ L(k, -, m, n) = ∞ *for all* k, - <sup>∈</sup> <sup>N</sup> *and all* m, n <sup>∈</sup> <sup>N</sup>*. The satisfaction degree of an* L*-formula under an interpretation* I⊆U *is given via the function deg*<sup>L</sup> : 2<sup>U</sup> × F<sup>L</sup> <sup>→</sup> <sup>N</sup> *with*


We also write I |=<sup>L</sup> <sup>m</sup> F for *deg*L(I, F) = m. If m < ∞, we say that I satisfies F (to a finite degree), and if m = ∞, then I does not satisfy F. If F is a classical formula, then I |=<sup>L</sup> <sup>1</sup> F ⇐⇒ I |= F and I |=<sup>L</sup> <sup>∞</sup> F ⇐⇒ I |= F. The symbols and ⊥ are shorthand for the formulas (a∨ ¬a) and (a∧ ¬a), where a can be any variable. We have *opt*L() = *opt*L(⊥) = 1, *deg*L(I, ) = 1 and *deg*L(I, ⊥) = ∞ for any interpretation I in every choice logic.

Models and preferred models of formulas are defined in the following way:

**Definition 4.** *Let* L *be a choice logic,* I *an interpretation, and* F *an* L*formula.* I *is a model of* F*, written as* I ∈ *Mod*L(F)*, if deg*L(I, F) < ∞*.* I *is a preferred model of* F*, written as* I ∈ *Prf* <sup>L</sup>(F)*, if* I ∈ *Mod*L(F) *and deg*L(I, F) ≤ *deg*L(J , F) *for all other interpretations* J *.*

Moreover, we define the notion of classical counterparts for choice connectives.

**Definition 5.** *Let* L *be a choice logic. The classical counterpart of a choice connective* ◦∈C<sup>L</sup> *is the classical binary connective such that, for all atoms* a *and* b*, deg*L(I, a ◦ b) < ∞ ⇐⇒ I |= a b*. The classical counterpart of an* L*-formula* F *is denoted as cp*(F) *and is obtained by replacing all occurrences of choice connectives in* F *by their classical counterparts.*

A natural property of known choice logics is that choice connectives can be replaced by their classical counterpart without affecting satisfiability, meaning that *deg*L(I, F) < ∞ ⇐⇒ I |= *cp*(F) holds for all L-formulas F.

So far we introduced choice logics in a quite abstract way. We now introduce two particular instantiations, namely QCL, the first and most prominent choice logic in the literature, and CCL, which introduces a connective #» called ordered conjunction in place of QCL's ordered disjunction.

**Definition 6.** QCL *is the choice logic such that* <sup>C</sup>QCL <sup>=</sup> {#»×}*, and, if* <sup>k</sup> <sup>=</sup> *opt*QCL(F)*,* -= *opt*QCL(G)*,* m = *deg*QCL(I, F)*, and* n = *deg*QCL(I, G)*, then*

$$\begin{aligned} \operatorname{opt}\_{\operatorname{QCL}}(F \vec{\times} G) &= \operatorname{opt}\_{\operatorname{QCL}}^{\vec{\chi}}(k, \ell) = k + \ell, \text{ and} \\ \operatorname{deg}\_{\operatorname{QCL}}(\mathcal{Z}, F \vec{\times} G) &= \operatorname{deg}\_{\operatorname{QCL}}^{\vec{\chi}}(k, \ell, m, n) = \begin{cases} m & \text{if } m < \infty; \\ n + k & \text{if } m = \infty, n < \infty; \\ \infty & \text{otherwise.} \end{cases} \end{aligned}$$

In the above definition, we can see how optionality is used to penalize nonsatisfaction: given a QCL-formula <sup>F</sup> #» ×G and an interpretation I, if I satisfies <sup>F</sup> (to some finite degree), then *deg*QCL(I, F #» ×G) = *deg*QCL(I, F); if I does not satisfy <sup>F</sup>, then *deg*QCL(I, F #» ×G) = *opt*QCL(F) + *deg*QCL(I, G). Since *deg*QCL(I, F) ≤ *opt*QCL(F), interpretations that satisfy F result in a lower degree, i.e., are more preferable, compared to interpretations that do not satisfy F. Let us take a look at a concrete example:

*Example 1.* Consider the QCL-formula <sup>F</sup> = (a#» ×c) ∧ (b #» ×c). Note that the classical counterpart of #» × is ∨, i.e., *cp*(F)=(a∨c)∧(b∨c). Thus, {c}, {a, b}, {a, c}, {b, c}, {a, b, c} ∈ *Mod* QCL(F). Of these models, {a, b} and {a, b, c} satisfy F to a degree of 1 while {c}, {a, c}, and {b, c} satisfy F to a degree of 2. Therefore, {a, b}, {a, b, c} ∈ *Prf* QCL(F).

Next, we define CCL. Note that we follow the revised definition of CCL [4], which differs from the initial specification<sup>1</sup>. Intuitively, given a CCL-formula <sup>F</sup> #»<sup>G</sup> it is best to satisfy both F and G, but also acceptable to satisfy only F.

**Definition 7.** CCL *is the choice logic such that* <sup>C</sup>CCL <sup>=</sup> {#»}*, and, if* <sup>k</sup> <sup>=</sup> *opt*CCL(F)*,* -= *opt*CCL(G)*,* m = *deg*CCL(I, F)*, and* n = *deg*CCL(I, G)*, then*

$$\begin{aligned} \operatorname{opt}\_{\operatorname{CCL}}(F\overrightarrow{\odot}G) &= k+\ell, \; \operatorname{and} \\ \deg\_{\operatorname{CCL}}(\mathcal{Z}, F\overrightarrow{\odot}G) &= \begin{cases} n & \text{if } m=1, n<\infty; \\ m+\ell & \text{if } m<\infty \; \operatorname{and} \; (m>1 \; \operatorname{or} n=\infty); \\ \infty & \text{otherwise.} \end{cases} \end{aligned}$$

*Example 2.* Consider the CCL-formula <sup>G</sup> = (a#»c) <sup>∧</sup> (<sup>b</sup> #»c). Note that the classical counterpart of #» is the first projection, i.e., *cp*(G) = <sup>a</sup> <sup>∧</sup> <sup>b</sup>. Thus, {a, b}, {a, b, c} ∈ *Mod* CCL(G). Of these models, {a, b, c} satisfies G to a degree of 1 while {a, b} satisfies G to a degree of 2. Therefore, {a, b, c} ∈ *Prf* CCL(G).

If L is a choice logic, then a set of L-formulas is called an L-theory. An L-theory T entails a classical formula F, written as T |∼ F, if F is true in all preferred models of T. However, we first need to define what the preferred models of a choice logic theory are. There are several approaches for this. In the original QCL paper [7], a lexicographic and an inclusion-based approach were introduced.

<sup>1</sup> It seems that, under the initial definition of CCL, <sup>a</sup>#»b is always ascribed a degree of 1 or ∞, i.e., non-classical degrees can not be obtained (cf. Definition 8 in [6]).

**Definition 8.** *Let* L *be a choice logic,* I *an interpretation, and* T *an* L*-theory.* I ∈ *Mod*L(T) *if deg*L(I, F) <sup>&</sup>lt; <sup>∞</sup> *for all* <sup>F</sup> <sup>∈</sup> <sup>T</sup>*.* <sup>I</sup><sup>k</sup> <sup>L</sup>(T) *denotes the set of formulas in* <sup>T</sup> *satisfied to a degree of* <sup>k</sup> *by* <sup>I</sup>*, i.e.,* <sup>I</sup><sup>k</sup> <sup>L</sup>(T) = {<sup>F</sup> <sup>∈</sup> <sup>T</sup> <sup>|</sup> *deg*L(I, F) = <sup>k</sup>}*.*


In our calculus for preferred model entailment we focus on the lexicographic approach, but it will become clear how it can be adapted to other preferred model semantics (see Sect. 4). We now formally define preferred model entailment:

**Definition 9.** *Let* L *be a choice logic,* T *an* L*-theory,* S *a classical theory, and* <sup>σ</sup> ∈ {*lex* , *inc*}*.* <sup>T</sup> |∼<sup>σ</sup> <sup>L</sup> <sup>S</sup> *if for all* I ∈ *Prf* <sup>σ</sup> <sup>L</sup>(T) *there is* F ∈ S *such that* I |= F*.*

*Example 3.* Consider the QCL-theory <sup>T</sup> <sup>=</sup> {¬(a∧b), a#» <sup>×</sup>c, b #» ×c}. Then {c}, {a, c}, {b, c} ∈ *Mod* QCL(T). Note that, because of ¬(a∧b), a model of T can not satisfy both <sup>a</sup>#» ×c and b #» ×c to a degree of 1. Specifically,

$$\begin{aligned} \{a,c\}\_{\mathrm{QCL}}^{1}(T) &= \{\neg(a\wedge b), a\stackrel{\scriptstyle \rightarrow}{\times}c\} \text{ and } \{a,c\}\_{\mathrm{QCL}}^{2}(T) = \{b\stackrel{\scriptstyle \rightarrow}{\times}c\},\\ \{b,c\}\_{\mathrm{QCL}}^{1}(T) &= \{\neg(a\wedge b), b\stackrel{\scriptstyle \rightarrow}{\times}c\} \text{ and } \{b,c\}\_{\mathrm{QCL}}^{2}(T) = \{a\stackrel{\scriptstyle \rightarrow}{\times}c\},\\ \{c\}\_{\mathrm{QCL}}^{1}(T) &= \{\neg(a\wedge b)\} \text{ and } \{c\}\_{\mathrm{QCL}}^{2}(T) = \{a\stackrel{\scriptstyle \rightarrow}{\times}c, b\stackrel{\scriptstyle \rightarrow}{\times}c\}. \end{aligned}$$

Thus, {a, c}, {b, c} ∈ *Prf lex* QCL(T) but {c} ∈ *Prf lex* QCL(T). It can be concluded that <sup>T</sup> |∼*lex* QCL <sup>c</sup> <sup>∧</sup> (<sup>a</sup> <sup>∨</sup> <sup>b</sup>). However, <sup>T</sup> |∼ *lex* QCL<sup>a</sup> and <sup>T</sup> |∼ *lex* QCLb.

It is easy to see that preferred model entailment is non-monotonic. For example, {a#» <sup>×</sup>b} |∼*lex* QCL <sup>a</sup> but {a#» <sup>×</sup>b,¬a} |∼ *lex* QCLa.

## **3 The Sequent Calculus L[QCL]**

As a first step towards a calculus for preferred model entailment, we propose a labeled calculus [2,12] for reasoning about the satisfaction degrees of QCL formulas in sequent format and prove its soundness and completeness. One advantage of the sequent calculus format is having symmetrical left and right rules for all connectives, in particular for the choice connectives. This is in contrast to the representation of ordered disjunction in the calculus for deontic logic [11], in which only right-hand side rules are considered.

As the calculus will be concerned with satisfaction degrees rather than preferred models, we need to define entailment in terms of satisfaction degrees. To this end, the formulas occurring in the sequents of our calculus are labeled with natural numbers, i.e., they are of the form (A)k, where A is a choice logic formula and <sup>k</sup> <sup>∈</sup> <sup>N</sup>. (A)<sup>k</sup> is satisfied by those interpretations that satisfy <sup>A</sup> to a degree of k. Instead of labeling formulas with degree ∞ we use the negated formula, i.e., instead of (A)<sup>∞</sup> we use (¬A)1. We observe that (A)<sup>k</sup> for *opt*L(A) > k can never have a model. We will deal with such formulas by replacing them with (⊥)1. For classical formulas, we may write A for (A)1.

**Definition 10.** *Let* (A1)k<sup>1</sup> ,...,(Am)k<sup>m</sup> *and* (B1)l<sup>1</sup> ,...,(Bn)l<sup>n</sup> *be labeled* QCL*formulas.* (A1)k<sup>1</sup> ,...,(Am)k<sup>m</sup> (B1)l<sup>1</sup> ,...,(Bn)l<sup>n</sup> *is a labeled* QCL*-sequent.* Γ Δ *is valid iff every interpretation that satisfies all labeled formulas in* Γ *to the degree specified by the label also satisfies at least one labeled formula in* Δ *to the degree specified by the label.*

Note that entailment in terms of satisfaction degrees, as defined above, is montonic. Frequently we will write (A)<k as shorthand for (A)1,...,(A)<sup>k</sup>−<sup>1</sup> and (A)>k for (A)<sup>k</sup>+1,...,(A)*opt*QCL(A),(¬A)1. Moreover, Γ,(A)<sup>i</sup> Δi<k denotes the sequence of sequents

$$
\Gamma, (A)\_1 \vdash \Delta \dots \square, (A)\_{k-1} \vdash \Delta.
$$

Analogously, Γ,(A)<sup>i</sup> Δi>k stands for the sequence of sequents Γ,(A)<sup>k</sup>+1 Δ ... Γ,(A)*opt*QCL(A) Δ Γ,(¬A)<sup>1</sup> Δ.

We define the sequent calculus **L**[QCL] over labeled sequents below. In addition to introducing inference rules for #» × we have to modify the inference rules for conjunction and disjunction of propositional **LK**. The idea behind the ∨-left rule is that a model M of (A)<sup>k</sup> is only a model of (A ∨ B)<sup>k</sup> if there is no l<k s.t. M is a model of (B)l. Therefore, every model of (A ∨ B)<sup>k</sup> is a model of Δ iff


Essentially the same idea works for ∧-left but with l>k. For the ∨-right rule, in order for every model of Γ to be a model of (A ∨ B)k, every model of Γ must either be a model of (A)<sup>k</sup> or of (B)<sup>k</sup> and no model of Γ can be a model of (A)<sup>l</sup> for l<k, i.e., Γ,(A)<sup>l</sup> ⊥. Similarly for ∧-right.

**Definition 11 (L**[QCL]**).** *The axioms of* **L**[QCL] *are of the form* (p)<sup>1</sup> (p)<sup>1</sup> *for propositional variables* p*. The inference rules are given below. For the structural and logical rules, whenever a labeled formula* (F)<sup>k</sup> *appears in the conclusion of an inference rule it holds that* k ≤ *opt*L(F)*.*

*The structural rules are:*

$$\frac{\Gamma \vdash \Delta}{\Gamma, (A)\_k \vdash \Delta} \, \, \, \mathrm{w} \Big| \qquad \frac{\Gamma \vdash \Delta}{\Gamma \vdash (A)\_k, \Delta} \, \, \, \mathrm{w}r \qquad \frac{\Gamma, (A)\_k, (A)\_k \vdash \Delta}{\Gamma, (A)\_k \vdash \Delta} \, \, \, \, \mathrm{d} \quad \frac{\Gamma \vdash (A)\_k, (A)\_k, \Delta}{\Gamma \vdash (A)\_k, \Delta} \, \, \, \, \mathrm{c}r$$

*The logical rules are:*

<sup>Γ</sup> (*cp*(A))1, Δ <sup>¬</sup><sup>l</sup> Γ,(¬A)<sup>1</sup> <sup>Δ</sup> Γ,(*cp*(A))<sup>1</sup> <sup>Δ</sup> <sup>¬</sup><sup>r</sup> <sup>Γ</sup> (¬A)1, Δ Γ,(A)<sup>k</sup> (B)<k,Δ Γ,(B)<sup>k</sup> (A)<k, Δ <sup>∨</sup><sup>l</sup> Γ,(<sup>A</sup> <sup>∨</sup> <sup>B</sup>)<sup>k</sup> <sup>Δ</sup> Γ,(A)<sup>i</sup> <sup>Δ</sup>i<k Γ,(B)<sup>i</sup> <sup>Δ</sup>i<k <sup>Γ</sup> (A)k,(B)k, Δ <sup>∨</sup><sup>r</sup> <sup>Γ</sup> (<sup>A</sup> <sup>∨</sup> <sup>B</sup>)k, Δ

$$\frac{\Gamma, (A)\_k \vdash (B)\_{>k}, \Delta \qquad \Gamma, (B)\_k \vdash (A)\_{>k}, \Delta \qquad \backslash l \quad \frac{\langle \Gamma, (A)\_l \vdash \Delta \rangle\_{i > k} \quad \langle \Gamma, (B)\_l \vdash \Delta \rangle\_{i > k} \quad \Gamma \vdash (A)\_k, (B)\_k, \Delta \qquad \backslash r}{\Gamma \vdash (A \land B)\_k, \Delta} \land r$$

*The rules for ordered disjunction, with* k ≤ *opt*L(A) *and* l ≤ *opt*L(B)*, are:*

$$\begin{array}{c} \begin{array}{c} \Gamma,(A)\_{k} \vdash \Delta\\ \Gamma,(\overrightarrow{A \times B})\_{k} \vdash \Delta \end{array} \xrightarrow{\begin{array}{c} \Gamma,(B)\_{l},(\neg A)\_{1} \vdash \Delta\\ \Gamma,(\overrightarrow{A \times B})\_{\text{opt}\_{Q \gets \text{L}}(A)+l} \vdash \Delta \end{array}} \xrightarrow{\begin{array}{c} \Gamma,(B)\_{l},(\neg A)\_{1} \vdash \Delta\\ \Gamma,(\overrightarrow{A \times B})\_{\text{opt}\_{Q \gets \text{L}}(A)+l} \vdash \Delta \end{array}} \xrightarrow{\begin{array}{c} \Gamma,(B)\_{l},(\neg A)\_{1} \vdash \Delta\\ \Gamma \vdash (\overrightarrow{A \times B})\_{\text{opt}\_{Q \gets \text{L}}(A)+l},\Delta \end{array}} \xrightarrow{\begin{array}{c} \Gamma,(B)\_{l},(\Delta\\ \Gamma,(\overrightarrow{A \times B})\_{\text{opt}\_{Q \gets \text{L}}(A))+l},\Delta \end{array}$$

*The degree overflow rules*<sup>2</sup>*, with* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*, are:*

$$\frac{\Gamma, \bot \vdash \Delta}{\Gamma, (A)\_{\text{opt}\_{\text{QGL}}(A) + k} \vdash \Delta} \, dol \qquad \frac{\Gamma \vdash \Delta}{\Gamma \vdash (A)\_{\text{opt}\_{\text{QGL}}(A) + k}, \Delta} \, dor$$

Observe that the modified ∧ and ∨ inference rules correspond to the ∧ and ∨ inference rules of propositional **LK** in case we are dealing only with classical formulas. Our ∧-left rule splits the proof-tree unnecessarily for classical theories, and the ∧-right rule adds an unnecessary third condition Γ A, B, Δ. These additional conditions are necessary when dealing with non-classical formulas.

The intuition behind the degree overflow rules is that we sometimes need to fix invalid sequences, i.e., sequences in which a formula F is assigned a label k with *opt*QCL(F) <k< ∞.

*Example 4.* The following is an **L**[QCL]-proof of a valid sequent.<sup>3</sup>

. . . b ∨ c, ¬a, b a ∧ b, a ∧ c, b #» <sup>×</sup>l<sup>2</sup> <sup>b</sup> <sup>∨</sup> c, (a#» <sup>×</sup>b)<sup>2</sup> <sup>a</sup> <sup>∧</sup> b, a <sup>∧</sup> c, b <sup>¬</sup><sup>r</sup> (a#» ×b)<sup>2</sup> ¬(b #» ×c), a ∧ b, a ∧ c, b . . . a ∨ b, ¬b, c a ∧ b, a ∧ c, b #» <sup>×</sup>l<sup>2</sup> <sup>a</sup> <sup>∨</sup> b, (<sup>b</sup> #» <sup>×</sup>c)<sup>2</sup> <sup>a</sup> <sup>∧</sup> b, a <sup>∧</sup> c, b <sup>¬</sup><sup>r</sup> (<sup>b</sup> #» <sup>×</sup>c)<sup>2</sup> ¬(a#» ×b), a ∧ b, a ∧ c, b <sup>∧</sup><sup>l</sup> ((a#» ×b) ∧ (b #» <sup>×</sup>c))<sup>2</sup> <sup>a</sup> <sup>∧</sup> b, a <sup>∧</sup> c, b <sup>¬</sup><sup>l</sup> <sup>¬</sup>(<sup>a</sup> <sup>∧</sup> <sup>b</sup>), ((a#» ×b) ∧ (b #» ×c))<sup>2</sup> a ∧ c, b

*Example 5.* The following proof shows how the ∧r-rule can introduce more than three premises. Note that we make use of the *do*l-rule in the leftmost branch.

$$\begin{array}{c} \vdots\\ \cline{2-4} a,b,\bot \vdash & \mathtt{i} & \mathtt{i} & \mathtt{i} \\ \cline{2-4} a,b,\bot \vdash & a,b,\neg a \vdash & \mathtt{a},b,(\mathtt{b}\upharpoonright c)\_{2}\vdash & \mathtt{i}\forall l\_{2} & \mathtt{a},b\vdash b\lor c \\ \cline{2-4} a,b,\neg a\vdash & a,b,(\mathtt{b}\upharpoonright c)\_{2}\vdash & a,b,\neg(\mathtt{b}\upharpoonright c)\vdash & a,b\vdash a,(\mathtt{b}\upharpoonright c)\_{1}\land\mathtt{i}\forall l\_{1}\end{array}$$

We now show soundness and completeness of **L**[QCL].

<sup>2</sup> *do*l/*do*r stands for degree overflow left/right.

<sup>3</sup> Note that, once we reach sequents containing only classical formulas, we do not continue the proof. However, it can be verified that the classical sequents on the left and right branch are provable in this case. Moreover, given a formula (A)<sup>1</sup> with a label of 1, the label is often omitted for readability.

#### **Proposition 1. L**[QCL] *is sound.*

*Proof.* We show for all rules that they are sound.


#### **Proposition 2. L**[QCL] *is complete.*

*Proof.* We show this by induction over the (aggregated) formula complexity of the non-classical formulas.


nor of <sup>Δ</sup>. However, then <sup>M</sup> is a model of <sup>A</sup> and therefore satisfies <sup>A</sup>#» ×B to a degree smaller than *opt*QCL(A). This contradicts our assumption that <sup>Γ</sup> (A#» ×B)*opt*QCL(A)+l, Δ is valid. Assume now that the second sequent is not valid, i.e., that there is a model M of Γ that is neither a model of (B)<sup>l</sup> nor of <sup>Δ</sup>. Then, <sup>M</sup> cannot be a model of (A#» ×B)*opt*QCL(A)+<sup>l</sup> and we again have a contradiction to our assumption. As before, it follows by the induction hypothesis that <sup>Γ</sup> (A#» ×B)*opt*QCL(A)+l, Δ is provable.

So far we have not introduced a cut rule, and as we have shown our calculus is complete without such a rule. However, it is easy to see that we have cutadmissibility, i.e., **L**[QCL] can be extended by:

$$\frac{\Gamma \vdash (A)\_k, \Delta \qquad \Gamma', (A)\_k \vdash \Delta'}{\Gamma, \Gamma' \vdash \Delta, \Delta'} \; cut$$

Another aspect of our calculus that should be mentioned is that, although **L**[QCL] is cut-free, we do not have the subformula property. This is especially obvious when looking at the rules for negation, where we use the classical counterpart *cp*(A) of QCL-formulas. For example, <sup>¬</sup>(a#» ×b) in the conclusion of the <sup>¬</sup>-left rule becomes *cp*(a#» ×b) = a ∨ b in the premise.

While we believe that **L**[QCL] is interesting in its own right, the question of how we can use it to obtain a calculus for preferred model entailment arises. Essentially, we have to add a rule that allows us to go from standard to preferred model inferences. As a first approach we consider theories Γ ∪ {A} with Γ consisting only of classical formulas and A being a QCL-formula. In this simple case, preferred models of Γ ∪ {A} are those models of Γ ∪ {A} that satisfy A to the smallest possible degree. One might add the following rule to **L**[QCL]:

$$\frac{\langle F, (A)\_i \vdash \bot \rangle\_{i < k}}{\Gamma, A \vdash\_{\text{QCD}}^{\text{lex}} \Delta} \mid \sim\_{\text{naive}}$$

Intuitively, the above rule states that, if there are no interpretations that satisfy Γ while also satisfying A to a degree lower than k, and if Δ follows from all models of Γ,(A)k, then Δ is entailed by the preferred models of Γ ∪ {A}. However, the obtained calculus **L**[QCL] + |∼*naive* derives invalid sequents.

*Example 6.* The invalid entailment <sup>¬</sup>a, a#» <sup>×</sup><sup>b</sup> |∼*lex* QCL a can be derived via |∼*naive* .

$$\frac{\frac{a \vdash a}{\neg a, a \vdash a} \; wl}{\neg a, (a \vec{\times} b)\_1 \vdash a} \vec{\times} l\_1}{\neg a, a \vec{\times} b \; \sim\_{\text{QCD}}^{\text{lex}} a} \; \sim\_{\text{naive}}$$

What is missing is an assertion that Γ,(A)<sup>k</sup> is satisfiable. Unfortunately, this can not be formulated in **L**[QCL]. A way of addressing this problem is to define a refutation calculus, as has been done for other non-monotonic logics [5].

# **4 Calculus for Preferred Model Entailment**

We now introduce a calculus for preferred model entailment. However, as argued above, we first need to introduce the refutation calculus **L**[QCL]−. In the literature, a rejection method for first-order logic with equality was first introduced in [17] and proved complete w.r.t. finite model theory. Our refutation calculus is based on a simpler rejection method for propositional logic defined in [5]. Using the refutation calculus, we prove that (A)<sup>k</sup> is satisfiable by deriving the antisequent (A)<sup>k</sup> -⊥.

**Definition 12.** *A labeled* QCL*-antisequent is denoted by* Γ - Δ *and it is valid if and only if the corresponding labeled* QCL*-sequent* Γ Δ *is not valid, i.e., if at least one model that satisfies all formulas in* Γ *to the degree specified by the label satisfies no formula in* Δ *to the degree specified by the label.*

Below we give a definition of the refutation calculus **L**[QCL]−. Note that most rules coincide with their counterparts in **L**[QCL]. Binary rules are translated into two rules; one inference rule per premise. (∨r) and (∧l) in **L**[QCL] have an unbounded number of premises, but due to their structure they can be translated into three inference rules. For (∧r) we need to introduce two extra rules for the case that either A or B is not satisfied.

**Definition 13 (L**[QCL]−**).** *The axioms of* **L**[QCL]<sup>−</sup> *are of the form* Γ - Δ*, where* Γ *and* Δ *are disjoint sets of atoms and* ⊥ ∈ Γ*. The inference rules of* **L**[QCL]<sup>−</sup> *are given below. Whenever a labeled formula* (F)<sup>k</sup> *appears in the conclusion of an inference rule it holds that* k ≤ *opt*L(F)*.*

*The logical rules are:*

Γ,(*cp*(A))<sup>1</sup> - Δ - <sup>¬</sup><sup>r</sup> <sup>Γ</sup> - (¬A)1, Δ Γ - (*cp*(A))1, Δ - <sup>¬</sup><sup>l</sup> Γ,(¬A)<sup>1</sup> - Δ Γ,(A)<sup>k</sup> - (B)<k, Δ - <sup>∨</sup>l<sup>1</sup> Γ,(<sup>A</sup> <sup>∨</sup> <sup>B</sup>)<sup>k</sup> - Δ Γ,(B)<sup>k</sup> - (A)<k, Δ - <sup>∨</sup>l<sup>2</sup> Γ,(<sup>A</sup> <sup>∨</sup> <sup>B</sup>)<sup>k</sup> - Δ Γ,(A)<sup>i</sup> - Δ - <sup>∨</sup>r<sup>1</sup> <sup>Γ</sup> - (A ∨ B)k, Δ Γ,(B)<sup>i</sup> - Δ - <sup>∨</sup>r<sup>2</sup> <sup>Γ</sup> - (A ∨ B)k, Δ Γ - (A)k,(B)k, Δ - <sup>∨</sup>r<sup>3</sup> <sup>Γ</sup> - (A ∨ B)k, Δ *where* i<k*.* Γ,(A)<sup>k</sup> - (B)>k, Δ - <sup>∧</sup>l<sup>1</sup> Γ,(<sup>A</sup> <sup>∧</sup> <sup>B</sup>)<sup>k</sup> - Δ Γ,(B)<sup>k</sup> - (A)>k, Δ - <sup>∧</sup>l<sup>2</sup> Γ,(<sup>A</sup> <sup>∧</sup> <sup>B</sup>)<sup>k</sup> - Δ Γ,(A)<sup>i</sup> - Δ - <sup>∧</sup>r<sup>1</sup> <sup>Γ</sup> - (A ∧ B)k, Δ Γ,(¬A)<sup>1</sup> - Δ - <sup>∧</sup>r<sup>2</sup> <sup>Γ</sup> - (A ∧ B)k, Δ Γ,(B)<sup>i</sup> - Δ - <sup>∧</sup>r<sup>3</sup> <sup>Γ</sup> - (A ∧ B)k, Δ Γ,(¬B)<sup>1</sup> - Δ - <sup>∧</sup>r<sup>4</sup> <sup>Γ</sup> - (A ∧ B)k, Δ Γ - (A)k,(B)k, Δ - <sup>∧</sup>r<sup>5</sup> <sup>Γ</sup> -(A ∧ B)k, Δ

*where* i>k*.*

*The rules for ordered disjunction, with* k ≤ *opt*L(A) *and* l ≤ *opt*L(B)*, are:*

$$\begin{array}{c} \frac{\Gamma,(A)\_{k}\models\Delta}{\Gamma,(A\stackrel{\scriptstyle\mathsf{V}}{\scriptstyle\mathsf{A}}B)\_{k}\models\Delta}\models\raisebox{0.1pt{ $\scriptstyle\perp$ }}{\Delta} \vdash\raisebox{0.1pt{ $\scriptstyle\perp$ }}{\Delta} & \dfrac{\Gamma,(B)\_{l},(\neg A)\_{1}\models\Delta}{\Gamma,(A\stackrel{\scriptstyle\mathsf{V}}{\scriptstyle\mathsf{A}}B)\_{\operatorname{opt}\_{\mathsf{Q}\in\mathsf{L}}(A)+l}\models\Delta}\models\raisebox{0.1pt{ $\scriptstyle\perp$ }}{\scriptstyle\exists\cdot\space{1.1pt{ $\scriptstyle\perp$ }}}\forall\,\,\exists\_{1}\\ \frac{\Gamma\vdash(A)\_{k},\Delta}{\Gamma\models\big(\overrightarrow{A\times B}\big)\_{k},\Delta}\models\dfrac{\Gamma\vdash(A\right)\_{1},\Delta}{\Gamma\models\big(\overrightarrow{A\times B}\big)\_{\operatorname{opt}\_{\mathsf{Q}\in\mathsf{L}}(A)+l}\dashve{\Delta}}\models\dfrac{\Gamma\models(B)\_{l},\Delta}{\Gamma\models\big(\overrightarrow{A\times B}\big)\_{\operatorname{opt}\_{\mathsf{Q}\in\mathsf{L}}(A)+l}\models\hbox{ $\scriptstyle\Delta$ }}\models\raisebox{0.1pt{ $\scriptstyle\perp$ }}{\scriptstyle\forall\,\,\forall\_{3}\end{array}$$

*The degree overflow rules, with* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*, are:*

$$\frac{\Gamma, \bot \models \Delta}{\Gamma, (A)\_{\text{opt}\_{\text{QCL}}(A) + k} \not\models \Delta} \not\models \text{dol} \quad \frac{\Gamma \not\models \Delta}{\Gamma \not\models (A)\_{\text{opt}\_{\text{QCL}}(A) + k}, \Delta} \not\models \text{dor}$$

*Example 7.* The following is related to Example 4 and shows that the sequent <sup>¬</sup>(<sup>a</sup> <sup>∧</sup> <sup>b</sup>),((a#» ×b) ∧ (b #» ×c))<sup>2</sup> is satisfiable.

$$\begin{array}{c} \vdots\\ (a\lor b), c, \neg b \not\vdash a \land b, \bot\\ \hline (a\lor b), (b\overrightarrow{\times}c)\_2 \models a \land b, \bot\\ \hline (b\overrightarrow{\times}c)\_2 \not\vdash \neg(a\overrightarrow{\times}b), a\land b, \bot\\ \hline ((a\overrightarrow{\times}b)\land(b\overrightarrow{\times}c))\_2 \not\vdash a \land b, \bot\\ \hline \neg(a\land b), ((a\overrightarrow{\times}b)\land(b\overrightarrow{\times}c))\_2 \not\vdash \bot \end{array} \not\vdash \land l\_2$$

Note that the interpretation {a, c} witnesses (<sup>a</sup> <sup>∨</sup> <sup>b</sup>), c,¬<sup>b</sup> a ∧ b, ⊥.

**Proposition 3. L**[QCL]<sup>−</sup> *is sound.*

*Proof.* The soundness of the negation rules is straightforward. The soundness of the rules (#» ×l1), (#» ×l2) and (#» ×r1) follows by the same argument as for **L**[QCL]. For the remaining rules, it is easy to check that the same model witnessing the validity of the premise also witnesses the validity of the conclusion.

**Proposition 4. L**[QCL]<sup>−</sup> *is complete.*

*Proof.* We show completeness by an induction over the (aggregated) formula complexity. Assume Γ - Δ is valid, i.e. Γ Δ is not valid. Now, there must be a rule in **L**[QCL] for which Γ Δ is the conclusion. By the soundness of **L**[QCL], this implies that at least one of the premises Γ<sup>∗</sup> Δ<sup>∗</sup> is not valid. However, then Γ<sup>∗</sup> - Δ<sup>∗</sup> is valid and, by induction, also provable. Now, by the construction of **L**[QCL]−, there is a rule that allows us to derive Γ - Δ from Γ<sup>∗</sup> -Δ∗.

So far no cut-rule has been introduced for **L**[QCL]−, and indeed, a counterpart of the cut rule would not be sound. One possibility is to introduce a contrapositive of cut as described by Bonatti and Olivetti [5]. Again, it is easy to see that this rule is admissible in our calculus:

$$\frac{\Gamma \not\vdash \Delta \qquad \Gamma, (A)\_k \vdash \Delta}{\Gamma \not\vdash (A)\_k, \Delta} \; cut2\ .}$$

We are now ready to combine **L**[QCL] and **L**[QCL]<sup>−</sup> by defining an inference rule that allows us to go from labeled sequents to non-monotonic inferences. Again, we first consider the case where Γ is classical and A is a choice logic formula. The preferred model inference rule is:

$$\frac{\langle \Gamma, (A)\_{i} \vdash \bot \rangle\_{i \leqslant k}}{\Gamma, A \mid \sim\_{\text{QCL}}^{lex} \Delta} \quad \frac{\Gamma, (A)\_{k} \vdash \bot}{}\_{}$$

.

Intuitively, the premises Γ,(A)<sup>i</sup> ⊥i<k along with Γ,(A)<sup>k</sup> - ⊥ ensure that models satisfying A to a degree of k are preferred, while the premise Γ,(A)<sup>k</sup> Δ ensures that Δ is entailed by those preferred models.

*Example 8.* The valid entailment <sup>¬</sup>(a∧b),(a#» ×b)∧(b #» <sup>×</sup>c) |∼*lex* QCL a∧c, b is provable by choosing k = 2:

$$\frac{\Gamma, \left( (a \stackrel{\scriptstyle \star}{\forall} b) \land (b \stackrel{\scriptstyle \star}{\forall} c) \right)\_1 \vdash \bot \qquad \Gamma, \left( (a \stackrel{\scriptstyle \star}{\forall} b) \land (b \stackrel{\scriptstyle \star}{\forall} c) \right)\_2 \vdash \bot \qquad \Gamma, \left( (a \stackrel{\scriptstyle \star}{\forall} b) \land (b \stackrel{\scriptstyle \star}{\forall} c) \right)\_2 \vdash \bot \qquad \Gamma \vdash\_{\text{simplify}} \quad \vdash\_{\text{simplify}}$$

with Γ = ¬(a∧b) and Δ = a∧c, b. ϕ<sup>3</sup> is the **L**[QCL]-proof from Example 4 and ϕ<sup>2</sup> is the **L**[QCL]−-proof from Example 7. ϕ<sup>1</sup> is not shown explicitly, but it can be verified that the corresponding sequent is provable.

We extend |∼*simple* to the more general case, where more than one non-classical formula may be present, to obtain a calculus for preferred model entailment. An additional rule |∼*unsat* is needed in case a theory is classically unsatisfiable.

**Definition 14 (L**[QCL]*lex* |∼ **).** *Let* <sup>≤</sup><sup>l</sup> *be the order on vectors in* <sup>N</sup><sup>k</sup> *defined by*


**L**[QCL]*lex* |∼ *consists of the axioms and rules of* **<sup>L</sup>**[QCL] *and* **<sup>L</sup>**[QCL]<sup>−</sup> *plus the following rules, where <sup>v</sup>*, *<sup>w</sup>* <sup>∈</sup> <sup>N</sup><sup>k</sup>*,* <sup>Γ</sup> *consists of only classical formulas, and every* A<sup>i</sup> *with* 1 ≤ i ≤ k *is a* QCL*-formula:*

$$\begin{array}{c} \langle \Gamma, (A\_1)\_{w\_1}, \dots, (A\_k)\_{w\_k} \vdash \bot \rangle\_{w \triangleleft v} \quad \Gamma, (A\_1)\_{w\_1}, \dots, (A\_k)\_{v\_k} \vdash \bot \quad \langle \Gamma, (A\_1)\_{w\_1}, \dots, (A\_k)\_{w\_k} \vdash \bot \rangle\_{w \equiv v} \quad \vert \sim\_{lex} \rangle\_{\ell} \\\\ \langle \Gamma, A\_1, \dots, A\_k \mid \sim\_{\text{QCCL}}^{\text{lex}} \Delta & & \\\\ \langle \Gamma, \neg \mathit{cp}(A\_1), \dots, \neg \mathit{cp}(A\_k) \vdash \bot \quad \vert \sim\_{\text{QCCL}} \rangle\_{\ell} \quad \vert \sim\_{\text{unout}} \rangle \end{array} \mid \sim\_{lex}$$

We first provide a small example and then show soundness and completeness.

*Example 9.* Consider the valid entailment <sup>¬</sup>(<sup>a</sup> <sup>∧</sup> <sup>b</sup>),(a#» ×b),(b #» <sup>×</sup>c) |∼*lex* QCL a ∧ c, b similar to Example 8, but with the information that we require (a#» ×b) and (b #» ×c) encoded as separate formulas. It is not possible to satisfy all formulas on the left to a degree of 1. Rather, it is optimal to either satisfy (¬(a∧b))1,(a#» ×b)1,(b #» ×c)<sup>2</sup> or, alternatively, (¬(<sup>a</sup> <sup>∧</sup> <sup>b</sup>))1,(a#» ×b)2,(b #» ×c)1. We choose *v* = (1, 1, 2), with *w* = (1, 1, 1) being the only vector *w* s.t. *w* < *v*. Thus, we get

. . Γ, (<sup>a</sup> #» <sup>×</sup>b)1, (<sup>b</sup> #» ×c)1 ⊥ . . Γ, (<sup>a</sup> #» <sup>×</sup>b)1, (<sup>b</sup> #» <sup>×</sup>c)2 - ⊥ . . Γ, (<sup>a</sup> #» <sup>×</sup>b)1, (<sup>b</sup> #» ×c)2 Δ . . Γ, (<sup>a</sup> #» <sup>×</sup>b)2, (<sup>b</sup> #» <sup>×</sup>c)1 <sup>Δ</sup> |∼*lex* Γ, (<sup>a</sup> #» <sup>×</sup>b), (<sup>b</sup> #» <sup>×</sup>c) |∼*lex* QCL Δ

.

.

.

with Γ = ¬(a ∧ b) and Δ = a ∧ c, b. It can be verified that indeed all branches are provable, but we do not show this explicitly here.

#### **Proposition 5. L**[QCL]*lex* |∼ *is sound.*

*Proof.* Consider first the |∼*lex* -rule and assume that all premises are derivable. By the soundness of **L**[QCL] and **L**[QCL]<sup>−</sup> they are also valid. From the first set of premises Γ,(A1)w<sup>1</sup> ,...,(Ak)w<sup>k</sup> ⊥*w*<*<sup>v</sup>* we can conclude that if there is some model M of Γ that satisfies A<sup>i</sup> to a degree of v<sup>i</sup> for all 1 ≤ i ≤ k, then <sup>M</sup> <sup>∈</sup> *Prf lex* QCL(<sup>Γ</sup> ∪ {A1,...,Ak}). The premise Γ,(A1)v<sup>1</sup> ,...,(Ak)v<sup>k</sup> - ⊥ ensures that there is such a model M. By the last set of premises Γ,(A1)w<sup>1</sup> ,...,(Ak)w<sup>k</sup> Δ*w*=*<sup>v</sup>* , we can conclude that all models of Γ ∪ {A1,...,Ak} that are equally as preferred as <sup>M</sup>, i.e., all <sup>M</sup> <sup>∈</sup> *Prf lex* QCL(Γ ∪ {A1,...,Ak}), satisfy at least one formula in <sup>Δ</sup>. Therefore, Γ, A1,...,A<sup>k</sup> |∼*lex* QCL Δ is valid.

Now consider the |∼*unsat*-rule and assume that Γ, *cp*(A1),..., *cp*(Ak) ⊥ is derivable and therefore valid. Thus, Γ∪{A1,...,Ak} has no models and therefore also no preferred models. Then Γ, A1,...,A<sup>k</sup> |∼*lex* QCL Δ is valid.

#### **Proposition 6. L**[QCL]*lex* |∼ *is complete.*

*Proof.* Assume that Γ, A1,...,A<sup>k</sup> |∼*lex* QCL Δ is valid. If Γ ∪{A1,...,Ak} is unsatisfiable then Γ, *cp*(A1),..., *cp*(Ak) ⊥ is valid, i.e., we can apply the |∼*unsat*rule. Now consider the case that Γ ∪ {A1,...,Ak} is satisfiable and assume that some preferred model M of Γ ∪ {A1,...,Ak} satisfies A<sup>i</sup> to a degree of v<sup>i</sup> for all 1 ≤ i ≤ k. Then, we claim that all premises of the rule are valid and, by the completeness of **L**[QCL] and **L**[QCL]−, also derivable.

Assume by contradiction that one of the premises is not valid. First, consider the case that Γ,(A1)<sup>w</sup><sup>1</sup> ,...,(Ak)<sup>w</sup><sup>k</sup> ⊥ is not valid for some *w* < *w*. Then there is a model M of Γ that satisfies A<sup>i</sup> to a degree of w<sup>i</sup> for all 1 ≤ i ≤ k. However, this contradicts the assumption that M is a preferred model of Γ ∪{A1,...,Ak}.

Next, assume that Γ,(A1)<sup>v</sup><sup>1</sup> ,...,(Ak)<sup>v</sup><sup>k</sup> - ⊥ is not valid. However, M satisfies Γ,(A1)<sup>v</sup><sup>1</sup> ,...,(Ak)<sup>v</sup><sup>k</sup> and does not satisfy ⊥. Contradiction.

Finally, we assume that Γ,(A1)<sup>w</sup><sup>1</sup> ,...,(Ak)<sup>w</sup><sup>k</sup> Δ is not valid for some *w* = *v*. Then, there is a model M of Γ that satisfies A<sup>i</sup> to a degree of w<sup>i</sup> for all 1 ≤ i ≤ k but does not satisfy any formula in Δ. But M is a preferred model of <sup>Γ</sup> ∪ {A1,...,Ak}, which contradicts Γ, A1,...,A<sup>k</sup> |∼*lex* QCL Δ being valid.

In this paper, we focused on the lexicographic semantics for preferred models of choice logic theories. However, rules for other semantics, e.g. a rule |∼*inc* for the inclusion based approach (cf. Definition 8), can be obtained by simply adapting the way in which vectors over N<sup>k</sup> are compared (cf. Definition 14).

## **5 Beyond QCL**

QCL was the first choice logic to be described [7], and applications concerned with QCL and ordered disjunction have been discussed in the literature [3,8,13]. For this reason, the main focus in this paper lies with QCL. However, as we have seen in Sect. 2, CCL and its ordered conjunction show that interesting logics similar to QCL exist. We will now demonstrate that **L**[QCL] can easily be adapted for other choice logics. In particular, we introduce **L**[CCL] in which the rules of **L**[QCL] for the classical connectives can be retained. All that is needed is to replace the #» <sup>×</sup>-rules by appropriate rules for the choice connective #» of CCL.

**Definition 15 (L**[CCL]**). L**[CCL] *is* **<sup>L</sup>**[QCL]*, except that the* #» ×*-rules are replaced by the following* #»*-rules:*

Γ,(A)1,(B)<sup>k</sup> <sup>Δ</sup> #»l<sup>1</sup> Γ,(A#»B)<sup>k</sup> <sup>Δ</sup> Γ,(A)l,(¬B)<sup>1</sup> <sup>Δ</sup> #»l<sup>2</sup> Γ,(A#»B)*opt*CCL(B)+<sup>l</sup> <sup>Δ</sup> Γ,(A)<sup>m</sup> <sup>Δ</sup> #»l<sup>3</sup> Γ,(A#»B)*opt*CCL(B)+<sup>m</sup> <sup>Δ</sup> <sup>Γ</sup> (A)1,Δ Γ (B)k, Δ #»r<sup>1</sup> <sup>Γ</sup> (A#»B)k, Δ <sup>Γ</sup> (A)l,Δ Γ (¬B)1, Δ #»r<sup>2</sup> <sup>Γ</sup> (A#»B)*opt*CCL(B)+l, Δ <sup>Γ</sup> (A)m, Δ #»r<sup>3</sup> <sup>Γ</sup> (A#»B)*opt*CCL(B)+m, Δ

*where* k ≤ *opt*CCL(B)*,* l ≤ *opt* CCL(A)*, and* 1 < m ≤ *opt*CCL(A)*.*

Note that, given Γ,(A#»B)*opt*CCL(B)+<sup>m</sup> <sup>Δ</sup> with 1 < m <sup>≤</sup> *opt*CCL(A), we need to guess whether #»l<sup>2</sup> or #»l<sup>3</sup> has to be applied. We do not define **<sup>L</sup>**[CCL]<sup>−</sup> here, but the necessary rules for #» can be inferred from the #»-rules of **<sup>L</sup>**[CCL] in a similar way to how **L**[QCL]<sup>−</sup> was derived from **L**[QCL].

## **Proposition 7. L**[CCL] *is sound.*

*Proof.* We consider the newly introduced rules.


#### **Proposition 8. L**[CCL] *is complete.*

*Proof.* We adapt the induction of the proof of Proposition 2:

– Assume that a sequent of the form Γ,(A#»B)<sup>k</sup> <sup>Δ</sup> is valid, with <sup>k</sup> <sup>≤</sup> *opt*L(B). All models that satisfy (A#»B)<sup>k</sup> must satisfy <sup>A</sup> to a degree of 1 and <sup>B</sup> to a degree of k. Thus, Γ,(A)1,(B)<sup>k</sup> Δ is valid, and, by the induction hypothesis, Γ,(A#»B)<sup>k</sup> <sup>Δ</sup> is provable. Similarly for the cases Γ,(A#»B)*opt*CCL(B)+<sup>l</sup> <sup>Δ</sup> with <sup>l</sup> <sup>≤</sup> *opt* CCL(A), and Γ,(A#»B)*opt*CCL(B)+<sup>m</sup> <sup>Δ</sup> with 1 < m <sup>≤</sup> *opt*CCL(A).

– Assume that a sequent of the form <sup>Γ</sup> (A#»B)k, Δ is valid, with <sup>k</sup> <sup>≤</sup> *opt*L(B). We claim that then Γ (A)1, Δ and Γ (B)k, Δ are valid. Assume, for the sake of a contradiction, that the first sequent is not valid. This means that there is a model M of Γ that is neither a model of (A)<sup>1</sup> nor of Δ. However, then <sup>M</sup> satisfies <sup>A</sup>#»<sup>B</sup> to a degree higher than *opt* CCL(B). This contradicts the assumption that <sup>Γ</sup> (A#»B)k, Δ is valid. Assume now that the second sequent is not valid, i.e., that there is a model M of Γ that is neither a model of (B)<sup>k</sup> nor of <sup>Δ</sup>. Then <sup>M</sup> cannot be a model of (A#»B)k, contradicting the assumption. As before, it follows by the induction hypothesis that Γ (A#»B)k, Δ is provable. Similarly for the cases <sup>Γ</sup> (A#»B)*opt*CCL(B)+<sup>l</sup>, Δ with <sup>l</sup> <sup>≤</sup> *opt*CCL(A), and <sup>Γ</sup> (A#»B)*opt*CCL(B)+<sup>m</sup>, Δ with 1 < m <sup>≤</sup> *opt*CCL(A).

We are confident that our methods can be adapted not only for QCL and CCL, but for numerous other instantiations of the choice logic framework defined in Sect. 2. We mention here *lexicographic choice logic* (LCL) [4], in which <sup>A</sup>#» <sup>B</sup> expresses that it is best to satisfy A and B, second best to satisfy only A, third best to satisfy only B, and unacceptable to satisfy neither.

Moreover, note that the inference rules |∼*lex* and |∼*unsat* (cf. Definition 14) do not depend on any specific choice logic. Thus, once labeled calculi are developed for a choice logic, a calculus for preferred model entailment follows immediately.

## **6 Conclusion**

In this paper we introduce a sound and complete sequent calculus for preferred model entailment in QCL. This non-monotonic calculus is built on two calculi: a monotonic labeled sequent calculus and a corresponding refutation calculus.

Our systems are modular and can easily be adapted: on the one hand, calculi for choice logics other than QCL can be obtained by introducing suitable rules for the choice connectives of the new logic, as exemplified with our calculus for CCL; on the other hand, a non-monotonic calculus for preferred model semantics other than the lexicographic semantics can be obtained by adapting the inference rule |∼*lex* which transitions from preferred model entailment to the labeled calculi.

Our work contributes to the line of research on non-monotonic sequent calculi that make use of refutation systems [5]. Our system is the first proof calculus for choice logics, which have been studied mainly from the viewpoint of their computational properties [4] and their potential applications [3,8,13] so far.

Regarding future work, we aim to investigate the proof complexity of our calculi, and how this complexity might depend on which choice logic or preferred model semantics is considered. Also, calculi for other choice logics such as LCL could be explicitly defined, as was done with CCL in Sect. 5.

**Acknowledgments.** We thank the anonymous reviewers for their valuable feedback. This work was funded by the Austrian Science Fund (FWF) under the grants Y698 and J4581.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Lash 1.0 (System Description)**

Chad E. Brown<sup>1</sup> and Cezary Kaliszyk2(B)

<sup>1</sup> Czech Technical University in Prague, Prague, Czech Republic <sup>2</sup> University of Innsbruck, Innsbruck, Austria cezary.kaliszyk@uibk.ac.at

**Abstract.** Lash is a higher-order automated theorem prover created as a fork of the theorem prover Satallax. The basic underlying calculus of Satallax is a ground tableau calculus whose rules only use shallow information about the terms and formulas taking part in the rule. Lash uses new, efficient C representations of vital structures and operations. Most importantly, Lash uses a C representation of (normal) terms with perfect sharing along with a C implementation of normalizing substitutions. We describe the ways in which Lash differs from Satallax and the performance improvement of Lash over Satallax when used with analogous flag settings. With a 10 s timeout Lash outperforms Satallax on a collection TH0 problems from the TPTP. We conclude with ideas for continuing the development of Lash.

**Keywords:** Higher-order logic · Automated reasoning · TPTP

#### **1 Introduction**

Satallax [4,7] is an automated theorem prover for higher-order logic that was a top competitor in the THF division of CASC [10] for most of the 2010s. The basic calculus of Satallax is a complete ground tableau calculus [2,5,6]. In recent years the top systems of the THF division of CASC are primarily based on resolution and superposition [3,8,11]. At the moment it is an open question whether there is a research and development path via which a tableau based prover could again become competitive. As a first step towards answering this question we have created a fork of Satallax, called Lash, focused on giving efficient C implementations of data structures and operations needed for search in the basic calculus.

Satallax was partly competitive due to (optional) additions that went beyond the basic calculus. Three of the most successful additions were the use of higherorder pattern clauses during search, the use of higher-order unification as a heuristic to suggest instantiations at function types and the use of the firstorder theorem prover E as a backend to try to prove the first-order part of the current state is already unsatisfiable. Satallax includes flags that can be used to activate or deactivate such additions so that search only uses the basic calculus. They are deactivated by default. Satallax has three representations of terms in Ocaml. The basic calculus rules use the primary representation. Higher-order unification and pattern clauses make use of a representation that includes a case for metavariables to be instantiated. Communication with E uses a third representation restricted to first-order terms and formulas. When only the basic calculus is used, only the primary representation is needed.

Assuming only the basic calculus is used only limited information about (normal) terms is needed during the search. Typically we only need to know the outer structure of the principal formulas of each rule, and so the full term does not need to be traversed. In some cases Satallax either implicitly or explicitly traverses the term. The implicit cases are when a rule needs to know if two terms are equal. In Satallax, Ocaml's equality is used to test for equality of terms, implicitly relying on a recursion over the term. The explicit cases are quantifier rules that instantiate with either a term or a fresh constant. In the former case we may also need to normalize the result after instantiating with a term.

In order to give an optimized implementation of the basic calculus we have created a new theorem prover, Lash<sup>1</sup>, by forking a recent version of Satallax (Satallax 3.4), the last version that won the THF division of CASC (in 2019). Generally speaking, we have removed all the additional code that goes beyond the basic calculus. In particular we do not need terms with metavariables since we support neither pattern clauses nor higher-order unification in Lash. Likewise we do not need a special representation for first-order terms and formulas since Lash does not communicate with E. We have added efficient C implementations of (normal) terms with perfect sharing. Additionally we have added new efficient C implementations of priority queues and the association of formulas with integers (to communicate with MiniSat). To measure the speedup given by the new parts of the implementation we have run Satallax 3.4 using flag settings that only use the basic calculus and Lash 1.0 using the same flag settings. We have also compared Lash to Satallax 3.4 using Satallax's default strategy with a timeout of 10 s, and have found that Lash 1.0 outperforms Satallax with this short timeout even when Satallax is using the optional additions (including calling E). We describe the changes and present a number of examples for which the changes lead to a significant speedup.

#### **2 Preliminaries**

We will presume a familiarity with simple type theory and only give a quick description to make our use of notation clear, largely following [6]. We assume a set of base types, one of which is the type o of propositions (also called booleans), and the rest we refer to as sorts. We use α, β to range over sorts and σ, τ to range over types. The only types other than base types are function types στ , which can be thought of as the type of functions from σ to τ .

All terms have a unique type and are inductively defined as (typed) variables, (typed) constants, well-typed applications (t s) and λ-abstractions (λx.t). We

<sup>1</sup> Lash 1.0 along with accompanying material is available at http://grid01.ciirc.cvut. cz/∼chad/ijcar2022lash/.

also include the logical constant <sup>⊥</sup> as a term of type <sup>o</sup>, terms (of type <sup>o</sup>) of the form (<sup>s</sup> <sup>⇒</sup> <sup>t</sup>) (implications) and (∀x.t) (universal quantifiers) where s, t have type o and terms (of type o) of the form (s =<sup>σ</sup> t) where s, t have a common type σ. We also include choice constants ε<sup>σ</sup> of (σo)σ at each type σ. We write <sup>¬</sup><sup>t</sup> for <sup>t</sup> ⇒ ⊥ and (<sup>s</sup> =<sup>σ</sup> <sup>t</sup>) for (<sup>s</sup> <sup>=</sup><sup>σ</sup> <sup>t</sup> ⇒ ⊥). We omit type parentheses and type annotations except where they are needed for clarity. Terms of type o are also called propositions. We also use , <sup>∨</sup>,∧, <sup>∃</sup> with the understanding that these are notations for equivalent propositions in the set of terms above.

We assume terms are equal if they are the same up to α-conversion of bound variables (using de Bruijn indices in the implementation). We write [s] for the βη-normal form of s.

The tableau calculi of [6] (without choice) and [2] (with choice) define when a branch is refutable. A branch is a finite set of normal propositions. We let A range over branches and write A, s for the branch <sup>A</sup>∪ {s}. We will not give a full calculus, but will instead discuss a few of the rules with surprising properties. Before doing so we emphasize rules that are *not* in the calculus. There is no cut rule stating that if A, s and A,¬<sup>s</sup> are refutable, then <sup>A</sup> is refutable. (During search such a rule would require synthesizing the cut formula s.) There is also no rule stating that if the branch A,(s = t), [ps], [pt] is refutable, then A,(s = t), [ps] is refutable (where s, t have type σ and p is a term of type σo). That is, there is no rule for rewriting into arbitrarily deep positions using equations.

All the tableau rules only need to examine the outer structure to test if they apply (when searching backwards for a refutation). When applying the rule, new formulas are constructed and added to the branch (or potentially multiple branches, each a subgoal to be refuted). An example is the confrontation rule, the only rule involving positive equations. The confrontation rule states that if <sup>s</sup> <sup>=</sup><sup>α</sup> <sup>t</sup> and <sup>u</sup> =<sup>α</sup> <sup>v</sup> are on a branch <sup>A</sup> (where <sup>α</sup> is a sort), then we can refute <sup>A</sup> by refuting A, s <sup>=</sup> u, t <sup>=</sup> <sup>u</sup> and A, s <sup>=</sup> v, t <sup>=</sup> <sup>v</sup>. A similar rule is the mating rule, which states that if ps<sup>1</sup> ...s<sup>n</sup> and <sup>¬</sup>pt<sup>1</sup> ...t<sup>n</sup> are on a branch <sup>A</sup> (where <sup>p</sup> is a constant of type <sup>σ</sup><sup>1</sup> ··· <sup>σ</sup>no), then we can refute <sup>A</sup> by refuting each of the branches A, s<sup>i</sup> <sup>=</sup> <sup>t</sup><sup>i</sup> for each <sup>i</sup> ∈ {1,...,n}. The mating rule demonstrates how disequations can appear on a branch even if the original branch to refute contained no reference to equality at all. One way a branch can be closed is if <sup>s</sup> <sup>=</sup> <sup>s</sup> is on the branch. In an implementation, this means an equality check is done for <sup>s</sup> and <sup>t</sup> whenever a disequation <sup>s</sup> <sup>=</sup> <sup>t</sup> is added to the branch. In Satallax this requires Ocaml to traverse the terms. In Lash this only requires comparing the unique integer ids the implementation assigns to the terms.

The disequations generated on a branch play an important role. Terms (of sort α) occuring on one side of a disequation on a branch are called *discriminating terms*. The rule for instantiating a quantified formula <sup>∀</sup>x.t (where <sup>x</sup> has sort α) is restricted to instantiating with discriminating terms (or a default term if no terms of sort α are discriminating). During search in Satallax this means there is a finite set of permitted instantiations (at sort α) and this set grows as disequations are produced. Note that, unlike most automated theorem provers, the instantiations do not arise from unification. In Satallax (and Lash) when <sup>∀</sup>x.t is being processed it is instantiated with all previously processed instantiations. When a new instantiation is produced, previously processed universally quantified propositions are instantiated with it. When <sup>∀</sup>x.t is instantiated with s, then [(λx.t)s] is added to the branch. Such an instantiation is the important case where the new formula involves term traversals: both for substitution and normalization. In Satallax the substitution and normalization require multiple term traversals. In Lash we have used normalizing substitutions and memorized previous computations, minimizing the number of term traversals. The need to instantiate arises when processing either a universally quantified proposition (giving a new quantifier to instantiate) or a disequation at a sort (giving new discriminating terms).

We discuss a small example both Satallax and Lash can easily prove. We briefly describe what both do in order to give the flavor of the procedure and (hopefully) prevent readers from assuming the provers behave too similarly from readers based on other calculi (e.g., resolution).

Example SEV241^5 from TPTP v7.5.0 [9] (X5201A from Tps [1]) contains a minor amount of features going beyond first-order logic. The statement to prove is

$$\forall x. U \; x \land W \; x \Rightarrow \forall S. (S = U \lor S = W) \Rightarrow Sx.$$

Here U and W are constants of type αo, x is a variable of type α and S is a variable of type αo. The higher-order aspects of this problem are the quantifier for S (though this could be circumvented by making S a constant like U and W) and the equations between predicates (though these could be circumvented by replacing <sup>S</sup> <sup>=</sup> <sup>U</sup> by <sup>∀</sup>y.Sy <sup>⇔</sup> Uy and replacing <sup>S</sup> <sup>=</sup> <sup>W</sup> similarly). The tableau rules effectively do both during search.

Satallax never clausifies. The formula above is negated and assumed. We will informally describe tableau rules as splitting the problem into subgoals, though this is technically mediated through MiniSat (where the set of MiniSat clauses is unsatisfiable when all branches are closed). Tableau rules are applied until the problem involves a constant c (for x), a constant S for S and assumptions U c, W c, <sup>S</sup> <sup>=</sup> <sup>U</sup> <sup>∨</sup> <sup>S</sup> <sup>=</sup> <sup>W</sup> and <sup>¬</sup>S c on the branch. The disjunction is internally <sup>S</sup> <sup>=</sup> <sup>U</sup> <sup>⇒</sup> <sup>S</sup> <sup>=</sup> <sup>W</sup> and the implication rule splits the problem into two branches, one with S = U and one with S = W. Both branches are solved in analogous ways and we only describe the S = U branch. Since S = U is an equation at function type, the relevant rule adds <sup>∀</sup>y.S y = Uy to the branch. Since there are no disequations on the branch, there is no instantiation available for <sup>∀</sup>y.S y = Uy. In such a case, a default instantiation is created and used. That is, a default constant d (of sort α) is generated and we instantiate with this d, giving S d =<sup>o</sup> U d. The rule for equations at type o splits into two subgoals: one branch with S <sup>d</sup> and U d and another with <sup>¬</sup>S <sup>d</sup> and <sup>¬</sup>U d. On the first branch we mate S <sup>d</sup> with <sup>¬</sup>S <sup>c</sup> adding the disequation <sup>d</sup> <sup>=</sup> <sup>c</sup> to the branch. This makes <sup>c</sup> available as an instantiation for <sup>∀</sup>y.S y = Uy. After instantiating with c the rest of the subcase is straightforward. In the other subgoal we mate U c with <sup>¬</sup>U d giving the disequation <sup>c</sup> <sup>=</sup> <sup>d</sup>. Again, <sup>c</sup> becomes available as an instantiation and the rest of the subcase is straightforward.

### **3 Terms with Perfect Sharing**

Lash represents normal terms as C structures, with a unique integer id assigned to each term. The structure contains a tag indicating which kind of term is represented, a number that is used to either indicate the de Bruijn index (for a variable), the name (for a constant), or the type (for a λ-abstraction, a universal quantifier, a choice operator, or an equation). Two pointers (optionally) point to relevant subterms in each case. In addition the structure maintains the information of which de Bruijn indices are free in the term (with de Bruijn indices limited to a maximum of 255). Knowing the free de Bruijn indices of terms makes recognizing potential η-redexes possible without traversing the λabstraction. Likewise it is possible to determine when shifting and substitution of de Bruijn indices would not affect a term, avoiding the need to traverse the term.

In Ocaml only the unique integer id is directly revealed and this is sufficient to test for equality of terms. Hash tables are used to uniquely assign types to integers and strings (for names) to integers and these integers are used to interface with the C code. Various functions are used in the Ocaml-C interface to request the construction of (normal) terms. For example, given the two Ocaml integer ids i and j corresponding to terms s and t, the function mk norm ap given i and j will return an integer k corresponding to the normal term [s t]. The C implementation recognizes if s is a λ-abstraction and performs all βηreductions to obtain a normal term. Additionally, the C implementation treats terms as graphs with perfect sharing, and additionally caches previous operations (including substitutions and de Bruijn shifting) to prevent recomputation.

In addition to the low-level C term reimplementation, we have also provided a number of other low-level functionalities replacing the slower parts of the Ocaml code. This includes low-level priority queues, as well as C code used to associate the integers representing normal propositions with integers that are used to communicate with MiniSat. The MiniSat integers are nonzero and satisfy the property that minus on integers corresponds to negation of propositions.

#### **4 Results and Examples**

The first mode in the default schedule for Satallax 3.1 is mode213. This mode activates one feature that goes beyond the basic calculus: pattern clauses. Additionally the mode sets a flag that tries to split the initial goal into several independent subgoals before beginning the search proper. Through experimentation we have found that setting a flag (common to both Satallax and Lash) to essentially prevent MiniSat from searching (i.e., only using MiniSat to recognize contradictions that are evident without search) often improves the performance. We have created a modified mode mode213d that deactivates these additions (and delays the use of MiniSat) so that Satallax and Lash will have a similar (and often the same) search space. (Sometimes the search spaces differ due to differences in the way Satallax and Lash enumerate instantiations for function types, an issue we


**Table 1.** Lash vs. Satallax on 2053 TH0 Problems.

will not focus on here.) We have also run Lash with many variants of Satallax modes with similar modifications. From such test runs we have created a 10 s schedule consisting of 5 modes.

To give a general comparison of Satallax and Lash we have run both on 2053 TH0 problems from a recent release of the TPTP [9] (7.5.0). We initially selected all problems with TPTP status of Theorem or Unsatisfiable (so they should be provable in principle) without polymorphism (or similar extensions of TH0). We additionally removed a few problems that could not be parsed by Satallax 3.4 and removed a few hundred problems big enough to activate SINE in Satallax 3.4.

We ran Lash for 10 s with its default schedule over this problem set. For comparison, we have run Satallax 3.4 for 10 s in three different ways: using the Lash schedule (since the flag settings make sense for both systems) and using Satallax 3.4's default schedule both with and without access to E [12]. The results are reported in Table 1. It is already promising that Lash has the ability to slightly outperform Satallax even when Satallax is allowed to call E.

To get a clearer view of the improvement we discuss a few specific examples.

TPTP problem NUM638^1 (part of Theorem 3 from the Automath formalization of Landau's book) is about the natural numbers (starting from 1). The problem assumes a successor function s is injective and that every number other than 1 has a predecessor. An abstract notion of existence is used by having a constant some of type (ιo)o about which no extra assumptions are made, so the assumption is formally <sup>∀</sup>x.x = 1 <sup>⇒</sup> some(λu.x <sup>=</sup> su). For a fixed <sup>n</sup>, <sup>n</sup> = 1 is assumed and the conjecture to prove is the negation of the implication (∀xy.n <sup>=</sup> sx <sup>⇒</sup> <sup>n</sup> <sup>=</sup> sy <sup>⇒</sup> <sup>x</sup> <sup>=</sup> <sup>y</sup>) ⇒ ¬(some(λu.n <sup>=</sup> su)). The implication is assumed and the search must rule out the negation of the antecedent (i.e., that n has two predecessors) and the succedent (that n has no predecessor). Satallax and Lash both take 3911 steps to prove this example. With mode213d, Lash completes the search in 0.4 s while Satallax requires almost 29 s.

TPTP problem SEV108^5 (SIX\_THEOREM from Tps [1]) corresponds to proving the Ramsey number R(3,3) is at most 6. The problem assumes there is a symmetric binary relation R (the edge relation of a graph with the sort as vertices) and there are (at least) 6 distinct elements. The conclusion is that there are either 3 distinct elements all of which are R-related or 3 distinct elements none of which are R-related. Satallax and Lash can solve the problem in 14129 steps with mode mode213d. Satallax proves the theorem in 0.153 s while Lash proves the theorem in the same number of steps but in 0.046 s.

The difference is more impressive if we consider the modified problem of proving R(3, 4) is at most 9. That is, we assume there are (at least) 9 distinct elements and modify the second disjunct of the conclusion to be that there are 4 distinct elements none of which are R-related. Satallax and Lash both use 186127 steps to find the proof. For Satallax this takes 44 s while for Lash this takes 5.5 s.

The TPTP problem SYO506^1 is about an if-then-else operator. The problem has a constant c of type oιιι. Instead of giving axioms indicating c behaves as an if-then-else operator, the conjecture is given as a disjunction:

$$
\neg(\forall xy. c \; (x = y) \; x \; y \; = \; y) \lor \neg(\forall xy. c \; \top \; x \; y \; = \; x) \lor \neg(\forall xy. c \; \bot \; x \; y \; = \; y).
$$

After negating the conjecture and applying the first few tableau rules the branch will contain the propositions <sup>∀</sup>xy.c x y <sup>=</sup> <sup>x</sup>, <sup>∀</sup>xy.c <sup>⊥</sup> x y <sup>=</sup> <sup>y</sup> and the disequation <sup>c</sup> (<sup>d</sup> <sup>=</sup> <sup>e</sup>) d e <sup>=</sup> <sup>e</sup> for fresh <sup>d</sup> and <sup>e</sup> of type <sup>ι</sup>. In principle the rules for if-then-else given in [2] could be used to solve the problem without using the universally quantified formulas (other than to justify that c *is* an if-thenelse operator). However, these are not implemented in Satallax or Lash. Instead search proceeds as usual via the basic underlying procedure. Both Satallax and Lash can prove the example using modes mode0c1 in 32704 steps. Satallax performs the search in 9.8 s while Lash completes the search in 0.2 s.

In addition to the examples considered above, we have constructed a family of examples intended to demonstrate the power of the shared term representation and caching of operations. Let cons have type ιιι and nil have type ι. For each natural number n, consider the proposition C<sup>n</sup> given by

```
n (λx.cons x x) (cons nil nil) = cons (n (λx.cons x x) nil) (n (λx.cons x x) nil)
```
where n is the appropriately typed Church numeral. Proving the proposition C<sup>n</sup> does not require any search and merely requires the prover to normalize the conjecture and note the two sides have the same normal form. However, this normal form on both sides will be a complete binary tree of depth n+1. We have run Lash and Satallax on <sup>C</sup><sup>n</sup> with <sup>n</sup> ∈ {20, <sup>21</sup>, <sup>22</sup>, <sup>23</sup>, <sup>24</sup>} using mode mode213d. Lash solves all five problems in the same amount of time, less than 0.02 s for each. Satallax takes 4 s, 8 s, 16 s, 32 s and 64 s. As expected, since Satallax is not using a shared representation, the computation time exponentially increases with respect to n.

## **5 Conclusion and Future Work**

We have used Lash as a vehicle to demonstrate that giving a more efficient implementation of the underlying tableau calculus of Satallax can lead to significant performance improvements. An obvious possible extension of Lash would be to implement pattern clauses, higher-order unification and the ability to call E. While we may do this, our current plans are to focus on directions that further diverge from the development path followed by Satallax.

Interesting theoretical work would be to modify the underlying calculus (while maintaining completeness). For example the rules of the calculus might be able to be further restricted based on orderings of ground terms. On the other hand, new rules might be added to support a variety of constants with special properties. This was already done for constants that satisfy axioms indicating the constant is a choice, description or if-then-else operator [2]. Suppose a constant <sup>r</sup> of type ιιo is known to be reflexive due to a formula <sup>∀</sup>x.r x x being on the branch. One could avoid ever instantiating this universally quantified formula by simply including a tableau rule that extends a branch with <sup>s</sup> <sup>=</sup> <sup>t</sup> whenever <sup>¬</sup>rst is on the branch. Similar rules could operationalize other special cases of universally quantified formulas, e.g., formulas giving symmetry or transitivity of a relation. A modification of the usual completeness proof would be required to prove completeness of the calculus with these additional rules (and with the restriction disallowing instantiating the corresponding universally quantified formulas).

Finally the C representation of terms could be extended to include precomputed special features. Just as the current implementation knows which de Bruijns are free in the term (without traversing the term), a future implementation could know other features of the term without requiring traversal. Such features could be used to guide the search.

**Acknowledgements.** The results were supported by the Ministry of Education, Youth and Sports within the dedicated program ERC CZ under the project POSTMAN no. LL1902 and the ERC starting grant no. 714034 SMART.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Go´eland: A Concurrent Tableau-Based Theorem Prover (System Description)**

Julie Cailler , Johann Rosain , David Delahaye , Simon Robillard(B) , and Hinde Lilia Bouziane

LIRMM, Univ Montpellier, CNRS, Montpellier, France {julie.cailler,johann.rosain,david.delahaye,simon.robillard, hinde.bouziane}@lirmm.fr

**Abstract.** We describe Go´eland, an automated theorem prover for firstorder logic that relies on a concurrent search procedure to find tableau proofs, with concurrent processes corresponding to individual branches of the tableau. Since branch closure may require instantiating free variables shared across branches, processes communicate via channels to exchange information about substitutions used for closure. We present the proof search procedure and its implementation, as well as experimental results obtained on problems from the TPTP library.

**Keywords:** Automated Theorem Proving · Tableaux · Concurrency

## **1 Introduction**

Although clausal proof techniques have enjoyed success in automated theorem proving, some applications benefit from reasoning on unaltered formulas (rather than Skolemized clauses), while others require the production of proofs in a sequent calculus. These roles are fulfilled by provers based on the tableau method [17], as initially designed by Beth and Hintikka [2,13]. For first-order logic, efficient handling of universal formulas is typically achieved with free variables that are instantiated only when needed to close a branch. This step is said to be *destructive* because it may affect open branches sharing variables. This causes fairness (and consequently, completeness) issues, as illustrated in Fig. 1. In this example, exploring the left branch produces a substitution that prevents direct closure of the right branch. Reintroducing the original quantified formula with a different free variable is not sufficient to close the right branch, because an applicable δ-rule creates a new Skolem symbol that will result in a different but equally problematic substitution every time a left branch is explored. Thus, systematically exploring the left branch before the right leads to non-termination of the search. Conversely, exploring the right branch first produces a substitution (which instantiates the free variable X with a rather than b) that closes both branches.

Concurrent computing offers a way to implement a proof search procedure that explores branches simultaneously. Such a procedure can compare closing


**Fig. 1.** Incompleteness caused by unfair selection of branches

substitutions to detect (dis)agreements between branches, and consequently either close branches early, or restart proof attempts with limited backtracking. The simultaneous exploration of branches is handled by the concurrency system, either by interleaving computations through scheduling, or by executing tasks in parallel if the hardware resources allow it. A concurrent procedure naturally lends itself to parallel execution, allowing us to take advantage of multi-core architectures for efficient first-order theorem proving. Thus, concurrency provides an elegant and efficient solution to proof search with free variable tableaux.

In this paper, we describe a concurrent destructive proof search procedure for first-order analytic tableaux (Sect. 2) and its implementation in a tool called Go´eland, as well as its evaluation on problems from the TPTP library [19] and comparison to other state-of-the-art provers (Sect. 3).

*Related Work.* A lot of research has been carried out on the parallelization of proof search procedures [4], often focusing primarily on parallel execution and performance. In contrast, we use concurrency not only as a way to take advantage of multi-core architectures, but also as an algorithmic device that is useful even for sequential execution (with interleaved threads). Some concurrent and parallel approaches focus more distinctly on the exploration of the search space, either by dividing the search space between processes (*distributed search*) or by using processes with different search plans on the same space (*multi search*) [3]. These approaches can be performed either by *heterogeneous systems* that rely on cooperation between systems with different inference systems [1,8,12], or *homogeneous systems* where all deductive processes use the same inference system. According to this classification, the technique presented here is a homogeneous system that performs a distributed search. Concurrent tableaux provers include the model-elimination provers CPTheo [12] and Partheo [18], and the higherorder prover Hot [15], which notably uses concurrency to deal with fairness issues arising from the non-terminating nature of higher-order unification. Lastly, concurrency has been used as the basis of a generic framework to present various proof strategies [10] or allow distributed calculations over a network [21].

### **2 Concurrent Proof Search**

*Free Variable Tableaux.* Go´eland attempts to build a refutation proof for a firstorder formula, i.e., a closed tableau for its negation, using a standard free-variable tableau calculus [11]. The calculus is composed of α-, γ- and δ-rules that extend a branch with one formula, β-rules that divide a branch by extending it with two formulas, and a --rule that closes a branch. γ-rules deal with universallyquantified formulas by introducing a formula with a free variable. A free variable is not universally quantified, but is instead a placeholder for some term instantiation, typically determined upon branch closure. δ-rules deal with existentiallyquantified formulas by introducing a formula with a Skolem function symbol that takes as arguments the free variables in the branch. This ensures freshness of the Skolem symbol independently of variable instantiation.

The branch closure rule applies to a branch carrying atomic formulas P and <sup>Q</sup> such that, for some substitution <sup>σ</sup>, <sup>σ</sup>(P) = <sup>σ</sup>(¬Q). In that case, <sup>σ</sup> is applied to all branches. That rule is consequently *destructive*: applying a substitution to close one branch may modify another, removing the possibility to close it immediately. A tableau is closed when all its branches are closed. Closing a tableau can thus be seen as providing a global unifier that closes all branches.

*Semantics for Concurrency.* Go´eland relies on a concurrent search procedure. In order to present this procedure, we use a simple While language augmented with instructions for concurrency, in the style of CSP [14]. Each process has its own variable store, as well as a collection of process identifiers used for communication: πparent denotes the identifier of a process's parent, while Πchildren denotes the collection of identifiers of active children of that process. Given a process identifier π and an expression e, the command π **!** e is used to send an asynchronous message with the value e to the process identified by π. Conversely, the command π **?** x blocks the execution until the process identified by π sends a message, which is stored in the variable x. Lastly, the instruction **start** creates a new process that executes a function with some given arguments, while the instruction **kill** interrupts the execution of a process according to its identifier.

*Proof Search Procedure.* The proof search is carried out concurrently by processes corresponding to branches of the tableau. Processes are started upon application of a β-rule, one for each new branch. Communications between processes take two forms: a process may send a set of closing substitutions for its branch to its parent, or a parent may send a substitution (that closes one of its children's branch) to the other children. The proof search is performed by the proofSearch, waitF orP arent, and waitF orChildren procedures (described in Procedures 1, 2, and 3, respectively).

The proofSearch procedure initiates the proof search for a branch. It first attempts to apply the closure rule. A closing substitution is called *local* to a process if its domain includes only free variables introduced by this process or one of its descendants (i.e., if the variables do not occur higher in the proof tree). If one of the closing substitutions is local to the process, it is reported and the

#### **Procedure 1:** proofSearch


process terminates. If only non-local closing substitutions are found, they are reported and the process executes waitF orP arent. Otherwise, the procedure applies tableau expansion rules according to the priority: <sup>α</sup> <sup>≺</sup> <sup>δ</sup> <sup>≺</sup> <sup>β</sup> <sup>≺</sup> <sup>γ</sup>. If a β-rule is applied, new processes are started, and each of them executes proofSearch on the newly created branch, while the current process executes waitF orChildren.

The waitF orP arent procedure is executed by a process after it has found closing non-local substitutions. Such substitutions may prevent closure in other branches. In these cases, the parent will eventually send another candidate substitution. waitF orP arent waits until such a substitution is received, and triggers a new step of proof search. The process may also be terminated by its parent (via the **kill** instruction) during the execution of this procedure, if one of the substitutions previously sent by the process leads to closing the parent's branch.

The waitF orChildren procedure is executed by a process after the application of a β-rule and the creation of child processes. The set of substitutions sent by each child is stored in a map *subst* (Line 2), initially undefined everywhere (f⊥). This procedure closes the branch (Line 13) if there exists a substitution θ that agrees with one closing substitution of each child process, i.e., for each child process, the process has reported a substitution σ such that σ(X) = θ(X) for any variable X in the domain of σ. If no such substitution can be found


after all the children have closed their branches, then one closing substitution <sup>σ</sup> <sup>∈</sup> *subst* is picked arbitrarily (Line 18) and sent to all the children (which are at that point executing waitF orP arent) to restart their proof attempts. With the additional constraint of the substitution σ, the new proof attempts may fail, hence the necessity for backtracking among candidate substitutions Θbacktrack (Line 5 and 6). At the end, if all the substitutions were tried and failed, the process sends a failure message (symbolized by ∅) to its parent.

Thus, concurrency and backtracking are used to prevent incompleteness resulting from unfair instantiation of free variables. Another potential source of unfairness is the γ-rule, when applied more than once to a universal formula (reintroduction). This may be needed to find a refutation, but unbounded reintroductions would lead to unfairness. Iterative deepening [16] is used to guard against this: a bound limits the number of reintroductions on any single branch, and if no proof is found, the bound is increased and the proof search restarted.

Figure 2 illustrates the interactions between processes for the problem in Fig. 1, and shows how concurrency helps ensure fairness. It describes the parent process, in the top box, and below, the two children processes created upon application of the β-rule. Dotted lines separate successive states of a process (i.e., Procedures 1, 2 and 3 seen above), while arrows and boxes represent substitution exchanges. The number above each arrow indicates the chronology of the interactions. After both children have returned a substitution (1), the parent arbitrarily chooses one of them, starting with <sup>X</sup> → <sup>b</sup>, and sends it to the children (2). Since this substitution prevents closure in the right branch (3), the parent later backtracks and sends the other substitution <sup>X</sup> → <sup>a</sup> (4), allowing both children (5) and then the parent to close successfully.

### **3 Implementation and Experimental Results**

*Implementation.* The procedures presented in Sect. <sup>2</sup> are implemented in the Go´eland prover<sup>1</sup> using the Go language. Go supports concurrency and parallelism, based on lightweight execution threads called *goroutines* [20]. Goroutines

<sup>1</sup> Available at: https://github.com/GoelandProver/Goeland/releases/tag/v1.0.0-beta.

```
Procedure 3: waitF orChildren
  Data: a tableau T, a set Θsent of substitutions sent by this process to its
        parent, a set Θbacktrack of substitutions used for backtracking
1 begin
2 var subst ← f⊥
3 while ∃π ∈ Πchildren. subst[π] = ⊥ do
4 π ? subst[π]
5 if subst[π] = ∅ then
6 if ∃θ ∈ Θbacktrack then
7 for π ∈ Πchildren do π ! θ;
8 waitF orChildren(T, Θsent, Θbacktrack \ {θ})
9 else
10 for π ∈ Πchildren do kill π;
11 πparent ! ∅
12 return
13 if ∃θ, agreement(θ, subst) then
14 πparent ! {θ}
15 for π ∈ Πchildren do kill π;
16 waitF orP arent(T, Θsent ∪ {θ})
17 else
18 σ ← choice(subst)
19 for π ∈ Πchildren do π ! σ;
20 waitF orChildren(T, Θsent, Θbacktrack ∪ -

                                          π subst[π] \ {σ}))
```
are executed according to a so-called *hybrid threading* (or <sup>M</sup> : <sup>N</sup>) model: <sup>M</sup> goroutines are executed over N effective threads and scheduling is managed by both the Go runtime and the operating system. This threading model allows the execution of a large number of goroutines with a reasonable consumption of system resources. Goroutines use channels to exchange messages, so that the implementation is close to the presentation of Sect. 2.

Go´eland has, for the time being, no dedicated mechanism for equality reasoning. However, we have implemented an extension that implements deduction modulo theory [9], i.e., transforms axioms into rewrite rules over propositions and terms. Deduction modulo theory has proved very useful to improve proof search when integrated into usual automated proof techniques [5], and also produces excellent results with manually-defined rewrite rules [6,7]. In Go´eland, deduction modulo theory selects some axioms on the basis of a simple syntactic criterion and replaces them by rewrite rules.

*Experimental Results.* We evaluated Go´eland on two problems categories with FOF theorems in the TPTP library (v7.4.0): syntactic problems without equality (SYN) and problems of set theory (SET). The former was chosen for its elementary nature, whereas the latter was picked primarily to evaluate the performance of the deduction modulo theory, as the axioms of set theory are good

**Fig. 2.** Proof search and resulting proof for <sup>P</sup>(a) ∧ ¬P(b) ∧ ∀x.(P(x) ⇔ ∀y.P(y))

targets for rewriting. We compared the results with those of five other provers: tableau-based provers Zenon (v0.8.5), Princess (v2021-05-10) and LeoIII (v1.6), as well as saturation-based provers E (v2.6) and Vampire (v4.6.1). Experiments were executed on a computer equipped with an Intel Xeon E5-2680 v4 2.4GHz 2×14-core processor and 128 GB of memory. Each proof attempt was limited to 300 s. Table 1 and Fig. 3 report the results. Table 1 shows the number of problems solved by each prover, the cumulative time, and the number of problems solved by a given prover but not by Go´eland (+) and conversely (−). Figure 3 presents the cumulative time required to solve the number of problems.

As can be observed, the results of Go´eland are comparable to, or slightly better than those of other tableau-based provers on problems from SYN, while saturation theorem provers achieve the best results. On this category, the axioms do not trigger deduction modulo theory rewriting rules, hence the similar results of Go´eland and Go´eland+DMT. On SET, Go´eland+DMT obtains significantly better results than other tableau-based provers. This confirms the previous results on the performance of deduction modulo theory for set theory [6,7].


**Table 1.** Experimental results over the TPTP library

**Fig. 3.** Cumulative time per problem solved between Go´eland, Go´eland+DMT(GDMT), Zenon, Princess, LeoIII, E, and Vampire

## **4 Conclusion**

We have presented a concurrent proof search procedure for tableaux in firstorder logic with the aim of ensuring a fair exploration of the search space. This procedure has been implemented in the prover Go´eland. This tool is still in an early stage, and (with the exception of deduction modulo theory) implements only the most basic functionalities, yet empirical results are encouraging. We plan on adding functionalities such as equality reasoning, arithmetic reasoning, and support for polymorphism to Go´eland, which should increase its usability and performance. The integration of these functionalities in the context of a concurrent prover seems to be a promising line of research. Further investigation is also needed to prove the fairness, and therefore completeness, of our procedure.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Binary Codes that Do Not Preserve Primitivity**

Stˇ <sup>ˇ</sup> ep´an Holub1(B) , Martin Raˇska<sup>1</sup> , and Stˇ <sup>ˇ</sup> ep´an Starosta2(B)

<sup>1</sup> Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic holub@karlin.mff.cuni.cz

<sup>2</sup> Faculty of Information Technology, Czech Technical University in Prague, Prague, Czech Republic

stepan.starosta@fit.cvut.cz

**Abstract.** A code *X* is not primitivity preserving if there is a primitive list **<sup>w</sup>** <sup>∈</sup> lists *<sup>X</sup>* whose concatenation is imprimitive. We formalize a full characterization of such codes in the binary case in the proof assistant Isabelle/HOL. Part of the formalization, interesting on its own, is a description of {*x, y*}-interpretations of the square *xx* if |*y*|≤|*x*|. We also provide a formalized parametric solution of the related equation *x*<sup>j</sup> *y*<sup>k</sup> = *z*- .

# **1 Introduction**

Consider two words abba and b. It is possible to concatenate (several copies of) them as b·abba·b, and obtain a power of a third word, namely a square bab·bab of bab. In this paper, we completely describe all ways how this can happen for two words, and formalize it in Isabelle/HOL.

The corresponding theory has a long history. The question can be formulated as solving equations in three variables of the special form W(x, y) = z where the left hand side is a sequence of x's and y's, and - ≥ 2. The seminal result in this direction is the paper by R. C. Lyndon and M.-P. Sch¨utzenberger [10] from 1962, which solves in a more general setting of free groups the equation x<sup>j</sup>y<sup>k</sup> = z- with 2 ≤ j, k, -. It was followed, in 1967, by a partial answer to our question by A. Lentin and M.-P. Sch¨utzenberger [9]. A complete characterization of monoids generated by three words was provided by L. G. Budkina and Al. A. Markov in 1973 [4]. The characterization was later, in 1976, reproved in a different way by Lentin's student J.-P. Spehner in his Ph.D. thesis [14], which even explicitly mentions the answer to the present question. See also a comparison of the two classifications by T. Harju and D. Nowotka [7]. In 1985, the result was again reproved by E. Barbin-Le Rest and M. Le Rest [1], this time specifically focusing on our question. Their paper contains a characterization of binary interpretations of a square as a crucial tool. The latter combinatorial result is interesting on its own, but is very little known. In addition to the fact that, as far as we know, the proof is not available in English, it has to be reconstructed from Th´eor`eme 2.1 and Lemme 3.1 in [1], it is long, technical and little structured, with many

c The Author(s) 2022 intuitive steps that have to be clarified. It is symptomatic, for example, that Maˇnuch [11] cites the claim as essentially equivalent to his desired result but nevertheless provides a different, shorter but similarly technical proof.

The fact that several authors opted to provide their own proof of the already known result, and that even a weaker result was republished as new shows that the existing proof was not considered sufficiently convincing and approachable. This makes the topic a perfect candidate for formalization. The proof we present here naturally contains some ideas of the proof from [1] but is significantly different. Our main objective was to follow the basic methodological requirement of a good formalization, namely to identify claims that are needed in the proof and formulate them as separate lemmas and as generally as possible so that they can be reused not only in the proof but also later. Moreover, the formalization naturally forced us to consider carefully the overall strategy of the proof (which is rather lost behind technical details of published works on this topic). Under Isabelle's pressure we eventually arrived at a hopefully clear proof structure which includes a simple, but probably innovative use of the idea of "gluing" words. The analysis of the proof is therefore another, and we believe the most important contribution of our formalization, in addition to the mere certainty that there are no gaps in the proof.

In addition, we provide a complete parametric solution of the equation x<sup>k</sup>y<sup>j</sup> = z for arbitrary j, k and -, a classification which is not very difficult, but maybe too complicated to be useful in a mere unverified paper form.

The formalization presented here is an organic part of a larger project of formalization of combinatorics of words (see an introductory description in [8]). We are not aware of a similar formalization project in any proof assistant. The existence of the underlying library, which in turn extends the theories of "List" and "HOL-Library.Sublist" from the standard Isabelle distribution, critically contributes to a smooth formalization which is getting fairly close to the way a human paper proof would look like, outsourcing technicalities to the (reusable) background. We accompany claims in this text with names of their formalized counterparts.

### **2 Basic Facts and Notation**

Let Σ be an arbitrary set. Lists (i.e. finite sequences) [x1, x2,...,xn] of elements x<sup>i</sup> ∈ Σ are called *words* over Σ. The set of all words over Σ is usually denoted as Σ∗, using the Kleene star. A notorious ambivalence of this notation is related to the situation when we consider a set of words X ⊂ Σ∗, and are interested in lists over X. They should be denoted as elements of X∗. However, X<sup>∗</sup> usually means something else (in the theory of rational languages), namely the set of all words in Σ<sup>∗</sup> generated by the set X. To avoid the confusion, we will therefore follow the notation used in the formalization in Isabelle, and write lists X instead, to make clear that the entries of an element of lists X are themselves words. In order to further help to distinguish words over the basic alphabet from lists over a set of words, we shall use boldface variables for the latter. In particular, it is important to keep in mind the difference between a letter a and the word [a] of length one, the distinction which is usually glossed over lightly in the literature on combinatorics on words. The set of words over Σ generated by X is then denoted as X. The (associative) binary operation of concatenation of two words u and v is denoted by u · v. We prefer this algebraic notation to the Isabelle's original @. Moreover, we shall often omit the dot as usual. If **u** = [x1, x2,...,xn] ∈ lists X is a list of words, then we write concat **u** for <sup>x</sup><sup>1</sup> · <sup>x</sup><sup>2</sup> ··· <sup>x</sup>n. We write <sup>ε</sup> for the empty list, and <sup>u</sup><sup>k</sup> for the concatenation of <sup>k</sup> copies of <sup>u</sup> (we use <sup>u</sup>@<sup>k</sup> in the formalization). We write <sup>u</sup> <sup>≤</sup><sup>p</sup> <sup>v</sup>, u <<sup>p</sup> <sup>v</sup>, u ≤<sup>s</sup> v, u <<sup>s</sup> v, and u ≤<sup>f</sup> v to denote that u is a *prefix*, a *strict prefix*, *suffix*, *strict suffix* and *factor* (that is, a contiguous sublist) respectively. A word is *primitive* if it is nonempty and not a power of a shorter word. Otherwise, we call it *imprimitive*. Each nonempty word w is a power of a unique primitive word ρ w, its *primitive root*. A nonempty word r is a *periodic root* of a word w if w ≤<sup>p</sup> r · w. This is equivalent to w being a prefix of the right infinite power of r, denoted r<sup>ω</sup>. Note that we deal with finite words only, and we use the notation r<sup>ω</sup> only as a convenient shortcut for "a sufficiently long power of r". Two words u and v are *conjugate*, we write u ∼ v, if u = rq and v = qr for some words r and q. Note that conjugation is an equivalence whose classes are also called *cyclic words*. A word u is a *cyclic factor* of w if it is a factor of some conjugate of w. A set of words X is a *code* if its elements do not satisfy any nontrivial relation, that is, they are a basis of a free semigroup. For a two-element set {x, y}, this is equivalent to x and y being non-commuting, i.e., xy = yx, and/or to ρ x = ρ y. An important characterization of a semigroup S of words to be free is the *stability condition* which is the implication u, v, uz, zv ∈ S =⇒ z ∈ S. The longest common prefix of u and v is denoted by u ∧<sup>p</sup> v. If {x, y} is a (binary) code, then (x · w) ∧<sup>p</sup> (y · w ) = xy ∧<sup>p</sup> yx for any w, w ∈ {x, y} sufficiently long. We explain some elementary facts from combinatorics on words used in this article in more detail in Sect. 8.

## **3 Main Theorem**

Let us introduce the central definition of the paper.

**Definition 1.** *We say that a set* X *of words is* primitivity preserving *if there is no word* **w** ∈ lists X *such that*


Note that our definition does not take into account singletons **w** = [x]. In particular, X can be primitivity preserving even if some x ∈ X is imprimitive. Nevertheless, in the binary case, we will also provide some information about the cases when one or both elements of the code have to be primitive.

In [12], V. Mitrana formulates the primitivity of a set in terms of morphisms, and shows that X is primitivity preserving if and only if it is the minimal set of generators of a "pure monoid", cf. [3, p. 276]. This brings about a wider concept of morphisms preserving a given property, most classically square-freeness, see for example a characterization of square-free morphisms over three letters by M. Crochemore [5].

The target claim of our formalization is the following characterization of words witnessing that a binary code is not primitivity preserving:

**Theorem 1 (**bin imprim code**).** *Let* B = {x, y} *be a code that is not primitivity preserving. Then there are integers* j ≥ 1 *and* k ≥ 1*, with* k = 1 *or* j = 1*, such that the following conditions are equivalent for any* **w** ∈ lists B *with* |**w**| ≥ 2*:*


*Moreover, assuming* |y|≤|x|*,*

*– if* j ≥ 2*, then* j = 2 *and* k = 1*, and both* x *and* y *are primitive; – if* k ≥ 2*, then* j = 1 *and* x *is primitive.*

*Proof.* Let **w** be a word witnessing that B is not primitivity preserving. That is, |**w**| ≥ 2, **w** is primitive, and concat **w** is imprimitive. Since [x] <sup>j</sup> [y] <sup>k</sup> and [y]

are conjugate, we can suppose, without loss of generality, that |y|≤|x|. First, we want to show that **w** is conjugate with [x] <sup>j</sup> [y] <sup>k</sup> for some j, k <sup>≥</sup> <sup>1</sup> such that k = 1 or j = 1. Since **w** is primitive and of length at least two, it contains both x and y. If it contains one of these letters exactly once, then **w** is clearly conjugate with [x] <sup>j</sup> [y] <sup>k</sup> for j = 1 or k = 1. Therefore, the difficult part is to show that no primitive **w** with concat **w** imprimitive can contain both letters at least twice. This is the main task of the rest of the paper, which is finally accomplished by Theorem 4 claiming that words that contain at least two occurrences of x are conjugate with [x, x, y]. To complete the proof of the first part of the theorem, it remains to show that j and k do not depend on **w**. This follows from Lemma 1.

<sup>k</sup>[x] j

Note that the imprimitivity of concat **w** induces the equality x<sup>j</sup>y<sup>k</sup> = z- for some z and - ≥ 2. The already mentioned seminal result of Lyndon and Sch¨utzenberger shows that j and k cannot be simultaneously at least two, since otherwise x and y commute. For the same reason, considering its primitive root, the word y is primitive if j ≥ 2. Similarly, x is primitive if k ≥ 2. The primitivity of x when j = 2 is a part of Theorem 4. 

We start by giving a complete parametric solution of the equation x<sup>j</sup>y<sup>k</sup> = z- in the following theorem. This will eventually yield, after the proof of Theorem 1 is completed, a full description of not primitivity preserving binary codes. Since the equation is mirror symmetric, we omit symmetric cases by assuming |y|≤|x|.

**Theorem 2 (**LS parametric solution**).** *Let* - ≥ 2*,* j, k ≥ 1 *and* |y|≤|x|*. The equality* x<sup>j</sup>y<sup>k</sup> = z *holds if and only if one of the following cases takes place:*

*A. There exists a word* r*, and integers* m, n, t ≥ 0 *such that*

$$\begin{aligned} mj + nk &= t\ell, \quad \text{and} \\ x = r^m, \quad y = r^n, \quad z = r^t; \end{aligned}$$

*B.* j = k = 1 *and there exist non-commuting words* r *and* q*, and integers* m, n ≥ 0 *such that*

$$\begin{aligned} m+n+1=\ell, \quad ∧ \\ x=(rq)^m r, \quad y=q(rq)^n, \quad z=rq; \end{aligned}$$

*C.* j = - = 2*,* k = 1 *and there exist non-commuting words* r *and* q *and an integer* m ≥ 2 *such that*

$$x = (rq)^m r, \quad y = qrrq, \quad z = (rq)^m rrq;$$

*D.* j = 1 *and* k ≥ 2 *and there exist non-commuting words* r *and* q *such that*

$$x = (qr^k)^{\ell - 1}q, \quad y = r, \quad z = qr^k;$$

*E.* j = 1 *and* k ≥ 2 *and there are non-commuting words* r *and* q*, an integer* m ≥ 1 *such that*

$$x = \left(qr(r(qr)^m)^{k-1}\right)^{\ell-2}qr(r(qr)^m)^{k-2}rq, \quad y = r(qr)^m, \quad z = qr(r(qr)^m)^{k-1}.$$

*Proof.* If x and y commute, then all three words commute, hence they are a power of a common word. A length argument yields the solution A.

Assume now that {x, y} is a code. Then no pair of words x, y and z commutes. We have shown in the overview of the proof of Theorem 1 that j = 1 or k = 1 by the Lyndon-Sch¨utzenberger theorem. The solution is then split into several cases.

*Case 1* : j = k = 1.

Let m and r be such that z<sup>m</sup>r = x with r a strict prefix of z. By setting z = rq, we obtain the solution B with n = -− m − 1.

*Case 2* : j ≥ 2, k = 1. Since |y|≤|x| and -≥ 2, we have

$$2|z| \le |z^{\ell}| = |x^{j}| + |y| < 2|x^{j}|,$$

so z is a strict prefix of x<sup>j</sup> .

As x<sup>j</sup> has periodic roots both z and x, and z does not commute with x, the Periodicity lemma implies <sup>|</sup>x<sup>j</sup> <sup>|</sup> <sup>&</sup>lt; <sup>|</sup>z<sup>|</sup> <sup>+</sup> <sup>|</sup>x|. That is, <sup>z</sup> <sup>=</sup> <sup>x</sup><sup>j</sup>−<sup>1</sup>u, <sup>x</sup><sup>j</sup> <sup>=</sup> zv and x = uv for some nonempty words u and v. As v is a prefix of z, it is also a prefix of x. Therefore, we have

$$x = uv = vu'$$

for some word u . This is a well known conjugation equality which implies u = rq, <sup>u</sup> <sup>=</sup> qr and <sup>v</sup> = (rq)<sup>n</sup><sup>r</sup> for some words <sup>r</sup>, <sup>q</sup> and an integer <sup>n</sup> <sup>≥</sup> 0.

We have

$$|j|x| + |y| = |x^j y| = |z^\ell| = \ell(j-1)|x| + \ell|u|,$$

and thus |y| = (j − - − j)|x| + -|u|. Since |y|≤|x|, |u| > 0, j ≥ 2, and - ≥ 2, it follows that j − - <sup>−</sup> <sup>j</sup> = 0, which implies <sup>j</sup> <sup>=</sup> <sup>l</sup> = 2. We therefore have <sup>x</sup><sup>2</sup><sup>y</sup> <sup>=</sup> <sup>z</sup><sup>2</sup> and x<sup>2</sup> = zv, hence vy = z.

Combining u = rq, u = qr, and v = (rq)nr with x = vu , z = xj−<sup>1</sup>u = xu = vu u, and vy = z, we obtain the solution C with m = n + 1. The assumption |y|≤|x| implies m ≥ 2.

*Case 3* : <sup>j</sup> = 1, k <sup>≥</sup> <sup>2</sup>, y<sup>k</sup>≤sz.

We have z = qy<sup>k</sup> for some word q. Noticing that x = z-<sup>−</sup><sup>1</sup>q yields the solution D.

*Case 4* : <sup>j</sup> = 1, k <sup>≥</sup> <sup>2</sup>,z <<sup>s</sup> <sup>y</sup><sup>k</sup>.

This case is analogous to the second part of Case 2. Using the Periodicity lemma, we obtain uy<sup>k</sup>−<sup>1</sup> = z, y<sup>k</sup> = vz, and y = vu with nonempty u and v. As v is a suffix of z, it is also a suffix of y, and we have y = vu = u v for some u . Plugging the solution of the last conjugation equality, namely u = rq, u = qr, v = (rq)<sup>n</sup>r, into y = u v, z = uy<sup>k</sup>−<sup>1</sup> and z-<sup>−</sup><sup>1</sup> = xv gives the solution E with m = n + 1.

Finally, the words r and q do not commute since x and y, which are generated by r and q, do not commute.

The proof is completed by a direct verification of the converse. 

We now show that, for a given not primitivity preserving binary code, there is a unique pair of exponents (j, k) such that x<sup>j</sup>y<sup>k</sup> is imprimitive.

**Lemma 1 (**LS unique**).** *Let* B = {x, y} *be a code. Assume* j, k, j , k ≥ 1*. If both* x<sup>j</sup>y<sup>k</sup> *and* x<sup>j</sup>- y<sup>k</sup>- *are imprimitive, then* j = j *and* k = k *.*

*Proof.* Let z1, z<sup>2</sup> be primitive words and -, -≥ 2 be such that

$$x^j y^k = z\_1^\ell \quad \text{and} \quad x^{j'} y^{k'} = z\_2^{\ell'}.\tag{1}$$

Since B is a code, the words x and y do not commute. We proceed by contradiction.

*Case 1* : First, assume that j = j and k = k .

Let, without loss of generality, k<k . From (1) we obtain z- 1y<sup>k</sup>- <sup>−</sup><sup>k</sup> = z-- <sup>2</sup> . The case k − k ≥ 2 is impossible due to the Lyndon-Sch¨utzenberger theorem. Hence k − k = 1. This is another place where the formalization triggered a simple and nice general lemma (easily provable by the Periodicity lemma) which will turn out to be useful also in the proof of Theorem 4. Namely, the lemma imprim\_ext\_suf\_comm claims that if both uv, and uvv are imprimitive, then u and v commute. We apply this lemma to u = x<sup>j</sup>y<sup>k</sup>−<sup>1</sup> and v = y, obtaining a contradiction with the assumption that x and y do not commute.

*Case 2.* The case k = k and j = j is symmetric to Case 1.

*Case 3.* Let finally j = j and k = k . The Lyndon-Sch¨utzenberger theorem implies that either j or k is one, and similarly either j or k is one. We can therefore assume that k = j = 1 and k , j ≥ 2. Moreover, we can assume that <sup>|</sup>y|≤|x|. Indeed, in the opposite case, we can consider the words <sup>y</sup>kx<sup>j</sup> and <sup>y</sup>k- xj- instead, which are also both imprimitive.

Theorem 2 now allows only the case C for the equality xjy = z- <sup>1</sup>. We therefore have j = - = 2 and <sup>x</sup> = (rq)mr, <sup>y</sup> <sup>=</sup> qrrq for an integer <sup>m</sup> <sup>≥</sup> 2 and some noncommuting words r and q. Since y = qrrq is a suffix of z- <sup>2</sup>, this implies that z<sup>2</sup> and rq do not commute. Consider the word <sup>x</sup> · qr = (rq)mrqr, which is a prefix of xy, and therefore also of z- <sup>2</sup>. This means that x · qr has two periodic roots, namely rq and z2, and the Periodicity lemma implies that |x · qr| < |rq| + |z2|. Hence x is shorter than z2. The equality xy<sup>k</sup>- = z-- <sup>2</sup> , with - ≥ 2, now implies on one hand that rqrq is a prefix of z2, and on the other hand that z<sup>2</sup> is a suffix of y<sup>k</sup>- . It follows that rqrq is a factor of (qrrq)<sup>k</sup>. Hence rqrq and qrrq are conjugate, thus they both have a period of length |rq|, which implies qr = rq. This is a contradiction. 

The rest of the paper, and therefore also of the proof of Theorem 1, is organized as follows. In Sect. 4, we introduce a general theory of interpretations, which is behind the main idea of the proof, and apply it to the (relatively simple) case of a binary code with words of the same length. In Sect. 5 we characterize the unique disjoint extendable {x, y}-interpretation of the square of the longer word x. This is a result of independent interest, and also the cornerstone of the proof of Theorem 1 which is completed in Sect. 6 by showing that a word containing at least two x's witnessing that {x, y} is not primitivity preserving is conjugate with [x, x, y].

#### **4 Interpretations and the Main Idea**

Let X be a code, let u be a factor of concat **w** for some **w** ∈ lists X. The natural question is to decide how u can be produced as a factor of words from X, or, in other words, how it can be interpreted in terms of X. This motivates the following definition.

**Definition 2.** *Let* X *be a set of words over* Σ*. We say that the triple* (p, s, **w**) ∈ Σ<sup>∗</sup> × Σ<sup>∗</sup> × lists X *is an* X-interpretation *of a word* u ∈ Σ<sup>∗</sup> *if*

*–* **w** *is nonempty; –* p · u · s = concat **w***; –* p <<sup>p</sup> hd **w** *and –* s <<sup>s</sup> last **w***.*

The definition is illustrated by the following figure, where **w** = [w1, w2, w3, w4]:


The second condition of the definition motivates the notation pus ∼<sup>I</sup> **w** for the situation when (p, s, **w**) is an X-interpretation of u.

*Remark 1.* For sake of historical reference, we remark that our definition of Xinterpretation differs from the one used in [1]. Their formulation of the situation depicted by the above figure would be that u is interpreted by the triple (s , w<sup>2</sup> · w3, p ) where p · s = w<sup>1</sup> and p · s = w4. This is less convenient for two reasons. First, the decomposition of w<sup>2</sup> · w<sup>3</sup> into [w2, w3] is only implicit here (and even ambiguous if X is not a code). Second, while it is required that the the words p and s are a prefix and a suffix, respectively, of an element from X, the identity of that element is left open, and has to be specified separately.

If u is a nonempty element of X and u = concat **u** for **u** ∈ lists X, then the X-interpretation εuε ∼<sup>I</sup> **u** is called *trivial*. Note that the trivial Xinterpretation is unique if X is a code.

As nontrivial X-interpretations of elements from X are of particular interest, the following two concepts are useful.

**Definition 3.** *An* X*-interpretation* pus ∼<sup>I</sup> **w** *of* u = concat **u** *is called*


Note that a disjoint X-interpretation is not trivial, and that being disjoint is relative to a chosen factorization **u** of u (which is nevertheless unique if X is a code).

The definitions above are naturally motivated by **the main idea** of the characterization of sets X that do not preserve primitivity, which dates back to Lentin and Sch¨utzenberger [9]. If **w** is primitive, while concat **w** is imprimitive, say concat **<sup>w</sup>** <sup>=</sup> <sup>z</sup><sup>k</sup>, <sup>k</sup> <sup>≥</sup> 2, then the shift by <sup>z</sup> provides a nontrivial and extendable X-interpretation of concat **w**. (In fact, k−1 such nontrivial interpretations). Moreover, the following lemma, formulated in a more general setting of two words **w**<sup>1</sup> and **w**2, implies that the X-interpretation is disjoint if X is a code.

**Lemma 2 (**shift interpret*,* shift disjoint**).** *Let* X *be a code. Let* **w**1, **w**<sup>2</sup> ∈ lists X *be such that* z · concat **w**<sup>1</sup> = concat **w**<sup>2</sup> · z *where* z /∈ X*. Then* <sup>z</sup> · concat **<sup>v</sup>**<sup>1</sup> <sup>=</sup> concat **<sup>v</sup>**2*, whenever* **<sup>v</sup>**<sup>1</sup> <sup>≤</sup><sup>p</sup> **<sup>w</sup>**<sup>n</sup> <sup>1</sup> *and* **<sup>v</sup>**<sup>2</sup> <sup>≤</sup><sup>p</sup> **<sup>w</sup>**<sup>n</sup> <sup>2</sup> *,* n ∈ N*.*

*In particular,* concat **u** *has a disjoint extendable* X*-interpretation for any prefix* **u** *of* **w**1*.*

The excluded possibility is illustrated by the following figure.

concat **v**<sup>1</sup>

*Proof.* First, note that <sup>z</sup> ·concat **<sup>w</sup>**<sup>n</sup> <sup>1</sup> = concat **w**<sup>n</sup> <sup>2</sup> · <sup>z</sup> for any <sup>n</sup>. Let **<sup>w</sup>**<sup>n</sup> <sup>1</sup> = **v**<sup>1</sup> ·**v** 1 and **w**<sup>n</sup> <sup>2</sup> = **v**<sup>2</sup> · **v** <sup>2</sup>. If z · concat **v**<sup>1</sup> = concat **v**2, then also concat **v** <sup>2</sup> · z = concat **v** <sup>1</sup>. This contradicts z /∈ X by the stability condition.

An extendable X-interpretation of **u** is induced by the fact that concat **u** is covered by concat(**w**<sup>2</sup> · **w**2). The interpretation is disjoint by the first part of the proof. 

In order to apply the above lemma to the imprimitive concat **w** = z<sup>k</sup> of a primitive **w**, set **w**<sup>1</sup> = **w**<sup>2</sup> = **w**. The assumption z /∈ X follows from the primitivity of **<sup>w</sup>**: indeed, if <sup>z</sup> <sup>=</sup> concat **<sup>z</sup>**, with **<sup>z</sup>** <sup>∈</sup> lists <sup>X</sup>, then **<sup>w</sup>** <sup>=</sup> **<sup>z</sup>**<sup>k</sup> since B is a code.

We first apply the main idea to a relatively simple case of nontrivial {x, y} interpretation of the word x · y where x and y are of the same length.

**Lemma 3 (**uniform square interp**).** *Let* B = {x, y} *be a code with* |x| = |y|*. Let* p (x · y) s ∼<sup>I</sup> **v** *be a nontrivial* B*-interpretation. Then* **v** = [x, y, x] *or* **v** = [y, x, y] *and* x · y *is imprimitive.*

*Proof.* From p · x · y · s = concat **v**, it follows, by a length argument, that |**v**| is three. A straightforward way to prove the claim is to consider all eight possible candidates. In each case, it is then a routine few line proof that shows that x = y, unless **v** = [x, y, x] or **v** = [y, x, y], which we omit. In the latter cases, x · y is a nontrivial factor of its square (x · y) · (x · y), which yields the imprimitivity of x · y. 

The previous (sketch of the) proof nicely illustrates on a small scale the advantages of formalization. It is not necessary to choose between a tedious elementary proof for sake of completeness on one hand, and the suspicion that something was missed on the other hand (leaving aside that the same suspicion typically remains even after the tedious proof). A bit ironically, the most difficult part of the formalization is to show that **v** is indeed of length three, which needs no further justification in a human proof.

We have the following corollary which is a variant of Theorem 4, and also illustrates the main idea of its proof.

**Lemma 4 (**bin imprim not conjug**).** *Let* B = {x, y} *be a binary code with* |x| = |y|*. If* **w** ∈ lists B *is such that* |**w**| ≥ 2*,* **w** *is primitive, and* concat **w** *is imprimitive, then* x *and* y *are not conjugate.*

*Proof.* Since **w** is primitive and of length at least two, it contains both letters x and y. Therefore, it has either [x, y] or [y, x] as a factor. The imprimitivity of concat **w** yields a nontrivial B-interpretation of x · y, which implies that x · y is not primitive by Lemma 3.

Let x and y be conjugate, and let x = r · q and y = q · r. Since x· y = r · q · q · r is imprimitive, also r · r · q · q is imprimitive. Then r and q commute by the theorem of Lyndon and Sch¨utzenberger, a contradiction with x = y. 

## **5 Binary Interpretation of a Square**

Let B = {x, y} be a code such that |y|≤|x|. In accordance with the main idea, the core technical component of the proof is the description of the disjoint extendable B-interpretations of the square x<sup>2</sup>. This is a very nice result which is relatively simple to state but difficult to prove, and which is valuable on its own. As we mentioned already, it can be obtained from Th´eor`eme 2.1 and Lemme 3.1 in [1].

**Theorem 3 (**square interp ext.sq ext interp**).** *Let* B = {x, y} *be a code such that* |y|≤|x|*, both* x *and* y *are primitive, and* x *and* y *are not conjugate. Let* p (x · x) s ∼<sup>I</sup> **w** *be a disjoint extendable* B*-interpretation. Then*

$$\mathbf{w} = [x, y, x], \qquad \qquad s \cdot p = y, \qquad \qquad p \cdot x = x \cdot s. \tag{1}$$

In order to appreciate the theorem, note that the definition of interpretation implies

$$p \cdot x \cdot x \cdot s = x \cdot y \cdot x,$$

hence <sup>x</sup> · <sup>y</sup> · <sup>x</sup> = (<sup>p</sup> · <sup>x</sup>)<sup>2</sup>. This will turn out to be the only way how primitivity may not be preserved if x occurs at least twice in **w**. Here is an example with x = 01010 and y = 1001:


*Proof.* By the definition of a disjoint interpretation, we have p·x·x·s = concat **w**, where p = ε and s = ε. A length argument implies that **w** has length at least three. Since a primitive word is not a nontrivial factor of its square, we have **w** = [hd **w**] · [y] <sup>k</sup> · [last **<sup>w</sup>**], with <sup>k</sup> <sup>≥</sup> 1. Since the interpretation is disjoint, we can split the equality into <sup>p</sup> · <sup>x</sup> <sup>=</sup> hd **<sup>w</sup>** · <sup>y</sup><sup>m</sup> · <sup>u</sup> and <sup>x</sup> · <sup>s</sup> <sup>=</sup> <sup>v</sup> · <sup>y</sup>- · last **w**, where y = u · v, both u and v are nonempty, and k = - + m + 1. We want to show hd **w** = last **w** = x and m = - = 0. The situation is mirror symmetric so we can solve cases two at a time.

If hd **w** = last **w** = y, then powers of x and y share a factor of length at least |x| + |y|. Since they are primitive, this implies that they are conjugate, a contradiction. The same argument applies when - ≥ 1 and hd **w** = y (if m ≥ 1 and last **w** = y respectively). Therefore, in order to prove hd **w** = last **w** = x, it remains to exclude the case hd **w** = y, - = 0 and last **w** = x (last **w** = y, m = 0 and hd **w** = x respectively). This is covered by one of the technical lemmas that we single out:

**Lemma 5 (**pref suf pers short**).** *Let* x ≤<sup>p</sup> v·x*,* x ≤<sup>s</sup> p·u·v·u *and* |x| > |v · u| *with* p ∈ {u, v}*. Then* u · v = v · u*.*

This lemma indeed excludes the case we wanted to exclude, since the conclusion implies that y is not primitive. We skip the proof of the lemma here and make instead an informal comment. Note that v is a period root of x. In other words, <sup>x</sup> is a factor of <sup>v</sup>ω. Therefore, with the stronger assumption that <sup>v</sup> ·u·<sup>v</sup> is a factor of x, the conclusion follows easily by the familiar principle that v being a factor of v<sup>ω</sup> "synchronizes" primitive roots of v. Lemma 5 then exemplifies one of the virtues of formalization, which makes it easy to generalize auxiliary lemmas, often just by following the most natural proof and checking its minimal necessary assumptions.

Now we have hd **<sup>w</sup>** <sup>=</sup> last **<sup>w</sup>** <sup>=</sup> <sup>x</sup>, hence <sup>p</sup>·<sup>x</sup> <sup>=</sup> <sup>x</sup>·y<sup>m</sup>·<sup>u</sup> and <sup>x</sup>·<sup>s</sup> <sup>=</sup> <sup>v</sup>·y- ·x. The natural way to describe this scenario is to observe that x has both the (prefix) period root <sup>v</sup> · <sup>y</sup>-, and the suffix period root <sup>y</sup><sup>m</sup> · <sup>u</sup>. Using again Lemma 5, we exclude situations when - = 0 and m ≥ 1 (m = 0 and - ≥ 1 resp.). It therefore remains to deal with the case when both m and are positive. We divide this into four lemmas according to the size of the overlap the prefix <sup>v</sup> · <sup>y</sup> and the suffix <sup>y</sup><sup>m</sup> · <sup>u</sup> have in <sup>x</sup>. More exactly, the cases are:

$$\begin{array}{l} - \left| v \cdot y^{\ell} \right| + \left| y^{m} \cdot u \right| \le |x| \\ - \left| x \right| < \left| v \cdot y^{\ell} \right| + \left| y^{m} \cdot u \right| \le |x| + |u| \\ - \left| x \right| + \left| u \right| < \left| v \cdot y^{\ell} \right| + \left| y^{m} \cdot u \right| < |x| + \left| u \cdot v \right| \\ - \left| x \right| + \left| u \cdot v \right| \le \left| v \cdot y^{\ell} \right| + \left| y^{m} \cdot u \right| \end{array}$$

and they are solved by an auxiliary lemma each. The first three cases yield that u and v commute, the first one being a straightforward application of the Periodicity lemma. The last one is also straightforward application of the "synchronization" idea. It implies that <sup>x</sup> · <sup>x</sup> is a factor of <sup>y</sup><sup>ω</sup>, a contradiction with the assumption that x and y are primitive and not conjugate. Consequently, the technical, tedious part of the whole proof is concentrated in lemmas dealing with the second, and the third case (see lemmas short\_overlap and medium\_overlap in the theory Binary\_Square\_Interpretation.thy). The corresponding proofs are further analyzed and decomposed into more elementary claims in the formalization, where further details can be found.

This completes the proof of **w** = [x, y, x]. A byproduct of the proof is the description of words x, y, p and s. Namely, there are non-commuting words r and t, and integers m, k and such that

$$x = (rt)^{m+1} \cdot r, \quad y = (tr)^{k+1} \cdot (rt)^{\ell+1}, \quad p = (rt)^{k+1}, \quad s = (tr)^{\ell+1} \cdot s$$

The second claim of the present theorem, that is, y = s · p is then equivalent to k = -, and it is an easy consequence of the assumption that the interpretation is extendable. 

#### **6 The Witness with Two** *x***'s**

In this section, we characterize words witnessing that {x, y} is not primitivity preserving and containing at least two x's.

**Theorem 4 (**bin imprim longer twice**).** *Let* B = {x, y} *be a code such that* |y|≤|x|*. Let* **w** ∈ lists {x, y} *be a primitive word which contains* x *at least twice such that concat* **w** *is imprimitive.*

*Then* **w** ∼ [x, x, y] *and both* x *and* y *are primitive.*

We divide the proof in three steps.

**The Core Case.** We first prove the claim with two additional assumptions which will be subsequently removed. Namely, the following lemma shows how the knowledge about the B-interpretation of x · x from the previous section is used. The additional assumptions are displayed as items.

**Lemma 6 (**bin imprim primitive**).** *Let* B = {x, y} *be a code with* |y|≤|x| *where*

*– both* x *and* y *are primitive,*

*and let* **w** ∈ lists B *be primitive such that* concat **w** *is imprimitive, and*

*–* [x, x] *is a cyclic factor of* **w***.*

*Then* **w** ∼ [x, x, y]*.*

*Proof.* Choosing a suitable conjugate of **w**, we can suppose, without loss of generality, that [x, x] is a prefix of **w**. Now, we want to show **w** = [x, x, y]. Proceed by contradiction and assume **w** = [x, x, y]. Since **w** is primitive, this implies **w** · [x, x, y] = [x, x, y] · **w**.

By Lemma 4, we know that x and y are not conjugate. Let concat **w** = z<sup>k</sup>, 2 ≤ k and z primitive. Lemma 2 yields a disjoint extendable B-interpretation of (concat **w**)<sup>2</sup>. In particular, the induced disjoint extendable B-interpretation of the prefix x · x is of the form p (x · x) s ∼<sup>I</sup> [x, y, x] by Theorem 3:

Let **p** be the prefix of **w** such that concat **p** · p = z. Then

concat(**p** · [x, y]) = z · (x · p), concat [x, x, y]=(x · p) <sup>2</sup>, concat **w** = z<sup>k</sup>,

and we want to show z = xp, which will imply concat([x, x, y]·**w**) = concat(**w**· [x, x, y]), hence **w** = [x, x, y] since {x, y} is a code, and both **w** and [x, x, y] are primitive.

Again, proceed by contradiction, and assume z = xp. Then, since both z and x·p are primitive, they do not commute. We now have two binary codes, namely {**w**, [x, x, y]} and {z, xp}. The following two equalities, (2) and (3) exploit the fundamental property of longest common prefixes of elements of binary codes mentioned in Sect. 2. In particular, we need the following lemma:

**Lemma 7 (**bin code lcp concat**).** *Let* X = {u0, u1} *be a binary code, and let* **z**0, **z**<sup>1</sup> ∈ lists X *be such that* concat **z**<sup>0</sup> *and* concat **z**<sup>1</sup> *are not prefixcomparable. Then*

$$(\mathbf{concat} \,\mathbf{z}\_0) \wedge\_p (\mathbf{concat} \,\mathbf{z}\_1) = \mathbf{concat} (\mathbf{z}\_0 \wedge\_p \mathbf{z}\_1) \cdot (u\_0 \wedge u\_1).$$

See Sect. 8 for more comments on this property. Denote αz,xp = z · xp ∧<sup>p</sup> xp · z. Then also <sup>α</sup>z,xp <sup>=</sup> <sup>z</sup><sup>k</sup> ·(xp)<sup>2</sup> <sup>∧</sup><sup>p</sup> (xp)<sup>2</sup> · <sup>z</sup>k. Similarly, let <sup>α</sup>x,y <sup>=</sup> <sup>x</sup> · <sup>y</sup> <sup>∧</sup><sup>p</sup> <sup>y</sup> · <sup>x</sup>. Then Lemma 7 yields

$$\begin{aligned} \alpha\_{z,xp} &= \texttt{concat}(\mathbf{w} \cdot [x,x,y]) \wedge\_p \texttt{concat}([x,x,y] \cdot \mathbf{w}) \\ &= \texttt{concat}(\mathbf{w} \cdot [x,x,y] \wedge\_p [x,x,y] \cdot \mathbf{w}) \cdot \alpha\_{x,y} \end{aligned} \tag{2}$$

and also

$$\begin{split} z \cdot \alpha\_{z,xp} &= \texttt{concat}(\mathbf{w} \cdot \mathbf{p} \cdot [x,y]) \wedge\_p \texttt{concat}(\mathbf{p} \cdot [x,y] \cdot \mathbf{w}) \\ &= \texttt{concat}(\mathbf{w} \cdot \mathbf{p} \cdot [x,y] \wedge\_p \mathbf{p} \cdot [x,y] \cdot \mathbf{w}) \cdot \alpha\_{x,y} .\end{split} \tag{3}$$

Denote

$$\mathbf{v}\_1 = \mathbf{w} \cdot [x, x, y] \wedge\_p [x, x, y] \cdot \mathbf{w}, \qquad \mathbf{v}\_2 = \mathbf{w} \cdot \mathbf{p} \cdot [x, y] \wedge\_p \mathbf{p} \cdot [x, y] \cdot \mathbf{w}.$$

From (2) and (3) we now have z · concat **v**<sup>1</sup> = concat **v**2. Since **v**<sup>1</sup> and **v**<sup>2</sup> are prefixes of some **<sup>w</sup>**<sup>n</sup>, we have a contradiction with Lemma 2. 

**Dropping the Primitivity Assumption.** We first deal with the situation when x and y are not primitive. A natural idea is to consider the primitive roots of x and y instead of x and y. This means that we replace the word **w** with R**w**, where R is the morphism mapping [x] to [ρ x] <sup>e</sup>*<sup>x</sup>* and [y] to [ρ y] e*y* where x = (ρ x)<sup>e</sup>*<sup>x</sup>* and y = (ρ y)<sup>e</sup>*<sup>y</sup>* . For example, if x = abab and y = aa, and **w** = [x, y, x]=[abab, aa, abab], then R**w** = [ab, ab, a, a, ab, ab].

Let us check which hypotheses of Lemma 6 are satisfied in the new setting, that is, for the code {ρ x, ρ y} and the word R**w**. The following facts are not difficult to see.

– concat **w** = concat(R**w**);

$$\text{-- if } [c, c], c \in \{x, y\}, \text{ is a cyclic factor } \mathbf{w}, \text{ then } [\rho \, c, \rho \, c] \text{ is a cyclic factor of } \mathcal{R}\mathbf{w}.$$

The next required property:

– if **w** is primitive, then R**w** is primitive;

deserves more attention. It triggered another little theory of our formalization which can be found in locale sings\_code. Note that it fits well into our context, since the claim is that R is a primitivity preserving morphism, which implies that its image on the singletons [x] and [y] forms a primitivity preserving set of words, see theorem code.roots\_prim\_morph.

Consequently, the only missing hypothesis preventing the use of Lemma 6 is |y|≤|x| since it may happen that |ρ x| < |ρ y|. In order to solve this difficulty, we shall ignore for a while the length difference between x and y, and obtain the following intermediate lemma.

**Lemma 8 (**bin imprim both squares*,* bin imprim both squares prim**).** *Let* B = {x, y} *be a code, and let* **w** ∈ lists B *be a primitive word such that* concat **w** *is imprimitive. Then* **w** *cannot contain both* [x, x] *and* [y, y] *as cyclic factors.*

*Proof.* Assume that **w** contains both [x, x] and [y, y] as cyclic factors.

Consider the word R**w** and the code {ρ x, ρ y}. Since R**w** contains both [ρ x, ρ x] and [ρ y, ρ y], Lemma 6 implies that R**w** is conjugate either with the word [ρ x, ρ x, ρ y] or with [ρ y, ρ y, ρ x], which is a contradiction with the assumed presence of both squares. 

**Concluding the Proof by Gluing.** It remains to deal with the existence of squares. We use an idea that is our main innovation with respect to the proof from [1], and contributes significantly to the reduction of length of the proof, and hopefully also to its increased clarity. Let **w** be a list over a set of words X. The idea is to choose one of the words, say u ∈ X, and to concatenate (or "glue") blocks of u's to words following them. For example, if **w** = [u, v, u, u, z, u, z], then the resulting list is [uv, uuz, uz]. This procedure is in the general case well defined on lists whose last "letter" is not the chosen one and it leads to a new alphabet {u<sup>i</sup> · <sup>v</sup> <sup>|</sup> <sup>v</sup> <sup>=</sup> <sup>u</sup>} which is a code if and only if <sup>X</sup> is. This idea is used in an elegant proof of the Graph lemma (see [8] and [2]). In the binary case, which is of interest here, if **w** in addition does not contain a square of a letter, say [x, x], then the new code {x · y, y} is again binary. Moreover, the resulting glued list **w** has the same concatenation, and it is primitive if (and only if) **w** is. Note that gluing is in this case closely related to the Nielsen transformation <sup>y</sup> → <sup>x</sup>−<sup>1</sup><sup>y</sup> known from the theory of automorphisms of free groups.

Induction on |**w**| now easily leads to the proof of Theorem 4.

*Proof (of Theorem 4).* If **w** contains y at most once, then we are left with the equation <sup>x</sup><sup>j</sup> · <sup>y</sup> <sup>=</sup> <sup>z</sup>-, - ≥ 2. The equality j = 2 follows from the Periodicity lemma, see Case 2 in the proof of Theorem 2.

Assume for contradiction that y occurs at least twice in **w**. Lemma 8 implies that at least one square, [x, x] or [y, y] is missing as a cyclic factor. Let {x , y } = {x, y} be such that [x , x ] is not a cyclic factor of **w**. We can therefore perform the gluing operation, and obtain a new, strictly shorter word **w** ∈ lists {x · y , y }. The longer element x · y occurs at least twice in **w** , since the number of its occurrences in **w** is the same as the number of occurrences of x in **w**, the latter word containing both letters at least twice by assumption. Moreover, **w** is primitive, and concat **w** = concat **w** is imprimitive. Therefore, by induction on |**w**|, we have **w** ∼ [x · y , x · y , y ]. In order to show that this is not possible we can successfully reuse the lemma imprim\_ext\_suf\_comm mentioned in the proof of Lemma 1, this time for u = x y x and v = y . The words u and v do not commute because x and y do not commute. Since uv is imprimitive, the word uvv ∼ concat **w** is primitive. 

This also completes the proof of our main target, Theorem 1.

### **7 Additional Notes on the Formalization**

The formalization is a part of an evolving combinatorics on words formalization project. It relies on its backbone session, called CoW, a version of which is also available in the Archive of Formal Proofs [15]. This session covers basics concepts in combinatorics on words including the Periodicity lemma. An overview is available in [8].

The evolution of the parent session CoW continued along with the presented results and its latest stable version is available at our repository [16]. The main results are part of another Isabelle session CoW Equations, which, as the name suggests, aims at dealing with word equations. We have greatly expanded its elementary theory Equations Basic.thy which provides auxiliary lemmas and definitions related to word equations. Noticeably, it contains the definition factor interpretation (Definition 2) and related facts.

Two dedicated theories were created: Binary Square Interpretation.thy and Binary Code Imprimitive.thy. The first contains lemmas and locales dealing with {x, y}-interpretation of the square xx (for |y|≤|x|), culminating in Theorem 3. The latter contains Theorems 1 and 4.

Another outcome was an expansion of formalized results related to the Lyndon-Sch¨utzenberger theorem. This result, along with many useful corollaries, was already part of the backbone session CoW, and it was newly supplemented with the parametric solution of the equation x<sup>j</sup>y<sup>k</sup> = z-, specifically Theorem 2 and Lemma 1. This formalization is now part of CoW Equations in the theory Lyndon Schutzenberger.thy.

Similarly, the formalization of the main results triggered a substantial expansion of existing support for the idea of gluing as mentioned in Sect. 6. Its reworked version is now in a separate theory called Glued Codes.thy (which is part of the session CoW Graph Lemma).

Let us give a few concrete highlights of the formalization. A very useful tool, which is part of the CoW session, is the reversed attribute. The attribute produces a symmetrical fact where the symmetry is induced by the mapping **rev**, i.e., the mapping which reverses the order of elements in a list. For instance, the fact stating that if p is a prefix of v, then p a prefix of v · w, is transformed by the reversed attribute into the fact saying that if s is suffix of v, then s is a suffix of w · v. The attribute relies on ad hoc defined rules which induce the symmetry. In the example, the main reversal rule is

(rev u ≤ p rev v) = u ≤ s v.

The attribute is used frequently in the present formalization. For instance, Fig. 1 shows the formalization of the proof of Cases 1 and 2 of Theorem 1. Namely, the proof of Case 2 is smoothly deduced from the lemma that deals with Case 1, avoiding writing down the same proof again up to symmetry. See [13] for more details on the symmetry and the attribute reversed.

To be able to use this attribute fully in the formalization of main results, it needed to be extended to be able to deal with elements of type a list list, as the constant factor\_interpretation is of the function type over this exact

**Fig. 1.** Highlights from the formalization in Isabelle/HOL.

type. The new theories of the session CoW Equations contain almost 50 uses of this attribute.

The second highlight of the formalization is the use of simple but useful proof methods. The first method, called primitivity\_inspection, is able to show primitivity or imprimitivity of a given word.

Another method named list\_inspection is used to deal with claims that consist of straightforward verification of some property for a set of words given by their length and alphabet. For instance, this method painlessly concludes the proof of lemma bin\_imprim\_both\_squares\_prim. The method divides the goal into eight easy subgoals corresponding to eight possible words. All goals are then discharged by simp\_all.

The last method we want to mention is mismatch. It is designed to prove that two words commute using the property of a binary code mentioned in Sect. 2 and explained in Sect. 8. Namely, if a product of words from {x, y} starting with x shares a prefix of length at least |xy| with another product of words from {x, y}, this time starting with y, then x and y commute. Examples of usage of the attribute reversed and all three methods are given in Fig. 1.

# **8 Appendix: Background Results in Combinatorics on Words**

A periodic root r of w need not be primitive, but it is always possible to consider the corresponding primitive root ρ r, which is also a periodic root of w. Note that any word has infinitely many periodic roots since we allow r to be longer than w. Nevertheless, a word can have more than one period even if we consider only periods shorter than |w|. Such a possibility is controlled by the Periodicity lemma, often called the Theorem of Fine and Wilf (see [6]):

**Lemma 9 (**per lemma comm**).** *If* w *has a period* u *and* v*, i.e.,* w ≤<sup>p</sup> uw *and* w ≤<sup>p</sup> vw*, with* |u| + |v| − gcd(|u|, |v|) ≤ |w|*, then* uv = vu*.*

Usually, the weaker test |u| + |v|≤|w| is sufficient to indicate that u and v commute.

Conjugation u ∼ v is characterized as follows:

**Lemma 10 (**conjugation**).** *If* uz = zv *for nonempty* u*, then there exists words* r *and* q *and an integer* k *such that* u = rq*,* v = qr *and* z = (rq)kr*.*

We have said that w has a periodic root r if it is a prefix of r<sup>ω</sup>. If w is a factor, not necessarily a prefix, of r<sup>ω</sup>, then it has a periodic root which is a conjugate of r. In particular, if |u| = |v|, then u ∼ v is equivalent to u and v being mutually factors of a power of the other word.

Commutation of two words is characterized as follows:

**Lemma 11 (**comm**).** xy = yx *if and only if* x = t <sup>k</sup> *and* y = t <sup>m</sup> *for some word* t *and some integers* k,m ≥ 0*.*

Since every nonempty word has a (unique) primitive root, the word t can be chosen primitive (k or m can be chosen 0 if x or y is empty).

We often use the following theorem, called "the theorem of Lyndon and Sch¨utzenberger":

**Theorem 5 (**Lyndon Schutzenberger**).** *If* x<sup>j</sup>y<sup>k</sup> = z *with* j ≥ 2*,* k ≥ 2 *and* -≥ 2*, then the words* x*,* y *and* z *commute.*

A crucial property of a primitive word t is that it cannot be a nontrivial factor of its own square. For a general word u, the equality u · u = p · u · s with nonempty p and s implies that all three words p, s, u commute, that is, have a common primitive root t. This can be seen by writing u = t <sup>k</sup>, and noticing that the presence of a nontrivial factor u inside uu can be obtained exclusively by a shift by several t's. This idea is often described as "synchronization".

Let x and y be two words that do not commute. The longest common prefix of xy and yx is denoted α. Let c<sup>x</sup> and c<sup>y</sup> be the letter following α in xy and yx respectively. A crucial property of α is that it is a prefix of any sufficiently long word in {x, y}. Moreover, if **w** = [u1, u2,...,un] ∈ lists {x, y} is such that concat **w** is longer than α, then α · [cx] is a prefix of concat **w** if u<sup>1</sup> = x and α · [cy] is a prefix of concat **w** if u<sup>1</sup> = y. That is why the length of α is sometimes called "the decoding delay" of the binary code {x, y}. Note that the property indeed in particular implies that {x, y} is a code, that is, it does not satisfy any nontrivial relation. It is also behind our method mismatch. Finally, using this property, the proof of Lemma 7 is straightforward.

**Acknowledgments.** The authors acknowledge support by the Czech Science Foundation grant GACR 20-20621S. ˇ

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Formula Simplification via Invariance Detection by Algebraically Indexed Types**

Takuya Matsuzaki(B) and Tomohiro Fujita

Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan matuzaki@rs.tus.ac.jp, 1418097@ed.tus.ac.jp

**Abstract.** We describe a system that detects an invariance in a logical formula expressing a math problem and simplifies it by eliminating variables utilizing the invariance. Pre-defined function and predicate symbols in the problem representation language are associated with algebraically indexed types, which signify their invariance property. A Hindley-Milner style type reconstruction algorithm is derived for detecting the invariance of a problem. In the experiment, the invariance-based formula simplification significantly enhanced the performance of a problem solver based on quantifier-elimination for real-closed fields, especially on the problems taken from the International Mathematical Olympiads.

#### **1 Introduction**

It is very common to find an argument marked by the phrase "without loss of generality" (w.l.o.g.) in a mathematical proof by human. An argument of this kind is most often based on the symmetry or the invariance in the problem [9].

Suppose that we are going to prove, by an algebraic method, that the three median lines of a triangle meet at a point (Fig. 1). Six real variables are needed to represent three points on a plane. Since the concepts of 'median lines' and 'meeting at a point' are translation-invariant, we may fix one of the corners at the origin. Furthermore, because these concepts are also invariant under any invertible linear map, we may fix the other two points to, e.g., (1, 0) and (0, 1). Thus, all six variables were eliminated and the task of proof became much easier.

W.l.o.g. arguments may thus have strong impact on the efficiency of inference. It has drawn attention in several research areas including the relative strength of proof systems (e.g., [2,3,12,20]), propositional SAT (e.g., [1,6,8,17,19]), proof assistants [9], and algebraic methods for geometry problem solving [7,10].

Among others, Iwane and Anai [10] share exactly the same objective with us; both aim at solving geometry problems stated in natural language, using an algebraic method as the backend. Logical formulas resulted from mechanical translation of problem text tend to be huge and very redundant, while the computational cost of algebraic methods is generally quite sensitive to the size of the input measured by, e.g., the number of variables. Simplification of the input formula is hence a mandatory part of such a problem-solving system.

**Fig. 1.** Variable Elimination w.l.o.g. by Invariance

Iwane and Anai's method operates on the first-order formula of real-closed fields (RCFs), i.e., a quantified boolean combination of equalities and inequalities between polynomials. They proposed to detect the invariance of a problem by testing the invariance of the polynomials under translation, scaling, and rotation. While being conceptually simple, it amounts to discover the geometric property of the problem solely by its algebraic representation. The detection of rotational invariance is especially problematic because, to test that on a system of polynomials, one needs to identify all the pairs (or triples) of variables that originate from the x and y (and z) coordinates of the same points. Thus their algorithm for 2D rotational invariance already incurs a search among a large number of possibilities and they left the detection of 3D rotational invariance untouched. Davenport [7] also suggests essentially the same method.

In this paper, we propose to detect the invariance in a more high-level language than that of RCF. We use algebraically indexed types (AITs) proposed by Atkey et al. [4] as the representation language. In AIT, each symbol in a formula has a type with indices. An indexed-type of a function indicates that its output undergoes the same or a related transformation as the input. The invariances of the functions are combined via type reconstruction and an invariance in a problem is detected.

The contribution of the current paper is summarized as follows:


In the rest of the paper, we first introduce a math problem solver, on which the proposed method was implemented, and summarize the formalism of AIT. We then detail the type reconstruction procedure and the variable elimination rules. We finally present the experimental results and conclude the paper.

**Fig. 2.** Overview of Todai Robot Math Problem Solver

**Fig. 3.** Example of Manually Formalized Problem (IMO 2012, Problem 5)

### **2 Todai Robot Math Solver and Problem Library**

This work is a part of the development of the Todai Robot Math Problem Solver (henceforth ToroboMath) [13–16]. Figure 2 presents an overview of the system. ToroboMath is targeted at solving pre-university math problems. Our longterm goal is to develop a system that solves problems stated in natural language.

The natural language processing (NLP) module of the system accepts a problem text and derives its logical representation through syntactic analysis. Currently, it produces a correct logical form for around 50% of sentences [13], which is not high enough to cover a wide variety of problems. Although the motivation behind the current work is to cope with the huge formulas produced by the NLP module, we instead used a library of *manually* formalized problems for the evaluation of the formula simplification procedure.

The problem library has been developed along with the ToroboMath system. It contains approximately one thousand math problems collected from several sources including the International Mathematical Olympiads (IMOs). Figure 3 presents a problem that was taken from IMO 2012.

The problems in the library are manually encoded in a polymorphic higherorder language, which is the same language as the output of the NLP module. Table 1 lists some of its primitive types. The language includes a large set of predicate and function symbols that are tailored for formalizing pre-university math problems. Currently, 1387 symbols are defined using 2808 axioms. Figure 4 provides an example of the axioms that defines the predicate maximum.


**Table 1.** Example of Primitive Types

The problem solving module of the ToroboMath accepts a formalized problem and iteratively rewrites it using: (1) basic transformations such as ∀x.(x = α → φ(x)) ⇔ φ(α) and beta-reduction, (2) simplification of expressions such as polynomial division and integration by computer algebra systems (CASs), and (3) the axioms that define the predicate and function symbols.

Once the rewritten formula is in the language of real-closed fields (RCFs) or Peano arithmetic, it is handed to a solver for the theory. For RCF formulas, we use an implementation of the quantifier-elimination (QE) procedure for RCF based on cylindrical algebraic decomposition. Finally, we solve the resulting quantifier-free formula with CASs and obtain the answer. The time complexity of RCF-QE is quite high; it is doubly exponential in the number of variables [5]. Hence, the simplification of the formula *before* RCF-QE is a crucial step.

#### **3 Algebraically Indexed Types**

This section summarizes the framework of AIT. We refrain from presenting it in full generality and describe its application to geometry ([4, §2]) with the restriction we made on it in incorporating it into the type system of ToroboMath.

In AIT, some of the primitive types have associated *indices*. An index represents a transformation on the object of that type. For instance, in VecB, t, the index B stands for an invertible linear transformation and t stands for a translation. The index variables bound by universal quantifiers signify that a function of that type is invariant under any transformations indicated by the indices, e.g.,

$$\mathsf{midpoint} : \forall B \colon \mathsf{GL}\_2. \forall t \colon \mathsf{T}\_2. \mathsf{Vec}\langle B, t \rangle \to \mathsf{Vec}\langle B, t \rangle \to \mathsf{Vec}\langle B, t \rangle.$$

The type of midpoint certifies that, when two points P and Q undergo an arbitrary affine transformation, the midpoint of P and Q moves accordingly.

#### **3.1 Sort and Index Expression**

The *sort* of an index signifies the kind of transformations represented by the index. We assume the set Sort of index sorts includes GL<sup>k</sup> (k = 1, 2, 3) (general linear transformations), O<sup>k</sup> (k = 2, 3) (orthogonal transformations), and T<sup>k</sup> (k = 2, 3) (translations). In the type of midpoint, B is of sort GL<sup>2</sup> and t is of sort T2.

An *index expression* is composed of index variables and index operators. In the current paper, we use the following operators: +, −, 0 are addition, negation, and unit of <sup>T</sup><sup>k</sup> (<sup>k</sup> = 2, 3); · , <sup>−</sup><sup>1</sup>, <sup>1</sup> are multiplication, inverse, and unit of GL<sup>k</sup> and Ok; det is the determinant; |·| is the absolute value. An *index context* Δ is a list of index variables paired with their sorts: Δ = i1:S1, i2:S2,...,in:Sn. The well-sortedness of an index expression e of sort S, written Δ e : S, is defined analogously to the well-typedness in simple type theory.

#### **3.2 Type, Term, and Typing Judgement**

The set of primitive types, PrimType <sup>=</sup> {Bool, <sup>R</sup>, 2d.Vec, 3d.Vec, 2d.Shape, ... }, is the same as that in the language of ToroboMath. A function tyArity: PrimType <sup>→</sup> Sort<sup>∗</sup> specifies the number and sorts of indices appropriate for the primitive types: e.g., tyArity(2d.Vec)=(GL2,T2).

A judgement Δ A type means that type A is well-formed and well-indexed with respect to an index context Δ. Here are the derivation rules:

$$\begin{array}{c} \mathbf{X} \in \text{PRMType} \quad \text{tyAiry} \langle \mathbf{X} \rangle = (S\_1, \dots, S\_m) \quad \{\Delta \vdash e\_j : S\_j\}\_{1 \le j \le m} \text{ } \text{TyPRM} \\\hline \hline \Delta \vdash \mathbf{X} \langle e\_1, \dots, e\_m \rangle \text{ type} \\\hline \Delta \vdash A \text{ type} \quad \Delta \vdash B \text{ type} \\\hline \Delta \vdash A \to B \text{ type} \end{array} \\ \begin{array}{c} \Delta , i: S \vdash A \text{ type} \\\hline \Delta \vdash \forall i: S \, A \text{ type} \\\hline \Delta \vdash \forall i: S \, A \text{ type} \end{array} \\ \text{TyFORAL}$$

While Atkey et al.'s system is formulated in the style of System F, we allow the quantifiers only at the outermost (prenex) position. The restriction permits an efficient type reconstruction algorithm analogous to Hindley-Milner's, while being expressive enough to capture the invariance of the pre-defined functions in ToroboMath and the invariance in the majority of math problems.

The well-typedness of a term M, written Δ; Γ M : A, is judged with respect to an index context Δ and a typing context Γ = x<sup>1</sup> : A1,...,x<sup>n</sup> : An. A typing context is a list of variables with their types. A special context Γops consists of the pre-defined symbols and their types, e.g., + : ∀s:GL1. Rs → Rs → Rs ∈ Γops. We assume Γops is always available in the typing derivation and suppress it in a judgement. The typing rules are analogous to those for lambda calculus with rank-1 polymorphism except for TyEQ:

$$\begin{array}{c} \begin{array}{c} x:A \in \Gamma\\ \Delta; \Gamma \vdash x:A \end{array} \mathrm{V\_{AR}} \quad \begin{array}{c} \Delta; \Gamma \vdash M: \forall i: S.A \quad \Delta \vdash e: S\\ \Delta; \Gamma \vdash M: A \{i \mapsto e\} \end{array} \mathrm{Unvlnsr} \quad \begin{array}{c} \Delta; \Gamma, x:A \vdash M:B\\ \Delta; \Gamma \vdash \lambda x.M:A \to B \end{array} \mathrm{Ans} \\\\ \begin{array}{c} \Delta; \Gamma \vdash M:A \to B \quad \Delta; \Gamma \vdash N:A \end{array} \mathrm{App} \quad \begin{array}{c} \Delta; \Gamma \vdash M:A \quad \Delta \vdash A \equiv B\\ \Delta; \Gamma \vdash M:B \end{array} \mathrm{TvEQ} \end{array}$$

In the Abs and App rules, the meta-variables A and B only designate a type without quantifiers. In the UnivInst rule, <sup>A</sup>{<sup>i</sup> <sup>→</sup> <sup>e</sup>} is the result of substituting e for i in A. The 'polymorphism' of the types with quantifiers hence takes place only when a pre-defined symbol (e.g., midpoint) enters a derivation via the Var rule and then the bound index variable is instantiated via the UnivInst rule.

The type equivalence judgement <sup>Δ</sup> <sup>A</sup> <sup>≡</sup> <sup>B</sup> in the TyEQ rule equates two types involving *semantically* equivalent index expressions; thus, e.g., s:GL<sup>1</sup> <sup>R</sup><sup>s</sup> · <sup>s</sup>−<sup>1</sup> ≡ <sup>R</sup>1 and <sup>O</sup>:O<sup>2</sup> <sup>R</sup>| det <sup>O</sup>| ≡ <sup>R</sup>1.

#### **3.3 Index Erasure Semantics and Transformational Interpretation**

The abstraction theorem for AIT [4] enables us to know the invariance of a term by its type. The theorem relates two kinds of interpretations of types and terms: index erasure semantics and relational interpretations. We will restate the theorem with what we here call *transformational* interpretations (t-interpretations hereafter), instead of the relational interpretations. It suffices for the purpose of justifying our algorithm and makes it easier to grasp the idea of the theorem.

The index-erasure semantics of a primitive type Xe1,...,en is determined only by X. We thus write Xe1,...,en = X. The interpretation X is the set of mathematical objects intended for the type: e.g., 2d.VecB, t = 2d.Vec <sup>=</sup> <sup>R</sup><sup>2</sup> and <sup>R</sup>s <sup>=</sup> <sup>R</sup> <sup>=</sup> <sup>R</sup>. The index-erasure semantics of a nonprimitive type is determined by the type structure: A → B = A→B and ∀i:S. T = T.

The index-erasure semantics of a typing context Γ = x1:T1,...,xn:T<sup>n</sup> is the direct product of the domains of the variables: Γ = T1×···× T<sup>n</sup>. The erasure semantics of a term Δ; Γ M : A is a function of the values assigned to its free variables: M : Γ→A and defined as usual (see, e.g., [18,21]).

The t-interpretation of a type T, denoted by -T, is a function from the assignments to the index variables to a transformation on T. To be precise, we first define the semantics of index context Δ = i1:S1,...,in:S<sup>n</sup> as the direct product of the interpretation of the sorts: -Δ = -S1 ×···× -Sn, where -S1,..., -Sn are the intended sets of transformations: e.g., -GL2 = GL<sup>2</sup> and -T2 = T2. The interpretation of an index expression e of sort S is a function e : -Δ → -S that is determined by the structure of the expression; for ρ ∈ -Δ,

$$\|\mathbf{f}(e\_1, \dots, e\_n)\| (\rho) = \|\mathbf{f}\| (\|e\_1\| (\rho), \dots, \|e\_n\| (\rho)), \quad \|i\_k\| (\rho) = \rho (i\_k),$$

where, in the last equation, we regard ρ ∈ -Δ as a function from index variables to their values. The index operations det and |·| are interpreted as intended.

The t-interpretation of a primitive type Xe1,...,en is then determined by X and the structures of the index expressions e1,...,en. The t-interpretation of Vec and Shape is the affine transformation of vectors and geometric objects parametrized by ρ ∈ -Δ; for index expressions β:GL<sup>2</sup> and τ :T2,

$$\begin{aligned} \{\mathsf{Vec}\langle\beta,\tau\rangle\}(\rho) &: \mathbb{R}^2 \ni x \mapsto M\_{\|\beta\|\langle\rho\rangle} x + v\_{\|\tau\|\langle\rho\rangle} \in \mathbb{R}^2\\ \{\mathsf{Shape}\langle\beta,\tau\rangle\}(\rho) &: \mathcal{P}(\mathbb{R}^2) \ni S \mapsto \{M\_{\|\beta\|\langle\rho\rangle} x + v\_{\|\tau\|\langle\rho\rangle} \mid x \in S\} \in \mathcal{P}(\mathbb{R}^2), \end{aligned}$$

where M<sup>β</sup>(ρ) and v<sup>τ</sup>(ρ) are the representation matrix and vector of β(ρ) and <sup>t</sup>(ρ), and <sup>P</sup>(R<sup>2</sup>) denotes the power set of <sup>R</sup><sup>2</sup>. Similarly, for the real numbers,

$$[\mathbb{R}\langle \sigma \rangle](\rho) : \mathbb{R} \ni x \longmapsto [\![\sigma]\!](\rho)x \in \mathbb{R}.$$

That is, -Rσ(ρ) is a change of scale with the scaling factor determined by the expression σ:GL<sup>1</sup> and the assignment ρ. For a primitive type X with no indices, its t-interpretation is the identity map on X: i.e., -X(ρ) = id <sup>X</sup>.

The t-interpretation of a function type A → B is a higher-order function that maps a (mathematical) function f : A→B to another function on the same domain and codomain such that: -A → B(ρ)(f) = -B(ρ) ◦ f ◦ (-A(ρ))−<sup>1</sup>. It is easy to check that this interpretation is compatible with currying. Equivalently, we may say that if g = -A → B(ρ)(f), then f and g are in the commutative relation g ◦ -A(ρ) = -B(ρ) ◦ f. The typing derivation in AIT is a way to 'pull out' the effect of transformation -A(ρ) on a free variable deep inside a term by combining such commutative relations.

The t-interpretation of a fully-quantified type is the identity map on its erasure semantics: -∀i1:S1.... ∀in:Sn. T = id <sup>T</sup>. We don't define that of partiallyquantified types because we don't need it to state the abstraction theorem.

#### **3.4 Abstraction Theorem**

The abstraction theorem for AIT enables us to detect the invariance of (the erasure-semantics of) a term under a certain set of transformations on its free variables. We first define the t-interpretation of the typing context Γ = x<sup>1</sup> : T1,...,x<sup>n</sup> : T<sup>n</sup> as a simultaneous transformation of η = (v1,...,vn) ∈ Γ:

$$\{\Gamma\}(\rho) \colon \left[\varGamma\right] \ni \eta \mapsto \left[\varGamma\right](\rho) \circ \eta = \left(\left[T\_1\right](\rho) \circ v\_1, \dots, \left[T\_n\right](\rho) \circ v\_n\right) \in \left[\varGamma\right] \dots$$

We now present a version of the abstraction theorem, restricted to the case of a term of quantifier-free type and restated with the t-interpretation:

**Theorem 1** *(Abstraction* [4]*, restated using transformational interpretation)***.** *If* A *is a quantifier-free type and* Δ; Γ M : A*, then for all* ρ ∈ -Δ *and all* η ∈ Γ*, we have* -A(ρ) ◦ M (η) = M (-Γ(ρ) ◦ η)*.*

Here we provide two easy corollaries of the theorem. The first one is utilized to eliminate variables from a formula while preserving the equivalence.

**Corollary 1.** *If* Δ; x<sup>1</sup> : T1,...,x<sup>n</sup> : T<sup>n</sup> φ(x1,...,xn) : Bool*, then for all* ρ ∈ -Δ*, we have* φ(x1,...,xn) ⇔ φ(-T1(ρ) ◦ x1,..., -Tn(ρ) ◦ xn)*.*

This is by the abstraction theorem and the fact -Bool(ρ) = id Bool for any ρ. It indicates that, without loss of generality, we may 'fix' some of the variables to, e.g., zeros by appropriately choosing ρ.

The second corollary is for providing more intuition about the theorem.

**Corollary 2.** *If* ; λx1. . . . .λxn. f(x1,...,xn) : ∀Δ. T<sup>1</sup> → ··· → T<sup>n</sup> → T<sup>0</sup> *then, for all* ρ ∈ -Δ *and all* v<sup>i</sup> ∈ T<sup>i</sup> (i = 1,...,n)*,*

$$\|\mathsf{T}\_{0}\|(\rho)\circ\lfloor f\rfloor\left(v\_{1},\ldots,v\_{n}\right)=\lfloor f\rfloor\left(\left\llbracket\mathsf{T}\_{1}\right\rrbracket(\rho)\circ v\_{1},\ldots,\left\llbracket\mathsf{T}\_{n}\right\rrbracket(\rho)\circ v\_{n}\right).$$

In the statement, ∀Δ signifies the universal quantification over all index variables in Δ. By this corollary, for instance, we can tell from the type of midpoint that, for all <sup>x</sup>1, x<sup>2</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup> and for all <sup>g</sup> <sup>∈</sup> GL<sup>2</sup> and <sup>t</sup> <sup>∈</sup> <sup>T</sup>2,

$$\lfloor \mathtt{indpoint} \rfloor \left( M\_g x\_1 + v\_t, M\_g x\_2 + v\_t \right) = M\_g \left\lfloor \mathtt{indpoint} \rfloor \left( x\_1, x\_2 \right) + v\_t.$$

# **3.5 Restriction on the Index Expressions of Sort GL***<sup>k</sup> /***O***<sup>k</sup>* **(***<sup>k</sup> <sup>≥</sup>* **2)**

We found that the type reconstruction in AIT is far more straightforward when we assume an index expression of sort GL<sup>k</sup> or O<sup>k</sup> (k ≥ 2) includes at most one index variable of sort GL<sup>k</sup> or O<sup>k</sup> that is not in the determinant operator. Assuming this, any expression e of sort GL<sup>k</sup> or O<sup>k</sup> can be written in the form of

$$e = \prod\_{i \in I} s\_i^{w\_i} \cdot \prod\_{i \in I} |s\_i|^{x\_i} \cdot \prod\_{j \in J} \det(B\_j)^{y\_j} \cdot \prod\_{j \in J} |\det(B\_j)|^{z\_j} \cdot B\_0^\delta,$$

where {si}<sup>i</sup>∈<sup>I</sup> are of sort GL1, {B0}∪{Bj}<sup>j</sup>∈<sup>J</sup> are of sort GL<sup>k</sup> or Ok, wi, xi, y<sup>j</sup> , z<sup>j</sup> ∈ Z, and δ ∈ {0, 1}. We henceforth say an expression e in the above form satisfies *the head variable property* and call B<sup>0</sup> the *head variable* of e.

Empirically, this restriction is not too restrictive; as far as we are aware of, the invariance of all the pre-defined functions and predicates in ToroboMath is expressible with an indexed-type satisfying this.

#### **4 Invariance Detection Through Type Reconstruction**

We need type reconstruction in AIT for two purposes: to infer the invariance of the pre-defined symbols in ToroboMath and to infer the invariance in a math problem. To this end, we only have to derive the judgement Δ; Γ φ : Bool where φ is either a defining axiom of a symbol or a formula of a problem. For a pre-defined symbol s, by a judgement Δ; s : T, ··· φ : Bool, we know s is of type T and it has the invariance signified by T. For a problem φ, by the judgement Δ; x<sup>1</sup> : T1,...,x<sup>n</sup> : T<sup>n</sup> φ : Bool, we know the invariance of φ under the transformation on the free variables x1,...,x<sup>n</sup> according to -T1,..., -Tn.

Since all types are in prenex form, we can find the typing derivation by a procedure analogous to the Hindley-Milner (H-M) algorithm. It consists of two steps: deriving equations among index expressions, and solving them. The procedure for solving the equations in T2/T<sup>3</sup> is essentially the same as in the type inference for Kennedy's unit-of-measure types [11], which is a precursor of AIT. Further development is required to solve the equations in GL2/GL3, even under the restriction on the form of index expressions mentioned in Sect. 3.5, due to the existence of the index operations |·| and det.

#### **4.1 Equation Derivation**

We first assign a type variable α<sup>i</sup> for each subterm t<sup>i</sup> in φ. Then, for a subterm t<sup>i</sup> in the form t<sup>j</sup> t<sup>k</sup> (i.e., application of t<sup>j</sup> to tk), we have the equation α<sup>j</sup> = α<sup>k</sup> → αi. The case for a subterm t<sup>i</sup> in the form of λx.t<sup>j</sup> is also analogous to H-M and we omit it here. For a leaf term (i.e., a variable) ti, if it is one of the pre-defined symbols and t<sup>i</sup> : ∀i1:S1.... ∀in:Sn.T ∈ Γops, we set α<sup>i</sup> = T{i<sup>1</sup> → β1,...,i<sup>n</sup> → βn}, where {i<sup>1</sup> → β1,...,i<sup>n</sup> → βn} stands for the substitution of fresh variables β1,...,β<sup>n</sup> for i1,...,in. By solving the equations for the type and index variables {αi} and {βj}, we reconstruct the most general indexed-types of all the subterms. For example, consider the following axiom defining perpendicular:

$$\forall v\_1. \forall v\_2. (\mathtt{perpendicu1ar}(v\_1, v\_2) \longleftarrow \mathtt{inner} \neg \mathtt{prod}(v\_1, v\_2) = 0),$$

and suppose that inner-prod is in Γops. We are going to reconstruct the type of perpendicular. The type of inner-prod is

$$\mathsf{Sanner-prod} : \forall s\_1, s\_2; \mathsf{GL}\_1. \,\forall O : \mathsf{O}\_2. \,\mathsf{Vec}\langle s\_1 O, 0 \rangle \to \mathsf{Vec}\langle s\_2 O, 0 \rangle \to \mathsf{R}\langle s\_1 \cdot s\_2 \rangle\_1$$

and it is instantiated as inner-prod : Vecs1O, 0 → Vecs2O, 0 → Rs<sup>1</sup> · s2 where s1, s2, and O are fresh variables. Since the type of perpendicular in the non-AIT version of our language is Vec → Vec → Bool, we set fresh variables to all indices in the primitive types and have:

$$\mathsf{perpendic} \mathsf{curl} \mathsf{curl} \mathsf{ar} : \mathsf{Vec} \langle \beta\_1, \tau\_1 \rangle \to \mathsf{Vec} \langle \beta\_2, \tau\_2 \rangle \to \mathsf{Bool} \mathsf{1}.$$

Since perpendicular is applied to v<sup>1</sup> and v2, the types of v<sup>1</sup> and v<sup>2</sup> are equated to Vecβ1, τ1 and Vecβ2, τ2. Additionally, since inner-prod is also applied to v<sup>1</sup> and v2, we have the following equations:

$$\mathsf{Vec}\langle s\_1O, 0\rangle = \mathsf{Vec}\langle \beta\_1, \tau\_1\rangle, \quad \mathsf{Vec}\langle s\_2O, 0\rangle = \mathsf{Vec}\langle \beta\_2, \tau\_2\rangle \tag{4.1}$$

If we have an equation between the same primitive type, by unifying both sides of the equation, in turn we have one or more equations between index expressions, i.e., if we have Xe1,...,em = Xe 1,...,e <sup>m</sup>, then we have: e<sup>1</sup> = e <sup>1</sup>,...,e<sup>m</sup> = e <sup>m</sup>. For Eq. (4.1), we hence have s1O = β1, s2O = β2, 0 = τ1, and 0 = τ2. Thus, by recursively unifying all the equated types, we are left with a system of equations between index expressions.

#### **4.2 Equation Solving**

To solve the derived equations between index expressions, we need to depart from the analogy with the H-M algorithm. Namely, instead of applying syntactic unification, we need semantic unification, i.e., we solve the equations as simultaneous equations in the transformation groups.

We first order the equations with respect to the sort of the equated expressions. We then process them in the order <sup>T</sup>2/T<sup>3</sup> <sup>→</sup> GL2/GL<sup>3</sup> <sup>→</sup> GL<sup>1</sup> as follows.<sup>1</sup>

First, since equations of sort T2/T<sup>3</sup> are always in the form of <sup>i</sup> ait<sup>i</sup> = 0 (a<sup>i</sup> ∈ Z), where {ti} are variables of sort T<sup>k</sup> (k ∈ {2, 3}), we can solve the equations as is the case with a linear homogeneous system. Although the solution may involve rational coefficients as in t<sup>i</sup> = j n*ij* <sup>m</sup>*ij* t<sup>j</sup> (nij , mij ∈ Z), we can clear the denominators by introducing new variables t <sup>j</sup> such that t<sup>j</sup> = lcm{mij}<sup>i</sup> · t j .

Next, by the head variable property, equations of sort GL2/GL<sup>3</sup> (henceforth GL≥<sup>2</sup>) are always in the form of σ1B<sup>1</sup> = σ2B2, where σ<sup>1</sup> and σ<sup>2</sup> are index

<sup>1</sup> In this subsection, GL2, GL3, O2, and O<sup>3</sup> are collectively denoted as GL2/GL<sup>3</sup> or GL≥<sup>2</sup>.

expressions of sort GL1, and B<sup>1</sup> and B<sup>2</sup> are the head variables of sort GL≥<sup>2</sup>. We decompose these equations according to Table 2, which summarizes the following argument: Let E denote the identity transformation. Since σ1B<sup>1</sup> = σ2B<sup>2</sup> ⇐⇒ σ−<sup>1</sup> <sup>1</sup> σ2E = B1B−<sup>1</sup> <sup>2</sup> , there must be some <sup>s</sup> <sup>∈</sup> GL<sup>1</sup> such that <sup>B</sup>1B−<sup>1</sup> <sup>2</sup> = sE and σ−<sup>1</sup> <sup>1</sup> σ<sup>2</sup> = s. Furthermore, by the superset-subset relation between the sorts of B<sup>1</sup> and B2, e.g., O<sup>2</sup> ⊂ GL<sup>2</sup> for B<sup>1</sup> : O<sup>2</sup> and B<sup>2</sup> : GL2, we can express one of the broader sort with the other as a parameter.

The algorithm for GL≥<sup>2</sup> equations works as follows. First, we initialize the set of solution with the empty substitution: S ← {}. For each GL≥<sup>2</sup> equation σ1B<sup>1</sup> = σ2B2, we look up Table 2 and find the GL≥<sup>2</sup> solution B<sup>i</sup> → sB<sup>j</sup> and one or more new GL<sup>1</sup> equations. We populate the current set of GL<sup>1</sup> equations with the new ones, and apply the solution B<sup>i</sup> → sB<sup>j</sup> to all the remaining GL<sup>1</sup> and GL≥<sup>2</sup> equations. We also compose the GL<sup>2</sup> solution B<sup>i</sup> → sB<sup>j</sup> with the current solution set: S ← S ◦ {B<sup>i</sup> → sBj}.

By processing all GL≥<sup>2</sup> equations as above, we are left with a partial solution S and a system of GL<sup>1</sup> equations, each of which is in the following form:

$$\prod\_{i \in I} s\_i^{w\_i} \cdot \prod\_{i \in I} |s\_i|^{x\_i} \cdot \prod\_{j \in J} \det(B\_j)^{y\_j} \cdot \prod\_{j \in J} |\det(B\_j)|^{z\_j} = 1 \quad (w\_i, x\_i, y\_j, z\_j \in \mathbb{Z}),$$

where we assume about I and J that {si}<sup>i</sup>∈<sup>I</sup> are all the GL<sup>1</sup> variables, {Bj}<sup>j</sup>∈<sup>J</sup> are all the remaining GL≥<sup>2</sup> variables, and I ∩ J = ∅. Letting u<sup>i</sup> = s<sup>i</sup> · |si| <sup>−</sup><sup>1</sup>, v<sup>i</sup> = |si|, u<sup>j</sup> = det(B<sup>j</sup> ) · |det(B<sup>j</sup> )| <sup>−</sup><sup>1</sup>, and <sup>v</sup><sup>i</sup> <sup>=</sup> <sup>|</sup>det(B<sup>j</sup> )|, we have <sup>s</sup><sup>i</sup> <sup>=</sup> <sup>u</sup>iv<sup>i</sup> and det(B<sup>j</sup> ) = ujv<sup>j</sup> for all i ∈ I and j ∈ J. By using them, we have

$$\prod\_i u\_i^{w\_i} \cdot \prod\_i v\_i^{w\_i + x\_i} \cdot \prod\_j u\_j^{y\_j} \cdot \prod\_j v\_j^{y\_j + z\_j} = 1.$$

Since ui, u<sup>j</sup> ∈ {+1, −1} and vi, v<sup>j</sup> > 0 for all i and j, we know the above equation is equivalent to the following two equations:

$$\prod\_i u\_i^{w\_i} \cdot \prod\_j u\_j^{y\_j} = 1, \quad \prod\_i v\_i^{w\_i + x\_i} \cdot \prod\_j v\_j^{y\_j + z\_j} = 1.$$

We thus have two systems of equations, one in {+1, −1} and the other in R<sup>&</sup>gt;0. Now we temporarily rewrite the solution with u<sup>i</sup> and vi: S ← S ◦ {s<sup>i</sup> → uivi}<sup>i</sup>∈<sup>I</sup> .

First consider the system in R<sup>&</sup>gt;0. As long as there remains an equation involving a variable vi, which originates from a GL<sup>1</sup> variable, we solve it for v<sup>i</sup> and compose the solution v<sup>i</sup> → i-=<sup>i</sup> <sup>v</sup> p*i*- i- · j v q*j* <sup>j</sup> with S while applying it to the remaining equations. The denominators of fractional exponents (i.e., p<sup>i</sup>- , q<sup>j</sup> ∈ Q\Z) can be cleared similarly to the case of T<sup>k</sup> equations. If all the equations in R<sup>&</sup>gt;<sup>0</sup> are solved this way, then S is the most general solution. Otherwise, there remain one or more equations of the form j∈J- | det B<sup>j</sup> | <sup>d</sup>*<sup>j</sup>* = 1 for some <sup>J</sup> <sup>⊂</sup> <sup>J</sup> and {dj}<sup>j</sup>∈J- . This is the only case where we may miss some invariance of a formula; in general, we cannot express the most general solution to this equation only with the index variables of sort GL<sup>k</sup> and Ok. We make a compromise here and are satisfied with a less general solution S ◦ {B<sup>j</sup> → E}<sup>j</sup>∈J-. Fortunately, this


**Table 2.** Decomposition of GL2/GL<sup>3</sup> equation σiB<sup>i</sup> = σjB<sup>j</sup> (s: a fresh variable)

does not frequently happen in practice. We made this compromise only on three out of 533 problems used in the experiment. We expect that having more sorts, e.g., SL<sup>±</sup> <sup>k</sup> = {M ∈ GL<sup>k</sup> | | det M| = 1}, in the language of index expressions might be of help here, but leave it as a future work.

The system in {+1, −1} is processed analogously to that in R<sup>&</sup>gt;0. Finally, by restoring {ui, vi}<sup>i</sup>∈<sup>I</sup> and {u<sup>j</sup> , vj}<sup>j</sup>∈<sup>J</sup> in the solution S to their original forms, e.g., u<sup>i</sup> → s<sup>i</sup> · |si| <sup>−</sup><sup>1</sup>, we have a solution to the initial set of equations in terms of the variables of sort GL<sup>k</sup> and Ok.

#### **4.3 Type Reconstruction for Pre-defined Symbols with Axioms**

We incrementally determined the indexed-types of the pre-defined symbols according to the hierarchy of their definitions. We first constructed a directed acyclic graph wherein the nodes are the pre-defined symbols and the edges represent the dependency between their definitions. We manually assigned an indexedtype to the symbols without defining axioms (e.g., + : R → R → R) and initialized Γops with them. We then reconstructed the indexed-types of other symbols in a topological order of the graph. After the reconstruction of the type of each symbol, we added the symbol with its inferred type to Γops.

For some of the symbols, type reconstruction does not go as well as we hope. For example, the following axiom defines the symbol midpoint:

$$\forall p\_1, p\_2. (\mathbf{midpoint}(p\_1, p\_2) = \frac{1}{2} \cdot (p\_1 + p\_2)).$$

At the beginning of the type reconstruction of midpoint, the types of the symbols in the axiom are instantiated as follows:

$$\begin{aligned} \mathsf{mid} \mathsf{mid} \mathsf{mid} \mathsf{p} \mathsf{c} \langle \beta\_{1}, \tau\_{1} \rangle &\to \mathsf{Vec} \langle \beta\_{2}, \tau\_{2} \rangle \to \mathsf{Vec} \langle \beta\_{3}, \tau\_{3} \rangle \\ &\cdot : \mathsf{R} \langle s\_{1} \rangle \to \mathsf{Vec} \langle B\_{1}, 0 \rangle \to \mathsf{Vec} \langle s\_{1} B\_{1}, 0 \rangle \\ &\cdot : \mathsf{Vec} \langle B\_{2}, t\_{1} \rangle \to \mathsf{Vec} \langle B\_{2}, t\_{2} \rangle \to \mathsf{Vec} \langle B\_{2}, t\_{1} + t\_{2} \rangle . \end{aligned}$$

The derived equations between the index expressions are as follows:

$$\{B\_2 = \beta\_1, B\_2 = \beta\_2, B\_1 = B\_2, \beta\_3 = s\_1 B\_1, s\_1 = 1, t\_1 = \tau\_1, t\_2 = \tau\_2, 0 = t\_1 + t\_2, \tau\_3 = 0\}.$$

By solving these equations, we obtain the indexed-type of midpoint as follows:

$$\mathsf{midpoint} : \forall B\_1 ; \mathsf{GL}\_2 . \,\forall t\_1 ; \mathsf{T}\_2 . \,\mathsf{Vec} \langle B\_1, t\_1 \rangle \to \mathsf{Vec} \langle B\_1, -t\_1 \rangle \to \mathsf{Vec} \langle B\_1, 0 \rangle.$$

This type indicates that the midpoint of any two points P and Q remains the same when we move <sup>P</sup> and <sup>Q</sup> respectively to <sup>P</sup> <sup>+</sup> <sup>t</sup><sup>1</sup> and <sup>Q</sup> <sup>−</sup> <sup>t</sup><sup>1</sup> for any <sup>t</sup><sup>1</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup>. While it is *not wrong*, the following type is more useful for our purpose:

$$\mathsf{midpoint} : \forall B \colon \mathsf{GL}\_2. \,\forall t \colon \mathsf{T}\_2. \,\mathsf{Vec}\langle B, t\rangle \to \mathsf{Vec}\langle B, t\rangle \to \mathsf{Vec}\langle B, t\rangle. \tag{1}$$

To such symbols, we manually assigned a more appropriate type.<sup>2</sup>

In the current system, 945 symbols have a type that includes indices. We manually assigned the types to 255 symbols that have no defining axioms. For 203 symbols we manually overwrote the inferred type as in the case of midpoint. The types of the remaining 487 symbols were derived through the type reconstruction.

#### **5 Variable Elimination Based on Invariance**

In this section, we first provide an example of the variable elimination procedure based on invariance. We then describe the top-level algorithm of the variable elimination, which takes a formula as input and eliminates some of the quantified variables in it by utilizing the invariance indicated by an index variable. We finally list the elimination rule for each sort of index variable.

#### **5.1 Example of Variable Elimination Based on Invariance**

Let us consider again the proof of the existence of the centroid of a triangle. For triangle ABC, the configuration of the midpoints P, Q, R of the three sides and the centroid G is described by the following formula:

$$\psi(A,B,C,P,Q,R,G) := \begin{pmatrix} P \texttt{ = } \texttt{midpoint}(B,C) \land \texttt{on}(G, \texttt{segment}(A,P)) \land \\ Q = \texttt{midpoint}(C,A) \land \texttt{on}(G, \texttt{segment}(B,Q)) \land \\ R = \texttt{midpoint}(A,B) \land \texttt{on}(G, \texttt{segment}(C,R)) \end{pmatrix}$$

where on(X, Y ) stands for the inclusion of point X in a geometric object Y , and segment(X, Y ) stands for the line segment between points X and Y . Let φ denote the existence of the centroid (and the three midpoints):

$$
\phi(A, B, C) := \exists G. \; \exists P. \; \exists Q. \; \exists R. \; \psi(A, B, C, P, Q, R, G).
$$

Our goal is to prove ∀A. ∀B. ∀C. φ(A, B, C).

<sup>2</sup> The awkwardness of the type inferred for midpoint is a price for the efficiency of type reconstruction; it is due to the fact that we ignore the linear space structure of <sup>T</sup><sup>2</sup> (and also, we do not posit T1( <sup>R</sup>) as the second index of type <sup>R</sup>). Otherwise, the type reconstruction comes closer to a search for an invariance on the algebraic representation of the problems and the defining axioms. Hence 1/2 ∗ (t + t) = t is not deduced for t : T2, which is necessary to infer the type in Eq. (1).

The functions midpoint, on, and segment are invariant under translations and general linear transformations. The reconstruction algorithm hence derives

$$\beta : \mathsf{GL}\_2, \tau : \mathsf{T}\_2 \text{ } ; \ A : \mathsf{Vec}\langle \beta, \tau \rangle, B : \mathsf{Vec}\langle \beta, \tau \rangle, C : \mathsf{Vec}\langle \beta, \tau \rangle \vdash \phi(A, B, C) : \mathsf{Bool}\mathbf{1}.$$

By the abstraction theorem, this judgement implies the invariance of the proposition φ(A, B, C) under arbitrary affine transformations:

$$
\forall g \in \text{GL}\_2. \,\forall t \in \text{T}\_2. \,\forall A, B, C. \,\phi(A, B, C) \Leftrightarrow \phi(t \circ g \circ A, t \circ g \circ B, t \circ g \circ C).
$$

First, by considering the case of g being identity, we have

$$
\forall t \in \mathcal{T}\_2. \,\,\forall A, B, C. \,\,\phi(A, B, C) \Leftrightarrow \phi(t \circ A, t \circ B, t \circ C). \tag{2}
$$

By using this, we are going to verify ∀B, C. φ(**0**,B,C) ⇔ ∀A, B, C. φ(A, B, C), by which we know that we only have to prove ∀B, C. φ(**0**,B,C).

Suppose that <sup>∀</sup>B, C. φ(**0**,B,C) holds. Since T<sup>2</sup> acts transitively on <sup>R</sup><sup>2</sup>, for any <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup>, there exists <sup>t</sup> <sup>∈</sup> <sup>T</sup><sup>2</sup> such that <sup>t</sup> ◦ **<sup>0</sup>** <sup>=</sup> <sup>A</sup>. Furthermore, for any B,C <sup>∈</sup> <sup>R</sup><sup>2</sup>, by instantiating <sup>∀</sup>B, C. φ(**0**,B,C) with <sup>B</sup> <sup>→</sup> <sup>t</sup> <sup>−</sup><sup>1</sup>◦<sup>B</sup> and <sup>C</sup> <sup>→</sup> <sup>t</sup> <sup>−</sup><sup>1</sup>◦C, we have <sup>φ</sup>(**0**, t−<sup>1</sup> ◦B, t−<sup>1</sup> ◦C). By Eq. (2), we obtain <sup>φ</sup>(t◦**0**, t◦<sup>t</sup> <sup>−</sup><sup>1</sup> ◦B, t◦<sup>t</sup> <sup>−</sup><sup>1</sup> ◦C), which is equivalent to φ(A, B, C). Since A, B, C were arbitrary, we proved

$$
\forall B, C. \ \phi(\mathbf{0}, B, C) \Rightarrow \forall A, B, C. \ \phi(A, B, C).
$$

The converse is trivial. We thus proved ∀B, C. φ(**0**,B,C) ⇔ ∀A, B, C. φ(A, B, C).

The simplified formula, ∀B, C. φ(**0**,B,C), is still invariant under the simultaneous action of GL<sup>2</sup> on B and C. Hence, by applying the type reconstruction again, we have β : GL<sup>2</sup> ; B : Vecβ, 0, C : Vecβ, 0 φ(**0**,B,C) : Bool. It implies the following invariance: ∀g ∈ GL2. ∀B, C. φ(**0**,B,C) ⇔ φ(**0**, g◦B,g◦C).

We now utilize it to eliminate the remaining variables B and C. Although it is tempting to 'fix' B and C respectively at, e.g., **e**<sup>1</sup> := (1, 0) and **e**<sup>2</sup> := (0, 1), it incurs some *loss* of generality. For instance, when B is at the origin, there is no way to move B to **e**<sup>1</sup> by any g ∈ GL2. We consider four cases:


For each of these cases, we can find a suitable transformation in GL<sup>2</sup> as follows:


By a similar argument to the one for the translation-invariance, we have

$$
\forall B, C. \ \phi(\mathbf{0}, B, C) \Leftrightarrow \phi(\mathbf{0}, \mathbf{e}\_1, \mathbf{e}\_2) \land \forall r. \ \phi(\mathbf{0}, \mathbf{e}\_1, r\mathbf{e}\_1) \land \forall r'. \ \phi(\mathbf{0}, r'\mathbf{e}\_1, \mathbf{e}\_1) \land \phi(\mathbf{0}, \mathbf{0}, \mathbf{0}).
$$

Thus, we eliminated all four coordinate values (i.e., x and y coordinates for B and C) in the first and the last case and three of them in the other two cases.

#### **5.2 Variable Elimination Algorithm**

The variable elimination algorithm works as follows. We traverse the formula of a problem in a top-down order and, for each subformula in the form of

$$(Qx\_1.Qx\_2.\cdots \cdot Qx\_n.\ \phi(x\_1, x\_2, \dots, x\_n, \mathbf{y}) \quad (Q \in \{\forall, \exists\}))$$

where **y** = y1,...,y<sup>m</sup> are the free variables, we apply the type reconstruction procedure to φ(x1, x2,...,xn, **y**) and derive a judgement Δ; Γ, x1:T1,...,xn:T<sup>n</sup> φ(x1,...,xn, **y**) : Bool. We then choose an index variable i that appears at least once in T1,..., T<sup>n</sup> but in none of the types of **y**. It means the transformation signified by i acts on some of {x1,...,xn} but on none of **y**. We select from {x1,...,xn} one or more variables whose types include i and are of the form Rσ or Vecβ, τ . Suppose that we select x1,...,xl. Then we know the judgement Δ; Γ, x1:T1,...,xl:T<sup>l</sup> Qx<sup>l</sup>+1. ··· Qxn. φ(x1,...,xn, **y**) : Bool also holds. We then eliminate (or add restriction on) the bound variables x1,...,x<sup>l</sup> by one of the lemmas in Sect. 5.3 according to the sort of i. After the elimination, the procedure is recursively applied to the resulting formula and its subformulas.

#### **5.3 Variable Elimination Rules**

We now present how to eliminate variables based on a judgement of the form

$$\Delta; \Gamma, \, x\_1: \mathsf{T}\_1, \dots, x\_n: \mathsf{T}\_n \vdash \psi(x\_1, \dots, x\_n, \mathbf{y}): \mathsf{Bool} \mathbf{1}$$

where T1,..., T<sup>n</sup> include no other variables than i; Γ = y1:U1,...,ym:U<sup>m</sup> is a typing context for **y** = y1,...,ym; and U<sup>1</sup> ..., U<sup>m</sup> do not include i. Note that we can obtain a judgement of this form by the procedure in Sect. 5.2 and by substituting the unity of appropriate sorts for all index variables other than i in T1,..., Tn.

We provide the variable elimination rules as lemmas, one for each sort of i. They state the rules for variables bound by ∀. The rules for ∃ are analogous. In stating the lemma, we suppress Δ and Γ in the judgement and **y** in ψ for brevity but we still assume the above-mentioned condition hold.

Some complication arises due to the fact that if k = l, then T<sup>k</sup> and T<sup>l</sup> may be indexed with *different* expressions of i. We thus need to consider potentially different transformations -T1(i),..., -Tn(i) applied simultaneously on x1,...,xn. Please refer to supplementary material on the first author's web page for a general argument behind the rules and the proofs of the lemmas (https:// researchmap.jp/mtzk/?lang=en).

Tk: The following lemma states that, as we saw in Sect. 5.1, we have only to consider the truth of a formula ψ(x) at x = **0** if ψ(x) is translation-invariant.

**Lemma 1.** *If* x : Vec1, τ (t) ψ(x) : Bool *holds for* t : T<sup>k</sup> (t ∈ {2, 3})*, then* ∀x. ψ(x) ⇔ ψ(**0**)*.*

O2: The following lemma means that we may assume x is on the x-axis if ψ(x) is invariant under rotation and reflection.

**Lemma 2.** *If* x : Vecβ(O), 0 ψ(x) : Bool *holds for* O : O2*, then* ∀x. ψ(x) ⇔ ∀r. ψ(r**e**1)*.*

O3: A judgement in the following form implies different kinds of invariance according to β<sup>1</sup> and β2:

$$x\_1: \mathsf{Vec}\langle\beta\_1(O), 0\rangle, x\_2: \mathsf{Vec}\langle\beta\_2(O), 0\rangle \vdash \psi(x\_1, x\_2): \mathsf{Bool}.\tag{3}$$

In any case, we may assume x<sup>1</sup> is on the x-axis and x<sup>2</sup> is on the xy-plane for proving ∀x1, x2. ψ(x1, x2), as stated in the following lemma.

**Lemma 3.** *If judgement (3) holds for* O : O3*, then*

$$
\forall x\_1. \ \forall x\_2. \ \psi(x\_1, x\_2) \Leftrightarrow \forall p, q, r \in \mathbb{R}. \ \psi(p\mathbf{e}\_1, q\mathbf{e}\_1 + r\mathbf{e}\_2).
$$

GL1: For s : GL1, a judgement x : Rσ(s) ψ(x) : Bool implies, either


The form of σ determines the type of invariance. The following lemma summarizes how we can eliminate or restrict a variable for these cases.

**Lemma 4.** *Let* <sup>σ</sup>(s) = <sup>s</sup><sup>e</sup> · |s<sup>|</sup> <sup>f</sup> (<sup>e</sup> = 0 *or* <sup>f</sup> = 0) *and suppose a judgement* x : Rσ(s) ψ(s) : Bool *holds for* s : GL1*. We have three cases:*


GL<sup>2</sup> For B : GL2, a judgement in the following form implies different kinds of invariance of ψ(x1, x2) depending on the form of β<sup>1</sup> and β2:

$$x\_1: \mathsf{Vec}\langle\beta\_1(B), 0\rangle, x\_2: \mathsf{Vec}\langle\beta\_2(B), 0\rangle \vdash \psi(x\_1, x\_2). \tag{4}$$

The following lemma summarizes how we eliminate the variables in each case.

**Lemma 5.** *Let* <sup>β</sup><sup>j</sup> (B) = det(B)<sup>e</sup>*<sup>j</sup>* ·| det(B)<sup>|</sup> <sup>f</sup>*<sup>j</sup>* ·<sup>B</sup> *and* <sup>g</sup><sup>j</sup> <sup>=</sup> <sup>e</sup><sup>j</sup> <sup>+</sup>f<sup>j</sup> (<sup>j</sup> ∈ {1, <sup>2</sup>})*. If judgement (4) holds, then, letting* ψ<sup>0</sup> := ψ(**0**, **0**) ∧ ∀r. ψ(r**e**1, **e**1) ∧ ∀r. ψ(**e**1, r**e**1) *and* Ψ := ∀x1. ∀x2. ψ(x1, x2)*, the following equivalences hold:*

*1. If* g<sup>1</sup> + g<sup>2</sup> +1=0 *and*

*– if* e<sup>1</sup> + e<sup>2</sup> *is an even number, then* Ψ ⇔ ψ<sup>0</sup> ∧ ψ(**e**1, **e**2)

*– if* e<sup>1</sup> + e<sup>2</sup> *is an odd number, then* Ψ ⇔ ψ<sup>0</sup> ∧ ψ(**e**1, **e**2) ∧ ψ(**e**1, −**e**2)

*2. If* g<sup>1</sup> + g<sup>2</sup> + 1 = 0*, then* Ψ ⇔ ψ<sup>0</sup> ∧ ∀r. ψ(r**e**1, **e**2)*.*

A similar lemma holds for the invariances indicated by an index variable of sort GL3. We refrain from presenting it for space reasons.


in ToroboMath Benchmark

**Table 3.** Results on All RCF Problems **Table 4.** Results on RCF Problems with Invariance Detected and Variable Eliminated

Division/#Prblms AlgIdx Baseline Speed

IMO 77 19% 91.3s 1% 3.6s 23% Univ 49 57% 31.0s 33% 62.7s 495% Chart 77 49% 14.3s 36% 26.0s 529% All 203 40% 34.3s 22% 38.5s 505%

Solved Time Solved Time up

**Fig. 5.** Comparison of Elapsed Time with and without the Invariance Detection based on AITs (Left: All Problems; Right: Problems Solved within 60 s)

#### **6 Experiment**

We evaluated the effectiveness of the proposed method on the pre-university math problems in the ToroboMath benchmark. We used a subset of the problems that can be naturally expressible (by human) in the language of RCF. Most of them are either in geometry or algebra. Note that the formalization was done in the language introduced in Sect. 2 but not directly in the language of RCF. The problems are divided according to the source of the problems; **IMO** problems were taken from past International Mathematical Olympiads, **Univ** problems were from entrance exams of Japanese universities, and **Chart** problems were from a popular math practice book series. Please refer to another paper [16] on the ToroboMath benchmark for the details of the problems.

The type reconstruction and formula simplification procedures presented in Sect. 4 and Sect. 5 were implemented as a pre-processor of the formalized problems. The time spent for the preprocessing was almost negligible (0.76 s per problem on average) compared to that for solving the problems.

We compared the ToroboMath system with and without the pre-processor (respectively called AlgIdx and Baseline below). The Baseline system *is equipped with* Iwane and Anai's invariance detection and simplification algorithm [10] that operates on the language of RCF while AlgIdx *is not with it*. Thus, our evaluation shall reveal the advantage of detecting and exploiting the invariance of the problem expressed in a language that directly encodes its geometric meaning.


nated by the Rule for each Sort

**Table 5.** Percentage of Problems from which one or more Variables are Elimi-**Table 6.** Most Frequent Invariance Types Detected and Eliminated


Table 3 presents the results on all problems. The solver was run on each problem with a time limit of 600 s. The table lists the number of problems, the percentages of the problems solved within the time limit, and the average wallclock time spent on the solved problems. The number of the solved problems is significantly increased in the **IMO** division. A modest improvement is observed in the other two divisions. Table 4 presents the results only on the problems in which at least one variable was eliminated by AlgIdx. The effect of the proposed method is quite clearly observed across all problem divisions and especially on **IMO**. On **IMO**, the average elapsed time on the problems solved by AlgIdx is longer than that by Baseline; it is because more difficult problems were solved by AlgIdx within the time limit. In fact, the average speed-up by AlgIdx (last column in Table 4) is around 500% on **Univ** and **Chart**; i.e., on the problems solved by both, AlgIdx output the answer five times faster than Baseline.

A curious fact is that both AlgIdx and Baseline tended to need more time to solve the problems on which an invariance was detected and eliminated by AlgIdx (i.e., Time in Table 4) than the average over all solved problems (Time in Table 3). It suggests that a problem having an invariance, or equivalently a symmetry, is harder for automatic solvers than those without it.

Figure 5 shows a comparison of the elapsed time for each problem. Each point represents a problem, and the x and y coordinates respectively indicate the elapsed time to solve (or to timeout) by Baseline and AlgIdx. We can see many problems that were not solved by Baseline within 600 s were solved within 300 s by AlgIdx. The speed-up is also observed on easier problems (those solved in 60 s) as can be seen in the right panel of Fig. 5.

Table 5 lists the fraction of problems on which one or more variables are eliminated based on the invariance indicated by an index variable of each sort. Table 6 provides the distribution of the combination of the sorts of invariances detected and eliminated by AlgIdx.

#### **7 Conclusion**

A method for automating w.l.o.g. arguments on geometry problems has been presented. It detects an invariance in a problem through type reconstruction in AIT and simplifies the problem utilizing the invariance. It was especially effective on harder problems including past IMO problems. Our future work includes the exploration for a more elaborate language of the index expressions that captures various kind of invariance while keeping the type inference amenable.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Synthetic Tableaux: Minimal Tableau Search Heuristics**

Michal Socha´nski(B) , Dorota Leszczy´nska-Jasion(B) , Szymon Chlebowski , Agata Tomczyk , and Marcin Jukiewicz

Adam Mickiewicz University, ul. Wieniawskiego 1, 61-712 Pozna´n, Poland {Michal.Sochanski,Dorota.Leszczynska,Szymon.Chlebowski, Agata.Tomczyk,Marcin.Jukiewicz}@amu.edu.pl

**Abstract.** We discuss the results of our work on heuristics for generating minimal synthetic tableaux. We present this proof method for classical propositional logic and its implementation in Haskell. Based on mathematical insights and exploratory data analysis we define heuristics that allows building a tableau of optimal or nearly optimal size. The proposed heuristics has been first tested on a data set with over 200,000 short formulas (length 12), then on 900 formulas of length 23. We describe the results of data analysis and examine some tendencies. We also confront our approach with the pigeonhole principle.

**Keywords:** Synthetic tableau · Minimal tableau · Data analysis · Proof-search heuristics · Haskell · Pigeonhole principle

# **1 Introduction**

The method of *synthetic tableaux* (ST, for short) is a proof method based entirely on direct reasoning but yet designed in a tableau format. The basic idea is that all the laws of logic, and only laws of logic, can be derived directly by cases from parts of some partition of the whole logical space. Hence an ST-proof of a formula typically starts with a division between 'p-cases' and '¬p-cases' and continues with further divisions, if necessary. Further process of derivation consists in applying the so-called *synthesizing* rules that build complex formulas from their parts—subformulas and/or their negations. For example, if p holds, then every implication with <sup>p</sup> in the succedent holds, '<sup>q</sup> <sup>→</sup> <sup>p</sup>' in particular; then also '<sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>)' holds by the same argument. If <sup>¬</sup><sup>p</sup> is the case, then every implication with <sup>p</sup> in the antecedent holds, thus '<sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>)' is settled. This kind of reasoning *proves* that '<sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>)' holds in every possible case (unless we reject *tertium non datur* in the partition of the logical space). There are no indirect assumptions, no *reductio ad absurdum*, no assumptions that need to be discharged. The ST method needs no labels, no derivation of a normal form (clausal form) is required.

This work was supported financially by National Science Centre, Poland, grant no 2017/26/E/HS1/00127.

c The Author(s) 2022

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 407–425, 2022. https://doi.org/10.1007/978-3-031-10769-6\_25

In the case of Classical Propositional Logic (CPL, for short) the method may be viewed as a formalization of the truth-tables method. The assumption that p amounts to considering all Boolean valuations that make p true; considering <sup>¬</sup><sup>p</sup> exhausts the logical space. The number of cases to be considered corresponds to the number of branches of an ST, and it clearly depends on the number of distinct propositional variables in a formula, thus the upper bound for complexity of an ST-search is the complexity of the truth-tables method. In the worst case this is exponential with respect to the number of variables, but for some classes of formulas truth-tables behave better than standard analytic tableaux (see [4–7] for this diagnosis). However, the method of ST can perform better than truthtables, as shown by the example of '<sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>)', where we do not need to partition the space of valuations against the <sup>q</sup>/¬<sup>q</sup> cases.<sup>1</sup> The question, obviously, is *how much* better? The considerations presented in this paper aim at developing a *quasi*-experimental framework for answering it.

The ST method was introduced in [19], then extended to some non-classical logics in [20,22]. An adjustment to the first-order level was presented in [14]. There were also interesting applications of the method in the domain of abduction: [12,13]. On the propositional level, the ST method is both a proof- and model-checking method, which means that one can examine satisfiability of a formula <sup>A</sup> (equivalently, validity of <sup>¬</sup>A) and its falsifiability (equivalently, inconsistency of <sup>¬</sup>A) at the same time. Normally, one needs to derive a clausal form of both <sup>A</sup> and <sup>¬</sup><sup>A</sup> to check the two dual semantic cases (satisfiability and validity) with one of the quick methods, while the ST-system is designed to examine both of them. Wisely used, this property can contribute to limiting the increase in complexity in verification of semantic properties.

For the purpose of optimization of the ST method we created a heuristics that leads to construction of a variable ordering—a task similar to the one performed in research on Ordered Binary Decision Diagrams (OBDDs), and, generally, in Boolean satisfiability problem (SAT) [8,15]. In Sect. 3 we sketch a comparison of STs to OBDDs. Let us stress at this point, however, that the aim of our analysis remains proof-theoretical—the ST method is a 'full-blooded' proof method working on formulas of arbitrary representation. It was already adjusted to firstorder and to some non-classical logics, and has a large scope of applications beyond satisfiability checking of clausal forms.

The optimization methods that we present are based on exploratory data analysis performed on millions of tableaux. Some aspects of the analysis are also discussed in the paper. The data are available on https://ddsuam.wordpress. com/software-and-data/.

Here is a plan of what follows. The next section introduces the ST method, Sect. 3 compares STs with analytic tableaux and with BDDs, and Sect. 4 presents the implementation in Haskell. In Sect. 5 we introduce the mathematical concepts

<sup>1</sup> On a side note, it is easy to show that the ST system is polynomially equivalent to system **KE** introduced in [4], as both systems contain cut. What is more, there is a strict analogy between the ST method and the inverse method (see [4,16]). The relation between ST and **KI** was examined by us in detail in Sect. 2 of [14].

needed to analyse heuristics of small tableaux generation. In Sect. 6 we describe the analysed data, and in Sect. 7—the obtained results. Section 8 confronts our approach with the pigeonhole principle, and Sect. 9 indicates plans for further research.

#### **2 The Method of Synthetic Tableaux**

**Language.** Let LCPL stand for the language of CPL with negation, ¬, and implication, <sup>→</sup>. Var <sup>=</sup> {p, q, r, . . . , pi,...} is the set of propositional variables and 'Form' stands for the set of all formulas of the language, where the notion of formula is understood in a standard way. A, B, C . . . will be used for formulas of LCPL. Propositional variables and their negations are called *literals*. *Length* of a formula A is understood as the number of occurrences of characters in A, parentheses excluded.

Let <sup>A</sup> <sup>∈</sup> Form. We define the notion of a *component* of <sup>A</sup> as follows. (i) <sup>A</sup> is a component of <sup>A</sup>. (ii) If <sup>A</sup> is of the form '¬¬B', then <sup>B</sup> is a component of <sup>A</sup>. (iii) If <sup>A</sup> is of the form '<sup>B</sup> <sup>→</sup> <sup>C</sup>', then '¬B' and <sup>C</sup> are components of <sup>A</sup>. (iv) If <sup>A</sup> is of the form '¬(<sup>B</sup> <sup>→</sup> <sup>C</sup>)', then <sup>B</sup> and '¬C' are components of <sup>A</sup>. (v) If <sup>C</sup> is a component of B and B is a component of A, then C is a component of A. (vi) Nothing else is a component of A. By 'Comp(A)' we mean the set of all components of A. For example, Comp( <sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>))= {<sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>),¬p, q <sup>→</sup> p,¬q, p}. As we can see, *component of a formula* is not the same as *subformula of a formula*; <sup>¬</sup><sup>q</sup> is not a subformula of the law of antecedent, q is, but it is not its component. Components refer to *uniform notation* as defined by Smullyan (see [18]) which is very convenient to use with a larger alphabet. Let us also observe that the association of Comp(A) with a Hintikka set is quite natural, although Comp(A) need not be consistent. In the sequel we shall also use 'Comp<sup>±</sup>(A)' as a short for 'Comp(A) <sup>∪</sup> Comp(¬A)'.

**Rules.** The system of ST consists of the set of rules (see Table 1) and the notion of proof (see Definition 2). The rules can be applied in the construction of an ST for a formula A on the proviso that (a) the premises already occur on a given branch, (b) the conclusion (conclusions, in the case of (cut)) of a particular application of the rule belongs (both belong) to Comp<sup>±</sup>(A). The only branching rule, called (cut) by analogy to its famous sequent-calculus formulation, is at the same time the only rule that needs no premises, hence every ST starts with an application of this rule. If its application creates branches with <sup>p</sup><sup>i</sup> and <sup>¬</sup>pi, then we say that the rule was *applied with respect to* pi.

One of the nice properties of this method is that it is easy to keep every branch *consistent*: it is sufficient to restrict the applications of (cut), so that on every branch (cut) is applied with respect to a given variable p<sup>i</sup> at most once. This warrants that <sup>p</sup>i,¬p<sup>i</sup> never occur together on the same branch.

The notion of a proof is formalized by that of a tree. If T is a labelled tree, then by <sup>X</sup><sup>T</sup> we mean the set of its nodes, and by <sup>r</sup><sup>T</sup> we mean its root. Moreover, <sup>η</sup><sup>T</sup> is used for a function assigning labels to the nodes in <sup>X</sup><sup>T</sup> .

**Table 1.** Rules of the ST system for LCPL


**Definition 1 (synthetic tableau).** *A* synthetic tableau for a formula A *is a finite labelled tree* <sup>T</sup> *generated by the above rules, such that* <sup>η</sup><sup>T</sup> : <sup>X</sup>\{r<sup>T</sup> } −→ Comp<sup>±</sup>(A) *and each leaf is labelled with* <sup>A</sup> *or with* <sup>¬</sup>A*.*

<sup>T</sup> *is called* consistent *if the applications of* (cut) *are subject to the restriction defined above: there are no two applications of* (cut) *on the same branch with respect to the same variable.*

T *is called* regular *provided that literals are introduced in the same order on each branch, otherwise* T *is called* irregular*.*

*Finally,* T *is called* canonical*, if, first, it is consistent and regular, and second, it starts with an introduction of all possible literals by* (cut) *and only after that the other rules are applied on the created branches.*

In the above definition we have used the notion of *literals introduced in the same order on each branch*. It seems sufficiently intuitive at the moment, so we postpone the clarification of this notion until the end of this section.

**Definition 2 (proof in ST system).** *A synthetic tableau* T *for a formula* <sup>A</sup> *is a* proof of <sup>A</sup> in the ST system *iff each leaf of* <sup>T</sup> *is labelled with* <sup>A</sup>*.*

**Theorem 1. (soundness and completeness, see** [21]**).** *A formula* A *is valid in* CPL *iff* A *has a proof in the ST-system.*

*Example 1.* Below we present two different STs for one formula: <sup>B</sup> <sup>=</sup> <sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> p). Each of them is consistent and regular. Also, each of them is a proof of the formula in the ST system.

In <sup>T</sup>1: 2 comes from 1 by **<sup>r</sup>**<sup>2</sup> <sup>→</sup>, similarly 3 comes from 2 by **<sup>r</sup>**<sup>2</sup> <sup>→</sup>. 5 comes from 4 by **r**<sup>1</sup> <sup>→</sup>. In <sup>T</sup>2: nothing can be derived from 1, hence the application of (cut) wrt p is the only possible move. The numbering of the nodes is not part of the ST.

There are at least two important size measures used with respect to trees: the number of nodes and the number of branches. As witnessed by our data, there is a very high overall correlation between the two measures, we have thus used only one of them—the number of branches—in further analysis. Among various STs for the same formula there can be those of smaller, and those of bigger size. An ST of a minimal size is called *optimal*. In the above example, T<sup>1</sup> is an optimal ST for B. Let us also observe that there can be many STs for a formula of the same size, in particular, there can be many optimal STs.

*Example 2.* Two possible canonical synthetic tableaux for <sup>B</sup> <sup>=</sup> <sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>). Each of them is regular, consistent, but clearly not optimal (*cf.* T1).

In the case of formulas with at most two distinct variables regularity is a trivial property. Here comes an example with three variables.

*Example 3.* <sup>T</sup><sup>5</sup> is an irregular ST for formula <sup>C</sup> = (<sup>p</sup> → ¬q) → ¬(<sup>r</sup> <sup>→</sup> <sup>p</sup>), i.e. variables are introduced in various orders on different branches. T<sup>6</sup> is an example of an inconsistent ST for C, i.e. there are two applications of (cut) on one branch with respect to <sup>p</sup>, which results in a branch carrying both <sup>p</sup> and <sup>¬</sup><sup>p</sup> (the blue one). The whole right subtree of <sup>T</sup>5, starting with <sup>¬</sup>p, is repeated twice in <sup>T</sup>6, where it is symbolized with letter <sup>T</sup> <sup>∗</sup>. Let us observe that ¬¬(<sup>r</sup> <sup>→</sup> <sup>p</sup>) is a component of <sup>¬</sup><sup>C</sup> due to clause (iv) defining the concept of component.

On the level of CPL we can use only consistent STs while still having a complete calculus (for details see [19,21]). An analogue of closing a branch of an analytic tableau for formula A is, in the case of an ST, ending a branch with A synthesized. And the fact that an ST for A has a consistent branch ending with <sup>¬</sup><sup>A</sup> witnesses satisfiability of <sup>¬</sup>A. The situation concerning consistency of branches is slightly different, however, in the formalization of first-order logic presented in [14], as a restriction of the calculus to consistent STs produces an incomplete formalization.

Finally, let us introduce some auxiliary terminology to be used in the sequel. Suppose <sup>T</sup> is an ST for a formula <sup>A</sup> and <sup>B</sup> is a branch of <sup>T</sup> . Literals occur on <sup>B</sup> in an order set by the applications of (cut), suppose that it is ±p1,..., <sup>±</sup>pn, where '±' is a negation sign or no sign. In this situation we call sequence <sup>o</sup> <sup>=</sup> p1,...,pn the *order on* <sup>B</sup>. It can happen that <sup>o</sup> contains *all* variables that occur in <sup>A</sup>, or that some of them are missing. Suppose that q1,...,q<sup>m</sup> are all of (and only) the distinct variables occurring in A. Each permutation of q1,...,q<sup>m</sup> will be called *an instruction for a branch of an ST for* A. Further, we will say that the order <sup>o</sup> on <sup>B</sup> *complies with an instruction* <sup>I</sup> iff either <sup>o</sup> <sup>=</sup> <sup>I</sup>, or <sup>o</sup> constitutes a proper initial segment of <sup>I</sup>. Finally, <sup>I</sup> is an *instruction for the construction of* <sup>T</sup> , if <sup>I</sup> is a set of instructions for branches of an ST for A such that for each branch of T , the order on the branch complies with some element of I.

Let us observe that in the case of a regular ST the set containing one instruction for a branch makes the whole instruction for the ST, as the instruction describes all the branches. Let us turn to examples. T<sup>5</sup> from Example 3 has four branches with the following orders (from the left): p, q, p, q, p, r, p, r. On the other hand, there are six permutations of p, q, r, and hence six possible instructions for branches of an arbitrary ST for the discussed formula. Order p, q complies with instruction p, q, r, and order p, r complies with instruction p, r, q. The set {p, q, r,p, r, q} is an instruction for the construction of an ST for <sup>C</sup>, more specifically, it is an instruction for the construction of <sup>T</sup>5.

#### **3 ST, Analytic Tableaux, BDDs, and SAT Solvers**

The analogy between STs and analytic tableaux sketched in the last paragraph of the previous section breaks in two points. First, let us repeat: the ST method is *both a satisfiability checker and a validity checker at once*, just like a truth table is. Second, the analogy breaks on complexity issues. In the case of analytic tableaux the order of decomposing compound formulas is the key to a minimal tableau. In the case of STs, the key to an optimized use of the method is a clever choice of variables introduced on each branch.

The main similarity between STs and Binary Decision Diagrams (BDDs, see e.g. [8,15]) is that both methods involve branching on variables. The main differences concern the representation they work on and their aims: firstly, STs constitute a proof method, whereas BDDs are compact representations of Boolean formulas, used mainly for practical aims such as design of electronic circuits (VLSI design); secondly, ST applies to logical formulas, whereas construction of BDDs may start with different representations of Boolean functions, usually circuits or Boolean formulas.

The structure of the constructed tree is also slightly different in the two approaches: in BDDs the inner nodes correspond to variables with outgoing edges labelled with 1 or 0; in STs, on the other hand, inner nodes are labelled with literals or more complex formulas. The terminal nodes of a BDD (also called sinks, labelled with 1 or 0) indicate the value of a Boolean function calculated for the arguments introduced along the path from the root, whereas the leaves of an ST carry a synthesized formula (the initial one or its negation). In addition to that, the methods differ in terms of the construction process: in case of BDDs, tree structures are first generated and then reduced to a more compact form using the elimination and merging rules; the STs, in turn, are built 'already reduced'. However, the interpretation of the outcome of both constructions is analogous. Firstly, for a formula A with n distinct variables p1,...,p<sup>n</sup> and the associated Boolean function f<sup>A</sup> = fA(x1,...,xn), the following fact holds: If a branch of an ST containing literals from a set <sup>L</sup> ends with <sup>A</sup> or <sup>¬</sup><sup>A</sup> synthesized (which means that assuming that the literals from L are true is sufficient to calculate the value of A), then the two mentioned reduction rules can be used in a BDD for fA, so that the route that contains the variables occurring in L followed by edges labelled according to the signs in L can be directed to a terminal node (sink). For example, if <sup>A</sup> can be synthesized on a branch with literals <sup>¬</sup>p1, <sup>p</sup><sup>2</sup> and <sup>¬</sup>p3, then <sup>f</sup>A(0, <sup>1</sup>, <sup>0</sup>, x4,...,xn) = 1 for all values of the variables <sup>y</sup> ∈ {x4,...,xn} and so the route in the associated BDD containing the variables x1, x<sup>2</sup> and x<sup>3</sup> followed by the edges labelled with 0, 1 and 0, respectively, leads directly to the sink labelled with 1.

However, possibility of applying the reduction procedures for a BDD does not always correspond to the possibility of reducing an ST. For example, the reduced BDD for formula <sup>p</sup> <sup>∨</sup> (<sup>q</sup> ∧ ¬q) consists of the single node labelled with p with two edges directed straight to the sinks 1 and 0; on the other hand, construction of an ST for the formula requires introducing q following the literal <sup>¬</sup>p. This observation suggests that ST, in general, have greater size than the reduced BDDs.

Strong similarity of the two methods is also illustrated by the fact that they both allow the construction of a disjunctive normal form (DNF) of the logical or Boolean formula to which they were applied. In the case of ST, DNF is the disjunction of conjunctions of literals that appear on branches finished with the formula synthesized. The smaller the ST, the smaller the DNF. Things are analogous with BDDs.

Due to complexity issues, research on BDDs centers on ordered binary decision diagrams (OBDDs), in which different variables appear in the same order on all paths from the root. A number of heuristics have been proposed in order to construct a variable ordering that will lead to the smallest OBDDs, using characteristics of the different types of representation of Boolean function (for example, for circuits, topological characteristics have been used for that purpose). OBDDs are clearly analogous to regular STs, the construction of which also requires finding a good variable ordering, leading to a smaller ST. We suppose that our methodology can also be used to find orderings for OBDDs by expressing Boolean functions as logical formulas. It is not clear to us whether the OBDDs methodology can be used in our framework.

Let us move on to other comparisons, this time with a lesser degree of detail. It is very instructive to compare the ST method to SAT-solvers, as their effectiveness is undeniably impressive nowadays<sup>2</sup>. The ST method does not aim at challenging this effectiveness. Let us explain, however, in what aspect the ST method can still be viewed as a computationally attractive alternative to a SAT solver. The latter produces an answer to question about satisfiability, sometimes producing also examples of satisfying valuations and/or counting the satisfying valuations. In order to obtain an answer to another question—that about validity—one needs to ask about satisfiability of the initial problem negated. As we stressed above, the ST method answers the two questions at once, providing at the same time a description of classes of valuations satisfying and not satisfying the initial formula. Hence one ST is worth two SAT-checks together with a rough model counting.

Another interesting point concerns clausal forms. The method of ST does not require derivation of clausal form, but the applications of the rules of the system, defined via α-, β-notation, reflects the breaking of a formula into its components, and thus, in a way, leads to a definition of a normal form (a DNF, as we mentioned above). But this is not to say that an ST needs a full conversion to DNF. In this respect the ST method is rather similar to non-clausal theorem provers (*e.g.* non-clausal resolution, see [9,17]).

Let us finish this section with a summary of the ST method. Formally, it is a proof method with many applications beyond the realm of CPL. In the area of CPL, semantically speaking, it is both satisfiability and validity checker, displaying semantic properties of a formula like a truth table does, but amenable to work more efficiently (in terms of the number of branches) than the latter method. The key to this efficiency is in the order of variables introduced in an ST. In what follows we present a method of construction of such variable orders and examine our approach in an experimental setting.

# **4 Implementation**

The main functionality of the implementation described in this section is a construction of an ST for a formula according to an instruction provided by the user. If required, it can also produce *all possible instructions* for a given formula and build all STs according to them. In our research we have mainly used the second possibility.

The implemented algorithm generates non-canonical, possibly irregular STs. Let us start with some basics. There are three main datatypes employed. Standard, recursively defined formula type, For, used to represent propositional formulas; Monad Maybe Formula, MF, consisting of Just Formula and Nothing used to express the fact that the synthesis of a given formula on a given branch was successful (Just) or not (Nothing). To represent an ST we use type of trees imported from Data.Tree. Thus every ST can be represented as Tree [MF]

<sup>2</sup> See [23, p. 2021]: *contemporary SAT solvers can often handle practical instances with millions of variables and constraints*.

[Tree [MF]], that is a tree labelled by lists of MF. We employed such a general structure having in mind possible extensions to non-classical logics (for CPL a binary tree is sufficient). The algorithm generating all possible ST for a given formula consists of the following steps:

	- (a) a list of all components of <sup>A</sup> and all components of <sup>¬</sup>A, and a separate list of the variables occurring in A (atoms A) is generated;
	- (b) the first list is sorted in such a way that all components of a given formula in that list precede it (sort A).
	- (a) after each introduction of a literal (by (cut)) we try to synthesize (by the other rules) as many formulas from sort A as possible;
	- (b) if no synthesizing rule is applicable we look into the instruction to introduce an appropriate literal and we go back to (a). Let us note that T1, T2, T<sup>5</sup> are constructed according to this strategy.

Please observe that the length of a single branch is linear in the size of a formula; this follows from the fact that sort A contains only the components of A. On the other hand, an 'outburst' of computational complexity enters on the level of the number of STs. In general, if k is the number of distinct variables in a formula A, then for k = 3 there are 12 different canonical STs, for k = 4 and k = 5 this number is, respectively, 576 and 1,688,800. In the case of k = 6 the number of canonical STs *per* formula exceeds 10<sup>12</sup> and this approach is no longer feasible<sup>3</sup>.

The Haskell implementation together with necessary documentation is available on https://ddsuam.wordpress.com/software-and-data/.

## **5 dp-Measure and the Rest of Our Toolbox**

.

As we have already observed, in order to construct an optimal ST for a given formula one needs to make a clever choice of the literals to start with. The following function was defined to facilitate the smart choices. It assigns a rational value from the interval 0; 1 to each occurrence of a literal in a syntactic tree for

$$\prod\_{i=1}^{k} \left(k - i + 1\right)^{2^{i-1}}.$$

<sup>3</sup> It can be shown (*e.g.* by mathematical induction) that for formulas with *k* different variables, the total number of canonical STs is given by the following explicit formula:

formula A (in fact, it assigns the values to all elements of Comp(A)). Intuitively, the value reflects the *derivative power* of the literal in synthesizing A.

The first case of the equation in Definition 3 is to make the function full (=total) on Form × Form, it also corresponds with the intended meaning of the defined measure: if B /<sup>∈</sup> Comp(A), then <sup>B</sup> is of no use in deriving <sup>A</sup>. The second case expresses the starting point: to calculate the values of dp(A, B) for atomic B, one needs to assign 1 = dp(A, A); then the value is propagated down along the branches of a formula's syntactic tree. Dividing the value a by 2 in the fourth line reflects the fact that both components of an α-formula are needed to synthesize the formula. In order to use the measure, we need to calculate it for both <sup>A</sup> and <sup>¬</sup>A; this follows from the fact that we do not know whether <sup>A</sup> or <sup>¬</sup><sup>A</sup> will be synthesized on a given branch.

$$\begin{array}{l} \textbf{Definition 3.} \ dp: \mathsf{Form} \times \mathsf{Form} \longrightarrow \langle 0; 1 \rangle \\ \qquad \qquad \qquad \qquad \begin{cases} 0 & \text{if } B \notin \mathsf{Comp}(A), \\ 1 & \text{if } B = A, \\ a & \text{if } dp(A, \neg \neg B) = a, \\ \frac{a}{2} & \text{if } B \in \{C, \neg D\} \text{ and } dp(A, \neg (C \to D)) = a, \\ a & \text{if } B \in \{\neg C, D\} \text{ and } dp(A, C \to D) = a. \end{array}$$

*Example 4.* A visualization of calculating dp for formulas B,C from Examples 2, <sup>3</sup> and for <sup>D</sup> = (<sup>p</sup> → ¬p) <sup>→</sup> <sup>p</sup>.

*p* → (*q* → *p*)1 ¬*p*1 *q* → *p*1 ¬*q*1 *p*1 ¬(*p* → (*q* → *p*))1 *p* 1 <sup>2</sup> <sup>¬</sup>(*<sup>q</sup>* <sup>→</sup> *<sup>p</sup>*) <sup>1</sup> 2 *q* 1 <sup>4</sup> <sup>¬</sup>*<sup>p</sup>* <sup>1</sup> 4 (*p* → ¬*q*) → ¬(*r* → *p*)1 ¬(*p* → ¬*q*)1 *p* 1 <sup>2</sup> ¬¬*<sup>q</sup>* <sup>1</sup> 2 *q* 1 2 ¬(*r* → *p*)1 *r* 1 <sup>2</sup> <sup>¬</sup>*<sup>p</sup>* <sup>1</sup> 2 ¬((*p* → ¬*q*) → ¬(*r* → *p*))1 *<sup>p</sup>* → ¬*<sup>q</sup>* <sup>1</sup> 2 <sup>¬</sup>*<sup>p</sup>* <sup>1</sup> <sup>2</sup> <sup>¬</sup>*<sup>q</sup>* <sup>1</sup> 2 ¬¬(*<sup>r</sup>* <sup>→</sup> *<sup>p</sup>*) <sup>1</sup> 2 *<sup>r</sup>* <sup>→</sup> *<sup>p</sup>* <sup>1</sup> 2 <sup>¬</sup>*<sup>r</sup>* <sup>1</sup> <sup>2</sup> *p* <sup>1</sup> 2 (*p* → ¬*p*) → *p*1 ¬(*p* → ¬*p*)1 *p* 1 <sup>2</sup> ¬¬*<sup>p</sup>* <sup>1</sup> 2 *p* 1 2 *p*1 ¬((*p* → ¬*p*) → *p*)1 *<sup>p</sup>* → ¬*<sup>p</sup>* <sup>1</sup> 2 <sup>¬</sup>*<sup>p</sup>* <sup>1</sup> <sup>2</sup> <sup>¬</sup>*<sup>p</sup>* <sup>1</sup> 2 <sup>¬</sup>*<sup>p</sup>* <sup>1</sup> 2

As one can see from Example 4, the effect of applying the dp measure to a formula and its negation is a number of values that need to be aggregated in order to obtain a clear instruction for an ST construction. However, some conclusions can be drawn already from the above example. It seems clear that the value dp( <sup>p</sup> <sup>→</sup> (<sup>q</sup> <sup>→</sup> <sup>p</sup>), p ) = 1 corresponds to the fact that <sup>p</sup> is sufficient to synthesize the whole formula (as witnessed by T1, see Example 1). So is the case with <sup>¬</sup>p. On the other hand, even if <sup>¬</sup><sup>q</sup> is sufficient to synthesize the formula, <sup>q</sup> is not (see <sup>T</sup>2, Example 1), hence the choice between <sup>p</sup> and <sup>q</sup> is plain. But it seems to be the only obvious choice at the moment. In the case of the second formula, every literal gets the same value: 0.5. What is more, in the case of longer formulas a situation depicted by the rightmost syntactic trees is very likely to happen: we obtain dp(D, p)=0.5 *twice* (since *dp* works on *occurrences* of literals), and dp(¬D,¬p)=0.<sup>5</sup> *three times*.

In the aggregation of the dp-values we use the parametrised Hamacher snorm, defined for a, b ∈ 0; 1 as follows:

$$a \operatorname{s}\_{\lambda} b = \frac{a + b - ab - (1 - \lambda)ab}{1 - (1 - \lambda)ab}$$

for which we have taken λ = 0.1, as the value turned out to give the best results. Hamacher s-norm can be seen as a fuzzy alternative; it is commutative and associative, hence it is straightforward to extend its application to an arbitrary finite number of arguments. For a = b = c = 0.5 we obtain:

$$a \text{ s}\_{\lambda} \, b \approx 0.677, \quad \text{and } (a \text{ s}\_{\lambda} \, b) \text{ s}\_{\lambda} \, c \approx 0.768$$

The value of this norm is calculated for a formula A and a literal l by taking the dp-values dp(A, l) for each occurrence l in the syntactic tree of A. This value will be denoted as 'h(A, l)'; in case there is only one value dp(A, l), we take h(A, l) = dp(A, l). Hence, referring to the above Example 4, we have *e.g.* <sup>h</sup>(B, p) = 1, <sup>h</sup>(¬B,¬p)=0.25, <sup>h</sup>(¬D,¬p) <sup>≈</sup> <sup>0</sup>.768.

Finally, function H is defined for *variables*, not their occurrences, in formula A as follows:

$$H(A, p\_i) = \frac{\max(h(A, p\_i), h(\neg A, p\_i)) + \max(h(A, \neg p\_i), h(\neg A, \neg p\_i))}{2}$$

The important property of this apparatus is that for a, b < 1 we have a **s**<sup>0</sup>.<sup>1</sup> b > max{a, b}, and thus <sup>h</sup>(A, l) and <sup>H</sup>(A, pi) are sensitive to the number of aggregated elements. Another desirable feature of the introduced functions is that h(A, pi) = 1 indicates that one can synthesize A on a branch starting with p<sup>i</sup> without further applications of (cut); furthermore, H(A, pi) = 1 indicates that both <sup>p</sup><sup>i</sup> and <sup>¬</sup>p<sup>i</sup> have this property.

Let us stress that the values of dp, h and H are very easy to calculate. Given a formula A, we need to assign a dp-value to each of its components, and the number of components is linear in the length of A. On the other hand, the information gained by these calculations is sometimes not sufficient. The assignment dp(A, pi)=2−<sup>m</sup> says only that A can be built from p<sup>i</sup> and m other components of A, but it gives us no clue as to which components are needed. In Example 4, H works perfectly, as we have H(B, p) = 1 and H(B, q)=0.625, hence <sup>H</sup> indicates the following instruction of construction of an ST: {p, q}. Unfortunately, in the case of formula C we have H(C, p) = H(C, q) = H(C, r) = 0.5, hence a more sophisticated solution is needed.

#### **6 Data**

At the very beginning of the process of data generation we faced the following general problem: how to make any *conclusive* inferences about an infinite population (all Form) on the basis of finite data? Considering the methodological problems connected with applying classical statistical inference methods in this context, we limited our analysis to descriptive statistics, exploratory analysis and testing. To make this as informative as possible, we took a 'big data' approach: for every formula we generated all possible STs, differing in the order of applications of (cut) on particular branches. In addition to that, where it was feasible, we generated all possible formulas falling under some syntactical specifications. The approach is aimed at testing different optimisation methods as well as exploring data in search for patterns and new hypotheses. The knowledge gained in this way is further used on samples of longer formulas to examine tendencies.

From now on we use l for the length of a formula, k for the number of distinct variables occurring in a formula, and n for the number of all occurrences of variables (leaves, if we think of formulas as trees). On the first stage we examined a dataset containing all possible STs for formulas with l = 12 and k - 4. There are over 33 million of different STs already for these modest values; for larger k the data to analyse was simply too big. We generated 242,265 formulas, from which we have later removed those with k - 2 and/or k = n, as the results for them where not interesting. In the case of further datasets we also generated all possible STs, but the formulas were longer and they were randomly generated<sup>4</sup>. And so we considered (i) 400 formulas with l = 23, k = 3, (ii) 400 formulas with l = 23, k = 4, (iii) 100 formulas with l = 23, k = 5. In all cases 9 n - 12; this value is to be combined with the occurrences of negations in a formula—the smaller n, the more occurrences of negation.

Having all possible STs for a formula generated, we could simply check what is the optimal ST' size for this formula. The idea was to look for possible relations between, on the one hand, instructions producing the small STs, and, on the other hand, properties of formulas that are easy to calculate, like dp or numbers of occurrences of variables. The first dataset included only relatively small formulas; however, with all possible formulas of a given type available, it was possible *e.g.* to track various types of 'unusual' behaviour of formulas and all possible problematic issues regarding the optimisation methods, which could remain unnoticed if only random samples of formulas were generated. In case of randomly generated formulas the 'special' or 'difficult' types of formulas may not be tracked (as the probability of drawing them may be small), but instead we have an idea of an 'average' formula, or average behaviour of the optimisation methods. By generating all the STs, in turn, we gained access to full information not only about the regular but also irregular STs, which is the basis for indicating the set of optimal STs and the evaluation of the optimisation methods.

#### **7 Data Analysis and a Discussion of Results**

In this section we present some results of analyses performed on our data. The main purpose of the analyses is to test the effectiveness of the function H in terms

<sup>4</sup> The algorithm of generating random formulas is described in [11]. The author prepared also the Haskell implementation of the algorithm. See https://github.com/ kiryk/random-for.

**Fig. 1.** Distribution of the difference between the size of a maximal and that of a minimal ST for formulas with *k* = 4*,* 5.

of indicating a small ST. Moreover, we performed different types of exploratory analysis on the data, aiming at understanding the variation of size among all STs for different formulas, and how it relates to the effectiveness of H.

Most results will be presented for the five combinations of the values of l and <sup>k</sup> in our data, that is, <sup>l</sup> = 12, k ∈ {3, <sup>4</sup>} and <sup>l</sup> = 23, k ∈ {3, <sup>4</sup>, <sup>5</sup>}; however, some results will be presented with the values of k = 3 and k = 4 grouped together (where the difference between them is insignificant) and the charts are presented only for k 4.

We will examine the variation of size among STs using a range statistic: by *range of the size of ST for a formula* A (ST range, for short) we mean the difference between an ST of maximal and minimal size; this value indicates the possible room for optimization. The maximal-size ST is bounded by the size of a canonical ST for a given formula; its size depends only on k. For k =4a canonical ST has 16 branches, for k = 5 it is 32 branches.

The histograms on Fig. 1 present the distributions of ST range for formulas with k = 4 and k = 5. The rightmost bar in the histogram for l = 23, k = 5 says that for 5 (among 100) formulas there are STs with only two branches, where the maximal STs for these formulas have 32 branches. We can also read from the histograms that for formulas with k = 4 the ST range of some formulas is equal to 0 (7.9% of formulas with l = 12 and 3.5% with l = 23), which means that all STs have the same size. We have decided to exclude these formulas from the results of tests of efficiency of H, as the formulas leave no room for optimization. However, as can be seen on the histogram, there were no formulas of this kind among those with k = 5. This indicates that with the increase of k the internal differentiation of the set of STs for a formula increases as well, leading to a smaller share of formulas with small ST range.

Two more measures relating to the distribution of the size of ST may be of interest. Firstly, the share of formulas for which no regular ST is of optimal size—

**Table 2.** Row A: the share of formulas that do not have a regular ST of optimal size. Row B: the share of optimal STs among all STs for a formula; this was first calculated for each formula, then averaged over all formulas in a given set.


it indicates how wrong we can be in pointing to only the regular STs. Secondly, the percentage share of optimal STs among all STs for a given formula. The latter gives an idea what is the chance of picking an optimal ST at random. Table 2 presents both values for formulas depending on k and l (let us recall that formulas with ST range equal to 0 are excluded from the analysis). In both cases we can see clearly a tendency with growing k. As was to be expected, the table shows that the average share of optimal STs depends on the value of k rather than the size of the formula. This is understandable—as the number of branches depends on k only, the length of a formula translates to the length of branches, and the latter is linear in the former. In a way, this explains why the results are almost identical when the size of STs is calculated in terms of nodes rather than branches (as we mentioned above, the overall correlation between the two measures makes the choice between them irrelevant).

We can categorise the output of the function H into three main classes. In the first case, the values assigned to variables by H strictly order the variables, which results in one specific instruction of construction of a regular ST. The general score of such unique indications was very high: 70.9% for formulas with l = 12, 92.0% for l = 23, k = 3, 4, and 72.0% for k = 5. The second possibility is when H assigns the same value to each variable; in this case we gain no information at all (let us recall that we have excluded the only cases that could justify such assignments, that is, the formulas for which each ST is of the same size). The share of such formulas in our datasets was small: 0.6% for l = 12, 0.1% for l = 23, k = 3, 4 and 0% for k = 5, suggesting that it tends to fall with k rising. The third possibility is that the ordering is not strict, yet some information is gained. In this case for some, but not all, variables the value of H is the same.

The methodology used to asses effectiveness of H is quite simple. We assume that every indication must be a single regular instruction, hence we use additional criteria in case of formulas of the second and third kind described, in order to obtain a strict ordering. If H outputs the same value for some variables, we first order the variables by the number of occurrences in the formula; if the ordering is still not strict, we give priority to variables for which the sum of depths for all occurrences of literals in the syntactic tree is smaller; finally, where the above criteria do not provide a strict ordering, the order is chosen at random.

We used three evaluating functions to asses the quality of indications. Each function takes as arguments a formula and the ST for this formula indicated by **Table 3.** The third column gives the number of formulas satisfying the characteristic presented in the first and the second column. The further three columns display values averaged on the sets. *F*<sup>1</sup> indicates how often we indicate an optimal ST. *F*<sup>2</sup> reports the mistake of our indication calculated as the difference of sizes between the indicated ST and an optimal one. Finally, POT indicates proximity to an optimal ST in a standardized way.


our heuristics. The first function (F<sup>1</sup> in Table 3) outputs 1 if the indicated ST is of optimal size, 0 otherwise. The second function (F<sup>2</sup> in Table 3) outputs the difference between the size of the indicated ST and the optimal size. The third function is called *proximity to optimal tableau*, POT<sup>A</sup> in symbols:

$$\mathsf{POT}\_A(\mathcal{T}) = 1 - \frac{|\mathcal{T}| - \min\_A}{\max\_A - \min\_A}$$

where <sup>T</sup> is the ST for formula <sup>A</sup> indicated by <sup>H</sup>, |T | is the size of <sup>T</sup> , max<sup>A</sup> is the size of an ST for A of maximal size, and min<sup>A</sup> is the size of an optimal ST for A. Later on we skip the relativization to A. Let us observe that the value |T |−min max−min represents a mistake in indication relative to the ST range of a formula, and in this sense POT<sup>A</sup> can be considered as a standardized measure of the quality of indication. Finally, values of each of the three evaluating functions were calculated for sets of formulas, by taking average values over all formulas in the set.

The results of the three functions presented in Table 3 show that optimal STs are indicated less often for formulas with greater k; however, the POT values seem to remain stable across all data, indicating that, on average, proximity of the indicated ST to the optimal ones does not depend on k or l.

Further analysis showed that the factor that most influenced the efficiency of our methodology was whether there is at least one value 1 among the dp-values of literals for a formula A. We shall write 'Max(dp) = 1' if this is the case, and 'Max(dp) < 1' otherwise (we skip the relativisation to A for simplicity). For formulas with Max(dp) = 1, results of the evaluating functions were much better; for example, the value of the POT function for formulas with l = 12 was 0.979 if Max(dp) = 1, and 0.814 for those with Max(dp) < 1; in case of formulas with l = 23, k = 3, 4 those values were 0.968 and 0.869, respectively, and for formulas with l = 23, k = 5 it was 0.974 and 0.901, respectively. This shows that our methodology works significantly worse if Max(dp) < 1; on the other hand, if Max(dp) = 1, the dp measure works very well. It should also be pointed

**Fig. 2.** Distribution of the difference between indicated and optimal ST in relation to ST-range. Every point corresponds to a formula, the points are slightly jittered in order to improve readibility. Each chart corresponds to different data, formulas *k* = 3 are excluded; additionally the colour indicates whether Max(dp) = 1 for a formula.

out that the difference between the POT values for both groups is smaller for formulas with greater l and k. Figure 2 presents a scatter plot that gives an idea of the whole distribution of the values of the POT function in relation to the ST range. Each formula on the plot is represented by a point, the colours additionally indicating whether Max(dp) < 1. The chart suggests, similarly as Table 3, that the method works well as the values of l and k rise for formulas, indicating STs that are on average equally close to the optimal ones.

One can point at two possible explanations of the fact that our methodology works worse for formulas with Max(dp) < 1. Firstly, if *e.g.*, dp(A, p)=2−<sup>m</sup>, we only obtain the information that, except for p, m more occurrences of components of A are required in order to synthesize the whole formula. Secondly, the function H neglects the complex dependencies between the various aggregated occurrences of a given variable, taking into account only the number of occurrences of literals in an aggregated group. However, considering very low computational complexity of the method based on the dp values and the function H, the outlined framework seems to provide good heuristics for indicating small STs. Methods that would reflect more aspects of the complex structure of logical formulas would likely require much more computational resources.

On a final note, we would like to add that exploration of the data allowed us to study properties of formulas that went beyond the scope of the optimisation of ST. The data was used in a similar way as in so called Experimental Mathematics, where numerous instances are analysed and visualized in order to *e.g.* gain insight, search for new patterns and relationships, test conjectures and introduce new concepts (see *e.g.* [1]).


**Table 4.** The pigeonhole principle

#### **8 The Pigeonhole Principle**

At the end we consider the propositional version of the principle introduced by Cook and Reckhow in [3, p. 43]. In the field of proof complexity the principle was used to prove that resolution is *intractable*, that is, any resolution proof of the propositional pigeonhole principle must be of exponential size (wrt the size of the formula). This has been proved by Haken in [10], see also[2].

Here is PHP<sup>m</sup> in the propositional version:

$$\bigwedge\_{0 \le i \le m} \bigvee\_{0 \le j < m} p\_{i,j} \to \bigvee\_{0 \le i < n \le m} \bigvee\_{0 \le j < m} (p\_{i,j} \wedge p\_{n,j})$$

where and stand for generalized conjunction, disjunction (respectively) with the range indicated beneath.

The pigeonhole principle is constructed in a perfect symmetry of the roles played by the consecutive variables. Each variable has the same number of occurrences in the formula, and each of them gets the same value under H, they also have occurrences at the same depth of a syntactic tree. All this means that in our account we can only suggest a random, regular ST. However, it is worth noticing that, first, H behaves consistently with the structure of the formula, and second, the result is still attractive. In Table 4 the fourth column presents the size of the ST indicated by our heuristics, that is, in fact, generated by random ordering of variables. It is to be contrasted with the number 2<sup>k</sup> in the last column describing the size of a canonical ST for the formula, which is at the same time the number of rows in a truth table for the formula. The minimal STs for the formulas were found with pen and paper and they are irregular.

#### **9 Summary and Further Work**

We presented a proof method of Synthetic Tableaux for CPL and explained how the efficiency of tableau construction depends on the choices of variables to apply (cut) to. We defined possible algorithms to choose the variables and experimentally tested their efficiency.

Our plan for the next research is well defined and it is to implement heuristics amenable to produce instructions for irregular STs. We have an algorithm, yet untested.

As far as proof-theoretical aims are concerned, the next task is to extend and adjust the framework to the first-order level based on the already described ST system for first-order logic [14]. We also wish to examine the efficiency of our indications on propositional non-classical logics for which the ST method exists (see [20,22]). In the area of data analysis another possible step would be to perform more complex statistical analysis using e.g. machine learning methods.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Modal Logics**

# **Paraconsistent G¨odel Modal Logic**

Marta B´ılkov´a<sup>1</sup> , Sabine Frittella<sup>2</sup> , and Daniil Kozhemiachenko2(B)

<sup>1</sup> The Czech Academy of Sciences, Institute of Computer Science, Prague, Czech Republic

bilkova@cs.cas.cz

<sup>2</sup> INSA Centre Val de Loire, Univ. Orl´eans, LIFO EA 4022, Bourges, France {sabine.frittella,daniil.kozhemiachenko}@insa-cvl.fr

**Abstract.** We introduce a paraconsistent modal logic **K**G<sup>2</sup>, based on G¨odel logic with coimplication (bi-G¨odel logic) expanded with a De Morgan negation ¬. We use the logic to formalise reasoning with graded, incomplete and inconsistent information. Semantics of **K**G<sup>2</sup> is twodimensional: we interpret **K**G<sup>2</sup> on crisp frames with two valuations v<sup>1</sup> and v2, connected via ¬, that assign to each formula two values from the real-valued interval [0, 1]. The first (resp., second) valuation encodes the positive (resp., negative) information the state gives to a statement. We obtain that **K**G<sup>2</sup> is strictly more expressive than the classical modal logic **K** by proving that finitely branching frames are definable and by establishing a faithful embedding of **K** into **K**G<sup>2</sup>. We also construct a constraint tableau calculus for **K**G<sup>2</sup> over finitely branching frames, establish its decidability and provide a complexity evaluation.

**Keywords:** Constraint tableaux · G¨odel logic · Two-dimensional logics · Modal logics

# **1 Introduction**

People believe in many things. Sometimes, they even have contradictory beliefs. Sometimes, they believe in one statement more than in the other. However, if a person has contradictory beliefs, they are not bound to believe in anything. Likewise, believing in φ *strictly more than* in χ makes one believe in φ *completely*. These properties of beliefs are natural, and yet hardly expressible in the classical modal logic. In this paper, we present a two-dimensional modal logic based on G¨odel logic that can formalise beliefs taking these traits into account.

*Two-Dimensional Treatment of Uncertainty.* Belnap-Dunn four-valued logic (BD, or First Degree Entailment—FDE) [4,16,34] can be used to formalise

The research of Marta B´ılkov´a was supported by the grant 22-01137S of the Czech Science Foundation. The research of Sabine Frittella and Daniil Kozhemiachenko was funded by the grant ANR JCJC 2019, project PRELAP (ANR-19-CE48-0006). This research is part of the MOSAIC project financed by the European Union's Marie Sklodowska-Curie grant No. 101007627.

reasoning with both incomplete and inconsistent information. In BD, formulas are evaluated on the De Morgan algebra **4** (Fig. 1, left) where the four values {t, f, b, n} encode the information available about the formula: true, false, both true and false, neither true nor false. b and n thus represent inconsistent and incomplete information, respectively. It is important to note that the values represent the available information about the statement, not its intrinsic truth or falsity. Furthermore, this approach essentially treats *evidence for* a statement (its positive support) as being independent of *evidence against* it (negative support) which allows to differentiate between 'absence of evidence' and the 'evidence of absence'. The BD negation ¬ then swaps positive and negative supports.

**Fig. 1. 4** (left) and its continuous extension [0, 1]- (right). (x, y) ≤[0*,*1]- (x- , y- ) iff x ≤ x and y ≥ y- .

The information regarding a statement, however, might itself be not crisp after all, our sources are not always completely reliable. Thus, to capture the uncertainty, we extend **4** to the lattice [0, 1]- (Fig. 1, right). [0, 1] is a twist product (cf, [37] for definitions) of [0, 1] with itself: the order on the second coordinate is reversed w.r.t. the order on the first coordinate. This captures the intuition behind the usual 'truth' (upwards) order: an agent is more certain in χ than in φ when the evidence for χ is stronger than the evidence for φ while the evidence against χ is weaker than the evidence against φ.

Note that [0, 1] is a bilattice whose left-to-right order can be interpreted as the information order. This links the logics we consider to bilattice logics applied to reasoning in AI in [19] and then studied further in [24,35].

*Comparing Beliefs.* Uncertainty is manifested not only in the non-crisp character of the information. An agent might often lack the capacity to establish the concrete numerical value that represents their certainty in a given statement. Indeed, 'I am 43% certain that the wallet is Paula's' does not sound natural. On the other hand, it is reasonable to assume that the agents' beliefs can be compared in most contexts: neither 'I am more confident that the wallet is Paula's than that the wallet is Quentin's', nor 'Alice is more certain than Britney that Claire loves pistachio ice cream' require us to give a concrete numerical representation to the (un)certainty.

These considerations lead us to choosing the two-dimensional relative of the G¨odel logic dubbed G<sup>2</sup> as the propositional fragment of our logic. G<sup>2</sup> was introduced in [5] and is, in fact, an extension of Moisil's logic<sup>1</sup> from [31] with the prelinearity axiom (p → q) ∨ (q → p). As in the original G¨odel logic G, the validity of a formula in G<sup>2</sup> depends not on the values of its constituent variables but on the relative order between them. In this sense, G is a logic of comparative truth. Thus, as we treat positive and negative supports of a given statement independently, G<sup>2</sup> is a logic of comparative truth and falsity. Note that while the values of two statements may not be comparable (say, p is evaluated as (0.5, 0.3) and q as (0, 0)), the coordinates of the values always are. We will see in Sect. 2, how we can formalise statements comparing agents' beliefs.

The sources available to the agents as well as the references between these sources can be represented as states in a Kripke model and its accessibility relation, respectively. It is important to mention that we account for the possibility that a source can give us contradictory information regarding some statement. Still, we want our reasoning with such information to be non-trivial. This is reflected by the fact that (p∧¬p) <sup>→</sup> <sup>q</sup> is not valid in <sup>G</sup><sup>2</sup>. Thus, the logic (treated as a set of valid formulas) lacks the explosion principle. In this sense, we call G<sup>2</sup> and its modal expansions 'paraconsistent'. This links our approach to other paraconsistent fuzzy logics such as the ones discussed in [17].

To reason with the information provided by the sources, we introduce two interdefinable modalities— and ♦—interpreted as infima and suprema w.r.t. the upwards order on [0, 1]-. We mostly assume (unless stated otherwise) that accessibility relations in models are crisp. Intuitively, it means that the sources are either accessible or not (and, likewise, either refer to the other ones, or not).

*Broader Context.* This paper is a part of the project introduced in [6] and carried on in [5] aiming to develop a modular logical framework for reasoning based on uncertain, incomplete and inconsistent information. We model agents who build their epistemic attitudes (like beliefs) based on information aggregated from multiple sources. and ♦ can be then viewed as two simple aggregation strategies: a pessimistic one (the infimum of positive support and the supremum of the negative support), and an optimistic one (the dual strategy), respectively. They can be defined via one another using ¬ in the expected manner: φ stands for ¬♦¬φ and ♦φ for ¬-¬φ. In this paper, in contrast to [15] and [6], we do allow for modalities to nest.

The other part of our motivation comes from the work on modal G¨odel logic (GK—in the notation of [36]) equipped with relational semantics [12,13, 36]. There, the authors develop proof and model theory of modal expansions of G interpreted over frames with both crisp and fuzzy accessibility relations. In particular, it was shown that the --fragment<sup>2</sup> of GK lacks the finite model property (FMP) w.r.t. fuzzy frames while the ♦-fragment has FMP<sup>3</sup> only w.r.t. fuzzy (but not crisp) frames. Furthermore, both and ♦ fragments of GK are PSPACE-complete [28,29].

<sup>1</sup> This logic was introduced several times: by Wansing [38] as I4C<sup>4</sup> and then by Leitgeb [27] as HYPE. Cf. [33] for a recent and more detailed discussion.

<sup>2</sup> Note that and ♦ are not interdefinable in GK—cf. [36, Lemma 6.1] for details. <sup>3</sup> There is, however, a semantics in [11] w.r.t. which bi-modal GK has FMP.

Description G¨odel logics, a notational version of modal logics, have found their use the field of knowledge representation [8–10], in particular, in the representation of vague or uncertain data which is not possible in the classical ontologies. In this respect, our paper provides a further extension of representable data types as we model not only vague reasoning but also non-trivial reasoning with inconsistent information.

In the present paper, we are expanding the language with the G¨odel coimplication to allow for the formalisation of statements expressing that an agent is *strictly more confident* in one statement than in another one (cf. Sect. 2 for the details). Furthermore, the presence of ¬ will allow us to simplify the frame definability. Still, we will show that our logic is a conservative extension of GK<sup>c</sup> —the modal G¨odel logic of crisp frames from [36] in the language with both and ♦.

*Logics.* We are discussing many logics obtained from the propositional G¨odel logic G. Our main interest is in the logic we denote **K**G<sup>2</sup>. It can be produced from <sup>G</sup> in several ways: (1) adding De Morgan negation <sup>¬</sup> to obtain <sup>G</sup><sup>2</sup> (in which case φ φ can be defined as ¬(¬φ → ¬φ)) and then further expanding the language with or ♦; (2) adding or Δ (Baaz' delta) to G, then both and ♦ thus acquiring **<sup>K</sup>**biG<sup>4</sup> (modal bi-G¨odel logic) which is further enriched with <sup>¬</sup>. These and other relations are given on Fig. 2.

**Fig. 2.** Logics in the article. ff stands for 'permitting fuzzy frames'. Subscripts on arrows denote language expansions. / stands for 'or' and comma for 'and'.

*Plan of the Paper.* The remainder of the paper is structured as follows. In Sect. 2, we define bi-G¨odel algebras and use them to present **K**biG (on both fuzzy and crisp frames) and then **K**G<sup>2</sup> (on crisp frames), show how to formalise statements where beliefs of agents are compared, and prove some semantical properties. In Sect. 3, we show that ♦ fragment of **K**biG<sup>f</sup> (**K**biG on fuzzy frames) lacks finite model property. We then present a finitely branching fragment of **K**G<sup>2</sup> (**K**G<sup>2</sup> fb) and argue for its use in representation of agents' beliefs. In Sect. 4, we design a constraint tableaux calculus for **K**G<sup>2</sup> fb which we use to obtain the complexity results. Finally, in Sect. 5 we discuss further lines of research.

<sup>4</sup> To the best of our knowledge, the only work on bi-G¨odel (symmetric G¨odel) modal logic is [20]. There, the authors propose an expansion of biG with and ♦ equipped with proof-theoretic interpretation and provide its algebraic semantics.

### **2 Language and Semantics**

In this section, we present semantics for **K**biG (modal bi-G¨odel logic) over both fuzzy and crisp frames and the one for **K**G<sup>2</sup> over crisp frames. Let Var be a countable set of propositional variables. The language biL<sup>¬</sup> ,♦ is defined via the following grammar.

$$\phi := p \in \mathsf{Val} \; | \; \neg \phi \; | \; (\phi \land \phi) \; | \; (\phi \lor \phi) \; | \; (\phi \to \phi) \; | \; (\phi \lhd \phi) \; | \; \Box \phi \; | \; \Diamond \phi$$

Two constants, **0** and **1**, can be introduced in the traditional fashion: **0** := p p, **1** := p → p. Likewise, the G¨odel negation can be also defined as expected: ∼φ := φ → **0**. The ¬-less fragment of biL<sup>¬</sup> ,♦ is denoted with biL,♦.

To facilitate the presentation, we introduce bi-G¨odel algebras.

**Definition 1.** *The bi-G¨odel algebra* [0, 1]<sup>G</sup> = ([0, 1], <sup>0</sup>, <sup>1</sup>,∧G,∨G,→G, -<sup>G</sup>) *is defined as follows: for all* a, b ∈ [0, 1]*, the standard operations are given by* a ∧<sup>G</sup> b := min(a, b)*,* a ∨<sup>G</sup> b := max(a, b)*,*

$$a \to\_G b = \begin{cases} 1, & \text{if } a \le b \\ b & \text{else}, \end{cases} \qquad\qquad b \prec\_G a = \begin{cases} 0, & \text{if } b \le a \\ b & \text{else}. \end{cases}$$

#### **Definition 2.**


**Definition 3 (K**biG **models).** *A* **K**biG model *is a tuple* M = W, R, v *with* W, R *being a (crisp or fuzzy) frame, and* v : Var × W → [0, 1]*.* v *(a valuation) is extended on complex* biL,♦ *formulas as follows:*

$$v(\phi \circ \phi', w) = v(\phi, w) \circ\_{\mathbb{G}} v(\phi', w). \tag{\circ \in \{\land, \lor, \to, \prec\}})$$

*The interpretation of modal formulas on* fuzzy *frames is as follows:*

$$v(\Box \phi, w) = \inf\_{w' \in W} \{ wRw' \to \mathfrak{g} \ v(\phi, w') \}, \quad v(\Diamond \phi, w) = \sup\_{w' \in W} \{ wRw' \wedge \mathfrak{g} \ v(\phi, w') \}.$$

*On* crisp *frames, the interpretation is simpler (here,* inf(∅)=1 *and* sup(∅)=0*):*

$$v(\Box \phi, w) = \inf \{ v(\phi, w') : wRw' \}, \qquad v(\Diamond \phi, w) = \sup \{ v(\phi, w') : wRw' \}.$$

*We say that* φ ∈ biL,♦ *is* **K**biG valid on frame F *(denote,* F |=**K**biG φ*) iff for any* w ∈ F*, it holds that* v(φ, w)=1 *for any model* M *on* F*.*

Note that the definitions of validity in GK<sup>c</sup> and GK coincide with those in **K**biG and **K**biG<sup>f</sup> if we consider the --free fragment of biL,♦.

As we have already mentioned, on *crisp* frames, the accessibility relation can be understood as availability of (trusted or reliable) sources. In *fuzzy* frames, it can be thought of as the degree of trust one has in a source. Then, ♦φ represents the search for evidence from trusted sources that supports φ: v(♦φ, t) > 0 iff there is t s.t. tRt > 0 and v(φ, t ) > 0, i.e., there must be a source t to which t has positive degree of trust and that has at least some certainty in φ. On the other hand, if no source is trusted by t (i.e., tRu = 0 for all u), then v(♦φ, t) = 0. Likewise, χ can be construed as the search of evidence against χ given by trusted sources: v(χ, t) < 1 iff there is a source t that gives to χ less certainty than t gives trust to t . In other words, if t trusts no sources, or if all sources have at least as high confidence in χ as t has in them, then t fails to find a trustworthy enough counterexample.

**Definition 4 (K**G<sup>2</sup> **models).** *<sup>A</sup>* **<sup>K</sup>**G<sup>2</sup> model *is a tuple* <sup>M</sup> <sup>=</sup> W, R, v1, v<sup>2</sup> *with* W, R *being a* crisp *frame, and* v1, v<sup>2</sup> : Var × W → [0, 1]*. The valuations which we interpret as support of truth and support of falsity, respectively, are extended on complex formulas as expected.*

v1(¬φ, w) = v2(φ, w) v2(¬φ, w) = v1(φ, w) v1(φ ∧ φ , w) = v1(φ, w) ∧<sup>G</sup> v1(φ , w) v2(φ ∧ φ , w) = v2(φ, w) ∨<sup>G</sup> v2(φ , w) v1(φ ∨ φ , w) = v1(φ, w) ∨<sup>G</sup> v1(φ , w) v2(φ ∨ φ , w) = v2(φ, w) ∧<sup>G</sup> v2(φ , w) v1(φ → φ , w) = v1(φ, w)→<sup>G</sup> v1(φ , w) v2(φ → φ , w) = v2(φ , w) -<sup>G</sup> v2(φ, w) v1(φ φ , w) = v1(φ, w) -<sup>G</sup> v1(φ , w) v2(φ φ , w) = v2(φ , w)→<sup>G</sup> v2(φ, w) v1(φ, w) = inf{v1(φ, w ) : wRw } v2(φ, w) = sup{v2(φ, w ) : wRw } v1(♦φ, w) = sup{v1(φ, w ) : wRw } v2(♦φ, w) = inf{v2(φ, w ) : wRw }

*We say that* φ ∈ biL<sup>¬</sup> ,♦ *is* **<sup>K</sup>**G<sup>2</sup> valid on frame <sup>F</sup> *(*<sup>F</sup> <sup>|</sup>=**K**G<sup>2</sup> <sup>φ</sup>*) iff for any* w ∈ F*, it holds that* v1(φ, w)=1 *and* v2(φ, w)=0 *for any model* M *on* F*.*

**Convention 1.** *In what follows, we will denote a pair of valuations* v1, v<sup>2</sup> *just with* v *if there is no risk of confusion. Furthermore, for each frame* F *and each* w ∈ F*, we denote*

> R(w) = {w : wRw = 1}, (for fuzzy frames) R(w) = {w : wRw }. (for crisp frames)

**Convention 2.** *We will further denote with* **K**biG *the set of all formulas* **K**biG*valid on all* crisp *frames;* **K**biG<sup>f</sup> *the set of all formulas* **K**biG*-valid on all* fuzzy *frames; and* **K**G<sup>2</sup>*—the set of all formulas* **K**G<sup>2</sup> *valid on all* crisp *frames.*

Before proceeding to establish some semantical properties, let us make two remarks. First, neither nor ♦ are trivialised by contradictions: in contrast to **<sup>K</sup>**, -(p∧ ¬p) → <sup>q</sup> is not **<sup>K</sup>**G<sup>2</sup> valid, and neither is ♦(p∧ ¬p) <sup>→</sup> ♦q. Intuitively, this means that one can have contradictory but non-trivial beliefs. Second, we can formalise statements of comparative belief such as the ones we have already given before:

wallet: *I am more confident that the wallet is Paula's than that the wallet is Quentin's.*

ice cream: *Alice is more certain than Britney that Claire loves pistachio ice cream.*

For this, consider the following defined operators.

$$
\Delta \tau := \sim(\mathbf{1} \prec \tau) \tag{1}
$$

$$
\Delta^{\frown} \phi := \sim(\mathbf{1} \prec \phi) \land \neg \sim \sim(\mathbf{1} \prec \phi) \tag{2}
$$

It is clear that for any τ ∈ biL,♦ and φ ∈ biL<sup>¬</sup> ,♦ interpreted on **<sup>K</sup>**biG and **<sup>K</sup>**G<sup>2</sup> models, respectively, it holds that

$$v(\Delta \tau, w) = \begin{cases} 1 & \text{if } v(\tau, w) = 1 \\ 0 & \text{otherwise}, \end{cases} \quad v(\Delta^\top \phi, w) \quad = \begin{cases} (1, 0) & \text{if } v(\phi, w) = (1, 0) \\ (0, 1) & \text{otherwise}. \end{cases} \tag{3}$$

Now we can define formulas that express order relations between values of two formulas both for **K**biG and **K**G<sup>2</sup>.

For **K**biG they look as follows:

$$\begin{aligned} v(\tau, w) &\le v(\tau', w) \text{ iff } v(\Delta(\tau \to \tau'), w) = 1, \\ v(\tau, w) &> v(\tau', w) \text{ iff } v\left(\sim \Delta(\tau' \to \tau), w\right) = 1. \end{aligned}$$

In **K**G<sup>2</sup>, the orders are defined in a more complicated way:

$$\begin{aligned} v(\phi, w) &\le v(\phi', w) \text{ iff } v(\Delta^\frown(\phi \to \phi'), w) = (1, 0), \\ v(\phi, w) &> v(\phi', w) \text{ iff } v(\Delta^\frown(\phi' \to \phi) \land \sim \Delta^\frown(\phi \to \phi'), w) = (1, 0). \end{aligned}$$

Observe, first, that both in **K**biG and **K**G<sup>2</sup> the relation 'the value of τ (φ) is less or equal to the value of τ (φ )' is defined as 'τ → τ (φ → φ ) has the designated value'. In **K**biG, the strict order is just a negation of the non-strict order since all values are comparable. On the other hand, in contrast to **K**biG, the strict order in **K**G<sup>2</sup> is not a simple negation of the non-strict order since **K**G<sup>2</sup> is essentially two-dimensional. We provide further details in Remark 2.

Finally, we can formalise wallet as follows. We interpret 'I am confident' as - and substitute 'the wallet is Paula's' with p, and 'the wallet is Quentin's' with q. Now, we just use the definition of > in biL<sup>¬</sup> ,♦ to get

$$
\Delta^{\frown}(\Box p \to \Box q) \land \sim \Delta^{\frown}(\Box q \to \Box p).\tag{4}
$$

For ice cream, we need two different modalities: <sup>a</sup> and <sup>b</sup> for Alice and Brittney, respectively. Replacing 'Alice loves pistachio ice cream' with p, we get

$$
\Delta^\frown(\Box\_ap \to \Box\_bp) \land \sim \Delta^\frown(\Box\_bp \to \Box\_ap).\tag{5}
$$

*Remark 1.* Δ is called Baaz' delta (cf., e.g. [3] for more details). Intuitively, Δτ can be interpreted as 'τ has the designated value' and acts much like a necessity modality: if τ is **K**biG valid, then so is Δτ ; moreover, Δ(p → q) → (Δp → Δq) is valid. Furthermore, Δ and can be defined via one another in **K**biG, thus the addition of Δ to G makes it more expressive and allows to define both strict and non-strict orders.

*Remark 2.* Recall that we mentioned in Sect. 1 that an agent should usually be able to compare their beliefs in different statements: this is reflected by the fact that Δ(p → q) ∨ Δ(q → p) is **K**biG valid. It can be counter-intuitive if the contents of beliefs have nothing in common, however.

This drawback is avoided if we treat support of truth and support of falsity independently. Here is where a difference between **K**biG and **K**G<sup>2</sup> lies. In **K**G<sup>2</sup>, we can *only compare the values of formulas coordinate-wise*, whence Δ¬(p → <sup>q</sup>)∨Δ¬(<sup>q</sup> <sup>→</sup> <sup>p</sup>) is not **<sup>K</sup>**G<sup>2</sup> valid. E.g., if we set <sup>v</sup>(p, w) = (0.7, <sup>0</sup>.6) and <sup>v</sup>(q, w) = (0.4, 0.2), v(p, w) and v(q, w) will not be comparable w.r.t. the truth (upward) order on [0, 1]-.

We end this section with establishing some useful semantical properties.

**Proposition 1.** F |=**K**G<sup>2</sup> φ *iff for any model* M *on* F *and any* w∈F*,* v1(φ, w)=1*.*

*Proof.* The 'if' direction is evident from the definition of validity. We show the 'only if' part. It suffices to show that the following statement holds for any φ and w ∈ F:

 $for\ any\ v(p,w)=(x,y),\ let\ v^\*(p,w)=(1-y,1-x).\ Then\ v(\phi,w)=(x,y).$  $iff\ v^\*(\phi,w)=(1-y,1-x).$ 

We proceed by induction on φ. The proof of propositional cases is identical to the one in [5, Proposition 5]. We consider only the case of φ = ψ since and ♦ are interdefinable.

Let v(ψ, w)=(x, y). Then inf{v1(ψ, w ) : wRw } = x, and sup{v2(ψ, w ) : wRw } = y. Now, we apply the induction hypothesis to ψ, and thus if v(ψ, s) = (x , y ), then v∗(ψ, s) = (1−y , 1−x ) for any s ∈ R(w). But then inf{v<sup>∗</sup> 1(ψ, w ) : wRw } = 1 − y, and sup{v<sup>∗</sup> 2(ψ, w ) : wRw } = 1 − x as required.

Now, assume that v1(φ, w) = 1 for any v<sup>1</sup> and w. We can show that v2(φ, w)= 0 for any w and v2. Assume for contradiction that v2(φ, w)=y >0 but v1(φ, w)= 1. Then, v∗(φ)= (1−y, 1−1)= (1−y, 0). But since y >0, v∗(φ) = (1, 0).

#### **Proposition 2.**


*Proof.* 1. follows directly from the semantic conditions of Definition 3. We consider 2. The 'only if' direction is straightforward since the semantic conditions of v<sup>1</sup> in **K**G<sup>2</sup> models and v in **K**biG models coincide. The 'if' direction follows from Proposition 1: if φ is valid on F, then v(φ, w) = 1 for any w ∈ F and any v on F. But then, v1(φ, w) = 1 for any w ∈ F. Hence, F |=**K**G<sup>2</sup> φ.

# **3 Model-Theoretic Properties of KG<sup>2</sup>**

In the previous section, we have seen how the addition of allowed us to formalise statements considering comparison of beliefs. Here, we will show that both - and ♦ fragments of **K**biG, and hence **K**G<sup>2</sup>, are strictly more expressive than the classical modal logic **K**, i.e. that they can define all classically definable classes of crisp frames as well as some undefinable ones.

**Definition 5 (Frame definability).** *Let* Σ *be a set of formulas.* Σ defines *a class of frames* <sup>K</sup> *in a logic* **<sup>L</sup>** *iff it holds that* <sup>F</sup> <sup>∈</sup> <sup>K</sup> *iff* <sup>F</sup> <sup>|</sup>=**<sup>L</sup>** <sup>Σ</sup>*.*

The next statement follows from Proposition 2 since **K** can be faithfully embedded in GK<sup>c</sup> by substituting each variable <sup>p</sup> with ∼∼<sup>p</sup> (cf. [28,29] for details).

**Theorem 1.** *Let* K *be a class of frames definable in* **K***. Then,* K *is definable in* **K**biG *and* **K**G<sup>2</sup>*.*

**Theorem 2.** *1. Let* F *be* crisp*. Then* F *is finitely branching (i.e.,* R(w) *is finite for every* <sup>w</sup> <sup>∈</sup> <sup>F</sup>*) iff* <sup>F</sup> <sup>|</sup>=**K**biG **<sup>1</sup>** - ♦((p q) ∧ q)*.*

*2. Let* F *be* fuzzy*. Then* F *is finitely branching and* sup{wRw : wRw < 1} < 1 *for all* <sup>w</sup> <sup>∈</sup> <sup>F</sup> *iff* <sup>F</sup> <sup>|</sup>=**K**biG **<sup>1</sup>** - ♦((p q) ∧ q)*.*

*Proof.* We show the case of fuzzy frames since the crisp ones can be tackled in the same manner. Assume that F is finitely branching and that sup{wRw : wRw <sup>&</sup>lt;1} <sup>&</sup>lt; 1 for all <sup>w</sup> <sup>∈</sup> <sup>F</sup>. It suffices to show that <sup>v</sup>(♦((<sup>p</sup> q) ∧ q), w) < 1 for all <sup>w</sup> <sup>∈</sup> <sup>F</sup>. First of all, observe that there is no <sup>w</sup> <sup>∈</sup> <sup>F</sup> s.t. <sup>v</sup>((<sup>p</sup> q) ∧ q, w ) = 1. It is clear that sup wRw-<1 {v((<sup>p</sup> q) ∧ q, w ) ∧<sup>G</sup> wRw } < 1 and that

$$\sup\{v((p\prec q)\land q, w') : wRw'=1\} = \max\{v((p\prec q)\land q, w') : wRw'=1\} < 1$$

since R(w) is finite. But then v(♦((p q) ∧ q), w) < 1 as required.

For the converse, either (1) R(w) is infinite for some w, or (2) sup{wRw : wRw < 1} = 1 for some w. For (1), set v(p, w ) = 1 for every w ∈ R(w). Now let <sup>W</sup> <sup>⊆</sup> <sup>R</sup>(w) and <sup>W</sup> <sup>=</sup> {w<sup>i</sup> : <sup>i</sup> ∈ {1, <sup>2</sup>,...}}. We set <sup>v</sup>(q, wi) = <sup>i</sup> <sup>i</sup>+1 . It is easy to see that sup{v(q, wi) : w<sup>i</sup> ∈ W } = 1 and that <sup>v</sup>((<sup>p</sup> q) ∧ q, wi) = v(q, wi). Therefore, v(**1** - ♦((p q) ∧ q), w) = 0.

For (2), we let v(p, w ) = 1 and further, v(q, w ) = wRw for all w ∈ F. Now since sup{wRw : wRw <sup>&</sup>lt; <sup>1</sup>} = 1 and <sup>v</sup>(((pq)∧q), w ) = v(q, w ) for all w ∈ F, it follows that v(♦((p <sup>q</sup>) <sup>∧</sup> <sup>q</sup>), w) = 1, whence <sup>v</sup>(**<sup>1</sup>** - ♦((p q) ∧ q), w) = 0.

*Remark 3.* The obvious corollary of Theorem 2 is the lack of FMP for the ♦ fragment of **K**biG<sup>f</sup> <sup>5</sup> since ♦((p q) ∧ q) in never true in a finite model. This differentiates **K**biG<sup>f</sup> from GK since the ♦-fragment of GK *has* FMP [12, Theorem 7.1]. Moreover, one can define finitely branching frames in fragments of GK and GK<sup>c</sup> . Indeed, ∼∼-(p ∨ ∼p) serves as such definition.

**Corollary 1. K**G<sup>2</sup> *and both and* ♦ *fragments of* **K**biG *are strictly more expressive than* **K***.*

*Proof.* From Theorems 1 and 2 since **K** is complete both w.r.t. all frames and all finitely branching frames. The result for **K**G<sup>2</sup> follows since it is conservative over **K**biG (Proposition 2).

<sup>5</sup> Bi-modal **K**biG<sup>f</sup> lacks have FMP since it is a conservative extension of GK.

These results show us that addition of greatly enhances the expressive power of our logic. Here it is instructive to remind ourselves that classical epistemic logics are usually complete w.r.t. finitely branching frames (cf. [18] for details). It is reasonable since for practical reasoning, agents cannot consider infinitely many alternatives. In our case, however, if we wish to use **K**biG and **K**G<sup>2</sup> for knowledge representation, we need to *impose* finite branching explicitly.

Furthermore, allowing for infinitely branching frames in **K**biG or **K**G<sup>2</sup> leads to counter-intuitive consequences. In particular, it is possible that v(φ, w) = (0, 1) even though there are no w , w ∈ R(w) s.t. v1(φ, w ) = 0 or v2(φ, w) = 1. In other words, there is no source that decisively falsifies φ, furthermore, all sources have some evidence *for* φ, and yet we somehow believe that φ is completely false and untrue. Dually, it is possible that v(♦φ, w) = (1, 0) although there are no w , w ∈ R(w) s.t. v1(φ, w ) = 1 or v2(φ, w) = 0. Even though ♦ is an 'optimistic' aggregation, it should not ignore the fact that *all* sources have some evidence *against* φ but *none* supports it completely.

Of course, this situation is impossible if we consider only finitely branching frames for infima and suprema will become minima and maxima. There, all values of modal formulas will be *witnessed* by some accessible states in the following sense. For ♥∈{-,♦}, i ∈ {1, 2}, if vi(♥φ, w) = x, then there is w ∈ R(w) s.t. vi(φ, w ) = x. Intuitively speaking, finitely branching frames represent the situation when our degree of certainty in some statement is based uniquely on the data given by the sources.

**Convention 3.** *We will further use* **K**biGfb *and* **K**G<sup>2</sup> fb *to denote the sets of all* biL,♦ *and* biL<sup>¬</sup> ,♦ *formulas valid on finitely branching crisp frames.*

Observe, moreover, that and ♦ are still undefinable via one another in biL,♦. The proof is the same as that of [36, Lemma 6.1].

**Proposition 3.** *and* ♦ *are not interdefinable in* **K**biGfb*.*

#### **Corollary 2.**

*1. and* ♦ *are not interdefinable in* **K**biG*,* **K**biG<sup>f</sup> fb*, and* **K**biG<sup>f</sup> *.*

*2. Both and* ♦ *fragments of* **K**biG *are more expressive than* **K***.*

In the remainder of the paper, we are going to provide a complete proof system for **K**G<sup>2</sup> fb (and hence, **K**biGfb), and establish its decidability and complexity as well as finite model property. Note, however, that the latter is not entirely for granted. In fact, several expected ways of defining filtration (cf. [7,14] for more details thereon) fail.

Let Σ ⊆ biL,♦ be closed under subformulas. If we want to have filtration for **K**biGfb, there are three intuitive ways to define ∼<sup>Σ</sup> on the carrier of a model that is supposed to relate states satisfying the same formulas.

1. <sup>w</sup> <sup>∼</sup><sup>1</sup> <sup>Σ</sup> w iff v(φ, w) = v(φ, w ) for all φ ∈ Σ. 2. <sup>w</sup> <sup>∼</sup><sup>2</sup> <sup>Σ</sup> w iff v(φ, w)=1 ⇔ v(φ, w ) = 1 for all φ ∈ Σ. 3. <sup>w</sup> <sup>∼</sup><sup>3</sup> <sup>Σ</sup> w iff v(φ, w) ≤ v(φ , w) ⇔ v(φ, w ) ≤ v(φ , w ) for all φ, φ∈Σ∪{**0**,**1**}. Consider the model on Fig. 3 and two formulas:

$$\phi^{\leq} := \sim(p \to \Diamond p) \qquad\qquad\phi^{>} := \sim(p \lhd p)$$

Now let <sup>Σ</sup> to be the set of all subformulas of <sup>φ</sup><sup>≤</sup> <sup>∧</sup> <sup>φ</sup>>.

First of all, it is clear that <sup>v</sup>(φ<sup>≤</sup> <sup>∧</sup> <sup>φ</sup>>, w) = 1 for any <sup>w</sup> <sup>∈</sup> <sup>M</sup>. Observe now that all states in <sup>M</sup> are *distinct* w.r.t. <sup>∼</sup><sup>1</sup> <sup>Σ</sup>. Thus, the first way of constructing the carrier of the new model does not give the FMP.

$$
\mathfrak{M}: w\_1 \xrightarrow{\quad} w\_2 \xrightarrow{\quad} \dots \xrightarrow{\quad} w\_n \xrightarrow{\quad} \dots
$$

$$\textbf{Fig. 3. } v(p, w\_n) = \frac{1}{n+1}$$

As regards to <sup>∼</sup><sup>2</sup> <sup>Σ</sup> and <sup>∼</sup><sup>3</sup> <sup>Σ</sup>, one can check that for any w, w ∈ M, it holds that <sup>w</sup> <sup>∼</sup><sup>2</sup> <sup>Σ</sup> <sup>w</sup> and <sup>w</sup> <sup>∼</sup><sup>3</sup> <sup>Σ</sup> w . So, if we construct a filtration of M using equivalence classes of either of these two relations, the carrier of the resulting model is going to be finite. Even more so, it is going to be a singleton.

However, we can show that there is *no finite model* N = U, S, e s.t.

$$\forall s \in \mathfrak{N}: v(\phi^{\leq} \wedge \phi^{>}, s) = 1.$$

Indeed, e(φ≤, t) = 1 iff e(p, t ) > 0 for some t <sup>∈</sup> <sup>S</sup>(t), while <sup>e</sup>(φ<sup>&</sup>gt;, t) = 1 iff v(p, t) > v(p, t ) for any t ∈ S(t). Now, if U is finite, we have two options: either (1) there is <sup>u</sup> <sup>∈</sup> <sup>U</sup> s.t. <sup>R</sup>(u) = <sup>∅</sup>, or (2) <sup>U</sup> contains a finite <sup>S</sup>-cycle.

For (1), note that v(♦p, u) = 0, and we have two options: if e(p, u) = 0, then e(φ<sup>&</sup>gt;, u) = 0; if, on the other hand, e(p, u) > 0, then e(φ≤, u) = 0. For (2), assume w.l.o.g. that the S-cycle looks as follows: u0Su1Su<sup>2</sup> ...SunSu0.

If e(p, u0)= 0, e(φ<sup>&</sup>gt;, u0)= 0, so e(p, u0)>0. Furthermore, e(p, ui)>e(p, ui+1). Otherwise, again, e(φ<sup>&</sup>gt;, ui) = 0. But then we have e(φ<sup>&</sup>gt;, ui) = 0.

But this means that <sup>∼</sup><sup>2</sup> <sup>Σ</sup> and <sup>∼</sup><sup>3</sup> <sup>Σ</sup> do not preserve truth of formulas from w to [w]Σ, i.e., neither of these two relations can be used to define filtration. Thus, in order to explicitly prove the finite model property and establish complexity evaluations for **K**biGfb and **K**G<sup>2</sup> fb, we will provide a tableaux calculus. It will also serve as a decision procedure for satisfiability and validity of formulas.

#### **4 Tableaux for KG<sup>2</sup> fb**

Usually, proof theory for modal and many-valued logics is presented in one of the following several forms. The first one is a Hilbert-style axiomatisation as given in e.g. [23] for the propositional G¨odel logic and in [12,13,36] for its modal expansions. Hilbert calculi are useful for establishing frame correspondence results as well as for showing that one logic extends another one in the same language. On the other hand, their completeness proofs might be quite complicated, and the proof-search not at all straightforward. Second, there are non-labelled sequent and hyper-sequent calculi (cf. [30] for the propositional proof systems and [28,29] for the modal hypersequent calculi). With regards to modal logics, completeness proofs of (hyper)sequent calculi often provide the answer for the decidability problem. Furthermore, the proof search can be quite straightforwardly automatised provided that the calculus is *cut-free*.

Finally, there are proof systems that directly incorporate semantics: in particular, tableaux (e.g., the ones for G¨odel logics [2] and tableaux for Lukasiewicz description logic [25]) and labelled sequent calculi (cf., e.g. [32] for labelled sequent calculi for classical modal logics). Because of the calculi's nature, their completeness proofs are usually simple. Besides, the calculi serve as a decision procedure that either establishes that the given formula is valid or provides an explicit countermodel.

Our tableaux system T **K**G<sup>2</sup> fb is a straightforward modal expansion of constraint tableaux for G<sup>2</sup> presented in [5]. It is inspired by constraint tableaux for Lukasiewicz logics from [21,22] (but cf. [26] for an approach similar to ours) which we modify with two-sorted labels corresponding to the support of truth and support of falsity in the model. This idea comes from tableaux for the Belnap—Dunn logic by D'Agostino [1]. Moreover, since **K**G<sup>2</sup> fb is a conservative extension of **K**biGfb, our calculus can be used for that logic as well if we apply only the rules that govern the support of truth of biL,♦ formulas.

**Definition 6 (**T **K**G<sup>2</sup> fb **).** *We fix a set of state-labels* W *and let* ∈ {<, } *and* ∈ {>, }*. Let further* w ∈W*,* **x**∈ {1, 2}*,* φ∈biL<sup>¬</sup> ,♦*, and* <sup>c</sup>∈ {0, <sup>1</sup>}*. A structure is either* w:**x**:φ *or* c*. We denote the set of structures with* Str*.*

*We define a* constraint tableau *as a downward branching tree whose branches are sets containing the following types of entries:*

*–* relational constraints *of the form* wRw *with* w, w ∈ W*;*

*–* structural constraints *of the form* X X *with* X, X ∈ Str*.*

*Each branch can be extended by an application of a rule*<sup>6</sup> *from Fig. 4 or Fig. 5. A tableau's branch* B *is* closed *iff one of the following conditions applies:*


*A tableau is* closed *iff all its branches are closed. We say that there is a* tableau proof *of* φ *iff there is a closed tableau starting from the constraint* w:1:φ < 1*. An open branch* B *is* complete *iff the following condition is met.*

*\** If all premises of a rule occur on <sup>B</sup>*,* then its one conclusion<sup>7</sup> occurs on <sup>B</sup>*.*

*Remark 4.* Note that due to Proposition 1, we need to check only one valuation of φ to verify its validity.

**Convention 4 (Interpretation of constraints).** *The following table gives the interpretations of structural constraints on the example of .*

<sup>6</sup> If X < 1 and X < X- (or 0 < X and X < X- ) occur on B, then the rules are applied only to X < X- .

<sup>7</sup> Note that branching rules have *two* conclusions.

**Fig. 4.** Propositional rules of T - **K**G<sup>2</sup> fb . Bars denote branching.


As one can see from Fig. 4 and Fig. 5, the rules follow the semantical conditions from Definition 4. Let us discuss →1 and -<sup>1</sup> in more details.

The premise of →1 is interpreted as v1(φ → φ , w) x. To decompose the implication, we check two options: either x = 1 (then, the value of φ → φ is arbitrary) or x < 1. In the second case, we use the semantics to obtain that v1(φ , w) x and v1(φ, w) > v1(φ , w).

$$\begin{split} & \Box : \Box \phi \not\geq \mathfrak{X} \\ & \Box : \frac{wRw'}{w' : 1 : \phi \not\geq \mathfrak{X}} \; \Box \; \otimes \tfrac{\mathfrak{X}}{wRw'} \; \Box \; \otimes \tfrac{\mathfrak{X}}{wRw'} \; \Box \; \otimes \tfrac{\mathfrak{X}}{wRw'} \; \Box \; \otimes \tfrac{\mathfrak{X}}{wRw'} \; \lnot\geq \mathfrak{X} \\ & \begin{subarray}{c} w : 1 : \phi \not\geq \mathfrak{X} \\ w'' : 1 : \phi \not\leq \mathfrak{X} \end{subarray} \; \begin{subarray}{c} w : 2 : \Box \phi \not\geq \mathfrak{X} \\ wRw' \end{subarray} \; \Box \; \otimes \tfrac{\mathfrak{X}}{wRw'} \; \lnot\geq \mathfrak{X} \\ & \begin{subarray}{c} w : 1 : \phi \not\leq \mathfrak{X} \\ wRw' \end{subarray} \; \otimes \tfrac{\mathfrak{X}}{w \; \lnot\geq \mathfrak{X} \; \otimes \tfrac{\mathfrak{X}}{w \; \lnot\geq \mathfrak{X}} \\ & \begin{subarray}{c} w : 2 : \lozenge \phi \not\geq \mathfrak{X} \\ wRw' \end{subarray} \; \otimes \tfrac{\mathfrak{X}}{w \; \lnot\geq \mathfrak{X} \; \otimes \tfrac{\mathfrak{X}}{w \; \lnot\geq \mathfrak{X}} \\ & \begin{subarray}{c} w \mathcal{R}w' \end{subarray} \; \otimes \tfrac{\mathfrak{X}}{w \; \lnot\geq \mathfrak{X} \; \otimes \tfrac{\mathfrak{X}}{w \; \lnot\geq \mathfrak{X}} \\ \end{subarray} \; \otimes \begin{subarray}{c} \mathcal{R} : \lozenge \$$

$$\textbf{Fig. 5. Model rules of } \mathcal{T} \left( \textbf{K} \textbf{G}\_{\text{fb}}^2 \right). \text{ } w^{\prime\prime} \text{ is fresh on the branch.}$$

In order to apply -<sup>1</sup> to w:1:φ X, we introduce a new state w that is seen by w. Since we work in a finite branching model, w can witness the value of φ. Thus, we add w :1:φ X.

We also provide an example of how our tableaux work. On Fig. 6, one can see a successful proof on the left and a failed proof on the right.

$$w\_0 \colon 1 \colon 1 \leqslant w\_0 \colon 1 \colon \mathbb{O}((p\lnot q) \land q) < 1 \qquad \begin{array}{c} w\_0 \colon 1 \colon \Box p \to \Box Dp < 1 \\ w\_0 \colon 1 \colon \Box p \to 1 \colon \\ w\_0 \colon 1 \colon \Box p \le 1 \\ w\_0 \colon 1 \colon \mathbb{O}((p\lnot q) \land q) \ge 1 \\ \qquad \qquad \qquad \qquad \begin{array}{c} w\_0 \colon 1 \colon \Box p \to \Box Dp < 1 \\ w\_0 \colon 1 \colon \Box p > w\_0 \colon 1 \colon \Box Dp < 1 \\ \qquad \qquad \qquad \qquad w\_0 \: Rw\_1 \\ w\_0 \colon 1 \colon \Box p > w\_1 \colon 1 \colon \Box p \\ w\_1 \colon 1 \colon p > w\_1 \colon 1 \colon \Box p \\ \qquad \qquad \qquad w\_1 \, Rw\_2 \\ w\_1 \colon 1 \colon p > w\_2 \colon 1 \colon p \\ \qquad \qquad \qquad \qquad w\_1 \colon 1 \colon p > w\_2 \colon 1 \colon p \\ \qquad \qquad \qquad \qquad \qquad w\_1 \colon 1 \colon p > w\_2 \colon 1 \colon p \\ \qquad \qquad \qquad \qquad w\_1 \colon 1 \colon q > 1 \\ \qquad \qquad \qquad \qquad w\_1 \colon 1 \colon q > 1 \\ \qquad \qquad \qquad w\_1 \colon 1 \colon q > 1 \\ \qquad \qquad \qquad w\_1 \colon 1 \colon q < 1 \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad w\_1 \colon 1 \colon q < 1 \\ \end{array}$$

**Fig. 6.** <sup>×</sup> indicates closed branches; indicates complete open branches.

**Definition 7 (Branch realisation).** *We say that a model* M = W, R, v1, v<sup>2</sup> *with* W = {w : w *occurs on* B} *and* R = {w, w : wRw ∈ B} realises a branch B *of a tree iff the following conditions are met.*

*–* v**x**(φ, w) ≤ v**x**- (φ , w ) *for any* w : **x** : φ w : **x** : φ ∈ B *with* **x**, **x** ∈ {1, 2}*. –* v**x**(φ, w) ≤ c *for any* w : **x** : φ c ∈ B *with* c ∈ {0, 1}*.*

**Theorem 3 (Completeness).** φ *is* **K**G<sup>2</sup> fb *valid iff it has a* <sup>T</sup> (**K**G<sup>2</sup> fb) *proof.*

*Proof.* We consider only the **K**G<sup>2</sup> fb case since **K**biGfb can be handled the same way. For soundness, we check that if the premise of the rule is realised, then so is at least one of its conclusions. We consider the cases of →1 and -<sup>1</sup> . Assume that w :1: φ → φ X is realised and assume w.l.o.g. that X = u :2: ψ. It is clear that either v2(ψ, u) = 1 or v2(ψ, u) < 1. In the first case, X 1 is realised. In the second case, we have that v1(φ, w) > v1(φ , w) and v1(φ , w) v2(ψ, u). Thus, X < 1, w :1: φ>w :1: φ , and w :1: φ u :1: ψ are realised as well, as required.

For -<sup>1</sup> , assume that w:1: φX is realised and assume w.l.o.g. that X = u:2:ψ. Thus, v1(φ, w) v2(ψ, u) Then, since the model is finitely branching, there is an accessible state w s.t. v1(φ, w) v2(ψ, u). Thus, w :1: φ X is realised too.

As no closed branch is realisable, the result follows.

For completeness, we show that every complete open branch B is realisable. We construct the model as follows. We let W = {w : w occurs in B}, and set R = {w, w : wRw ∈ B}. Now, it remains to construct the suitable valuations.

For i ∈ {1, 2}, if w : i : p 1 ∈ B, we set vi(p, w) = 1. If w : i : p 0 ∈ B, we set vi(p, w) = 0. To set the values of the remaining variables q1, ..., qn, we proceed as follows. Denote <sup>B</sup><sup>+</sup> the transitive closure of <sup>B</sup> under and let

$$[w: \mathbf{x}: q\_i] = \left\{ w': \mathbf{x'}: q\_j \; \middle| \begin{array}{l} w: \mathbf{x}: q\_i \leqslant w': \mathbf{x'}: q\_j \in \mathcal{B}^+ \text{ and } w: \mathbf{x}: q\_i < w': \mathbf{x'}: q\_j \notin \mathcal{B}^+ \\ \text{or} \\ w: \mathbf{x}: q\_i \geqslant w': \mathbf{x'}: q\_j \in \mathcal{B}^+ \text{ and } w: \mathbf{x}: q\_i > w': \mathbf{x'}: q\_j \notin \mathcal{B}^+ \end{array} \right\}$$

It is clear that there are at most 2 · n · |W| [w : **x** : qi]'s since the only possible loop in <sup>B</sup><sup>+</sup> is <sup>w</sup><sup>i</sup><sup>1</sup> :**x**: <sup>r</sup> ... <sup>w</sup><sup>i</sup><sup>1</sup> :**x**: <sup>r</sup>, but in such a loop all elements belong to [w<sup>i</sup><sup>1</sup> :**x**:r]. We put [w:**x**:qi] ≺ [w :**x** :q<sup>j</sup> ] iff there are w<sup>k</sup> :**x**:r ∈ [w:**x**:qi] and w <sup>k</sup> :**x** :r ∈ [w :**x** :q<sup>j</sup> ] s.t. w<sup>k</sup> :**x**:r<w <sup>k</sup> :**x** :r ∈ B<sup>+</sup>.

We now set the valuation of these variables as follows

$$v\_{\mathbf{x}}(q\_i, w) = \frac{|\{ [w' : \mathbf{x}' : q'] \mid [w' : \mathbf{x}' : q'] \prec [w : \mathbf{x} : q\_i] \} |}{2 \cdot n \cdot |W|}$$

Note that if some <sup>φ</sup> contains <sup>s</sup> but <sup>B</sup><sup>+</sup> contains no inequality with it, the above definition ensures that s is going to be evaluated at 0. Thus, all constraints containing only variables are satisfied.

It remains to show that all other constraints are satisfied. For that, we prove that if at least one conclusion of the rule is satisfied, then so is the premise. The propositional cases are straightforward and can be tackled in the same manner as in [5, Theorem 2]. We consider only the case of ♦<sup>2</sup> . Assume w.l.o.g. that = and X = u :1: ψ. Since B is complete, if w : 2: ♦φ u :1: ψ ∈ B, then for any w s.t. wRw ∈ B, we have w : 2: φ u :1: ψ ∈ B, and all of them are realised by M. But then w: 2:♦φ u:1:ψ is realised too, as required.

#### **Theorem 4.**


*Proof.* We begin with 1. By Theorem 3, if φ is *not* **K**G<sup>2</sup> fb *valid*, we can build a falsifying model using tableaux. It is also clear from the rules on Fig. 5 that the depth of the constructed model is bounded from above by the maximal number of nested modalities in φ. The width of the model is bounded by the maximal number of modalities on the same level of nesting. The sharpness of the bound is obtained using the embedding of **K** into **K**G<sup>2</sup> fb since **K** is complete w.r.t. finitely branching models and it is possible to force shallow trees of exponential size in **K** (cf., e.g. [7, §6.7]). The embedding also entails PSPACE-hardness. It remains to tackle membership.

First, observe from the proof of Theorem 3 that φ(p1,...,pn) is satisfiable (falsifiable) on M = W, R, v1, v<sup>2</sup> iff there are v<sup>1</sup> and v<sup>2</sup> that give variables values from V = 0, <sup>1</sup> <sup>2</sup>·n·|W<sup>|</sup> ,..., <sup>2</sup>·n·|W|−<sup>1</sup> <sup>2</sup>·n·|W<sup>|</sup> , <sup>1</sup> under which φ is satisfied (falsified).

As we mentioned, <sup>|</sup>W<sup>|</sup> is bounded from above by <sup>k</sup><sup>k</sup>+1 with <sup>k</sup> being the number of modalities in φ. Therefore, we replace structural constraints with labelled formulas of the form w :i:φ=v (v ∈ V) avoiding comparisons of values of formulas in different states. As expected, we close the branch if it contains w:i:ψ=v and w:i:ψ=v for v = v .

Now we replace the rules with the new ones that work with labelled formulas instead of structural constraints. Below, we give as an example new rules for → and ♦<sup>9</sup> (with <sup>|</sup>V<sup>|</sup> <sup>=</sup> <sup>m</sup> + 1):

$$\begin{array}{c|c|c} & w:1:\phi \to \phi'=1\\ \hline w:1:\phi=0 \left| \begin{array}{c} w:1:\phi = \frac{1}{m+1} \\ w:1:\phi'=\frac{1}{m+1} \end{array} \right| w:1:\phi = \frac{1}{m+1} \left| \begin{array}{c} w:1:\phi = \frac{m-1}{m+1} \\ w:1:\phi'=\frac{m}{m+1} \end{array} \right| w:1:\phi'=1\\ \\ \hline \\ w:1:\lozenge \phi = \frac{r}{m+1} \\ \hline w\mathbb{R}w'';w'':1:\phi = \frac{r}{m+1} \end{array} \qquad \begin{array}{c|c} w:1:\phi = \frac{m-1}{m+1} \\ w:1:\phi'=1 \end{array} \Big| w:1:\phi'=1 \end{array}$$

<sup>8</sup> Satisfiability and falsifiability (non-validity) are reducible to each other using -: φ is satisfiable iff ∼∼(<sup>φ</sup> - **<sup>0</sup>**) is falsifiable; <sup>φ</sup> is falsifiable iff ∼∼(**<sup>1</sup>** φ) is satisfiable.

<sup>9</sup> Intuitively, for a value 1 > v > 0 of ♦φ at w, we add a new state that witnesses v, and for a state on the branch, we guess a value smaller than v. Other modal rules can be rewritten similarly.

We now show how to build a satisfying model for φ using polynomial space. We begin with w<sup>0</sup> :1: φ = 1 and start applying propositional rules (first, those that do not require branching). If we implement a branching rule, we pick one branch and work only with it: either until the branch is closed, in which case we pick another one; until no more rules are applicable (then, the model is constructed); or until we need to apply a modal rule to proceed. At this stage, we need to store only the subformulas of φ with labels denoting their value at w0.

Now we guess a modal formula (say, w<sup>0</sup> :2:χ= <sup>1</sup> <sup>m</sup>+1 ) whose decomposition requires an introduction of a new state (w1) and apply this rule. Then we apply all modal rules that use w0Rw<sup>1</sup> as a premise (again, if those require branching, we guess only one branch) and start from the beginning with the propositional rules. If we reach a contradiction, the branch is closed. Again, the only new entries to store are subformulas of φ (now, with fewer modalities), their values at w1, and a relational term w0Rw1. Since the depth of the model is O(|φ|) and since we work with modal formulas one by one, we need to store subformulas of φ with their values O(|φ|) times, so, we need only O(|φ| <sup>2</sup>) space.

Finally, if no rule is applicable and there is no contradiction, we mark w<sup>0</sup> : 2 : χ = <sup>1</sup> <sup>m</sup>+1 as 'safe'. Now we *delete all entries of the tableau below it* and pick another unmarked modal formula that requires an introduction of a new state. Dealing with these one by one allows us to construct the model branch by branch. But since the length of each branch of the model is bounded by O(|φ|) and since we delete *branches of the model* once they are shown to contain no contradictions, we need only polynomial space.

We end the section with two simple observations. First, Theorems 3 and 4 are applicable both to **K**biGfb and **K**G<sup>2</sup> fb because the latter is conservative over the former. Secondly, since **K**G<sup>2</sup> and **K**biG are conservative over GK<sup>c</sup> and since **K** can be embedded in GK<sup>c</sup> , the lower bounds on complexity of a classical modal logic of some class of frames K and G<sup>2</sup> modal logic of K will coincide.

## **5 Concluding Remarks**

In this paper, we developed a crisp modal expansion of the two-dimensional G¨odel logic <sup>G</sup><sup>2</sup> as well as an expansion of bi-G¨odel logic with and ♦ both for crisp and fuzzy frames. We also established their connections with modal G¨odel logics, and gave a complexity analysis of their finitely branching fragments.

The following steps are: to study the proof theory of **K**G<sup>2</sup> and **K**G<sup>2</sup> fb: both in the form of Hilbert-style and sequent calculi; establish the decidability (or lack thereof) for the case of **K**G<sup>2</sup>. Moreover, two-dimensional treatment of information invites for different modalities, e.g. those formalising aggregation strategies given in [6]—in particular, the cautious one (where the agent takes minima/infima of *both* positive and negative supports of a given statement) and the confident one (whereby the maxima/suprema are taken). Last but not least, while in this paper we assumed that our *access* to sources is crisp, one can argue that the *degree of our bias* towards the given source can be formalised via *fuzzy* frames. Thus, it would be instructive to construct a fuzzy version of **K**G<sup>2</sup>.

In a broader perspective, we plan to provide a general treatment of twodimensional modal logics of uncertainty. Indeed, within our project [5,6], we are formalising reasoning with heterogeneous and possibly incomplete and inconsistent information (such as crisp or fuzzy data, personal beliefs, etc.) in a modular fashion. This modularity is required because different contexts should be treated with different logics—indeed, not only the information itself can be of various nature but the reasoning strategies of different agents even applied to the same data are not necessarily the same either. Thus, since we wish to account for this diversity, we should be able to combine different logics in our approach.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Non-associative, Non-commutative Multi-modal Linear Logic**

Eben Blaisdell1, Max Kanovich2, Stepan L. Kuznetsov3,4, Elaine Pimentel2(B) , and Andre Scedrov<sup>1</sup>

<sup>1</sup> Department of Mathematics, University of Pennsylvania, Philadelphia, USA <sup>2</sup> Department of Computer Science, University College London, London, UK elaine.pimentel@gmail.com


**Abstract.** Adding multi-modalities (called *subexponentials*) to linear logic enhances its power as a logical framework, which has been extensively used in the specification of *e.g.* proof systems, programming languages and bigraphs. Initially, subexponentials allowed for classical, linear, affine or relevant behaviors. Recently, this framework was enhanced so to allow for commutativity as well. In this work, we close the cycle by considering associativity. We show that the resulting system (acLLΣ) admits the (multi)cut rule, and we prove two undecidability results for fragments/variations of acLLΣ.

# **1 Introduction**

Resource aware logics have been object of passionate study for quite some time now. The motivations for this passion vary: resource consciousness are adequate for modeling steps of computation; logics have interesting algebraic semantics; calculi have nice proof theoretic properties; multi-modalities allow for the specification of several behaviors; there are many interesting applications in linguistics, etc.

With this variety of subjects, applications and views, it is not surprising that different groups developed different systems based on different principles. For example, the Lambek calculus (L) [29] was introduced for mathematical modeling of natural language syntax, and it extends a basic categorial grammar [3,4] by a concatenation operator. Linear logic (LL) [16], originally discovered by Girard from a semantical analysis of the models of polymorphic λ-calculus, turned out to be a refinement of classical and intuitionistic logic, having the dualities of the former and constructive properties of the

The work of Max Kanovich was partially supported by EPSRC Programme Grant EP/R006865/1: "Interface Reasoning for Interacting Systems (IRIS)."

The work of Stepan L. Kuznetsov was supported by the Theoretical Physics and Mathematics Advancement Foundation "BASIS" and partially performed within the framework of HSE University Basic Research Program and within the project MK-1184.2021.1.1 "Algorithmic and Semantic Questions of Modal Extensions of Linear Logic" funded by the Ministry of Science and Higher Education of Russia.

Elaine Pimentel acknowledges partial support by the MOSAIC project (EU H2020-MSCA-RISE-2020 Project 101007627).

latter. The key point is the presence of the *modalities* !, ?, called *exponentials* in LL. In the intuitionistic version of LL, denoted by ILL, only the ! exponential is present.

L and LL were compared in [2], when Abrusci showed that Lambek calculus coincides with a variant of the non-commutative, multiplicative version of ILL [41]. This correspondence can be lifted for considering also the additive connectives: Full (multiplicative-additive) Lambek calculus FL relates to non-commutative multiplicative-additive version of ILL, here denoted by cLL.

In this paper we propose the sequent based system acLLΣ, a conservative extension of cLL, where associativity is allowed only for formulas marked with a special kind of modality, determined by a *subexponential signature* Σ. The notation adopted is modular, uniform and scalable, in the sense that many well known systems will appear as fragments or special cases of acLLΣ, by only modifying the signature Σ. The core fragment of acLL<sup>Σ</sup> (*i.e.*, without the subexponentials) corresponds to the non-associative version of full Lambek calculus, FNL [8].<sup>1</sup>

The language of acLL<sup>Σ</sup> consists of a denumerable infinite set of propositional variables {p, q, r, . . .}, the unities {1, -}, the binary connectives for additive conjunction and disjunction {&, ⊕}, the non-commutative multiplicative conjunction ⊗, the non-commutative linear implications {→,←}, and the unary subexponentials ! i , with i belonging to a pre-ordered set of labels (I, ).

Roughly speaking, subexponentials [13] are substructural multi-modalities. In LL, ! A indicates that the linear formula A behaves *classically*, that is, it can be contracted *and* weakened. Labeling ! with indices allows moving one step further: The set I can be partitioned so that, in ! i A, A can be contracted *and/or* weakened. This allows for two other types of behavior (other than classical or linear): affine (only weakening) or relevant (only contraction). Pre-ordering the labels (together with an upward closeness requirement) guarantees cut-elimination [42]. But then, why consider only weakening and contraction? Why not also take into account other structural properties, like commutativity or associativity? In [20,21] commutativity was added to the picture, so that in ! i A, A can be contracted, weakened, classical or linear, but it may also commute with the neighbor formula. In this work we consider the last missing part: Associativity.

Smoothly extending cLL to allow consideration of the non-associative case is non trivial. This requires a structural recasting/reframing of sequents: we pass from sets/multisets to lists in the non-commutative case, onto trees in the case of nonassociativity [28]. As a consequence, the inference rules should act deeply over formulas in tree-structured sequents, which can be tricky in the presence of modalities [17].

On the other side, the multi-modal Lambek calculus introduced in [35,45] and extended/compiled/implemented in [18,36–38] <sup>2</sup> use different *families of connectives and contexts*, distinguished by means of indices, or *modes*. Contexts are indexed binary trees, with formulas built from the indexed adjoint connectives {→i,←i} and ⊗<sup>i</sup> (*e.g.*

<sup>1</sup> The multiplicative fragment of acLL<sup>Σ</sup> is the non-associative version of Lambek's calculus, NL, introduced by Lambek himself in [30]. Both the associative calculus L and the non-associative calculus NL have their advantages and disadvantages for the analysis of natural language syntax, as we discuss in more detail in Sect. 2.2.

<sup>2</sup> The Grail family of theorem provers [37] works with a variety of modern type-logical frameworks, including multimodal type-logical grammars.

(<sup>A</sup> <sup>→</sup><sup>i</sup> B,(<sup>C</sup> <sup>⊗</sup><sup>j</sup> D,H)<sup>k</sup>)<sup>i</sup> ). Each mode has its own set of logical rules (following the same rule scheme), and different structural features can be combined via the mode information on the formulas. This gives to the resulting system a multi-modal flavor, but it also results in a language of *binary connectives*, determined by the modes. This forces an unfortunate second level synchronization between implications and tensor, and modalities act over whole *sequents*, not on single *formulas*.

In order to attribute particular resource management properties to individual resources, in [27,33] explicit (classical) multi-modalities <sup>i</sup>, <sup>i</sup> were proposed. While such unary modalities were inspired in LL exponentials, the resemblance stops there. First of all, the logical connectives come together with structural constructors for contexts, which turns <sup>i</sup>, <sup>i</sup> into truncated forms of product and implication.

Second, <sup>i</sup>, <sup>i</sup> have a *temporal behavior*, in the sense that -F ⇒ F and F ⇒ -F, which are not provable in LL using the "natural interpretation" -= ?, = !.

In this paper, multi-modality is *totally local*, given by the subexponentials. The signature Σ contains the pre-ordered set of labels, together with a function stating which axioms, among weakening, contraction, exchange and associativity, are assumed for each label. Sequents will have a *nested structure*, corresponding to trees of formulas. And rules will be applied deeply in such structures. This not only gives the LL based system a more modern presentation (based on nested systems, like *e.g.* in [10,15]), but it also brings the notation closer to the one adopted by the Lambek community, like in [25]. Finally, it also uniformly extends several LL based systems present in the literature, as Example 8 in the next section shows.

Designing a good system serves more than simple pure proof-theoretic interests: Well behaved, neat proof systems can be used in order to approach several important problems, such as interpolation, complexity and decidability. And decidability of extensions/variants/fragments of L and LL is a fascinating subject of study, since the presence or absence of substructural properties/connectives may completely change the outcome. Indeed, it is well known that LL is undecidable [32], but adding weakening (affine LL) turns the system decidable [24], while removing the additives (MELL – multiplicative, exponential LL) reaches the border of knowledge: It is a long standing open problem [50]. Non-associativity also alters decidability and complexity: L is NPcomplete [47], while NL is decidable in polynomial time [1,6]. Finally, the number of subexponentials also plays a role in decision problems: MELL with two subexponentials is undecidable [9].

In this work, we will present two undecidability results, all orbiting (but not encompassing) MELL/FNL. First, we show that acLL<sup>Σ</sup> containing the multiplicatives ⊗,→, the additive ⊕ and one classical subexponential (allowing contraction and weakening) is undecidable. This is a refinement of the unpublished result by Tanaka [51], which states that FNL plus one fully-powered subexponential is undecidable.

In the second undecidability result, we keep two subexponentials, but with a minimalist configuration: the implicational fragment of the logic plus two subexponentials: the "main" one allowing for contraction, exchange, and associativity (weakening is optional), and an "auxiliary" one allowing only associativity. This is a variation of Chaudhuri's result (in the non-associative, non-commutative case), making use of fewer connectives (tensor is not needed) and less powerful subexponentials.


**Table 1.** Acronyms/decidability of systems mentioned in the paper.

The rest of the paper is organized as follows: Sect. 2 presents the system acLLΣ, showing that it has the cut-elimination property and presenting an example in linguistics; Sect. 3 shows the undecidability results; and Sect. 4 concludes the paper.

We have placed, in Table 1, the acronyms for and decidability of all considered systems. Decidability for the cases marked with "−" depends on the signature Σ.

### **2 A Nested System for Non-associativity**

Similar to modal connectives, the exponential ! in ILL is not *canonical* [13], in the sense that if i = j then ! i F ≡ ! j F. Intuitively, this means that we can mark the exponential with *labels* taken from a set I organized in a pre-order (*i.e.*, reflexive and transitive), obtaining (possibly infinitely-many) exponentials (! i for i ∈ I). Also as in multi-modal systems, the pre-order determines the provability relation: for a general formula F, ! b F *implies* ! <sup>a</sup><sup>F</sup> iff <sup>a</sup> <sup>b</sup>.

The algebraic structure of subexponentials, combined with their intrinsic structural property allow for the proposal of rich linear logic based frameworks. This opened a venue for proposing different multi-modal substructural logical systems, that encountered a number of different applications. Originally [42], subexponentials could assume only weakening and contraction axioms:

$$\mathbf{C}: \quad !^i F \to !^i F \otimes !^i F \qquad \mathsf{W}: \quad !^i F \to 1$$

This allows the specification of systems with multiple contexts, which may be represented by sets or multisets of formulas [44], as well as the specification and verification of concurrent systems [43], and biological systems [46]. In [20,21], non-commutative systems allowing commutative subexponentials were presented:

$$\mathsf{E}: \quad (!^i F) \otimes G \equiv G \otimes (!^i F)$$

and this has many applications, *e.g.*, in linguistics [21].

In this work, we will present a non-commutative, non-associative linear logic based system, and add the possibility of assuming associativity<sup>3</sup>

$$\mathsf{A1}: \ \mathsf{l}^{i}F \otimes (G \otimes H) \equiv (\mathsf{l}^{i}F \otimes G) \otimes H \qquad \mathsf{A2}: \ (G \otimes H) \otimes \mathsf{l}^{i}F \equiv G \otimes (H \otimes \mathsf{l}^{i}F)$$

as well as commutativity and other structural properties.

We start by presenting an adaption of simply dependent multimodal linear logics (SDML) appearing in [31] to the non-associative/commutative case.

The language of non-commutative SDML is that of (propositional intuitionistic) linear logic with subexponentials [21] supplied with the *left residual*; or similarly, that of FL with subexponentials. Non-associative contexts will be organized via binary trees, here called *structures*.

**Definition 1 (Structured sequents).** Structures *are formulas or pairs containing structures:*

$$\Gamma, \Delta := F \mid (\Gamma, \Gamma)$$

*where the constructors may be empty but never a singleton.*

*An* n-ary context Γ -1 ...*<sup>n</sup> is a context that contains* n *pairwise distinct numbered* holes { } *wherever a formula may otherwise occur. Given* n *contexts* Γ1,...,Γn*, we write* Γ{Γ1}···{Γn} *for the context where the k-th hole in* Γ -1 ...*<sup>n</sup> has been replaced by* <sup>Γ</sup><sup>k</sup> *(for* <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>n</sup>*). If* <sup>Γ</sup><sup>k</sup> <sup>=</sup> <sup>∅</sup> *the hole is removed.*

*A* structured sequent *(or simply* sequent*) has the form* Γ ⇒ F *where* Γ *is a structure and* F *is a formula.*

*Example 2.* Structures are binary trees, with formulas as leaves and commas as nodes. The structure ! i A,(B,C) represents the tree below left, while (!<sup>i</sup> A, B), C represents the tree below right

**Definition 3 (SDML).** *Let* A *be a set of axioms. A (non-associative/commutative)* simply dependent multimodal logical system *(*SDML*) is given by a triple* Σ = (I, -, f)*, where* I *is a set of indices,* (I, -) *is a pre-order, and* f *is a mapping from* I *to* 2A*.*

*If* Σ *is a* SDML*, then the* logic described by Σ *has the modality* ! i *for every* i ∈ I*, with the rules of* FNL *depicted in Fig. 1, together with rules for the axioms* f(i) *and the interaction axioms* ! j A → ! i A *for every* i, j ∈ I *with* i j*. Finally, every* SDML *is assumed to be upwardly closed w.r.t. , that is, if* i j *then* f(i) ⊆ f(j) *for all* i, j ∈ I*.*

<sup>3</sup> Note that the implemented rules in Fig. 2 reflect the left to right direction of such axioms only.

Figure 2 presents the structured system acLLΣ, for the logic described by the SDML determined by Σ, with A = {C, W,A1,A2, E} where, in the subexponential rule for S ∈ A, the respective s ∈ I is such that S ∈ f(s) (*e.g.* the subexponential symbol e indicates that E ∈ f(e)). We will denote by ! AxΔ the fact that the structure Δ contains only banged formulas as leaves, each of them assuming the axiom Ax.

As an economic notation, we will write ↑ i for the *upset* of the index i, *i.e.*, the set {j ∈ I : i j}. We extend this notation to structures in the following way. Let Γ be a structure containing only banged formulas as leaves. If such formulas admit the multiset partition

$$\{!^j F \in \Gamma : i \prec j \} \cup \{!^k F \in \Gamma : i \nnot k \text{ and } \mathcal{W} \in f(k) \}$$

then Γ↑<sup>i</sup> is the structure obtained from Γ by easing the formulas in the second component of the partition (equivalently, the substructure of Γ formed with all and only formulas of the first component of the partition). Otherwise, Γ↑<sup>i</sup> is undefined.

*Example 4.* Let Γ = (!<sup>i</sup> A,(!<sup>j</sup> B, ! <sup>k</sup>C)) be represented below left, <sup>i</sup> <sup>j</sup> but <sup>i</sup> <sup>k</sup>, and <sup>W</sup> <sup>∈</sup> <sup>f</sup>(k). Then <sup>Γ</sup>↑<sup>i</sup> = (!<sup>i</sup> A, ! j B) is depicted below right

Observe that, if <sup>W</sup> <sup>∈</sup>/ <sup>f</sup>(k), then <sup>Γ</sup>↑<sup>i</sup> cannot be built. In this case, any derivation of Γ ⇒ ! i (A ⊗ B) cannot start with an application of the promotion rule ! i R (similarly to how promotion in ILL cannot be applied in the presence of non-classical contexts). In this case, if A, B are atomic, this sequent would not be provable.

*Example 5.* The use of subexponentials to deal with associativity can be illustrated by the prefixing sequent A → B ⇒ (C → A) → (C → B): It is not provable for an arbitrary formula C, but if C = !<sup>a</sup>C , then

$$\begin{array}{l} \overbrace{\begin{subarray}{l} \overline{\text{"{a}}^{\prime}\text{C}\Rightarrow\text{!}^{a}C^{\prime}} \end{subarray}}^{} \text{init} \quad \overbrace{\begin{subarray}{l} \overline{A\Rightarrow A} \ \overline{B\Rightarrow B} \end{subarray}}^{} \begin{subarray}{l} \text{init} \\ \overline{(A,A\rightarrow B)\Rightarrow B} \end{subarray}}^{} \begin{subarray}{l} \text{init} \\ \overline{(A,A\rightarrow B)\Rightarrow B} \end{subarray}} \begin{subarray}{l} \text{init} \\ \overline{(B,A\rightarrow B)\Rightarrow B} \end{subarray}}{} \rightarrow L$$

#### **2.1 Cut-Elimination**

When it comes to the proof of cut-elimination for acLLΣ, the cut reductions for the propositional connectives follow the standard steps for similar systems such as, *e.g.*, Moot and Retore's system ´ NL in [38, Chapter 5.2.2]. The case of structural rules, on the other hand, should be treated with care.

$$\begin{array}{llll} \frac{\Gamma \Gamma (F, G) \rightleftharpoons H}{\Gamma \{ F \otimes G \} \Rightarrow H} & \otimes L & \frac{\Gamma\_{1} \Rightarrow F \quad \Gamma\_{2} \Rightarrow G}{(\Gamma\_{1}, \Gamma\_{2}) \Rightarrow F \otimes G} & \otimes R & \frac{\Gamma \langle F \rangle \Rightarrow H \quad \Gamma \langle G \rangle \Rightarrow H}{\Gamma \{ F \oplus G \} \Rightarrow H} & \oplus L \\\\ \frac{\Gamma \Rightarrow F\_{i}}{\Gamma \Rightarrow F\_{1} \Leftrightarrow F\_{2}} & \oplus R\_{i} & \frac{\Gamma \langle F\_{i} \rangle \Rightarrow G}{\Gamma \langle F\_{1} \& F\_{2} \rangle \Rightarrow G} & \& L\_{i} & \frac{\Gamma \Rightarrow F \quad \Gamma \Rightarrow G}{\Gamma \Rightarrow F \& G} & \& R \\\\ \frac{\Delta \Rightarrow F \quad \Gamma (G) \Rightarrow H}{\Gamma \{ (\mathcal{A}, F \rightarrow G) \} \Rightarrow H} & \rightarrow L & \frac{\langle F, \Gamma \rangle \Rightarrow G}{\Gamma \Rightarrow F \rightarrow G} & \rightarrow R & \frac{\Delta \Rightarrow F \quad \Gamma (G) \Rightarrow H}{\Gamma \langle (G \leftarrow F, \mathcal{A}) \rangle \Rightarrow H} & \leftarrow L \\\\ \frac{\langle \Gamma, F \rangle \Rightarrow G}{\Gamma \Rightarrow G \leftarrow F} & \leftarrow R & \frac{\Gamma \langle 1 \rangle \Rightarrow F}{\Gamma \langle 1 \rangle \Rightarrow F} & \downarrow L & \xrightarrow{} \frac{\Gamma}{\Rightarrow 1} & \text{\$} R & \quad \frac{\Gamma \Rightarrow \top}{\Gamma \Rightarrow \top} \end{array}$$

**Fig. 1.** Structured system FNL for non-associative, full Lambek calculus.

$$\frac{\Gamma \stackrel{\forall}{\Rightarrow} \Rightarrow F}{\Gamma \Rightarrow \text{\textquotedblleft} \stackrel{\forall}{\Rightarrow} \end{\text{\textquotedblleft} \begin{aligned} \Gamma \& \rightarrow G\\ \Gamma \left\{ ! ^{i}F \right\} \Rightarrow G \end{aligned} \text{ \textquotedblright}$$

**Fig. 2.** Structured system acLL<sup>Σ</sup> for the logic described by Σ.

**Theorem 6.** *If the sequent* Γ ⇒ F *is provable in* acLLΣ*, then it has a proof with no instances of the rule* mcut*.*

*Proof.* The most representative cases of cut reductions involving subexponentials are detailed next. In order to simplify the notation, when possible, the mcut rule is presented in its simple form, with an 1-ary context.

Case ! a : Suppose that

$$\frac{\frac{\pi\_1}{\Delta\_1^{\uparrow a} \Rightarrow F} \quad !^a R \quad \frac{\Gamma\{(!^a F, \Delta\_2), \Delta\_3\} \} \Rightarrow G}{\frac{\Gamma\{(!^a F, (\Delta\_2, \Delta\_3))\} \Rightarrow G}{\Gamma\{(\Delta\_1, (\Delta\_2, \Delta\_3))\} \Rightarrow G} \text{ A1}$$

Since axioms are upwardly closed w.r.t. , it must be the case that <sup>Δ</sup><sup>↑</sup><sup>a</sup> <sup>1</sup> contains only formulas marked with subexponentials allowing associativity. All the other formulas in Δ<sup>1</sup> can be weakened; this is guaranteed by the application of the rule ! <sup>a</sup>R in π1. Hence the derivation above reduces to

$$\frac{\begin{array}{c} \pi\_{1} \\ \Delta\_{1}^{\uparrow a} \Rightarrow F \\ \hline \Delta\_{1}^{\uparrow a} \Rightarrow \text{!}^{a}F \end{array}}{\begin{array}{c} \Gamma\Big\{ ( (\Delta\_{1}^{\uparrow a}, \Delta\_{2}), \Delta\_{3} ) \} \Rightarrow G \\ \hline \Gamma\Big\{ ( (\Delta\_{1}^{\uparrow a}, \Delta\_{2}), \Delta\_{3} ) \} \Rightarrow G \\ \hline \Gamma\Big\{ ( \Delta\_{1}^{\uparrow a}, (\Delta\_{2}, \Delta\_{3}) ) \} \Rightarrow G \\ \hline \Gamma\{ (\Delta\_{1}, (\Delta\_{2}, \Delta\_{3})) \} \Rightarrow G \end{array}}{\begin{array}{c} \text{A1} \\ \hline \end{array}} \text{A2}$$

Case ! c : Suppose that

$$\frac{\frac{\Delta^{\uparrow c}}{\Delta \Rightarrow \,^c F} \; \, !^c R \quad \frac{\, ^c \{!^c F\} \dots \, \{!^c F\} \dots \{!^c F\} \Rightarrow G}{\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, G} \; \, \, \, \, \, \, \, G$$

Since Δ↑<sup>c</sup> contains only formulas marked with subexponentials allowing contraction, the derivation above reduces to

$$\frac{\frac{\Delta^{\uparrow c} \Rightarrow F}{\Delta^{\uparrow c} \Rightarrow \text{!}^{c}F} \; ^{\text{!}^{c}R}{\;} \; ^{\text{!}^{c}F} \; \_{\;\;\;} \; \Gamma \{ !^{c}F \} \ldots \; \{ !^{c}F \} \; \ldots \; \{ !^{c}F \} \Rightarrow G}{\frac{\Gamma \{ \Delta^{\uparrow c} \} \ldots \; \{ \Delta^{\uparrow c} \} \ldots \; \{ \Delta^{\uparrow c} \} \Rightarrow G}{\frac{\Gamma \{ \} \ldots \{ \Delta^{\uparrow c} \} \ldots \; \{ \} \Rightarrow G}{\Gamma \{ \} \ldots \{ \Delta \} \ldots \; \{ \} \Rightarrow G}} \; \mathsf{W}$$

Observe that here, as usual, the multicut rule is needed in order to reduce the cut complexity.

Case ! i R: Suppose that

$$\frac{\frac{\pi\_1}{\Delta \stackrel{\uparrow i}{\Rightarrow} \stackrel{\pi}{\Rightarrow} F}{\Delta \stackrel{\uparrow i}{\Rightarrow} !^i R} \; \frac{\left(\varGamma \{ !^i F \} \right)^{\uparrow j} \Rightarrow G}{\varGamma \{ !^i F \} \Rightarrow !^j G} \; \mathsf{l}^j R$$

If j i, then it should be the case that W ∈ f(i) and Γ ! i F <sup>↑</sup><sup>j</sup> = <sup>Γ</sup>{ }<sup>↑</sup><sup>j</sup> , since ! i F will be weakened in the application of rule ! j R. Hence, all formulas in Δ can be weakened as well and the reduction is

$$\frac{\Gamma\{\}^{\uparrow j} \Rightarrow G}{\Gamma\{\} \Rightarrow \!!^{j}G} \stackrel{!^{j}R}{\mathsf{W}}$$

On the other hand, if <sup>j</sup> <sup>i</sup>, by transitivity all the formulas in <sup>Δ</sup>↑<sup>i</sup> also have this property (implying that Δ↑<sup>i</sup> is a substructure of Δ↑<sup>j</sup> ), and the rest of formulas of Δ can be weakened. Hence the derivation above reduces to

$$\frac{\frac{\pi\_1}{\Delta^{\uparrow i} \Rightarrow F} \begin{array}{c} \pi\_1\\ \Delta^{\uparrow j} \Rightarrow \text{!}^i R \end{array} \begin{array}{c} \pi\_2\\ \left(\Gamma\{\text{!}^i F\}\right)^{\uparrow j} \Rightarrow G \end{array}}{\left(\Gamma\{\Delta\}\right)^{\uparrow j} \Rightarrow G}$$

The other cases for subexponentials are similar or simpler.

The next examples illustrate what we mean by acLL<sup>Σ</sup> being a "conservative extension" of subsystems and variants. Indeed, although we remove structural properties of the core LL, subexponentials allow them to be added back, either locally or globally.

*Example 7 (Structural variants of* iMALL*).* Adding combinations of contraction C and / or weakening W for *arbitrary formulas* to additive-multiplicative intuitionistic linear logic (iMALL) yields, respectively, propositional intuitionistic logic ILP = iMALL + {C, W}, and the intuitionistic versions of affine linear logic aLL = iMALL + W and relevant logic R = iMALL + C. For the sake of presentation we overload the notation and use the connectives of linear logic also for these logics. In order to embed the logics above into acLLΣ, let α ∈ {ILP, aLL, R} and consider modalities ! <sup>α</sup> with f(α) = {E,A1,A2} ∪ A where A ⊆ {C, W} is the set of axioms whose corresponding rules are in α. The translation τ<sup>α</sup> prefixes *every subformula* with the modality ! α . For L ∈ {ILP, aLL, R} it is then straightforward to show that a structured sequent S is cut-free derivable in L iff its translation τα(S) is cut-free derivable in the logic described by ({α}, -, f) with the obvious relation, and f as given above.

*Example 8 (Structural variants of* FNL*).* Following the same script as above and starting from FNL:

	- If A = {A1,A2}, then we obtain the system FL;
	- If A = {E,A1,A2} then the resulting system corresponds to iMALL.
	- Adding C, W as options to A will result the affine/relevant versions of the systems above.

#### **2.2 An Example in Linguistics**

Since its inception, Lambek calculus [29] has been applied to the modeling of natural language syntax by means of categorial grammars. In a categorial grammar, each word is assigned one or several Lambek formulas, which serve as syntactic categories. For a simple example, John and Mary are assigned np ("noun phrase") and lovesgets

(np → s) ← np. Here s stands for "sentence", and loves is a transitive verb, which lacks noun phrases on both sides to become a sentence. Grammatical validity of "John loves Mary" is supported by derivability of the sequent np,(np → s) ← np, np ⇒ s. Notice that this derivability keeps valid also in the non-associative setting, if the correct nested structure is provided: (np,((np → s) ← np, np)) ⇒ s.

The original Lambek calculus L is associative. In some cases, however, associativity leads to over-generation, *i.e.*, validation of grammatically incorrect sentences. Lambek himself realized this and proposed the non-associative calculus NL in [30]. We will illustrate this issue with the example given in [38, Sect. 4.2.2]. The syntactic category assignment is as follows (where n stands for "noun"):

$$\begin{array}{c} \text{Words} \mid \text{Types} \\ \text{the } np \leftarrow n \\ \text{Hulk } n \\ \text{is } (np \rightarrow s) \leftarrow (n \leftarrow n) \\ \text{green, irreducible } n \leftarrow n \end{array}$$

With this assignment, sentences "The Hulk is green" and "The Hulk is incredible" are correctly marked as valid, by deriving the sequent

$$(np \gets n, n), ((np \to s) \gets (n \gets n), n \gets n) \Rightarrow s$$

However, in the associative setting the sequent for the phrase "The Hulk is green incredible,"which is grammatically incorrect, also becomes derivable:

$$(np \leftarrow n, n, (np \rightarrow s) \leftarrow (n \leftarrow n), n \leftarrow n, n \leftarrow n \Rightarrow s, )$$

essentially due to derivability of n ← n, n ← n ⇒ n ← n.

In other situations, however, associativity is useful. Standard examples include handling of dependent clauses, *e.g.*, "the girl whom John loves," which is validated as a noun phrase by the following derivable sequent:

$$np \leftarrow n, n, (n \rightarrow n) \leftarrow (s \leftarrow np), np, (np \rightarrow s) \leftarrow np \Rightarrow np$$

Here (n → n) ← (s ← np) is the syntactic category for who.

Our subexponential extension of NL, however, handles this case using local associativity instead of the global one. Namely, the category for whom now becomes (n → n) ← (s ← ! <sup>a</sup>np), where ! <sup>a</sup> is a subexponential which allows the A2 rule, and the following sequent is happily derivable:

$$np \gets n, (n, ((n \to n) \gets (s \gets ! ^
perp), (np, (np \to s) \gets np))) \Rightarrow np$$

The necessity of this more fine-grained control of associativity, instead of a global associativity rule, is seen via a combination of these examples. Namely, we talk about sentences like "The superhero whom Hawkeye killed was incredible" and "... was green". With ! <sup>a</sup>, each of them is handled in the same way as the previous examples:

$$\begin{aligned} ((np \leftarrow n, (n, ((n \rightarrow n) \leftarrow (s \leftarrow ! "np), (np, (np \rightarrow s) \leftarrow np)))),\\ ((np \rightarrow s) \leftarrow (n \leftarrow n), n \leftarrow n) \Rightarrow s. \end{aligned}$$

On one hand, without ! <sup>a</sup> this sequent cannot be derived in the non-associative system. On the other hand, if we make the system globally associative, it would validate incorrect sentences like "The superhero whom Hawkeye killed was green incredible."

### **3 Some Undecidability Results**

Non-associativity makes a significant difference in decidability and complexity matters. For example, while L is NP-complete [47], NL is decidable in polynomial time [1,14].

For our system acLLΣ, its decidability or undecidability depends on its signature Σ. In fact, we have a family of different systems acLLΣ, with Σ as a parameter. Recall that the subexponential signature Σ controls not just the number of subexponentials and the preorder among them. More importantly, it dictates, for each subexponential, which structural rules this subexponential licenses. If for every i ∈ I we have C ∈/ f(s), that is, no subexponential allows contraction, then acLL<sup>Σ</sup> is clearly decidable, since the cut-free proof search space is finite. Therefore, for undecidability it is necessary to have at least one subexponential which allows contraction.

For a non-associative system with only one fully-powered exponential modality s (that is, f(s) = {E, C, W,A1,A2}), undecidability was proven in a preprint by Tanaka [51], based on Chvalovsky's [ ´ 11] result on undecidability of the finitary consequence relation in FNL.

In this section, we prove two undecidability results. The first one is a refinement of Tanaka's result: We establish undecidability with at least one subexponential which allows contraction and weakening (commutativity/associativity are optional), in a subsystem containing only the additive connective ⊕ and the multiplicatives ⊗ and →.

The second undecidability result is for the minimalistic, purely multiplicative fragment, which includes only → (not even ⊗). As a trade-off, however, it requires two subexponentials: the "main" one, which allows contraction, exchange, and associativity (weakening is optional), and an "auxiliary" one, which allows only associativity.

It should be noted that this undecidability result is orthogonal to Tanaka's [51], and the proof technique is essentially different. Indeed, Chvalovsky's undecidability ´ theorem does not hold for the non-associative Lambek calculus without additives, where the consequence relation is decidable [7].

Finally, we observe that *if* the intersection of these systems is decidable (which is still an open question), then our two undecidability results are *incomparable:* we have two undecidable fragments of acLLΣ, but their common part, which includes only divisions and one exponential, would be decidable.

#### **3.1 Undecidability with Additives and One Subexponential**

We are going to derive the next theorem from undecidability of the finitary consequence relation in FNL [11]. Recall that FNL is, in fact, the fragment of acLL<sup>Σ</sup> without subexponentials (that is, with an empty I).

**Theorem 9.** *If there exists such* s ∈ I *that* f(s) ⊇ {C, W}*, then the derivability problem in* acLL<sup>Σ</sup> *is undecidable. Moreover, this holds for the fragment with only* ⊗*,* →*,* ⊕, ! s *.*

In fact, using C and W, one can also derive A1, A2, E1, and E2. Therefore, if f(s) ⊇ {C, W}, then ! <sup>s</sup> is actually a full-power exponential modality. (In the proof of Theorem 9 below, we use only W and C rules, in order to avoid confusion.) However, Theorem 9 does not directly follow from undecidability of propositional linear logic [32], because here the basic system is non-associative and non-commutative, while linear logic is both associative and commutative. Thus, we need a different encoding for undecidability.

Let **Φ** be a finite set of FNL sequents. By FNL(**Φ**) let us denote FNL extended by adding sequents from **Φ** as additional (non-logical) axioms. In general, FNL(**Φ**) does not enjoy cut-elimination, so mcut is kept as a rule of inference in FNL(**Φ**). A sequent Γ ⇒ F is called *a consequence of* **Φ** if this sequent is derivable in FNL(**Φ**).

**Theorem 10 (Chvalovsky´** [11]**).** *The consequence relation in* FNL *is undecidable, that is, there exists no algorithm which, given* **Φ** *and* Γ ⇒ F*, determines whether* Γ ⇒ F *is a consequence of* **Φ***. Moreover, undecidability keeps valid when* **Φ** *and* Γ ⇒ F *are built from variables using only* ⊗ *and* ⊕*.*

Now, in order to prove Theorem 9, we internalize **Φ** into the sequent using ! s , assuming f(s) ⊇ {C, W}.

First we notice that we may suppose, without loss of generality, that all sequents in **Φ** are of the form ⇒ A, that is, have empty antecedents. Namely, each sequent of the form Π ⇒ B can be replaced by ⇒ ( Π) → B, where Π is obtained from Π by replacing each comma with ⊗. Indeed, these sequents are derivable from one another: from Π ⇒ B to ⇒ ( Π) → B we apply a sequence of ⊗L followed by → R, and for the other direction we apply a series of cuts, first with ( Π,( Π) → B) ⇒ B, and then with (F, G) ⇒ F ⊗ G several times, for the corresponding subformulas of Π. The following embedding lemma ("modalized deduction theorem") holds.

**Lemma 11.** *The sequent* Γ ⇒ F *is a consequence of* **Φ** = { ⇒ A1,..., ⇒ An} *if and only if the sequent* (...((!<sup>s</sup> A1, ! s A2), ! s A3),..., ! s An), Γ ⇒ F *is derivable in* acLLΣ*.*

*Proof.* Let us denote (...((!<sup>s</sup> A1, ! s A2), ! s A3),..., ! s An) by !Φ. Notice that C and W can be applied to !Φ as a whole; this is easily proven by induction on n.

For the "only if" direction let us take the derivation of Γ ⇒ F in FNL(**Φ**) (with cuts) and replace each sequent of the form Δ ⇒ G in it with (!Φ, Δ) ⇒ G, and each sequent of the form ⇒ G with !Φ ⇒ G. The translations of non-logical axioms from **Φ** are derived as follows:

$$\begin{array}{c} \overline{A\_i \Rightarrow A\_i} \quad \text{init} \\ \frac{!^s A\_i \Rightarrow A\_i}{!^\Phi \Rightarrow A\_i} \text{ } \mathsf{W}, \ n-1 \text{ times} \end{array}$$

Translations of axioms init and 1R are derived from the corresponding original axioms by W, n times; -R remains valid.

Rules ⊗L, ⊕L, ⊕Ri, &Li, &R, and 1L remain valid. For → L, ← L, and mcut we contract !Φ as a whole:

$$\frac{\begin{array}{l} (\!\langle \Phi, \Delta \rangle \Rightarrow F \quad \langle !\Phi, \varGamma \{ \!\langle \Phi \rangle \} \Rightarrow H \end{array}}{\begin{array}{l} (\!\langle \Phi, \varGamma \{ \!\langle \langle \Phi, \Delta \rangle, F \to G \rangle \} \rangle \Rightarrow H \end{array}} \rightarrow L \quad \frac{\begin{array}{l} (\!\langle \Phi, \Delta \rangle \Rightarrow F \quad \langle !\Phi, \varGamma \{ \!\langle \Phi, \varGamma \{ \!\langle \Phi, \bot \} \rangle \} \Rightarrow C \end{array}}{\begin{array}{l} (\!\langle \Phi, \varGamma \{ \!\langle \!\langle \Phi, \Delta \rangle \} \ldots \, \{ \!\langle \!\langle \Phi, \bot \rangle \} \} \Rightarrow C \end{array}} \begin{array}{l} \mathfrak{m} \text{cut} \\ \text{ $\langle \Phi, \varGamma \{ \!\langle \Phi, \bot \} \rangle \exists \, \langle \!\langle \Phi, \bot \rangle \rangle$ } \\ \mathfrak{m} \text{cut} \end{array}$$

For ⊗R, → R, and ← R, we combine contraction and weakening:

$$\frac{\frac{(!\Phi,\varGamma\_{1})\Rightarrow F}{((!\Phi,\varGamma\_{1}),(!\Phi,\varGamma\_{2}))\Rightarrow F\otimes G}\circledast R}{\frac{(!\Phi,(!\Phi,\varGamma\_{1}),(!\Phi,\varGamma\_{2}))\Rightarrow F\otimes G}{(!\Phi,((!\Phi,\varGamma\_{1}),(!\Phi,\varGamma\_{2})))\Rightarrow F\otimes G}\hph\_{1}}\hph\_{1}\quad\frac{\frac{(!\Phi,(F,\varGamma))\Rightarrow G}{(!\Phi,(F,(!\Phi,\varGamma))\Rightarrow G}}{\mathsf{C}}\hph\_{1}}{\mathsf{C}}\hph\_{1}$$

Notice that our original derivation was in FNL(**Φ**), so it does not include rules operating subexponentials.

For the "if" direction we take a cut-free proof of (!Φ, Γ) ⇒ F in acLL<sup>Σ</sup> and erase all formulas which include the subexponential. In the resulting derivation tree all rules and axioms, except those which operate ! s , remain valid. Structural rules for ! <sup>s</sup> trivialize (since the !-formula was erased). The ! s R rule could not have been used, since we do not have positive occurrences of ! s F, and our proof is cut-free.

Finally, der translates into

$$\frac{\Gamma\{A\_i\} \Rightarrow G}{\Gamma\{\} \Rightarrow G}$$

This is modeled by cut with one of the sequents from **Φ**:

$$\frac{\Rightarrow A\_i \quad \Gamma\{A\_i\} \Rightarrow G}{\Gamma\{\} \Rightarrow G} \text{ און} \text{ און}$$

Thus, we get a correct derivation in FNL(**Φ**).

Theorem 10 and Lemma 11 immediately yield Theorem 9.

#### **3.2 Undecidability Without Additives and with Two Subexponentials**

**Theorem 12.** *If there are* a, c ∈ I *such that* f(a) = {A1,A2} *and* f(c) ⊇ {C, E,A1,A2}*, then the derivability problem in* acLL<sup>Σ</sup> *is undecidable. Moreover, this holds for the fragment with only* →*,* ! <sup>a</sup>*, and* ! c *.*

Remember from Example 8 that SMALC<sup>Σ</sup> [21] denotes the extension of FL with subexponentials. The undecidability theorem above is proved by encoding the one-division fragment of SMALC<sup>Σ</sup> containing one exponential c such that f(c) ⊇ {C, E}. It turns out that that such a system is undecidable.

**Theorem 13 (Kanovich et al.** [22,23]**).** *If there exists such* c ∈ I *that* f(c) ⊇ {C, E}*, then the derivability problem in* SMALC<sup>Σ</sup> *is undecidable. Moreover, this holds for the fragment with only* → *and* ! c *.*

Observe that SMALC<sup>Σ</sup> can be obtained from acLL<sup>Σ</sup> by adding "global" associativity rules:

$$\frac{\Gamma\{ ( (\Delta\_1, \Delta\_2), \Delta\_3 ) \} \Rightarrow G}{\Gamma\{ (\Delta\_1, (\Delta\_2, \Delta\_3)) \} \Rightarrow G} \qquad \frac{\Gamma\{ (\Delta\_1, (\Delta\_2, \Delta\_3)) \} \Rightarrow G}{\Gamma\{ ( (\Delta\_1, \Delta\_2), \Delta\_3) \} \Rightarrow G}$$

The usual formulation of SMALCΣ, of course, uses sequences of formulas instead of nested structures as antecedents. The alternative formulation, however, would be more convenient for us now. It will be also convenient for us to regard all subexponentials in SMALC<sup>Σ</sup> to be associative, that is, f(s) ⊇ {A1,A2} for each s ∈ I.

In order to embed SMALC<sup>Σ</sup> into acLLΣ, we define two translations, A!<sup>−</sup> and A!+, by mutual recursion:

$$\begin{aligned} &z^{!-} = !^{a}z & \qquad z^{!+} = z & \text{where } z \text{ is a variable, } 1, \text{ or } \top\\ &(A \to B)^{!-} = !^{a}(A^{!+} \to B^{!-}) & (A \to B)^{!+} = A^{!-} \to B^{!+}\\ &(B \gets A)^{!-} = !^{a}(B^{!-} \gets A^{!+}) & (B \gets A)^{!+} = B^{!+} \gets A^{!-}\\ &(A \oplus B)^{!-} = !^{a}(A^{!-} \oplus B^{!-}) & (A \oplus B)^{!+} = A^{!+} \oplus B^{!+} & \text{where } \oplus \in \{\otimes, \oplus, \&\}\\ &(!^{a}A)^{!-} = !^{a}(A^{!-}) & (!^{a}A)^{!+} = !^{a}(A^{!+}) \end{aligned}$$

Informally, our translation adds a ! <sup>a</sup> over any formula (not only over atoms) of negative polarity, unless this formula was already marked with a ! s . Thus, all formulae in antecedents would begin with either the new subexponential ! <sup>a</sup> or one of the old subexponentials ! s , and all these subexponentials allow associativity rules A1 and A2.

**Lemma 14.** *A sequent* A1,...,A<sup>n</sup> ⇒ B *is derivable in* SMALC<sup>Σ</sup> *if and only if its translation* (...(A!<sup>−</sup> <sup>1</sup> , A!<sup>−</sup> <sup>2</sup> ),...,A!<sup>−</sup> <sup>n</sup> ) <sup>⇒</sup> <sup>B</sup>!+ *is derivable in* acLLΣ*.*

*Proof.* For the "only if" part, let us first note that each formula A!<sup>−</sup> <sup>i</sup> is of the form ! s F and A1,A2 ∈ f(s). Indeed, either s is an "old" subexponential label (for which we added A1,A2) or s = a. Thus brackets can be freely rearranged in the antecedent.

Now we take a cut-free proof of A1,...,A<sup>n</sup> ⇒ B in SMALC<sup>Σ</sup> and replace each sequent in it with its translation. Right rules for connectives other than subexponentials, i.e., ⊗R, ⊕Ri, &R, → R, and ← R, remain valid as they are, up to rearranging brackets in antecedents. For ! i R, we notice that the translation of a formula of the form ! j F, where j i, is also a formula of the form ! j F . Thus, this rule also remains valid. The same holds for the dereliction rule der, because (!<sup>i</sup> F)!<sup>−</sup> is exactly ! i (F!−). Finally, the "old" structural rules (exchange, contraction, weakening) also remain valid (up to rearranging of brackets), since ! i F gets translated into ! i (F!−), which enjoys the same structural rules.

For the other left rules, we need to derelict ! <sup>a</sup> first, and then perform the corresponding rule application. Rearrangement of brackets, if needed, is performed below dereliction or above the application of the rule in question.

The "if" part is easier. Given a derivation of (...(A!<sup>−</sup> <sup>1</sup> , A!<sup>−</sup> <sup>2</sup> ),...,A!<sup>−</sup> <sup>n</sup> ) <sup>⇒</sup> <sup>B</sup>!+ in acLLΣ, we erase ! <sup>a</sup> everywhere, and consider it as a derivation in SMALCΣ. Associativity rules for the erased ! <sup>a</sup> (which are the only structural rules for this subexponential) keep valid, because now associativity is global. Dereliction and right introduction for ! <sup>a</sup> trivialize. All other rules, which do not operate ! <sup>a</sup>, remain as they are. Thus, we get a derivation of A1,...,A<sup>n</sup> ⇒ B in SMALCΣ, since erasing ! <sup>a</sup> makes our translations just identical.

#### **4 Related Work and Conclusion**

In this paper, we have presented acLLΣ, a sequent-based system for non-associative, non-commutative linear logic with subexponentials. Starting form FNL, we modularly and uniformly added rules for exchange, associativity, weakening and contraction, which can be applied with the subexponentials having with the respective features. This allows for the application of structural rules locally, and it conservatively extends well known systems in the literature, continuing the path of controlling structural properties started by Girard himself [16].

Another approach to combining associative and non-associative behavior in Lambek-style grammars is the framework of *the Lambek calculus with brackets* by Morrill [39,40] and Moortgat [34]. The bracket approach is dual to ours: there the base system is associative, and brackets, which are controlled by bracket modalities, introduce local non-associativity. Both the associative Lambek calculus and the nonassociative Lambek calculus can be embedded into the Lambek calculus with brackets: the former is just by design of the system and the latter was shown by Kurtonina [26] by constructing a translation.

From the point of view of generative power, however, the (associative) Lambek calculus with brackets is weaker than the non-associative system with subexponentials, which is presented in this paper. Namely, as shown by Kanazawa [19], grammars based on the Lambek calculus with brackets can generate only context-free languages. In contrast, grammars based on our system with subexponentials go beyond context-free languages, even when no subexponential allows contraction (subexponentials allowing contraction may lead to undecidability, as shown in the last section).

As a quick example, let us consider a subexponential ! ae which allows both associativity (A1 and A2) and exchange (E). If we put this subexponential over any (sub)formula, the system becomes associative and commutative. Using this system, one can describe the non context-free language MIX3, which contains all non-empty words over {a, b, c}, in which the numbers of a, b, and c are equal. Indeed, MIX<sup>3</sup> is the permutation closure of the language {(abc)<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>1</sup>}. The latter is regular, therefore context-free, and therefore definable by a Lambek grammar. The ability of our system to go beyond context-free languages is important from the point of view of applications, since there are known linguistic phenomena which are essentially non-context-free [49].

Regarding decidability, let us compare our results with the more well-known associative non-commutative and associative commutative cases.

In the associative and commutative case the situation is as follows. In the presence of additives, the system is known to be undecidable with one exponential modality [32]. Without additives, we get MELL, the (un)decidability of which is a well-known open problem [50]. However, with two subexponentials MELL again becomes undecidable [9]. Thus, we have the same trade-off as in our non-associative non-commutative case: for undecidability one needs either additives, or two subexponentials.

Our results help to shed some light in the (un)decidability problem for the spectrum of logical systems surrounding MELL/FNL, allowing for a fine-grained analysis of the problem, specially the trade-offs on connectives and subexponentials for guaranteeing (un)decidability.

There is a lot to be done from now on. First of all, we would like to analyze better the minimalist fragment of acLL<sup>Σ</sup> containing only implication and one fully-powered subexponential, as it seems to be crucial for understanding the lower bound of undecidability (or the upper bound of decidability). Second, one should definitely explore more the use of acLL<sup>Σ</sup> in modeling natural language syntax. The examples in Sect. 2.2 show how to locally combine sentences with different grammatical characteristics, and the MIX<sup>3</sup> example above illustrates how that can be of importance. That is, it would be interesting to have a formal study about acLL<sup>Σ</sup> and categorial grammars. Third, we plan to investigate the connections between our work and Adjoint logic [48] as well as with Display calculus [5,12]. Finally, we intend to study proof-theoretic properties of acLLΣ, such as normalization of proofs (*e.g.* via focusing) and interpolation.

**Acknowledgements.** We are grateful for the useful suggestions from the anonymous referees. We would like to thank L. Beklemishev, M. Moortgat, and C. Retore for their inspiring and ´ helpful comments regarding an approach based on non-associativity.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Effective Semantics for the Modal Logics K and KT via Non-deterministic Matrices**

Ori Lahav<sup>1</sup> and Yoni Zohar2(B)

<sup>1</sup> Tel Aviv University, Tel Aviv-Yafo, Israel <sup>2</sup> Bar Ilan University, Ramat Gan, Israel yoni.zohar@biu.ac.il

**Abstract.** A four-valued semantics for the modal logic K is introduced. Possible worlds are replaced by a hierarchy of four-valued valuations, where the valuations of the first level correspond to valuations that are legal w.r.t. a basic non-deterministic matrix, and each level further restricts its set of valuations. The semantics is proven to be effective, and to precisely capture derivations in a sequent calculus for K of a certain form. Similar results are then obtained for the modal logic KT, by simply deleting one of the truth values.

## **1 Introduction**

Propositional modal logics extend classical logic with *modalities*, intuitively interpreted as necessity, knowledge, or temporal operators. Such extensions have several applications in computer science and artificial intelligence (see, e.g., [7,9,13]).

The most common and successful semantic framework for modal logics is the so called *possible worlds semantics*, in which each world is equipped with a twovalued valuation, and the semantic constraints regarding the modal operators consider the valuations in *accessible* worlds. While this has been the gold standard for modal logic semantics for many years, alternative semantic frameworks have been proposed. One of these approaches, initiated by Kearns [10], is based on an infinite sequence of sets of valuations in a non-deterministic many-valued semantics. Since then, several non-deterministic many-valued semantics, without possible worlds, were developed for modal logics (see, e.g., [4,8,12,14]). The current paper is a part of that body of work. Having an alternative semantic framework for modal logics, different than the common possible worlds semantics, has the potential of exposing new intuitions and understandings of modal logics, and also to form the basis to new decision procedures.

Our main contribution is a four-valued semantics for the modal logic K. The key characteristic of the semantics that we present is *effectiveness*: when checking

We thank the anonymous reviewers for their useful feedback. This research was supported by NSF-BSF (grant number 2020704), ISF (grant numbers 619/21 and 1566/18), and the Alon Young Faculty Fellowship.

c The Author(s) 2022

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 468–485, 2022. https://doi.org/10.1007/978-3-031-10769-6\_28

for the entailment of a formula ϕ from a set Γ of formulas in K, it suffices to only consider *partial* models, defined over the subformulas of Γ and ϕ. To the best of our knowledge, this is the first effective Nmatrices-based semantics for K. Such a semantics has the potential of being subject to reductions to classical satisfiability [3], as it is based on finite-valued truth tables, and thus improving the performance of solvers for modal logic by utilizing off-the-shelf SAT solvers. Another advantage of this semantics is that it precisely captures derivations in a sequent calculus for K that admit a certain property. Following Kearns, models of this semantics are based on the concept of *levels*—valuations of level 0 are the ordinary valuations of Nmatrices, while each level m > 0 introduces more constraints. We show that valuations of level m correspond to derivations in the calculus whose largest number of applications of the rule that correspond to the axiom (K) in any branch of the derivation is at most m. Our restrictions between the levels are more complex than the original restrictions in Kearns' work, in order to obtain effectiveness. Another precise correspondence between the semantics and the proof system that we prove, is between the domains of valuations and the formulas allowed to be used in derivations.

Finally, we observe that by deleting one of the truth values, a three-valued semantics for the modal logic KT is obtained, which is similar to the one presented in [8]. Like the case of K, the resulting semantics is effective, and tightly correspond to derivations in a sequent calculus for KT.

*Outline.* The paper is organized as follows: Sect. 2 reviews standard notions in non-deterministic matrices. In Sect. 3, we present our semantics for the modal logic K, as well as the sequent calculus our investigation will be based on, which is coupled with the notion of (K)-depth of derivations. In Sect. 4, we prove soundness and completeness theorems between the sequent calculus and the semantics. In Sect. 5, we prove that the semantics that we provide is effective, not only for deciding entailment, but also for producing countermodels when an entailment does not hold. In Sect. 6 we establish similar results for the modal logic KT. We conclude with §7, where directions for future research are outlined.

*Related Work.* In [10], Kearns initiated the study of modal semantics without possible worlds. This work was recently revisited by Skurt and Omori [14], who generalized Kearns' work and reframed his framework within the framework of logical Non-deterministic matrices. As indicated in [14], it was not clear how to make this semantics effective, as it requires checking truth values of infinitely many formulas when considering the validity of a given formula (see, e.g., Remark 42 of [14]). In [4], Coniglio et al. develop a similar framework for modal logics, and some bound over the formulas that need to be considered was achieved. However, in [5], the authors clarified that it is unclear how to effectively use the resulting semantics. A semantics based on Nmatrices for the modal logics KT and S4 was presented in [8] by Gr¨atz, that includes a method to extend a partial model in that semantics into a total one, which results in an effective semantics. We chose here to focus on K, which is a weaker logic, forming a common basis to all other normal modal logics. By deleting one out of four truth values, we obtain corresponding results for KT as well. The semantics that we present here is similar in nature to the one presented in [8], however: (i) the truth tables are different, as we intentionally enforced the many-valued tables of the classical connectives to be obtained by a straightforward duplication of truth values from the original two-valued truth tables; and (ii) the semantic condition for levels of valuations that we define here is inductive, where each level relies on lower levels (thus refraining from a definition of a more cyclic nature as the one in [8], that is better understood operationally). A variant of the semantics from [14] was also introduced and studied in [12], but without considering the ability to perform effective automated reasoning but instead focusing on infinite valuations rather than on partial ones. A complete proof theoretic characterization in terms of sequent calculi to the various levels of valuations was not given in any of the above works. Also, an effective semantics for K, which is the most basic modal logic, was not given in any of the above works.

Non-deterministic matrices were introduced in [2], and have since became a useful tool for investigating non-classical logics and proof systems (see [1] for a survey). They generalize (deterministic) matrices [15] by allowing a nondeterministic choice of truth values in the truth tables. Like matrices, Nmatrices enjoy the semantic *analyticity* property, which allows one to extend a partial valuation into a full one. Our semantic framework can be viewed as a further refinement of non-deterministic matrices, namely *restricted* non-deterministic matrices, introduced in [6].

# **2 Preliminaries**

In this section we provide the necessary definitions about Nmatrices following [1]. We assume a propositional language L with countably infinitely many atomic variables p1, p2,.... When there is no room for confusion, we identify L with its set of well-formed formulas (e.g., when writing ϕ ∈ L). We write *sub*(ϕ) for the set of subformulas of a formula ϕ. This notation is extended to sets of formulas in the natural way.

*Valuations.* In the context of a set V of "truth values", a *valuation* is a function v from some domain Dom(v) ⊆ L to V. For a set F⊆L, an F-*valuation* is a valuation with domain F. (In particular, an L-valuation is defined on all formulas.) For <sup>X</sup> ⊆ V, we write <sup>v</sup>−<sup>1</sup>[X] for the set {<sup>ϕ</sup> <sup>|</sup> <sup>v</sup>(ϕ) <sup>∈</sup> <sup>X</sup>}. For <sup>x</sup> ∈ V, we also write <sup>v</sup>−<sup>1</sup>[x] for the set {<sup>ϕ</sup> <sup>|</sup> <sup>v</sup>(ϕ) = <sup>x</sup>}.

**Definition 1.** Let D⊆V be a set of "designated truth values". A valuation v D-*satisfies* a formula ϕ, denoted by v |=<sup>D</sup> ϕ, if v(ϕ) ∈ D. For a set Σ of formulas, we write v |=<sup>D</sup> Σ if v |=<sup>D</sup> ϕ for every ϕ ∈ Σ.

**Notation 2.** Let D⊆V be a set of designated truth values and V be a set of valuations. For sets L, R of formulas, we write <sup>L</sup> <sup>V</sup> <sup>D</sup> <sup>R</sup> if for every <sup>v</sup> <sup>∈</sup> <sup>V</sup>, <sup>v</sup> <sup>|</sup>=<sup>D</sup> <sup>L</sup> implies that v |=<sup>D</sup> ϕ for some ϕ ∈ R. We omit L or R in this notation when they are empty (e.g., when writing <sup>V</sup> <sup>D</sup> <sup>R</sup>), and set parentheses for singletons (e.g., when writing <sup>L</sup> <sup>V</sup> <sup>D</sup> <sup>ϕ</sup>).

*Nmatrices.* An Nmatrix M for L is a triple of the form V, D, O, where V is a set of *truth values*, D⊆V is a set of *designated truth values*, and O is a function assigning a *truth table* <sup>V</sup><sup>n</sup> <sup>→</sup> <sup>P</sup>(V)\ {∅} to every <sup>n</sup>-ary connective of <sup>L</sup> (which assigns a set of possible values to each tuple of values). In the context of an Nmatrix M = V, D, O, we often denote O( ) by ˜ .

An F-valuation v is M-*legal* if v(ϕ) ∈ pos-val(ϕ, M, v) for every formula ϕ ∈ F whose immediate subformulas are contained in F, where pos-val(ϕ, M, v) is defined by:


In other words, there is no restriction regarding the values assigned to atomic formulas, whereas the values of compound formulas should respect the truth tables.

**Lemma 1 (**[1]**).** *Let* F⊆L *be a set closed under subformulas and* M *an Nmatrix for* L*. Then every* M*-legal* F*-valuation* v *can be extended to an* M*-legal* L*-valuation.*

# **3 The Modal Logic** K

In this section we introduce a novel effective semantics for the model logic K. We first present a known proof system for this logic (Sect. 3.1), and then our semantics (Sect. 3.2). From here on, we assume that the language L consists of the connectives ⊃, ∧, ∨, ¬ and with their usual arities. The standard ♦ operator can be defined as a macro ♦ϕ def = ¬-¬ϕ. Obviously, using De-Morgan rules, fewer connectives can be used. However, we chose this set of connectives in order to have a primitive language rich enough for the examples that we include along the paper.

#### **3.1 Proof System**

Figure 1 presents a Gentzen-style calculus, denoted by GK, for the modal logic K that was proven to be equivalent to the original formulation of the logic as a Hilbert system (see, e.g., [16]). We take *sequents* to be pairs Γ,Δ of finite *sets* of formulas. For readability, we write Γ ⇒ Δ instead of Γ,Δ and use standard notations such as Γ, ϕ ⇒ ψ instead of (Γ ∪ {ϕ}) ⇒ {ψ}.

The (cut) rule is included in G<sup>K</sup> for convenience, but applications of (cut) can be eliminated from derivations (see, e.g., [11]). Since the focus of this paper is semantics rather than cut-elimination, we allow ourselves to use cut freely and do not distinguish derivations that use it from derivations that do not. We write <sup>G</sup><sup>K</sup> Γ ⇒ Δ if there is a derivation of a sequent Γ ⇒ Δ in the calculus GK.

In the sequel, we provide a semantic characterization of <sup>G</sup><sup>K</sup> . It is based on a more refined notion of derivability that takes into account: (i) the set F of formulas used in the derivation; and (ii) the (K)-*depth* of the derivation, as defined next.

$$\begin{array}{llll} \left( \text{(WEAK)} \right) \frac{\Gamma \Rightarrow \Delta}{\Gamma, \Gamma' \Rightarrow \Delta, \Delta'} & \quad \left( \text{(D)} \right) \frac{\Gamma \Rightarrow \varphi, \Delta}{\Gamma, \varphi \Rightarrow \varphi, \Delta} & \quad \left( K \right) \frac{\Gamma \Rightarrow \varphi}{\Box \Gamma \Rightarrow \Box \varphi} \\\\ \left( \begin{array}{ll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma, \neg \varphi \Rightarrow \Delta \end{array} & \left( \Rightarrow \text{)} \end{array} \right) \begin{array}{llll} \Gamma \Rightarrow \varphi, \Delta \\ \left( \begin{array}{ll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma \Rightarrow \neg \varphi, \Delta \end{array} & \left( \begin{array}{ll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma \Rightarrow \neg \varphi, \Delta \end{array} \right) \begin{array}{llll} \Gamma, \forall \Rightarrow \varphi \\ \Gamma, \forall \Rightarrow \Delta \\ \Gamma, \varphi \Rightarrow \psi, \Delta \end{array} \\\\ \left( \begin{array}{ll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma, \varphi \Rightarrow \Delta \end{array} & \left( \begin{array}{ll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma \Rightarrow \varphi \land \psi, \Delta \end{array} \right) \begin{array}{llll} \Gamma, \varphi \Rightarrow \psi, \Delta \\ \Gamma \Rightarrow \varphi \Rightarrow \psi, \Delta \\ \Gamma \Rightarrow \varphi \Rightarrow \psi, \Delta \end{array} \\\\ \left( \begin{array}{ll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma, \varphi \Rightarrow \psi, \Delta \end{array} \right) \begin{array}{llll} \Gamma \Rightarrow \varphi, \Delta \\ \Gamma \Rightarrow \varphi \Rightarrow \psi, \Delta \end{array} \end{array}$$

**Fig. 1.** The sequent calculus G<sup>K</sup>

**Definition 3.** A *derivation* of a sequent Γ ⇒ Δ in G<sup>K</sup> is a tree in which the nodes are labeled with sequents, the root is labeled with Γ ⇒ Δ, and every node is the result of an application of some rule of G<sup>K</sup> where the premises are the labels of its children in the tree. A derivation is called an F-*derivation* if it employs only sequents composed of formulas from F. The (K)-*depth* of a derivation is the maximal number of applications of rule (K) in any of the branches of the derivation.

**Notation 4.** We write <sup>F</sup>,m <sup>G</sup><sup>K</sup> Γ ⇒ Δ if there is a derivation of Γ ⇒ Δ in G<sup>K</sup> in which only F-sequents occur and that has (K)-depth at most m. We drop F from this notation when F = L; and drop m to dismiss the restriction regarding the (K)-depth.

*Example 1.* Let ϕ def = -(p<sup>1</sup> ∧ p2) ⊃ (p<sup>1</sup> ∧ p2) and F = *sub*(ϕ). The following is a derivation of ⇒ ϕ in G<sup>K</sup> that only uses F-formulas and has (K)-depth of 1 (though the number of applications of (K) in the derivation is 2):

$$\begin{array}{c} \overline{\frac{p\_1, p\_2 \Rightarrow p\_1}{p\_1 \land p\_2 \Rightarrow p\_1}}(\land\Rightarrow) \qquad \overline{\frac{p\_1, p\_2 \Rightarrow p\_2}{p\_1 \land p\_2 \Rightarrow p\_2}}(\land\Rightarrow) \\ \overline{\frac{\Box(p\_1 \land p\_2) \Rightarrow \Box p\_1}{\Box(p\_1 \land p\_2) \Rightarrow \Box p\_1 \land \Box p\_2}}(\land\Rightarrow) \qquad \overline{\frac{\Box(p\_1 \land p\_2) \Rightarrow \Box p\_2}{\Box(p\_1 \land p\_2) \Rightarrow \Box p\_2}}(\text{K}) \\ \hline \end{array}$$

#### **3.2 Semantics**

The semantics is based on a four-valued Nmatrix stratified with "levels", where for every m, legal valuations of level m + 1 are a subset of legal valuations of level m. The underlying Nmatrix, denoted by MK, is obtained by duplicating the classical truth values. Thus, the sets of truth values and of designated truth values are given by:

$$\mathcal{V}\_4 \stackrel{\text{def}}{=} \{ \mathsf{T}, \mathsf{t}, \mathsf{f}, \mathsf{F} \} \qquad \qquad \mathcal{D} \stackrel{\text{def}}{=} \{ \mathsf{T}, \mathsf{t} \}.$$

The truth tables are as follows (we have D = {f, F}):


We employ the following notations for subsets of truth values:

$$\mathsf{TF} \stackrel{\mathsf{a}\*}{=} \{\mathsf{T}, \mathsf{F}\} \qquad \mathsf{tf} \stackrel{\mathsf{a}\*}{=} \{\mathsf{t}, \mathsf{f}\}$$

For the classical connectives, the truth tables of M<sup>K</sup> treat t just like T, and f just like F, and are essentially two-valued—the result is either D or D, and it depends solely on whether the inputs are elements of D or D. Thus, for the language without -, this Nmatrix provides a (non-economic) four-valued semantics for classical logic.

While the output for is also always D or D, it differentiates between T (that results in D) and t (that results in D), and similarly between F and f. In fact, this table is captured by the condition: -˜ (x) ∈ D iff <sup>x</sup> <sup>∈</sup> TF.

*Example 2.* Let F = *sub*(ϕ) where ϕ is the formula from Example 1. The following valuation v is an F-valuation that is MK-legal:

$$v(p\_1) = v(p\_2) = \mathsf{f} \quad v(p\_1 \wedge p\_2) = \mathsf{F} \quad v(\Box p\_1) = v(\Box p\_2) = v(\Box p\_1 \wedge \Box p\_2) = \mathsf{F}$$

$$v(\Box (p\_1 \wedge p\_2)) = \mathsf{T} \quad v(\Box (p\_1 \wedge p\_2) \supset (\Box p\_1 \wedge \Box p\_2)) = \mathsf{F}$$

To show that it is MK-legal, one needs to verify that v(ψ) ∈ pos-val(ψ, MK, v) for each ψ ∈ F. For example, v(p1) = f ∈ V<sup>4</sup> = pos-val(p1, MK, v). As another example, since v(p1) = f, we have that pos-val(p1, MK, v) = -˜ (f) = {F, <sup>f</sup>}, and hence v(p1) = F ∈ pos-val(p1, MK, v). Notice that v does not satisfy ϕ.

The truth table for can be understood via "possible worlds" intuition. Our four truth values are intuitively captured as follows, assuming a given formula ψ and a world w:


In the possible worlds semantics, ψ holds in some world w iff ψ holds in every world that is accessible from w, which intuitively explains the table for -. Note that non-determinism is inherent here. For example, if ψ holds in w and in every world accessible from w (i.e., ψ has value T), we know that ψ holds in w, but we do not know whether ψ holds in every world accessible from w (thus ψ has value T or t).

Now, the Nmatrix M<sup>K</sup> by itself is not adequate for the modal logic K (as Examples 1 and 2 demonstrate). What is missing is the relation between the choices we make to resolve non-determinism for different formulas. Continuing with the possible worlds intuition, we observe that if a formula ϕ follows from a set of formulas Σ that hold in all accessible worlds (i.e., ϕ follows from formulas whose truth value is T or F), then ϕ itself should hold in all accessible worlds (i.e., ϕ's truth value should be T or F). Directly encoding this condition requires us to consider a set V of MK-legal F-valuations for which the following holds (recall Notation 2 from Sect. 2):

$$\forall v \in \mathbb{V}. \forall \varphi \in \mathcal{F}. (v^{-1}[\mathbb{T}] \vdash\_{\mathcal{D}}^{\mathbb{V}} \varphi \implies v(\varphi) \in \mathbb{T}) \tag{neressation} \text{ } (nessisation)$$

In turn, to obtain completeness we take a maximal set V that satisfies the *necessitation* condition. While it is possible to define this set of valuations as the greatest fixpoint of *necessitation*, following previous work, we find it convenient to reach this set using "levels":

**Definition 5.** The set VF,m <sup>K</sup> is inductively defined as follows:

– VF,<sup>0</sup> <sup>K</sup> is the set of MK-legal F-valuations. – VF,m+1 K def = - <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> | ∀<sup>ϕ</sup> ∈ F. v−<sup>1</sup>[TF] <sup>V</sup>F,m K <sup>D</sup> <sup>ϕ</sup> <sup>=</sup><sup>⇒</sup> <sup>v</sup>(ϕ) <sup>∈</sup> TF 

We also define:

$$\mathbb{V}\_{\mathsf{K}}^{\mathcal{F}} \stackrel{\scriptstyle \mathsf{def}}{=} \bigcap\_{m \geq 0} \mathbb{V}\_{\mathsf{K}}^{\mathcal{F}, m} \qquad \qquad \mathbb{V}\_{\mathsf{K}}^{m \stackrel{\scriptstyle \mathsf{def}}{=}} \mathbb{V}\_{\mathsf{K}}^{\mathcal{L}, m} \qquad \qquad \mathbb{V}\_{\mathsf{K}} \stackrel{\scriptstyle \mathsf{def}}{=} \bigcap\_{m \geq 0} \mathbb{V}\_{\mathsf{K}}^{\mathcal{L}, m}$$

Similarly to the idea originated by Kearns in [10], valuations are partitioned into *levels*, which are inductively defined. The first level, VF,<sup>0</sup> <sup>K</sup> , consists solely of the MK-legal valuations with domain F. For each m > 0, the m'th level is defined as a subset of the (m − 1)'th level, with an additional constraint: a valuation v from level m − 1 remains in level m, only if every formula ϕ ∈ F entailed (at the m − 1 level) from the set of formulas that were assigned a value from TF by v, is itself assigned a value from TF by v. As we show below, in the "end" of this process, by taking <sup>m</sup>≥<sup>0</sup> <sup>V</sup>F,m <sup>K</sup> , one obtains the greatest set V satisfying the *necessitation* condition

*Remark 1.* The *necessitation* condition is similar to the one provided in [8] to the modal logics KT and S4. In contrast, the condition from [4,10,14] is simpler and does not involve v−<sup>1</sup>[TF] at all, but also does not give rise to decision procedures.

*Example 3.* Following Example 2, while the formula ϕ is not satisfied by all valuations in VF,<sup>0</sup> <sup>K</sup> , it is satisfied by all valuations in VF,m <sup>K</sup> for every m > 0. In particular, the valuation v from Example 2 is not in VF,<sup>1</sup> <sup>K</sup> : we have <sup>p</sup>1∧p<sup>2</sup> <sup>V</sup>F,<sup>0</sup> K <sup>D</sup> p<sup>1</sup> and <sup>v</sup>(p<sup>1</sup> <sup>∧</sup> <sup>p</sup>2) = <sup>F</sup> (so <sup>p</sup><sup>1</sup> <sup>∧</sup> <sup>p</sup><sup>2</sup> <sup>∈</sup> <sup>v</sup>−<sup>1</sup>[TF]), but <sup>v</sup>(p1) = <sup>f</sup> <sup>∈</sup>/ TF.

For each set F⊆L and <sup>m</sup> <sup>≥</sup> 0, we obtain a consequence relation VF,m K D between sets of <sup>F</sup>-formulas. Disregarding <sup>m</sup>, we also obtain the relation V<sup>F</sup> K D (for every F), which we will show to be sound and complete for K. We note that all these relations are compact. The proof of the following theorem relies on the completeness theorems that we prove in Sect. 4.

#### **Theorem 1 (Compactness).**

*1. For every* <sup>m</sup> <sup>≥</sup> <sup>0</sup>*, if* <sup>L</sup> VF,m *K* <sup>D</sup> <sup>R</sup>*, then* <sup>Γ</sup> VF,m *K* <sup>D</sup> <sup>Δ</sup> *for some finite* <sup>Γ</sup> <sup>⊆</sup> <sup>L</sup> *and* Δ ⊆ R*. 2. If* <sup>L</sup> <sup>V</sup><sup>F</sup> *K* <sup>D</sup> <sup>R</sup>*, then* <sup>Γ</sup> <sup>V</sup><sup>F</sup> *K* <sup>D</sup> <sup>Δ</sup> *for some finite* <sup>Γ</sup> <sup>⊆</sup> <sup>L</sup> *and* <sup>Δ</sup> <sup>⊆</sup> <sup>R</sup>*.*

Now, to show that V<sup>F</sup> <sup>K</sup> is indeed the largest set V of MK-legal F-valuations that satisfies *necessitation*, we use the following two lemmas. The first is a general construction that relies only on the use of *finite-valued* valuation functions.

**Lemma 2.** *Let* v0, v1, v2,... *be an infinite sequence of valuations over a common domain* F*. Then, there exists some* v *such that for every finite set* F ⊆ F *of formulas and* m ≥ 0*, we have* v|F- = vk|F*for some* k ≥ m*.*

*Proof (Outline).* First, if F is finite, then there is only a finite number of Fvaluations, and there must exists some F-valuation v<sup>m</sup> that occurs infinitely often in the sequence v0, v1,.... We take v = vm, and the required property trivially holds. Now, assume that F is infinite, and let ϕ0, ϕ1,... be an enumeration of the formulas in F. For every i ≥ 0, let F<sup>i</sup> = {ϕ0,...,ϕi}. We construct a sequence of infinite sets A0, A1,... ⊆ N such that:

– For every i ≥ 0, Ai+1 ⊆ Ai.

– For every 0 ≤ j ≤ i, a ∈ A<sup>j</sup> , and b ∈ Ai, va(ϕ<sup>j</sup> ) = vb(ϕ<sup>j</sup> ).

To do so, take some infinite set A<sup>0</sup> ⊆ N such that va(ϕ0) = vb(ϕ0) for every a, b ∈ A<sup>0</sup> (such set must exist since we have a finite number of truth values). Then, given Ai, we let Ai+1 be some infinite subset of A<sup>i</sup> such that va(ϕi+1) = vb(ϕi+1) for every a, b ∈ Ai+1. The valuation v is defined by v(ϕi) = va(ϕi) for some a ∈ Ai. The properties of the Ai's ensure that v is well defined, and it can be shown that it also satisfies the required property.

Using Lemma 2 and the compactness property, we can show the following:

**Lemma 3.** *Let* v0, v1,... *be a sequence of valuations over a common domain* F *such that* <sup>v</sup><sup>m</sup> <sup>∈</sup> <sup>V</sup>F,m *<sup>K</sup> for every* m ≥ 0*. Then, there exists some* v ∈ V<sup>F</sup> *<sup>K</sup> such that for every* ϕ ∈ F*,* v(ϕ) = vm(ϕ) *for some* m ≥ 0*.*

*Proof (Outline).* By Lemma 2, there exists some v such that for every finite set F of formulas, v|F- = vm|F for some m ≥ 0. It is easy to verify that v satisfies the required properties. In particular, one shows that <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> for every m ≥ 0 by induction on <sup>m</sup>. In that proof we use Theorem <sup>1</sup> to obtain a finite <sup>Γ</sup> <sup>⊆</sup> <sup>v</sup>−<sup>1</sup>[TF] such that <sup>Γ</sup> <sup>V</sup>F,m−<sup>1</sup> K <sup>D</sup> <sup>ϕ</sup> from the assumption that <sup>v</sup>−<sup>1</sup>[TF] <sup>V</sup>F,m−<sup>1</sup> K <sup>D</sup> <sup>ϕ</sup>. Then, the above property of v is applied with F = Γ ∪ {ϕ}. Now, our characterization theorem easily follows:

**Theorem 2.** *The set* V<sup>F</sup> *<sup>K</sup> is the largest set* V *of* M*K-legal* F*-valuations that satisfies necessitation.*

*Proof (Outline).* To prove that V<sup>F</sup> <sup>K</sup> satisfies *necessitation*, one needs to prove that if <sup>v</sup>−<sup>1</sup>[TF] V<sup>F</sup> K <sup>D</sup> <sup>ϕ</sup>, then also <sup>v</sup>−<sup>1</sup>[TF] VF,m K <sup>D</sup> <sup>ϕ</sup> for some <sup>m</sup> <sup>≥</sup> 0. This is done using Lemma 3. For maximality, given a set V, we assume by contradiction that there is some <sup>m</sup> such that <sup>V</sup> ⊆ <sup>V</sup>F,m <sup>K</sup> , take a minimal such m, and show that it cannot be 0. Then, from V ⊆ V<sup>F</sup> <sup>K</sup> <sup>m</sup> <sup>−</sup> 1, it follows that actually <sup>V</sup> <sup>⊆</sup> <sup>V</sup>F,m <sup>K</sup> , and thus we obtain a contradiction.

**Finite Domain.** By definition we have VF,<sup>0</sup> <sup>K</sup> <sup>⊇</sup> <sup>V</sup>F,<sup>1</sup> <sup>K</sup> <sup>⊇</sup> <sup>V</sup>F,<sup>2</sup> <sup>K</sup> ⊇ ... (and so, <sup>V</sup>F,<sup>0</sup> K <sup>D</sup> ⊆ <sup>V</sup>F,<sup>1</sup> K <sup>D</sup> ⊆ <sup>V</sup>F,<sup>2</sup> K <sup>D</sup> <sup>⊆</sup> ...). Next, we show that when <sup>F</sup> is finite, then this sequence must converge.

**Lemma 4.** *Suppose that* VF,m *<sup>K</sup>* <sup>=</sup> <sup>V</sup>F,m+1 *<sup>K</sup> for some* m ≥ 0*. Then,* V<sup>F</sup> *<sup>K</sup>* = VF,m *<sup>K</sup> .* **Lemma 5.** *For a* finite *set* F *of formulas,* V<sup>F</sup> *<sup>K</sup>* <sup>=</sup> <sup>V</sup>F,4|F| *<sup>K</sup> .*

*Proof.* The left-to-right inclusion follows from our definitions. For the right-toleft inclusion, note that by Lemma 4, VF,m <sup>K</sup> <sup>=</sup> <sup>V</sup>F,m+1 <sup>K</sup> implies that VF,m <sup>K</sup> = VF,k K for every <sup>k</sup> <sup>≥</sup> <sup>m</sup>. Thus, it suffices to show that <sup>V</sup>F,m <sup>K</sup> <sup>=</sup> <sup>V</sup>F,m+1 <sup>K</sup> for some 0 ≤ <sup>m</sup> <sup>≤</sup> <sup>4</sup>|F | + 1. Indeed, otherwise we have <sup>V</sup>F,<sup>0</sup> <sup>K</sup> <sup>⊃</sup> <sup>V</sup>F,<sup>1</sup> <sup>K</sup> <sup>⊃</sup> <sup>V</sup>F,<sup>2</sup> <sup>K</sup> <sup>⊃</sup> ... <sup>⊃</sup> <sup>V</sup>F,4|F|+1 <sup>K</sup> , but this is impossible since there are only 4|F | functions from F to V4.

**Optimized Tables.** Starting from level 1, the condition on valuations allows us to refine the truth tables of MK, and reduce the search space for countermodels. For instance, since <sup>ψ</sup> <sup>V</sup>F,<sup>0</sup> K <sup>D</sup> <sup>ϕ</sup> <sup>⊃</sup> <sup>ψ</sup> (for every <sup>F</sup> with {ψ, ϕ, ϕ <sup>⊃</sup> <sup>ψ</sup>}⊆F), at level 1 we have that if <sup>ψ</sup> <sup>∈</sup> <sup>v</sup>−<sup>1</sup>[TF], then <sup>v</sup>(<sup>ϕ</sup> <sup>⊃</sup> <sup>ψ</sup>) <sup>∈</sup> TF. This allows us to remove t and f from the first and third columns (when y ∈ TF) in the table presenting <sup>⊃</sup>˜ . The following entailments (at level 0), all with a single occurrence of some connective, lead to similar refinements, resulting in the optimized tables below for ⊃, ∧ and ∨:

$$\begin{array}{ccccc} \varphi,\varphi\supset\psi \vdash\_{\mathcal{D}}^{\mathsf{V}^{\mathsf{F}},0}\psi & \varphi,\psi \vdash\_{\mathcal{D}}^{\mathsf{V}^{\mathsf{F}},0}\varphi\wedge\psi & \varphi\wedge\psi \vdash\_{\mathcal{D}}^{\mathsf{V}^{\mathsf{F}},0}\varphi & \varphi\wedge\psi \vdash\_{\mathcal{D}}^{\mathsf{V}^{\mathsf{F}},0}\psi\\\\ \varphi\vdash\_{\mathcal{D}}^{\mathsf{V}^{\mathsf{F}},0}\varphi\vee\psi & \psi \vdash\_{\mathcal{D}}^{\mathsf{V}^{\mathsf{F}},0}\varphi\vee\psi \end{array}$$


We note that level 1 valuations are not fully captured by these tables. For example, they must assign T to every formula of the form ϕ ⊃ ϕ, while the table above allows also t when v(ϕ) ∈ tf. A decision procedure for K can benefit from relying on these optimized tables instead of the original ones, starting from level 1.

## **4 Soundness and Completeness**

In this section we establish the soundness and completeness of the proposed semantics. For that matter, we first extend the notion of satisfaction to sequents:

**Definition 6.** An F-valuation v D-*satisfies* an F-sequent Γ ⇒ Δ, denoted by v |=<sup>D</sup> Γ ⇒ Δ, if v |=<sup>D</sup> ϕ for some ϕ ∈ Γ or v |=<sup>D</sup> ϕ for some ϕ ∈ Δ.

To prove soundness, we first note that except for (K), the soundness of each derivation rule easily follows from the Nmatrix semantics:

**Lemma 6 (Local Soundness).** *Consider an application of a rule of* G*<sup>K</sup> other than* (K) *deriving a sequent* Γ ⇒ Δ *from sequents* Γ<sup>1</sup> ⇒ Δ1,...,Γ<sup>n</sup> ⇒ Δn*, such that* <sup>Γ</sup> <sup>∪</sup> <sup>Γ</sup><sup>1</sup> <sup>∪</sup> ... <sup>∪</sup> <sup>Γ</sup><sup>n</sup> <sup>∪</sup> <sup>Δ</sup> <sup>∪</sup> <sup>Δ</sup><sup>1</sup> <sup>∪</sup> ... <sup>∪</sup> <sup>Δ</sup><sup>n</sup> ⊆ F*. Let* <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m *<sup>K</sup> for some* m ≥ 0*. If* v |=<sup>D</sup> Γ<sup>i</sup> ⇒ Δ<sup>i</sup> *for every* 1 ≤ i ≤ n*, then* v |=<sup>D</sup> Γ ⇒ Δ*.*

For (K), we make use of the level requirement, and prove the following lemma.

**Lemma 7 (Soundness of (K)).** *Suppose that* Γ ∪ -Γ ∪ {ϕ, ϕ}⊆F*, and* <sup>Γ</sup> <sup>V</sup>F,m−<sup>1</sup> *K* <sup>D</sup> <sup>ϕ</sup>*. Then,* -<sup>Γ</sup> <sup>V</sup>F,m *K* <sup>D</sup> ϕ*.*

*Proof.* Let <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> such that v |=<sup>D</sup> -Γ. We prove that v |=<sup>D</sup> ϕ. By the truth table of -, we have that v(ψ) ∈ TF for every ψ ∈ Γ, and we need to show that <sup>v</sup>(ϕ) <sup>∈</sup> TF. Since <sup>v</sup>(ψ) <sup>∈</sup> TF for every <sup>ψ</sup> <sup>∈</sup> <sup>Γ</sup>, we have <sup>Γ</sup> <sup>⊆</sup> <sup>v</sup>−<sup>1</sup>[TF]. Since <sup>Γ</sup> <sup>V</sup>F,m−<sup>1</sup> K <sup>D</sup> <sup>ϕ</sup>, we have <sup>v</sup>−<sup>1</sup>[TF] <sup>V</sup>F,m−<sup>1</sup> K <sup>D</sup> <sup>ϕ</sup>. Since <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> , it follows that v(ϕ) ∈ TF.

The above two lemmas together establish soundness, and from soundness for each level, we easily derive soundness for arbitrary (K)-depth.

**Theorem 3 (Soundness for** <sup>m</sup>**).** *If* <sup>F</sup>,m <sup>G</sup>*<sup>K</sup>* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then* <sup>Γ</sup> <sup>V</sup>F,m *K* <sup>D</sup> <sup>Δ</sup>*.*

**Theorem 4 (Soundness without** m**).** *If* <sup>F</sup> <sup>G</sup>*<sup>K</sup>* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then* <sup>Γ</sup> <sup>V</sup><sup>F</sup> *K* <sup>D</sup> <sup>Δ</sup>*.*

By taking <sup>F</sup> <sup>=</sup> <sup>L</sup> in Theorem <sup>4</sup> we get that if <sup>G</sup><sup>K</sup> <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>, then <sup>Γ</sup> <sup>V</sup><sup>K</sup> <sup>D</sup> <sup>Δ</sup>. Next, we prove the following two completeness theorems:

**Theorem 5 (Completeness for** m**).** *Let* F⊆L *closed under subformulas and* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> *an* <sup>F</sup>*-sequent. If* <sup>Γ</sup> <sup>V</sup>F,m *K* <sup>D</sup> <sup>Δ</sup>*, then* <sup>F</sup>,m <sup>G</sup>*<sup>K</sup>* Γ ⇒ Δ*.*

**Theorem 6 (Completeness without** m**).** *Let* F⊆L *closed under subformulas and* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> *an* <sup>F</sup>*-sequent. If* <sup>Γ</sup> V<sup>F</sup> *K* <sup>D</sup> <sup>Δ</sup>*, then* <sup>F</sup> <sup>G</sup>*<sup>K</sup>* Γ ⇒ Δ*.*

In fact, since F may be infinite, we need to prove stronger theorems than Theorems 5 and 6, that incorporate infinite sequents.

**Definition 7.** An ω-sequent is a pair L, R, denoted by L ⇒ R, such that L and R are (possibly infinite) sets of formulas. We write F,m <sup>G</sup><sup>K</sup> <sup>L</sup> <sup>⇒</sup> <sup>R</sup> if F,m <sup>G</sup><sup>K</sup> Γ ⇒ Δ for some finite Γ ⊆ L and Δ ⊆ R.

Other notions for sequents (e.g., being an F-sequent) are extended to ωsequents in the obvious way. In particular, v |=<sup>D</sup> L ⇒ R if v(ψ) ∈ D/ for some ψ ∈ L or v(ψ) ∈ D for some ψ ∈ R.

**Theorem 7 (**ω**-Completeness for** m**).** *Let* F⊆L *closed under subformulas and* <sup>L</sup> <sup>⇒</sup> <sup>R</sup> *an* <sup>ω</sup>*-*F*-sequent. If* <sup>L</sup> <sup>V</sup>F,m *K* <sup>D</sup> <sup>R</sup>*, then* <sup>F</sup>,m <sup>G</sup>*<sup>K</sup>* L ⇒ R*.*

**Theorem 8 (**ω**-Completeness without** m**).** *Let* F⊆L *closed under subformulas and* <sup>L</sup> <sup>⇒</sup> <sup>R</sup> *an* <sup>ω</sup>*-*F*-sequent. If* <sup>L</sup> <sup>V</sup><sup>F</sup> *K* <sup>D</sup> <sup>R</sup>*, then* <sup>F</sup> <sup>G</sup>*<sup>K</sup>* L ⇒ R*.*

Theorem <sup>5</sup> is a consequence of Theorem 7. Indeed, by Theorem 7, <sup>Γ</sup> <sup>V</sup>F,m K <sup>D</sup> Δ implies that <sup>F</sup>,m <sup>G</sup><sup>K</sup> Γ ⇒ Δ for some (finite) Γ ⊆ Γ and Δ ⊆ Δ. Using (weak), we obtain that <sup>F</sup>,m <sup>G</sup><sup>K</sup> Γ ⇒ Δ. Similarly, Theorem 6 is a consequence of Theorem 8. Also, using Lemma 3, we obtain Theorem 8 from Theorem 7. Hence in the remainder of this section we focus on the proof of Theorem 7.

**Proof of Theorem** 7**.** We start by defining maximal and consistent ω-sequents, and proving their existence.

**Definition 8 (Maximal and consistent** ω**-sequent).** Let F⊆L and m ≥ 0. An F-ω-sequent L ⇒ R is called:


**Lemma 8.** *Let* F⊆L *and* <sup>L</sup> <sup>⇒</sup> <sup>R</sup> *an* <sup>F</sup>*-*ω*-sequent. Suppose that* <sup>F</sup>,m <sup>G</sup>*<sup>K</sup>* L ⇒ R*. Then, there exist sets* LMC(G*K*,F,m,L⇒R) *and* RMC(G*K*,F,m,L⇒R) *such that the following hold:*


Thus, given an underivable ω-sequent, we can extend it to a GK, F, m-maxcon ω-sequent. This ω-sequent induces the canonical countermodel, as defined next.

#### **Algorithm 1.** Deciding <sup>Γ</sup> V<sup>K</sup> <sup>D</sup> <sup>ϕ</sup>.

1: F ← *sub*(Γ ∪ {ϕ}) 2: m ← 4|F | 3: **for** <sup>v</sup> <sup>∈</sup> <sup>V</sup>F*,m* <sup>K</sup> **do** 4: **if** v |=<sup>D</sup> Γ and v |=<sup>D</sup> ϕ **then** 5: **return** ("NO", v) 6: **return** "YES"

**Notation 9.** We denote the set {ψ ∈F | <sup>ψ</sup> <sup>∈</sup> <sup>X</sup>} by <sup>B</sup><sup>X</sup> F .

**Definition 10.** Suppose that L R = F. The *canonical model w.r.t.* L ⇒ R*,* F*, and* m, denoted by v(F, L ⇒ R, m), is the F-valuation defined as follows (in λ notation):

$$\begin{aligned} & \text{For } m=0: & \quad \text{For } m>0:\\ & \lambda \varphi \in \mathcal{F}. \begin{cases} \mathsf{T} & \varphi \in L \text{ and } \Box \varphi \in L \\ \mathbf{t} & \varphi \in L \text{ and } \Box \varphi \notin L \\ \mathsf{F} & \varphi \in R \text{ and } \Box \varphi \in L \\ \mathbf{f} & \varphi \in R \text{ and } \Box \varphi \notin L \end{cases} & \lambda \varphi \in \mathcal{F}. \begin{cases} \mathsf{T} & \varphi \in L \text{ and } \|\!^{\mathcal{F}, m-1} \mathbb{B}\!^{L}\_{\mathcal{F}} \Rightarrow \varphi \\ \mathsf{t} & \varphi \in L \text{ and } \|\!^{\mathcal{F}, m-1} \mathbb{B}\!^{L}\_{\mathcal{F}} \Rightarrow \varphi \\ \mathsf{F} & \varphi \in R \text{ and } \|\!^{\mathcal{F}} \!^{\mathcal{G}, m-1} \mathbb{B}\!^{L}\_{\mathcal{F}} \Rightarrow \varphi \\ \mathsf{f} & \varphi \in R \text{ and } \|\!^{\mathcal{F}} \!^{\mathcal{G}, m-1} \mathbb{B}\!^{L}\_{\mathcal{F}} \Rightarrow \varphi \end{aligned} \end{aligned}$$

Clearly, v(F, L ⇒ R, m) |=<sup>D</sup> L ⇒ R. The proof of Theorem 7 is done by induction on m, and then carries on by showing that if L ⇒ R is GK, F, m max-con, then <sup>v</sup>(F, L <sup>⇒</sup> R, m) belongs to <sup>V</sup>F,m <sup>K</sup> for every m.

Concretely, let v def <sup>=</sup> <sup>v</sup>(F, L <sup>⇒</sup> R, m). We show that <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,k <sup>K</sup> for every k ≤ m by induction on k. The base case k = 0 is straightforward. For k > 0, we have <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,k−<sup>1</sup> <sup>K</sup> by the induction hypothesis. Let ϕ ∈ F, and suppose that <sup>v</sup>−<sup>1</sup>[TF] <sup>V</sup>F,k−<sup>1</sup> K <sup>D</sup> <sup>ϕ</sup>. To show that <sup>v</sup>(ϕ) <sup>∈</sup> TF, we prove that <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> B<sup>L</sup> <sup>F</sup> <sup>⇒</sup> <sup>ϕ</sup>. By the outer induction hypothesis (regarding the completeness theorem itself), <sup>v</sup>−<sup>1</sup>[TF] <sup>V</sup>F,k−<sup>1</sup> K <sup>D</sup> <sup>ϕ</sup> implies that <sup>F</sup>,k−<sup>1</sup> <sup>G</sup><sup>K</sup> <sup>v</sup>−<sup>1</sup>[TF] <sup>⇒</sup> <sup>ϕ</sup>, which implies that <sup>F</sup>,m−<sup>1</sup> G<sup>K</sup> <sup>v</sup>−<sup>1</sup>[TF] <sup>⇒</sup> <sup>ϕ</sup>. Hence, there is a finite set {ϕ1,...,ϕn} ⊆ <sup>v</sup>−<sup>1</sup>[TF] such that <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> {ϕ1,...,ϕn} ⇒ <sup>ϕ</sup>. For every 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, since <sup>ϕ</sup><sup>i</sup> <sup>∈</sup> <sup>v</sup>−<sup>1</sup>[TF], we have that <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> B<sup>L</sup> <sup>F</sup> <sup>⇒</sup> <sup>ϕ</sup><sup>i</sup> and hence <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> <sup>Γ</sup><sup>i</sup> <sup>⇒</sup> <sup>ϕ</sup><sup>i</sup> for some <sup>Γ</sup><sup>i</sup> <sup>⊆</sup> <sup>B</sup><sup>L</sup> <sup>F</sup> . Using <sup>n</sup> applications of (cut) on these sequents and <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> {ϕ1,...,ϕn} ⇒ ϕ, we obtain that <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> <sup>Γ</sup>1,...,Γ<sup>n</sup> <sup>⇒</sup> <sup>ϕ</sup>, and so <sup>F</sup>,m−<sup>1</sup> <sup>G</sup><sup>K</sup> B<sup>L</sup> <sup>F</sup> <sup>⇒</sup> <sup>ϕ</sup>.

#### **5 Effectiveness of the Semantics**

In this section we study the effectiveness of the semantics introduced in Definition 5 for deciding <sup>M</sup><sup>K</sup> . Roughly speaking, a semantic framework is said to be *effective* if it induces a decision procedure that decides its underlying logic.

Consider Algorithm 1. Given a finite set Γ of formulas and a formula ϕ, it checks whether any valuations in VF,m <sup>K</sup> is a countermodel. The correctness of this algorithm relies on the analyticity of GK, namely:

**Lemma 9 (**[11]**).** *If* G*<sup>K</sup>* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then sub*(Γ∪{ϕ}) <sup>G</sup>*<sup>K</sup>* Γ ⇒ Δ*.*

Using Lemma 9, we show that the algorithm is correct.

#### **Lemma 10.** *Algorithm <sup>1</sup> always terminates, and returns "YES" iff* <sup>Γ</sup> V*<sup>K</sup>* <sup>D</sup> <sup>ϕ</sup>*.*

*Proof.* Termination follows from the fact that VF,m <sup>K</sup> is finite. Suppose that the result is "YES" and assume for contradiction that <sup>Γ</sup> V<sup>K</sup> <sup>D</sup> <sup>ϕ</sup>. Hence, there exists some <sup>u</sup> <sup>∈</sup> <sup>V</sup><sup>K</sup> such that <sup>u</sup> <sup>|</sup>=<sup>D</sup> <sup>Γ</sup> and <sup>u</sup> |=<sup>D</sup> <sup>ϕ</sup>. Consider <sup>v</sup> def = u|<sup>F</sup> . Then, v ∈ V<sup>F</sup> <sup>K</sup> <sup>⊆</sup> <sup>V</sup>F,m <sup>K</sup> , which contradicts the fact that the algorithm returns "YES". Now, suppose that the result is "NO". Then, there exists some <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> such that v |=<sup>D</sup> Γ and v |=<sup>D</sup> ϕ. By Lemma 5, v ∈ V<sup>F</sup> <sup>K</sup> . Hence, <sup>Γ</sup> <sup>V</sup><sup>F</sup> K <sup>D</sup> <sup>ϕ</sup>. By Theorem 3, we have <sup>F</sup> <sup>G</sup><sup>K</sup> Γ ⇒ ϕ. By Lemma 9, we have <sup>G</sup><sup>K</sup> Γ ⇒ ϕ. By Theorem 6, we have <sup>Γ</sup> <sup>V</sup><sup>K</sup> <sup>D</sup> <sup>ϕ</sup>.

Lemma 10 shows that Algorithm 1 is a decision procedure for <sup>M</sup><sup>K</sup> , when ignoring the additional output provided in Line 5. However, it is typical in applications that a "YES" or "NO" answer is not enough, and often it is expected that a "NO" result is accompanied with a countermodel. Algorithm 1 returns a valuation v in case the answer is "NO", but Lemma 10 does not ensure that v is indeed a countermodel for <sup>Γ</sup> <sup>V</sup><sup>K</sup> <sup>D</sup> <sup>ϕ</sup>. The issue is that the valuation <sup>v</sup> from the proof of Lemma <sup>10</sup> witnesses the fact that <sup>V</sup><sup>K</sup> <sup>D</sup> only in a non-constructive way. Indeed, using the soundness and completeness theorems, we are able to deduce that v |=<sup>D</sup> Γ and v |=<sup>D</sup> ϕ for some v ∈ VK, but the relation between v and v is unclear. Most importantly, it is not clear whether v and v agree on F-formulas. In the remainder of this section we prove that v extends v, and so the returned countermodel of Line 5 can be trusted.

We say that a valuation v *extends* a valuation v if Dom(v) ⊆ Dom(v ) and v (ϕ) = v(ϕ) for every ϕ ∈ Dom(v) (identifying functions with sets of pairs, this means v ⊆ v ). Clearly, for a Dom(v)-formula ψ we have that v |=<sup>D</sup> ψ iff <sup>v</sup> <sup>|</sup>=<sup>D</sup> <sup>ψ</sup>. We first show how to extend a given valuation <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> by a single formula <sup>ψ</sup> such that *sub*(ψ)\ {ψ}⊆F, obtaining a valuation <sup>v</sup> <sup>∈</sup> <sup>V</sup>F∪{ψ},m <sup>K</sup> that agrees with v on all formulas in F.

**Lemma 11.** *Let* <sup>m</sup> <sup>≥</sup> <sup>0</sup>*,* F⊆L*, and* <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m *<sup>K</sup> . Let* ψ ∈ L\F *such that sub*(ψ)\ {ψ}⊆F*. Then,* <sup>v</sup> *can be extended to some* <sup>v</sup> <sup>∈</sup> <sup>V</sup>F∪{ψ},m *<sup>K</sup> .*

We sketch the proof of Lemma 11.

When m = 0, v exists from Lemma 1. For m > 0, we define v as follows:<sup>1</sup>

$$v' \stackrel{\text{def}}{=} \lambda \varphi \in \mathcal{F} \cup \{\psi\} \cdot \begin{cases} v(\varphi) & \varphi \in \mathcal{F} \\ \min(\mathsf{pos-val}(\psi, \mathsf{M}\_{\mathbb{K}}, v) \cap \mathsf{TF}) & \varphi = \psi \wedge v^{-1}[\mathsf{TF}] \vdash\_{\mathcal{D}}^{\mathsf{V}\cup\{\psi\}, m-1} \psi \\ \min(\mathsf{pos-val}(\psi, \mathsf{M}\_{\mathbb{K}}, v) \cap \mathsf{tf}) & \text{otherwise} \end{cases}$$

<sup>1</sup> The use of min here assumes an arbitrary order on truth values. It is used here only to choose *some* element from a non-empty set of truth values.

The proof of Lemma <sup>11</sup> then carries on by showing that <sup>v</sup> <sup>∈</sup> <sup>V</sup>F∪{ψ},m <sup>K</sup> . Next, Lemma 11 is used in order to extend partial valuations into total ones.

**Lemma 12.** *Let* <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m *<sup>K</sup> for some* F *closed under subformulas. Then,* v *can be extended to some* <sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>m</sup> *K .*

Finally, Lemmas 3 and 12 can be used in order to extend any partial valuation in V<sup>F</sup> <sup>K</sup> into a total one.

**Lemma 13.** *Let* v ∈ V<sup>F</sup> *<sup>K</sup> for some set* F *closed under subformulas. Then,* v *can be extended to some* v ∈ V*K.*

We conclude by showing that when Algorithm 1 returns ("NO", v), then v is a finite representation of a true countermodel for Γ <sup>M</sup><sup>K</sup> ϕ.

**Corollary 1.** *If* <sup>Γ</sup> <sup>V</sup>*<sup>K</sup>* <sup>D</sup> <sup>ϕ</sup>*. Then Algorithm <sup>1</sup> returns ("NO",* <sup>v</sup>*) for some* <sup>v</sup> *for which there exists* v ∈ V*<sup>K</sup> such that* v = v |*sub*(Γ∪{ϕ})*,* v |=<sup>D</sup> Γ*, and* v |=<sup>D</sup> ϕ*.*

*Proof.* Suppose that <sup>Γ</sup> <sup>V</sup><sup>K</sup> <sup>D</sup> <sup>ϕ</sup>. Then by Lemma 10, Algorithm <sup>1</sup> does not return "YES". Therefore, it returns ("NO", <sup>v</sup>) for some <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m <sup>K</sup> such that v |=<sup>D</sup> Γ and v |=<sup>D</sup> ϕ, where F = *sub*(Γ ∪ {ϕ}) and m = 4|F |. By Lemma 5, v ∈ VKF. By Lemma 13, v can be extended to some v ∈ VK. Therefore, v = v |*sub*(Γ∪{ϕ}), v |=<sup>D</sup> Γ, and v |=<sup>D</sup> ϕ.

*Remark 2.* Notice that in scenarios where model generation is not important, m can be set to a much smaller number in Line 2 of Algorithm 1, namely, the "modal depth" of the input.<sup>2</sup> The reason for that is that for such m, it can be shown that <sup>F</sup>,m <sup>G</sup><sup>K</sup> Γ ⇒ ϕ iff <sup>F</sup> <sup>G</sup><sup>K</sup> Γ ⇒ ϕ, by reasoning about the applications of rule (K). Using the soundness and completeness theorems, we can get <sup>Γ</sup> <sup>V</sup>F,m K <sup>D</sup> ϕ iff <sup>Γ</sup> <sup>V</sup><sup>F</sup> K <sup>D</sup> <sup>ϕ</sup>, and so limiting to such <sup>m</sup> is enough. Notice however, that we do not necessarily get VF,m <sup>K</sup> = V<sup>F</sup> <sup>K</sup> for such m, and so the valuation returned in Line 5 might not be an element of V<sup>F</sup> K .

#### **6 The Modal Logic** KT

In this section we obtain similar results for the modal logic KT. First, the calculus GKT is obtained from G<sup>K</sup> by adding the following rule (see, e.g., [16]):

$$(T)\;\frac{\Gamma,\varphi\Rightarrow\Delta}{\Gamma,\square\varphi\Rightarrow\Delta}$$

Derivations are defined as before. (In particular, the (K)-depth of a derivation still depends on applications of rule (K), not of rule (T).) We write <sup>F</sup>,m <sup>G</sup>KT Γ ⇒ Δ

<sup>2</sup> The *modal depth* of an atomic formula p is 0. The modal depth of ϕ is the modal depth of ϕ plus 1. The modal depth of (ϕ1,...,ϕ*n*) for = is the maximum among the modal depths of ϕ1,...,ϕ*n*.

if there is a derivation of Γ ⇒ Δ in GKT in which only F-sequents occur and that has (K)-depth at most m.

Next, we consider the semantics. For a valuation v ∈ V<sup>K</sup> to respect rule (T), we must have that if v |=<sup>D</sup> Γ, ϕ ⇒ Δ, then v |=<sup>D</sup> Γ, ϕ ⇒ Δ. In particular, when v |=<sup>D</sup> Γ ⇒ Δ, we get that if v(ϕ) ∈ D/ , then v(ϕ) ∈ D/ . Now, if v(ϕ) = F, then v(ϕ) ∈ D according to the truth table of in MK. But, we must have v(ϕ) ∈ D/ . This leads us to remove F from MK.

We thus obtain the following Nmatrix MKT: The sets of truth values and of designated truth values are given by<sup>3</sup>

$$\mathcal{V}\_3 \stackrel{\text{def}}{=} \{\mathsf{T}, \mathsf{t}, \mathsf{f}\} \qquad \qquad \mathcal{D} \stackrel{\text{def}}{=} \{\mathsf{T}, \mathsf{t}\}$$

and the truth tables are as follows:


Again, one may gain intuition from the possible worlds semantics. There, the logic KT is characterized by frames with *reflexive* accessibility relation. Thus, for instance, if ψ holds in w but not in some world accessible from w (i.e., ψ has value t), we know that ψ does not hold in w, and the reflexivity of the accessibility relation implies that ψ does not hold in some world accessible from w (thus ψ has value f).

*Example 4.* Let ϕ def = --(p<sup>1</sup> ∧ p2) ⊃ <sup>p</sup><sup>1</sup> and <sup>F</sup> def = *sub*(ϕ). The sequent ⇒ ϕ has a derivation in GKT using only F formulas of (K)-depth of 1. However, it is not satisfied by all MKT-legal F-valuations. For example, the following valuation is an MKT-legal valuation that does not satisfy ϕ:

$$v(p\_1) = v(p\_2) = \mathbf{t} \quad v(\Box p\_1) = \mathbf{f}$$

$$v(p\_1 \wedge p\_2) = v(\Box (p\_1 \wedge p\_2)) = v(\Box \Box (p\_1 \wedge p\_2)) = \mathbf{\sf T} \quad v(\varphi) = \mathbf{f}$$

Next, we define the levels of valuations for MKT. These are obtained from Definition 5 by removing the value F:

**Definition 11.** The set VF,m KT is recursively defined as follows:

– VF,<sup>0</sup> KT is the set of MKT-legal F-valuations. – VF,m+1 KT def = - <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m KT | ∀<sup>ϕ</sup> ∈ F. v−<sup>1</sup>[T] <sup>V</sup>F,m KT <sup>D</sup> <sup>ϕ</sup> <sup>=</sup><sup>⇒</sup> <sup>v</sup>(ϕ) = <sup>T</sup> 

We also define:

$$\mathbb{V}\_{\mathsf{KT}}^{\mathcal{F}} \stackrel{\scriptstyle \mathsf{def}}{=} \bigcap\_{m \geq 0} \mathbb{V}\_{\mathsf{KT}}^{\mathcal{F}, m} \qquad\qquad\qquad \mathbb{V}\_{\mathsf{KT}}^{m} \stackrel{\scriptstyle \mathsf{def}}{=} \mathbb{V}\_{\mathsf{KT}}^{\mathcal{L}, m} \qquad\qquad \mathbb{V}\_{\mathsf{KT}} \stackrel{\scriptstyle \mathsf{def}}{=} \bigcap\_{m \geq 0} \mathbb{V}\_{\mathsf{KT}}^{\mathcal{L}, m}$$

<sup>3</sup> In this section we denote the set {T} by TF.

*Example 5.* Following Example 4, we note that for every <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m KT with m > 0, we have v |=<sup>D</sup> ϕ. In particular, the valuation v from Example 4 does not belong to VF,m KT : -(p<sup>1</sup> <sup>∧</sup> <sup>p</sup>2) <sup>∈</sup> <sup>v</sup>−<sup>1</sup>[T], -(p<sup>1</sup> <sup>∧</sup> <sup>p</sup>2) VF,<sup>0</sup> KT <sup>D</sup> <sup>p</sup>1, but <sup>v</sup>(p1) = <sup>t</sup>.

Similarly to Theorem 2, the levels of valuations converge to a maximal set that satisfies the following condition:

$$\forall v \in \mathbb{V}. \forall \varphi \in \mathcal{F}. v^{-1}[\mathsf{T}] \vdash\_{\mathcal{D}}^{\mathsf{V}} \varphi \implies v(\varphi) = \mathsf{T} \qquad \qquad (recesisation\_{\mathsf{KT}})$$

**Theorem 9.** *The set* V<sup>F</sup> *KT is the largest set* V *of* M*KT-legal* F*-valuations that satisfies necessitationKT.*

The proof of Theorem 9 is analogous to that of Theorem 2.

*Remark 3.* The *necessitation*KT condition is equivalent to the one given in [8], except that the underlying truth table is different. Theorem 9 proves that our gradual way of defining V<sup>F</sup> KT via levels coincides with the semantic condition from [8].

As we demonstrated for K, starting from level 1, the condition on valuations allows us to refine the truth tables of MKT, and reduce the search space. Simple entailments (at level 0) lead to the optimized tables below for ⊃, ∧ and ∨:


Soundness and completeness for GKT are obtained analogously to GK, keeping in mind that MKT is obtained from M<sup>K</sup> by deleting the value F. For soundness, this is captured by the rule (T). For completeness, the same construction of a countermodel is performed , while rule (T) ensures that it is three-valued.

**Theorem 10 (Soundness and Completeness).** *Let* F⊆L *closed under subformulas and* Γ ⇒ Δ *an* F*-sequent.*

*1. For every* <sup>m</sup> <sup>≥</sup> <sup>0</sup>*,* <sup>Γ</sup> <sup>V</sup>F,m *KT* <sup>D</sup> <sup>Δ</sup> *iff* <sup>F</sup>,m <sup>G</sup>*KT* Γ ⇒ Δ*. 2.* <sup>Γ</sup> <sup>V</sup><sup>F</sup> *KT* <sup>D</sup> <sup>Δ</sup> *iff* <sup>F</sup> <sup>G</sup>*KT* Γ ⇒ Δ*.*

Effectiveness is also shown similarly to K. For that matter, we use the following main lemma, whose proof is similar to Lemma 13. The only component that is added to that proof is making sure that the constructed model is three-valued.

**Lemma 14.** *Let* v ∈ V<sup>F</sup> *KT for some set* F *closed under subformulas. Then,* v *can be extended to some* v ∈ V*KT.*

Let Algorithm 2 be obtained from Algorithm 1 by setting m to 3|F | in Line 2, and taking <sup>v</sup> <sup>∈</sup> <sup>V</sup>F,m KT in Line 3. Similarly to Lemma 10 and Corollary 1, we get that Algorithm 2 is a model-producing decision procedure for <sup>M</sup>KT .

**Lemma 15.** *Algorithm 2 always terminates, and returns "YES" iff* <sup>Γ</sup> V*KT* <sup>D</sup> <sup>ϕ</sup>*. Further, if* <sup>Γ</sup> V*KT* <sup>D</sup> <sup>ϕ</sup>*, then it returns ("NO",* <sup>v</sup>*) for some* <sup>v</sup> *for which there exists* v ∈ V*KT such that* v = v |*sub*(Γ∪{ϕ})*,* v |=<sup>D</sup> Γ*, and* v |=<sup>D</sup> ϕ*.*

## **7 Future Work**

We have introduced a new semantics for the modal logic K, based on levels of valuations in many-valued non-deterministic matrices. Our semantics is effective, and was shown to tightly correspond to derivations in a sequent calculus for K. We also adapted these results for the modal logic KT.

There are two main directions for future work. The first is to establish similar semantics for other normal modal logics, such as KD, K4, S4 and S5, and to investigate ♦ as an independent modality. The second is to analyze the complexity, implement and experiment with decision procedures for K and KT based on the proposed semantics. In particular, we plan to consider SAT-based decision procedures that would encode this semantics in SAT, directly or iteratively.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Local Reductions for the Modal Cube**

Cl´audia Nalon<sup>1</sup> , Ullrich Hustadt2(B) , Fabio Papacchini<sup>3</sup> , and Clare Dixon<sup>4</sup>

<sup>1</sup> Department of Computer Science, University of Bras´ılia, Bras´ılia, Brazil nalon@unb.br

<sup>2</sup> Department of Computer Science, University of Liverpool, Liverpool, UK U.Hustadt@liverpool.ac.uk

<sup>3</sup> School of Computing and Communications, Lancaster University in Leipzig, Leipzig, Germany

f.papacchini@lancaster.ac.uk

<sup>4</sup> Department of Computer Science, University of Manchester, Manchester, UK clare.dixon@manchester.ac.uk

**Abstract.** The modal logic K is commonly used to represent and reason about necessity and possibility and its extensions with combinations of additional axioms are used to represent knowledge, belief, desires and intentions. Here we present local reductions of all propositional modal logics in the so-called modal cube, that is, extensions of K with arbitrary combinations of the axioms B, D, T, 4 and 5 to a normal form comprising a formula and the set of modal levels it occurs at. Using these reductions we can carry out reasoning for all these logics with the theorem prover KSP. We define benchmarks for these logics and experiment with the reduction approach as compared to an existing resolution calculus with specialised inference rules for the various logics.

### **1 Introduction**

Modal logics have been used to represent and reason about mental attitudes such as knowledge, belief, desire and intention, see for example [17,20,31]. These can be represented using extensions of the basic modal logic K with one or more of the axioms B (symmetry), D (seriality), T (reflexivity), 4 (transitivity) and 5 (Euclideaness). The logic K and these extensions form the so-called *modal cube*, see Fig. 1. In the diagram, a line from a logic L<sup>1</sup> to a logic L<sup>2</sup> to its right and/or above means that all theorems of L<sup>1</sup> are also theorems of L2, but not vice versa. As indicated in Fig. 1, some of the logics have the same theorems, e.g., KB5 and KB4. Also, all logics not explicitly listed have the same theorems as KT5 aka S5. In total there are 15 distinct logics.

While these modal logics are well-studied and a multitude of calculi and translations to other logics exist, see, e.g., [1,3–6,9,13,14,16,18,22,41], fully

C. Dixon was partially supported by the EPSRC funded RAI Hubs FAIR-SPACE (EP/R026092/1) and RAIN (EP/R026084/1), and the EPSRC funded programme Grant S4 (EP/N007565/1).

**Fig. 1.** Modal Cube: Relationships between modal logics

automatic support by provers is still lacking. Early implementations covering the full modal cube, such as Catach's TABLEAUX system [7], are no longer available. LoTREC 2.0 [10] supports a wide range of logics but is not intended as an automatic theorem prover. MOIN [11] supports all the logics but the focus is on producing human-readable proofs and countermodels for small formulae. Other provers that go beyond just K, like MleanCoP [28] and CEGARBox [15] only support a small subset of the 15 logics. There are also a range of translations from modal logics to first-order and higher-order logics [13,18,19,27,33]. Regarding implementations of those, SPASS [33,43] is limited to a subset of the 15 logics, while LEO-III [13,36] supports all the logics in the modal cube, but can only solve very few of the available benchmark formulae.

KSP [23] is a modal logic theorem prover that implements both the modallayered resolution (MLR) calculus [25] for the modal logic K and the global resolution (GMR) calculus [24] for all the 15 logics considered here. It also supports several refinements of resolution and a range of simplification rules. In this paper, we give reductions of all logics of the modal cube into a normal form for the basic modal logic K. We then compare the performance of the combination of these reductions with the modal-layered resolution calculus to that of the global resolution calculus on a new benchmark collection for the modal cube.

In [29] we have presented new reductions<sup>1</sup> of the propositional modal logics KB, KD, KT, K4, and K5 to Separated Normal Form with Sets of Modal Levels SNFsml. SNFsml is a generalisation of the Separated Normal Form with Modal Level, SNFml. In the latter, labelled modal clauses are used where a natural number label refers to a particular level within a tree Kripke structure at which a modal clause holds. In the former, a finite or infinite set of natural numbers labels each modal clause with the intended meaning that such a modal clause is true at every level of a tree Kripke structure contained in that set. As our prover KSP and the modal-layered resolution calculus it implements currently only support sets of modal clauses in SNFml, we then use a further reduction from SNFsml

<sup>1</sup> A *reduction* here is a satisfiability preserving mapping between logics.

to SNFml to obtain an automatic theorem prover for these modal logics. Where all modal clauses are labelled with finite sets, this reduction is straightforward. This is the case for KB, KD and KT. For K4 and K5, characterised by the axioms ✷<sup>ϕ</sup> <sup>→</sup> ✷✷<sup>ϕ</sup> and ✸<sup>ϕ</sup> <sup>→</sup> ✷✸ϕ, modal clauses are in general labelled with infinite sets. However, using a result by Massacci [21] for K4 and an analogous result for K5 by ourselves, we are able to bound the maximal level occurring in those labelling sets which in turn makes a reduction to SNFml possible.

Also in [29], we have shown experimentally that these reductions allow us to reason effectively in these logics, compared to the global modal resolution calculus [24] and to the relational and semi-functional translation built into the first-order theorem prover SPASS 3.9 [33,38,42]. The reason that the comparison only included a rather limited selection of provers is that these are the only ones with built-in support for all six logics our reductions covered.

Unfortunately, we cannot simply combine our reductions for single axioms to obtain satisfiability preserving reductions for their combinations. There are two main reasons for this. First, our calculus does not use an explicit representation of the accessibility relationship within a Kripke structure, which would make it possible to reflect modal axioms via corresponding properties of that accessibility relationship. Instead, we add labelled modal clauses based on instances of the modal axioms for ✷-formulae occurring in the modal formula we want to check for satisfiability. However, if we deal with multiple modal axioms, then these axioms might interact making it necessary to add instances that are not necessary for each individual axiom. For instance, consider, the converse of axiom <sup>B</sup>, ✸✷<sup>ϕ</sup> <sup>→</sup> <sup>ϕ</sup>, and axiom <sup>4</sup>, ✷<sup>ϕ</sup> <sup>→</sup> ✷✷ϕ. Together they imply ✸✷<sup>ϕ</sup> <sup>→</sup> ✷ϕ. Instances of this derived axiom are necessary for completeness of a reduction from KB4 to K, but are unsound for KB and K4 separately.

Second, our reductions attempt to keep the labelling sets minimal in size in order to decrease the number of inferences that can be performed. Again, taking axioms B and 4 as examples, in KB, a ✷-formula ✷ψ true at level ml in a treelike Kripke structure <sup>M</sup> forces <sup>ψ</sup> to be true at level ml <sup>−</sup> 1, while in K4, ✷<sup>ψ</sup> true at level ml in M forces ψ to be true all levels ml with ml > ml. This is reflected in the labelling sets we use for these two logics. However, for KB4, ✷ψ true at level ml forces ψ to be true at every level in a tree-like Kripke structure M (unless M consists only of a single world).

Since we intend to maintain these two properties of our reductions, we have to consider each modal logic individually. As we will see, for some logics a reduction can be obtained as the union of the existing reductions while for others we need a logic-specific reduction to accommodate the interaction of axioms.

The structure of the paper is as follows. In Sect. 2 we recall common concepts of propositional modal logic and the definition of our normal form SNFml. Section 3 introduces our reduction for extensions of the basic modal logic K with combinations of the axioms B, D, T, 4, and 5. Section 4 presents a transformation from SNFsml to SNFml which allows us to use the modal resolution prover KSP to reason in all the modal logics. In Sect. 5 we compare the performance of a combination of our reductions and the modal-layered resolution calculus implemented in the prover KSP with resolution calculi specifically designed for the logics under consideration as well as the prover LEO-III.

#### **2 Preliminaries**

The language of modal logic is an extension of the language of propositional logic with a unary modal operator ✷ and its dual ✸. More precisely, given a denumerable set of *propositional symbols*, <sup>P</sup> <sup>=</sup> {p, p0, q, q0, t, t0,...} as well as propositional *constants* **true** and **false**, *modal formulae* are inductively defined as follows: constants and propositional symbols are modal formulae. If ϕ and ψ are modal formulae, then so are <sup>¬</sup>ϕ, (<sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup>), (<sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>), (<sup>ϕ</sup> <sup>→</sup> <sup>ψ</sup>), ✷ϕ, and ✸ϕ. We also assume that ∧ and ∨ are associative and commutative operators and consider, e.g., (p∨(q∨r)) and (r∨(q∨p)) to be identical formulae. We often omit parentheses if this does not cause confusion. By var(ϕ) we denote the set of all propositional symbols occurring in ϕ. This function straightforwardly extends to finite sets of modal formulae. A *modal axiom (schema)* is a modal formula ψ representing the set of all instances of ψ.

A *literal* is either a propositional symbol or its negation; the set of literals is denoted by <sup>L</sup><sup>P</sup> . By <sup>¬</sup><sup>l</sup> we denote the *complement* of the literal <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> , that is, if <sup>l</sup> is the propositional symbol <sup>p</sup> then <sup>¬</sup><sup>l</sup> denotes <sup>¬</sup>p, and if <sup>l</sup> is the literal <sup>¬</sup><sup>p</sup> then <sup>¬</sup><sup>l</sup> denotes <sup>p</sup>. By <sup>|</sup>l<sup>|</sup> for <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> we denote <sup>p</sup> if <sup>l</sup> <sup>=</sup> <sup>p</sup> or <sup>l</sup> <sup>=</sup> <sup>¬</sup>p. A *modal literal* is either ✷<sup>l</sup> or ✸l, where <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> .

A *(normal) modal logic* is a set of modal formulae which includes all propositional tautologies, the axiom schema ✷(<sup>ϕ</sup> <sup>→</sup> <sup>ψ</sup>) <sup>→</sup> (✷<sup>ϕ</sup> <sup>→</sup> ✷ψ), called the *axiom* <sup>K</sup>, it is closed under modus ponens (if <sup>ϕ</sup> and <sup>ϕ</sup> <sup>→</sup> <sup>ψ</sup> then <sup>ψ</sup>) and the rule of necessitation (if <sup>ϕ</sup> then ✷ϕ).

K is the weakest modal logic, that is, the logic given by the smallest set of modal formulae constituting a normal modal logic. By KΣ we denote an *extension* of K by a set Σ of axioms.

The standard semantics of modal logics is the *Kripke semantics* or *possible world semantics*. A *Kripke frame* <sup>F</sup> is an ordered pair W, R where <sup>W</sup> is a nonempty set of *worlds* and R is a binary (accessibility) relation over W. A *Kripke structure* <sup>M</sup> over <sup>P</sup> is an ordered pair F, V where <sup>F</sup> is a Kripke frame and the *valuation* V is a function mapping each propositional symbol in P to a subset <sup>V</sup> (p) of <sup>W</sup>. A *rooted Kripke structure* is an ordered pair M,w0 with <sup>w</sup><sup>0</sup> <sup>∈</sup> <sup>W</sup>. To simplify notation, in the following we write W, R, V and W, R, V, w0 instead of W, R, V and W, R, V , w0, respectively.

Satisfaction (or truth) of a formula at a world w of a Kripke structure M = W, R, V is inductively defined by:



**Table 1.** Modal axioms and relational frame properties

**Table 2.** Rewriting Rules for Simplification

```
ϕ ∧ ϕ ⇒ ϕ
   ϕ ∨ ϕ ⇒ ϕ
ϕ ∧ true ⇒ ϕ
                ϕ ∧ ¬ϕ ⇒ false
                ϕ ∨ ¬ϕ ⇒ true
              ϕ ∧ false ⇒ false
                                  ✷true ⇒ true
                                  ✸false ⇒ false
                                ϕ ∨ false ⇒ ϕ
                                                    ¬true ⇒ false
                                                    ¬false ⇒ true
                                                  ϕ ∨ true ⇒ true
                                                                   ¬¬ϕ ⇒ ϕ
```
If M,w |<sup>=</sup> <sup>ϕ</sup> holds then <sup>M</sup> is a *model* of <sup>ϕ</sup>, <sup>ϕ</sup> is *true at* <sup>w</sup> *in* <sup>M</sup> and <sup>M</sup> *satisfies* ϕ. A modal formula ϕ is *satisfiable* iff there exists a Kripke structure <sup>M</sup> and a world <sup>w</sup> in <sup>M</sup> such that M,w |<sup>=</sup> <sup>ϕ</sup>.

We are interested in extensions of K with the modal axioms shown in Table 1 and their combinations. Each of these axioms defines a class of Kripke frames where the accessibility relation R satisfies the first-order property stated in the table. Combinations of axioms then define a class of Kripke frames where the accessibility relation satisfies the combination of their corresponding properties.

Given a normal modal logic L with corresponding class of frames F, we say a modal formula <sup>ϕ</sup> is <sup>L</sup>*-satisfiable* iff there exists a frame <sup>F</sup> <sup>∈</sup> <sup>F</sup>, a valuation <sup>V</sup> and a world <sup>w</sup> <sup>∈</sup> <sup>F</sup> such that F, V, w |<sup>=</sup> <sup>ϕ</sup>. It is <sup>L</sup>*-valid* or *valid in* <sup>L</sup> iff for every frame <sup>F</sup> <sup>∈</sup> <sup>F</sup>, every valuation <sup>V</sup> and every world <sup>w</sup> <sup>∈</sup> <sup>F</sup>, F, V, w |<sup>=</sup> <sup>ϕ</sup>. A normal modal logic L<sup>2</sup> is *an extension* of a normal modal logic L<sup>1</sup> iff all L1-valid formulae are also L2-valid.

A rooted Kripke structure <sup>M</sup> <sup>=</sup> W, R, V, w0 is a *rooted tree Kripke structure* iff R is a tree, that is, a directed acyclic connected graph where each node has at most one predecessor, with *root* w0. It is a *rooted tree Kripke model* of a modal formula <sup>ϕ</sup> iff W, R, V, w0 |<sup>=</sup> <sup>ϕ</sup>. In a rooted tree Kripke structure with root <sup>w</sup><sup>0</sup> for every world <sup>w</sup><sup>k</sup> <sup>∈</sup> <sup>W</sup> there is exactly one path connecting <sup>w</sup><sup>0</sup> and <sup>w</sup>k, the length of that path is the *modal level of* w<sup>k</sup> *(in* M*), denoted by* mlM(wk).

It is well-known [17] that a modal formula ϕ is K-satisfiable iff there is a finite rooted tree Kripke structure <sup>M</sup> <sup>=</sup> F, V, w0 such that M,w0 |<sup>=</sup> <sup>ϕ</sup>.

For the reductions presented in the next section we assume that any modal formula ϕ has been simplified by exhaustively applying the rewrite rules in Table 2, and it is in Negation Normal Form (NNF). That is, a formula where only propositional symbols are allowed in the scope of negations. We say that such a formula is in *simplified NNF*.

The reductions produce formulae in a clausal normal form, called *Separated Normal Form with Sets of Modal Levels* SNFsml, introduced in [29]. The language of SNFsml extends that of the basic modal logic K with sets of modal levels as labels. Clauses in SNFsml have one of the following forms:

S : n <sup>i</sup>=1 <sup>l</sup><sup>i</sup> (literal clause) S : l <sup>→</sup> ✷<sup>l</sup> (positive modal clause) S : l <sup>→</sup> ✸<sup>l</sup> (negative modal clause)

where <sup>S</sup> <sup>⊆</sup> <sup>N</sup> and <sup>l</sup>, <sup>l</sup> , <sup>l</sup><sup>i</sup> are propositional literals with 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>n</sup> <sup>∈</sup> <sup>N</sup>. We write : ϕ instead of N : ϕ and such clauses are called *global clauses*. Positive and negative modal clauses are together known as *modal clauses*.

Given a rooted tree Kripke structure M and a set S of natural numbers, by M[S] we denote the set of worlds that are at a modal level in S, that is, <sup>M</sup>[S] = {<sup>w</sup> <sup>∈</sup> <sup>W</sup> <sup>|</sup> mlM(w) <sup>∈</sup> <sup>S</sup>}. Then

$$M \vdash S : \varphi \text{ iff } \langle M, w \rangle \vdash \varphi \text{ for every world } w \in M[S].$$

The motivation for using a set S to label clauses is that in our reductions the formula ϕ may hold at several levels, possibly an infinite number of levels. It therefore makes sense to label such formulae not with just a single level, but a set of levels. The Separated Normal Form with Modal Level, SNFml, can be seen as the special case of SNFsml where all labelling sets are singletons.

Note that if <sup>S</sup> <sup>=</sup> <sup>∅</sup>, then <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> : <sup>ϕ</sup> trivially holds. Also, a Kripke structure <sup>M</sup> can satisfy <sup>S</sup> : **false** if there is no world <sup>w</sup> with mlM(w) <sup>∈</sup> <sup>S</sup>. On the other hand, <sup>S</sup> : **false** with 0 <sup>∈</sup> <sup>S</sup> is unsatisfiable as a rooted tree Kripke structure always has a world with modal level 0.

If <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> : <sup>ϕ</sup>, then we say that <sup>S</sup> : <sup>ϕ</sup> *holds in* <sup>M</sup> or *is true in* <sup>M</sup>. For a set <sup>Φ</sup> of labelled formulae, <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup> iff <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> : <sup>ϕ</sup> for every <sup>S</sup> : <sup>ϕ</sup> in <sup>Φ</sup>, and we say <sup>Φ</sup> is K*-satisfiable*.

We introduce some notation that will be used in the following. Let S<sup>+</sup> = {l+ 1 <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>S</sup>}, <sup>S</sup><sup>−</sup> <sup>=</sup> {l−<sup>1</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>S</sup>}, and <sup>S</sup><sup>≥</sup> <sup>=</sup> {<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> min(S)}, where min(S) is the least element in S. Note that the restriction of the elements being in N implies that S<sup>−</sup> cannot contain negative numbers.

# **3 Extensions of K**

In this section we define reductions from all the logics in the modal cube to SNFsml. We assume that the set P of propositional symbols is partitioned into two infinite sets Q and T such that Q contains the propositional symbols of the modal formula ϕ under consideration, and T surrogate symbols t<sup>ψ</sup> for every subformula ψ of ϕ and supplementary propositional symbols. In particular, for every modal formula <sup>ψ</sup> we have var(ψ) <sup>⊂</sup> <sup>Q</sup> and there exists a propositional symbol <sup>t</sup><sup>ψ</sup> <sup>∈</sup> <sup>T</sup> uniquely associated with <sup>ψ</sup>. These surrogate symbols serve the same purpose as Tseitin variables [40] and Skolem predicates [30,39] in the transformation of propositional and first-order formulae, respectively, to clausal form via structural transformation.

It turns out that given a reduction <sup>ρ</sup>K<sup>Σ</sup> for <sup>K</sup><sup>Σ</sup> with {D,T} ∩ <sup>Σ</sup> <sup>=</sup> <sup>∅</sup>, there is a uniform and straightforward way we can obtain a reduction for KDΣ and KTΣ from ρK<sup>Σ</sup>. Also, the valid formulae of KDTΣ are the same as those of


**Table 3.** Categorisation of modal logics in the modal cube

KTΣ, so we do not need to consider the case of adding both axioms to KΣ. Similarly, the logics KT45, KDB4, KTB4 and KT5 all have the same set of valid formulae. Therefore, as shown in Table 3, we can divide the 15 modal logics into three categories: Six 'base logics', five modal logics obtained by extending a 'base logic' with D, and a further four modal logics obtained by extending a 'base logic' with T. For four of the six 'base logics' (namely, K, KB, K4, and K5) we have already devised reductions in [29], so only two (i.e., KB4 and K45) remain.

Given a modal formula <sup>ϕ</sup> in simplified NNF and <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>Σ</sup> with <sup>Σ</sup> <sup>⊆</sup> {B, <sup>D</sup>,T, <sup>4</sup>, <sup>5</sup>}, we can obtain a set <sup>Φ</sup><sup>L</sup> of clauses in SNFsml such that <sup>ϕ</sup> is L-satisfiable iff Φ<sup>L</sup> is K-satisfiable with Φ<sup>L</sup> = ρsml <sup>L</sup> (ϕ) = {{0} : <sup>t</sup><sup>ϕ</sup>} ∪ <sup>ρ</sup>L({0} : <sup>t</sup><sup>ϕ</sup> <sup>→</sup> <sup>ϕ</sup>), where <sup>ρ</sup><sup>L</sup> is defined as follows:

$$\begin{aligned} \rho\_L(S:t \to \mathbf{true}) &= \emptyset \\ \rho\_L(S:t \to \mathbf{false}) &= \{S:\neg t\} \\ \rho\_L(S:t \to (\psi\_1 \land \psi\_2)) &= \{S:\neg t \lor \eta(\psi\_1), S:\neg t \lor \eta(\psi\_2)\} \cup \delta\_L(S, \psi\_1) \cup \delta\_L(S, \psi\_2) \\ \rho\_L(S:t \to \psi) &= \{S:\neg t \lor \psi\} \\ \text{if } \psi &\text{ is a disjoint of literals} \\ \rho\_L(S:t \to (\psi\_1 \lor \psi\_2)) &= \{S:\neg t \lor \eta(\psi\_1) \lor \eta(\psi\_2)\} \cup \delta\_L(S, \psi\_1) \cup \delta\_L(S, \psi\_2) \\ \text{if } \psi\_1 \lor \psi\_2 \text{ is not a disjoint of literals} \\ \rho\_L(S:t \to \lozenge \psi) &= \{S:t \to \lozenge \eta(\psi)\} \cup \delta\_L(S^+, \psi) \end{aligned}$$

<sup>ρ</sup>L(<sup>S</sup> : <sup>t</sup> <sup>→</sup> ✷ψ) = <sup>P</sup>L(<sup>S</sup> : <sup>t</sup> <sup>→</sup> ✷ψ) <sup>∪</sup> <sup>Δ</sup>L(<sup>S</sup> : <sup>t</sup> <sup>→</sup> ✷ψ)

η and δ<sup>L</sup> are defined as follows:

$$\eta(\psi) = \begin{cases} \psi, & \text{if } \psi \text{ is a literal} \\ t\_{\psi}, & \text{otherwise} \end{cases} \quad \delta\_L(S, \psi) = \begin{cases} \emptyset, & \text{if } \psi \text{ is a literal} \\ \rho\_L(S: t\_{\psi} \to \psi), & \text{otherwise} \end{cases}$$

and functions P<sup>L</sup> and ΔL, are defined as shown in Table 4.

We can see in Table 4 that the reduction for KB4 has an additional SNFsml clause : <sup>t</sup>✷<sup>ψ</sup> <sup>∨</sup>t✷¬t✷<sup>ψ</sup> that occurs neither in the reduction for KB nor in that for K4. It can be seen as an encoding of the derived axiom ✸✷<sup>ψ</sup> <sup>→</sup> ✷<sup>ψ</sup> that follows from the contrapositive ✸✷<sup>ψ</sup> <sup>→</sup> <sup>ψ</sup> of <sup>B</sup> and <sup>4</sup> ✷ψ <sup>→</sup> ✷✷ψ .

For K45 we see that all the SNFsml clauses in the reduction for K5 carry over. These clauses are already sufficient to ensure that, semantically, if t✷<sup>ψ</sup> is true at any world at a level other than 0, then t✷<sup>ψ</sup> is true at every world. Consequently, to accommodate axiom <sup>4</sup>, it suffices to add the SNFsml clause {0} : <sup>t</sup>✷<sup>ψ</sup> <sup>→</sup> ✷t✷<sup>ψ</sup> to ensure that this also holds for the root world at level 0.


where lb<sup>P</sup> *<sup>K</sup>*<sup>Σ</sup> and lb<sup>δ</sup> *<sup>K</sup>*<sup>Σ</sup> are defined as follows

**Table 4.** Reduction of ✷-formulae, <sup>Σ</sup> ⊆ {B, <sup>4</sup>, <sup>5</sup>}.


For reductions of KDΣ and KTΣ we have favoured the reuse of reductions for KΣ, KD and KT over optimisation for specific logics. For example, take KBD. Given that in a symmetric model, every world w except the root world w<sup>0</sup> has an R-successor, the axiom D only 'enforces' that w<sup>0</sup> also has an R-successor. So, instead of adding a clause <sup>S</sup> : <sup>t</sup>✷<sup>ψ</sup> <sup>→</sup> ✸<sup>ψ</sup> for every clause <sup>S</sup> : <sup>t</sup>✷<sup>ψ</sup> <sup>→</sup> ✷η(ψ) we could just add {0} : <sup>t</sup>✷<sup>ψ</sup> <sup>→</sup> ✸<sup>ψ</sup> iff 0 <sup>∈</sup> <sup>S</sup>. Similarly, in KT5, because of <sup>5</sup>, for all worlds <sup>w</sup> except <sup>w</sup><sup>0</sup> we already have wRw. So, we could again {0} : <sup>¬</sup>t✷<sup>ψ</sup> <sup>∨</sup>η(ψ) for every clause <sup>S</sup> : <sup>t</sup>✷<sup>ψ</sup> <sup>→</sup> ✷η(ψ) iff 0 <sup>∈</sup> <sup>S</sup>.

For the KB4-unsatisfiable formula <sup>ψ</sup><sup>1</sup> = (¬<sup>p</sup> <sup>∧</sup> ✸✸✷p), if we were to independently apply the reductions for KB and K4, that is, we compute {{0} : <sup>t</sup><sup>ψ</sup><sup>1</sup> } ∪ρKB({0} : <sup>t</sup><sup>ψ</sup><sup>1</sup> <sup>→</sup> <sup>ψ</sup>1)∪ρK4({0} : <sup>t</sup><sup>ψ</sup><sup>1</sup> <sup>→</sup> <sup>ψ</sup>1), then the result is the following set of clauses Φ1:

(1) {0} : <sup>t</sup><sup>ψ</sup><sup>1</sup> (2) {0} : <sup>¬</sup>t<sup>ψ</sup><sup>1</sup> ∨ ¬<sup>p</sup> (3) {0} : <sup>¬</sup>t<sup>ψ</sup><sup>1</sup> <sup>∨</sup> <sup>t</sup>✸✸✷<sup>p</sup> (4) {0} : <sup>t</sup>✸✸✷<sup>p</sup> <sup>→</sup> ✸t✸✷<sup>p</sup> (5) {1} : <sup>t</sup>✸✷<sup>p</sup> <sup>→</sup> ✸t✷<sup>p</sup> (6) {2}<sup>≥</sup> : <sup>t</sup>✷<sup>p</sup> <sup>→</sup> ✷<sup>p</sup> (7) {2}<sup>≥</sup> : <sup>t</sup>✷<sup>p</sup> <sup>→</sup> ✷t✷<sup>p</sup> (8) {1} : <sup>p</sup> <sup>∨</sup> <sup>t</sup>✷¬t✷<sup>p</sup> (9) {1} : <sup>t</sup>✷¬t✷<sup>p</sup> <sup>→</sup> ✷¬t✷<sup>p</sup>

Clauses (1) to (5) stem from the transformation of ψ<sup>1</sup> to SNFsml for K, Clauses (6) and (7) stem from the reduction for 4 and Clauses (8) and (9) stem from the reduction for B. This set of SNFsml clauses is K-satisfiable. The clauses imply {1} : <sup>p</sup>, but neither {1} : ✷<sup>p</sup> nor {0} : <sup>p</sup> which we need to obtain a contradiction. Part of the reason is that we would need to apply the reduction for 4 and B recursively to newly introduced surrogates for ✷-formulae which in turn leads to the introduction of further surrogates and problems with the termination of the reduction.

In contrast, the clause set Φ<sup>2</sup> obtained by our reduction for KB4 is:

$$\begin{array}{llll} \text{(10)} \ \{0\}: t\_{\psi\_{1}} & \text{(15)} \star: t\_{\Box p} \to \Box p & \text{(17)} \star: p \lor t\_{\Box \neg t\_{\Box p}}\\ \text{(11)} \ \{0\}: \neg t\_{\psi\_{1}} \vee \neg p & \text{(16)} \star: t\_{\Box p} \to \Box t\_{\Box p} & \text{(18)} \star: t\_{\Box \neg t\_{\Box p}} \to \Box \neg t\_{\Box p} \\ \text{(12)} \ \{0\}: \neg t\_{\psi\_{1}} \vee t\_{\Diamond \Diamond \Box p} & \text{(19)} \star: t\_{\Box p} \lor t\_{\Box \neg t\_{\Box \neg t\_{\Box p}}}\\ \text{(13)} \ \{0\}: t\_{\Diamond \Diamond \Box p} \to \Diamond t\_{\Diamond \Diamond p} & \text{(20)} \star: t\_{\Box \neg t\_{\Box \neg t\_{\Box p}}} \to \Box t\_{\Box \neg t\_{\Box p}}\\ \text{(14)} \ \{1\}: t\_{\Diamond \Diamond p} \to \Diamond t\_{\Box p} & \end{array}$$

Note Clauses (19) and (20) in Φ<sup>2</sup> for which there are no corresponding clauses in Φ1. Also, the set of labels of Clauses (15) to (18) are strict supersets of those of the corresponding Clauses (6) to (9). <sup>Φ</sup><sup>2</sup> implies both {1} : ✷<sup>p</sup> and {0} : <sup>p</sup>. The latter, together with Clauses (10) and (11), means Φ<sup>2</sup> is K-unsatisfiable.

**Theorem 1.** *Let* <sup>ϕ</sup> *be a modal formula in simplified NNF,* <sup>Σ</sup> ⊆ {B, <sup>D</sup>,T, <sup>4</sup>, <sup>5</sup>}*, and* ΦK<sup>Σ</sup> = ρsml <sup>K</sup><sup>Σ</sup> (ϕ)*. Then* <sup>ϕ</sup> *is* <sup>K</sup>Σ*-satisfiable iff* <sup>Φ</sup>K<sup>Σ</sup> *is* <sup>K</sup>*-satisfiable.*

*Proof (Sketch).* For <sup>|</sup>Σ| ≤ 1 this follows from Theorem 5 in [29].

For K45, KB4, KDΣ , and KTΣ with <sup>Σ</sup> ⊆ {B, <sup>4</sup>, <sup>5</sup>} we proceed in analogy to the proofs of Theorems 3 and 4 in [29]. Let L be one of these logics.

To show that if ϕ is L-satisfiable then Φ<sup>L</sup> is K-satisfiable, we show that given a rooted L-model M of ϕ a small variation of the unravelling of M is a rooted tree <sup>K</sup>-model M <sup>L</sup> of <sup>Φ</sup>L. The main step is to define the valuation of the additional propositional symbols t<sup>ψ</sup> so that we can prove that all clauses in Φ<sup>L</sup> hold in M <sup>L</sup>. To show that if <sup>Φ</sup><sup>L</sup> is <sup>K</sup>-satisfiable then <sup>ϕ</sup> is <sup>L</sup>-satisfiable, we take a rooted tree <sup>K</sup>-model <sup>M</sup> <sup>=</sup> W, R, V, w0 of <sup>Φ</sup><sup>L</sup> and construct a Kripke structure <sup>M</sup><sup>L</sup> <sup>=</sup> W, R<sup>L</sup>,V,w0. The relation <sup>R</sup><sup>L</sup> is the closure of <sup>R</sup> under the relational properties associated with the axioms of L. The proof that M<sup>L</sup> is a model of ϕ relies on the fact that the clauses in Φ<sup>L</sup> ensure that for subformulae ✷ψ of ϕ, ψ will be true at all worlds reachable via <sup>R</sup><sup>L</sup> from a world where ✷<sup>ψ</sup> is true.

# **4 From SNF***sml* **to SNF***ml*

As KSP does not support SNFsml, in our evaluation of the effectiveness of the reductions defined in Sect. 3, we have used a transformation from SNFsml to SNFml. An alternative approach would be to reflect the use of SNFsml in the calculus and re-implement the prover. Whilst we believe that redesigning the calculus presents few problems, re-implementing KSP needs more thought in particular how to represent infinite sets. The route we adopt here allows us to experiment with the approach in general without having to change the prover. For extensions of K with one or more of the axioms B, D, T such a transformation


**Table 5.** Bounds on the length of prefixes in SST tableaux

is straightforward as the sets of modal levels occurring in the normal form of modal formulae are all finite. Thus, instead of a single SNFsml clause <sup>S</sup> : <sup>¬</sup>t<sup>ψ</sup> <sup>∨</sup> <sup>η</sup><sup>f</sup> (ψ) we can use the finite set of SNFml clauses {ml : <sup>¬</sup>t<sup>ψ</sup> <sup>∨</sup> <sup>η</sup><sup>f</sup> (ψ) <sup>|</sup> ml <sup>∈</sup> <sup>S</sup>}.

For extensions of K with at least one of the axioms 4 and 5, potentially together with other axioms, the sets of modal levels labelling clauses are in general infinite. For each logic L it is, however, possible to define a computable function that maps the modal formula ϕ under consideration onto a bound db<sup>ϕ</sup> L such that, restricting the modal levels in the normal form of ϕ by db<sup>ϕ</sup> <sup>L</sup>, preserves satisfiability equivalence.

To establish the bound and prove satisfiability equivalence, we need to introduce the basic notions of Single Step Tableaux (SST) calculi for a modal logic L [14,21], which uses sequences of natural numbers to prefix modal formulae in a tableau. The SST calculus consists of a set of rules, with the (π) rule being the only rule increasing prefixes' lengths (i.e., σ : ✸ϕ/σ.n : ϕ with σ.n new on the branch). For a logic <sup>L</sup>, an <sup>L</sup>*-tableau* <sup>T</sup> in the SST calculus for a modal formula <sup>ϕ</sup> is a (binary) tree where the root of <sup>T</sup> . is labelled with 1 : <sup>ϕ</sup>, and every other node is labelled with a prefixed formula σ : ψ obtained by application of a rule of the calculus. A *branch* B is a path from the root to a leaf. A *branch* B is *closed* if it contains either **false** or a propositional contradiction at the same prefix. A *tableau* "<sup>T</sup> is *closed* if all its branches are closed. A prefixed formula <sup>σ</sup> : <sup>ψ</sup> is *reduced for rule* (r) *in* <sup>B</sup> if the branch <sup>B</sup> already contains the conclusion of such rule application. By a *systematic tableau construction* we mean an application of the procedure in [14, p. 374] adapted to SST rules.

For each logic L, we establish its bound by considering an L-SST calculus, where a modal level in an SNFsml clause corresponds to the length of a prefix in an SST tableau. The bound then either follows from an already known bound on the length of prefixes in an SST tableau preserving correctness of the SST calculus, or we establish such a bound ourselves. To prove satisfiability equivalence, we show that, for a closed SST tableau with such a bound on the length of prefixes in place, we can construct a resolution refutation of a set of SNFsml or SNFml clauses with a corresponding bound on modal levels in those clauses.

For a modal formula ϕ in simplified NNF let d<sup>ϕ</sup> <sup>m</sup> be the modal depth of ϕ, d<sup>ϕ</sup> ✸ be the maximal nesting of ✸-operators not under the scope of any ✷ operators in ϕ, n<sup>ϕ</sup> ✷ be the number of ✷-subformulae in <sup>ϕ</sup>, and <sup>n</sup><sup>ϕ</sup> ✸ be the number of ✸-subformulae below ✷-operators in ϕ. Our results for the bounds on the length of prefixes in SST tableaux can then be summarised by the following theorem.

**Theorem 2.** *Let* <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>Σ</sup> *with* <sup>Σ</sup> ⊆ {B, <sup>D</sup>,T, <sup>4</sup>, <sup>5</sup>}*. A systematic tableau construction of an* L*-tableau for a modal formula* ϕ *in simplified NNF under the following Constraints (TC1) and (TC2)*


*terminates in one of following states:*


The proof is analogous to Massacci's [21, Section B.2]. Note that for logics KD4 and KD5, we use max(1, n<sup>ϕ</sup> ✸) in the calculation of the bound. That is, if <sup>n</sup><sup>ϕ</sup> ✸ ≥ 1 then max(1, n<sup>ϕ</sup> ✸) = <sup>n</sup><sup>ϕ</sup> ✸ and the bound is the same as for K4 and K5. Otherwise max(1, n<sup>ϕ</sup> ✸) = 1, that is, the bound is the same as for a formula with a single ✸-subformula below ✷-operators in ϕ.

For K, KD, KT, KB and KDB these bounds were already stated in [21, Tables III and IV]. The bound for KTB follows straightforwardly from that for KB and KDB. For KD4, Massacci [21, Tables III and IV] states the bound to be the same as for K4. However, this is not correct for the case that the formula ϕ contains no ✸-formulae, where its bound would simply be 2, independent of ϕ. For example, the formula ✷✷✷**false** which is KD4-unsatisfiable, does not have a closed KD4-tableau with this bound. For the other logics the bounds are new. As argued in [21], the bounds allow tableau decision procedures for extensions of K with axioms 4 and 5 that do not require a loop check and are therefore of wider interest.

Note that in KT4, ✷✷<sup>ψ</sup> and ✷<sup>ψ</sup> are equivalent and so are ✷(ψ∧✷ϑ) and ✷(ψ<sup>∧</sup> ϑ). So, it makes sense to further simplify KT4 formulae using such equivalences before computing the normal form and the bound with the benefit that it may not only reduce the bound but also the size of the normal form. Similar equivalences that can be used to reduce the number of modal operators in a formula also exist for other logics, see, e.g., [8, Chapter 4].

To establish a relationship between closed tableaux and resolution refutations of a set of SNFml clauses, we formally define the modal layered resolution calculus. Table 6 shows the inference rules of the calculus restricted to labels occurring in our normal form. For GEN1 and GEN3, if the modal clauses in the premises occur at the modal level ml, then the literal clause in the premises occurs at modal level ml + 1.

Let Φ be a set of SNFml clauses. A *(resolution) derivation from* Φ is a sequence of sets <sup>Φ</sup>0, Φ1,... where <sup>Φ</sup><sup>0</sup> <sup>=</sup> <sup>Φ</sup> and, for each i > 0, <sup>Φ</sup>i+1 <sup>=</sup> <sup>Φ</sup><sup>i</sup> ∪ {D}, where <sup>D</sup> <sup>∈</sup> <sup>Φ</sup><sup>i</sup> is the resolvent obtained from <sup>Φ</sup><sup>i</sup> by an application of one of the inference rules to premises in Φi. A *(resolution) refutation of* Φ is a derivation Φ0,...,Φk, <sup>k</sup> <sup>∈</sup> <sup>N</sup>, where 0 : **false** <sup>∈</sup> <sup>Φ</sup>k.

To map a set of SNFsml clauses to a set of SNFml clauses, using a bound <sup>n</sup> <sup>∈</sup> <sup>N</sup> on the modal levels, we define a function db<sup>n</sup> on clauses and sets of clauses in SNFsml as follows:

$$\begin{aligned} \mathrm{db}\_n(S:\varphi) &= \{ ml: \varphi \mid ml \in S \text{ and } ml \le n \} \\ \mathrm{db}\_n(\Phi) &= \bigcup\_{S: \varphi \in \Phi} \mathrm{db}\_n(S:\varphi) \end{aligned}$$

Note that prefixes in SST-tableaux have a minimal length of 1 while the minimal modal level in SNFml clauses is 0. So, a prefix of length n in a prefixed formula corresponds to a modal level <sup>n</sup> <sup>−</sup> 1 in an SNFml clause.

The proof of the following theorem then takes advantage of the fact that we have surrogates and associated clauses for each subformula of ϕ and proceeds by induction over applications of rule (π).

**Theorem 3.** *Let* <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>Σ</sup> *with* <sup>Σ</sup> ⊆ {B, <sup>D</sup>,T, <sup>4</sup>, <sup>5</sup>}*,* <sup>ϕ</sup> *be a* <sup>K</sup>Σ*-unsatisfiable formula in simplified NNF,* db<sup>ϕ</sup> <sup>L</sup> *be as defined in Table 5, and* Φ<sup>L</sup> = ρml <sup>L</sup> (ϕ) = dbdb<sup>ϕ</sup> L−<sup>1</sup>(ρsml <sup>L</sup> (ϕ))*. Then there is a resolution refutation of* ΦL*.*

Regarding the size of the encoding, we note that, ignoring the labelling sets, the reduction ρsml <sup>L</sup> into SNFsml is linear with respect to the size of the original formula. The size including the labelling sets would depend on the exact representation of those sets, in particular, of infinite sets. As those are not arbitrary, there is still an overall polynomial bound on the size of the sets of SNFsml clauses produced by ρsml <sup>L</sup> . When transforming clauses from SNFsml into SNFml, we may need to add every clause to all levels within the bounds provided by Theorem 3. The parameters for calculating those bounds, d<sup>ϕ</sup> <sup>m</sup>, d<sup>ϕ</sup> ✸, <sup>n</sup><sup>ϕ</sup> ✸, and <sup>n</sup><sup>ϕ</sup> ✷, are all themselves linearly bound by the size of the formula. Thus, in the worst case, which is S4, the size of the clause set produced by ρml <sup>L</sup> is bounded by a polynomial of degree 3 with respect to the size of the original formula.

It is worth pointing out that both the reduction ρsml <sup>L</sup> of a modal formula to SNFsml and the reduction ρml <sup>L</sup> to SNFml are also reversible, that is, we can reconstruct the original formula from the SNFsml and from the SNFml clause set obtained by ρsml <sup>L</sup> or ρml <sup>L</sup> , respectively. This reconstruction can also be performed in polynomial time. Thus the reduction itself does not affect the complexity of the satisfiability problem. For instance, the satisfiability problem for S5 is NP-complete and so is the satisfiability problem of the subclass CS5 of SNFml clause sets that can be obtained as the result of an application of ρml S5 to a modal formula. However, a generic decision procedure for K will not be a complexityoptimal decision procedure for CS5.

#### **Table 6.** Inference rules of the MLR calculus

LRES : ml : <sup>D</sup> <sup>∨</sup> <sup>l</sup> ml : <sup>D</sup> ∨ ¬<sup>l</sup> ml : <sup>D</sup> <sup>∨</sup> <sup>D</sup> MRES : ml : <sup>l</sup><sup>1</sup> <sup>→</sup> ✷<sup>l</sup> ml : <sup>l</sup><sup>2</sup> <sup>→</sup> ✸¬<sup>l</sup> ml : <sup>¬</sup>l<sup>1</sup> ∨ ¬l<sup>2</sup> GEN2 : ml : l <sup>1</sup> <sup>→</sup> ✷l<sup>1</sup> ml : l <sup>2</sup> <sup>→</sup> ✷¬l<sup>1</sup> ml : l <sup>3</sup> <sup>→</sup> ✸l<sup>2</sup> ml : <sup>¬</sup><sup>l</sup> <sup>1</sup> ∨ ¬<sup>l</sup> <sup>2</sup> ∨ ¬<sup>l</sup> 3 GEN1 : ml : l <sup>1</sup> <sup>→</sup> ✷¬l<sup>1</sup> . . . ml : l <sup>m</sup> <sup>→</sup> ✷¬l<sup>m</sup> ml : l <sup>→</sup> ✸¬<sup>l</sup> ml +1: <sup>l</sup><sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>l</sup><sup>m</sup> <sup>∨</sup> <sup>l</sup> ml : <sup>¬</sup><sup>l</sup> <sup>1</sup> <sup>∨</sup> ... ∨ ¬<sup>l</sup> <sup>m</sup> ∨ ¬<sup>l</sup> GEN3 : ml : l <sup>1</sup> <sup>→</sup> ✷¬l<sup>1</sup> . . . ml : l <sup>m</sup> <sup>→</sup> ✷¬l<sup>m</sup> ml : l <sup>→</sup> ✸<sup>l</sup> ml +1: <sup>l</sup><sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>l</sup><sup>m</sup> ml : <sup>¬</sup><sup>l</sup> <sup>1</sup> <sup>∨</sup> ... ∨ ¬l <sup>m</sup> ∨ ¬l

#### **5 Evaluation**

An empirical evaluation of the practical usefulness of the reductions we presented in Sects. 3 and 4 faces the challenge that there is no substantive collection of benchmark formulae for the 15 logics of the modal cube except for basic modal logic. Catach [7] evaluates his prover on 31 modal formulae with a maximal length of 22 and maximal modal depth of 4. They are not sufficiently challenging. The QMLTP Problem Library for First-Order Modal Logics [32] focuses on quantified formulae and contains only a few formulae taken from the research literature that are purely propositional and were not written for the basic modal logic K. The Logics Workbench (LWB) benchmark collection [2] contains formulae for K, KT and S4 but not for any of the other logics we consider. For each of these three logics, the collection consists of 18 parameterised classes with 21 formulae each, plus scripts with which further formulae could be generated if needed. All formulae in 9 classes are satisfiable and all formulae in the other 9 classes are unsatisfiable in the respective logic.

In [29] we have used the 18 classes of the LWB benchmark collection for K to evaluate our approach for the six logics consisting of K and its extensions with a single axiom. One drawback of using these 18 classes for other modal logics is that formulae that are K-satisfiable are not necessarily KΣ-satisfiable for non-empty sets Σ of additional axioms. For example, for K5, only 60 out of 180 K-satisfiable formulae were K5-satisfiable. Another drawback is that while K-unsatisfiable formulae are also KΣ-unsatisfiable, a resolution refutation would not necessarily involve any of the additional clauses introduced by our reduction for KΣ. It may be that the additional clauses allow us to find a shorter refutation, but it may just be a case of finding the same refutation in a larger search space. It is also worth recalling that simplification alone is sufficient to determine that all formulae in the class k lin p are K-unsatisfiable while pure literal elimination can be used to reduce all formulae in k grz p to the same simple formula [26].


**Table 7.** Logic-specific modification of unsatisfiable benchmark formulae

Thus, some of the classes evaluate the preprocessing capabilities of a prover but not the actual calculus and its implementation.

We therefore propose a different approach here. The principles underlying our approach are that (i) there should be the same number of formulae for each logic though not necessarily the same formulae across all logics; (ii) there should be an equal number of satisfiable and unsatisfiable formulae for each logic; (iii) a formula that is L-unsatisfiable should only be L -unsatisfiable for every extension L of L; (iv) a formula that is L -satisfiable should be L-satisfiable for every extension L of L; (v) the formulae should belong to parameterised classes of formulae of increasing difficulty. Note that Principles (iii) and (iv) are intentionally not symmetric. For L-unsatisfiable formulae it should be necessary for a prover to use the rules or clauses specific to L instead of being able to find a refutation without those. For L-satisfiable formulae we want to maximise the search space for a model.

For unsatisfiable formulae, we take the five LWB classes k branch p, k path p, k ph p, k poly p, k t4p p and for each logic L in the modal cube transform each formula in a class so that is L-unsatisfiable, but L -satisfiable for any logic L that is not an extension of L. The transformation proceeds by first converting a formula ϕ to simplified NNF. Then for each propositional literal l it replaces all its occurrences by (l∨ψ<sup>p</sup> <sup>L</sup>) where <sup>|</sup>l<sup>|</sup> <sup>=</sup> <sup>p</sup> and <sup>ψ</sup><sup>p</sup> <sup>L</sup> is a modal formula uniquely associated with p and L, resulting in a formula ϕ . Finally, for logics KD4 and KDB we need to add a disjunct (✷<sup>q</sup> <sup>∧</sup> ✷¬q) to <sup>ϕ</sup> , while for logics S4 and KTB we need to add a disjunct (<sup>q</sup> <sup>∧</sup>✷¬q), where <sup>q</sup> is a propositional symbol not occurring in ϕ . These disjuncts are unsatisfiable in the respective logics but satisfiable in logics where D, or T, do not hold. Table 7 shows the formulae ψ<sup>p</sup> L that we use in our evaluation. In the table, q<sup>p</sup> and q <sup>p</sup> are propositional variables uniquely associated with p that do not occur in ϕ. The overall effect of this transformation is that the resulting classes of formulae satisfy Principles (iii) and (v).

For satisfiable formulae, we use the five classes k poly n, s4 md n, s4 ph n, s4 path n, s4 s5 n without modification. Although the first of these classes was designed to be K-satisfiable and the other four to be S4-satisfiable, the formulae in those classes are satisfiable in all the logics we consider. s4 ipc n also consists


**Table 8.** Benchmarking results

only of S5-satisfiable formulae but these appear to be insufficiently challenging and have not been included in our benchmark set. All other classes of the LWB benchmark classes for K and S4 are satisfiable in some of the logics, but not in all. The five classes satisfy Principles (iv) and (v). The benchmark collection consisting of all ten classes together then also satisfies Principles (i) and (ii).

Another challenge for an empirical evaluation is the lack of available fully automatic theorem provers for all 15 logics that we have already discussed in Sect. 1. This leaves us with just three different approaches we can compare (i) the higher-order logic prover LEO-III [12,37], with **E** 2.6 as external reasoner, *LEO-III+E for short*, that supports a wide range of logics via semantic embedding into higher-order logic (ii) the combination of our reductions with the modallayered resolution (MLR) calculus for SNFml clauses [25], *R+MLR calculus for short*, implemented in the modal theorem prover KSP (iii) the global modal resolution (GMR) calculus, implemented in KSP, which has resolution rules for all 15 logics [24]. For R+MLR and GMR calculi, resolution inferences between literal clauses can either be unrestricted (cplain option), restricted by negative resolution (cneg option), or restricted by an ordering (cord option). It is worth pointing out that negative and ordered resolution require slightly different transformations to the normal form that introduce additional clauses (snf+ and snf++ options, respectively). Also, the ordering cannot be arbitrary [25]. For the experiments, we have used the following options: (i) input processing: prenexing, together with simplification and pure literal elimination (bnfsimp, prenex, early ple); (ii) preprocessing of clauses: renaming reuses symbols (limited reuse renaming), forward and backward subsumption (fsub, bsub) are enabled; the usable is populated with clauses whose maximal literal is positive (populate usable, max lit positive); pure literal elimination is set for GMR (ple) and modal level ple is set for MLR (mlple); (iii) processing: inference rules not required for completeness are also used (unit, lhs unit,mres), the options for preprocessing of clauses are kept and clause selection takes the shortest clause by level (shortest).

For LEO-III we provide the prover with a modal formula in the syntax it expects plus a logic specification that tells the prover in which modal logic the formula is meant to be solved, for example, \$modal system S4. LEO-III can collaborate with external reasoners during proof search and we have used **E** 2.6 [34,35] as external reasoner and restricted LEO-III to one instance of **E** running in parallel. LEO-III is implemented in Java and we have set the maximum heap size to 1 GB and the thread stack size to 64 MB for the JVM.

Table 8 shows our benchmarking results. The first three columns of the table show the logic in which we determine the satisfiability status of each formula, the satisfiability status of the formulae, and their number. The next six columns then show how many of those formulae were solved by KSP with a particular calculus and refinement. The last column shows the result for LEO-III. The highest number or numbers are highlighted in bold. A time limit of 100 CPU seconds was set for each formula. Benchmarking was performed on a PC with an AMD Ryzen 5 5600X CPU @ 4.60 GHz max and 64 GB main memory using Fedora release 34 as operating system.

While the R+MLR calculus is competitive with GMR on extensions of K with axioms D, T and, possibly, B, the GMR calculus has better performance on extensions with axioms 4 and 5.

On satisfiable formulae, where for all logics we use exactly the same formulae and both resolution calculi have to saturate the set of clauses up to redundancy, the number of formulae solved is directly linked to the number of inferences necessary to do so. The fact that we reduce SNFsml clauses to SNFml clauses via the introduction of multiple copies of the same clausal formulae with different labels clearly leads to a corresponding multiplication of the inferences that need to be performed. LEO-III+E does not solve any of the satisfiable formulae. This can be seen as an illustration of how important the use of additional techniques is that can turn resolution into a decision procedure on embeddings of modal logics into first-order logic [18,33].

On unsatisfiable formulae, where we use different formulae for each logic, the number of formulae solved is linked to the number of inferences it takes to find a refutation. For instance, on K it takes the GMR calculus on average 6.2 times the number of inferences to find a refutation than the R+MLR calculus. However, for all other logics the opposite is true. On the remaining 14 logics, the R+MLR calculus on average requires 6.5 times the number of inferences to find a refutation than the GMR calculus. Given that the R+MLR calculus currently uses a reduction from a modal logic to SNFsml followed by a transformation from SNFsml to SNFml, it is difficult to discern which of the two is the major problem. It is clear that multiple copies of the same clausal formulae are also detrimental to proof search. LEO-III+E does reasonably well on unsatisfiable formulae and the results clearly show the impact that additional axioms have on its performance. It performs best for KT and K but for logics involving axioms 4 and 5 very few formulae can be solved. The external prover **E** finds the proof for 121 out of the 122 modal formulae LEO-III+E can solve.

## **6 Conclusions**

We have presented novel reductions of extensions of the modal logic K with arbitrary combinations of the axioms B, D, T, 4, 5 to clausal normal forms SNFsml and SNFml for K. The implementation of those reductions combined with KSP [26], allows us to reason in all 15 logics of the modal cube in a fully automatic way. Such support was so far extremely limited.

The transformation of sets of SNFsml to sets of SNFml relies on new results that show that non-clausal closed tableaux in the Single Step Tableaux calculus [14,21] can be simulated by refutations in the modal-layered resolution (MLR) calculus for SNFml clauses [25].

We have also developed a new collection of benchmark formulae that covers all 15 logics of the modal cube. The collection consists of classes of parameterised and therefore scalable formulae. It contains an equal number of satisfiable and unsatisfiable formulae for each logic and the satisfiability status of each formula is known in advance. So far extensive collections of benchmark formulae were only available for K with smaller collections available for KT and S4. A key feature of the approach is that it uses the systematic modification of K-unsatisfiable formulae to obtain unsatisfiable formulae in other logics. Thus, we could obtain a more extensive collection by applying this approach to further collections of benchmark formulae for K.

The evaluation we presented shows that on most of the 15 modal logics the combination of our reduction to SNFml with the MLR calculus does not perform as well as the global modal resolution (GMR) calculus, also implemented in KSP. This contrasts with the evaluation in [29], where we only considered six logics and used a different collection of benchmarks. We believe that the new benchmark collection more clearly indicates weaknesses in the current approach, in particular, the reduction from SNFsml to SNFml. It is possible that the implementation of a calculus that operates directly on sets of SNFsml clauses would perform considerably better as it avoids the repetition of clauses with different labels. However, it does so by using potentially infinite sets of labels which makes an implementation challenging. We intend to explore this possibility in future work.

### **References**


297–396. Springer, Heidelberg (1999). https://doi.org/10.1007/978-94-017-1754- 0 6


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Proof Systems and Proof Search**

# **Cyclic Proofs, Hypersequents, and Transitive Closure Logic**

Anupam Das(B) and Marianna Girlando

University of Birmingham, Birmingham, UK {a.das,m.girlando}@bham.ac.uk

**Abstract.** We propose a cut-free cyclic system for Transitive Closure Logic (TCL) based on a form of *hypersequents*, suitable for automated reasoning via proof search. We show that previously proposed sequent systems are cut-free incomplete for basic validities from Kleene Algebra (KA) and Propositional Dynamic Logic (PDL), over standard translations. On the other hand, our system faithfully simulates known cyclic systems for KA and PDL, thereby inheriting their completeness results. A peculiarity of our system is its richer correctness criterion, exhibiting 'alternating traces' and necessitating a more intricate soundness argument than for traditional cyclic proofs.

**Keywords:** Cyclic proofs · Transitive Closure Logic · Hypersequents · Propositional Dynamic Logic

## **1 Introduction**

*Transitive Closure Logic* (TCL) is the extension of first-order logic by an operator computing the transitive closure of definable binary relations. It has been studied by numerous authors, e.g. [15–17], and in particular has been proposed as a foundation for the mechanisation and automation of mathematics [1].

Recently, Cohen and Rowe have proposed *non-wellfounded* and *cyclic* systems for TCL [9,11]. These systems differ from usual ones by allowing proofs to be infinite (finitely branching) trees, rather than finite ones, under some appropriate global correctness condition (the 'progressing criterion'). One particular feature of the cyclic approach to proof theory is the facilitation of automation, since complexity of inductive invariants is effectively traded off for a richer proof structure. In fact this trade off has recently been made formal, cf. [3,12], and has led to successful applications to automated reasoning, e.g. [6,7,24,26,27].

In this work we investigate the capacity of cyclic systems to automate reasoning in TCL. Our starting point is the demonstration of a key shortfall of Cohen and Rowe's system: its cut-free fragment, here called TCG, is unable to cyclically prove even standard theorems of relational algebra, e.g. (a ∪ b)<sup>∗</sup> = a∗(ba∗)<sup>∗</sup> and

c The Author(s) 2022

This work was supported by a UKRI Future Leaders Fellowship, 'Structure vs Invariants in Proofs', project reference MR/S035540/1.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 509–528, 2022. https://doi.org/10.1007/978-3-031-10769-6\_30

(aa <sup>∪</sup> aba)<sup>+</sup> <sup>≤</sup> <sup>a</sup><sup>+</sup>((ba<sup>+</sup>)<sup>+</sup> <sup>∪</sup> <sup>a</sup>)) (Theorem 12). An immediate consequence of this is that cyclic proofs of TC<sup>G</sup> do not enjoy cut-admissibility (Corollary 13). On the other hand, these (in)equations are theorems of Kleene Algebra (KA) [18,19], a decidable theory which admits automation-via-proof-search thanks to the recent cyclic system of Das and Pous [14].

What is more, TCL is well-known to interpret Propositional Dynamic Logic (PDL), a modal logic whose modalities are just terms of KA, by a natural extension of the 'standard translation' from (multi)modal logic to first-order logic (see, e.g., [4,5]). Incompleteness of cyclic-TC<sup>G</sup> for PDL over this translation is inherited from its incompleteness for KA. This is in stark contrast to the situation for modal logics without fixed points: the standard translation from K (and, indeed, all logics in the 'modal cube') to first-order logic actually *lifts* to cut-free proofs for a wide range of modal logic systems, cf. [21,22].

A closer inspection of the systems for KA and PDL reveals the stumbling block to any simulation: these systems implicitly conduct a form of 'deep inference', by essentially reasoning underneath ∃ and ∧. Inspired by this observation, we propose a form of *hypersequents* for predicate logic, with extra structure admitting the deep reasoning required. We present the cut-free system HTC and a novel notion of cyclic proof for these hypersequents. In particular, the incorporation of some deep inference at the level of the rules necessitates an 'alternating' trace condition corresponding to *alternation* in automata theory.

Our first main result is the Soundness Theorem (Theorem 23): nonwellfounded proofs of HTC are sound for *standard semantics*. The proof is rather more involved than usual soundness arguments in cyclic proof theory, due to the richer structure of hypersequents and the corresponding progress criterion. Our second main result is the Simulation Theorem (Theorem 28): HTC is complete for PDL over the standard translation, by simulating a cut-free cyclic system for the latter. This result can be seen as a formal interpretation of cyclic modal proof theory within cyclic predicate proof theory, in the spirit of [21,22].

To simplify the exposition, we shall mostly focus on equality-free TCL and 'identity-free' PDL in this paper, though all our results hold also for the 'reflexive' extensions of both logics. We discuss these extensions in Sect. 7, and present further insights and conclusions in Sect. 8. Full proofs and further examples not included here (due to space constraints) can be found in [13].

## **2 Preliminaries**

We shall work with a fixed first-order vocabulary consisting of a countable set Pr of unary *predicate* symbols, written p, q, etc., and of a countable set Rel of binary *relation* symbols, written a, b, etc. We shall generally reserve the word 'predicate' for unary and 'relation' for binary. We could include further relational symbols too, of higher arity, but choose not to in order to calibrate the semantics of both our modal and predicate settings.

We build formulas from this language differently in the modal and predicate settings, but all our formulas may be formally evaluated within *structures*:

**Definition 1 (Structures).** *A* structure M *consists of a set* D*, called the* domain *of* M*, which we sometimes denote by* |M|*; a subset* p<sup>M</sup> ⊆ D *for each* p ∈ Pr*; and a subset* a<sup>M</sup> ⊆ D × D *for each* a ∈ Rel*.*

#### **2.1 Transitive Closure Logic**

In addition to the language introduced at the beginning of this section, in the predicate setting we further make use of a countable set of *function* symbols, written f<sup>i</sup> , g<sup>j</sup> , etc. where the superscripts i, j <sup>∈</sup> <sup>N</sup> indicate the *arity* of the function symbol and may be omitted when it is not ambiguous. Nullary function symbols (aka *constant* symbols), are written c, d etc. We shall also make use of *variables*, written x, y, etc., typically bound by quantifiers. *Terms*, written s, t, etc., are generated as usual from variables and function symbols by function application. A term is *closed* if it has no variables.

We consider the usual syntax for first-order logic formulas over our language, with an additional operator for transitive closure (and its dual). Formally, TCL formulas, written A, B, etc., are generated as follows:

$$\begin{array}{c} A, B ::= p(t) \mid \bar{p}(t) \mid a(s, t) \mid \bar{a}(s, t) \mid (A \land B) \mid (A \lor B) \mid (A \lor B) \mid \forall x A \mid \exists x A \mid \bar{x} \\\ \mid \begin{array}{c} TC(\lambda x, y.A)(s, t) \mid \overline{TC}(\lambda x, y.A)(s, t) \end{array}$$

When variables x, y are clear from context, we may write *TC* (A(x, y))(s, t) or *TC* (A)(s, t) instead of *TC* (λx, y.A)(s, t), as an abuse of notation, and similarly for *TC* . We may write A[t/x] for the formula obtained from A by replacing every free occurrence of the variable x by the term t. We have included both *TC* and *TC* as primitive operators, so that we can reduce negation to atomic formulas, shown below. This will eventually allow a one-sided formulation of proofs.

**Definition 2 (Duality).** *For a formula* A *we define its* complement*,* A¯*, by:*

$$\begin{array}{llll}\overline{p(t)} & \coloneqq \bar{p}(t) & \overline{\bar{p}(t)} & \coloneqq p(t) & \overline{A \wedge B} & \coloneqq \bar{A} \vee \bar{B} & \overline{TC(A)(s,t)} & \coloneqq \overline{TC}(\bar{A})(s,t) \\ \overline{\bar{a}(s,t)} & \coloneqq \bar{a}(s,t) & \overline{\exists x A} & \coloneqq \forall x \bar{A} & \coloneqq \bar{A} \wedge \bar{B} & \overline{TC}(A)(s,t) & \coloneqq TC(\bar{A})(s,t) \end{array}$$

We shall employ standard logical abbreviations, e.g. <sup>A</sup> <sup>⊃</sup> <sup>B</sup> for <sup>A</sup>¯ <sup>∨</sup> <sup>B</sup>.

We may evaluate formulas with respect to a structure, but we need additional data for interpreting function symbols:

**Definition 3 (Interpreting function symbols).** *Let* M *be a structure with domain* D*. An* interpretation *is a map* ρ *that assigns to each function symbol* <sup>f</sup> <sup>n</sup> *a function* <sup>D</sup><sup>n</sup> <sup>→</sup> <sup>D</sup>*. We may extend any interpretation* <sup>ρ</sup> *to an action on (closed) terms by setting recursively* ρ(f(t1,...,tn)) := ρ(f)(ρ(t1),...,ρ(tn))*.*

We only consider *standard* semantics in this work: *TC* (and *TC* ) is always interpreted as the *real* transitive closure (and its dual) in a structure, rather than being axiomatised by some induction (and coinduction) principle.

**Definition 4 (Semantics).** *Given a structure* M *with domain* D *and an interpretation* ρ*, the judgement* M, ρ |= A *is defined as usual for first-order logic with the following additional clauses for TC and TC :*<sup>1</sup>


*Remark 5 (TC and TC as least and greatest fixed points).* As expected, we have <sup>M</sup>, ρ <sup>|</sup><sup>=</sup> *TC* (A)(s, t) just if <sup>M</sup>, ρ <sup>|</sup><sup>=</sup> *TC* (A¯)(s, t), and so the two operators are semantically dual. Thus, *TC* and *TC* duly correspond to least and greatest fixed points, respectively, satisfying in any model:

$$TC(A)(s,t) \iff A(s,t) \lor \exists x (A(s,x) \land TC(A)(x,t))\tag{1}$$

$$
\overline{TC}(A)(s,t) \iff A(s,t) \land \forall x (A(s,x) \lor \overline{TC}(A)(x,t))\tag{2}
$$

Let us point out that our *TC* operator is not the same as Cohen and Rowe's transitive 'co-closure' operator *TC op* in [10], but rather the De Morgan dual of *TC* . In the presence of negation, *TC* and *TC* are indeed interdefinable, cf. Definition 2.

#### **2.2 Cohen-Rowe Cyclic System for TCL**

Cohen and Rowe proposed in [9,11] a non-wellfounded system for TCL that extends a usual sequent calculus LK<sup>=</sup> for first-order logic with equality and substitution by rules for *TC* inspired by its characterisation as a least fixed point, cf. (1).<sup>2</sup> Note that the presence of the substitution rule is critical for the notion of 'regularity' in predicate cyclic proof theory. The resulting notions of non-wellfounded and cyclic proofs are formulated similarly to those for first-order logic with (ordinary) inductive definitions [8]:

**Definition 6 (Sequent system).** TC<sup>G</sup> *is the extension of* LK<sup>=</sup> *by the rules:*

$$\begin{aligned} \;^{TC\_0}\frac{\Gamma, A(s, t)}{\Gamma, \;TC(A)(s, t)} \quad & \frac{\Gamma, A(s, r) \quad \Gamma, \;TC(A)(r, t)}{\Gamma, \;TC(A)(s, t)}\\ \;^{TC}\frac{\Gamma, A(s, t) \quad \Gamma, A(s, c), \; \overline{TC}(A)(c, t)}{\Gamma, \; \overline{TC}(A)(s, t)} \; c \; fresh \end{aligned} \tag{3}$$

TCG*-*preproofs *are possibly infinite trees of sequents generated by the rules of* TCG*. A preproof is* regular *if it has only finitely many distinct sub-preproofs.*

<sup>1</sup> Note that we are including 'parameters from the model' in formulas here. Formally,

this means each <sup>v</sup> <sup>∈</sup> <sup>D</sup> is construed as a constant symbol for which <sup>ρ</sup>(v) = <sup>v</sup>. <sup>2</sup> Cohen and Rowe's system is originally called RTCG, rather using a 'reflexive' version *RTC* of the *TC* operator. However this (and its rules) can be encoded (and simulated) by defining *RTC*(λx, y.A)(s, t) := *TC*(λx, y(x = y ∨ A))(s, t).

The notion of 'correct' non-wellfounded proof is obtained by a standard *progressing criterion* in cyclic proof theory. We shall not go into details here, being beyond the scope of this work, but refer the reader to those original works (as well as [13] for our current variant). Let us write *cyc* for their notion of cyclic provability using the above rules, cf. [9,11]. A standard infinite descent countermodel argument yields:

# **Proposition 7 (Soundness,** [9,11]**).** *If* TC<sup>G</sup> *cyc* A *then* |= A*.*

In fact, this result is subsumed by our main soundness result for HTC (Theorem 23) and its simulation of TC<sup>G</sup> (Theorem 19). In the presence of cut, a form of converse of Proposition 7 holds: cyclic TC<sup>G</sup> proofs are 'Henkin complete', i.e. complete for all models of a particular axiomatisation of TCL based on (co)induction principles for *TC* (and *TC* ) [9,11]. However, the counterexample we present in the next section implies that cut is not eliminable (Corollary 13).

### **3 Interlude: Motivation from PDL and Kleene Algebra**

Given the TCL sequent system proposed by Cohen and Rowe, why do we propose a hypersequential system? Our main argument is that proof search in TC<sup>G</sup> is rather weak, to the extent that cut-free cyclic proofs are unable to simulate a basic (cut-free) system for modal logic PDL (regardless of proof search strategy). At least one motivation here is to 'lift' the *standard translation* from cut-free cyclic proofs for PDL to cut-free cyclic proofs in an adequate system for TCL.

#### **3.1 Identity-Free PDL**

*Identity-free propositional dynamic logic* (PDL<sup>+</sup>) is a version of the modal logic PDL without tests or identity, thereby admitting an 'equality-free' standard translation into predicate logic. Formally, PDL<sup>+</sup> *formulas*, written A, B, etc., and *programs*, written α, β, etc., are generated by the following grammars:

$$\begin{aligned} A, B &::= p \mid \overline{p} \mid (A \wedge B) \mid (A \vee B) \mid [\alpha]A \mid \langle \alpha \rangle A \\ \alpha, \beta &::= a \mid (\alpha; \beta) \mid (\alpha \cup \beta) \mid \alpha^{+} \end{aligned}$$

We sometimes simply write αβ instead of α; β, and (α)A for a formula that is either αA or [α]A.

**Definition 8 (Duality).** *For a formula* A *we define its* complement*,* A¯*, by:*

$$\bar{p} \;:= \; p \qquad \frac{\overline{A \wedge B}}{\overline{A \vee B}} \; := \bar{A} \lor \bar{B} \qquad \frac{\overline{[\alpha]A}}{\langle \alpha \rangle A} \; := \langle \alpha \rangle \bar{A}$$

We *evaluate* PDL<sup>+</sup> formulas using the traditional relational semantics of modal logic, by associating each program with a binary relation in a structure. Again, we only consider standard semantics, in the sense that the + operator is interpreted as the real transitive closure within a structure.

**Definition 9 (Semantics).** *For structures* M *with domain* D*, elements* v ∈ D*, programs* α *and formulas* A*, we define* α<sup>M</sup> ⊆ D × D *and the judgement* M, v |= A *as follows:*

*– (*a<sup>M</sup> *is already given in the specification of* M*, cf. Definition 1). –* (α; β)<sup>M</sup> := {(u, v) : *there is* w ∈ D *s.t.* (u, w) ∈ α<sup>M</sup> *and* (w, v) ∈ βM}*. –* (α ∪ β)<sup>M</sup> := {(u, v):(u, v) ∈ α<sup>M</sup> *or* (u, v) ∈ βM}*. –* (α<sup>+</sup>)<sup>M</sup> := {(u, v) : *there are* <sup>w</sup>0,...,wn+1 <sup>∈</sup> <sup>D</sup> *s.t.* <sup>u</sup> <sup>=</sup> <sup>w</sup>0, v <sup>=</sup> wn+1 *and, for every* i ≤ n,(wi, wi+1) ∈ αM}*.*

*–* M, v |= p *if* v ∈ pM*. –* M, v |= p *if* v /∈ pM*. –* M, v |= A ∧ B *if* M, v |= A *and* M, v |= B*. –* M, v |= A ∨ B *if* M, v |= A *or* M, v |= B*. –* M, v |= [α]A *if* ∀ (v, w) ∈ α<sup>M</sup> *we have* M, w |= A*. –* M, v |= αA *if* ∃ (v, w) ∈ α<sup>M</sup> *with* M, w |= A*.*

*If* M, v |= A *for all* M *and* v ∈ |M|*, then we write* |= A*.*

Note that we are overloading the satisfaction symbol <sup>|</sup>= here, for both PDL<sup>+</sup> and TCL. This should never cause confusion, in particular since the two notions of satisfaction are 'compatible' as we shall now see.

#### **3.2 The Standard Translation**

The so-called 'standard translation' of modal logic into predicate logic is induced by reading the semantics of modal logic as first-order formulas. We now give a natural extension of this that interprets PDL<sup>+</sup> into TCL. At the logical level our translation coincides with the usual one for basic modal logic; our translation of programs, as expected, requires the *TC* operator to interpret the + of PDL<sup>+</sup>.

**Definition 10.** *For* PDL<sup>+</sup> *formulas* A *and programs* α*, we define the* standard translations ST(A)(x) *and* ST(α)(x, y) *as* TCL*-formulas with free variables* x *and* x, y*, resp., inductively as follows:*

ST(p)(x) := p(x) ST(a)(x, y) := a(x, y) ST(¯p)(x) := ¯p(x) ST(α ∪ β)(x, y) := ST(α)(x, y) ∨ ST(β)(x, y) ST(A ∨ B)(x) := ST(A)(x) ∨ ST(B)(x) ST(α; β)(x, y) := ∃z(ST(α)(x, z) ∧ ST(β)(z,y)) ST(<sup>A</sup> <sup>∧</sup> <sup>B</sup>)(x) := ST(A)(x) <sup>∧</sup> ST(B)(x) ST(α<sup>+</sup>)(x, y) := *TC* (ST(α))(x, y) ST(αA)(x) := ∃y(ST(α)(x, y) ∧ ST(A)(y)) ST([α]A)(x) := ∀y(ST(α)(x, y) ∨ ST(A)(y))

*where TC* (ST(α)) *is shorthand for TC* (λx, y.ST(α)(x, y))*.*

It is routine to show that ST(A)(x) = ST(A¯)(x), by structural induction on A, justifying our overloading of the notation A¯, in both TCL and PDL<sup>+</sup>. Yet another advantage of using the same underlying language for both the modal and predicate settings is that we can state the following (expected) result without the need for encodings, following by a routine structural induction (see, e.g., [5]):

**Theorem 11.** *For* PDL<sup>+</sup> *formulas* <sup>A</sup>*, we have* <sup>M</sup>, v <sup>|</sup><sup>=</sup> <sup>A</sup> *iff* M |<sup>=</sup> ST(A)(v)*.*

#### **3.3 Cohen-Rowe System is not Complete for PDL<sup>+</sup>**

PDL<sup>+</sup> admits a standard cut-free cyclic proof system LPD<sup>+</sup> (see Sect. 6.1) which is both sound and complete (cf. Theorem 30). However, a shortfall of TC<sup>G</sup> is that it is unable to cut-free simulate LPD<sup>+</sup>. In fact, we can say something stronger:

**Theorem 12 (Incompleteness).** *There exist a* PDL<sup>+</sup> *formula* A *such that* |= A *but* TC<sup>G</sup> *cyc* ST(A)(x) *(in the absence of cut).*

This means not only that TC<sup>G</sup> is unable to locally cut-free simulate the rules of LPD<sup>+</sup>, but also that there are some validities for which there are no cut-free cyclic proofs at all in TCG. One example of such a formula is:

$$\langle (aa \cup aba)^{+} \rangle p \supset \langle a^{+} ((ba^{+})^{+} \cup a) \rangle p \tag{4}$$

A detailed proof of this is found in [13], but let us briefly discuss it here. First, the formula above is not artificial: it is derived from the well-known PDL validity (a ∪ b)<sup>∗</sup>p ⊃ a∗(ba∗)<sup>∗</sup>p by identity-elimination. This in turn is essentially a theorem of relational algebra, namely (a ∪ b)<sup>∗</sup> ≤ a∗(ba∗)∗, which is often used to eliminate ∪ in (sums of) regular expressions. The same equation was (one of those) used by Das and Pous in [14] to show that the sequent system LKA for Kleene Algebra is cut-free cyclic incomplete.

The argument that TC<sup>G</sup> *cyc* ST(4)(x) is much more involved than the one from [14], due to the fact we are working in predicate logic, but the underlying basic idea is similar. At a very high level, the RHS of (4) (viewed as a relational inequality) is translated to an existential formula <sup>∃</sup>z(ST(a<sup>+</sup>)(x, z)∧ST((ba<sup>+</sup>)<sup>+</sup> <sup>∪</sup> a)(z,y) that, along some branch (namely the one that always chooses aa when decomposing the LHS of (4)) can never be instantiated while remaining valid. This branch witnesses the non-regularity of any proof. However ST(4)(x) is cyclically provable in TC<sup>G</sup> with cut, so an immediate consequence of Theorem 12 is:

**Corollary 13.** *The class of cyclic proofs of* TC<sup>G</sup> *does not enjoy cutadmissibility.*

#### **4 Hypersequent Calculus for TCL**

Let us take a moment to examine why any 'local' simulation of LPD<sup>+</sup> by TC<sup>G</sup> fails, in order to motivate the main system that we shall present. The program rules, in particular the -rules, require a form of *deep inference* to be correctly simulated, over the standard translation. For instance, let us consider the action of the standard translation on two rules we shall see later in LPD<sup>+</sup> (cf. Sect. 6.1):

$$\begin{array}{c} \Gamma, \langle a\_0 \rangle p \\ \langle \cdot \rangle \frac{\Gamma, \langle a\_0 \rangle p}{\Gamma, \langle a\_0 \cup a\_1 \rangle p} & \leadsto \\ \langle \cdot \rangle \frac{\Gamma, \langle a \rangle \langle b \rangle p}{\Gamma, \langle a \cdot b \rangle p} & \leadsto \end{array} \quad \begin{array}{c} \mathsf{ST}(\varGamma)(c), \exists x (a\_0(c, x) \wedge p(x)) \\ \hline \mathsf{ST}(\varGamma)(c), \exists x ((a\_0(c, x) \vee a\_1(c, x)) \wedge p(x)) \end{array}$$

$$\begin{array}{c} \Gamma, \langle a \rangle \langle b \rangle p \\ \hline \Gamma, \langle a \cdot b \rangle p \end{array} \quad \begin{array}{c} \mathsf{ST}(\varGamma)(c), \exists y (a(c, y) \wedge \exists x (b(y, x) \wedge p(x))) \\ \hline \mathsf{ST}(\varGamma)(c), \exists x (\exists y (a(c, y) \wedge b(y, x)) \wedge p(x)) \end{array}$$

**Fig. 1.** Hypersequent calculus HTC. σ is a 'substitution' map from constants to terms and a renaming of other function symbols and variables.

The first case above suggests that any system to which the standard translation lifts must be able to reason *underneath* ∃ *and* ∧, so that the inference indicated in blue is 'accessible' to the prover. The second case above suggests that the existential-conjunctive meta-structure necessitated by the first case should admit basic equivalences, in particular certain *prenexing*. This section is devoted to the incorporation of these ideas (and necessities) into a bona fide proof system.

#### **4.1 A System for Predicate Logic via Annotated Hypersequents**

An *annotated cedent*, or simply *cedent*, written S, S etc., is an expression {Γ}**<sup>x</sup>**, where Γ is a set of formulas and the *annotation* **x** is a set of variables. We sometimes construe annotations as lists rather than sets when it is convenient, e.g. when taking them as inputs to a function.

Each cedent may be intuitively read as a TCL formula, under the following interpretation: *fm*({Γ}<sup>x</sup>1,...,x*<sup>n</sup>* ) := <sup>∃</sup>x<sup>1</sup> ... <sup>∃</sup>x<sup>n</sup> - Γ. When **x** = ∅ then there are no existential quantifiers above, and when Γ = ∅ we simply identify - Γ with . We also sometimes write simply <sup>A</sup> for the annotated cedent {A}<sup>∅</sup>.

A *hypersequent*, written **S**, **S** etc., is a set of annotated cedents. Each hypersequent may be intuitively read as the disjunction of its cedents. Namely we set: *fm*({Γ1}**<sup>x</sup>**<sup>1</sup> ,..., {Γ<sup>n</sup>}**<sup>x</sup>***<sup>n</sup>* ) := *fm*({Γ1}**<sup>x</sup>**<sup>1</sup> ) <sup>∨</sup> ... <sup>∨</sup> *fm*({Γ<sup>n</sup>}**<sup>x</sup>***<sup>n</sup>* ).

**Definition 14 (System).** *The rules of* HTC *are given in Fig. 1. A* HTC preproof *is a (possibly infinite) derivation tree generated by the rules of* HTC*. A preproof is* regular *if it has only finitely many distinct subproofs.*

Our hypersequential system is somewhat more refined than usual sequent systems for predicate logic. E.g., the usual ∃ rule is decomposed into ∃ and inst, whereas the usual ∧ rule is decomposed into ∧ and ∪. The rules for *TC* and *TC* are induced directly from their characterisations as fixed points in (1).

Note that the rules *TC* and ∀ introduce, bottom-up, the fresh function symbol f, which plays the role of the *Herbrand function* of the corresponding ∀ quantifier: just as ∀**x**∃xA(x) is equisatisfiable with ∀**x**A(f(**x**)), when f is fresh, by Skolemisation, by duality ∃**x**∀xA(x) is equivalid with ∃**x**A(f(**x**)), when f is fresh, by Herbrandisation. The usual ∀ rule of the sequent calculus corresponds to the case when **x** = ∅.

#### **4.2 Non-wellfounded Hypersequent Proofs**

Our notion of ancestry, as compared to traditional sequent systems, must account for the richer structure of hypersequents:

**Definition 15 (Ancestry).** *Fix an inference step* r*, as typeset in Fig. 1. A formula* C *in the premiss is an* immediate ancestor *of a formula* C *in the conclusion if they have the same colour; if* C, C ∈ Γ *then we further require* C = C *, and if* C, C *occur in* **S** *then* C = C *occur in the same cedent. A cedent* S *in the premiss is an immediate ancestor of a cedent* S *in the conclusion if some formula in* S *is an* immediate ancestor *of some formula in* S *.*

Immediate ancestry on both formulas and cedents is a binary relation, inducing a directed graph whose paths form the basis of our correctness condition:

**Definition 16 ((Hyper)traces).** *A* hypertrace *is a maximal path in the graph of immediate ancestry on cedents. A* trace *is a maximal path in the graph of immediate ancestry on formulas.*

**Definition 17 (Progress and proofs).** *Fix a preproof* D*. A (infinite) trace* (Fi)<sup>i</sup>∈<sup>ω</sup> *is* progressing *if there is* k *such that, for all* i>k*,* F<sup>i</sup> *has the form TC* (A)(si, ti) *and is infinitely often principal.*<sup>3</sup> *A (infinite) hypertrace* <sup>H</sup> *is* progressing *if every infinite trace within it is progressing. A (infinite) branch is progressing if it has a progressing hypertrace.* D *is a* proof *if every infinite branch is progressing. If, furthermore,* D *is regular, we call it a* cyclic proof*.*

*We write* HTC *nwf* **S** *(or* HTC *cyc* **S***) if there is a proof (or cyclic proof, respectively) of* HTC *of the hypersequent* **S***.*

In usual cyclic systems, checking that a regular preproof is progressing is decidable by straightforward reduction to the universality of nondeterministic ω-automata, with runs 'guessing' a progressing trace along an infinite branch. Our notion of progress exhibits an extra quantifier alternation: we must *guess* an infinite hypertrace in which *every* trace is progressing. Nonetheless, by appealing to determinisation or alternation, we can still decide our progressing condition:

**Proposition 18.** *Checking whether a* HTC *preproof is a proof is decidable by reduction to universality of* ω*-regular languages.*

<sup>3</sup> In fact, by a simple well-foundedness argument, it is equivalent to say that (Fi)i<ω is progressing if it is infinitely often principal for a *TC*-formula.

As we mentioned earlier, cyclic proofs of HTC indeed are at least as expressive as those of Cohen and Rowe's system by a routine local simulation of rules:

# **Theorem 19 (Simulating Cohen-Rowe).** *If* TC<sup>G</sup> *cyc* A *then* HTC *cyc* A*.*

#### **4.3 Some Examples**

*Example 20 (Fixed point identity).* The sequent {*TC* (a)(c, d)}∅, {*TC* (¯a)(c, d)}<sup>∅</sup> is finitely derivable using rule id on *TC* (a)(c, d) and the init rule. However we can also cyclically reduce it to a simpler instance of id. Due to the granularity of the inference rules of HTC, we actually have some liberty in how we implement such a derivation. E.g., the HTC-proof below applies *TC* rules below *TC* ones, and delays branching until the 'end' of proof search, which is impossible in TCG. The only infinite branch, looping on •, is progressing by the blue hypertrace.


This is an example of the more general 'rule permutations' available in HTC, hinting at a more flexible proof theory (we discuss this further in Sect. 8).

*Example 21 (Transitivity). TC* can be proved transitive by way of a cyclic proof in TC<sup>G</sup> of the sequent *TC* (a)(c, d), *TC* (a)(d, e), *TC* (¯a)(c, e). As in the previous example we may mimic that proof line by line, but we give a slightly different one that cannot directly be interpreted as a TC<sup>G</sup> proof:


The only infinite branch (except for that from Example 20), looping on ◦, is progressing by the red hypertrace.

Finally, it is pertinent to revisit the 'counterexample' (4) that witnessed incompleteness of TC<sup>G</sup> for PDL<sup>+</sup>. The following result is, in fact, already implied by our later completeness result, Theorem 28, but we shall present it nonetheless:

**Proposition 22.** HTC *cyc* ST((aa <sup>∪</sup> aba)<sup>+</sup>)(c, d) <sup>⊃</sup> ST(a<sup>+</sup>((ba<sup>+</sup>)<sup>+</sup> <sup>∪</sup> <sup>a</sup>))(c, d)*.*

*Proof.* We give the required cyclic proof in Fig. 2, using the abbreviations: <sup>α</sup>(c, d) = ST(aa∪aba)(c, d) and <sup>β</sup>(c, d) = ST((ba<sup>+</sup>)<sup>+</sup> <sup>∪</sup>a)(c, d). The only infinite branch (looping on •) has progressing hypertrace is marked in blue. Hypersequents **<sup>R</sup>** <sup>=</sup> {α(c, d)}∅, {α(c, d), *TC* (α)(e, d)}∅, {*TC* (a)(c, y), β(y, d)}<sup>y</sup> and **<sup>R</sup>** <sup>=</sup> {α(c, d)}∅, {α(c, d)}∅, {*TC* (a)(c, y), β(y, d)}<sup>y</sup> have finitary proofs, while **<sup>P</sup>** <sup>=</sup> {aba(c, e)}∅, {*TC* (α)(e, d)}∅, {*TC* (a)(c, y), β(y, d)}<sup>y</sup> has a cyclic proof.

**Fig. 2.** Cyclic proof for sequent not cyclically provable by TCG.

# **5 Soundness of HTC**

This section is devoted to the proof of the first of our main results:

**Theorem 23 (Soundness).** *If* HTC *nwf* **S** *then* |= **S***.*

The argument is quite technical due to the alternating nature of our progress condition. In particular the treatment of traces within hypertraces requires a more fine grained argument than usual, bespoke to our hypersequential structure.

Throughout this section, we shall fix a HTC preproof D of a hypersequent **S**. For practical reasons we shall assume that D is substitution-free (at the cost of regularity) and that each quantifier in **S** binds a distinct variable.<sup>4</sup> We further assume some structure M<sup>×</sup> and an interpretation ρ<sup>0</sup> such that ρ<sup>0</sup> |= **S** (within M<sup>×</sup>). Since each rule is locally sound, by contraposition we can continually choose 'false premisses' to construct an infinite 'false branch':

**Lemma 24 (Countermodel branch).** *There is a branch* B<sup>×</sup> = (**S**i)i<ω *of* D *and an interpretation* ρ<sup>×</sup> *such that, with respect to* M<sup>×</sup>*:*

<sup>4</sup> Note that this convention means we can simply take <sup>y</sup> <sup>=</sup> <sup>x</sup> in the <sup>∃</sup> rule in Fig. 1.


Unpacking this a little, our interpretation ρ<sup>×</sup> is actually defined as the limit of a chain of 'partial' interpretations (ρi)i<ω, with each ρ<sup>i</sup> |= **S**<sup>i</sup> (within M×). Note in particular that, by 2, whenever some *TC* -formula is principal, we choose ρ<sup>i</sup>+1 to always assign to it a falsifying path of minimal length (if one exists at all), with respect to the assignment to variables in its annotation. It is crucial at this point that our definition of ρ<sup>×</sup> is parametrised by such assignments.

Let us now fix B<sup>×</sup> and ρ<sup>×</sup> as provided by the Lemma above. Moreover, let us henceforth assume that D is a proof, i.e. it is progressing, and fix a progressing hypertrace <sup>H</sup> = ({Γ<sup>i</sup>}**<sup>x</sup>***<sup>i</sup>* )i<ω along <sup>B</sup><sup>×</sup>. In order to carry out an infinite descent argument, we will need to define a particular trace along this hypertrace that 'preserves' falsity, bottom-up. This is delicate since the truth values of formulas in a trace depend on the assignment of elements to variables in the annotations. A particular issue here is the instantiation rule inst, which requires us to 'revise' whatever assignment of y we may have defined until that point. Thankfully, our earlier convention on substitution-freeness and uniqueness of bound variables in D facilitates the convergence of this process to a canonical such assignment:

**Definition 25 (Assignment).** *We define* δ<sup>H</sup> : i<ω **x**<sup>i</sup> → |M<sup>×</sup>| *by* δH(x) := ρ(t) *if* x *is instantiated by* t *in* H*; otherwise* δH(x) *is some arbitrary* d ∈ |M<sup>×</sup>|*.*

Note that δ<sup>H</sup> is indeed well-defined, thanks to the convention that each quantifier in **S** binds a distinct variable. In particular we have that each variable x is instantiated at most once along a hypertrace. Henceforth we shall simply write ρ, δ<sup>H</sup> |= A(**x**) instead of ρ |= A(δH(**x**)). Working with such an assignment ensures that false formulas along H always have a false immediate ancestor:

**Lemma 26 (Falsity through** H**).** *If* ρ×, δ<sup>H</sup> |= F *for some* F ∈ Γi*, then* F *has an immediate ancestor* F ∈ Γ<sup>i</sup>+1 *with* ρ×, δ<sup>H</sup> |= F *.*

In particular, regarding the inst rule of Fig. 1, note that if F ∈ Γ(y) then we can choose F = F[t/y] which, by definition of δH, has the same truth value. By repeatedly applying this Lemma we obtain:

**Proposition 27 (False trace).** *There exists an infinite trace* τ <sup>×</sup> = (Fi)i<ω *through* H *such that, for all* i*, it holds that* M<sup>×</sup>, ρ×, δ<sup>H</sup> |= Fi*.*

We are now ready to prove our main soundness result.

*Proof (of Theorem 23, sketch)*. Fix the infinite trace τ <sup>×</sup> = (Fi)i<ω through H obtained by Proposition 27. Since τ <sup>×</sup> is infinite, by definition of HTC proofs, it

<sup>5</sup> To be clear, we here choose an arbitrary such minimal 'A¯-path'.

needs to be progressing, i.e., it is infinitely often *TC* -principal and there is some <sup>k</sup> <sup>∈</sup> <sup>N</sup> s.t. for i>k we have that <sup>F</sup><sup>i</sup> <sup>=</sup> *TC* (A)(si, ti) for some terms <sup>s</sup>i, ti.

To each Fi, for i>k, we associate the natural number n<sup>i</sup> measuring the 'A¯-distance between <sup>s</sup><sup>i</sup> and <sup>t</sup>i'. Formally, <sup>n</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> is least such that there are d0,...,dn*<sup>i</sup>* ∈ |M×| with ρ×(s) = d0, ρ×(t) = dn*<sup>i</sup>* and, for all i<ni, <sup>ρ</sup>×, δ<sup>H</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>¯(di, di+1). Our aim is to show that (ni)i>k has no minimal element, contradicting wellfoundness of N. For this, we establish the following two local properties:

**Fig. 3.** Rules of LPD<sup>+</sup>.


So (ni)i>k is monotone decreasing, by 1, but cannot converge, by 2 and the definition of progressing trace. Thus (ni)k<i has no minimal element, yielding the required contradiction.

# **6 HTC is Complete for PDL<sup>+</sup>, Over Standard Translation**

In this section we give our next main result:

**Theorem 28 (Completeness for** PDL<sup>+</sup>**).** *For a* PDL<sup>+</sup> *formula* <sup>A</sup>*, if* <sup>|</sup><sup>=</sup> <sup>A</sup> *then* HTC *cyc* ST(A)(c)*.*

The proof is by a direct simulation of a cut-free cyclic system for PDL<sup>+</sup> that is complete. We shall briefly sketch this system below.

#### **6.1 Circular System for PDL<sup>+</sup>**

The system LPD<sup>+</sup>, given in Fig. 3, is the natural extension of the usual sequent calculus for basic multimodal logic K by rules for programs. In Fig. 3, aΓ is shorthand for {aB : B ∈ Γ}. (Regular) preproofs for this system are defined just like for HTC or TCG. The notion of 'immediate ancestor' is induced by the indicated colouring: a formula C in a premiss is an immediate ancestor of a formula C in the conclusion if they have the same colour; if C, C ∈ Γ then we furthermore require C = C .

**Definition 29 (Non-wellfounded proofs).** *Fix a preproof* D *of a sequent* Γ*. A* thread *is a maximal path in its graph of immediate ancestry. We say a thread is* progressing *if it has a smallest infinitely often principal formula of the form* [α<sup>+</sup>]A*.* <sup>D</sup> *is a* proof *if every infinite branch has a progressing thread. If* <sup>D</sup> *is regular, we call it a* cyclic proof *and we may write* LPD<sup>+</sup> *cyc* <sup>Γ</sup>*.*

Soundness of cyclic-LPD<sup>+</sup> is established by a standard infinite descent argument, but is also implied by the soundness of cyclic-HTC (Theorem 23) and the simulation we are about to give (Theorem 28), though this is somewhat overkill. Completeness may be established by the game theoretic approach of Niwinsk´ı and Walukiewicz [23], as done by Lange [20] for PDL (with identity), or by purely proof theoretic techniques of Studer [25]. Either way, both results follow from a standard embedding of PDL<sup>+</sup> into the μ-calculus and its known completeness results [23,25], by way of a standard 'proof reflection' argument: μ-calculus proofs of the embedding are 'just' step-wise embeddings of LPD<sup>+</sup> proofs:

**Theorem 30 (Soundness and completeness,** [20]**).** *Let* A *be a* PDL<sup>+</sup> *formula.* <sup>|</sup><sup>=</sup> <sup>A</sup> *iff* LPD<sup>+</sup> *cyc* <sup>A</sup>*.*

#### **6.2 A 'Local' Simulation of LPD<sup>+</sup> by HTC**

In this subsection we show that LPD<sup>+</sup>-preproofs can be stepwise transformed into HTC-proofs, with respect to the standard translation. In order to produce this local simulation, we need a more refined version of the standard translation that incorporates the structural elements of hypersequents.

Fix a PDL<sup>+</sup> formula <sup>A</sup> = [α1] ... [αn]β1...β<sup>m</sup>B, for n, m <sup>≥</sup> 0. The *hypersequent translation* of A, written HT(A)(c), is defined as:

$$\begin{aligned} \{\overline{\mathfrak{ST}(\alpha\_1)(c,d\_1)}\}^{\mathcal{B}}, \{\overline{\mathfrak{ST}(\alpha\_2)(d\_1,d\_2)}\}^{\mathcal{B}}, \dots, \{\overline{\mathfrak{ST}(\alpha\_n)(d\_{n-1},d\_n)}\}^{\mathcal{B}},\\ \{\mathfrak{ST}(\beta\_1)(d\_n,y\_1), \mathfrak{ST}(\beta\_2)(y\_2,y\_3), \dots, \mathfrak{ST}(\beta\_m)(y\_{m-1},y\_m), \mathfrak{ST}(B)(y\_m)\}^{y\_1,\dots,y\_m} \end{aligned}$$

For Γ = A1,...,Ak, we write HT(Γ)(c) := HT(A1)(c),..., HT(Ak)(c).

**Definition 31 (**HT**-translation).** *Let* <sup>D</sup> *be a* PDL<sup>+</sup> *preproof. We shall define a* HTC *preproof* HT(D)(c) *of the hypersequent* HT(A)(c) *by a local translation of inference steps. We give only a few of the important cases here, but a full definition can be found in [13].*

$$\begin{array}{ll} \mathsf{T} - A \ \mathsf{step} \; ^{\mathsf{x}\_{a}} \; \frac{B\_{1}, \ldots, B\_{k}, A}{\mathsf{(a)}B\_{1}, \ldots, \mathsf{(a)}B\_{k}, [\mathsf{d}]A} \ \mathsf{is} \; \mathsf{Translated} \; \mathsf{of} \; \mathsf{c} \\\\ \frac{\mathsf{HT}(B\_{1})(\mathsf{c}), \ldots, \mathsf{HT}(B\_{k})(\mathsf{c}), \mathsf{HT}(A)(\mathsf{c})}{\mathsf{HT}(B\_{1})(\mathsf{d}), \ldots, \mathsf{HT}(B\_{k})(\mathsf{d}), \mathsf{HT}(A)(\mathsf{d})} \\\\ \frac{\mathsf{w} \mathsf{l}}{\mathsf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf{\bf$$

*where (omitted) left-premisses of* ∪ *steps are simply proved by* wk, id, init*. In this and the following cases, we use the notation* CT(A)(c) *and* **x**<sup>A</sup> *for the appropriate sets of formulas and variables forced by the definition of* HT *(again, see [13] for further details).*

*– A* ∪<sup>i</sup> *step (for* i = 0, 1*), as typeset in Fig. 3, is translated to:*


*– A* ; *step, as typeset in Fig. 3, is translated to:*


*– A* [+] *step, as typeset in Fig. 3, is translated to:*

$$\begin{split} \ll & \xrightarrow{\mathsf{HT}(\Gamma)(c),\mathsf{HT}([\alpha][\alpha^{\bot}]A)(c)}\\ \ll & \xrightarrow{\mathsf{LT}^{\prime}} \frac{\mathcal{E}^{\prime}}{\mathsf{HT}(\Gamma)(c),\{\mathsf{ST}(\alpha)(c,f)\}^{\mathcal{B}},\{\mathsf{TC}(\overline{\mathsf{ST}(\alpha)})(f,d)\}^{\mathcal{B}},\mathsf{HT}(A)(d)}\\ \ll & \xrightarrow{\mathsf{LT}^{\prime}} \frac{\mathsf{ET}(\Gamma)(c),\{\mathsf{ST}(\alpha)(c,f)\}^{\mathcal{B}},\{\mathsf{ST}(\alpha)(c,d),\overline{\mathsf{TC}(\overline{\mathsf{ST}(\alpha)})(f,d)\}^{\mathcal{B}},\mathsf{HT}(A)(d)}{\mathsf{HT}(\Gamma)(c),\{\overline{\mathsf{ST}(\alpha)(c,d)},\overline{\mathsf{ST}(\alpha)(c,f)}\}^{\mathcal{B}},\{\overline{\mathsf{ST}(\alpha)(c,d)},\overline{\mathsf{TC}(\overline{\mathsf{ST}(\alpha)})(f,d)\}^{\mathcal{B}},\mathsf{HT}(A)(d)}\\ &= \frac{\mathsf{HT}(\Gamma)(c),\{\overline{\mathsf{TC}(\overline{\mathsf{ST}(\alpha)})(c,d)\}^{\mathcal{B}},\mathsf{HT}(A)(d)}{\mathsf{HT}(\Gamma)(c),\mathsf{HT}([\alpha^{\bot}]A)(c)}.\end{split}$$

*where* E *and* E *derive* HT(Γ)(c) *and* HT([α]A)(c)*, resp., using* wk*-steps.*

Note that, formally speaking, the well-definedness of HT(D)(c) in the definition above is guaranteed by coinduction: each rule of D is translated into a (nonempty) derivation.

*Remark 32 (Deeper inference).* Observe that HTC can also simulate 'deeper' program rules than are available in LPD<sup>+</sup>. E.g. a rule Γ, αβ*i*<sup>A</sup> Γ, αβ<sup>0</sup> <sup>∪</sup> <sup>β</sup>1<sup>A</sup> may be simulated too (similarly for [ ]). E.g. a<sup>+</sup>b<sup>p</sup> ⊃ a<sup>+</sup><sup>b</sup> <sup>∪</sup> <sup>c</sup><sup>p</sup> admits a *finite* proof in HTC (under ST), rather than a necessarily infinite (but cyclic) one in LPD<sup>+</sup>.

#### **6.3 Justifying Regularity and Progress**

**Proposition 33.** *If* D *is regular, then so is* HT(D)(c)*.*

*Proof.* Notice that each rule in D is translated to a finite derivation in HT(D)(c). Thus, if D has only finitely many distinct subproofs, then also HT(D)(c) has only finitely many distinct subproofs.

# **Proposition 34.** *If* D *is progressing, then so is* HT(D)(c)*.*

*Proof (sketch).* We need to show that every infinite branch of HT(D)(c) has a progressing hypertrace. Since the HT translation is defined stepwise on the individual steps of D, we can associate to each infinite branch B of HT(D)(c) a unique infinite branch B of D. Since D is progressing, let τ = (Fi)i<ω be a progressing thread along B . By inspecting the rules of LPD<sup>+</sup> (and by definition of progressing thread), for some <sup>k</sup> <sup>∈</sup> <sup>N</sup>, each <sup>F</sup><sup>i</sup> for i>k has the form: [αi,1] ··· [αi,n*<sup>i</sup>* ][α<sup>+</sup>]A, for some <sup>n</sup><sup>i</sup> <sup>≥</sup> 0. So, for i>k, HT(Fi)(di) has the form:

$$\{\overline{\mathsf{ST}(\alpha\_{i,1})(c,d\_{i,1})}\}^{\mathcal{O}}, \dots, \{\overline{\mathsf{ST}(\alpha\_{i,n\_i})(d\_{i,n\_i-1},d\_{i,n\_i})}\}^{\mathcal{O}}, \{\overline{TC(\mathsf{ST}(\alpha))(d\_{i,n\_i},d\_i)}\}^{\mathcal{O}}, \mathsf{HT}(A)(d\_i)\}$$

By inspection of the HT-translation (Definition 31) whenever F<sup>i</sup>+1 is an immediate ancestor of F<sup>i</sup> in B , there is a path from the cedent {*TC* (ST(α))(d<sup>i</sup>+1,n*i*+1 , d<sup>i</sup>+1)}<sup>∅</sup> to the cedent {*TC* (ST(α))(di,n*<sup>i</sup>* , di)}<sup>∅</sup> in the graph of immediate ancestry along B. Thus, since τ = (Fi)i<ω is a trace along B , we have a (infinite) hypertrace of the form H<sup>τ</sup> := ({Δi, *TC* (ST(α))(di,n*<sup>i</sup>* , di)}<sup>∅</sup>)i>k along <sup>B</sup>. By construction <sup>Δ</sup><sup>i</sup> <sup>=</sup> <sup>∅</sup> for infinitely many i>k , and so H<sup>τ</sup> has just one infinite trace. Moreover, by inspection of the [+] step in Definition 31, this trace progresses in B every time τ does in B , and so progresses infinitely often. Thus, H is a progressing hypertrace. Since the choice of the branch B of D was arbitrary, we are done.

#### **6.4 Putting it all Together**

We can now finally conclude our main simulation theorem:

*Proof (of Theorem 28, sketch).* Let <sup>A</sup> be a PDL<sup>+</sup> formula s.t. <sup>|</sup><sup>=</sup> <sup>A</sup>. By the completeness result for LPD<sup>+</sup>, Theorem 30, we have that LPD<sup>+</sup> *cyc* <sup>A</sup>, say by a cyclic proof D. From here we construct the HTC preproof HT(D)(c) which, by Propositions 33 and 34, is in fact a cyclic proof of HT(A)(c). Finally, we apply some basic ∨,∧, ∃, ∀ steps to obtain a cyclic HTC proof of ST(A)(c).

## **7 Extension by Equality and Simulating Full PDL**

We now briefly explain how our main results are extended to the 'reflexive' version of TCL. The language of HTC<sup>=</sup> allows further atomic formulas of the form s = t and s = t. The calculus HTC<sup>=</sup> extends HTC by the rules:

$$-\frac{\mathbf{S}, \{\Gamma\}^\mathbf{x}}{\mathbf{S}, \{t = t, \Gamma\}^\mathbf{x}} \qquad \neq \frac{\mathbf{S}, \{\Gamma(s), \Delta(s)\}^\mathbf{x}}{\mathbf{S}, \{\Gamma(s), s \neq t\}^\mathbf{x}, \{\Delta(t)\}^\mathbf{x}}$$

The notion of immediate ancestry is colour-coded as in Definition 15, and the resulting notions of (pre)proof, (hyper)trace and progress are as in Definition 17. The simulation of Cohen and Rowe's system TC<sup>G</sup> extends to their reflexive system, RTCG, by defining their operator *RTC* (λx, y.A)(s, t) := *TC* (λx, y.(x = y ∨ A))(s, t). Note that, while it is semantically correct to set *RTC* (A)(s, t) to be s = t∨*TC* (A)(s, t), this encoding does not lift to the Cohen-Rowe rules for *RTC* . Understanding that structures interpret = as true equality, a modular adaptation of the soundness argument for HTC, cf. Sect. 5, yields:

# **Theorem 35 (Soundness of** HTC=**).** *If* HTC<sup>=</sup> *nwf* **S** *then* |= **S***.*

Turning to the modal setting, PDL may be defined as the extension of PDL<sup>+</sup> by including a program A? for each formula A. Semantically, we have (A?)<sup>M</sup> = {(v, v) : <sup>M</sup>, v <sup>|</sup><sup>=</sup> <sup>A</sup>}. From here we may define <sup>ε</sup> := ? and <sup>α</sup><sup>∗</sup> := (ε∪α)<sup>+</sup>; again, while it is semantically correct to set <sup>α</sup><sup>∗</sup> <sup>=</sup> <sup>ε</sup> <sup>∪</sup> <sup>α</sup><sup>+</sup>, this encoding does not lift to the standard sequent rules for <sup>∗</sup>. The system LPD is obtained from LPD<sup>+</sup> by including the rules:

$$\left( \begin{matrix} \Gamma, A & \Gamma, B \\ \hline \Gamma, \langle A? \rangle B \end{matrix} \right) \quad \left[ \begin{matrix} \Gamma, \bar{A}, B \\ \hline \Gamma, [A?]B \end{matrix} \right]$$

Again, the notion of immediate ancestry is colour-coded as for LPD<sup>+</sup>; the resulting notions of (pre)proof, thread and progress are as in Definition 29. Just like for LPD<sup>+</sup>, a standard encoding of LPD into the μ-calculus yields its soundness and completeness, thanks to known sequent systems for the latter, cf. [23,25], but has also been established independently [20]. Again, a modular adaptation of the simulation of LPD<sup>+</sup> by HTC, cf. Sect. 6, yields:

**Theorem 36 (Completeness for** PDL**).** *Let* A *be a* PDL *formula. If* |= A *then* HTC<sup>=</sup> *cyc* ST(A)(c)*.*

## **8 Conclusions**

In this work we proposed a novel cyclic system HTC for Transitive Closure Logic (TCL) based on a form of hypersequents. We showed a soundness theorem for standard semantics, requiring an argument bespoke to our hypersequents. Our system is cut-free, rendering it suitable for automated reasoning via proof search. We showcased its expressivity by demonstrating completeness for PDL, over the standard translation. In particular, we demonstrated formally that such expressivity is not available in the previously proposed system TC<sup>G</sup> of Cohen and Rowe (Theorem 12). Our system HTC locally simulates TC<sup>G</sup> too (Theorem 19).

As far as we know, HTC is the first cyclic system employing a form of *deep inference* resembling *alternation* in automata theory, e.g. wrt. proof checking, cf. Proposition 18. It would be interesting to investigate the structural proof theory that emerges from our notion of hypersequent. As hinted at in Examples 20 and 21, our hypersequential system exhibits more liberal rule permutations than usual sequents, so we expect their *focussing* and *cut-elimination* behaviours to similarly be richer, cf. [21,22]. Note however that such investigations are rather pertinent for pure predicate logic (without *TC* ): focussing and cut-elimination arguments do not typically preserve regularity of non-wellfounded proofs, cf. [2].

Finally, our work bridges the cyclic proof theories of (identity-free) PDL and (reflexive) TCL. With increasing interest in both modal and predicate cyclic proof theory, it would be interesting to further develop such correspondences.

**Acknowledgements.** The authors would like to thank Sonia Marin, Jan Rooduijn and Reuben Rowe for helpful discussions on matters surrounding this work.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Equational Unification and Matching, and Symbolic Reachability Analysis in Maude 3.2 (System Description)**

Francisco Dur´an<sup>1</sup> , Steven Eker<sup>2</sup> , Santiago Escobar3(B) , Narciso Mart´ı-Oliet<sup>4</sup> , Jos´e Meseguer<sup>5</sup> , Rub´en Rubio<sup>4</sup> , and Carolyn Talcott<sup>2</sup>

<sup>1</sup> Universidad de M´alaga, M´alaga, Spain duran@lcc.uma.es <sup>2</sup> SRI International, Menlo Park, CA, USA eker@csl.sri.com, clt@cs.stanford.edu <sup>3</sup> VRAIN, Universitat Polit`ecnica de Val`encia, Valencia, Spain sescobar@upv.es <sup>4</sup> Universidad Complutense de Madrid, Madrid, Spain *{*narciso,rubenrub*}*@ucm.es <sup>5</sup> University of Illinois at Urbana-Champaign, Urbana, IL, USA meseguer@illinois.edu

**Abstract.** Equational unification and matching are fundamental mechanisms in many automated deduction applications. Supporting them efficiently for as wide as possible a class of equational theories, and in a typed manner supporting type hierarchies, benefits many applications; but this is both challenging and nontrivial. We present Maude 3.2's efficient support of these features as well as of symbolic reachability analysis of infinite-state concurrent systems based on them.

# **1 Introduction**

Unification is a key mechanism in resolution [41] and paramodulation-based [36] theorem proving. Since Plotkin's work [40] on *equational unification*, i.e.,

Dur´an was supported by the grant UMA18-FEDERJA-180 funded by J. Andaluc´ıa/ FEDER and the grant PGC2018-094905-B-I00 funded by MCIN/AEI/10.13039/ 501100011033 and ERDF A way of making Europe. Escobar was supported by the EC H2020-EU grant 952215, by the grant RTI2018-094403-B-C32 funded by MCIN/AEI/ 10.13039/501100011033 and ERDF A way of making Europe, by the grant PROME-TEO/2019/098 funded by Generalitat Valenciana, and by the grant PCI2020-120708-2 funded by MICIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR. Mart´ı-Oliet and Rubio were supported by the grant PID2019- 108528RB-C22 funded by MCIN/AEI/10.13039/501100011033 and ERDF A way of making Europe. Talcott was partially supported by the U. S. Office of Naval Research under award numbers N00014-15-1-2202 and N00014-20-1-2644, and NRL grant N0017317-1-G002.

E-*unification* modulo an equational theory E, it is widely used for increased effectiveness. Since Walther's work [47] it has been well understood that *typed* E-unification, exploiting types and subtype hierarchies, can drastically reduce a prover's search space. Many other automated deduction applications use typed E-unification as a key mechanism, including, inter alia: (i) constraint logic programming, e.g., [12,23]; (ii) narrowing-based infinite-state reachability analysis and model checking, e.g., [6,35]; (iii) cryptographic protocol analysis modulo algebraic properties, e.g., [8,19,28]; (iv) partial evaluation, e.g., [4,5]; and (v) SMT solving, e.g., [32,48]. The special case of typed E-*matching* is also a key component in all the above areas as well as in: (vi) E-generalization (also called anti-unification), e.g., [1,2]; and (vii) E-homeomorphic embedding, e.g., [3].

Maximizing the scope and effectiveness of typed E-unification and Ematching means efficiently supporting as wide a class of theories E as possible. Such efficiency crucially depends on both efficient algorithms (and their combinations) and —since the number of E-unifiers may be large— on computing complete *minimal* sets of solutions to reduce the search space. The recent Maude 3.2 release<sup>1</sup> provides this kind of efficient support for typed E-unification and E-matching in three, increasingly more general classes of theories E:


For classes (1) and (2) the set of B- (resp. E ∪ B-) unifiers is always *complete, minimal and finite*, except for the *AwoC* case when B contains an A but not C axiom for some binary symbol f. <sup>2</sup> The typing is order-sorted [22,29] and thus contains many-sorted and unsorted B- (resp. E ∪B-) unification as special cases. For class (3), Maude enumerates a possibly infinite complete set of E∪B-unifiers, with the same AwoC exception on B. We discuss new features for classes (1)–(2), and a new narrowing modulo E ∪ B-based *symbolic reachability analysis* feature for infinite-state systems specified in Maude as rewrite theories (Σ,E ∪ B,R) with equations E ∪ B in class (2) and concurrent transition rules R. In Sect. 5 we discuss various applications that can benefit from these new features.

In comparison with previous Maude tool papers reporting on new features —the last one was [16]— the new features reported here include: (i) computing minimal complete sets of most general B- (resp. E ∪ B-) unifiers for classes (1) and (2) except for the AwoC case; (ii) a new E ∪ B-matching algorithm for class (2); and (iii) a new symbolic reachability analysis for concurrent systems

<sup>1</sup> Publicly available at http://maude.cs.illinois.edu.

<sup>2</sup> In the AwoC case, Maude's algorithms are optimized to favor many commonly occurring cases where typed *A*-unification is finitary, and provides a finite set of solutions and an incompleteness warning outside such cases (see [18]).

based on narrowing with transition rules modulo equations E ∪ B in class (2) enjoying powerful state-space reduction capabilities based on the minimality and completeness feature (i) and on "folding" less general symbolic states into more general ones through subsumption. Section 3.1 shows the importance of the new E ∪ B-matching algorithm for efficient computation of minimal E ∪ B-unifiers.

**Notation, Strict**-B-**Coherence, and FVP.** For notation involving either term positions, p ∈ *pos*(t), t|p, t[t ]p, or substitutions, tθ, θμ, see [14]. Equations (<sup>u</sup> <sup>=</sup> <sup>v</sup>) <sup>∈</sup> <sup>E</sup> oriented as rules (<sup>u</sup> <sup>→</sup> <sup>v</sup>) <sup>∈</sup> −→<sup>E</sup> are *strictly coherent* modulo axioms B iff (t =<sup>B</sup> t ∧ t →→− E ,B w) ⇒ ∃w (t →→− E ,B w ∧ w =<sup>B</sup> w ), where t →→− E ,B w iff <sup>∃</sup>(<sup>u</sup> <sup>→</sup> <sup>v</sup>) <sup>∈</sup> −→E , <sup>∃</sup>θ, <sup>∃</sup><sup>p</sup> <sup>∈</sup> *pos*(t)(uθ <sup>=</sup><sup>B</sup> <sup>t</sup>|<sup>p</sup> <sup>∧</sup> <sup>w</sup> <sup>=</sup> <sup>t</sup>[vθ]p). For (Σ,E <sup>∪</sup> <sup>B</sup>) an equational theory with −→E confluent, terminating and strictly coherent modulo <sup>B</sup>, (1) an −→E,B-t-*variant* is a pair (v, θ) s.t. <sup>v</sup> = (tθ)!→− E ,B ∧ θ = θ!→− E ,B, where u!→− E ,B (resp. θ!→− E ,B) denotes the −→E,B-normal form of <sup>u</sup>, resp. <sup>θ</sup>; (2) for −→E,B-tvariants (v, θ), (u, μ), the *more general relation* (v, θ) <sup>B</sup> (u, μ) holds iff ∃γ(u =<sup>B</sup> vγ ∧ θγ =<sup>B</sup> μ); (3) (Σ,E ∪ B) is *FVP* [13,21] iff any Σ-term t has a *finite* set of most general −→E,B-t-variants. Footnote 5 explains how FVP can be checked.

### **2 Complete and Minimal Order-Sorted** *B***-Unifiers**

Throughout the paper we use the following equational theory E ∪ B of the Booleans as a running example (with self-explanatory, user-definable syntax<sup>3</sup>):

```
fmod BOOL-FVP is protecting TRUTH-VALUE .
   op _and_ : Bool Bool -> Bool [assoc comm] .
   op _xor_ : Bool Bool -> Bool [assoc comm] .
   op not_ : Bool -> Bool .
   op _or_ : Bool Bool -> Bool .
   op _<=>_ : Bool Bool -> Bool .
   vars X Y Z W : Bool .
   eq X and true = X [variant] .
   eq X and false = false [variant] .
   eq X and X = X [variant] .
   eq X and X and Y = X and Y [variant] . *** AC extension
   eq X xor false = X [variant] .
   eq X xor X = false [variant] .
   eq X xor X xor Y = Y [variant] . *** AC extension
   eq not X = X xor true [variant] .
   eq X or Y = (X and Y) xor X xor Y [variant] .
   eq X <=> Y = true xor X xor Y [variant] .
endfm
```
<sup>3</sup> This module imports Maude's TRUTH-VALUE module and the command "set include BOOL off ." must be typed before the module to avoid default importation of BOOL.

The axioms B are the associativity-commutativity (AC) axioms for xor and and (specified with the assoc comm attributes). The equations E are terminating and confluent modulo B [42]. To achieve strict B-*coherence* [30], the needed ACextensions [39] are added —for example, the AC-extension of X xor X = false is X xor X xor Y = Y. The equations E for xor and and define the theory of *Boolean rings*, *except for the missing*<sup>4</sup> *distributivity equation* X and (Y xor Z) = (X and Y) xor (X and Z). The remaining equations in E define or, not and <=> as definitional extensions. The variant attribute declares that the equation will be used for folding variant narrowing [21]. The theory is FVP,<sup>5</sup> in class (2). In this section we will consider B-unification (for B = AC) using this example. E ∪ B-unification for the same example will be discussed in Sect. 3.

For B any combination of associativity and/or commutativity and/or identity axioms, Maude's unify command computes a complete finite set of most general B-unifiers, except for the AwoC case. The new irredundant unify command always returns<sup>6</sup> a *finite, complete and minimal* set of B-unifiers, except for the AwoC case. The output of unify for the equation below can be found in [10, §13].

Maude> irredundant unify X and not Y and not Z =? W and Y and not X . Decision time: 0ms cpu (0ms real)


# **3** *E ∪ B***-Unification and Matching for FVP Theories**

It is a general result from [21] that if E ∪B is FVP and B-unification is finitary, then E ∪ B-unification is *finitary* and a complete finite set of E ∪ B-unifiers can be computed by *folding variant narrowing* [21]. Furthermore, assuming that TΣ/E,s is non-empty for each sort s, a finitary E ∪ B-unification algorithm automatically provides a decision procedure for *satisfiability* of any *positive* (the ∧,∨-fragment) quantifier-free formula ϕ in the initial algebra TΣ/E, since ϕ can be put in DNF, and a conjunction of equalities Γ is satisfiable in TΣ/E iff Γ is E ∪ B-unifiable.

Since for our running example BOOL-FVP the equations E∪B are FVP and Bunification (in this case B = AC) is finitary, all this has useful consequences for

<sup>4</sup> By missing distributivity, this theory is *weaker* than the theory of Boolean rings. Nevertheless, its *initial algebra <sup>T</sup>*Σ/E∪<sup>B</sup> is exactly the Booleans on *{*true,false*}* with the standard truth tables for all connectives. Thus, all equations provable in

Boolean algebra hold in *<sup>T</sup>*Σ/E∪<sup>B</sup>, including the missing distributivity equation. <sup>5</sup> This can be easily checked in Maude by checking the finiteness of the variants for each *f*(*X*), resp. *f*(*X, Y* ), for each unary, resp. binary, symbol *f* in BOOL-FVP using the get variants command; see [9] for a theoretical justification of this check.

<sup>6</sup> Fresh variables follow the form #1:Bool.

BOOL-FVP. Indeed, <sup>T</sup>Σ/E∪<sup>B</sup> is exactly the Booleans<sup>7</sup> on {true,false} with the well-known truth tables for and, xor, not, or and <=>. This means that E ∪ Bunification provides a Boolean *satisfiability decision procedure* for a Boolean expression u on such symbols, namely, u is Boolean satisfiable iff the equation u = true is E∪B-unifiable. Furthermore, a ground assignment ρ to the variables of u is a satisfying assignment for u iff there exists an E∪B-unifier α of u = true and a ground substitution δ such that ρ = αδ. For the same reasons, u is a Boolean *tautology* iff the equation u = false has no E ∪ B-unifiers.

A complete, finite set of E ∪ B-unifiers can be computed with Maude's variant unify command whenever E ∪ B is FVP, except for the AwoC case. Instead, the new<sup>8</sup> filtered variant unify command computes a *finite, complete and minimal* set of E ∪ B-unifiers, which can be considerably smaller than that computed by variant unify. For our BOOL-FVP example, filtered variant unify gives us a Boolean satisfiability decision procedure plus a symbolic specification of satisfying assignments. Such a procedure is not practical: it cannot compete with standard SAT-solvers; but that was never our purpose: our purpose here is to illustrate with simple examples how E ∪ B-unification works for the *infinite* class of *user-definable* FVP theories E ∪ B, of which BOOL-FVP is just a simple example; dozens of other examples can be found in [32].

The difference between the variant unify and the new filtered variant unify command is illustrated with the following example; its unfiltered output can be found in [10, §14]. Note that the single E ∪ B-unifier gives us a compact symbolic description of this Boolean expression's satisfying assignments.

```
Maude> filtered variant unify (X or Y) <=> Z =? true .
rewrites: 3224 in 12765ms cpu (14776ms real) (252 rewrites/second)
Unifier 1
X --> #1:Bool xor #2:Bool
Y --> #1:Bool
Z --> #2:Bool xor (#1:Bool and (#1:Bool xor #2:Bool))
No more unifiers.
Advisory: Filtering was complete.
```
The computation of a minimal set of E ∪ B-unifiers relies on filtering by E ∪ Bmatching between two E ∪ B-unifiers, as explained in the following section.

# **3.1 FVP** *E ∪ B***-Matching and Minimality of** *E ∪ B***-Unifiers**

By definition, a term u E ∪ B-*matches* another term v iff there is a substitution γ such that u =<sup>E</sup>∪<sup>B</sup> vγ. Besides the existing match command modulo axioms

<sup>7</sup> Each connective's truth table can be checked with Maude's reduce command. Actually, need only check and and xor (other connectives are definitional extensions).

<sup>8</sup> In Maude, different command names are used to emphasize different algorithms. The word 'filtered' is used instead of 'irredundant' because irredundancy is not guaranteed in the AwoC case.

B, Maude's new variant match command computes a complete, minimal set of E ∪ B-*matching substitutions* for any FVP theory E ∪ B in class (2), except for the AwoC case. Such an algorithm could always be derived from an E ∪ Bunification algorithm by replacing u by u, where all variables in u are replaced by fresh constants in u, and computing the E ∪ B-unifiers of u = v. But a more efficient special-purpose algorithm has been designed and implemented for this purpose. E ∪ B-matching algorithms are automatically provided by Maude for any *user-definable* theory in class (2) with the variant match command.

Maude> variant match in BOOL-FVP : Z and W <=? X . rewrites: 12 in 21ms cpu (27ms real) (545 rewrites/second)


This is a good moment to ask and answer a relevant question: Why is computing a complete *minimal* set of E ∪ B-unifiers for a unification problem Γ, where E ∪ B is an FVP theory in class (2) except for the AwoC case, *nontrivial*? We first need to explain how minimality is achieved. Suppose that α and β are two E ∪ B-unifiers of a system of equations Γ with, say, typed variables x1,...,xn. We then say that α *is more general than* β modulo E ∪ B, denoted α <sup>E</sup>∪<sup>B</sup> β, iff there is a substitution γ such that for each xi, 1 ≤ i ≤ n, γ(α(xi)) =<sup>E</sup>∪<sup>B</sup> β(xi). But this exactly means that the vector [β(x1),...,β(xn)] E ∪ B-matches the vector [α(x1),...,α(xn)] with E ∪ B-matching substitution γ. A complete set of E ∪ B-unifiers of Γ is by definition *minimal* iff for any two different unifiers α and β in it we have α <sup>E</sup>∪<sup>B</sup> β *and* β <sup>E</sup>∪<sup>B</sup> α, i.e., the two associated E ∪ B-matching problems fail.

What is *nontrivial* is computing a minimal complete set of E ∪ B-unifiers *efficiently*. One could do so inefficiently by simulating E ∪ B-matching with E ∪ B-unification, and more efficiently by using an E∪B-matching algorithm. Maude achieves still greater efficiency by directly computing the α <sup>E</sup> <sup>∪</sup> <sup>B</sup> β relation. The key difference between the variant unify command and the new filtered variant unify command is that the second computes a E ∪ B-minimal set of E∪B-unifiers of Γ using the α <sup>E</sup>∪<sup>B</sup> β relation, whereas the first only computes a set of B-minimal E ∪B-unifiers of Γ using the cheaper α <sup>B</sup> β relation. There are three ideas we use to make it fast in practice: (i) variant matching is faster than variant unification because one side is variable-free; (ii) enumerating the variant matchers between two variant unifiers is far more expensive than checking existence of a matcher; and (iii) variant unifiers are discarded on-the-fly avoiding further narrowing steps and computation.

## **4 Narrowing-Based Symbolic Reachability Analysis**

In Maude, concurrent systems are specified in so-called *system modules* as *rewrite theories* of the form: R = (Σ, G, R), where G is an equational theory either of the form B in class (1), or E∪B in classes (2) or (3), and R are the *system transition rules*, specified as rewrite rules. When the theory R is *topmost*, meaning that the rules R rewrite the entire state, narrowing with rules R modulo the equations G is a *complete* symbolic reachability analysis method for *infinite-state systems* [35]. That is, given a term u with variables −→x , representing a typically infinite set of initial states, and another term v with variables −→y , representing a possibly infinite set of target states, narrowing can answer the question: *can an instance of* <sup>u</sup> *reach an instance of* <sup>v</sup>*?* That is, does the formula ∃−→x , −→y u <sup>→</sup><sup>∗</sup> <sup>v</sup> hold in R? Note that, if the *complement* of a system invariant I can be symbolically described as the set of ground instances of terms in a set {v1,...,vn} of pattern terms, then narrowing provides a semi-decision procedure for verifying whether the system specified by R fails to satisfy I starting from an initial set of states specified by u. Namely, I holds iff no instance of any v<sup>i</sup> can be reached from some instance of u.

Assuming G is in class (1) or (2), Maude's vu-narrow command implements narrowing with R modulo G by performing G-unification at each narrowing step. However, the number of symbolic states that need to be explored can be *infinite*. This means that if no solution exists for the narrowing search, Maude will search forever, so that only *depth-bounded searches* will terminate. The great advantage of the new {fold} vu-narrow {filter,delay} command is that it performs a powerful *symbolic state space reduction* by: (i) removing a newly explored symbolic state v if it E ∪ B-matches a previously explored state v and replacing transition with target v by transitions with target v; and (ii) using minimal sets of E ∪B-unifiers for each narrowing step and for checking common instances between a newly explored state and the target term (ensured by words filter and delay). This can make the entire search space finite and allow full verification of invariants for some infinite-state systems. Consider the following Maude specification of Lamport's bakery protocol.

```
mod BAKERY is
 sorts Nat LNat Nat? State WProcs Procs .
 subsorts Nat LNat < Nat? . subsort WProcs < Procs .
 op 0 : -> Nat .
 op s : Nat -> Nat .
 op [_] : Nat -> LNat . *** number-locking operator
 op < wait,_> : Nat -> WProcs .
 op < crit,_> : Nat -> Procs .
 op mt : -> WProcs . *** empty multiset
 op __ : Procs Procs -> Procs [assoc comm id: mt] . *** union
 op __ : WProcs WProcs -> WProcs [assoc comm id: mt] . *** union
 op _|_|_ : Nat Nat? Procs -> State .
 vars n m i j k : Nat . var x? : Nat? . var PS : Procs . var WPS : WProcs .
 rl [new]: m | n | PS => s(m) | n | < wait,m > PS [narrowing] .
 rl [enter]: m|n|< wait,n > PS => m | [n] | < crit,n > PS [narrowing] .
 rl [leave]: m | [n] | < crit,n > PS => m | s(n) | PS [narrowing] .
endm
```
The states of BAKERY have the form "m | x? | PS" with m the ticket-dispensing counter, x? the (possibly locked) counter to access the critical section, and PS a multiset of processes either waiting or in the critical section. BAKERY is infinitestate: [new] creates new processes, and the counters can grow unboundedly. When a waiting process enters the critical section with [enter], the second counter n is locked as [n]; and it is unlocked and incremented when it leaves it with [leave]. The key invariant is *mutual exclusion*. Note that the term "i | x? | < crit, j > < crit, k > PS" describes all states in the *complement* of mutual exclusion states. Without the fold option, narrowing does not terminate, but with the following command we can verify that BAKERY satisfies mutual exclusion, not just for the initial state "0 | 0 | mt", but for the much more general infinite set of initial states with waiting processes only "m|n| WPS".

```
Maude> {fold} vu-narrow {filter,delay}
        m | n | WPS =>* i | x? | < crit, j > < crit, k > PS .
No solution.
rewrites: 4 in 1ms cpu (1ms real) (2677 rewrites/second)
```
The new vu-narrow {filter,delay} command can achieve dramatic state space reductions over the previous vu-narrow command by filtering E ∪ Bunifiers. This is illustrated by a simple cryptographic protocol example in [10, §15] exploiting the unitary nature of unification in the exclusive-or theory [24].

## **5 Applications and Conclusion**

Maude can be used as a meta-tool to develop new formal tools because: (i) its underlying equational and rewriting logics are logical —and reflective metalogical— frameworks [7,27,46]; (ii) Maude's efficient support of logical reflection through its META-LEVEL module; (iii) Maude's rewriting, search, model checking, and strategy language features [11,15]; and (iv) Maude's symbolic reasoning features [15,33], the latest reported here. We refer to [11,15,31,33] for references on various Maude-based tools. Many of them can benefit from these new features.

By way of example we mention some areas ready to reap such benefits: (1) *Formal Analysis of Cryptographic Protocols*. The new features can yield substantial improvements to tools such as Maude-NPA [19], Tamarin [28] and AKISS [8]. (2) *Model Checking of Infinite-State Systems*. The narrowing-based LTL symbolic model checker reported in [6,20], and the addition of new symbolic capabilities to Real-Time Maude [37,38] can both benefit from the new features. (3) *SMT Solving*. In Sect. 3 we noted that FVP E∪B-unification makes satisfiability of positive QF formulas in TΣ/E∪<sup>B</sup> decidable. Under mild conditions, this has been extended in [32,44] to a procedure for satisfiability in TΣ/E∪<sup>B</sup> of all QF formulas which will also benefit from the new features. (4) *Theorem Proving*. The new Maude Inductive Theorem Prover under construction [34], as well as Maude's Invariant Analyzer [43] and Reachability Logic Theorem Prover [45] all use equational unification and narrowing modulo equations; so all will benefit from the new features. (5) *Theory Transformations* based on equational unification, e.g., partial evaluation [4], ground confluence methods [17] or program termination methods [25,26] could likewise become more efficient.

In conclusion, we have presented and illustrated with examples new equational unification and matching, and symbolic reachability analysis features in Maude 3.2. Thanks to the above-mentioned properties (i)–(iv) of Maude as a meta-tool, we hope that this work will encourage other researchers to use Maude and its symbolic features to develop new tools in many different logics.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Le´sniewski's Ontology – Proof-Theoretic Characterization**

Andrzej Indrzejczak(B)

Department of Logic, University of Lodz, -L´od´z, Poland andrzej.indrzejczak@filhist.uni.lodz.pl

**Abstract.** The ontology of Le´sniewski is commonly regarded as the most comprehensive calculus of names and the theoretical basis of mereology. However, ontology was not examined by means of proof-theoretic methods so far. In the paper we provide a characterization of elementary ontology as a sequent calculus satisfying desiderata usually formulated for rules in well-behaved systems in modern structural proof theory. In particular, the cut elimination theorem is proved and the version of subformula property holds for the cut-free version.

**Keywords:** Le´sniewski · Ontology · Calculus of Names · Sequent Calculus · Cut Elimination

# **1 Introduction**

The ontology of Le´sniewski is a kind of calculus of names proposed as a formalization of logic alternative to Fregean paradigm. Basically, it is a theory of the binary predicate ε understood as the formalization of the Greek 'esti'. Informally a formula aεb is to be read as "(the) a is (a/the) b", so in order to be true a must be an individual name whereas b can be individual or general name. In the original formulation Le´sniewski's ontology is the middle part of the hierarchical structure involving also the protothetics and mereology (see the presentation in Urbaniak [20]). Protothetics, a very general form of propositional logic, is the basis of the overall construction. Its generality follows from the fact that, in addition to sentence variables, arbitrary sentence-functors (connectives) are allowed as variables, and quantifiers binding all these kinds of variables are involved. Similarly in Le´sniewski's ontology, we have a quantification over name variables but also over arbitrary name-functors creating complex names. In consequence we obtain very expressive logic which is then extended to mereology. The latter, which is the most well-known ingredient of Le´sniewski's construction, is a theory of parthood relation, which provides an alternative formalization of the theory of classes and foundations of mathematics.

Despite of the dependence of Le´sniewski's ontology on his protothetics, we can examine this theory, in particular its part called elementary ontology, in isolation, as a kind of first-order theory of ε based on classical first-order logic (FOL). Elementary ontology, in this sense, was investigated, among others, by Slupecki [17] and Iwanu´s [7], and we follow this line here. The expressive power of such an approach is strongly reduced, in particular, quantifiers apply only to name variables. One should note however that, despite of the appearances, it is not just another elementary theory in the standard sense, since the range of variables is not limited to individual names but admits general and even empty names. Thus, name variables may represent not only 'Napoleon Bonaparte' but also 'an emperor' and 'Pegasus'. This leads to several problems concerning the interpretation of quantifiers in ontology, encountered in the semantical treatment (see e.g. K¨ung and Canty [8] or Rickey [16]). However, for us the problems of proper interpretation are not important here, since we develop purely syntactical formulation, which is shown to be equivalent to Le´sniewski's axiomatic formulation.

Taking into account the importance and originality of Le´sniewski's ontology it is interesting, if not surprising, that so far no proof-theoretic study was offered, in particular, in terms of sequent calculus (SC). In fact, a form of natural deduction proof system was applied by many authors following the original way of presenting proofs by Le´sniewski (see, e.g. his [9–11]). However this can hardly be treated as a proof-theoretic study of Le´sniewski's ontology but only as a convenient way of simplifying presentation of axiomatic proofs. Ishimoto and Kobayashi [6] introduced also a tableau system for part of (quantifier-free) ontology – we will say more about this system later.

In this paper we present a sequent calculus for elementary ontology and focus on its most important properties. More specifically, in Sect. 2 we briefly characterise elementary ontology which will be the object of our study. In Sect. 3 we present an adequate sequent calculus for the basic part of elementary ontology and prove that it is equivalent with the axiomatic formulation. Then we prove the cut elimination theorem for this calculus in Sect. 4. In the next section we focus on the problem of extensionality and discuss some alternative formulations of ontology and some of its parts, as well as the intuitionistic version of it. Section 6 shows how the basic system can be extended with rules for new predicate constants which preserve cut elimination. The problem of extension with rules for term constants is discussed briefly in Sect. 7. A summary of obtained results and open problems closes the paper.

#### **2 Elementary Ontology**

Roughly, in this article, by Le´sniewski's elementary ontology we mean standard FOL (in some chosen adequate formalization) with Le´sniewski's axiom LA added. For more detailed general presentation of Le´sniewski's systems one may consult Urbaniak [20] and for a detailed study of Le´sniewski's ontology see Iwanu´s [7] or Slupecki [17]. In the next section we will select a particular sequent system as representing FOL and investigate several ways of possible representation of LA in this framework.

We will consider two languages for ontology. In both we assume a denumerable set of name variables. Following the well-known Gentzen's custom we apply a graphical distinction between the bound variables, which will be denoted by x, y, z, ... (possibly with subscripts), and the free variables usually called parameters, which will be denoted by a, b, c, .... These are the only terms we admit, and both kinds will be called simply name variables. The basic language L*<sup>o</sup>* consists of the following vocabulary:


As we can see, in addition to the standard logical vocabulary of FOL, the only specific constant is a binary predicate ε with the formation rule: tεt is an atomic formula, for any terms t, t . In what follows we will use a convention: instead of tεt we will write tt . The complexity of formulae of L*<sup>o</sup>* is defined as the number of occurrences of logical constants, i.e. connectives and quantifiers. Hence the complexity of atomic formulae is 0.

The language L*p*, considered in Sect. 6, adds to this vocabulary a number of unary and binary predicates: D, V, S, G, U, <sup>=</sup>, <sup>≡</sup>, <sup>≈</sup>, ε,¯ <sup>⊂</sup>, -, A, E, I, O.

In L*<sup>o</sup>* and L*<sup>p</sup>* we have name variables, which range over all names (individual, general and empty), as the only terms. However Le´sniewski considered also complex terms built with the help of specific term-forming functors. We will discuss briefly such extensions in the setting of sequent calculus in Sect. 7 and notice important problems they generate for decent proof-theoretic treatment.

The only specific axiom of elementary ontology is Le´sniewski's axiom LA:

$$\forall xy (xy \leftrightarrow \exists z (zx) \land \forall z (zx \rightarrow zy) \land \forall z v (zx \land vx \rightarrow zv))$$

LA→, LA<sup>←</sup> will be used to refer to the respective implications forming LA, with dropped outer universal quantifier. Note that:

**Lemma 1.** *The following formulae are equivalent to LA:*


We start with the system in the language L*o*, i.e. with ε (conventionally omitted) as the only specific predicate constant added to the standard language of FOL.

#### **3 Sequent Calculus**

Elementary ontology will be formalised as a sequent calculus with sequents <sup>Γ</sup> <sup>⇒</sup> Δ which are ordered pairs of finite multisets of formulae called the antecedent and the succedent, respectively. We will use the calculus G (after Gentzen) which is essentially the calculus G1 of Troelstra and Schwichtenberg [19]. All necessary

(AX) <sup>ϕ</sup> <sup>⇒</sup> <sup>ϕ</sup> (Cut) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ ϕ, Π <sup>⇒</sup> <sup>Σ</sup> Γ, Π ⇒ Δ, Σ (W⇒) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒W) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> Γ ⇒ Δ, ϕ (C⇒) ϕ, ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒C) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ, ϕ Γ ⇒ Δ, ϕ (¬⇒) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>¬</sup>ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒¬) ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> Γ ⇒ Δ,¬ϕ (∧⇒) ϕ, ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>ϕ</sup> <sup>∧</sup> ψ,Γ <sup>⇒</sup> <sup>Δ</sup> (⇒∧) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ Γ <sup>⇒</sup> Δ, ψ Γ ⇒ Δ, ϕ ∧ ψ (∨⇒) ϕ, Γ <sup>⇒</sup> Δ ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>ϕ</sup> <sup>∨</sup> ψ, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒∨) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ, ψ Γ ⇒ Δ, ϕ ∨ ψ (→⇒) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>ϕ</sup> <sup>→</sup> ψ, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒→) ϕ, Γ <sup>⇒</sup> Δ, ψ Γ ⇒ Δ, ϕ → ψ (↔⇒) <sup>Γ</sup><sup>⇒</sup> Δ, ϕ, ψ ϕ, ψ, Γ<sup>⇒</sup> <sup>Δ</sup> <sup>ϕ</sup>↔ψ,Γ<sup>⇒</sup> <sup>Δ</sup> (∀⇒) <sup>ϕ</sup>[x/b], Γ<sup>⇒</sup> <sup>Δ</sup> ∀xϕ, Γ⇒ Δ (⇒↔) ϕ, Γ<sup>⇒</sup> Δ, ψ ψ, Γ <sup>⇒</sup> Δ, ϕ <sup>Γ</sup><sup>⇒</sup> Δ, ϕ↔<sup>ψ</sup> (⇒∀) <sup>Γ</sup><sup>⇒</sup> Δ, ϕ[x/a] Γ⇒ Δ, ∀xϕ (∃⇒) <sup>ϕ</sup>[x/a], Γ<sup>⇒</sup> <sup>Δ</sup> <sup>∃</sup>xϕ, Γ<sup>⇒</sup> <sup>Δ</sup> (⇒∃) <sup>Γ</sup><sup>⇒</sup> Δ, ϕ[x/b] Γ⇒ Δ, ∃xϕ

where a is a fresh parameter (eigenvariable), not present in Γ,Δ and ϕ, whereas b is an arbitrary parameter.

#### **Fig. 1.** Calculus G

structural rules, including cut, weakening and contraction are primitive. The calculus G consists of the rules from Fig. 1:

Let us recall that formulae displayed in the schemata are active, whereas the remaining ones are parametric, or form a context. In particular, all active formulae in the premisses are called side formulae, and the one in the conclusion is the principal formula of the respective rule application. Proofs are defined in a standard way as finite trees with nodes labelled by sequents. The height of a proof <sup>D</sup> of <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> is defined as the number of nodes of the longest branch in <sup>D</sup>. *<sup>k</sup>* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> means that <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> has a proof of the height at most <sup>k</sup>.

G provides an adequate formalization of the classical pure FOL (i.e. with no terms other than variables). However, we should remember that here terms in quantifier rules are restricted to variables ranging over arbitrary names (including empty and general). This means, in particular, that quantifiers do not have an existential import, like in standard FOL.

Let us call G+LA an extension of G with LA as an additional axiomatic sequent. The following hold:

**Lemma 2.** *The following sequents are provable in G+LA:* ab ⇒ ∃x(xa) ab ⇒ ∀x(xa <sup>→</sup> xb) ab ⇒ ∀xy(xa <sup>∧</sup> ya <sup>→</sup> xy) <sup>∃</sup>x(xa), <sup>∀</sup>x(xa <sup>→</sup> xb), <sup>∀</sup>xy(xa <sup>∧</sup> ya <sup>→</sup> xy) <sup>⇒</sup> ab

The proof is obvious. In fact, these sequents together allow us to derive LA so we could use them alternatively in a characterization of elementary ontology on the basis of G.

G+LA is certainly an adequate formalization of elementary ontology in the sense of Slupecki and Iwanu´s. However, from the standpoint of proof theoretic analysis it is not an interesting form of sequent calculus and it will be used only for showing the adequacy of our main system called GO.

To obtain the basic GO we add the following four rules to G:

$$\begin{array}{ll}(R) \quad \frac{aa,\Gamma \Rightarrow \Delta}{ab,\Gamma \Rightarrow \Delta} & (T) \quad \frac{ac,\Gamma \Rightarrow \Delta}{ab,bc,\Gamma \Rightarrow \Delta} & (S) \quad \frac{ba,\Gamma \Rightarrow \Delta}{ab,bb,\Gamma \Rightarrow \Delta} \\\\ (E) \quad \frac{da,\Gamma \Rightarrow \Delta,dc \quad dc,\Gamma \Rightarrow \Delta,da \quad ab,\Gamma \Rightarrow \Delta}{cb,\Gamma \Rightarrow \Delta} & \end{array}$$

where d in (E) is a new parameter (eigenvariable), and a, b, c are arbitrary.

The names of rules come from reflexivity, transitivity, symmetry and extensionality. In case of (R) and (S) it is a kind of prefixed reflexivity and symmetry (ab <sup>→</sup> aa, bb <sup>→</sup> (ab <sup>→</sup> ba)). Why (E) comes from extensionality will be explained later.

We can show that GO is an adequate characterization of elementary ontology.

**Theorem 1.** *If G+LA* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then GO* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*.*

*Proof.* It is sufficient to prove that the axiomatic sequent LA is provable in GO.

$$(\Rightarrow \exists) \frac{(R) \, \frac{aa \Rightarrow aa}{ab \Rightarrow aa}}{\frac{ab \Rightarrow \exists x (xa)}{ab \Rightarrow \exists x (xa)}} \quad \frac{\frac{cb \Rightarrow cb}{ca, ab \Rightarrow ca \to cb} \, (\Rightarrow \to)}{\frac{ab \Rightarrow ca \to cb}{ab \Rightarrow \forall x (xa \to xb)}} \, (\Rightarrow \nwarrow)$$

(⇒ ∧) with:

$$\begin{array}{c} \frac{cd \Rightarrow cd}{ca, ad \Rightarrow cd} \, (T) \\ \hline ca, da, aa \Rightarrow cd \, (S) \\ \hline ca, da, ab \Rightarrow cd \, (R) \\ \hline ab, ca \land da \Rightarrow cd \, (\land \Rightarrow) \\ \hline ab \Rightarrow ca \land da \rightarrow cd \, (\Rightarrow \rightarrow) \\ ab \Rightarrow \forall xy (xa \land ya \rightarrow xy) \, \, (\Rightarrow \forall) \end{array}$$

yields LA<sup>→</sup> after (⇒→). A proof of the converse is more complicated (for readability and space-saving we ommited all applications of weakening rules necessary for the application of two- and three-premiss rules; this convention will be applied hereafter with no comments):

$$\begin{array}{llll} (\Rightarrow \land) & \frac{da \Rightarrow da \quad \quad ca \Rightarrow ca}{(\Rightarrow \Rightarrow)} & \frac{ca \Rightarrow da \land ca}{da, ca \Rightarrow da \land ca} & \quad \quad dc \Rightarrow dc \\ (\forall \Rightarrow) & \frac{da, ca, da \land ca \rightarrow dc \Rightarrow dc}{da, ca, a \forall xy (xa \land ya \rightarrow xy) \Rightarrow dc} & \quad \quad \frac{da \Rightarrow da}{dc, ca \Rightarrow da} & (T) & \quad \quad ab \Rightarrow ab \\ \hline ca \Rightarrow ca & & & & cb, ca, \forall xy (xa \land ya \rightarrow xy) \Rightarrow ab & & \\ \hline & ca, ca \rightarrow cb, \forall xy (xa \land ya \rightarrow xy) \Rightarrow ab & & & \\ & \frac{ca, \forall x (xa \rightarrow xb), \forall xy (xa \land ya \rightarrow xy) \Rightarrow ab}{\exists x (xa), \forall x (xa \rightarrow xb), \forall xy (xa \land ya \rightarrow xy) \Rightarrow ab} & (\exists \Rightarrow) \\ \end{array}$$

It is routine to prove LA.

Note that to prove LA<sup>→</sup> the rules (R),(T),(S) were sufficient, whereas in order to derive the converse, (E) alone is not sufficient - we need (T) again.

**Theorem 2.** *If GO* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then G+LA* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*.*

*Proof.* It is sufficient to prove that the four rules of GO are derivable in G+LA. For (T):

$$\frac{bc \Rightarrow \forall x (xb \to xc)}{abc \Rightarrow \forall x (xb \to xc)} \xrightarrow{\begin{array}{l} ab \Rightarrow ac, ab \Rightarrow ac\\ \forall x (xb \to xc), ab \Rightarrow ac\\ ab, bc \Rightarrow ac \end{array}} \begin{pmatrix} \Rightarrow \\ \forall \Rightarrow \end{pmatrix}}{ac, bc \Rightarrow ac}$$

where the leftmost leaf is provable in G+LA (Lemma 2).

For (S):

$$\begin{array}{c} \begin{array}{l} bb \Rightarrow bb \\ \hline bb, ab \Rightarrow bb \land ab \end{array} \left(\Rightarrow \wedge\right) \end{array} \begin{array}{l} ba \Rightarrow ba \\ \begin{array}{l} bb \land ab \to ba, bb, ab \Rightarrow ba \end{array} \left(\begin{array}{l} \hline \Rightarrow \end{array}\right) \end{array}$$

$$\begin{array}{l} bb \Rightarrow \forall xy (xb \land yb \to xy) \end{array} \begin{array}{l} \begin{array}{l} \forall xy (xb \land yb \to xy), bb, ab \Rightarrow ba \end{array} \left(\begin{array}{l} \forall \Rightarrow \end{array}\right) \end{array}$$

$$\begin{array}{l} \frac{bb, bb, ab \Rightarrow ba}{bb, ab \Rightarrow ba} \left(C \Rightarrow \right) \end{array} \left(\begin{array}{l} C \Rightarrow \end{array}\right) \end{array}$$

where the leftmost leaf is provable in G+LA (Lemma 2). By cut with the premiss of (S) we obtain its conclusion.

For (R):

$$\begin{array}{llll} ab \Rightarrow \forall xy (xa \land ya \rightarrow xy) & \frac{ab \Rightarrow \exists x (xa) \quad \quad S \\ \hline \forall x y (xa \land ya \rightarrow xy), \forall x (xa \rightarrow xa), ab \Rightarrow aa & \\ \hline \forall x (xa \rightarrow xa), ab, ab \Rightarrow aa & \\ \hline \forall x (xa \rightarrow xa), ab \Rightarrow aa & \\ \end{array} \begin{array}{llll} (Cut) \\ (Cut) \\ (Cut) \\ \hline \forall x (xa \rightarrow xa), (ab \Rightarrow aa) & \\ \end{array}$$

where <sup>S</sup> := <sup>∃</sup>x(xa), <sup>∀</sup>xy(xa <sup>∧</sup> ya <sup>→</sup> xy), <sup>∀</sup>x(xa <sup>→</sup> xa) <sup>⇒</sup> aa and all leaves are provable in G+LA (Lemma 2); in particular S is the fourth sequent with b replaced with <sup>a</sup>. By cut with ⇒ ∀x(xa <sup>→</sup> xa) and the premiss of (R) we obtain its conclusion.

Since (R),(T),(S) are all derivable in G+LA we use them in the proof of the derivability of (E) to simplify matters. Note first the following three proofs with weakenings omitted:

$$\begin{array}{l} \left(\begin{array}{l} \left(R\right) \xrightarrow{cc} \Rightarrow cc\\ \left(\Rightarrow\Rightarrow\right) \xrightarrow{cb\Rightarrow cc} \end{array}\right) \begin{array}{l} \left(\begin{array}{l} c\Rightarrow cc\\ ca\leftrightarrow cc, cb\Rightarrow ca\\ ca\leftrightarrow cc, cb\Rightarrow\exists x(xa)\\ \forall x(xa\leftrightarrow xc), cb\Rightarrow\exists x(xa)\end{array}\right)\\ \left(\forall\Rightarrow\right) \frac{da\Rightarrow da}{\left(\begin{array}{l} da\Rightarrow da\\ da\leftrightarrow dc, cb, da\Rightarrow db\\ \forall x(xa\leftrightarrow xc), cb, da\Rightarrow db\\ \left(\Rightarrow\right) \end{array}\right)}\\ \left(\begin{array}{l} \left(\Rightarrow\right) \end{array}\right) \frac{\left(\begin{array}{l} \left(\begin{array}{l} ca\Rightarrow bc\\ \forall x(xa\leftrightarrow xc), cb\Rightarrow ab\\ \forall x(xa\leftrightarrow xc), cb\Rightarrow ab\\ \forall x(xa\leftrightarrow xb) \end{array}\right)}{\left(\begin{array}{l} \forall x(xa\leftrightarrow xb), cb\Rightarrow \forall x(xa\rightarrow xb) \end{array}\right)}\end{array}$$

and

$$\begin{array}{l} \begin{array}{l} \frac{de \Rightarrow de}{\/ec, dc \Rightarrow de} \left(\begin{array}{l} \text{\$} \\ \hline ce, dc \Rightarrow de \end{array} \left(\begin{array}{l} \text{\$} \\ \hline ce, dc, c \Rightarrow de \end{array} \left(\begin{array}{l} \text{\$} \\ \hline ce, dc, c \Rightarrow de \end{array} \left(\begin{array}{l} \text{\$} \\ \hline ce, dc, c \Rightarrow de \end{array} \left(\begin{array}{l} \text{\$} \\ \hline e, c \&, c \Rightarrow de \end{array} \left(\begin{array}{l} \text{\$} \\ \hline e \end{array} \right) \end{array} \right) \\ \begin{array}{l} \left(\iff\right) \quad da \Rightarrow da \quad \overline{\frac{d}{dc, \forall x (xa \leftrightarrow xc), cb, a \Rightarrow de}} \left(\begin{array}{l} \text{\$} \\ \hline da \Rightarrow de, \forall x (xa \leftrightarrow xc), cb, da \text{, ea \Rightarrow de} \end{array} \right) \\ \left(\begin{array}{l} \forall x (xa \leftrightarrow xc), \forall x (xa \to xc), cb, da, ea \Rightarrow de \end{array} \right) \\ \left(\land \Rightarrow\right) \quad \overline{\forall x (xa \leftrightarrow xc), cb, da, ea \Rightarrow de} \\ \left(\begin{array}{l} \text{\$} \to \text{\$} \end{array} \left(\begin{array}{l} \forall x (xa \leftrightarrow xc), cb, da \Rightarrow de \\ \text{\$} \neq \text{\$} \end{array} \right) \\ \left(\Rightarrow \right) \quad \overline{\forall x (xa \leftrightarrow xc), cb, da \land ea \Rightarrow de} \\ \left(\Rightarrow \right) \quad \overline{\forall x (xa \leftrightarrow xc), cb \Rightarrow da \land ea \rightarrow de} \\ \end{array} \right)$$

By three cuts with <sup>∃</sup>x(xa), <sup>∀</sup>x(xa <sup>→</sup> xb), <sup>∀</sup>xy(xa <sup>∧</sup> ya <sup>→</sup> xy) <sup>⇒</sup> ab and contractions we obtain a proof of <sup>S</sup> := <sup>∀</sup>x(xa <sup>↔</sup> xc), cb <sup>⇒</sup> ab. Then we finish in the following way:

$$\begin{array}{c} (\Rightarrow) \xrightarrow{da,\Gamma \Rightarrow \Delta, dc} \Delta, dc, \Gamma \Rightarrow \Delta, da\\ (\Rightarrow) \frac{\Gamma \Rightarrow \Delta, da \leftrightarrow dc}{\Gamma \Rightarrow \Delta, \forall x (xa \leftrightarrow xc)}\\ (Cut) \xrightarrow{\Gamma \Rightarrow \Delta, \forall x (xa \leftrightarrow xc)} \frac{S}{(Cut)\ \frac{cb,\Gamma \Rightarrow \Delta, ab}{cb,\Gamma \Rightarrow \Delta}} \end{array} \qquad ab,\Gamma \Rightarrow \Delta.$$

Note that to prove derivability of (E) we need in fact the whole LA. We elaborate on the strength of this rule in Sect. 5.

# **4 Cut Elimination**

The possibility of representing LA by means of these four rules makes GO a calculus with desirable proof-theoretic properties. First of all note that for G the cut elimination theorem holds. Since the only primitive rules for ε are all one-sided, in the sense that principal formulae occur in the antecedents only, we can easily extend this result to GO. We follow the general strategy of cut elimination proofs applied originally for hypersequent calculi in Metcalfe, Olivetti and Gabbay [13] but which works well also in the context of standard sequent calculi (see Indrzejczak [5]). Such a proof has a particularly simple structure and allows us to avoid many complexities inherent in other methods of proving cut elimination. In particular, we avoid well known problems with contraction, since two auxiliary lemmata deal with this problem in advance. Note first that for GO the following result holds:

# **Lemma 3 (Substitution).** *If <sup>k</sup>* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then <sup>k</sup>* <sup>Γ</sup>[a/b] <sup>⇒</sup> <sup>Δ</sup>[a/b]*.*

*Proof.* By induction on the height of a proof. Note that (E) may require similar relettering like (∃ ⇒) and (⇒ ∀). Note that the proof provides the heightpreserving admissibility of substitution.

Let us assume that all proofs are regular in the sense that every parameter a which is fresh by side condition on the respective rule must be fresh in the entire proof, not only on the branch where the application of this rule takes place. There is no loss of generality since every proof may be systematically transformed into a regular one by the substitution lemma. The following notions are crucial for the proof:


Remember that the complexity of atomic formulae, and consequently of cutand proof-degree in case of atomic cuts, is 0. The proof of the cut elimination theorem is based on two lemmata which successively make a reduction: first on the height of the right, and then on the height of the left premiss of cut. ϕ*k*, Γ*<sup>k</sup>* denote k > 0 occurrences of ϕ, Γ, respectively.

**Lemma 4 (Right reduction).** *Let* <sup>D</sup><sup>1</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ *and* <sup>D</sup><sup>2</sup> <sup>ϕ</sup>*k*, Π <sup>⇒</sup> <sup>Σ</sup> *with* <sup>d</sup>D1, dD<sup>2</sup> < dϕ*, and* <sup>ϕ</sup> *principal in* <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ*, then we can construct a proof* <sup>D</sup> *such that* <sup>D</sup> <sup>Γ</sup>*k*, Π <sup>⇒</sup> <sup>Δ</sup>*k*, Σ *and* <sup>d</sup><sup>D</sup> < dϕ*.*

*Proof.* By induction on the height of <sup>D</sup>2. The basis is trivial, since <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ is identical with <sup>Γ</sup>*k*, Π <sup>⇒</sup> <sup>Δ</sup>*k*, Σ. The induction step requires examination of all cases of possible derivations of <sup>ϕ</sup>*k*, Π <sup>⇒</sup> <sup>Σ</sup>, and the role of the cut-formula in the transition. In cases where all occurrences of ϕ are parametric we simply apply the induction hypotheses to the premisses of <sup>ϕ</sup>*k*, Π <sup>⇒</sup> <sup>Σ</sup> and then apply the respective rule – it is essentially due to the context independence of almost all rules and the regularity of proofs, which together prevent violation of side conditions on eigenvariables. If one of the occurrences of ϕ in the premiss(es) is a side formula of the last rule we must additionally apply weakening to restore the missing formula before the application of the relevant rule.

In cases where one occurrence of <sup>ϕ</sup> in <sup>ϕ</sup>*k*, Π <sup>⇒</sup> <sup>Σ</sup> is principal we make use of the fact that ϕ in the left premiss is also principal; for the cases of contraction and weakening it is trivial. Note that due to condition that ϕ is principal in the left premiss it must be compound, since all rules introducing atomic formulae as principal are working only in the antecedents. Hence all cases where one occurrence of atomic ϕ in the right premiss would be introduced by means of (R),(S),(T),(E) are not considered in the proof of this lemma. The only exceptions are axiomatic sequents <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ with principal atomic <sup>ϕ</sup>, but they do not make any harm.

**Lemma 5 (Left reduction).** *Let* <sup>D</sup><sup>1</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ*<sup>k</sup> and* <sup>D</sup><sup>2</sup>  ϕ, Π <sup>⇒</sup> <sup>Σ</sup> *with* <sup>d</sup>D1, dD<sup>2</sup> < dϕ*, then we can construct a proof* <sup>D</sup> *such that* <sup>D</sup> Γ,Π*<sup>k</sup>* <sup>⇒</sup> Δ, Σ*<sup>k</sup> and* <sup>d</sup><sup>D</sup> < dϕ*.*

*Proof.* By induction on the height of <sup>D</sup><sup>1</sup> but with some important differences. First note that we do not require <sup>ϕ</sup> to be principal in ϕ, Π <sup>⇒</sup> <sup>Σ</sup> so it includes the case with ϕ atomic. In all these cases we just apply the induction hypothesis. This guarantees that even if an atomic cut formula was introduced in the right premiss by one of the rules (R),(S),(T),(E) the reduction of the height is done only on the left premiss, and we always obtain the expected result. Now, in cases where one occurrence of <sup>ϕ</sup> in <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ*<sup>k</sup>* is principal we first apply the induction hypothesis to eliminate all other <sup>k</sup> <sup>−</sup> 1 occurrences of <sup>ϕ</sup> in premisses and then we apply the respective rule. Since the only new occurrence of ϕ is principal we can make use of the right reduction lemma again and obtain the result, possibly after some applications of structural rules.

Now we are ready to prove the cut elimination theorem:

# **Theorem 3.** *Every proof in GO can be transformed into cut-free proof.*

*Proof.* By double induction: primary on <sup>d</sup><sup>D</sup> and subsidiary on the number of maximal cuts (in the basis and in the inductive step of the primary induction). We always take the topmost maximal cut and apply Lemma 5 to it. By successive repetition of this procedure we diminish either the degree of a proof or the number of cuts in it until we obtain a cut-free proof.

As a consequence of the cut elimination theorem for GO we obtain:

**Corollary 1.** *If* <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup>*, then it is provable in a proof which is closed under subformulae of* <sup>Γ</sup> <sup>∪</sup> <sup>Δ</sup> *and atomic formulae.*

So cut-free GO satisfies the form of the subformula property which holds for several elementary theories as formalised by Negri and von Plato [14].

# **5 Modifications**

Construction of rules which are deductively equivalent to axioms may be to some extent automatised (see e.g. Negri and von Plato [14], Bra¨uner [1], or Marin, Miller, Pimentel and Volpe [12]). Still, even the choice of the version of (equivalent) axiom which will be used for transformation, may have an impact on the quality of obtained rules. Moreover, very often some additional tuning is necessary to obtain rules, which are well-behaved from the proof-theoretic point of view. In this section we will focus briefly on this problem and sketch some alternatives.

In our adequacy proofs we referred to the original formulation of LA, since rules (R),(T),(S) correspond directly in a modular way to three conjuncts of LA→. Our rule (E) however, is modelled not on LA<sup>←</sup> but rather on the suitable implication of variant 3 of LA from Lemma 1. As a first approximation we can obtain the rule:

$$\frac{\varGamma \Rightarrow \Delta, \exists z (\forall v (va \leftrightarrow vz) \land zb)}{\varGamma \Rightarrow \Delta, ab}$$

which after further decomposition and quantifier elimination yields:

$$\frac{da, \varGamma \Rightarrow \Delta, dc \quad dc, \varGamma \Rightarrow \Delta, da \quad \varGamma \Rightarrow \Delta, cb}{\varGamma \Rightarrow \Delta, ab}.$$

(where d is a new parameter) which is very similar to (E) but with some active atoms in the succedents. This is troublesome for proving cut elimination if ab is a cut formula and a principal formula of (R),(S) or (T) in the right premiss of cut. Fortunately, (E) is interderivable with this rule (it follows from the rule generation theorem in Indrzejczak [5]) and has the principal formula in the antecedent.

It is clear that if we focus on other variants then we can obtain different rules by their decomposition. In effect note that instead of (E) we may equivalently use the following rules based directly on LA, or on variants 2 and 1 respectively:

$$\begin{array}{ll} (E\_{LA}) & \frac{da, \Gamma \Rightarrow \Delta, db \quad da, ea, \Gamma \Rightarrow \Delta, de \quad ab, \Gamma \Rightarrow \Delta}{ca, \Gamma \Rightarrow \Delta} \\ (E\_2) & \frac{da, \Gamma \Rightarrow \Delta, dc \quad da, \Gamma \Rightarrow \Delta, cd \quad ab, \Gamma \Rightarrow \Delta}{ca, cb, \Gamma \Rightarrow \Delta} \\ (E\_1) & \frac{da, ea, \Gamma \Rightarrow \Delta, de \quad ab, \Gamma \Rightarrow \Delta}{ca, cb, \Gamma, \Rightarrow \Delta} \end{array}$$

where d, e are new parameters (eigenvariables).

Note, that each of these rules, used instead of (E), yields a variant of GO for which we can also prove cut elimination. However, as we will show by the end of this section, (E) seems to be optimal. Perhaps, the last one is the most economical in the sense of branching factor. However, since its left premiss directly corresponds to the condition <sup>∀</sup>xy(xa <sup>∧</sup> ya <sup>→</sup> xy) it introduces two different new parameters to premisses which makes it more troublesome in some respects. In fact, if we want to reduce the branching factor it is possible to replace all these rules by the following variants:

$$\begin{array}{ll} (E') & \frac{da, \Gamma \Rightarrow \Delta, dc \quad dc, \Gamma \Rightarrow \Delta, da}{cb, \Gamma \Rightarrow \Delta, ab} \\ (E'\_{LA}) & \frac{da, \Gamma \Rightarrow \Delta, db \quad da, ea, \Gamma \Rightarrow \Delta, de}{ca, \Gamma \Rightarrow \Delta, ab} \\ (E'\_2) & \frac{da, \Gamma \Rightarrow \Delta, dc \quad da, \Gamma \Rightarrow \Delta, cd}{ca, cb, \Gamma \Rightarrow \Delta, ab} \\ (E'\_1) & \frac{da, ea, \Gamma \Rightarrow \Delta, de}{ca, cb, \Gamma \Rightarrow \Delta, ab} \end{array}$$

with the same proviso on eigenvariables d, e. Their interderivability with the rules stated first is easily obtained by means of the rule generation theorem too. These rules seem to be more convenient for proof search. However, for these primed rules cut elimination cannot be proved in the constructive way, for the reasons mentioned above, and it is an open problem if cut-free systems with these rules as primitive are complete.

We finish this section with stating the last reason for choosing (E). Let us explain why (E), the most complicated specific rule of GO, was claimed to be connected with extensionality. Consider the following two principles:

$$\begin{array}{l} WE \; \forall x (xa \leftrightarrow xb) \rightarrow \forall x (ax \leftrightarrow bx) \\ WExt \; \forall x (xa \leftrightarrow xb) \rightarrow \forall x (\varphi(x,a) \leftrightarrow \varphi(x,b)) \end{array}$$

where ϕ(x, a) denotes arbitrary formula with at least one occurrence of x (not bound by any quantifier within ϕ) and a.

**Lemma 6.** W E *is equivalent to* W Ext*.*

*Proof.* That W E follows from W Ext is obvious since the former is a specific instance of the latter. The other direction is by induction on the complexity of ϕ. In the basis there are just two cases: ϕ(x, a) is either xa or ax; the former is trivial and the latter is just W E. The induction step goes like an ordinary proof of the extensionality principle in FOL.

**Lemma 7.** *In G* (E) *is equivalent to* (W E)*.*

*Proof.* Note first that in G the following sequents are provable:


we will use them in the proofs to follow.

For derivability of (E):

$$\begin{array}{c} (\Rightarrow \rightarrow) \begin{array}{c} da, \Gamma \Rightarrow \Delta, dc \\ (\Rightarrow \forall) \begin{array}{c} \Gamma \Rightarrow \Delta, da \leftrightarrow dc \\ \Gamma \Rightarrow \Delta, \forall x (xa \leftrightarrow xc) \end{array} \\ (Cut) \end{array} \frac{\Gamma \Rightarrow \Delta, \forall x (xa \leftrightarrow xc)}{(Cut) \end{array} \begin{array}{c} \mathcal{D} \\ (\Rightarrow \Delta, \forall x (ax \leftrightarrow cx)) \end{array} \begin{array}{c} \mathcal{D} \\ \forall x (ax \leftrightarrow cx), cb \Rightarrow ab \\ cb \end{array}$$

where <sup>D</sup> is a proof of <sup>∀</sup>x(xa <sup>↔</sup> xc) ⇒ ∀x(ax <sup>↔</sup> cx) from W E and the rightmost sequent is provable. The endsequent by cut with ab, Γ <sup>⇒</sup> <sup>Δ</sup> yields the conclusion of (E).

Provability of W E in G with (E):

$$(E)\ \frac{\forall x(xa \leftrightarrow xc), da \Rightarrow dc \qquad \forall x(xa \leftrightarrow xc), dc \Rightarrow da \qquad ab \Rightarrow ab}{\forall x(xa \leftrightarrow xc), cb \Rightarrow ab}$$

In the same way we prove <sup>∀</sup>x(xa <sup>↔</sup> xc), ab <sup>⇒</sup> cb which by (⇒↔), (⇒ ∀) and (⇒→) yields W E.

This shows that we can obtain the axiomatization of elementary ontology by means of LA<sup>→</sup> and W E (or W Ext). Also instead of LA<sup>→</sup> we can use three axioms corresponding to our three rules (R),(S),(T). Note that if we get rid of (E) (or W E) we obtain a weaker version of ontology investigated by Takano [18]. If we get rid of quantifier rules we obtain a quantifier-free version of this system investigated by Ishimoto and Kobayashi [6].

On the basis of the specific features of sequent calculus we can obtain here for free also the intuitionistic version of ontology. As is well known it is sufficient to restrict the rules of G to sequents having at most one formula in the succedent (which requires small modifications like replacement of (↔⇒) and (⇒ ∨) with two variants having always one side formula in the succedent) to obtain the version adequate for the intuitionistic FOL. Since all specific rules for ε can be restricted in a similar way, we can obtain the calculus GIO for the intuitionistic version of elementary ontology. One can easily check that all proofs showing the adequacy of GO and the cut elimination theorem are either intuitionistically correct or can be easily changed into such proofs. The latter remark concerns these proofs in which the classical version of (↔⇒) required the introduction of the second side formula into succedent by (<sup>⇒</sup> <sup>W</sup>); the intuitionistic two versions of (↔⇒) do not require this step.

#### **6 Extensions**

Le´sniewski and his followers were often working on ontology enriched with definitions of special predicates and name-creating functors. In this section we focus

on a number of unary and binary predicates which are popular ontological constants. Instead of adding these definitions to GO we will introduce predicates by means of sequent rules satisfying conditions formulated for well-behaved SC rules. Let us call L*<sup>p</sup>* the language of L*<sup>o</sup>* enriched with all these predicates and GOP, the calculus with the additional rules for predicates. The definitions of the most important unary predicates are:

$$\begin{array}{l}Da := \exists x(xa) \quad Va := \neg \exists x(xa) \\ Sa := \exists x(ax) \quad Ga := \exists xy(xa \land ya \land \neg xy) \end{array}$$

D, V, S, G are unary predicates informing that a is denoting, empty (or void), singular or general. D and S are Le´sniewski's ex and ob respectively. He preferred also to apply sol(a) which we symbolize with U (for unique):

U a := <sup>∀</sup>xy(xa <sup>∧</sup> ya <sup>→</sup> xy) [or simply <sup>¬</sup>Ga]

The additional rules for these predicates are of the form:

$$\begin{array}{llll} (D \Rightarrow) & \frac{ba, \Gamma \Rightarrow \Delta}{Da, \Gamma \Rightarrow \Delta} & (\Rightarrow D) & \frac{\Gamma \Rightarrow \Delta, ca}{\Gamma \Rightarrow \Delta, Da} & (S \Rightarrow) & \frac{ab, \Gamma \Rightarrow \Delta}{Sa, \Gamma \Rightarrow \Delta} \\ (\Rightarrow S) & \frac{\Gamma \Rightarrow \Delta, ac}{\Gamma \Rightarrow \Delta, Sa} & (V \Rightarrow) & \frac{\Gamma \Rightarrow \Delta, ca}{Va, \Gamma \Rightarrow \Delta} & (\Rightarrow V) & \frac{ba, \Gamma \Rightarrow \Delta}{\Gamma \Rightarrow \Delta, Va} \end{array}$$

where b is new and c arbitrary in all schemata.

$$\begin{array}{llll} (G \Rightarrow) & \frac{ba, ca, \Gamma \Rightarrow \Delta, bc}{Ga, \Gamma \Rightarrow \Delta} & (\Rightarrow G) & \frac{\Gamma \Rightarrow \Delta, da}{\Gamma \Rightarrow \Delta, ca} & \frac{\Gamma \Rightarrow \Delta, ca}{\Gamma \Rightarrow \Delta, Ga} \\ (\Rightarrow U) & \frac{ba, ca, \Gamma \Rightarrow \Delta, bc}{\Gamma \Rightarrow \Delta, Un} & (U \Rightarrow) & \frac{\Gamma \Rightarrow \Delta, da}{\Gamma \Rightarrow \Delta} & \frac{\Gamma \Rightarrow \Delta, ca}{Ua, \Gamma \Rightarrow \Delta} \end{array}$$

where b, c are new, and d, e are arbitrary parameters.

The binary predicates of identity, (weak and strong) coextensiveness, nonbeing b, subsumption and antysubsumption are defined in the following way:

$$\begin{cases} a = b := ab \land ba & a\overline{\varepsilon}b := aa \land \neg ab\\ a \equiv b := \forall x(xa \leftrightarrow xb) & a \subset b := \forall x(xa \to xb) \\ a \approx b := a \equiv b \land Da & a \not\subseteq b := \forall x(xa \to \neg xb) \end{cases}$$

Finally note that Aristotelian categorical sentences can be also defined in Le´sniewski's ontology:

$$\begin{array}{l} aAb := a \subset b \land Da \; aEb := a \not\subseteq b \land Da\\ aIb := \exists x(xa \land xb) \; aOb := \exists x(xa \land \neg xb) \end{array}$$

The rules for binary predicates:

$$\begin{array}{llll} (\implies) & \frac{ab, ba, \Gamma \Rightarrow \Delta}{a = b, \Gamma \Rightarrow \Delta} & (\Rightarrow =) & \frac{\Gamma \Rightarrow \Delta, ab \quad \Gamma \Rightarrow \Delta, ba}{\Gamma \Rightarrow \Delta, ca, cb \quad \Gamma \Rightarrow \Delta} \\ (\implies) & \frac{\Gamma \Rightarrow \Delta, ca, cb \quad ca, cb, \Gamma \Rightarrow \Delta}{a \equiv b, \Gamma \Rightarrow \Delta} & (\Rightarrow \equiv) & \frac{da, \Gamma \Rightarrow \Delta, db \quad db, \Gamma \Rightarrow \Delta, da}{\Gamma \Rightarrow \Delta, a \equiv b} \\ (\approx \Rightarrow) & \frac{da, \Gamma \Rightarrow \Delta, ca, cb \quad ca, cb, da, \Gamma \Rightarrow \Delta}{a \approx b, \Gamma \Rightarrow \Delta} & & \\ \end{array}$$

(⇒≈) da, Γ <sup>⇒</sup> Δ, db db, Γ <sup>⇒</sup> Δ, da Γ <sup>⇒</sup> Δ, ca <sup>Γ</sup> <sup>⇒</sup> Δ, a <sup>≈</sup> <sup>b</sup> (¯<sup>ε</sup> <sup>⇒</sup>) aa, Γ <sup>⇒</sup> Δ, ab aεb, Γ ¯ <sup>⇒</sup> <sup>Δ</sup> (<sup>⇒</sup> <sup>ε</sup>¯) <sup>Γ</sup> <sup>⇒</sup> Δ, aa ab, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, aεb¯ (⊂⇒) <sup>Γ</sup> <sup>⇒</sup> Δ, ca cb, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>a</sup> <sup>⊂</sup> b, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒⊂) da, Γ <sup>⇒</sup> Δ, db <sup>Γ</sup> <sup>⇒</sup> Δ, a <sup>⊂</sup> <sup>b</sup> (-<sup>⇒</sup>) <sup>Γ</sup> <sup>⇒</sup> Δ, ca Γ <sup>⇒</sup> Δ, cb a b, Γ <sup>⇒</sup> <sup>Δ</sup> (⇒-) da, db, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, a b (<sup>A</sup> <sup>⇒</sup>) da, Γ <sup>⇒</sup> Δ, ca cb, da, Γ <sup>⇒</sup> <sup>Δ</sup> aAb, Γ <sup>⇒</sup> <sup>Δ</sup> (<sup>⇒</sup> <sup>A</sup>) da, Γ <sup>⇒</sup> Δ, db Γ <sup>⇒</sup> Δ, ca <sup>Γ</sup> <sup>⇒</sup> Δ, aAb (<sup>E</sup> <sup>⇒</sup>) da, Γ <sup>⇒</sup> Δ, ca da, Γ <sup>⇒</sup> Δ, cb aEb, Γ <sup>⇒</sup> <sup>Δ</sup> (<sup>⇒</sup> <sup>E</sup>) da, db, Γ <sup>⇒</sup> Δ Γ <sup>⇒</sup> Δ, ca <sup>Γ</sup> <sup>⇒</sup> Δ, aEb (<sup>I</sup> <sup>⇒</sup>) da, db, Γ <sup>⇒</sup> <sup>Δ</sup> aIb, Γ <sup>⇒</sup> <sup>Δ</sup> (<sup>⇒</sup> <sup>I</sup>) <sup>Γ</sup> <sup>⇒</sup> Δ, ca Γ <sup>⇒</sup> Δ, cb <sup>Γ</sup> <sup>⇒</sup> Δ, aIb (<sup>O</sup> <sup>⇒</sup>) da, Γ <sup>⇒</sup> Δ, db aOb, Γ <sup>⇒</sup> <sup>Δ</sup> (<sup>⇒</sup> <sup>O</sup>) <sup>Γ</sup> <sup>⇒</sup> Δ, ca cb, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, aOb

where <sup>d</sup> is new and <sup>c</sup> arbitrary (but <sup>c</sup> can be identical to <sup>d</sup> in rules for <sup>≈</sup>, A, E).

Proofs of interderivability with equivalences corresponding to suitable definitions are trivial in most cases. We provide only one for the sake of illustration. The hardest case is ≈.

$$(\Leftrightarrow \succ) \xrightarrow[(\Leftrightarrow \leftrightarrow)] \begin{array}{c} da, ca \Rightarrow ca, cb \qquad da, ca, cb \Rightarrow cb \\\ (\Rightarrow \leftrightarrow) \begin{array}{c} a \approx b, ca \Rightarrow cb \\\ \end{array} \end{array} \qquad \begin{array}{c} da, ca \Rightarrow ca, cb \qquad da, ca, cb \Rightarrow ca \\\ \end{array}$$

$$(\Rightarrow \curlyvee) \begin{array}{c} a \approx b \Rightarrow ca \leftrightarrow cb \\\ a \approx b \Rightarrow \forall x (xa \leftrightarrow xb) \end{array}$$

and

$$\begin{array}{ll} (\Rightarrow \exists) \frac{ca \Rightarrow ca, aa, ab}{ca \Rightarrow \exists x (xa), aa, ab} & (\Rightarrow \forall) \frac{ca, aa, ab \Rightarrow ca}{ca, aa, ab \Rightarrow \exists x (xa)} \\ (\approx \Rightarrow) \end{array}$$

by (⇒ ∧) yield one part. For the second:

$$(\Rightarrow \approx) \frac{\forall x (xa \leftrightarrow xb), da \Rightarrow db \qquad \forall x (xa \leftrightarrow xb), db \Rightarrow da \qquad ca \Rightarrow ca}{(\exists \Rightarrow) \frac{\forall x (xa \leftrightarrow xb), ca \Rightarrow a \approx b}{\forall x (xa \leftrightarrow xb), \exists x (xa) \Rightarrow a \approx b}}$$

where the left and the middle premiss are obviously provable by means of (∀ ⇒), (↔⇒). We omit proofs of the derivability of both rules in GO enriched with the axiom ⇒ ∀x(xa <sup>↔</sup> xb) ∧ ∃x(xa) <sup>↔</sup> <sup>a</sup> <sup>≈</sup> <sup>b</sup>.

We treat all these predicates as new constants hence their complexity is fixed as 1, in contrast to atomic formulae, which are of complexity 0. Of course we can consider ontology with an arbitrary selection of these predicates according to the needs. Accordingly we can enrich GO also with arbitrary selection of suitable rules for predicates. All the results holding for GOP are correct for any subsystem. Let us list some important features of these rules and enriched GO:


We do not prove the substitution lemma, since the proof is standard, but we comment on the last point, since cut elimination holds due to 3 and 4. The notion of reductivity for sequent rules was introduced by Ciabattoni [2] and it may be roughly defined as follows: A pair of introduction rules (<sup>⇒</sup> ), ( <sup>⇒</sup>) for a constant is reductive if an application of cut on cut formulae introduced by these rules may be replaced by the series of cuts made on less complex formulae, in particular on their subformulae. Basically it enables the reduction of cutdegree in the proof of cut elimination. Again we illustrate the point with respect to the most complicated case. Let us consider the application of cut with the cut formula <sup>a</sup> <sup>≈</sup> <sup>b</sup>, then the left premiss of this cut was obtained by:

$$(\Rightarrow \approx) \xrightarrow{ca, \Gamma \Rightarrow \Delta, cb} \begin{array}{c} cb, \Gamma \Rightarrow \Delta, ca \qquad \Gamma \Rightarrow \Delta, da \ \vdash \end{array}$$

where c is new and d is arbitrary. And the right premiss was obtained by:

$$(\approx)\begin{array}{c} ea,\varPi\Rightarrow\Sigma,fa,fb \\\hline\end{array}\begin{array}{c} ea,fa,fb,\varPi\Rightarrow\Sigma\\\end{array}$$

where e is new and f is arbitrary.

By the substitution lemma on the premisses of (⇒≈),(≈⇒) we obtain:

1. f a, Γ <sup>⇒</sup> Δ, f b 2. f b, Γ <sup>⇒</sup> Δ, f a 3. da, Π <sup>⇒</sup> Σ, f a, f b 4. da, f a, f b, Π <sup>⇒</sup> <sup>Σ</sup>

and we can derive:

$$(Cut)\ \frac{\Gamma \Rightarrow \Delta, da \qquad da, \Pi \Rightarrow \Sigma, fa, fb}{(Cut)\ \frac{\Gamma, \Pi \Rightarrow \Delta, \Sigma, fa, fb}{(C)\ \frac{\Gamma, \Gamma, \Pi \Rightarrow \Delta, \Delta, fa, fa}{(Cut)\ \frac{\Gamma, \Pi \Rightarrow \Delta, \Sigma, fa}{}}}{(Cut)\ \frac{\Gamma, \Pi \Rightarrow \Delta, \Sigma, fa}{(C)\ \frac{\Gamma, \Gamma, \Pi, \Pi \Rightarrow \Delta, \Delta, \Sigma, \Sigma}{\Gamma, \Pi \Rightarrow \Delta, \Sigma}}$$

where <sup>D</sup> is a similar proof of f a, Γ, Π <sup>⇒</sup> Δ, Σ from <sup>Γ</sup> <sup>⇒</sup> Δ, da, 4 and 1 by cuts and contractions. All cuts are of lower degree than the original cut. It is routine exercise to check that all rules for predicates are reductive and this is sufficient for proving Lemma 4 and 5 for GOP. As a consequence we obtain:

**Theorem 4.** *Every proof in GOP can be transformed into cut-free proof.*

Since the rules are modular this holds for every subsystem based on a selection of the above rules.

## **7 Conclusion**

Both the basic system GO and its extension GOP are cut-free and satisfy a form of the subformula property. It shows that Le´sniewski's ontology admits standard proof-theoretical study and allows us to obtain reasonable results. In particular, we can prove for GO the interpolation theorem using the Maehara strategy (see e.g. [19]) and this implies for GO other expected results like e.g. Beth's definability theorem. Space restrictions forbid to present it here. On the other hand, we restricted our study to the system with simple names only, whereas fuller study should cover also complex names built with the help of several nameforming functors. The typical ones are the counterparts of the well-known class operations definable in Le´sniewski's ontology in the following way:

$$a\bar{b} := aa \land \neg ab \quad a(b \cap c) := ab \land ac \quad a(b \cup c) := ab \lor ac$$

It is not a problem to provide suitable rules corresponding to these definitions:

$$\begin{array}{llll} (\lnot\Rightarrow) & \frac{aa,\varGamma\Rightarrow\Delta,ab}{a\bar{b},\varGamma\Rightarrow\Delta} & (\Rightarrow -) & \frac{ab,\varGamma\Rightarrow\Delta}{\varGamma\Rightarrow\Delta,aa} \\ (\sqcap\Rightarrow) & \frac{ab,ac,\varGamma\Rightarrow\Delta}{a(b\cap c),\varGamma\Rightarrow\Delta} & (\Rightarrow \cap) & \frac{\varGamma\Rightarrow\Delta,ab}{\varGamma\Rightarrow\Delta,ac} \\ (\cup\Rightarrow) & \frac{ab,\varGamma\Rightarrow\Delta & ac,\varGamma\Rightarrow\Delta}{a(b\cup c),\varGamma\Rightarrow\Delta} & (\Rightarrow \cup) & \frac{\varGamma\Rightarrow\Delta,ab,ac}{\varGamma\Rightarrow\Delta,ab,ac} \\ \end{array}$$

Although their structure is similar to the rules provided for predicates in the last section, their addition raises important problems. One is of a more general nature and well-known: definitions of term-forming operations in ontology are creative. Although it was intended in the original architecture of Le´sniewski's systems, in the modern approach this is not welcome. Iwanu´s [7] has shown that the problem can be overcome by enriching elementary ontology with two axioms corresponding to special versions of the comprehension axiom but this opens a problem of derivability of these axioms in GO enriched with special rules.

There is also a specific problem with cut elimination for GO with added complex terms and suitable rules. Even if they are reductive (and the rules stated above are reductive, as a reader can check), we run into a problem with quantifier rules. If unrestricted instantiation of terms is admitted in (⇒ ∃),(∀ ⇒) the subformula property is lost. One can find some solutions for this problem, for example by using two separated measures of complexity for formula-makers and term-makers (see e.g. [3]), or by restricting in some way the instantiation of terms in respective quantifier rules (see e.g. [4]). The examination of these possibilities is left for further study.

The last open problem deserving careful study is the possibility of application for automated proof-search and obtaining semi-decision procedures (or decision procedures for quantifier-free subsystems) on the basis of the provided sequent calculus. In particular, due to modularity of provided rules, one could obtain in this way decision procedures for several quantifier-free subsystems investigated by Pietruszczak [15], or by Ishimoto and Kobayashi [6].

**Acknowledgements.** The research is supported by the National Science Centre, Poland (grant number: DEC-2017/25/B/HS1/01268). I am also greatly indebted to Nils K¨urbis for his valuable comments.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Bayesian Ranking for Strategy Scheduling in Automated Theorem Provers**

Chaitanya Mangla(B) , Sean B. Holden , and Lawrence C. Paulson

Computer Laboratory, University of Cambridge, Cambridge, England {cm772,sbh11,lp15}@cl.cam.ac.uk

**Abstract.** A *strategy schedule* allocates time to proof strategies that are used in sequence in a theorem prover. We employ Bayesian statistics to propose alternative sequences for the strategy schedule in each proof attempt. Tested on the TPTP problem library, our method yields a time saving of more than 50%. By extending this method to optimize the fixed time allocations to each strategy, we obtain a notable increase in the number of theorems proved.

**Keywords:** Bayesian machine learning · Strategy scheduling · Automated theorem proving

### **1 Introduction**

Theorem provers have wide-ranging applications, including formal verification of large mathematical proofs [9] and reasoning in knowledge-bases [37]. Thus, improvements in provers that lead to more successful proofs, and savings in the time taken to discover proofs, are desirable.

Automated theorem provers generate proofs by utilizing inference procedures in combination with heuristic search. A specific configuration of a prover, which may be specialized for a certain class of problems, is termed a *strategy*. Provers such as E [27] can select from a portfolio of strategies to solve the goal theorem. Furthermore, certain provers hedge their allocated proof time across a number of proof strategies by use of a *strategy schedule*, which specifies a time allocation for each strategy and the sequence in which they are used until one proves the goal theorem. This method was pioneered in the Gandalf prover [33].

Prediction of the effectiveness of a strategy prior to a proof attempt is usually intractable or undecidable [12]. A practical implementation must infer such a prediction by tractable approximations. Therefore, machine learning methods for strategy invention, selection and scheduling are actively researched. Machine learning methods for strategy selection conditioned on the proof goal have shown promising results [3]. Good results have also been reported for strategy synthesis using machine learning [1]. Work on machine learning for algorithm portfolios which allocate resources to multiple solvers simultaneously—is also relevant to strategy scheduling because of its similar goals. For this purpose, Silverthorn and Miikkulainen propose latent class models [31] .

In this work, we present a method for generating strategy schedules using Bayesian learning with two primary goals: to reduce proving time or to prove more theorems. We have evaluated this method for both purposes using iLean-CoP, an intuitionistic first-order logic prover with a compact implementation and good performance [18]. Intuitionistic logic is a non-standard form of firstorder logic, of which relatively little is known with regard to automation. It is of interest in theoretical computer science and philosophy of mathematics [7]. Among intuitionistic provers, iLeanCoP is seen as impressive and is able to prove a sufficient number of theorems in our benchmarks for significance testing. Its core is implemented in around thirty lines of Prolog; such simplicity adds clarity to interpretations of our results. Our method was benchmarked on the Thousands of Problems for Theorem Provers (TPTP) problem library [32], in which we are able to save more than 50% on proof time when aiming for the former goal. Towards the latter goal, we are able to prove notably more theorems.

Our two primary, complementary, contributions presented here are: first, a Bayesian machine learning model for strategy scheduling; and second, engineered features for use in that model. The text below is organized as follows. In Sect. 2, we introduce preliminary material used subsequently to construct a machine learning model for strategy scheduling, described in Sects. 3–7. The data used to train and evaluate this model are described in Sect. 8, followed by experiments, results and conclusions in Sects. 9–12.

#### **2 Distribution of Permutations**

We model a strategy schedule using a vector of strategies, and thus all schedules are *permutations* of the same.

**Definition 1 (Permutation).** *Let* <sup>M</sup> <sup>∈</sup> <sup>N</sup>*. A permutation <sup>π</sup>* <sup>∈</sup> <sup>N</sup><sup>M</sup> *is a vector of indices, with* <sup>π</sup>i ∈ {1,...,M} *and* <sup>∀</sup><sup>i</sup> <sup>=</sup> <sup>j</sup> : <sup>π</sup>i <sup>=</sup> <sup>π</sup>j *, representing a reordering of the components of an* <sup>M</sup>*-dimensional vector <sup>s</sup> to* [sπ<sup>1</sup> , sπ<sup>2</sup> ,...,sπ*<sup>M</sup>* ] - *.*

In this text, vector-valued variables, such as *π* above, are in boldface, which must change when they are indexed, like π<sup>1</sup> for example. For probabilistic modelling of schedules represented using permutations, we use the Plakett-Luce model [14,21] to define a parametric probability distribution over permutations.

**Definition 2 (Plakett-Luce distribution).** *The Plakett-Luce distribution* Perm(*λ*) *with parameter <sup>λ</sup>* <sup>∈</sup> <sup>R</sup><sup>M</sup> ><sup>0</sup>*, has support over permutations of indices* {1,...,M}*. For permutation <sup>Π</sup> distributed as* Perm(*λ*)*,*

$$\Pr(\varPi = \pi; \lambda) = \prod\_{j=1}^{M} \frac{\lambda\_{\pi\_j}}{\sum\_{u=j}^{M} \lambda\_{\pi\_u}}.$$

In latter sections, we use the parameter *λ* to assign an abstract 'score' to strategies when modelling distributions over schedules. This score is particularly useful due to the following theorem.

**Theorem 1.** *Let <sup>π</sup>*<sup>∗</sup> *be a mode of the distribution* Perm(*λ*)*, that is*

$$
\pi^\* = \underset{\pi}{\text{argmax}} \, \text{Pr}(\pi; \lambda).
$$

*Then,* <sup>λ</sup>π<sup>∗</sup> <sup>1</sup> λπ∗ <sup>2</sup> λπ∗ <sup>3</sup> - ... λπ∗ *M* .

Thus, assuming *λ* is a vector of the score of each strategy, the highest probability permutation indexes the strategies in decreasing order of scores. Conversely, the highest probability permutation can be obtained efficiently by sorting the indices of *λ* with respect to their corresponding values in decreasing order. Cao et al. [4] have presented a proof of Theorem 1, and Cheng et al. [5] have discussed some further interesting details.

*Example 1.* Let *<sup>λ</sup>* = [1, 9]- , *π*(1) = [1, 2] and *π*(2) = [2, 1]- . Then,

$$\Pr(\mathcal{II} = \pi^{(1)}; \lambda) = \frac{\lambda\_{\pi\_1^{(1)}}}{\lambda\_{\pi\_1^{(1)}} + \lambda\_{\pi\_2^{(1)}}} \cdot \frac{\lambda\_{\pi\_2^{(1)}}}{\lambda\_{\pi\_2^{(1)}}} = \frac{1}{1 + 9} \cdot \frac{9}{9} = \frac{1}{10}.$$

Similarly, Pr(*<sup>Π</sup>* <sup>=</sup> *<sup>π</sup>*(2);*λ*)=9/10.

**Theorem 2.** Perm(c*λ*) = Perm(*λ*)*, for any scalar constant* c > <sup>0</sup>*.*

In other words, the Plakett-Luce distribution is invariant to the scale of the parameter vector.

**Lemma 1.** Perm(exp(*<sup>λ</sup>* <sup>+</sup> <sup>c</sup>)) = Perm(exp(*λ*))*, for any scalar constant* <sup>c</sup> <sup>∈</sup> <sup>R</sup>*.*

Lemma 1 follows from Theorem 2, and shows the same distribution is translation invariant if the parameter is exponentiated. Cao et al. [4] give proofs of both.

#### **3 A Maximum Likelihood Model**

We model a strategy schedule as a ranking of known strategies, where each strategy is constructed by a parameter setting and time allocation. A ranking therein is a permutation of strategies, with each strategy retaining its time allocation irrespective of the ordering. We construct, in this section, a model for inference of such permutations that is linear in the parameters.

Suppose we have a repository of N theorems which we test against each of our <sup>M</sup> known strategies to build a data-set <sup>D</sup> <sup>=</sup> {(*π*(i), *<sup>x</sup>*(i))} N i=1, where *<sup>π</sup>*(i) is a desirable ordering of strategies for theorem i and *x*(i) is a feature vector representation of the theorem. In Sect. 9, we detail how we instantiated D for our experiments, which serves as an example for any other implementation. We assume that *π*(i) has Plakett-Luce distribution conditioned on *x*(i) such that

$$\Pr(\pi; x, \omega) = \text{Perm}(\varLambda(x, \omega)), \tag{1}$$

where *<sup>ω</sup>* is a parameter the model must learn and *<sup>Λ</sup>*(·) is a vector-valued function of range R<sup>M</sup> <sup>&</sup>gt;<sup>0</sup>. We use the notation <sup>Λ</sup>(·)i to index into the value of *<sup>Λ</sup>*(·). We

represent our prover strategies with feature vectors {*d*(j)} M j=1. To calculate the score of strategy <sup>j</sup> using <sup>Λ</sup>(·)j , we specify

$$A(\mathbf{z}^{(i)}, \omega)\_j = \exp\left(\phi(\mathbf{z}^{(i)}, \mathbf{d}^{(j)})^\mathsf{T}\omega\right) \tag{2}$$

to ensure that the scores are positive valued, where *φ* is a suitable basis expansion function. Assuming the data is i.i.d, the likelihood of the parameter vector is given by

$$\mathcal{L}(\omega) = p(\mathcal{D}; \omega) = \prod\_{i=1}^{N} \Pr(\pi^{(i)}; A(x^{(i)}, \omega)).\tag{3}$$

An *ω*ˆ that maximizes this likelihood can then be used to forecast the distribution over permutations for a new theorem *x*<sup>∗</sup> by evaluating Perm(*Λ*(*x*∗, *ω*ˆ )) for all permutations. This would incur factorial complexity; however, we are often only interested in the most likely permutation, which can be retrieved in polynomial time. Specifically for strategy scheduling the permutation with the highest predicted probability should reflect the orderings in the data. For this purpose, we use Theorem 1 to find the highest probability permutation *π*<sup>∗</sup> by sorting the values of {*Λ*(*x*∗, *<sup>ω</sup>*<sup>ˆ</sup> )j} M j=1 in descending order.

*Remark 1.* A method named ListNet designed to rank documents for search queries using the Plakett-Luce distribution is evaluated by Cao et al. [4]. Their evaluation uses a linear basis expansion. We can derive a similar construction in our model by setting

$$\phi(\mathbf{z}^{(i)}, \mathbf{d}^{(j)}) = [\mathbf{z}^{(i)\mathsf{T}}, \mathbf{d}^{(j)\mathsf{T}}]^\mathsf{T}.\tag{4}$$

*Remark 2.* The likelihood in Equation (3) can be maximized by minimizing the negative log likelihood (*ω*) = <sup>−</sup> log <sup>L</sup>(*ω*), which (as shown by Sch¨afer and H¨ullermeier [26]) is convex and therefore can be minimized using gradient-based methods. The minima may, however, be unidentifiable due to translation invariance, as demonstrated by Lemma 1. This problem is eliminated in our Bayesian model by the use of a Gaussian prior, as explained in Sect. 4.

*Example 2.* Let there be <sup>N</sup> = 2 theorems and <sup>M</sup> = 2 strategies. Let the theorems and strategies be characterized by univariate values such that x(1) = 1, x(2) = 2, d(1) = 1 and d(2) = 2.

Suppose strategy d(1) is ideal for theorem x(1) and strategy d(2) for x(2), as shown on the right, where a + indicates the preferred strategy. d(1) d(2) <sup>x</sup>(1) <sup>+</sup> <sup>−</sup> <sup>x</sup>(2) <sup>−</sup> <sup>+</sup>

This is evidently an example of a parity problem [34], and hence cannot be modelled by a simple linear expansion using the basis function mentioned in Remark 1. A solution in this instance is to use

$$
\phi(x^{(i)}, d^{(j)}) = x^{(i)} \cdot d^{(j)}.
$$

The parameter ω is then one-dimensional, and the required training data takes the form <sup>D</sup> <sup>=</sup> {([1, 2]-, 1),([2, 1]-, 2)}. We find that <sup>L</sup>(w) is convex, with maxima at ˆω = 0.42 as shown in Fig. 1.

**Fig. 1.** The likelihood function in Example 2.

#### **4 Bayesian Inference**

We place a Gaussian prior distribution on the parameter *ω* of the model described in Sect. 3. This has two advantages: first, the posterior mode is identifiable, as noted by Johnson et al. [11] and demonstrated in Example 3 on page 7; second, the parameter is regularized. With this prior specified as the normal distribution

$$
\omega \sim \mathcal{N}(m\_0, \mathbf{S\_0}), \tag{5}
$$

and assuming *<sup>π</sup>* is independent of <sup>D</sup> given (*x*, *<sup>ω</sup>*), the posterior predictive distribution is

$$p(\boldsymbol{\pi}|\boldsymbol{x}^\*, \mathcal{D}) = \int p(\boldsymbol{\pi}|\boldsymbol{x}^\*, \boldsymbol{\omega}) p(\boldsymbol{\omega}|\mathcal{D}) d\boldsymbol{\omega},$$

which may be approximated by sampling from the posterior,

$$
\omega^s \sim p(\omega|\mathcal{D}),
\tag{6}
$$

to obtain

$$p(\boldsymbol{\pi}|\boldsymbol{x}^\*, \mathcal{D}) \approx \frac{1}{S} \sum\_{s=1}^S p(\boldsymbol{\pi}|\boldsymbol{x}^\*, \boldsymbol{\omega}^s). \tag{7}$$

Given a new theorem *x*∗, to find the permutation of strategies with the highest probability of success, using the approximation above would require its evaluation for every permutation of *π*. This process incurs factorial complexity. We instead make a Bayes point approximation [16] using the mean values of the samples such that,

$$\begin{array}{rcl}p(\pi|x^\*,\mathcal{D}) \approx & p(\pi|x^\*,\langle\omega^s\rangle) \quad \text{using Eq. (7)}\\ =& \Pr(\pi|A(x^\*,\langle\omega^s\rangle)) \text{ using Eq. (1),}\end{array}$$

where · denotes mean value. The mean of the Plakett-Luce parameter for Bayesian inference has been used in prior work [8] to obtain good results. Furthermore, using that, the highest probability permutation can be obtained by using Theorem 1, thereby incurring only the cost of sorting the items. This saving is substantial when generating a strategy schedule, because it saves on prediction time, which is important for the following reason.

#### **Algorithm 1.** Metropolis-Hastings Algorithm

Suppose we have generated samples {*ω*(1)*,..., <sup>ω</sup>*(*i*) } from the *target distribution <sup>p</sup>*. Generate *ω*(*i*+1) as follows.

1: Generate candidate value *<sup>ω</sup>*˙ <sup>∼</sup> *<sup>q</sup>*(*ω*(*i*) ), where *q* is the *proposal distribution*.

2: Evaluate *<sup>r</sup>* <sup>≡</sup> *<sup>r</sup>*(*ω*(*i*) *, ω*˙ ) where

$$r(x,y) = \min\left\{\frac{p(y)}{p(x)}\frac{q(x|y)}{q(y|x)}, 1\right\}.$$

3: Set

*ω*(*i*+1) = *ω*˙ with probability *r <sup>ω</sup>*(*i*) with probability 1 <sup>−</sup> *r.*

*Remark 3.* While benchmarking and in typical use, a prover is allocated a fixed amount of time for a proof attempt, and any time taken to predict a strategy schedule must be accounted for within this allocation. Time taken for this prediction is time taken away from the prover itself which could have been invested in the proof search. Therefore, it is essential to minimize schedule prediction time. It is particularly wise to favour a saving in prediction time at the cost of model optimization and training time.

*Remark 4.* In our implementation we set *<sup>m</sup>*<sup>0</sup> = 0. This has the effect of prioritizing smaller weights *<sup>ω</sup>* in the posterior. Furthermore, we set *<sup>S</sup>*<sup>0</sup> <sup>=</sup> <sup>η</sup>*I*, η <sup>∈</sup> <sup>R</sup>, where *I* is the identity matrix. Consequently, the hyperparameter η controls the strength of the prior, since the entropy of the Gaussian prior scales linearly by log <sup>|</sup>*S*0|.

*Remark 5.* A specialization of the Plakett-Luce distribution using the Thurstonian interpretation admits a Gamma distribution conjugate prior [8]. That, however, is unavailable to our model when parametrized as shown in Eq. (1).

#### **5 Sampling**

We use the Markov chain Monte Carlo (MCMC) Metropolis-Hastings algorithm [38] to generate samples from the posterior distribution. In MCMC sampling, one constructs a Markov chain whose stationary distribution matches the target distribution p. For the Metropolis-Hastings algorithm, stated in Algorithm 1, this chain is constructed using a proposal distribution <sup>y</sup>|<sup>x</sup> <sup>∼</sup> <sup>q</sup>, where <sup>q</sup> is set to a distribution that can be conveniently sampled from.

Note that while calculating r in Algorithm 1, the normalization constant of the target density p cancels out. This is to our advantage; to generate samples *ω*<sup>s</sup> from the posterior, which is, by Eq. (3) and Eq. (5),

$$\begin{split}p(\omega|\mathcal{D}) &\propto p(\mathcal{D}|\omega)p(\omega) \\ &= \mathcal{L}(\omega)\mathcal{N}(m\_{\mathbf{0}}, \mathbf{S\_{0}}),\end{split} \tag{8}$$

the posterior only needs to be computed in this unnormalized form.

In this work, we choose a random walk proposal of the form

$$q(\omega'|\omega) = \mathcal{N}(\omega'|\omega, \Sigma\_q),\tag{9}$$

and tune <sup>Σ</sup>q for efficient sampling simulation. We start the simulation at a local mode *<sup>ω</sup>*<sup>ˆ</sup> , and set <sup>N</sup> (*ω*<sup>ˆ</sup> , Σq) to approximate the local curvature of the posterior at that point using methods by Rossi [25]. Specifically, our procedure for computing Σq is as follows.

#### 1. First, writing the posterior from Eq. (8) as

$$p(\omega|\mathcal{D}) = \frac{1}{Z}e^{-E(\omega)},$$

where Z is the normalization constant, we have

$$E(\omega) = -\log \mathcal{L}(\omega) - \log \mathcal{N}(m\_\mathbf{0}, \mathbf{S\_0}).\tag{10}$$

We find a local mode *ω*ˆ by optimizing E(*ω*) using a gradient-based method.

2. Then, using a Laplace approximation [2], we approximate the posterior in the locality of this mode to

$$\mathcal{N}(\hat{\omega}, H^{-1}), \text{ where } H = \nabla \nabla E(\omega)|\_{\hat{\omega}}.$$

is the Hessian matrix of E(*ω*) evaluated at that local mode.

3. Finally, we set

$$
\Sigma\_q = s^2 \, H^{-1}
$$

in Eq. (9), where s is used to tune all the length scales. We set this value to s<sup>2</sup> = 2.38 based on the results by Roberts and Rosenthal [24].

*Remark 6.* When calculating <sup>r</sup> in Algorithm <sup>1</sup> during sampling, to evaluate the unnormalized posterior at any point *ω*<sup>s</sup> we compute it from Equation (10) as exp(−E(*ω*s))—it is therefore the only form in which the posterior needs to be coded in the implementation.

*Example 3 (Gaussian Prior).* To demonstrate the effect of using a Gaussian prior, we build upon Example 2, with the data taking the form

$$\mathcal{D} = \{ ([1,2]\mathsf{I},1), ([2,1]\mathsf{I},2) \}.$$

We perform basis expansion as explained in Sect. 6 with prior parameter η = 1.0, kernel σ = 0.1 and ς = 2 centres. Thus, the model parameter is

$$
\omega = [\omega\_1, \omega\_2]^\mathsf{T}, \quad \omega \in \mathbb{R}^2.
$$

The unnormalized negative log posterior E(ω1, ω2), as defined in Eq. (10), is shown in Fig. 2b; and the negative log likelihood (ω1, ω2) = <sup>−</sup> log <sup>L</sup>(ω1, ω2) as mentioned in Remark 2, is shown in Fig. 2a. Note the contrast in the shape of the two surfaces. The minimum is along the top-right portion in Fig. 2a, which is flat and leads to an unidentifiable point estimate, whereas in Fig. 2b, the minimum is in a narrow region near the centre. The Gaussian prior, in informal terms, has lifted the surface up, with an effect that increases in proportion to the distance from the origin.

**Fig. 2.** Comparison of the shape of the likelihood and the posterior functions.

#### **6 Basis Expansion**

Example 2 shows how the linear expansion in Remark 1 is ineffective even in very simple problem instances. The maximum likelihood bilinear model presented by Sch¨afer and H¨ullermeier [26] is related to our model defined in Sect. 2 with the basis performing the Kronecker (tensor) product *<sup>φ</sup>*(x, d) = <sup>x</sup> <sup>⊗</sup> <sup>d</sup>. Their results show such an expansion produces a competitive model, but falls behind in comparison to their non-linear model.

To model non-linear interactions between theorems and strategies, we use a *Gaussian kernel* for the basis expansion.

**Definition 3 (Gaussian Kernel).** *A Gaussian kernel* <sup>κ</sup> *is defined by*

$$\kappa(\mathbf{y}, \mathbf{z}) = \exp\left(-\frac{||\mathbf{y} - \mathbf{z}||^2}{2\sigma^2}\right), \quad \text{for } \sigma > 0.1$$

The Gaussian kernel κ(*y*, *z*) effectively represents the inner product of *y* and *z* in a Hilbert space whose bandwidth is controlled by σ. Smaller values of σ correspond to a higher bandwidth, more flexible, inner product space. Larger values of σ will reduce the kernel to a constant function, as detailed in [30]. For our ranking model, we must tune σ to balance between over-fitting and under-performance.

We use the Gaussian kernel for basis expansion by setting

$$\phi(x,d) = \left[\kappa\left([x^{\mathsf{T}},d^{\mathsf{T}}]^{\mathsf{T}},c^{(1)}\right),\ldots,\kappa\left([x^{\mathsf{T}},d^{\mathsf{T}}]^{\mathsf{T}},c^{(C)}\right)\right]^{\mathsf{T}},$$

where {*c*(i)} C i=1 is a collection of *centres*. By choosing centres to be themselves composed of theorems *x*(.) and strategies *d*(.) , such that *<sup>c</sup>*(.) = [*x*(.)- , *<sup>d</sup>*(.)- ] - , the basis expansion above represents each data item with a non-linear inner product against other known items.

To find the relevant subset of D from which centres should be formed, we follow the method described in the steps below.


$$
\Gamma\_{i,j} = \phi(\mathbf{c}^{(i)})\_j = \kappa(\mathbf{c}^{(i)}, \mathbf{c}^{(j)}) .
$$


This method is inspired by the procedure used in Relevance Vector Machines [35] for a similar purpose.

*Remark 7 (score).* For a strategy that succeeds in proving a theorem, the score for the pair is the fraction of the time allocation left unconsumed by the prover. For an unsuccessful strategy-theorem combination, we set the score to a value close to zero.

*Remark 8 (*ς*).* The parameter <sup>ς</sup> is another tunable parameter which, in similar fashion to the parameter σ earlier in this section, controls the model complexity introduced by the basis expansion. Both variables must be tuned together.

#### **7 Model Selection and Time Allocations**

From Remark 8, ς and σ are hyperparameters that control the complexity introduced into our model through the Gaussian basis expansion; and Remark 4 introduces η, the hyperparameter that controls the strength of the prior. The final model is selected by tuning them. Tuning must aim to avoid overfitting to the training data; and to maximize, during testing, either the savings in proof-search time or the number of theorems proved. However, we do not have a closed-form expression relating these parameters to this aim, thus any combination of the parameters can be judged only by testing them.

In this work we have used *Bayesian optimization* [29] to optimize these hyperparameters. Bayesian optimization is a black-box parameter optimization method that attempts to search for a global optimum within the scope of a set resource budget. It models the optimization target as a user-specified *objective function*, which maps from the parameter space to a loss metric. This model of the objective function is constructed using *Gaussian Process* (GP) regression [22], using data generated by repeatedly testing the objective function.

Our specified objective function maps from the hyperparameters (ς, σ, η) to a loss metric ξ. We use cross-validation within the training data while calculating ξ to penalize hyperparameters that over-fit. Hyperparameters are tuned at training time only, after which they are fixed for subsequent testing. The final test set is never used for any hyperparameter optimization.

In the method presented thus far we are only permuting strategies with fixed time allocations to build a sequence for a strategy schedule. In this setting, the number of theorems proved cannot change, but the time taken to prove theorems can be reduced. Therefore, with this aim, a useful metric for ξ is the total time taken by the theorem prover to prove the theorems in the cross-validation test set.

However, we can take further advantage of the hyperparameter tuning phase to additionally tune the times allocated to each strategy, by treating these times as hyperparameters. Therefore, for each strategy *d*(i) we create a hyperparameter <sup>ν</sup>(i) <sup>∈</sup> (0, 1) which sets the proportion of the proof time allocated to that strategy. We can then optimize our model to maximize the number of theorems proved; a count of the remaining theorems is then a viable metric for ξ. Note that once the ν(·) are set, time allocation for *d*(i) is fixed to ν(i) , irrespective of its order in the strategy schedule.

*Remark 9.* Our results include two types of experiment:


# **8 Training Data and Feature Extraction**

Our chosen theorem prover, iLeanCoP, is shipped with a fixed strategy schedule consisting of 5 strategies. It splits the allocated proof time across the first four strategies by 2%, 60%, 20% and 10%. However, only the first strategy is complete and therefore usually expected to take up its entire time allocation. The remaining strategies are incomplete, and may exit early on failure. Therefore, the fifth and final strategy, which we refer to as the fallback strategy, is allocated all the remaining time.

**Emulating iLeanCop.** We have constructed a dataset by attempting to prove every theorem in our problem library using each of these strategies individually. With this information, the result of any proof attempt can be calculated by emulating the behaviour of iLeanCoP. This is how we evaluate the predicted schedules—we emulate a proof attempt by iLeanCoP using that schedule for each theorem in the test set. For a faithful emulation of the fallback strategy, it is always attempted last, and therefore any new schedule is only a permutation of the first four strategies. Our experiments allocate a time of 600 s per theorem. The dataset is built to ensure that, within this proof time, any such strategy permutation can be emulated. We kept a timeout of 1200 s per strategy per theorem when building the dataset, which is more than sufficient for current experiments and gives us headroom for future experiments with longer proof times.

**Strategy Features.** Each strategy in iLeanCoP consists of a time allocation and parameter settings; the parameters are described by Otten [19]. We use a one-hot encoding feature representation for strategies based on the parameter setting as shown in Table 1. Another feature noting the completeness of each strategy is also shown. Another feature (not shown in the table) contains the time allocated to each strategy. Note the fallback strategy is used in prover emulation but not in the schedule prediction.


**Table 1.** Features of the four main strategies.

**Theorem Features.** The TPTP problem library contains a large, comprehensive collection of theorems and is designed for testing automated theorem provers. The problems are taken from a range of domains such as Logic Calculi, Algebra, Software Verification, Biology and Philosophy, and presented in multiple logical forms. For iLeanCoP, we select the subset in first-order form, denoted there as FOF. In version 7.1.0, there are 8157 such problems covering 43 domains. Each problem consists of a set of formulae and a goal theorem. The problems are of varying sizes. For example, the problem named HWV134+1 from the Hardware Verification domain contains 128975 formulae, whilst SET703+4 from the Set Theory domain contains only 12.

We have constructed a dataset containing features extracted from the firstorder logic problems in TPTP (see Appendix A). Here, we describe how those features were developed.

In deployment, a prover using our method to generate strategy schedules would have to extract features from the goal theorem at the beginning of a proof attempt. To minimize the computational overhead of feature extraction, in keeping with our goal noted in Remark 3, we use features that can be collected when the theorem is parsed by the prover. The collection of features developed in this work is based on the authors' prior experience, and later we will briefly examine the quality of each feature to discard the uninformative ones. We extract the following features, which are all considered candidates for the subsequent feature selection process.


**Quantifier Alternations:** A count of the number of times the quantifiers flip between the existential and universal. When calculated by examining only the sequence of lexical symbols, the count may be inaccurate. An accurate count is obtained by tracking negations during parsing while collecting quantifiers. We extract both as candidates.

**Feature Selection and Pre-processing.** We examine the degree of association between the individual theorem features described above and the speed with which the strategies solve each theorem; for this we use the Maximal Information Coefficient (MIC) measure [23]. For every theorem we calculate the score, as defined in Remark 7, averaged over all strategies. This score is paired with each feature to calculate its MIC. Most lexical symbols achieve an MIC close to zero. We selected the features with relatively high MIC for the presented work, and these are shown in Fig. 3.

The two features based on quantifier alternations are clearly correlated, but both meet the above criterion for selection. Correlations can also be expected between the other features. Furthermore, our features range over different scales. For example, the maximal function arity in TPTP averages 2, whereas the number of predicate symbols averages 2097. It is desirable to remove these correlations to alleviate any burden on the subsequent modelling phase, and to standardize the features to zero mean and unit variance to create a feature space with similar length-scales in all dimensions. The former is achieved by *decorrelation*, the latter by *standardization*, and both together by a *sphering transformation*.

**Fig. 3.** MIC between selected features and scores.

We transform our extracted features as such using Zero-phase Component Analysis (ZCA), which ensures the transformed data is as close as possible to the original [6].

**Coverage.** As mentioned above, we run iLeanCoP on every first-order theorem in TPTP with each strategy allocated 1200 s. Although every theorem in intuitionistic logic also holds for classical logic, the converse does not hold. For that reason and because of the limitations of iLeanCoP, many theorems remain unproved by any strategy. We exclude these theorems from our experiments, leaving us with a data-set of 2240 theorems.

#### **9 Experiments**

We present two experiments in this work, as noted in Remark 9. In this section, we describe our experimental apparatus in detail.

As noted in Sect. 8, our data contains:


This data needs to be presented to our model for training in the form of <sup>D</sup> <sup>=</sup> {(*π*(i), *<sup>x</sup>*(i))} N i=1, as described in Sect. 3. Since the two experiments have slightly different goals, we specialize D according to each.

When aiming to predict schedules that minimize the time taken to prove theorems, a natural value for *π*(i) is the index order that sorts strategies in increasing amounts of time taken to prove theorem i. However, some strategies may fail to prove theorem i within their time allocation. In that case, we consider the failed strategies equally bad and place them last in the ordering in *π*(i). Furthermore, we create additional items (*π*(i), *<sup>x</sup>*(i)) in <sup>D</sup>, by permuting the positions of the failed strategies in *π*(i) to create multiple *π*(i).

When the goal is only to prove more theorems, the strategies that succeed are all considered equally ranked above the failed strategies. In this mode, the successful strategies are similarly permuted in the data, in addition to those that failed.

In each experiment, a random one-third of the N theorems are separated into a *holdout* test set *<sup>N</sup>*˙ , leaving behind a training set *<sup>N</sup>*¨ . This training set is first used for hyperparameter tuning using BO. As explained in Sect. 7, each hyperparameter combination is tested with five-fold cross-validation within *N*¨ , to penalize instances that overfit to *N*¨ . This results in estimated optimum values for the hyperparameters. These are used to set the model, which is then trained on *N*¨ and then finally evaluated on *N*˙ . The whole process is repeated ten times with new random splits *N*˙ and *N*¨ to create one set of ten results for that experiment.

### **10 Results**

Each experiment, repeated ten times, is conducted in two phases: first, hyperparameter optimization; and second, model training and evaluation. The bounds on the search space in the first phase were always the same (see Appendix A). The holdout test set contained 747 theorems. A proof time of 600 s was emulated.

#### **10.1 Experiment 1: Optimizing Proof Attempt Time**

The results are shown in Fig. 4. The total prediction time for all 747 theorems, averaged across the trials, is 0.14 s.

The times across proof attempts are not normally distributed, for both the unmodified iLeanCoP schedule and the predicted ones, as confirmed by a Jarque-Bera test. Therefore, we used the right-tailed Wilcoxon signed-rank test for a pair-wise comparison of the times taken for each theorem by the original schedule in iLeanCoP versus the predicted schedules, resulting in a p-value of less

**Fig. 4.** Results of Experiment 1. Proof times are compared with precision 10*−*<sup>6</sup>s.

than 10−<sup>6</sup> in each trial, confirming the alternate hypothesis that the reduction in time taken to prove each theorem comes from a distribution with median greater than zero. This confirms that the time savings are statistically significant. Furthermore, we note from Fig. 4 a saving of more than 50% in the total proof-time in each trial.

#### **10.2 Experiment 2: Proving More Theorems**

We set our hyperparameter search to find time allocations for strategies. The resulting predicted schedules have gains and losses when compared to the original schedule, as shown in the four facets of Fig. 5. However, there is a consistent gain in the number of theorems proved and a gain of five theorems on average, evident from the mean values in (†) and (‡).

**Fig. 5.** Comparison of the proof attempts by the original (orig.) and predicted (pred.) schedules in Experiment 2. Theorems which are proved by pred. but could not be proved by orig. are counted in †, and the vice versa in ‡.

#### **11 Related Work**

Prior work on machine learning for *algorithm selection*, such as that introduced by Leyton-Brown et al. [13], is a precursor to our work. In that topic, the machine learning methods must perform the task of selecting a good algorithm from within a portfolio to solve the given problem instance. Typically, as was the case in the work by Leyton-Brown et al. [13], the learning methods predict the runtime of all algorithms, and then pick the fastest predicted one. This line of enquiry has been extended to select algorithms for SMT solvers—a recent example is MachSMT by Scott et al. [28]. The machine learning models in MachSMT are trained by considering all the portfolio members in pairs for each problem in the training set. This method is called *pairwise ranking*, which contrasts from our method, called *list-wise ranking*, in which we consider the full list of portfolio members all together.

In terms of the machine learning task, the work on scheduling solvers bears greater similarity to our presented work. In MedleySolver, for example, Pimpalkhare et al. [20] frame this task as a multi-armed bandit problem. They predict a sequence of solvers as well as the time allocation for each to generate schedules for the goal problems. MedleySolver is able to solve more problems than any individual solver would on its own.

With an approach that contrasts with ours, H˚ula et al. [10] have made use of Graph Neural Networks (GNNs) for solver scheduling. They produce a regression model to predict, for the given problem, the runtime of all the solvers; which is used as the key to sort the solvers in increasing order of predicted runtime to build a schedule. This is an example of *point-wise ranking*. The authors use GNNs to automatically discover features for machine learning. They combine this feature extraction with training of the regression model. They achieve an increase in the number of problems solved as well as a reduction in the total proof time. Meanwhile, our use of manual feature engineering combined with statistical methods for selection and normalization has certain advantages. For one, we can analyse our features and derive a subjective interpretation of their efficacy. Additionally, our features effectively impart our domain knowledge onto the model. Such domain knowledge may not be available in the data itself. Manual feature engineering such as ours can be combined with automatic feature extraction to reap the benefits of both.

# **12 Conclusions**

We have presented a method to specialize, for the given goal theorem, the sequence of strategies in the schedule used in each proof attempt. A Bayesian machine learning model is trained in this method using data generated by testing the prover of interest. When evaluated with the iLeanCoP prover using the TPTP library as a benchmark, our results show a significant reduction in the time taken to prove theorems. For theorems that are successfully proved, the average time saving is above 50%. The prediction time is on average low enough to have a negligible impact on the resources subtracted from the proof search itself.

We also extend this method to optimize time allocations to each strategy. In this setting, our results show a notable increase in the number of theorems proved.

This work shows, by example, that Bayesian machine learning models designed specifically to augment heuristics in theorem provers, with detailed consideration of the computational compromises required in this setting, can deliver substantial improvements.

**Acknowledgments.** Initial investigations for this work were co-supervised by Prof. Mateja Jamnik, Computer Laboratory, University of Cambridge, UK.

This work was supported by: the UK Engineering and Physical Sciences Research Council (EPSRC) through a Doctoral Training studentship, award reference 1788755; and the ERC Advanced Grant ALEXANDRIA (Project GA 742178). For the purpose of open access, the author has applied a Creative Commons Attribution (CC-BY-4.0) licence to any Author Accepted Manuscript version arising.

Computations for this work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service, provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/P020259/1), and DiRAC funding from the Science and Technology Facilities Council.

#### **A Implementation, Code and Data**

This work is implemented primarily in Matlab [36]. All experiments can be reproduced using the code, data and instructions available at [15]. The hyperparameter search space in all experiments was restricted to <sup>ς</sup> <sup>∈</sup> [10, 300], σ <sup>∈</sup> [0.01, <sup>100</sup>.0] and <sup>η</sup> <sup>∈</sup> [1, 100].

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# A Framework for Approximate Generalization in Quantitative Theories

Temur Kutsia(B) and Cleo Pau

RISC, Johannes Kepler University Linz, Linz, Austria {kutsia,ipau}@risc.jku.at

Abstract. Anti-unification aims at computing generalizations for given terms, retaining their common structure and abstracting differences by variables. We study quantitative anti-unification where the notion of the common structure is relaxed into "proximal" up to the given degree with respect to the given fuzzy proximity relation. Proximal symbols may have different names and arities. We develop a generic set of rules for computing minimal complete sets of approximate generalizations and study their properties. Depending on the characterizations of proximities between symbols and the desired forms of solutions, these rules give rise to different versions of concrete algorithms.

Keywords: Generalization · Anti-unification · Quantiative theories · Fuzzy proximity relations

## 1 Introduction

Generalization problems play an important role in various areas of mathematics, computer science, and artificial intelligence. Anti-unification [12,14] is a logicbased method for computing generalizations. Being originally used for inductive and analogical reasoning, some recent applications include recursion scheme detection in functional programs [4], programming by examples in domainspecific languages [13], learning bug-fixing from software code repositories [3,15], automatic program repair [7], preventing bugs and misconfiguration in services [11], linguistic structure learning for chatbots [6], to name just a few.

In most of the existing theories where anti-unification is studied, the background knowledge is assumed to be precise. Therefore, those techniques are not suitable for reasoning with incomplete, imprecise information (which is very common in real-world communication), where the exact equality is replaced by its (quantitative) approximation. Fuzzy proximity and similarity relations are notable examples of such extensions. These kinds of quantitative theories have many useful applications, some most recent ones being related to artificial intelligence, program verification, probabilistic programming, or natural language processing. Many tasks arising in these areas require reasoning methods and computational tools that deal with quantitative information. For instance, approximate inductive reasoning, reasoning and programming by analogy, similarity detection in programming language statements or in natural language texts could benefit from solving approximate generalization constraints, which is a theoretically interesting and challenging task. Investigations in this direction have been started only recently. In [1], the authors proposed an anti-unification algorithm for fuzzy similarity (reflexive, symmetric, min-transitive) relations, where mismatches are allowed not only in symbol names, but also in their arities (fully fuzzy signatures). The algorithm from [9] is designed for fuzzy proximity (i.e., reflexive and symmetric) relations with mismatches only in symbol names.

In this paper, we study approximate anti-unification from a more general perspective. The considered relations are fuzzy proximity relations. Proximal symbols may have different names and arities. We consider four different variants of relating arguments between different proximal symbols: unrestricted relations/functions, and correspondence (i.e. left- and right-total) relations/functions. A generic set of rules for computing minimal complete sets of generalizations is introduced and its termination, soundness and completeness properties are proved. From these rules, we obtain concrete algorithms that deal with different kinds of argument relations. We also show how the existing approximate anti-unification algorithms and their generalizations fit into this framework.

*Organization:* In Sect. 2 we introduce the notation and definitions. Section 3 is devoted to a technical notion of term set consistency and to an algorithm for computing elements of consistent sets of terms. It is used later in the main set of anti-unification rules, which are introduced and characterized in Sect. 4. The concrete algorithms obtained from those rules are also described in this section. In Sect. 5, we discuss complexity. Section 6 offers a high-level picture of the studied problems and concludes.

An extended version of this work can be found in the technical report [8].

## 2 Preliminaries

Proximity Relations. Given a set S, a mapping R from S ˆ S to the real interval [0, 1] is called a binary *fuzzy relation* on S. By fixing a number λ, 0 ď λ ď 1, we can define the crisp (i.e., two-valued) counterpart of R, named the λ*-cut* of R, as R<sup>λ</sup> :" {ps1, s2q | Rps1, s2q ě λ}. A fuzzy relation R on a set S is called a *proximity relation* if it is reflexive (Rps, sq " 1 for all s P S) and symmetric (Rps1, s2q " Rps2, s1q for all s1, s<sup>2</sup> P S). A T-norm ^ is an associative, commutative, non-decreasing binary operation on [0, 1] with 1 as the unit element. We take minimum in the role of T-norm.

Terms and Substitutions. We consider a first-order alphabet consisting of a set of fixed arity function symbols F and a set of variables V, which includes a special symbol \_ (the anonymous variable). The set of *named* (i.e., non-anonymous) variables V\{\_} is denoted by <sup>V</sup><sup>N</sup>. When the set of variables is not explicitly specified, we mean V. The set of terms T pF, Vq over F and V is defined in the standard way: t P T pF, Vq iff t is defined by the grammar t :" x | fpt1,...,tnq, where <sup>x</sup> <sup>P</sup> <sup>V</sup> and <sup>f</sup> <sup>P</sup> <sup>F</sup> is an <sup>n</sup>-ary symbol with <sup>n</sup> <sup>ě</sup> <sup>0</sup>. Terms over <sup>T</sup> <sup>p</sup>F, <sup>V</sup>N<sup>q</sup> are defined similarly except that all variables are taken from <sup>V</sup>N.

We denote arbitrary function symbols by f, g, h, constants by a, b, c, variables by x, y, z, v, and terms by s, t, r. The *head* of a term is defined as headpxq :" x and headpfpt1,...,tnqq :" <sup>f</sup>. For a term <sup>t</sup>, we denote with <sup>V</sup>pt<sup>q</sup> (resp. by <sup>V</sup>Nptq) the set of all variables (resp. all named variables) appearing in t. A term is called *linear* if no named variable occurs in it more than once.

The deanonymization operation deanon replaces each occurrence of the anonymous variable in a term by a fresh variable. For instance, deanonpfp\_, x, gp\_qqq " fpy , x, gpyqqq, where y and y are fresh. Hence, deanonptq P <sup>T</sup> <sup>p</sup>F, <sup>V</sup><sup>N</sup><sup>q</sup> is unique up to variable renaming for all <sup>t</sup> <sup>P</sup> <sup>T</sup> <sup>p</sup>F, <sup>V</sup>q. deanonptq is linear iff t is linear.

The notions of *term depth*, *term size* and a *position in a term* are defined in the standard way, see, e.g. [2]. By t|<sup>p</sup> we denote the subterm of t at position p and by t[s]<sup>p</sup> a term that is obtained from t by replacing the subterm at position p by the term s.

<sup>A</sup> *substitution* is a mapping from <sup>V</sup><sup>N</sup> to <sup>T</sup> <sup>p</sup>F, <sup>V</sup><sup>N</sup><sup>q</sup> (i.e., without anonymous variables), which is the identity almost everywhere. We use the Greek letters σ, ϑ, ϕ to denote substitutions, except for the identity substitution which is written as *Id*. We represent substitutions with the usual set notation. *Application* of a substitution σ to a term t, denoted by tσ, is defined as \_σ :" \_, xσ :" σpxq, fpt1,...,tnqσ :" fpt1σ, . . . , tnσq. Substitution *composition* is defined as a composition of mappings. We write σϑ for the composition of σ with ϑ.

Argument Relations and Mappings. Given two sets N " {1,...,n} and M " {1,...,m}, a binary *argument relation* over N ˆ M is a (possibly empty) subset of N ˆ M. We denote argument relations by ρ. An argument relation ρ Ď N ˆ M is (i) *left-total* if for all i P N there exists j P M such that pi, jq P ρ; (ii) *right-total* if for all j P M there exists i P N such that pi, jq P ρ. *Correspondence relations* are those that are both left- and right-total.

An *argument mapping* is an argument relation that is a partial injective function. In other words, an argument mapping π from N " {1,...,n} to M " {1,...,m} is a function π : I<sup>n</sup> -→ Im, where I<sup>n</sup> Ď N, I<sup>m</sup> Ď M and |In|"|Im|. Note that it can be also the empty mapping: π : H -→ H. The inverse of an argument mapping is again an argument mapping.

Given a proximity relation R over F, we assume that for each pair of function symbols f and g with Rpf,gq " α > 0, where f is n-ary and g is m-ary, there is also given an argument relation ρ over {1,...,n}ˆ{1,...,m}. We use the notation <sup>f</sup> "<sup>ρ</sup> <sup>R</sup>,<sup>α</sup> <sup>g</sup>. These argument relations should satisfy the following conditions: <sup>ρ</sup> is the empty relation if <sup>f</sup> or <sup>g</sup> is a constant; <sup>ρ</sup> is the identity if <sup>f</sup> " <sup>g</sup>; <sup>f</sup> "<sup>ρ</sup> <sup>R</sup>,<sup>α</sup> <sup>g</sup> iff <sup>g</sup> "<sup>ρ</sup>´<sup>1</sup> <sup>R</sup>,<sup>α</sup> <sup>f</sup>, where <sup>ρ</sup>´<sup>1</sup> is the inverse of <sup>ρ</sup>.

*Example 1.* Assume that we have four different versions of defining the notion of author (e.g., originated from four different knowledge bases) *author* <sup>1</sup>p*first*-name, *middle*-*initial*, *last*-nameq, *author* <sup>2</sup>p*first*-name, *last*-nameq, *author* <sup>3</sup>p*last*-name, *first*-name, *middle*-*initial*q, and *author* <sup>4</sup>p*full*-nameq. One could define the argument relations/mappings between these function symbols e.g., as follows:

$$\begin{aligned} &(a) \mathop{\text{aut}}\nolimits\_{1} \sim\_{\mathbb{R},0,7}^{\{(1,1),(3,2)\}} \mathop{\text{aut}}\nolimits\_{2}, \quad &a \text{other}\_{1} \sim\_{\mathbb{R},0,9}^{\{(3,1),(1,2),(2,3)\}} \mathop{\text{aut}}\nolimits\_{3}, \\ &(a) \mathop{\text{aut}}\nolimits\_{1} \sim\_{\mathbb{R},0,5}^{\{(1,1),(3,1)\}} \mathop{\text{aut}}\nolimits\_{4}, \quad &a \text{other}\_{2} \sim\_{\mathbb{R},0,7}^{\{(1,2),(2,1)\}} \mathop{\text{aut}}\nolimits\_{3}, \\ &(a) \mathop{\text{aut}}\nolimits\_{2} \sim\_{\mathbb{R},0,5}^{\{(1,1),(2,1)\}} \mathop{\text{aut}}\nolimits\_{4}, \quad &a \text{other}\_{3} \sim\_{\mathbb{R},0,5}^{\{(1,1),(2,1)\}} \mathop{\text{aut}}\nolimits\_{4}. \end{aligned}$$

Proximity Relations over Terms. Each proximity relation R in this paper is defined on F Y V such that Rpf,xq " 0 for all f P F and x P V, and Rpx, yq " 0 for all x ‰ y, x, y P V. We assume that R is *strict*: for all w1, w<sup>2</sup> P F Y V, if Rpw1, w2q " 1, then w<sup>1</sup> " w2. Yet another assumption is that for each f P F, its pR, λq-proximity class {g | Rpf,gq ě λ} is *finite* for any R and λ.

We extend such an R to terms from T pF, Vq as follows:


If Rpt, sq ě λ, we write t »R,λ s. When λ " 1, the relation »R,λ does not depend on R due to strictness of the latter and is just the syntactic equality ". The pR, λq-*proximity class* of a term t is **pc**R,λptq :" {s | s »R,λ t}.

Generalizations. Given R and λ, a term r is an pR, λq-*generalization* of (alternatively, pR, λq-*more general than*) a term t, written as r ÀR,λ t, if there exists a substitution σ such that deanonprqσ »R,λ deanonptq. The strict part of ÀR,λ is denoted by ≺R,λ, i.e., r ≺R,λ t if r ÀR,λ t and not t ÀR,λ r.

*Example 2.* Given a proximity relation R, a cut value λ, constants a "H <sup>R</sup>,α<sup>1</sup> <sup>b</sup> and b "H <sup>R</sup>,α<sup>2</sup> <sup>c</sup>, binary function symbols <sup>f</sup> and <sup>h</sup>, and a unary function symbol <sup>g</sup> such that <sup>h</sup> "{p1,1q,p1,2q} <sup>R</sup>,α<sup>3</sup> <sup>f</sup> and <sup>h</sup> "{p1,1q} <sup>R</sup>,α<sup>4</sup> <sup>g</sup> with <sup>α</sup><sup>i</sup> <sup>ě</sup> <sup>λ</sup>, <sup>1</sup> <sup>ď</sup> <sup>i</sup> <sup>ď</sup> <sup>4</sup>, we have


The notion of *syntactic generalization* of a term is a special case of pR, λqgeneralization for λ " 1. We write r À t to indicate that r is a syntactic generalization of t. Its strict part is denoted by ≺.

Since R is strict, r À t is equivalent to deanonprqσ " deanonptq for some σ (note the syntactic equality here).

# Theorem 1. *If* r À t *and* t ÀR,λ s*, then* r ÀR,λ s*.*

*Proof.* r À t implies deanonprqσ " deanonptq for some σ, while from t ÀR,λ s we have deanonptqϑ »R,λ deanonpsq for some ϑ. Then deanonprqσϑ »R,λ deanonpsq, which implies r ÀR,λ s. -

Note that r ÀR,λ t and t ÀR,λ s, in general, do not imply r ÀR,λ s due to non-transitivity of »R,λ.

Definition 1 (Minimal complete set of pR, λq-generalizations). *Given* R*,* λ*,* t1*, and* t2*, a set of terms* T *is a* complete set of pR, λq-generalizations *of* t<sup>1</sup> *and* t<sup>2</sup> *if*


*In addition,* T *is minimal, if it satisfies the following property:*

*(c) if* r, r P T*,* r ‰ r *, then neither* r ≺R,λ r *nor* r ≺R,λ r*.*

*A* minimal complete set of pR, λq-generalizations *((*R, λ*)-mcsg) of two terms is unique modulo variable renaming. The elements of the* pR, λq*-mcsg of* t<sup>1</sup> *and* t<sup>2</sup> *are called least general* pR, λq*-generalizations ((*R, λ*)-lggs) of* t<sup>1</sup> *and* t2*.*

*This definition directly extends to generalizations of finitely many terms.*

The problem of computing an pR, λq-generalization of terms t and s is called the pR, λq*-anti-unification problem* of t and s. In anti-unification, the goal is to compute their least general pR, λq-generalization.

The precise formulation of the anti-unification problem would be the following: Given R, λ, t1, t2, find an pR, λq-lgg r of t<sup>1</sup> and t2, substitutions σ1, σ2, and the approximation degrees α1, α<sup>2</sup> such that Rprσ1, t1q " α<sup>1</sup> and Rprσ2, t2q " α2. A minimal complete algorithm to solve this problem would compute exactly the elements of pR, λq-mcsg of t<sup>1</sup> and t<sup>2</sup> together with their approximation degrees. However, as we see below, it is problematic to solve the problem in this form. Therefore, we will consider a slightly modified variant, taking into account anonymous variables in generalizations and relaxing bounds on their degrees.

We assume that the terms to be generalized are ground. It is not a restriction because we can treat variables as constants that are close only to themselves.

Recall that the proximity class of any alphabet symbol is finite. Also, the symbols are related to each other by finitely many argument relations. One may think that it leads to finite proximity classes of terms, but this is not the case. Consider, e.g., <sup>R</sup> and <sup>λ</sup>, where <sup>h</sup> »{p1,1q} <sup>R</sup>,λ <sup>f</sup> with binary <sup>h</sup> and unary <sup>f</sup>. Then the pR, λq-proximity class of fpaq is infinite: {fpaq} Y {hpa, tq | t P T pF, Vq}. Also, the pR, λq-mcsg for fpaq and fpbq is infinite: {fpxq} Y {hpx, tq | t P T pF, Hq}.

Definition 2. *Given the terms* t1,...,tn*,* n ě 1*, a position* p *in a term* r *is called* irrelevant for pR, λq-generalizing *(resp.* for pR, λq-proximity to*)* t1,...,t<sup>n</sup> *if* r[s]<sup>p</sup> ÀR,λ t<sup>i</sup> *(resp.* r[s]<sup>p</sup> »R,λ ti*) for all* 1 ď i ď n *and for all terms* s*.*

*We say that* r *is a* relevant pR, λq-generalization *(resp.* relevant pR, λq-proximal term*) of* t1,...,t<sup>n</sup> *if* r ÀR,λ t<sup>i</sup> *(resp.* r »R,λ ti*) for all* 1 ď i ď n *and* r|<sup>p</sup> " *\_ for all positions* p *in* r *that is irrelevant for generalizing (resp. for proximity to)* t1,...,tn*. The* pR, λq*-*relevant proximity class *of* t *is*

**rpc**R,λptq :" {s | s *is a relevant* pR, λq*-proximal term of* t}.

In the example above, position 2 in hpx, tq is irrelevant for generalizing fpaq and fpbq, and hpx, \_q is one of their relevant generalizations. Note that fpxq is also a relevant generalization of fpaq and fpbq, since it contains no irrelevant positions. More general generalizations like, e.g., x, are relevant as well. Similarly, position 2 in hpa, tq is irrelevant for proximity to fpaq and **rpc**R,λpfpaqq " {fpaq, hpa, \_q}. Generally, **rpc**R,λptq is finite for any t due to the finiteness of proximity classes of symbols and argument relations mentioned above.

Definition 3 (Minimal complete set of relevant pR, λq-generalizations). *Given* R*,* λ*,* t1*, and* t2*, a set of terms* T *is a* complete set of relevant pR, λqgeneralizations *of* t<sup>1</sup> *and* t<sup>2</sup> *if*


*The minimality property is defined as in Definition 1.*

This definition directly extends to relevant generalizations of finitely many terms. We use pR, λq-mcsrg as an abbreviation for minimal complete set of relevant pR, λq-generalization. Like relevant proximity classes, mcsrg's are also finite.

Lemma 1. *For given* R *and* λ*, if all argument relations are correspondence relations, then* pR, λq*-mcsg's and* pR, λq*-proximity classes for all terms are finite.*

*Proof.* Under correspondence relations no term contains an irrelevant position for generalization or for proximity. -

Hence, for correspondence relations the notions of mcsg and mcsrg coincide, as well as the notions of proximity class and relevant proximity class.

For a term r, we define its *linearized version* linprq as a term obtained from r by replacing each occurrence of a named variable in r by a fresh one. For instance, linpfpx, \_, gpy, x, aq, bqq " fpx , \_, gpy , x, aq, bq, where x , x, y are fresh variables. Linearized versions of terms are unique modulo variable renaming.

Definition 4 (Generalization degree upper bound). *Given two terms* r *and* t*, a proximity relation* R*, and a* λ*-cut, the* pR, λq-generalization degree upper bound of r and t, *denoted by* gdubR,λpr, tq*, is defined as follows:*

*Let* α :" max{Rplinprqσ, tq | σ *is a substitution*}*. Then* gdubR,λpr, tq *is* α *if* α ě λ*, and* 0 *otherwise.*

Intuitively, gdubR,λpr, tq " α means that no instance of r can get closer than α to t in R. From the definition it follows that if r ÀR,λ t, then 0 < λ gdubR,λpr, tq ≤ 1 and if r ÀR,λ t, then gdubR,λpr, tq " 0.

The upper bound computed by gdub is more relaxed than it would be if the linearization function were not used, but this is what we will be able to compute in our algorithms later.

*Example 3.* Let Rpa, bq " 0.6, Rpb, cq " 0.7, and λ " 0.5. Then gdubR,λpfpx, bq, fpa, cqq " 0.7 and gdubR,λpfpx, xq, fpa, cqq " gdubR,λpfpx, yq, fpa, cqq " 1.

It is not difficult to see that if rσ »R,λ t, then Rprσ, tq ď gdubR,λpr, tq. In Example 3, for σ " {x -→ b} we have Rpfpx, xqσ, fpa, cqq " Rpfpb, bq, fpa, cqq " 0.6 < gdubR,λpfpx, xq, fpa, cqq " 1.

We compute gdubR,λpr, tq as follows: If r is a variable, then gdubR,λpr, tq " 1. Otherwise, if headprq "<sup>ρ</sup> <sup>R</sup>,<sup>β</sup> headptq, then gdubR,λpr, tq " <sup>β</sup> ^ - <sup>p</sup>i,jqP<sup>ρ</sup> gdubR,λpr|i, t|<sup>j</sup> q. Otherwise, gdubR,λpr, tq " 0.

# 3 Term Set Consistency

The notion of term set consistency plays an important role in the computation of proximal generalizations. Intuitively, a set of terms is pR, λq-consistent if all the terms in the set have a common pR, λq-proximal term. In this section, we discuss this notion and the corresponding algorithms.

Definition 5 (Consistent set of terms). *A finite set of terms* T *is* pR, λqconsistent *if there exists a term* s *such that* s »R,λ t *for all* t P T*.*

pR, λq-consistency of a finite term set T is equivalent to <sup>t</sup>P<sup>T</sup> **pc**R,λptq‰H, but we cannot use this property to decide consistency, since proximity classes of terms can be infinite (when the argument relations are not restricted). For this reason, we introduce the operation [ on terms as follows: (i) t [ \_ " \_ [ t " t, (ii) fpt1,...,tnq [ fps1,...,snq " fpt<sup>1</sup> [ s1,...,t<sup>n</sup> [ snq, n ě 0. Obviously, [ is associative (A), commutative (C), idempotent (I), and has \_ as its unit element (U). It can be extended to sets of terms: T<sup>1</sup> [ T<sup>2</sup> :" {t<sup>1</sup> [ t<sup>2</sup> | t<sup>1</sup> P T1, t<sup>2</sup> P T2}. It is easy to see that [ on sets also satisfies the ACIU properties with the set {\_} playing the role of the unit element.

Lemma 2. *A finite set of terms* T *is* pR, λq*-consistent iff* Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λptq‰H*. Proof.* p⇒q If s »R,λ t for all t P T, then s<sup>t</sup> P **rpc**R,λptq, where s<sup>t</sup> is obtained from s by replacing all subterms that are irrelevant for its pR, λq-proximity to t by \_. Assume T " {t1,...,tn}. Then s<sup>t</sup><sup>1</sup> [ ··· [ s<sup>t</sup>*<sup>n</sup>* P Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λptq.

p⇐q Obvious, since s »R,λ t for s P Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λpt<sup>q</sup> and for all <sup>t</sup> <sup>P</sup> <sup>T</sup>. -

Now we design an algorithm C that computes Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λpt<sup>q</sup> without actually computing **rpc**R,λptq for each t P T. A special version of the algorithm can be used to decide the pR, λq-consistency of T.

The algorithm is rule-based. The rules work on states, that are pairs **I**; s, where s is a term and **I** is a finite set of expressions of the form x in T, where T is a finite set of terms. R and λ are given. There are two rules (Z stands for disjoint union):

Rem: Removing the empty set

{x in H} Z **I**; s "⇒ **I**; s{x -→ \_}.

#### Red: Reduce a set to new sets

{x in {t1,...,tm}} Z **I**; s "⇒ {y<sup>1</sup> in T1,...,y<sup>n</sup> in Tn} Y **I**; s{x -→ hpy1,...,ynq}, where <sup>m</sup> <sup>ě</sup> <sup>1</sup>, <sup>h</sup> is an <sup>n</sup>-ary function symbol such that <sup>h</sup> "ρ*<sup>k</sup>* <sup>R</sup>,γ*<sup>k</sup>* headptk<sup>q</sup> with γ<sup>k</sup> ě λ for all 1 ď k ď m, and T<sup>i</sup> :" {tk|<sup>j</sup> | pi, jq P ρk, 1 ď k ď m}, 1 ď i ď n, is the set of all those arguments of the terms t1,...,t<sup>m</sup> that are supposed to be pR, λq-proximal to the i's argument of h.

To compute Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λptq, <sup>C</sup> starts with {<sup>x</sup> in <sup>T</sup>}; <sup>x</sup> and applies the rules as long as possible. Red causes branching. A state of the form <sup>H</sup>; <sup>s</sup> is called a success state. A failure state has the form **I**; s, to which no rule applies and **I** ‰ H. In the full derivation tree, each leaf is a either success or a failure state.

*Example 4.* Assume a, b, c are constants, g, f, h are function symbols with the arities respectively 1, 2, and 3. Let λ be given and R be defined so that Rpa, bq ě <sup>λ</sup>, <sup>R</sup>pb, cq ě <sup>λ</sup>, <sup>h</sup> "{p1,1q,p1,2q} <sup>R</sup>,<sup>β</sup> <sup>f</sup>, <sup>h</sup> "{p2,1q} <sup>R</sup>,<sup>γ</sup> <sup>g</sup> with <sup>β</sup> <sup>ě</sup> <sup>λ</sup> and <sup>γ</sup> <sup>ě</sup> <sup>λ</sup>. Then

$$\begin{aligned} \mathbf{r} \mathbf{p} \mathbf{c}\_{\mathcal{R},\lambda}(f(a,c)) &= \{ f(a,c), \, f(b,c), \, f(a,b), \, f(b,b), \, h(b,\\_,\\_) \}, \\ \mathbf{r} \mathbf{p} \mathbf{c}\_{\mathcal{R},\lambda}(g(a)) &= \{ g(a), g(b), h(\\_,a,\\_), h(\\_,b,\\_) \}, \end{aligned}$$

and **rpc**R,λpfpa, cqq[**rpc**R,λpgpaqq " {hpb, a,\_q, hpb, b,\_q}. We show how to compute this set with C: {x in {fpa, cq, gpaq}}; x "⇒Red {y<sup>1</sup> in {a, c}, y<sup>2</sup> : {a}, y<sup>3</sup> in H}; hpy1, y2, y3q "⇒Rem {y<sup>1</sup> in {a, c}, y<sup>2</sup> : {a}}; hpy1, y2, \_q "⇒Red {y<sup>2</sup> in {a}}; <sup>h</sup>pb, y2, \_q. Here we have two ways to apply Red to the last state, leading to two elements of **rpc**R,λpfpa, cqq [ **rpc**R,λpgpaqq: hpb, a, \_q and hpb, b, \_q.

Theorem 2. *Given a finite set of terms* T*, the algorithm* C *always terminates starting from the state* {x in T}; x *(where* x *is a fresh variable). If* S *is the set of success states produced at the end, we have* {s | H; s P S} " Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λptq*.*

*Proof.* Termination: Associate to each state {x<sup>1</sup> in T1,...x<sup>n</sup> in Tn}; s the multiset {d1,...,dn}, where d<sup>i</sup> is the maximum depth of terms occurring in Ti. d<sup>i</sup> " 0 if T<sup>i</sup> " H. Compare these multisets by the Dershowitz-Manna ordering [5]. Each rule strictly reduces them, which implies termination.

By the definitions of **rpc**R,λ and [, hps1,...,snq P Ű <sup>t</sup>P{t1,...,t*m*} **rpc**R,λpt<sup>q</sup> iff <sup>h</sup> "<sup>ρ</sup>*<sup>k</sup>* <sup>R</sup>,γ*<sup>k</sup>* headptk<sup>q</sup> with <sup>γ</sup><sup>k</sup> <sup>ě</sup> <sup>λ</sup> for all <sup>1</sup> <sup>ď</sup> <sup>k</sup> <sup>ď</sup> <sup>m</sup> and <sup>s</sup><sup>i</sup> <sup>P</sup> <sup>Ű</sup> <sup>t</sup>PT*<sup>i</sup>* **rpc**R,λptq, where <sup>T</sup><sup>i</sup> " {tk|<sup>j</sup> <sup>|</sup> <sup>p</sup>i, jq P <sup>ρ</sup>k, <sup>1</sup> <sup>ď</sup> <sup>k</sup> <sup>ď</sup> <sup>m</sup>}, <sup>1</sup> <sup>ď</sup> <sup>i</sup> <sup>ď</sup> <sup>n</sup>. Therefore, in the Rem rule, the instance of x (which is hpy1,...,ynq) is in Ű <sup>t</sup>P{t1,...,t*m*} **rpc**R,λpt<sup>q</sup> iff for each 1 ď i ď n we can find an instance of y<sup>i</sup> in Ű <sup>t</sup>PT*<sup>i</sup>* **rpc**R,λptq. If <sup>T</sup><sup>i</sup> is empty, it means that the i's argument of h is irrelevant for terms in {t1,...,tm} and can be replaced by \_. (Rem does it in a subsequent step.) Hence, in each success branch of the derivation tree, the algorithm C computes one element of Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λptq. Branching at Red helps produce all elements of Ű <sup>t</sup>P<sup>T</sup> **rpc**R,λptq. -

It is easy to see how to use C to decide the pR, λq-consistency of T: it is enough to find one successful branch in the C-derivation tree for {x in T}; x. If there is no such branch, then T is not pR, λq-consistent. In fact, during the derivation we can even ignore the second component of the states.

# 4 Solving Generalization Problems

Now we can reformulate the anti-unification problem that will be solved in the remaining part of the paper. R is a proximity relation and λ is a cut value.

Given: R, λ, and the ground terms t1,...,tn, n ě 2.

Find: a set S of tuples pr, σ1,...,σn, α1,..., αnq such that


(When n " 1, this is a problem of computing a relevant proximity class of a term.) Below we give a set of rules, from which one can obtain algorithms to solve the anti-unification problem for four versions of argument relations:


Each of them has also the corresponding linear variant, computing minimal complete sets of (relevant) linear pR, λq-generalizations. They are denoted by adding the superscript lin to the corresponding algorithm name: Alin <sup>1</sup> and Alin 2 .

For simplicity, we formulate the algorithms for the case n " 2. They can be extended for arbitrary n straightforwardly.

The main data structure in these algorithms is an anti-unification triple (AUT) x : T<sup>1</sup> fi T2, where T<sup>1</sup> and T<sup>2</sup> are finite *consistent* sets of ground terms. The idea is that x is a common generalization of all terms in T<sup>1</sup> Y T2. A configuration is a tuple A; S; r; α1; α2, where A is a set of AUTs to be solved, S is a set of solved AUTs (the store), r is the generalization computed so far, and the α's are the current approximations of generalization degree upper bounds of r for the input terms.

Before formulating the rules, we discuss one peculiarity of approximate generalizations:

*Example 5.* For a given <sup>R</sup> and <sup>λ</sup>, assume <sup>R</sup>pa, bq ě <sup>λ</sup>, <sup>R</sup>pb, cq ě <sup>λ</sup>, <sup>h</sup> "{p1,1q,p1,2q} R,α <sup>f</sup> and <sup>h</sup> "{p1,1q} <sup>R</sup>,<sup>β</sup> <sup>g</sup>, where <sup>f</sup> is binary, g, h are unary, <sup>α</sup> <sup>ě</sup> <sup>λ</sup> and <sup>β</sup> <sup>ě</sup> <sup>λ</sup>. Then


This example shows that generalization algorithms should take into account not only the heads of the terms to be generalized, but also should look deeper, to make sure that the arguments grouped together by the given argument relation have a common neighbor. This justifies the requirement of consistency of a set of arguments, the notion introduced in the previous section and used in the decomposition rule below.

#### 4.1 Anti-unification for Unrestricted Argument Relations

Algorithms Alin <sup>1</sup> and A<sup>1</sup> use the rules below to transform configurations into configurations. Given R, λ, and the ground terms t<sup>1</sup> and t2, we create the initial configuration {x : {t1} fi {t2}}; H; x; 1; 1 and apply the rules as long as possible. Note that the rules preserve consistency of AUTs. The process generates a finite complete tree of derivations, whose terminal nodes have configurations with the first component empty. We will show how from these terminal configurations one collects the result as required in the anti-unification problem statement.

#### Tri: Trivial

{x : H fi H} Z A; S; r; α1; α<sup>2</sup> "⇒ A; S; r{x -→ \_}; α1; α2.

#### Dec: Decomposition

{x : T<sup>1</sup> fi T2} Z A; S; r; α1; α<sup>2</sup> "⇒ {y<sup>i</sup> : Qi<sup>1</sup> fi Qi<sup>2</sup> | 1 ď i ď n} Y A; S; r{x -→ hpy1,...,ynq}; α<sup>1</sup> ^ β1; α<sup>2</sup> ^ β2,

where T<sup>1</sup> Y T<sup>2</sup> ‰ H; h is n-ary with n ě 0; y1,...,y<sup>n</sup> are fresh; and for j " 1, 2, if T<sup>j</sup> " {t j 1,...,t<sup>j</sup> <sup>m</sup>*<sup>j</sup>* }, then


#### Sol: Solving

{x : T<sup>1</sup> fi T2} Z A; S; r; α1; α<sup>2</sup> "⇒ A; {x : T<sup>1</sup> fi T2} Y S; r; α1; α2, if Tri and Dec rules are not applicable. (It means that at least one <sup>T</sup><sup>i</sup> ‰ H and either there is no h as it is required in the Dec rule, or at least one Qij from Dec is not pR, λq-consistent.)

Let expand be an *expansion operation* defined for sets of AUTs as

$$\mathtt{exprand}(S) \coloneqq \{ x : \bigcap\_{t \in T\_1} \mathtt{rep}\_{\mathcal{R}, \lambda}(t) \triangleq \bigcap\_{t \in T\_2} \mathtt{rep}\_{\mathcal{R}, \lambda}(t) \mid x : T\_1 \triangleq T\_2 \in S \}.$$

Exhaustive application of the three rules above leads to configurations of the form H; S; r; α1; α2, where r is a linear term. These configurations are further postprocessed, replacing S by expandpSq. We will use the letter E for expanded stores. Hence, terminal configurations obtained after the exhaustive rule application and expansion have the form H; E; r; α1; α2, where r is a linear term.<sup>1</sup> This is what Algorithm Alin <sup>1</sup> stops with.

To an expanded store E " {y<sup>1</sup> : Q<sup>11</sup> fi Q12,...,y<sup>n</sup> : Qn<sup>1</sup> fi Qn2} we associate two sets of substitutions ΣLpEq and ΣRpEq, defined as follows: σ P ΣLpEq (resp. σ P ΣRpEq) iff dompσq " {y1,...,yn} and yiσ P Qi<sup>1</sup> (resp. yiσ P Qi2) for each 1 ď i ď n. We call them the sets of *witness substitutions*.

Configurations containing expanded stores are called *expanded configurations*. From each expanded configuration C " H; E; r; α1; α2, we construct the set SpCq :" {pr, σ1, σ2, α1, α2q | σ<sup>1</sup> P ΣLpEq, σ<sup>2</sup> P ΣRpEq}.

Given an anti-unification problem R, λ, t<sup>1</sup> and t2, the *answer computed by Algorithm* Alin <sup>1</sup> is the set S :" Y<sup>m</sup> <sup>i</sup>"<sup>1</sup>SpCiq, where C1,...,C<sup>m</sup> are all of the final expanded configurations reached by Alin <sup>1</sup> for R, λ, t1, and t2. 2

*Example 6.* Assume a, b, c and d are constants with b "H <sup>R</sup>,0.<sup>5</sup> <sup>c</sup>, <sup>c</sup> "H <sup>R</sup>,0.<sup>6</sup> <sup>d</sup>, and <sup>f</sup>, g and h are respectively binary, ternary and quaternary function symbols with <sup>h</sup> "{p1,1q,p3,2q,p4,2q} <sup>R</sup>,0.<sup>7</sup> <sup>f</sup> and <sup>h</sup> "{p1,1q,p3,3q} <sup>R</sup>,0.<sup>8</sup> <sup>g</sup>. For the proximity relation <sup>R</sup> given in this way and <sup>λ</sup> " <sup>0</sup>.5, Algorithm <sup>A</sup>lin <sup>1</sup> performs the following steps to anti-unify fpa, bq and gpa, c, dq:

$$\begin{split} & \{ x : \{ f(a, b) \} \triangleq \{ g(a, c, d) \} \}; \mathcal{Q}; x; 1; 1 \Longrightarrow\_{\mathsf{Dec}} \\ & \{ x\_1 : \{ a \} \neq \{ a \}, \, x\_2 : \mathcal{Q} \neq \mathcal{Q}, \, x\_3 : \{ b \} \neq \{ d \}, \\ & \qquad \qquad x\_4 : \{ b \} \neq \mathcal{Q}; \{ \mathcal{Q}; h(x\_1, x\_2, x\_3, x\_4); 0.7; 0.8 \Longrightarrow\_{\mathsf{Dec}} \\ & \{ x\_2 : \mathcal{Q} \neq \mathcal{Q}, \, x\_3 : \{ b \} \neq \{ d \}, \, x\_4 : \{ b \} \neq \mathcal{Q} \}; \mathcal{Q}; h(a, x\_2, x\_3, x\_4); 0.7; 0.8 \Longrightarrow\_{\mathsf{Tri}} \\ & \{ x\_3 : \{ b \} \neq \{ d \}, \, x\_4 : \{ b \} \neq \mathcal{Q} \}; \mathcal{Q}; h(a, \\_, x\_3, x\_4); 0.7; 0.8 \Longrightarrow\_{\mathsf{Dec}} \\ & \{ x\_4 : \{ b \} \neq \mathcal{Q} \}; \mathcal{Q}; h(a, \\_, c, x\_4); 0.5; 0.6. \end{split}$$

Here Dec applies in two different ways, with the substitutions {x<sup>4</sup> -→ b} and {x<sup>4</sup> -→ c}, leading to two final configurations: H; H; hpa, \_, c, bq; 0.5; 0.6 and H; H; hpa, \_, c, cq; 0.5; 0.6. The witness substitutions are the identity substitutions. We have Rphpa, \_, c, bq, fpa, bqq " 0.5, Rphpa, \_, c, bq, gpa, c, dqq " 0.6, Rphpa, \_, c, cq, fpa, bqq " 0.5, and Rphpa, \_, c, cq, gpa, c, dqq " 0.6.

If we had <sup>h</sup> "{p1,1q,p1,2q,p4,2q} <sup>R</sup>,0.<sup>7</sup> <sup>f</sup>, then the algorithm would perform only the Sol step, because in the attempt to apply Dec to the initial configuration, the set

<sup>1</sup> Note that no side of the AUTs in E in those configurations is empty due to the

condition at the Decomposition rule requiring the <sup>Q</sup>*ij* 's to be <sup>p</sup>R, λq-consistent. <sup>2</sup> If we are interested only in linear generalizations *without witness substitutions*, there is no need in computing expanded configurations in Alin 1 .

Q<sup>11</sup> " {a, b} is inconsistent: **rpc**R,λpaq " {a}, **rpc**R,λpbq " {b, c}, and, hence, **rpc**R,λpaq [ **rpc**R,λpbq"H.

Algorithm A<sup>1</sup> is obtained by further transforming the expanded configurations produced by Alin <sup>1</sup> . This transformation is performed by applying the Merge rule below as long as possible. Intuitively, its purpose is to make the linear generalization obtained by Alin <sup>1</sup> less general by merging some variables.

Mer: Merge

H; {x<sup>1</sup> : R<sup>11</sup> fi R12, x<sup>2</sup> : R<sup>21</sup> fi R22} Z E; r; α1; α<sup>2</sup> "⇒ H; {y : Q<sup>1</sup> fi Q2} Y E; rσ; α1; α2,

where Q<sup>i</sup> " pR1<sup>i</sup> [ R2iq‰H, i " 1, 2, y is fresh, and σ " {x<sup>1</sup> -→ y, x<sup>2</sup> -→ y}.

The answer computed by A<sup>1</sup> is defined similarly to the answer computed by Alin 1 .

*Example 7.* Assume a, b are constants, f1, f2, g1, and g<sup>2</sup> are unary function symbols, p is a binary function symbol, and h<sup>1</sup> and h<sup>2</sup> are ternary function symbols. Let <sup>λ</sup> be a cut value and <sup>R</sup> be defined as <sup>f</sup><sup>i</sup> "{p1,1q} <sup>R</sup>,α*<sup>i</sup>* <sup>h</sup><sup>i</sup> and <sup>g</sup><sup>i</sup> "{p1,2q} <sup>R</sup>,β*<sup>i</sup>* <sup>h</sup><sup>i</sup> with α<sup>i</sup> ě λ, β<sup>i</sup> ě λ, i " 1, 2. To generalize ppf1paq, g1pbqq and ppf2paq, g2pbqq, we use A1. The derivation starts as

$$\begin{aligned} \{x:\{p(f\_1(a),g\_1(b))\} & \triangleq \{p(f\_2(a),g\_2(b))\}\}; \mathcal{\mathcal{Q}}; \; x; \; 1; \; 1 & \Longrightarrow\_{\mathsf{Dec}}\\ \{y\_1:\{f\_1(a)\} & \triangleq \{f\_2(a)\}, \; y\_2:\{g\_1(b)\} & \triangleq \{g\_2(b)\}\}; \; \mathcal{Q}; \; p(y\_1, y\_2); \; 1; \; 1 & \Longrightarrow\_{\mathsf{Sol}}^2\\ \mathcal{Q}; \; \{y\_1:\{f\_1(a)\} \neq \{f\_2(a)\}, \; y\_2:\{g\_1(b)\} \neq \{g\_2(b)\}\}; \; p(y\_1, y\_2); \; 1; \; 1. \end{aligned}$$

At this stage, we expand the store, obtaining

$$\begin{aligned} \mathcal{Q}; \{ y\_1: \{ f\_1(a), h\_1(a, \\_, \\_) \} \triangleq \{ f\_2(a), h\_2(a, \\_, \\_) \}, \\ y\_2: \{ g\_1(b), h\_1(\\_, b, \\_) \} \triangleq \{ g\_2(b), h\_2(\\_, b, \\_) \}; \, p(y\_1, y\_2); 1; \, 1. \end{aligned}$$

If we had the standard intersection <sup>X</sup> in the Mer rule, we would not be able to merge y<sup>1</sup> and y2, because the obtained sets in the corresponding AUTs are disjoint. However, Mer uses [: we have {fipaq, hipa, \_, \_q} [ {gipbq, hip\_, b, \_q} " {hipa, b, \_q}, i " 1, 2 and, therefore, can make the step

$$\begin{aligned} \mathcal{Q}; \{ y\_1: \{ f\_1(a), h\_1(a, \\_, \\_) \} \models \{ f\_2(a), h\_2(a, \\_, \\_) \}, \\ y\_2: \{ g\_1(b), h\_1(\\_, b, \\_) \} \models \{ g\_2(b), h\_2(\\_, b, \\_) \}; \, p(y\_1, y\_2); \, 1; \, 1 \Longrightarrow\_{\mathsf{Mern}} \\ \mathcal{Q}; \, \{ z: \{ h\_1(a, b, \\_) \} \models \{ h\_2(a, b, \\_) \} \}; \, p(z, z); \, 1; \, 1. \end{aligned}$$

Indeed, if we take the witness substitutions σ<sup>i</sup> " {z -→ hipa, b, \_q}, i " 1, 2, and apply them to the obtained generalization, we get

$$\begin{aligned} p(z,z)\sigma\_1 &= p(h\_1(a,b,\\_),h\_1(a,b,\\_)) \simeq\_{\mathcal{R},\lambda} p(f\_1(a),g\_1(b)),\\ p(z,z)\sigma\_2 &= p(h\_2(a,b,\\_),h\_2(a,b,\\_)) \simeq\_{\mathcal{R},\lambda} p(f\_2(a),g\_2(b)).\end{aligned}$$

Theorem 3. *Given* R*,* λ*, and the ground terms* t<sup>1</sup> *and* t2*, Algorithm* A<sup>1</sup> *terminates for* {x : {t1} fi {t2}}; H; x; 1; 1 *and computes an answer set* S *such that*


*Proof. Termination:* Define the depth of an AUT x : {t1,...,tm} fi {s1,..., <sup>s</sup>n} as the depth of the term <sup>f</sup>pgpt1,...,tmq, hps1,...,snqq. The rules Tri, Dec, and Sol strictly reduce the multiset of depths of AUTs in the first component of the configurations. Mer strictly reduces the number of distinct variables in generalizations. Hence, these rules cannot be applied infinitely often and A<sup>1</sup> terminates.

In order to prove (1), we need to verify three properties:


*Soundness:* We show that each rule transforms an pR, λq-generalization into an pR, λq-generalization. Since we start from a most general pR, λq-generalization of t<sup>1</sup> and t<sup>2</sup> (a fresh variable x), at the end of the algorithm we will get an pR, λq-generalization of t<sup>1</sup> and t2. We also show that in this process all irrelevant positions are abstracted by anonymous variables, to guarantee that each computed generalization is relevant.

Dec: The computed <sup>h</sup> is <sup>p</sup>R, λq-close to the head of each term in <sup>T</sup><sup>1</sup> <sup>Y</sup> <sup>T</sup>2. <sup>Q</sup>ij 's correspond to argument relations between h and those heads, and each Qij is pR, λq-consistent, i.e., there exists a term that is pR, λq-close to each term in Qij . It implies that xσ " hpy1,...,ynq pR, λq-generalizes all the terms from T<sup>1</sup> YT2. Note that at this stage, hpy1,...,ynq might not yet be a relevant pR, λqgeneralization of T<sup>1</sup> and T2: if there exists an irrelevant position 1 ď i ď n for the pR, λq-generalization of T<sup>1</sup> and T2, then in the new configuration we will have an AUT y<sup>i</sup> : H fi H.

Tri: When Dec generates <sup>y</sup> : <sup>H</sup> fi <sup>H</sup>, the Tri rule replaces <sup>y</sup> by \_ in the computed generalization, making it relevant.

Sol does not change generalizations.

Mer merges AUTs whose terms have *nonempty* intersection of **rpc**'s. Hence, we can reuse the same variable in the corresponding positions in generalizations, i.e., Mer transforms a generalization computed so far into a less general one.

*Completeness:* We prove a slightly more general statement. Given two finite consistent sets of ground terms T<sup>1</sup> and T2, if r is a relevant pR, λq-generalization for all t<sup>1</sup> P T<sup>1</sup> and t<sup>2</sup> P T2, then starting from {x : T<sup>1</sup> fi T2}; H; x; 1; 1, Algorithm A<sup>1</sup> computes a pr, σ1, σ2, α1, α2q such that r À r.

We may assume w.l.o.g. that r is a relevant pR, λq-lgg. Due to the transitivity of À, completeness for such an r will imply it for all terms more general than r .

We proceed by structural induction on r . If r is a (named or anonymous) variable, the statement holds. Assume r " hpr 1,...,r <sup>n</sup>q, T<sup>1</sup> " {u1,...,um}, and <sup>T</sup><sup>2</sup> " {w1,...,wl}. Then <sup>h</sup> is such that <sup>h</sup> "ρ*<sup>i</sup>* <sup>R</sup>,β*<sup>i</sup>* headpui<sup>q</sup> for all <sup>1</sup> <sup>ď</sup> <sup>i</sup> <sup>ď</sup> <sup>m</sup> and <sup>h</sup> "μ*<sup>j</sup>* <sup>R</sup>,γ*<sup>j</sup>* headpw<sup>j</sup> <sup>q</sup> for all <sup>1</sup> <sup>ď</sup> <sup>j</sup> <sup>ď</sup> <sup>l</sup>. Moreover, each <sup>r</sup> <sup>k</sup> is a relevant pR, λqgeneralization of Qk<sup>1</sup> " Y<sup>m</sup> <sup>i</sup>"1{ui|<sup>q</sup> <sup>|</sup> <sup>p</sup>k, qq P <sup>ρ</sup>i} and <sup>Q</sup>k<sup>2</sup> " Y<sup>l</sup> <sup>j</sup>"1{w<sup>j</sup> |<sup>q</sup> | pk, qq P μj} and, hence, Qk<sup>1</sup> and Qk<sup>2</sup> are pR, λq-consistent. Therefore, we can perform a step by Dec, choosing <sup>h</sup>py1,...,yk<sup>q</sup> as the generalization term and <sup>y</sup><sup>i</sup> : <sup>Q</sup>i<sup>1</sup> fi <sup>Q</sup>i<sup>2</sup> as the new AUTs. By the induction hypothesis, for each 1 ď i ď n we can compute a relevant pR, λq-generalization r<sup>i</sup> for Q<sup>i</sup><sup>1</sup> and Qi<sup>2</sup> such that r <sup>i</sup> À ri.

If r is linear, then the combination of the current Dec step with the derivations that lead to those ri's computes a tuple pr, . . .q P S, where r " hpr1,...,rnq and, hence, r À r.

If r is non-linear, assume without loss of generality that all occurrences of a shared variable z appear as the direct arguments of h: z " r <sup>k</sup><sup>1</sup> " ··· " r <sup>k</sup>*<sup>p</sup>* for 1 ď k<sup>1</sup> < ··· < k<sup>p</sup> ď n. Since r is an lgg, Q<sup>k</sup>*i*<sup>1</sup> and Q<sup>k</sup>*i*<sup>2</sup> cannot be generalized by a non-variable term, thus, Tri and Dec are not applicable. Therefore, the AUTs y<sup>i</sup> : Q<sup>k</sup>*i*<sup>1</sup> fi Q<sup>k</sup>*i*<sup>2</sup> would be transformed by Sol. Since all pairs Q<sup>k</sup>*i*<sup>1</sup> and Q<sup>k</sup>*i*<sup>2</sup>, 1 ď i ď p, are generalized by the same variable, we have [<sup>t</sup>PQ*<sup>j</sup>* **rpc**R,λptq‰H, where <sup>Q</sup><sup>j</sup> " Y<sup>p</sup> <sup>i</sup>"<sup>1</sup>Q<sup>k</sup>*i*<sup>j</sup> , <sup>j</sup> " <sup>1</sup>, <sup>2</sup>. Additionally, <sup>r</sup> <sup>k</sup><sup>1</sup> ,...,r <sup>k</sup>*<sup>p</sup>* are all occurrences of z in r . Hence, the condition of Mer is satisfied and we can extend our derivation with p ´ 1-fold application of this rule, obtaining r " hpr1,...,rnq with z " r<sup>k</sup><sup>1</sup> " ··· " r<sup>k</sup>*<sup>p</sup>* , implying r À r.

*Minimality:* Alternative generalizations are obtained by branching in Dec or Mer. If the current generalization r is transformed by Dec into two generalizations r<sup>1</sup> and r<sup>2</sup> on two branches, then r<sup>1</sup> " h1py1,...,ymq and r<sup>2</sup> " h2pz1,...,znq for some h's, and fresh y's and z's. It may happen that r<sup>1</sup> ÀR,λ r<sup>2</sup> or vice versa (if h<sup>1</sup> and h<sup>2</sup> are pR, λq-close to each other), but neither r<sup>1</sup> ≺R,λ r<sup>2</sup> nor r<sup>2</sup> ≺R,λ r<sup>1</sup> holds. Hence, the set of generalizations computed before applying Mer is minimal. Mer groups AUTs together maximally, and different groupings are not comparable. Therefore, variables in generalizations are merged so that distinct generalizations are not ≺R,λ-comparable. Hence, (1) is proven.

As for (2), for <sup>i</sup> " <sup>1</sup>, <sup>2</sup>, from the construction in Dec follows <sup>R</sup>prσi, tiq ď <sup>α</sup>i. Mer does not change <sup>α</sup>i, thus, <sup>α</sup><sup>i</sup> " gdubR,λpr, ti<sup>q</sup> also holds, since the way how <sup>α</sup><sup>i</sup> is computed corresponds exactly to the computation of gdubR,λpr, tiq: r ÀR,λ t<sup>i</sup> and only the decomposition changes the degree during the computation. -

The corollary below is proved similarly to Theorem 3:

Corollary 1. *Given* <sup>R</sup>*,* <sup>λ</sup>*, and the ground terms* <sup>t</sup><sup>1</sup> *and* <sup>t</sup>2*, Algorithm* <sup>A</sup>lin <sup>1</sup> *terminates for* {x : {t1} fi {t2}}; H; x; 1; 1 *and computes an answer set* S *such that*


### 4.2 Anti-unification with Correspondence Argument Relations

Correspondence relations make sure that for a pair of proximal symbols, no argument is irrelevant for proximity. Left- and right-totality of those relations guarantee that each argument of a term is close to at least one argument of its proximal term and the inverse relation remains a correspondence relation. Consequently, in the Dec rule of A1, the sets Qij never get empty. Therefore, the Tri rule becomes obsolete and no anonymous variable appears in generalizations. As a result, the pR, λq-mcsrg and the pR, λq-mcsg coincide, and the algorithm computes a solution from which we get an pR, λq-mcsg for the given anti-unification problem. The linear version Alin <sup>1</sup> works analogously.

#### 4.3 Anti-unification with Argument Mappings

When the argument relations are mappings, we are able to design a more constructive method for computing generalizations and their degree bounds (Recall that our mappings are partial injective functions, which guarantees that their inverses are also mappings.) We denote this algorithm by A2. The configurations stay the same as in before, but the AUTs in A will contain only empty or singleton sets of terms. In the store, we may still get (after the expansion) AUTs with term sets containing more than one element. Only the Dec rule differs from its previous counterpart, having a simpler condition:

#### Dec: Decomposition

{x : T<sup>1</sup> fi T2} Z A; S; r; α1; α<sup>2</sup> "⇒ {y<sup>i</sup> : Qi<sup>1</sup> fi Qi<sup>2</sup> | 1 ď i ď n} Y A; S; r{x -→ hpy1,...,ynq}; α<sup>1</sup> ^ β1; α<sup>2</sup> ^ β2,

where T<sup>1</sup> Y T<sup>2</sup> ‰ H; h is n-ary with n ě 0; y1,...,y<sup>n</sup> are fresh; for j " 1, 2 and for all <sup>1</sup> <sup>ď</sup> <sup>i</sup> <sup>ď</sup> <sup>n</sup>, if <sup>T</sup><sup>j</sup> " {tj} then <sup>h</sup> "π*<sup>j</sup>* <sup>R</sup>,β*<sup>j</sup>* headpt<sup>j</sup> <sup>q</sup> and <sup>Q</sup>ij " {t<sup>j</sup> <sup>|</sup>π*<sup>j</sup>* <sup>p</sup>iq}, and if T<sup>j</sup> " H then β<sup>j</sup> " 1 and Qij " H.

This Dec rule is equivalent to the special case of Dec for argument relations where m<sup>j</sup> ď 1. The new Qij 's contain at most one element (due to mappings) and, thus, are always <sup>p</sup>R, λq-consistent. Various choices of <sup>h</sup> in Dec and alternatives in grouping AUTs in Mer cause branching in the same way as in A1. It is easy to see that the counterparts of Theorem 3 hold for A<sup>2</sup> and Alin <sup>2</sup> as well.

A special case of this fragment of anti-unification is anti-unification for similarity relations in fully fuzzy signatures from [1]. Similarity relations are mintransitive proximity relations. The position mappings in [1] can be modeled by our argument mappings, requiring them to be total for symbols of the smaller arity and to satisfy the similarity-specific consistency restrictions from [1].

#### 4.4 Anti-unification with Correspondence Argument Mappings

Correspondence argument mappings are bijections between arguments of function symbols of the same arity. For such mappings, if h »<sup>π</sup> <sup>R</sup>,λ <sup>f</sup> and <sup>h</sup> is <sup>n</sup>-ary, then f is also n-ary and π is a permutation of p1,...,nq. Hence, A<sup>2</sup> combines in this case the properties of A<sup>1</sup> for correspondence relations (Sect. 4.2) and of A<sup>2</sup> for argument mappings (Sect. 4.3): all generalizations are relevant, computed answer gives an mcsg of the input terms, and the algorithm works with term sets of cardinality at most 1.

### 5 Remarks About the Complexity

The proximity relation R can be naturally represented as an undirected graph, where the vertices are function symbols and an edge between them indicates that they are proximal. Graphs induced by proximity relations are usually sparse. Therefore we can represent them by (sorted) adjacency lists. In the adjacency lists, we can also accommodate the argument relations and proximity degrees.

In the rest of this section we use the following notation:


We assume that the given anti-unification problem is represented as a completely shared directed acyclic graph (dag). Each node of the dag has a pointer to the adjacency list (with respect to R) of the symbol in the node.

Theorem 4. *Time complexities of* C *and the linear versions of the generalization algorithms are as follows:*


*Proof (Sketch).* In C, in the case of argument relations, an application of the Red rule to a state **I**; s replaces one element of **I** of size m by at most a new elements, each of them of size m ´ 1. Hence, one branch in the search tree for C, starting from a singleton set **<sup>I</sup>** of size <sup>n</sup>, will have the length at most <sup>l</sup> " <sup>n</sup>´<sup>1</sup> <sup>i</sup>"<sup>0</sup> <sup>a</sup><sup>i</sup> . At each node on it there are at most Δ choices of applying Red with different h's, which gives the total size of the search tree to be at most <sup>l</sup>´<sup>1</sup> <sup>i</sup>"<sup>0</sup> <sup>Δ</sup><sup>i</sup> , i.e., the number of steps performed by <sup>C</sup> in the worst case is <sup>O</sup>pΔ•a*•<sup>n</sup>* q. Those different h's are obtained by intersecting the proximity classes of the heads of terms {t1,...,tm} in the Red rule. In our graph representation of the proximity relation, proximity classes of symbols are exactly the adjacency lists of those symbols which we assume are sorted. Their maximal length is Δ. Hence, the work to be done at each node of the search tree of C is to find the intersection of at most n sorted lists, each containing at most Δ elements. It needs Opn · Δq time. It gives the time complexity <sup>O</sup>p<sup>n</sup> · <sup>Δ</sup> · <sup>Δ</sup>•a*•<sup>n</sup>* q of C for the relation case.

In the mapping case, an application of the Red rule to a state **I**; s replaces one element of **I** of size m by at most a new elements of the *total* size m ´ 1. Therefore, the maximal length of a branch is n, the branching factor is Δ, and the amount of work at each node, like above, is Opn · Δq. Hence, the number of steps in the worst case is <sup>O</sup>pΔ•n<sup>q</sup> and the time complexity of <sup>C</sup> is <sup>O</sup>pn· <sup>Δ</sup> · <sup>Δ</sup>•nq.

The fact that consistency check is incorporated in the Dec rule in Alin <sup>1</sup> can be used to guide the application of this rule, using the values memoized by the previous applications of Red. The very first time, the appropriate h in Dec is chosen arbitrarily. In any subsequent application of this rule, h is chosen according to the result of the Red rule that has already been applied to the arguments of the current AUT for their consistency check, as required by the condition of Dec. In this way, the applications of Dec and Sol will correspond to the applications of Red. There is a natural correspondence between the applications of Rem and Tri rules. Therefore, Alin <sup>1</sup> will have the search tree analogous to that of C. Hence the complexity of Alin <sup>1</sup> is <sup>O</sup>pn·Δ·Δ•a*•<sup>n</sup>* <sup>q</sup>. <sup>A</sup>lin <sup>2</sup> does not call the consistency check, but does the same work as <sup>C</sup> and, hence, has the same complexity <sup>O</sup>p<sup>n</sup> · <sup>Δ</sup> · <sup>Δ</sup>•<sup>n</sup>q. -

### 6 Discussion and Conclusion

The diagram below illustrates the connections between different anti-unification problems based on argument relations:

The arrows indicate the direction from more general problems to more specific ones. For the unrestricted cases (left column) we compute mcsrg's. For correspondence relations and correspondence mappings (right column), mcsg's are computed. (In fact, for them, the notions of mcsrg and mcsg coincide). The algorithms for relations (upper row) are more involved than those for mappings (lower row): Those for relations deal with AUTs containing arbitrary sets of terms, while for mappings, those sets have cardinality at most one, thus simplifying the conditions in the rules. Moreover, the two cases in the lower row generalize the existing anti-unification problems:


All our algorithms can be easily turned into anti-unification algorithms for crisp tolerance relations<sup>3</sup> by taking lambda-cuts and ignoring the computation of the approximation degrees. Besides, they are modular and can be used to compute only linear generalizations by just skipping the merging rule. We provided complexity estimations for the algorithms that compute linear generalizations (that often are of practical interest).

<sup>3</sup> Tolerance: reflexive, symmetric, not necessarily transitive relation. According to Poincaré, a fundamental notion for mathematics applied to the physical world.

In this paper, we did not consider cases when the same pair of symbols is related to each other by more than one argument relation. Our results can be extended to them, that would open a way towards approximate anti-unification modulo background theories specified by shallow collapse-free axioms. Another interesting direction of future work would be extending our results to quantitative algebras [10] that also deal with quantitative extensions of equality.

Acknowledgments. Supported by the Austrian Science Fund, project P 35530.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Guiding an Automated Theorem Prover with Neural Rewriting

Jelle Piepenbrock1,2(B) , Tom Heskes<sup>2</sup> , Mikoláš Janota<sup>1</sup> , and Josef Urban<sup>1</sup>

<sup>1</sup> Czech Technical University in Prague, Prague, Czech Republic Jelle.Piepenbrock@cvut.cz <sup>2</sup> Radboud University, Nijmegen, The Netherlands

Abstract. Automated theorem provers (ATPs) are today used to attack open problems in several areas of mathematics. An ongoing project by Kinyon and Veroff uses Prover9 to search for the proof of the Abelian Inner Mapping (AIM) Conjecture, one of the top open conjectures in quasigroup theory. In this work, we improve Prover9 on a benchmark of AIM problems by neural synthesis of useful alternative formulations of the goal. In particular, we design the 3SIL (stratified shortest solution imitation learning) method. 3SIL trains a neural predictor through a reinforcement learning (RL) loop to propose correct rewrites of the conjecture that guide the search.

3SIL is first developed on a simpler, Robinson arithmetic rewriting task for which the reward structure is similar to theorem proving. There we show that 3SIL outperforms other RL methods. Next we train 3SIL on the AIM benchmark and show that the final trained network, deciding what actions to take within the equational rewriting environment, proves 70.2% of problems, outperforming Waldmeister (65.5%). When we combine the rewrites suggested by the network with Prover9, we prove 8.3% more theorems than Prover9 in the same time, bringing the performance of the combined system to 90%.

Keywords: Automated theorem proving · Machine learning

## 1 Introduction

Machine learning (ML) has recently proven its worth in a number of fields, ranging from computer vision [17], to speech recognition [15], to playing games [28,40] with *reinforcement learning* (RL) [45]. It is also increasingly applied in automated and interactive theorem proving. Learned predictors have been used for premise selection [1] in hammers [6], to improve clause selection in saturationbased theorem provers [9], to synthesize functions in higher-order logic [12], and to guide connection-tableau provers [21] and interactive theorem provers [2,5,14].

Future growth of the knowledge base of mathematics and the complexity of mathematical proofs will increase the need for proof checking and its better computer support and automation. Simultaneously, the growing complexity of software will increase the need for formal verification to prevent failure modes [10]. Automated theorem proving and mathematics will benefit from more advanced ML integration. One of the mathematical subfields that makes substantial use of automated theorem provers is the field of quasigroup and loop theory [32].

#### 1.1 Contributions

In this paper, we propose to use a neural network to suggest lemmas to the Prover9 [25] ATP system by rewriting parts of the conjecture (Sect. 2). We test our method on a dataset of theorems collected in the work on the Abelian Inner Mapping (AIM) Conjecture [24] in loop theory. For this, we use the AIMLEAP proof system [7] as a reinforcement learning environment. This setup is described in Sect. 3. For development we used a simpler Robinson arithmetic rewriting task (Sect. 4). With the insights derived from this and a comparison with other methods, we describe our own 3SIL method in Sect. 5. We use a neural network to process the state of the proving attempt, for which the architecture is described in Sect. 6. The results on the Robinson arithmetic task are described in Sect. 7.1. We show our results on the AIMLEAP proving task, both using our predictor as a stand-alone prover and by suggesting lemmas to Prover9 in Sect. 7.2. Our contributions are:


# 2 ATP and Suggestion of Lemmas by Neural Rewriting

Saturation-based ATPs make use of the *given clause* [30] algorithm, which we briefly explain as background. A problem is expressed as a conjunction of many initial clauses (i.e., the clausified axioms and the negated goal which is always an equation in the AIM dataset). The algorithm starts with all the initial clauses in the *unprocessed set*. We then pick a clause from this set to be the given clause and move it to the *processed set* and do all inferences with the clauses in the processed set. The newly inferred clauses are added to the unprocessed set. This concludes one iteration of the algorithm, after which we pick a new given

Fig. 1. Schematic representation of the proposed guidance method. In the first phase, we run a reinforcement learning loop to propose actions that rewrite a conjecture. This predictor is trained using the AIMLEAP proof environment. We collect the rewrites of the LHS and RHS of the conjecture. In the second phase, we add the rewrites to the ATP search input, to act as guidance. In this specific example, we only rewrote the conjecture for 1 step, but the added guidance lemmas are in reality the product of many steps in the RL loop.

clause and repeat [23]. Typically, this approach is designed to be *refutationally complete*, i.e., the algorithm is guaranteed to eventually find a contradiction if the original goal follows from the axioms.

This process can produce a lot of new clauses and the search space can become quite large. In this work, we modify the standard loop by adding useful lemmas to the initial clause set. These lemmas are proposed by a neural network that was trained *from zero knowledge* to rewrite the left- and right-hand sides of the initial goal to make them equal by using the axioms as the available rewrite actions. Even though the neural rewriting might not fully succeed, the rewrites produced by this process are likely to be useful as additional lemmas when added to the problem. This idea is schematically represented in Fig. 1.

## 3 AIM Conjecture and the AIMLEAP RL Environment

Automated theorem proving has been applied in the theory surrounding the Abelian Inner Mapping Conjecture, known as the AIM Conjecture. This is one of the top open conjectures in quasigroup theory. Work on the conjecture has been going on for more than a decade. Automated theorem provers use hundreds of thousands of inference steps when run on problems from this theory.

As a testbed for our machine learning and prover guidance methods we use a previously published dataset of problems generated by the AIM conjecture [7]. The dataset comes with a simple prover called AIMLEAP that can take machine learning advice.<sup>1</sup> We use this system as an RL environment. AIMLEAP keeps the state and carries out the cursor movements (the cursor determines the location of the rewrite) and rewrites that a neural predictor chooses.

<sup>1</sup> https://github.com/ai4reason/aimleap.

The AIM conjecture concerns specific structures in *loop theory* [24]. A loop is a quasigroup with an identity element. A quasigroup is a generalization of a group that does not preserve associativity. This manifests in the presence of two different 'division' operators, one left-division (\) and one right-division (/). We briefly explain the conjecture to show the nature of the data.

For loops, three *inner mapping functions* (left-translation L, right-translation R, and the mapping T) are:

$$\begin{aligned} L(u,x,y) &:= (y\*x) \backslash (y\*(x\*u)) & \quad &T(u,x) := x \backslash (u\*x) \\ R(u,x,y) &:= ((u\*x)\*y) \backslash (x\*y) \end{aligned}$$

These mappings can be seen as measures of the deviation from commutativity and associativity. The conjecture concerns the consequences of these three inner mapping functions forming an Abelian (commutative) group. There are two more notions, that of the *associator* function *a* and the *commutator* function *K*:

$$a(x,y,z) := (x\*(y\*z))\backslash((x\*y)\*z) \qquad\qquad K(x,y) := (y\*x)/(x\*y)$$

From these definitions, the conjecture can be stated. There are two parts to the conjecture. For both parts, the following equalities need to hold for all *u, v, x, y,* and *z* :

$$a(a(x,y,z),u,v) = 1 \qquad \qquad a(x,a(y,z,u),v) = 1 \qquad \qquad a(x,y,a(z,u,v)) = 1$$

where 1 is the identity element. These are necessary, but not sufficient for the two main parts of the conjecture. The first part of the conjecture asks whether a loop modulo its center is a group. In this context, the *center* is the set of all elements that commute with all other elements. This is the case if

$$K(a(x,y,z),u) = 1.$$

The second part of the conjecture asks whether a loop modulo its nucleus is an Abelian group. The *nucleus* is the set of elements that associate with all other elements. This is the case if

$$a(K(x,y),z,u) = 1 \qquad \qquad a(x,K(y,z),u) = 1 \qquad \qquad a(x,y,K(z,u)) = 1$$

#### 3.1 The AIMLEAP RL Environment

Currently, work in this area is done using automated theorem provers such as *Prover9* [24,25]. This has led to some promising results, but the search space is enormous. The main strategy for proving the AIM conjecture thus far has been to prove weaker versions of the conjecture (using additional assumptions) and then import crucial proof steps into the stronger version of the proof. The *Prover9* theorem prover is especially suited to this approach because of its wellestablished *hints* mechanism [48]. The AIMLEAP dataset is derived from this *Prover9* approach and contains around 3468 theorems that can be proven with the supplied definitions and lemmas [7].

There are 177 possible actions in the AIMLEAP environment [7]. We handle the proof state as a tree, with the root node being an equality node. Three actions are cursor movements, where the cursor can be moved to an argument of the current position. The other actions all rewrite the current term at the cursor position with various axioms, definitions and lemmas that hold in the AIM context. As an example, this is one of the theorems in the dataset (\ and = are part of the language):

$$T(T(T(x,T(x,y)\backslash 1),T(x,y)\backslash 1),y) = T((T(x,y)\backslash 1)\backslash 1,T(x,y)\backslash 1)\dots$$

The task of the machine learning predictor is to process the proof state and recognize which actions are most likely to lead to a proof, meaning that the two sides of the starting equation are equal according to the AIMLEAP system. The only feedback that the environment gives is whether a proof has been found or not: there is no intermediate reward (i.e. rewards are *sparse*). The ramifications of this are further discussed in Sect. 5.1.

#### 4 Rewriting in Robinson Arithmetic as an RL Task

To develop a machine learning method that can help solve equational theorem proving problems, we considered a simpler arithmetic task, which also has a treestructured input and *a sparse reward structure*: the normalization of Robinson arithmetic expressions. The task is to normalize a mathematical expression to one specific form. This task has been implemented as a Python RL environment, which we make available.<sup>2</sup> The learning environment incorporates an existing dataset, constructed by Gauthier for RL experiments in the interactive theorem prover HOL4 [11]. Our RL setup for the task is also modeled after [11].

In more detail, the formalism that we use as an RL environment is Robinson arithmetic (RA). RA is a simple arithmetic theory. Its language contains the successor function *S*, addition *+* and multiplication *\** and one constant, the 0. The theory considers only non-negative numbers and we only use four axioms of RA. Numbers are represented by the constant 0 with the appropriate number of successor functions applied to it. The task for the agent is to rewrite an expression until there are only nodes of the successor or 0 types. Effectively, we are asking the agent to calculate the value of the expression. As an example, S(S(0)) + S(0), representing 2+1, needs to be rewritten to S(S(S(0))).

The expressions are represented as a tree data structure. Within the environment, there are seven different rewrite actions available to the agent. The four axioms (equations) defining these actions are x +0= x, x + S(y) = S(x + y), x ∗ 0=0 and x ∗ S(y)=(x ∗ y) + x, where the agent can apply the equations in either direction. There is one exception: the multiplication by 0 cannot be applied from right to left, as this would require the agent to introduce a fresh

<sup>2</sup> https://github.com/learningeqtp/rewriteRL.

term which is out of scope for the current work. The place where the rewrite is applied is denoted by the location of the *cursor* in the expression tree.

In addition to the seven rewrite actions, the agent can move the cursor to one of the children of the current cursor node. This gives a total number of nine actions. Moving to a child of a node with only one child counts as moving to the left child. After a rewriting action, the cursor is reset to the root of the expression. More details on the actions are in the RewriteRL repository.

## 5 Reinforcement Learning Methods

This section describes the reinforcement learning methods, while Sect. 6 then further explains the particular neural architectures that are trained in the RL loops. We first briefly explain here the approaches that we used as reinforcement learning (RL) baselines, then we go into detail about the proposed 3SIL method.

#### 5.1 Reinforcement Learning Baselines

General RL Setup. For comparison, we used implementations of four established reinforcement learning baseline methods. In reinforcement learning, we consider an *agent* that is acting within an *environment*. The agent can take actions a from the action-space A to change the state s ∈ S of the environment. The agent can be rewarded for certain actions taken in a certain states, with reward given by the *reward function* R : (S×A) → R. The behavior of the environment is given by the *state transition function* P : (S×A) → S. The history of the agent's actions and the environments states and rewards at each timestep t are collected in tuples (st, at, rt). For a given history of a certain agent within an environment, we call the list of tuples (st, at, rt) describing this history an *episode*. The *policy function* π : S→A allows the agent to decide which action to take. The agent's goal is to maximize the return R: the sum of discounted rewards - <sup>t</sup>≥<sup>0</sup> <sup>γ</sup><sup>t</sup> rt, where γ is a *discount factor* that allows control over how heavily rewards further in the future should be weighted. We will use R<sup>t</sup> when we mean R, but calculated only from rewards from timestep t on. In the end, we are thus looking for a policy function π that maximizes the sum R of (discounted) expected rewards [45].

In our setting, every proof attempt (in the AIM setting) or normalization attempt (in the Robinson arithmetic setting) corresponds to an episode. The reward structure of theorem proving is such that there is only a reward of 1 at the end of a successful episode (i.e. a proof was found in AIM). Unsuccessful episodes get a reward of 0 at every timestep t.

A2C. The first method, *Advantage Actor-Critic*, or *A2C* [27] contains ideas on which the other three RL baseline methods build, so we will go into more detail for this method, while keeping the explanation for the other methods brief. For details we refer to the corresponding papers.

A2C attempts to find suitable parameters for an agent by minimizing a *loss function* consisting of two parts:

$$\mathcal{L} = \mathcal{L}\_{\text{policy}}^{\text{A2C}} + \mathcal{L}\_{\text{value}}^{\text{A2C}} \cdot \mathcal{L}$$

In addition to the policy function π, the agent has access to a *value function* V : S → R, that predicts the sum of future rewards obtained when given a state. In practice, both the policy and the value function are computed by a neural network *predictor*. The parameters of the predictor are set by *stochastic gradient descent* to minimize L. The set of parameters of the predictor that defines the policy function π is named θ, while the parameters that define the value function are named μ. The first part of the loss is the *policy loss*, which for one time step has the form

$$\mathcal{L}\_{\text{policy}}^{\text{A2C}} = -\log \pi\_{\theta}(a\_t|s\_t) A(s\_t, a\_t) \,,$$

where A(s, a) is the *advantage function*. The advantage function can be formulated in multiple ways, but the simplest is as R<sup>t</sup> − Vμ(st). That is to say: the advantage of an action in a certain state is the difference between the discounted rewards R<sup>t</sup> after taking that action and the value estimate of the current state.

Minimizing <sup>L</sup>A2C policy amounts to maximizing the log probability of predicting actions that are judged by the advantage function to lead to high reward.

The value estimates Vμ(s) for computing the advantage function are supplied by the *value predictor* V<sup>μ</sup> with parameters μ, which is trained using the loss:

$$
\mathcal{L}\_{\text{value}}^{\text{A2C}} = \frac{1}{2} \left( R\_t - \mathcal{V}\_{\mu}(s\_t) \right)^2,
$$

which minimizes the advantage function. The logic of this is that the value estimate at timestep t, Vμ(st), will learn to incorporate the later rewards Rt, ensuring that when later seeing the same state, the possible future reward will be considered. Note that the sets of parameters θ and μ are not necessarily disjoint (see Sect. 6).

Note how the above equations are affected if there is no non-zero reward r<sup>t</sup> obtained at any timestep. In that case, the value function Vμ(st) will estimate (correctly) that any state will get 0 reward, which means that the advantage function <sup>A</sup>(s, a) will also be 0 everywhere. This means that <sup>L</sup>A2C policy will be 0 in most cases, which will lead to no or little change in the parameters of the predictor: learning will be very slow. This is the difficult aspect of the structure of theorem proving: there is only reward at the end of a successful proof, and nowhere else. This implies a possible strategy is to imitate successful episodes, without a value function. In this case, we would only need to train a *policy function*, and no approximate *value function*. This an aspect we explore in the design of our own method 3SIL, which we will explain shortly.

Compared to two-player games, such as chess and go, for which many approaches have been tailored and successfully used [41], theorem-proving has the property that it is hard to collect useful examples to learn from, as only successful proofs are likely to contain useful knowledge. In chess or go, however, one player almost always wins and the other loses, which means that we can at least learn from the difference between the two strategies used by those players. As an example, we executed 2 million random proof attempts on the AIMLEAP environment, which led to 300 proofs to learn from, whereas in a two-player setting like chess, we would get 2 million games in which one player would likely win.

ACER. The second RL baseline method we tested in our experiments is ACER, *Actor-Critic with Experience Replay* [49]. This approach can make use of data from older episodes to train the current predictor. ACER applies corrections to the value estimates so that data from old episodes may be used to train the current policy. It also uses trust region policy optimization [35] to limit the size of the policy updates. This method is included as a baseline to check if using a larger replay buffer to update the parameters would be advantageous.

PPO. Our third RL baseline is the widely used *proximal policy optimization* (PPO) algorithm [36]. It restricts the size of the parameter update to avoid causing a large difference between the original predictor's behavior and the updated version's behavior. The method is related to the above trust region policy optimization method. In this way, PPO addresses the training instability of many reinforcement learning approaches. It has been used in various settings, for example complex video games [4]. With its versatility, the PPO algorithm is well-positioned. We use the PPO algorithm with clipped objective, as in [36].

SIL-PAAC. Our final RL baseline uses only the transitions with positive advantage to train on for a portion of the training procedure, to learn more from good episodes. This was proposed as *self-imitation learning* (SIL) [29]. To avoid confusion with the method that we are proposing, we extend the acronym to SIL-PAAC, for positive advantage actor-critic. This algorithm outperformed A2C on the sparse-reward task Montezuma's Revenge (a puzzle game). As theorem proving has a sparse reward structure, we included SIL-PAAC as a baseline. More information about the implementations for the baselines can be found in the Implementation Details section at the end of this work.

### 5.2 Stratified Shortest Solution Imitation Learning

We introduce stratified shortest solution imitation learning (3SIL) to tackle the equational theorem proving domain. It learns to explicitly imitate the actions taken during the shortest solutions found for each problem in the dataset. We do this by minimizing the cross-entropy −log p(a*solution*|st) between the predictor output and the actions taken in the shortest solution. This is in contrast to the baseline methods, where value functions are used to judge the utility of decisions.

In our procedure this is not the case. Instead, we build upon the assumption for data selection that shorter proofs are better in the context of theorem proving


Input: problem p, policy πθ, problem history H Generate episode by following noisy version of π<sup>θ</sup> on p If solution, add list of tuples (s, a) to H[p] Keep k shortest solutions in H[p]

#### Algorithm 2. 3SIL

Input: set of problems P, randomly initialized policy πθ, batch size B, number of batches NB, problem history H, number of warmup episodes m, number of episodes f, max epochs ME Output: trained policy πθ, problem history H for e = 0 to ME *−* 1 do if e = 0 then num = m else num = f for i = 0 to num *−* 1 do CollectEpisode(sample(P), πθ, H) (Algorithm 1) end for for i = 0 to NB *−* 1 do Sample B tuples (s, a) with uniform probability for each problem from H Update θ to lower *−* -B <sup>b</sup>=0 log πθ(ab*|*sb) by gradient descent end for end for

and expression normalization. In a sense, we value decisions from shorter proofs more and explicitly imitate those transitions. We keep a history H for each problem, where we store the current shortest solution (states seen and actions taken) found for that problem in the training dataset. We can also store multiple shortest solutions for each problem if there are multiple strategies for a proof (the number of solutions kept is governed by the parameter k).

During training, in the case k = 1, we sample state-action pairs from each problem's current shortest solution at an equal probability (if a solution was found). To be precise, we first randomly pick a theorem for which we have a solution, and then randomly sample one transition from the shortest encountered solution. This directly counters one of the phenomena that we had observed: the training examples for the baseline methods tend to be dominated by very long episodes (as they contribute more states and actions). This *stratified* sampling method ensures that problems with short proofs get represented equally in the training process.

The 3SIL algorithm is described in more detail in Algorithm 2. Sampling from a noisy version of policy π<sup>θ</sup> means that actions are sampled from the predictordefined distribution and in 5% of cases a random valid action is selected. This is also known as the -greedy policy (with at 0.05).

Related Methods. Our approach is similar to the imitation learning algorithm DAGGER (Dataset Aggregation), which was used for several games [34] and modified for branch-and-bound algorithms in [16]. The behavioral cloning (BC) technique used in robotics [47] also shares some elements. 3SIL significantly differs from DAGGER and BC because it does not use an outside expert to obtain useful data, because of the stratified sampling procedure, and because of the selection of the shortest solutions for each problem in the training dataset. We include as an additional baseline an implementation of behavioral cloning (BC), where we regard proofs already encountered as coming from an expert. We minimize cross-entropy between the actions in proofs we have found and the predictions to train the predictor. For BC, there is no stratified sampling or shortest solution selection, only the minimization of cross-entropy between actions taken from recent successful solutions and the predictor's output.

Extensions. For the AIM tasks, we introduce two other techniques, *biased sampling* and *episode pruning*. In biased sampling, problems without a solution in the history are sampled 5 times more during episode collection than solved problems to accelerate progress. This was determined by testing 1, 2, 5 and 10 as sampling proportions. For episode pruning, when the agent encountered the same state twice, we prune the episode to exclude the looping before storing the episode. This helps the predictor learn to avoid these loops.

# 6 Neural Architectures

The tree-structured states representing expressions occurring during the tasks will be processed by a neural network. The neural network takes the treestructured state and predicts an action to take that will bring the expression closer to being normalized or the theorem closer to being proven.

Fig. 2. Schematic representation of the creation of a representation of an expression (*an embedding*) using different neural network layers to represent different operations. The figure depicts the creation of a numerical representation for the Robinson arithmetic expression (S(0) + 0). Note that the successor layer and the addition layer consist of trainable parameters, for which the values are set through gradient descent.

There are two main components to the neural network we use: an *embedding* tree neural network that outputs a numerical vector representing the treestructured proof state and a second *processor* network that takes this vector representation of the state and outputs a distribution of the actions possible in the environment.<sup>3</sup>

Tree neural networks have been used in various settings, such as natural language processing [20] and also in Robinson arithmetic expression embedding [13]. These networks consist of smaller neural networks, each representing one of the possible functions that occur in the expressions. For example, there will be separate networks representing addition and multiplication. The cursor is a special unary operation node with its own network that we insert into the tree at the current location. For each unique constant, such as the constant 0 in RA or the identity element 1 for the AIM task, we generate a random vector (from a standard normal distribution) that will represent this leaf. In the case of the AIM task, these vectors are parameters that can be optimized during training.

At prediction time, the numerical representation of a tree is constructed by starting at the leaves of the tree, for which we can look up the generated vectors. These vectors act as input to the neural networks that represent the parent node's operation, yielding a new vector, which now represents the subtree of the parent node. The process repeats until there is a single vector for the entire tree after the root node is processed (see also Fig. 2).

The neural networks representing each operation consist of a linear transformation, a non-linearity in the form of a rectified linear unit (ReLU) and another linear transformation. In the case of binary operations, the first linear transformation will have an input dimension of *2n* and an output dimension of *n*, where *n* is the dimension of the vectors representing leaves of the tree (the *internal representation size*). The weights representing these transformations are randomly initialized at the beginning of training.

When we have obtained a single vector embedding representing the entire tree data structure, this vector serves as the input to the *predictor* neural network, which consists of three linear layers, with non-linearities (Sigmoid/ReLU) in between these layers. The last layer has an output dimension equal to the number of possible actions in the environment. We obtain a probability distribution over the actions, e.g. by applying the softmax function to the output of this last layer. In the cases where we also need a value prediction, there is a parallel last layer that predicts the state's value (usually referred to as a *two-headed* network [41]). The internal representation size n for the Robinson arithmetic experiments is set to 16, for the AIM task this is 32. The number of neurons in each layer (except for the last one) of the predictor networks is 64.

In the AIM dataset task, an arbitrary number of variables can be introduced during the proof. These are represented by untrainable random vectors. We add a special neural network (with the same architecture as the networks representing unary operations, so from size *n* to *n*) that processes these vectors before they are

<sup>3</sup> In the reinforcement learning baselines that we use, this second *processor* network has the additional task of predicting the value of a state.

processed by the rest of the tree neural network embedding. The idea is that this neural network learns to project these new variable vectors into a subspace and that an arbitrary number of variables can be handled. The vectors are resampled at the start of each episode, so the agent cannot learn to recognize specific variables. This approach was partly inspired by the *prime* mechanism in [13], but we use separate vectors for all variables instead of building vectors sequentially. All our neural networks are implemented using the PyTorch library [31].

# 7 Experiments

We first describe our experiments on the Robinson arithmetic task, with which we designed the properties of our 3SIL approach with the help of comparisons with other algorithms. We then train a predictor using 3SIL on the AIMLEAP loop theory dataset, which we evaluate both as a standalone prover within the RL environment and as a neural guidance mechanism for the ATP Prover9.

### 7.1 Robinson Arithmetic Dataset

Dataset Details. The Robinson arithmetic dataset [11] is split into three distinct sets, based on the number of steps that it takes a fixed rewriting strategy to normalize the expression. This fixed strategy, LOPL, which stands for *left outermost proof length*, always rewrites the leftmost possible element. If it takes this strategy less than 90 steps to solve the problem, it is in the *low* difficulty category. Problems with a difficulty between 90 and 130 are in the *medium* category and a greater difficulty than 130 leads to the *high* category. The *high* dataset also contains problems the LOPL strategy could not solve within the time limit. The *low* dataset is split into a training and testing set. We train on the *low* difficulty problems, but after training we also test on problems with a higher difficulty. Because we have a difficulty measure for this dataset, we use a curriculum setup. We start by learning to normalize the expressions that a fixed strategy can normalize in a small amount of steps. This setup is similar to [11].

Training Setup. The 400 problems with the lowest difficulty are the starting point. Every time an agent reaches 95 percent success rate when evaluated on a sample of size 400 from these problems, we add 400 more difficult problems to set of training problems P. One iteration of the *collection* and *training* phase is called an *epoch*. Agents are evaluated after every epoch. The blocks of size 400 are called *levels*. The number of episodes m and f are set to 1000. For 3SIL and BC, the batch size BS is 32 and the number of batches NB is 250. The baselines are configured so that the number of episodes and training transitions is at least as many as the 3SIL/BC approaches. Episodes that take over 100 steps are stopped. ADAM [22] is used as an optimizer.

Fig. 3. The level in the curriculum reached by each method. Each method was run three times. The bold line shows the mean performance and the shaded region shows the minimum and maximum performance. K is the number of proofs stored per problem.

Results on RA Curriculum. In Fig. 3, we show the progression through the training curriculum for behavioral cloning (BC), the RL methods (PPO, ACER) and two configurations of 3SIL. Behavioral cloning simply imitates actions from successful episodes. Of the RL baselines, PPO reaches the second level in one run, while ACER steadily solves the first level and in the best run solves around 80% of the second level. Both methods do not learn enough solutions for the second level to advance to the third. A2C and SIL-PAAC do not reach the second level, so these are left out of the plot. However, they do learn to solve about 70–80% of the first 400 problems. From these results we can conclude that the RL baselines do not perform well on this task in our experiment. We attribute this to the difficulty of learning a good value function due to the sparse rewards (Sect. 5.1). Our hypothesis is that because this value estimate influences the policy updates, the RL methods do not learn well on this task. Note that the two methods with a trust region update mechanism, ACER and PPO, perform better than the methods without this mechanism. From these results, it is clear that 3SIL with 1 shortest proof stored, k = 1, is the best-performing configuration. It reaches the end of the training curriculum of about 5000 problems in 40 epochs. We experimented with k = 3 and k = 4, but these were both worse than k = 2.

Generalization. While our approach works well on the training set, we must check if the predictors generalize to unseen examples. Only the methods that reached the end of the curriculum are tested. In Table 1, we show the results of evaluating the performance of our predictors on the three different test sets: the unseen examples from the *low* dataset and the unseen examples from the *medium* and *high* datasets. Because we expect longer solutions, the episode limits are expanded from 100 steps to 200 and 250 for the *medium* and *high* datasets respectively. For the *low* and *medium* datasets, the second of which contains problems with more difficult solutions than the training data, the predictors solve almost all test problems. For the *high* difficulty dataset, the performance drops by at least 20% points. Our method outperforms the Monte Carlo Tree Search approach used in [11] on the same datasets, which got to 0.954 on the *low* dataset with 1600 iterations and 0.786 on the *medium* dataset (no results on the *high* dataset were reported). These results indicate that this training method might be strong enough to perform well on the AIM rewriting RL task.

Table 1. Generalization with greedy evaluation on the test set for the Robinson arithmetic normalization tasks, shown as average success rate and standard deviation from 3 training runs. Generalization is high on the low and medium difficulty (training data is similar to the low difficulty dataset). With high difficulty data, performance drops.


#### 7.2 AIM Conjecture Dataset

Training Setup. Finally, we train and evaluate 3SIL on the AIM Conjecture dataset. We apply 3SIL (k = 1) to train predictors in the AIMLEAP environment. Ten percent of the AIM dataset is used as a hold-out test set, not seen during training. As there is no estimate for the difficulty of the problems in terms of the actions available to the predictor, we do not use a curriculum ordering for these experiments. The number m of episodes collected before training is set to 2,000,000. These random proof attempts result in about 300 proofs. The predictor learns from these proofs and afterwards the search for new proofs is also guided by its predictions. For the AIM experiments, episodes are stopped after 30 steps in the AIMLEAP environment. The predictors are trained for 100 epochs. The number of collected episodes per epoch f is 10,000. The successful proofs are stored, and the shortest proof for each theorem is kept. NB is 500 and BS is set to 32. The number of problems with a solution in the history after each epoch of the training run is shown in Fig. 4.

Results as a Standalone Prover. After 100 epochs, about 2500 of 3114 problems in the training dataset have a solution in their history. To test the generalization capability of the predictors, we inspect their performance on the holdout test set problems. In Table 2 we compare the success rate of the trained predictors on the holdout test set with three different automated theorem provers: E [37,38], Waldmeister [19] and Prover9. E is currently one of the best overall automated theorem provers [44], Waldmeister is a prover specialized in memoryefficient equational theorem proving [18] and Prover9 is the theorem prover that

Fig. 4. The number of training problems for which a solution was encountered and stored (cumulative). At the start of the training, the models rapidly collect more solutions, but after 100 epochs, the process slows down and settles at about 2500 problems with known solutions. The minimum, maximum and mean of three runs are shown.

is used for AIM conjecture research and the prover that the dataset was generated by. Waldmeister and E are the best performing solvers in competitions for the relevant unit equality (UEQ) category [44].

Table 2. Theorem proving performance on the hold-out test set in fraction of problems solved. Means and standard deviations are the results of evaluations of 3 different predictors from 3 different training runs on the 354 unseen test set problems.


The results show that a single greedy evaluation of the predictor trying to solve the problem in the AIMLEAP environment is not as strong as the theorem proving software. However, the theorem provers got 60 s of execution time, and the execution of the predictor, including interaction with AIMLEAP, takes on average less than 1 s. We allowed the predictor setup to use 60 s, by running attempts in AIMLEAP until the time was up, sampling actions from the predictor's distribution with 5% noise, instead of using greedy execution. With this approach, the predictor setup outperforms Waldmeister.<sup>4</sup> Figure 5 shows the overlap between the problems solved by each prover. The diagram shows that each theorem prover found a few solutions that no other prover could find within

<sup>4</sup> After the initial experiments, we also evaluated Twee [42], which won the most recent UEQ track: it can prove most of the test problems in 60 s, only failing for 1 problem.

the time limit. Almost half of all problems from the test set that are solved are solved by all four systems.

Fig. 5. Venn diagram of the test set problems solved by each solver with 60 s time limit.

Results of Neural Rewriting Combined with Prover9. We also combine the predictor with *Prover9*. In this setup, the predictor modifies the starting form of the goal, for a maximum of 1 s in the AIMLEAP environment. This produces new expressions on one or both sides of the equality. We then add, as lemmas, equalities between the left-hand side of the goal before the predictor's rewriting and after each rewriting (see Fig. 1). The same is done for the righthand side. For each problem, this procedure yields new lemmas that are added to the problem specification file that is given to *Prover9*.

Table 3. Prover9 theorem proving performance on the hold-out test set when injecting lemmas suggested by the learned predictor. *Prover9* 's performance increases when using the suggested lemmas.


In Table 3, it is shown that adding lemmas suggested by the rewriting actions of the trained predictor improves the performance of *Prover9*. Running *Prover9* for 2 s results in better performance than running it for 1 s, as expected. The combined (1 s + 1 s) system improved on *Prover9's* 2-s performance by 12.7% (= 0.841/0.746), indicating that the predictor suggests useful lemmas. Additionally, 1 s of neural rewriting combined with 59 s of Prover9 search proves almost 8.3% (= 0.902/0.833) more theorems than Prover9 with a 60 s time limit (Table 2).

#### 7.3 Implementation Details

All experiments for the Robinson task were run on a 16 core Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60 GHz. The AIM experiments were run on a 72 core Intel(R) Xeon(R) Gold 6140 CPU @ 2.30 GHz. All calculations were done on CPU. The PPO implementation was adapted from an existing implementation [3]. The model was updated every 2000 timesteps, the PPO clip coefficient was set to 0.2. The learning rate was 0.002 and the discount factor γ was set to 0.99. The ACER implementation was adapted from an available implementation [8]. The replay buffer size was 20,000. The truncation parameter was 10 and the model was updated every 100 steps. The replay ratio was set to 4. Trust region decay was set to 0.99 and the constraint was set to 1. The discount factor was set to 0.99 and the learning rate to 0.001. Off-policy minibatch size was set to 1. The A2C and SIL implementations were based on Pytorch actor-critic example code available at the PyTorch repository [33]. For the A2C algorithm, we experimented with two formulations of the advantage function: the 1-step lookahead estimate (r<sup>t</sup> + γVμ(st+1)) − Vμ(st) and the R<sup>t</sup> − Vμ(st) formulation. However, we did not observe different performance, so we opted in the end for the 1-step estimate favored in the original A2C publication. For SIL-PAAC, we implemented the SIL loss on top of the A2C implementation. There is also a prioritized replay buffer with an exponent of 0.6, as in the original paper. Each epoch, 8000 (250 batches of size 32) transitions were taken from the prioritized replay buffer in the SIL step of the algorithm. The size of the prioritized replay buffer was 40,000. The critic loss weight was set to 0.01 as in the original paper. For the 3SIL and behavioral cloning implementations, we sample 8000 transitions (250 batches of size 32) from the replay buffer or history. For the behavioral cloning, we used a buffer of size 40,000. An example implementation of 3SIL can be found in the RewriteRL repository. On the Robinson arithmetic task, for 3SIL and BC, the evaluation is done greedily (always take the highest probability actions). For the other methods, we performed experiments with both greedy and non-greedy (sample from the predictor distribution and add 5% noise) evaluation and show the results the best-performing setting (which in most cases was the non-greedy evaluation, except for PPO). On the AIM task, we evaluate greedily with 3SIL.

AIMLEAP expects a distance estimate for each applicable action. This represents the estimated distance to a proof. This behavior was converted to a reinforcement learning setup by always setting the chosen action of the model to the minimum distance and all other actions to a distance larger than the maximum proof length. Only the chosen action is then carried out.

Versions of the automated theorem provers used: Version 2.5 of E [39], the Nov 2017 version of Prover9 [26] and the Feb 2018 version of Waldmeister [46] and version 2.4.1 of Twee [43].

## 8 Conclusion and Future Work

Our experiments show that a neural rewriter, trained with the 3SIL method that we designed, can learn to suggest useful lemmas that assist an ATP and improve its proving performance. With the same limit of 1 min, Prover9 managed to prove close to 8.3% more theorems. Furthermore, our 3SIL training method is powerful enough to train an equational prover from zero knowledge that can compete with hand-engineered provers, such as Waldmeister. Our system on its own proves 70.2% of the unseen test problems in 60s, while Waldmeister proved 65.5%.

In future work, we will apply our method to other equational reasoning tasks. An especially interesting research direction concerns selecting which proofs to learn from: some sub-proofs might be more general than other sub-proofs. The incorporation of graph neural networks instead of tree neural networks may improve the performance of the predictor, since in graph neural networks information not only propagates from the leaves to the root, but also through all other connections.

Acknowledgements. We would like to thank Chad Brown for his work with the AIMLEAP software. In addition, we thank Thibault Gauthier and Bartosz Piotrowski for their help with the Robinson arithmetic rewriting task and the AIM rewriting task respectively. We also thank the referees of the IJCAR conference for their useful comments.

This work was partially supported by the European Regional Development Fund under the Czech project AI&Reasoning no. CZ.02.1.01/0.0/0.0/15\_003/ 0000466 (JP, JU), Amazon Research Awards (JP, JU) and by the Czech MEYS under the ERC CZ project *POSTMAN* no. LL1902 (JP, MJ).

This article is part of the RICAIP project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 857306.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Rensets and Renaming-Based Recursion for Syntax with Bindings**

Andrei Popescu(B)

Department of Computer Science, University of Sheffield, Sheffield, UK a.popescu@sheffield.ac.uk

**Abstract.** I introduce *renaming-enriched sets* (*rensets* for short), which are algebraic structures axiomatizing fundamental properties of renaming (also known as variable-for-variable substitution) on syntax with bindings. Rensets compare favorably in some respects with the wellknown foundation based on nominal sets. In particular, renaming is a more fundamental operator than the nominal swapping operator and enjoys a simpler, equationally expressed relationship with the variablefreshness predicate. Together with some natural axioms matching properties of the syntactic constructors, rensets yield a truly minimalistic characterization of λ-calculus terms as an abstract datatype – one involving an infinite set of *unconditional equations*, referring only to the most fundamental term operators: the constructors and renaming. This characterization yields a recursion principle, which (similarly to the case of nominal sets) can be improved by incorporating Barendregt's variable convention. When interpreting syntax in semantic domains, my renaming-based recursor is easier to deploy than the nominal recursor. My results have been validated with the proof assistant Isabelle/HOL.

### **1 Introduction**

Formal reasoning about syntax with bindings is necessary for the meta-theory of logics, calculi and programming languages, and is notoriously error-prone. A great deal of research has been put into formal frameworks that make the specification of, and the reasoning about bindings more manageable.

Researchers wishing to formalize work involving syntax with bindings must choose a paradigm for representing and manipulating syntax—typically a variant of one of the "big three": nameful (sometimes called "nominal" reflecting its best known incarnation, nominal logic [23,39]), nameless (De Bruijn) [4,13,49,51] and higher-order abstract syntax (HOAS) [19,20,28,34,35]. Each paradigm has distinct advantages and drawbacks compared with each of the others, some discussed at length, e.g., in [1,9] and [25, §8.5]. And there are also hybrid approaches, which combine some of the advantages [14,18,42,47].

A significant advantage of the nameful paradigm is that it stays close to the way one informally defines and manipulates syntax when describing systems in textbooks and research papers—where the binding variables are explicitly indicated. This can in principle ensure transparency of the formalization and allows the formalizer to focus on the high-level ideas. However, it only works if the technical challenge faced by the nameful paradigm is properly addressed: enabling the seamless definition and manipulation of concepts "up to alphaequivalence", i.e., in such a way that the names of the bound variables are (present but nevertheless) inconsequential. This is particularly stringent in the case of recursion due to the binding constructors of terms not being free, hence not being *a priori* traversable recursively—in that simply writing some recursive clauses that traverse the constructors is not *a priori* guaranteed to produce a correct definition, but needs certain favorable conditions. The problem has been addressed by researchers in the form of tailored *nameful recursors* [23,33,39,43, 56,57], which are theorems that identify such favorable conditions and, based on them, guarantee the existence of functions that recurse over the non-free constructors.

In this paper, I make a contribution to the nameful paradigm in general, and to nameful recursion in particular. I introduce *rensets*, which are algebraic structures axiomatizing the properties of renaming, also known as variable-forvariable substitution, on terms with bindings (Sect. 3). Rensets differ from nominal sets (Sect. 2.2), which form the foundation of nominal logic, by their focus on (not necessarily injective) renaming rather than swapping (or permutation). Similarly to nominal sets, rensets are pervasive: Not only do the variables and terms form rensets, but so do any container-type combinations of rensets.

While lacking the pleasant symmetry of swapping, my axiomatization of renaming has its advantages. First, renaming is more fundamental than swapping because, at an abstract axiomatic level, renaming can define swapping but not vice versa (Sect. 4). The second advantage is about the ability to define another central operator: the variable freshness predicate. While the definability of freshness from swapping is a signature trait of nominal logic, my renamingbased alternative fares even better: In rensets freshness has a simple, firstorder definition (Sect. 3). This contrasts the nominal logic definition, which involves a second-order statement about (co)finiteness of a set of variables. The third advantage is largely a consequence of the second: Rensets enriched with constructor-like operators facilitate an equational characterization of terms with bindings (using an infinite set of unconditional equations), which does not seem possible for swapping (Sect. 5.1). This produces a recursion principle (Sect. 5.2) which, like the nominal recursor, caters for Barendregt's variable convention, and in some cases is easier to apply than the nominal recursor—for example when interpreting syntax in semantic domains (Sect. 5.3).

In summary, I argue that my renaming-based axiomatization offers some benefits that strengthen the arsenal of the nameful paradigm: a simpler representation of freshness, a minimalistic equational characterization of terms, and a convenient recursion principle. My results are established with high confidence thanks to having been mechanized in Isabelle/HOL [32]. The mechanization is available [44] from Isabelle's Archive of Formal Proofs.

Here is the structure of the rest of this paper: Sect. 2 provides background on terms with bindings and on nominal logic. Section 3 introduces rensets and describes their basic properties. Section 4 establishes a formal connection to nominal sets. Section 5 discusses substitutive-set-based recursion. Section 6 discusses related work. A technical report [45] associated to this paper includes an appendix with more examples and results and more background on nominal sets.

## **2 Background**

This section recalls the terms of λ-calculus and their basic operators (Sect. 2.1), and aspects of nominal logic including nominal sets and nominal recursion (Sect. 2.2).

#### **2.1 Terms with Bindings**

I work with the paradigmatic syntax of (untyped) λ-calculus. However, my results generalize routinely to syntaxes specified by arbitrary binding signatures such as the ones in [22, §2], [39,59] or [12].

Let Var be a countably infinite set of variables, ranged over by x, y, z etc. The set Trm of <sup>λ</sup>*-terms* (or *terms* for short), ranged over by t, t1, t<sup>2</sup> etc., is defined by the grammar <sup>t</sup> ::= Vr <sup>x</sup> <sup>|</sup> Ap <sup>t</sup><sup>1</sup> <sup>t</sup><sup>2</sup> <sup>|</sup> Lm x t

with the proviso that terms are equated (identified) modulo alpha-equivalence (also known as naming equivalence). Thus, for example, if <sup>x</sup> -<sup>=</sup> <sup>z</sup> -= y then Lm x (Ap (Vr x) (Vr z)) and Lm y (Ap (Vr y) (Vr z)) are considered to be the same term. I will often omit Vr when writing terms, as in, e.g., Lm x x.

What the above specification means is (something equivalent to) the following: One first defines the set PTrm of *pre-terms* as freely generated by the grammar <sup>p</sup> ::= PVr <sup>x</sup> <sup>|</sup> PAp <sup>p</sup><sup>1</sup> <sup>p</sup><sup>2</sup> <sup>|</sup> PLm x p. Then one defines the alpha-equivalence relation ≡ : PTrm → PTrm → Bool inductively, proves that it is an equivalence, and defines Trm by quotienting PTrm to alpha-equivalence, i.e., Trm = PTrm/ <sup>≡</sup>. Finally, one proves that the pre-term constructors are compatible with ≡, and defines the term counterpart of these constructors: Vr : Var → Trm, Ap : Trm → Trm → Trm and Lm : Var → Trm → Trm.

The above constructions are technical, but well-understood, and can be fully automated for an arbitrary syntax with bindings (not just that of λ-calculus); and tools such as the Isabelle/Nominal package [59,60] provide this automation, hiding pre-terms completely from the end user. In formal and informal presentations alike, one usually prefers to forget about pre-terms, and work with terms only. This has several advantages, including (1) being able to formalize concepts at the right abstraction level (since in most applications the naming of bound variables should be inconsequential) and (2) the renaming operator being wellbehaved. However, there are some difficulties that need to be overcome when working with terms, and in this paper I focus on one of the major ones: providing recursion principles, i.e., mechanisms for defining functions by recursing over terms. This difficulty arises essentially because, unlike in the case of pre-term constructors, the binding constructor for terms is not free.

The main characters of my paper will be (generalizations of) some common operations and relations on Trm, namely:


The free-variable and freshness operators are of course related: A variable x is fresh for a term <sup>t</sup> (i.e., <sup>x</sup> # <sup>t</sup>) if and only if it is not free in <sup>t</sup> (i.e., x /<sup>∈</sup> FV(t)). The renaming operator [ / ] : Trm <sup>→</sup> Var <sup>→</sup> Var <sup>→</sup> Trm substitutes (in terms) *variables* for variables, not terms for variables. (But an algebraization of termfor-variable substitution is discussed in [45, Appendix D].)

#### **2.2 Background on Nominal Logic**

I will employ a formulation of nominal logic [38,39,57] that does not require any special logical foundation, e.g., axiomatic nominal set theory. For simplicity, I prefer the swapping-based formulation [38] to the equivalent permutation-based formulation—[45, Appendix C] gives details on these two alternatives.

<sup>A</sup> *pre-nominal set* is a pair <sup>A</sup> = (A, [ <sup>∧</sup> ]) where <sup>A</sup> is a set and [ <sup>∧</sup> ] : <sup>A</sup> <sup>→</sup> Perm <sup>→</sup> <sup>A</sup> is a function called *the swapping operator of* <sup>A</sup> satisfying the following properties for all <sup>a</sup> <sup>∈</sup> <sup>A</sup> and x, x1, x2, y1, y<sup>2</sup> <sup>∈</sup> Var:


Given a pre-nominal set <sup>A</sup> = (A, [ <sup>∧</sup> ]), an element <sup>a</sup> <sup>∈</sup> <sup>A</sup> and a set <sup>X</sup> <sup>⊆</sup> Var, one says that <sup>a</sup> *is supported by* <sup>X</sup> if <sup>a</sup>[x∧y] = <sup>a</sup> holds for all x, y <sup>∈</sup> Var such that x, y /<sup>∈</sup> <sup>X</sup>. An element <sup>a</sup> <sup>∈</sup> <sup>A</sup> is called *finitely supported* if there exists a finite set <sup>X</sup> <sup>⊆</sup> <sup>A</sup> such that <sup>a</sup> is supported by <sup>X</sup>. A *nominal set* is a pre-nominal set <sup>A</sup> = (A, [ <sup>∧</sup> ]) such that every element of <sup>a</sup> is finitely supported. If <sup>A</sup> = (A, [ <sup>∧</sup> ]) is a nominal set and <sup>a</sup> <sup>∈</sup> <sup>A</sup>, then the smallest set <sup>X</sup> <sup>⊆</sup> <sup>A</sup> such that <sup>a</sup> is supported by <sup>X</sup> exists, and is denoted by supp<sup>A</sup> <sup>a</sup> and called the *support of* <sup>a</sup>. One calls a variable <sup>x</sup> *fresh for* <sup>a</sup>, written <sup>x</sup> # <sup>a</sup>, if x /<sup>∈</sup> supp<sup>A</sup> <sup>a</sup>.

An alternative, more direct definition of freshness (which is preferred, e.g., by Isabelle/Nominal [59,60]) is provided by the following proposition:

**Proposition 1.** For any nominal set <sup>A</sup> = (A, [ <sup>∧</sup> ]) and any <sup>x</sup> <sup>∈</sup> Var and <sup>a</sup> <sup>∈</sup> <sup>A</sup>, it holds that <sup>x</sup> # <sup>a</sup> if and only if the set {<sup>y</sup> <sup>|</sup> <sup>a</sup>[y∧x] -<sup>=</sup> <sup>a</sup>} is finite.

Given two pre-nominal sets <sup>A</sup> = (A, [ <sup>∧</sup> ]) and <sup>B</sup> = (B, [ <sup>∧</sup> ]), the set <sup>F</sup> = (<sup>A</sup> <sup>→</sup> <sup>B</sup>) of functions from <sup>A</sup> to <sup>B</sup> becomes a pre-nominal set <sup>F</sup> = (F, [ <sup>∧</sup> ]) by defining <sup>f</sup>[x∧y] to send each <sup>a</sup> <sup>∈</sup> <sup>A</sup> to (f(a[x∧y]))[x∧y]. <sup>F</sup> is not a nominal set because not all functions are finitely supported (though of course one obtains a nominal set by restricting to finitely supported functions).

The set of terms together with their swapping operator, (Trm, [ <sup>∧</sup> ]), forms a nominal set, where the support of a term is precisely its set of free variables. However, the power of nominal logic resides in the fact that not only the set of terms, but also many other sets can be organized as nominal sets—including the target domains of many functions one may wish to define on terms. This gives rise to a convenient mechanism for defining functions recursively on terms:

**Theorem 2** [39]**.** Let <sup>A</sup> = (A, [ ]) be a nominal set and let Vr<sup>A</sup> : Var <sup>→</sup> <sup>A</sup>, Ap<sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> and Lm<sup>A</sup> : Var <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> be some functions, all supported by a finite set X of variables and with Lm<sup>A</sup> satisfying the following freshness condition for binders (FCB): There exists <sup>x</sup> <sup>∈</sup> Var such that x /<sup>∈</sup> <sup>X</sup> and <sup>x</sup> # Lm<sup>A</sup> x a for all <sup>a</sup> <sup>∈</sup> <sup>A</sup>.

Then there exists a unique function <sup>f</sup> : Trm <sup>→</sup> <sup>A</sup> that is supported by <sup>X</sup> and such that the following hold for all <sup>x</sup> <sup>∈</sup> Var and <sup>t</sup>1, t2, t <sup>∈</sup> Trm:

$$\begin{array}{ll} \text{(i)} \ f\left(\mathsf{Vr}\ x\right) = \mathsf{Vr}^{\mathcal{A}}\ x & \text{(ii)} \ f\left(\mathsf{Ap}\ t\_{1}\ t\_{2}\right) = \mathsf{Ap}^{\mathcal{A}}\ \left(f\ t\_{1}\right)\left(f\ t\_{2}\right) \\\ \text{(iii)} \ f\left(\mathsf{L}\ m\ x\ t\right) = \mathsf{L}\mathsf{m}^{\mathcal{A}}\ x\ \left(f\ t\right)\ \text{if}\ x\notin X \\\end{array}$$

A useful feature of nominal recursion is the support for Barendregt's famous *variable convention* [8, p. 26]: "If [the terms] <sup>t</sup>1,...,tn occur in a certain mathematical context (e.g. definition, proof), then in these terms all bound variables are chosen to be different from the free variables." The above recursion principle adheres to this convention by fixing a finite set X of variables meant to be free in the definition context and guaranteeing that the bound variables in the definitional clauses are distinct from them. Formally, the target domain operators Vr<sup>A</sup>, Ap<sup>A</sup> and Lm<sup>A</sup> are supported by X, and the clause for λ-abstraction is conditioned by the binding variable x being outside of X. (The Barendregt convention is also present in nominal logic via induction principles [39,58–60].)

## **3 Rensets**

This section introduces rensets, an alternative to nominal sets that axiomatize renaming rather than swapping or permutation.

<sup>A</sup> *renaming-enriched set* (*renset* for short) is a pair <sup>A</sup> = (A, [ / ]) where <sup>A</sup> is a set and [ / ] : <sup>A</sup> <sup>→</sup> Var <sup>→</sup> Var <sup>→</sup> <sup>A</sup> is an operator such that the following hold for all x, x1, x2, x3, y, y1, y<sup>2</sup> <sup>∈</sup> Var and <sup>a</sup> <sup>∈</sup> <sup>A</sup>:


Let us call <sup>A</sup> the *carrier* of <sup>A</sup> and [ / ] the *renaming operator* of <sup>A</sup>. Similarly to the case of terms, we think of the elements <sup>a</sup> <sup>∈</sup> <sup>A</sup> as some kind of variablebearing entities and of a[y/x] as the result of substituting x with y in a. With this intuition, the above properties are natural: Identity says that substituting a variable with itself has no effect. Idempotence acknowledges the fact that, after its renaming, a variable y is no longer there, so substituting it again has no effect. Chaining says that a chain of renamings x3/x2/x<sup>1</sup> has the same effect as the end-to-end renaming x3/x<sup>1</sup> provided there is no interference from x2, which is ensured by initially substituting x<sup>2</sup> with some other variable y. Finally, Commutativity allows the reordering of any two independent renamings.

**Examples.** (Var, [ / ]) and (Trm, [ / ]), the sets of variables and terms with the standard renaming operator on them, form rensets. Moreover, given any functor <sup>F</sup> on the category of sets and a renset <sup>A</sup> = (A, [ / ]), let us define the renset <sup>F</sup> <sup>A</sup> = (F A, [ / ]) as follows: for any <sup>k</sup> <sup>∈</sup> F A and x, y <sup>∈</sup> Var, <sup>k</sup>[x/y] = F ( [x/y]) k, where the last occurrence of F refers to the action of the functor on morphisms. This means that one can freely build new rensets from existing ones using container types (which are particular kinds of functors)—e.g., lists, sets, trees etc. Another way to put it: Rensets are closed under datatype and codatatype constructions [55].

In what follows, let us fix a renset <sup>A</sup> = (A, [ / ]). One can define the notion of freshness of a variable for an element of a in the style of nominal logic. But the next proposition shows that simpler formulations are available.

**Proposition 3.** The following are equivalent:

(1) The set {<sup>y</sup> <sup>∈</sup> Var <sup>|</sup> <sup>a</sup>[y/x] -<sup>=</sup> <sup>a</sup>} is finite.

(2) <sup>a</sup>[y/x] = <sup>a</sup> for all <sup>y</sup> <sup>∈</sup> Var. (3) <sup>a</sup>[y/x] = <sup>a</sup> for some <sup>y</sup> <sup>∈</sup> Var -{x}.

Let us define the predicate # : Var <sup>→</sup> <sup>A</sup> <sup>→</sup> Bool as follows: <sup>x</sup> # <sup>a</sup>, read <sup>x</sup> *is fresh for* <sup>a</sup>, if either of Proposition 3's equivalent properties holds.

Thus, points (1)–(3) above are three alternative formulations of x # a, all referring to the lack of effect of substituting y for x, expressed as a[y/x] = a: namely that this phenomenon affects (1) all but a finite number of variables y, (2) all variables <sup>y</sup>, or (3) some variable <sup>y</sup> -= x. The first formulation is the most complex of the three—it is the nominal definition, but using renaming instead of swapping. The other two formulations do not have counterparts in nominal logic, essentially because swapping is not as "efficient" as renaming at exposing freshness. In particular, (3) does not have a nominal counterpart because there is no single-swapping litmus test for freshness. The closest we can get to property (3) in a nominal set is the following: <sup>x</sup> is fresh for <sup>a</sup> if and only <sup>a</sup>[y∧x] = <sup>a</sup> holds for some fresh y—but this needs freshness to explain freshness!

**Examples (continued).** For the rensets of variables and terms, freshness defined as above coincides with the expected operators: distinctness in the case of variables and standard freshness in the case of terms. And applying the definition of freshness to rensets obtained using finitary container types has similarly intuitive outcomes; for example, the freshness of a variable x for a list of items [a1,...,an] means that <sup>x</sup> is fresh for each item <sup>a</sup>i in the list.

Freshness satisfies some intuitive properties, which can be easily proved from its definition and the renset axioms. In particular, point (2) of the next proposition is the freshness-based version of the Chaining axiom.

**Proposition 4.** The following hold:


## **4 Connection to Nominal Sets**

So far I focused on consequences of the purely equational theory of rensets, without making any assumption about cardinality. But after additionally postulating a nominal-style finite support property, one can show that rensets give rise to nominal sets—which is what I will do in this section.

Let us say that a renset <sup>A</sup> = (A, [ / ]) has the *Finite Support* property if, for all <sup>a</sup> <sup>∈</sup> <sup>A</sup>, the set {<sup>x</sup> <sup>∈</sup> Var | ¬ <sup>x</sup> # <sup>a</sup>} is finite.

Let <sup>A</sup> = (A, [ / ]) be a renset satisfying Finite Support. Let us define the swapping operator [ <sup>∧</sup> ] : <sup>A</sup> <sup>→</sup> Var <sup>→</sup> Var <sup>→</sup> <sup>A</sup> as follows: <sup>a</sup>[x<sup>1</sup> <sup>∧</sup> <sup>x</sup>2] = a[y/x1][x1/x2][x2/y], where y is a variable that is fresh for all the involved items, namely y /∈ {x1, x2} and <sup>y</sup> # <sup>a</sup>. Indeed, this is how one would define swapping from renaming on terms: using a fresh auxiliary variable y, and exploiting that such a fresh y exists and that its choice is immaterial for the end result. The next lemma shows that this style of definition also works abstractly, i.e., all it needs are the renset axioms plus Finite Support.

**Lemma 5.** The following hold for all <sup>x</sup>1, x<sup>2</sup> <sup>∈</sup> Var and <sup>a</sup> <sup>∈</sup> <sup>A</sup>:


And one indeed obtains an operator satisfying the nominal axioms:

**Proposition 6.** If (A, [ / ]) is a renset satisfying Finite Support, then (A, [ <sup>∧</sup> ]) is a nominal set. Moreover, (A, [ / ]) and (A, [ <sup>∧</sup> ]) have the same notion of freshness, in that the freshness operator defined from renaming coincides with that defined from swapping.

The above construction is functorial, as I detail next. Given two nominal sets <sup>A</sup> = (A, [ <sup>∧</sup> ]) and <sup>B</sup> = (B, [ <sup>∧</sup> ]), a *nominal morphism* <sup>f</sup> : A→B is a function <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> with the property that it commutes with swapping, in that (f a)[<sup>x</sup> <sup>∧</sup> <sup>y</sup>] = <sup>f</sup>(a[<sup>x</sup> <sup>∧</sup> <sup>y</sup>]) for all <sup>a</sup> <sup>∈</sup> <sup>A</sup> and x, y <sup>∈</sup> Var. Nominal sets and nominal morphisms form a category that I will denote by *Nom*. Similarly, let us define a morphism <sup>f</sup> : A→B between two rensets <sup>A</sup> = (A, [ / ]) and <sup>B</sup> = (B, [ ]) to be a function <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> that commutes with renaming, yielding the category *Sbs* of rensets. Let us write *FSbs* for the full subcategory of *Sbs* given by rensets that satisfy Finite Support. Let us define <sup>F</sup> : *FSbs* <sup>→</sup> *Nom* to be an operator on objects and morphisms that sends each finite-support renset to the above described nominal set constructed from it, and sends each substitutive morphism to itself.

**Theorem 7.** <sup>F</sup> is a functor between *FSbs* and *Nom* which is injective on objects and full and faithful (i.e., bijective on morphisms).

One may ask whether it is also possible to make the trip back: from nominal to rensets. The answer is negative, at least if one wants to retain the same notion of freshness, i.e., have the freshness predicate defined in the nominal set be identical to the one defined in the resulting renset. This is because swapping preserves the cardinality of the support, whereas renaming must be allowed to change it since it might perform a non-injective renaming. The following example captures this idea:

**Counterexample.** Let <sup>A</sup> = (A, [ <sup>∧</sup> ]) be a nominal set such that all elements of <sup>A</sup> have their support consisting of exactly two variables, <sup>x</sup> and <sup>y</sup> (with <sup>x</sup> -= y). (For example, A can be the set of all terms with these free variables—this is indeed a nominal subset of the term nominal set because it is closed under swapping.) Assume for a contradiction that [ / ] is an operation on A that makes (A, [ / ]) a renset with its induced freshness operator equal to that of <sup>A</sup>. Then, by the definition of A, a[y/x] needs to have exactly two non-fresh variables. But this is impossible, since by Proposition 4(3), all the variables different from y (including <sup>x</sup>) must be fresh for <sup>a</sup>[y/x]. In particular, <sup>A</sup> is not in the image of the functor <sup>F</sup> : *FSbs* <sup>→</sup> *Nom*, which is therefore not surjective on objects.

Thus, at an abstract algebraic level renaming can define swapping, but not the other way around. This is not too surprising, since swapping is fundamentally bijective whereas renaming is not; but it further validates our axioms for renaming, highlighting their ability to define a well-behaved swapping.

### **5 Recursion Based on Rensets**

Proposition 3 shows that, in rensets, renaming can define freshness using only equality and universal or existential quantification over variables—without needing any cardinality condition like in the case of swapping. As I am about to discuss, this forms the basis of a characterization of terms as the initial algebra of an equational theory (Sect. 5.1) and an expressive recursion principle (Sect. 5.2) that fares better than the nominal one for interpretations in semantic domains (Sect. 5.3).

#### **5.1 Equational Characterization of the Term Datatype**

Rensets contain elements that are "term-like" in as much as there is a renaming operator on them satisfying familiar properties of renaming on terms. This similarity with terms can be strengthened by enriching rensets with operators having arities that match those of the term constructors.

<sup>A</sup> *constructor-enriched renset* (*CE renset* for short) is a tuple <sup>A</sup> <sup>=</sup> (A, [ / ],Vr<sup>A</sup>,Ap<sup>A</sup>, Lm<sup>A</sup>) where:

– (A, [ / ]) is a renset – Vr<sup>A</sup> : Var <sup>→</sup> <sup>A</sup>, Ap<sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> and Lm<sup>A</sup> : Var <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> are functions

such that the following hold for all a, a1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup> and x, y, z <sup>∈</sup> Var:

$$\begin{array}{l} \text{(S1) } (\mathsf{Vr}^{\mathcal{A}} \, x)[y/z] = \mathsf{Vr}^{\mathcal{A}}(x[y/z])\\ \text{(S2) } (\mathsf{Ap}^{\mathcal{A}} \, a\_1 \, a\_2)[y/z] = \mathsf{Ap}^{\mathcal{A}}(a\_1[y/z]) \, (a\_2[y/z])\\ \text{(S3) if } x \notin \{y, z\} \text{ then } (\mathsf{Lm}^{\mathcal{A}} \, x \, a)[y/z] = \mathsf{Lm}^{\mathcal{A}} \, x \, (a[y/z])\\ \text{(S4) } (\mathsf{Lm}^{\mathcal{A}} \, x \, a)[y/x] = \mathsf{Lm}^{\mathcal{A}} \, x \, a\\ \text{(S5) if } z \neq y \text{ then } \mathsf{Lm}^{\mathcal{A}} \, x \, (a[z/y]) = \mathsf{Lm}^{\mathcal{A}} \, y \, (a[z/y][y/x]) \end{array}$$

Let us call VrA,ApA, Lm<sup>A</sup> the *constructors* of <sup>A</sup>. (S1)–(S3) express the constructors' commutation with renaming (with capture-avoidance provisions in the case of (S3)), (S4) the lack of effect of substituting for a bound variable, and (S5) the possibility to rename a bound variable without changing the abstracted item (where the inner renaming of <sup>z</sup> -= y for y ensures the freshness of the "new name" y, hence its lack of interference with the other names in the "term-like" entity where the renaming takes place). All these are well-known to hold for terms:

**Example.** Terms with renaming and the constructors, namely (Trm, [ / ],Vr, Ap, Lm), form a CE renset which will be denoted by <sup>T</sup>rm.

As it turns out, the CE renset axioms capture exactly the term structure <sup>T</sup>rm, via initiality. The notion of *CE substitutive morphism* <sup>f</sup> : A→B between two CE rensets <sup>A</sup> = (A, [ / ],VrA,ApA, Lm<sup>A</sup>) and <sup>B</sup> = (B, [ / ],VrB,ApB, Lm<sup>B</sup>) is the expected one: a function <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> that is a substitutive morphism and also commutes with the constructors. Let us write *Sbs*CE for the category of CE rensets and morphisms.

**Theorem 8.** <sup>T</sup>rm is the initial CE renset, i.e., initial object in *Sbs*CE.

*Proof Idea.* Let <sup>A</sup> = (A, [ / ],VrA,ApA, Lm<sup>A</sup>) be a CE renset. Instead of directly going after a function <sup>f</sup> : Trm <sup>→</sup> <sup>A</sup>, one first inductively defines a relation <sup>R</sup> : Trm <sup>→</sup> <sup>A</sup> <sup>→</sup> Bool, with inductive clauses reflecting the desired properties concerning the commutation with the constructors, e.g., Rta R (Lm x t) (Lm*<sup>A</sup>* x a) . It suffices to prove that R is total and functional and preserves renaming, since that allows one to define a constructor- and renaming-preserving function (a morphism) f by taking f t to be the unique a with Rta.

Proving that R is total is easy by standard induction on terms. Proving the other two properties, namely functionality and preservation of renaming, is more elaborate and requires their simultaneous proof together with a third property: that R preserves freshness. The simultaneous three-property proof follows by a form of "substitutive induction" on terms: Given a predicate <sup>φ</sup> : Trm <sup>→</sup> Bool, to show <sup>∀</sup><sup>t</sup> <sup>∈</sup> Trm.φt it suffices to show the following: (1) <sup>∀</sup><sup>x</sup> <sup>∈</sup> Var. φ (Vr <sup>x</sup>), (2) <sup>∀</sup>t1, t<sup>2</sup> <sup>∈</sup> Trm.φt<sup>1</sup> & φ t<sup>2</sup> <sup>→</sup> <sup>φ</sup> (Ap <sup>t</sup><sup>1</sup> <sup>t</sup>2), and (3) <sup>∀</sup><sup>x</sup> <sup>∈</sup> Var, t <sup>∈</sup> Trm. (∀<sup>s</sup> <sup>∈</sup> Trm. Con [ / ] t s <sup>→</sup> φ s) <sup>→</sup> <sup>φ</sup> (Lm x t), where Con [ / ] t s means that <sup>t</sup> is connected to s by a chain of renamings.

Roughly speaking, R turns out to be functional because the λ-abstraction operator on the "term-like" inhabitants of A is, thanks to the axioms of CE renset, at least as non-injective as (i.e., identifies at least as many items as) the <sup>λ</sup>-abstraction operator on terms. 

Theorem 8 is the central result of this paper, from both practical and theoretical perspectives. Practically, it enables a useful form of recursion on terms (as I will discuss in the following sections). Theoretically, this is a characterization of terms as the initial algebra of an equational theory that only the most fundamental term operations, namely the constructors and renaming. The equational theory consists of the axioms of CE rensets (i.e., those of rensets plus (S1)–(S5)), which are an infinite set of unconditional equations—for example, axiom (S5) gives one equation for each pair of distinct variables y, z.

It is instructive to compare this characterization with the one offered by nominal logic, namely by Theorem 2. To do this, one first needs a lemma:

**Lemma 9.** Let <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> be a function between two nominal sets <sup>A</sup> = (A, [ <sup>∧</sup> ]) and <sup>B</sup> = (B, [ <sup>∧</sup> ]) and <sup>X</sup> a set of variables. Then <sup>f</sup> is supported by <sup>X</sup> if and only if <sup>f</sup>(a[x∧y]) = (f a)[x∧y] for all x, y <sup>∈</sup> Var -X.

Now Theorem <sup>2</sup> (with the variable avoidance set <sup>X</sup> taken to be <sup>∅</sup>) can be rephrased as an initiality statement, as I describe below.

Let us define a *constructor-enriched nominal set* (*CE nominal set*) to be any tuple <sup>A</sup> = (A, [ <sup>∧</sup> ],VrA,ApA, Lm<sup>A</sup>) where (A, [ <sup>∧</sup> ]) is a nominal set and Vr<sup>A</sup> : Var <sup>→</sup> <sup>A</sup>, Ap<sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup>, Lm<sup>A</sup> : Var <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> are operators on <sup>A</sup> such that the following properties hold for all a, a1, a<sup>2</sup> <sup>∈</sup> <sup>A</sup> and x, y, z <sup>∈</sup> Var:

(N1) (Vr<sup>A</sup> <sup>x</sup>)[y∧z] = Vr<sup>A</sup>(x[y∧z]) (N2) (Ap<sup>A</sup> <sup>a</sup><sup>1</sup> <sup>a</sup>2)[y∧z] = Ap<sup>A</sup>(a1[y∧z]) (a2[y∧z]) (N3) (Lm<sup>A</sup> x a)[y∧z] = Lm<sup>A</sup> (x[y∧z]) (a[y∧z]) (N4) <sup>x</sup> # Lm x a, i.e., {<sup>y</sup> <sup>∈</sup> Var <sup>|</sup> (Lm x a)[y∧x] -<sup>=</sup> Lm x a} is finite.

The notion of *CE nominal morphism* is defined as the expected extension of that of nominal morphism: a function that commutes with swapping and the constructors. Let *Nom*CE be the category of CE nominal sets morphisms.

**Theorem 10** ([39], rephrased)**.** (Trm, [ <sup>∧</sup> ],Vr,Ap, Lm) is the initial CE nominal set, i.e., the initial object in *Nom*CE.

The above theorem indeed corresponds exactly to Theorem <sup>2</sup> with <sup>X</sup> <sup>=</sup> <sup>∅</sup>:


Unlike the renaming-based characterization of terms (Theorem 8), the nominal logic characterization (Theorem 10) is not purely equational. This is due to a combination of two factors: (1) two of the axioms ((N4) and the Finite Support condition) referring to freshness and (2) the impossibility of expressing freshness equationally from swapping. The problem seems fundamental, in that a nominal-style characterization does not seem to be expressible purely equationally. By contrast, while the freshness idea is implicit in the CE renset axioms, the freshness predicate itself is absent from Theorem 8.

#### **5.2 Barendregt-Enhanced Recursion Principle**

While Theorem 8 already gives a recursion principle, it is possible to improve it by incorporating Barendregt's variable convention (in the style of Theorem 2):

**Theorem 11.** Let <sup>X</sup> be a finite set, (A, [ / ]) a renset and Vr<sup>A</sup> : Var <sup>→</sup> <sup>A</sup>, Ap<sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> and Lm<sup>A</sup> : Var <sup>→</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> some functions that satisfy the clauses (S1)–(S5) from the definition of CE renset, but only under the assumption that x, y, z /<sup>∈</sup> <sup>X</sup>. Then there exists a unique function <sup>f</sup> : Trm <sup>→</sup> <sup>A</sup> such that th following hold:

$$\begin{array}{ll} \text{(i)} \ f \left( \mathsf{Vr} \, x \right) = \mathsf{Vr}^{\mathcal{A}} \, x \\\\ \end{array} \begin{array}{ll} \text{(i)} \ f \left( \mathsf{A} \mathfrak{p} \, t\_{1} \, t\_{2} \right) = \mathsf{A} \mathsf{p}^{\mathcal{A}} \left( f \, t\_{1} \right) \left( f \, t\_{2} \right) \end{array}$$

$$(\text{iii)}\ f\left(\mathsf{L}\mathfrak{m}\ x\ \mathsf{t}\right) = \mathsf{L}\mathfrak{m}^{\mathcal{A}}\ x\ \left(f\ \mathsf{t}\right)\ \text{if}\ x\notin X\quad(\text{iv)}\ f\left(t[y/z]\right) = \left(f\ \mathsf{t}\right)[y/z]\ \text{if}\ y,z\notin X$$

*Proof Idea.* The constructions in the proof of Theorem <sup>8</sup> can be adapted to avoid clashing with the finite set of variables X. For example, the clause for λ-abstraction in the inductive definition of the relation R becomes x∈X Rta R (Lm x t) (Lm*<sup>A</sup>* x a) and preservation of renaming and freshness are also formulated to avoid X. Totality is still ensured thanks to the possibility of renaming bound variables—in terms and inhabitants of A alike (via the modified axiom (S5)). 

The above theorem says that if the structure A is assumed to be "almost" a CE set, save for additional restrictions involving the avoidance of X, then there exists a unique "almost"-morphism—satisfying the CE substitutive morphism conditions restricted so that the bound and renaming-participating variables avoid X. It is the renaming-based counterpart of the nominal Theorem 2.

In regards to the relative expressiveness of these two recursion principles (Theorems 11 and 2), it seems difficult to find an example that is definable by one but not by the other. In particular, my principle can seamlessly define standard nominal examples [39,40] such as the length of a term, the counting of λ-abstractions or of the free-variables occurrences, and term-for-variable substitution—[45, Appendix A] gives details. However, as I am about to discuss, I found an important class of examples where my renaming-based principle is significantly easier to deploy: that of interpreting syntax in semantic domains.

#### **5.3 Extended Example: Semantic Interpretation**

Semantic interpretations, also known as denotations (or denotational semantics), are pervasive in the meta-theory of logics and λ-calculi, for example when interpretating first-order logic (FOL) formulas in FOL models, or untyped or simply-typed λ-calculus or higher-order logic terms in specific models (such as full-frame or Henkin models). In what follows, I will focus on λ-terms and Henkin models, but the ideas discussed apply broadly to any kind of statically scoped interpretation of terms or formulas involving binders.

Let <sup>D</sup> be a set and ap : <sup>D</sup> <sup>→</sup> <sup>D</sup> <sup>→</sup> <sup>D</sup> and lm : (<sup>D</sup> <sup>→</sup> <sup>D</sup>) <sup>→</sup> <sup>D</sup> be operators modeling semantic notions of application and abstraction. An environment will be a function <sup>ξ</sup> : Var <sup>→</sup> <sup>D</sup>. Given x, y <sup>∈</sup> Var and d, e <sup>∈</sup> <sup>D</sup>, let us write <sup>ξ</sup> <sup>x</sup> := <sup>d</sup> for ξ updated with value d for x (i.e., acting like ξ on all variables except for x where it returns <sup>d</sup>); and let us write <sup>ξ</sup> <sup>x</sup> := d, y := <sup>e</sup> instead of <sup>ξ</sup> <sup>x</sup> := <sup>d</sup> <sup>y</sup> := <sup>e</sup> .

Say one wants to interpret terms in the semantic domain D in the context of environments, i.e., define the function sem : Trm <sup>→</sup> (Var <sup>→</sup> <sup>D</sup>) <sup>→</sup> <sup>D</sup> that maps syntactic to semantic constructs; e.g., one would like to have:


where I use <sup>d</sup> → ... to describe functions in <sup>D</sup> <sup>→</sup> <sup>D</sup>, e.g., <sup>d</sup> → ap d d is the function sending every <sup>d</sup> <sup>∈</sup> <sup>D</sup> to ap d d.

The definition should therefore naturally go recursively by the clauses:


Of course, since Trm is not a free datatype, these clauses do not work out of the box, i.e., do not form a definition (yet)—this is where binding-aware recursion principles such as Theorems 11 and 2 could step in. I will next try them both.

The three clauses above already determine constructor operations Vr<sup>I</sup>, Ap<sup>I</sup> and Lm<sup>I</sup> on the set of interpretations, <sup>I</sup> = (Var <sup>→</sup> <sup>D</sup>) <sup>→</sup> <sup>D</sup>, namely:


To apply the renaming-based recursion principle from Theorem 11, one must further define a renaming operator on I. Since the only chance to successfully apply this principle is if sem commutes with renaming, the definition should be inspired by the question: How can sem(t[y/x]) be determined from sem t, y and <sup>x</sup>? The answer is (4) sem (t[y/x]) <sup>ξ</sup> = (sem <sup>t</sup>) (<sup>ξ</sup> <sup>x</sup> := ξ y ), yielding an operator [ / ] <sup>I</sup> : <sup>I</sup> <sup>→</sup> Var <sup>→</sup> Var <sup>→</sup> <sup>I</sup> defined by <sup>i</sup>[y/x] <sup>I</sup> <sup>ξ</sup> <sup>=</sup> <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := ξ y ).

It is not difficult to verify that <sup>I</sup> = (I, [ / ] <sup>I</sup>,Vr<sup>I</sup>,Ap<sup>I</sup>, Lm<sup>I</sup>) is a CE renset for example, Isabelle's automatic methods discharge all the goals. This means Theorem 11 (or, since here one doesn't need Barendregt's variable convention, already Theorem 8) is applicable, and gives us a unique function sem that commutes with the constructors, i.e., satisfies clauses (1)–(3) (which are instances of the clauses (i)–(iii) from Theorem 11), and additionally commutes with renaming, i.e., satisfies clause (4) (which is an instances of the clause (iv) from Theorem 11).

On the other hand, to apply nominal recursion for defining sem, one must identify a swapping operator on I. Similarly to the case of renaming, this identification process is guided by the goal of determining sem(t[x∧y]) from sem <sup>t</sup>, <sup>x</sup> and <sup>y</sup>, leading to (4') sem (t[x∧y]) <sup>ξ</sup> <sup>=</sup> sem <sup>t</sup> (<sup>ξ</sup> <sup>x</sup> := ξ y, y := ξ x ), which yields the definition of [ ∧ ] <sup>I</sup> by <sup>i</sup>[x∧y] <sup>I</sup> <sup>ξ</sup> <sup>=</sup> <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := ξ y, y := ξ x ). However, as pointed out by Pitts [39, §6.3] (in the slightly different context of interpreting simplytyped <sup>λ</sup>-calculus), the nominal recursor (Theorem 2) does *not* directly apply (hence neither does my reformulation based on CE nominal sets, Theorem 10). This is because, in my terminology, the structure <sup>I</sup> = (I, [ <sup>∧</sup> ] <sup>I</sup>,Vr<sup>I</sup>,Ap<sup>I</sup>, Lm<sup>I</sup>) is not a CE nominal set. The problematic condition is FCB (the freshness condition for binders), requiring that <sup>x</sup> #<sup>I</sup> (Lm<sup>I</sup> x i) holds for all <sup>i</sup> <sup>∈</sup> <sup>I</sup>. Expanding the definition of #<sup>I</sup> (the nominal definition of freshness from swapping, recalled in Sect. 2.2) and the definitions of [ ∧ ] <sup>I</sup> and Lm<sup>I</sup>, one can see that x #<sup>I</sup> (Lm<sup>I</sup> x i) means the following:

lm (<sup>d</sup> → <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := ξ y, y := ξ x<sup>x</sup> := <sup>d</sup> )) = lm (<sup>d</sup> → <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := <sup>d</sup> )), i.e., lm (<sup>d</sup> → <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := d, y := ξ x ) = lm (<sup>d</sup> → <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := <sup>d</sup> )), holds for all but a finite number of variables y.

The only chance for the above to be true is if i, when applied to an environment, ignores the value of y in that environment for all but a finite number of variables y; in other words, i only analyzes the value of a finite number of variables in that environment—but this is not guaranteed to hold for arbitrary elements <sup>i</sup> <sup>∈</sup> <sup>I</sup>. To repair this, Pitts engages in a form of induction-recursion [17], carving out from I a smaller domain that is still large enough to interpret all terms, then proving that both FCB and the other axioms hold for this restricted domain. It all works out in the end, but the technicalities are quite involved.

Although FCB is not required by the renaming-based principle, note incidentally that this condition would actually be true (and immediate to check) if working with freshness defined not from swapping but from renaming. Indeed, the renaming-based version of <sup>x</sup> #<sup>I</sup> (Lm<sup>I</sup> x i) says that lm (<sup>d</sup> → <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := ξ y<sup>x</sup> := <sup>d</sup> )) = lm (<sup>d</sup> → <sup>i</sup>(<sup>ξ</sup> <sup>x</sup> := <sup>d</sup> )) holds for all <sup>y</sup> (or at least for some y -<sup>=</sup> <sup>x</sup>)—which is immediate since <sup>ξ</sup> <sup>x</sup> := ξ y<sup>x</sup> := <sup>d</sup> <sup>=</sup> <sup>ξ</sup> <sup>x</sup> := <sup>d</sup> . This further illustrates the idea that semantic domains 'favor' renaming over swapping.

In conclusion, for interpreting syntax in semantic domains, my renamingbased recursor is trivial to apply, whereas the nominal recursor requires some fairly involved additional definitions and proofs.

## **6 Conclusion and Related Work**

This paper introduced and studied rensets, contributing (1) theoretically, a minimalistic equational characterization of the datatype of terms with bindings and (2) practically, an addition to the formal arsenal for manipulating syntax with bindings. It is part of a longstanding line of work by myself and collaborators on exploring convenient definition and reasoning principles for bindings [25,27,43,46,47], and will be incorporated into the ongoing implementation of a new Isabelle definitional package for binding-aware datatypes [12].

**Initial Model Characterizations of the Terms Datatype.** My results provide a truly elementary characterization of terms with bindings, as an "ordinary" datatype specified by the fundamental operations only (the constructors plus


**Fig. 1.** Initial model characterizations of the datatype of terms with bindings "ctors" = "constructors", "perm" = "permutation", "fresh" = "the freshness predicate", "freshdef" = "clause for defining the freshness predicate", "fin-supp" = "Finite Support"

variable-for-variable renaming) and some equations (those defining CE rensets). As far as specification simplicity goes, this is "the next best thing" after a completely free datatype such as those of natural numbers or lists.

Figure 1 shows previous characterizations from the literature, in which terms with bindings are identified as an initial model (or algebra) of some kind. For each of these, I indicate (1) the employed reasoning paradigm, (2) whether the initiality/recursion theorem features an extension with Barendregt's variable convention, (3) the underlying category (from where the carriers of the models are taken), (4) the operations and relations on terms to which the models must provide counterparts and (5) the properties required on the models.

While some of these results enjoy elegant mathematical properties of intrinsic value, my main interest is in the recursors they enable, specifically in the ease of deploying these recursors. That is, I am interested in how easy it is in principle to organize the target domain as a model of the requested type, hence obtain the desired morphism, i.e., get the recursive definition done. By this measure, elementary approaches relying on standard FOL-like models whose carriers are sets rather than pre-sheaves have an advantage. Also, it seems intuitive that a recursor is easier to apply if there are fewer operators, and fewer and structurally simpler properties required on its models—although empirical evidence of successfully deploying the recursor in practice should complement the simplicity assessment, to ensure that simplicity is not sponsored by lack of expressiveness.

The first column in Fig. 1's table contains an influential representative of the nameless paradigm: the result obtained independently by Fiore et al. [22] and Hofmann [29] characterizing terms as initial in the category of algebras over the pre-sheaf topos *Set*F, where <sup>F</sup> is the category of finite ordinals and functions between them. The operators required by algebras are the constructors, as well as the free-variable operator (implicitly as part of the separation on levels) and the injective renamings (as part of the functorial structure). The algebra's carrier is required to be a functor and the constructors to be natural transformations. There are several variations of this approach, e.g., [5,11,29], some implemented in proof assistants, e.g., [3,4,31].

The other columns refer to initiality results that are more closely related to mine. They take place within the nameful paradigm, and they all rely on elementary models (with set carriers). Pitts's already discussed nominal recursor [39] (based on previous work by Gabbay and Pitts [23]) employs the constructors and permutation (or swapping), and requires that its models satisfy some Horn clauses for constructors, permutation and freshness, together with the secondorder properties that (1) define freshness from swapping and (2) express Finite Support. Urban et al.'s version [56,57] implemented in Isabelle/Nominal is an improvement of Pitts's in that it removes the Finite Support requirement from the models—which is practically significant because it enables non-finitely supported target domains for recursion. Norrish's result [33] is explicitly inspired by nominal logic, but renounces the definability of the free-variable operator from swapping—with the price of taking both swapping and free-variables as primitives. My previous work with Gunter and Gheri takes as primitives either termfor-variable substitution and freshness [46] or swapping and freshness [25], and requires properties expressed by different Horn clauses (and does not explore a Barendregt dimension, like Pitts, Urban et al. and Norrish do). My previous focus on term-for-variable substitution [46] (as opposed to renaming, i.e., variable-forvariable substitution) impairs expressiveness—for example, the depth of a term is not definable using a recursor based on term-for-variable substitution because we cannot say how term-for-variable substitution affects the depth of a term based on its depth and that of the substitutee alone. My current result based on rensets keeps freshness out of the primitive operators base (like nominal logic does), and provides an unconditionally equational characterization using only constructors and renaming. The key to achieving this minimality is the simple expression of freshness from renaming in my axiomatization of rensets. In future work, I plan a systematic formal comparison of the relative expressiveness of all these nameful recursors.

**Recursors in Other Paradigms.** Figure 1 focuses on nameful recursors, while only the Fiore et al./Hofmann recursor for the sake of a rough comparison with the nameless approach. I should stress that such a comparison is necessarily rough, since the nameless recursors do not give the same "payload" as the nameful ones. This is because of the handling of bound variables. In the nameless paradigm, the λ-constructor does not explicitly take a variable as an input, as in Lm x t, i.e., does not have type Var <sup>→</sup> Trm <sup>→</sup> Trm. Instead, the bindings are indicated through nameless pointers to positions in a term. So the nameless λ-constructor, let's call it NLm, takes only a term, as in NLm t, i.e., has type Trm → Trm or a scope-safe (polymorphic or dependently-typed) variation of this, e.g., - n∈<sup>F</sup> Trm<sup>n</sup> <sup>→</sup> Trmn+1 [22,29] or - α∈Type Trm<sup>α</sup> <sup>→</sup> Trmα+unit [5,11]. The <sup>λ</sup>constructor is of course matched by operators in the considered models, which appears in the clauses of the functions f defined recursively on terms: Instead of a clause of the form <sup>f</sup> (Lm x t) = expression depending on <sup>x</sup> and f t from the nameful paradigm, in the nameless paradigm one gets a clause of the form <sup>f</sup> (NLm <sup>t</sup>) = expression depending on f t . A nameless recursor is usually easier to prove correct and easier to apply because the nameless constructor NLm is free—whereas a nameful recursor must wrestle with the non-freeness of Lm, handled by verifying certain properties of the target models. However, once the definition is done, having nameful clauses pays off by allowing "textbook-style" proofs that stay close to the informal presentation of a calculus or logic, whereas with the nameless definition some additional index shifting bureaucracy is necessary. (See [9] for a detailed discussion, and [14] for a hybrid solution.)

A comparison of nameful recursion with HOAS recursion is also generally difficult, since major HOAS frameworks such as Abella [7], Beluga [37] or Twelf [36] are developed within non-standard logical foundations, allowing a <sup>λ</sup>-constructor of type (Trm <sup>→</sup> Trm) <sup>→</sup> Trm, which is not amenable to typical well-foundedness based recursion but requires some custom solutions (e.g., [21,50]). However, the *weak HOAS* variant [16,27] employs a constructor of the form WHLm : (Var <sup>→</sup> Trm) <sup>→</sup> Trm which *is* recursable, and in fact yields a free datatype, let us call it WHTrm—one generated by WHVr : Var → WHTrm, WHAp : WHTrm → WHTrm → WHTrm and WHLm. WHTrm contains (natural encodings of) all terms but also additional entities referred to as "exotic terms". Partly because of the exotic terms, this free datatype by itself is not very helpful for recursively defining useful functions on terms. But the situation is dramatically improved if one employs a variant of weak HOAS called *parametric HOAS (PHOAS)* [15], i.e., takes Var not as a fixed type but as a type parameter (type variable) and works with - Var∈Type TrmVar; this enables many useful definitions by choosing a suitable type Var (usually large enough to make the necessary distinctions) and then performing standard recursion. The functions definable in the style of PHOAS seem to be exactly those definable via the semantic domain interpretation pattern (Sect. 5.3): Choosing the instantiation of Var to a type <sup>T</sup> corresponds to employing environments in Var <sup>→</sup> <sup>T</sup>. (I illustrate this at the end of [45, Appendix A] by showing the semantic-domain version of a PHOAS example.)

As a hybrid nameful/HOAS approach we can count Gordon and Melham's characterization of the datatype of terms [26], which employs the nameful constructors but formulates recursion treating Lm as if recursing in the weak-HOAS datatype WHTrm. Norrish's recursor [33] (a participant in Fig. 1) has been inferred from Gordon and Melham's one. Weak-HOAS recursion also has interesting connections with nameless recursion: In presheaf toposes such as those employed by Fiore et al. [22], Hofmann [29] and Ambler et al. [6], for any object <sup>T</sup> the function space Var <sup>⇒</sup> <sup>T</sup> is isomorphic to the De Bruijn level shifting transformation applied to T; this effectively equates the weak-HOAS and nameless recursors. A final cross-paradigm note: In themselves, nominal sets are not confined to the nameful paradigm; their category is equivalent [23] to the Schanuel topos [30], which is attractive for pursuing the nameless approach.

**Axiomatizations of Renaming.** In his study of name-passing process calculi, Staton [52] considers an enrichment of nominal sets with renaming (in addition to swapping) and axiomatizes renaming with the help of the nominal (swappingdefined) freshness predicate. He shows that the resulted category is equivalent to the non-injective renaming counterpart of the Schanuel topos (i.e., the subcategory of *Set*<sup>F</sup> consisting of functors that preserve pullbacks of monos). Gabbay and Hofmann [24] provide an elementary characterization of the above category, in terms of *nominal renaming sets*, which are sets equipped with a multiplevariable-renaming action satisfying identity and composition laws, and a form of Finite Support (FS). Nominal renaming sets seem very related to rensets satisfying FS. Indeed, any nominal renaming set forms a FS-satisfying renset when restricted to single-variable renaming. Conversely, I conjecture that any FS-satisfying renset gives rise to a nominal renaming set. This correspondence seems similar to the one between the permutation-based and swapping-based alternative axiomatizations of nominal sets—in that the two express the same concept up to an isomorphism of categories. In their paper, Gabbay and Hofmann do not study renaming-based recursion, beyond noting the availability of a recursor stemming from the functor-category view (which, as I discussed above, enables nameless recursion with a weak-HOAS flavor). Pitts [41] introduces *nominal sets with* <sup>01</sup>*-substitution structure*, which axiomatize substitution of one of two possible constants for variables on top of the nominal axiomatization, and proves that they form a category that is equivalent with that of cubical sets [10], hence relevant for the univalent foundations [54].

**Other Work.** Sun [53] develops universal algebra for first-order languages with bindings (generalizing work by Aczel [2]) and proves a completeness theorem. In joint work with Ro¸su [48], I develop first-order logic and prove completeness on top of a generic syntax with axiomatized free-variables and substitution.

**Renaming Versus Swapping and Nominal Logic, Final Round.** I believe that my work complements rather than competes with nominal logic. My results do not challenge the swapping-based approach to defining syntax (defining the alpha-equivalence on pre-terms and quotienting to obtain terms) recommended by nominal logic, which is more elegant than a renaming-based alternative; but my easier-to-apply recursor can be a useful addition even on top of the nominal substratum. Moreover, some of my constructions are explicitly inspired by the nominal ones. For example, I started by adapting the nominal idea of defining freshness from swapping before noticing that renaming enables a simpler formulation. My formal treatment of Barendregt's variable convention also originates from nominal logic—as it turns out, this idea works equally well in my setting. In fact, I came to believe that the possibility of a Barendregt enhancement is largely orthogonal to the particularities of a binding-aware recursor. In future work, I plan to investigate this, i.e., seek general conditions under which an initiality principle (such as Theorems 10 and 8) is amenable to a Barendregt enhancement (such as Theorems 2 and 11, respectively).

**Acknowledgments.** I am grateful to the IJCAR reviewers for their insightful comments and suggestions, and for pointing out related work.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Finite Two-Dimensional Proof Systems for Non-finitely Axiomatizable Logics**

Vitor Greati1,2(B) and Jo˜ao Marcos<sup>1</sup>

<sup>1</sup> Programa de P´os-gradua¸c˜ao em Sistemas e Computa¸c˜ao & DIMAp, Universidade Federal do Rio Grande do Norte, Natal, Brazil vitor.greati.017@ufrn.edu.br, jmarcos@dimap.ufrn.br

<sup>2</sup> Bernoulli Institute, University of Groningen, Groningen, The Netherlands

**Abstract.** The characterizing properties of a proof-theoretical presentation of a given logic may hang on the choice of proof formalism, on the shape of the logical rules and of the sequents manipulated by a given proof system, on the underlying notion of consequence, and even on the expressiveness of its linguistic resources and on the logical framework into which it is embedded. Standard (one-dimensional) logics determined by (non-deterministic) logical matrices are known to be axiomatizable by analytic and possibly finite proof systems as soon as they turn out to satisfy a certain constraint of sufficient expressiveness. In this paper we introduce a recipe for cooking up a two-dimensional logical matrix (or B-matrix) by the combination of two (possibly partial) non-deterministic logical matrices. We will show that such a combination may result in Bmatrices satisfying the property of sufficient expressiveness, even when the input matrices are not sufficiently expressive in isolation, and we will use this result to show that one-dimensional logics that are not finitely axiomatizable may inhabit finitely axiomatizable two-dimensional logics, becoming, thus, finitely axiomatizable by the addition of an extra dimension. We will illustrate the said construction using a well-known logic of formal inconsistency called **mCi**. We will first prove that this logic is not finitely axiomatizable by a one-dimensional (generalized) Hilbert-style system. Then, taking advantage of a known 5-valued non-deterministic logical matrix for this logic, we will combine it with another one, conveniently chosen so as to give rise to a B-matrix that is axiomatized by a two-dimensional Hilbert-style system that is both finite and analytic.

**Keywords:** Hilbert-style proof systems · finite axiomatizability · consequence relations · non-deterministic semantics · paraconsistency

### **1 Introduction**

A logic is commonly defined nowadays as a relation that connects collections of formulas from a formal language and satisfies some closure properties. The

c The Author(s) 2022 J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 640–658, 2022. https://doi.org/10.1007/978-3-031-10769-6\_37

V. Greati acknowledges support from CAPES—Finance Code 001 and from the FWF project P 33548. J. Marcos acknowledges support from CNPq.

established connections are called consecutions and each of them has two parts, an antecedent and a succedent, the latter often being said to 'follow from' (or to be a consequence of) the former. A logic may be manufactured in a number of ways, in particular as being induced by the set of derivations justified by the rules of inference of a given proof system. There are different kinds of proof systems, the differences between them residing mainly in the shapes of their rules of inference and on the way derivations are built. We will be interested here in Hilbert-style proof systems ('H-systems', for short), whose rules of inference have the same shape of the consecutions of the logic they canonically induce and whose associated derivations consist in expanding a given antecedent by applications of rules of inference until the desired succedent is produced. A remarkable property of an H-system is that the logic induced by it is the least logic containing the rules of inference of the system; in the words of [24], the system constitutes a 'logical basis' for the said logic.

Conventional H-systems, which we here dub 'Set-Fmla H-systems', do not allow for more than one formula in the succedents of the consecutions that they manipulate. Since [23], however, we have learned that the simple elimination of this restriction on H-systems —that is, allowing for sets of formulas rather than single formulas in the succedents— brings numerous advantages, among which we mention: *modularity* (correspondence between rules of inference and properties satisfied by a semantical structure), *analyticity* (control over the resources demanded to produce a derivation), and the automatic generation of analytic proof systems for a wide class of logics specified by sufficiently expressive nondeterministics semantics, with an associated straightforward proof-search procedure [13,18]. Such generalized systems, here dubbed 'Set-Set H-systems', induce logics whose consecutions involve succedents consisting in a collection of formulas, intuitively understood as 'alternative conclusions'.

An H-system H is said to be an *axiomatization* for a given logic L when the logic induced by H coincides with L. A desirable property for an axiomatization is *finiteness*, namely the property of consisting on a finite collection of schematic axioms and rules of inference. A logic having a finite axiomatization is said to be 'finitely based'. In the literature, one may find examples of logics having a quite simple, finite semantic presentation, being, in contrast, not finitely based in terms of Set-Fmla H-systems [21]. These very logics, however, when seen as companions of logics with multiple formulas in the succedent, turn out to be finitely based in terms of Set-Set H-systems [18]. In other words, by updating the underlying proof-theoretical and the logical formalisms, we are able to obtain a finite axiomatization for logics which in a more restricted setting could not be said to be finitely based. We may compare the above mentioned movement to the common mathematical practice of adding dimensions in order to provide better insight on some phenomenon. A well-known example of that is given by the Fundamental Theorem of Algebra, which provides an elegant solution to the problem of determining the roots of polynomials over a single variable, demanding only that real coefficients should be replaced by complex coefficients. Another example, from Machine Learning, is the 'kernel trick' employed in support vector machines: by increasing the dimensionality of the input space, the transformed data points become more easily separable by hyperplanes, making it possible to achieve better results in classification tasks.

It is worth noting that there are logics that fail to be finitely based in terms of Set-Set H-systems. An example of a logic designed with the sole purpose of illustrating this possibility was provided in [18]. One of the goals of the present work is to show that an important logic from the literature of logics of formal inconsistency (LFIs) called **mCi** is also an example of this phenomenon. This logic results from adding infinitely-many axiom schemas to the logic **mbC**, a logic that is obtained by extending positive classical logic with two axiom schemas. Incidentally, along the proof of this result, we will show that **mCi** is the limit of a strictly increasing chain of LFIs extending **mbC** (comparable to the case of CLim in da Costa's hierarchy of increasingly weaker paraconsistent calculi [16]). A natural question, then, is whether we can enrich our technology, in the same vein, in order to provide finite axiomatizations for all these logics. We answer that in the affirmative by means of the two-dimensional frameworks developed in [11,17]. Logics, in this case, connect pairs of collections of formulas. A consecution, in this setting, may be read as involving formulas that are accepted and those that are not, as well as formulas that are rejected and those that are not. 'Acceptance' and 'rejection' are seen, thus, as two orthogonal dimensions that may interact, making it possible, thus, to express more complex consecutions than those expressible in one-dimensional logics. Two-dimensional H-systems, which we call 'Set<sup>2</sup>-Set<sup>2</sup> H-systems', generalize Set-Set H-systems so as to manipulate pairs of collections of formulas, canonically inducing two-dimensional logics and constituting logical bases for them. Another goal of the present work is, therefore, to show how to obtain a two-dimensional logic inhabited by a (possibly not finitely based) one-dimensional logic of interest. More than that, the logic we obtain will be finitely axiomatizable in terms of a Set<sup>2</sup>-Set<sup>2</sup> analytic H-system. The only requirements is that the one-dimensional logic of interest must have an associated semantics in terms of a finite non-deterministic logical matrix and that this matrix can be combined with another one through a novel procedure that we will introduce, resulting in a two-dimensional non-deterministic matrix (a B-matrix [9]) satisfying a certain condition of sufficient expressiveness [17]. An application of this approach will be provided here in order to produce the first finite and analytic axiomatization of **mCi**.

The paper is organized as follows: Sect. 2 introduces basic terminology and definitions regarding algebras and languages. Section 3 presents the notions of one-dimensional logics and Set-Set H-systems. Section 4 proves that **mCi** is not finitely axiomatizable by one-dimensional H-systems. Section 5 introduces two-dimensional logics and H-systems, and describes the approach to extending a logical matrix to a B-matrix with the goal of finding a finite two-dimensional axiomatization for the logic associated with the former. Section 6 presents a twodimensional finite analytic H-system for **mCi**. In the final remarks, we highlight some byproducts of our present approach and some features of the resulting proof systems, in addition to pointing to some directions for further research.<sup>1</sup>

<sup>1</sup> Detailed proofs of some results may be found in https://arxiv.org/abs/2205.08920.

#### **2 Preliminaries**

<sup>A</sup> *propositional signature* is a family <sup>Σ</sup> := {Σk}k∈ω, where each <sup>Σ</sup><sup>k</sup> is a collection of k-ary *connectives*. We say that Σ *is finite* when its base set - <sup>k</sup>∈<sup>ω</sup> <sup>Σ</sup><sup>k</sup> is finite. A *non-deterministic algebra over* Σ, or simply Σ*-nd-algebra*, is a structure **A** := -A, ·**A**, such that <sup>A</sup> is a non-empty collection of values called the *carrier* of **<sup>A</sup>**, and, for each <sup>k</sup> <sup>∈</sup> <sup>ω</sup> and ∈ <sup>c</sup> <sup>Σ</sup>k, the multifunction <sup>c</sup> **<sup>A</sup>** : <sup>A</sup><sup>k</sup> <sup>→</sup> <sup>P</sup>(A) is the *interpretation of* <sup>c</sup> *in* **<sup>A</sup>**. When Σ and A are finite, we say that **<sup>A</sup>** is *finite*. When the range of all interpretations of **A** contains only singletons, **A** is said to be a *deterministic algebra over* Σ, or simply a Σ*-algebra*, meeting the usual definition from Universal Algebra [12]. When ∅ is not in the range of each <sup>c</sup> **<sup>A</sup>**, **<sup>A</sup>** is said to be *total*. Given a <sup>Σ</sup>-algebra **<sup>A</sup>** and a ∈ <sup>c</sup> <sup>Σ</sup><sup>1</sup>, we let <sup>c</sup> <sup>0</sup> **<sup>A</sup>**(x) := x and <sup>c</sup> <sup>i</sup>+1 **<sup>A</sup>** (x) := <sup>c</sup> **<sup>A</sup>**(<sup>c</sup> <sup>i</sup> **<sup>A</sup>**(x)). A mapping v : A <sup>→</sup> B is a *homomorphism* from **<sup>A</sup>** to **<sup>B</sup>** when, for all <sup>k</sup> <sup>∈</sup> <sup>ω</sup>, ∈ <sup>c</sup> <sup>Σ</sup><sup>k</sup> and <sup>x</sup>1,...,x<sup>k</sup> <sup>∈</sup> <sup>A</sup>, we have f[<sup>c</sup> **<sup>A</sup>**(x1,...,x<sup>k</sup>)] <sup>⊆</sup> <sup>c</sup> **<sup>B</sup>**(f(x<sup>1</sup>),...,f(x<sup>k</sup>)). The set of all homomorphisms from **<sup>A</sup>** to **<sup>B</sup>** is denoted by HomΣ(**A**, **<sup>B</sup>**). When **<sup>B</sup>** <sup>=</sup> **<sup>A</sup>**, we write EndΣ(**A**), rather than HomΣ(**A**, **<sup>A</sup>**), for the set of *endomorphisms on* **<sup>A</sup>**.

Let P be a denumerable collection of *propositional variables* and Σ be a propositional signature. The absolutely free Σ-algebra freely generated by P is denoted by **<sup>L</sup>**Σ(P) and called the <sup>Σ</sup>*-language generated by* <sup>P</sup>. The elements of L<sup>Σ</sup>(P) are called Σ*-formulas*, and those among them that are not propositional variables are called Σ-*compounds*. Given Φ <sup>⊆</sup> L<sup>Σ</sup>(P), we denote by <sup>Φ</sup><sup>c</sup> the set L<sup>Σ</sup>(P)\Φ. The homomorphisms from **<sup>L</sup>**Σ(P) to **<sup>A</sup>** are called *valuations on* **<sup>A</sup>**, and we denote by ValΣ(**A**) the collection thereof. Additionally, endomorphisms on **<sup>L</sup>**Σ(P) are dubbed <sup>Σ</sup>*-substitutions*, and we let Subs<sup>P</sup> <sup>Σ</sup> := EndΣ(**L**Σ(P)); when there is no risk of confusion, we may omit the superscript from this notation.

Given ϕ <sup>∈</sup> L<sup>Σ</sup>(P), let props(ϕ) be the set of propositional variables occurring in ϕ. If props(ϕ) = {p1,...,p<sup>k</sup>}, we say that <sup>ϕ</sup> is <sup>k</sup>-ary (*unary*, for <sup>k</sup> = 1; *binary*, for <sup>k</sup> = 2) and let <sup>ϕ</sup>**<sup>A</sup>** : <sup>A</sup><sup>k</sup> <sup>→</sup> <sup>P</sup>(A) be *the* <sup>k</sup>*-ary multifunction on* **<sup>A</sup>** *induced by* <sup>ϕ</sup>, where, for all <sup>x</sup><sup>1</sup>,...,x<sup>k</sup> <sup>∈</sup> <sup>A</sup>, we have <sup>ϕ</sup>**<sup>A</sup>**(x<sup>1</sup>,...,x<sup>k</sup>) := {v(ϕ) <sup>|</sup> <sup>v</sup> <sup>∈</sup> ValΣ(**A**) and <sup>v</sup>(p<sup>i</sup>) = <sup>x</sup><sup>i</sup>, for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>}. Moreover, given <sup>ψ</sup><sup>1</sup>,...,ψ<sup>k</sup> <sup>∈</sup> <sup>L</sup><sup>Σ</sup>(P), we write <sup>ϕ</sup>(ψ<sup>1</sup>,...,ψ<sup>k</sup>) for the <sup>Σ</sup>-formula <sup>ϕ</sup>**<sup>L</sup>**Σ(<sup>P</sup> )(ψ<sup>1</sup>,...,ψ<sup>k</sup>), and, where <sup>Φ</sup> <sup>⊆</sup> L<sup>Σ</sup>(P) is a set of k-ary Σ-formulas, we let Φ(ψ<sup>1</sup>,...,ψ<sup>k</sup>) := {ϕ(ψ<sup>1</sup>,...,ψ<sup>k</sup>) <sup>|</sup> ϕ <sup>∈</sup> Φ}. Given ϕ <sup>∈</sup> L<sup>Σ</sup>(P), by subf(ϕ) we refer to the set of *subformulas of* <sup>ϕ</sup>. Where θ is a unary Σ-formula, we define the set subf<sup>θ</sup>(ϕ) as {σ(θ) <sup>|</sup> <sup>σ</sup> : <sup>P</sup> <sup>→</sup> subf(ϕ)}. Given a set Θ ⊇ {p} of unary Σ-formulas, we set subf<sup>Θ</sup>(ϕ) := - <sup>θ</sup>∈<sup>Θ</sup> subf<sup>θ</sup>(ϕ). For example, if <sup>Θ</sup> <sup>=</sup> {p,¬p}, we will have subf<sup>Θ</sup>(¬(q <sup>∨</sup> r)) = {q, r, q <sup>∨</sup> r,¬(q <sup>∨</sup> r)} ∪ {¬q,¬r,¬(q <sup>∨</sup> r),¬¬(q <sup>∨</sup> r)}. Such generalized notion of subformulas will be used in the next section to provide a more generous proof-theoretical concept of *analyticity*.

#### **3 One-Dimensional Consequence Relations**

<sup>A</sup> Set-Set *statement* (or *sequent*) is a pair (Φ, Ψ) <sup>∈</sup> <sup>P</sup>(L<sup>Σ</sup>(P)) <sup>×</sup> <sup>P</sup>(L<sup>Σ</sup>(P)), where Φ is dubbed the *antecedent* and Ψ the *succedent*. A *one-dimensional con-* *sequence relation on* LΣ(P) is a collection of Set-Set statements satisfying, for all Φ, Ψ, Φ , Ψ <sup>⊆</sup> LΣ(P),


**(C)** if Π <sup>∪</sup> Φ - Ψ <sup>∪</sup> Π<sup>c</sup> for all Π <sup>⊆</sup> LΣ(P), then Φ -Ψ

Properties **(O)**, **(D)** and **(C)** are called *overlap*, *dilution* and *cut*, respectively. The relation is called *substitution-invariant* when it satisfies, for every σ <sup>∈</sup> SubsΣ,

**(S)** if Φ - Ψ, then σ[Φ] σ[Ψ]

and it is called *finitary* when it satisfies

**(F)** if Φ - Ψ, then Φ<sup>f</sup> -Ψ<sup>f</sup> for some finite Φ<sup>f</sup> <sup>⊆</sup> Φ and Ψ<sup>f</sup> <sup>⊆</sup> Ψ

One-dimensional consequence relations will also be referred to as *one-dimensional logics*. Substitution-invariant finitary one-dimensional logics will be called *standard*. We will denote by the complement of -, called the *compatibility relation associated with* -[10].

A Set-Fmla *statement* is a sequent having a single formula as consequent. When we restrict standard consequence relations to collections of Set-Fmla statements, we define the so-called (substitution-invariant finitary) *Tarskian consequence relations*. Every one-dimensional consequence relation determines a Tarskian consequence relation - <sup>⊆</sup> <sup>P</sup>(L<sup>Σ</sup>(P))×L<sup>Σ</sup>(P), dubbed *the* Set-Fmla *Tarskian companion of* -, such that, for all Φ ∪ {ψ} ⊆ L<sup>Σ</sup>(P), <sup>Φ</sup> <sup>ψ</sup> if, and only if, Φ - {ψ}. It is well-known that the collection of all Tarskian consequence relations over a fixed language constitutes a complete lattice under set-theoretical inclusion [25]. Given a set C of such relations, we will denote by C its supremum in the latter lattice.

We present in what follows two ways of obtaining one-dimensional consequence relations: one semantical, via non-deterministic logical matrices [6], and the other proof-theoretical, via Set-Set Hilbert-style systems [18,23].

<sup>A</sup> *non-deterministic* Σ*-matrix*, or simply Σ*-nd-matrix*, is a structure <sup>M</sup> := -**<sup>A</sup>**, D, where **<sup>A</sup>** is a Σ-nd-algebra, whose carrier is the set of *truth-values*, and D <sup>⊆</sup> A is the set of *designated truth-values*. Such structures are also known in the literature as 'PNmatrices' [7]; they generalize the so-called 'Nmatrices' [5], which are Σ-nd-matrices with the restriction that **<sup>A</sup>** must be total. From now on, whenever X <sup>⊆</sup> A, we denote A\X by <sup>X</sup>. In case **<sup>A</sup>** is deterministic, we simply say that <sup>M</sup> is a Σ*-matrix*. Also, <sup>M</sup> is said to be *finite* when **<sup>A</sup>** is finite. Every Σnd-matrix M determines a substitution-invariant one-dimensional consequence relation over Σ, denoted by -<sup>M</sup>, such that Φ -M <sup>Ψ</sup> if, and only if, for all <sup>v</sup> <sup>∈</sup> ValΣ(**A**), v[Φ] <sup>∩</sup> D <sup>=</sup> <sup>∅</sup> or v[Ψ] <sup>∩</sup> D <sup>=</sup> <sup>∅</sup>. It is worth noting that -M is finitary whenever the carrier of **A** is finite (the proof runs very similar to that of the same result for Nmatrices [5, Theorem 3.15]).

<sup>A</sup> *strong homomorphism* between <sup>Σ</sup>-matrices <sup>M</sup><sup>1</sup> := -**<sup>A</sup>**<sup>1</sup>, D<sup>1</sup> and <sup>M</sup><sup>2</sup> := -**<sup>A</sup>**<sup>2</sup>, D<sup>2</sup> is a homomorphism <sup>h</sup> between **<sup>A</sup>**<sup>1</sup> and **<sup>A</sup>**<sup>2</sup> such that <sup>x</sup> <sup>∈</sup> <sup>D</sup><sup>1</sup> if, and only if, h(x) <sup>∈</sup> D<sup>2</sup>. When there is a surjective strong homomorphism between M<sup>1</sup> and M2, we have that -<sup>M</sup><sup>1</sup> = -<sup>M</sup><sup>2</sup> .

Now, to the Hilbert-style systems. A (*schematic*) Set-Set *rule of inference* <sup>R</sup><sup>s</sup> is the collection of all substitution instances of the Set-Set statement <sup>s</sup>, called the *schema* of <sup>R</sup>s. Each <sup>r</sup> <sup>∈</sup> <sup>R</sup><sup>s</sup> is called a *rule instance of* <sup>R</sup>s. A (*schematic*) Set-Set *H-system* R is a collection of Set-Set rules of inference. When we constrain the rule instances of R to having only singletons as succedents, we obtain the conventional notion of Hilbert-style system, called here Set-Fmla *H-system*.

An R*-derivation* in a Set-Set H-system R is a rooted directed tree t such that every node is labelled with sets of formulas or with a discontinuation symbol <sup>∗</sup>, and in which every non-leaf node (that is, a node with child nodes) <sup>n</sup> in t is an *expansion of* <sup>n</sup> *by a rule instance* r of <sup>R</sup>. This means that the antecedent of r is contained in the label of <sup>n</sup> and that <sup>n</sup> has exactly one child node for each formula ψ in the succedent of r. These child nodes are, in turn, labelled with the same formulas as those of <sup>n</sup> plus the respective formula ψ. In case r has an empty succedent, then n has a single child node labelled with ∗. Here we will consider only *finitary* Set-Set H-systems, in which each rule instance has finite antecedent and succedent. In such cases, we only need to consider finite derivations. Figure 1 illustrates how derivations using only finitary rules of inference may be graphically represented. We denote by  <sup>t</sup> (n) the label of the node n in the tree t. It is worth observing that, for Set-Fmla H-systems, derivations are linear trees (as rule instances have a single formula in their succedents), or, in other words, just sequences of formulas built by applications of the rule instances, matching thus the conventional definition of Hilbert-style systems.

**Fig. 1.** Graphical representation of R-derivations, for R finitary. The dashed edges and blank circles represent other branches that may exist in the derivation. We usually omit the formulas inherited from the parent node, exhibiting only the ones introduced by the applied rule of inference. In both cases, we must have Γ ⊆ Φ to enable the application of the rule.

A node <sup>n</sup> of an <sup>R</sup>-derivation <sup>t</sup> is called Δ*-closed* in case it is a leaf node with  t (n) = <sup>∗</sup> or  <sup>t</sup> (n)<sup>∩</sup> Δ <sup>=</sup> <sup>∅</sup>. A branch of <sup>t</sup> is Δ-closed when it ends in a Δ-closed node. When every branch in <sup>t</sup> is Δ-closed, we say that <sup>R</sup> is itself Δ*-closed*. An <sup>R</sup>*-proof* of a Set-Set statement (Φ, Ψ) is a Ψ-closed <sup>R</sup>-derivation <sup>t</sup> such that  t (rt(t)) <sup>⊆</sup> Φ.

Consider the binary relation -<sup>R</sup> on <sup>P</sup>(LΣ(P)) such that <sup>Φ</sup>-<sup>R</sup>Ψ if, and only if, there is an <sup>R</sup>-proof of (Φ, Ψ). This relation is the smallest substitution-invariant one-dimensional consequence relation containing the rules of inference of R, and it is finitary when R is finitary. Since Set-Set (and Set-Fmla) H-systems canonically induce one-dimensional consequence relations, we may refer to them as *one-dimensional H-systems* or *one-dimensional axiomatizations*. In case there is a proof of (Φ, Ψ) whose nodes are labelled only with subsets of subf<sup>Θ</sup>[<sup>Φ</sup> <sup>∪</sup> <sup>Ψ</sup>], we write Φ -Θ <sup>R</sup> <sup>Ψ</sup> . In case -<sup>R</sup> = -Θ <sup>R</sup> , we say that <sup>R</sup> is <sup>Θ</sup>*-analytic*. Note that the ordinary notion of analyticity obtains when Θ <sup>=</sup> {p}. From now on, whenever we use the word "analytic" we will mean this extended notion of Θ-analyticity, for some Θ implicit in the context. When the Θ happens to be important for us or we identify any risk of confusion, we will mention it explicitly.

In [13], based on the seminal results on axiomatizability via Set-Set Hsystems by Shoesmith and Smiley [23], it was proved that any non-deterministic logical matrix M satisfying a criterion of sufficient expressiveness is axiomatizable by a Θ-analytic Set-Set Hilbert-style system, which is finite whenever <sup>M</sup> is finite, where Θ is the set of separators for the pairs of truth-values of <sup>M</sup>. According to such criterion, an nd-matrix is *sufficiently expressive* when, for every pair (x, y) of distinct truth-values, there is a unary formula S, called a *separator for* (x, y), such that S**A**(x) <sup>⊆</sup> D and S**A**(y) <sup>⊆</sup> D, or vice-versa; in other words, when every pair of distinct truth-values is *separable in* M.

We emphasize that it is essential for the above result the adoption of Set-Set H-systems, instead of the more restricted Set-Fmla H-systems. In fact, while two-valued matrices may always be finitely axiomatized by Set-Fmla Hsystems [22], there are sufficiently expressive three-valued deterministic matrices [21] and even quite simple two-valued non-deterministic matrices [19] that fail to be finitely axiomatized by Set-Fmla H-systems. When the nd-matrix at hand is not sufficiently expressive, we may observe the same phenomenon of not having a finite axiomatization also in terms of Set-Set H-systems, even if the said ndmatrix is finite. The first example (and, to the best of our knowledge, the only one in the current literature) of this fact appeared in [13], which we reproduce here for later reference:

*Example 1.* Consider the signature <sup>Σ</sup> := {Σ<sup>k</sup>}<sup>k</sup>∈<sup>ω</sup> such that <sup>Σ</sup><sup>1</sup> := {g, h} and <sup>Σ</sup><sup>k</sup> := <sup>∅</sup> for all <sup>k</sup> = 1. Let <sup>M</sup> := -**<sup>A</sup>**, {**a**} be a Σ-nd-matrix, with A := {**a**, **<sup>b</sup>**, **<sup>c</sup>**} and

$$g\_{\mathbf{A}}(x) = \begin{cases} \{\mathbf{a}\}, & \text{if } x = \mathbf{c} \\ A, & \text{otherwise} \end{cases} \quad h\_{\mathbf{A}}(x) = \begin{cases} \{\mathbf{b}\}, & \text{if } x = \mathbf{b} \\ A, & \text{otherwise} \end{cases}$$

This matrix is not sufficiently expressive because there is no separator for the pair (**b**, **<sup>c</sup>**), and [13] proved that it is not axiomatizable by a finite Set-Set H-system, even though an infinite Set-Set system that captures it has a quite simple description in terms of the following infinite collection of schemas:

$$\frac{h^i(p)}{p, g(p)}, \text{ for all } i \in \omega.$$

In the next section, we reveal another example of this same phenomenon, this time of the known LFI [14] called **mCi**. In the path of proving that this logic is not axiomatizable by a finite Set-Set H-system, we will show that there are infinitely many LFIs between **mbC** and **mCi**, organized in a strictly increasing chain whose limit is **mCi** itself.

Before continuing, it is worth emphasizing that any given non-sufficiently expressive nd-matrix may be conservatively extended to a sufficiently expressive nd-matrix provided new connectives are added to the language [18]. These new connectives have the sole purpose of separating the pairs of truth-values for which no separator is available in the original language. The Set-Set system produced from this extended nd-matrix can, then, be used to reason over the original logic, since the extension is conservative. However, these new connectives, which a priori have no meaning, are very likely to appear in derivations of consecutions of the original logic. This might not look like an attractive option to inferentialists who believe that purity of the schematic rules governing a given logical constant is essential for the meaning of the latter to be coherently fixed. In the subsequent sections, we will introduce and apply a potentially more expressive notion of logic in order to provide a *finite* and *analytic* H-system for logics that are not finitely axiomatizable in one dimension, while preserving their original languages.

#### **4 The Logic mCi is Not Finitely Axiomatizable**

A one-dimensional logic over Σ is said to be <sup>¬</sup>*-paraconsistent* when we have p,¬p q, for p, q <sup>∈</sup> P. Moreover, is ¬*-gently explosive* in case there is a collection (p) <sup>⊆</sup> L<sup>Σ</sup>(P) of unary formulas such that, for some <sup>ϕ</sup> <sup>∈</sup> <sup>L</sup><sup>Σ</sup>(P), we have (ϕ), ϕ <sup>∅</sup>; (ϕ),¬ϕ <sup>∅</sup>, and, for all ϕ <sup>∈</sup> L<sup>Σ</sup>(P), (ϕ), ϕ,¬<sup>ϕ</sup> -∅. We say that is a *logic of formal inconsistency (LFI)* in case it is ¬-paraconsistent yet <sup>¬</sup>-gently explosive. In case (p) = {◦p}, for ◦ a (primitive or composite) *consistency connective*, the logic is said also to be a *C -system*. In what follows, let Σ◦ be the propositional signature such that <sup>Σ</sup>◦ <sup>1</sup> := {¬, ◦}, <sup>Σ</sup>◦ <sup>2</sup> := {∧,∨, ⊃ }, and Σ◦ <sup>k</sup> := <sup>∅</sup> for all <sup>k</sup> ∈ {1, <sup>2</sup>}.

One of the simplest **C**-systems is the logic **mbC**, which was first presented in terms of a Set-Fmla H-system over Σ◦ obtained by extending any Set-Fmla H-system for positive classical logic (**CPL**<sup>+</sup>) with the following pair of axiom schemas:

(em) p ∨ ¬p (bc1) ◦p <sup>⊃</sup> (p <sup>⊃</sup> (¬p <sup>⊃</sup> q))

The logic **mCi**, in turn, is the **C**-system resulting from extending the Hsystem for **mbC** with the following (infinitely many) axiom schemas [20] (the resulting Set-Fmla H-system is denoted here by H**mCi**):

(ci) ¬◦p <sup>⊃</sup> (p ∧ ¬p) (ci)<sup>j</sup> ◦¬<sup>j</sup>◦<sup>p</sup> (for all 0 <sup>≤</sup> j<ω) A unary connective c is said to constitute a *classical negation* in a onedimensional logic extending **CPL**<sup>+</sup> in case, for all ϕ, ψ <sup>∈</sup> <sup>L</sup>Σ(P), <sup>∅</sup>ϕ∨<sup>c</sup> (ϕ) and ∅ ϕ <sup>⊃</sup> (<sup>c</sup> (ϕ) <sup>⊃</sup> ψ). One of the main differences between **mCi** and **mbC** is that an inconsistency connective • may be defined in the former using the paraconsistent negation, instead of a classical negation, by setting •ϕ := ¬◦ϕ [20].

Both logics above were presented in [15] in ways other than H-systems: via tableau systems, via bivaluation semantics and via possible-translations semantics. In addition, while these logics are known not to be characterizable by a single finite deterministic matrix [20], a characteristic nd-matrix is available for **mbC** [1] and a 5-valued non-deterministic logical matrix is available for **mCi** [2], witnessing the importance of non-deterministic semantics in the study of nonclassical logics. Such characterizations, moreover, allow for the extraction of sequent-style systems for these logics by the methodologies developed in [3,4]. Since **mCi**'s 5-valued nd-matrix will be useful for us in future sections, we recall it below for ease of reference.

**Definition 1.** *Let* <sup>V</sup><sup>5</sup> := {f, F, I, T, t} *and* <sup>Y</sup><sup>5</sup> := {I,T,t}*. Define the* <sup>Σ</sup>◦*-matrix* <sup>M</sup>**mCi** := -**<sup>A</sup>**<sup>5</sup>,Y5 *such that* **<sup>A</sup>**<sup>5</sup> := -<sup>V</sup><sup>5</sup>, ·**<sup>A</sup>**<sup>5</sup> *interprets the connectives of* <sup>Σ</sup>◦ *according to the following:*

<sup>∧</sup>**<sup>A</sup>**<sup>5</sup> (x1, x<sup>2</sup>) := {f} *if either* <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> *or* <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> {I,t} *otherwise* <sup>∨</sup>**<sup>A</sup>**<sup>5</sup> (x1, x<sup>2</sup>) := {I,t} *if either* <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> *or* <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> {f} *if* <sup>x</sup>1*,*x<sup>2</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> <sup>⊃</sup> **<sup>A</sup>**<sup>5</sup> (x1, x<sup>2</sup>) := {I,t} *if either* <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> *or* <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> {f} *if* <sup>x</sup><sup>1</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> *and* <sup>x</sup><sup>2</sup> <sup>∈</sup> <sup>Y</sup><sup>5</sup> f F I T t <sup>¬</sup>**<sup>A</sup>**<sup>5</sup> {I*,*t} {T} {I*,*t} {F} {f} f F I T t ◦**<sup>A</sup>**<sup>5</sup> {T} {T} {F} {T} {T}

One might be tempted to apply the axiomatization algorithm of [13] to the finite non-deterministic logical matrix defined above to obtain a finite and analytic Set-Set system for **mCi**. However, it is not obvious, at first, whether this matrix is sufficiently expressive or not (we will, in fact, prove that it is not). In what follows, we will show now **mCi** is actually axiomatizable neither by a finite Set-Fmla H-system (first part), nor by a finite Set-Set H-system (second part); it so happens, thus, that it was not by chance that H**mCi** has been originally presented with infinitely many rule schemas. For the first part, we rely on the following general result:

**Theorem 1 (**[25]**, Theorem 2.2.8, adapted).** *Let be a standard Tarskian consequence relation. Then is axiomatizable by a finite* Set-Fmla *H-system if, and only if, there is no strictly increasing sequence* <sup>0</sup> , <sup>1</sup> ,..., <sup>n</sup> ,... *of standard Tarskian consequence relations such that* = <sup>i</sup>∈<sup>ω</sup> <sup>i</sup> *.*

In order to apply the above theorem, we first present a family of finite Set-Fmla H-systems that, in the sequel, will be used to provide an increasing sequence of standard Tarskian consequence relations whose supremum is precisely **mCi**. Next, we show that this sequence is stricly increasing, by employing the matrix methodology traditionally used for showing the independence of axioms in a proof system.

**Definition 2.** *For each* k <sup>∈</sup> ω*, let* <sup>H</sup><sup>k</sup> **mCi** *be a* Set-Fmla *H-system for positive classical logic together with the schemas (em), (bc1), (ci) and (ci)*<sup>j</sup> *, for all* 0 ≤ j <sup>≤</sup> k*.*

Since H<sup>k</sup> **mCi** may be obtained from H**mCi** by deleting some (infinitely many) axioms, it is immediate that:

**Proposition 1.** *For every* <sup>k</sup> <sup>∈</sup> <sup>ω</sup>*,* <sup>H</sup><sup>k</sup> **mCi** ⊆ **mCi** *.*

The way we define the promised increasing sequence of consequence relations in the next result is by taking the systems H<sup>k</sup> **mCi** with odd superscripts, namely, we will be working with the sequence <sup>H</sup><sup>1</sup> **mCi** , <sup>H</sup><sup>3</sup> **mCi** , <sup>H</sup><sup>5</sup> **mCi** ,... Excluding the cases where k is even will facilitate, in particular, the proof of Lemma 3.

\*\*Lemma 1.\*\*  $For \ each \ 1 \le k < \omega$ ,  $let \left| \frac{\pi}{k} := \frac{\|\cdot\|\_{\mathcal{H}\_{\mathbf{mCi}}}}{\mathcal{H}\_{\mathbf{mCi}}}$ .  $Then \left| \frac{\pi}{1} \le \frac{\|\cdot\|\_{2}}{2} \subseteq \dots$ ,  $and$ .

$$\left| \frac{\pi}{\mathbf{mCi}} = \bigsqcup\_{1 \le k < \omega} \bigsqcup\_{k} \mathbf{1}$$
.

Finally, we prove that the sequence outlined in the paragraph before Lemma 1 is strictly increasing. In order to achieve this, we define, for each 1 <sup>≤</sup> k<ω, a <sup>Σ</sup>◦-matrix <sup>M</sup><sup>k</sup> and prove that <sup>H</sup><sup>2</sup>k−<sup>1</sup> **mCi** is sound with respect to such matrix. Then, in the second part of the proof (the "independence part"), we show that, for each <sup>1</sup> <sup>≤</sup> k<ω, <sup>M</sup><sup>k</sup> fails to validate the rule schema (ci)<sup>j</sup> , for <sup>j</sup> = 2k, which is present in H2(k+1)−<sup>1</sup> **mCi** . In this way, by the contrapositive of the soundness result proved in the first part, we will have (ci)<sup>j</sup> provable in H2(k+1)−<sup>1</sup> **mCi** while unprovable in H<sup>2</sup>k−<sup>1</sup> **mCi** . In what follows, for any <sup>k</sup> <sup>∈</sup> <sup>ω</sup>, we use <sup>k</sup><sup>∗</sup> to refer to the successor of <sup>k</sup>.

**Definition 3.** *Let* <sup>1</sup> <sup>≤</sup> k<ω*. Define the* <sup>2</sup>k<sup>∗</sup>*-valued* <sup>Σ</sup>◦*-matrix* <sup>M</sup><sup>k</sup> := -**<sup>A</sup>**<sup>k</sup>, D<sup>k</sup> *such that* <sup>D</sup><sup>k</sup> := {k<sup>∗</sup> + 1,..., <sup>2</sup>k<sup>∗</sup>} *and* **<sup>A</sup>**<sup>k</sup> := -{1,..., <sup>2</sup>k<sup>∗</sup>}, ·**<sup>A</sup>**<sup>k</sup> *, the interpretation of* <sup>Σ</sup>◦ *in* **<sup>A</sup>**<sup>k</sup> *given by the following operations:*

$$x \vee\_{\mathbf{A}\_k} y := \begin{cases} 1 & \text{if } x, y \in \overline{D\_k} \\ k^\* + 1 & \text{otherwise} \end{cases} \qquad x \wedge\_{\mathbf{A}\_k} y := \begin{cases} k^\* + 1 & \text{if } x, y \in D\_k \\ 1 & \text{otherwise} \end{cases}$$

$$x \supset \mathbf{A}\_k y := \begin{cases} 1 & \text{if } x \in D\_k \text{ and } y \notin \overline{D\_k} \\ k^\* + 1 & \text{otherwise} \end{cases}$$

$$\circ\_{\mathbf{A}\_k} x := \begin{cases} 1 & \text{if } x = 2k^\* \\ k^\* + 1 & \text{otherwise} \end{cases} \neg\_{\mathbf{A}\_k} x := \begin{cases} k^\* + 1 & \text{if } x \in \{1, 2k^\*\} \\ x + k^\* & \text{if } 2 \le x \le k^\* \\ x - (k^\* - 1) & \text{if } k^\* + 1 \le x \le 2k^\* - 1 \end{cases}$$

Before continuing, we state results concerning this construction, which will be used in the remainder of the current line of argumentation. In what follows, when there is no risk of confusion, we omit the subscript '**A**k' from the interpretations to simplify the notation.

**Lemma 2.** *For all* k <sup>≥</sup> <sup>1</sup> *and* <sup>1</sup> <sup>≤</sup> m <sup>≤</sup> <sup>2</sup>k*,*

$$\neg^{m}\_{\mathbf{A}\_{k}}(k^{\*}+1) = \begin{cases} (k^{\*}+1) + \frac{m}{2}, & \text{if } m \text{ is even} \\ 1 + \frac{m+1}{2}, & \text{otherwise} \end{cases}$$

**Lemma 3.** *For all* <sup>1</sup> <sup>≤</sup> k<ω*, we have* <sup>H</sup>2k∗−<sup>1</sup> **mCi** ◦¬<sup>2</sup><sup>k</sup>◦p *but* H2k−<sup>1</sup> **mCi** ◦¬<sup>2</sup><sup>k</sup>◦p*.*

Finally, Theorem 1, Lemma 1 and Lemma 3 give us the main result:

**Theorem 2. mCi** *is not axiomatizable by a finite* Set-Fmla *H-system.*

For the second part —namely, that no finite Set-Set H-system axiomatizes **mCi**—, we make use of the following result:

**Theorem 3 (**[23]**, Theorem 5.37, adapted).** *Let be a one-dimensional consequence relation over a propositional signature containing the binary connective* <sup>∨</sup>*. Suppose that the* Set-Fmla *Tarskian companion of* -*, denoted by* -*, satisfies the following property:*

Φ, ϕ <sup>∨</sup> ψ <sup>γ</sup> *if, and only if,* Φ, ϕ <sup>γ</sup> *and* Φ, ψ <sup>γ</sup> (Disj)

*If a* Set-Set *H-system* R *axiomatizes* -*, then* R *may be converted into a* Set-Fmla *H-system for that is finite whenever* R *is finite.*

It turns out that:

**Lemma 4. mCi** *satisfies (Disj).*

*Proof.* The non-deterministic semantics of **mCi** gives us that, for all ϕ, ψ <sup>∈</sup> L<sup>Σ</sup>◦ (P), ϕ -<sup>M</sup>**mCi** <sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>; <sup>ψ</sup> -<sup>M</sup>**mCi** <sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>, and <sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup> -<sup>M</sup>**mCi** ϕ, ψ, and such facts easily imply (Disj).

**Theorem 4. mCi** *is not axiomatizable by a finite* Set-Set *H-system.*

*Proof.* If R were a finite Set-Set H-system for **mCi**, then, by Lemma 4 and Theorem 3, it could be turned into a finite Set-Fmla H-system for this very logic. This would contradict Theorem 2.

Finding a finite one-dimensional H-system for **mCi** (analytic or not) over the same language, then, proved to be impossible. The previous result also tells us that there is no sufficiently expressive non-deterministic matrix that characterizes **mCi** (for otherwise the recipe in [13] would deliver a finite analytic Set-Set H-system for it), and we may conclude, in particular, that:

## **Corollary 1.** *The nd-matrix* M**mCi** *is not sufficiently expressive.*

The pairs of truth-values of M**mCi** that seem not to be separable (at least one of these pairs must not be, in view of the above corollary) are (t, T) and (f,F). The insufficiency of expressive power to take these specific pairs of values apart, however, would be circumvented if we had considered instead the matrix defined below, obtained from M**mCi** by changing its set of designated values:

#### **Definition 4.** *Let* M<sup>n</sup> **mCi** := -**<sup>A</sup>**<sup>5</sup>, <sup>N</sup>5*, where* <sup>N</sup><sup>5</sup> := {f, I,T}*.*

Note that, in M<sup>n</sup> **mCi**, we have t <sup>∈</sup> <sup>N</sup>5, while T <sup>∈</sup> <sup>N</sup>5, and we have that f <sup>∈</sup> <sup>N</sup>5, while F <sup>∈</sup> <sup>N</sup>5. Therefore, the single propositional variable <sup>p</sup> separates in <sup>M</sup><sup>n</sup> **mCi** the pairs (t, T) and (f,F). On the other hand, it is not clear now whether the pairs (t, F) and (f,T) are separable in this new matrix. Nonetheless, we will see, in the next section, how we can take advantage of the semantics of nondeterministic B-matrices in order to combine the expressiveness of M**mCi** and Mn **mCi** in a very simple and intuitive manner, preserving the language and the algebra shared by these matrices. The notion of logic induced by the resulting structure will not be one-dimensional, as the one presented before, but rather two-dimensional, in a sense we shall detail in a moment. We identify two important aspects of this combination: first, the logics determined by the original matrices can be fully recovered from the combined logic; and, second, since the notions of H-systems and sufficient expressiveness, as well as the axiomatization algorithm of [13], were generalized in [17], the resulting two-dimensional logic may be algorithmically axiomatized by an *analytic* two-dimensional H-system that is *finite* if the combining matrices are finite, provided the criterion of sufficient expressiveness is satisfied after the combination. This will be the case, in particular, when we combine M**mCi** and M<sup>n</sup> **mCi**. Consequently, this novel way of combining logics provides a quite general approach for producing finite and analytic axiomatizations for logics determined by non-deterministic logical matrices that fail to be finitely axiomatizable in one dimension; this includes the logics from Example 1, and also **mCi**.

#### **5 Two-Dimensional Logics**

From now on, we will employ the symbols Y, Y , N and Nto informally refer to, respectively, the cognitive attitudes of *acceptance*, *non-acceptance*, *rejection* and *non-rejection*, collected in the set Atts := {Y, Y , N, N}. Given a set Φ <sup>⊆</sup> L<sup>Σ</sup>(P), we will write <sup>Φ</sup><sup>α</sup> to intuitively mean that a given agent entertains the cognitive attitude α <sup>∈</sup> Atts with respect to the formulas in Φ, that is: the formulas in <sup>Φ</sup><sup>Y</sup> will be understood as being accepted by the agent; the ones in <sup>Φ</sup> Y , as nonaccepted; the ones in <sup>Φ</sup><sup>N</sup>, as rejected; and the ones in Φ N, as non-rejected. Where α <sup>∈</sup> Atts, we let ˜α be its flipped version, that is, <sup>Y</sup>˜ := Y , ˜Y := Y, N˜ := Nand ˜N:= N.

We refer to each as a B*-statement*, where (Φ<sup>Y</sup>, Φ<sup>N</sup>) is the *antecedent* and (Φ Y , Φ N) is the *succedent*. The sets in the latter pairs are called *components*. A B*-consequence relation* is a collection · · | · · of <sup>B</sup>statements satisfying:

**(O2)** if <sup>Φ</sup><sup>Y</sup> <sup>∩</sup> <sup>Φ</sup> Y <sup>=</sup> <sup>∅</sup> or <sup>Φ</sup><sup>N</sup> <sup>∩</sup> <sup>Φ</sup> N <sup>=</sup> <sup>∅</sup>, then <sup>Φ</sup> N <sup>Φ</sup><sup>Y</sup> | Φ Y Φ<sup>N</sup> **(D2)** if <sup>Ψ</sup> N <sup>Ψ</sup><sup>Y</sup> | Ψ Y <sup>Ψ</sup><sup>N</sup> and <sup>Ψ</sup><sup>α</sup> <sup>⊆</sup> <sup>Φ</sup><sup>α</sup> for every <sup>α</sup> <sup>∈</sup> Atts, then <sup>Φ</sup> N <sup>Φ</sup><sup>Y</sup> | Φ Y Φ<sup>N</sup> **(C2)** if <sup>Ω</sup><sup>c</sup> S <sup>Ω</sup><sup>S</sup> | Ω<sup>c</sup> S Ω S for all <sup>Φ</sup><sup>Y</sup> <sup>⊆</sup> <sup>Ω</sup><sup>S</sup> <sup>⊆</sup> <sup>Φ</sup><sup>c</sup> Yand <sup>Φ</sup><sup>N</sup> <sup>⊆</sup> <sup>Ω</sup> S<sup>⊆</sup> Φ<sup>c</sup> N , then <sup>Φ</sup> N <sup>Φ</sup><sup>Y</sup> | Φ YΦ<sup>N</sup>

A B-consequence relation is called *substitution-invariant* if, in addition, <sup>Φ</sup> N <sup>Φ</sup><sup>Y</sup> | Φ Y Φ<sup>N</sup> holds whenever, for every σ <sup>∈</sup> SubsΣ:

$$\text{(S2) } \frac{\Psi\_{\mathsf{H}}}{\Psi\_{\mathsf{Y}}} \Big| \frac{\Psi\_{\mathsf{A}}}{\Psi\_{\mathsf{N}}} \text{ and } \Phi\_{\alpha} = \sigma(\mathsf{V}\_{\alpha}) \text{ for every } \alpha \in \mathsf{Atts}$$

Moreover, a B-consequence relation is called *finitary* when it enjoys the property

**(F2)** if <sup>Φ</sup> N <sup>Φ</sup><sup>Y</sup> | Φ Y <sup>Φ</sup><sup>N</sup> , then <sup>Φ</sup><sup>f</sup> N Φ<sup>f</sup> Y | Φf Y Φ<sup>f</sup> N , for some finite Φ<sup>f</sup> <sup>α</sup> <sup>⊆</sup> <sup>Φ</sup><sup>α</sup>, and each <sup>α</sup> <sup>∈</sup> Atts

In what follows, B-consequence relations will also be referred to as *two-dimensional logics*. The complement of · · | · · , sometimes called the *compatibility relation associated with* · · | · · [10], will be denoted by · · ×| · · . Every <sup>B</sup>-consequence relation C := · · | · · induces one-dimensional consequence relations -C <sup>t</sup> and -C <sup>f</sup> , such that Φ<sup>Y</sup>-C t Φ Y iff <sup>∅</sup> <sup>Φ</sup><sup>Y</sup> | Φ Y ∅ , and <sup>Φ</sup><sup>N</sup>-C f Φ Niff <sup>Φ</sup> N ∅ <sup>|</sup> <sup>∅</sup> <sup>Φ</sup><sup>N</sup> . Given a one-dimensional consequence relation -, we say that it *inhabits the* t*-aspect of* C if - = -C <sup>t</sup> , and that it *inhabits the* f*-aspect of* C if - = -C <sup>f</sup> . B-consequence relations actually induce many other (even non-Tarskian) one-dimensional notions of logics; the reader is referred to [9,11] for a thorough presentation on this topic.

As we did for one-dimensional consequence relations, we present now realizations of B-consequence relations, first via the semantics of nd-B-matrices, then by means of two-dimensional H-systems.

<sup>A</sup> *non-deterministic* <sup>B</sup>*-matrix over* Σ, or simply Σ*-nd-*B*-matrix*, is a structure M := -**<sup>A</sup>**,Y, <sup>N</sup>, where **<sup>A</sup>** is a Σ-nd-algebra, <sup>Y</sup> <sup>⊆</sup> A is the set of *designated values* and <sup>N</sup> <sup>⊆</sup> A is the set of *antidesignated values* of <sup>M</sup>. For convenience, we define Y := A\<sup>Y</sup> to be the set of *non-designated values*, and N:= A\<sup>N</sup> to be the set of *non-antidesignated values* of M. The elements of ValΣ(**A**) are dubbed M*-valuations*. The B*-entailment relation determined by* M is a collection · · | · · <sup>M</sup> of B-statements such that

$$(\mathsf{B}\text{-ent})\quad\frac{\Phi\_{\mathsf{N}}}{\Phi\_{\mathsf{Y}}}|\frac{\Phi\_{\mathsf{A}}}{\Phi\_{\mathsf{N}}}\mathfrak{M}\quad\text{iff}\quad\text{there is no }\mathfrak{M}\text{-value}\ v\text{ such that}$$

$$v(\Phi\_{\alpha})\subseteq\alpha\text{ for each }\alpha\in\mathsf{Atts},$$

for every Φ<sup>Y</sup>, Φ<sup>N</sup>, Φ Y , Φ N<sup>⊆</sup> L<sup>Σ</sup>(P). Whenever <sup>Φ</sup> N <sup>Φ</sup><sup>Y</sup> | Φ Y <sup>Φ</sup><sup>N</sup> <sup>M</sup> , we say that the Bstatement *holds in* M or *is valid in* M. An M-valuation that bears witness to <sup>Φ</sup> N Φ<sup>Y</sup> ×| Φ Y <sup>Φ</sup><sup>N</sup> <sup>M</sup> is called a *countermodel for in* M. One may easily check that · · | · · <sup>M</sup> is a substitution-invariant <sup>B</sup>-consequence relation, that is finitary when A is finite. Taking <sup>C</sup> as · · | · · <sup>M</sup> , we define -<sup>M</sup><sup>t</sup> := -C <sup>t</sup> and -M <sup>f</sup> := -C f .

**Fig. 2.** Graphical representation of finite R-derivations. We emphasize that, in both cases, we must have Ψ<sup>Y</sup> ⊆ Φ<sup>Y</sup> and Ψ<sup>N</sup> ⊆ Φ<sup>N</sup> to enable the application of the rule.

We move now to two-dimensional, or Set<sup>2</sup>-Set<sup>2</sup>, H-systems, first introduced in [17]. A *(schematic)* Set<sup>2</sup>-Set<sup>2</sup> *rule of inference* <sup>R</sup><sup>s</sup> is the collection of all substitution instances of the Set<sup>2</sup>-Set<sup>2</sup> statement <sup>s</sup>, called the *schema* of <sup>R</sup><sup>s</sup>. Each <sup>r</sup> <sup>∈</sup> <sup>R</sup><sup>s</sup> is said to be a *rule instance of* <sup>R</sup><sup>s</sup>. In a proof-theoretic context, rather than writing the B-statement , we shall denote the corresponding rule by <sup>Φ</sup><sup>Y</sup> <sup>Φ</sup><sup>N</sup> Φ Y Φ N . A *(schematic)* Set<sup>2</sup>-Set<sup>2</sup> *H-system* R is a collection of Set<sup>2</sup>-Set<sup>2</sup> rules of inference. Set<sup>2</sup>-Set<sup>2</sup> *derivations* are as in the Set-Set H-systems, but now the nodes are labelled with pairs of sets of formulas, instead of a single set. When applying a rule instance, each formula in the succedent produces a new branch as before, but now the formula goes to the same component in which it was found in the rule instance. See Fig. 2 for a general representation and compare it with Fig. 1.

Let <sup>t</sup> be an <sup>R</sup>-derivation. A node <sup>n</sup> of <sup>t</sup> is (Ψ Y , Ψ N)*-closed* in case it is discontinued (namely, labelled with <sup>∗</sup>) or it is a leaf node with  <sup>t</sup> (n)=(ΦY, Φ<sup>N</sup>) and either <sup>Φ</sup><sup>Y</sup> <sup>∩</sup> <sup>Ψ</sup> Y <sup>=</sup> <sup>∅</sup> or <sup>Φ</sup><sup>N</sup> <sup>∩</sup> <sup>Ψ</sup> N <sup>=</sup> <sup>∅</sup>. A branch of <sup>t</sup> is (Ψ Y , Ψ N)*-closed* when it ends in a (Ψ Y , Ψ N)-closed node. An <sup>R</sup>-derivation <sup>t</sup> is said to be (Ψ Y , Ψ N) *closed* when all of its branches are (Ψ Y , Ψ N)-closed. An R*-proof* of is a (Φ Y , Φ N)-closed <sup>R</sup>-derivation <sup>t</sup> with  <sup>t</sup> (rt(t)) <sup>⊆</sup> (Φ<sup>Y</sup>, Φ<sup>N</sup>). The definitions of the (finitary) substitution-invariant B-consequence relation · · | · · <sup>R</sup> induced by a (finitary) Set<sup>2</sup>-Set<sup>2</sup> H-system <sup>R</sup> and <sup>Θ</sup>-analyticity are obvious generalizations of the corresponding Set-Set definitions.

In [17], the notion of sufficient expressiveness was generalized to nd-Bmatrices. We reproduce here the main definitions for self-containment:

**Definition 5.** *Let* M := -**<sup>A</sup>**,Y, <sup>N</sup> *be a* Σ*-nd-*B*-matrix.*


In the same work [17], the axiomatization algorithm of [13] was also generalized, guaranteeing that every sufficiently expressive nd-B-matrix M is axiomatizable by a Θ-analytic Set<sup>2</sup>-Set<sup>2</sup> H-system, which is finite whenever <sup>M</sup> is finite, where Θ is a set of separators for the pairs of truth-values of <sup>M</sup>. Note that, in the second bullet of the above definition, a unary formula is characterized as a separator whenever it separates a pair of truth-values according to *at least one* of the distinguished sets of values. This means that having two of such sets may allow us to separate more pairs of truth-values than having a single set, that is, the nd-B-matrices are, in this sense, potentially more expressive than the (one-dimensional) logical matrices.

*Example 2.* Let **<sup>A</sup>** be the Σ-nd-algebra from Example 1, and consider the nd-B-matrix M := -**<sup>A</sup>**, {**a**}, {**b**}. As we know, in this matrix the pair (**b**, **<sup>c</sup>**) is not separable if we consider only the set of designated values {**a**}. However, as we have now the set {**b**} of antidesignated truth-values, the separation becomes evident: the propositional variable p is a separator for this pair now, since **<sup>b</sup>** ∈ {**b**} and **<sup>c</sup>** ∈ {**b**}. The recipe from [17] produces the following Set<sup>2</sup>-Set<sup>2</sup> axiomatization for M, with only three very simple schematic rules of inference:

$$\begin{array}{ccc}\frac{p \parallel p}{\parallel} & \frac{\parallel}{f(p), p \parallel p} & \frac{\parallel}{\parallel} \frac{p}{t(p)} \\ \end{array}$$

By construction, the one-dimensional logic determined by the nd-matrix of Example 1 inhabits the t-aspect of · · | · · <sup>M</sup> , thus it can be seen as being axiomatized by this *finite* and *analytic* two-dimensional system (contrast with the *infinite* Set-Set axiomatization known for this logic provided in that same example).

We constructed above a Σ-nd-B-matrix from two Σ-nd-matrices in such a way that the one-dimensional logics determined by latter are fully recoverable from the former. We formalize this construction below:

**Definition 6.** *Let* <sup>M</sup> := -**<sup>A</sup>**, D *and* <sup>M</sup> := -**<sup>A</sup>**, D *be* Σ*-nd-matrices. The* <sup>B</sup>product *between* <sup>M</sup> *and* <sup>M</sup> *is the* <sup>Σ</sup>*-nd-*B*-matrix* <sup>M</sup> <sup>M</sup> := -**<sup>A</sup>**, D, D *.*

Note that Φ -M <sup>Ψ</sup> iff <sup>Φ</sup> <sup>|</sup> <sup>Ψ</sup> <sup>M</sup>-M iff Φ -<sup>M</sup> <sup>M</sup> <sup>t</sup> <sup>Ψ</sup>, and <sup>Φ</sup> -<sup>M</sup> <sup>Ψ</sup> iff <sup>Ψ</sup> <sup>|</sup> <sup>Φ</sup> <sup>M</sup>-M- iff Φ -<sup>M</sup> <sup>M</sup> <sup>f</sup> <sup>Ψ</sup>. Therefore, -M and -<sup>M</sup> are easily recoverable from · · | · · <sup>M</sup>-M- , since they inhabit, respectively, the t-aspect and the f-aspect of the latter. One of the applications of this novel way of putting two distinct logics together was illustrated in that same Example 2 to produce a two-dimensional analytic and finite axiomatization for a one-dimensional logic characterized by a Σ-ndmatrix. As we have shown, the latter one-dimensional logic does not need to be finitely axiomatizable by a Set-Set H-system. We present this application of B-products with more generality below:

**Proposition 2.** *Let* <sup>M</sup> := -**<sup>A</sup>**, D *be a* Σ*-nd-matrix and suppose that* U <sup>⊆</sup> A×A *contains all and only the pairs of distinct truth-values that fail to be separable in* <sup>M</sup>*. If, for some* <sup>M</sup> := -**<sup>A</sup>**, D *, the pairs in* U *are separable in* <sup>M</sup> *, then* <sup>M</sup>M *is sufficiently expressive (thus, axiomatizable by an analytic* Set<sup>2</sup>*-*Set<sup>2</sup> *H-system, that is finite whenever* **A** *is finite).*

#### **6 A Finite and Analytic Proof System for mCi**

In the spirit of Proposition 2, we define below a nd-B-matrix by combining the matrices <sup>M</sup>**mCi** := -**<sup>A</sup>**<sup>5</sup>,Y5 and <sup>M</sup><sup>n</sup> **mCi** := -**<sup>A</sup>**<sup>5</sup>, <sup>N</sup>5 introduced in Sect. <sup>4</sup> (Definition 1 and Definition 4):

**Definition 7.** *Let* <sup>M</sup>**mCi** := <sup>M</sup>**mCi** M<sup>n</sup> **mCi** = -**<sup>A</sup>**<sup>5</sup>,Y<sup>5</sup>, <sup>N</sup>5*, with* <sup>Y</sup><sup>5</sup> := {I,T,t} *and* <sup>N</sup><sup>5</sup> := {f, I,T}*.*

When we consider now both sets Y<sup>5</sup> and N<sup>5</sup> of designated and antidesignated truth-values, the separation of all truth-values of **A**<sup>5</sup> becomes possible, that is, M**mCi** is sufficiently expressive, as guaranteed by Proposition 2. Furthermore, notice that we have two alternatives for separating the pairs (I,t) and (I,T): either using the formula <sup>¬</sup>p or the formula ◦p. With this finite sufficiently expressive nd-B-matrix in hand, producing a *finite* {p, ◦p}-analytic two-dimensional Hsystem for it is immediate by [17, Theorem 2]. Since **mCi** inhabits the t-aspect of · · | · · <sup>M</sup>**mCi** , we may then conclude that:

**Theorem 5. mCi** *is axiomatizable by a finite and analytic two-dimensional H-system.*

Our axiomatization recipe delivers an H-system with about 300 rule schemas. When we simplify it using the streamlining procedures indicated in that paper, we obtain a much more succinct and insightful presentation, with 28 rule schemas, which we call R**mCi**. The full presentation of this system is given below:

$$\begin{array}{c||c||c||c||c||c||c}\hline \text{q} & \text{p} & \text{m}^{\text{c}} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p} & \text{p$$

Note that the set of rules {<sup>c</sup> **mCi** <sup>i</sup> <sup>|</sup> ∈ {∧ <sup>c</sup> ,∨, ⊃ }, i ∈ {1, <sup>2</sup>, <sup>3</sup>}} makes it clear that the t-aspect of the induced B-consequence relation is inhabited by a logic extending positive classical logic, while the remaining rules for these connectives involve interactions between the two dimensions. Also, rule <sup>¬</sup>**mCi** <sup>2</sup> indicates that ◦ satisfies one of the main conditions for being taken as a consistency connective in the logic inhabiting the t-aspect. In fact, all these observations are aligned with the fact that the logic inhabiting the t-aspect of · · | · · <sup>R</sup>**mCi** is precisely **mCi**. See, in Fig. 3, <sup>R</sup>**mCi**-derivations showing that, in **mCi**, ¬◦<sup>p</sup> and p∧¬p are logically equivalent and that ◦¬◦p is a theorem.

**Fig. 3.** R**mCi**-derivations showing, respectively, that <sup>∅</sup> *<sup>p</sup>*∧¬*<sup>p</sup>* | ¬◦*p* ∅ <sup>R</sup>**mCi** , <sup>∅</sup> ¬◦*<sup>p</sup>* | *p*∧¬*p* ∅ R**mCi** and <sup>∅</sup> ∅ | ◦¬◦*p* ∅ R**mCi** . Note that, for a cleaner presentation, we omit the formulas inherited from parent nodes.

#### **7 Concluding Remarks**

In this work, we introduced a mechanism for combining two non-deterministic logical matrices into a non-deterministic B-matrix, creating the possibility of producing finite and analytic two-dimensional axiomatizations for one-dimensional logics that may fail to be finitely axiomatizable in terms of one-dimensional Hilbert-style systems. It is worth mentioning that, as proved in [17], one may perform proof search and countermodel search over the resulting two-dimensional systems in time at most exponential on the size of the B-statement of interest through a straightforward proof-search algorithm.

We illustrated the above-mentioned combination mechanism with two examples, one of them corresponding to a well-known logic of formal inconsistency called **mCi**. We ended up proving not only that this logic is not finitely axiomatizable in one dimension, but also that it is the limit of a strictly increasing chain of LFIs extending the logic **mbC**. From the perspective of the study of Bconsequence relations, these examples allow us to eliminate the suspicion that a two-dimensional H-system R may always be converted into Set-Set H-systems for the logics inhabiting the one-dimensional aspects of · · | · · <sup>R</sup> without losing any desirable property (in this case, finiteness of the presentation).

At first sight, the formalism of two-dimensional H-systems may be confused with the formalism of n-sided sequents [3,4], in which the objects manipulated by rules of inference (the so-called n*-sequents*) accommodate more than two sets of formulas in their structures. The reader interested in a comparison between these two different approaches is referred to the concluding remarks of [17].

We close with some observations regarding M**mCi** and the two-dimensional H-system R**mCi**. A one-dimensional logic is said to be ¬*-consistent* when ϕ,¬ϕ-<sup>∅</sup> and <sup>¬</sup>*-determined* when <sup>∅</sup>ϕ,¬ϕ for all ϕ <sup>∈</sup> LΣ(P). A <sup>B</sup>-consequence relation · · | · · is said to *allow for gappy reasoning* when ϕ ×| ϕ and to *allow for glutty reasoning* when <sup>ϕ</sup>×| <sup>ϕ</sup> , for some <sup>ϕ</sup> <sup>∈</sup> <sup>L</sup>Σ(P). Notice that <sup>¬</sup>-determinedness in the logic inhabiting the t-aspect of a B-consequence relation by no means implies the disallowance of gappy reasoning in the two-dimensional setting: we still have F <sup>∈</sup> <sup>Y</sup>5∩N5, so one may both non-accept and non-reject a formula <sup>ϕ</sup> in · · | · · <sup>R</sup>**mCi** , even though non-accepting both ϕ and its negation in **mCi** is not possible, in view of rule <sup>¬</sup>**mCi** <sup>7</sup> . Similarly, the recovery of ¬-consistency achieved via ◦ in such logic does not coincide with the gentle disallowance of glutty reasoning in · · | · · <sup>R</sup>**mCi** , that is, we do not have, in general, p,◦<sup>p</sup> <sup>|</sup> <sup>p</sup> <sup>R</sup>**mCi** or <sup>p</sup> <sup>|</sup> ◦p,p <sup>R</sup>**mCi** , even though for binary compounds both are derivable in view of rules <sup>c</sup> **mCi** <sup>5</sup> , for ∈ {∧ <sup>c</sup> ,∨, ⊃ }, and ◦**mCi** <sup>1</sup> . With these observations we hope to call attention to the fact that B-consequence relations open the doors for further developments concerning the study of paraconsistency (and, dually, of paracompleteness), as well as the study of recovery operators [8].

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Vampire Getting Noisy: Will Random Bits Help Conquer Chaos? (System Description)**

Martin Suda(B)

Czech Technical University in Prague, Prague, Czech Republic martin.suda@cvut.cz

**Abstract.** Treating a saturation-based automatic theorem prover (ATP) as a Las Vegas randomized algorithm is a way to illuminate the chaotic nature of proof search and make it amenable to study by probabilistic tools. On a series of experiments with the ATP Vampire, the paper showcases some implications of this perspective for prover evaluation.

**Keywords:** Saturation-based proving *·* Evalutation *·* Randomization

# **1 Introduction**

Saturation-based proof search is known to be fragile. Even seemingly insignificant changes in the search procedure, such as shuffling the order in which input formulas are presented to the prover, can have a huge impact on the prover's running time and thus on the ability to find a proof within a given time limit.

This *chaotic* aspect of the prover behaviour is relatively poorly understood, yet has obvious consequences for evaluation. A typical experimental evaluation of a new technique T compares the number of problems solved by a baseline run with a run enhanced by T (over an established benchmark and with a fixed timeout). While a higher number of problems solved by the run enhanced by T indicates a benefit of the new technique, it is hard to claim that a certain problem <sup>P</sup> is getting solved *thanks* to <sup>T</sup>. It might be that <sup>T</sup> just helps the prover get lucky on P by a complicated chain of cause and effect not related to the technique T—and the original idea behind it—in any reasonable sense.

We propose to expose and counter the effect of chaotic behaviours by deliberately *injecting randomness* into the prover and observing the results of many independently seeded runs. Although computationally more costly than standard evaluation, such an approach promises to bring new insights. We gain the ability to apply the tools of probability theory and statistics to analyze the results, assign confidences, and single out those problems that *robustly* benefit

c The Author(s) 2022

This work was supported by the Czech Science Foundation project 20-06390Y and the project RICAIP no. 857306 under the EU-H2020 programme.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 659–667, 2022. https://doi.org/10.1007/978-3-031-10769-6\_38

from the evaluated technique. At the same time, by observing the changes in the corresponding runtime distributions we can even meaningfully establish the effect of the new technique on a single problem in isolation, something that is normally inconclusive due to the threat of chaotic fluctuations.

In this paper, we report on several experiments with a randomized version of the ATP Vampire [9]. After explaining the method in more detail (Sect. 2), we first demonstrate the extent in which the success of a typical Vampire proof search strategy can be ascribed to chance (Sect. 3). Next, we use the collected data to highlight the specifics of comparing two strategies probabilistically (Sect. 4). Finally, we focus on a single problem to see a chaotic behaviour smoothened into a distribution with a high variance (Sect. 5). The paper ends with an overview of related work (Sect. 6) and a discussion (Sect. 7).

## **2 Randomizing Out Chaos**

Any developer of a saturation-based prover will confirm that the behaviour of a specific proving strategy on a specific problem is extremely hard to predict, that a typical experimental evaluation of a new technique (such as the one described earlier) invariably leads to both gains and losses in terms of the solved problems, and that a closer look at any of the "lost" problems often reveals just a complicated chain of cause and effect that steers the prover away from the original path (rather than a simple opportunity to improve the technique further).

These observations bring indirect evidence that the prover's behaviour is chaotic: A specific prover run can be likened to a single bead falling down through the pegs of the famous Galton board<sup>1</sup>. The bead follows a deterministic trajectory, but only because the code fixes every single detail of the execution, including many which the programmer did not care about and which were left as they are merely out of coincidence. We put forward here that any such fixed detail (which does not contribute to an officially implemented heuristic) represents a candidate location for randomization, since a different programmer could have fixed the detail differently and we would still call the code essentially the same.

*Implementation:* We implemented randomization on top of Vampire version 4.6.1; the code is available as a separate git branch<sup>2</sup>. We divided the randomization opportunities into three groups (governed by three new Vampire options).

Shuffling the input (-si on) randomly reorders the input formulas and, recursively, sub-formulas under commutative logical operations. This is done several times throughout the preprocessing pipeline, at the end of which a finished clause normal form is produced. Randomizing traversals (-rtra on) happens during saturation and consists of several randomized reorderings including: reordering literals in a newly generated clause and in each given clause before activation, and shuffling the order in which generated clauses are put into the

<sup>1</sup> https://en.wikipedia.org/wiki/Galton board.

<sup>2</sup> https://github.com/vprover/vampire/tree/randire.

**Fig. 1.** Blue: first-order TPTP problems ordered by the decreasing probability of being solved by the dis10 strategy within 50 billion instruction limit. Red: a cactus plot for the same strategy, showing the dependence between a given instruction budget (*y*-axis) and the number of problems on average solved within that budget (*x*-axis). (Color figure online)

passive set. It also (partially) randomizes term ids, which are used as tiebreakers in various term indexing operations and determine the default orientation of equational literals in the term sharing structure. Finally, "randomized age-weight ratio" (-rawr on) swaps the default, deterministic mechanism for choosing the next queue to select the given clause from [13] for a randomized one (which only respects the age-weight ratio probabilistically).

All the three options were active by default during our experiments.

#### **3 Experiment 1: A Single-Strategy View**

First, we set out to establish to what degree the performance of a Vampire strategy can be affected by randomization. We chose the default strategy of the prover except for the saturation algorithm, which we set to Discount, and the age-weight ratio, set to 1:10 ( calling the strategy dis10). We ran our experiment on the first-order problems from the TPTP library [15] version 7.5.0<sup>3</sup>.

To collect our data, we repeatedly (with different seeds) ran the prover on the problems, performing full randomization. We measured the executed instructions<sup>4</sup> needed to successfully solve a problem and used a limit of 50 billion instructions (which roughly corresponds to 15 s of running time on our machine<sup>5</sup>) after which a run was declared unsuccessful. We ran the prover 10 times on each problem and additionally as many times as required to observe the instruction count average (over both successful and unsuccessful runs) stabilize within 1% from any of its 10 previously recorded values<sup>6</sup>.

A summary view of the experiment is given by Fig. 1. The most important to notice is the shaded region there, which spans 965 problems that were solved by

<sup>3</sup> Materials accompanying the experiments can be found at https://bit.ly/3JDCwea.

<sup>4</sup> As measured via the perf event open Linux performance monitoring feature.

<sup>5</sup> A server with Intel(R) Xeon(R) Gold 6140 CPUs @ 2.3 GHz and 500 GB RAM.

<sup>6</sup> Utilizing all the 72 cores of our machine, such data collection took roughly 12 h.

**Fig. 2.** The effect of turning AVATAR off in the dis10 strategy (cf. Figure 1).

dis10 at least once but not by every run. In other words, these problems have probability p of being solved between 0 <p< 1. This is a relatively large number and can be compared to the 8720 "easy" problems solved by every run. The collected data implies that 9319.1 problems are being solved on average (marked by the left-most dashed line in Fig. 1) with a standard deviation σ = 11.7. The latter should be an interesting indicator for prover developers: beating a baseline by only 12 TPTP problems can easily be ascribed just to chance.

Figure 1 also contains the obligatory "cactus plot" (explained in the caption), which—thanks to the collected data—can be constructed with the "on average" qualifier. By definition, the plot reaches the left-most dashed line for the full instruction budged of 50 billion. The subsequent dashed lines mark the number of problems we would on average expect to solve by running the prover (independently) on each problem twice, three, four and five times. This is an information relevant for strategy scheduling: e.g., one can expect to solve whole additional 137 problems by running randomized dis10 for a second time.

Not every strategy exhibits the same degree of variability under randomization. Observe Fig. 2 with a plot analogous to Fig. 1, but for dis10 in which the AVATAR [16] has been turned off. The shaded area there is now much smaller (and only spans 448 problems). The powerful AVATAR architecture is getting convicted of making proof search more fragile and the prover less robust<sup>7</sup>.

*Remark.* Randomization incurs a small but measurable computational overhead. On a single run of dis10 over the first-order TPTP (filtering out cases that took less than 1 s to finish, to prevent distortion by rounding errors) the observed median relative time spent randomizing on a single problem was 0.47%, the average 0.59%, and the worse<sup>8</sup> 13.86%. Without randomization, the dis10 strategy solved 9335 TPTP problems under the 50 billion instruction limit, i.e., 16 problems more than the average reported above. Such is the price we pay for turning our prover into a Las Vegas randomized algorithm.

<sup>7</sup> Another example of a strong but fragile heuristic is the lookahead literal selection [5], which selects literals in a clause based on the current content of the active set: dis10 enhanced with lookahead solves 9512.4 (*±*13*.*8) TPTP problems on average, 8672 problems with *p* = 1 and additional 1382 (!) problems with 0 *<p<* 1.

<sup>8</sup> On the hard-to-parse, trivial-to-solve HWV094-1 with 361 199 clauses.

**Fig. 3.** Scatter plots comparing probabilities of solving a TPTP problem by the baseline dis10 strategy and 1) dis10 with AVATAR turned off (left), and 2) dis10 with blocked clause elimination turned on (right). On problems marked red the respective technique could not be applied (no splittable clauses derived / no blocked clauses eliminated).

#### **4 Experiment 2: Comparing Two Strategies**

Once randomized performance profiles of multiple strategies are collected, it is interesting to look at two at a time. Figure 3 shows two very different scatter plots, each comparing our baseline dis10 to its modified version in terms of the probabilities of solving individual problems.

On the left we see the effect of turning AVATAR off. The technique affects the proving landscape quite a lot and most problems have their mark along the edges of the plot where at least one of the two probabilities has the extreme value of either 0 or 1. What the plot does not show well, is how many marks end up at the extreme corners. These are: 7896 problems easy for both, 661 easy for AVATAR and hard without, 135 hard for AVATAR and easy without.

Such "purified", one-sided gains and losses constitute a new interesting indicator of the impact of a given technique. They should be the first to look at, e.g., during debugging, as they represent the most extreme but robust examples of how the new technique changes the capabilities of the prover.

The right plot is an analogous view, but now at the effect of turning on *blocked clause elimination* (BCE). This is a preprocessing technique coming from the context of propositional satisfiability [7] extended to first-order logic [8]. We see that here most of the visible problems show up as marks along the plot's main diagonal, suggesting a (mostly) negligible effect of the technique. The extreme corners hide: 8648 problems easy for both, 17 easy with BCE (11 satisfiable and 6 unsatisfiable), and 2 easy without BCE (1 satisfiable and 1 unsatisfiable).

**Fig. 4.** 2D-histograms for the relative frequencies (color-scale) of how often, given a specific *awr* (*x*-axis), solving PRO017+2 required the shown number of instructions (*y*axis). The curves in pink highlight the mean *y*-value for every *x*. The performance of dis10 (left) and the same strategy enhanced by a goal-directed heuristic (right). (Color figure online)

#### **5 Experiment 3: Looking at One Problem at a Time**

In their paper on age/weight shapes [13, Fig. 2], Rawson and Reger plot the number of given-clause loops required by Vampire to solve the TPTP problem PRO017+2 as a function of age/weight ratio (*awr* ), a ratio specifying how often the prover selects the next clause to activate from its age-ordered and weightordered queues, respectively. The curve they obtain is quite "jiggly", indicating a fragile (discontinuous) dependence. Randomization allows us to smoothen the picture and reveal new, until now hidden, (probabilistic) patterns.

The 2D-histogram in Fig. 4 (left) was obtained from 100 independently seeded runs for each of 1200 distinct values of *awr* from between 1:1024 = 2−<sup>10</sup> and 4:1 = 2<sup>2</sup>. We can confirm Rawson and Reger's observation of the best *awr* for PRO017+2 lying at around 1:2. However, we can now also attempt to explain the "jiggly-ness" of their curve: With a fragile proof search, even a slight change in *awr* effectively corresponds to an independent sample from the prover's execution resource<sup>9</sup> distribution, which—although changing continuously with *awr*—is of a high variance for our problem (note the log-scale of the y-axis)<sup>10</sup>.

The distribution has another interesting property: At least for certain values of *awr* it is distinctly multi-modal. As if the prover can either find a proof quickly (after a lucky event?) or only after much harder effort later and almost nothing in between. Shedding more light on this phenomenon is left for further research.

It is also very interesting to observe the change of such a 2D-histogram when we modify the proof search strategy. Figure 4 (right) shows the effect of turning on SInE-level split queues [3], a goal directed clause selection heuristic

<sup>9</sup> Rawson and Reger [13] counted given-clause loops, we measure instructions.

<sup>10</sup> Even with 100 samples for each value of *awr*, the mean instruction count (rendered in pink in Fig. 4) looks jiggly towards the weight-heavy end of the plot.

(Vampire option -slsq on). We can see that the mean instruction count gets worse (for every tried *awr* value) and also the variance of the distribution distinctly increases. A curious effect of this is that we observe the shortest successful runs with -slsq on, while we still could not recommend (in the case of PRO017+2) this heuristic to the user. The probabilistic view makes us realize that there are competing criteria of prover performance for which one might want to optimize.

## **6 Related Work**

The idea of randomizing a theorem prover is not new. Ertel [2] studied the speedup potential of running independently seeded instances of the connection prover SETHEO [10]. The dashed lines in our Figs. 1 and 2 capture an analogous notion in terms of "additional problems covered" for levels of parallelism 1*−*5. randoCoP [12] is a randomized version of another connection prover, leanCoP 2.0 [11]: especially in its incomplete setup, several restarts with different seeds helped randoCoP improve over leanCoP in terms of the number of solved problems.

Gomes et al. [4] notice that randomized complete backtracking algorithms for propositional satisfiability (SAT) lead to heavy-tailed runtime distributions on satisfiable instances. While we have not yet analyzed the runtime distributions coming from saturation-based first-order proof search in detail, we definitely observed high variance also for unsatisfiable problems. Also in the domain of SAT, Brglez et al. [1] proposed input shuffling as a way of turning solver's runtime into a random variable and studied the corresponding distributions.

An interesting view on the trade-offs between expected performance of a randomized solver and the risk associated with waiting for an especially long run to finish is given by Huberman et al. [6]. This is related to the last remark of the previous section.

Finally, in the satisfiability modulo theories (SMT) community, input shuffling, or scrambling, has been discussed as an obfuscation measure in competitions [17], where it should prevent the solvers to simply look up a precomputed answer upon recognising a previously seen problem. Notable is also the use of randomization in solver debugging via fuzz testing [14,18].

## **7 Discussion**

As we have seen, the behaviour of a state-of-the-art saturation-based theorem prover is to a considerable degree chaotic and on many problems a mere perturbation of seemingly unimportant execution details decides about the success or the failure of the corresponding run. While this may be seen as a sign of our as-of-yet imperfect grasp of the technology, the author believes that an equally plausible view is that some form of chaos is inherent and originates from the complexity of the theorem proving task itself. (A higher-order logic proof search is expected to exhibit an even higher degree of fragility.)

This paper has proposed randomization as a key ingredient to a prover evaluation method that takes the chaotic nature of proof search into account. The extra cost required by the repeated runs, in itself not unreasonable to pay on contemporary parallel hardware, seems more than compensated by the new insights coming from the probabilistic picture that emerges. Moreover, other uses of randomization are easy to imagine, such as data augmentation for machine learning approaches or the construction of more robust strategy schedules. It feels that we only scratched the surface of the opened-up possibilities. More research will be needed to fully harness the potential of this perspective.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Evolution, Termination, and Decision Problems**

# **On Eventual Non-negativity and Positivity for the Weighted Sum of Powers of Matrices**

S. Akshay(B) , Supratik Chakraborty(B) , and Debtanu Pal

Indian Institute of Technology Bombay, Mumbai 400076, India {akshayss,supratik,debtanu}@cse.iitb.ac.in

**Abstract.** The long run behaviour of linear dynamical systems is often studied by looking at eventual properties of matrices and recurrences that underlie the system. A basic problem in this setting is as follows: given a set of pairs of rational weights and matrices {(w1, A1),..., (wm, Am)}, does there exist an integer N s.t for all n ≥ N, m <sup>i</sup>=1 <sup>w</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> <sup>i</sup> ≥ 0 (resp. > 0). We study this problem, its applications and its connections to linear recurrence sequences. Our first result is that for m ≥ 2, the problem is as hard as the ultimate positivity of linear recurrences, a long standing open question (known to be coNP-hard). Our second result is that for any m ≥ 1, the problem reduces to ultimate positivity of linear recurrences. This yields upper bounds for several subclasses of matrices by exploiting known results on linear recurrence sequences. Our third result is a general reduction technique for a large class of problems (including the above) from diagonalizable case to the case where the matrices are simple (have non-repeated eigenvalues). This immediately gives a decision procedure for our problem for diagonalizable matrices.

**Keywords:** Eventual properties of matrices · Ultimate Positivity · linear recurrence sequences

# **1 Introduction**

The study of eventual or asymptotic properties of discrete-time linear dynamical systems has long been of interest to both theoreticians and practitioners. Questions pertaining to (un)-decidability and/or computational complexity of predicting the long-term behaviour of such systems have been extensively studied over the last few decades. Despite significant advances, however, there remain simple-to-state questions that have eluded answers so far. In this work, we investigate one such problem, explore its significance and links with other known problems, and study its complexity and computability landscape.

Author names are in alphabetical order of last names.

c The Author(s) 2022 J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 671–690, 2022. https://doi.org/10.1007/978-3-031-10769-6\_39

This work was partly supported by DST/CEFIPRA/INRIA Project EQuaVE and DST/SERB Matrices Grant MTR/2018/000744.

The time-evolution of linear dynamical systems is often modeled using linear recurrence sequences, or using sequences of powers of matrices. Asymptotic properties of powers of matrices are therefore of central interest in the study of linear differential systems, dynamic control theory, analysis of linear loop programs etc. (see e.g. [26,32,36,37]). The literature contains a rich body of work on the decidability and/or computational complexity of problems related to the long-term behaviour of such systems (see, e.g. [15,19,27,29,36,37]). A question of significant interest in this context is whether the powers of a given matrix of rational numbers eventually have only non-negative (resp. positive) entries. Such matrices, also called *eventually non-negative* (resp. *eventually positive*) matrices, enjoy beautiful algebraic properties ([13,16,25,38]), and have been studied by mathematicians, control theorists and computer scientists, among others. For example, the work of [26] investigates reachability and holdability of nonnegative states for linear differential systems – a problem in which eventually non-negative matrices play a central role. Similarly, eventual non-negativity (or positivity) of a matrix modeling a linear dynamical system makes it possible to apply the elegant Perron-Frobenius theory [24,34] to analyze the long-term behaviour of the system beyond an initial number of time steps. Another level of complexity is added if the dynamics is controlled by a set of matrices rather than a single one. For instance, each matrix may model a mode of the linear dynamical system [23]. In a partial observation setting [22,39], we may not know which mode the system has been started in, and hence have to reason about eventual properties of this multi-modal system. This reduces to analyzing the sum of powers of the per-mode matrices, as we will see.

Motivated by the above considerations, we study the problem of determining whether a given matrix of rationals is eventually non-negative or eventually positive and also a generalized version of this problem, wherein we ask if the *weighted sum of powers of a given set of matrices of rationals* is eventually non-negative (resp. positive). Let us formalize the general problem statement. *Given a set* <sup>A</sup> <sup>=</sup> {(w1, A1),...(wm, Am)}*, where each* <sup>w</sup><sup>i</sup> *is a rational number and each* <sup>A</sup><sup>i</sup> *is a* <sup>k</sup>×<sup>k</sup> *matrix of rationals, we wish to determine if* m <sup>i</sup>=1 <sup>w</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> <sup>i</sup> *has only non-negative (resp. positive) entries for all sufficiently large values of* n*.* We call this problem *Eventually Non-Negative (resp. Positive) Weighted Sum of Matrix Powers* problem, or ENNSoM (resp. EPSoM) for short. The eventual non-negativity (resp. positivity) of powers of a single matrix is a special case of the above problem, where <sup>A</sup> <sup>=</sup> {(1, A)}. We call this special case the *Eventually Non-Negative (resp. Positive) Matrix* problem, or ENNMat (resp. EPMat) for short.

Given the simplicity of the ENNSoM and EPSoM problem statements, one may be tempted to think that there ought to be simple algebraic characterizations that tell us whether m <sup>i</sup>=1 <sup>w</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> <sup>i</sup> is eventually non-negative or positive. But in fact, the landscape is significantly nuanced. On one hand, a solution to the general ENNSoM or EPSoM problem would resolve long-standing open questions in mathematics and computer science. On the other hand, efficient algorithms can indeed be obtained under certain well-motivated conditions. This paper is a study of both these aspects of the problem. Our primary contributions can be summarized as follows. Below, we use <sup>A</sup> <sup>=</sup> {(w1, A1),...(wm, Am)} to define an instance of ENNSoM or EPSoM.

1. If |A| ≥ 2, we show that both ENNSoM and EPSoM are as hard as the ultimate non-negativity problem for linear recurrence sequences (UNNLRS, for short). The decidability of UNNLRS is closely related to Diophantine approximations, and remains unresolved despite extensive research (see e.g. [31]). Since UNNLRS is coNP-hard (in fact, as hard as the decision problem for

universal theory of reals), so is ENNSoM and EPSoM, when |A| ≥ 2. Thus, unless P = NP, we cannot hope for polynomial-time algorithms, and any algorithm would also resolve long-standing open problems.


As mentioned earlier, the eventual non-negativity and positivity problem for single rational matrices are well-motivated in the literature, and EPMat (or EPSoM with |A| = 1) is known to be in PTIME [25]. But for ENNMat, no decidability results are known to the best of our knowledge. From our work, we obtain two new results about ENNMat: (i) in general ENNMat reduces to UNNLRS and (ii) for diagonalizable matrices, we can decide ENNMat. What is surprising (see Sect. 5) is that the latter decidability result goes via ENNSoM, i.e. the multiple matrices case. Thus, reasoning about sums of powers of matrices, viz. ENNSoM, is useful even when reasoning about powers of a single matrix, viz. ENNMat.

*Potential Applications of* ENNSoM *and* EPSoM*.* A prime motivation for defining the generalized problem statement ENNSoM is that it is useful even when reasoning about the single matrix case ENNMat. However and unsurprisingly, ENNSoM and EPSoM are also well-motivated independently. Indeed, for every application involving a linear dynamical system that reduces to ENNMat/EPMat, there is a naturally defined aggregated version of the application involving multiple independent linear dynamical systems that reduces to ENNSoM/EPSoM (e.g., the *swarm of robots* example in [3]).

Beyond this, ENNSoM/EPSoM arise naturally and directly when solving problems in different practical scenarios. Due to lack of space, we detail two applications here and describe more in the longer version of the paper [3].

**Partially Observable Multi-modal Systems.** Our first example comes from the domain of cyber-physical systems in a partially observable setting. Consider a system (e.g. a robot) with m modes of operation, where the i th mode dynamics is given by a linear transformation encoded as a <sup>k</sup>×<sup>k</sup> matrix of rationals, say <sup>A</sup>i. Thus, if the system state at (discrete) time t is represented by a k-dimensional rational (row) vector **ut**, the state at time t + 1, when operating in mode i, is given by **ut**Ai. Suppose the system chooses to operate in one of its various modes at time 0, and then sticks to this mode at all subsequent time. Further, the initial choice of mode is not observable, and we are only given a probability distribution over modes for the initial choice. This is natural, for instance, if our robot (multimodal system) knows the terrain map and can make an initial choice of which path (mode) to take, but cannot change its path once it has chosen. If p<sup>i</sup> is a rational number denoting the probability of choosing mode i initially, then the expected state at time n is given by m <sup>i</sup>=1 <sup>p</sup><sup>i</sup> · **<sup>u</sup>0**A<sup>n</sup> <sup>i</sup> = **u<sup>0</sup>** m <sup>i</sup>=1 <sup>p</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> i . A safety question in this context is whether starting from a state **u<sup>0</sup>** with all nonnegative (resp. positive) components, the system is expected to eventually stay locked in states that have all non-negative (resp. positive) components. In other words, does **u<sup>0</sup>** m <sup>i</sup>=1 <sup>p</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> i have all non-negative (resp. positive) entries for all sufficiently large n? Clearly, a sufficient condition for an affirmative answer to this question is to have n <sup>i</sup>=1 <sup>p</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> <sup>i</sup> eventually non-negative (resp. positive), which is an instance of ENNSoM (resp. EPSoM).

**Commodity Flow Networks.** Consider a flow network where m different commodities {c1,...,c<sup>m</sup>} use the same flow infrastructure spanning <sup>k</sup> nodes, but have different loss/regeneration rates along different links. For every pair of nodes i, j ∈ {1,...,k} and for every commodity <sup>c</sup> ∈ {c1,...,c<sup>m</sup>}, suppose Ac[i, j] gives the fraction of the flow of commodity c starting from i that reaches j through the link connecting i and j (if it exists). In general, Ac[i, j] is the product of the fraction of the flow of commodity c starting at i that is sent along the link to j, and the loss/regeneration rate of c as it flows in the link from i to j. Note that Ac[i, j] can be 0 if commodity c is never sent directly from i to j, or the commodity is lost or destroyed in flowing along the link from i to j. It can be shown that A<sup>n</sup> <sup>c</sup> [i, j] gives the fraction of the flow of c starting from i that reaches j after n hops through the network. If commodities keep circulating through the network ad-infinitum, we wish to find if the network gets *saturated*, i.e., for all sufficiently long enough hops through the network, there is a non-zero fraction of some commodity that flows from i to j for every pair i, j. This is equivalent to asking if there exists <sup>N</sup> <sup>∈</sup> <sup>N</sup> such that m -=1 <sup>A</sup><sup>n</sup> c*-* > 0. If different commodities have different weights (or costs) associated, with commodity c<sup>i</sup> having the weight wi, the above formulation asks if m -=1 <sup>w</sup>-.A<sup>n</sup> c is eventually positive, which is effectively the EPSoM problem.

*Other Related Work.* Our problems of interest are different from other wellstudied problems that arise if the system is allowed to choose its mode independently at each time step (e.g. as in Markov decision processes [5,21]). The crucial difference stems from the fact that we require that the mode be chosen once initially, and subsequently, the system must follow the same mode forever. Thus, our problems are prima facie different from those related to general probabilistic or weighted finite automata, where reachability of states and questions pertaining to long-run behaviour are either known to be undecidable or have remained open for long ([6,12,17]). Even in the case of unary probabilistic/weighted finite automata [1,4,8,11], reachability is known in general to be as hard as the Skolem problem on linear recurrences – a long-standing open problem, with decidability only known in very restricted cases. The difference sometimes manifests itself in the simplicity/hardness of solutions. For example, EPMat (or EPSoM with |A| = 1) is known to be in PTIME [25] (not so for ENNMat however), whereas it is still open whether the reachability problem for unary probabilistic/weighted automata is decidable. It is also worth remarking that instead of the sum of powers of matrices, if we considered the product of their powers, we would effectively be solving problems akin to the *mortality problem* [9,10] (which asks whether the all-0 matrix can be reached by multiplying with repetition from a set of matrices) – a notoriously difficult problem. The diagonalizable matrix restriction is a common feature in in the context of linear loop programs (see, e.g., [7,28]), where matrices are used for updates. Finally, logics to reason about temporal properties of linear loops have been studied, although decidability is known only in restricted settings, e.g. when each predicate defines a semi-algebraic set contained in some 3-dimensional subspace, or has intrinsic dimension 1 [20].

### **2 Preliminaries**

The symbols Q, R, A and C denote the set of rational, real, algebraic and complex numbers respectively. Recall that an *algebraic number* is a root of a non-zero polynomial in one variable with rational coefficients. An algebraic number can be real or complex. We use RA to denote the set of real algebraic numbers (which includes all rationals). The sum, difference and product of two (real) algebraic numbers is again (real) algebraic. Furthermore, every root of a polynomial equation with (real) algebraic coefficients is again (real) algebraic. We call matrices with all rational (resp. real algebraic or real) entries *rational* (resp. *real algebraic* or *real*) *matrices*. We use <sup>A</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>l</sup> (resp. <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>k</sup>×<sup>l</sup> and <sup>A</sup> <sup>∈</sup> RA<sup>k</sup>×<sup>l</sup> ) to denote that <sup>A</sup> is a <sup>k</sup>×<sup>l</sup> rational (resp. real and real algebraic) matrix, with rows indexed 1 through k, and columns indexed 1 through l. The entry in the i th row and jth column of a matrix A is denoted A[i, j]. If A is a column vector (i.e. l = 1), we often use boldface letters, viz. **A**, to refer to it. In such cases, we use **A**[i] to denote the i th component of **<sup>A</sup>**, i.e. <sup>A</sup>[i, 1]. The transpose of a <sup>k</sup> <sup>×</sup> <sup>l</sup> matrix <sup>A</sup>, denoted <sup>A</sup><sup>T</sup>, is the <sup>l</sup> <sup>×</sup> <sup>k</sup> matrix obtained by letting <sup>A</sup><sup>T</sup>[i, j] = <sup>A</sup>[j, i] for all <sup>i</sup> ∈ {1,...l} and <sup>j</sup> ∈ {1,...k}. Matrix <sup>A</sup> is said to be *non-negative* (resp. *positive*) if all entries of A are non-negative (resp. positive) real numbers. Given a set <sup>A</sup> <sup>=</sup> {(w1, A1),...(wm, Am)} of (weight, matrix) pairs, where each <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>k</sup> (resp. <sup>∈</sup> RA<sup>k</sup>×<sup>k</sup>) and each <sup>w</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup>, we use -A<sup>n</sup> to denote the weighted matrix sum m <sup>i</sup>=1 <sup>w</sup><sup>i</sup> · <sup>A</sup><sup>n</sup> <sup>i</sup> , for every natural number n > 0. Note that -A<sup>n</sup> is itself a matrix in Q<sup>k</sup>×<sup>k</sup> (resp. RA<sup>k</sup>×<sup>k</sup>).

**Definition 1.** *We say that* A *is eventually non-negative (resp. positive) iff there is a positive integer* N *s.t.,* -<sup>A</sup><sup>n</sup> *is non-negative (resp. positive) for all* <sup>n</sup> <sup>≥</sup> <sup>N</sup>*.*

The ENNSoM (resp. EPSoM) problem, described in Sect. 1, can now be re-phrased as: *Given a set* <sup>A</sup> *of pairs of rational weights and rational* <sup>k</sup> <sup>×</sup> <sup>k</sup> *matrices, is* <sup>A</sup> *eventually non-negative (resp. positive)?* As mentioned in Sect. 1, if <sup>A</sup> <sup>=</sup> {(1, A)}, the ENNSoM (resp. EPSoM) problem is also called ENNMat (resp. EPMat). We note that the study of ENNSoM and EPSoM with |A| = 1 is effectively the study of ENNMat and EPMat i.e., wlog we can assume w = 1.

The *characteristic polynomial* of a matrix <sup>A</sup> <sup>∈</sup> RA<sup>k</sup>×<sup>k</sup> is given by *det*(A−λI), where <sup>I</sup> denotes the <sup>k</sup>×<sup>k</sup> identity matrix. Note that this is a degree <sup>k</sup> polynomial in λ. The roots of the characteristic polynomial are called the *eigenvalues* of A. The non-zero vector solution of the equation A**x** = λi**x**, where λ<sup>i</sup> is an eigenvalue of <sup>A</sup>, is called an *eigenvector* of <sup>A</sup>. Although <sup>A</sup> <sup>∈</sup> RA<sup>k</sup>×<sup>k</sup>, in general it can have eigenvalues <sup>λ</sup> <sup>∈</sup> <sup>C</sup> which are all algebraic numbers. An eigenvector is said to be positive (resp. non-negative) if each component of the eigenvector is a positive (resp. non-negative) rational number. A matrix is called *simple* if all its eigenvalues are distinct. Further, a matrix A is called *diagonalizable* if there exists an invertible matrix S and diagonal matrix D such that SDS−<sup>1</sup> = A.

The study of weighted sum of powers of matrices is intimately related to the study of *linear recurrence sequences (LRS)*, as we shall see. We now present some definitions and useful properties of LRS. For more details on LRS, the reader is referred to the work of Everest et al. [14]. A sequence of rational numbers u <sup>=</sup> u<sup>n</sup><sup>∞</sup> <sup>n</sup>=0 is called an LRS of *order* <sup>k</sup> (<sup>&</sup>gt; 0) if the <sup>n</sup>th term of the sequence, for all <sup>n</sup> <sup>≥</sup> <sup>k</sup>, can be expressed using the recurrence: <sup>u</sup><sup>n</sup> <sup>=</sup> <sup>a</sup><sup>k</sup>−1u<sup>n</sup>−<sup>1</sup> <sup>+</sup> ... <sup>+</sup> <sup>a</sup>1u<sup>n</sup>−k−<sup>1</sup> <sup>+</sup> <sup>a</sup>0u<sup>n</sup>−<sup>k</sup>. Here, <sup>a</sup><sup>0</sup> (= 0), a1,...,a<sup>k</sup>−<sup>1</sup> <sup>∈</sup> <sup>Q</sup> are called the *coefficients* of the LRS, and <sup>u</sup>0, u1,...,u<sup>k</sup>−<sup>1</sup> <sup>∈</sup> <sup>Q</sup> are called the *initial values* of the LRS. Given the coefficients and initial values, an LRS is uniquely defined. However, the same LRS may be defined by multiple sets of coefficients and corresponding initial values. An LRS u is said to be *periodic* with period <sup>ρ</sup> if it can be defined by the recurrence <sup>u</sup><sup>n</sup> <sup>=</sup> <sup>u</sup><sup>n</sup>−<sup>ρ</sup> for all <sup>n</sup> <sup>≥</sup> <sup>ρ</sup>. Given an LRS u, its *characteristic polynomial* is <sup>p</sup>u(x) = <sup>x</sup><sup>k</sup> <sup>−</sup> <sup>k</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>a</sup>ix<sup>i</sup> . We can factorize the characteristic polynomial as <sup>p</sup>u(x) = <sup>d</sup> <sup>j</sup>=1(x−λ<sup>j</sup> )<sup>ρ</sup>*<sup>j</sup>* , where <sup>λ</sup><sup>j</sup> is a root, called a *characteristic root* of *algebraic multiplicity* ρ<sup>j</sup> . An LRS is called *simple* if <sup>ρ</sup><sup>j</sup> = 1 for all <sup>j</sup>, i.e. all characteristic roots are distinct. Let {λ1, λ2,...,λ<sup>d</sup>} be distinct roots of <sup>p</sup>u(x) with multiplicities <sup>ρ</sup>1, ρ2,...,ρ<sup>d</sup> respectively. Then the nth term of the LRS, denoted un, can be expressed as u<sup>n</sup> = d <sup>j</sup>=1 <sup>q</sup><sup>j</sup> (n)λ<sup>n</sup> j , where <sup>q</sup><sup>j</sup> (x) <sup>∈</sup> <sup>C</sup>(x) are univariate polynomials of degree at most <sup>ρ</sup><sup>j</sup> <sup>−</sup> 1 with complex coefficients such that d <sup>j</sup>=1 <sup>ρ</sup><sup>j</sup> <sup>=</sup> <sup>k</sup>. This representation of an LRS is known as the *exponential polynomial solution* representation. It is well known that scaling an LRS by a constant gives another LRS, and the sum and product of two LRSs is also an LRS (Theorem 4.1 in [14]). Given an LRS u defined by <sup>u</sup><sup>n</sup> <sup>=</sup> <sup>a</sup><sup>k</sup>−<sup>1</sup>u<sup>n</sup>−<sup>1</sup> <sup>+</sup> ... <sup>+</sup> <sup>a</sup>1u<sup>n</sup>−k−<sup>1</sup> <sup>+</sup> <sup>a</sup>0u<sup>n</sup>−<sup>k</sup>, we define its *companion matrix* <sup>M</sup>u to be the <sup>k</sup> <sup>×</sup> <sup>k</sup> matrix shown in Fig. 1.

$$M\_{\langle u\rangle} = \begin{bmatrix} a\_{k-1} \ 1 \ \ldots \ 0 \ 0 \\ \vdots \ \vdots \ \ddots \ \vdots \\ a\_2 \ 0 \ \ldots \ 1 \ 0 \\ a\_1 \ 0 \ldots \ 0 \ 1 \\ a\_0 \ 0 \ldots \ 0 \ 0 \end{bmatrix} \quad \text{mit the subscript for clay of notation, and use} \\ \begin{cases} \text{omit the subscript for clay of notation, and use} \\\ M \text{ for } M\_{\langle u\rangle}. \ \text{Let } \mathbf{u} = (u\_{k-1}, \ldots, u\_0) \text{ be a row vector} \\\ \text{tor containing the } k \text{ initial values of the recurrence}, \\\ \text{and let } \mathbf{e\_k} = (0, 0, \ldots, 1)^T \text{ be a column vector of } k \\\ \text{dimensionis with the last element equal to 1 and the} \\\ \text{rest set to 0s. It is easy to see that for all } n \ge 1, \\\ \mathbf{r} \mathbf{g} \ \mathbf{1}. \ \text{Computation matrix} \quad \mathbf{u} M^n \mathbf{e\_k} \text{ gives } u\_n. \ \text{Note that the eigenvalues of the} \end{cases}$$

**Fig. 1.** Companion matrix

matrix M are exactly the roots of the characteristic polynomial of the LRS u. For **<sup>u</sup>** = (u<sup>k</sup>−1,...,u0), we call the matrix <sup>G</sup>u <sup>=</sup> 0 **u <sup>0</sup>**<sup>T</sup> <sup>M</sup>u the *generator matrix* of the LRS u, where **<sup>0</sup>** is a <sup>k</sup>-dimensional vector of all 0s. We omit the

subscript and use <sup>G</sup> instead of <sup>G</sup>u, when the LRS u is clear from the context. It is easy to show from the above that <sup>u</sup><sup>n</sup> <sup>=</sup> <sup>G</sup><sup>n</sup>+1[1, k + 1] for all <sup>n</sup> <sup>≥</sup> 0.

We say that an LRS u is *ultimately non-negative* (resp. *ultimately positive*) iff there exists N > 0, such that <sup>∀</sup><sup>n</sup> <sup>≥</sup> <sup>N</sup>, <sup>u</sup><sup>n</sup> <sup>≥</sup> 0 (resp. <sup>u</sup><sup>n</sup> <sup>&</sup>gt; 0)<sup>1</sup>. The problem of determining whether a given LRS is ultimately non-negative (resp. ultimately positive) is called the *Ultimate Non-negativity* (resp. *Ultimate Positivity*) problem for LRS. We use UNNLRS (resp. UPLRS) to refer to this problem. It is known [19] that UNNLRS and UPLRS are polynomially inter-reducible, and these problems have been widely studied in the literature (e.g., [27,31,32]). A closely related problem is the *Skolem problem*, wherein we are given an LRS u and we are required to determine if there exists <sup>n</sup> <sup>≥</sup> 0 such that <sup>u</sup><sup>n</sup> = 0. The relation between the Skolem problem and UNNLRS (resp. UPLRS) has been extensively studied in the literature (e.g., [18,19,33]).

#### **3 Hardness of Eventual Non-negativity and Positivity**

In this section, we show that UNNLRS (resp. UPLRS) polynomially reduces to ENNSoM (resp. EPSoM) when |A| ≥ 2. Since UNNLRS and UPLRS are known to be coNP-hard (in fact, as hard as the decision problem for the universal theory of reals Theorem 5.3 [31]), we conclude that ENNSoM and EPSoM are also coNP-hard and at least as hard as the decision problem for the universal theory of reals, when |A| ≥ 2. Thus, unless P = NP, there is no hope of finding polynomial-time solutions to these problems.

**Theorem 1.** UNNLRS *reduces to* ENNSoM *with* |A| ≥ 2 *in polynomial time.*

*Proof.* Given an LRS u of order <sup>k</sup> defined by the recurrence <sup>u</sup><sup>n</sup> <sup>=</sup> <sup>a</sup><sup>k</sup>−<sup>1</sup>u<sup>n</sup>−<sup>1</sup> <sup>+</sup> ... <sup>+</sup> <sup>a</sup>1u<sup>n</sup>−k−<sup>1</sup> <sup>+</sup> <sup>a</sup>0u<sup>n</sup>−<sup>k</sup> and initial values <sup>u</sup>0, u1,...,u<sup>k</sup>−<sup>1</sup>, construct two

<sup>1</sup> *Ultimately non-negative* (resp. *ultimately positive*) LRS, as defined by us, have also been called *ultimately positive* (resp. *strictly positive*) LRS elsewhere in the literature [31]. However, we choose to use terminology that is consistent across matrices and LRS, to avoid notational confusion.

matrices <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> such that u is ultimately non-negative iff (A<sup>n</sup> <sup>1</sup> <sup>+</sup> <sup>A</sup><sup>n</sup> <sup>2</sup> ) is eventually non-negative. Consider A<sup>1</sup> = 0 **u 0**<sup>T</sup> M , the generator matrix of u and A<sup>2</sup> = 0 **0 0**<sup>T</sup> P , where <sup>P</sup> <sup>∈</sup> <sup>Q</sup>k×<sup>k</sup> is constructed such that : <sup>P</sup>[i, j] ≥ |M[i, j]|. For example <sup>P</sup> can be constructed as: <sup>P</sup>[i, j] = <sup>M</sup>[i, j] for all <sup>j</sup> <sup>∈</sup> [2, k] and <sup>i</sup> <sup>∈</sup> [1, k] and <sup>P</sup>[i, j] = max(|a0|, <sup>|</sup>a1|,..., <sup>|</sup>ak−<sup>1</sup>|) + 1 for <sup>j</sup> = 1. Now consider the sequence of matrices defined by A<sup>n</sup> <sup>1</sup> <sup>+</sup> <sup>A</sup><sup>n</sup> <sup>2</sup> , for all <sup>n</sup> <sup>≥</sup> 1. By properties of the generator matrix, it is easily verified that A<sup>n</sup> <sup>1</sup> = 0 **u**M<sup>n</sup>−<sup>1</sup> **0**<sup>T</sup> M<sup>n</sup> . Similarly, we get A<sup>n</sup> <sup>2</sup> = 0 **0 0**<sup>T</sup> P <sup>n</sup> . Therefore, A<sup>n</sup> <sup>1</sup> <sup>+</sup> <sup>A</sup><sup>n</sup> <sup>2</sup> = 0 **u**M<sup>n</sup>−<sup>1</sup> **0**<sup>T</sup> P <sup>n</sup> + M<sup>n</sup> , for all <sup>n</sup> <sup>≥</sup> 1. Now, we can observe that <sup>P</sup> <sup>n</sup> <sup>+</sup> <sup>M</sup><sup>n</sup> is always non-negative, since <sup>P</sup>[i, j] ≥ |M[i, j]| ≥ <sup>0</sup> for all i, j ∈ {1,...k} and hence <sup>P</sup> <sup>n</sup>[i, j] + <sup>M</sup><sup>n</sup>[i, j] <sup>≥</sup> 0 for all i, j ∈ {1,...k} and <sup>n</sup> <sup>≥</sup> 1. Thus we conclude that <sup>A</sup>(n) = <sup>A</sup><sup>n</sup> <sup>1</sup> <sup>+</sup> <sup>A</sup><sup>n</sup> <sup>2</sup> <sup>≥</sup> 0 (<sup>n</sup> <sup>≥</sup> 1) iff u is ultimately non-negative, since the elements A(n)[1, 1] ...,A(n)[1, k + 1] consists of (u<sup>n</sup>+k−<sup>2</sup> ...,un, u<sup>n</sup>−<sup>1</sup>) and the rest of the elements are non-negative. 

Observe that the same reduction technique works if we are required to use more than 2 matrices in ENNSoM. Indeed, we can construct matrices A3, A4,...,A<sup>m</sup> similar to the construction of A<sup>2</sup> in the reduction above, by having the <sup>k</sup> <sup>×</sup> <sup>k</sup> matrix in the bottom right (see definition of <sup>A</sup>2) to have positive values greater than the maximum absolute value of every element in the companion matrix.

A simple modification of the above proof setting A<sup>2</sup> = 1 **0 1**<sup>T</sup> P , where **1** denotes the k-dimensional vector of all 1's gives us the corresponding hardness result for EPSoM (see [3] for details).

# **Theorem 2.** UPLRS *reduces to* EPSoM *with* |A| ≥ 2 *in polynomial time.*

We remark that for the reduction technique used in Theorems 1 and 2 to work, we need at least two (weight, matrix) pairs in A. For explanation of why this reduction doesn't work when |A| = 1, we refer the reader to [3]. Having shown the hardness of ENNSoM and EPSoM when |A| ≥ 2, we now proceed to establish upper bounds on the computational complexity of these problems.

# **4 Upper Bounds on Eventual Non-negativity and Positivity**

In this section, we show that ENNSoM (resp. EPSoM) is polynomially reducible to UNNLRS (resp. UPLRS), regardless of |A|.

### **Theorem 3.** ENNSoM*, reduces to* UNNLRS *in polynomial time.*

The proof is in two parts. First, we show that for a single matrix A, we can construct a linear recurrence a such that <sup>A</sup> is eventually non-negative iff a is ultimately non-negative. Then, we show that starting from such a linear recurrence for each matrix in <sup>A</sup>, we can construct a new LRS, say a, with the property that the weighted sum of powers of the matrices in A is eventually non-negative iff a is ultimately non-negative. Our proof makes crucial use of the following property of matrices.

**Lemma 1 Adapted from Lemma 1.1 of** [19]**).** *Let* <sup>A</sup> <sup>∈</sup> <sup>Q</sup>k×<sup>k</sup> *be a rational matrix with characteristic polynomial* <sup>p</sup>A(λ) = det(<sup>A</sup> <sup>−</sup> λI)*. Suppose we define the sequence* aij *for every* <sup>1</sup> <sup>≤</sup> i, j <sup>≤</sup> <sup>k</sup> *as follows:* <sup>a</sup>i,j <sup>n</sup> = A<sup>n</sup>+1[i, j]*, for all* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*. Then* ai,j *is an LRS of order* <sup>k</sup> *with characteristic polynomial* <sup>p</sup>A(x) *and initial values given by* aij <sup>0</sup> <sup>=</sup> <sup>A</sup><sup>1</sup>[i, j],...aij <sup>k</sup>−<sup>1</sup> <sup>=</sup> <sup>A</sup><sup>k</sup>[i, j]*.*

This follows from the Cayley-Hamilton Theorem and the reader is referred to [19] for further details. From Lemma 1, it is easy to see that the LRS ai,j for all 1 <sup>≤</sup> i, j <sup>≤</sup> <sup>k</sup> share the same order and characteristic polynomial (hence the defining recurrence) and differ only in their initial values. For notational convenience, we say that the LRS ai,j is *generated by* <sup>A</sup>[i, j].

**Proposition 1.** *A matrix* <sup>A</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>k</sup> *is eventually non-negative iff all LRS* ai,j *generated by* <sup>A</sup>[i, j] *for all* <sup>1</sup> <sup>≤</sup> i, j <sup>≤</sup> <sup>k</sup> *are ultimately non-negative.*

The proof follows from the definition of eventually non-negative matrices and the definition of aij . Next we define the notion of interleaving of LRS.

**Definition 2.** *Consider a set* <sup>S</sup> <sup>=</sup> {u<sup>i</sup> : 0 <sup>≤</sup> i<t} *of* <sup>t</sup> *LRSes, each having order* <sup>k</sup> *and the same characteristic polynomial. An LRS* v *is said to be the LRS-interleaving of* S *iff* vtn+<sup>s</sup> = u<sup>s</sup> <sup>n</sup> *for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>0</sup> <sup>≤</sup> s<t*.*

Observe that, the order of v is tk and its initial values are given by the interleaving of the <sup>k</sup> initial values of the LRSes u<sup>i</sup> . Formally, the initial values are vtj+<sup>i</sup> = u<sup>i</sup> <sup>j</sup> for 0 <sup>≤</sup> i<t and 0 <sup>≤</sup> j<k. The characteristic polynomial <sup>p</sup>v(s) is equal to <sup>p</sup>u*i*(x<sup>t</sup> ).

**Proposition 2.** *The LRS-interleaving* v *of a set of LRSes* <sup>S</sup> <sup>=</sup> {u<sup>i</sup> : 0 <sup>≤</sup> i < <sup>t</sup>} *is ultimately non-negative iff each LRS* u<sup>i</sup> *in* <sup>S</sup> *is ultimately non-negative.*

Now, from the definitions of LRSes ai,j , u<sup>i</sup> and v, and from Propositions 1 and 2, we obtain the following crucial lemma.

**Lemma 2.** *Given a matrix* <sup>A</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>k</sup>*, let* <sup>S</sup> <sup>=</sup> {u<sup>i</sup> | <sup>u</sup><sup>i</sup> <sup>n</sup> = apq <sup>n</sup> , *where* p = i/k + 1, q <sup>=</sup> <sup>i</sup> mod <sup>k</sup> + 1, <sup>0</sup> <sup>≤</sup> i<k<sup>2</sup>} *be the set of* <sup>k</sup><sup>2</sup> *LRSes mentioned in Lemma 1. The LRS* v *generated by LRS-interleaving of* <sup>S</sup> *satisfies the following:*


We lift this argument from a single matrix to a weighted sum of matrices.

**Lemma 3.** *Given* <sup>A</sup> <sup>=</sup> {(w1, A1),...,(wm, Am)}*, there exists a linear recurrence* a*, such that* m <sup>i</sup>=1 <sup>w</sup>iA<sup>n</sup> <sup>i</sup> *is eventually non-negative iff* a *is ultimately non-negative.*

*Proof.* For each matrix <sup>A</sup><sup>i</sup> in <sup>A</sup>, let v<sup>i</sup> be the interleaved LRS as constructed in Lemma 2. Let <sup>w</sup>iv<sup>i</sup> denote the scaled LRS whose <sup>n</sup>th entry is <sup>w</sup>iv<sup>i</sup> <sup>n</sup> for all <sup>n</sup> <sup>≥</sup> 0. The LRS a is obtained by adding the scaled LRSes <sup>w</sup>1v<sup>1</sup>, w2v<sup>2</sup>,... <sup>w</sup>mvm. Clearly, <sup>a</sup> <sup>n</sup> is non-negative iff m <sup>i</sup>=1 <sup>w</sup>iv<sup>i</sup> <sup>n</sup> is non-negative. From the definition of <sup>v</sup><sup>i</sup> (see Lemma 2), we also know that for all <sup>n</sup> <sup>≥</sup> 0, <sup>v</sup><sup>i</sup> <sup>n</sup> = A<sup>r</sup>+1 <sup>i</sup> [s + <sup>1</sup>, t + 1], where <sup>r</sup> <sup>=</sup> n/k<sup>2</sup>, <sup>s</sup> <sup>=</sup> (<sup>n</sup> mod <sup>k</sup><sup>2</sup>)/k and <sup>t</sup> <sup>=</sup> <sup>n</sup> mod <sup>k</sup>. Therefore, a <sup>n</sup> is non-negative iff m <sup>i</sup>=1 <sup>w</sup>iA<sup>r</sup>+1 <sup>i</sup> [s + 1, t + 1] is non-negative. It follows that a is ultimately non-negative iff m <sup>i</sup>=1 <sup>w</sup>iA<sup>n</sup> <sup>i</sup> is eventually non-negative. 

From Lemma 3, we can conclude the main result of this section, i.e., proof of Theorem 3. The following corollary can be shown *mutatis mutandis*.

**Corollary 1.** EPSoM *reduces to* UPLRS *in polynomial time.*

We note that it is also possible to argue about the eventual non-negativity (positivity) of only certain indices of the matrix using a similar argument as above. By interleaving only the LRS's corresponding to certain indices of the matrices in A, we can show this problem's equivalence with UNNLRS (UPLRS).

# **5 Decision Procedures for Special Cases**

Since there are no known algorithms for solving UNNLRS in general, the results of the previous section present a bleak picture for deciding ENNSoM and EPSoM. We now show that these problems can be solved in some important special cases.

#### **5.1 Simple Matrices and Matrices with Real Algebraic Eigenvalues**

Our first positive result follows from known results for special classes of LRSes.

**Theorem 4.** ENNSoM *and* EPSoM *are decidable for* <sup>A</sup> <sup>=</sup> {(w1, A1),...(wm, Am)} *if one of the following conditions holds for all* <sup>i</sup> ∈ {1,...m}*.*


*Proof.* Suppose each <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>k</sup>, and let <sup>λ</sup>i,1,...λi,k be the (possibly repeated) eigenvalues of <sup>A</sup>i. The characteristic polynomial of <sup>A</sup><sup>i</sup> is <sup>p</sup><sup>A</sup>*<sup>i</sup>* (x) = <sup>k</sup> <sup>j</sup>=1(<sup>x</sup> <sup>−</sup> λi,j ). Denote the LRS obtained from A<sup>i</sup> by LRS interleaving as in Lemma 2 as a<sup>i</sup> . By Lemma 2, we have (i) <sup>a</sup><sup>i</sup> rk2+sk+<sup>t</sup> <sup>=</sup> <sup>A</sup><sup>r</sup>+1 <sup>i</sup> [<sup>s</sup> + 1, t + 1] for all <sup>r</sup> <sup>∈</sup> <sup>N</sup> and 0 <sup>≤</sup> s, t < k, and (ii) <sup>p</sup>a*i*(x) = <sup>k</sup> <sup>j</sup>=1 x<sup>k</sup><sup>2</sup> <sup>−</sup> <sup>λ</sup>i,j . We now define the scaled LRS {b<sup>i</sup> , where <sup>|</sup> <sup>b</sup><sup>i</sup> <sup>n</sup> = w<sup>i</sup> a<sup>i</sup> <sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Since scaling does not change the characteristic polynomial of an LRS (refer [3] for a simple proof), we have <sup>p</sup>b*i*(x) = <sup>k</sup> <sup>j</sup>=1 xk<sup>2</sup> <sup>−</sup>λi,j . Once the LRSes b<sup>1</sup>,...bm are obtained as above, we sum them to obtain the LRS b. Thus, for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, we have b <sup>n</sup> = m <sup>i</sup>=1 <sup>b</sup><sup>i</sup> <sup>n</sup> = m <sup>i</sup>=1 <sup>w</sup><sup>i</sup> <sup>a</sup><sup>i</sup> <sup>n</sup> = m <sup>i</sup>=1 <sup>w</sup><sup>i</sup> <sup>A</sup><sup>r</sup> <sup>i</sup> [s, t], where <sup>n</sup> <sup>=</sup> rk<sup>2</sup> <sup>+</sup> sk <sup>+</sup> <sup>t</sup>, <sup>r</sup> <sup>∈</sup> <sup>N</sup> and 0 <sup>≤</sup> s, t < k. Hence, ENNSoM (resp. EPSoM) for {(w1, A1),...(wm, Am)} polynomially reduces to UNNLRS (resp. UPLRS) for b.

By [14], we know that the characteristic polynomial <sup>p</sup>b(x) is the LCM of the characteristic polynomials <sup>p</sup>b*i*(x) for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>. If <sup>A</sup><sup>i</sup> are simple, there are no repeated roots of <sup>p</sup>b*i*(x). If this holds for all <sup>i</sup> ∈ {1,...m}, there are no repeated roots of the LCM of <sup>p</sup>b1(x),...pb*m*(x) as well. Hence, <sup>p</sup>b(x) has no repeated roots. Similarly, if all eigenvalues of A<sup>i</sup> are roots of real algebraic numbers, so are all roots of <sup>p</sup>b*i*(x). It follows that all roots of the LCM of <sup>p</sup>b1(x),...pb*m*(x), i.e. <sup>p</sup>b(x), are also roots of real algebraic numbers.

The theorem now follows from the following two known results about LRS.


*Remark:* The technique used in [31] to decide UNNLRS (resp. UPLRS) for simple rational LRS also works for simple LRS with real algebraic coefficients and initial values. This allows us to generalize Theorem 4(1) to the case where all Ai's and wi's are real algebraic matrices and weights respectively.

#### **5.2 Diagonalizable Matrices**

We now ask if ENNSoM and EPSoM can be decided if each matrix A<sup>i</sup> is diagonalizable. Since diagonalizable matrices strictly generalize simple matrices, Theorem 4(1) cannot answer this question directly, unless one perhaps looks under the hood of the (highly non-trivial) proof of decidability of non-negativity/positivity of simple LRSes. The main contribution of this section is a reduction that allows us to decide ENNSoM and EPSoM for diagonalizable matrices using a black-box decision procedure (i.e. without knowing operational details of the procedure or details of its proof of correctness) for the corresponding problem for simple real-algebraic matrices.

Before we proceed further, let us consider an example of a non-simple matrix (i.e. one with repeated eigenvalues) that is diagonalizable.


**Fig. 2.** Diagonalizable matrix

Specifically, matrix A in Fig. 2 has eigenvalues 2, 2 and <sup>−</sup>1, and can be written as SDS−<sup>1</sup>, where <sup>D</sup> is the 3 <sup>×</sup> 3 diagonal matrix with <sup>D</sup>[1, 1] = <sup>D</sup>[2, 2] = 2 and <sup>D</sup>[3, 3] = <sup>−</sup>1, and <sup>S</sup> is the 3 <sup>×</sup> 3 matrix with columns (−4, <sup>1</sup>, 0)<sup>T</sup>, (2, <sup>0</sup>, 1)<sup>T</sup> and (−1, <sup>1</sup>, 1)<sup>T</sup>.

Interestingly, the reduction technique we develop applies to properties much more general than ENNSoM and EPSoM. Formally, given a sequence of matrices B<sup>n</sup> defined by m <sup>i</sup>=1 <sup>w</sup>iA<sup>n</sup> <sup>i</sup> , we say that a property P of the sequence is *positive scaling invariant* if it stays unchanged even if we scale all Ais by the same positive real. Examples of such properties include ENNSoM, EPSoM, non-negativity and positivity of <sup>B</sup><sup>n</sup> (i.e. is <sup>B</sup>n[i, j] <sup>≥</sup> 0 or <sup>&</sup>lt; 0, as the case may be, for all <sup>n</sup> <sup>≥</sup> 1 and for all 1 <sup>≤</sup> i, j <sup>≤</sup> <sup>k</sup>), existence of zero (i.e. is <sup>B</sup><sup>n</sup> equal to the all 0-matrix for some <sup>n</sup> <sup>≥</sup> 1), existence of a zero element (i.e. is <sup>B</sup>n[i, j] = 0 for some <sup>n</sup> <sup>≥</sup> 1 and some i, j ∈ {1,...k}), variants of the <sup>r</sup>-non-negativity (resp. <sup>r</sup>-positivity and r-zero) problem, i.e. does there exist at least/exactly/at most r non-negative (resp. positive/zero) elements in <sup>B</sup><sup>n</sup> for all <sup>n</sup> <sup>≥</sup> 1, for a given <sup>r</sup> <sup>∈</sup> [1, k]) etc. The main result of this section is a reduction for deciding such properties, formalized in the following theorem.

**Theorem 5.** *The decision problem for every positive scaling invariant property on rational diagonalizable matrices effectively reduces to the decision problem for the property on real algebraic simple matrices.*

While we defer the proof of this theorem to later in the section, an immediate consequence of Theorem 5 and Theorem 4(1) (read with the note at the end of Sect. 5.1) is the following result.

**Corollary 2.** ENNSoM *and* EPSoM *are decidable for* <sup>A</sup> <sup>=</sup> {(w1, A1),... (wm, Am)} *if all* <sup>A</sup>i*s are rational diagonalizable matrices and all* <sup>w</sup>i*s are rational.*

It is important to note that Theorem 5 yields a decision procedure for checking any positive scaling invariant property of diagonalizable matrices from a corresponding decision procedure for real algebraic simple matrices *without making any assumptions* about the inner working of the latter decision procedure. Given *any* black-box decision procedure for checking *any* positive scaling property for a set of weighted simple matrices, our reduction tells us how a corresponding decision procedure for checking the same property for a set of weighted diagonalizable matrices can be constructed. Interestingly, since diagonalizable matrices have an exponential form solution with constant coefficients for exponential terms, we can use an algorithm that exploits this specific property of the exponential form (like Ouaknine and Worrell's algorithm [31], originally proposed for checking ultimate positivity of simple LRS) to deal with diagonalizable matrices. However, our reduction technique is neither specific to this algorithm nor does it rely on any special property the exponential form of the solution.

The proof of Theorem 5 crucially relies on the notion of perturbation of diagonalizable matrices, which we introduce first. Let <sup>A</sup> be a <sup>k</sup> <sup>×</sup> <sup>k</sup> real diagonalizable matrix. Then, there exists an invertible <sup>k</sup> <sup>×</sup> <sup>k</sup> matrix <sup>S</sup> and a diagonal <sup>k</sup> <sup>×</sup> <sup>k</sup> matrix <sup>D</sup> such that <sup>A</sup> <sup>=</sup> SDS−<sup>1</sup>, where <sup>S</sup> and <sup>D</sup> may have complex entries. It follows from basic linear algebra that for every <sup>i</sup> ∈ {1,...k}, <sup>D</sup>[i, i] is an eigenvalue of A and if α is an eigenvalue of A with algebraic multiplicity ρ, then α appears exactly ρ times along the diagonal of D. Furthermore, for every <sup>i</sup> ∈ {1,...k}, the <sup>i</sup> th column of S (resp. i th row of S−<sup>1</sup>) is an eigenvector of A (resp. of A<sup>T</sup>) corresponding to the eigenvalue D[i, i], and the columns of S

(resp. rows of S−<sup>1</sup>) form a basis of the vector space Ck. Let α1,...α<sup>m</sup> be the eigenvalues of A with algebraic multiplicities ρ1,...ρ<sup>m</sup> respectively. Wlog, we assume that <sup>ρ</sup><sup>1</sup> <sup>≥</sup> ... <sup>≥</sup> <sup>ρ</sup><sup>m</sup> and the diagonal of <sup>D</sup> is partitioned into segments as follows: the first ρ<sup>1</sup> entries along the diagonal are α1, the next ρ<sup>2</sup> entries are α2, and so on. We refer to these segments as the α1-segment, α2-segment and so on, of diagonal of D. Formally, if κ<sup>i</sup> denotes <sup>i</sup>−<sup>1</sup> <sup>j</sup>=1 <sup>ρ</sup><sup>j</sup> , the <sup>α</sup>i-segment of diagonal of D consists of entries D[κ<sup>i</sup> + 1, κ<sup>i</sup> + 1],...D[κ<sup>i</sup> +ρi, κ<sup>i</sup> +ρi], all of which are αi.

Since A is a real matrix, its characteristic polynomial has all real coefficients and for every eigenvalue α of A (and hence of A<sup>T</sup>), its complex conjugate, denoted α, is also an eigenvalue of A (and hence of A<sup>T</sup>) with the same algebraic multiplicity. This allows us to define a bijection <sup>h</sup><sup>D</sup> from {1,...,k} to {1,...k} as follows. If <sup>D</sup>[i, i] is real, then <sup>h</sup>D(i) = <sup>i</sup>. Otherwise, let <sup>D</sup>[i, i] = <sup>α</sup> <sup>∈</sup> <sup>C</sup> and let D[i, i] be the l th element in the α-segment of the diagonal of D. Then hD(i) = j, where D[j, j] is the l th element in the α-segment of the diagonal of D. The matrix A being real also implies that for every real eigenvalue α of A (resp. of A<sup>T</sup>), there exists a basis of *real eigenvectors* of the corresponding eigenspace. Additionally, for every non-real eigenvalue α and for every set of eigenvectors of A (resp. of A<sup>T</sup>) that forms a basis of the eigenspace corresponding to α, the component-wise complex conjugates of these basis vectors serve as eigenvectors of A (resp. of A<sup>T</sup>) and form a basis of the eigenspace corresponding to α.

Using the above notation, we choose matrix S−<sup>1</sup> (and hence S) such that A = SDS−<sup>1</sup> as follows. Suppose α is an eigenvalue of A (and hence of A<sup>T</sup>) with algebraic multiplicity <sup>ρ</sup>. Let {<sup>i</sup> + 1,...i <sup>+</sup> <sup>ρ</sup>} be the set of indices <sup>j</sup> for which D[j, j] = α. If α is real (resp. complex), the i + 1st,...i + ρth rows of S−<sup>1</sup> are chosen to be real (resp. complex) eigenvectors of A<sup>T</sup> that form a basis of the eigenspace corresponding to α. Moreover, if α is complex, the hD(i + s)th row of S−<sup>1</sup> is chosen to be the component-wise complex conjugate of the i + sth row of <sup>S</sup>−<sup>1</sup>, for all <sup>s</sup> ∈ {1,...ρ}.

**Definition 3.** *Let* <sup>A</sup> <sup>=</sup> SDS−<sup>1</sup> *be a* <sup>k</sup>×<sup>k</sup> *real diagonalizable matrix. We say that* <sup>E</sup> = (ε1,...εk) <sup>∈</sup> <sup>R</sup><sup>k</sup> *is a* perturbation *w.r.t.* <sup>D</sup> *if* <sup>ε</sup><sup>i</sup> = 0 *and* <sup>ε</sup><sup>i</sup> <sup>=</sup> <sup>ε</sup><sup>h</sup>*D*(i) *for all* <sup>i</sup> ∈ {1,...k}*. Further, the* <sup>E</sup>-perturbed variant of <sup>A</sup> *is the matrix* <sup>A</sup> <sup>=</sup> SD S−<sup>1</sup>*, where* <sup>D</sup> *is the* <sup>k</sup>×<sup>k</sup> *diagonal matrix with* <sup>D</sup> [i, i] = <sup>ε</sup>iD[i, i] *for all* <sup>i</sup> ∈ {1,...k}*.*

In the following, we omit "w.r.t. <sup>D</sup>" and simply say "<sup>E</sup> is a perturbation", when D is clear from the context. Clearly, A as defined above is a diagonalizable matrix and its eigenvalues are given by the diagonal elements of D .

Recall that the diagonal of D is partitioned into αi-segments, where each α<sup>i</sup> is an eigenvalue of A = SDS−<sup>1</sup> with algebraic multiplicity ρi. We now use a similar idea to segment a perturbation <sup>E</sup> w.r.t. <sup>D</sup>. Specifically, the first <sup>ρ</sup><sup>1</sup> elements of <sup>E</sup> constitute the <sup>α</sup>1-segment of <sup>E</sup>, the next <sup>ρ</sup><sup>2</sup> elements of <sup>E</sup> constitute the <sup>α</sup>2 segment of E and so on.

**Definition 4.** *A perturbation* <sup>E</sup> = (ε1,...εk) *is said to be* segmented *if the* <sup>j</sup>th *element (whenever present) in every segment of* E *has the same value, for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>ρ</sup>1*. Formally, if* <sup>i</sup> <sup>=</sup> <sup>l</sup>−<sup>1</sup> <sup>s</sup>=1 <sup>ρ</sup><sup>s</sup> <sup>+</sup> <sup>j</sup> *and* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>ρ</sup><sup>l</sup> <sup>≤</sup> <sup>ρ</sup>1*, then* <sup>ε</sup><sup>i</sup> <sup>=</sup> <sup>ε</sup><sup>j</sup> *.*

Clearly, the first <sup>ρ</sup><sup>1</sup> elements of a segmented perturbation <sup>E</sup> define the whole of <sup>E</sup>. As an example, suppose (α1, α1, α1, α2, α2, <sup>α</sup>2, <sup>α</sup>2, α3) is the diagonal of <sup>D</sup>, where α1, α2, α<sup>2</sup> and α<sup>3</sup> are distinct eigenvalues of A. There are four segments of the diagonal of <sup>D</sup> (and of <sup>E</sup>) of lengths 3, <sup>2</sup>, 2 and 1 respectively.

Example segmented perturbations in this case are (ε1, ε2, ε3, ε1, ε2, ε1, ε2, ε1) and (ε3, ε1, ε2, ε3, ε1, ε3, ε1, ε3). If <sup>ε</sup><sup>1</sup> <sup>=</sup> <sup>ε</sup><sup>2</sup> or <sup>ε</sup><sup>2</sup> <sup>=</sup> <sup>ε</sup>3, a perturbation that is *not segmented* is E = (ε1, ε2, ε3, ε2, ε3, ε2, ε3, ε1).

**Definition 5.** *Given a segmented perturbation* <sup>E</sup> = (ε1,...εk) *w.r.t.* <sup>D</sup>*, a* rotation *of* <sup>E</sup>*, denoted* <sup>τ</sup>D(E)*, is the segmented perturbation* <sup>E</sup> = (ε 1,...ε <sup>k</sup>) *in which* ε (<sup>i</sup> mod <sup>ρ</sup>1)+1 <sup>=</sup> <sup>ε</sup><sup>i</sup> *for* <sup>i</sup> ∈ {1,...ρ1}*, and all other* <sup>ε</sup> <sup>i</sup>*s are as in Definition 4.*

Continuing with our example, if <sup>E</sup> = (ε1, ε2, ε3, ε1, ε2, ε1, ε2, ε1), then <sup>τ</sup>D(E) = (ε3, ε1, ε2, ε3, ε1, ε3, ε1, ε3), τ <sup>2</sup> <sup>D</sup>(E)=(ε2, ε3, ε1, ε2, ε3, ε2, ε3, ε2) and <sup>τ</sup> <sup>3</sup> <sup>D</sup>(E) = E.

**Lemma 4.** *Let* <sup>A</sup> <sup>=</sup> SDS−<sup>1</sup> *be a* <sup>k</sup> <sup>×</sup> <sup>k</sup> *real diagonalizable matrix with eigenvalues* <sup>α</sup><sup>i</sup> *of algebraic multiplicity* <sup>ρ</sup>i*. Let* <sup>E</sup> = (ε1,...εk) *be a segmented perturbation w.r.t.* D *such that all* ε<sup>j</sup> *s have the same sign, and let* A<sup>u</sup> *denote the* τ<sup>u</sup> <sup>D</sup>(E)*-perturbed variant of* <sup>A</sup> *for* <sup>0</sup> <sup>≤</sup> u<ρ1*, where* <sup>τ</sup> <sup>0</sup>(E) = <sup>E</sup>*. Then* A<sup>n</sup> = <sup>1</sup> *ρ*1 *<sup>j</sup>*=1 <sup>ε</sup>*<sup>n</sup> j* <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>A</sup><sup>n</sup> <sup>u</sup>*, for all* <sup>n</sup> <sup>≥</sup> <sup>1</sup>*.*

*Proof.* Let <sup>E</sup><sup>u</sup> denote <sup>τ</sup><sup>u</sup> <sup>D</sup>(E) for 0 <sup>≤</sup> u<ρ1, and let <sup>E</sup><sup>u</sup>[i] denote the <sup>i</sup> th element of <sup>E</sup><sup>u</sup> for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>. It follows from Definitions <sup>4</sup> and <sup>5</sup> that for each i, j ∈ {1,...ρ1}, there is a unique <sup>u</sup> ∈ {0,...ρ<sup>1</sup> <sup>−</sup> <sup>1</sup>} such that <sup>E</sup><sup>u</sup>[i] = <sup>ε</sup><sup>j</sup> . Specifically, <sup>u</sup> <sup>=</sup> <sup>i</sup> <sup>−</sup> <sup>j</sup> if <sup>i</sup> <sup>≥</sup> <sup>j</sup>, and <sup>u</sup> = (ρ<sup>1</sup> <sup>−</sup> <sup>j</sup>) + <sup>i</sup> if i<j. Furthermore, Definition <sup>4</sup> ensures that the above property holds not only for <sup>i</sup> ∈ {1,...ρ1}, but for all <sup>i</sup> ∈ {1,...k}.

Let <sup>D</sup><sup>u</sup> denote the diagonal matrix with <sup>D</sup>u[i, i] = <sup>E</sup><sup>u</sup>[i]D[i, i] for 0 <sup>≤</sup> i<ρ1. Then D<sup>n</sup> <sup>u</sup> is the diagonal matrix with D<sup>n</sup> <sup>u</sup>[i, i] = <sup>E</sup><sup>u</sup>[i]D[i, i] <sup>n</sup> for all <sup>n</sup> <sup>≥</sup> 1. It follows from the definition of A<sup>u</sup> that A<sup>n</sup> <sup>u</sup> = S D<sup>n</sup> <sup>u</sup> <sup>S</sup>−<sup>1</sup> for 0 <sup>≤</sup> u<ρ and <sup>n</sup> <sup>≥</sup> 1. Therefore, <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>A</sup><sup>n</sup> <sup>u</sup> = S <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>D</sup><sup>n</sup> u S−<sup>1</sup>. Now, <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>D</sup><sup>n</sup> <sup>u</sup> is a diagonal matrix whose i th element along the diagonal is <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>E</sup><sup>u</sup>[i]D[i, i] n = <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>E</sup><sup>n</sup> <sup>u</sup> [i] D<sup>n</sup>[i, i]. By virtue of the property mentioned in the previous paragraph, <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>E</sup><sup>n</sup> <sup>u</sup> [i] = ρ1 <sup>j</sup>=1 <sup>ε</sup><sup>n</sup> <sup>j</sup> for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>. Therefore, <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>D</sup><sup>n</sup> <sup>u</sup> = ρ1 <sup>j</sup>=1 <sup>ε</sup><sup>n</sup> j D<sup>n</sup>, and hence, <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>A</sup><sup>n</sup> <sup>u</sup> = ρ1 <sup>j</sup>=1 <sup>ε</sup><sup>n</sup> j S D<sup>n</sup> S−<sup>1</sup> = ρ1 <sup>j</sup>=1 <sup>ε</sup><sup>n</sup> j A<sup>n</sup>. Since all ε<sup>j</sup> s have the same sign and are non-zero, ρ1 <sup>j</sup>=1 <sup>ε</sup><sup>n</sup> j is non-zero for all <sup>n</sup> <sup>≥</sup> 1. It follows that <sup>A</sup><sup>n</sup> <sup>=</sup> <sup>1</sup> *ρ*1 *<sup>j</sup>*=1 <sup>ε</sup>*<sup>n</sup> j* <sup>ρ</sup>1−<sup>1</sup> <sup>u</sup>=0 <sup>A</sup><sup>n</sup> <sup>u</sup>. 

We are now in a position to present the proof of the main result of this section, i.e. of Theorem 5. Our proof uses a variation of the idea used in the proof of Lemma 4 above.

*Proof of Theorem* 5. Consider a set {(w1, A1),...(wi, Ai)} of (weight, matrix) pairs, where each matrix <sup>A</sup><sup>i</sup> is in <sup>Q</sup><sup>k</sup>×<sup>k</sup> and each <sup>w</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup>. Suppose further that each A<sup>i</sup> = SiDiS−<sup>1</sup> <sup>i</sup> , where D<sup>i</sup> is a diagonal matrix with segments along the diagonal arranged in descending order of algebraic multiplicities of the corresponding eigenvalues. Let ν<sup>i</sup> be the number of distinct eigenvalues of Ai, and let these eigenvalues be αi,1,...αi,ν*<sup>i</sup>* . Let μ<sup>i</sup> be the largest algebraic multiplicity among those of all eigenvalues of Ai, and let μ = lcm(μ1,...μm). We now choose *positive* rationals ε1,...ε<sup>μ</sup> such that (i) all ε<sup>j</sup> s are distinct, and (ii) for every <sup>i</sup> ∈ {1,...m}, for every distinct j, l ∈ {1,...νi} and for every distinct p, q ∈ {1,...μ}, we have <sup>ε</sup>*<sup>p</sup>* <sup>ε</sup>*<sup>q</sup>* = | α*i,j* αi,l |. Since Q is a dense set, such a choice of <sup>ε</sup>1,...ε<sup>μ</sup> can always be made once all <sup>|</sup> α*i,j* <sup>α</sup>*i,l* |s are known, even if within finite precision bounds.

For 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>, let <sup>η</sup><sup>i</sup> denote μ/μi. We now define <sup>η</sup><sup>i</sup> distinct and segmented perturbations w.r.t. <sup>D</sup><sup>i</sup> as follows, and denote these as <sup>E</sup>i,1,... <sup>E</sup>i,η*<sup>i</sup>* . For 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>η</sup>i, the first <sup>μ</sup><sup>i</sup> elements (i.e. the first segment) of <sup>E</sup>i,j are <sup>ε</sup>(j−1)μ*i*+1,...εjμ*<sup>i</sup>* (as chosen in the previous paragraph), and all other elements of Ei,j are defined as in Definition 4. For each <sup>E</sup>i,j thus obtained, we also consider its rotations <sup>τ</sup><sup>u</sup> D*i* (Ei,j ) for 0 <sup>≤</sup> u<μi. For 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>η</sup><sup>i</sup> and 0 <sup>≤</sup> u<μi, let <sup>A</sup>i,j,u <sup>=</sup> <sup>S</sup><sup>i</sup> <sup>D</sup>i,j,u <sup>S</sup>−<sup>1</sup> i denote the τ<sup>u</sup> D*i* (Ei,j )-perturbed variant of <sup>A</sup>i. It follows from Definition <sup>3</sup> that if we consider the set of diagonal matrices {Di,j,u <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>η</sup>i, 0 <sup>≤</sup> u<μ<sup>i</sup>}, then for every <sup>p</sup> ∈ {1,...k} and for every <sup>q</sup> ∈ {1,...μ}, there is a unique <sup>u</sup> and <sup>j</sup> such that <sup>D</sup>i,j,u[p, p] = <sup>ε</sup>q. Specifically, <sup>j</sup> <sup>=</sup> q/μ<sup>i</sup>. To find <sup>u</sup>, let <sup>E</sup>i,j [p] be the <sup>p</sup> th element in a segment of <sup>E</sup>i,j , where 1 <sup>≤</sup> <sup>p</sup> <sup>≤</sup> <sup>μ</sup>i, and let <sup>q</sup> be <sup>q</sup> mod <sup>μ</sup>i. Then, <sup>u</sup> = (<sup>p</sup> <sup>−</sup> <sup>q</sup> ) if <sup>p</sup> <sup>≥</sup> <sup>q</sup> and <sup>u</sup> = (μ<sup>i</sup> <sup>−</sup> <sup>q</sup> ) + <sup>p</sup> otherwise. By our choice of <sup>ε</sup>ts, we also know that for all <sup>i</sup> ∈ {1,...m}, for all j, l ∈ {1,...ν<sup>i</sup>} and for all p, q ∈ {1,...μ}, we have <sup>ε</sup>pαi,l <sup>=</sup> <sup>ε</sup>qαi,j unless <sup>p</sup> <sup>=</sup> <sup>q</sup> *and* <sup>j</sup> <sup>=</sup> <sup>l</sup>. This ensures that all Di,j,u matrices, and hence all Ai,j,us matrices, are simple, i.e. have distinct eigenvalues.

Using the reasoning in Lemma 4, we can now show that A<sup>n</sup> <sup>i</sup> = 1 *μ <sup>j</sup>*=1 <sup>ε</sup>*<sup>n</sup> j* <sup>×</sup> η*i* j=1 <sup>μ</sup>*i*−<sup>1</sup> <sup>u</sup>=0 <sup>A</sup><sup>n</sup> i,j,u and so, m <sup>i</sup>=1 <sup>w</sup>iA<sup>n</sup> <sup>i</sup> = <sup>1</sup> *μ <sup>j</sup>*=1 <sup>ε</sup>*<sup>n</sup> j* × m i=1 η*i* j=1 <sup>μ</sup>*i*−<sup>1</sup> <sup>u</sup>=0 <sup>w</sup>iA<sup>n</sup> i,j,u . Since all ε<sup>j</sup> s are positive reals, μ <sup>j</sup>=1 <sup>ε</sup><sup>n</sup> <sup>j</sup> is a positive real for all <sup>n</sup> <sup>≥</sup> 1.

Hence, for each p, q ∈ {1,...k}, m <sup>i</sup>=1 <sup>w</sup>iA<sup>n</sup> <sup>i</sup> [p, q] is > 0, < 0 or = 0 if and only if m i=1 η*i* j=1 <sup>μ</sup>*i*−<sup>1</sup> <sup>u</sup>=0 <sup>w</sup>iA<sup>n</sup> i,j,u[p, q] is > 0, < 0 or = 0, respectively. The only remaining helper result that is now needed to complete the proof of the theorem is that each Ai,j,u is a real algebraic matrix. This is shown in Lemma 5, presented at the end of this section to minimally disturb the flow of arguments. 

The reduction in proof of Theorem 5 can be easily encoded as an algorithm, as shown in Algorithm 1. Further, in addition to Corollary 2, there are other consequences of our reduction. One such result (with proof in [3]) is below.

**Corollary 3.** *Given* <sup>A</sup> <sup>=</sup> {(w1, A1),...(wm, Am)}*, where each* <sup>w</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup> *and* <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>k</sup> *is diagonalizable, and a real value* ε > <sup>0</sup>*, there exists* <sup>B</sup> <sup>=</sup> {(v1, B1), ...(vM, BM)}*, where each* <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup> *and each* <sup>B</sup><sup>i</sup> <sup>∈</sup> RA<sup>k</sup>×<sup>k</sup> *is simple, such that* m <sup>i</sup>=0 <sup>w</sup>iA<sup>n</sup> <sup>i</sup> [p, q] <sup>−</sup> -M <sup>j</sup>=0 <sup>v</sup>jB<sup>n</sup> <sup>j</sup> [p, q] < ε<sup>n</sup> *for all* p, q ∈ {1,...k} *and all* <sup>n</sup> <sup>≥</sup> <sup>1</sup>*.*

We end this section with the promised helper result used at the end of the proof of Theorem 5.

#### **Algorithm 1.** Reduction procedure for diagonalizable matrices

**Input:** <sup>A</sup> <sup>=</sup> {(wi, Ai):1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> m, w<sup>i</sup> <sup>∈</sup> <sup>Q</sup>, A<sup>i</sup> <sup>∈</sup> <sup>Q</sup><sup>k</sup>×<sup>k</sup> and diagonalizable} **Output:** <sup>B</sup> <sup>=</sup> {(vi, Bi): 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> t, v<sup>i</sup> <sup>∈</sup> <sup>Q</sup>, B<sup>i</sup> <sup>∈</sup> RA<sup>k</sup>×<sup>k</sup> are simple} s.t. m <sup>i</sup>=1 wiA<sup>n</sup> i = f(n) t <sup>i</sup>=1 viB<sup>n</sup> i , where f(n) > 0 for all n ≥ 0? 1: P ← {1}; - Initialize set of forbidden ratios of various ε<sup>j</sup> s 2: **for** i in 1 through m **do** - For each matrix A<sup>i</sup> 3: R<sup>i</sup> ← {(αi,j , ρi,j ) : αi,j is eigenvalue of A<sup>i</sup> with algebraic multiplicity ρi,j}; 4: D<sup>i</sup> ← Diagonal matrix of αi,j -segments ordered in decreasing order of ρi,j ; 5: <sup>S</sup><sup>i</sup> <sup>←</sup> Matrix of linearly independent eigenvectors of <sup>A</sup><sup>i</sup> s.t. <sup>A</sup><sup>i</sup> <sup>=</sup> <sup>S</sup>iDiS−<sup>1</sup> <sup>i</sup> ; 6: P ← P ∪ |αi,j/αi,l| : αi,j , αi,l are eigenvalues in R<sup>i</sup> ; μ<sup>i</sup> ← max<sup>j</sup> ρi,j 7: μ = lcm(μ1,...μm); - Count of ε<sup>j</sup> s needed 8: **for** j in 1 through μ **do** - Generate all required ε<sup>j</sup> s 9: Choose ε<sup>j</sup> ∈ Q s.t. ε<sup>j</sup> > 0 and ε<sup>j</sup> ∈ {πε<sup>p</sup> : 1 ≤ p < j, π ∈ P}; 10: B ← ∅; - Initialize set of (weight, simple matrix) pairs 11: **for** i in 1 through m **do** - For each matrix A<sup>i</sup> 12: ν<sup>i</sup> ← μ/μi; - Count of segmented perturbations to be rotated for A<sup>i</sup> 13: **for** j in 0 through ν<sup>i</sup> − 1 **do** - For each segmented perturbation 14: Ei,j ← Seg. perturbn. w.r.t. D<sup>i</sup> with first μ<sup>i</sup> elements being εjμ*i*+1,...ε(j+1)μ*<sup>i</sup>* ; 15: **for** u in 0 through μ<sup>i</sup> − 1 **do** - For each rotation of Ei,j 16: <sup>A</sup>i,j,u <sup>←</sup> <sup>τ</sup> <sup>u</sup> <sup>D</sup>*<sup>i</sup>* (Ei,j )-perturbed variant of A; 17: B ← B ∪ {(wi, Ai,j,u)}; - Update A- 18: **return** B;

**Lemma 5.** *For every real (resp. real algebraic) diagonalizable matrix* A = SDS−<sup>1</sup> *and perturbation* E ∈ <sup>R</sup><sup>k</sup> *(resp.* RA<sup>k</sup>*), the* <sup>E</sup>*-perturbed variant of* <sup>A</sup> *is a real (resp. real algebraic) diagonalizable matrix.*

*Proof.* We first consider the case of <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>k</sup>×<sup>k</sup> and E ∈ <sup>R</sup><sup>k</sup>. Given a perturbation <sup>E</sup> w.r.t. <sup>D</sup>, we first define <sup>k</sup> *simple* perturbations <sup>E</sup><sup>i</sup> (1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>) w.r.t. <sup>D</sup> as follows: <sup>E</sup><sup>i</sup> has all its components set to 1, except for the <sup>i</sup> th component, which is set to <sup>ε</sup>i. Furthermore, if <sup>D</sup>[i, i] is not real, then the <sup>h</sup>D(i)th component of <sup>E</sup><sup>i</sup> is also set to <sup>ε</sup>i. It is easy to see from Definition <sup>3</sup> that each <sup>E</sup><sup>i</sup> is a perturbation w.r.t. <sup>D</sup>. Moreover, if <sup>j</sup> <sup>=</sup> <sup>h</sup>D(i), then <sup>E</sup><sup>j</sup> <sup>=</sup> <sup>E</sup><sup>i</sup>.

Let E <sup>=</sup> {E<sup>i</sup><sup>1</sup> ,... <sup>E</sup><sup>i</sup>*<sup>u</sup>* } be the set of all *unique* perturbations w.r.t <sup>D</sup> among <sup>E</sup>1,... <sup>E</sup><sup>k</sup>. It follows once again from Definition <sup>3</sup> that the <sup>E</sup>-perturbed variant of <sup>A</sup> can be obtained by a sequence of <sup>E</sup><sup>i</sup>*<sup>j</sup>* -perturbations, where <sup>E</sup><sup>i</sup>*<sup>j</sup>* <sup>∈</sup> <sup>E</sup> . Specifically, let <sup>A</sup>0,<sup>E</sup> <sup>=</sup> <sup>A</sup> and <sup>A</sup>v,<sup>E</sup> be the <sup>E</sup><sup>i</sup>*<sup>v</sup>* -perturbed variant of <sup>A</sup><sup>v</sup>−1,<sup>E</sup> for all <sup>v</sup> ∈ {1,...u}. Then, the <sup>E</sup>-perturbed variant of <sup>A</sup> is identical to <sup>A</sup>u,<sup>E</sup> . This shows that it suffices to prove the lemma only for simple perturbations E<sup>i</sup>, as defined above. We focus on this special case below.

Let A = SD <sup>S</sup>−<sup>1</sup> be the <sup>E</sup><sup>i</sup>-perturbed variant of <sup>A</sup>, and let <sup>D</sup>[i, i] = <sup>α</sup>. For every <sup>p</sup> ∈ {1,...k}, let **<sup>e</sup><sup>p</sup>** denote the <sup>p</sup>-dimensional unit vector whose <sup>p</sup>th component is 1. Then, A **e<sup>p</sup>** gives the pth column of A . We prove the first part of the lemma by showing that A **e<sup>p</sup>** = (S D <sup>S</sup>−<sup>1</sup>) **<sup>e</sup><sup>p</sup>** <sup>∈</sup> <sup>R</sup><sup>k</sup>×<sup>1</sup> for all <sup>p</sup> ∈ {1,...k}.

Let **T** denote D S−<sup>1</sup> **ep**. Then **T** is a column vector with **T**[r] = D [r, r] <sup>S</sup>−<sup>1</sup>[r, p] for all <sup>r</sup> ∈ {1,...k}. Let **<sup>U</sup>** denote <sup>S</sup>**T**. By definition, **<sup>U</sup>** is the pth column of the matrix A . To compute **U**, recall that the rows of S−<sup>1</sup> form a basis of <sup>C</sup>k. Therefore, for every <sup>q</sup> ∈ {1,...k}, <sup>S</sup>−<sup>1</sup> **<sup>e</sup><sup>q</sup>** can be viewed as transforming the basis of the unit vector **e<sup>q</sup>** to that given by the rows of S−<sup>1</sup> (modulo possible scaling by real scalars denoting the lengths of the row vectors of S−<sup>1</sup>). Similarly, computation of **U** = S**T** can be viewed as applying the inverse basis transformation to **T**. It follows that the components of **U** can be obtained by computing the dot product of **T** and the transformed unit vector S−<sup>1</sup> **eq**, for each <sup>q</sup> ∈ {1,...k}. In other words, **<sup>U</sup>**[q] = **<sup>T</sup>** · (S−<sup>1</sup> **<sup>e</sup>q**). We show below that each such **U**[q] is real.

By definition, **U**[q] = k <sup>r</sup>=1(**T**[r] <sup>S</sup>−<sup>1</sup>[r, q]) = k <sup>r</sup>=1(D [r, r] S−<sup>1</sup>[r, p] S−<sup>1</sup> [r, q]). We consider two cases below.


The proof when <sup>A</sup> <sup>∈</sup> RA<sup>k</sup>×<sup>k</sup> and E ∈ <sup>Q</sup><sup>k</sup> follows from a similar reasoning as above, and from the following facts about real algebraic matrices.


## **6 Conclusion**

In this paper, we investigated eventual non-negativity and positivity for matrices and the weighted sum of powers of matrices (ENNSoM/EPSoM). First, we showed reductions from and to specific problems on linear recurrences, which allowed us give complexity lower and upper bounds. Second, we developed a new and generic perturbation-based reduction technique from simple matrices to diagonalizable matrices, which allowed us to transfer results between these settings.

Most of our results, that we showed in the rational setting, hold even with real-algebraic matrices by adapting the complexity notions and depending on corresponding results for ultimate positivity for linear recurrences and related problems over reals. As future work, we would like to extend our techniques for other problems of interest like the *existence* of a matrix power where all entries are non-negative or zero. Finally, the line of work started here could lead to effective algorithms and applications in varied areas ranging from control theory systems to cyber-physical systems, where eventual properties of matrices play a crucial role.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Decision Problems in a Logic for Reasoning About Reconfigurable Distributed Systems**

Marius Bozga(B) , Lucas Bueri , and Radu Iosif

Univ. Grenoble Alpes, CNRS, Grenoble INP, VERIMAG, 38000 Saint-Martin-d'Heres, France ` marius.bozga@univ-grenoble-alpes.fr

**Abstract.** We consider a logic used to describe sets of configurations of distributed systems, whose network topologies can be changed at runtime, by reconfiguration programs. The logic uses inductive definitions to describe networks with an unbounded number of components and interactions, written using a multiplicative conjunction, reminiscent of Bunched Implications [37] and Separation Logic [39]. We study the complexity of the satisfiability and entailment problems for the configuration logic under consideration. Additionally, we consider the robustness property of degree boundedness (is every component involved in a bounded number of interactions?), an ingredient for decidability of entailments.

# **1 Introduction**

Distributed systems are increasingly used as critical parts of the infrastructure of our digital society, as in e.g., datacenters, e-banking and social networking. In order to address maintenance (e.g., replacement of faulty and obsolete network nodes by new ones) and data traffic issues (e.g., managing the traffic inside a datacenter [35]), the distributed systems community has recently put massive effort in designing algorithms for *reconfigurable systems*, whose network topologies change at runtime [23]. However, dynamic reconfiguration in the form of software or network upgrades has been recognized as one of the most important sources of cloud service outage [25].

This paper contributes to a logical framework that addresses the timely problems of formal *modeling* and *verification* of reconfigurable distributed systems. The basic building blocks of this framework are (i) a Hoare-style program proof calculus [1] used to write formal proofs of correctness of reconfiguration programs, and (ii) an invariant synthesis method [6] that proves the safety (i.e., absence of reachable error configurations) of the configurations defined by the assertions that annotate a reconfiguration program. These methods are combined to prove that an initially correct distributed system cannot reach an error state, following the execution of a given reconfiguration sequence.

The assertions of the proof calculus are written in a logic that defines infinite sets of configurations, consisting of *components* (i.e., processes running on different nodes of the network) connected by *interactions* (i.e., multi-party channels alongside which messages between components are transfered). Systems that share the same architectural style (e.g., pipeline, ring, star, tree, etc.) and differ by the number of components and interactions are described using inductively defined predicates. Such configurations can be modified either by (a) adding or removing components and interactions (reconfiguration), or (b) changing the local states of components, by firing interactions.

The assertion logic views components and interactions as *resources*, that can be created or deleted, in the spirit of resource logics *a la `* Bunched Implications [37], or Separation Logic [39]. The main advantage of using resource logics is their support for *local reasoning* [12]: reconfiguration actions are specified by pre- and postconditions mentioning only the resources involved, while framing out the rest of the configuration.

The price to pay for this expressive power is the difficulty of automating the reasoning in these logics. This paper makes several contributions in the direction of proof automation, by studying the complexity of the *satisfiability* and *entailment* problems, for the configuration logic under consideration. Additionally, we study the complexity of a robustness property [27], namely *degree boundedness* (is every component involved in a bounded number of interactions?). In particular, the latter problem is used as a prerequisite for defining a fragment with a decidable entailment problem. For space reasons, the proofs of the technical results are given in [5].

#### **1.1 Motivating Example**

The logic studied in this paper is motivated by the need for an assertion language that supports reasoning about dynamic reconfigurations in a distributed system. For instance, consider a distributed system consisting of a finite (but unknown) number of *components* (processes) placed in a ring, executing the same finite-state program and communicating via *interactions* that connect the *out* port of a component to the *in* port of its right neighbour, in a round-robin fashion, as in Fig. 1(a). The behavior of a component is a machine with two states, T and H, denoting whether the component has a token (T) or not (H). A component *ci* without a token may receive one, by executing a transition H *in* −→ T, simultaneously with its left neighbour *cj*, that executes the transition T *out* −→ H. Then, we say that the interaction (*cj*,*out*,*ci*,*in*) has fired, moving a token one position to the right in the ring. Note that there can be more than one token, moving independently in the system, as long as no token overtakes another token.

The token ring system is formally specified by the following inductive rules:

$$\begin{aligned} \mathsf{ring}\_{h,t}(\mathbf{x}) & \leftarrow \exists \mathbf{y} \exists z \ [\mathbf{x}] \ \mathsf{@}q \ast \langle \mathbf{x}.out, z.in \rangle \ast \mathsf{chain}\_{h',t'}(\mathbf{z},\mathbf{y}) \ast \langle \mathsf{y}.out, \mathbf{x}.in \rangle\\ \mathsf{chain}\_{h,t}(\mathbf{x},\mathbf{y}) & \leftarrow \exists \mathbf{z}.\ [\mathbf{x}] \ \mathsf{@}q \ast \langle \mathsf{x}.out, z.in \rangle \ast \mathsf{chain}\_{h',t'}(\mathbf{z},\mathbf{y})\\ \mathsf{chain}\_{0,1}(\mathbf{x},\mathbf{x}) & \leftarrow [\mathbf{x}] \ \mathsf{@}\mathsf{T} \qquad \mathsf{chain}\_{1,0}(\mathbf{x},\mathbf{x}) \leftarrow [\mathbf{x}] \ \mathsf{@}\mathsf{H} \qquad \mathsf{chain}\_{0,0}(\mathbf{x},\mathbf{x}) \leftarrow [\mathbf{x}]\\ \text{where } h' \stackrel{\text{def}}{=} \begin{cases} \max(h-1,0), \text{ if } q = \mathsf{H} \\ h \end{cases} \text{ and } \mathsf{r}' \stackrel{\text{def}}{=} \begin{cases} \max(t-1,0), \text{ if } q = \mathsf{T} \\ t \end{cases} \end{aligned}$$

The predicate ring*h*,*t*(*x*) describes a ring with at least two components, such that at least *h* (resp. *t*) components are in state H (resp. T). The ring consists of a component *x* in state *q*, described by the formula [*x*]@*q*, an interaction from the *out* port of *x* to the *in* port of another component *z*, described as *x*.*out*, *z*.*in*, a separate chain of components stretching from *z* to *y* (chain*<sup>h</sup>* ,*<sup>t</sup>* (*z*,*y*)), and an interaction connecting the *out* port of component *y* to the *in* port of component *x* (*y*.*out*, *x*.*in*). Inductively, a chain consists of a component [*x*]@*q*, an interaction *x*.*out*, *z*.*in* and a separate chain*<sup>h</sup>* ,*<sup>t</sup>* (*z*,*y*). Figure 1(b) depicts the unfolding of the inductive definition of the token ring, with the

**Fig. 1.** Inductive Specification and Reconfiguration of a Token Ring

existentially quantified variables *z* from the above rules α-renamed to *z*1,*z*2,... to avoid confusion.

A *reconfiguration program* takes as input a mapping of program variables to components and executes a sequence of *basic operations* i.e., component/interaction creation/deletion, involving the components and interactions denoted by these variables. For instance, the reconfiguration program in Fig. 1(c) takes as input three adjacent components, mapped to the variables x, y and z, respectively, removes the component y together with its left and right interactions and reconnects x directly with z. Programming reconfigurations is error-prone, because the interleaving between reconfiguration actions and interactions in a distributed system may lead to bugs that are hard to trace. For instance, if a reconfiguration program removes the last component in state T (resp. H) from the system, no token transfer interaction may fire and the system deadlocks.

We prove absence of such errors using a Hoare-style proof system [1], based on the logic introduced above as assertion language. For instance, the proof from Fig. 1(c) shows that the reconfiguration sequence applied to a component y in state H (i.e., [*y*]@H) in a ring with at least *h* ≥ 2 components in state H and at least *t* ≥ 1 components in state T leads to a ring with at least *h* − 1 components in state H and at least *t* in state T; note that the states of the components may change during the execution of the reconfiguration program, as tokens are moved by interactions.

The proof in Fig. 1(c) uses *local axioms* specifying, for each basic operation, only those components and interactions required to avoid faulting, with a *frame rule* {φ} P {ψ}⇒{φ ∗ ✄ ✂ ✁ *<sup>F</sup>* } <sup>P</sup> {ψ∗*F*}; for readability, the frame formulæ (from the preconditions of the conclusion of the frame rule applications) are enclosed in boxes.

The proof also uses the *consequence rule* {φ} P {ψ}⇒{φ } P {ψ } that applies if φ is stronger than φ and ψ is weaker than ψ. The side conditions of the consequence rule require checking the validity of the entailments ring*h*,*t*(*y*) |= ∃*x*∃*z* . *x*.*out*, *y*.*in* ∗ [*y*]@<sup>H</sup> ∗ *y*.*out*, *<sup>z</sup>*.*in* ∗ chain*h*−1,*t*(*z*,*x*) and chain*h*−1,*t*(*z*, *<sup>x</sup>*) ∗ *x*.*out*, *<sup>z</sup>*.*in* |<sup>=</sup> ring*h*−1,*t*(*z*), for all *h* ≥ 2 and *t* ≥ 1. These side conditions can be automatically discharged using the results on the decidability of entailments given in this paper. Additionally, checking the satisfiability of a precondition is used to detect trivially valid Hoare triples.

#### **1.2 Related Work**

Formal modeling coordinating architectures of component-based systems has received lots of attention, with the development of architecture description languages (ADL), such as BIP [3] or REO [2]. Many such ADLs have extensions that describe programmed reconfiguration, e.g., [19,30], classified according to the underlying formalism used to define their operational semantics: *process algebras* [13,33], *graph rewriting* [32,41,44], *chemical reactions* [43] (see the surveys [7,11]). Unfortunately, only few ADLs support formal verification, mainly in the flavour of runtime verification [10,17,20,31] or finite-state model checking [14].

Parameterized verification of unbounded networks of distributed processes uses mostly hard-coded coordinating architectures (see [4] for a survey). A first attempt at specifying architectures by logic is the *interaction logic* of Konnov et al. [29], a combination of Presburger arithmetic with monadic uninterpreted function symbols, that can describe cliques, stars and rings. More structured architectures (pipelines and trees) can be described using a second-order extension [34]. However, these interaction logics are undecidable and lack support for automated reasoning.

Specifying parameterized component-based systems by inductive definitions is not new. *Network grammars* [26,32,40] use context-free grammar rules to describe systems with linear (pipeline, token-ring) architectures obtained by composition of an unbounded number of processes. In contrast, we use predicates of unrestricted arities to describe architectural styles that are, in general, more complex than trees. Moreover, we write inductive definitions using a resource logic, suitable also for writing Hoare logic proofs of reconfiguration programs, based on local reasoning [12].

Local reasoning about concurrent programs has been traditionally the focus of Concurrent Separation Logic (CSL), based on a parallel composition rule [36], initially with a non-interfering (race-free) semantics [8] and later combining ideas of assumeand rely-guarantee [28,38] with local reasoning [22,42] and abstract notions of framing [15,16,21]. However, the body of work on CSL deals almost entirely with sharedmemory multithreading programs, instead of distributed systems, which is the aim of our work. In contrast, we develop a resource logic in which the processes do not just share and own resources, but become mutable resources themselves.

The techniques developed in this paper are inspired by existing techniques for similar problems in the context of Separation Logic (SL) [39]. For instance, we use an abstract domain similar to the one defined by Brotherston et al. [9] for checking satisfiability of symbolic heaps in SL and reduce a fragment of the entailment problem in our logic to SL entailment [18]. In particular, the use of existing automated reasoning techniques for SL has pointed out several differences between the expressiveness of our logic and that of SL. First, the configuration logic describes hypergraph structures, in which edges are --tuples for - ≥ 2, instead of directed graphs as in SL, where is a parameter of the problem: considering to be a constant strictly decreases the complexity of the problem. Second, the degree (number of hyperedges containing a given vertex) is unbounded, unlike in SL, where the degree of heaps is constant. Therefore, we dedicate an entire section (Sect. 4) to the problem of deciding the existence of a bound (and computing a cut-off) on the degree of the models of a formula, used as a prerequisite for the encoding of the entailment problems from the configuration logic as SL entailments.

## **2 Definitions**

We denote by N the set of positive integers including zero. For a set *A*, we define *A*<sup>1</sup> def = *A*, *Ai*<sup>+</sup><sup>1</sup> def <sup>=</sup> *Ai* <sup>×</sup>*A*, for all *<sup>i</sup>* <sup>≥</sup> 1, and *<sup>A</sup>*<sup>+</sup> <sup>=</sup> - *<sup>i</sup>*≥<sup>1</sup> *Ai* , where × denotes the Cartesian product. We denote by pow(*A*) the powerset of *A* and by mpow(*A*) the power-multiset (set of multisets) of *A*. The cardinality of a finite set *A* is denoted as ||*A*||. By writing *A* ⊆*fin B* we mean that *A* is a finite subset of *B*. Given integers *i* and *j*, we write [*i*, *j*] for the set {*i*,*i*+1,..., *j*}, assumed to be empty if *i* > *j*. For a tuple **t** = *t*1,...,*tn*, we define |**t**| def = *n*, **t***<sup>i</sup>* def = *ti* and **t**[*i*, *<sup>j</sup>*] def = *ti*,...,*tj*. By writing *x* = *poly*(*y*), for given *x*,*y* ∈ N, we mean that there exists a polynomial function *f* : N → N, such that *x* ≤ *f*(*y*).

#### **2.1 Configurations**

We model distributed systems as hypergraphs, whose vertices are *components* (i.e., the nodes of the network) and hyperedges are *interactions* (i.e., describing the way the components communicate with each other). The components are taken from a countably infinite set C, called the *universe*. We consider that each component executes its own copy of the same *behavior*, represented as a finite-state machine B = (*P*,*Q* ,→− ), where *P* is a finite set of *ports*, *Q* is a finite set of *states* and →⊆ − *Q* ×*P* ×*Q* is a transition relation. Intuitively, each transition *q p* −→ *q* of the behavior is triggerred by a visible event, represented by the port *p*. For instance, the behavior of the components of the token ring system from Fig. 1(a) is <sup>B</sup> = ({*in*,*out*},{H,T},{<sup>H</sup> *in* −→ <sup>T</sup>,<sup>T</sup> *out* −→ H}). *The universe* C *and the behavior* B = (*P*,*Q* ,→− ) *are fixed in the rest of this paper.*

We introduce a logic for describing infinite sets of *configurations* of distributed systems with unboundedly many components and interactions. A configuration is a snapshot of the system, describing the topology of the network (i.e., the set of present components and interactions) together with the local state of each component:

**Definition 1.** *A* configuration *is a tuple* γ = (*C*,*I*,ρ)*, where:*


The last condition requires that there is an infinite pool of components in each state *q* ∈ *Q* ; since C is infinite and *Q* is finite, this condition is feasible. For example, the configurations of the token ring from Fig. 1(a) are ({*c*1,...,*cn*},{(*ci*,*out*,*c*(*<sup>i</sup>* mod *<sup>n</sup>*)+1,*in*) | *i* ∈ [1,*n*]},ρ), where ρ : C → {H,T} is a state map. The ring topology is described by the set of components {*c*1,...,*cn*} and interactions {(*ci*,*out*,*c*(*<sup>i</sup>* mod *<sup>n</sup>*)+1,*in*) | *i* ∈ [1,*n*]}.

Intuitively, an interaction (*c*1, *p*1,...,*cn*, *pn*) synchronizes transitions labeled by the ports *p*1,..., *pn* from the behaviors (i.e., replicas of the state machine B) of *c*1,...,*cn*, respectively. Note that the components *ci* are not necessary part of the configuration. The interactions are classified according to their sequence of ports, called the *interaction type* and let Inter def = *P* <sup>+</sup> be the set of interaction types; an interaction type models, for instance, the passing of a certain kind of message (e.g., request, acknowledgement, etc.). From an operational point of view, two interactions that differ by a permutation of indices e.g., (*c*1, *p*1,...,*cn*, *pn*) and (*ci*<sup>1</sup> , *pi*<sup>1</sup> ,...,*cin* , *pin* ) such that {*i*1,...,*in*} = [1,*n*], are equivalent, since the set of transitions is the same; nevertheless, we chose to distinguish them in the following, exclusively for reasons of simplicity.

Below we define the composition of configurations, as the union of disjoint sets of components and interactions:

**Definition 2.** *The composition of two configurations* γ*<sup>i</sup>* = (*Ci*,*Ii*,ρ)*, for i* = 1,2*, such that C*<sup>1</sup> ∩*C*<sup>2</sup> = 0/ *and I*<sup>1</sup> ∩*I*<sup>2</sup> = 0/*, is defined as* γ<sup>1</sup> • γ<sup>2</sup> def = (*C*<sup>1</sup> <sup>∪</sup>*C*2,*I*<sup>1</sup> <sup>∪</sup>*I*2,ρ)*. The composition* γ<sup>1</sup> • γ<sup>2</sup> *is undefined if C*<sup>1</sup> ∩*C*<sup>2</sup> = 0/ *or I*<sup>1</sup> ∩*I*<sup>2</sup> = 0/*.*

In analogy with graphs, the *degree* of a configuration is the maximum number of interactions from the configuration that involve a (possibly absent) component:

**Definition 3.** *The* degree *of a configuration* γ = (*C*,*I*,ρ) *is defined as* δ(γ) def = max*c*∈<sup>C</sup> <sup>δ</sup>*c*(γ)*, where* <sup>δ</sup>*c*(γ) def = ||{(*c*1, *p*1,...,*cn*, *pn*) ∈ *I* | *c* = *ci*, *i* ∈ [1,*n*]}||*.*

For instance, the configuration of the system from Fig. 1(a) has degree two.

### **2.2 Configuration Logic**

Let V and A be countably infinite sets of *variables* and *predicates*, respectively. For each predicate A ∈ A, we denote its arity by #A. The formulæ of the *Configuration Logic* (CL) are described inductively by the following syntax:

$$\Phi := \mathsf{emp} \mid [\mathsf{x}] \mid \langle \mathsf{x}\_1.p\_I, \dots, \mathsf{x}\_n.p\_n \rangle \mid \mathsf{x} \oplus q \mid \mathsf{x} = \mathsf{y} \mid \mathsf{x} \neq \mathsf{y} \mid \mathsf{A}(\mathsf{x}\_1, \dots, \mathsf{x}\_{\mathsf{A}\mathsf{A}}) \mid \phi \ast \phi \mid \exists \mathsf{x} \dots \phi \rangle$$

where *x*,*y*,*x*1,... ∈ V, *q* ∈ *Q* and A ∈ A. A formula [*x*], *x*1.*p1*,...,*xn*.*pn*, *x*@*q* and A(*x*1,...,*x*#A) is called a *component*, *interaction*, *state* and *predicate* atom, respectively. These formulæ are also referred to as *atoms*. The connective ∗ is called the *separating conjunction*. We use the shorthand [*x*]@*<sup>q</sup>* def = [*x*] <sup>∗</sup> *<sup>x</sup>*@*q*. For instance, the formula [*x*]@*<sup>q</sup>* <sup>∗</sup> [*y*]@*q* ∗ *x*.*out*, *y*.*in*∗*x*.*in*, *y*.*out* describes a configuration consisting of two distinct components, denoted by the values of *x* and *y*, in states *q* and *q* , respectively, and two interactions binding the *out* port of one to the *in* port of the other component.

A formula is said to be *pure* if and only if it is a separating conjunction of state atoms, equalities and disequalities. A formula with no occurrences of predicate atoms (resp. existential quantifiers) is called *predicate-free* (resp. *quantifier-free*). A variable is *free* if it does not occur within the scope of an existential quantifier ; we note fv(φ) the set of free variables of φ. A *sentence* is a formula with no free variables. A *substitution* φ[*x*1/*y*<sup>1</sup> ...*xn*/*yn*] replaces simultaneously every free occurrence of *xi* by *yi* in φ, for all *i* ∈ [1,*n*]. Before defining the semantics of CL formulæ, we introduce the set of inductive definitions that assigns meaning to predicates:

**Definition 4.** *A* set of inductive definitions (SID) Δ *consists of* rules A(*x*1,...,*x*#A) ← φ*, where x*1,...,*x*#<sup>A</sup> *are pairwise distinct variables, called* parameters*, such that* fv(φ) ⊆ {*x*1,...,*x*#A}*. The rule* A(*x*1,...,*x*#A) ← φ defines A *and we denote by* defΔ(A) *the set of rules from* Δ *that define* A*.*

Note that having distinct parameters in a rule is without loss of generality, as e.g., a rule A(*x*1,*x*1) ← φ can be equivalently written as A(*x*1,*x*2) ← *x*<sup>1</sup> = *x*<sup>2</sup> ∗ φ. As a convention, we shall always use the names *x*1,...,*x*#<sup>A</sup> for the parameters of a rule that defines A.

The semantics of CL formulæ is defined by a satisfaction relation <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> φ between configurations and formulæ. This relation is parameterized by a *store* ν : V → C mapping the free variables of a formula into components from the universe (possibly absent from γ) and an SID Δ. We write ν[*x* ← *c*] for the store that maps *x* into *c* and agrees with ν on all variables other than *x*. The definition of the satisfaction relation is by induction on the structure of formulæ, where γ = (*C*,*I*,ρ) is a configuration (Definition 1):

<sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> emp ⇐⇒ *C* = 0/ and *I* = 0/ <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> [*x*] ⇐⇒ *C* = {ν(*x*)} and *I* = 0/ <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> *x*1.*p1*,...,*xn*.*pn* ⇐⇒ *C* = 0/ and *I* = {(ν(*x*1), *p*1,...,ν(*xn*), *pn*)} <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> *<sup>x</sup>*@*<sup>q</sup>* ⇐⇒ <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> emp and ρ(ν(*x*)) = *q* <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> *<sup>x</sup>* <sup>∼</sup> *<sup>y</sup>* ⇐⇒ <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> emp and ν(*x*) ∼ ν(*y*), for all ∼∈ {=,=} <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> <sup>A</sup>(*y*1,...,*y*#A) ⇐⇒ <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> φ[*x*1/*y*1,...,*x*#A/*y*#A], for some rule A(*x*1,...,*x*#A) ← φ from Δ <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> <sup>φ</sup><sup>1</sup> <sup>∗</sup> <sup>φ</sup><sup>2</sup> ⇐⇒ exist <sup>γ</sup>1, <sup>γ</sup>2, such that <sup>γ</sup> <sup>=</sup> <sup>γ</sup><sup>1</sup> • <sup>γ</sup><sup>2</sup> and <sup>γ</sup>*<sup>i</sup>* <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> φ*i*, for *i* = 1,2 <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> <sup>∃</sup>*<sup>x</sup>* . <sup>φ</sup> ⇐⇒ <sup>γ</sup> <sup>|</sup>=ν[*x*←*c*] <sup>Δ</sup> φ, for some *c* ∈ C

If <sup>φ</sup> is a sentence, the satisfaction relation <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> φ does not depend on the store, written γ |=<sup>Δ</sup> φ, in which case we say that γ is a *model* of φ. If φ is a predicate-free formula, the satisfaction relation does not depend on the SID, written <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>φ</sup>. A formula <sup>φ</sup> is *satisfiable* if and only if the sentence ∃*x*<sup>1</sup> ...∃*xn* . φ has a model, where fv(φ) = {*x*1,...,*xn*}. A formula φ *entails* a formula ψ, written φ |=<sup>Δ</sup> ψ if and only if, for any configuration γ and store <sup>ν</sup>, we have <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> <sup>φ</sup> only if <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> ψ.

#### **2.3 Separation Logic**

Separation Logic (SL) [39] will be used in the following to prove several technical results concerning the decidability and complexity of certain decision problems for CL. For self-containment reasons, we define SL below. The syntax of SL formulæ is described by the following grammar:

$$\Phi \colon = \mathsf{emp} \mid \mathsf{x\_0} \mapsto (\mathsf{x\_1}, \dots, \mathsf{x\_{\mathsf{A}}}) \mid \mathsf{x} = \mathsf{y} \mid \mathsf{x} \neq \mathsf{y} \mid \mathsf{A}(\mathsf{x\_1}, \dots, \mathsf{x\_{\mathsf{A}}}) \mid \phi \* \phi \mid \exists \mathsf{x} \ . \ \phi \mid$$

where *x*,*y*,*x*0,*x*1,... ∈ V, A ∈ A and K ≥ 1 is an integer constant. Formulæ of SL are interpreted over finite partial functions h : C *fin* CK, called *heaps*1, by a satisfaction relation h <sup>ν</sup> φ, defined inductively as follows:

<sup>1</sup> We use the universe C here for simplicity, the definition works with any countably infinite set.

$$\begin{array}{lll} \mathsf{h} \Vdash\_{\Delta}^{\mathsf{V}} \mathsf{em} \mathsf{p} & \iff \mathsf{h} = \mathsf{0} \\ \mathsf{h} \Vdash\_{\Delta}^{\mathsf{V}} \mathsf{x}\_{0} \mapsto (\mathsf{x}\_{1}, \dots, \mathsf{x}\_{\mathsf{A}}) & \iff \mathsf{dom}(\mathsf{h}) = \{\mathsf{v}(\mathsf{x}\_{0})\} \text{ and } \mathsf{h}(\mathsf{v}(\mathsf{x}\_{0})) = \langle \mathsf{v}(\mathsf{x}\_{1}), \dots, \mathsf{v}(\mathsf{x}\_{\mathsf{A}}) \rangle, \\ \mathsf{h} \Vdash\_{\Delta}^{\mathsf{V}} \phi\_{1} \ast \phi\_{2} & \iff \text{there exist } \mathsf{h}\_{1}, \mathsf{h}\_{2} \text{ such that } \mathsf{dom}(\mathsf{h}\_{1}) \cap \mathsf{dom}(\mathsf{h}\_{2}) = \emptyset, \\ \mathsf{h} = \mathsf{h}\_{1} \cup \mathsf{h}\_{2} \text{ and } \mathsf{h}\_{i} \Vdash\_{\Delta}^{\mathsf{V}} \phi\_{i}, \text{ for both } i = 1, 2 \end{array}$$

where dom(h) def = {*c* ∈ C | h(*c*) is defined} is the domain of the heap and (dis-) equalities, predicate atoms and existential quantifiers are defined same as for CL.

#### **2.4 Decision Problems**

We define the decision problems that are the focus of the upcoming sections. As usual, a decision problem is a class of yes/no queries that differ only in their input. In our case, the input consists of an SID and one or two predicates, written between square brackets.

**Definition 5.** *We consider the following problems, for a SID* Δ *and predicates* A,B ∈ A*:*


The size of a formula φ is the total number of occurrences of symbols needed to write it down, denoted by size(φ). The size of a SID Δ is size(Δ) def = ∑A(*x*1,...,*x*#A)←φ∈<sup>Δ</sup> size(φ) + #A+1. Other parameters of a SID Δ are:


For a decision problem P[Δ,A,B], we consider its (*k*, -)-bounded versions P(*k*,-) [Δ,A,B], obtained by restricting the predicates and interaction atoms occurring Δ to arity(Δ) ≤ *k* and intersize(Δ) ≤ -, respectively, where *k* and are either positive integers or infinity. We consider, for each P[Δ,A,B], the subproblems P(*k*,-) [Δ,A,B] corresponding to the three cases (1) *k* < ∞ and - = ∞, (2) *k* = ∞ and - < ∞, and (3) *k* = ∞ and - = ∞. As we explain next, this is because, for the decision problems considered (Definition 5), the complexity for the case *k* < ∞,- < ∞ matches the one for the case *k* < ∞, -= ∞.

Satisfiability (1) and entailment (3) arise naturally during verification of reconfiguration programs. For instance, Sat[Δ,φ] asks whether a specification φ of a set configurations (e.g., a pre-, post-condition, or a loop invariant) is empty or not (e.g., an empty precondition typically denotes a vacuous verification condition), whereas Entl[Δ,φ,ψ] is used as a side condition for the Hoare rule of consequence, as in e.g., the proof from Fig. 1(c). Moreover, entailments must be proved when checking inductiveness of a user-provided loop invariant.

The Bnd[Δ,φ] problem is used to check a necessary condition for the decidability of entailments i.e., Entl[Δ,φ,ψ]. If Bnd[Δ,φ] has a positive answer, we can reduce the problem Entl[Δ,φ,ψ] to an entailment problem for SL, which is always interpreted over heaps of bounded degree [18]. Otherwise, the decidability status of the entailment problem is open, for configurations of unbounded degree, such as the one described by the example below.

*Example 1.* The following SID describes star topologies with a central controller connected to an unbounded number of workers stations:

$$\begin{aligned} \text{Controller}(\mathbf{x}) &\leftarrow [\mathbf{x}] \ast \text{Worker}(\mathbf{x})\\ \text{Worker}(\mathbf{x}) &\leftarrow \exists \mathbf{y} \text{ . } \langle \mathbf{x}.out, \mathbf{y}.in \rangle \ast [\mathbf{y}] \ast \text{Worker}(\mathbf{x}) \end{aligned} \qquad \begin{aligned} \text{Worker}(\mathbf{x}) &\leftarrow \mathbf{e} \text{ . } \mathbf{m} \mathbf{p} \end{aligned}$$

### **3 Satisfiability**

We show that the satisfiability problem (Definition 5, point 1) is decidable, using a method similar to the one pioneered by Brotherston et al. [9], for checking satisfiability of inductively defined symbolic heaps in SL. We recall that a formula π is *pure* if and only if it is a separating conjunction of equalities, disequalities and state atoms. In the following, the order of terms in (dis-)equalities is not important i.e., we consider *x* = *y* (resp. *x* = *y*) and *y* = *x* (resp. *y* = *x*) to be the same formula.

**Definition 6.** *The* closure cl(π) *of a pure formula* π *is the limit of the sequence* <sup>π</sup>0,π1,π2,... *such that* <sup>π</sup><sup>0</sup> <sup>=</sup> <sup>π</sup> *and, for each i* <sup>≥</sup> <sup>0</sup>*,* <sup>π</sup>*i*+<sup>1</sup> *is obtained by joining (with* <sup>∗</sup>*) all of the following formulæ to* <sup>π</sup>*<sup>i</sup> :*


Because only finitely many such formulæ can be added, the sequence of pure formulæ from Definition 6 is bound to stabilize after polynomially many steps. A pure formula is satisfiable if and only if its closure does not contain contradictory literals i.e., *x* = *y* and *x* = *y*, or *x*@*q* and *x*@*q* , for *q* = *q* ∈ *Q* . We write *x* ≈<sup>π</sup> *y* (resp. *x* ≈π*y*) if and only if *x* = *y* (resp. *x* = *y*) occurs in cl(π) and not(*x* ≈<sup>π</sup> *y*) (resp. not(*x* ≈π*y*)) whenever *x* ≈<sup>π</sup> *y* (resp. *x*≈π*y*) does not hold. Note that e.g., not(*x* ≈<sup>π</sup> *y*) is not the same as *x* ≈π*y*.

*Base tuples* constitute the abstract domain used by the algorithms for checking satisfiability (point 1 of Definition 5) and boundedness (point 2 of Definition 5), defined as follows:

**Definition 7.** *A* base tuple *is a triple* t = (*C* ,*I* ,π)*, where:*


*A base tuple is called* satisfiable *if and only if* π *is satisfiable and the following hold:*


*We denote by* SatBase *the set of satisfiable base tuples.*

Intuitively, a base tuple is an abstract representation of a configuration, where components (resp. interactions) are represented by variables (resp. tuples of variables). Note that a base tuple (*C* ,*I* ,π) is unsatisfiable if *C* (*I* ) contains the same variable (tuple of variables) twice (for the same interaction type), hence the use of multisets in the definition of base tuples. It is easy to see that checking the satisfiability of a given base tuple (*C* ,*I* ,π) can be done in time *poly*(||*C* ||+∑τ∈Inter ||*I* (τ)||+size(π)).

We define a partial *composition* operation on satisfiable base tuples, as follows:

$$(\mathcal{L}\_1^\sharp, I\_1^\sharp, \mathfrak{m}\_1) \otimes (\mathcal{L}\_2^\sharp, I\_2^\sharp, \mathfrak{m}\_2) \stackrel{\text{def}}{=} (\mathcal{L}\_1^\sharp \cup \mathcal{L}\_2^\sharp, I\_1^\sharp \cup I\_2^\sharp, \mathfrak{m}\_1 \ast \mathfrak{m}\_2),$$

where the union of multisets is lifted to functions Inter <sup>→</sup> mpow(V+) in the usual way. The composition operation <sup>⊗</sup> is undefined if (*C* 1,*I* <sup>1</sup> ,π1)⊗(*C* 2,*I* <sup>2</sup> ,π2) is not satisfiable e.g., if *C* <sup>1</sup> <sup>∩</sup>*C* <sup>2</sup> <sup>=</sup> 0/, *I* <sup>1</sup> (τ)∩*I*

<sup>2</sup> (τ) = 0/, for some τ ∈ Inter, or π<sup>1</sup> ∗ π<sup>2</sup> is not satisfiable. Given a pure formula π and a set of variables *X*, the projection π↓*<sup>X</sup>* removes from π all atoms <sup>α</sup>, such that fv(α) ⊆ *<sup>X</sup>*. The *projection* of a base tuple (*C* ,*I* ,π) on a variable set *X* is formally defined below:

$$\begin{aligned} (\mathcal{L}^{\sharp}, I^{\sharp}, \mathfrak{m})\_{\downarrow} & \stackrel{\text{def}}{=} \left( \mathcal{L}^{\sharp} \cap X, \lambda \mathfrak{m} \cdot \{ \langle x\_{1}, \dots, x\_{|\mathfrak{m}|} \rangle \in I^{\sharp}(\mathfrak{x}) \mid x\_{1}, \dots, x\_{|\mathfrak{m}|} \in X \}, \text{cl}(\text{dist}(I^{\sharp}) \ast \mathfrak{m})\_{\downarrow} \right) \\ \text{where } \text{dist}(I^{\sharp}) & \stackrel{\text{def}}{=} \mathsf{K} \underset{\mathfrak{t} \in \mathrm{Inter}}{\text{letter}} \; \mathsf{K} \begin{aligned} (\mathfrak{x}\_{1}, \dots, \mathfrak{x}\_{|\mathfrak{t}|}) \in I^{\sharp}(\mathfrak{x}) \; \mathsf{K} \begin{cases} (\mathit{dist}(I^{\sharp}) \ast \mathfrak{m})\_{\downarrow} \underline{\mathfrak{x}} = \mathsf{K} \\ 1 \le i < j \le |\mathfrak{t}| \end{cases} \end{aligned} \right) \end{aligned}$$

The *substitution* operation (*C* ,*I* ,π)[*x*1/*y*1,...,*xn*/*yn*] replaces simultaneously each *xi* with *yi* in *C* , *I* and π, respectively. We lift the composition, projection and substitution operations to sets of satisfiable base tuples, as usual.

Next, we define the base tuple corresponding to a quantifier- and predicate-free formula φ = ψ ∗ π, where ψ consists of component and interaction atoms and π is pure. Since, moreover, we are interested in those components and interactions that are visible through a given indexed set of parameters *X* = {*x*1,...,*xn*}, for a variable *y*, we denote by {{*y*}}*<sup>X</sup>* <sup>π</sup> the parameter *xi* with the least index, such that *y* ≈<sup>π</sup> *xi*, or *y* itself, if no such parameter exists. We define the following sets of formulæ:

$$\begin{array}{l} \mathsf{Base}(\phi, X) \stackrel{\text{def}}{=} \left\{ \begin{array}{l} \left( \mathcal{L}^{\sharp}, I^{\sharp}, \mathfrak{n} \right) \right\}, \text{ if } \left( \mathcal{L}^{\sharp}, I^{\sharp}, \mathfrak{n} \right) \text{ is satisfiable} \\ \varnothing \end{array} \right. \\ \text{where } \mathcal{L}^{\sharp} \stackrel{\text{def}}{=} \left\{ \left\{ \left\{ \mathbf{x} \right\} \right\}\_{\mathfrak{n}}^{\mathsf{X}} \mid \left[ \mathbf{x} \right] \text{ occurs in } \mathsf{y} \right\} \\ I^{\sharp} \stackrel{\text{def}}{=} \lambda \left\langle p\_{1}, \ldots, p\_{s} \right\rangle . \left\{ \left\langle \left\{ \left\{ \mathbf{y}\_{1} \right\} \right\}\_{\mathfrak{n}}^{\mathsf{X}}, \ldots, \left\{ \left\{ \mathbf{y}\_{s} \right\} \right\}\_{\mathfrak{n}}^{\mathsf{X}} \right\rangle \mid \left\langle \mathbf{y}\_{1}. p\_{1}, \ldots, \mathbf{y}\_{s}. p\_{s} \right\rangle \text{ occurs in } \mathsf{y} \right\} \end{array}$$

We consider a tuple of variables →− *X* , having a variable *X*(A) ranging over pow(SatBase), for each predicate A that occurs in Δ. With these definitions, each rule of Δ:

$$\mathsf{A}(\mathsf{x}\_{1},\ldots,\mathsf{x}\_{\mathsf{A}}) \leftarrow \exists \mathsf{y}\_{1} \ldots \exists \mathsf{y}\_{m} \, . \, \mathsf{ϕ} \ast \mathsf{B}\_{1}(z\_{1}^{1},\ldots,z\_{\mathsf{A}\mathsf{B}\_{1}}^{1}) \ast \ldots \ast \mathsf{B}\_{h}(z\_{1}^{h},\ldots,z\_{\mathsf{A}\mathsf{B}\_{h}}^{h}) \, .$$

where φ is a quantifier- and predicate-free formula, induces the constraint:

$$\mathcal{X}(\mathsf{A}) \supseteq \left( \mathsf{BASE}(\boldsymbol{\phi}, \{\mathsf{x}\_{1}, \ldots, \mathsf{x}\_{\#\mathsf{A}}\}) \otimes \bigotimes\_{\ell=1}^{h} \mathcal{X}(\mathsf{B}\_{\ell}) [\mathsf{x}\_{1}/z\_{1}^{\ell}, \ldots, \mathsf{x}\_{\#\mathsf{B}\_{\ell}}/z\_{\#\mathsf{B}\_{\ell}}^{\ell}] \right) \downarrow\_{\mathsf{x}\_{1}, \ldots, \mathsf{x}\_{\#\mathsf{A}}} \tag{1}$$

**input output** 1: initially 2: **for** , with quantifier- and predicate-free **do** 3: Base 4: **while** still change **do** 5: **for do** 6: **if** there exist **then** 7: Base

**Fig. 2.** Algorithm for the Computation of the Least Solution

Let Δ be the set of such constraints, corresponding to the rules in Δ and let *µ* →− *X* .Δ be the tuple of least solutions of the constraint system generated from Δ, indexed by the tuple of predicates that occur in Δ, such that *µ* →− *X* .Δ (A) denotes the entry of *µ* →− *X* .Δ correponding to A. Since the composition and projection are monotonic operations, such a least solution exists and is unique. Since SatBase is finite, the least solution can be attained in a finite number of steps, using a Kleene iteration (see Fig. 2).

We state below the main result leading to an elementary recursive algorithm for the satisfiability problem (Theorem 1). The intuition is that, if *µ* →− *X* .Δ (A) is not empty, then it contains only satisfiable base tuples, from which a model of A(*x*1,...,*x*#A) can be built.

**Lemma 1.** Sat[Δ,A] *has a positive answer if and only if µ*→− *X* .Δ (A) = 0/*.*

If the maximal arity of the predicates occurring in Δ is bound by a constant *k*, no satisfiable base tuple (*C* ,*I* ,π) can have a tuple *y*1,...,*y*|τ| ∈ *I* (τ), for some τ ∈ Inter, such that |τ| > *k*, since all variables *y*1,...,*y*|τ<sup>|</sup> are parameters denoting distinct components (point 3 of Definition 7). Hence, the upper bound on the size of a satisfiable base tuple is constant, in both the *k* < ∞,- < ∞ and *k* < ∞, - = ∞ cases, which are, moreover indistinguishable complexity-wise (i.e., both are NP-complete). In contrast, in the cases *k* = ∞,- < ∞ and *k* = ∞, - = ∞, the upper bound on the size of satisfiable base tuples is polynomial and simply exponential in size(Δ), incurring a complexity gap of one and two exponentials, respectively. The theorem below states the main result of this section:

**Theorem 1.** Sat(*k*,∞) [Δ,A] *is* NP*-complete for k* <sup>≥</sup> <sup>4</sup>*,* Sat(∞,-) [Δ,A] *is* EXP*-complete and* Sat[Δ,A] *is in* 2EXP*.*

The upper bounds are consequences of the fact that the size of a satisfiable base tuple is bounded by a simple exponential in the min(arity(Δ),intersize(Δ)), hence the number of such tuples is doubly exponential in min(arity(Δ),intersize(Δ)). The lower bounds are by a polynomial reduction from the satisfiability problem for SL [9].

*Example 2.* The doubly-exponential upper bound for the algorithm computing the least solution of a system of constraints of the form (1) is necessary, in general, as illustrated by the following worst-case example. Let *n* be a fixed parameter and consider the *n*-arity predicates *A*1,...,*An* defined by the following SID:

$$\begin{array}{lcl} A\_l(\mathbf{x}\_1, \ldots, \mathbf{x}\_n) \leftarrow \mathsf{X} \mathop{\mathsf{\mathop{\!{n-i}}}}\_{j=0}^{n-i} A\_{l+1}(\mathbf{x}\_1, \ldots, \mathbf{x}\_{l-1}, [\mathbf{x}\_l, \ldots, \mathbf{x}\_n]^j), & \text{for all } i \in [1, n-1] \\ A\_n(\mathbf{x}\_1, \ldots, \mathbf{x}\_n) \leftarrow \langle \mathbf{x}\_1, p, \ldots, \mathbf{x}\_n, p \rangle & A\_n(\mathbf{x}\_1, \ldots, \mathbf{x}\_n) \leftarrow \mathbf{emp} \end{array}$$

where, for a list of variables *xi*,...,*xn* and an integer *<sup>j</sup>* <sup>≥</sup> 0, we write [*xi*,...,*xn*]*<sup>j</sup>* for the list rotated to the left *j* times (e.g., [*x*1,*x*2,*x*3,*x*4,*x*5] <sup>2</sup> = *x*3,*x*4,*x*5,*x*1,*x*2). In this example, when starting with *A*1(*x*1,...,*xn*) one eventually obtains predicate atoms *An*(*xi*<sup>1</sup> ,...,*xin* ), for any permutation *xi*<sup>1</sup> ,...,*xin* of *x*1,...,*xn*. Since *An* may choose to create or not an interaction with that permutation of variables, the total number of base tuples generated for *A*<sup>1</sup> is 2*n*! . That is, the fixpoint iteration generates 22*O*(*n*log*n*) base tuples, whereas the size of the input of Sat[Δ,A] is *poly*(*n*).

## **4 Degree Boundedness**

The boundedness problem (Definition 5, point 2) asks for the existence of a bound on the degree (Definition 3) of the models of a sentence ∃*x*<sup>1</sup> ...∃*x*#<sup>A</sup> . A(*x*1,...,*x*#A). Intuitively, the Bnd[Δ,A] problem has a negative answer if and only if there are increasingly large unfoldings (i.e., expansions of a formula by replacement of a predicate atom with one of its definitions) of A(*x*1,...,*x*#A) repeating a rule that contains an interaction atom involving a parameter of the rule, which is always bound to the same component. We formalize the notion of unfolding below:

**Definition 8.** *Given a predicate* A *and a sequence* (r1,*i*1),...,(r*n*,*in*) ∈ (Δ×N) +*, where* <sup>r</sup><sup>1</sup> : <sup>A</sup>(*x*1,...,*x*#A) <sup>←</sup> <sup>φ</sup> <sup>∈</sup> <sup>Δ</sup>*, the* unfolding <sup>A</sup>(*x*1,...,*x*#A) (r1,*i*1)...(r*n*,*in*) ========⇒<sup>Δ</sup> <sup>ψ</sup> *is inductively defined as (1)* ψ = φ *if n* = 1*, and (2)* ψ *is obtained from* φ *by replacing its i*1*-th predicate atom* B(*y*1,...,*y*#B) *with* ψ1[*x*1/*y*1,...,*x*#B/*y*#B]*, where* <sup>B</sup>(*x*1,...,*x*#B) (r2,*i*2)...(r*n*,*in*) ========⇒<sup>Δ</sup> <sup>ψ</sup><sup>1</sup> *is an unfolding, if n* <sup>&</sup>gt; <sup>1</sup>*.*

We show that the Bnd[Δ,A] problem can be reduced to the existence of increasingly large unfoldings or, equivalently, a cycle in a finite directed graph, built by a variant of the least fixpoint iteration algorithm used to solve the satisfiability problem (Fig. 3).

**Definition 9.** *Given satisfiable base pairs* t,u ∈ SatBase *and a rule from* Δ*:*

r : A(*x*1,...,*x*#A) ← ∃*y*<sup>1</sup> ...∃*ym* . φ ∗B1(*z* 1 <sup>1</sup>,...,*z* 1 #B<sup>1</sup> ) ∗ ... ∗B*h*(*z h* <sup>1</sup>,...,*z h* #B*h* )

*where* φ *is a quantifier- and predicate-free formula, we write* (A,t) (r,*i*) ∼∼∼∼- (B,u) *if and only if* B = B*<sup>i</sup> and there exist satisfiable base tuples* t1,...,u = t*i*,...,t*<sup>h</sup>* ∈ SatBase*, such that* t ∈ Base(φ,{*x*1,...,*x*#A})⊗*<sup>h</sup>* -<sup>=</sup><sup>1</sup> t-[*x*1/*z*- <sup>1</sup>,...,*x*#B-/*z*- #B- ] ↓*<sup>x</sup>*1,...,*x*#<sup>A</sup> *. We define the directed graph with edges labeled by pairs* (r,*i*) ∈ Δ×N*:*

$$\mathcal{G}(\Delta) \stackrel{\text{def}}{=} \left( \{ \text{def}(\Delta) \times \text{SatBase} \}, \{ \langle (\mathsf{A}, \mathsf{t}), (\mathsf{r}, \mathsf{i}), (\mathsf{B}, \mathsf{u}) \rangle \mid (\mathsf{A}, \mathsf{t})^{\stackrel{(\mathsf{r}, \mathsf{i})}{\longrightarrow}} (\mathsf{B}, \mathsf{u}) \} \right)$$

The graph *G*(Δ) is built by the algorithm in Fig. 3, a slight variation of the classical Kleene iteration algorithm for the computation of the least solution of the constraints of the form (1). A path (A1,t1) (r<sup>1</sup> ,*i*<sup>1</sup> ) ∼∼∼∼- (A2,t2) (r<sup>2</sup> ,*i*<sup>2</sup> ) ∼∼∼∼- ... (r*<sup>n</sup>* ,*in* ) ∼∼∼∼-(A*n*,t*n*) in *G*(Δ) induces a unique

**input output** 1: initially 2: **for** , with quantifier- and predicate-free **do** 3: Base 4: **while** *V* or *E* still change **do** 5: **for do** 6: **if** there exist **then** 7: Base 8: 9:

**Fig. 3.** Algorithm for the Construction of *G*(Δ)

unfolding <sup>A</sup>1(*x*1,...,*x*#A<sup>1</sup> ) (r1,*i*1)...(r*n*,*in*) ========⇒<sup>Δ</sup> <sup>φ</sup> (Definition 8). Since the vertices of *<sup>G</sup>*(Δ) are pairs (A,t), where t is a satisfiable base tuple and the edges of *G*(Δ) reflect the construction of the base tuples from the least solution of the constraints (1), the outcome φ of this unfolding is always a satisfiable formula.

An *elementary cycle* of *G*(Δ) is a path from some vertex (B,u) back to itself, such that (B,u) does not occur on the path, except at its endpoints. The cycle is, moreover, *reachable* from (A,t) if and only if there exists a path (A,t) (r<sup>1</sup> ,*i*<sup>1</sup> ) ∼∼∼∼- ... (r*<sup>n</sup>* ,*in* ) ∼∼∼∼- (B,u) in *G*(Δ). We reduce the complement of the Bnd[Δ,A] problem, namely the existence of an infinite set of models of ∃*x*<sup>1</sup> ...∃*x*#<sup>A</sup> . A(*x*1,...,*x*#A) of unbounded degree, to the existence of a reachable elementary cycle in *G*(Δ ), where Δ is obtained from Δ, as described in the following.

First, we consider, for each predicate B ∈ def(Δ), a predicate B , of arity #B + 1, not in def(Δ) i.e., the set of predicates for which there exists a rule in Δ. Second, for each rule <sup>B</sup>0(*x*1,...,*x*#B<sup>0</sup> ) ← ∃*y*<sup>1</sup> ...∃*ym* . <sup>φ</sup> <sup>∗</sup> ∗*<sup>h</sup>* -=2B-(*z*- 1,...,*z*- #B- ) ∈ Δ, where φ is a quantifier- and predicate-free formula and iv(φ) ⊆ fv(φ) denotes the subset of variables occurring in interaction atoms in φ, the SID Δ has the following rules:

B <sup>0</sup>(*x*1,...,*x*#B<sup>0</sup> ,*x*#B0+1) ← ∃*y*<sup>1</sup> ...∃*ym* . <sup>φ</sup> <sup>∗</sup> ∗<sup>ξ</sup>∈iv(φ)*x*#B0+<sup>1</sup> <sup>=</sup> <sup>ξ</sup> <sup>∗</sup> ∗*h* -=2B -(*z* - <sup>1</sup>,...,*z* - #B- ,*x*#B0+1) (2)

$$\begin{aligned} \mathsf{B}\_{0}^{\prime}(\mathsf{x}\_{1},\ldots,\mathsf{x}\_{\#\mathsf{B}\_{0}},\mathsf{x}\_{\#\mathsf{B}\_{0}+1}) &\leftarrow \exists \mathsf{y}\_{1}\ldots\exists \mathsf{y}\_{m}\,.\,\mathsf{ϕ}\,\ast \,\mathsf{x}\_{\#\mathsf{B}\_{0}+1} = \mathsf{\mathsf{f}}\,\ast\\ &\quad \mathsf{\mathsf{x}}\,\mathsf{\mathsf{f}}\_{\ell=2}^{h}\mathsf{B}\_{\ell}^{\prime}(\mathsf{z}\_{1}^{\ell},\ldots,\mathsf{z}\_{\#\mathsf{B}\_{\ell}}^{\ell},\mathsf{x}\_{\#\mathsf{B}\_{0}+1}) \end{aligned} \tag{3}$$
 for each variable  $\mathsf{f} \in \mathsf{Jy}(\mathsf{A})$  that occurs in an interaction atom in  $\mathsf{A}$ 

for each variable ξ ∈ iv(φ), that occurs in an interaction atom in φ.

There exists a family of models (with respect to Δ) of ∃*x*<sup>1</sup> ...∃*x*#<sup>A</sup> . A(*x*1,...,*x*#A) of unbounded degree if and only if these are models of ∃*x*<sup>1</sup> ...∃*x*#A+<sup>1</sup> . A (*x*1,...,*x*#A+1) (with respect to Δ ) and the last parameter of each predicate B ∈ def(Δ ) can be mapped, in each of the these models, to a component that occurs in unboundedly many interactions. The latter condition is equivalent to the existence of an elementary cycle, containing a rule of the form (3), that it, moreover, reachable from some vertex (A ,t) of *G*(Δ ), for some t ∈ SatBase. This reduction is formalized below:

**Lemma 2.** *There exists an infinite sequence of configurations* γ1, γ2,... *such that* γ*<sup>i</sup>* |=<sup>Δ</sup> ∃*x*<sup>1</sup> ...∃*x*#<sup>A</sup> . A(*x*1,...,*x*#A) *and* δ(γ*i*) < δ(γ*i*+1)*, for all i* ≥ 1 *if and only if G*(Δ ) *has an elementary cycle containing a rule (3), reachable from a node* (A ,t)*, for* t ∈ SatBase*.*

The complexity result below uses a similar argument on the maximal size of (hence the number of) base tuples as in Theorem 1, leading to similar complexity gaps:

**Theorem 2.** Bnd(*k*,∞) [Δ,A] *is in* co*-*NP*,* Bnd(∞,-) [Δ,A] *is in* EXP*,* Bnd[Δ,A] *is in* 2EXP*.*

Moreover, the construction of *G*(Δ ) allows to prove the following cut-off result:

**Proposition 1.** *Let* <sup>γ</sup> *be a configuration and* <sup>ν</sup> *be a store, such that* <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> A(*x*1,...,*x*#A)*. If* Bnd(*k*,-) [Δ,A] *then (1)* δ(γ) = *poly*(size(Δ)) *if k* < ∞*,* - = ∞*, (2)* δ(γ) = 2*poly*(size(Δ)) *if k* = ∞*,* - < ∞ *and (3)* δ(γ) = 22*poly*(size(Δ)) *if k* = ∞*,* -= ∞*.*

# **5 Entailment**

This section is concerned with the entailment problem Entl[Δ,A,B], that asks whether <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> ∃*x*#A+<sup>1</sup> ...∃*x*#<sup>B</sup> . B(*x*1,...,*x*#B), for every configuration γ and store ν, such that <sup>γ</sup> <sup>|</sup>=<sup>ν</sup> <sup>Δ</sup> A(*x*1,...,*x*#A). For instance, the proof from Fig. 1(c) relies on the following entailments, that occur as the side conditions of the Hoare logic rule of consequence:

$$\begin{array}{l} \mathsf{ring}\_{h,t}(\mathsf{y}) \mid =\_{\Delta} \exists \mathsf{x} \exists z. [\mathsf{y}] \oplus \mathsf{H} \* \langle \mathsf{y}.out, z.in \rangle \* \mathsf{chain}\_{h-1,t}(z, \mathsf{x}) \* \langle \mathsf{x}.out, \mathsf{y}.in \rangle \\\ [\mathsf{z}] \circledast \mathsf{H} \* \langle \mathsf{z}.out, \mathsf{x}.in \rangle \* \mathsf{chain}\_{h-1,t}(\mathsf{x}, \mathsf{y}) \* \langle \mathsf{y}.out, \mathsf{z}.in \rangle \vdash\_{\Delta} \mathsf{ring}\_{h,t}(z) \end{array}$$

By introducing two fresh predicates A<sup>1</sup> and A2, defined by the rules:

$$\mathsf{A}\_{1}(\mathsf{x}\_{1}) \leftarrow \exists \mathsf{y} \exists \mathsf{z}. \left[\mathsf{x}\_{1}\right] \mathsf{@H} \models \langle\mathsf{x}\_{1}.out,\mathsf{z}.in\rangle \ast \mathsf{ch}\mathsf{a}\mathsf{in}\_{h-1,t}(\mathsf{z},\mathsf{y}) \ast \langle\mathsf{y}.out,\mathsf{x}\_{1}.in\rangle\tag{4}$$

$$\mathsf{A}\_{2}(\mathsf{x}\_{1},\mathsf{x}\_{2}) \leftarrow \exists \mathsf{z}. [\mathsf{x}\_{1}] \circledast \mathsf{H} \* \langle \mathsf{x}\_{1}.out,\mathsf{z}.in \rangle \* \mathsf{ch} \mathsf{ain}\_{h-1,l}(\mathsf{z},\mathsf{x}\_{2}) \* \langle \mathsf{x}\_{2}.out,\mathsf{x}\_{1}.in \rangle \tag{5}$$

the above entailments are equivalent to Entl[Δ,ring*h*,*t*,A1] and Entl[Δ,A2,ring*h*,*t*], respectively, where Δ consists of the rules (4) and (5), together with the rules that define the ring*h*,*<sup>t</sup>* and chain*h*,*<sup>t</sup>* predicates (Sect. 1.1).

We show that the entailment problem is undecidable, in general (Thm. 3), and recover a decidable fragment, by means of three syntactic conditions, typically met in our examples. These conditions use the following notion of *profile*:

**Definition 10.** *The* profile *of a SID* Δ *is the pointwise greatest function* λΔ : A → *pow*(N)*, mapping each predicate* A *into a subset of* [1,#A]*, such that, for each rule* A(*x*1,...,*x*#A) ← φ *from* Δ*, each atom* B(*y*1,...,*y*#B) *from* φ *and each i* ∈ λΔ(B)*, there exists j* ∈ λΔ(A)*, such that xj and yi are the same variable.*

The profile identifies the parameters of a predicate that are always replaced by a variable *x*1,...,*x*#<sup>A</sup> in each unfolding of A(*x*1,...,*x*#A), according to the rules in Δ; it is computed by a greatest fixpoint iteration, in time *poly*(size(Δ)).

**Definition 11.** *A rule* <sup>A</sup>(*x*1,...,*x*#A) ← ∃*y*<sup>1</sup> ...∃*ym* . <sup>φ</sup> <sup>∗</sup> ∗*<sup>h</sup>* -=1B-(*z*- 1,...,*z*- #B- )*, where* φ *is a quantifier- and predicate-free formula, is said to be:*


*A SID* Δ *is* progressing*,* connected *and* e-restricted *if and only if each rule in* Δ *is* progressing*,* connected *and* e-restricted*, respectively.*

For example, the SID consisting of the rules from Sect. 1.1, together with rules (4) and (5) is progressing, connected and e-restricted.

We recall that defΔ(A) is the set of rules from Δ that define A and denote by def<sup>∗</sup> <sup>Δ</sup>(A) the least superset of defΔ(A) containing the rules that define a predicate from a rule in def∗ <sup>Δ</sup>(A). The following result shows that the entailment problem becomes undecidable as soon as the connectivity condition is even slightly lifted:

**Theorem 3.** Entl[Δ,A,B] *is undecidable, even when* Δ *is progressing and e-restricted, and only the rules in* def∗ <sup>Δ</sup>(A) *are connected (the rules in* def<sup>∗</sup> <sup>Δ</sup>(B) *may be disconnected).*

On the positive side, we prove that Entl[Δ,A,B] is decidable, if Δ is progressing, connected and e-restricted, assuming further that Bnd[Δ,A] has a positive answer. In this case, the bound on the degree of the models of A(*x*1,...,*x*#A) is effectively computable, using the algorithm from Fig. 3 (see Proposition 1 for a cut-off result) and denote by B this bound, throughout this section.

The proof uses a reduction of Entl[Δ,A,B] to a similar problem for SL, showed to be decidable [18]. We recall the definition of SL, interpreted over heaps h : C *fin* CK, introduced in Sect. 2.3. SL rules are denoted as A(*x*1,...,*x*#(A)) ← φ, where φ is a SL formula, such that fv(φ) ⊆ {*x*1,...,*x*#(A)} and SL SIDs are denoted as Δ. The profile λΔ is defined for SL same as for CL (Definition 10).

**Definition 12.** *A SL rule* A(*x*1,...,*x*#(A)) ← φ *from a SID* Δ *is said to be:*


Note that the definitions of progressing and connected rules are different for SL, compared to CL (Definition 11); in the rest of this section, we rely on the context to distinguish progressing (connected) SL rules from progressing (connected) CL rules. Moreover, e-restricted rules are defined in the same way for CL and SL (point 3 of Definition 11). A tight upper bound on the complexity of the entailment problem between SL formulæ, interpreted by progressing, connected and e-restricted SIDs, is given below:

**Theorem 4 (**[18]**).** *The SL entailment problem is in* 22*poly*(width(Δ)·logsize(Δ))*, for progressing, connected and e-restricted SIDs.*

The reduction of Entl[Δ,A,B] to SL entailments is based on the idea of viewing a configuration as a logical structure (hypergraph), represented by a undirected *Gaifman graph*, in which every tuple from a relation (hyperedge) becomes a clique [24]. In a similar vein, we encode a configuration, of degree at most B, by a heap of degree K (Definition 13), such that K is defined using the following integer function:

$$\text{pos}(i,j,k) \stackrel{\text{def}}{=} 1 + \mathfrak{B} \cdot \sum\_{\ell=1}^{j-1} |\mathfrak{r}\_{\ell}| + i \cdot |\mathfrak{r}\_{j}| + k$$

where Inter def <sup>=</sup> {τ1,...,τ*M*} is the set of interaction types and *Q* def = {*q*1,...,*qN*} is the set of states of the behavior B = (*P*,*Q* ,→− ) (Sect. 2). Here *i* ∈ [0,B−1] denotes an interaction of type *j* ∈ [1,*M*] and *k* ∈ [0,*N* −1] denotes a state. We use *M* and *N* throughout the rest of this section, to denote the number of interaction types and states, respectively.

For a set *I* of interactions, let Tuples*<sup>j</sup> <sup>I</sup>*(*c*) def = {*c*1,...,*cn* | (*c*1, *p*1,...,*cn*, *pn*) ∈ *I*, τ *<sup>j</sup>* = *p*1,..., *pn*, *c* ∈ {*c*1,...,*cn*}} be the tuples of components from an interaction of type τ *<sup>j</sup>* from *I*, that contain a given component *c*.

**Definition 13.** *Given a configuration* γ = (*C*,*I*,ρ)*, such that* δ(γ) ≤ B*, a* Gaifman heap *for* γ *is a heap* h : C *fin* CK*, where* K def = pos(0,*M* +1,*N*)*,* dom(h) = nodes(γ) *and, for all c*<sup>0</sup> ∈ dom(h)*, such that* h(*c*0) = *c*1,...,*c*K*, the following hold:*


*We denote by* G(γ) *the set of Gaifman heaps for* γ*.*

Intuitively, if h is a Gaifman heap for γ and *c*<sup>0</sup> ∈ dom(h), then the first entry of h(*c*0) indicates whether *<sup>c</sup>*<sup>0</sup> is present (condition <sup>1</sup> of Definition 13), the next <sup>B</sup> · <sup>∑</sup>*<sup>M</sup> <sup>j</sup>*=<sup>1</sup> |τ *<sup>j</sup>*| entries are used to encode the interactions of each type τ *<sup>j</sup>* (condition 2 of Definition 13), whereas the last *N* entries are used to represent the state of the component (condition 3 of Definition 13). Note that the encoding of configurations by Gaifman heaps is not unique: two Gaifman heaps for the same configuration may differ in the order of the tuples from the encoding of an interaction type and the choice of the unconstrained entries from h(*c*0), for each *c*<sup>0</sup> ∈ dom(h). On the other hand, if two configurations have the same Gaifman heap encoding, they must be the same configuration.

*Example 3.* Figure 4(b) shows a Gaifman heap for the configuration in Fig. 4(a), where each component belongs to at most 2 interactions of type *out*,*in*.

We build a SL SID Δ that generates the Gaifman heaps of the models of the predicate atoms occurring in a progressing CL SID Δ. The construction associates to each variable *<sup>x</sup>*, that occurs free or bound in a rule from <sup>Δ</sup>, a unique <sup>K</sup>-tuple of variables <sup>η</sup>(*x*) <sup>∈</sup> <sup>V</sup>K,

**Fig. 4.** Gaifman Heap for a Chain Configuration

that represents the image of the store value ν(*x*) in a Gaifman heap h i.e., h(ν(*x*)) = ν(η(*x*)). Moreover, we consider, for each predicate symbol A ∈ def(Δ), an annotated predicate symbol <sup>A</sup><sup>ι</sup> of arity #A<sup>ι</sup> = (K+1)· #A, where <sup>ι</sup> : [1,#A]×[1,*M*] <sup>→</sup> <sup>2</sup>[0,B−1] is a map associating each parameter *i* ∈ [1,#A] and each interaction type τ *<sup>j</sup>*, for *j* ∈ [1,*M*], a set of integers ι(*i*, *j*) denoting the positions of the encodings of the interactions of type τ *<sup>j</sup>*, involving the value of *xi*, in the models of Aι(*x*1,...,*x*#A,η(*x*1),...,η(*x*#A)) (point 2 of Definition 13). Then Δ contains rules of the form:

$$\overline{\mathsf{A}}\_{\mathsf{t}}(\mathsf{x}\_{1},\ldots,\mathsf{x}\_{\mathsf{\mathsf{A}}(\mathsf{A})},\mathsf{\eta}(\mathsf{x}\_{1}),\ldots,\mathsf{\eta}(\mathsf{x}\_{\mathsf{\mathsf{A}}(\mathsf{A})})) \leftarrow \tag{6}$$

$$\exists \mathsf{y}\_{1}\ldots\exists \mathsf{y}\_{m} \exists \mathsf{\eta}(\mathsf{y}\_{1})\ldots\exists \mathsf{\eta}(\mathsf{y}\_{m})\ .\ \overline{\mathsf{y}}\*\pi \,\*\, \mathsf{x}\*\, \mathsf{\mathsf{A}}\, ^{h}\_{\ell=1}\mathsf{B}\_{\mathsf{t}^{\ell}}^{\ell}(z\_{1}^{\ell},\ldots,z\_{\mathsf{\mathsf{A}}(\mathsf{B}^{\ell})},\mathsf{\eta}(z\_{1}^{\ell}),\ldots,\mathsf{\eta}(z\_{\mathsf{\mathsf{A}}(\mathsf{B}^{\ell})}^{\ell}))$$

for which <sup>Δ</sup> has a *stem rule* <sup>A</sup>(*x*1,...,*x*#(A)) ← ∃*y*<sup>1</sup> ...∃*ym* . <sup>ψ</sup>∗<sup>π</sup>∗∗*<sup>h</sup>* -=1B- (*z*- 1,...,*z*- #B- ), where ψ∗π is a quantifier- and predicate-free formula and π is the conjunction of equalities and disequalities from ψ∗π. However, not all rules (6) are considered in Δ, but only the ones meeting the following condition:

**Definition 14.** *A rule of the form (6) is* well-formed *if and only if, for each i* ∈ [1,#A] *and each j* ∈ [1,*M*]*, there exists a set of integers Yi*, *<sup>j</sup>* ⊆ [0,B−1]*, such that:*


We denote by Δ the set of well-formed rules (6), such that, moreover:

$$\begin{array}{l} \mathsf{W} \stackrel{\mathsf{def}}{=} \mathsf{x}\_{1} \mapsto \mathsf{η}(\mathsf{x}\_{1}) \; \* & \mathsf{PK}\_{\mathsf{x} \in \mathsf{Ver}(\mathsf{V})} \mathsf{CompStates}\_{\mathsf{V}}(\mathsf{x}) \; \* & \mathsf{PK}\_{i=1}^{\#\mathsf{A}} \; \mathsf{InterAtoms}\_{\mathsf{V}}(\mathsf{x}\_{i}), \; \mathsf{where:} \\ \mathsf{CompStates}\_{\mathsf{V}}(\mathsf{x}) \stackrel{\mathsf{def}}{=} \mathsf{x}\_{[\mathsf{x}] \; \mathsf{occur} \; \mathsf{in} \; \mathsf{V}} \langle \mathsf{η}(\mathsf{x}) \rangle\_{1} = \mathsf{x} \, \* & \mathsf{K}\_{\mathsf{x} \in \mathsf{@}\_{\mathsf{K}} \; \mathsf{comv} \; \mathsf{in} \; \mathsf{v}} \langle \mathsf{η}(\mathsf{x}) \rangle\_{\mathsf{State}(\mathsf{k})} = \mathsf{x} \\ \mathsf{InterAtoms}\_{\mathsf{V}}(\mathsf{x}\_{i}) \stackrel{\mathsf{def}}{=} \mathsf{x}\_{j=1}^{M} \, \* & \mathsf{x}\_{p=1}^{r\_{j}} \, \langle \mathsf{η}(\mathsf{x}\_{i}) \rangle\_{\mathsf{inter}(j,k\_{p}^{j})} = \mathsf{x}\_{p}^{j} \, \mathsf{and} \, \left\{ k\_{1}^{j}, \ldots, k\_{r\_{j}}^{j} \right\} \stackrel{\mathsf{def}}{=} \mathsf{l}(i,j) \, \left\langle \mathcal{Z}\_{j}(\mathsf{x}\_{i}) \right\rangle\_{\mathsf{let}(\mathsf{k})} \end{array}$$

Here for two tuples of variables **x** = *x*1,...,*xk* and **y** = *y*1,...,*yk*, we denote by **<sup>x</sup>** <sup>=</sup> **<sup>y</sup>** the formula ∗*<sup>k</sup> <sup>i</sup>*=1*xi* = *yi*. Intuitively, the SL formula CompStatesψ(*x*) realizes the encoding of the component and state atoms from ψ, in the sense of points (1) and (3) from Definition 13, whereas the formula InterAtomsψ(*xi*) realizes the encodings of the interactions involving a parameter *xi* in the stem rule (point 2 of Definition 13). In particular, the definition of InterAtomsψ(*xi*) uses the fact that the rule is well-formed.

We state below the main result of this section on the complexity of the entailment problem. The upper bounds follow from a many-one reduction of Entl[Δ,A,B] to the SL entailment Aι(*x*1,...,*x*#A,η(*x*1),...,η(*x*#A)) -<sup>Δ</sup> ∃*x*#B+<sup>1</sup> ...∃*x*#<sup>B</sup> ∃η(*x*#B+1)...∃η(*x*#B) . B<sup>ι</sup> (*x*1,...,*x*#B,η(*x*1),...,η(*x*#B)), in combination with the upper bound provided by Theorem 4, for SL entailments. If *k* < ∞, the complexity is tight for CL, whereas gaps occur for *k* = ∞,-< ∞ and *k* = ∞, - = ∞, due to the cut-off on the degree bound (Proposition 1), which impacts the size of Δ and time needed to generate it from Δ.

**Theorem 5.** *If* Δ *is progressing, connected and e-restricted and, moreover,* Bnd[Δ,A] *has a positive answer,* Entl*k*,-[Δ,A,B] *is in* 2EXP*,* Entl∞,-[Δ,A,B] *is in* 3EXP ∩ 2EXP*hard, and* Entl[Δ,A,B] *is in* 4EXP ∩ 2EXP*-hard.*

# **6 Conclusions and Future Work**

We study the satisfiability and entailment problems in a logic used to write proofs of correctness for dynamically reconfigurable distributed systems. The logic views the components and interactions from the network as resources and reasons also about the local states of the components. We reuse existing techniques for Separation Logic [39], showing that our configuration logic is more expressive than SL, fact which is confirmed by a number of complexity gaps. Closing up these gaps and finding tight complexity classes in the more general cases is considered for future work. In particular, we aim at lifting the boundedness assumption on the degree of the configurations that must be considered to check the validity of entailments.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Proving Non-Termination and Lower Runtime Bounds with LoAT (System Description)**

Florian Frohn(B) and J¨urgen Giesl(B)

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany florian.frohn@cs.rwth-aachen.de, giesl@informatik.rwth-aachen.de

**Abstract.** We present the *Loop Acceleration Tool* (LoAT), a powerful tool for proving non-termination and worst-case lower bounds for programs operating on integers. It is based on the novel calculus from [10,11] for *loop acceleration*, i.e., transforming loops into non-deterministic straight-line code, and for finding non-terminating configurations. To implement it efficiently, LoAT uses a new approach based on unsat cores. We evaluate LoAT's power and performance by extensive experiments.

# **1 Introduction**

Efficiency is one of the most important properties of software. Consequently, *automated complexity analysis* is of high interest to the software verification community. Most research in this area has focused on deducing *upper* bounds on the worst-case complexity of programs. In contrast, the *Loop Acceleration Tool* LoAT aims to find performance bugs by deducing *lower* bounds on the worst-case complexity of programs operating on integers. Since non-termination implies the lower bound ∞, LoAT is also equipped with non-termination techniques.

LoAT is based on *loop acceleration* [4,5,9–11,15], which replaces loops by non-deterministic code: The resulting program chooses a value n, representing the number of loop iterations in the original program. To be sound, suitable constraints on n are synthesized to ensure that the original loop allows for at least n iterations. Moreover, the transformed program updates the program variables to the same values as n iterations of the original loop, but it does so in a single step. To achieve that, the loop body is transformed into a *closed form*, which is parameterized in <sup>n</sup>. In this way, LoAT is able to compute *symbolic under-approximations* of programs, i.e., every execution path in the resulting transformed program corresponds to a path in the original program, but not necessarily vice versa. In contrast to many other techniques for computing underapproximations, the symbolic approximations of LoAT cover *infinitely many runs* of *arbitrary length*.

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2).

c The Author(s) 2022

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 712–722, 2022. https://doi.org/10.1007/978-3-031-10769-6\_41

*Contributions:* The main new feature of the novel version of LoAT presented in this paper is the *integration* of the *loop acceleration calculus* from [10,11], which combines different loop acceleration techniques in a modular way, into LoAT's framework. This enables LoAT to use the loop acceleration calculus for the analysis of full integer programs, whereas the standalone implementation of the calculus from [10,11] was only applicable to single loops without branching in the body. To control the application of the calculus, we use a new technique based on unsat cores (see Sect. 5). The new version of LoAT is evaluated in extensive experiments. See [14] for all proofs.

## **2 Preliminaries**

Let L⊇{*main*} be a finite set of *locations*, where *main* is the *canonical start location* (i.e., the entry point of the program), and let x := [x1,...,xd] be the vector of *program variables*. Furthermore, let T V be a countably infinite set of *temporary variables*, which are used to model non-determinism, and let sup<sup>Z</sup> := <sup>∞</sup>. We call an arithmetic expression <sup>e</sup> an *integer expression* if it evaluates to an integer when all variables in e are instantiated by integers. LoAT analyzes tail-recursive programs operating on integers, represented as *integer transition systems* (ITSs), i.e., sets of *transitions* <sup>f</sup>(<sup>x</sup>) <sup>p</sup> −→ g(a) [ϕ] where f,g ∈ L, the *update* a is a vector of d integer expressions over TV∪ <sup>x</sup>, the *cost* <sup>p</sup> is either an arithmetic expression over TV∪ <sup>x</sup> or <sup>∞</sup>, and the *guard* <sup>ϕ</sup> is a conjunction of inequations over integer expressions with variables from TV∪ x. <sup>1</sup> For example, consider the loop on the left and the corresponding transition t*loop* on the right.

$$\text{while } x > 0 \text{ do } x \gets x - 1 \newline \qquad f(x) \xrightarrow{1} f(x - 1) \left[ x > 0 \right] \newline \qquad (t\_{loop}) \newline$$

Here, the cost 1 instructs LoAT to use the number of loop iterations as cost measure. LoAT allows for arbitrary *user defined* cost measures, since the user can choose any polynomials over the program variables as costs. LoAT synthesizes transitions with cost ∞ to represent non-terminating runs, i.e., such transitions are not allowed in the input.

<sup>A</sup> *configuration* is of the form <sup>f</sup>(c) with f ∈ L and <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>. For any entity s /∈ L and any arithmetic expressions b = [b1,...,bd], let s(b) denote the result of replacing each variable <sup>x</sup><sup>i</sup> in <sup>s</sup> by <sup>b</sup>i, for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup>. Moreover, <sup>V</sup>*ars*(s) denotes the program variables and T V(s) denotes the temporary variables occurring in s. For an integer transition system T , a configuration f(<sup>c</sup>) *evaluates to* <sup>g</sup>(c ) *with cost* <sup>k</sup> <sup>∈</sup> <sup>Z</sup>∪ {∞}, written <sup>f</sup>(<sup>c</sup>) <sup>k</sup> −→<sup>T</sup> g(c ), if there exist a transition f(<sup>x</sup>) <sup>p</sup> −→ g(a) [ϕ] ∈ T and an instantiation of its temporary variables with integers such that the following holds:

$$
\varphi(\vec{c}) \wedge \vec{c}' = \vec{a}(\vec{c}) \wedge k = p(\vec{c}).
$$

<sup>1</sup> LoAT can also analyze the complexity of certain non-tail-recursive programs, see [9]. For simplicity, we restrict ourselves to tail-recursive programs in the current paper.

As usual, we write f(<sup>c</sup>) <sup>k</sup> →<sup>∗</sup> <sup>T</sup> <sup>g</sup>(c ) if f(c) evaluates to g(c ) in arbitrarily many steps, and the sum of the costs of all steps is k. We omit the costs if they are irrelevant. The *derivation height* of <sup>f</sup>(c) is

$$dh\_T(f(\vec{c})) := \sup \{ k \mid \exists g(\vec{c}').f(\vec{c}) \stackrel{k}{\to}\_T^\* g(\vec{c}') \},$$

and the *runtime complexity* of <sup>T</sup> is

$$rc\_T(n) := \sup \{ dh\_T(\min(c\_1, \dots, c\_d)) \mid |c\_1| + \dots + |c\_d| \le n \}.$$

<sup>T</sup> terminates if no configuration *main*(c) admits an infinite −→<sup>T</sup> -sequence and T is *finitary* if no configuration *main*(c) admits a −→<sup>T</sup> -sequence with cost ∞. Otherwise, <sup>c</sup> is a *witness of non-termination* or a *witness of infinitism*, respectively. Note that termination implies finitism for ITSs where no transition has cost ∞. However, our approach may transform non-terminating ITSs into terminating, infinitary ITSs, as it replaces non-terminating loops by transitions with cost ∞.

## **3 Overview of LoAT**

The goal of LoAT is to compute a lower bound on *rc*<sup>T</sup> or even prove nontermination of T . To this end, it repeatedly applies program simplifications, so-called *processors*. When applying them with a suitable strategy (see [8,9]), one eventually obtains *simplified transitions* of the form *main*(<sup>x</sup>) <sup>p</sup> −→ f(a) [ϕ] where <sup>f</sup> <sup>=</sup> *main*. As LoAT's processors are *sound for lower bounds* (i.e., if they transform T to T , then *dh*<sup>T</sup> <sup>≥</sup> *dh*<sup>T</sup> - ), such a simplified transition gives rise to the lower bound <sup>I</sup><sup>ϕ</sup> ·<sup>p</sup> on *dh*<sup>T</sup> (*main*(x)) (where I<sup>ϕ</sup> denotes the indicator function of ϕ, which is 1 for values where ϕ holds and 0 otherwise). This bound can be lifted to *rc*<sup>T</sup> by solving a so-called *limit problem*, see [9].

LoAT's processors are also *sound for non-termination*, as they preserve finitism. So if p = ∞, then it suffices to prove satisfiability of ϕ to prove infinitism, which implies non-termination of the original ITS, where transitions with cost ∞ are forbidden (see Sect. 2). LoAT's most important processors are:

**Loop Acceleration** (Sect. 4) transforms a *simple loop*, i.e., a single transition f(<sup>x</sup>) <sup>p</sup> −→ f(a) [ϕ], into a non-deterministic transition that can simulate several loop iterations in one step. For example, loop acceleration transforms t*loop* to

$$f(x) \xrightarrow{n} f(x-n) \left[ x \ge n \land n > 0 \right],\tag{t\_{loop}}$$

where n ∈TV, i.e., the value of n can be chosen non-deterministically.

**Instantiation** [9, Theorem 3.12] replaces temporary variables by integer expressions. For example, it could instantiate n with x in t*loop*<sup>n</sup> , resulting in

$$f(x) \xrightarrow{x} f(0) \left[x > 0\right]. \tag{t\_{loop^x}}$$

**Chaining** [9, Theorem 3.18] combines two subsequent transitions into one transition. For example, chaining combines the transitions

$$\begin{aligned} \min(x) &\xrightarrow{1} f(x) \\ \text{and } t\_{loop^x} \text{ to } \begin{array}{c} \min(x) \xrightarrow{x+1} f(0) \left[x > 0\right] \end{array} . \end{aligned}$$

**Nonterm** (Sect. 6) searches for witnesses of non-termination, characterized by a formula ψ. So it turns, e.g.,

$$\begin{aligned} f(x\_1, x\_2) &\xrightarrow{1} f(x\_1 - x\_2, x\_2) \, [x\_1 > 0] \end{aligned} \qquad \qquad \begin{aligned} (t\_{nonterm})\\ (t\_{nonterm}) &\xrightarrow{\infty} sink(x\_1, x\_2) \, [x\_1 > 0 \land x\_2 \le 0] \end{aligned}$$

(where *sink* ∈ L is fresh), as each <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>2</sup> with <sup>c</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>c</sup><sup>2</sup> <sup>≤</sup> 0 witnesses non-termination of t*nonterm*, i.e., here ψ is x<sup>1</sup> > 0 ∧ x<sup>2</sup> ≤ 0.

Intuitively, LoAT uses **Chaining** to transform non-simple loops into simple loops. **Instantiation** resolves non-determinism heuristically and thus reduces the number of temporary variables, which is crucial for scalability. In addition to these processors, LoAT removes transitions after processing them, as explained in [9]. See [8,9] for heuristics and a suitable strategy to apply LoAT's processors.

# **4 Modular Loop Acceleration**

For **Loop Acceleration**, LoAT uses *conditional acceleration techniques* [10]. Given two formulas <sup>ξ</sup> and <sup>ϕ</sup>q, and a loop with update a, a conditional acceleration technique yields a formula *accel*(ξ, ϕ, <sup>q</sup> <sup>a</sup>) which implies that <sup>ξ</sup> holds throughout <sup>n</sup> loop iterations (i.e., <sup>ξ</sup> is an <sup>n</sup>*-invariant*), provided that <sup>ϕ</sup><sup>q</sup> is an <sup>n</sup>-invariant, too. In the following, let a<sup>0</sup>(x) := x and a<sup>m</sup>+1(x) := a(a<sup>m</sup>(x)) = a[x/a<sup>m</sup>(x)].

**Definition 1 (Conditional Acceleration Technique).** *A function accel is a* conditional acceleration technique *if the following implication holds for all formulas* <sup>ξ</sup> *and* <sup>ϕ</sup><sup>q</sup> *with variables from* TV∪ <sup>x</sup>*, all updates* <sup>a</sup>*, all* n > <sup>0</sup>*, and all instantiations of the variables with integers:* <sup>q</sup> <sup>a</sup>) ∧ ∀<sup>i</sup> <sup>∈</sup> [0, n). <sup>ϕ</sup>q(-

$$\vdash \left( accel(\xi, \check{\varphi}, \vec{a}) \land \forall i \in [0, n). \; \check{\varphi}(\vec{a}^i(\vec{x})) \right) \implies \forall i \in [0, n). \; \xi(\vec{a}^i(\vec{x})).$$

The prerequisite <sup>∀</sup><sup>i</sup> <sup>∈</sup> [0, n). <sup>ϕ</sup>q(ai (x)) is ensured by previous acceleration steps, i.e., <sup>ϕ</sup><sup>q</sup> is initially (*true*), and it is refined by conjoining a part <sup>ξ</sup> of the loop guard in each acceleration step. When formalizing acceleration techniques, we only specify the result of *accel* for certain arguments <sup>ξ</sup>, <sup>ϕ</sup>q, and a, and assume *accel*(ξ, ϕ, <sup>q</sup> <sup>a</sup>) = <sup>⊥</sup> (*false*) otherwise.

**Definition 2 (LoAT's Conditional Acceleration Techniques** [10,11]**). Increase** *accelinc*(ξ, ϕ, <sup>q</sup> <sup>a</sup>) := <sup>ξ</sup> *if* <sup>|</sup><sup>=</sup> <sup>ξ</sup> <sup>∧</sup> <sup>ϕ</sup><sup>q</sup> <sup>=</sup><sup>⇒</sup> <sup>ξ</sup>(a) **Decrease** *acceldec*(ξ, ϕ, <sup>q</sup> <sup>a</sup>) := <sup>ξ</sup>(a<sup>n</sup>−<sup>1</sup>(<sup>x</sup>)) *if* <sup>|</sup><sup>=</sup> <sup>ξ</sup>(<sup>a</sup>) <sup>∧</sup> <sup>ϕ</sup><sup>q</sup> <sup>=</sup><sup>⇒</sup> <sup>ξ</sup> **Eventual Decrease** *accel ev -dec*(t > <sup>0</sup>, ϕ, <sup>q</sup> <sup>a</sup>) := t > <sup>0</sup> <sup>∧</sup> <sup>t</sup>(a<sup>n</sup>−<sup>1</sup>(x)) > 0 *if* <sup>|</sup>= (<sup>t</sup> <sup>≥</sup> <sup>t</sup>(<sup>a</sup>) <sup>∧</sup> <sup>ϕ</sup>q) =<sup>⇒</sup> <sup>t</sup>(a) ≥ t(a<sup>2</sup>(x)) **Eventual Increase** *accel ev -inc*(t > <sup>0</sup>, ϕ, <sup>q</sup> <sup>a</sup>) := t > <sup>0</sup> <sup>∧</sup> <sup>t</sup> <sup>≤</sup> <sup>t</sup>(a) *if* <sup>|</sup>= (<sup>t</sup> <sup>≤</sup> <sup>t</sup>(<sup>a</sup>) <sup>∧</sup> <sup>ϕ</sup>q) =<sup>⇒</sup> <sup>t</sup>(a) ≤ t(a<sup>2</sup>(x)) **Fixpoint** *accelfp*(t > <sup>0</sup>, ϕ, <sup>q</sup> <sup>a</sup>) := t > <sup>0</sup> <sup>∧</sup> <sup>x</sup>∈*closure*<sup>a</sup>(t) x = x(a) *where closure*a(t) := <sup>i</sup>∈<sup>N</sup> <sup>V</sup>*ars*(t(ai (x)))

The above five techniques are taken from [10,11], where only deterministic loops are considered (i.e., there are no temporary variables). Lifting them to non-deterministic loops in a way that allows for *exact* conditional acceleration techniques (which capture all possible program runs) is non-trivial and beyond the scope of this paper. Thus, we sacrifice exactness and treat temporary variables like additional constant program variables whose update is the identity, resulting in a sound under-approximation (that captures a subset of all possible runs).

So essentially, **Increase** and **Decrease** handle inequations t > 0 in the loop guard where t increases or decreases (weakly) monotonically when applying the loop's update. The canonical examples where **Increase** or **Decrease** applies are

$$f(x, \ldots) \to f(x+1, \ldots) \left[ x > 0 \land \ldots \right] \quad \text{or} \quad f(x, \ldots) \to f(x-1, \ldots) \left[ x > 0 \land \ldots \right],$$

respectively. **Eventual Decrease** applies if t never increases again once it starts to decrease. The canonical example is f(x, y, . . .) → f(x + y, y − 1,...) [x > 0 ∧ ...]. Similarly, **Eventual Increase** applies if t never decreases again once it starts to increase. **Fixpoint** can be used for inequations t > 0 that do not behave (eventually) monotonically. It should only be used if *accelfp*(t > 0, ϕ, <sup>q</sup> <sup>a</sup>) is satisfiable.

LoAT uses the *acceleration calculus* of [10]. It operates on *acceleration problems* <sup>ψ</sup> <sup>|</sup> <sup>ϕ</sup><sup>q</sup> <sup>|</sup> <sup>ϕ</sup>a, where <sup>ψ</sup> (which is initially ) is repeatedly refined. When it stops, ψ is used as the guard of the resulting accelerated transition. The formulas <sup>ϕ</sup><sup>q</sup> and <sup>ϕ</sup> are the parts of the loop guard that have already or have not yet been handled, respectively. So <sup>ϕ</sup><sup>q</sup> is initially , and <sup>ϕ</sup> and a are initialized with the guard ϕ and the update of the loop f(<sup>x</sup>) <sup>p</sup> −→ f(a) [ϕ] under consideration, i.e., the initial acceleration problem is -|| <sup>ϕ</sup>a. Once <sup>ϕ</sup> is , the loop is accelerated to f(<sup>x</sup>) <sup>q</sup> −→ f(a<sup>n</sup>(x)) [ψ ∧ n > 0], where the cost q and a closed form for a<sup>n</sup>(x) are computed by the recurrence solver PURRS [2].

**Definition 3 (Acceleration Calculus for Conjunctive Loops).** *The relation on acceleration problems is defined as* <sup>q</sup> <sup>a</sup>) = <sup>ψ</sup><sup>2</sup>

$$
\sim \text{on } acceleration \text{ } problees \text{ } as \text{ } a \text{ } define \text{ } as
$$

$$
\frac{accel(\xi, \check{\varphi}, \vec{a}) = \psi\_2}{\|\psi\_1 \mid \check{\varphi} \mid \xi \wedge \hat{\varphi}\|\_{\vec{a}} \sim \|\psi\_1 \wedge \psi\_2 \mid \check{\varphi} \wedge \xi \mid \hat{\varphi}\|\_{\vec{a}}} \qquad \stackrel{aecel \text{ } is \text{ } a \text{ } conditional \text{ }}{accel \text{ } is \text{ } a \text{ } calculation \text{ } technique \text{ }}
$$

So to accelerate a loop, one picks a not yet handled part ξ of the guard in each step. When accelerating f(x) −→ f(a) [ξ] using a conditional acceleration technique *accel*, one may assume <sup>∀</sup><sup>i</sup> <sup>∈</sup> [0, n). <sup>ϕ</sup>q(ai (<sup>x</sup>)). The result of *accel* is conjoined to the result ψ<sup>1</sup> computed so far, and ξ is moved from the third to the second component of the problem, i.e., to the already handled part of the guard.

*Example 4 (Acceleration Calculus).* We show how to accelerate the loop

$$\begin{aligned} f(x,y) &\xrightarrow{x} f(x-y,y) \left[ x > 0 \land y \ge 0 \right] \qquad \text{to} \\ f(x,y) &\xrightarrow{(x+\frac{y}{2})\cdot n - \frac{y}{2}\cdot n^2} f(x-n\cdot y,y) \left[ y \ge 0 \land x - (n-1)\cdot y > 0 \land n > 0 \right] \end{aligned}$$

The closed form <sup>a</sup><sup>n</sup>(x)=(<sup>x</sup> <sup>−</sup> <sup>n</sup> · y, y) can be computed via recurrence solving. Similarly, the cost (x + <sup>y</sup> <sup>2</sup> ) · <sup>n</sup> <sup>−</sup> <sup>y</sup> <sup>2</sup> · <sup>n</sup><sup>2</sup> of <sup>n</sup> loop iterations is obtained by solving the following recurrence relation (where c(n) and x(n) denote the cost and the value of x after n applications of the transition, respectively).

$$c^{(n)} = c^{(n-1)} + x^{(n-1)} = c^{(n-1)} + x - (n-1) \cdot y \qquad \text{and} \qquad c^{(1)} = x.$$

The guard is computed as follows:

$$\begin{aligned} \left\lbrack \top \mid \top \mid x > 0 \land y \ge 0 \right\rbrack\_{\vec{a}} & \leadsto \left\lbrack y \ge 0 \mid y \ge 0 \mid x > 0 \right\rbrack\_{\vec{a}} \\ & \leadsto \left\lbrack y \ge 0 \land x - (n - 1) \cdot y > 0 \mid y \ge 0 \land x > 0 \mid \top \right\rbrack\_{\vec{a}}. \end{aligned}$$

In the 1st step, we have <sup>ξ</sup> = (<sup>y</sup> <sup>≥</sup> 0) and *accelinc*(<sup>y</sup> <sup>≥</sup> <sup>0</sup>, , a)=(y ≥ 0). In the <sup>2</sup>nd step, we have <sup>ξ</sup> = (x > 0) and *acceldec*(x > <sup>0</sup>, y <sup>≥</sup> <sup>0</sup>, a)=(x−(n−1)·y > 0). So the inequation x − (n − 1) · y > 0 ensures n-invariance of x > 0.

## **5 Efficient Loop Acceleration Using Unsat Cores**

Each attempt to apply a conditional acceleration technique other than **Fixpoint** requires proving an implication, which is implemented via SMT solving by proving unsatisfiability of its negation. For **Fixpoint**, satisfiability of *accelfp*(t > <sup>0</sup>, ϕ, <sup>q</sup> <sup>a</sup>) is checked via SMT. So even though LoAT restricts <sup>ξ</sup> to atoms, up to Θ(m<sup>2</sup>) attempts to apply a conditional acceleration technique are required to accelerate a loop whose guard contains m inequations using a naive strategy (5 ·<sup>m</sup> attempts for the 1st --step, 5 ·(m−1) attempts for the 2nd step, . . . ).

To improve efficiency, LoAT uses a novel encoding that requires just 5 · m attempts. For any <sup>α</sup> <sup>∈</sup> AT*imp* <sup>=</sup> {*inc*, *dec*, *ev*-*dec*, *ev*-*inc*}, let *encode*<sup>α</sup>(ξ, ϕ, q a) be the implication that has to be valid in order to apply *accel* <sup>α</sup>, whose premise is of the form ...∧ϕq. Instead of repeatedly refining <sup>ϕ</sup>q, LoAT tries to prove validity<sup>2</sup> of *encode*α,ξ := *encode*<sup>α</sup>(ξ,ϕ \ {ξ}, a) for each α ∈ AT*imp* and each ξ ∈ ϕ, where ϕ is the (conjunctive) guard of the transition that should be accelerated. Again,

<sup>2</sup> Here and in the following, we unify conjunctions of atoms with sets of atoms.

proving validity of an implication is equivalent to proving unsatisfiability of its negation. So if validity of *encode*α,ξ can be shown, then SMT solvers can also provide an *unsat core* for <sup>¬</sup>*encode*α,ξ.

**Definition 5 (Unsat Core).** *Given a conjunction* <sup>ψ</sup>*, we call each unsatisfiable subset of* <sup>ψ</sup> *an* unsat core *of* <sup>ψ</sup>*.*

Theorem 6 shows that when handling an inequation ξ, one only has to require <sup>n</sup>-invariance for the elements of <sup>ϕ</sup>\{ξ} that occur in an unsat core of <sup>¬</sup>*encode*α,ξ. Thus, an unsat core of <sup>¬</sup>*encode*α,ξ can be used to determine which prerequisites <sup>ϕ</sup><sup>q</sup> are needed for the inequation <sup>ξ</sup>. This information can then be used to find a suitable order for handling the inequations of the guard. Thus, in this way one only has to check (un)satisfiability of the 4 · <sup>m</sup> formulas <sup>¬</sup>*encode*α,ξ. If no such order is found, then LoAT either fails to accelerate the loop under consideration, or it resorts to using **Fixpoint**, as discussed below.

**Theorem 6 (Unsat Core Induces** -**-Step).** *Let deps*α,ξ *be the intersection of* <sup>ϕ</sup> \ {ξ} *and an unsat core of* <sup>¬</sup>*encode*α,ξ*. If* <sup>ϕ</sup><sup>q</sup> *implies deps*α,ξ*, then accel*<sup>α</sup>(ξ, ϕ, <sup>q</sup> <sup>a</sup>) = *accel*<sup>α</sup>(ξ,ϕ \ {ξ}, a)*.*

*Example 7 (Controlling Acceleration Steps via Unsat Cores).* Reconsider Example 4. Here, LoAT would try to prove, among others, the following implications:

$$
overset{\text{def}}{\text{dec}}\_{\text{dec},x>0} \quad = \quad (x-y>0 \land y>0) \implies x>0 \tag{1}
$$

$$encode\_{inc, y>0} \quad = \quad (y>0 \land x>0) \implies y>0 \tag{2}$$

To do so, it would try to prove unsatisfiability of <sup>¬</sup>*encode*α,ξ via SMT. For (1), we get <sup>¬</sup>*encodedec*,x><sup>0</sup> = (<sup>x</sup> <sup>−</sup> y > <sup>0</sup> <sup>∧</sup> y > <sup>0</sup> <sup>∧</sup> <sup>x</sup> <sup>≤</sup> 0), whose only unsat core is <sup>¬</sup>*encodedec*,x>0, and its intersection with <sup>ϕ</sup> \ {x > <sup>0</sup>} <sup>=</sup> {y > <sup>0</sup>} is {y > <sup>0</sup>}.

For (2), we get <sup>¬</sup>*encodeinc*,y><sup>0</sup> = (y > <sup>0</sup>∧x > <sup>0</sup>∧<sup>y</sup> <sup>≤</sup> 0), whose minimal unsat core is y > 0 ∧ y ≤ 0, and its intersection with ϕ \ {y > 0} = {x > 0} is empty. So by Theorem 6, we have *accelinc*(y > <sup>0</sup>, , <sup>a</sup>) = *accelinc*(y > <sup>0</sup>,x> <sup>0</sup>, a).

In this way, validity of *encode*<sup>α</sup>1,x><sup>0</sup> and *encode*<sup>α</sup>2,y><sup>0</sup> is proven for all <sup>α</sup><sup>1</sup> <sup>∈</sup> AT*imp* \ {*inc*} and all <sup>α</sup><sup>2</sup> <sup>∈</sup> AT*imp*. However, the premise <sup>x</sup> <sup>≤</sup> <sup>x</sup> <sup>−</sup> <sup>y</sup> <sup>∧</sup> y > <sup>0</sup> of *encodeev*-*inc*,x><sup>0</sup> is unsatisfiable and thus a corresponding acceleration step would yield a transition with unsatisfiable guard. To prevent that, LoAT only uses a technique <sup>α</sup> <sup>∈</sup> AT*imp* for <sup>ξ</sup> if the premise of *encode*α,ξ is satisfiable.

So for each inequation ξ from ϕ, LoAT synthesizes up to 4 potential --steps corresponding to *accel* <sup>α</sup>(ξ, *deps*α,ξ, <sup>a</sup>), where <sup>α</sup> <sup>∈</sup> AT*imp*. If validity of *encode*α,ξ cannot be shown for any α ∈ AT*imp*, then LoAT tries to prove satisfiability of *accelfp*(ξ, , a) to see if **Fixpoint** should be applied. Note that the 2nd argument of *accelfp* is irrelevant, i.e., **Fixpoint** does not benefit from previous acceleration steps and thus --steps that use it do not have any dependencies.

It remains to find a suitably ordered subset S of m --steps that constitutes a successful --sequence. In the following, we define AT := AT*imp* ∪ {*fp*} and we extend the definition of *deps*α,ξ to the case <sup>α</sup> <sup>=</sup> *fp* by defining *depsfp*,ξ := <sup>∅</sup>.

**Lemma 8.** *Let* <sup>C</sup> <sup>⊆</sup> AT <sup>×</sup> <sup>ϕ</sup> *be the smallest set such that* (α, ξ) <sup>∈</sup> <sup>C</sup> *implies*


*Let* <sup>S</sup> := {(α, ξ) <sup>∈</sup> <sup>C</sup> <sup>|</sup> <sup>α</sup> <sup>≥</sup>AT <sup>α</sup> *for all* (α , ξ) <sup>∈</sup> <sup>C</sup>} *where* <sup>&</sup>gt;AT *is the total order inc* <sup>&</sup>gt;AT *dec* <sup>&</sup>gt;AT *ev -dec* <sup>&</sup>gt;AT *ev -inc* <sup>&</sup>gt;AT *fp. We define* (α , ξ ) ≺ (α, ξ) *if* <sup>ξ</sup> <sup>∈</sup> *deps*α,ξ*. Then* <sup>≺</sup> *is a strict (and hence, well-founded) order on* <sup>S</sup>*.*

The order >AT in Lemma 8 corresponds to the order proposed in [10]. Note that the set C can be computed without further (potentially expensive) SMT queries by a straightforward fixpoint iteration, and well-foundedness of ≺ follows from minimality of C. For Example 7, we get

$$\begin{aligned} C &= \{ (dec, x > 0), (ev \cdot dec, x > 0) \} \cup \{ (\alpha, y > 0) \mid \alpha \in AT \} &\text{and} \\ S &= \{ (dec, x > 0), (inc, y > 0) \} \text{ with } (inc, y > 0) \prec (dec, x > 0). \end{aligned}$$

Finally, we can construct a valid --sequence via the following theorem.

**Theorem 9. (Finding** -**-Sequences).** *Let* <sup>S</sup> *be defined as in Lemma 8 and assume that for each* <sup>ξ</sup> <sup>∈</sup> <sup>ϕ</sup>*, there is an* <sup>α</sup> <sup>∈</sup> AT *such that* (α, ξ) <sup>∈</sup> <sup>S</sup>*. W.l.o.g., let* <sup>ϕ</sup> <sup>=</sup> <sup>m</sup> <sup>i</sup>=1 <sup>ξ</sup><sup>i</sup> *where* (α1, ξ1) <sup>≺</sup> ... <sup>≺</sup> (αm, ξm) *for some strict total order* <sup>≺</sup> *containing* <sup>≺</sup>*, and let* <sup>ϕ</sup>q<sup>j</sup> := <sup>j</sup> <sup>i</sup>=1 <sup>ξ</sup><sup>i</sup>*. Then for all* <sup>j</sup> <sup>∈</sup> [0, m)*, we have:*

$$\begin{aligned} \text{containing } \prec, \text{ and let } \check{\varphi}\_{j} := \bigwedge\_{i=1}^{j} \xi\_{i}. \text{ Then for all } j \in [0, m), \text{ we have:}\\ \left[\bigwedge\_{i=1}^{j} \operatorname{accel}\_{\alpha\_{i}}(\xi\_{i}, \check{\varphi}\_{i-1}, \vec{a}) \; \middle| \; \check{\varphi}\_{j} \; \middle| \; \bigwedge\_{i=j+1}^{m} \xi\_{i} \right]\_{\vec{a}} & \sim \left[\bigwedge\_{i=1}^{j+1} \operatorname{accel}\_{\alpha\_{i}}(\xi\_{i}, \check{\varphi}\_{i-1}, \vec{a}) \; \middle| \; \check{\varphi}\_{j+1} \; \middle| \; \bigwedge\_{i=j+2}^{m} \xi\_{i} \right]\_{\vec{a}}. \end{aligned}$$

In our example, we have ≺ = ≺ as ≺ is total. Thus, we obtain a - sequence by first processing y > 0 with **Increase** and then processing x > 0 with **Decrease**.

# **6 Proving Non-Termination of Simple Loops**

To prove non-termination, LoAT uses a variation of the calculus from Sect. 4, see [11]. To adapt it for proving non-termination, further restrictions have to be imposed on the conditional acceleration techniques, resulting in the notion of *conditional non-termination techniques*, see [11, Def. 10]. We denote a --step that uses a conditional non-termination technique with *nt*.

**Theorem 10. (Proving Non-Termination via** *nt* **).** *Let* <sup>f</sup>(x) −→ f(a) [ϕ] ∈ <sup>T</sup> *. If* -|| ϕa -∗ *nt* <sup>ψ</sup> <sup>|</sup> <sup>ϕ</sup> | a*, then for every* <sup>c</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup> *where* <sup>ψ</sup>(<sup>c</sup>) *is satisfiable, the configuration* <sup>f</sup>(<sup>c</sup>) *admits an infinite* −→<sup>T</sup> *-sequence.*

The conditional non-termination techniques used by LoAT are **Increase**, **Eventual Increase**, and **Fixpoint**. So non-termination proofs can be synthesized while trying to accelerate a loop with very little overhead. After successfully accelerating a loop as explained in Sect. 5, LoAT tries to find a second suitably ordered --sequence, where it only considers the conditional non-termination techniques mentioned above. If LoAT succeeds, then it has found a *nt*-sequence which gives rise to a proof of non-termination via Theorem 10.

# **7 Implementation, Experiments, and Conclusion**

Our implementation in LoAT can parse three widely used formats for ITSs (see [13]), and it is configurable via a minimalistic set of command-line options:






We evaluate three versions of LoAT: LoAT '19 uses templates to find invariants that facilitate loop acceleration for proving non-termination [8]; LoAT '20 deduces worst-case lower bounds based on loop acceleration via *metering functions* [9]; and LoAT '22 applies the calculus from [10,11] as described in Sect. <sup>5</sup> and 6. We also include three other state-of-the-art termination tools in our evaluation: T2 [6], VeryMax [16], and iRankFinder [3,7]. Regarding complexity, the only other tool for worst-case lower bounds of ITSs is LOBER [1]. However, we do not compare with LOBER, as it only analyses (multi-path) loops instead of full ITSs.

We use the examples from the categories *Termination* (1222 examples) and *Complexity of ITSs* (781 examples), respectively, of the *Termination Problems Data Base* [19]. All benchmarks have been performed on *StarExec* [18] (Intel Xeon E5-2609, 2.40GHz, 264GB RAM [17]) with a wall clock timeout of 300 s.



By the table on the left, LoAT '22 is the most powerful tool for nontermination. The improvement over LoAT '19 demonstrates that the calculus from [10,11] is more powerful and efficient than the approach from [8]. The last three columns show the average, the median, and the standard deviation of the wall clock runtime, including examples where the timeout was reached.

The table on the right shows the results for complexity. The diagonal corresponds to examples where LoAT '20 and LoAT '22 yield the same result. The entries above or below the diagonal correspond to examples where LoAT '22 or LoAT '20 is better, respectively. There are 8 regressions and 79 improvements, so the calculus from [10,11] used by LoAT '22 is also beneficial for lower bounds.

LoAT is open source and its source code is available on GitHub [12]. See [13,14] for details on our evaluation, related work, all proofs, and a pre-compiled binary.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Implicit Definitions with Differential Equations for KeYmaera X (System Description)

James Gallicchio(B) , Yong Kiam Tan(B) , Stefan Mitsch(B) , and André Platzer(B)

Computer Science Department, Carnegie Mellon University, Pittsburgh, USA jgallicc@andrew.cmu.edu, {yongkiat,smitsch,aplatzer}@cs.cmu.edu

Abstract. Definition packages in theorem provers provide users with means of defining and organizing concepts of interest. This system description presents a new definition package for the hybrid systems theorem prover KeYmaera X based on differential dynamic logic (dL). The package adds KeYmaera X support for user-defined smooth functions whose graphs can be implicitly characterized by dL formulas. Notably, this makes it possible to implicitly characterize functions, such as the exponential and trigonometric functions, as solutions of differential equations and then prove properties of those functions using dL's differential equation reasoning principles. Trustworthiness of the package is achieved by minimally extending KeYmaera X's soundness-critical kernel with a single axiom scheme that expands function occurrences with their implicit characterization. Users are provided with a high-level interface for defining functions and non-soundness-critical tactics that automate low-level reasoning over implicit characterizations in hybrid system proofs.

Keywords: Definitions · Differential dynamic logic · Verification of hybrid systems · Theorem proving

## 1 Introduction

KeYmaera X [7] is a theorem prover implementing differential dynamic logic dL [17,19–21] for specifying and verifying properties of hybrid systems mixing discrete dynamics and differential equations. Definitions enable users to express complex theorem statements in concise terms, e.g., by modularizing hybrid system models and their proofs [14]. Prior to this work, KeYmaera X had only one mechanism for definition, namely, non-recursive abbreviations via uniform substitution [14,20]. This restriction meant that common and useful functions, e.g., the trigonometric and exponential functions, could not be directly used in KeYmaera X, even though they can be uniquely characterized by dL formulas [17].

This system description introduces a new KeYmaera X definitional mechanism where functions are *implicitly defined* in dL as solutions of ordinary differential equations (ODEs). Although definition packages are available in most general-purpose proof assistants, our package is novel in tackling the question of how best to support user-defined functions in the *domain-specific* setting for hybrid systems. In contrast to tools with builtin support for *some* fixed subsets of special functions [1,9,23]; or higher-order logics that can work with functions via their infinitary series expansions [4], e.g., exp(t) = -∞ i=0 <sup>t</sup> *i* i! ; our package strikes a balance between practicality and generality by allowing users to define and reason about *any* function characterizable in dL as the solution of an ODE (Sect. 2), e.g., exp(t) solves the ODE e <sup>=</sup> <sup>e</sup> with initial value <sup>e</sup>(0) = 1.

Theoretically, implicit definitions strictly expand the class of ODE invariants amenable to dL's complete ODE invariance proof principles [22]; such invariants play a key role in ODE safety proofs [21] (see Proposition 3). In practice, arithmetical identities and other specifications involving user-defined functions are proved by automatically unfolding their implicit ODE characterizations and reusing existing KeYmaera X support for ODE reasoning (Sect. 3). The package is designed to provide seamless integration of implicit definitions in KeYmaera X and its usability is demonstrated on several hybrid system verification examples drawn from the literature that involve special functions (Sect. 4).

All proofs are in the supplement [8]. The definitions package is part of KeYmaera X with a usage guide at: http://keymaeraX.org/keymaeraXfunc/.

#### 2 Interpreted Functions in Differential Dynamic Logic

This section briefly recalls differential dynamic logic (dL) [17,18,20,21] and explains how its term language is extended to support implicit function definitions.

Syntax. Terms e, e˜ and formulas φ, ψ in dL are generated by the following grammar, with variable x, rational constant c, k-ary function symbols h (for any k <sup>∈</sup> <sup>N</sup>), comparison operator ∼∈{=, =, <sup>≥</sup>, >, <sup>≤</sup>, <}, and hybrid program α:

$$\{e, \bar{e} ::= x \mid c \mid e + \bar{e} \mid e \cdot \bar{e} \mid h(e\_1, \dots, e\_k) \tag{1}$$

$$\left|\phi,\psi\right\rangle ::= e \sim \overline{e} \mid \phi \land \psi \mid \phi \lor \psi \mid \neg \phi \mid \forall x\phi \mid \exists x\phi \mid \left[\alpha\right] \phi \mid \left<\alpha\right>\phi\tag{2}$$

The terms and formulas above extend the first-order language of real arithmetic (FOLR) with the box ([α] φ) and diamond (α φ) modality formulas which express that *all* or *some* runs of hybrid program α satisfy postcondition φ, respectively. Table 1 gives an intuitive overview of dL's hybrid programs language for modeling systems featuring discrete and continuous dynamics and their interactions thereof. In dL's uniform substitution calculus, function symbols h are *uninterpreted*, i.e., they semantically correspond to an arbitrary (smooth) function. Such uninterpreted function symbols (along with uninterpreted predicate and program symbols) are crucially used to give a parsimonious axiomatization of dL based on uniform substitution [20] which, in turn, enables a trustworthy microkernel implementation of the logic in the theorem prover KeYmaera X [7,16].


Table 1. Syntax and informal semantics of hybrid programs

**Hybrid program model (auxiliary variables** *s, c***):**

**Hybrid program model (trigonometric functions):**

$$\alpha\_s \equiv \begin{pmatrix} p := \* ; \text{if } \left(\frac{1}{2}(\omega - p)^2 < \frac{g}{L}\cos(\theta)\right) & \{\omega := \omega - p\}; \\\\ \{\theta' = \omega, \omega' = -\frac{g}{L}\sin(\theta) - k\omega\} \end{pmatrix}$$

**safety specification:**

$$\phi\_{\sigma} \equiv g > 0 \land L > 0 \land k > 0 \land \theta = 0 \land \omega = 0 \to [\alpha\_{\sigma}] |\theta| < \frac{\iota}{2}$$

Fig. 1. Running example of a swinging pendulum driven by an external force (left), its hybrid program models and dL safety specification (right). Program α<sup>s</sup> uses trigonometric functions directly, while program αˆ<sup>s</sup> uses variables s, c to implicitly track the values of sin(θ) and cos(θ), respectively (additions in red). The implicit characterizations φsin(s, θ), φcos(c, θ) are defined in (4), (5) and are not repeated here for brevity. (Color figure online)

Running Example. Adequate modeling of hybrid systems often requires the use of *interpreted* function symbols that denote specific functions of interest. As a running example, consider the swinging pendulum shown in Fig. 1. The ODEs describing its continuous motion are θ <sup>=</sup> ω, ω <sup>=</sup> <sup>−</sup> <sup>g</sup> L sin(θ) <sup>−</sup> kω, where θ is the swing angle, ω is the angular velocity, and g, k,L are the gravitational constant, coefficient of friction, and length of the rigid rod suspending the pendulum, respectively. The hybrid program <sup>α</sup>s models an external force that repeatedly pushes the pendulum and changes its angular velocity by a nondeterministically chosen value p; the guard if(...) condition is designed to ensure that the push does not cause the pendulum to swing above the horizontal as specified by <sup>φ</sup>s. Importantly, the function symbols sin, cos must denote the usual real trigonometric functions in <sup>α</sup>s. Program <sup>α</sup>ˆs shows the same pendulum modeled in dL *without* the use of interpreted symbols, but instead using auxiliary variables s, c. Note that <sup>α</sup>ˆs is cumbersome and subtle to get right: the implicit characterizations φsin(s, θ), φcos(c, θ) from (4), (5) are lengthy and the differential equations s <sup>=</sup> ωc, c <sup>=</sup> <sup>−</sup>ωs must be manually calculated and added to ensure that s, c correctly track the trigonometric functions as θ evolves continuously [18,22].

Interpreted Functions. To enable extensible use of interpreted functions in dL, the term grammar (1) is enriched with k-ary function symbols h that carry an *interpretation* annotation [5,27], <sup>h</sup>φ, where <sup>φ</sup> <sup>≡</sup> <sup>φ</sup>(x<sup>0</sup>, y<sup>1</sup>,...,yk) is a dL formula with free variables in <sup>x</sup><sup>0</sup>, y<sup>1</sup>,...,yk and no uninterpreted symbols. Intuitively, φ is a formula that characterizes the graph of the intended interpretation for <sup>h</sup>, where <sup>y</sup><sup>1</sup>,...,yk are inputs to the function and <sup>x</sup><sup>0</sup> is the output. Since φ depends only on the values of its free variables, its formula semantics [[φ]] can be equivalently viewed as a subset of Euclidean space [[φ]] <sup>⊆</sup> <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>k</sup> [20,21]. The dL term semantics ν[[e]] [20,21] in a state ν is extended with a case for terms <sup>h</sup>φ(e<sup>1</sup>,...,ek) by evaluation of the smooth <sup>C</sup><sup>∞</sup> function characterized by [[φ]]:

$$\nu[h\_{\ll \phi \otimes}(e\_1, \dots, e\_k)] = \begin{cases} \hat{h}(\nu[e\_1], \dots, \nu[e\_k]) & \text{if } [\phi] \text{ graph of smooth } \hat{h} \colon \mathbb{R}^k \to \mathbb{R} \\ 0 & \text{otherwise} \end{cases}$$

This semantics says that, if the relation [[φ]] <sup>⊆</sup> <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>k</sup> is the graph of some smooth <sup>C</sup><sup>∞</sup> function <sup>h</sup><sup>ˆ</sup> : <sup>R</sup><sup>k</sup> <sup>→</sup> <sup>R</sup>, then the annotated syntactic symbol <sup>h</sup>φ is interpreted semantically as hˆ. Note that the graph relation uniquely defines <sup>h</sup><sup>ˆ</sup> (if it exists). Otherwise, <sup>h</sup>φ is interpreted as the constant zero function which ensures that the term semantics remain well-defined for all terms. An alternative is to leave the semantics of some terms (possibly) undefined, but this would require more extensive changes to the semantics of dL and extra case distinctions during proofs [2].

Axiomatics and Differentially-Defined Functions. To support reasoning for implicit definitions, annotated interpretations are reified to characterization axioms for expanding interpreted functions in the following lemma.

Lemma 1. (Function interpretation). *The FI axiom (below) for* dL *is sound where* h *is a* k*-ary function symbol and the formula semantics* [[φ]] *is the graph of a smooth* C<sup>∞</sup> *function* <sup>h</sup><sup>ˆ</sup> : <sup>R</sup><sup>k</sup> <sup>→</sup> <sup>R</sup>*.*

$$\text{FI} \quad e\_0 = h\_{\llcorner \phi \ggcorner \gg}(e\_1, \dots, e\_k) \leftrightarrow \phi(e\_0, e\_1, \dots, e\_k)$$

Axiom FI enables reasoning for terms <sup>h</sup>φ(e<sup>1</sup>,...,ek) through their implicit interpretation φ, but Lemma <sup>1</sup> does not directly yield an implementation because it has a soundness-critical side condition that interpretation φ characterizes the graph of a smooth C<sup>∞</sup> function. It is possible to syntactically characterize this side condition [2], e.g., the formula <sup>∀</sup>y1,...,yk∃x0φ(x0, y1,...,yk) expresses that the graph represented by <sup>φ</sup> has at least one output value <sup>x</sup><sup>0</sup> for each input value <sup>y</sup>1,...,yk, but this burdens users with the task of proving this side condition in dL before working with their desired function. The KeYmaera X definition package opts for a middle ground between generality and ease-of-use by implementing FI for univariate, *differentially-defined* functions, i.e., the interpretation <sup>φ</sup> has the following shape, where <sup>x</sup> = (x<sup>0</sup>, x<sup>1</sup>,...,xn) abbreviates a vector of variables, there is one input <sup>t</sup> <sup>=</sup> <sup>y</sup><sup>1</sup>, and <sup>X</sup> = (X<sup>0</sup>, X<sup>1</sup>,...,Xn), <sup>T</sup> are dL terms that do not mention any free variables, e.g., are rational constants, which have constant value in any dL state:

$$\phi(x\_0, t) \equiv \left\langle x\_1, \dots, x\_n := \*; \begin{cases} x' = -f(x, t), t' = -1 \cup \\ x' = f(x, t), t' = 1 \end{cases} \right\rangle \left( \begin{array}{c} x = X \wedge \\ t = T \end{array} \right) \tag{3}$$

Formula (3) says from point x<sup>0</sup>, there exists a choice of the remaining coordinates <sup>x</sup><sup>1</sup>,...,xn such that it is possible to follow the defining ODE either forward <sup>x</sup> <sup>=</sup> f(x, t), t = 1 or backward x <sup>=</sup> <sup>−</sup>f(x, t), t <sup>=</sup> <sup>−</sup><sup>1</sup> in time to reach the initial values x <sup>=</sup> X at time t <sup>=</sup> T. In other words, the implicitly defined function <sup>h</sup>φ(x0,t) is the <sup>x</sup><sup>0</sup>-coordinate projected solution of the ODE starting from initial values X at initial time T. For example, the trigonometric functions used in Fig. 1 are differentially-definable as respective projections:

$$\phi\_{\sin}(s,t) \equiv \left\langle c := \*; \begin{cases} s' = -c, c' = & s, t' = -1 \cup \\ s' = & c, c' = -s, t' = 1 \end{cases} \right\rangle \left( \begin{array}{c} s = 0 \land c = 1 \land \\ t = 0 \end{array} \right) \tag{4}$$

$$\phi\_{\rm coa}(c,t) \equiv \left\langle s := \*; \begin{cases} s' = -c, c' = & s, t' = -1 \cup \\ s' = & c, c' = -s, t' = 1 \end{cases} \right\rangle \left( \begin{array}{c} s = 0 \land c = 1 \land \\ t = 0 \end{array} \right) \tag{5}$$

By Picard-Lindelöf [21, Thm. 2.2], the ODE x <sup>=</sup> <sup>f</sup>(x, t) has a unique solution Φ:(a, b) <sup>→</sup> <sup>R</sup>n+1 on an open interval (a, b) for some −∞ ≤ a<b ≤ ∞. Moreover, Φ(t) is C<sup>∞</sup> smooth in <sup>t</sup> because the ODE right-hand sides are dL terms with smooth interpretations [20]. Therefore, the side condition for Lemma 1 reduces to showing that <sup>Φ</sup> exists globally, i.e., it is defined on t <sup>∈</sup> (−∞,∞).

Lemma 2. (Smooth interpretation). *If formula* <sup>∃</sup>x<sup>0</sup>φ(x<sup>0</sup>, t) *is valid,* φ(x<sup>0</sup>, t) *from* (3) *characterizes a smooth* C<sup>∞</sup> *function and axiom FI is sound for* <sup>φ</sup>(x<sup>0</sup>, t)*.*

Lemma 2 enables an implementation of axiom FI in KeYmaera X that combines a syntactic check (the interpretation has the shape of formula (3)) and a side condition check (requiring users to prove existence for their interpretations).

The addition of differentially-defined functions to dL strictly increases the deductive power of ODE invariants, a key tool in deductive ODE safety reasoning [21]. Intuitively, the added functions allow direct, syntactic descriptions of invariants, e.g., the exponential or trigonometric functions, that have effective invariance proofs using dL's complete ODE invariance reasoning principles [22]. Proposition 3. (Invariant expressivity). *There are valid polynomial* dL *differential equation safety properties which are provable using differentially-defined function invariants but are not provable using polynomial invariants.*

# 3 KeYmaera X Implementation

The implicit definition package adds interpretation annotations and axiom FI based on Lemma 2 in ≈170 lines of code extensions to KeYmaera X's soundnesscritical core [7,16]. This section focuses on non-soundness-critical usability features provided by the package that build on those core changes.

## 3.1 Core-Adjacent Changes

KeYmaera X has a browser-based user interface with concrete, ASCII-based dL syntax [14]. The package extends KeYmaera X's parsers and pretty printers with support for interpretation annotations h«...»(...) and users can simultaneously define a family of functions as respective coordinate projections of the solution of an n-dimensional ODE (given initial conditions) with sugared syntax:

implicit Real h1(Real t), ..., hn(Real t) = {{initcond};{ODE}}

For example, the implicit definitions (4), (5) can be written with the following sugared syntax; KeYmaera X automatically inserts the associated interpretation annotations for the trigonometric function symbols, see the supplement [8] for a KeYmaera X snippet of formula <sup>φ</sup>s from Fig. <sup>1</sup> using this sugared definition.

```
implicit Real sin(Real t), cos(Real t)
   = {{sin:=0; cos:=1;}; {sin'=cos, cos'=-sin}}
```
In fact, the functions sin, cos, exp are so ubiquitous in hybrid system models that the package builds their definitions in automatically without requiring users to write them explicitly. In addition, although arithmetic involving those functions is undecidable [11,24], KeYmaera X can export those functions whenever its external arithmetic tools have partial arithmetic support for those functions.

### 3.2 Intermediate and User-Level Proof Automation

The package automatically proves three important lemmas about user-defined functions that can be transparently re-used in all subsequent proofs:

1. It proves the side condition of axiom FI using KeYmaera X's automation for proving sufficient duration existence of solutions for ODEs [26] which automatically shows global existence of solutions for all affine ODEs and some univariate nonlinear ODEs. As an example of the latter, the hyperbolic tanh function is differentially-defined as the solution of ODE x = 1−x<sup>2</sup> with initial value x = 0 at t = 0 whose global existence is proved automatically.


These lemmas enable the use of differentially-defined functions with all existing ODE automation in KeYmaera X [22,26]. In particular, since differentiallydefined functions are univariate Noetherian functions, they admit complete ODE invariance reasoning principles in dL [22] as implemented in KeYmaera X.

The package also adds specialized support for arithmetical reasoning over differential definitions to supplement external arithmetic tools in proofs. First, it allows users to manually prove identities and bounds using KeYmaera X's ODE reasoning. For example, the bound tanh(λx)<sup>2</sup> <sup>&</sup>lt; <sup>1</sup> used in the example <sup>α</sup>n from Sect. 4 is proved by *differential unfolding* as follows (see supplement [8]):

$$\frac{\vdash \tanh(0)^2 < 1 \quad \tanh(\lambda v)^2 < 1 \vdash \left[ \{ v' = 1 \& \, v \le x \} \cup \{ v' = -1 \& \, v \ge x \} \right] \tanh(\lambda v)^2 < 1}{\vdash \tanh(\lambda x)^2 < 1}$$

This deduction step says that, to show the conclusion (below rule bar), it suffices to prove the premises (above rule bar), i.e., the bound is true at v = 0 (left premise) and it is preserved as v is evolved forward v = 1 or backward v <sup>=</sup> <sup>−</sup><sup>1</sup> along the real line until it reaches x (right premise). The left premise is proved using the initial value lemma for tanh while the right premise is proved by ODE invariance reasoning with the differential axiom for tanh [22].

Second, the package uses KeYmaera X's uniform substitution mechanism [20] to implement (untrusted) abstraction of functions with fresh variables when solving arithmetic subgoals, e.g., the following arithmetic bound for example <sup>α</sup>n is proved by abstraction after adding the bounds tanh(λx)<sup>2</sup> < <sup>1</sup>,tanh(λy)<sup>2</sup> < <sup>1</sup>.

$$\textbf{Bound:}\quad x(\tanh(\lambda x) - \tanh(\lambda y)) + y(\tanh(\lambda x) + \tanh(\lambda y)) \le 2\sqrt{x^2 + y^2}$$

$$\textbf{Absracted:}\quad t\_x^2 < 1 \land t\_y^2 < 1 \to x(t\_x - t\_y) + y(t\_x + t\_y) \le 2\sqrt{x^2 + y^2}$$

### 4 Examples

The definition package enables users to work with differentially-defined functions in KeYmaera X, including modeling and expressing their design intuitions in proofs. This section applies the package to verify various continuous and hybrid system examples from the literature featuring such functions.

*Discretely Driven Pendulum.* The specification <sup>φ</sup>s from Fig. <sup>1</sup> contains a discrete loop whose safety property is proved by a loop invariant, i.e., a formula that is preserved by the discrete and continuous dynamics in each loop iteration [21]. The key invariant is *Inv* ≡ <sup>g</sup> L (1 <sup>−</sup> cos <sup>θ</sup>) + <sup>1</sup> <sup>2</sup>ω<sup>2</sup> < <sup>g</sup> L , which expresses that the total energy of the system (sum of potential and kinetic energy on the LHS) is less than the energy needed to cross the horizontal (RHS). The main steps are as follows (proofs for these steps are automated by KeYmaera X):


*Neuron Interaction.* The ODE <sup>α</sup>n models the interaction between a pair of neurons [12]; its specification <sup>φ</sup>n nests dL's diamond and box modalities to express that the system norm (x<sup>2</sup> <sup>+</sup> y<sup>2</sup>) is asymptotically bounded by <sup>2</sup>τ .

$$\begin{aligned} \alpha\_n &\equiv x' = -\frac{x}{\tau} + \tanh(\lambda x) - \tanh(\lambda y), y' = -\frac{y}{\tau} + \tanh(\lambda x) + \tanh(\lambda y) \\ \phi\_n &\equiv \tau > 0 \rightarrow \forall \varepsilon > 0 \langle \alpha\_n \rangle \left[ \alpha\_n \right] \sqrt{x^2 + y^2} \le 2\tau + \varepsilon \end{aligned}$$

The verification of <sup>φ</sup>n uses differentially-defined functions in concert with KeYmaera X's symbolic ODE safety and liveness reasoning [26]. The proof uses a decaying exponential bound x<sup>2</sup> <sup>+</sup> <sup>y</sup><sup>2</sup> <sup>≤</sup> exp(<sup>−</sup> <sup>t</sup> τ ) x<sup>2</sup> <sup>0</sup> <sup>+</sup> <sup>y</sup><sup>2</sup> <sup>0</sup>+2τ (1−exp(<sup>−</sup> <sup>t</sup> τ )), where the constants <sup>x</sup><sup>0</sup>, y<sup>0</sup> are symbolic initial values for x, y at initial time <sup>t</sup> = 0, respectively. Notably, the arithmetic subgoals from this example are all proved using abstraction and differential unfolding (Sect. 3) without relying on external arithmetic solver support for tanh.

*Longitudinal Flight Dynamics.* The differential equations <sup>α</sup>a below describe the 6th order longitudinal motion of an airplane while climbing or descending [10,25]. The airplane adjusts its pitch angle θ with pitch rate q, which determines its axial velocity u and vertical velocity w, and, in turn, range x

and altitude z (illustrated on the right). The physical parameters are: gravity g, mass m, aerodynamic thrust and moment M along the lateral axis, aerodynamic and thrust forces X, Z along x and z, respectively, and the moment of inertia <sup>I</sup>yy, see [10, Sect. 6.2].

$$\begin{aligned} \alpha\_a &\equiv u' = \frac{X}{m} - g\sin(\theta) - qw, &\qquad w' = \frac{Z}{m} + g\cos(\theta) + qu, &\qquad q' = \frac{M}{I\_{yy}},\\\ x' &= \cos(\theta)u + \sin(\theta)w, &\qquad z' = -\sin(\theta)u + \cos(\theta)w, &\qquad \theta' = q \end{aligned}$$

The verification of specification <sup>J</sup> <sup>→</sup> [αa]<sup>J</sup> shows that the safety envelope <sup>J</sup> <sup>≡</sup> <sup>J</sup><sup>1</sup> <sup>∧</sup> <sup>J</sup><sup>2</sup> <sup>∧</sup> <sup>J</sup><sup>3</sup> is invariant along the flow of <sup>α</sup>a with algebraic invariants <sup>J</sup>i:

$$\begin{aligned} J\_1 &\equiv \frac{Mz}{I\_{yy}} + g\theta + \left(\frac{X}{m} - qw\right)\cos(\theta) + \left(\frac{Z}{m} + qu\right)\sin(\theta) = 0\\ J\_2 &\equiv \frac{Mz}{I\_{yy}} - \left(\frac{Z}{m} + qu\right)\cos(\theta) + \left(\frac{X}{m} - qw\right)\sin(\theta) = 0 \quad J\_3 \equiv -q^2 + \frac{2M\theta}{I\_{yy}} = 0 \end{aligned}$$

Additional examples are available in the supplement [8], including: a bouncing ball on a sinusoidal surface [6,13] and a robot collision avoidance model [15].

# 5 Conclusion

This work presents a convenient mechanism for extending the dL term language with differentially-defined functions, thereby furthering the class of real-world systems amenable to modeling and formalization in KeYmaera X. Minimal soundness-critical changes are made to the KeYmaera X kernel, which maintains its trustworthiness while allowing the use of newly defined functions in concert with all existing dL hybrid systems reasoning principles implemented in KeYmaera X. Future work could formally verify these kernel changes by extending the existing formalization of dL [3]. Further integration of external arithmetic tools [1,9,23] will also help to broaden the classes of arithmetic sub-problems that can be solved effectively in hybrid systems proofs.

Acknowledgments. We thank the anonymous reviewers for their helpful feedback on this paper. This material is based upon work supported by the National Science Foundation under Grant No. CNS-1739629. This research was sponsored by the AFOSR under grant number FA9550-16-1-0288.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Automatic Complexity Analysis of Integer Programs via Triangular Weakly Non-Linear Loops**

Nils Lommen(B) , Fabian Meyer , and J¨urgen Giesl(B)

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany lommen@cs.rwth-aachen.de, giesl@informatik.rwth-aachen.de

**Abstract.** There exist several results on deciding termination and computing runtime bounds for *triangular weakly non-linear loops* (twn-loops). We show how to use results on such subclasses of programs where complexity bounds are computable within incomplete approaches for complexity analysis of full integer programs. To this end, we present a novel modular approach which computes local runtime bounds for subprograms which can be transformed into twn-loops. These local runtime bounds are then lifted to global runtime bounds for the whole program. The power of our approach is shown by our implementation in the tool KoAT which analyzes complexity of programs where all other state-ofthe-art tools fail.

# **1 Introduction**

Most approaches for automated complexity analysis of programs are based on incomplete techniques like ranking functions (see, e.g., [1–4,6,11,12,18, 20,21,31]). However, there also exist numerous results on subclasses of programs where questions concerning termination or complexity are *decidable*, e.g., [5,14,15,19,22,24,25,32,34]. In this work we consider the subclass of *triangular weakly non-linear loops* (twn-loops), where there exist *complete* techniques for analyzing termination and runtime complexity (we discuss the "completeness" and decidability of these techniques below). An example for a twn-loop is:

$$\text{while } (x\_1^2 + x\_3^5 < x\_2 \land x\_1 \neq 0) \text{ do } (x\_1, x\_2, x\_3) \leftarrow (-2 \cdot x\_1, 3 \cdot x\_2 - 2 \cdot x\_3^3, x\_3) \tag{1}$$

Its guard is a propositional formula over (possibly *non-linear* ) polynomial inequations. The update is *weakly non-linear*, i.e., no variable x<sup>i</sup> occurs non-linear in its own update. Furthermore, it is *triangular*, i.e., we can order the variables such that the update of any x<sup>i</sup> does not depend on the variables x1,...,x<sup>i</sup>−<sup>1</sup> with smaller indices. Then, by handling one variable after the other one can compute a *closed form* which corresponds to applying the loop's update n times. Using

c The Author(s) 2022

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2) and DFG Research Training Group 2236 UnRAVeL.

J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 734–754, 2022. https://doi.org/10.1007/978-3-031-10769-6\_43

these closed forms, termination can be reduced to an existential formula over Z [15] (whose validity is decidable for linear arithmetic and where SMT solvers often also prove (in)validity in the non-linear case). In this way, one can show that non-termination of twn-loops over Z is semi-decidable (and it is decidable over the real numbers).

While termination of twn-loops over Z is not decidable, by using the closed forms, [19] presented a "*complete*" complexity analysis technique. More precisely, for every twn-loop over Z, it infers a polynomial which is an upper bound on the runtime for all those inputs where the loop terminates. So for all (possibly non-linear) terminating twn-loops over Z, the technique of [19] *always* computes polynomial runtime bounds. In contrast, existing tools based on incomplete techniques for complexity analysis often fail for programs with non-linear arithmetic.

In [6,18] we presented such an incomplete modular technique for complexity analysis which uses individual ranking functions for different subprograms. Based on this, we now introduce a novel approach to automatically infer runtime bounds for programs possibly consisting of multiple consecutive or nested loops by handling some subprograms as twn-loops and by using ranking functions for others. In order to compute runtime bounds, we analyze subprograms in topological order, i.e., in case of multiple consecutive loops, we start with the first loop and propagate knowledge about the resulting values of variables to subsequent loops. By inferring runtime bounds for one subprogram after the other, in the end we obtain a bound on the runtime complexity of the whole program. We first try to compute runtime bounds for subprograms by so-called multiphase linear ranking functions (MΦRFs, see [3,4,18,20]). If MΦRFs do not yield a finite runtime bound for the respective subprogram, then we use our novel twn-technique on the unsolved parts of the subprogram. So for the first time, "complete" complexity analysis techniques like [19] for subclasses of programs with *non-linear* arithmetic are combined with incomplete techniques based on (linear) ranking functions like [6,18]. Based on our approach, in future work one could integrate "complete" techniques for further subclasses (e.g., for *solvable loops* [24,25,30,34] which can be transformed into twn-loops by suitable automorphisms [15]).

*Structure:* After introducing preliminaries in Sect. 2, in Sect. 3 we show how to lift a (local) runtime bound which is only sound for a subprogram to an overall global runtime bound. In contrast to previous techniques [6,18], our lifting approach works for any method of bound computation (not only for ranking functions). In Sect. 4, we improve the existing results on complexity analysis of twn- loops [14,15,19] such that they yield concrete polynomial bounds, we refine these bounds by considering invariants, and we show how to apply these results to full programs which contain twn-loops as subprograms. Section 5 extends this technique to larger subprograms which can be transformed into twn-loops. In Sect. 6 we evaluate the implementation of our approach in the complexity analysis tool KoAT and show that one can now also successfully analyze the runtime of programs containing non-linear arithmetic. We refer to [26] for all proofs.

**Fig. 1.** An Integer Program with a Nested Self-Loop

## **2 Preliminaries**

This section recapitulates preliminaries for complexity analysis from [6,18].

**Definition 1 (Atoms and Formulas).** *We fix a set* V *of variables. The set of* atoms <sup>A</sup>(V) *consists of all inequations* <sup>p</sup><sup>1</sup> < p<sup>2</sup> *for polynomials* <sup>p</sup>1, p<sup>2</sup> <sup>∈</sup> <sup>Z</sup>[V]*.* F(V) *is the set of all propositional* formulas *built from atoms* A(V)*,* ∧*, and* ∨*.*

In addition to "<", we also use "≥", "=", "=", etc., and negations "¬" which can be simulated by formulas (e.g., p<sup>1</sup> ≥ p<sup>2</sup> is equivalent to p<sup>2</sup> < p<sup>1</sup> + 1 for integers).

For integer programs, we use a formalism based on transitions, which also allows us to represent **while**-programs like (1) easily. Our programs may have *non-deterministic branching*, i.e., the guards of several applicable transitions can be satisfied. Moreover, *non-deterministic sampling* is modeled by *temporary variables* whose values are updated arbitrarily in each evaluation step.

# **Definition 2 (Integer Program).** (PV,L, 0, T ) *is an* integer program *where*


Transitions (0, , , ) are called *initial*. Note that <sup>0</sup> has no incoming transitions.

*Example 3.* Consider the program in Fig. 1 with PV = {x<sup>i</sup> | 1 ≤ i ≤ 5}, L = {<sup>i</sup> | 0 ≤ i ≤ 3}, and T = {t<sup>i</sup> | 0 ≤ i ≤ 5}, where t<sup>5</sup> has non-linear arithmetic in its guard and update. We omitted trivial guards, i.e., ϕ = true, and identity updates of the form η(v) = v. Thus, t<sup>5</sup> corresponds to the **while**-program (1).

<sup>A</sup> *state* is a mapping <sup>σ</sup> : V → <sup>Z</sup>, <sup>Σ</sup> denotes the set of all states, and L × <sup>Σ</sup> is the set of *configurations*. We also apply states to arithmetic expressions p or formulas ϕ, where the number σ(p) resp. the Boolean value σ(ϕ) results from replacing each variable v by σ(v). So for a state with σ(x1) = −8, σ(x2) = 55, and σ(x3) = 1, the expression x<sup>2</sup> <sup>1</sup> + x<sup>5</sup> <sup>3</sup> evaluates to σ(x<sup>2</sup> <sup>1</sup> + x<sup>5</sup> <sup>3</sup>) = 65 and the formula ϕ = (x<sup>2</sup> <sup>1</sup> + x<sup>5</sup> <sup>3</sup> < x2) evaluates to σ(ϕ) = (65 < 55) = false. From now on, we fix a program (PV,L, 0, T ).

**Definition 4 (Evaluation of Programs).** *For configurations* (, σ)*,* ( , σ ) *and* t = (t, ϕ, η, <sup>t</sup>) ∈ T *,* (, σ) →<sup>t</sup> ( , σ ) *is an* evaluation *step if* = t*,* = <sup>t</sup>*,* σ(ϕ) = true*, and* σ(η(v)) = σ (v) *for all* v ∈ PV*. Let* →<sup>T</sup> = - <sup>t</sup>∈T <sup>→</sup>t*, where we also write* <sup>→</sup> *instead of* <sup>→</sup><sup>t</sup> *or* <sup>→</sup><sup>T</sup> *. Let* (0, σ0) <sup>→</sup><sup>k</sup> (k, σk) *abbreviate* (0, σ0) → ... → (k, σk) *and let* (, σ) →<sup>∗</sup> ( , σ ) *if* (, σ) <sup>→</sup><sup>k</sup> ( , σ ) *for some* k ≥ 0*.*

So when denoting states <sup>σ</sup> as tuples (σ(x1),...,σ(x5)) <sup>∈</sup> <sup>Z</sup><sup>5</sup> for the program in Fig. 1, we have (0,(1, 5, 7, 1, 3)) →<sup>t</sup><sup>0</sup> (1,(1, 5, 7, 1, 3)) →<sup>t</sup><sup>1</sup> (3,(1, <sup>1</sup>, <sup>3</sup>, <sup>1</sup>, 3)) <sup>→</sup><sup>3</sup> <sup>t</sup><sup>5</sup> (3,(1, −8, 55, 1, 3)) →<sup>t</sup><sup>2</sup> .... The runtime complexity rc(σ0) of a program corresponds to the length of the longest evaluation starting in the initial state σ0.

**Definition 5 (Runtime Complexity).** *The* runtime complexity *is* rc:<sup>Σ</sup> <sup>→</sup> <sup>N</sup> *with* <sup>N</sup> <sup>=</sup> <sup>N</sup> ∪ {ω} *and* rc(σ0) = sup{<sup>k</sup> <sup>∈</sup> <sup>N</sup> | ∃( , σ ).(0, σ0) <sup>→</sup><sup>k</sup> ( , σ )}*.*

## **3 Computing Global Runtime Bounds**

We now introduce our general approach for computing (upper) runtime bounds. We use weakly monotonically increasing functions as bounds, since they can easily be "composed" (i.e., if f and g increase monotonically, then so does f ◦ g).

**Definition 6 (Bounds** [6,18]**).** *The set of* bounds B *is the smallest set with* <sup>N</sup> ⊆ B*,* PV ⊆ B*, and* {b<sup>1</sup> <sup>+</sup> <sup>b</sup>2, b<sup>1</sup> · <sup>b</sup>2, k<sup>b</sup><sup>1</sup> }⊆B *for all* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>b</sup>1, b<sup>2</sup> ∈ B*.*

A bound constructed from <sup>N</sup>, PV, +, and · is *polynomial*. So for PV <sup>=</sup> {x, y}, we have <sup>ω</sup>, <sup>x</sup><sup>2</sup>, <sup>x</sup> <sup>+</sup> <sup>y</sup>, 2<sup>x</sup>+<sup>y</sup> ∈ B. Here, <sup>x</sup><sup>2</sup> and <sup>x</sup> <sup>+</sup> <sup>y</sup> are polynomial bounds.

We measure the size of variables by their absolute values. For any σ ∈ Σ, |σ| is the state with |σ|(v) = |σ(v)| for all v ∈ V. So if σ<sup>0</sup> denotes the initial state, then |σ0| maps every variable to its initial "size", i.e., its initial absolute value. RBglo : T →B is a *global runtime bound* if for each transition t and initial state σ<sup>0</sup> ∈ Σ, RBglo(t) evaluated in the state |σ0| over-approximates the number of evaluations of t in any run starting in the configuration (0, σ0). Let →<sup>∗</sup> <sup>T</sup> ◦ →<sup>t</sup> denote the relation where arbitrary many evaluation steps are followed by a step with t.

**Definition 7 (Global Runtime Bound** [6,18]**).** *The function* RB*glo* : T → B *is a* global runtime bound *if for all* t ∈ T *and all states* σ<sup>0</sup> ∈ Σ *we have* <sup>|</sup>σ0|(RB*glo*(t)) <sup>≥</sup> sup{<sup>k</sup> <sup>∈</sup> <sup>N</sup> | ∃ ( , σ ). (0, σ0) (→<sup>∗</sup> <sup>T</sup> ◦ →<sup>t</sup>)<sup>k</sup> ( , σ )}*.*

For the program in Fig. 1, in Example 12 we will infer RBglo(t0) = 1, RBglo(ti) = x<sup>4</sup> for 1 ≤ i ≤ 4, and RBglo(t5)=8 · x<sup>4</sup> · x<sup>5</sup> + 13006 · x4. By adding the bounds for all transitions, a global runtime bound RBglo yields an upper bound on the program's runtime complexity. So for all σ<sup>0</sup> ∈ Σ we have |σ0|( <sup>t</sup>∈T RBglo(t)) <sup>≥</sup> rc(σ0).

For *local runtime bounds*, we consider the *entry transitions* of subsets T ⊆ T .

**Definition 8 (Entry Transitions** [6,18]**).** *Let* <sup>∅</sup> <sup>=</sup> <sup>T</sup> ⊆ T *. Its* entry transitions *are* E<sup>T</sup> - = {t | t= (, ϕ, η, )∈ T \T ∧ *there is a transition* ( , , , )∈ T }*.*

So in Fig. 1, we have ET \{t0} = {t0} and E{t5} = {t1, t4}.

In contrast to global runtime bounds, a *local* runtime bound RBloc : E<sup>T</sup> - → B only takes a subset T into account. A *local run* is started by an entry transition r ∈ E<sup>T</sup> followed by transitions from T . A *local runtime bound* considers a subset T <sup>&</sup>gt; ⊆ T and over-approximates the number of evaluations of any transition from T <sup>&</sup>gt; in an arbitrary local run of the subprogram with the transitions T . More precisely, for every t ∈ T <sup>&</sup>gt;, RBloc(r) over-approximates the number of applications of t in any run of T , if T is entered via r ∈ E<sup>T</sup> - . However, local runtime bounds do not consider how often an entry transition from E<sup>T</sup> is evaluated or how large a variable is when we evaluate an entry transition. To illustrate that RBloc(r) is a bound on the number of evaluations of transitions from T <sup>&</sup>gt; after evaluating r, we often write RBloc(→<sup>r</sup> T <sup>&</sup>gt;) instead of RBloc(r).

**Definition 9 (Local Runtime Bound).** *Let* <sup>∅</sup> <sup>=</sup> <sup>T</sup> <sup>&</sup>gt; ⊆ T ⊆ T *. The function* RB*loc* : E<sup>T</sup> - → B *is a* local runtime bound *for* T <sup>&</sup>gt; *w.r.t.* T *if for all* t ∈ T >*, all* r ∈ E<sup>T</sup> *with* r = (, , , )*, and all* σ ∈ Σ *we have* |σ|(RB*loc*(→<sup>r</sup> T <sup>&</sup>gt;)) ≥ sup{<sup>k</sup> <sup>∈</sup> <sup>N</sup> | ∃ <sup>σ</sup>0,( , σ ). (0, σ0) →<sup>∗</sup> <sup>T</sup> ◦ →<sup>r</sup> (, σ) (→<sup>∗</sup> T - ◦ →<sup>t</sup>)<sup>k</sup> ( , σ )}*.*

Our approach is *modular* since it computes local bounds for program parts separately. To lift local to global runtime bounds, we use *size bounds* SB(t, v) to over-approximate the size (i.e., absolute value) of the variable v after evaluating t in any run of the program. See [6] for the automatic computation of size bounds.

**Definition 10 (Size Bound** [6,18]**).** *The function* SB : (T × PV) → B *is a* size bound *if for all* (t, v) ∈ T × PV *and all states* σ<sup>0</sup> ∈ Σ *we have* |σ0|(SB(t, v)) ≥ sup{|σ (v)||∃ ( , σ ). (0, σ0) (→<sup>∗</sup> ◦ →<sup>t</sup>) ( , σ )}*.*

To compute global from local runtime bounds RBloc(→<sup>r</sup> T <sup>&</sup>gt;) and size bounds SB(r, v), Theorem 11 generalizes the approach of [6,18]. Each local run is started by an entry transition r. Hence, we use an already computed global runtime bound RBglo(r) to over-approximate the number of times that such a local run is started. To over-approximate the size of each variable v when entering the local run, we instantiate it by the size bound SB(r, v). So size bounds on previous transitions are needed to compute runtime bounds, and similarly, runtime bounds are needed to compute size bounds in [6]. For any bound b, "b [v/SB(r, v) | v ∈ PV]" results from b by replacing every program variable v by SB(r, v). Here, weak monotonic increase of b ensures that the over-approximation of the variables v in b by SB(r, v) indeed also leads to an over-approximation of b. The analysis starts with an *initial* runtime bound RBglo and an *initial* size bound SB which map all transitions resp. all pairs from T ×PV to ω, except for the transitions t which do not occur in cycles of T , where RBglo(t) = 1. Afterwards, RBglo and SB are refined repeatedly, where we alternate between computing runtime and size bounds.

**Theorem 11 (Computing Global Runtime Bounds).** *Let* RB*glo be a global runtime bound,* SB *be a size bound, and* <sup>∅</sup> <sup>=</sup> <sup>T</sup> <sup>&</sup>gt; ⊆ T ⊆ T *such that* T *contains no initial transitions. Moreover, let* RB*loc be a local runtime bound for* T <sup>&</sup>gt; *w.r.t.* T *. Then* RB *glo is also a global runtime bound, where for all* t ∈ T *we define:*

$$\mathcal{R}\mathcal{B}\_{glob}^{\prime}(t) = \begin{cases} \mathcal{R}\mathcal{B}\_{glob}(t), & \text{if } t \in \mathcal{T} \backslash \mathcal{T}\_{>}^{\prime} \\ \sum\_{r \in \mathcal{E}\_{T'}} \mathcal{R}\mathcal{B}\_{glob}(r) \cdot (\mathcal{R}\mathcal{B}\_{loc}(\to\_{r} \mathcal{T}\_{>}^{\prime}) \left[v/\mathcal{S}\mathcal{B}(r,v) \mid v \in \mathcal{P}\mathcal{V}\right]), \text{ if } t \in \mathcal{T}\_{>}^{\prime} \end{cases}$$

*Example 12.* For the example in Fig. 1, we first use T <sup>&</sup>gt; = {t2} and T = T \ {t0}. With the ranking function x<sup>4</sup> one obtains RBloc(→<sup>t</sup><sup>0</sup> T <sup>&</sup>gt;) = x4, since t<sup>2</sup> decreases the value of x<sup>4</sup> and no transition increases it. Then we can infer the global runtime bound RBglo(t2) = RBglo(t0)·(x<sup>4</sup> [v/SB(t0, v) | v ∈ PV]) = x<sup>4</sup> as RBglo(t0) = 1 (since t<sup>0</sup> is evaluated at most once) and SB(t0, x4) = x<sup>4</sup> (since t<sup>0</sup> does not change any variables). Similarly, we can infer RBglo(t1) = RBglo(t3) = RBglo(t4) = x4.

For T <sup>&</sup>gt; = T = {t5}, our twn-approach in Sect. 4 will infer the local runtime bound RBloc : E{t5} → B with RBloc(→<sup>t</sup><sup>1</sup> {t5})=4 · x<sup>2</sup> + 3 and RBloc(→<sup>t</sup><sup>4</sup> {t5})=4 · <sup>x</sup><sup>2</sup> + 4 · <sup>x</sup><sup>3</sup> <sup>3</sup> + 4 · <sup>x</sup><sup>5</sup> <sup>3</sup> + 3 in Example 30. By Theorem 11 we obtain the global bound

$$\begin{array}{lcl} \mathcal{TR}\_{\text{glob}}(t\_{5}) &= \mathcal{TR}\_{\text{glob}}(t\_{1}) \cdot (\mathcal{TR}\_{\text{loc}}(\rightarrow\_{t\_{1}}\{t\_{5}\})[v/\mathcal{SR}(t\_{1},v) \mid v \in \mathcal{PV}]) + \\ & \mathcal{TR}\_{\text{glob}}(t\_{4}) \cdot (\mathcal{TR}\_{\text{loc}}(\rightarrow\_{t\_{4}}\{t\_{5}\})[v/\mathcal{SR}(t\_{4},v) \mid v \in \mathcal{PV}]) \\ &= x\_{4} \cdot (4 \cdot x\_{5} + 3) + x\_{4} \cdot (4 \cdot x\_{5} + 4 \cdot 5^{3} + 4 \cdot 5^{5} + 3) \\ & \qquad \quad \quad \quad (\text{as } \mathcal{SR}(t\_{1}, x\_{2}) = \mathcal{SR}(t\_{4}, x\_{2}) = x\_{5} \text{ and } \mathcal{SR}(t\_{4}, x\_{3}) = 5) \\ &= 8 \cdot x\_{4} \cdot x\_{5} + 13006 \cdot x\_{4}. \end{array}$$

Thus, rc(σ0) ∈ O(n<sup>2</sup>) where <sup>n</sup> is the largest initial absolute value of all program variables. While the approach of [6,18] was limited to local bounds resulting from ranking functions, here we need our Theorem 11. It allows us to use both local bounds resulting from twn-loops (for the non-linear transition t<sup>5</sup> where tools based on ranking functions cannot infer a bound, see Sect. 6) and local bounds resulting from ranking functions (for t1,...,t4, since our twn-approach of Sect. 4 and 5 is limited to so-called simple cycles and cannot handle the full program).

In contrast to [6,18], we allow different local bounds for different entry transitions in Definition 9 and Theorem 11. Our example demonstrates that this can indeed lead to a smaller asymptotic bound for the whole program: By distinguishing the cases where t<sup>5</sup> is reached via t<sup>1</sup> or t4, we end up with a quadratic bound, because the local bound RBloc(→<sup>t</sup><sup>1</sup> {t5}) is linear and while x<sup>3</sup> occurs with degrees 5 and 3 in RBloc(→<sup>t</sup><sup>4</sup> {t5}), the size bound for x<sup>3</sup> is constant after t<sup>3</sup> and t4.

To improve size and runtime bounds repeatedly, we treat the strongly connected components (SCCs)<sup>1</sup> of the program in topological order such that

<sup>1</sup> As usual, a graph is *strongly connected* if there is a path from every node to every other node. A *strongly connected component* is a maximal strongly connected subgraph.

improved bounds for previous transitions are already available when handling the next SCC. We first try to infer local runtime bounds by multiphase-linear ranking functions (see [18] which also contains a heuristic for choosing T <sup>&</sup>gt; and T when using ranking functions). If ranking functions do not yield finite local bounds for all transitions of the SCC, then we apply the twn-technique from Sect. 4 and 5 on the remaining unbounded transitions (see Sect. 5 for choosing T <sup>&</sup>gt; and T in that case). Afterwards, the global runtime bound is updated according to Theorem 11.

## **4 Local Runtime Bounds for Twn-Self-Loops**

In Sect. 4.1 we recapitulate twn-loops and their termination in our setting. Then in Sect. 4.2 we present a (complete) algorithm to infer polynomial runtime bounds for all terminating twn-loops. Compared to [19], we increased its precision considerably by computing bounds that take the different roles of the variables into account and by using over-approximations to remove monomials. Moreover, we show how our algorithm can be used to infer local runtime bounds for twn-loops occurring in integer programs. Section 5 will show that our algorithm can also be applied to infer runtime bounds for larger cycles in programs instead of just self-loops.

### **4.1 Termination of Twn-Loops**

Definition 13 extends the definition of twn-loops in [15,19] by an initial transition and an update-invariant. Here, ψ is an *update-invariant* if |= ψ → η(ψ) where η is the update of the transition (i.e., invariance must hold independent of the guard).

**Definition 13. (Twn-Loop).** *An integer program* (PV,L, 0, T ) *is a* triangular weakly non-linear loop (twn-loop) *if* PV = {x1,...,x<sup>d</sup>} *for some* d ≥ 1*,* L = {0, }*, and* T = {t0, t} *with* t<sup>0</sup> = (0,ψ, id, ) *and* t = (, ϕ, η, ) *for some* ψ,ϕ ∈ F(PV) *with* |= ψ → η(ψ)*, where* id(v) = v *for all* v ∈ PV*, and for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup> *we have* <sup>η</sup>(xi) = <sup>c</sup><sup>i</sup> · <sup>x</sup><sup>i</sup> <sup>+</sup> <sup>p</sup><sup>i</sup> *for some* <sup>c</sup><sup>i</sup> <sup>∈</sup> <sup>Z</sup> *and some polynomial* <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>Z</sup>[x<sup>i</sup>+1,...,xd]*. We often denote the loop by* (ψ, ϕ, η) *and refer to* <sup>ψ</sup>*,* <sup>ϕ</sup>*,* <sup>η</sup> *as its (update-) invariant, guard, and update, respectively. If* c<sup>i</sup> ≥ 0 *holds for all* 1 ≤ i ≤ d*, then the program is a* non-negative *triangular weakly non-linear loop* (tnn-loop)*.*

*Example 14.* The program consisting of the initial transition (0, true, id, 3) and the self-loop t<sup>5</sup> in Fig. 1 is a twn-loop (corresponding to the **while**-loop (1)). This loop terminates as every iteration increases x<sup>2</sup> <sup>1</sup> by a factor of 4 whereas x<sup>2</sup> is only tripled. Thus, x<sup>2</sup> <sup>1</sup> + x<sup>5</sup> <sup>3</sup> eventually outgrows the value of x2.

To transform programs into twn- or tnn-form, one can combine subsequent transitions by *chaining*. Here, similar to states σ, we also apply the update η to polynomials and formulas by replacing each program variable v by η(v).

**Definition 15 (Chaining).** *Let* t1,...,t<sup>n</sup> *be a sequence of transitions without temporary variables where* t<sup>i</sup> = (i, ϕi, ηi, i+1) *for all* 1 ≤ i ≤ n − 1*, i.e., the target location of* t<sup>i</sup> *is the start location of* ti+1*. We may have* t<sup>i</sup> = t<sup>j</sup> *for* i = j*, i.e., a transition may occur several times in the sequence. Then the transition* t<sup>1</sup> ... t<sup>n</sup> = (1, ϕ, η, n+1) *results from* chaining t1,...,t<sup>n</sup> *where*

ϕ = ϕ<sup>1</sup> ∧ η1(ϕ2) ∧ η2(η1(ϕ3)) ∧ ... ∧ ηn−<sup>1</sup>(...η1(ϕn)...) η(v) = ηn(...η1(v)...) *for all* v ∈ PV*, i.e.,* η = η<sup>n</sup> ◦ ... ◦ η1*.*

Similar to [15,19], we can restrict ourselves to tnn-loops, since chaining transforms any twn-loop L into a tnn-loop L L. Chaining preserves the termination behavior, and a bound on L L's runtime can be transformed into a bound for L.

**Lemma 16 (Chaining Preserves Asymptotic Runtime, see** [19, Lemma 18]**).** *For the twn-loop* L = (ψ, ϕ, η) *with the transitions* t<sup>0</sup> = (0,ψ, id, )*,* t = (, ϕ, η, )*, and runtime complexity* rcL*, the program* L L *with the transitions* t<sup>0</sup> *and* t t = (ψ,ϕ ∧ η(ϕ), η ◦ η) *is a tnn-loop. For its runtime complexity* rcL-L*, we have* 2 · rcL-<sup>L</sup>(σ) ≤ rcL(σ) ≤ 2 · rcL-<sup>L</sup>(σ)+1 *for all* σ ∈ Σ*.*

*Example 17.* The program of Example 14 is only a twn-loop and not a tnnloop as x<sup>1</sup> occurs with a negative coefficient −2 in its own update. Hence, we chain the loop and consider t<sup>5</sup> t5. The update of t<sup>5</sup> t<sup>5</sup> is (η ◦ η)(x1)=4 · x1, (<sup>η</sup> ◦ <sup>η</sup>)(x2)=9 · <sup>x</sup><sup>2</sup> <sup>−</sup> <sup>8</sup> · <sup>x</sup><sup>3</sup> <sup>3</sup>, and (η ◦ η)(x3) = x3. To ease the presentation, in this example we will keep the guard ϕ instead of using ϕ ∧ η(ϕ) (ignoring η(ϕ) in the conjunction of the guard does not decrease the runtime complexity).

Our algorithm starts with computing a closed form for the loop update, which describes the values of the program variables after n iterations of the loop. Formally, a tuple of arithmetic expressions cl<sup>n</sup> *<sup>x</sup>* = (cl<sup>n</sup> <sup>x</sup><sup>1</sup> ,..., cl<sup>n</sup> <sup>x</sup><sup>d</sup> ) over the variables *x* = (x1,...,xd) and the distinguished variable n is a *(normalized) closed form* for the update η with *start value* n<sup>0</sup> ≥ 0 if for all 1 ≤ i ≤ d and all <sup>σ</sup> : {x1,...,xd, n} → <sup>Z</sup> with <sup>σ</sup>(n) <sup>≥</sup> <sup>n</sup>0, we have <sup>σ</sup>(cl<sup>n</sup> <sup>x</sup><sup>i</sup> ) = σ(η<sup>n</sup>(xi)). As shown in [14,15,19], for tnn-loops such a normalized closed form and the start value n<sup>0</sup> can be computed by handling one variable after the other, and these normalized closed forms can be represented as so-called *normalized polyexponential expressions*. Here, <sup>N</sup>≥<sup>m</sup> stands for {<sup>x</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>x</sup> <sup>≥</sup> <sup>m</sup>}.

**Definition 18. (Normalized Poly-Exponential Expression** [14,15,19]**).** *Let* PV = {x1,...,x<sup>d</sup>}*. Then we define the set of all* normalized poly-exponential expressions *by* NPE <sup>=</sup> { <sup>j</sup>=1 <sup>p</sup><sup>j</sup> · <sup>n</sup><sup>a</sup><sup>j</sup> · <sup>b</sup><sup>n</sup> j , a<sup>j</sup> <sup>∈</sup> <sup>N</sup>, p<sup>j</sup> <sup>∈</sup> <sup>Q</sup>[PV], b<sup>j</sup> <sup>∈</sup> <sup>N</sup>≥<sup>1</sup>}*.*

*Example 19.* A normalized closed form (with start value n<sup>0</sup> = 0) for the tnn-loop in Example 17 is cl<sup>n</sup> <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> · <sup>4</sup><sup>n</sup>, cl<sup>n</sup> <sup>x</sup><sup>2</sup> = (x<sup>2</sup> <sup>−</sup> <sup>x</sup><sup>3</sup> <sup>3</sup>) · <sup>9</sup><sup>n</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>3</sup>, and cl<sup>n</sup> <sup>x</sup><sup>3</sup> = x3.

Using the normalized closed form, similar to [15] one can represent nontermination of a tnn-loop (ψ, ϕ, η) by the formula

$$\exists \, \mathbf{x} \in \mathbb{Z}^d, \ m \in \mathbb{N}. \,\forall n \in \mathbb{N}\_{\geq m}. \,\psi \land \varphi[\mathbf{x}/\mathbf{c1}\_x^n]. \tag{2}$$

Here, ϕ[*x*/cl<sup>n</sup> *<sup>x</sup>*] means that each variable <sup>x</sup><sup>i</sup> in <sup>ϕ</sup> is replaced by cl<sup>n</sup> <sup>x</sup><sup>i</sup> . Since ψ is an update-invariant, if ψ holds, then ψ[*x*/cl<sup>n</sup> *<sup>x</sup>*] holds as well for all n ≥ n0. Hence, whenever <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>≥m. ψ <sup>∧</sup> <sup>ϕ</sup>[*x*/cl<sup>n</sup> *<sup>x</sup>*] holds, then clmax{n0,m} *<sup>x</sup>* witnesses non-termination. Thus, invalidity of (2) is equivalent to termination of the loop.

Normalized poly-exponential expressions have the advantage that it is always clear which addend determines their asymptotic growth when increasing n. So as in [15], (2) can be transformed into an existential formula and we use an SMT solver to prove its invalidity in order to prove termination of the loop. As shown in [15, Theorem 42], non-termination of twn-loops over Z is semi-decidable and deciding termination is Co-NP-complete if the loop is linear and the eigenvalues of the update matrix are rational.

#### **4.2 Runtime Bounds for Twn-Loops via Stabilization Thresholds**

As observed in [19], since the closed forms for tnn-loops are poly-exponential expressions that are weakly monotonic in n, every tnn-loop (ψ, ϕ, η) *stabilizes* for each input *<sup>e</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>. So there is a number of loop iterations (a *stabilization threshold* sth(ψ,ϕ,η)(*e*)), such that the truth value of the loop guard ϕ does not change anymore when performing further loop iterations. Hence, the runtime of every terminating tnn-loop is bounded by its stabilization threshold.

**Definition 20 (Stabilization Threshold).** *Let* (ψ, ϕ, η) *be a tnn-loop with* PV <sup>=</sup> {x1,...,x<sup>d</sup>}*. For each <sup>e</sup>* = (e1,...,ed) <sup>∈</sup> <sup>Z</sup><sup>d</sup>*, let* <sup>σ</sup>*<sup>e</sup>* <sup>∈</sup> <sup>Σ</sup> *with* <sup>σ</sup>*<sup>e</sup>* (xi) = <sup>e</sup><sup>i</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup>*. Let* <sup>Ψ</sup> <sup>⊆</sup> <sup>Z</sup><sup>d</sup> *such that <sup>e</sup>* <sup>∈</sup> <sup>Ψ</sup> *iff* <sup>σ</sup>*<sup>e</sup>* (ψ) *holds. Then* sth(ψ,ϕ,η) : <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>N</sup> *is the* stabilization threshold *of* (ψ, ϕ, η) *if for all <sup>e</sup>* <sup>∈</sup> <sup>Ψ</sup>*,* sth(ψ,ϕ,η)(*e*) *is the smallest number such that* σ*<sup>e</sup>* <sup>η</sup><sup>n</sup>(ϕ) <sup>↔</sup> <sup>η</sup>sth(ψ,ϕ,η)(*e*) (ϕ) *holds for all* n ≥ sth(ψ,ϕ,η)(*e*)*.*

For the tnn-loop from Example 17, it will turn out that 2 · <sup>x</sup><sup>2</sup> + 2 · <sup>x</sup><sup>3</sup> <sup>3</sup> + 2 · <sup>x</sup><sup>5</sup> <sup>3</sup> + 1 is an upper bound on its stabilization threshold, see Example 28.

To compute such upper bounds on a tnn-loop's stabilization threshold (i.e., upper bounds on its runtime if the loop is terminating), we now present a construction based on *monotonicity thresholds*, which are computable [19, Lemma 12].

**Definition 21 (Monotonicity Threshold** [19]**).** *Let* (b1, a1),(b2, a2) <sup>∈</sup> <sup>N</sup><sup>2</sup> *such that* (b1, a1) >lex (b2, a2) *(i.e.,* b<sup>1</sup> > b<sup>2</sup> *or both* b<sup>1</sup> = b<sup>2</sup> *and* a<sup>1</sup> > a2*). For any* <sup>k</sup> <sup>∈</sup> <sup>N</sup>≥<sup>1</sup>*, the* <sup>k</sup>*-*monotonicity threshold *of* (b1, a1) *and* (b2, a2) *is the smallest* <sup>n</sup><sup>0</sup> <sup>∈</sup> <sup>N</sup> *such that for all* <sup>n</sup> <sup>≥</sup> <sup>n</sup><sup>0</sup> *we have* <sup>n</sup><sup>a</sup><sup>1</sup> · <sup>b</sup><sup>n</sup> <sup>1</sup> > k · <sup>n</sup><sup>a</sup><sup>2</sup> · <sup>b</sup><sup>n</sup> 2 *.*

For example, the 1-monotonicity threshold of (4, 0) and (3, 1) is 7 as the largest root of <sup>f</sup>(n)=4<sup>n</sup> <sup>−</sup> <sup>n</sup> · <sup>3</sup><sup>n</sup> is approximately 6.5139.

Our procedure again instantiates the variables of the loop guard ϕ by the normalized closed form cl<sup>n</sup> *<sup>x</sup>* of the loop's update. However, in the poly-exponential expressions <sup>j</sup>=1 <sup>p</sup><sup>j</sup> · <sup>n</sup><sup>a</sup><sup>j</sup> · <sup>b</sup><sup>n</sup> <sup>j</sup> resulting from ϕ[*x*/cl<sup>n</sup> *x*], the corresponding technique of [19, Lemma 21] over-approximated the polynomials p<sup>j</sup> by a polynomial

that did not distinguish the effects of the different variables x1,...,xd. Such an over-approximation is only useful for a direct asymptotic bound on the runtime of the twn-loop, but it is too coarse for a useful *local* runtime bound within the complexity analysis of a larger program. For instance, in Example 12 it is crucial to obtain local bounds like 4 · <sup>x</sup><sup>2</sup> + 4 · <sup>x</sup><sup>3</sup> <sup>3</sup> + 4 · <sup>x</sup><sup>5</sup> <sup>3</sup> + 3 which indicate that only the variable x<sup>3</sup> may influence the runtime with an exponent of 3 or 5. Thus, if the size of x<sup>3</sup> is bound by a constant, then the resulting global bound becomes linear.

So we now improve precision and over-approximate the polynomials p<sup>j</sup> by the polynomial {p1,...,p} which contains every monomial <sup>x</sup><sup>e</sup><sup>1</sup> <sup>1</sup> · ... · <sup>x</sup><sup>e</sup><sup>d</sup> <sup>d</sup> of {p1,...,p}, using the absolute value of the largest coefficient with which the monomial occurs in {p1,...,p}. Thus, {x<sup>3</sup> <sup>3</sup> <sup>−</sup> <sup>x</sup><sup>5</sup> <sup>3</sup>, x<sup>2</sup> <sup>−</sup> <sup>x</sup><sup>3</sup> <sup>3</sup>} <sup>=</sup> <sup>x</sup><sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>3</sup> + x<sup>5</sup> <sup>3</sup>. In the following let *<sup>x</sup>* = (x1,...,xd), and for *<sup>e</sup>* = (e1,...,ed) <sup>∈</sup> <sup>N</sup><sup>d</sup>, *xe* denotes xe1 <sup>1</sup> · ... · <sup>x</sup><sup>e</sup><sup>d</sup> <sup>d</sup> .

**Definition 22 (Over-Approximation of Polynomials).** *Let* p1,...,p ∈ <sup>Z</sup>[*x*]*, and for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> *, let* <sup>I</sup><sup>j</sup> <sup>⊆</sup> (Z\ {0})×N<sup>d</sup> *be the* index set *of the polynomial* p<sup>j</sup> *where* p<sup>j</sup> = (c,*e*)∈I<sup>j</sup> <sup>c</sup>·*x<sup>e</sup> and there are no* <sup>c</sup> <sup>=</sup> <sup>c</sup> *with* (c, *<sup>e</sup>*),(c , *e*) ∈ I<sup>j</sup> *. For all <sup>e</sup>* <sup>∈</sup> <sup>N</sup><sup>d</sup> *we define* <sup>c</sup>*<sup>e</sup>* <sup>∈</sup> <sup>N</sup> *with* <sup>c</sup>*<sup>e</sup>* = max{|c| | (c, *<sup>e</sup>*) ∈ I<sup>1</sup> <sup>∪</sup> ... ∪ I}*, where* max <sup>∅</sup> = 0*. Then the* over-approximation *of* <sup>p</sup>1,...,p *is* {p1,...,p} <sup>=</sup> *<sup>e</sup>*∈N<sup>d</sup> <sup>c</sup>*<sup>e</sup>* · *xe .*

Clearly, {p1,...,p} indeed over-approximates the absolute value of each p<sup>j</sup> .

**Corollary 23 (Soundness of** {p1,...,p}**).** *For all* <sup>σ</sup> : {x1,...,x<sup>d</sup>} → <sup>Z</sup> *and all* 1 ≤ j ≤ *, we have* |σ|({p1,...,p}) ≥ |σ(p<sup>j</sup> )|*.*

A drawback is that {p1,...,p} considers all monomials and to obtain weakly monotonically increasing bounds from B, it uses the absolute values of their coefficients. This can lead to polynomials of unnecessarily high degree. To improve the precision of the resulting bounds, we now allow to over-approximate the poly-exponential expressions <sup>j</sup>=1 <sup>p</sup><sup>j</sup> ·n<sup>a</sup><sup>j</sup> ·b<sup>n</sup> <sup>j</sup> which result from instantiating the variables of the loop guard by the closed form. For this over-approximation, we take the invariant ψ of the tnn-loop into account. So while (2) showed that update-invariants ψ can restrict the sets of possible witnesses for non-termination and thus simplify the termination proofs of twn-loops, we now show that preconditions ψ can also be useful to improve the bounds on twn-loops.

More precisely, Definition <sup>24</sup> allows us to replace addends <sup>p</sup>·n<sup>a</sup> ·b<sup>n</sup> by <sup>p</sup>·n<sup>i</sup> ·j<sup>n</sup> where (j, i) >lex (b, a) if the monomial p is always positive (when the precondition ψ is fulfilled) and where (b, a) >lex (i, j) if p is always non-positive.

**Definition 24 (Over-Approximation of Poly-Exponential Expressions).** *Let* ψ ∈ F(PV) *and let* npe = (p,a,b)∈<sup>Λ</sup> <sup>p</sup> · <sup>n</sup><sup>a</sup> · <sup>b</sup><sup>n</sup> <sup>∈</sup> NPE *where* <sup>Λ</sup> *is a set of tuples* (p, a, b) *containing a monomial*<sup>2</sup> <sup>p</sup> *and two numbers* a, b <sup>∈</sup> <sup>N</sup>*. Here, we*

<sup>2</sup> Here, we consider monomials of the form <sup>p</sup> <sup>=</sup> <sup>c</sup> · <sup>x</sup><sup>e</sup><sup>1</sup> <sup>1</sup> · ... · <sup>x</sup><sup>e</sup><sup>d</sup> <sup>d</sup> with coefficients <sup>c</sup> <sup>∈</sup> <sup>Q</sup>.

*may have* (p, a, b),(p , a, b) ∈ Λ *for* p = p *. Let* Δ, Γ ⊆ Λ *such that* |= ψ → (p > 0) *holds for all* (p, a, b) ∈ Δ *and* |= ψ → (p ≤ 0) *holds for all* (p, a, b) ∈ Γ*.* <sup>3</sup> *Then*

$$\lceil npe \rceil\_{\Delta,\Gamma}^{\psi} = \sum\_{(p,a,b)\in\Delta\uplus\Gamma} p \cdot n^{i\_{\langle p,a,b\rangle}} \cdot j\_{\langle p,a,b\rangle}^n + \sum\_{(p,a,b)\in\Lambda\backslash(\Delta\uplus\Gamma)} p \cdot n^a \cdot b^n$$

*is an* over-approximation *of* npe *if* <sup>i</sup>(p,a,b), j(p,a,b) <sup>∈</sup> <sup>N</sup> *are numbers such that* (j(p,a,b), i(p,a,b)) >lex (b, a) *holds if* (p, a, b) ∈ Δ *and* (b, a) >lex (j(p,a,b), i(p,a,b)) *holds if* (p, a, b) ∈ Γ*. Note that* i(p,a,b) *or* j(p,a,b) *can also be* 0*.*

*Example 25.* Let npe <sup>=</sup> <sup>q</sup><sup>3</sup> · <sup>16</sup><sup>n</sup> <sup>+</sup>q<sup>2</sup> · <sup>9</sup><sup>n</sup> <sup>+</sup>q<sup>1</sup> <sup>=</sup> <sup>q</sup><sup>3</sup> · <sup>16</sup><sup>n</sup> <sup>+</sup>q <sup>2</sup> · <sup>9</sup><sup>n</sup> <sup>+</sup>q <sup>2</sup> · <sup>9</sup><sup>n</sup> <sup>+</sup>q <sup>1</sup> +q 1 , where <sup>q</sup><sup>3</sup> <sup>=</sup> <sup>−</sup>x<sup>2</sup> <sup>1</sup>, q<sup>2</sup> = q <sup>2</sup> + q <sup>2</sup> , q <sup>2</sup> = x2, q <sup>2</sup> <sup>=</sup> <sup>−</sup>x<sup>3</sup> <sup>3</sup>, q<sup>1</sup> = q <sup>1</sup> + q <sup>1</sup> , q <sup>1</sup> = x<sup>3</sup> 3, q <sup>1</sup> <sup>=</sup> <sup>−</sup>x<sup>5</sup> <sup>3</sup>, and <sup>ψ</sup> = (x<sup>3</sup> <sup>&</sup>gt; 0). We can choose <sup>Δ</sup> <sup>=</sup> {(x<sup>3</sup> <sup>3</sup>, 0, 1)} since |= ψ → (x<sup>3</sup> <sup>3</sup> <sup>&</sup>gt; 0) and <sup>Γ</sup> <sup>=</sup> {(−x<sup>5</sup> <sup>3</sup>, <sup>0</sup>, 1)} since <sup>|</sup><sup>=</sup> <sup>ψ</sup> <sup>→</sup> (−x<sup>5</sup> <sup>3</sup> ≤ 0). Moreover, we choose j(x<sup>3</sup> <sup>3</sup>,0,1) = 9, i(x<sup>3</sup> <sup>3</sup>,0,1) = 0, which is possible since (9, 0) >lex (1, 0). Similarly, we choose j(−x<sup>5</sup> <sup>3</sup>,0,1) = 0, i(−x<sup>5</sup> <sup>3</sup>,0,1) = 0, since (1, 0) >lex (0, 0). Thus, we replace x3 <sup>3</sup> and <sup>−</sup>x<sup>5</sup> <sup>3</sup> by the larger addends x<sup>3</sup> <sup>3</sup> · <sup>9</sup><sup>n</sup> and 0. The motivation for the latter is that this removes all addends with exponent 5 from npe. The motivation for the former is that then, we have both the addends <sup>−</sup>x<sup>3</sup> <sup>3</sup> · <sup>9</sup><sup>n</sup> and <sup>x</sup><sup>3</sup> <sup>3</sup> · <sup>9</sup><sup>n</sup> in the expression which cancel out, i.e., this removes all addends with exponent 3. Hence, we obtain npe ψ Δ,Γ <sup>=</sup> <sup>p</sup><sup>2</sup> · <sup>16</sup><sup>n</sup> <sup>+</sup> <sup>p</sup><sup>1</sup> · <sup>9</sup><sup>n</sup> with <sup>p</sup><sup>2</sup> <sup>=</sup> <sup>−</sup>x<sup>2</sup> <sup>1</sup> and p<sup>1</sup> = x2. To find a suitable over-approximation which removes addends with high exponents, our implementation uses a heuristic for the choice of Δ, Γ, i(p,a,b), and j(p,a,b).

The following lemma shows the soundness of the over-approximation npe ψ Δ,Γ .

**Lemma 26 (Soundness of** npe ψ Δ,Γ **).** *Let* ψ*,* npe*,* Δ*,* Γ*,* i(p,a,b)*,* j(p,a,b)*, and* npe ψ Δ,Γ *be as in Definition 24, and let* <sup>D</sup>npe <sup>ψ</sup> Δ,Γ =

max( {1*-monotonicity threshold of* (j(p,a,b), i(p,a,b)) *and* (b, a) | (p, a, b) ∈ Δ} ∪ {1*-monotonicity threshold of* (b, a) *and* (j(p,a,b), i(p,a,b)) | (p, a, b) ∈ Γ}).

*Then for all <sup>e</sup>* <sup>∈</sup> <sup>Ψ</sup> *and all* <sup>n</sup> <sup>≥</sup> <sup>D</sup>npe <sup>ψ</sup> Δ,Γ *, we have* σ*<sup>e</sup>* (npe ψ Δ,Γ ) ≥ σ*<sup>e</sup>* (npe)*.*

For any terminating tnn-loop (ψ, ϕ, η), Theorem 27 now uses the new concepts of Definition 22 and 24 to compute a polynomial sth which is an upper bound on the loop's stabilization threshold (and hence, on its runtime). For any atom <sup>α</sup> = (s<sup>1</sup> < s2) (resp. <sup>s</sup><sup>2</sup> <sup>−</sup> <sup>s</sup><sup>1</sup> <sup>&</sup>gt; 0) in the loop guard <sup>ϕ</sup>, let npe<sup>α</sup> <sup>∈</sup> NPE be a poly-exponential expression which results from multiplying (s<sup>2</sup> <sup>−</sup> <sup>s</sup>1)[*x*/cl<sup>n</sup> *x*] with the least common multiple of all denominators occurring in (s2−s1)[*x*/cl<sup>n</sup> *x*]. Since the loop is terminating, for some of these atoms this expression will become non-positive for large enough n and our goal is to compute bounds on their corresponding stabilization thresholds. First, one can replace npe<sup>α</sup> by an overapproximation npe<sup>α</sup> ψ- Δ,Γ where ψ = (ψ ∧ ϕ) considers both the invariant ψ

<sup>3</sup> Δ and Γ do not have to contain *all* such tuples, but can be (possibly empty) subsets.

and the guard <sup>ϕ</sup>. Let <sup>Ψ</sup> <sup>⊆</sup> <sup>Z</sup><sup>d</sup> such that *<sup>e</sup>* <sup>∈</sup> <sup>Ψ</sup> iff <sup>σ</sup>*<sup>e</sup>* (ψ ) holds. By Lemma 26 (i.e., σ*<sup>e</sup>* (npeα ψ- Δ,Γ ) ≥ σ*<sup>e</sup>* (npeα) for all *e* ∈ Ψ ), it suffices to compute a bound on the stabilization threshold of npeα ψ- Δ,Γ if it is always non-positive for large enough n, because if npeα ψ- Δ,Γ is non-positive, then so is npeα. We say that an over-approximation npeα ψ- Δ,Γ is *eventually non-positive* iff whenever npeα ψ- Δ,Γ = npeα, then one can show that for all *e* ∈ Ψ , σ*<sup>e</sup>* (npeα ψ- Δ,Γ ) is always non-positive for large enough n. <sup>4</sup> Using over-approximations npe<sup>α</sup> ψ- Δ,Γ can be advantageous because npe<sup>α</sup> ψ- Δ,Γ may contain less monomials than npe<sup>α</sup> and thus, the construction from Definition 22 can yield a polynomial of lower degree. So although npeα's stabilization threshold might be smaller than the one of npe<sup>α</sup> ψ- Δ,Γ , our technique might compute a smaller bound on the stabilization threshold when considering npe<sup>α</sup> ψ- Δ,Γ instead of npe.

**Theorem 27 (Bound on Stabilization Threshold).** *Let* L = (ψ, ϕ, η) *be a terminating tnn-loop, let* <sup>ψ</sup> = (<sup>ψ</sup> <sup>∧</sup> <sup>ϕ</sup>)*, and let* cl<sup>n</sup> *<sup>x</sup> be a normalized closed form for* η *with start value* n0*. For every atom* α = (s<sup>1</sup> < s2) *in* ϕ*, let* npe<sup>α</sup> ψ- Δ,Γ *be an eventually non-positive over-approximation of* npe<sup>α</sup> *and let* <sup>D</sup><sup>α</sup> <sup>=</sup> <sup>D</sup>npe<sup>α</sup> <sup>ψ</sup>- Δ,Γ *.*

*If* npe<sup>α</sup> ψ- Δ,Γ <sup>=</sup> <sup>j</sup>=1 <sup>p</sup><sup>j</sup> ·n<sup>a</sup><sup>j</sup> ·b<sup>n</sup> <sup>j</sup> *with* p<sup>j</sup> = 0 *for all* 1 ≤ j ≤ *and* (b, a) >lex ...>lex (b1, a1)*, then let* C<sup>α</sup> = max{1, N2, M2,...,N, M}*, where we have:*

$$M\_j = \begin{cases} 0, & \text{if } b\_j = b\_{j-1} \\ 1 \text{-}monotonicity \text{ } threshold \text{ of } \begin{cases} \end{cases} \text{ } \\ \begin{cases} \text{ } b\_j = b\_{j-1} \\ \text{ } \end{cases} \text{ } \begin{cases} 1, & \text{if } j = 2 \\ mt', & \text{if } j = 3 \\ \max\{mt, mt'\}, & \text{if } j > 3 \end{cases} \end{cases}$$

*Here,* mt *is the* (j − 2)*-monotonicity threshold of* (b<sup>j</sup>−1, a<sup>j</sup>−<sup>1</sup>) *and* (b<sup>j</sup>−2, a<sup>j</sup>−<sup>2</sup>) *and* mt = max{1*-monotonicity threshold of* (b<sup>j</sup>−2, a<sup>j</sup>−<sup>2</sup>) *and* (bi, ai) | 1 ≤ i ≤ j−3}*. Let* P ol<sup>α</sup> = {p1,...,p−<sup>1</sup>}*,* P ol = - *atom* <sup>α</sup> *occurs in* <sup>ϕ</sup> P olα*,* C = max{C<sup>α</sup> | *atom* α *occurs in* ϕ}*,* D = max{D<sup>α</sup> | *atom* α *occurs in* ϕ}*, and* sth <sup>∈</sup> <sup>Z</sup>[*x*] *with* sth = 2 · P ol + max{n0, C, D}*. Then for all e* ∈ Ψ *, we have* |σ*<sup>e</sup>* |(sth ) ≥ sth(ψ,ϕ,η)(*e*)*. If the tnn-loop has the initial transition* t<sup>0</sup> *and looping transition* t*, then* RB*glo*(t0)=1 *and* RB*glo*(t) = sth *is a global runtime bound for* L*.*

*Example 28.* The guard ϕ of the tnn-loop in Example 17 has the atoms α = (x<sup>2</sup> <sup>1</sup> + x<sup>5</sup> <sup>3</sup> < x2), α = (0 < x1), and α = (0 < −x1) (since x<sup>1</sup> = 0 is transformed into α ∨α). When instantiating the variables by the closed forms of Example 19 with start value n<sup>0</sup> = 0, Theorem 27 computes the bound 1 on the stabilization thresholds for α and α. So the only interesting atom is α = (0 < s<sup>2</sup> − s1) for s<sup>1</sup> = x<sup>2</sup> <sup>1</sup> +x<sup>5</sup> <sup>3</sup> and <sup>s</sup><sup>2</sup> <sup>=</sup> <sup>x</sup>2. We get npe<sup>α</sup> = (s<sup>2</sup> <sup>−</sup>s1)[*x*/cl<sup>n</sup> *<sup>x</sup>*] = <sup>q</sup><sup>3</sup> ·16<sup>n</sup> <sup>+</sup>q<sup>2</sup> ·9<sup>n</sup> <sup>+</sup>q1, with q<sup>j</sup> as in Example 25.

<sup>4</sup> This can be shown similar to the proof of (2) for (non-)termination of the loop. Thus, we transform <sup>∃</sup> *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>, m <sup>∈</sup> <sup>N</sup>. <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>≥<sup>m</sup>. ψ- ∧ npeα ψ- Δ,Γ > 0 into an existential formula as in [15] and try to prove its invalidity by an SMT solver.

In the program of Fig. 1, the corresponding self-loop t<sup>5</sup> has two entry transitions t<sup>4</sup> and t<sup>1</sup> which result in two tnn-loops with the update-invariants ψ<sup>1</sup> = true resulting from transition t<sup>4</sup> and ψ<sup>2</sup> = (x<sup>3</sup> > 0) from t1. So ψ<sup>2</sup> is an update-invariant of t<sup>5</sup> which always holds when reaching t<sup>5</sup> via transition t1.

For <sup>ψ</sup><sup>1</sup> <sup>=</sup> true, we choose <sup>Δ</sup> <sup>=</sup> <sup>Γ</sup> <sup>=</sup> <sup>∅</sup>, i.e., npeα ψ- 1 Δ,Γ = npeα. So we have b<sup>3</sup> = 16, b<sup>2</sup> = 9, b<sup>1</sup> = 1, and a<sup>j</sup> = 0 for all 1 ≤ j ≤ 3. We obtain

M<sup>2</sup> = 0, as 0 is the 1-monotonicity threshold of (9, 0) and (1, 1)

M<sup>3</sup> = 0, as 0 is the 1-monotonicity threshold of (16, 0) and (9, 1)

N<sup>2</sup> = 1 and N<sup>3</sup> = 1, as 1 is the 1-monotonicity threshold of (9, 0) and (1, 0).

Hence, we get C = C<sup>α</sup> = max{1, N2, M2, N3, M3} = 1. So we obtain the runtime bound sth <sup>ψ</sup><sup>1</sup> = 2 ·{q1, q2}+max{n0, C<sup>α</sup>} = 2 ·x<sup>2</sup> + 2 ·x<sup>3</sup> <sup>3</sup> + 2 ·x<sup>5</sup> <sup>3</sup> + 1 for the loop t5 t<sup>5</sup> w.r.t. ψ1. By Lemma 16, this means that 2·sth <sup>ψ</sup>1+1 = 4·x2+4·x<sup>3</sup> 3+4·x<sup>5</sup> <sup>3</sup>+3 is a runtime bound for the loop at transition t5.

For the update-invariant ψ<sup>2</sup> = (x<sup>3</sup> > 0), we use the over-approximation npe<sup>α</sup> ψ- 2 Δ,Γ <sup>=</sup> <sup>p</sup><sup>2</sup> · <sup>16</sup><sup>n</sup> <sup>+</sup> <sup>p</sup><sup>1</sup> · <sup>9</sup><sup>n</sup> with <sup>p</sup><sup>2</sup> <sup>=</sup> <sup>−</sup>x<sup>2</sup> <sup>1</sup> and p<sup>1</sup> = x<sup>2</sup> from Example 25, where ψ <sup>2</sup> = (ψ2∧ϕ) implies that it is always non-positive for large enough n. Now we obtain M<sup>2</sup> = 0 (the 1-monotonicity threshold of (16, 0) and (9, 1)) and N<sup>2</sup> = 1, where C = C<sup>α</sup> = max{1, N2, M2} = 1. Moreover, we have D<sup>α</sup> = max{1, 0} = 1, since

1 is the 1-monotonicity threshold of (9, 0) and (1, 0), and

0 is the 1-monotonicity threshold of (1, 0) and (0, 0).

We now get the tighter bound sth <sup>ψ</sup><sup>2</sup> = 2 · {p1} + max{n0, Cα, D<sup>α</sup>} = 2 · x<sup>2</sup> + 1 for t<sup>5</sup> t5. So t5's runtime bound is 2 ·sth <sup>ψ</sup><sup>2</sup> +1= 4 · x<sup>2</sup> + 3 when using invariant ψ2.

Theorem 29 shows how the technique of Lemma 16 and Theorem 27 can be used to compute local runtime bounds for twn-loops whenever such loops occur within an integer program. To this end, one needs the new Theorem 11 where in contrast to [6,18] these local bounds do not have to result from ranking functions.

To turn a self-loop t and r ∈ E{t} from a larger program P into a twn-loop (ψ, ϕ, η), we use t's guard ϕ and update η. To obtain an update-invariant ψ, our implementation uses the Apron library [23] for computing invariants on a version of the full program where we remove all entry transitions E{t} except r. <sup>5</sup> From the invariants computed for t, we take those that are also update-invariants of t.

**Theorem 29 (Local Bounds for Twn-Loops).** *Let* P = (PV,L, 0, T ) *be an integer program with* PV = {x1,...,x<sup>d</sup>} ⊆ PV*. Let* t = (, ϕ, η, ) ∈ T *with* ϕ ∈ F(PV )*,* <sup>η</sup>(v) <sup>∈</sup> <sup>Z</sup>[PV ] *for all* v ∈ PV *, and* η(v) = v *for all* v ∈ PV\PV *. For any entry transition* r ∈ E{t}*, let* ψ ∈ F(PV ) *such that* |= ψ → η(ψ) *and*

<sup>5</sup> Regarding invariants for the full program in the computation of local bounds for t is possible since in contrast to [6,18] our definition of local bounds from Definition

<sup>9</sup> is restricted to states that are reachable from an initial configuration (0, σ0).

*such that* σ(ψ) *holds whenever there is a* σ<sup>0</sup> ∈ Σ *with* (0, σ0) →<sup>∗</sup> <sup>T</sup> ◦ →<sup>r</sup> (, σ)*. If* L = (ψ, ϕ, η) *is a terminating tnn-loop, then let* RB*loc*(→<sup>r</sup> {t}) = sth *, where* sth *is defined as in Theorem 27. If* L *is a terminating twn-loop but no tnnloop, let* RB*loc*(→<sup>r</sup> {t})=2 · sth + 1*, where* sth *is the bound of Theorem 27 computed for* L L*. Otherwise, let* RB*loc*(→<sup>r</sup> {t}) = ω*. Then* RB*loc is a local runtime bound for* {t} = T <sup>&</sup>gt; = T *in the program* P*.*

*Example 30.* In Fig. 1, we consider the self-loop t<sup>5</sup> with E{t5} = {t4, t1} and the update-invariants ψ<sup>1</sup> = true resp. ψ<sup>2</sup> = (x<sup>3</sup> > 0). For t5's guard ϕ and update η, both (ψi, ϕ, η) are terminating twn-loops (see Example 14), i.e., (2) is invalid.

By Theorem 29 and Example 28, RBloc with RBloc(→<sup>t</sup><sup>4</sup> {t5})=4 · x<sup>2</sup> + 4 · x3 <sup>3</sup> + 4 · <sup>x</sup><sup>5</sup> <sup>3</sup> + 3 and RBloc(→<sup>t</sup><sup>1</sup> {t5})=4 · x<sup>2</sup> + 3 is a local runtime bound for {t5} = T <sup>&</sup>gt; = T in the program of Fig. 1. As shown in Example 12, Theorem 11 then yields the global runtime bound RBglo(t5)=8 · x<sup>4</sup> · x<sup>5</sup> + 13006 · x4.

### **5 Local Runtime Bounds for Twn-Cycles**

Section 4 introduced a technique to determine local runtime bounds for twn-selfloops in a program. To increase its applicability, we now extend it to larger cycles. For every entry transition of the cycle, we *chain* the transitions of the cycle, starting with the transition which follows the entry transition. In this way, we obtain loops consisting of a single transition. If the chained loop is a twn-loop, we can apply Theorem 29 to compute a local runtime bound. Any local bound on the chained transition is also a bound on each of the original transitions.<sup>6</sup>

By Theorem 29, we obtain a bound on the number of evaluations of the *complete cycle*. However, we also have to consider a *partial execution* which stops before traversing the full cycle. Therefore, we increase every local runtime bound by 1.

Note that this replacement of a cycle by a self-loop which results from chaining its transitions is only sound for *simple* cycles. A cycle is simple if each iteration through the cycle can only be done in a unique way. So the cycle must not have any subcycles and there also must not be any indeterminisms concerning the next transition to be taken. Formally, C = {t1,...,t<sup>n</sup>}⊂T is a simple cycle if C does not contain temporary variables and there are pairwise different locations 1,...,<sup>n</sup> such that t<sup>i</sup> = (i, , , <sup>i</sup>+1) for 1 ≤ i ≤ n−1 and t<sup>n</sup> = (n, , , 1). This ensures that if there is an evaluation with →<sup>t</sup><sup>i</sup> ◦ →<sup>∗</sup> C\{ti} ◦ →<sup>t</sup><sup>i</sup> , then the steps with →<sup>∗</sup> C\{ti} have the form <sup>→</sup><sup>t</sup>i+1 ◦ ... ◦ →<sup>t</sup><sup>n</sup> ◦ →<sup>t</sup><sup>1</sup> ◦ ... ◦ →<sup>t</sup>i−<sup>1</sup> .

Algorithm 1 describes how to compute a local runtime bound for a simple cycle C = {t1,...,t<sup>n</sup>} as above. In the loop of Line 2, we iterate over all entry transitions r of C. If r reaches the transition ti, then in Line 3 and 4 we chain t<sup>i</sup> ... t<sup>n</sup> t<sup>1</sup> ... t<sup>i</sup>−<sup>1</sup> which corresponds to one iteration of the cycle starting

<sup>6</sup> This is sufficient for our improved definition of local bounds in Definition 9 where in contrast to [6,18] we do not require a bound on the *sum* but only on *each* transition in the considered set T - . Moreover, here we again benefit from our extension to compute individual local bounds for different entry transitions.

**Algorithm 1.** Algorithm to Compute Local Runtime Bounds for Cycles

**input :** A program (PV, L, 0, T ) and a simple cycle C = {t1,...,tn}⊂T **output :** A local runtime bound RBloc for C = T - <sup>&</sup>gt; = T -


**<sup>7</sup> return** local runtime bound RBloc.

**Fig. 2.** An Integer Program with a Nested Non-Self-Loop

in ti. If a suitable renaming (and thus also reordering) of the variables turns the chained transition into a twn-loop, then we use Theorem 29 to compute a local runtime bound RBloc(→<sup>r</sup> C) in Lines 5 and 6. If the chained transition does not give rise to a twn-loop, then RBloc(→<sup>r</sup> C) is ω (Line 1). In practice, to use the twn-technique for a transition t in a program, our tool KoAT searches for those simple cycles that contain t and where the chained cycle is a twn-loop. Among those cycles it chooses the one with the smallest runtime bounds for its entry transitions.

**Theorem 31 (Correctness of Algorithm** 1**).** *Let* P = (PV,L, 0, T ) *be an integer program and let* C⊂T *be a simple cycle in* P*. Then the result* RB*loc* : E<sup>C</sup> → B *of Algorithm 1 is a local runtime bound for* C = T <sup>&</sup>gt; = T *.*

*Example 32.* We apply Algorithm 1 on the cycle C = {t5<sup>a</sup>, t5<sup>b</sup>} of the program in Fig. 2. C's entry transitions t<sup>1</sup> and t<sup>4</sup> both end in 3. Chaining t5<sup>a</sup> and t5<sup>b</sup> yields the transition t<sup>5</sup> of Fig. 1, i.e., t<sup>5</sup> = t5<sup>a</sup> t5<sup>b</sup>. Thus, Algorithm 1 essentially transforms the program of Fig. 2 into Fig. 1. As in Example 28 and 30, we obtain RBloc(→<sup>t</sup><sup>4</sup> C) = 1 + (2 · sth true + 1) = 4 · <sup>x</sup><sup>2</sup> + 4 · <sup>x</sup><sup>3</sup> <sup>3</sup> + 4 · <sup>x</sup><sup>5</sup> <sup>3</sup> + 4 and RBloc(→<sup>t</sup><sup>1</sup> C) = 1 + (2 · sth <sup>x</sup>3><sup>0</sup> + 1) = 4 · x<sup>2</sup> + 4, resulting in the global runtime bound RBglo(t5<sup>a</sup>) = RBglo(t5<sup>b</sup>)=8 · x<sup>4</sup> · x<sup>5</sup> + 13008 · x4, which again yields rc(σ0) ∈ <sup>O</sup>(n<sup>2</sup>).

### **6 Conclusion and Evaluation**

We showed that results on subclasses of programs with computable complexity bounds like [19] are not only theoretically interesting, but they have an important practical value. To our knowledge, our paper is the first to integrate such results into an incomplete approach for automated complexity analysis like [6,18]. For this integration, we developed several novel contributions which extend and improve the previous approaches in [6,18,19] substantially:


The need for these improvements is demonstrated by our leading example in Fig. 1 (where the contributions (a)–(d) are needed to infer quadratic runtime complexity) and by the example in Fig. 2 (which illustrates (e)). In this way, the power of automated complexity analysis is increased substantially, because now one can also infer runtime bounds for programs containing non-linear arithmetic.

To demonstrate the power of our approach, we evaluated the integration of our new technique to infer local runtime bounds for twn-cycles in our reimplementation of the tool KoAT (written in OCaml) and compare the results to other state-of-the-art tools. To distinguish our re-implementation of KoAT from the original version of the tool from [6], let KoAT1 refer to the tool from [6] and let KoAT2 refer to our new re-implementation. KoAT2 applies a local controlflow refinement technique [18] (using the tool iRankFinder [8]) and preprocesses the program in the beginning, e.g., by extending the guards of transitions by invariants inferred using the Apron library [23]. For all occurring SMT problems, KoAT2 uses Z3 [28]. We tested the following configurations of KoAT2, which differ in the techniques used for the computation of local runtime bounds:


Existing approaches for automated complexity analysis are already very powerful on programs that only use linear arithmetic in their guards and updates.


**Fig. 3.** Evaluation on the Collection CINT<sup>+</sup>

The corresponding benchmarks for *Complexity of Integer Transitions Systems* (CITS) and *Complexity of* C *Integer Programs* (CINT) from the *Termination Problems Data Base* [33] which is used in the annual *Termination and Complexity Competition (TermComp)* [17] contain almost only examples with linear arithmetic. Here, the existing tools already infer finite runtimes for more than 89% of those examples in the collections CITS and CINT where this *might*<sup>7</sup> be possible.

The main benefit of our new integration of the twn-technique is that in this way one can also infer finite runtime bounds for programs that contain non-linear guards or updates. To demonstrate this, we extended both collections CITS and CINT by 20 examples that represent typical such programs, including several benchmarks from the literature [3,14,15,18,20,34], as well as our programs from Fig. 1 and 2. See [27] for a detailed list and description of these examples.

Figure 3 presents our evaluation on the collection CINT<sup>+</sup>, consisting of the 484 examples from CINT and our 20 additional examples for non-linear arithmetic. We refer to [27] for the (similar) results on the corresponding collection CITS<sup>+</sup>.

In the C programs of CINT<sup>+</sup>, all variables are interpreted as integers over Z (i.e., without overflows). For KoAT2 and KoAT1, we used Clang [7] and llvm2kittel [10] to transform C programs into integer transitions systems as in Definition 2. We compare KoAT2 with KoAT1 [6] and the tools CoFloCo [11,12], MaxCore [2] with CoFloCo in the backend, and Loopus [31]. We do not compare with RaML [21], as it does not support programs whose complexity depends on (possibly negative) integers (see [29]). We also do not compare with PUBS [1], because as stated in [9] by one of its authors, CoFloCo is stronger than PUBS. For the same reason, we only consider MaxCore with the backend CoFloCo instead of PUBS.

All tools were run inside an Ubuntu Docker container on a machine with an AMD Ryzen 7 3700X octa-core CPU and 48 GB of RAM. As in *TermComp*, we applied a timeout of 5 min for every program.

In Fig. 3, the first entry in every cell denotes the number of benchmarks from CINT<sup>+</sup> where the respective tool inferred the corresponding bound. The number

<sup>7</sup> The tool LoAT [13,16] proves unbounded runtime for 217 of the 781 examples from CITS and iRankFinder [4,8] proves non-termination for 118 of 484 programs of CINT.

in brackets is the corresponding number of benchmarks when only regarding our 20 new examples for non-linear arithmetic. The runtime bounds computed by the tools are compared asymptotically as functions which depend on the largest initial absolute value n of all program variables. So for instance, there are 26 + 231 = 257 programs in CINT<sup>+</sup> (and 5 of them come from our new examples) where KoAT2+TWN+MΦRF5 can show that rc(σ0) ∈ O(n) holds for all initial states σ<sup>0</sup> where |σ0(v)| ≤ n for all v ∈ PV. For 26 of these programs, KoAT2+TWN+MΦRF5 can even show that rc(σ0) ∈ O(1), i.e., their runtime complexity is constant. Overall, this configuration succeeds on 344 examples, i.e., "< ∞" is the number of examples where a finite bound on the runtime complexity could be computed by the respective tool within the time limit. "AVG<sup>+</sup>(s)" is the average runtime of the tool on successful runs in seconds, i.e., where the tool inferred a finite time bound before reaching the timeout, whereas "AVG(s)" is the average runtime of the tool on all runs including timeouts.

On the original benchmarks CINT where very few examples contain non-linear arithmetic, integrating TWN into a configuration that already uses multiphaselinear ranking functions does not increase power much: KoAT2+TWN+MΦRF5 succeeds on 344−15 = 329 such programs and KoAT2+MΦRF5 solves 328−1 = 327 examples. On the other hand, if one only has linear ranking functions, then an improvement via our twn-technique has similar effects as an improvement with multiphase-linear ranking functions (here, the success rate of KoAT2+MΦRF5 is similar to KoAT2+TWN+RF which solves 341 − 15 = 326 such programs).

But the main benefit of our technique is that it also allows to successfully handle examples with non-linear arithmetic. Here, our new technique is significantly more powerful than previous ones. Other tools and configurations without TWN in Fig. 3 solve at most 2 of the 20 new examples. In contrast, KoAT2+TWN+RF and KoAT2+TWN+MΦRF5 both succeed on 15 of them.<sup>8</sup> In particular, our running examples from Fig. 1 and 2 and even isolated twn-loops like t<sup>5</sup> or t<sup>5</sup> t5 from Example 14 and 17 can *only* be solved by KoAT2 with our twn-technique.

To summarize, our evaluations show that KoAT2 with the added twntechnique outperforms all other configurations and tools for automated complexity analysis on all considered benchmark sets (i.e., CINT<sup>+</sup>, CINT, CITS<sup>+</sup>, and CITS) and it is the only tool which is also powerful on examples with non-linear arithmetic.

KoAT's source code, a binary, and a Docker image are available at https:// aprove-developers.github.io/KoAT TWN/. The website also has details on our experiments and *web interfaces* to run KoAT's configurations directly online.

**Acknowledgments.** We are indebted to M. Hark for many fruitful discussions about complexity, twn-loops, and KoAT. We are grateful to S. Genaim and J. J. Dom´enech for a suitable version of iRankFinder which we could use for control-flow refinement in KoAT's backend. Moreover, we thank A. Rubio and E. Mart´ın-Mart´ın for a static binary of MaxCore, A. Flores-Montoya and F. Zuleger for help in running CoFloCo and Loopus, F. Frohn for help and advice, and the reviewers for their feedback to improve the paper.

<sup>8</sup> One is the non-terminating leading example of [15], so at most 19 *might* terminate.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Author Index**

Akshay, S. 671 Albert, Elvira 3 Alrabbaa, Christian 271 Baader, Franz 271 Barbosa, Haniel 15 Barrett, Clark 15, 95, 125 Berg, Jeremias 75 Bernreiter, Michael 331 Bidoit, Nicole 310 Bílková, Marta 429 Blaisdell, Eben 449 Borgwardt, Stefan 271 Bouziane, Hinde Lilia 359 Bozga, Marius 691 Bromberger, Martin 147 Brown, Chad E. 350 Bryant, Randal E. 106 Bueri, Lucas 691 Cailler, Julie 359 Cauli, Claudia 281 Chakraborty, Supratik 671 Chlebowski, Szymon 407 Dachselt, Raimund 271 Das, Anupam 509 Delahaye, David 359 Dill, David 125 Dixon, Clare 486 Dowek, Gilles 8 Draheim, Dirk 300 Duarte, André 169 Durán, Francisco 529 Eker, Steven 529 Escobar, Santiago 529 Felli, Paolo 36 Ferrari, Mauro 57 Fiorentini, Camillo 57 Frittella, Sabine 429 Frohn, Florian 712 Fujita, Tomohiro 388

Gallicchio, James 723 Giesl, Jürgen 712, 734 Girlando, Marianna 509 Gordillo, Pablo 3 Greati, Vitor 640 Grieskamp, Wolfgang 125 Haifani, Fajar 188, 208 Hernández-Cerezo, Alejandro 3 Heskes, Tom 597 Heule, Marijn J. H. 106 Holden, Sean B. 559 Holub, Štˇepán 369 Hustadt, Ullrich 486 Ihalainen, Hannes 75 Indrzejczak, Andrzej 541 Iosif, Radu 691 Janota, Mikoláš 597 Järv, Priit 300 Järvisalo, Matti 75 Jukiewicz, Marcin 407 Kaliszyk, Cezary 350 Kanovich, Max 449 Koopmann, Patrick 188, 271 Korovin, Konstantin 169 Kozhemiachenko, Daniil 429 Kremer, Gereon 15, 95 Kutsia, Temur 578 Kuznetsov, Stepan L. 449 Lachnitt, Hanna 15 Lahav, Ori 468 Leidinger, Hendrik 228 Leszczy ´nska-Jasion, Dorota 407 Leutgeb, Lorenz 147 Lolic, Anela 331 Lommen, Nils 734 Ma, Yue 310 Maly, Jan 331 Mangla, Chaitanya 559 Marcos, João 640 Martí-Oliet, Narciso 529

Matsuzaki, Takuya 388 Méndez, Julián 271 Meseguer, José 529 Meyer, Fabian 734 Mitsch, Stefan 723 Montali, Marco 36 Nalon, Cláudia 486 Niemetz, Aina 15 Nötzli, Andres 15, 125 Ortiz, Magdalena 281 Ozdemir, Alex 15 Pal, Debtanu 671 Papacchini, Fabio 486 Park, Junkil 125 Pau, Cleo 578 Paulson, Lawrence C. 559 Piepenbrock, Jelle 597 Pimentel, Elaine 449 Piterman, Nir 281 Platzer, André 723 Popescu, Andrei 618 Preiner, Mathias 15 Qadeer, Shaz 125 Raška, Martin 369 Reeves, Joseph E. 106 Reynolds, Andrew 15, 95, 125 Robillard, Simon 359

Rodríguez-Núñez, Clara 3 Rosain, Johann 359 Rubio, Albert 3 Rubio, Rubén 529

Scedrov, Andre 449 Sheng, Ying 125 Socha´nski, Michał 407 Starosta, Štˇepán 369 Suda, Martin 659

Talcott, Carolyn 529 Tammet, Tanel 300 Tan, Yong Kiam 723 Tinelli, Cesare 15, 95, 125 Tomczyk, Agata 407 Tourret, Sophie 188

Urban, Josef 597

Viswanathan, Arjun 15 Viteri, Scott 15

Weidenbach, Christoph 147, 188, 208, 228 Winkler, Sarah 36 Woltran, Stefan 331

Yamada, Akihisa 248 Yang, Hui 310

Zohar, Yoni 15, 125, 468