Naoki Katoh · Yuya Higashikawa · Hiro Ito · Atsuki Nagao · Tetsuo Shibuya · Adnan Sljoka · Kazuyuki Tanaka · Yushi Uno Editors

# Sublinear Computation Paradigm

Algorithmic Revolution in the Big Data Era

Sublinear Computation Paradigm

Naoki Katoh • Yuya Higashikawa • Hiro Ito • Atsuki Nagao • Tetsuo Shibuya • Adnan Sljoka • Kazuyuki Tanaka • Yushi Uno Editors

# Sublinear Computation Paradigm

Algorithmic Revolution in the Big Data Era

Editors Naoki Katoh Graduate School of Information Science University of Hyogo Kobe, Hyogo, Japan

Hiro Ito School of Informatics and Engineering University of Electro-Communications Chofu, Tokyo, Japan

Tetsuo Shibuya Human Genome Center University of Tokyo Minato, Tokyo, Japan

Kazuyuki Tanaka Graduate School of Information Science Tohoku University Sendai, Miyagi, Japan

Yuya Higashikawa Graduate School of Information Science University of Hyogo Kobe, Hyogo, Japan

Atsuki Nagao Department of Information Science Ochanomizu University Bunkyo, Tokyo, Japan

Adnan Sljoka Center for Advanced Intelligence Project RIKEN Chuo, Tokyo, Japan

Yushi Uno Graduate School of Engineering Osaka Prefecture University Sakai, Osaka, Japan

ISBN 978-981-16-4094-0 ISBN 978-981-16-4095-7 (eBook) https://doi.org/10.1007/978-981-16-4095-7

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

# Preface

This book gives an overview of cutting-edge work on a new paradigm called the "sublinear computation paradigm," which was proposed in the large multiyear academic research project "Foundations of Innovative Algorithms for Big Data" in Japan. In today's rapidly evolving age of big data, massive increases in big data have led to many new opportunities and uncharted areas of exploration, but have also brought new challenges. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, we are pursuing innovative changes in algorithm theory for big data. For example, polynomial-time algorithms have thus far been regarded as "fast," but if we apply an <sup>O</sup>ðn<sup>2</sup>Þ-time algorithm to a petabyte-scale or larger big data set, we will encounter problems in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, we require linear, sublinear, and constant-time algorithms. In this project, which ran from October 2014 to September 2021, we have proposed the sublinear computation paradigm in order to support innovation in the big data era. We have created a foundation of innovative algorithms by developing computational procedures, data structures, and modeling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modeling. Our work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book.

This book consists of five parts: Part I, which consists of a single chapter introducing the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modeling, respectively; and Part V presents some application results.

We deeply appreciate the members of this project and everyone else who was involved. This project was conducted as a subproject of the research project "Advanced Core Technologies for Big Data Integration," which was supervised by Prof. Masaru Kitsuregawa. We would like to express our gratitude to him and everyone involved in that project. We also thank the editorial office of Springer for the opportunity to publish this book.

Kobe, Japan Naoki Katoh Tokyo, Japan Hiro Ito Kobe, Japan Yuya Higashikawa

# Contents

#### Part I Introduction


#### Part IV Sublinear Modelling


# **Part I Introduction**

# **Chapter 1 What Is the Sublinear Computation Paradigm?**

**Naoki Katoh and Hiro Ito**

**Abstract** This chapter introduces the "sublinear computation paradigm." A sublineartime algorithm is an algorithm that runs in time sublinear in the size of the instance (input data). In other words, the running time is *o*(*n*), where *n* is the size of the instance. This century marks the start of the era of big data. In order to manage big data, polynomial-time algorithms, which are considered to be efficient, may sometimes be inadequate because they may require too much time or computational resources. In such cases, sublinear-time algorithms are expected to work well. We call this idea the "sublinear computation paradigm." A research project named "Foundations on Innovative Algorithms for Big Data (ABD)," in which this paradigm is the central concept, was started under the CREST program of the Japan Science and Technology Agency (JST) in October 2014 and concluded in September 2021. This book mainly introduces the results of this project.

#### **1.1 We Are in the Era of Big Data**

The twenty-first century can be called the era of Big Data. The number of webpages on the Internet was estimated to be more than 1 trillion (=1012) in 2008 [22], and the number of websites grows ten times in these 10 years [21]. Thus the number of webpages is estimated to be more than 10 trillion (=1013) now. If we assume that 10<sup>6</sup> bytes(≈ 10<sup>7</sup> bits) of data is contained in a single webpage on average,<sup>1</sup> then the total amount of the data stored on the Internet would be more than 100 exabits (=10<sup>20</sup> bits)! The various actions that everyone performs are collected by our smartphones and are stored in the memory of storage devices around the world. The remarkable development of computer memory has made it possible to store this information.

N. Katoh

<sup>1</sup>Note that one 1080 <sup>×</sup> 1920 pixel digital photo consists of more than 2 <sup>×</sup> <sup>10</sup><sup>6</sup> pixels.

University of Hyogo, 8-2-1 Gakuennishi-machi, Nishi-ku, Kobe, Hyogo 651-2197, Japan e-mail: naoki.katoh@gsis.u-hyogo.ac.jp

H. Ito (B)

The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan e-mail: itohiro@uec.ac.jp

<sup>©</sup> The Author(s) 2022 N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_1

However, the ability to store data and the ability to make good use of the data are different problems. The speed of the data transfer using IEEE 802.11ac is 6.9 Gbps. Using this, it would take 1.7 days to read 1 petabit (10<sup>15</sup> bit) of data. To read 1 exabit (10<sup>18</sup> bit) of data, we would need over 4 years! Although the speed of data transfer is expected to continue to increase, the amount of available data is also expected to grow even faster.

This situation can create new problems that did not arise in past centuries, such as requiring a huge amount of time just to read an entire dataset. We are thus faced with new problems in terms of computation.

## **1.2 Theory of Computational Complexity and Polynomial-Time Algorithms**

In the area of the theory of computational complexity, the term "polynomial-time algorithms" is often as a synonym for "efficient algorithms." A *polynomial-time algorithm* is an algorithm that runs in time expressed by a function polynomial of the size of the instance (i.e., the input). For example, consider the sorting problem that takes a set of positive integers *a*1,..., *an* as input and outputs a permutation π : {1,..., *n*}→{1,..., *n*} such that *a*π(*i*) ≤ *a*π(*i*+1) for every *i* ∈ {1,..., *n* − 1}. In this problem, the input is expressed by *n* integers and thus the input size is *n*. 2

We now briefly introduce the theory of computational complexity. Theoretically, the computation time of an algorithms is expressed in terms of the number of basic units of calculations (i.e., the basic arithmetic operations, reading or writing a value in a cell in memory, and comparison of two values3). The complexity is then expressed as a function of *n*, say *T* (*n*), where *n* is the (data) size of the input. If there exists a fixed integer *k* such that *T* (*n*) = *O*(*n<sup>k</sup>* ), then we say that the algorithm runs in polynomial time.

For example, the sorting problem can be solved in *O*(*n* log *n*) time, which is polynomial, and it has been proven that this is the minimum in the big-O sense, meaning that no algorithm exists that runs in *o*(*n* log *n*)-time. In contrast, for the *partitioning problem*, which is the problem of finding a subset *B* of a given set *A* consisting of *n* integers *a*1,..., *an* such that - *ai*∈*<sup>B</sup> ai* <sup>=</sup> <sup>1</sup> 2 - *ai*∈*<sup>A</sup> ai* , no polynomialtime algorithms have been found and the majority of researchers believe that no such algorithm exists.<sup>4</sup>

<sup>2</sup> More rigorously, representing an integer *a* requires around log2 *a* bits. However, in the area of the theory of computational complexity, we usually use the assumption that one integer is stored in one cell (byte) of the memory. Since this assumption may cause some strange results if pathologically huge integers are used, these integers are prohibited.

<sup>3</sup> In order to avoid excessive calculations, we assume that each integer consists of at most log2 *n* bits, where *n* is the number of integers treated in the instance.

<sup>4</sup> This is equivalent to the well-known "P vs. NP problem," which is one of the seven Millennium Prize open problems in mathematics.

For many problems, constructing an exponential-time algorithm is easy. For the partitioning problem, for example, an algorithm that tests all subsets of *A* clearly solves the problem, and this requires 2*<sup>n</sup>* · *O*(*n*)time, which is exponential. Therefore, the existence of an exponential-time algorithm is considered to be trivial for many cases. Constructing polynomial-time algorithms, however, requires additional ideas in many cases.

## **1.3 Polynomial-Time Algorithms and Sublinear-Time Algorithms**

#### *1.3.1 A Brief History of Polynomial-Time Algorithms*

The idea that "polynomial-time algorithms are efficient" is sometimes called Cobham's Thesis or Cobham–Edmonds' Thesis, which is named after Alan Cobham and Jack Edmonds [4]. Cobham [3] identified tractable problems with the complexity class P, which is the class of problems solvable in polynomial-time with respect to the input size. Edmonds also stated the same thing in [7].

Although these papers were published in 1965, the idea behind this thesis seems to have been a commonly held belief among researchers in the late in 1950s. For example, Kruskal's algorithm and Prim's algorithms, which are both almost lineartime algorithms for the minimum spanning tree problem, were presented in 1956 [16] and 1957 [17], respectively. Dijkstra's algorithm, which is an almost linear-time algorithm for the shortest path problem with positive edge lengths, was presented in 1959 [6]. Ford and Fulkerson presented the augmenting path algorithm for the maximum flow problem in 1956 [8]. The blossom algorithm was proposed by Jack Edmonds in 1961 for the maximum matching problem on general (i.e., not necessarily bipartite) graphs [7].

In 1971, Cook proposed the idea of NP-completeness and proved that the satisfiability problem (SAT) is NP-complete [5]. NP-complete problems are intuitively the most difficult problems among the class NP. NP is the set of problems that can be solved in polynomial-time by nondeterministic Turing machines. Although we do not have a proof yet, many researchers believe that no polynomial-time algorithms exist for any NP-complete problems.<sup>5</sup> Cook's study created a new field of research through which countlessly many combinatorial problems have been found to be NP-complete [10].

By definition, it is trivial that every problem in NP can be solved in exponential time (by a Turing machine). The theory of NP-completeness explicitly and firmly fixed the idea that "polynomial-time algorithms are efficient" in the minds of researchers. We would like to call this idea the *polynomial computation paradigm*.

<sup>5</sup> This is the "P vs. NP problem," which is one of the seven Millennium Prize open problems in mathematics at the end of the twntienth century.

Many important polynomial-time algorithms are now known, including the two basic polynomial-time algorithms for the linear programming problem (LP), namely the ellipsoid method proposed by Khachiyan in 1979 [15] and the interior-point method proposed by Karmarkar in 1984 [13], the strongly polynomial-time algorithm for the minimum cost flow problem proposed by Éva Tardos in 1985 [19], the lineartime shortest path algorithm with positive integer edge lengths proposed by Mikkel Thorup in 1997 [20], and the deterministic polynomial-time algorithm for primality test proposed by Agrawal, Kayal, and Saxena in 2002 [1]. These algorithms pioneered new perspectives in the field of algorithm research. They are gems that were found under the polynomial computation paradigm.

#### *1.3.2 Emergence of Sublinear-Time Algorithms*

Although linear-time algorithms have naturally considered the fastest, since intuitively we basically have to read all the data when solving a problem, the new idea of "sublinear-time algorithms" emerged at the end of the twentieth century. Sublineartime algorithms run by reading only a sublinear (i.e., *o*(*n*)) amount of data from the input.

The most popular framework for sublinear-time algorithms is "property testing." This idea was first presented by Rubinfeld and Sudan [18] in 1996 (although it appeared even earlier at a conference version in 1992) in the context of program checking. In this paper, they introduced the ideas of "distance" between an instance (e.g., a function) and a property (e.g., linearity), and "-farness." They also gave constant-time testers for some properties of functions. The first study giving the notion of constant-time testability of combinatorial (mainly graph) structures was given by Goldreich, Goldwasser, and Ron [11], which was present a conference in 1995 (STOC'95). After the turn of the century, many studies that follow this idea of testability have appeared and the importance of this field is growing [2, 9].

#### *1.3.3 Property Testing and Parameter Testing*

We say that a *testing algorithm* (or *tester* for short) for a property *P* accepts a given instance *I* with probability at least 2/3 if *I* has *P* and rejects it with probability at least 2/3 if *I* is far from having *P*. *P* is defined as a (generally infinitely large) subset of instances. The distance between *I* and *P* is defined as the minimum Hamming distance between *I* and *I* ∈ *P*. The distance is normalized to be in [0, 1] (i.e., dist(*I*,*P*) ∈ [0, 1]). If an instance has the property, the distance is zero (i.e., dist(*I*,*P*) = 0 if *I* ∈ *P*). If dist(*I*,*P*) ≥ for an ∈ [0, 1], then we say that *I* is -far from *P* and otherwise -close. A tester rejects *I* with probability at least 2/3 if *I* is -far from *P*.

For a property, if a tester exists whose running time<sup>6</sup> is bounded by a constant independent of the size of the input, then we call the property is *testable*. <sup>7</sup> This framework is called *property testing*.

Property testing is a relaxation of the framework of decision problems. In contrast, a relaxation of the framework of optimization problems is *parameter testing*. In parameter testing, we try to find an approximation to the value of the objective function with an additive error of at most *N* from the optimum value, where *N* is the maximum value of the objective function.

This idea appeared at the end of the twentieth century, and was further developed in this century. See Chaps. 2 and 3 for these themes.

#### **1.4 Ways to Decrease Computational Resources**

In addition to property and parameter testing, there are various methods for decreasing the amount of computational resources needed for handling big data. Although some methods may require linear computation, each of them has strong merits. We briefly introduce these methods in this section.

#### *1.4.1 Streaming Algorithms*

Property testing generally uses the assumption that an algorithm can read any position (cell) of the input. However, this may be difficult in some situations, such as if the data arrives as a stream (sequence) and the algorithm is required to read the values one by one in the order of arrival. The key assumption of this framework is that an algorithm does not have enough memory to store the entire input. For example, to find the maximum value in a sequence of integers *a*1, ..., *an*, it is enough to use *O*(1) cells of memories.8

Although this method requires linear computation time, since it must read all of the data, the amount of memory is constant in many cases. If we assume that the order of data arrival in the stream is random, then it becomes close to the setting of (nonadaptive9) property testing. In this book, streaming algorithms are covered in Chap. 16.

<sup>6</sup> Normally we also use the "query complexity" besides the running time. See Chap. 2 for details.

<sup>7</sup> Sometimes "testable" means that the problem has an algorithm with a sublinear query complexity, and *strongly testable* may be used for distinguishing constant query complexity from mere sublinear query complexity.

<sup>8</sup> We assume that each memory cell can store any one integer among {*a*1,..., *an*}.

<sup>9</sup> *Nonadaptive* means that the query (of an algorithm) cannot depend on any answer of the queries; in other words, the queries are fixed before the algorithm starts.

#### *1.4.2 Compression*

Compression is a traditional and typical method for treating digital data. Basically, there are two types of compression: one type is compression of data without losing any information. In this type of compression, there is an information-theoretical lower bound on the data size. This method is used when the original data needs be reconstructed perfectly from the compressed data, and it thus called *lossless compression* or *reversible compression*. The other type of compression allows discarding of some of the data such that the compressed data is an inexact approximation. Although some of these algorithms can compress data drastically, it is not possible to reconstruct the original data perfectly from the compressed data, and these algorithms are thus called *lossy compression* or *irreversible compression*. This method works remarkably well in the field of music and image compression. See Chaps. 6, 7, 10, and 16 in this book for results from this area.

#### *1.4.3 Succinct Data Structures*

When compressed data is used, it essentially needs to be decompressed. However, decompression requires extra computation. It is therefore useful to be able to use compressed data as-is without decompression. Succinct data structures are a framework that realizes this idea. Specifically, succinct data structures use an amount of space that is close to the information-theoretical lower bound while still allowing efficient (fast) query operations. These structures involve a tradeoff between space and time. See Chaps. 8 and 9 for details.

#### **1.5 Need for the Sublinear Computation Paradigm**

## *1.5.1 Sublinear and Polynomial Computation Are Both Important*

Even though the sublinear computation paradigm has become necessary, it does *not* mean that the polynomial computation paradigm is obsolete. Polynomial computation is still important in normal computations. The typical cases where the sublinear computations are needed are when we need to treat big data. In such cases, traditional polynomial computation is sometimes too slow.

This relationship between the polynomial computation paradigm and the sublinear computation paradigm is analogous to the relationship between *Newtonian mechanics* and *the theory of relativity* in physics. While Newton mechanics is used for normal physical calculations, the theory of relativity is needed if we try to calculate the motion of very fast objects such as rockets, satellites, or electrons. We entered the era of the theory of relativity in the twentieth century and the era of sublinear computation era in the twenty-first century.

#### *1.5.2 Research Project ABD*

A research project named "Foundations on Innovative Algorithms for Big Data (ABD),"10 in which the sublinear computation paradigm is the central concept was started by JST, CREST, Japan in October 2014 and concluded in September 2021. The total budget was more than 300 million yen. Although the project had 24 members at its inception, many more researchers later joined and the final number of regular members exceeded 40 in total. The leader of the project was Prof. Naoki Katoh of University of Hyogo.<sup>11</sup> The project consisted of three groups: the Sublinear Algorithm Group (Team A) led by Prof. Katoh; the Sublinear Data Structure Group (Team D) led by Prof. Tetsuo Shibuya of the University of Tokyo; and the Sublinear Modeling Group (Team M) led by Prof. Kazuyuki Tanaka of Tohoku University. In this project, we worked on problems in big data computation. The main purpose of this book is to introduce the results of this project. A special issue of The Review of Socionetwork Strategies [14] is also available for this project. While some of the methods adopted in this project are not sublinear, we are confident that every piece of research concluded under the project is useful and will form the foundations of innovative algorithms for big data!

#### *1.5.3 The Organization of This Book*

This part of the book, Part I, has provided an introduction. Parts II, III, and IV present the theoretical results of Teams A, D, and M, respectively. Application results leading to scientific and technological innovation are compiled in Part V.

#### **References**


<sup>10</sup> http://crest-sublinear.jp/en/.

<sup>11</sup> Although he was at Kyoto University when the project started, he later moved to Kwansei Gakuin University because of his retirement before finally moving again to his present position.

*Studies in Logic and the Foundations of Mathematics* (North-Holland Publishing Company, Amsterdam, 1965), pp. 24–30


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part II Sublinear Algorithms**

# **Chapter 2 Property Testing on Graphs and Games**

**Hiro Ito**

**Abstract** Constant-time algorithms are powerful tools, since they run by reading only a constant-sized part of each input. Property testing is the most popular research framework for constant-time algorithms. In property testing, an algorithm determines whether a given instance satisfies some predetermined property or is far from satisfying the property with high probability by reading a constant-sized part of the input. A property is said to be testable if there is a constant-time testing algorithm for the property. This chapter covers property testing on graphs and games. The fields of graph algorithms and property testing are two of the main streams of research on discrete algorithms and computational complexity. In the section on graphs in this chapter, we present some important results, particularly on the characterization of testable graph properties. At the end of the section, we show results that we published in 2020 on a complete characterization (necessary and sufficient condition) of testable monotone or hereditary properties in the bounded-degree digraphs. In the section on games, we present results that we published in 2019 showing that the generalized chess, Shogi (Japanese chess), and Xiangqi (Chinese chess) are all testable. We believe that this is the first results for testable EXPTIME-complete problems.

# **2.1 Introduction**

The development of efficient algorithms for problems on big data problems is an urgent task. Constant-time algorithms are a powerful tool for this since they run by reading only a constant-sized part of each input. In other words, the running time is invariant regardless of the size of the input. Property testing is the most popular research framework for constant-time algorithms. In property testing, an algorithm determines whether a given instance satisfies some predetermined property or is far from satisfying that property with high probability by reading a constant-sized part of the input. This section presents some results mainly concerning property testing that have recently been obtained in the ABD Project.

H. Ito (B)

The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan e-mail: itohiro@uec.ac.jp

<sup>©</sup> The Author(s) 2022

N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_2

#### **2.2 Basic Terms and Definitions for Property Testing**

This section gives some of the basic terms that are needed in order to explain our results. Property testing works on many different types of models, including graphs, functions, strings, grammars, and images. Although the details of the definitions differ slightly between the different models, since the basic ideas are the same for all of models, we present only the definitions for digraphs.

Let <sup>N</sup> = {0, <sup>1</sup>, <sup>2</sup>,...} be the set of natural numbers. In this chapter, we sometimes omit floor or ceiling functions. For example, if we write *<sup>s</sup>* <sup>=</sup> <sup>√</sup>*<sup>n</sup>* in a context where *<sup>s</sup>* must be an integer and *<sup>n</sup>* is not necessarily a square number, then <sup>√</sup>*<sup>n</sup>* should be taken to mean <sup>√</sup>*n* or <sup>√</sup>*n*. This allows us to disregard integrality issues that make no real difference to any of our proofs.

#### *2.2.1 Graphs and the Three Models for Property Testing*

A directed graph or *digraph G* is defined as a pair of finite sets (*V*, *E*), where *V* is a finite set of *vertices* and *E* ⊆ *V* × *V* is a set of directed edges, or *edges* for short. The vertex set *V* and the edge set *E* of a graph *G* are sometimes written as *VG* and *EG*, respectively. If the direction of each edge is ignored (i.e., (*u*, *v*) = (*v*, *u*) for any *u*, *v* ∈ *V*), then the digraph is called a *graph* (or an *undirected graph* if we want to indicate undirectedness explicitly). Every graph can be represented as a digraph by using reflectivity on edges; in other words if (*u*, *v*) ∈ *E*, then (*v*, *u*) ∈ *E* for every *u*, *v* ∈ *V*. Thus, graphs can be regarded as special cases of digraphs. This section mainly treats (undirected) graphs. Digraphs are considered in Sect. 2.4. Many of the terms and symbols we define for graphs are also used for digraphs.

The *order* of a graph *G* is given by |*VG*| and the *size* of a graph *G* is given by |*EG*|. A graph (resp., digraph) of order *n* is also called an *n-graph* (resp., *n-digraph*). The number of vertices adjacent to a vertex *v* in a graph *G* is denoted by deg*G*(*v*). If *G* is clear from the context, the subscript *G* may be omitted. In property testing, since an algorithm reads only a part of an instance (input), it gets information about the instances through oracles, which depend on how to the graphs are represented. There are three known models for treating graphs in property testing: the dense-graph model; the bounded-degree (graph) model; and the general-graph model.

In *the dense-graph model*, the *edge oracle* is used: If an algorithm queries whether (*u*, *v*) ∈ *E* or not, the oracle answers correctly: the answer is 1 if (*u*, *v*) ∈ *E* and 0 otherwise. This model basically treats dense (i.e., |*E*| = -(*n*<sup>2</sup>)) graphs. This is because if |*E*| = *o*(*n*<sup>2</sup>), then the edge oracle answers "0" almost every time when *n* is large, making the queries useless.1

In the *bounded-degree model*, there is a restriction such that the degree of every vertex is bounded by a predetermined integer *d* ≥ 1, that is, deg(*v*) ≤ *d* (∀*v* ∈ *V*). From this restriction, it follows that the number of edges in a graph is at most *dn*/2 (or

<sup>1</sup> It works only for determining whether a given graph is sparse.

*dn* for a digraph); in other words, the graph is sparse (note that *d* is a constant). This model assumes that for every vertex *v*, the vertices adjacent to *v* are ordered. This model uses the *adjacent-vertex oracle*: If an algorithm queries for the *i*th (1 ≤ *i* ≤ *d*) adjacent vertex of *v* by giving a pair (*v*,*i*), the oracle answers the name (ID) of the vertex if exists and returns a predetermined special symbol such as ⊥ otherwise. A graph where the degree is bounded by *d* is also called a *d-bounded-degree graph*.

The *general-graph model* is a mixed model of the dense-graph model and the bounded-degree model. Although this model does not have any maximum degreebound, there is a fixed upper bound *d* on the *average* degree. In many cases *d* is a constant and the graphs in this model are sparse. However, if *d* = (*n*), graphs in the model may be dense. This model allows all oracles that are allowed in the other two models in addition to the *degree oracle*: If an algorithm queries the degree of a vertex *v*, it replies with the correct answer deg(*v*).

#### *2.2.2 Properties, Distances, and Testers*

The set of graphs considered in each model—that is, the dense-graph model, the bounded-degree model, or the general-graph model—is denoted by . The subset of such that the order of the graph is *n* is denoted by *n*. Hence = - *<sup>n</sup>*∈<sup>N</sup> *n*.

A *property* is defined as a (generally infinitely large) subset of graphs closed under isomorphism.2 For example "planarity" is defined as the set of all planar graphs. For a property P, we define P*<sup>n</sup>* as P ∩ *n*. Thus, clearly P = - *<sup>n</sup>*∈<sup>N</sup> P*n*.

Property testing is a relaxation of a decision problem. The object of a property testing is to distinguish with high probability whether a given instance satisfies some predetermined property or the instance is "far" from satisfying the property. This requires a mathematical definition of "far."

Let *G* and *G* both be *n*-graphs; *G*, *G* ∈ *n*. The distance between the two graphs is defined as the Hamming distance between them divided by the largest Hamming distance in the model (for normalization). Thus, the distance depends on the models (i.e., how the graphs are represented). We explain this by using the densegraph model. Let δ*EG* : *V* × *V* → {0, 1} be the characteristic function on *EG*, that is, δ*EG* (*u*, *v*) = 1 if (*u*, *v*) ∈ *EG* and 0 otherwise. The distance between *G* and *G* is defined as follows: We denote by *m*(*G*, *G* ) the number of edges that need to be deleted from and/or inserted into *G* in order to make *G* = *G* , i.e.

$$m(G, G') := |\{(\mu, \nu) \in V \times V \mid \delta\_{E\_G}(\mu, \nu) \neq \delta\_{E\_{\mathcal{O'}}}(\mu, \nu)\}|$$

Using this, we define the distance between *G* and *G* as follows3:

<sup>2</sup> Intuitively this means to ignore the labels on vertices and edges.

<sup>3</sup> Although the maximum number of edges in any (undirected) graph of order *<sup>n</sup>* is *<sup>n</sup>*(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)/2, we use *n*<sup>2</sup> for the denominator for simplicity.

16 H. Ito

$$\text{dist}(G, G') := \frac{m(G, G')}{n^2}. \tag{2.1}$$

Note that 0 ≤ dist(*G*, *G* ) ≤ 1 for every *G* and *G* . In the bounded-degree model and the general-graph model, the distance is defined as follows4:

$$\text{dist}(G, G') := \frac{m(G, G')}{dn},\tag{2.2}$$

where *d* is the upper bound on the maximum (resp., the average) vertex-degrees for the bounded-degree model (resp., the general-graph model).

By using the distance between graphs, the distance beetween a graph *G* ∈ *<sup>n</sup>* and a property P is defined as follows:

$$\text{dist}(G, \mathcal{P}) := \begin{cases} \min\_{G' \in \mathcal{P}\_n} \text{dist}(G, G') & \text{if } \mathcal{P}\_n \neq \emptyset, \\ \infty & \text{otherwise.} \end{cases}$$

This applies to all the models. For a real value 0 ≤ ≤ 1, we say that *G* is *-far* from *G* (resp., P) if dist(*G*, *G* )> (resp., dist(*G*, P)>) and *-close* otherwise.

A *testing algorithm* for a property P is an algorithm that, given query access (by the oracles) to an instance *G* and given 0 < ≤ 1, accepts every *G* ∈ P with probability at least 2/3, and rejects every *G* that is -far from P with probability at least 2/3. If a testing algorithm accepts every *G* ∈ P with probability 1, then the algorithm is called a *one-sided-error*. The number of queries made by an algorithm to the given oracle is called the *query complexity* of the algorithm. If the query complexity of a testing algorithm is bounded by a constant that is independent of *n* (but that may depend on and *d*), then the algorithm is called a *tester*. A property is *testable*<sup>5</sup> if there is a tester for the property.

## **2.3 Important Known Results in Property Testing on Graphs**

This section gives a very brief overview of important known results in property testing on graphs, particularly on the characterization and general properties of testability. See a recent review [11] or books [4, 8] for details.

<sup>4</sup> Although the maximum number of edges of any *d*-bounded-degree (undirected) graph of order *n* is *dn*/2, we use *dn* for the denominator for simplicity.

<sup>5</sup> Sometimes "testable" means that the problem has an algorithm with sublinear query complexity, and *strongly testable* may be used to distinguish constant query complexity from mere sublinear query complexity.

#### *2.3.1 Results for the Dense-Graph Model*

Alon et al. [2] found a combinatorial characterization (necessary and sufficient condition) of testable properties for the dense-graph model. We first present the theorem without defining the terms used in it.

**Theorem 2.1** *For the dense-graph model, a graph property is testable if and only if it is regular-reducible.*

This theorem utilize the extremely powerful monumental Szeméredi's regularity lemma, which we now introduce briefly. For a pair of subsets of vertices *A*, *B* ⊆ *V* of a graph *<sup>G</sup>* <sup>=</sup> (*V*, *<sup>E</sup>*), den(*A*, *<sup>B</sup>*) := <sup>|</sup>*E*(*A*,*B*)<sup>|</sup> <sup>|</sup>*A*||*B*<sup>|</sup> is called the *density* of the pair. A family of subsets V = {*V*1,..., *Vk* } (*Vi* ⊆ *V*, ∀*i* ∈ {1,..., *k*}) is called a *partition* of *V* if *Vi* ∩ *Vj* = ∅ for all 1 ≤ *i* < *j* ≤ *k* and *V* = *V*<sup>1</sup> ∪···∪ *Vk* . A partition V = {*V*1,..., *Vk* } of the vertex set of a graph is called an *equipartition* if |*Vi*| and |*Vj*| differ by no more than 1 for all 1 ≤ *i* < *j* ≤ *k*.

**Definition 2.1** (*-regular*) Let 0 < ≤ 1 be a real number and *A*, *B* ⊆ *V*. A pair (*A*, *B*) is called *-regular* if |den(*A*, *B*) − den(*X*, *Y* )| ≤ for any two subsets *X* ⊆ *A* and *Y* ⊆ *B* satisfying |*X*| ≥ |*A*| and |*Y* | ≥ |*B*|. An equipartition V = {*V*1,..., *Vk* } of the vertex set of a graph is called *-regular* if all but at most *k*<sup>2</sup> of the pairs (*Vi*, *Vj*) (*i*, *j* ∈ {1,..., *k*}) are -regular.

**Definition 2.2** (*regularity-instance*) A *regularity-instance R* is given by an errorparameter 0 < <sup>≤</sup> 1, an integer *<sup>k</sup>*, a set of *<sup>k</sup>* 2 real numbers 0 ≤ η*<sup>i</sup>*,*<sup>j</sup>* ≤ 1 indexed by 1 ≤ *i* < *j* ≤ *k*, and a set *R* of pairs(*i*, *j*) of size at most *k*2. A graph is said to satisfy the *regularity-instance* if it has an equipartition V = {*V*1,..., *Vk* } such that for all (*i*, *j*) /∈ *R* the pair (*Vi*, *Vj*) is -regular and satisfies |*E*(*Vi*, *Vj*)| = η*<sup>i</sup>*,*j*|*Vi*||*Vj*|. The *complexity* of the regularity instance is max(*k*, 1/).

**Definition 2.3** (*regular-reducible*) A graph property P is *regular-reducible* if for any δ > 0 there exists *r* = *r*P(δ) such that for any *n* there is a family R of at most *r* regularity-instances each of complexity at most *r*, such that the following holds for every > 0 and every *n*-graph *G*:


**Theorem 2.2** (Szeméredi's regularity lemma [2, 17]) *For every pair of an integer t and a real number* > 0 *there exists an integer T* = *T*2(*t*, ) *such that any graph with n* ≥ *T vertices has an -regular equipartition of order k, where t* ≤ *k* ≤ *T .*

An intuitive explanation of the regularity lemma is that, for any > 0, every graph *G* = (*V*, *E*) has an -approximation of a constant-sized edge-weighted graph, where the edge weight approximates the density of the corresponding vertex pair. Intuitively, a property being regular-reducible means that it can be represented by a constant number of equipartitions based on the regularity lemma; in other words, the regularity lemma holds for testing the property. See [11] also for details.

Representative regular-reducible properties are monotone or hereditary properties, which are defined as follows.

**Definition 2.4** A graph property P is *monotone* if for every *G* ∈ P and *e* ∈ *EG*, *G* − {*e*} ∈ P. A graph property P is *hereditary* if for every *G* ∈ P and *v* ∈ *VG*, *G* − {*v*} ∈ P.

Planarity, bipartiteness, *<sup>k</sup>*-colorability (for any *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>), *<sup>H</sup>*-freeness (for any graph *H*),<sup>6</sup> and disconnectedness are all monotone. The former four properties are also hereditary, but the last one, disconnectedness, is not.7 A well-known non-monotone and hereditary property is perfectness: A graph is said to be *perfect* if for every induced subgraph, the chromatic number of the subgraph equals the clique number (= the order of the largest clique) of the subgraph. Every monotone or hereditary property is regular-reducible (see [2] for details).

We can say that Theorem 2.1 solves the problem of characterizing testable properties in the dense-graph model in a sense. However, the constants that appear in the algorithms obtained by Theorem 2.1 are incredibly (maybe more than astronomically) huge! Thus, developing faster (i.e., smaller constant complexity) algorithms remains an issue for each problem.

#### *2.3.2 Results for the Bounded-Degree Model*

Whereas the combinatorial characterization of testable properties as shown in Theorem 2.1 was obtained for the dense-graph model, no perfect results have been obtained for the bounded-degree model despite many attempts to achieve this goal. However, progress is being made in steps. We now have an important characterization of testable properties in the bounded-degree model called "hyperfiniteness." We also found another characterization called "forbidden configurations," for one-sided error testability, which is explained in Sect. 2.4.

**Definition 2.5** Let > 0, *t* > 0, and *d* > 0. Let *G* = (*V*, *E*) be a *d*-bounded-degree *n*-graph. If one can remove at most *dn* edges from *G* such that each connected component of the resulting graph has at most *t* vertices, then *G* is called (, *t*) *hyperfinite* (with respect to degree bound *<sup>d</sup>*). For a function <sup>ρ</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup>+, if *<sup>G</sup>* is (, ρ())-hyperfinite for every > 0, then *G* is called ρ*-hyperfinite*. A set G of *d*degree-bounded graphs is called ρ*-hyperfinite* if ∀*G* ∈ G is ρ-hyperfinite. G is called *hyperfinite* if there is a function ρ such that G is ρ-hyperfinite.

Newman and Sohler [15] presented the following theorem.

<sup>6</sup> If a graph includes no *H* as a subgraph, then it is called *H -free*.

<sup>7</sup> If a graph consisting of one connected component of order *<sup>n</sup>* <sup>−</sup> 1 and one isolated vertex is disconnected, removing the isolated vertex from the graph makes it connected.

**Theorem 2.3** *In the bounded-degree model, every graph property is testable for any hyperfinite family of graphs.*

While this is a sufficient condition, the following necessary condition related to hyperfiniteness was obtained by Fichtenberger et al. [5].

**Definition 2.6** A *subproperty* of a property P is a property that is a subset of P. A property is *non-trivially testable* if it is testable and there exists > 0 such that there is an infinite number of graphs that are -far from the property.

**Theorem 2.4** *Every testable property of bounded-degree graphs is either finite or contains an infinite hyperfinite subproperty. Furthermore, the complement of every non-trivially testable graph property contains an infinite hyperfinite subproperty.*

These theorems show that there is a deep relation between hyperfiniteness and testability on bounded-degree graphs. We have found, however, no necessary and sufficient condition of graph testability even for a one-sided error. Recently we found necessary and sufficient conditions for one-sided-error testability on subclasses of properties of digraphs<sup>8</sup> [12]. This was obtained through the ABD Project, and is explained in Sect. 2.4.

#### *2.3.3 Results for the General-Graph Model*

There were previously no general classes of testable properties for the general-graph model. Through the ABD Project, a class that models complex networks called Hierarchical Scale Free (HSF ) was founded that is testable. We present an outline of the result below, and the details are available in [10, 11].

**Definition 2.7** *For positive real numbers c* > 0 and γ > 1, *a class of scale-free (multi)graphs* SF (*c*,γ) *consists of (multi)graphs G* = (*V*, *E*) *for which the following condition holds: Let* ν*<sup>i</sup> be the number of vertices v of degree i*. *Then:*

$$\nu\_i \le c n i^{-\gamma}, \quad \forall i \in \{2, 3, \dots, \epsilon\}. \tag{2.3}$$

A *clique* is a subgraph in which there exists an edge between every pair of vertices. For a nonnegative integer *c* ≥ 0, a *c-isolated clique* is a clique such that the number of outgoing edges (edges between the clique and the other vertices) is less than *ck*, where *k* is the number of vertices of the clique. A 1-isolated clique is sometimes simply called an *isolated clique* (see [9] for details). Let E(*G*) be the graph obtained from *G* by contracting all isolated cliques.9

<sup>8</sup> Note that any undirected graph can be represented by a digraph, i.e., the set of digraphs can be regarded as including the set of undirected graphs.

<sup>9</sup> Two distinct isolated cliques never overlap, except in the special case of *double-isolated-cliques*, which consists of two isolated cliques of size *k* that share *k* − 1 vertices. A double-isolated-clique

**Definition 2.8** For positive real numbers *c* > 0, γ > 1 and a positive integer *n*<sup>0</sup> ≥ 1, a class of *hierarchical scale-free (multi)graphs* HSF = HSF (*c*,γ, *n*0) consists of (multi)graphs *G* = (*V*, *E*) for which the following conditions hold:


For a graph *G* and a nonnegative integer *d* ≥ 0, *G*|*d* is the graph obtained by deleting all edges incident to each vertex *v* of degree more than *d*. Note that *G*|*d* is a *d*-bounded-degree graph. The following properties were obtained by [10].

**Lemma 2.1** *For every* SF = SF (*c*,γ) *with* γ > 2*, and every positive real number* > 0*, there exists a constant* δ = δ(, *c*,γ) *such that for every graph G* ∈ SF *, G*|δ *is -close to G.*

This lemma looks useful since it means that for any > 0, any scale-free graph is -close to a bounded-degree graph. This lemma is applied in the proof of the following theorem, which is the main theorem of [10].

**Theorem 2.5** *Every property is testable for* HSF (*c*,γ, *n*0) *with* γ > 2*.*

In the general-graph model, no other universal (constant-time) tester is known, but universal testing algorithms with polylog(*n*)-time query complexity have been found for forests [14] and outerplanar graphs [3].

## **2.4 Characterization of Testability on Bounded-Degree Digraphs**

#### *2.4.1 Bounded-Degree Model of Digraphs*

As mentioned previously, there is no complete characterization of testable graph properties in bounded-degree graphs even for one-sided-errors. Through the ABD project, however, we have obtained a characterization for one-sided-error testable properties of monotone and hereditary properties of bounded-degree digraphs [12], which we briefly explain in this section. The set of digraphs can be regarded to include the set of undirected graphs by introducing reflexivity, i.e., ∀*u*, *v* ∈ *V*, if (*u*, *v*) ∈ *E*, then (*v*, *u*) ∈ *E*.

In this section, we consider the bounded-degree model on digraphs. For a digraph *G* = (*V*, *E*) and a vertex *v* ∈ *V* we denote by *N* <sup>+</sup> *<sup>G</sup>* (*v*) the set of outgoing neighbours

*Q* has no edge between *Q* and the other part of the graph (i.e., deg*G*(*Q*) = 0), and thus we specially define that a double-isolated-clique in *G* is contracted into a vertex in E(*G*). Under this assumption, E(*G*) is uniquely defined.

of *v*, i.e., *N* <sup>+</sup> *<sup>G</sup>* (*v*) := {*u* ∈ *V* | (*v*, *u*) ∈ *E*}. Similarly, *N* <sup>−</sup> *<sup>G</sup>* (*v*) := {*u* ∈ *V* | (*u*, *v*) ∈ *E*} and *NG*(*v*) := *N* <sup>+</sup> *<sup>G</sup>* (*v*) ∪ *N* <sup>−</sup> *<sup>G</sup>* (*v*). The *out-degree* of *v* is deg<sup>+</sup> *<sup>G</sup>*(*v*) := |*N* <sup>+</sup> *<sup>G</sup>* (*v*)|, and the *in-degree* of *v* is deg<sup>−</sup> *<sup>G</sup>*(*v*) := |*N* <sup>−</sup> *<sup>G</sup>* (*v*)|. The subscript *G* can be omitted if it is clear.

For a (di)graph *G* = (*V*, *E*) and *F* ⊆ *E*, we denote by *G* − *F* the graph (*V*, *E* − *F*). For a (di)graph *G* = (*V*, *E*) and *W* ⊆ *V*, we denote by *G*[*W*] the subgraph of *G* induced by *W* (i.e., *G*[*W*] contains all edges in *EG* whose both endpoints are in *W*). *G*[*V* − *W*] can be denoted by *G* − *W*.

In the bounded-degree model for digraphs, there are two submodels: In one, only the out-degree is bounded; in the other, both the in-degree and out-degree are bounded.10 The former case is represented by *F*(*d*)model and the latter one by *F B*(*d*) model.<sup>11</sup> The *F*(*d*) model is clealy wider than the *F B*(*d*) model. Moreover, every undirected *d*-bounded graph can be formulated by the *F B*(*d*) model by replacing each undirected edge by a pair of anti-parallel directed edges. That is, the *F B*(*d*) model (and thus the *F*(*d*) model as well) is regarded as including the undirected *d*-bounded degree model.

#### *2.4.2 Monotone Properties and Hereditary Properties*

This section extends the monotone and hereditary properties that were defined in Definition 2.4 to digraphs.

We first introduce the following notation for characterizing the testability of these properties. Let H be a set of digraphs. We call H an *r -set* if every member *H* ∈ H has at most *r* vertices (i.e., *H* is an *r* -digraph for some *r* ≤ *r*). A digraph *G* is H*free* if for every *H* ∈ H, *G* contains no subgraph that is isomorphic to *H*. A digraph *G* is *induced* H*-free* if for every *H* ∈ H, *G* contains no induced subgraph that is isomorphic to *H*. We denote by PH (resp., P<sup>∗</sup> <sup>H</sup> ) the property that contains all digraphs that are H-free (resp., induced H-free). PH,*<sup>n</sup>* (resp., P<sup>∗</sup> <sup>H</sup>,*<sup>n</sup>*) is the subproperty of PH that consists of all *n*-digraphs in PH (resp., P<sup>∗</sup> <sup>H</sup> ). We can easily confirm that PH is monotone and P<sup>∗</sup> <sup>H</sup> is hereditary for any <sup>H</sup>.

Let *H* = (*V*, *E*) be a digraph. For a subset *W* ⊆ *V*, if by disregarding the directions of the edges of *H*, *W* induces a connected component in the resulting undirected graph, then we say that *H*[*W*], which is the directed subgraph of *H* induced by *W*, is a *component* of *H*. A digraph *H* is *rooted* if every component *H* of *H* has a vertex *v* such that for every *u* ∈ *VH* there exists a dipath (= directed path) from *v* to *u*.

<sup>10</sup> Clearly the case in which only the *in*-degree is bounded can be formulated by the model in which only the *out*-degree is bounded by changing the edge direction.

<sup>11</sup> *F* and *B* mean forward and backward, respectively.

## *2.4.3 Characterizations*

By using these terms, the characterizations of testable monotone or hereditary properties for the *F*(*d*) model were given in [12].

**Theorem 2.6** *Let P* = - *<sup>n</sup>*∈<sup>N</sup> P*<sup>n</sup> be a monotone property in the F*(*d*)*-model. Then* P *is testable if and only if there is a function r* : (0, <sup>1</sup>) <sup>→</sup> <sup>N</sup> *such that for any* <sup>0</sup> << <sup>1</sup> *and n* <sup>∈</sup> <sup>N</sup>*, there is an r*()*-set of rooted digraphs* <sup>H</sup>*<sup>n</sup> such that the property* PH*<sup>n</sup>* ,*<sup>n</sup> satisfies the following two conditions:*

*(a)* P*<sup>n</sup>* ⊆ PH*<sup>n</sup>* ,*<sup>n</sup> (b)* PH*<sup>n</sup>* ,*<sup>n</sup> is* /2*-close to* P*n.*

**Theorem 2.7** *Let* P *be a hereditary property in the F*(*d*)*-model. Then* P *is testable if and only if there are functions r* : (0, <sup>1</sup>) <sup>→</sup> <sup>N</sup> *and N* : (0, <sup>1</sup>) <sup>→</sup> <sup>N</sup> *such that for any* 0 << 1*, there is an r*()*-set of rooted digraphs* H *such that for every n* ≥ *N*()*,* P∗ <sup>H</sup>,*<sup>n</sup> satisfies the following two conditions: (a)* P*<sup>n</sup>* ⊆ P<sup>∗</sup> H,*n (b)* P<sup>∗</sup> <sup>H</sup>,*<sup>n</sup> is* /2*-close to* <sup>P</sup>*n.*

Condition (b) in both Theorems 2.6 and 2.7 is necessary, since there exists a monotone and hereditary property that is testable with a one-sided-error and has no H*<sup>n</sup>* such that |H*n*| is bounded by a constant (*r*()) and "P*<sup>n</sup>* = PH*<sup>n</sup>* ,*<sup>n</sup>* or P*<sup>n</sup>* = P<sup>∗</sup> <sup>H</sup>*<sup>n</sup>* ,*<sup>n</sup>*": One of these properties is PC<sup>√</sup>*<sup>n</sup>* (= P<sup>∗</sup> C√*<sup>n</sup>* ) on the *F*(1)-model,<sup>12</sup> where C*<sup>k</sup>* is the set of directed cycles (or *dicycles*, for short) of length in [3, *k*], i.e., PC<sup>√</sup>*<sup>n</sup>* is the property of having no dicycle of length in [3, <sup>√</sup>*n*]. This property is clearly monotone and hereditary. To express PC<sup>√</sup>*<sup>n</sup>* by using a set H of forbidden subgraphs (or forbidden induced subgraphs), H must includes C√*n*, and thus |H| cannot be bounded by any constant. However, this property is testable with a one-sided-error as shown below.

**Lemma 2.2** PC<sup>√</sup>*<sup>n</sup> on the F*(1)*-model is one-sided-error testable with query complexity O*(−<sup>2</sup>)*.*

To prove this lemma, we will use the following lemma, which is often effective for estimating the query complexity of testers.

**Lemma 2.3** *For any real number x, the following inequality holds:*

$$e^{\mathbf{x}} \ge \mathbf{x} + \mathbf{1}.\tag{2.4}$$

The proof of this lemma is trivial from the differentiation of *e<sup>x</sup>* − (*x* + 1), and is omitted here.

*Proof of Lemma* 2.2: If *n* ≤ 2/, then we can get the complete data of the graph in time 2/. Thus it is enough to consider the case of *n* > 2/. We use the following algorithm for the tester:

<sup>12</sup> PC<sup>√</sup>*<sup>n</sup>* = <sup>P</sup><sup>∗</sup> <sup>C</sup>√*<sup>n</sup>* on the *<sup>F</sup>*(*d*)-model for *<sup>d</sup>* <sup>≥</sup> 2.

Choose *s* = 2/ vertices *v*1, ..., *vs* from *V* uniformly at random, and denote them by *S*. For each *vi* ∈ *S*, check whether there is a dicycle of length at most *s* that includes *vi* by following each outgoing edge successively whenever it exists. (Note that in the *F*(1)-model, the outgoing edge of each vertex exists uniquely if it exists.) If a dicycle of length in [3,*s*] is found, then it is rejected; otherwise, it is accepted.

We show that the above algorithm is the desired one-sided-error tester. It is clearly a one-sided-error, since it never rejects without finding a short (i.e., length of at most *s*) dicycle. Thus, it is enough to show that the algorithm rejects with probability at least 2/3 if the input is -far from PC<sup>√</sup>*<sup>n</sup>* .

Assume that the input *G* = (*V*, *E*) is -far from PC<sup>√</sup>*<sup>n</sup>* , i.e., that *G* contains more than *n* dicycles of length in [3, <sup>√</sup>*n*]. Let <sup>C</sup> be the set of such dicycles. We divide <sup>C</sup> into the following two sets:

Cshort = {*C* ∈ C | the length of *C* is at most *s*.}

<sup>C</sup>long = {*<sup>C</sup>* <sup>∈</sup> <sup>C</sup> <sup>|</sup> the length of *<sup>C</sup>* is more than *<sup>s</sup>* (and at most <sup>√</sup>*n*).}

From |C| > *n*, |Cshort| > *n*/2 or |Clong| > *n*/2 holds.

First, we assume that |Clong| > *n*/2. Clearly no pair of dicycles in C shares a common vertex, and thus more than *sn*/2 = *n* vertices are included in the graph contradiction. Thus, |Clong| ≤ *n*/2.

From this, it follows that |Cshort| > *n*/2. Since no pair of dicycles in C shares a common vertex and each dicycle has at least three vertices, then the dicycles in Cshort contain more than 3*n*/2 vertices. Let *W* be the set of such vertices. If the algorithm finds at least one vertex from *W*, then it will find a short dicycle that includes the vertex and rejects the input. From |*W*| > 3*n*/2, it follows that the probability that a chosen vertex is not in *W* is less than 1 − 3/2. Thus, the probability that all of *s* vertices chosen by the algorithm are not in *W* is less than

$$(1 - 3\epsilon/2)^s \le e^{-3\epsilon s/2} = e^{-3} < \frac{1}{3}.$$

Note that the first inequality above uses the inequality (2.4). The probability that the algorithm finds at least one vertex from *W* is, therefore, more than 2/3. The query complexity of this tester is clearly *O*(−<sup>2</sup>). -

Since PC<sup>√</sup>*<sup>n</sup>* is both monotone and hereditary, Theorems 2.6 and 2.7 hold. If we apply Theorem 2.6, then H*<sup>n</sup>* = Cmin{2/,√*n*} for each *n*. If we apply Theorem 2.7, then *N*() = 4/<sup>2</sup> and H = C<sup>2</sup>/ . From this discussion, we observe that *N*() is essential in Theorem 2.7.

## *2.4.4 An Idea to Extend the Characterizations Beyond Monotone and Hereditary*

We would like to extend Theorems 2.6 and 2.7 to general properties. We denote by Pdeg+(*d*−1) the property consisting of digraphs having no vertex with out-degree *d* − 1 on the *F*(*d*)-model. Pdeg(*d*−1) is one-sided-error testable as shown below.

Let *G* = (*V*, *E*) be an input. The algorithm for Pdeg+(*d*−1) chooses 2/ vertices from *V* uniformly at random and checks their out-degrees. If it finds a vertex of degree *d* − 1, then it is rejected; otherwise, it is accepted. This algorithm is a onesided-error, since it never rejects if there is no vertex of out-degree *d* − 1. If *G* is -far from Pdeg+(*d*−1), then there are more than *n* vertices of out-degrees *d* − 1. Thus, the probability that there is no vertex of out-degree *d* − 1 in the selected 2/ vertices by the algorithm is less than

$$(1 - \epsilon)^{\frac{2}{\epsilon}} \le \left(e^{-\epsilon}\right)^{\frac{2}{\epsilon}} = e^{-2} < \frac{1}{3}.$$

Note that this also uses the inequality (2.4).

Hence, the above algorithm is a one-sided-error tester for Pdeg+(*d*−1). However, expressing this property by using forbidden subgraphs or forbidden induced subgraphs like Theorems 2.6 or 2.7 is impossible.13

To extend the idea of "forbidden something" to non-monotone and non-hereditary properties, we [12] introduced the idea of "configurations," by generalizing subgraphs and induced subgraphs. A similar idea has also appeared in [16].

**Definition 2.9** A *configuration* is a pair *O* = (*H*, *L*), where *H* = (*W*, *F*) is a digraph in the *F*(*d*)-model, *L* : *W* → {developed,frontier} is a function, and the out-degree of every frontier vertex is 0. The configuration is *rooted* if *H* is rooted.

**Definition 2.10** Let *O* = (*H* = (*W*, *F*), *L*) and *G* = (*V*, *E*) be a configuration and a graph respectively in the *F*(*d*)-model. We say that *G* has an *O-appearance* if there is an injective mapping φ : *W* → *V* satisfying the condition that ∀*v* ∈ *W* with *L*(*v*) = developed, the following two conditions hold:

(i) ∀*u* ∈ *W*, (*v*, *u*) ∈ *F* if and only if (φ(*v*), φ(*u*)) ∈ *E*.

(ii) If (φ(*v*), *x*) ∈ *E*, then ∃*u* ∈ *W*, φ(*u*) = *x*.

We say that *G* is *O-free* if *G* has no *O*-appearance. For a set O of configurations, we say that *G* is O*-free* if ∀*O* ∈ O, *G* is *O*-free. -

As we have already stated, Pdeg+(*d*−1) cannot be defined by any set of forbidden subgraphs or induced subgraphs. However, it can be defined by using O-freeness. That is, let *O*deg+(*d*−1) = (*H* = (*W*, *F*), *L*) be a configuration such that *W* = {*v*0, *v*1,..., *vd*−<sup>1</sup>}, *E* = {(*v*0, *v*1), (*v*0, *v*2), . . . , (*v*0, *vd*−<sup>1</sup>)}, *L*(*v*0) = developed, and *L*(*v*1) = *L*(*v*2) =···= *L*(*vd*−<sup>1</sup>) = frontier. ThenPdeg(*d*−1) is defined by the set of *O*deg(*d*−1)-free graphs.

The idea of configuration-free (or forbidden configurations) may work for characterizing general one-sided-error testable properties on the *F*(*d*)-model. See [12] for details.

<sup>13</sup> This follows from the fact that <sup>P</sup>deg+(*d*−1) is neither monotone nor hereditary, and from Theorems 2.6, and 2.7.

#### **2.5 Testable EXPTIME-Complete Games**

This section presents results on the testability of combinatorial games, particularly the generalized chess, Shogi (Japanese chess), and Xiangqi (Chinese chess). Given any position on a <sup>√</sup>*<sup>n</sup>* <sup>×</sup> <sup>√</sup>*<sup>n</sup>* board with *<sup>O</sup>*(*n*) pieces, the generalized chess, Shogi, and Xiangqi problems are the problems of determining the property that "the player who moves first has a winning strategy." These problems are known or believed to be EXPTIME-complete [1, 6, 7]. In [13], we proposed that this property is testable for chess, Shogi, and Xiangqi. The Shogi tester and Xiangqi tester are one-sided-error testers, and surprisingly, the chess tester is a no-error tester. Many problems have been revealed to be testable, but most of such problems belong to class NP. We think that this is the first result on the constant-time testability of EXPTIME-complete problems. This section presents these results. We mainly focus on chess, followed by Shogi, but omit the explanation for Xiangqi since the method is similar to the one for Shogi. See [13] for details.

#### *2.5.1 Definitions*

We begin by focusing mainly on generalized chess. Generalized chess is played on a <sup>√</sup>*<sup>n</sup>* <sup>×</sup> <sup>√</sup>*<sup>n</sup>* board with *<sup>O</sup>*(*n*) pieces, including two kings. White moves first and black plays after white. A position is defined by fixing each piece to a particular cell on the board. At any given position *S*, the problem is to determine whether white wins if both players play optimally. The basic rules are the same as those in the original chess and are omitted here.

In chess, there are six different types of pieces: king (K), queen (Q), bishop (B), knight (N), rook (R), and pawn (P). There are only two pieces of kings; one white and one black. For each of the other piece-types (i.e., bishop, knight, rook, and pawn), there exist at most *cn* pieces for both white and black, respectively, where *c* is a constant. Piece-numbers from 1 to *cn* are given to each white or black piece of each piece-type; in other words, each piece has its own *piece ID* (*k*, *o*, ) comprising a piece-type *k* ∈ {K, Q,B, N,R, P}, an owner-color *o* ∈ {white, black}, and a piecenumber  ∈ {1,..., *cn*}.

An algorithm can find the given position through the following oracles.


When we explicitly identify position *S*, we express the oracles as *q*1(*k*, *o*, ; *S*) and *q*2(*i*, *j*; *S*), respectively. We introduce the assumption that all pieces can be arranged on the board simultaneously, and it thus follows that 2 × (5*cn* + 1) ≤ *n*. For simplicity, we assume that

$$c \le 1/11.\tag{2.5}$$

A position *S* is called a *winner* if white has a winning strategy (i.e., white will win if the players start from *S* and play optimally) and a *loser* otherwise. Note that a loser not only includes cases where white loses but also where the game ends in a draw.

A position is fixed by querying the piece oracle for every piece. The number of different queries for the piece oracle is at most *n*, and thus a position is fixed by the maximum of *n* data. From this, we define the *distance* between positions *S* and *S* as

$$\text{dist}(S, S') := \frac{|\{(i, j) \mid q\_2(i, j; S) \neq q\_2(i, j; S')\}|}{n}. \tag{2.6}$$

Clearly 0 ≤ dist(*S*, *S* ) ≤ 1.

Positions *S* and *S* are called *isomorphic* if we can make *S* identical to *S* by only changing their piece-numbers (neither changing the piece-type nor owner-color is allowed). A set of positions that is closed under isomorphism is called a *property*. The distance between a position *S* and a property P is defined as follows:

$$\text{dist}(S, \mathcal{P}) := \min\_{S' \in \mathcal{P}} \text{dist}(S, S'). \tag{2.7}$$

For a positive > 0, *S* is *-far* from P if dist(*S*, P)>; otherwise, it is *-close*. Let W be the set of winners. W is clearly closed under isomorphism and thus W is a property.

For generalized Shogi and Xiangqi, similar definitions are used. They can be easily deduced and are omitted here. See [13] for details.

#### *2.5.2 Testers for Generalized Chess, Shogi, and Xiangqi*

The following theorem was presented in [13]. Note that a *no-error* tester is a onesided-error tester that always rejects every input that is -far from the property; that is, it always accepts or rejects with no-error if the input is in the property or -far from the property.

**Theorem 2.8** *There exists a no-error tester with query complexity O*(−<sup>1</sup>) *for the generalized chess problem, there exists a one-sided-error tester with query complexity O*(−<sup>2</sup>) *for the generalized Shogi problem, and there exists a one-sided-error tester with query complexity O*(−<sup>1</sup>) *for the generalized Xiangqi problem.*

**Fig. 2.1** The black king will be checkmated by white's next move, as indicated by the arrow 1

*Proof of the chess part of Theorem* 2.8 Let *S* be a given position. Let *S* be the position made from *S* by changing the pieces in cells (*i*, *j*), *i* ∈ {1, 2, 3, 4} and *j* ∈ {1, 2, 3, 4, 5}, as shown in Fig. 2.1.

The pieces that were in these cells in *S* are changed to be unused pieces, and the pieces that appear in these cells in *S* are moved from other cells or unused pieces. In *S* , the white king is safe and the black king will be checkmated by white's next move (moving the queen from (3, 2) to (2, 2)), meaning that *S* is a winner. The distance between *S* and *S* is at most 20 + 8 = 28. Thus, if *n* ≥ 28/, then dist(*S*, *S* ) ≤ 28/*n* ≤ . Hence, *S* is -close to W, and it is sufficient to accept it. If *n* < 28/, it is sufficient to read all of the information by calling the piece oracle for all pieces, which requires *O*(−<sup>1</sup>) queries.

This algorithm always accepts a winner. Moreover, if a given position *S* is -far from W, then *n* < 28/ and the algorithm knows the complete information for *S*. Therefore, this algorithm is no-error. -

The algorithms for the generalized Shogi and Xiangqi problems are a little more complicated. The reason is that in Shogi and Xiangqi there are fouls based on positions. A player who plays the fouls loses. In Shogi, the following fouls need to be considered in the generalized Shogi problem.


<sup>14</sup> If a piece of some piece-type can be promoted (to a stronger piece) when it enters the opponent's camp.

<sup>15</sup> These three pieces can move only forward.

In a given position *S*, if there is a white piece that plays a fault, then white cannot win,<sup>16</sup> and thus the position is not a winner. However, if the number of pieces related to fouls is small (e.g., smaller than *n*/2), we can remove the fouls and make white win, i.e., the position is -close to W. To detect this, we need to perform preprocessing and the tester may have error when the input is -far from W. For Xiangqi, a similar discussion applies. See [13] for details.

#### **2.6 Summary**

In this chapter, we introduced basic terminology and important results for property testing, which is the most examined framework for constant- or sublinear-time algorithms. In particular we presented two of our resent esults: The first is the complete characterization of one-sided-error testable monotone or hereditary properties on bounded-out-degree digraphs, and the other one is the testers for the generalized chess, Shogi, and Xiangqi problems, which are all EXPTIME-complete.

The 21st century can be called the era of big data, and the larger big data becomes, the more we need sublinear- and constant-time algorithms. The importance of this area will continue to grow. The number of fields in which constant-algorithms are efficiently applied will increase, and new techniques will be found accordingly. We eagerly await these developments.

#### **References**


<sup>16</sup> If both players play fouls, the game is a draw.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 Constant-Time Algorithms for Continuous Optimization Problems**

**Yuichi Yoshida**

**Abstract** In this chapter, we consider constant-time algorithms for continuous optimization problems. Specifically, we consider quadratic function minimization and tensor decomposition, both of which have numerous applications in machine learning and data mining. The key component in our analysis is *graph limit theory*, which was originally developed to study graphs analytically.

#### **3.1 Introduction**

In this chapter, we turn our attention to constant-time algorithms for continuous optimization problems. Specifically, we consider quadratic function minimization and tensor decomposition, both of which have numerous applications in machine learning and data mining. The key component in our analysis is *graph limit theory*, which was originally developed to study graphs analytically.

We introduce graph limit theory in Sect. 3.2, and then discuss quadratic function minimization and tensor decomposition in Sects. 3.3 and 3.4, respectively. Throughout this chapter, we assume the real RAM model, in which we can perform basic algebraic operations on real numbers in one step. For a positive integer *n*, let [*n*] denote the set {1, <sup>2</sup>,..., *<sup>n</sup>*}. For real values *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>, *<sup>a</sup>* <sup>=</sup> *<sup>b</sup>* <sup>±</sup> *<sup>c</sup>* is used as shorthand for *b* − *c* ≤ *a* ≤ *b* + *c*. The algorithms and analysis presented in this chapter are based on [5, 6].

#### **3.2 Graph Limit Theory**

This section reviews the basic concepts of graph limit theory. For further details, refer to the book by Lovász [7].

Y. Yoshida (B)

National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan e-mail: yyoshida@nii.ac.jp

We call a (measurable) function W : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> <sup>a</sup> *dikernel of order K*. We define

$$|\mathcal{W}|\_F = \sqrt{\int\_{[0,1]^K} \mathcal{W}(\mathbf{x})^2 d\mathbf{x}},\tag{\text{Frobenius norm}}$$

$$|\mathcal{W}|\_{\text{max}} = \max\_{\mathbf{x} \in [0,1]^K} |\mathcal{W}(\mathbf{x})|,\tag{\text{Maxwell norm}}$$

$$|\mathcal{W}|\_{\square} = \sup\_{S\_1, \dots, S\_K \subseteq [0,1]} \left| \int\_{S\_1 \times \dots \times S\_K} \mathcal{W}(\mathbf{x}) d\mathbf{x},\right|\tag{\text{Cut norm}}$$

We note that these norms satisfy the triangle inequality. For two dikernels W and W , we define their *inner product* as W,W = [0,1] *<sup>K</sup>* W(*x*)W (*x*)d*x*. For a dikernelW : [0, 1] <sup>2</sup> <sup>→</sup> <sup>R</sup> and a function *<sup>f</sup>* : [0, <sup>1</sup>] → <sup>R</sup>, we define a function<sup>W</sup> *<sup>f</sup>* : [0, <sup>1</sup>] → <sup>R</sup> as (<sup>W</sup> *<sup>f</sup>* )(*x*) = W(*x*, ·), *<sup>f</sup>* .

Let λ be a Lebesgue measure. A map π : [0, 1]→[0, 1] is said to be *measurepreserving* if the pre-image π−<sup>1</sup>(*X*) is measurable for every measurable set *X*, and λ(π−<sup>1</sup>(*X*)) = λ(*X*). A *measure-preserving bijection* is a measure-preserving map whose inverse map exists and is also measurable (and, in turn, also measurepreserving). For a measure-preserving bijection π : [0, 1]→[0, 1] and a dikernel W : [0, 1] *<sup>K</sup>*→R, we define a dikernelπ(W) : [0, <sup>1</sup>] *<sup>K</sup>* <sup>→</sup> <sup>R</sup>asπ(W)(*x*1,..., *xK* ) <sup>=</sup> W(π(*x*1), . . . , π(*xK* )).

A partition P = (*V*1,..., *Vp*) of the interval [0, 1] is called an *equipartition* if λ(*Vi*) = 1/*p* for every *i* ∈ [*p*]. For a dikernelW : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> and an equipartition P = (*V*1,..., *Vp*) of [0, 1], we define WP : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> as the dikernel obtained by averaging each *Vi*<sup>1</sup> ×···× *ViK* for *i*1,...,*iK* ∈ [*p*]. More formally, we define

$$\mathcal{W}\_{\mathcal{P}}(\mathbf{x}) = \frac{1}{\prod\_{k \in [K]} \lambda(V\_{i\_k})} \int\_{V\_{i\_1} \times \cdots \times V\_{i\_K}} \mathcal{W}(\mathbf{x'}) d\mathbf{x'} = p^K \int\_{V\_{i\_1} \times \cdots \times V\_{i\_K}} \mathcal{W}(\mathbf{x'}) d\mathbf{x'},$$

where *ik* is the unique index such that *xk* ∈ *Vik* for each *k* ∈ [*K*]. The following lemma states that any dikernel W : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> can be well approximated by WP for some equipartition P into a small number of parts.

**Lemma 3.1** (Weak regularity lemma for dikernels*[4]*) *Let*W<sup>1</sup>,...,W*<sup>T</sup>* : [0, <sup>1</sup>] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> *be dikernels. Then, for any* > <sup>0</sup>*, there exists an equipartition* <sup>P</sup> *into* <sup>|</sup>P| ≤ <sup>2</sup>*<sup>O</sup>*(*T*/2*<sup>K</sup>* ) *parts, such that for every t* ∈ [*<sup>T</sup>* ]*,*

$$|\mathcal{W}' - \mathcal{W}''\_{\mathcal{P}}|\_{\square} \le \epsilon |\mathcal{W}'|\_{F}.$$

We can construct the dikernel X : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> from a tensor *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···×*NK* as follows. For an integer *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, let *<sup>I</sup> <sup>n</sup>* <sup>1</sup> = [0, <sup>1</sup> *<sup>n</sup>* ], *<sup>I</sup> <sup>n</sup>* <sup>2</sup> <sup>=</sup> ( <sup>1</sup> *<sup>n</sup>* , <sup>2</sup> *<sup>n</sup>* ],..., *I <sup>n</sup> <sup>n</sup>* <sup>=</sup> ( *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup>* ,..., 1]. For *<sup>x</sup>* ∈ [0, <sup>1</sup>], we define *in*(*x*) ∈ [*n*] as the unique integer such that *<sup>x</sup>* <sup>∈</sup> *<sup>I</sup> <sup>n</sup> <sup>i</sup>* . We then define X(*x*1,..., *xK* ) = *XiN*<sup>1</sup> (*x*1)···*iNK* (*xK* ). The main motivation of creating a dikernel from a tensor is that, in doing so, we can define the distance between two tensors

*X* and *Y* of different sizes via the cut norm—that is, |X − Y|-, where X and Y are dikernels corresponding to *X* and *Y* , respectively.

Let W : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> be a dikernel and *Sk* <sup>=</sup> (*x<sup>k</sup>* <sup>1</sup> ,..., *x<sup>k</sup> <sup>s</sup>* ) for *k* ∈ [*K*] be sequences of elements in [0, 1]. Then, we define a dikernelW|*<sup>S</sup>*1,...,*SK* : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> as follows:We first extract a tensor *<sup>W</sup>* <sup>∈</sup> <sup>R</sup>*<sup>s</sup>*×···×*<sup>s</sup>* by setting *Wi*1···*iK* <sup>=</sup>W(*<sup>x</sup>* <sup>1</sup> *i*1 ,..., *x <sup>K</sup> iK* ). Next, we defineW|*<sup>S</sup>*1,...,*SK* as the dikernel corresponding to *W*|*<sup>S</sup>*1,...,*SK* . The following is the key technical lemma in the analysis of the algorithms given in the subsequent sections.

**Lemma 3.2** *Let* <sup>W</sup><sup>1</sup>,...,W*<sup>T</sup>* : [0, <sup>1</sup>] *<sup>K</sup>* → [−*L*, *<sup>L</sup>*] *be dikernels. Let S*1,..., *SK be sequences of s elements uniformly and independently sampled from* [0, 1]*. Then, with probability at least* 1 − exp(−*<sup>K</sup>* (*s*<sup>2</sup>(*T*/ log *s*) <sup>1</sup>/*<sup>K</sup>* ))*, there exists a measurepreserving bijection* π : [0, 1]→[0, 1] *such that, for every t* ∈ [*T* ]*, we have*

$$|\mathcal{W}^t - \pi(\mathcal{W}^t|\_{\mathbb{S}\_1,\dots,\mathbb{S}\_K})|\_{\square} = L \cdot O\_K\left(\frac{T}{\log s}\right)^{1/2K},$$

*where OK* (·) *and <sup>K</sup>* (·) *hide factors depending on K .*

#### **3.3 Quadratic Function Minimization**

Background

Quadratic functions are one of the most important function classes in machine learning, statistics, and data mining.Many fundamental problems such as linear regression, *k*-means clustering, principal component analysis, support vector machines, and kernel methods can be formulated as a minimization problem of a quadratic function. See, e.g., [8] for more details.

In some applications, it is sufficient to compute the minimum value of a quadratic function rather than its solution. For example, Yamada et al. [13] proposed an efficient method for estimating the Pearson divergence, which provides useful information about data, such as the density ratio [10]. They formulated the estimation problem as the minimization of a squared loss and showed that the Pearson divergence can be estimated from the minimum value. Least-squares mutual information [9] is another example that can be computed in a similar manner.

Despite its importance, minimization of quadratic functions suffers from the issue of scalability. Let *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> be the number of variables. In general, this kind of minimization problem can be solved by quadratic programming (QP), which requires poly(*n*) time. If the problem is convex and there are no constraints, then the problem is reduced to solving a system of linear equations, which requires *O*(*n*<sup>3</sup>) time. Both methods easily become infeasible, even for medium-scale problems of, say, *n* > 10000.

Although several techniques have been proposed to accelerate quadratic function minimization, they require at least linear time in *n*. This is problematic when handling

#### **Algorithm 1**

**Input:** *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, query access to a matrix *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* and to vectors *<sup>d</sup>*, *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*n*, and , δ <sup>∈</sup> (0, <sup>1</sup>). 1: *S* ← a sequence of *s* = *s*(, δ) indices independently and uniformly sampled from [*n*]. 2: **return** *<sup>n</sup>*<sup>2</sup> *<sup>s</sup>*<sup>2</sup> min*v*∈R*<sup>n</sup> ps*,*A*|*<sup>S</sup>* ,*d*|*<sup>S</sup>* ,*b*|*<sup>S</sup>* (*v*).

large-scale problems, where even linear time is slow or prohibitive. For example, stochastic gradient descent (SGD) is an optimization method that is widely used for large-scale problems. A nice property of this method is that, if the objective function is strongly convex, it outputs a point that is sufficiently close to an optimal solution after a constant number of iterations [1]. Nevertheless, each iteration needs at least (*n*) time to access the variables. Another popular technique is low-rank approximation such as Nyström's method [12]. The underlying idea is to approximate the input matrix by a low-rank matrix, which drastically reduces the time complexity. However, we still need to compute the matrix vector product of size *n*, which requires (*n*)time. Clarkson et al. [2] proposed sublinear-time algorithms for special cases of quadratic function minimization. However, these are "sublinear" with respect to the number of pairwise interactions of the variables, which is (*n*<sup>2</sup>), and the algorithms require *<sup>O</sup>*(*<sup>n</sup>* log*<sup>c</sup> <sup>n</sup>*) time for some *<sup>c</sup>* <sup>≥</sup> 1.

Constant-time algorithm for quadratic function minimization Let *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>* be a matrix and *<sup>d</sup>*, *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* be vectors. Then, we consider the following quadratic problem:

$$\underset{\mathbf{v}\in\mathbb{R}^{n}}{\text{minimize}}\ p\_{\boldsymbol{\eta},\mathcal{A},\mathbf{d},\boldsymbol{\vartheta}}(\mathbf{v}), \text{ where } p\_{\boldsymbol{\eta},\mathcal{A},\mathbf{d},\boldsymbol{\vartheta}}(\mathbf{v}) = \langle \mathbf{v}, A\mathbf{v} \rangle + n\langle \mathbf{v}, \text{diag}(\mathbf{d})\mathbf{v} \rangle + n\langle \boldsymbol{\vartheta}, \mathbf{v} \rangle,\tag{5.1}$$

where ·, · denotes the inner product and diag(*d*) denotes a diagonal matrix in which the diagonal entries are specified by *d*. Note that although a constant term can be included in (3.1), it is omitted here because it is irrelevant when optimizing (3.1), and hence we omit it.

Let *<sup>z</sup>*<sup>∗</sup> <sup>∈</sup> <sup>R</sup> be the optimal value of (3.1) and let , δ <sup>∈</sup> (0, <sup>1</sup>) be parameters. Then, our goal is then to compute *z* with |*z* − *z*∗| = *O*(*n*<sup>2</sup>) with probability at least 1 − δ in constant time. We further assume that we have query access to *A*, *b*, and *d*, with which we can obtain their entry by specifying an index. We note that *z*<sup>∗</sup> is typically (*n*<sup>2</sup>) because *v*, *Av* consists of (*n*<sup>2</sup>) terms, and *v*, diag(*d*)*v* and *b*, *v* consist of (*n*) terms. Hence, we can regard the error of (*n*<sup>2</sup>) as an error of () for each term, which is reasonably small in typical situations.

Let ·|*<sup>S</sup>* be an operator that extracts a submatrix (or subvector) specified by an index set *<sup>S</sup>* <sup>⊂</sup> <sup>N</sup>. Our algorithm is then given by Algorithm 1, where the parameter *s* := *s*(, δ) is determined later. In other words, we sample a constant number of indices from the set [*n*], and then solve the problem (3.1) restricted to these indices. Note that the number of queries and the time complexity are *O*(*s*2) and poly(*s*), respectively.

The goal of the rest of this section is to show the following approximation guarantee of Algorithm 1.

**Theorem 3.1** *Let v*<sup>∗</sup> *and z*<sup>∗</sup> *be an optimal solution and the optimal value, respectively, of problem* (3.1)*. By choosing s*(, δ) <sup>=</sup> <sup>2</sup>(1/2) <sup>+</sup> (log <sup>1</sup> <sup>δ</sup> log log <sup>1</sup> <sup>δ</sup> )*, with probability at least* 1 − δ*, a sequence S of s indices independently and uniformly sampled from* [*n*] *satisfies the following: Let v*˜ <sup>∗</sup> *and z*˜<sup>∗</sup> *be an optimal solution and the optimal value, respectively, of the problem* min*<sup>v</sup>*∈R*<sup>s</sup> ps*,*A*|*<sup>S</sup>* ,*d*|*<sup>S</sup>* ,*b*|*<sup>S</sup>* (*v*)*. Then, we have*

$$\left| \frac{n^2}{s^2} \tilde{z}^\* - z^\* \right| \le \epsilon L M^2 n^2,$$

*where*

$$L = \max\left\{ \max\_{i,j} |A\_{ij}|, \max\_i |d\_i|, \max\_i |b\_i| \right\} \text{ and } M = \max\left\{ \max\_{i \in [n]} |\nu\_i^\*|, \max\_{i \in [n]} |\tilde{\nu}\_i^\*| \right\}.$$

We can show that *M* is bounded when *A* is symmetric and full rank. To see this, we first note that we can assume *A* + *n*diag(*d*) is positive-definite, as otherwise *pn*,*A*,*d*,*<sup>b</sup>* is not bounded and the problem is uninteresting. Then, for any set *S* ⊆ [*n*] of*s* indices, (*A* + *n*diag(*d*))|*<sup>S</sup>* is again positive-definite because it is a principal submatrix. Hence, we have *v*<sup>∗</sup> = (*A* + *n*diag(*d*))−1*nb*/2 and *v*˜ <sup>∗</sup> = (*A*|*<sup>S</sup>* + *n*diag(*d*|*S*))−1*nb*|*S*/2, which means that *M* is bounded.

#### *3.3.1 Proof of Theorem 3.1*

To use dikernels in our analysis, we first introduce a continuous version of *pn*,*A*,*d*,*<sup>b</sup>*. The real-valued function *Pn*,*A*,*d*,*<sup>b</sup>* on the functions *<sup>f</sup>* : [0, <sup>1</sup>] → <sup>R</sup> is defined as

$$P\_{n,A,\mathbf{d},\mathbf{b}}(f) = \langle f, \mathcal{R}f \rangle + \langle f^2, \mathcal{D}1 \rangle + \langle f, \mathcal{B}1 \rangle,$$

where D and B are the dikernels corresponding to *d***1** and *b***1** , respectively, *<sup>f</sup>* <sup>2</sup> : [0, <sup>1</sup>] → <sup>R</sup> is a function such that *<sup>f</sup>* <sup>2</sup>(*x*) <sup>=</sup> *<sup>f</sup>* (*x*)<sup>2</sup> for every *<sup>x</sup>* ∈ [0, <sup>1</sup>] and <sup>1</sup> : [0, <sup>1</sup>] → <sup>R</sup> is a constant function that has a value of 1 everywhere. The following lemma states that the minimizations of *pn*,*A*,*d*,*<sup>b</sup>* and *Pn*,*A*,*d*,*<sup>b</sup>* are equivalent:

**Lemma 3.3** *Let A* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup> be a matrix and <sup>d</sup>*, *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup> be vectors. Then, we have*

$$\min\_{\nu \in [-M,M]^n} p\_{n,A,d,b}(\nu) = n^2 \cdot \inf\_{f: [0,1] \to [-M,M]} P\_{n,A,d,b}(f) \,.$$

*for any M* > 0*.*

*Proof* First, we show that *n*<sup>2</sup> · inf *<sup>f</sup>* :[0,1]→[−*M*,*M*] *Pn*,*A*,*d*,*<sup>b</sup>*( *f* ) ≤ min*<sup>v</sup>*∈[−*M*,*M*] *n pn*,*A*,*d*,*<sup>b</sup>*(*v*). Given a vector *v* ∈ [−*M*, *M*] *<sup>n</sup>*, we define *<sup>f</sup>* : [0, <sup>1</sup>] → [−*M*, *<sup>M</sup>*] as *f* (*x*) = *vin* (*x*). Then,

$$\begin{split} \langle f, \mathcal{A}f \rangle &= \sum\_{i,j:\{\mathrm{el}\}} \int\_{I\_i^n} \int\_{I\_j^n} A\_{ij} f(\mathbf{x}) f(\mathbf{y}) \mathrm{d\mathbf{x}} \mathrm{d\mathbf{y}} = \frac{1}{n^2} \sum\_{i,j:\{\mathrm{el}\}} A\_{ij} \mathrm{v\_i} \mathbf{v}\_j = \frac{1}{n^2} \langle \mathbf{v}, A\mathbf{v} \rangle, \\ \langle f^2, \mathcal{D}1 \rangle &= \sum\_{i,j:\{\mathrm{el}\}} \int\_{I\_i^n} \int\_{I\_j^n} d\_i f(\mathbf{x})^2 \mathrm{d\mathbf{x}} \mathrm{d\mathbf{y}} = \sum\_{i\in\{\mathrm{el}\}} \int\_{I\_i^n} d\_i f(\mathbf{x})^2 \mathrm{d\mathbf{x}} \\ &= \frac{1}{n} \sum\_{i\in\{\mathrm{el}\}} d\_i \mathbf{v}\_i^2 = \frac{1}{n} \langle \mathbf{v}, \mathrm{diag}(\mathbf{d}) \mathbf{v} \rangle, \\ \langle f, \mathcal{B}1 \rangle &= \sum\_{i,j:\{\mathrm{el}\}} \int\_{I\_i^n} \int\_{I\_j^n} b\_i f(\mathbf{x}) \mathrm{d\mathbf{x}} \mathrm{d\mathbf{y}} = \sum\_{i\in\{\mathrm{el}\}} \int\_{I\_i^n} b\_i f(\mathbf{x}) \mathrm{d\mathbf{x}} = \frac{1}{n} \sum\_{i\in\{\mathrm{el}\}} b\_i \mathbf{v}\_i = \frac{1}{n} \langle \mathbf{v}, \mathbf{b} \rangle. \end{split}$$

Hence, we have *n*<sup>2</sup>*Pn*,*A*,*d*,*<sup>b</sup>*( *f* ) ≤ *pn*,*A*,*d*,*<sup>b</sup>*(*v*).

Next, we show that min*<sup>v</sup>*∈[−*M*,*M*]*<sup>n</sup> pn*,*A*,*d*,*<sup>b</sup>*(*v*) ≤ *n*<sup>2</sup> · inf *<sup>f</sup>* :[0,1]→[−*M*,*M*] *Pn*,*A*,*d*,*<sup>b</sup>*( *f* ). Let *f* : [0, 1] → [−*M*, *M*] be a measurable function. For *x* ∈ [0, 1], we then have

$$\begin{aligned} &\frac{\partial \, \partial\_{n,\mathbf{d},\mathbf{d},\mathbf{b}}(f(\mathbf{x}))}{\partial f(\mathbf{x})} \\ &= \sum\_{i \in [n]} \int\_{I\_i^n} A\_{ii\_n(\mathbf{x})} f(\mathbf{y}) \mathrm{d}\mathbf{y} + \sum\_{j \in [n]} \int\_{I\_j^n} A\_{i\_n(\mathbf{x})j} f(\mathbf{y}) \mathrm{d}\mathbf{y} + 2d\_{i\_n(\mathbf{x})} f(\mathbf{x}) + b\_{i\_n(\mathbf{x})} .\end{aligned}$$

Note that the form of this partial derivative depends on only *in*(*x*). Hence, in the optimal solution *f* <sup>∗</sup> : [0, 1] → [−*M*, *M*], we can assume *f* <sup>∗</sup>(*x*) = *f* <sup>∗</sup>(*y*) if *in*(*x*) = *in*(*y*). In other words, *f* <sup>∗</sup> is constant on each of the intervals *I <sup>n</sup>* <sup>1</sup> ,..., *I <sup>n</sup> <sup>n</sup>* . For such *<sup>f</sup>* <sup>∗</sup>, we define the vector *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* as *vi* <sup>=</sup> *<sup>f</sup>* <sup>∗</sup>(*x*), where *<sup>x</sup>* ∈ [0, <sup>1</sup>] is any element in *I n <sup>i</sup>* . Then, we have

$$
\langle \mathbf{v}, A\mathbf{v} \rangle = \sum\_{i, j \in [n]} A\_{ij} \mathbf{v}\_i \mathbf{v}\_j = n^2 \sum\_{i, j \in [n]} \int\_{I\_i^n} \int\_{I\_j^n} A\_{ij} f^\*(\mathbf{x}) f^\*(\mathbf{y}) \mathbf{dx} \mathbf{dy} = n^2 \langle f^\*, \mathcal{B}f^\* \rangle,
$$

$$
\langle \mathbf{v}, \operatorname{diag}(\mathbf{d}) \mathbf{v} \rangle = \sum\_{i \in [n]} d\_i \mathbf{v}\_i^2 = n \sum\_{i \in [n]} \int\_{I\_i^n} d\_i f^\*(\mathbf{x})^2 \mathbf{dx} = n \langle (f^\*)^2, \mathcal{D}1 \rangle,
$$

$$
\langle \mathbf{v}, \mathbf{b} \rangle = \sum\_{i \in [n]} b\_i \mathbf{v}\_i = n \sum\_{i \in [n]} \int\_{I\_i^n} b\_i f^\*(\mathbf{x}) \mathbf{dx} = n \langle f^\*, \mathcal{B}1 \rangle.
$$

Hence, we have *pn*,*A*,*d*,*<sup>b</sup>*(*v*) ≤ *n*<sup>2</sup>*Pn*,*A*,*d*,*<sup>b</sup>*( *f* <sup>∗</sup>).

*Proof* (*of Theorem*3.1)We instantiate Lemma 3.2with *<sup>s</sup>* <sup>=</sup> <sup>2</sup>(1/2) <sup>+</sup> (log <sup>1</sup> <sup>δ</sup> log log <sup>1</sup> δ ) and the dikernels A, D, and B. Then, with probability at least 1 − δ, there exists a measure-preserving bijection π : [0, 1]→[0, 1] such that

$$\max\left\{ |\langle f, \left( \mathcal{H} - \pi(\mathcal{H}|\_{\mathcal{S}} \right) \right) f \rangle|, |\langle f^2, \left( \mathcal{O} - \pi(\mathcal{O}|\_{\mathcal{S}} \right) \mathbb{I} \rangle|, |\langle f, \left( \mathcal{B} - \pi(\mathcal{B}|\_{\mathcal{S}} \right) \right) \mathbb{I} \rangle \right\} \le \frac{\epsilon L M^2}{3}$$

for any function *f* : [0, 1] → [−*M*, *M*]. Conditioned on this event, we have

*z*˜ <sup>∗</sup> = min *<sup>v</sup>*∈R*<sup>s</sup> ps*,*A*|*<sup>S</sup>* ,*d*|*<sup>S</sup>* ,*b*|*<sup>S</sup>* (*v*) <sup>=</sup> min *<sup>v</sup>*∈[−*M*,*M*] *<sup>s</sup> ps*,*A*|*<sup>S</sup>* ,*d*|*<sup>S</sup>* ,*b*|*<sup>S</sup>* (*v*) <sup>=</sup> *<sup>s</sup>*<sup>2</sup> · inf *f* :[0,1]→[−*M*,*M*] *Ps*,*A*|*<sup>S</sup>* ,*d*|*<sup>S</sup>* ,*b*|*<sup>S</sup>* ( *f* ) (By Lemma 3) <sup>=</sup> *<sup>s</sup>*<sup>2</sup> · inf *f* :[0,1]→[−*M*,*M*] *<sup>f</sup>*, (π(A|*S*) <sup>−</sup> <sup>A</sup>) *<sup>f</sup>* + *<sup>f</sup>*, <sup>A</sup> *<sup>f</sup>* + *<sup>f</sup>* <sup>2</sup> , (π(D|*S*) − D)1+ *f* 2 , D1+ *f*, (π(B|*S*) − B)1+ *f*, B1 <sup>≤</sup> *<sup>s</sup>*<sup>2</sup> · inf *f* :[0,1]→[−*M*,*M*] *<sup>f</sup>*, <sup>A</sup> *<sup>f</sup>* + *<sup>f</sup>* <sup>2</sup> , <sup>D</sup>1+ *<sup>f</sup>*, <sup>B</sup>1 ± *L M*<sup>2</sup> = *s*2 *<sup>n</sup>*<sup>2</sup> · min *<sup>v</sup>*∈[−*M*,*M*]*<sup>n</sup> pn*,*A*,*d*,*<sup>b</sup>*(*v*) <sup>±</sup> *L M*<sup>2</sup> *<sup>s</sup>*<sup>2</sup> . (By Lemma 3) = *s*2 *<sup>n</sup>*<sup>2</sup> · min *<sup>v</sup>*∈R*<sup>n</sup> pn*,*A*,*d*,*<sup>b</sup>*(*v*) <sup>±</sup> *L M*<sup>2</sup> *<sup>s</sup>*<sup>2</sup> <sup>=</sup> *<sup>s</sup>*<sup>2</sup> *<sup>n</sup>*<sup>2</sup> *<sup>z</sup>*<sup>∗</sup> <sup>±</sup> *L M*<sup>2</sup> *<sup>s</sup>*<sup>2</sup> .

Rearranging the inequality, we obtain the desired result.

#### **3.4 Tensor Decomposition**

Background

We say that a tensor (or a multidimensional array) is of *order K* if it is a *K*dimensional array. Each dimension is called a *mode* in tensor terminology. Tensor decomposition, which approximates the input tensor by a number of smaller tensors, is a fundamental tool for dealing with large tensors because it drastically reduces memory usage.

Among the many existing tensor decomposition methods, *Tucker decomposition* [11] is a popular choice. To some extent, Tucker decomposition is analogous to singular-value decomposition (SVD). Whereas SVD decomposes a matrix into left and right singular vectors that interact via singular values, Tucker decomposition of an order-*K* tensor consists of *K* factor matrices that interact via the so-called core tensor. The key difference between SVD and Tucker decomposition is that, in the latter, the core tensor does not need to be diagonal and its "rank" can differ for each mode. We refer to the size of the core tensor, which is a *K*-tuple, as the *Tucker rank* of a Tucker decomposition.

We are usually interested in obtaining factor matrices and a core tensor to minimize the *residual error*—the error between the input and low-rank approximated tensors. Sometimes, however, knowing the residual error itself is a task of interest. The residual error tells us how suitable a low-rank approximation is to approximate the input tensor in the first place, and is also useful to predetermine the Tucker rank. In real applications, Tucker ranks are not explicitly given, and we must select them by considering the tradeoff between space usage and approximation accuracy. For example, if the selected Tucker rank is too small, we risk losing essential information in the input tensor, whereas if the selected Tucker rank is too large, the computational cost of computing the Tucker decomposition (even if we allow for approximation methods) increases considerably along with space usage. As with the case of the matrix rank, one might think that a reasonably good Tucker rank can be found using a grid search. Unfortunately, grid search for an appropriate Tucker rank is challenging because, for an order-*K* tensor, the Tucker rank consists of *K* free parameters and the search space grows exponentially in *K*. Hence, we want to evaluate each grid point as quickly as possible.

Although several practical algorithms have been proposed, such as the higher order orthogonal iteration (HOOI) [3], they are not sufficiently scalable. For each mode, HOOI iteratively applies SVD to an unfolded tensor—a matrix that is reshaped from the input tensor. Given an *N*<sup>1</sup> ×···× *NK* tensor, the computational cost is hence *O*(*K* max*<sup>k</sup> Nk* · *<sup>k</sup> Nk* ), which crucially depends on the input size *N*1,..., *NK* . Although there are several approximation algorithms, their computational costs are still intensive.

Constant-time algorithm for the Tucker fitting problem

The problem of computing the residual error is formalized as the following *Tucker fitting* problem: Given an order-*<sup>K</sup>* tensor *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···×*NK* and integers *Rk* <sup>≤</sup> *Nk* (*<sup>k</sup>* <sup>=</sup> 1,..., *K*), we want to compute the following normalized residual error:

$$\ell\_{R\_1,\ldots,R\_K}(X) := \min\_{G \in \mathbb{R}^{R\_1 \times \cdots \times R\_K}, \{U^{(k)} \in \mathbb{R}^{N\_k \times R\_k}\}\_{k \in \{K\}}} \frac{\left| X - \llbracket \!\!\! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \!$$

where [[*G*; *U*(1) ,..., *U*(*K*) ]] ∈ <sup>R</sup>*<sup>N</sup>*1×···×*NK* is an order-*<sup>K</sup>* tensor, defined as

$$\|\mathbb{G}; U^{(1)}, \dots, U^{(K)}\|\_{i\_1 \cdots i\_K} = \sum\_{r\_1 \in \{R\_1\}, \dots, r\_K \in \{R\_K\}} G\_{r\_1 \cdots r\_K} \prod\_{k \in \{K\}} U^{(k)}\_{i\_k r\_k}$$

for every *i*<sup>1</sup> ∈ [*N*1],...,*iK* ∈ [*NK* ]. Here, *G* is the core tensor, and *U*(1) ,..., *U*(*K*) are the factor matrices. Note that we are not concerned with computing the minimizer, but only want to compute the minimum value. In addition, we do not need the exact minimum. Indeed, a rough estimate still helps to narrow down promising rank candidates. The question here is how quickly we can compute the normalized residual error *<sup>R</sup>*1,...,*RK* (*X*) with moderate accuracy.

In this section, we consider the following simple sampling algorithm, and show that it can be used to approximately solve the Tucker fitting problem. First, given an order-*<sup>K</sup>* tensor *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···×*NK* , Tucker rank (*R*1,..., *RK* ), and sample size *<sup>s</sup>* <sup>∈</sup> <sup>N</sup>, we sample a sequence of indices *Sk* <sup>=</sup> (*x<sup>k</sup>* <sup>1</sup> ,..., *x<sup>k</sup> <sup>s</sup>* ) uniformly and independently from [*Nk* ] for each mode *k* ∈ [*K*]. We then construct a minitensor *<sup>X</sup>*|*<sup>S</sup>*1,...,*SK* <sup>∈</sup> <sup>R</sup>*<sup>s</sup>*×···×*<sup>s</sup>*, where (*X*|*<sup>S</sup>*1,...,*SK* )*<sup>i</sup>*1,...,*iK* <sup>=</sup> *Xx*<sup>1</sup> *i*1 ...,*x <sup>K</sup> iK* . Finally, we compute *<sup>R</sup>*1,...,*RK* (*X*|*<sup>S</sup>*1,...,*SK* ) using an arbitrary solver, such as HOOI, and output the obtained value. The details are provided in Algorithm 2. Note that the time complex-

#### **Algorithm 2** Sampling algorithm for the Tucker fitting problem

**Input:** *<sup>N</sup>*1,..., *NK* <sup>∈</sup> <sup>N</sup>, query access to a tensor *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*N*1×···×*NK* , Tucker rank (*R*1,..., *Rk* ), and , δ ∈ (0, 1).


ity for computing *<sup>R</sup>*1,...,*RK* (*X*|*S*1,...,*SK* ) does not depend on the input size *N*1,..., *NK* but rather on the sample size *s*, meaning that the algorithm runs in *constant time*, regardless of the input size.

The goal of the rest of this section is to show the following approximation guarantee of Algorithm 2.

**Theorem 3.2** *Let X* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···×*NK be a tensor, R*1,..., *RK be integers, and* , δ <sup>∈</sup> (0, <sup>1</sup>)*. For s*(, δ) <sup>=</sup> <sup>2</sup>(1/2*K*−2) <sup>+</sup> (log <sup>1</sup> <sup>δ</sup> log log <sup>1</sup> <sup>δ</sup> )*, we have the following. Let S*1,..., *SK be sequences of indices as defined in Algorithm 2. Let* (*G*∗, *U*<sup>∗</sup> <sup>1</sup> ,..., *U*<sup>∗</sup> *K* ) *and* (*G*˜ <sup>∗</sup>, *U*˜ <sup>∗</sup> <sup>1</sup> ,..., *U*˜ <sup>∗</sup> *<sup>K</sup>* ) *be minimizers of problem* (3.2) *on X and X*|*S*1,...,*SK for which the factor matrices are orthonormal, respectively. Then we have*

$$\ell\_{R\_1,\ldots,R\_K}(X|\_{S\_1,\ldots,S\_K}) = \ell\_{R\_1,\ldots,R\_K}(X) \pm O(\epsilon L^2 (1 + 2MR)),$$

*with probability at least* 1 − δ*, where L* = |*X*|max*, M* = max{|*G*∗|max, |*G*˜ ∗|max}*, and R* = *<sup>k</sup>*∈[*K*] *Rk .*

We remark that, for the matrix case (i.e., *K* = 2), |*G*∗|max and |*G*˜ ∗|max are equal to the maximum singular values of the original and sampled matrices, respectively.

#### *3.4.1 Preliminaries*

Let *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···*NK* be a tensor. We define

$$|X|\_F = \sqrt{\sum\_{i\_1,\ldots,i\_K} X\_{i\_1\cdots i\_K}^2},\tag{Frobenius norm}$$

$$|X|\_{\max} = \max\_{i\_1 \in \{N\_1\}, \ldots, i\_K \in \{N\_K\}} |X\_{i\_1\cdots i\_K}|,\tag{\text{Max norm}}$$

$$|X|\_{\square} = \max\_{S\_1 \subseteq \{N\_1\}, \ldots, S\_K \subseteq \{N\_K\}} \left| \sum\_{i\_1 \in S\_1, \ldots, i\_K \in S\_K} X\_{i\_1\cdots i\_K} \right|.\tag{\text{Cut norm}}$$

We note that these norms satisfy the triangle inequality.

For a vector *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and a sequence *<sup>S</sup>* <sup>=</sup> (*x*1,..., *xs*) of indices in [*n*], we define the *restriction <sup>v</sup>*|*<sup>S</sup>* <sup>∈</sup> <sup>R</sup>*<sup>s</sup>* of *<sup>v</sup>* as (*v*|*S*)*<sup>i</sup>* <sup>=</sup> *vxi* for *<sup>i</sup>* ∈ [*s*]. Let *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···×*NK* be a tensor, and *Sk* <sup>=</sup> (*x<sup>k</sup>* <sup>1</sup> ,..., *x<sup>k</sup> <sup>s</sup>* ) be a sequence of indices in [*Nk* ] for each mode *k* ∈ [*K*]. Then, we define the *restriction X*|*<sup>S</sup>*1,...,*SK* <sup>∈</sup> <sup>R</sup>*<sup>s</sup>*×···×*<sup>s</sup>* of *<sup>X</sup>* to *<sup>S</sup>*<sup>1</sup> ×···× *SK* as (*X*|*<sup>S</sup>*1,...,*SK* )*<sup>i</sup>*1···*iK* = *Xx*<sup>1</sup> *i*1 ,...,*x <sup>K</sup> iK* for each *i*<sup>1</sup> ∈ [*N*1],...,*iK* ∈ [*Nk* ].

For a tensor *<sup>G</sup>*∈R*<sup>R</sup>*1×···×*RK* and vector-valued functions{*F*(*k*) : [0, <sup>1</sup>]→R*Rk* }*<sup>k</sup>*∈[*K*], we define an order-*K* dikernel [[*G*; *F*(1) ,..., *F*(*K*) ]] : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> as

$$\|G; F^{(1)}, \dots, F^{(K)}\| (\mathbf{x}\_1, \dots, \mathbf{x}\_K) = \sum\_{r\_1 \in \{R\_1\}, \dots, r\_K \in \{R\_K\}} G\_{r\_1, \dots, r\_K} \prod\_{k \in \{K\}} F^{(k)}(\mathbf{x}\_k)\_{r\_k}$$

We note that[[*G*; *F*(1) ,..., *F*(*K*) ]]is a continuous analogue of Tucker decomposition.

#### *3.4.2 Proof of Theorem 3.2*

To prove Theorem 3.2, we first consider the dikernel counterpart to the Tucker fitting problem, in which we want to minimize the following:

$$\ell\_{R\_1,\ldots,R\_K}(\mathcal{X}) := \inf\_{G \in \mathbb{R}^{R\_1 \times \ldots \times R\_K}, \{f^{(k)}; [0,1] \mapsto \mathbb{R}^{R\_k}\}\_{k \in [K]}} \left| \mathcal{X} - \llbracket \!\!fG; f^{(1)}, \ldots, f^{(K)} \!\!\! \|\_{F}^{2} , \tag{3.3}$$

The following lemma, which is proved in Sect. 3.4.3, states that the Tucker fitting problem and its dikernel counterpart have the same optimum values.

**Lemma 3.4** *Let X* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*1×···×*NK be a tensor, and let R*1,..., *RK* <sup>∈</sup> <sup>N</sup> *be integers. Then, we have*

$$\ell\_{\mathcal{R}\_1,\ldots,\mathcal{R}\_K}(X) = \ell\_{\mathcal{R}\_1,\ldots,\mathcal{R}\_K}(\mathcal{X})\,.$$

For a set of vector-valued functions *<sup>F</sup>* = { *<sup>f</sup>* (*k*) : [0, <sup>1</sup>] → <sup>R</sup>*Rk* }*<sup>k</sup>*∈[*K*], we define <sup>|</sup>*F*|max <sup>=</sup> max*<sup>k</sup>*∈[*K*],*r*∈[*Rk* ],*x*∈[0,1] *<sup>f</sup>* (*k*) *<sup>r</sup>* (*x*). For a dikernel <sup>X</sup> : [0, <sup>1</sup>] *<sup>K</sup>* <sup>→</sup> <sup>R</sup>, we define a dikernel <sup>X</sup><sup>2</sup> : [0, <sup>1</sup>] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> as <sup>X</sup><sup>2</sup>(*x*) <sup>=</sup> <sup>X</sup>(*x*) <sup>2</sup> for every *<sup>x</sup>* ∈ [0, <sup>1</sup>] *<sup>K</sup>* . The following lemma, which is proved in Sect. 3.4.4, states that if X and Y are close in the cut norm, then the optimum values when the Tucker fitting problem is applied to them are also close.

**Lemma 3.5** *Let* X, Y : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> *be dikernels with* <sup>|</sup><sup>X</sup> <sup>−</sup> <sup>Y</sup>|- <sup>≤</sup> *and* <sup>|</sup>X<sup>2</sup> <sup>−</sup> <sup>Y</sup><sup>2</sup>|-<sup>≤</sup> *. For R*1,..., *RK* <sup>∈</sup> <sup>N</sup>*, we have*

$$\ell\_{R\_1,\ldots,R\_K}(\mathcal{X}) = \ell\_{R\_1,\ldots,R\_K}(\mathcal{Y}) \pm 2\epsilon \left( 1 + R \left( |G\_{\mathcal{X}}|\_{\max} |F\_{\mathcal{X}}|\_{\max}^K + |G\_{\mathcal{Y}}|\_{\max} |F\_{\mathcal{Y}}|\_{\max}^K \right) \right),$$

*where* (*G*X, *F*<sup>X</sup> = { *f* (*k*) <sup>X</sup> }*<sup>k</sup>*∈[*K*]) *and* (*G*Y,*F*<sup>Y</sup> = { *<sup>f</sup>* (*k*) <sup>Y</sup> }*<sup>k</sup>*∈[*K*]) *are solutions to problem* (3.3) *on* X *and* Y*, respectively, which have objective values exceeding the infima by at most , and R* = *<sup>k</sup>*∈[*K*] *Rk .*

*Proof* (*of Theorem* 3.2) We apply Lemma 3.2 to <sup>X</sup> and <sup>X</sup>2. Thus, with probability at least 1 − δ, there exists a measure-preserving bijection π : [0, 1]→[0, 1] such that

$$|\mathcal{X} - \pi(\mathcal{X}|\_{\mathcal{S}\_1,\dots,\mathcal{S}\_K})|\_{\square} \le \epsilon L \quad \text{and} \quad |\mathcal{X}^2 - \pi(\mathcal{X}^2|\_{\mathcal{S}\_1,\dots,\mathcal{S}\_K})|\_{\square} \le \epsilon L^2 \dots$$

In the following, we assume that this has happened. By Lemma 3.5 and the fact that *<sup>R</sup>*1,...,*RK* (X|*<sup>S</sup>*1,...,*SK* ) = *<sup>R</sup>*1,...,*RK* (π(X|*<sup>S</sup>*1,...,*SK* )), we have

$$\ell\_{\mathcal{R}\_1,\ldots,\mathcal{R}\_K}(\mathcal{X}|\_{\mathcal{S}\_1,\ldots,\mathcal{S}\_K}) = \ell\_{\mathcal{R}\_1,\ldots,\mathcal{R}\_K}(\mathcal{X}) \pm \epsilon L^2 \Big(1 + 2\mathcal{R}(|G|\_{\max}|F|\_{\max}^K + |\tilde{G}|\_{\max}|\tilde{F}|\_{\max}^K)\Big),$$

where (*G*, *F* = { *f* (*k*) }*k*∈[*K*]) and (*G*˜ , *F*˜ = { ˜*f* (*k*) }*k*∈[*K*]) are as in the statement of Lemma 3.5. From the proof of Lemma 3.4, we can assume that |*G*|max = |*G*∗|max, |*G*˜ |max = |*G*˜ ∗|max, |*F*|max ≤ 1, and |*F*˜|max ≤ 1 (owing to the orthonormality of *U*∗ <sup>1</sup> ,..., *U*<sup>∗</sup> *<sup>K</sup>* and *U*˜ <sup>∗</sup> <sup>1</sup> ,..., *U*˜ <sup>∗</sup> *<sup>K</sup>* ). It follows that

$$\ell\_{R\_1,\ldots,R\_K}(\mathcal{X}|\_{S\_1,\ldots,S\_K}) = \ell\_{R\_1,\ldots,R\_K}(\mathcal{X}) \pm \epsilon L^2 \left(1 + 2R(|G^\*|\_{\max} + |\tilde{G}^\*|\_{\max})\right). \tag{3.4}$$

Then, we have

$$\ell\_{R\_1,\ldots,R\_K}(X|\_{S\_1,\ldots,S\_K}) = \ell\_{R\_1,\ldots,R\_K}(X|\_{S\_1,\ldots,S\_K})\tag{By Lemma 4}$$

$$\ell = \ell\_{R\_1,\ldots,R\_K}(X) \pm \epsilon L^2 \Big(1 + 2R(|G^\*|\_{\max} + |\tilde{G}^\*|\_{\max})\Big)\tag{By 4}$$

$$\ell = \ell\_{R\_1,\ldots,R\_K}(X) \pm \epsilon L^2 \Big(1 + 2R(|G^\*|\_{\max} + |\tilde{G}^\*|\_{\max})\Big). \tag{By Lemma 4}$$

Hence, the proof is complete.

#### *3.4.3 Proof of Lemma 3.4*

We say that a vector-valued function *<sup>f</sup>* : [0, <sup>1</sup>] → <sup>R</sup>*<sup>R</sup>* is *orthonormal* if *fr*, *fr* = <sup>1</sup> for every *r* ∈ [*R*] and *fr*, *fr* = 0 if*r* = *r* . First, we calculate the partial derivatives of the objective function. We omit the proof because it is a straightforward (but tedious) calculation.

**Lemma 3.6** *Let* <sup>X</sup> ∈ [0, <sup>1</sup>]*<sup>K</sup>* <sup>→</sup> <sup>R</sup> *be a dikernel, G* <sup>∈</sup> <sup>R</sup>*<sup>R</sup>*1×···*RK be a tensor, and* { *<sup>f</sup>* (*k*) : [0, <sup>1</sup>] → <sup>R</sup>*Rk* }*<sup>k</sup>*∈[*K*] *be a set of orthonormal vector-valued functions. Then, we have*

$$\begin{split} & \frac{\partial}{\partial f\_{r\_{0}}^{(k\_{0})}(\mathbf{x}\_{0})} \Big| \mathcal{X} - \llbracket \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \!$$

*Proof* (*of Lemma* 3.4) First, we show that (LHS) ≤ (RHS). Consider a sequence of solutions for the continuous problem (3.3) for which the objective values attain the infimum. For Tucker decompositions, it is well known that there exists a minimizer for which the factor matrices *U*(1) ,..., *U*(*K*) are orthonormal. By similar reasoning, we can show that the vector-valued functions *f* (1) ,..., *f* (*K*) in each solution of the sequence are orthonormal. As the objective function is coercive with respect to tensor *G*, we can take a subsequence for which *G* converges. Let *G*<sup>∗</sup> be the limit. Now, for any δ > 0, we can create a matrix *G*˜ by perturbing *G*<sup>∗</sup> so that (i) by fixing *G* to *G*˜ in the continuous problem, the infimum increases only by δ, and (ii) a matrix constructed from *G*˜ is invertible and has a condition number at least δ = δ (δ) > 0.

Now, consider a sequence of solutions for the continuous problem (3.3) with *G* fixed to *G*˜ for which the objective values attain the infimum. We can show that the partial derivatives converge to zero almost everywhere. For any > 0, there then exists a solution (*G*˜ , *f* (1) ,..., *f* (*K*) ) in the sequence such that the partial derivatives are at most almost everywhere.

Then by Lemma 3.6, for any *k*<sup>0</sup> ∈ [*K*], *r*<sup>0</sup> ∈ [*Rk* ], and almost all *x* ∈ [0, 1], we have

$$\sum\_{r\_1,\ldots,r\_K} \tilde{G}\_{r\_1\cdots r\_K} \tilde{G}\_{r\_1\cdots r\_{k\_0-1}r\_0r\_{k\_0+1}\cdots r\_K} f^{(k\_0)}\_{r\_{k\_0}}(\mathbf{x}\_0)$$

$$=\sum\_{r\_1,\ldots,r\_K;r\_{k\_0}=r\_0} \tilde{G}\_{r\_1\cdots r\_K} \int\_{[0,1]^K:\mathbf{x}\_{k\_0}=\mathbf{x}\_0} \mathcal{X}(\mathbf{x}) \prod\_{k\in\{K\}\backslash\{k\_0\}} f^{(k)}\_{r\_k}(\mathbf{x}\_k) d\mathbf{x} \pm \epsilon(k\_0,r\_0,\mathbf{x}),\tag{3.5}$$

where (*k*0,*r*0, *x*) = *O*(). Now, we consider a system of linear equations consisting of (3.5) for *<sup>r</sup>*<sup>0</sup> <sup>=</sup> <sup>1</sup>,..., *Rk*<sup>0</sup> , where the variables are *<sup>f</sup> <sup>k</sup>*<sup>0</sup> <sup>1</sup> (*x*0), . . . , *<sup>f</sup> <sup>k</sup>*<sup>0</sup> *Rk*<sup>0</sup> (*x*0). We can assume that the matrix involved in this system is invertible and has a positive condition number. For any *k* ∈ [*K*],*r* ∈ [*Rk* ] and almost every pair *x*, *x* ∈ [0, 1] with *iNk* (*x*) = *iNk* (*x* ), we then have *f* (*k*0) *<sup>r</sup>*<sup>0</sup> (*x*) = *f* (*k*0) *<sup>r</sup>*<sup>0</sup> (*x* ) ± *O*(/δ ). For each *k* ∈ [*K*], we can define a matrix *<sup>U</sup>*(*k*) <sup>∈</sup> <sup>R</sup>*Nk*×*Rk* as *<sup>U</sup>*(*k*) *ir* = *f* (*k*) *<sup>r</sup>* (*x*), where *x* ∈ [0, 1] is an arbitrary value with *iNk* (*x*) = *i*. Then, we have

$$\begin{split} &\frac{1}{N}\left|\boldsymbol{X} - \mathbb{I}\tilde{\boldsymbol{G}};\boldsymbol{U}^{(1)},\ldots,\boldsymbol{U}^{(K)}\mathbb{I}\right|\_{F}^{2} = \frac{1}{N}\sum\_{i\_{1},\ldots,i\_{K}}\left(\boldsymbol{X}\_{i\_{1}\cdots i\_{K}} - \mathbb{I}\tilde{\boldsymbol{G}};\boldsymbol{U}^{(1)},\ldots,\boldsymbol{U}^{(K)}\mathbb{I}\_{i\_{1}\cdots i\_{K}}\right)^{2} \\ &= \sum\_{i\_{1},\ldots,i\_{K}}\int\_{\boldsymbol{\kappa}\_{i\_{1}}^{\boldsymbol{\kappa}\_{1}}\times\cdots\times\boldsymbol{\kappa}\_{i\_{K}}^{\boldsymbol{\kappa}\_{K}}}\left(\boldsymbol{X}(\mathbf{x}) - \mathbb{I}\tilde{\boldsymbol{G}};\boldsymbol{f}^{(1)},\ldots,\boldsymbol{f}^{(K)}\mathbb{I}(\mathbf{x}) \pm O(\epsilon/\delta')\right)^{2}\mathbf{d}\mathbf{x} \\ &= \left|\boldsymbol{X} - \mathbb{I}\tilde{\boldsymbol{G}};\boldsymbol{f}^{(1)},\ldots,\boldsymbol{f}^{(K)}\mathbb{I}\right|\_{F}^{2} \pm O(\epsilon^{2}N/(\delta')^{2}) \end{split}$$

for *N* = *<sup>k</sup>*∈[*K*] *Nk* . As the choice of and δ are arbitrary, we obtain (LHS) ≤ (RHS).

Second, we show that (RHS) <sup>≤</sup> (LHS). Let *<sup>U</sup>*(*k*) <sup>∈</sup> <sup>R</sup>*Nk*×*Rk* (*<sup>k</sup>* ∈ [*K*]) be matrices. We define a vector-valued function *<sup>f</sup>* (*k*) : [0, <sup>1</sup>] → <sup>R</sup>*Rk* as *<sup>f</sup>* (*k*) *<sup>r</sup>* (*x*) <sup>=</sup> *<sup>U</sup>*(*k*) *iNk* (*x*)*<sup>r</sup>* for each *k* ∈ [*K*] and *r* ∈ [*Rk* ]. Then, we have

 <sup>X</sup> − [[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]] 2 *<sup>F</sup>* = [0,1] *K* <sup>X</sup>(*x*) − [[*G*; *<sup>f</sup>* (1) ,...,, *f* (*K*) ]](*x*) 2 d*x* <sup>=</sup> *i*1,...,*iK <sup>k</sup>*∈[*K*] *I Nk ik* <sup>X</sup>(*x*) − [[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]](*x*) 2 d*x* = 1 *N i*1,...,*iK Xi*1···*iK* − [[*G*; *<sup>U</sup>*(1) ,..., *U*(*K*) ]]*i*1···*iK* 2 = 1 *N <sup>X</sup>* − [*G*; *<sup>U</sup>*(1) ,..., *U*(*K*) ] 2 *F* ,

from which the claim follows.

#### *3.4.4 Proof of Lemma 3.5*

For a sequence of functions *f* (1) ,..., *f* (*K*) , we define their tensor product *k*∈[*K*] *f* (*k*) ∈ [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> as *<sup>k</sup>*∈[*K*] *<sup>f</sup>* (*k*) (*x*1,..., *xK* ) = *<sup>k</sup>*∈[*K*] *<sup>f</sup>* (*k*) (*xk* ), which is a dikernel of order-*K*.

The cut norm is useful for bounding the absolute value of the inner product between a tensor and a tensor product:

**Lemma 3.7** *Let* ≥ 0 *and* W : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> *be a dikernel with* <sup>|</sup>W|- ≤ *. Then, for any functions f* (1) ,..., *<sup>f</sup>* (*K*) : [0, <sup>1</sup>] → [−*L*, *<sup>L</sup>*]*, we have* |W, *<sup>k</sup>*∈[*K*] *<sup>f</sup>* (*k*) | ≤ *L <sup>K</sup> .*

*Proof* For <sup>τ</sup> <sup>∈</sup> <sup>R</sup> and the function *<sup>h</sup>* : [0, <sup>1</sup>] → <sup>R</sup>, let *<sup>L</sup>*<sup>τ</sup> (*h*) := {*<sup>x</sup>* ∈ [0, <sup>1</sup>] | *<sup>h</sup>*(*x*) <sup>=</sup> τ } be the level set of *h* at τ . For *f* (*i*) = *f* (*i*) /*L*, we have

$$\begin{split} & \left| \left\langle \mathcal{W}, \bigotimes\_{k \in [K]} f^{(k)} \right\rangle \right| = L^K \left| \left\langle \mathcal{W}, \bigotimes\_{k \in [K]} f^{(k)} \right\rangle \right| \\ & \quad = L^K \left| \int\_{[-1,1]^K} \prod\_{k \in [K]} \mathsf{\tau}\_k \int\_{\prod\_{k \in [K]} L\_{\tau\_k} \left(f^{(k)}\right)} \mathcal{W}(\mathbf{x}) \mathbf{dx} \mathbf{d} \mathbf{z} \right| \\ & \quad \le L^K \int\_{[-1,1]^K} \prod\_{k \in [K]} |\mathsf{\tau}\_k| \left| \int\_{\prod\_{k \in [K]} L\_{\tau\_k} \left(f^{(k)}\right)} \mathcal{W}(\mathbf{x}) \mathbf{dx} \mathbf{d} \mathbf{z} \right| \\ & \quad \le \epsilon L^K \int\_{[-1,1]^K} \prod\_{k \in [K]} |\mathsf{\tau}\_k| \mathbf{d} \mathbf{z} = \epsilon L^K. \end{split}$$

Thus, we have the following:

**Lemma 3.8** *Let* X, Y : [0, 1] *<sup>K</sup>* <sup>→</sup> <sup>R</sup> *be dikernels with* <sup>|</sup><sup>X</sup> <sup>−</sup> <sup>Y</sup>|- <sup>≤</sup> *and* <sup>|</sup>X<sup>2</sup> <sup>−</sup> <sup>Y</sup><sup>2</sup>|- <sup>≤</sup> *, where* <sup>X</sup><sup>2</sup>(*x*) <sup>=</sup> <sup>X</sup>(*x*) <sup>2</sup> *and* <sup>Y</sup><sup>2</sup>(*x*) <sup>=</sup> <sup>Y</sup>(*x*) <sup>2</sup> *for every <sup>x</sup>* ∈ [0, <sup>1</sup>] *<sup>K</sup> . Then, for any tensor G* <sup>∈</sup> <sup>R</sup>*<sup>R</sup>*1×···×*RK and a set of vector-valued functions F* = { *<sup>f</sup>* (*k*) : [0, <sup>1</sup>] → <sup>R</sup>*Rk* }*k*∈[*K*]*, we have*

$$\left| \mathcal{X} - \llbracket \boldsymbol{G} \colon f^{(1)}, \dots, f^{(K)} \rrbracket \right|\_F^2 = \left| \mathcal{Y} - \llbracket \boldsymbol{G} \colon f^{(1)}, \dots, f^{(K)} \rrbracket \right|\_F^2 \pm \epsilon \left( 1 + 2 \mathcal{R} \vert \boldsymbol{G} \vert\_{\max} \vert \boldsymbol{F} \vert\_{\max}^K \right),$$

*where R* = *<sup>k</sup>*∈[*K*] *RK .*

*Proof* We have

 <sup>X</sup> − [[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]] 2 *<sup>F</sup>* − <sup>Y</sup> − [[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]] 2 *F* = [0,1]*<sup>K</sup>* <sup>X</sup>(*x*) − [[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]](*x*) 2 d*x* − [0,1]*<sup>K</sup>* <sup>Y</sup>(*x*) − [[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]](*x*) 2 d*x* = [0,1]*<sup>K</sup>* X(*x*) <sup>2</sup> <sup>−</sup> <sup>Y</sup>(*x*) 2 d*x* − 2 [0,1]*<sup>K</sup>* (X(*x*) <sup>−</sup> <sup>Y</sup>(*x*))[[*G*; *<sup>f</sup>* (1) ,..., *f* (*K*) ]](*x*)d*x* ≤ |X<sup>2</sup> <sup>−</sup> <sup>Y</sup>2|- <sup>+</sup> <sup>2</sup> *r*1∈[*R*1],...,*rk*∈[*Rk* ] |*Gr*1···*rK* | · X − Y, *k*∈[*K*] *f* (*k*) *rk* ≤ + 2*R*|*G*|max|*F*| *K* max

by Lemma 3.7.

*Proof* (*of Lemma* 3.5) By Lemma 3.8, we have

$$\begin{aligned} & \left| \mathcal{Y} - \llbracket \boldsymbol{G} \boldsymbol{y}; f^{(1)}\_{\mathcal{Y}}, \dots, f^{(K)}\_{\mathcal{Y}} \rrbracket \right|\_{F}^{2} \leq \left| \mathcal{Y} - \llbracket \boldsymbol{G} \boldsymbol{\chi}; f^{(1)}\_{\mathcal{X}}, \dots, f^{(K)}\_{\mathcal{X}} \rrbracket \right|\_{F}^{2} + \epsilon \\ & \leq \left| \mathcal{X} - \llbracket \boldsymbol{G} \boldsymbol{\chi}; f^{(1)}\_{\mathcal{X}}, \dots, f^{(K)}\_{\mathcal{X}} \rrbracket \right|\_{F}^{2} + \left( 2\epsilon + 2\epsilon \boldsymbol{R} \left| \boldsymbol{G}\_{\mathcal{X}} \right| \mathop{\rm max} \nolimits \left\| \boldsymbol{F}\_{\mathcal{X}} \right\|\_{\max}^{K} \right). \end{aligned}$$

Similarly, we have

$$\begin{aligned} & \left\| X - \llbracket \boldsymbol{G}\_{\mathcal{X}} \boldsymbol{\varepsilon} \boldsymbol{f}\_{\mathcal{X}}^{(1)}, \dots, \boldsymbol{f}\_{\mathcal{X}}^{(K)} \rrbracket \right\|\_{F}^{2} \leq \left\| X - \llbracket \boldsymbol{G}\_{\mathcal{Y}} \boldsymbol{\varepsilon} \boldsymbol{f}\_{\mathcal{Y}}^{(1)}, \dots, \boldsymbol{f}\_{\mathcal{Y}}^{(K)} \rrbracket \right\|\_{F}^{2} + \epsilon \\ & \leq \left\| \mathcal{Y} - \llbracket \boldsymbol{G}\_{\mathcal{Y}} \boldsymbol{\varepsilon} \boldsymbol{f}\_{\mathcal{Y}}^{(1)}, \dots, \boldsymbol{f}\_{\mathcal{Y}}^{(K)} \rrbracket \right\|\_{F}^{2} + \left( 2\epsilon + 2\epsilon R \| \boldsymbol{G}\_{\mathcal{Y}} \|\_{\max} \| \boldsymbol{F}\_{\mathcal{Y}} \|\_{\max}^{K} \right). \end{aligned}$$

Hence, the claim follows.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Oracle-Based Primal-Dual Algorithms for Packing and Covering Semidefinite Programs**

**Khaled Elbassioni and Kazuhisa Makino**

**Abstract** *Packing and covering* semidefinite programs (SDPs) appear in natural relaxations of many combinatorial optimization problems as well as a number of other applications. Recently, several techniques have been proposed that utilize the particular structure of this class of problems in order to obtain more efficient algorithms than those offered by general SDP solvers. For certain applications, it may be necessary to deal with SDPs with a very large number of (e.g., exponentially or even infinitely many) constraints. In this chapter, we give an overview of some of the techniques that can be used to solve this class of problems, focusing on multiplicative weight updates and logarithmic-potential methods.

## **4.1 Packing and Covering Semidefinite Programs**

We denote by <sup>S</sup>*<sup>n</sup>* the set of all *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* real symmetric matrices and by <sup>S</sup>*<sup>n</sup>* <sup>+</sup> <sup>⊆</sup> <sup>S</sup>*<sup>n</sup>* the set of all *n* × *n* positive semidefinite (psd) matrices. We consider the following pairs of *packing-covering* semidefinite programs (SDPs):

*z*∗ *<sup>I</sup>* <sup>=</sup> max *<sup>C</sup>* • *<sup>X</sup>* (Packing- I) s.t. *Ai* • *X* ≤ *bi*, ∀*i* ∈ [*m*] *<sup>X</sup>* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>*, *<sup>X</sup>* <sup>0</sup> *z*∗ *<sup>I</sup>* <sup>=</sup> min *<sup>b</sup><sup>T</sup> <sup>y</sup>* (Covering- I) s.t.*m i*=1 *yi Ai C <sup>y</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*, *<sup>y</sup>* <sup>≥</sup> <sup>0</sup>,

K. Elbassioni (B)

Khalifa University of Science and Technology, P.O. Box 127788, Abu Dhabi, United Arab Emirates

e-mail: khaled.elbassioni@ku.ac.ae

K. Makino

Research Institute for Mathematical Sciences (RIMS), Kyoto University, Kyoto 606-8502, Japan e-mail: makino@kurims.kyoto-u.ac.jp

*z*∗ *I I* <sup>=</sup> min *<sup>C</sup>* • *<sup>X</sup>* (Covering- II) s.t. *Ai* • *X* ≥ *bi*, ∀*i* ∈ [*m*] *<sup>X</sup>* <sup>∈</sup> <sup>S</sup>*n*, *<sup>X</sup>* <sup>0</sup> *z*∗ *I I* <sup>=</sup> max *<sup>b</sup><sup>T</sup> <sup>y</sup>* (Packing- II) s.t.*m i*=1 *yi Ai C <sup>y</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*, *<sup>y</sup>* <sup>≥</sup> <sup>0</sup>,

where *<sup>C</sup>*, *<sup>A</sup>*1,..., *Am* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>* <sup>+</sup> are (non-zero) psd matrices, and *<sup>b</sup>* <sup>=</sup> (*b*1,..., *bn*)*<sup>T</sup>* <sup>∈</sup> R*<sup>m</sup>* <sup>+</sup> is a non-negative vector. In the above, *<sup>C</sup>* • *<sup>X</sup>* := Tr(*C X*) <sup>=</sup> *<sup>n</sup> i*=1 *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *ci j xi j* , and "" is the *Löwner order* on matrices: *A B* if and only if *A* − *B* is psd. This type of SDP arises in many applications. See, for example, [14, 15] and the references therein.

We assume the following throughout this chapter:

(A) *bi* > 0 and hence *bi* = 1 for all *i* ∈ [*m*].

It is known that, under assumption (A),*strong duality* holds for problems (Packing-I) and (Covering- I) (resp., (Packing- II) and (Covering- II)). Let - ∈ (0, 1] be a given constant.We say that(*X*, *y*)is an -*-optimal* primal-dual solution for (Packing-I)-(Covering- I) if (*X*, *<sup>y</sup>*) is a primal-dual feasible pair such that

$$\mathcal{C} \bullet X \ge (1 - \epsilon) b^{\mathcal{T}} \mathbf{y} \ge (1 - \epsilon) z\_I^\*. \tag{4.1}$$

Similarly, we say that (*X*, *y*) is an --optimal primal-dual solution for (Packing- II)- (Covering- II) if (*X*, *<sup>y</sup>*) is a primal-dual feasible pair such that

$$\mathcal{C} \bullet X \le (1 + \epsilon) b^T \mathbf{y} \le (1 + \epsilon) z\_{II}^\*. \tag{4.2}$$

In this chapter, we allow the number of constraints *<sup>m</sup>* in (Packing- I) (resp., (Covering- II)) to be *exponentially* (or even infinitely) large, so we assume the availability of the following *oracle*:

Max(*Y*)(resp., Min(*Y*)) : Given *<sup>Y</sup>* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>* <sup>+</sup>, find *i* ∈ argmax*<sup>i</sup>*∈[*m*] *Ai* • *Y* (resp., *i* ∈ argmin*<sup>i</sup>*∈[*m*] *Ai* • *Y* ).

Note that an *approximation* oracle computing the above maximum (resp., minimum) within a factor of (1 − -) (resp., (1 + -)) is also sufficient for our purposes. A primal-dual solution (*X*, *<sup>y</sup>*) to (Covering- I) (resp., (Packing- II)) is said to be η-*sparse* if the size of supp(*y*) := {*i* ∈ [*m*] : *yi* > 0} is at most η.

When *<sup>C</sup>* <sup>=</sup> *<sup>I</sup>* <sup>=</sup> *In* (which is the identity matrix in <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>*) and *<sup>b</sup>* <sup>=</sup> **<sup>1</sup>***<sup>m</sup>* (which is the vector containing all ones in R*<sup>m</sup>*), we say that the packing-covering SDPs are in *normalized* form. It can be shown (see, e.g., [7, 16]) that, to within a multiplicative factor of (1 + -) in the objective, any pair of packing-covering SDPs of the form (Packing- I)-(Covering- I) can be brought to normalized form in *<sup>O</sup>*(*n*<sup>3</sup>) time while increasing the oracle time by only *O*(*n*ω), where ω is the exponent of matrix multiplication, under the following assumption:

(B-I) There exist *<sup>r</sup>* matrices, say *<sup>A</sup>*1,..., *Ar*, such that *<sup>A</sup>*<sup>ˆ</sup> := *<sup>r</sup> <sup>i</sup>*=<sup>1</sup> *Ai* 0. In particular, Tr(*X*) <sup>≤</sup> <sup>τ</sup> := *<sup>r</sup>* <sup>λ</sup>min(*A*¯) for any optimal solution *<sup>X</sup>* for (Packing- I), and we may assume that *<sup>r</sup>* <sup>=</sup> 1 and *<sup>A</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> τ *I*.

Similarly, it can be shown that, to within a multiplicative factor of (1 + -) in the objective, any pair of packing-covering SDPs of the form (Packing- II)-(Covering-II) can be brought to normalized form in *<sup>O</sup>*(*n*3) time, while increasing the oracle time by only *O*(*n*ω). Moreover, we may assume in this normalized form that

$$\text{(B-II)}\ \lambda\_{\text{min}}(A\_i) = \Omega\left(\frac{\epsilon}{n} \cdot \text{min}\_{i'} \lambda\_{\text{max}}(A\_{i'})\right) \text{ for all } i \in [m],$$

where, for a psd matrix *<sup>B</sup>* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>* <sup>+</sup>, we denote by {λ*j*(*B*) : *j* = 1,..., *n*} the eigenvalues of *B*, and by λmin(*B*) and λmax(*B*) the minimum and maximum eigenvalues of *B*, respectively. Given additional *O*(*mn*<sup>2</sup>) time, we may also assume that

$$(\text{B-II'})\,\,\frac{\lambda\_{\text{max}}(A\_i)}{\lambda\_{\text{min}}(A\_i)} = O\left(\frac{n^2}{\epsilon^2}\right) \text{ for all } i \in [m].$$

Thus, the remainder of this chapter focuses on normalized problems.

#### **Mixed packing and covering SDPs**.

We also consider the following *mixed* packing-covering feasibility SDPs:

$$\begin{aligned} A\_i &\bullet X \le b\_i, \quad \forall i \in [m\_p] & \quad &(\text{MIX-PCA-COVER})\\ B\_i &\bullet X \ge d\_i, \quad \forall i \in [m\_c] \\ X \in \mathbb{S}^n, \quad X \succeq 0, \end{aligned}$$

where *<sup>A</sup>*1,..., *Amp* , *<sup>B</sup>*1,..., *Bmc* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>* are psd matrices, and *<sup>b</sup>* <sup>=</sup> (*b*1,..., *bmp* )*<sup>T</sup>* , *d* = (*d*1,..., *dmc* )*<sup>T</sup>* are non-negative real vectors.

A matrix *<sup>X</sup>* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>* <sup>+</sup> is an --approximate solution for (Mix- Pack- Cover) if *Ai* • *X* ≤ *bi* for all *i* ∈ [*mp*] and *Bi* • *X* ≥ (1 − -)*di* for all *i* ∈ [*mc*].

#### **4.2 Applications**

# *4.2.1 SDP relaxation for Robust* MaxCut

Given a simple undirected graph *G* = (*V*, *E*) on *n* = |*V*| vertices with non-negative edge weights <sup>w</sup> <sup>∈</sup> <sup>R</sup>*<sup>E</sup>* <sup>+</sup>, the objective in the well-known MaxCut problem is to find a subset of the vertices *X* ⊂ *V* that maximizes the weight of the cut: w(*X*, *V* \ *X*) := *<sup>u</sup>*∈*X*, v∈*V*\*<sup>X</sup>* w*<sup>u</sup>*v. The best-known approximation algorithm (with approximation ratio 0.<sup>878</sup> ...) [10] for MaxCut is based on the following SDP relaxation:

$$\max \ L(w) \bullet X \qquad\qquad \qquad (\text{MaxCut-SDP})$$

$$\begin{aligned} \text{s.t.} \quad &\mathbf{1}\_i \mathbf{1}\_i^T \bullet X = 1, \quad \forall i \in [n] \\ &X \in \mathbb{R}^{n \times n}, \ X \succeq 0. \end{aligned} \tag{4.3}$$

By simply changing the equality in (4.3) into an inequality, this can be written in the form (Packing- I), with *Ai* := **<sup>1</sup>***i***1***<sup>T</sup> <sup>i</sup>* and *C* := *L*(w) 0 being the *Laplacian* matrix of *G*, defined as follows:

$$L\_{ij}(w) = \begin{cases} \sum\_{k=1}^{n} w\_{ik} & \text{if } i = j, \\ -w\_{ij} & \text{if } \{i, j\} \in E, \\ 0 & \text{otherwise.} \end{cases}$$

Based on this relaxation, the following result is obtained using the *scalar multiplicative weights update (MWU)* method:

**Theorem 4.1** ([18]) *There is a randomized algorithm for finding an* -*-optimal solution for (*MaxCut- SDP*) in time <sup>O</sup>*˜( *nm* -<sup>3</sup> )*, where n and m respectively denote the number of vertices and edges in a given graph.*

Under the *robust optimization* framework, one assumes the weights are *not known precisely*, but instead are given by a *convex uncertainty* set *<sup>W</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>n</sup>* <sup>+</sup>, where it is necessary to find a (near)-optimal solution under the *worst-case* choice w ∈ *W* in the uncertainty set:

$$\begin{aligned} \max \min\_{w \in \mathcal{W}} L(w) \bullet X & \quad \text{ROBUst-MaxCut-SDP} \\\\ \text{s.t.} \quad \mathbf{1}\_i \mathbf{1}\_i^T \bullet X & \le 1, \quad \forall i \in [n] \\ X \in \mathbb{R}^{n \times n}, \ X & \succeq 0. \end{aligned} \tag{4.4}$$

By "guessing" the value τ of an optimal solution (via binary search), (4.4) can be reduced to

$$\begin{aligned} \min \quad & I \bullet X\\ \text{ROSUST- MaxCut-SDPs.t.} \quad & \mathbf{1}\_i \mathbf{1}\_i^T \bullet X \ge 1, \quad \forall i \in [n] \\ & \frac{1}{\pi} L(w) \bullet X \ge 1, \quad \forall w \in \mathcal{W} \\ & X \in \mathbb{R}^{n \times n}, \ X \succeq 0. \end{aligned}$$

Thus, we obtain a covering SDP (of type (Covering- II)) with an *infinite* number of constraints, given by a minimization oracle over the convex set *W*. We can use the *matrix logarithmic-potential* method to obtain the following result:

**Theorem 4.2** *There is a randomized algorithm that finds an* -*-optimal solution for (4.4) in time O*˜ *<sup>n</sup>*ω+<sup>1</sup> -<sup>2</sup>.<sup>5</sup> <sup>+</sup> *<sup>n</sup><sup>T</sup>* -2 *, where T is the time needed to optimize a linear function over W.*

Note that for this reduction to remain valid, it is sufficient to find an --optimal solution to (4.4) for any - = *o* 1 *n* .

#### *4.2.2 Mahalanobis Distance Learning*

Given a psd matrix *<sup>X</sup>* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>*, the *<sup>X</sup>*-*Mahalanobis distance* between two points *<sup>a</sup>*, *<sup>b</sup>* <sup>∈</sup> R*<sup>n</sup>* is defined as

$$d\_X(a,b) := \sqrt{(a-b)^T X (a-b)}\dots$$

The distance function *dX* (·, ·) is a semi-metric; that is, it is symmetric (*dX* (*a*, *b*) = *dX* (*a*, *b*)) and satisfies the triangle inequality (*dX* (*a*, *c*) ≤ *dX* (*a*, *b*) + *dM* (*b*, *c*)), and it is also a metric if *X* 0 (as in this case, *dX* (*a*, *b*) = 0 if and only if *a* = *b*).

The *Mahalanobis distance learning* problem is defined as follows [28]: Given sets *<sup>C</sup><sup>s</sup>* and *<sup>C</sup><sup>d</sup>* of *similar* and *dissimilar* pairs of points in <sup>R</sup>*<sup>n</sup>*, respectively, a similarity parameter <sup>σ</sup>*<sup>s</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> and a dissimilarity parameter <sup>σ</sup>*<sup>d</sup>* <sup>∈</sup> <sup>R</sup>+, the objective is to find a matrix *X* such that all the pairs in *C<sup>s</sup>* are "close" and all the pairs in *C<sup>d</sup>* are "far" with respect to the distance function *dX* (·, ·):

$$(a-b)^T X (a-b) \le \sigma\_s, \ \forall (a,b) \in \mathcal{C}\_s \tag{4.5}$$

$$(a-b)^T X (a-b) \ge \sigma\_d, \ \forall (a,b) \in \mathcal{C}\_d \tag{4.6}$$

$$X \in \mathbb{S}^n, \ X \succeq 0. \tag{4.7}$$

Note that this can be written in the form (Mix- Pack- Cover), with <sup>|</sup>*Cs*<sup>|</sup> packing constraints of the form *Aa*,*<sup>b</sup>* • *X* ≤ σ*s*, where *Aa*,*<sup>b</sup>* = (*a* − *b*)(*a* − *b*)*<sup>T</sup>* for (*a*, *b*) ∈ *Cs*, and |*C<sup>d</sup>* | covering constraints of the form *Ba*,*<sup>b</sup>* • *X* ≥ σ*<sup>d</sup>* , where *Ba*,*<sup>b</sup>* = (*a* − *<sup>b</sup>*)(*<sup>a</sup>* <sup>−</sup> *<sup>b</sup>*)*<sup>T</sup>* for (*a*, *<sup>b</sup>*) <sup>∈</sup> *<sup>C</sup><sup>d</sup>* .

We can use the *scalar MWU* method to obtain the following result:

**Theorem 4.3** *There is a deterministic algorithm that finds an* -*-feasible solution for (4.5)-(4.2.2) in time O*˜( *<sup>m</sup>*(*m*+*n*3) -<sup>2</sup> )*, where n is the dimension of the point sets and m* := |*Cs*| <sup>2</sup> + |*C<sup>d</sup>* <sup>|</sup> 2*.*

We remark that it is plausible that further improvements (possibly by another factor of *O*(*m*)) are possible via *rank-one tricks* and the use of *approximate eigenvalue* computations.

#### *4.2.3 Related Work*

Problems (Packing- I)-(Covering- I) and (Packing- II)-(Covering- II) can be solved using general SDP solvers, such as interior-point methods. For example, the barrier method (see, e.g., [22]) can compute a solution within an *additive* error of from the optimal in time *O*( <sup>√</sup>*nm*(*n*<sup>3</sup> <sup>+</sup> *mn*<sup>2</sup> <sup>+</sup> *<sup>m</sup>*2)log <sup>1</sup> - ) (see also [1, 27]). However, due to the special nature of (Packing- I)-(Covering- I) and (Packing-II)-(Covering- II), better algorithms can be obtained. Most of the improvements are obtained by using *first-order methods* [2, 3, 5, 6, 8, 15–18, 21, 23, 24], or secondorder methods [13, 14]. In general, we can classify these algorithms according to whether they are *(semi) width-independent*, are *parallel*, *output sparse solutions*, or are *oracle-based*, as follows.


Table 4.1 below gives a summary1 of the most relevant results together with their classifications according to the four criteria above. We note that almost all of these algorithms for packing/covering SDPs are generalizations of similar algorithms for packing/covering linear programs (LPs), and most of them are essentially based on an *exponential potential function* in the form of *scalar exponentials*, such as [3, 18], or *matrix exponential* [2, 5, 6, 15, 17]. For instance, several of these results use the scalar or matrix versions of the MWU method (see, e.g., [4]), which are extensions of similar methods for packing/covering LPs [9, 11, 25, 29].

In [12], a different type of algorithm was given for covering LPs (indeed, more generally, for a class of concave covering inequalities) based on a *logarithmic* potential function. In [7], it was shown that this approach could be extended to provide sparse solutions for both versions of packing and covering SDPs.

As we can see from the table, among all the algorithms, only the matrix (MWU and logarithmic-potential) algorithms are oracle-based (and hence produce sparse

<sup>1</sup> We provide rough estimates of the bounds, as some of them are not stated explicitly in the corresponding paper in terms of the parameters we consider here.


However, the dependence

 of the size of the support on the bit length

*L* is not polynomial solutions) in the sense described above. However, the overall running time of the matrix MWU algorithm is larger by a factor of (roughly) (*n*3−ω) than that of the logarithmic-potential algorithm, where ω is the exponent of matrix multiplication. Moreover, we cannot extend the matrix MWU algorithm to solve (Packing- I)- (Covering- I) (in particular, it seems tricky to bound the number of iterations).

# **4.3 General Framework for Packing-Covering SDPs**

Given a pair of packing-covering SDPs (Packing- I)-(Covering- I) or (Covering-II)-(Packing- II), we consider the following general framework in which each constraint is assigned a weight reflecting how satisfied the constraint is given the current solution:


**Algorithm 1:** A general framework for solving packing-covering SDPs

We obtain different algorithms depending on how the weights are defined. We write *ai* := *Ai* • *X* ≥ 0. Since *a*max := max{*a*1,..., *am*} (resp., *<sup>a</sup>*min := min{*a*1,..., *am*}) is *not a smooth* function (in *X*), it is more convenient to work with a *smooth approximation* of it, which is provided by the weighted average formed in step 3 in the framework. There are several ways to do this, for example:

• *Exponential averaging*: The weights are *pi* := (1+-)*ai m i*=1 (1+-)*ai* (resp., *pi* := (1−-)*ai <sup>m</sup> i*=1(1−-)*ai* ). The following claim justifies the use of these sets of weights.

**Lemma 4.1** *If a*max <sup>≥</sup> <sup>1</sup>+- log1+ *m* - *resp., a*min <sup>≥</sup> <sup>1</sup> log <sup>1</sup> 1−- *m*·*a*max -·*a*min *, then*

$$\frac{a\_{\max}}{1+\epsilon} \le \sum\_{i=1}^m \overline{p}\_i a\_i \le a\_{\max} \quad \left(resp., \ a\_{\min} \le \sum\_{i=1}^m \overline{p}\_i a\_i \le (1+\epsilon)a\_{\min}\right).$$

• *Logarithmic potential averaging*: The weights are *pi* <sup>=</sup> - *m* θ ∗ <sup>θ</sup> ∗−*ai* (resp., *pi* <sup>=</sup> - *m* θ ∗ *ai*−<sup>θ</sup> <sup>∗</sup> ), where <sup>θ</sup> <sup>∗</sup> is the minimizer (resp., maximizer) of the potential function

4 Oracle-Based Primal-Dual Algorithms for Packing … 55

$$\Phi(\theta) = \ln \left( \theta \cdot \sqrt[{-1}]{\prod\_{i=1}^{m} \frac{1}{\theta - a\_i}} \right) \text{ (resp.,}\\\Phi(\theta) = \ln \left( \theta \cdot \sqrt[{-1}]{\prod\_{i=1}^{m} (a\_i - \theta)} \right).$$

(It can be easily verified that *<sup>i</sup> pi* = 1.) The following claim justifies the use of these sets of weights.

**Lemma 4.2**

$$\frac{(1-\epsilon)a\_{\max}}{1-\epsilon/m} \le \sum\_{i=1}^m \overline{p}\_i a\_i \le a\_{\max} \ \left(resp., a\_{\min} \le \sum\_{i=1}^m \overline{p}\_i a\_i \le \frac{a\_{\min}(1+\epsilon)}{1+\epsilon/m} \right).$$

#### **4.4 Scalar Algorithms**

## *4.4.1 Scalar MWU Algorithm for (***PACKING-I***)-(***COVERING-I***)*

Given a normalized pair of packing-covering SDPs of type I (Packing- I)- (Covering- I), and a feasible primal solution *<sup>X</sup>*, we use the exponential weight *pi* := (1 + -)*Ai* •*<sup>X</sup>* , for *i* ∈ [*m*]. Averaging the inequalities with respect to the weights *pi* := *pi <sup>i</sup> pi* , we arrive at the following problem:

$$\begin{aligned} \max & \quad I \bullet X\\ \text{s.t.} & \quad \sum\_{i} \overline{p}\_{i} A\_{i} \bullet X \le 1, \quad \forall i \in [m] \\ & X \in \mathbb{R}^{n \times n}, \ X \succeq 0. \end{aligned} (4.8)$$

Letting *<sup>A</sup>* := *<sup>i</sup> pi Ai* and writing *<sup>X</sup>* <sup>=</sup> <sup>v</sup>∈*Bn* λvvv*<sup>T</sup>* , where *Bn* := {<sup>v</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* : v = 1} and λv ≥ 0 for all v ∈ *Bn*, we obtain the following (*infinite-dimensional*) *knapsack* problem

$$\max \sum\_{v \in B\_{\mathfrak{s}}} \lambda\_v \tag{4.9}$$

$$\text{s.t. } \sum\_{v \in B\_{\mathfrak{s}}} \lambda\_v \overline{A} \bullet vv^T \le 1, \quad \forall i \in [m]$$

$$\lambda\_v \ge 0, \quad \forall v \in B\_{\mathfrak{n}}.$$

An optimal solution is attained at a vector v ∈ *Bn* which *minimizes* v*<sup>T</sup> A*v. This is the basis vector corresponding to λmin(*A*).

Thus, using this set of weights in our general framework (Algorithm 1) yields the following procedure (for a vector *<sup>p</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*, we write *pi* := *pi <sup>i</sup> pi* ):

**<sup>1</sup>** *t* ← 0; *X*(0) ← 0; *y*(0) ← 0; *M*(0) ← 0; *T* ← -<sup>−</sup><sup>2</sup> ln *m* **2 while** *M*(*t*) < *T* **do <sup>3</sup>** *pi*(*t*) = (1 + -)*Ai* •*X*(*t*) /\* Update the weights \*/ **<sup>4</sup>** v(*t*) = argminv:||v||=<sup>1</sup> *<sup>i</sup> pi*(*t*)*Ai* • vv*<sup>T</sup>* /\* Find an eigenvector corresponding to the smallest eigenvalue of the average inequality matrix \*/ **<sup>5</sup>** δ(*t*) <sup>=</sup> <sup>1</sup>/ max*<sup>i</sup> Ai* • v(*t*)v(*t*)*<sup>T</sup>* /\* Define the update step size \*/ **<sup>6</sup>** *<sup>X</sup>*(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>=</sup> *<sup>X</sup>*(*t*) <sup>+</sup> δ(*t*)v(*t*)v(*t*)*<sup>T</sup>* ; *<sup>y</sup>*(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> *<sup>y</sup>*(*t*) <sup>+</sup> δ(*t*)*pi*(*t*) /\* Update the primaldual solutions \*/ **<sup>7</sup>** *M*(*t* + 1) = max*<sup>i</sup> Ai* • *X*(*t* + 1) /\* Compute the largest LHS \*/ **<sup>8</sup>** *t* ← *t* + 1 **9 end <sup>10</sup> output** (*X*ˆ , *y*ˆ) = *X*(*t*) *<sup>M</sup>*(*t*), *<sup>y</sup>*(*t*) (1−1.5-)*M*(*t*) 

**Algorithm 2:** Scalar MWU algorithm for (Packing- I)-(Covering- I)

The stopping criterion is that the left-hand side (LHS) of at least one inequality in (Packing- I) reaches some threshold *<sup>T</sup>* := -<sup>−</sup><sup>2</sup> ln *m*, with respect to the current solution *X*(*t*). The step size (step 5) is chosen such that in each iteration of the whileloop, this right-hand size increases by at least 1, thus guaranteeing termination in *mT* iterations.

**Theorem 4.4** *Given a real* - ∈ (0, 1]*, Algorithm 2 outputs an* -*-optimal solution for (*Packing- I*)-(*Covering- I*) in O*(*<sup>m</sup>* log *<sup>m</sup>*/-<sup>2</sup>) *iterations, where each iteration requires an oracle call that computes an eigenvector corresponding to the minimum eigenvalue of a psd matrix.*

For a given matrix *<sup>M</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>*, computing <sup>λ</sup>min(*M*) (almost) exactly requires *O*(*n*<sup>3</sup>) time via a full eigenvalue decomposition of the matrix. If *M* is psd, a faster approximation of λmin(*M*) can be obtained (using Lanczos' algorithm with a random start) via the following result.

**Theorem 4.5** ([19]) *Let M* <sup>∈</sup> <sup>S</sup>*<sup>n</sup>* <sup>+</sup> *be a psd matrix with N non-zeros and* γ ∈ (0, 1) *be a given constant. Then, there is a randomized algorithm that computes, with high (i.e.,* <sup>1</sup> <sup>−</sup> *<sup>o</sup>*(1)*) probability a unit vector* <sup>v</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup> such that* <sup>v</sup>*<sup>T</sup> <sup>M</sup>*<sup>v</sup> <sup>≥</sup> (<sup>1</sup> <sup>−</sup> γ )λmax(*M*)*. The algorithm takes O* log *<sup>n</sup>* <sup>√</sup><sup>γ</sup> *iterations, each requiring O*(*N*) *arithmetic operations.*

By applying the lemma to (*A*)−1, we can approximate λmin(*A*) in *O*˜(*n*ω) time.

## *4.4.2 Scalar Logarithmic Potential Algorithm For (***PACKING-I***)–(***COVERING-I***)*

Given a normalized pair of packing-covering SDPs of type I (Packing- I)- (Covering- I) and a feasible primal solution *<sup>X</sup>*, we use the logarithmic-potential weights *pi* <sup>=</sup> - *m* θ ∗ <sup>θ</sup> ∗−*Ai* •*<sup>X</sup>* for *<sup>i</sup>* ∈ [*m*]. Averaging the inequalities with respect to this set of weights, we arrive at the knapsack problem (4.9). This gives rise to the following procedure:

$$\begin{cases} & \mathcal{I} \leftarrow 0; \epsilon \leftarrow \frac{1}{2}; \ell \leftarrow 0; \upsilon(0) \leftarrow 1; X(0) \leftarrow 1; X(0) \leftarrow 1; \overline{1}\_{i}^{T} \text{ (for an arbitrary } i \in [n]) \\ & \text{ $ } & \mathcal{I} \text{ while } \varepsilon\_{\mathcal{I}} > \varepsilon \text{ do} \\ & \mathcal{I} \cdot \frac{1}{\sqrt{n}} \sum\_{i=1}^{n} \text{while } \upsilon(t) > \varepsilon\_{i}. \text{do} \\ & \mathcal{I} \cdot \frac{1}{n} \sum\_{i=1}^{n} \text{do} \, \overline{\rho}^{\*}(t) \text{ $ } & \text{if } \text{else} \\ & \frac{\varepsilon\_{i}\theta}{m} \sum\_{i=1}^{n} \overline{\theta}\_{i} = \overline{A\_{i}} \bullet \overline{X^{i}}(t) \text{  $} & \text{if } i \in [n] \text{ $ } & \text{if } \varepsilon \in [n] \\ & \mathcal{I} \cdot \frac{1}{m} \sum\_{i=1}^{n} \text{do} \, \overline{\rho}\_{i}(t) \text{ :} \, \overline{\rho}\_{i}(t) \coloneqq \frac{\varepsilon\_{i}\theta(t)}{m} \frac{1}{\overline{\theta}(t) - A\_{i}\!\!\!\!\/$$

**Algorithm 3:** Scalar logarithmic-potential algorithm for (Packing- I)- (Covering- I)

In the above, for given numbers *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> and <sup>δ</sup> <sup>∈</sup> (0, <sup>1</sup>), we define the <sup>δ</sup>-(upper) approximation *x*<sup>δ</sup> of *x* to be a number satisfying: *x* ≤ *x*<sup>δ</sup> < (1 + δ)*x*.

**Theorem 4.6** *Given* - ∈ (0, 1]*, Algorithm 3 outputs an* -*-optimal solution for (*Covering- I*)-(*Packing- I*) in O*(*<sup>m</sup>* log <sup>ψ</sup> <sup>+</sup> *<sup>m</sup>*/-<sup>2</sup>)*iterations, where*<sup>ψ</sup> := <sup>λ</sup>max(*A*(0)) λmin(*A*(0)) *and each iteration requires an oracle call that computes an eigenvector corresponding to the minimum eigenvalue of a psd matrix.*

#### **4.5 Matrix Algorithms**

## *4.5.1 Matrix MWU Algorithm For (***COVERING-II***)-(***PACKING-II***)*

Let *<sup>F</sup>*(*y*) := *<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *yi Ai* . Then, we can rewrite the normalized version of (Packing-II) as follows:

$$\begin{aligned} z\_I^\* &= \max \mathbf{1}^T \mathbf{y} & \quad \text{(PACKING-II)}\\ \text{s.t.} \quad \lambda\_j(F(\mathbf{y})) &\le 1, \quad \forall j \in [n] \\ \mathbf{y} &\in \mathbb{R}^m, \ \mathbf{y} \ge \mathbf{0}. \end{aligned}$$

Averaging the inequalities with respect to the weights *<sup>p</sup> <sup>j</sup>* := *pj <sup>j</sup> pj* , where *pj* := (1 + -)λ*j*(*F*(*y*)), we get

$$\begin{aligned} \max & \mathbf{1}^T \mathbf{y} \\ \text{s.t.} & \sum\_{j} \overline{p}\_j \lambda\_j(F(\mathbf{y})) \le 1, \quad \forall j \in [n], \\ & \mathbf{y} \in \mathbb{R}^m, \ \mathbf{y} \ge \mathbf{0}. \end{aligned}$$

Using the *eigenvalue decomposition*: *F*(*y*) = *UU<sup>T</sup>* , whereis the diagonal matrix containing the eigenvalues of *F*(*y*) and *UU<sup>T</sup>* = *I*, and letting

$$
\overline{P} := U \begin{bmatrix} \overline{p}\_1 \ 0 & \cdots \ 0 \\ 0 & \overline{p}\_2 \ \cdots \ 0 \\ \cdots & \cdots & \cdots \\ 0 & 0 & \cdots & \overline{p}\_n \end{bmatrix} U^T = \frac{(1+\epsilon)^{F(\mathbf{y})}}{\text{Tr}((1+\epsilon)^{F(\mathbf{y})})},
$$

we obtain the following knapsack problem:

$$\begin{aligned} \max & \mathbf{1}^T \mathbf{y} \\ \text{s.t.} & \sum\_{i} (\overline{P} \bullet A\_i) \mathbf{y}\_i \le 1, \quad \forall j \in [n], \\ & \mathbf{y} \in \mathbb{R}^m, \quad \mathbf{y} \ge 0. \end{aligned}$$

An optimal solution is attained at the basis vector *<sup>y</sup>* <sup>=</sup> **<sup>1</sup>***<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* <sup>+</sup> that minimizes *P* • *Ai* . This gives rise to the following matrix MWU algorithm:

**Theorem 4.7** *Given an real* - ∈ (0, 1]*, Algorithm 2 outputs an* -*-optimal solution for (*Covering- II*)-(*Packing- II*) in O*(*<sup>n</sup>* log *<sup>n</sup>*/-<sup>2</sup>) *iterations, where each iteration requires matrix exponential computation, two oracle calls that computes the max-* **<sup>1</sup>** *t* ← 0; *y*(0) ← 0; *X*(0) ← 0; *M*(0) ← 0; *T* ← -<sup>−</sup><sup>2</sup> ln *n* **2 while** *M*(*t*) < *T* **do <sup>3</sup>** *P*(*t*) = (1 + -) *<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *yi*(*t*)*Ai* /\* Update the weight matrix by exponentiation \*/ **<sup>4</sup>** *i*(*t*) ← argmin*<sup>i</sup> Ai* • *P*(*t*) δ(*t*) ← 1/λmax(*Ai*(*<sup>t</sup>*)) /\* Define the update step size \*/ *<sup>X</sup>*(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> *<sup>X</sup>*(*t*) <sup>+</sup> δ(*t*)*P*(*t*) *<sup>I</sup>* •*P*(*t*) ; *<sup>y</sup>*(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> *<sup>y</sup>*(*t*) <sup>+</sup> δ(*t*)**1***i*(*t*) /\* Update the primal-dual solution \*/ **<sup>5</sup>** *M*(*t* + 1) ← λmax( *<sup>i</sup> yi*(*t*)*Ai*) /\* Compute the largest eigenvalue of LHS of dual \*/ **<sup>6</sup>** *t* ← *t* + 1 **7 end <sup>8</sup>** *L*(*t*) ← min*<sup>i</sup> Ai* • *X*(*t*) **<sup>9</sup> output** (*X*ˆ , *y*ˆ) = *X*(*t*) *<sup>L</sup>*(*t*) , *<sup>y</sup>*(*t*) *M*(*t*) 

**Algorithm 4:** Matrix MWU algorithm for (Packing- II)-(Covering- II)

*imum eigenvalue of a psd matrix, and a single oracle call to the minimization in step 4.*

The most demanding step in the above algorithm is the matrix exponential computation, which can be done in *O*(*n*<sup>3</sup>) time via a complete eigenvalue decomposition. A more efficient approximation, particularly when the matrices *Ai* are sparse, can be obtained via the following result.

**Theorem 4.8** ([26]) *There is an algorithm for approximating the matrix exponential e<sup>F</sup> in time O*(*n*2*r* log3 <sup>1</sup> - )*, where r denotes the number of non-zeros in F* <sup>∈</sup> <sup>S</sup>*n, and is the approximation accuracy.*

We remark that a matrix MWU algorithm and a theorem similar to Algorithm 4 and Theorem 4.7 for (Packing- I)-(Covering- I) have not yet been discovered and are left as open problems.

## *4.5.2 Matrix Logarithmic Potential Algorithm For (***PACKING-I***)-(***COVERING-I***)*

Let *<sup>F</sup>*(*y*) := *<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *yi Ai* . Then, we can rewrite the normalized version of (Covering-I) as

$$\begin{aligned} z\_I^\* &= \min \mathbf{1}^T \mathbf{y} & \quad \text{(PACKING-II)}\\ \text{s.t.} \quad \lambda\_j(F(\mathbf{y})) &\ge 1, \quad \forall j \in [n] \\ \mathbf{y} &\in \mathbb{R}^m, \ \mathbf{y} \ge \mathbf{0}. \end{aligned}$$

Averaging the inequalities with respect to the weights *<sup>p</sup> <sup>j</sup>* := - *n* θ ∗ <sup>λ</sup>*j*(*F*(*y*))−<sup>θ</sup> <sup>∗</sup> , we get

$$\begin{aligned} \min & \mathbf{1}^T \mathbf{y} \\ \text{s.t.} & \sum\_{j} \overline{p}\_j \lambda\_j(F(\mathbf{y})) \ge 1, \quad \forall j \in [n], \\ & \mathbf{y} \in \mathbb{R}^m, \ \mathbf{y} \ge \mathbf{0}. \end{aligned}$$

Using the *eigenvalue decomposition*: *F*(*y*) = *UU<sup>T</sup>* , whereis the diagonal matrix containing the eigenvalues of *F*(*y*) and *UU<sup>T</sup>* = *I*, and letting

$$\overline{P} := U \begin{bmatrix} \overline{p}\_1 \ 0 & \cdots \ 0 \\ 0 & \overline{p}\_2 \ \cdots \ 0 \\ \cdots & \cdots & \cdots \\ 0 & 0 & \cdots & \overline{p}\_n \end{bmatrix} U^T = \frac{\epsilon \theta^\*}{n} (F(\mathbf{y}) - \theta^\* I)^{-1},$$

we obtain the following knapsack problem:

$$\begin{aligned} \min &\quad \mathbf{1}^T \mathbf{y} \\ \text{s.t.} &\quad \sum\_{i} (\overline{P} \bullet A\_i) \mathbf{y}\_i \ge 1, \forall j \in [n], \\ &\quad \mathbf{y} \in \mathbb{R}^m, \ y \ge 0. \end{aligned}$$

An optimal solution is attained at the basis vector *<sup>y</sup>* <sup>=</sup> **<sup>1</sup>***<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* <sup>+</sup> that maximizes *P* • *Ai* . This gives rise to the following matrix logarithmic-potential algorithm:

**1** *<sup>s</sup>* <sup>←</sup> 0; <sup>ε</sup><sup>0</sup> <sup>←</sup> <sup>1</sup> <sup>2</sup> ; *<sup>t</sup>* <sup>←</sup> 0; ν(0) <sup>←</sup> 1; *<sup>y</sup>*(0) <sup>←</sup> <sup>1</sup> *r <sup>r</sup> <sup>i</sup>*=<sup>1</sup> **<sup>1</sup>***<sup>i</sup>* **2 while** ε*<sup>s</sup>* > **do <sup>3</sup>** <sup>δ</sup>*<sup>s</sup>* <sup>←</sup> <sup>ε</sup><sup>3</sup> *s* 32*n* **4 while** ν(*t*)>ε*<sup>s</sup>* **do <sup>5</sup>** θ (*t*) <sup>←</sup> <sup>θ</sup>∗(*t*)δ*<sup>s</sup>* , where <sup>θ</sup>∗(*t*) is the smallest positive root of the equation <sup>ε</sup>*<sup>s</sup>* <sup>θ</sup> *<sup>n</sup>* Tr(*F*(*y*(*t*)) <sup>−</sup> <sup>θ</sup> *<sup>I</sup>*) <sup>−</sup><sup>1</sup> <sup>=</sup> <sup>1</sup> *<sup>X</sup>*(*t*) <sup>←</sup> <sup>ε</sup>*<sup>s</sup>* θ (*t*) *<sup>n</sup>* (*F*(*y*(*t*)) <sup>−</sup> θ (*t*)*I*) <sup>−</sup><sup>1</sup> /\* Set the primal solution \*/ *<sup>i</sup>*(*t*) <sup>←</sup> argmax*<sup>i</sup> Ai* • *<sup>X</sup>*(*t*) /\* Call the maximization oracle \*/ **<sup>6</sup>** ν(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> *<sup>X</sup>*(*t*) • *Ai*(*t*) <sup>−</sup> *<sup>X</sup>*(*t*) • *<sup>F</sup>*(*y*(*t*)) *<sup>X</sup>*(*t*) • *Ai*(*t*) <sup>+</sup> *<sup>X</sup>*(*t*) • *<sup>F</sup>*(*y*(*t*)) /\* Compute the error \*/ **<sup>7</sup>** τ (*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> <sup>ε</sup>*<sup>s</sup>* θ (*t*)ν(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>4</sup>*n*(*X*(*t*) • *Ai*(*t*) <sup>+</sup> *<sup>X</sup>*(*t*) • *<sup>F</sup>*(*y*(*t*))) /\* Compute the step size \*/ *y*(*t* + 1) ← (1 − τ (*t* + 1))*y*(*t*) + τ (*t* + 1)**1***i*(*t*) /\* Update the dual solution \*/ *t* ← *t* + 1 **8 end <sup>9</sup>** <sup>ε</sup>*s*+<sup>1</sup> <sup>←</sup> <sup>ε</sup>*<sup>s</sup>* 2 **10** *<sup>s</sup>* <sup>←</sup> *<sup>s</sup>* <sup>+</sup> <sup>1</sup> **11 end 12 output** (*X*<sup>ˆ</sup> , *<sup>y</sup>*ˆ) <sup>=</sup> (1−ε*s*−1)*X*(*t*−1) (1+ε*s*−1)2θ (*t*−1)) , *<sup>y</sup>*(*t*−1) θ (*t*−1) 

**Algorithm 5:** Matrix logarithmic-potential algorithm for (Packing- I)- (Covering- I)

The most demanding steps are the computation of θ (*t*) and *X*(*t*) in steps 5 and 5, respectively. Computing θ (*t*) can be done via binary search over a region determined by repeated matrix multiplications and approximate minimum eigenvalue computation (cf. Theorem 4.5). Once θ (*t*) is determined, computing *X*(*t*) requires a single matrix inversion. The overall running time per iteration is *O*˜(*n*ω) plus the time needed by the maximization oracle in step 5.

**Theorem 4.9** *Given* - ∈ (0, 1]*, Algorithm 5 outputs an* -*-optimal solution for (*Covering- I*)-(*Packing- I*) in O*(*<sup>n</sup>* log <sup>ψ</sup> <sup>+</sup> *<sup>n</sup>* -<sup>2</sup> )*iterations, where*<sup>ψ</sup> := *<sup>r</sup>*·max*<sup>i</sup>* <sup>λ</sup>max(*Ai*) λmin(*A*ˆ) *and each iteration requires O*(log *<sup>n</sup>* - ) *matrix multiplications and a single oracle call to the maximization in step 5.*

## *4.5.3 Matrix Logarithmic Potential Algorithm For (***PACKING-II***)-(***COVERING-II***)*

A symmetric version of Algorithm <sup>5</sup> for (Packing- II)-(Covering- II) can be given as follows:

**<sup>1</sup>** *<sup>s</sup>* <sup>←</sup> 0; <sup>ε</sup><sup>0</sup> <sup>←</sup> <sup>1</sup> <sup>4</sup> ; *t* ← 0; ν(0) ← 1; *y*(0) ← **1***<sup>i</sup>* (for an arbitrary *i* ∈ [*m*]) **2 while** ε*<sup>s</sup>* > **do <sup>3</sup>** <sup>δ</sup>*<sup>s</sup>* <sup>←</sup> <sup>ε</sup><sup>3</sup> *s* <sup>32</sup>*<sup>n</sup>* **while** ν(*t*)>ε*<sup>s</sup>* **do <sup>4</sup>** θ (*t*) <sup>←</sup> <sup>θ</sup> <sup>∗</sup>(*t*)δ*<sup>s</sup>* , where <sup>θ</sup> <sup>∗</sup>(*t*) is the smallest positive root of the equation ε*s*θ *<sup>n</sup>* Tr(θ *<sup>I</sup>* <sup>−</sup> *<sup>F</sup>*(*y*(*t*)))−<sup>1</sup> <sup>=</sup> <sup>1</sup> *<sup>X</sup>*(*t*) <sup>←</sup> <sup>ε</sup>*s*θ (*t*) *<sup>n</sup>* (θ (*t*)*<sup>I</sup>* <sup>−</sup> *<sup>F</sup>*(*y*(*t*)))−<sup>1</sup> /\* Set the primal solution \*/ *i*(*t*) ← argmin*<sup>i</sup> Ai* • *X*(*t*) /\* Call the minimization oracle \*/ ν(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> *<sup>X</sup>*(*t*) • *<sup>F</sup>*(*y*(*t*)) <sup>−</sup> *<sup>X</sup>*(*t*) • *Ai*(*t*) *<sup>X</sup>*(*t*) • *Ai*(*t*) <sup>+</sup> *<sup>X</sup>*(*t*) • *<sup>F</sup>*(*y*(*t*)) /\* Compute the error \*/ τ (*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>←</sup> <sup>ε</sup>*s*θ (*t*)ν(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) <sup>4</sup>*n*(*X*(*t*) • *Ai*(*t*) <sup>+</sup> *<sup>X</sup>*(*t*) • *<sup>F</sup>*(*y*(*t*))) /\* Compute the step size \*/ *y*(*t* + 1) ← (1 − τ (*t* + 1))*y*(*t*) + τ (*t* + 1)**1***i*(*t*) /\* Update the dual solution \*/ *t* ← *t* + 1 **5 end <sup>6</sup>** <sup>ε</sup>*s*+<sup>1</sup> <sup>←</sup> <sup>ε</sup>*<sup>s</sup>* 2 **<sup>7</sup>** *s* ← *s* + 1 **8 end <sup>9</sup> output** (*X*ˆ , *y*ˆ) = (1+ε*s*−1)*X*(*t*−1) (1−2ε*s*−1)2θ (*t*−1) , *<sup>y</sup>*(*t*−1) θ (*t*−1) 

**Algorithm 6:** Materix logarithmic-potential algorithm for (Packing- II)- (Covering- II)

**Theorem 4.10** *Given* - ∈ (0, 1]*, Algorithm 6 outputs an* -*-optimal solution for (*Packing- II*)-(*Covering- II*) in O*(*<sup>n</sup>* log <sup>ψ</sup> <sup>+</sup> *<sup>n</sup>* -<sup>2</sup> ) *iterations, where* <sup>ψ</sup> := *<sup>O</sup>*(log *<sup>n</sup>* - ) *and each iteration requires O*(log *<sup>n</sup>* - ) *matrix inversions and a single oracle call to the minimization in step 4.*

**Acknowledgements** We thank Waleed Najy for many helpful discussions on this topic. This work was partially supported by JST CREST JPMJCR1402 and Grants-in-Aid for Scientific Research. The research of the first author was partially supported by Abu Dhabi Education & Knowledge − Abu Dhabi Award for Research Excellence (AARE18-152).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Almost Linear Time Algorithms for Some Problems on Dynamic Flow Networks**

**Yuya Higashikawa, Naoki Katoh, and Junichi Teruyama**

**Abstract** Motivated by evacuation planning, several problems regarding *dynamic flow networks* have been studied in recent years. A dynamic flow network consists of an undirected graph with positive edge lengths, positive edge capacities, and positive vertex weights. The road network in an area can be treated as a graph where the edge lengths are the distances along the roads and the vertex weights are the number of people at each site. An edge capacity limits the number of people that can enter the edge per unit time. In a dynamic flow network, when particular points on edges or vertices called *sinks* are given, all of the people are required to evacuate from the vertices to the sinks as quickly as possible. This chapter gives an overview of two of our recent results on the problem of locating multiple sinks in a dynamic flow path network such that the max/sum of evacuation times for all the people to sinks is minimized, and we focus on techniques that enable the problems to be solved in almost linear time.

#### **5.1 Introduction**

Recently, many parts of the world have been affected by disasters including earthquakes, nuclear plant accidents, volcanic eruptions, and flooding, highlighting the urgent need for orderly evacuation planning. One powerful tool for evacuation planning is the *dynamic flow model* introduced by Ford and Fulkerson [10], which represents movement of commodities over time in a network. In this model, we are given a graph with *source* vertices and *sink* vertices. Each source vertex is associated with a positive weight, called a *supply*; each sink vertex is associated with a positive weight, called a *demand*; and each edge is associated with a positive length and capacity.

e-mail: higashikawa@gsis.u-hyogo.ac.jp

Y. Higashikawa (B) · N. Katoh · J. Teruyama University of Hyogo, Kobe, Japan

N. Katoh e-mail: naoki.katoh@gsis.u-hyogo.ac.jp

J. Teruyama e-mail: junichi.teruyama@gsis.u-hyogo.ac.jp © The Author(s) 2022

N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_5


**Table 5.1** Summary of minmax *k*-sink problems

An edge capacity limits the amount of supply that can enter the edge per unit time. One variant of the dynamic flow problem is the *quickest transshipment problem*, in which the objective is to send exactly the right amount of supply out of sources into sinks while satisfying demand constraints in the minimum overall time. Hoppe and Tardos [17] provided a polynomial time algorithm for this problem in the case where the transit times are integral. However, the complexity of their algorithm is very high. Finding a practical polynomial time solution to this is still an open problem. Readers are referred to a recent survey by Skutella [20] on dynamic flows.

This chapter discusses related problems called *k-sink problems* [3, 5–9, 14–16, 18], in which the objective is to find the locations of *k* sinks in a given dynamic flow network so that all the supply is sent to the sinks as quickly as possible. The following two criteria can be naturally considered for determining the optimality of the locations: minimization of *evacuation completion time* and *aggregate evacuation time* (i.e., *average evacuation time*). We call the *k*-sink problem that requires finding the locations of *k* sinks that minimize the evacuation completion time (resp., the aggregate evacuation time) the *minmax* (resp., *minsum*) *k-sink problem*. Although several papers have studied minmax *k*-sink problems in dynamic flow networks [3, 7–9, 14, 15, 18], minsum *k*-sink problems in dynamic flow networks have not been studied except for the case of path networks [5, 6, 15, 16].<sup>1</sup> Tables 5.1 and 5.2 summarize the previous results for the minmax *k*-sink problems and the minsum *k*-sink problems, respectively.

There are two models for the evacuation method. Under the *confluent flow model*, all the supply leaving a vertex must evacuate to the same sink through the same edges, and under the *non-confluent flow model*, there is no such restriction. To our knowledge, almost all of the papers that deal with the *k*-sink problems [3, 5–9, 15] adopt the confluent flow model, while only one paper [16] handles both of the models.

Although it may seem natural to model the evacuation behavior of people by treating each supply as a discrete quantity as in [17, 18], almost all of the previous papers on sink problems [3, 7–9, 14–16] have treated each supply as a continuous

<sup>1</sup> Note that the minsum 1-sink problem in general networks can be solved in polynomial time by applying the following two facts: (1) Baumann and Skutella [2] provided a polynomial time algorithm for the problem of computing a dynamic flow to a fixed sink in a general network while minimizing the aggregate evacuation time. (2) For the minsum 1-sink problem in general networks, one can prove that there exists an optimal sink located at a vertex in a similar manner to the well-known *node optimality theorem* for the 1-median problem [12].


**Table 5.2** Summary of minsum *k*-sink problems

quantity since it is easier to treat the problems mathematically and the effect is negligible when the number of people is large. Throughout this chapter, we adopt the model with continuous supplies.

We also give an overview of two of our recent results [7, 16] on the problems of locating multiple sinks on dynamic flow path networks such that the max/sum of evacuation times for all the people to sinks is minimized, and we focus on algorithmic frameworks that enable solving the problems in almost linear time.

#### **5.2 Preliminaries**

For two real values *<sup>a</sup>*, *<sup>b</sup>* with *<sup>a</sup>* <sup>&</sup>lt; *<sup>b</sup>*, let [*a*, *<sup>b</sup>*]={*<sup>t</sup>* <sup>∈</sup> <sup>R</sup> <sup>|</sup> *<sup>a</sup>* <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>b</sup>*}, [*a*, *<sup>b</sup>*) <sup>=</sup> {*<sup>t</sup>* <sup>∈</sup> <sup>R</sup> <sup>|</sup> *<sup>a</sup>* <sup>≤</sup> *<sup>t</sup>* <sup>&</sup>lt; *<sup>b</sup>*}, (*a*, *<sup>b</sup>*]={*<sup>t</sup>* <sup>∈</sup> <sup>R</sup> <sup>|</sup> *<sup>a</sup>* <sup>&</sup>lt; *<sup>t</sup>* <sup>≤</sup> *<sup>b</sup>*}, and (*a*, *<sup>b</sup>*) = {*<sup>t</sup>* <sup>∈</sup> <sup>R</sup> <sup>|</sup> *<sup>a</sup>* <sup>&</sup>lt; *<sup>t</sup>* <sup>&</sup>lt; *<sup>b</sup>*}, where <sup>R</sup> is the set of real values. For two integers*i*, *<sup>j</sup>* with *<sup>i</sup>* <sup>≤</sup> *<sup>j</sup>*, let [*i*.. *<sup>j</sup>*]={*<sup>h</sup>* <sup>∈</sup> <sup>Z</sup> <sup>|</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>h</sup>* <sup>≤</sup> *<sup>j</sup>*}, whereZis the set of integers. A dynamic flow path network*<sup>P</sup>* is given as a 5-tuple (*P*, **w**, **c**, **l**,τ), where *P* is a path with vertex set *V* = {*vi* | *i* ∈ [1..*n*]} and edge set *E* = {*ei* = (*vi*, *vi*+<sup>1</sup>) | *i* ∈ [1..*n* − 1]}, **w** is a vector *w*1,...,*wn* of which each component *wi* is the *weight* of vertex *vi* representing the amount of supply (e.g., the number of evacuees or cars) located at *vi* , **c** is a vector *c*1,..., *cn*−<sup>1</sup> of which each component *ci* is the *capacity* of edge *ei* representing the upper bound on the flow amount that can enter *ei* per unit time, **l** is a vector 1,...,*<sup>n</sup>*−<sup>1</sup> of which each component *<sup>i</sup>* is the *length* of edge *ei* (i.e., the distance between two end vertices of *ei*), and τ is the time taken for unit supply to move unit distance along any edge.

We say that a point *p* lies on path *P* = (*V*, *E*), denoted by *p* ∈ *P*, if *p* lies on a vertex *v* ∈ *V* or an edge *e* ∈ *E*. We assume that path *P* can be represented by a horizontal line segment along which the vertices *v*1, *v*2,..., *vn* are arranged in order from left to right. For two points *p*, *q* ∈ *P*, *p* ≺ *q* means that *p* lies to the left side of *q*. For two points *p*, *q* ∈ *P*, *p q* means that *p* ≺ *q* or *p* and *q* lie at the same location. For two points *p*, *q* ∈ *P* such that *p q*, *p* divides an edge (*vi*, *vi*+<sup>1</sup>) in the ratio *r <sup>p</sup>* : 1 − *r <sup>p</sup>*, and *q* divides an edge (*v <sup>j</sup>*, *v <sup>j</sup>*+<sup>1</sup>)in the ratio *rq* : 1 − *rq* , let *L*(*p*, *q*) be the distance between *p* and *q*, that is, *L*(*p*, *q*) = (1 − *r <sup>p</sup>*)*<sup>i</sup>* + *rq <sup>j</sup>* + *j*−1 *<sup>h</sup>*=*i*+<sup>1</sup> *<sup>h</sup>* (where *i <sup>h</sup>*=*i*+<sup>1</sup> *<sup>h</sup>* = 0 and *i*−1 *<sup>h</sup>*=*i*+<sup>1</sup> *<sup>h</sup>* = −*i*). Let us consider two integers *i*, *j* ∈ [1..*n*] with *i* < *j*. We denote by *Pi*,*<sup>j</sup>* a *subpath* of *P* from *vi* to *v <sup>j</sup>* , and by *P<sup>i</sup>*,*<sup>j</sup>* a subnetwork of *P* consisting of subpaths *Pi*,*<sup>j</sup>* . Let *Li*,*<sup>j</sup>* be the distance between *vi* and *v <sup>j</sup>* , that is, *Li*,*<sup>j</sup>* = *j*−1 *<sup>h</sup>*=*<sup>i</sup> h*, and let *Ci*,*<sup>j</sup>* be the minimum capacity among all the edges between *vi* and *v <sup>j</sup>* , that is,*Ci*,*<sup>j</sup>* = min{*ch* | *h* ∈ [*i*.. *j* − 1]}. For*i* ∈ [1..*n*], we denote the sum of weights from*v*<sup>1</sup> to *vi* by *Wi* = *i <sup>j</sup>*=<sup>1</sup> *wj* . Note that, given a dynamic flow path network *P*, if we construct two lists of *Wi* and *L*1,*<sup>i</sup>* for all *i* ∈ [1..*n*] in *O*(*n*) preprocessing time, we can obtain *Wi* for any *i* ∈ [1..*n*] and *Li*,*<sup>j</sup>* = *L*1,*<sup>j</sup>* − *L*1,*<sup>i</sup>* for any *i*, *j* ∈ [1..*n*] with *i* < *j* in *O*(1) time. In addition, *Ci*,*<sup>j</sup>* for any *i*, *j* ∈ [1..*n*] with *i* < *j* can be obtained in *O*(1) time with *O*(*n*) preprocessing time, which is known as the *range minimum query* [1, 4].

A *k-sink* **x** is a *k*-tuple (*x*1,..., *xk* ) of points on *P* such that *xi* ≺ *x <sup>j</sup>* for any *i* < *j*. We assume that no two sinks lie on the same edge.<sup>2</sup> We define the function Id for point *p* ∈ *P* as follows: the value Id(*p*) is an integer such that *v*Id(*p*) *p* ≺ *v*Id(*p*)+<sup>1</sup> holds, that is, if *p* lies on edge (*vi*, *vi*+<sup>1</sup>) or at vertex *vi* , Id(*p*) = *i*. A *divider* **d** is a (*k* − 1)-tuple (*d*1,..., *dk*−<sup>1</sup>) of real values such that 0 ≤ *di* < *dj* ≤ *Wn* for any *i* < *j*. A pair (**x**, **d**) is called *valid* if and only if *W*Id(*xi*) ≤ *di* ≤ *W*Id(*xi*+1) holds for any *i*. A valid pair (**x**, **d**) determines what amount of supply from which vertex flows to which sink so that the portion *di* − *di*−<sup>1</sup> of supply is assigned to flow to sink *xi* , where *d*<sup>0</sup> = 0 and *dk* = *Wn*. More precisely, given a valid pair (**x**, **d**), the portion *W*Id(*xi*) − *di*−<sup>1</sup> of supply that originates from the left side of *xi* flows to sink *xi* , and the portion *di* − *W*Id(*xi*) of supply that originates from the right side of *xi* also flows to sink *xi* . For instance, under the non-confluent flow model, if *Wh*−<sup>1</sup> < *di* < *Wh* where *h* ∈ [1..*n*], the portion *di* − *Wh*−<sup>1</sup> of the *wh* supply at *vh* flows to sink *xi* and the rest of the *Wh* − *di* supply flows to sink *xi*+1. The difference between the confluent flow model and the non-confluent flow model is that the confluent flow model requires that each value *di* of a divider **d** must take a value in {*W*1,..., *Wn*}, whereas the non-confluent flow model does not. For a dynamic flow path network *P* and a valid pair (**x**, **d**), the *evacuation completion time* CT(*P*, **x**, **d**) is the time at which all the supply completes the evacuation. The *aggregate evacuation time* AT(*P*, **x**, **d**) is the sum of the evacuation completion time for all the supply. Explicit definitions of these are given in Sect. 5.3.

#### **5.3 Objective Functions**

Suppose that we are given a divider **d** = (*d*1,..., *dk*−<sup>1</sup>). This **d** implies that we have *k* 1-sink subproblems. The *i*th subproblem consists of a subnetwork *P<sup>h</sup>*,*<sup>h</sup>* such that the weight of *v <sup>j</sup>* is *wj* for *j* ∈ [*h* + 1..*h* − 1], while those of *vh* and *vh* are *Wh* − *di*−<sup>1</sup> and *di* − *Wh* <sup>−</sup>1, respectively, where *Wh*−<sup>1</sup> < *di*−<sup>1</sup> ≤ *Wh* and *Wh* <sup>−</sup><sup>1</sup> < *di* ≤ *Wh* . To explicitly define the evacuation completion time and the aggregate evacuation time,

<sup>2</sup> It turns out that this assumption does not result in a loss of generality once the cost function is introduced later. If some adjacent two sinks *xi* and *xi*+<sup>1</sup> lie on edge (*v <sup>j</sup>*, *v <sup>j</sup>*+1), that is, *v <sup>j</sup>* ≺ *xi* ≺ *xi*+<sup>1</sup> ≺ *v <sup>j</sup>*+1, moving *xi* to *v <sup>j</sup>* or moving *xi*+<sup>1</sup> to *v <sup>j</sup>*+<sup>1</sup> does not increase the cost.

we first consider the case of the 1-sink problem, and then extend the argument to the general case of the *k*-sink problem.

#### *5.3.1 Objective Functions for the 1-Sink Problem*

Given a dynamic flow path network *P* = (*P*, **w**, **c**, **l**,τ) with *n* vertices, we assign a unique sink to a point *x*, that is, **x** = (*x*) and **d** = (), which is the 0-tuple. We consider only the case where *x* is on an edge *ei* excluding its end vertices, that is, *vi* ≺ *x* ≺ *vi*+1, since the case where *x* is on a vertex can be treated similarly. In this case, all the supply on the left side of *x* (i.e., at *v*1,..., *vi*) flows to the right toward sink *x*, and all the supply on the right side of *x* (i.e., at *vi*+<sup>1</sup>,..., *vn*) flows to the left toward sink *x*.

To treat this case, we introduce some new notation. Let the function θ *<sup>x</sup>*,+(*z*) denote the time at which the first *z* − *Wi* of supply on the right side of *x* completes its evacuation to sink *x* (where θ *<sup>x</sup>*,+(*z*) = 0 for *z* ∈ [0, *Wi*]). Similarly, let θ *<sup>x</sup>*,−(*z*) denote the time at which the first *Wi* − *z* of supply on the left side of *x* completes its evacuation to sink *x* (where θ *<sup>x</sup>*,−(*z*) = 0 for *z* ∈ [*Wi*, *Wn*]). Higashikawa [13] showed that the values θ *<sup>x</sup>*,+(*Wn*) and θ *<sup>x</sup>*,−(0), which are the evacuation completion times for all the supply on the right and left sides of *x*, respectively, are given by the following formulae:

$$\theta^{x,+}(W\_n) = \max \left\{ \frac{W\_n - W\_{j-1}}{C\_{i,j}} + \mathfrak{r} \cdot L(\mathbf{x}, v\_j) \mid j \in [i+1..n] \right\}, \text{ and} \qquad (5.1)$$

$$\theta^{x,-}(0) = \max \left\{ \frac{W\_j}{C\_{j,i+1}} + \tau \cdot L(\mathbf{v}\_j, \mathbf{x}) \mid j \in [1..i] \right\}.\tag{5.2}$$

Using these, the evacuation completion time CT(*P*, (*x*), ()) is given by

$$\mathcal{CT}(\mathcal{P}, (\mathbf{x}), ()) = \max \left\{ \theta^{\mathbf{x}, +}(W\_n), \theta^{\mathbf{x}, -}(\mathbf{0}) \right\}. \tag{5.3}$$

We can generalize formulae (5.1) and (5.2) to the case of any *z* ∈ [0, *Wn*] as follows:

$$\theta^{x,+}(z) = \max \{ \theta^{x,+,j}(z) \mid j \in [i+1..n] \}, \tag{5.4}$$

where θ *<sup>x</sup>*,+,*<sup>j</sup>* (*z*) for *j* ∈ [*i* + 1..*n*] is defined as

$$\theta^{\chi, +, j}(z) = \begin{cases} 0 & \text{if } z \le W\_{j-1}, \\ \frac{z - W\_{j-1}}{C\_{i, j}} + \tau \cdot L(\chi, \nu\_j) & \text{if } z > W\_{j-1}, \end{cases} \tag{5.5}$$

and

**Fig. 5.1** The blue (resp., red) thick half-open segments indicate the function θ *<sup>x</sup>*,+(*z*) (resp., <sup>θ</sup> *<sup>x</sup>*,−(*z*)). The gray area indicates AT(*P*, (*x*), ())

$$\theta^{x,-}(z) = \max \{ \theta^{x,-,j}(z) \mid j \in [1..i] \}, \tag{5.6}$$

where θ *<sup>x</sup>*,−,*<sup>j</sup>* (*z*) is defined for *j* ∈ [1..*i*] as

$$\theta^{x,-,j}(z) = \begin{cases} \frac{W\_j - z}{C\_{j,i+1}} + \tau \cdot L(\nu\_j, x) \text{ if } z < W\_j, \\ 0 & \text{if } z \ge W\_j. \end{cases} \tag{5.7}$$

Then, the aggregate evacuation times for the supply on the right and left sides of *x* are

$$\int\_{W\_i}^{W\_i} \theta^{x,+}(z)dz \quad \text{and} \quad \int\_0^{W\_i} \theta^{x,-}(z)dz,$$

respectively. Thus, the aggregate evacuation time AT(*P*, (*x*), ()) is given by

$$\mathbf{AT}(\mathcal{P}, (\mathbf{x}), (\mathbf{j})) = \int\_0^{W\_i} \theta^{\mathbf{x}, -}(z) dz + \int\_{W\_i}^{W\_\pi} \theta^{\mathbf{x}, +}(z) dz. \tag{5.8}$$

See also Fig. 5.1.

#### *5.3.2 Objective Functions for k-Sink*

Let us consider a valid pair consisting of a *k*-sink **x** = (*x*1,..., *xk* ) and a divider **d** = (*d*1,..., *dk*−<sup>1</sup>) such that each sink is on an edge excluding its end vertices, that is, *v*Id(*xi*) ≺ *xi* ≺ *v*Id(*xi*)+1. In this situation, for each *i* ∈ [1..*k*], the first *di* − *W*Id(*xi*) of supply on the right side of *xi* and the first *W*Id(*xi*) − *di*−<sup>1</sup> of supply on the left side of *xi* move to sink *xi* . By the argument of the previous section, the evacuation completion times for the supply on the right and left sides of *xi* are represented by

5 Almost Linear Time Algorithms for Some Problems on Dynamic Flow Networks 71

$$
\theta^{x\_{i,+} +}(d\_i) \text{ and } \theta^{x\_{i,-}}(d\_{i-1}),
$$

respectively. Thus, the evacuation completion time CT(*P*, **x**, **d**) is given by

$$\mathbf{CT}(\mathcal{P}, \mathbf{x}, \mathbf{d}) = \max \left\{ \theta^{x\_i, +}(d\_i), \theta^{x\_i, -}(d\_{i-1}) \mid i \in [1..k] \right\}, \tag{5.9}$$

where *d*<sup>0</sup> = 0 and *dk* = *Wn*. The aggregate evacuation times for the supply on the right and left sides of *xi* are

$$\int\_{W\_{\mathrm{ld}(x\_i)}}^{d\_i} \theta^{x\_i,+}(z)dz \quad \text{and} \quad \int\_{d\_{i-1}}^{W\_{\mathrm{ld}(x\_i)}} \theta^{x\_i,-}(z)dz,$$

respectively. Thus, the aggregate evacuation time AT(*P*, **x**, **d**) is given by

$$\mathbf{AT}(\mathcal{P}, \mathbf{x}, \mathbf{d}) = \sum\_{i \in \{1..k\}} \left( \int\_{d\_{i-1}}^{\mathbf{W}\_{\text{ld}(z\_i)}} \theta^{x\_i, -}(z) dz + \int\_{\mathbf{W}\_{\text{ld}(z\_i)}}^{d\_i} \theta^{x\_i, +}(z) dz \right), \tag{5.10}$$

where *d*<sup>0</sup> = 0 and *dk* = *Wn*.

#### **5.4 Minmax** *k***-Sink Problems on Paths**

In this section, we consider the minmax *k*-sink problems on path networks under the confluent flow model, which is precisely defined as

(Minmax-*k*-Sink-Path-Confluent-Flow) **Input:** A dynamic flow path network *P* = (*P*, **w**, **c**, **l**,τ). **Goal:** Find a solution (**x**, **d**) to the problem


For the Minmax-*k*-Sink-Path-Confluent-Flow problem, [7] reported the following result, which is the best so far:

**Theorem 5.1** (*[7]*) *The* Minmax*-k-*Sink*-*Path*-*Confluent*-*Flow *problem can be solved in O*(min{*<sup>n</sup>* log *<sup>n</sup>* <sup>+</sup> *<sup>k</sup>*<sup>2</sup> log4 *<sup>n</sup>*, *<sup>n</sup>* log<sup>3</sup> *<sup>n</sup>*})*time. Moreover, if the capacities of <sup>P</sup> are uniform, the* Minmax*-k-*Sink*-*Path*-*Confluent*-*Flow *problem can be solved in O*(min{*<sup>n</sup>* <sup>+</sup> *<sup>k</sup>*<sup>2</sup> log<sup>2</sup> *<sup>n</sup>*, *<sup>n</sup>* log *<sup>n</sup>*}) *time.*

Theorem 5.1 implies that the problem is solved in almost linear time for any *k*. In [7], two kinds of algorithms are provided: One is an *<sup>O</sup>*(*<sup>n</sup>* log *<sup>n</sup>* <sup>+</sup> *<sup>k</sup>*<sup>2</sup> log<sup>4</sup> *<sup>n</sup>*) time algorithm based on the *parametric search method*, and the other is an *O*(*n* log<sup>3</sup> *n*) time algorithm based on the *sorted matrix method*.

Both algorithms require repeatedly solving the problems of locating 1-sink for multiple choices of different subnetworks. Note that the optimal solution for the problem of locating 1-sink on *P<sup>i</sup>*,*<sup>j</sup>* is a point *x*<sup>∗</sup> that minimizes the following expression over *x* ∈ *Pi*,*<sup>j</sup>*

$$\mathsf{CT}(\mathcal{P}\_{i,j}, (\mathbf{x}), (\mathbf{j}) = \max \left\{ \theta^{\mathbf{x}, +}(W\_j), \theta^{\mathbf{x}, -}(W\_{i-1}) \right\}. \tag{5.11}$$

Both algorithms also require repeatedly performing *feasibility tests* for multiple choices of different subnetworks. We say that *P<sup>i</sup>*,*<sup>j</sup>* is (*t*, *q*)-*feasible* if and only if the answer of the following decision problem is "yes":

(Feasibility-Test-for-Subpath)

**Input:** A dynamic flow path network *<sup>P</sup>* <sup>=</sup> (*P*, **<sup>w</sup>**, **<sup>c</sup>**, **<sup>l</sup>**,τ), a positive real *<sup>t</sup>* <sup>∈</sup> <sup>R</sup>+, integers *q*,*i*, *j* satisfying *q* ∈ [1..*k*] and *i*, *j* ∈ [1..*n*] with *i* < *j*.

**Goal:** Determine whether there exists a pair of vectors (**x** , **d** ) such that

$$\begin{aligned} \mathsf{CT}(\mathcal{P}\_{i,j}, \mathbf{x}', \mathbf{d}') &\leq t, \\ \mathbf{x}' &= (\mathbf{x}\_1, \dots, \mathbf{x}\_q) \in P\_{i,j}^q, \\ \mathbf{d}' &= (d\_1, \dots, d\_{q-1}) \in \{W\_h \mid h \in [i..j] \}^{q-1}, \qquad W\_{\mathrm{Id}(\mathbf{x}\_h)} \leq d\_h \leq W\_{\mathrm{Id}(\mathbf{x}\_{h+1})} \; \forall h. \end{aligned}$$

Note that [7] developed a data structure called the *CUE tree* to efficiently compute θ *<sup>x</sup>*,−(*Wi*−<sup>1</sup>) and θ *<sup>x</sup>*,+(*Wj*) for any integers *i*, *j* ∈ [1..*n*] with *i* < *j* and any *x* ∈ *Pi*,*<sup>j</sup>* . For the case of general edge capacities, the CUE tree can be constructed in *O*(*n* log *n*) time, and <sup>θ</sup> *<sup>x</sup>*,−(*Wi*−<sup>1</sup>) and <sup>θ</sup> *<sup>x</sup>*,+(*Wj*) can be computed in *<sup>O</sup>*(log<sup>2</sup> *<sup>n</sup>*) time by using the CUE tree. See [7] for more detail.

**Lemma 5.1** (*[7]*) *Given a dynamic flow path network P* = (*P*, **w**, **c**, **l**,τ) *with n vertices, the CUE tree can be constructed in O*(*n* log *n*) *time. Moreover, if the capacities of P are uniform, the CUE tree can be constructed in O*(*n*) *time.*

**Lemma 5.2** (*[7]*) *Given a dynamic flow path network P* = (*P*, **w**, **c**, **l**,τ) *with n vertices, suppose that the CUE tree is available. Then, for any integers i*, *j* ∈ [1..*n*] *with i* <sup>&</sup>lt; *j and any x* <sup>∈</sup> *Pi*,*<sup>j</sup> ,* <sup>θ</sup> *<sup>x</sup>*,−(*Wi*−<sup>1</sup>) *and* <sup>θ</sup> *<sup>x</sup>*,+(*Wj*) *can be computed in O*(log<sup>2</sup> *<sup>n</sup>*) *time. Moreover, if the capacities of <sup>P</sup> are uniform,* <sup>θ</sup> *<sup>x</sup>*,−(*Wi*−<sup>1</sup>) *and* <sup>θ</sup> *<sup>x</sup>*,+(*Wj*) *can be computed in O*(log *n*) *time.*

In the rest of this section, we first describe how feasibility tests are performed in Sect. 5.4.1 and how the 1-sink problem for a subnetwork is solved in Sect. 5.4.2, and then show the frameworks of the parametric search method in Sect. 5.4.3 and the sorted matrix method in Sect. 5.4.4.

#### *5.4.1 Feasibility Test*

In [7], to solve the Minmax-*k*-Sink-Path-Confluent-Flow problem, an algorithm repeatedly tests the (*t*, *q*)-feasibility of *P<sup>i</sup>*,*<sup>j</sup>* for multiple choices of different 4-tuples (*t*, *q*,*i*, *j*). Let CTOPT(*q*,*i*, *j*) denote the optimal cost for the problem of locating *<sup>q</sup>*-sink on *<sup>P</sup><sup>i</sup>*,*<sup>j</sup>* . Then, for a positive real *<sup>t</sup>* <sup>∈</sup> <sup>R</sup>+, integers *<sup>q</sup>*,*i*, *<sup>j</sup>* satisfying *<sup>q</sup>* ∈ [1..*k*] and *i*, *j* ∈ [1..*n*] with *i* < *j*, *P<sup>i</sup>*,*<sup>j</sup>* is (*t*, *q*)-feasible if and only if CTOPT(*q*,*i*, *j*) ≤ *t* holds.

**Lemma 5.3** (*[7]*) *Given a dynamic flow path network P* = (*P*, **w**, **c**, **l**,τ) *with n vertices, suppose that the CUE tree is available. For integers q*,*i*, *j satisfying q* ∈ [1..*k*] *and i*, *j* ∈ [1..*n*] *with i* < *j, the* (*t*, *q*)*-feasibility of P<sup>i</sup>*,*<sup>j</sup> can be tested in O*(min{*<sup>n</sup>* log<sup>2</sup> *<sup>n</sup>*, *<sup>k</sup>* log<sup>3</sup> *<sup>n</sup>*}) *time. Moreover, if the capacities of <sup>P</sup> are uniform, the* (*t*, *q*)*-feasibility of P<sup>i</sup>*,*<sup>j</sup> can be tested in O*(min{*n*, *k* log *n*}) *time.*

*Proof* We prove only the case of general capacities. For the case of uniform capacity, see [7].

To determine the (*t*, *q*)-feasibility of *P<sup>i</sup>*,*<sup>j</sup>* , we first place the sinks consecutively from left to right as far to the right as possible. We then compute the maximum integer *h* such that θ *vh* ,−(*Wi*−<sup>1</sup>) ≤ *t* and θ *vh*+1,−(*Wi*−<sup>1</sup>) > *t* holds. Next, we solve

$$
\theta^{v\_{h+1},-}(W\_{i-1}) - \alpha \cdot \tau \ell\_h = t \tag{5.12}
$$

for α. If α < 1, we move the leftmost sink *x*<sup>1</sup> to the point that divides edge *eh* = (*vh*, *vh*+<sup>1</sup>) at a ratio of 1 − α : α, otherwise we place *x*<sup>1</sup> at *vh*. We then compute the maximum integer *l*<sup>1</sup> such that θ *<sup>x</sup>*1,+(*Wl*<sup>1</sup> ) ≤ *t* and θ *<sup>x</sup>*1,+(*Wl*1+<sup>1</sup>) > *t* holds. We thus determine the maximal subnetwork *P<sup>i</sup>*,*l*<sup>1</sup> such that CTOPT(1,*i*,*l*1) ≤ *t*. In the same manner, we repeatedly isolate the maximal subnetworks *P<sup>i</sup>*,*l*<sup>1</sup> ,*P<sup>l</sup>*1+1,*l*<sup>2</sup> ,*P<sup>l</sup>*2+1,*l*<sup>3</sup> ,..., and if the *q*th subnetwork is found to have *lq* < *j*, then *P<sup>i</sup>*,*<sup>j</sup>* is not (*t*, *q*)-feasible, otherwise it is (*t*, *q*)-feasible.

Let us now look at the time complexity. Isolating *P<sup>i</sup>*,*l*<sup>1</sup> consists of (a) computing *h*, (b) solving the equation for α, and (c) computing *l*1. Obviously (b) takes *O*(1) time. For (a), applying a binary search takes *O*(log<sup>3</sup> *n*) time because we compute θ *va* ,−(*Wi*−<sup>1</sup>) over *a* ∈ [*i*.. *j*] *O*(log *n*)times and each θ *va* ,−(*Wi*−<sup>1</sup>) can be computed in *O*(log<sup>2</sup> *n*)time using the CUE tree by Lemma 5.2. Similarly (c) takes *O*(log<sup>3</sup> *n*)time by binary search. In this way, we can isolate at most *<sup>q</sup>* subnetworks in *<sup>O</sup>*(*<sup>q</sup>* log<sup>3</sup> *<sup>n</sup>*) <sup>=</sup> *O*(*k* log<sup>3</sup> *n*) time. However, if we simply scan from left to right instead of using a binary search for (a) and (c), that is, if we compute θ *va* ,−(*Wi*−<sup>1</sup>) for *a* = *i*,*i* + 1,..., *h*, *h* + 1 and θ *<sup>x</sup>*1,−(*Wb*) for *b* = *h* + 1, *h* + 2,...,*l*1,*l*<sup>1</sup> + 1, it takes *O*((*l*<sup>1</sup> − *i*)log<sup>2</sup> *n*) time to determine *Pi*,*l*<sup>1</sup> . In this way, we can isolate at most *p* subnetworks in *<sup>O</sup>*((*<sup>j</sup>* <sup>−</sup> *<sup>i</sup>*)log<sup>2</sup> *<sup>n</sup>*) <sup>=</sup> *<sup>O</sup>*(*<sup>n</sup>* log<sup>2</sup> *<sup>n</sup>*) time. -

#### *5.4.2 Solving the 1-Sink Problem*

**Lemma 5.4** (*[7]*) *Given a dynamic flow path network P* = (*P*, **w**, **c**, **l**,τ) *with n vertices, suppose that the CUE tree is available. For any integers i*, *j satisfying i*, *j* ∈ [1..*n*] *with i* <sup>&</sup>lt; *j,* CTOPT(1,*i*, *<sup>j</sup>*) *can be computed in O*(log3 *<sup>n</sup>*) *time. Moreover, if the capacities of P are uniform,* CTOPT(1,*i*, *j*) *can be computed in O*(log *n*) *time.*

*Proof* We prove only the case of general capacities. See [7] for the case of uniform capacity.

Recalling Eq. (5.11), we have

$$\begin{split} \mathsf{CT}\_{\mathsf{OPT}}(1, i, j) &= \min\_{x \in P\_{i,j}} \mathsf{CT}(\mathcal{P}\_{i,j}, (\mathsf{x}), (\mathsf{)}) \\ &= \min\_{x \in P\_{i,j}} \max \left\{ \theta^{x,+}(W\_j), \theta^{x,-}(W\_{i-1}) \right\}. \end{split} \tag{5.13}$$

Because θ *<sup>x</sup>*,+(*Wj*) and θ *<sup>x</sup>*,−(*Wi*−<sup>1</sup>) are monotonically decreasing and monotonically increasing, respectively, in *x* ∈ *Pi*,*<sup>j</sup>* , if an integer *h* ∈ [*i*.. *j*] satisfies θ *vh* ,−(*Wi*−<sup>1</sup>) ≤ θ *vh* ,+(*Wj*) and θ *vh*+1,−(*Wi*−<sup>1</sup>)>θ *vh*+1,+(*Wj*), then there exists *x*<sup>∗</sup> that minimizes CT(*P<sup>i</sup>*,*<sup>j</sup>*, (*x*), ()) on edge *eh* including *vh* and *vh*+1. We can apply binary search to compute this *h*, which can be done in *O*(log<sup>3</sup> *n*) time using the CUE tree (see Lemma 5.2). Once *h* is determined, *x*<sup>∗</sup> can be computed as follows: We solve

$$
\theta^{\text{v}\_{\text{h}+1,-}}(W\_{i-1}) - \alpha \cdot \tau \ell\_h = \theta^{\text{v}\_{\text{h}},+}(W\_j) - (1 - \alpha) \cdot \tau \ell\_h \tag{5.14}
$$

for α in *O*(1) time. If α ≤ 0, let *x*<sup>∗</sup> = *vh*+<sup>1</sup> and compute CTOPT(1,*i*, *j*) = CT(*P<sup>i</sup>*,*<sup>j</sup>*, (*vh*+<sup>1</sup>), ()). If α ≥ 1, let *x*<sup>∗</sup> = *vh* and compute CTOPT(1,*i*, *j*) = CT(*P<sup>i</sup>*,*<sup>j</sup>*, (*vh*), ()). Otherwise, let *x*<sup>∗</sup> be the point that divides edge *eh* = (*vh*, *vh*+<sup>1</sup>) at a ratio of 1 − α : α and compute CTOPT(1,*i*, *j*) = θ *vh*+1,−(*Wi*−<sup>1</sup>) − α · τ *<sup>h</sup>* = θ *vh* ,+(*Wj*) − (1 − α) · τ *h*. Using the CUE tree, we can compute these values in *<sup>O</sup>*(log<sup>2</sup> *<sup>n</sup>*) time. Thus, CTOPT(1,*i*, *<sup>j</sup>*) can be computed in *<sup>O</sup>*(log3 *<sup>n</sup>*) <sup>+</sup> *<sup>O</sup>*(1) <sup>+</sup> *<sup>O</sup>*(log<sup>2</sup> *<sup>n</sup>*) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> *<sup>n</sup>*) time. -

#### *5.4.3 Parametric Search Method*

In the parametric search method, we first compute the maximum integer *i*<sup>1</sup> such that *P<sup>i</sup>*1+1,*<sup>n</sup>* is not (CTOPT(1, 1,*i*1), *k* − 1)-feasible and store *t*<sup>1</sup> = CTOPT(1, 1,*i*<sup>1</sup> + 1) as a feasible value. Note that *t*<sup>∗</sup> = CTOPT(*k*, 1, *n*) satisfies CTOPT(1, 1,*i*1) < *t*<sup>∗</sup> ≤ *t*1. To compute *i*1, we apply binary search by executing *O*(log *n*) tests for (CTOPT(1, 1, *a*), *k* − 1)-feasibility of *P<sup>a</sup>*+1,*<sup>n</sup>* over 1 ≤ *a* ≤ *n*. For an integer *a*, we can compute CTOPT(1, 1, *a*) in *O*(log<sup>3</sup> *n*) time by Lemma 5.4. Also, by Lemma 5.3, we can test whether *<sup>P</sup><sup>a</sup>*+1,*<sup>n</sup>* is (CTOPT(1, <sup>1</sup>, *<sup>a</sup>*), *<sup>k</sup>* <sup>−</sup> <sup>1</sup>)-feasible in *<sup>O</sup>*(*<sup>k</sup>* log3 *<sup>n</sup>*) time. Summarizing these arguments, we can compute *<sup>i</sup>*<sup>1</sup> and *<sup>t</sup>*<sup>1</sup> in {*O*(log<sup>3</sup> *<sup>n</sup>*) <sup>+</sup> *<sup>O</sup>*(*<sup>k</sup>* log<sup>3</sup> *<sup>n</sup>*)} <sup>×</sup> *<sup>O</sup>*(log *<sup>n</sup>*) <sup>=</sup> *<sup>O</sup>*(*<sup>k</sup>* log<sup>4</sup> *<sup>n</sup>*)time. Next, we compute the maximum integer *<sup>i</sup>*<sup>2</sup> such that *P<sup>i</sup>*2+1,*<sup>n</sup>* is not (CTOPT(1,*i*<sup>1</sup> + 1,*i*2), *k* − 2)-feasible and store *t*<sup>2</sup> = CTOPT(1,*i*<sup>1</sup> + <sup>1</sup>,*i*<sup>2</sup> <sup>+</sup> <sup>1</sup>) as a feasible value, which can be done in *<sup>O</sup>*(*<sup>k</sup>* log<sup>4</sup> *<sup>n</sup>*)time in the same manner as in the computation of (*i*1, *t*1). Sequentially, we determine (*i*3, *t*3), . . . , (*ik*−1, *tk*−1) in (*<sup>k</sup>* <sup>−</sup> <sup>3</sup>) <sup>×</sup> *<sup>O</sup>*(*<sup>k</sup>* log<sup>4</sup> *<sup>n</sup>*) time and eventually compute *tk* <sup>=</sup> CTOPT(1,*ik*−<sup>1</sup> <sup>+</sup> <sup>1</sup>, *<sup>n</sup>*) in *<sup>O</sup>*(log3 *<sup>n</sup>*) time. Note that *<sup>t</sup>*<sup>∗</sup> <sup>=</sup> min{*ti* <sup>|</sup> *<sup>i</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>,..., *<sup>k</sup>*} holds, which can be computed in *<sup>O</sup>*(*k*)time. We then execute a (*t*∗, *<sup>k</sup>*)-feasibility test for *<sup>P</sup>* in *<sup>O</sup>*(*<sup>k</sup>* log<sup>3</sup> *<sup>n</sup>*) time, so that the optimal *k*-sink is obtained.We thus see that the problem can be solved in (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>) <sup>×</sup> *<sup>O</sup>*(*<sup>k</sup>* log<sup>4</sup> *<sup>n</sup>*) <sup>+</sup> *<sup>O</sup>*(log<sup>3</sup> *<sup>n</sup>*) <sup>+</sup> *<sup>O</sup>*(*k*) <sup>+</sup> *<sup>O</sup>*(*<sup>k</sup>* log<sup>3</sup> *<sup>n</sup>*) <sup>=</sup> *<sup>O</sup>*(*k*<sup>2</sup> log<sup>4</sup> *<sup>n</sup>*) time once the CUE tree is constructed. Since it takes *O*(*n* log *n*) time to construct the CUE tree by Lemma 5.1, the total time complexity is *<sup>O</sup>*(*<sup>n</sup>* log *<sup>n</sup>* <sup>+</sup> *<sup>k</sup>*<sup>2</sup> log<sup>4</sup> *<sup>n</sup>*).

For the case of uniform capacity, the same argument holds. Applying Lemmas 5.1, 5.3, and 5.4, we have a total time complexity of *<sup>O</sup>*(*<sup>n</sup>* <sup>+</sup> *<sup>k</sup>*<sup>2</sup> log<sup>2</sup> *<sup>n</sup>*).

#### *5.4.4 Sorted Matrix Method*

A matrix *A* is *sorted* if and only if each row and column of *A* is sorted in nondecreasing order. The sorted matrix method is based on the following lemma shown in [11]:

**Lemma 5.5** (*[11]*) *Consider a minimization problem Q with an instance I of size n. Suppose that the feasibility of any value for I can be tested in g*(*n*) *time. Let A be an n* × *n sorted matrix such that each element can be computed in f* (*n*) *time. Then, the minimum element of A that is feasible for Q can be found in O*(*n f* (*n*) + *g*(*n*)log *n*) *time.*

In [7], an *n* × *n* matrix *A* is defined such that the (*i*, *j*)th entry of *A* is given by

$$A[i,j] = \begin{cases} \mathsf{CT}\_{\mathsf{OPT}}(1, n-i+1, j) & \text{if } n-i+1 \le j \\ 0 & \text{otherwise.} \end{cases} \tag{5.15}$$

Note that we do not actually compute all the elements of *A*, but compute the element *A*[*i*, *j*] on demand as needed.

Let us confirm that matrix *A* is sorted. It is also clear that matrix *A* includes CTOPT(1,*l*,*r*) for every pair of integers (*l*,*r*) such that *l*,*r* ∈ [1..*n*] with *l* < *r*. In addition, there exists a pair (*l*,*r*) such that CTOPT(1,*l*,*r*) = CTOPT(*k*, 1, *n*). These facts imply that the minimum element *A*[*i*, *j*] such that *P* is (*A*[*i*, *j*], *k*)-feasible is CTOPT(*k*, 1, *n*), and hence we can apply Lemma 5.5 to solve the Minmax- *k*- Sink-Path- Confluent- Flow problem as follows: Once the CUE tree is constructed, we have *<sup>f</sup>* (*n*) <sup>=</sup> *<sup>O</sup>*(log3 *<sup>n</sup>*) by Lemma 5.4 and *<sup>g</sup>*(*n*) <sup>=</sup> *<sup>O</sup>*(*<sup>n</sup>* log<sup>2</sup> *<sup>n</sup>*) by Lemma 5.3, so the problem can be solved in *O*(*n* log<sup>3</sup> *n*) time. Because it takes *O*(*n* log *n*) time to construct the CUE tree by Lemma 5.1, the total time complexity is *O*(*n* log *n*) + *<sup>O</sup>*(*<sup>n</sup>* log<sup>3</sup> *<sup>n</sup>*) <sup>=</sup> *<sup>O</sup>*(*<sup>n</sup>* log<sup>3</sup> *<sup>n</sup>*).

For the case of uniform capacity, the same argument holds. Applying Lemmas 5.1, 5.3, and 5.4, we have a total time complexity of *O*(*n* log *n*).

#### **5.5 Minsum** *k***-Sink Problems on Paths**

In this section, our task is to find a valid pair (**x**, **d**) that minimizes the aggregate evacuation time AT(*P*, **x**, **d**). This task can be precisely represented as follows:

(Minsum-*k*-Sink-Path) **Input:** A dynamic flow path network *P* = (*P*, **w**, **c**, **l**,τ). **Goal:** Find a solution (**x**, **d**) to the problem

min. AT(*P*, **x**, **d**) <sup>s</sup>.t. **<sup>x</sup>** <sup>=</sup> (*x*1,..., *xk* ) <sup>∈</sup> *<sup>P</sup><sup>k</sup>* , *xh* ≺ *xl* ∀*h* < *l*, **<sup>d</sup>** <sup>=</sup> (*d*1,..., *dk*−<sup>1</sup>) <sup>∈</sup> <sup>R</sup>*<sup>k</sup>*−<sup>1</sup> , *W*Id(*xh* ) ≤ *dh* ≤ *W*Id(*xh*+1) ∀*h*.

(Minsum-*k*-Sink-Path-Confluent-Flow) **Input:** A dynamic flow path network *P* = (*P*, **w**, **c**, **l**,τ). **Goal:** Find a solution (**x**, **d**) to the problem

$$\begin{aligned} \text{minim.} & \quad \mathsf{AT}(\mathcal{P}, \mathbf{x}, \mathbf{d})\\ \text{s.t.} & \quad \mathbf{x} = (\mathbf{x}\_{1}, \dots, \mathbf{x}\_{k}) \in P^{k}, \\ & \quad \mathbf{d} = (d\_{1}, \dots, d\_{k-1}) \in \left\{ W\_{h} \mid h \in [1..n] \right\}^{k-1}, \qquad W\_{\text{Id}(\mathbf{x}\_{h})} \le d\_{h} \le W\_{\text{Id}(\mathbf{x}\_{h+1})} \; \forall h. \end{aligned}$$

For the Minsum-*k*-Sink-Path problem, [16] reported the following result, which is the best so far:

**Theorem 5.2** (*[16]*) *The* Minsum*-k-*Sink*-*Path*/*Minsum*-k-*Sink*-*Path*-*Confluent*-*Flow *problems can be solved in* min{*O*(*kn* log<sup>3</sup> *<sup>n</sup>*), *<sup>n</sup>*2*<sup>O</sup>*( <sup>√</sup>log *<sup>k</sup>* log log *<sup>n</sup>*) log3 *<sup>n</sup>*} *time. Moreover, if the capacities of <sup>P</sup> are uniform, then both the problems can be solved in* min{*O*(*kn* log<sup>2</sup> *<sup>n</sup>*), *<sup>n</sup>*2*<sup>O</sup>*( <sup>√</sup>log *<sup>k</sup>* log log *<sup>n</sup>*) log<sup>2</sup> *<sup>n</sup>*} *time.*

For the confluent flow model, it was shown in [6, 15] that for the minsum *k*-sink problems, there exists an optimal *k*-sink such that all of the *k* sinks are at vertices. [16] extended this fact to the non-confluent flow model.

**Lemma 5.6** (*[6, 15, 16]*) *For the minsum k-sink problem in a dynamic flow path network, there exists an optimal k-sink such that all of the k sinks are at vertices under the confluent/non-confluent flow model.*

Lemma 5.6 implies that it is sufficient to consider only the case where every sink is at a vertex. Thus, we suppose **x** = (*x*1,..., *xk* ) ∈ *V<sup>k</sup>* , where *xi* ≺ *x <sup>j</sup>* for *i* < *j*.

The fundamental idea of [16] for solving the Minsum-*k*-Sink-Path problem is to reduce it to the *minimum k-link path problem*. In the minimum *k*-link path problem, we are given a weighted complete directed acyclic graph (DAG) *G* = (*V* , *E* ,*w* ) with *V* = {*v <sup>i</sup>* | *i* ∈ [1..*n*]} and *E* = {(*v <sup>i</sup>*, *v <sup>j</sup>*) | *i*, *j* ∈ [1..*n*],*i* < *j*}. Each edge (*v <sup>i</sup>*, *v <sup>j</sup>*) is associated with a weight *w* (*i*, *j*). A *k-link* path is a path that contains exactly *k* edges. The task is to find a *k*-link path from *v* <sup>1</sup> to *v <sup>n</sup>* that minimizes the sum of weights of *k* edges. The minimum *k*-link path problem is represented as follows:

(Minimum-*k*-Link-Path)

**Input:** A weighted complete DAG *G* = (*V* , *E* ,*w* ). **Goal:** Find a *k*-link path (*v <sup>a</sup>*<sup>0</sup> = *v* <sup>1</sup>, *v a*1 , *v a*2 ,..., *v ak*−<sup>1</sup> , *v ak* = *v <sup>n</sup>*) from *v* <sup>1</sup> to *v n*.

$$\begin{aligned} \text{min.} & \quad \sum\_{i=1}^{k} w'(a\_{i-1}, a\_i) \\ \text{s.t.} & \quad a\_i \in [0..n], \quad \quad \quad \quad \quad a\_0 = 1, a\_k = n, a\_h < a\_l \; \forall h < \ell. \end{aligned}$$

Schieber [19] showed that the Minimum-*k*-Link-Path can be solved in almost linear time3 regardless of *k* if the weight function *w* satisfies the *concave Monge property*.

**Definition 5.1** *(Concave Monge property)* We say that a function *<sup>f</sup>* : <sup>Z</sup> <sup>×</sup> <sup>Z</sup> <sup>→</sup> <sup>R</sup> satisfies the concave Monge property if for any integers*i*, *j* with *i* + 1 < *j*, *f* (*i*, *j*) + *f* (*i* + 1, *j* + 1) ≤ *f* (*i* + 1, *j*) + *f* (*i*, *j* + 1) holds.

**Lemma 5.7** (*[19]*) *Given a weighted complete DAG with n vertices, if the weight function satisfies the concave Monge property, the* Minimum*-k-*Link*-*Path *can be solved in* min{*O*(*kn*)*, n*2*<sup>O</sup>*( <sup>√</sup>log *<sup>k</sup>* log log *<sup>n</sup>*) } *time.*

Higashikawa et al. [16] presented a reduction from Minsum-*k*-Sink-Path to Minimum-(*k* + 1)-link-Path such that the weight function *w* satisfies the concave Monge property. Let a dynamic flow path network*P* = (*P* = (*V*, *E*), **w**, **c**, **l**,τ) with *n* vertices be an instance of Minsum-*k*-Sink-Path. We prepare a weighted complete DAG *G* = (*V* , *E* ,*w* ) with *n* + 2 vertices, where *V* = {*v <sup>i</sup>* | *i* ∈ [0..*n* + 1]} and *E* = {(*v <sup>i</sup>*, *v <sup>j</sup>*) | *i*, *j* ∈ [0..*n* + 1],*i* < *j*}. We set the weight function *w* as

$$w'(i,j) = \begin{cases} \mathsf{AT}\_{\mathsf{OPT}}(i,j) & i, j \in [1..n], i < j, \\ \mathsf{AT}(\mathcal{P}\_{i.n}, (\upsilon\_i), ()) & i \in [1..n] \text{ and } j = n+1, \\ \mathsf{AT}(\mathcal{P}\_{1,j}, (\upsilon\_j), ()) & i = 0 \text{ and } j \in [1..n], \\ \infty & i = 0 \text{ and } j = n+1, \end{cases} \tag{5.16}$$

where ATOPT(*i*, *j*) is the optimal aggregate evacuation time required to move all the supply between *vi* and *v <sup>j</sup>* to one of two sinks *vi* or *v <sup>j</sup>* . On the weighted complete DAG *G* constructed as above, let us consider a (*k* + 1)-link path (*v <sup>a</sup>*<sup>0</sup> =

<sup>3</sup> Note that we assume that the weight function *w* is not given explicitly, but that a value *w* (*i*, *j*) can be obtained in constant time whenever required.

*v* <sup>0</sup>, *v a*1 ,..., *v ak* , *v ak*+<sup>1</sup> = *v <sup>n</sup>*+1) from *v* <sup>0</sup> to *v <sup>n</sup>*+1, where *a*1,..., *ak* are integers satisfying 0 < *a*<sup>1</sup> < *a*<sup>2</sup> < ··· < *ak* < *n* + 1. The sum of weights of this (*k* + 1)-link path is

$$\sum\_{l=0}^{k} \mathbf{w}'(a\_l, a\_{l+1}) = \mathsf{AT}(\mathcal{P}\_{1, a\_1}, (\mathbf{v}\_{a\_l}), 0) + \sum\_{l=1}^{k-1} \mathsf{AT}\_{\mathsf{OPT}}(a\_l, a\_{l+1}) + \mathsf{AT}(\mathcal{P}\_{a\_k, n}, (\mathbf{v}\_{a\_k}), 0).$$

This value is equivalent to min**<sup>d</sup>** AT(*P*, **x**, **d**) for a *k*-sink **x** = (*va*<sup>1</sup> , *va*<sup>2</sup> ,..., *vak* ), which implies that a minimum (*k* + 1)-link path on *G* corresponds to an optimal *k*-sink location for a dynamic flow path network *P*.

Let us consider the following subtasks:

(Minsum-Flow-for-Subpath)

**Input:** A dynamic flow path network *P* = (*P*, **w**, **c**, **l**,τ), integers *i*, *j* ∈ [1..*n*] with *i* < *j*.

**Goal:** Find a value *d* such that

$$\begin{aligned} \min. \qquad & \mathsf{AT}(\mathcal{P}\_{i,j}, \mathbf{x} = (\nu\_i, \nu\_j), \mathbf{d} = (d)), \\ \text{s.t.} \qquad & W\_i \le d \le W\_{j-1}. \end{aligned}$$

(Minsum-Flow-for-Subpath-Confluent-Flow)

**Input:** A dynamic flow path network *P* = (*P*, **w**, **c**, **l**,τ), integers *i*, *j* ∈ [1..*n*] with *i* < *j*.

**Goal:** Find a value *d* such that

$$\begin{aligned} \min. \qquad & \mathsf{AT}(\mathcal{P}\_{i,j}, \mathbf{x} = (\boldsymbol{\nu}\_{i}, \boldsymbol{\nu}\_{j}), \mathbf{d} = (d)), \\ \text{s.t.} \qquad & d \in \{W\_{h} \mid h \in [i..j-1] \}. \end{aligned}$$

Note that [16] developed a data structure to efficiently solve both of these problems. This data structure can be constructed in *O*(*n* log<sup>2</sup> *n*) time and can be used to solve Minsum-Flow-for-Subpath/Minsum-Flow-for-Subpath-Confluent-Flow in *O*(log<sup>3</sup> *n*) time. See [16] for details.

**Lemma 5.8** (*[16]*) *For a given dynamic flow path network P with n vertices, there exists a segment tree T that satisfies the following conditions:*


Because *w* satisfies the concave Monge property (see Sect. 5.5.2), Lemmas 5.7 and 5.8 lead to Theorem 5.2.

In the rest of this section, we first observe the properties of the aggregate evacuation time in Sect. 5.5.1 and then show that the weighted function *w* obtained by the reduction satisfies the concave Monge property in Sect. 5.5.2.

#### *5.5.1 Property of Aggregate Evacuation Time*

Recalling that we consider only the case where every sink is at a vertex, we simply use θ*<sup>i</sup>*,+(*z*) and θ*<sup>i</sup>*,−(*z*) instead of θ *vi*,+(*z*) and θ *vi*,+(*z*), respectively.

We next give the general form of the aggregate evacuation time. Let φ*<sup>i</sup>*,+(*z*) denote the aggregate evacuation time when the first *z* − *Wi* of supply on the right side of *vi* flows to sink *vi* . Similarly, we denote by φ*<sup>i</sup>*,−(*z*) the aggregate evacuation time when the first *Wi*−<sup>1</sup> − *z* of supply on the left side of *vi* flows to sink *vi* . Therefore, we have

$$\begin{aligned} \phi^{i,+}(z) &= \int\_{W\_i}^z \theta^{i,+}(t)dt = \int\_0^z \theta^{i,+}(t)dt \quad \text{and} \\\phi^{i,-}(z) &= \int\_z^{W\_{i-1}} \theta^{i,-}(t)dt = \int\_z^{W\_n} \theta^{i,-}(t)dt = -\int\_{W\_n}^z \theta^{i,-}(t)dt \end{aligned} \tag{5.17}$$

(see Fig. 5.2). For *i*, *j* ∈ [1..*n*] with *i* < *j*, we define

$$\phi^{i,j}(z) = \phi^{i,+}(z) + \phi^{j,-}(z) = \int\_0^z \theta^{i,+}(t)dt + \int\_z^{W\_n} \theta^{j,-}(t)dt\tag{5.18}$$

for *z* ∈ [*Wi*, *Wj*−1].

**Fig. 5.2** Thick half-open segments represent the function θ*i*,+(*t*) and the gray area represents φ*i*,+(*z*) for some *z* > *Wi*

Suppose that we are given a *k*-sink **x** = (*x*1,..., *xk* ) ∈ *V<sup>k</sup>* and a divider **d** = (*d*1,..., *dk*−1). Recalling the definition of Id(*p*) for *p* ∈ *P*, we have *xi* = *v*Id(*xi*) for all *i* ∈ [1..*k*]. Because each sink is at a vertex, by simply modifying the integration intervals in Eq. (5.10), the aggregate evacuation time AT(*P*, **x**, **d**) is given by

$$\begin{split} \mathsf{AT}(\mathcal{P}, \mathbf{x}, \mathbf{d}) &= \sum\_{i \in \{1, \ldots\}} \left( \int\_{d\_{l-1}}^{\mathrm{W}\_{\mathrm{Id}(\mathbf{t}\_{i})-1}} \theta^{\mathbf{x}, -}(z) dz + \int\_{\mathrm{W}\_{\mathrm{Id}(\mathbf{t}\_{i})}}^{d\_{l}} \theta^{\mathbf{x}, +}(z) dz \right) \\ &= \sum\_{i \in \{1, \ldots\}} \left( \int\_{d\_{l-1}}^{\mathrm{W}\_{\mathrm{Id}(\mathbf{t}\_{i})-1}} \theta^{\mathrm{Id}(\mathbf{t}\_{i}), -}(z) dz + \int\_{\mathrm{W}\_{\mathrm{Id}(\mathbf{t}\_{i})}}^{d\_{l}} \theta^{\mathrm{Id}(\mathbf{t}\_{i}), +}(z) dz \right) . \end{split} \tag{5.19}$$

By Eqs. (5.17), (5.18) and (5.19), we have

$$\begin{split} \mathsf{AT}(\mathcal{P}, \mathbf{x}, \mathbf{d}) &= \sum\_{i \in [1..k]} \left( \int\_{d\_{i-1}}^{\mathrm{Val}(\mathbf{x}\_{i})-1} \theta^{\mathrm{ld}(\mathbf{x}\_{i}),-}(z) dz + \int\_{W\_{\mathrm{Id}(\mathbf{x}\_{i})}}^{d\_{i}} \theta^{\mathrm{ld}(\mathbf{x}\_{i}),+}(z) dz \right) \\ &= \sum\_{i \in [1..k]} \left( \phi^{\mathrm{ld}(\mathbf{x}\_{i}),-}(d\_{i-1}) + \phi^{\mathrm{ld}(\mathbf{x}\_{i}),+}(d\_{i}) \right) \\ &= \phi^{\mathrm{ld}(\mathbf{x}\_{1}),-}(0) + \sum\_{i \in [1..k-1]} \phi^{\mathrm{ld}(\mathbf{x}\_{i}),\mathrm{ld}(\mathbf{x}\_{i+1})}(d\_{i}) + \phi^{\mathrm{ld}(\mathbf{x}\_{k}),+}(W\_{n}). \end{split}$$

In the rest of this section, we show the important properties of φ*<sup>i</sup>*,*<sup>j</sup>* (*z*). Let us first confirm that by Eq. (5.17), both φ*<sup>i</sup>*,+(*z*) and φ *<sup>j</sup>*,−(*z*) are convex in *z* since θ*<sup>i</sup>*,+(*z*) and −θ *<sup>j</sup>*,−(*z*) are non-decreasing in *z*, and therefore φ*<sup>i</sup>*,*<sup>j</sup>* (*z*) is convex in *z*. We have a more useful lemma that gives the conditions for the minimizer of φ*<sup>i</sup>*,*<sup>j</sup>* (*z*).

**Lemma 5.9** (*[16]*) *For any i*, *j* ∈ [1..*n*] *with i* < *j, there uniquely exists*

$$z^\* \in \underset{z \in \{W\_i, W\_{j-1}\}}{\text{arg min }} \max\{\theta^{i,+}(z), \theta^{j,-}(z)\}.$$

*Furthermore,* φ*<sup>i</sup>*,*<sup>j</sup>* (*z*) *is minimized on* [*Wi*, *Wj*−<sup>1</sup>] *when z* = *z*∗*.*

*Proof* By Eqs. (5.4) and (5.5), θ*<sup>i</sup>*,+(*z*) is strictly increasing in *z* ∈ [*Wi*, *Wn*]. Similarly, by Eqs. (5.6) and (5.7), θ *<sup>j</sup>*,−(*z*) is strictly decreasing in *z* ∈ [0, *Wj*−<sup>1</sup>]. Thus, there uniquely exists *z*<sup>∗</sup> ∈ [*Wi*, *Wj*−<sup>1</sup>].

We then see that for any *z* ∈ [*Wi*,*z*∗],

$$\begin{aligned} \phi^{i,j}(z^\*) - \phi^{i,j}(z') &= \phi^{i,+}(z^\*) + \phi^{j,-}(z^\*) - (\phi^{i,+}(z') + \phi^{j,-}(z')) \\ &= \int\_{z'}^{z^\*} \theta^{i,+}(t)dt - \int\_{z'}^{z^\*} \theta^{j,-}(t)dt \\ &= \int\_{z'}^{z^\*} \left\{\theta^{i,+}(t) - \theta^{j,-}(t)\right\} dt \le 0, \end{aligned}$$

and for any *z* ∈ [*z*∗, *Wj*−<sup>1</sup>],

$$\begin{aligned} \phi^{i,j}(z^\*) - \phi^{i,j}(z') &= \phi^{i,+}(z^\*) + \phi^{j,-}(z^\*) - (\phi^{i,+}(z') + \phi^{j,-}(z')) \\ &= \int\_{z''}^{z^\*} \theta^{i,+}(t)dt - \int\_{z''}^{z^\*} \theta^{j,-}(t)dt \\ &= -\int\_{z^\*}^{z''} \left\{\theta^{i,+}(t) - \theta^{j,-}(t)\right\} dt \le 0, \end{aligned}$$

which imply that *z*<sup>∗</sup> minimizes φ*<sup>i</sup>*,*<sup>j</sup>* (*z*) on [*Wi*, *Wj*−1]. -

In the following sections, this *z*<sup>∗</sup> is called the *pseudo-intersection point*<sup>4</sup> of θ*<sup>i</sup>*,+(*z*) and θ *<sup>j</sup>*,−(*z*).

#### *5.5.2 Concave Monge Property*

We now show that the function *w* defined in Eq. (5.16) satisfies the concave Monge property under the non-confluent flow model. We omit the proof for the confluent flow model, since the proof can be constructed similarly to the one for the confluent flow model. See [16] for details.

Let us give some observations of ATOPT(*i*, *j*). Under the non-confluent flow model, for any *i*, *j* ∈ [1..*n*] with *i* < *j*, ATOPT(*i*, *j*) = min*<sup>z</sup>*∈[*Wi*,*Wj*−1] φ*<sup>i</sup>*,*<sup>j</sup>* (*z*). Lemma 5.9 implies that φ*<sup>i</sup>*,*<sup>j</sup>* (*z*) on [*Wi*, *Wj*−1] is minimized when *z* is the pseudointersection point of θ*<sup>i</sup>*,+(*z*) and θ *<sup>j</sup>*,−(*z*). For any *i*, *j* ∈ [1..*n*] with *i* < *j*, let α*<sup>i</sup>*,*<sup>j</sup>* denote the pseudo-intersection point of θ*<sup>i</sup>*,+(*z*) and θ *<sup>j</sup>*,−(*z*).

Thus, we have

$$\mathsf{AT}\_{\mathsf{OPT}}(i,j) = \phi^{i,j}(\alpha^{i,j}) = \int\_0^{\alpha^{i,j}} \theta^{i,+}(z)dz + \int\_{\alpha^{i,j}}^{W\_n} \theta^{j,-}(z)dz. \tag{5.21}$$

We give the following two lemmas.

**Lemma 5.10** (*[16]*) *For any integer i* ∈ [1..*n* − 1] *and any z* ∈ [0, *Wn*]*,*

$$
\theta^{i,+}(z) \ge \theta^{i+1,+}(z) \text{ and } \theta^{i,-}(z) \le \theta^{i+1,-}(z)
$$

*hold.*

*Proof* We give the proof of only θ*<sup>i</sup>*,+(*z*) ≥ θ*<sup>i</sup>*+1,+(*z*) because the other case can be proved in a similar way. By Eq. (5.5), for any *j* ∈ [*i* + 2..*n*], we have

<sup>4</sup> The reason why we use the term "pseudo-intersection" is that the two functions θ*i*,+(*z*) and θ *<sup>j</sup>*,−(*z*) are not continuous, in general, whereas the "intersection" is usually defined for continuous functions.

$$\theta^{i,+,j}(z) - \theta^{i+1,+,j}(z) = \begin{cases} 0 & \text{if } z \le W\_{j-1}, \\ \frac{z - W\_{j-1}}{C\_{i,j}} + \tau \cdot L\_{i,j} & \text{if } W\_{j-1} < z \le W\_j, \\\frac{(z - W\_{j-1})(C\_{i+1,j} - C\_{i,j})}{C\_{i,j}C\_{i+1,j}} + \tau \cdot \ell\_i & \text{if } z > W\_j. \end{cases}$$

Because*Ci*+1,*<sup>j</sup>* − *Ci*,*<sup>j</sup>* = min{*ch* | *h* ∈ [*i* + 1.. *j* − 1]} − min{*ch* | *h* ∈ [*i*.. *j* − 1]} ≥ 0, θ*<sup>i</sup>*,+,*<sup>j</sup>* (*z*) − θ*<sup>i</sup>*+1,+,*<sup>j</sup>* (*z*) ≥ 0 holds. Therefore, we have θ*<sup>i</sup>*,+(*z*) ≥ θ*<sup>i</sup>*+1,+(*z*) since θ*<sup>i</sup>*,+(*z*) = max{θ*<sup>i</sup>*,+,*<sup>j</sup>* (*z*) | *j* ∈ [*i* + 1..*n*]} by Eq. (5.4). -

**Lemma 5.11** (*[16]*) *For any i*, *j* ∈ [1..*n*] *with i* < *j,*

$$
\alpha^{i,j} \le \alpha^{i+1,j} \le \alpha^{i+1,j+1} \text{ and } \alpha^{i,j} \le \alpha^{i,j+1} \le \alpha^{i+1,j+1}
$$

*hold.*

*Proof* We give the proof of only α*<sup>i</sup>*,*<sup>j</sup>* ≤ α*<sup>i</sup>*+1,*<sup>j</sup>* because the other cases can be proved in a similar way. For any *i*, *j* ∈ [1..*n*] with *i* < *j* and positive constant , we have

$$
\theta^{i+1,+}(\alpha^{i,j}-\epsilon) \le \theta^{i,+}(\alpha^{i,j}-\epsilon) < \theta^{j,-}(\alpha^{i,j}-\epsilon)
$$

because θ*<sup>i</sup>*,+(*z*) ≥ θ*<sup>i</sup>*+1,+(*z*) holds by Lemma 5.10 and θ *<sup>j</sup>*,−(*z*) is a non-increasing function. This implies that α*<sup>i</sup>*,*<sup>j</sup>* ≤ α*<sup>i</sup>*+1,*<sup>j</sup>* holds, which completes the proof. -

Let us show that the function *w* defined in Eq. (5.16) satisfies the concave Monge property under the non-confluent flow model.

**Lemma 5.12** (*[16]*) *The weight function w defined in Eq. (5.16) satisfies the concave Monge property under the non-confluent flow model.*

*Proof* If we show that, for any *i*, *j* ∈ [0..*n*] with *i* < *j*,

$$w'(i,j) + w'(i+1,j+1) \le w'(i,j+1) + w'(i+1,j)\tag{5.22}$$

holds, thus completing the proof. Note that condition (5.22) holds for *i* = 0 and *j* = *n*, because the right-hand side of (5.22) contains *w* (0, *n* + 1) = ∞ and other terms are finite. Let us consider the following three cases: (1) 0 < *i* < *j* < *n*, (2) *i* = 0 and 0 < *j* < *n*, (3) 0 < *i* < *n* and *j* = *n*.

Case 1. Consider the case of 0 < *i* < *j* < *n*. By Eq. (5.16), for any (*i* , *j* ) ∈ {(*i*, *j*), (*i*, *j* + 1), (*i* + 1, *j*), (*i* + 1, *j* + 1)}, we have *w* (*i* , *j* ) = ATOPT(*i* , *j* ). Recall that α*<sup>i</sup>*,*<sup>j</sup>* is the pseudo-intersection point of θ*<sup>i</sup>*,+(*z*) and θ *<sup>j</sup>*,−(*z*), and we have

$$\mathbb{W}'(i,j) = \mathsf{AT}\_{\mathsf{OPT}}(i,j) = \phi^{i,j}(\alpha^{i,j}) = \int\_0^{\mathsf{a}^{i,j}} \theta^{i,+}(z)dz + \int\_{\mathsf{a}^{i,j}}^{W\_n} \theta^{j,-}(z)dz. \\ (5.23)$$

For any *i*, *j* ∈ [1..*n* − 1] with *i* < *j*, Eq. (5.23) and Lemma 5.11 state that

#### 5 Almost Linear Time Algorithms for Some Problems on Dynamic Flow Networks 83

$$\begin{split} &w'(i,j+1) + w'(i+1,j) - w'(i,j) - w'(i+1,j+1) \\ &= \phi^{i,j+1}(\alpha^{i,j+1}) + \phi^{i+1,j}(\alpha^{i+1,j}) - \phi^{i,j}(\alpha^{i,j}) - \phi^{i+1,j+1}(\alpha^{i+1,j+1}) \\ &= \int\_{\alpha^{i,j}}^{\alpha^{i,j+1}} \theta^{i,+}(z)dz + \int\_{\alpha^{i,j+1}}^{\alpha^{i+1,j+1}} \theta^{j+1,-}(z)dz \\ &\qquad - \int\_{\alpha^{i,j}}^{\alpha^{i+1,j}} \theta^{j,-}(z)dz - \int\_{\alpha^{i+1,j}}^{\alpha^{i+1,j+1}} \theta^{i+1,+}(z)dz. \tag{5.24} \end{split}$$

Now, we show that for any *z* ∈ [α*<sup>i</sup>*,*<sup>j</sup>* , α*<sup>i</sup>*+1,*j*+<sup>1</sup>),

$$\min\{\theta^{i,+}(z), \theta^{j+1,-}(z)\} \ge \max\{\theta^{j,-}(z), \theta^{i+1,+}(z)\}$$

holds. First, for any *z* ∈ [0, *Wn*], θ*<sup>i</sup>*,+(*z*) ≥ θ*<sup>i</sup>*+1,+(*z*) and θ *<sup>j</sup>*,−(*z*) ≤ θ *<sup>j</sup>*+1,−(*z*) hold by Lemma 5.10. For any *z* ≥ α*<sup>i</sup>*,*<sup>j</sup>* , θ*<sup>i</sup>*,+(*z*) ≥ θ *<sup>j</sup>*,−(*z*) holds because α*<sup>i</sup>*,*<sup>j</sup>* is the pseudointersection point of θ*<sup>i</sup>*,+(*z*) and θ *<sup>j</sup>*,−(*z*). Similarly, for any *z* < α*<sup>i</sup>*+1,*j*+1, we have θ*<sup>i</sup>*+1,+(*z*) ≤ θ *<sup>j</sup>*+1,−(*z*). Therefore, for any *z* ∈ [α*<sup>i</sup>*,*<sup>j</sup>* , α*<sup>i</sup>*+1,*j*+<sup>1</sup>), min{θ*<sup>i</sup>*,+(*z*), θ *<sup>j</sup>*+1,<sup>−</sup> (*z*)} ≥ max{θ *<sup>j</sup>*,−(*z*), θ*<sup>i</sup>*+1,+(*z*)} holds.

Thus, Eq. (5.24) continues as

$$\begin{aligned} &\quad w'(i,j+1) + w'(i+1,j) - w'(i,j) - w'(i+1,j+1) \\ \geq &\int\_{\mathfrak{a}^{i,j}}^{\mathfrak{a}^{i+1,j+1}} \min\{\theta^{i,+}(z), \theta^{j+1,-}(z)\} - \int\_{\mathfrak{a}^{i,j}}^{\mathfrak{a}^{i+1,j+1}} \max\{\theta^{j,-}(z), \theta^{i+1,+}(z)\} dz \\ \geq &0, \end{aligned}$$

and then condition (5.22) holds for any *i*, *j* with 0 < *i* < *j* < *n*.

Case 2. Consider the case of *i* = 0 and *j* ∈ [1..*n* − 1]. Recall that *w* (0, *j*) = φ *<sup>j</sup>*,−(0) and *w* (0, *j* + 1) = φ *<sup>j</sup>*+1,−(0) by Eq. (5.16). In this case, we have

$$\begin{split} &\quad w'(0,j+1) + w'(1,j) - w'(0,j) - w'(1,j+1) \\ &= \phi^{j+1,-}(0) + \phi^{1,j}(\alpha^{1,j}) - \phi^{j,-}(0) - \phi^{1,j+1}(\alpha^{1,j+1}) \\ &= \int\_{0}^{W\_{n}} \theta^{j+1,-}(z) dz + \int\_{0}^{\alpha^{1,j}} \theta^{1,+}(z) dz + \int\_{\alpha^{1,j}}^{W\_{n}} \theta^{j,-}(z) dz \\ &\quad - \int\_{0}^{W\_{n}} \theta^{j,-}(z) dz - \int\_{0}^{\alpha^{1,j+1}} \theta^{1,+}(z) dz - \int\_{\alpha^{1,j+1}}^{W\_{n}} \theta^{j+1,-}(z) dz \\ &= \int\_{0}^{\alpha^{1,j+1}} \theta^{j+1,-}(z) dz - \int\_{0}^{\alpha^{1,j}} \theta^{j,-}(z) dz - \int\_{\alpha^{1,j}}^{\alpha^{1,j+1}} \theta^{1,+}(z) dz, \end{split}$$

where the last equality uses α<sup>1</sup>,*<sup>j</sup>* ≤ α<sup>1</sup>,*j*+<sup>1</sup> by Lemma 5.11. By Lemma 5.10, we have θ *<sup>j</sup>*+1,−(*z*) ≥ θ *<sup>j</sup>*,−(*z*) for any *z* ∈ [0, *Wn*]. Using the same argument as in the previous case, for any *z* < α<sup>1</sup>,*j*+1, we have θ <sup>1</sup>,+(*z*)<θ *<sup>j</sup>*+1,−(*z*). Thus, we have

$$\begin{aligned} &w'(0,j+1) + w'(1,j) - w'(0,j) - w'(1,j+1) \\ &= \int\_0^{a^{1,j+1}} \theta^{j+1,-}(z)dz - \int\_0^{a^{1,j}} \theta^{j,-}(z)dz - \int\_{a^{1,j}}^{a^{1,j+1}} \theta^{1,+}(z)dz \\ &= \int\_0^{a^{1,j}} \left\{\theta^{j+1,-}(z) - \theta^{j,-}(z)\right\} dz + \int\_{a^{1,j}}^{a^{1,j+1}} \left\{\theta^{j+1,-}(z) - \theta^{1,+}(z)\right\} dz \ge 0. \end{aligned}$$

Case 3. Consider the case of *j* = *n* and *i* ∈ [1..*n* − 1]. Recall that *w* (*i*, *n* + 1) = φ*i*,+(*Wn*) and *w* (*i* + 1, *n* + 1) = φ*i*+1,+(*Wn*) by Eq. (5.16). Similar to the second case, we use the facts that α*<sup>i</sup>*,*<sup>n</sup>* ≤ α*<sup>i</sup>*+1,*<sup>n</sup>* by Lemma 5.11, θ*<sup>i</sup>*,+(*z*) ≥ θ*<sup>i</sup>*+1,+(*z*) for any *z* ∈ [0, *Wn*] by Lemma 5.10, and θ*<sup>i</sup>*,+(*z*) ≥ θ *<sup>n</sup>*,−(*z*) for any *z* ≥ α*<sup>i</sup>*,*<sup>n</sup>*. Then, we have

$$\begin{split} &w'(i,n+1) + w'(i+1,n) - w'(i,n) - w'(i,n+1) \\ &= \phi^{i,+}(W\_n) + \phi^{i+1,n}(\alpha^{i+1,n}) - \phi^{i,n}(\alpha^{i,n}) - \phi^{i+1,+}(W\_n) \\ &= \int\_0^{W\_n} \theta^{i,+}(z) dz + \int\_0^{\alpha^{i+1,n}} \theta^{i+1,+}(z) dz + \int\_{\alpha^{i+1,n}}^{W\_n} \theta^{n,-}(z) dz \\ &\quad - \int\_0^{\alpha^{i,n}} \theta^{i,+}(z) dz - \int\_{\alpha^{i,n}}^{W\_n} \theta^{n,-}(z) dz - \int\_0^{W\_n} \theta^{i+1,+}(z) dz \\ &= \int\_{\alpha^{i,n}}^{W\_n} \theta^{i,+}(z) dz - \int\_{\alpha^{i,n}}^{\alpha^{i+1,n}} \theta^{n,-}(z) dz - \int\_{\alpha^{i+1,n}}^{W\_n} \theta^{i+1,+}(z) dz \\ &= \int\_{\alpha^{i,n}}^{\alpha^{i+1,n}} \left\{\theta^{i,+}(z) - \theta^{n,-}(z)\right\} dz + \int\_{\alpha^{i+1,n}}^{W\_n} \left\{\theta^{i,+}(z) - \theta^{i+1,+}(z)\right\} dz \geq 0. \end{split}$$

Thus, for any *i*, *j* ∈ [0..*n*] with *i* < *j*, condition (5.22) holds. This implies that the function *w* satisfies the concave Monge condition. -

**Acknowledgements** We thank Robert Benkoczi, Binay Bhattacharya, Mordecai J. Golin, and Tsunehiko Kameda for many helpful discussions on this topic. This work was supported by JST CREST Grant Number JPMJCR1402, JSPS KAKENHI Grant Number 19H04068, and JSPS KAK-ENHI Grant Number 20K19746.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part III Sublinear Data Structures**

# **Chapter 6 Information Processing on Compressed Data**

**Yoshimasa Takabatake, Tomohiro I, and Hiroshi Sakamoto**

**Abstract** We survey our recent work related to information processing on compressed strings. Note that a "string" here contains any fixed-length sequence of symbols and therefore includes not only ordinary text but also a wide range of data, such as pixel sequences and time-series data. Over the past two decades, a variety of algorithms and their applications have been proposed for compressed information processing. In this survey, we mainly focus on two problems: recompression and privacy-preserving computation over compressed strings. Recompression is a framework in which algorithms transform a given compressed data into another compressed format without decompression. Recent studies have shown that a higher compression ratio can be achieved at lower cost by using an appropriate recompression algorithm such as preprocessing. Furthermore, various privacy-preserving computation models have been proposed for information retrieval, similarity computation, and pattern mining.

#### **6.1 Restructuring Compressed Data**

Data compression plays a central role in the efficient transmission and storage of data. Recent developments have also shown that data compression is a useful tool for processing highly repetitive data which contains long common substrings. Typical examples of highly repetitive data include collections of genomes taken from similar species and versioned documents. Popular compressors for highly repetitive data include Lempel-Ziv 77 (LZ77) [40], run-length encoded Burrows-Wheeler transform (RLBWT) [8], and grammar-based compression [34]. For each of these compression methods, researchers have developed techniques for operating on compressed data. For example, there are indexes based on LZ77 [37], RLBWT [17], and grammarbased compression [11]. Although recent studies [33, 36, 45] have investigated the fundamentals of these techniques and obtained a unified view of the compressibility of highly repetitive data, each compressed format still has pros and cons that cannot

Y. Takabatake · T. I · H. Sakamoto (B)

Kyushu Institute of Technology, Kitakyushu, Japan e-mail: hiroshi@ai.kyutech.ac.jp

N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_6

<sup>©</sup> The Author(s) 2022

be ignored in practice. LZ77 usually achieves better compression than other compression methods, the index based on RLBWT (called *r*-index) supports very fast pattern search, and grammar-based compression is easy to handle in both theory and practice. Thus, in order to take advantage of the virtues of the different compressed formats, it is useful to have algorithms that can efficiently convert one compressed format to another. In this section, we present some examples of these algorithms.

#### *6.1.1 Preliminaries*

Let be an ordered alphabet, that is, a set of characters that has a total order. A string over is a sequence of characters chosen from -. The length of a string w is denoted by |w|. For any 1 ≤ *i* ≤ |w|, the *i*th character of w is denoted by w[*i*]. The substring of w starting at *i* and ending at *j* is denoted by w[*i*... *j*]. The substring w[*i*... *j*] is called a *prefix* (resp., *suffix*) if *i* = 1 (resp., *j* = |w|). The reversed string of w is denoted by w*<sup>R</sup>*, namely, w*<sup>R</sup>* = w[|w|]w[|w| − 1]··· w[2]w[1].

Let *T* be a string of length *n* over -. We consider the following three compression schemes for *T* .

**LZ77**: LZ77 is characterized by greedy factorization *T* = *f*<sup>1</sup> *f*<sup>2</sup> ··· *fz* of *T*. The *i*th factor *fi* is a single character if the character does not appear in *f*<sup>1</sup> *f*<sup>2</sup> ··· *fi*−1, and otherwise, the longest substring such that there is another occurrence *si* of *fi* with *si* ≤ | *f*<sup>1</sup> *f*<sup>2</sup> ··· *fi*−1|. The position *si* is called the reference position of the *i*th LZ77 factor *fi* . We can store *T* in *O*(*z*)-space because each factor *fi* (in the second case) can be replaced with a pair (*si*, | *fi*|).

**BWT, RLBWT**: For simplicity, we assume that *T* is extended by the end marker \$, which is a special character not in and lexicographically smaller than any character in -, that is, *T*[*n* + 1] = \$. The Burrows-Wheeler transform [8] is a permutation *L* of characters in *T*[1 ... *n* + 1] obtained as follows: *L*[*i*] is the character preceding the lexicographically *i*th smallest suffix among all non-empty suffixes of *T* with the exception that *L*[*i*] = \$ when the *i*th smallest suffix is *T* itself (and therefore has no preceding character). The resulting string *L* can be interpreted as a sequence obtained by sorting characters in *T* according to their context (succeeding suffixes). Since characters sharing similar context tend to be identical, *L* is well compressible by run-length encoding. The run-length encoded BWT is called RLBWT.

Let SA[1 ... *n* + 1] denote the suffix array of *T* [1 ... *n* + 1], where SA[*i*] is the starting position of the lexicographically *i*th smallest suffix. We consider SA as a mapping from BWT position to text position and say that the BWT position *i* corresponds to the text position SA[*i*]. One crucial operation on the BWT string *L* is the so-called LF mapping that maps a BWT position *i* to the BWT position corresponding to text position SA[*i*] − 1. LF mapping can be implemented by a rank data structure on *L* that returns the number of occurrences of a character *c* in *L*[1 ... *i*] for any character *c* and BWT position *i*.

By using LF mapping, we can also support backward search. For any string w that appears in *T*, there is a unique maximal interval [*b* ... *e*] such that the lexicographically *i*th suffix is prefixed by w iff *i* ∈ [*b* ... *e*]. Note that *e* − *b* + 1 is the number of occurrences of w in *T* and the text positions corresponding to these positions represent the occurrences of w. A single step of the backward search computes the *c*w-interval from the w-interval by using the same mechanism as LF mapping, where *c* is a character. The index based on backward search on BWT is known as the FM-index [14]. Although it was previously known that the occurrences of a pattern can be counted by a backward search implemented in RLBWT space [41], it was recently reported that RLBWT can be augmented with an *O*(*r*)-space data structure to report all the occurrences of the pattern efficiently. The index based on RLBWT is called the *r*-index [17].

**Grammar compression**: Grammar compression is a general framework of data compression in which a context-free grammar (CFG) *S* = (-, *V*, *D*) that derives a single string *T* is considered to be a compressed representation of *T*, where is the set of characters (terminals), *V* is the set of variables (non-terminals), *D* is the set of deterministic production rules whose right-hand sides are strings over (*V* ∪ -), and the last variable derives *T*. <sup>1</sup> The compressed size of *<sup>S</sup>* is expressed by the sum of the lengths of right-hand sides of the production rules in *S*. We consider run-length encoding right-hand sides of CFGs, and call such CFGs *run-length encoded CFGs (RLCFGs)*. The compressed size of an RLCFG is expressed by the sum of run-length encoded sizes of right-hand sides of the production rules.

**Algorithm 1:** Supposing that we have parsed suffix *T*[*p* + 1 ...], compute the length of the next LZ77 factor ending at *p*.

**<sup>1</sup>** *p* ← *p*; **<sup>2</sup>** w ← ε; **<sup>3</sup>** *c* ← *T*[*p* ]; **4 while** *c*w*-interval contains a text position larger than p* **do <sup>5</sup>** *p* ← *p* − 1; **<sup>6</sup>** w ← *c*w; **<sup>7</sup>** *c* ← *T*[*p* ]; **<sup>8</sup> return** min(1, *p* − *p*);

#### *6.1.2 RLBWT to LZ77*

Algorithms to compute LZ77 from RLBWT are considered in [3, 32, 46, 47, 49]. An essential task when computing LZ77 is to search for the longest prefix of *T*[| *f*<sup>1</sup> *f*<sup>2</sup> ··· *fi*−<sup>1</sup>| + 1 ···] that occurs before and compute an occurrence *si* ≤ | *f*<sup>1</sup> *f*<sup>2</sup> ··· *fi*−<sup>1</sup>| of *fi* . The basic idea is to use the backward search on RLBWT of *T <sup>R</sup>* to perform this task. One difficulty is ignoring the BWT positions that correspond

<sup>1</sup> We treat the last variable as the starting variable.

to the suffixes starting after | *f*<sup>1</sup> *f*<sup>2</sup> ··· *fi*−<sup>1</sup>| during the backward search. In [49], it is shown that keeping at most 2*r* BWT positions is sufficient to compute the longest prefix and a reference position for the LZ77 factor. This subsection gives a brief review of this idea.

For the sake of this explanation, consider the case of LZ77 parsing from right to left (i.e., we conceptually compute the LZ77 factorization for *T <sup>R</sup>*) so that backward search on *T* (instead of the reversed one) can be used. Supposing that we have parsed suffix *T*[*p* + 1 ...], Algorithm 1 shows how to compute the length of the next factor ending at *p*. To check whether the *c*w-interval contains a text position larger than *p* , we partition SA into *r* subintervals and maintain at most two positions for each subinterval, which is the LF-mapped interval of a run of *L*. Suppose that the *c*winterval [*b* ... *e*] is non-empty and [*b* ... *e*] is covered by consecutive subintervals [*b*<sup>1</sup> ... *e*1],[*b*<sup>2</sup> ... *e*2],...,[*bk* ... *ek* ] with minimal integer *k*, that is, *b*<sup>1</sup> ≤ *b* < *e*<sup>1</sup> + 1 = *b*<sup>2</sup> < *e*<sup>2</sup> + 1 = *b*<sup>3</sup> < ··· < *ek*−<sup>1</sup> + 1 = *bk* ≤ *e* ≤ *ek* . If *k* = 1, the characters of *L* in w-interval consist of a single character *c* and all positions in w-interval are LFmapped to *c*w-interval. Therefore, *c*w-interval contains a text position larger than *p* iff w-interval satisfies the condition in the previous step. For the case of *k* > 1, we mark the closest positions from the boundaries of subintervals that correspond to text positions larger than *p* . Using this information, we can check whether SA[*b*<sup>1</sup> ... *e*1] and/or SA[*bk* ... *ek* ] contain a text position larger than *p* . We also maintain the data structure to check whether a subinterval in [*b*<sup>2</sup> ... *e*2],...,[*bk*−<sup>1</sup> ... *ek*−1] contains a text position larger than *p* , and if so we compute which interval contains that position.

In this way, we can compute the lengths of LZ77 factors. The reference position for each LZ77 factor can also be computed by maintaining text positions corresponding to the marked positions in each subinterval. The data structures use only *O*(*r*) words of space.

In [46], the data structures are tuned to improve the time complexity. In [47], a fast implementation for the backward search in RLBWT space was proposed and applied to the above-mentioned algorithm. In [3], an online construction of *r*-index was proposed and the technique was extended to an online LZ77 factorization algorithm in RLBWT space. In [32], a different approach to converting RLBWT to LZ77 was proposed.

#### *6.1.3 Recompression on Grammar Compression*

Given that there are a number of CFGs with different properties for representing strings, we may want to transform one CFG to another without explicitly decompressing the text. In this subsection, we introduce a technique called recompression which has proven to be a powerful tool in problems related to grammar compression [26–28, 31] and word equations [29, 30].

In [27], Je˙z proposed an algorithm TtoG for computing an RLCFG of *T* in *O*(*N*) time. Let TtoG(*T*) denote the RLCFG of *T* produced by TtoG. We use the term *letters* for characters and variables introduced by TtoG. A run is called a *block* in this subsection. TtoG consists of two different types of compression, namely, block compression (BComp) and pair compression (PComp).


TtoG compresses *T*<sup>0</sup> = *T* by applying BComp and PComp in turns until the string is shrunk down to a single letter. Because PComp compresses a given string by a constant factor of 3/4, the height of TtoG(*T*) is *O*(lg *N*).

TtoG performs level-by-level transformation of *T*<sup>0</sup> into strings *T*1, *T*2,..., *Th*ˆ, where |*Th*ˆ| = 1. If *h* is even, the transformation from *Th* to *Th*+<sup>1</sup> is performed by BComp, and production rules of the form *c* → ¨*c<sup>d</sup>* are introduced. If *h* is odd, the transformation from *Th* to *Th*+<sup>1</sup> is performed by PComp, and production rules of the form *c* → ´*cc*` are introduced. Let *<sup>h</sup>* be the set of letters appearing in *Th*.

The advantage of TtoG is that it can be simulated on *S* = *S*<sup>0</sup> = (-<sup>0</sup>, *V*, *D*0) without decompression. We consider the level-by-level transformation of *S*<sup>0</sup> into CFGs *S*<sup>1</sup> = (-<sup>1</sup>, *V*, *D*1), *S*<sup>2</sup> = (-<sup>2</sup>, *V*, *D*2), . . . , *S<sup>h</sup>*<sup>ˆ</sup> = (*<sup>h</sup>*ˆ, *V*, *D<sup>h</sup>*ˆ), where each *S<sup>h</sup>* generates *Th*. More specifically, the compression from *Th* to *Th*+<sup>1</sup> is simulated on *Sh*. We can correctly compute the letters introduced in each level *h* + 1 while modifying *S<sup>h</sup>* into *S<sup>h</sup>*+1; hence, we get all the letters of TtoG(*T*) in the end. We note that new "variables" are never introduced and modifications are made by rewriting the right-hand sides of the original variables.

We now show howPComp is performed on*S<sup>h</sup>* for odd *h*. That is, we compute*S<sup>h</sup>*+<sup>1</sup> from *Sh*. Note that any occurrence *i* of a pair *c*´*c*` in *Th* can be uniquely associated with a variable *X* that is the label of the lowest node covering the interval [*i* ... *i* + 1] in the derivation tree of *S<sup>h</sup>* (recall that *S<sup>h</sup>* generates *Th*). We can compute the frequency table of pairs by counting pairs associated with *X* in *Dh*(*X*) and multiplying it by the number of occurrences of *X* in the derivation tree of *Sh*. The frequency table is used to compute a partition of *<sup>h</sup>*, which determines the pairs to be replaced. A pair appears *explicitly* in right-hand sides or *crosses* the boundaries of variables. We can modify *S<sup>h</sup>* so that all the crossing occurrences to be replaced appear explicitly in some right-hand side, then replace the explicit occurrences to get *S<sup>h</sup>*+1. In a similar way, BComp can also be performed on *S<sup>h</sup>* for odd *h*.

In [23], it is shown that TtoG(*T*) can be used to answer the longest common extension (LCE) queries and the transformation from arbitrary CFG *S* to TtoG(*T*) is a key for efficient construction algorithms of LCE data structures in grammar compressed space. In [53], the recompression technique is modified to transform arbitrary CFG *S* into the CFG obtained by the RePair algorithm [38]. RePair is known to achieve the best compression performance in practice and there are many studies on computing RePair in small space. Using *online* grammar compression algorithms, such as [43, 57], the algorithm in [53] leads to the first RePair algorithm working in compressed space.

#### **6.2 Privacy-Preserving Similarity Computation**

#### *6.2.1 Related Work*

This section reviews recent results in privacy-preserving information retrieval over strings recently presented in [59]. As the number of strings containing personal information has increased, privacy-preserving computation has become increasingly important. Secure computation based on public-key encryption is one of the great accomplishments of modern cryptography because it allows untrusted parties to compute a function based on their private inputs, while revealing nothing but the result.

Rapid progress in gene sequencing technology has expanded the range of applications of edit distance to include personalized genomic medicine, diagnosis of diseases, and preventive treatment (e.g., see [1]). However, because the genome of a person is ultimately personal information that uniquely identifies the owner, the parties involved should not share personal genomic data in plaintext. We therefore consider a secure two-party model for edit distance computation: Two untrusted parties generating their own public and private keys have strings *x* and *y*, respectively, and they want to jointly compute *f* (*x*, *y*) for a given metric *f* without revealing anything about their individual strings.

Homomorphic encryption (HE) is an emerging technique for such secure multiparty computation. HE is a kind of public-key encryption between two parties Alice and Bob where Bob wants to send a secret message to Alice. In this model, Bob generates his secret key and public key prior to communication, say *sk* and *pk*, where *pk* is known to everyone. Alice then sends the encrypted message *E*(*m*, *pk*) to Bob and he decrypts *m* by using his secret key *sk* using the property *E*(*E*(*m*, *pk*),*sk*) = *m*. If it is not necessary to specify the owner of *pk* and *sk*, we simply write *E*(*m*) for simplicity.

A public-key encryption *E*() has the additive homomorphic property if we can obtain *E*(*m* + *n*) from *E*(*m*) and *E*(*n*) without decryption, and the multiplicative property is similarly defined. If *E*() is additive, Alice can obtain the summation of many people's secret numbers without revealing their private numbers.

The first public-key encryption algorithm RSA [51] is multiplicative because it has the following property: Let (*e*, *n*) be a public key and (*d*, *n*) be a secret key, respectively, where *e*, *d*, *n* are integers. For a message *m*, its encryption is computed by *c* = (*m<sup>e</sup>* mod *n*) and is decrypted by *c<sup>d</sup>* = *med* ≡ *m* mod *n*. We can easily check the multiplicative property (*m<sup>e</sup>* <sup>1</sup> mod *<sup>n</sup>*) · (*m<sup>e</sup>* <sup>2</sup> mod *n*) = (*m*1*m*2)*<sup>e</sup>* mod *n*. The Paillier encryption system [48] was the first system to have the additive property. This means that parties can jointly compute the encrypted value *E*(*x* + *y*) directly based on only two encrypted integers *E*(*x*) and *E*(*y*).

By taking advantage of the homomorphic property, researchers have proposed HEbased privacy-preserving protocols for computing the Levenshtein distance *d*(*x*, *y*). For example, Inan et al. [25] designed a three-party protocol where two parties securely compute *d*(*x*, *y*) by enlisting the help of a reliable third party. Rane and Sun [50] then improved this three-party protocol to develop the first two-party protocol.

In this review, we focus on an extended Levenshtein distance called the *edit distance with moves* (EDM) which allows any substring to be moved with unit cost in addition to the standard operations of inserting, deleting, and replacing a character. Based on the EDM, we can find a set of approximately maximal common substrings appearing in two strings, which can be used to detect plagiarism in documents or long repeated segments in DNA sequences. As an example, consider two unambiguously similar strings *x* = *a<sup>N</sup> b<sup>N</sup>* and *y* = *b<sup>N</sup> a<sup>N</sup>* , which can be transformed into each other by a single move. While the exact EDM is simply EDM(*x*, *y*) = 1, the Levenshtein distance has the undesirable value *d*(*x*, *y*) = 2*N*. The *n*-gram distance is preferable to the Levenshtein distance in this case, but it requires huge time/space complexity depending on *N*.

Although computation of EDM(*x*, *y*) is NP-hard [55], Cormode and Muthukrishnan [12] were able to find an almost linear-time approximation algorithm. Many techniques have been proposed for computing the EDM. For example, Ganczorz et al. [18] proposed a lightweight probabilistic algorithm. In these algorithms, each string *x* is transformed into a characteristic vector v*<sup>x</sup>* consisting of nonnegative integers representing the frequencies of particular substrings of *x*. For two strings *x* and *y*, we then have the approximate distance guaranteeing *L*1(v*<sup>x</sup>* , v*<sup>y</sup>* ) = *O*(lg<sup>∗</sup> *N* lg *N*)EDM(*x*, *y*) for *N* = |*x*|+|*y*|.

In Appendix A of [15], there is a subtle flaw in the ESP algorithm [12] that achieves this *O*(lg<sup>∗</sup> *N* lg *N*) bound. However, this flaw can be remedied by an alternative algorithm called HSP [15]. Because lg<sup>∗</sup> *N* increases extremely slowly,<sup>2</sup> we employ *L*1(v*<sup>x</sup>* , v*<sup>y</sup>* ) as a reasonable approximation to EDM(*x*, *y*).

Basically, the ESP tree is a special type of grammar compression referred to in the previous section where the length of the right-hand side of any production rule is just two or three. Therefore, EDM(*x*, *y*) is approximated by the compressed expressions for the strings *x* and *y*. The relationship between grammar compression (including ESP) and its applications has been widely investigated in the past two decades (see, e.g., [10, 21, 24, 39, 42, 52, 54, 56–58]).

<sup>2</sup> lg<sup>∗</sup> *N* is the number of times the logarithm function lg can be iteratively applied to *N* until lg<sup>∗</sup> *N* ≤ 1.

Recently, Nakagawa et al. proposed the first secure two-party protocol for EDM (sEDM) [44] based on HE. However, their algorithm suffers from a bottleneck during the step where the parties construct a shared labeling scheme. Yoshimoto improved the previous algorithm to make it easier to use in practice [59].We review the practical algorithm here.

#### *6.2.2 Edit Distance with Moves*

Based on the notation for strings in the previous section, EDM(*S*, *S* ) is the length of the shortest sequence of edit operations that transforms *S* into *S* , where the permitted operations (each having unit cost) are inserting, deleting, or renaming one symbol at any position, or moving an arbitrary substring. Unfortunately, as Theorem 6.1 states, computing EDM(*S*, *S* ) is NP-hard even if the renaming operations are not allowed [55], so we focus on an approximation algorithm for EDM, called Edit-Sensitive Parsing (ESP) [12].

**Theorem 6.1** (Shapira and Storer [55]) *Determining* EDM(*x*, *y*) *is NP-hard even if only three unit-cost operations are allowed, namely, inserting a character, deleting a character, and moving a substring.*

ESP constructs a parsing tree, called an ESP tree, for a given string *S*, where internal nodes are labeled *consistently*, that is, internal nodes have a common name if and only if they derive the same string. After two ESP trees *TS* and *TS* are constructed for given strings *S* and *S* for comparison in EDM, the characteristic vectors v*<sup>S</sup>* and v*<sup>S</sup>* are defined such that v*S*[*i*] is the frequency of the *i*th label in *TS*. *EDM*(*S*, *S* ) is then approximated by *L*1(v*S*, v*<sup>S</sup>*) with the following lower/upper bounds.

**Theorem 6.2** (Cormode and Muthukrishnan [12]) *Let TS and TS be consistently labeled ESP trees for S*, *S* ∈ -<sup>∗</sup>*, and let* v*<sup>S</sup> be the characteristic vector for S, where* v*S*[*k*] *is the frequency of label k in TS. Then,*

$$\frac{1}{2}\text{EDM}(S, S') \le L\_1(v\_S, v\_{S'}) = O(\text{lg}^\* N \lg N)\text{EDM}(S, S'),$$
 
$$\sum\_{k=1}^k$$

*for L*1(v*S*, v*<sup>S</sup>*) <sup>=</sup> *i*=1 |v*S*[*i*] − v*S*[*i*]|*.*

In Fig. 6.1, we illustrate an example of consistent labeling of the trees *TS* and *TS* together with the resulting characteristic vectors. Since the strings *S* and *S* are parsed offline, the problem of preserving privacy is reduced to designing a secure protocol for creating consistent labels and computing the *L*1-distance between the trees.

**Fig. 6.1** Example of approximate EDM. For strings *S* = *adabcadeab* and *S* = *eabcadadab*, *S* is transformed into *S* by two moves of substrings, that is, EDM(*S*, *S* ) = 2. After constructing ESP trees *TS* and *TS* with consistent labeling, the corresponding characteristic vectors v*<sup>S</sup>* and v*S* are computed offline. The exact EDM(*S*, *S* ) is approximated by *L*1(v*S*, v*S*) = 4

#### *6.2.3 Homomorphic Encryption*

We now briefly review the framework of homomorphic encryption. Let (*pk*,*sk*) be a key pair for a public-key encryption scheme, and let *Epk* (*x*) be the encrypted value of a message *x* and *Dsk* (*C*) be the decrypted value of a ciphertext *C*, respectively. We say that the encryption scheme is *additively homomorphic* if we have the following properties: (1) There is an operation *h*+(·, ·) for *Epk* (*x*) and *Epk* (*y*) such that *Dsk* (*h*+(*Epk* (*x*), *Epk* (*y*))) = *x* + *y*. (2) For any *r*, we can compute the scalar multiplication such that *Dsk* (*r* · *Epk* (*x*)) = *r* · *x*.

An additive homomorphic encryption scheme that allows a sufficient number of these operations is called an additive HE.3 Paillier's encryption scheme [48] is the first secure additive HE. However, there are not many functions that can be evaluated by using only additive homomorphism and scalar multiplication.

The multiplication *Dsk* (*h*×(*Epk* (*x*), *Epk* (*y*))) = *x* · *y* is another important homomorphism. If we allow both additive and multiplicative homomorphism as well as scalar multiplication (called a fully homomorphic encryption, FHE [19] for short), it follows that we can perform any arithmetic operation on ciphertexts. For example, if we can use sufficiently large number of additive operations and a single multiplicative operation over ciphertexts, we obtain the inner-product of two encrypted vectors.

However, there is a trade-off between the available homomorphic operations and their computational cost. To avoid this difficulty, we focus on leveled HE (LHE) where the number of homomorphic multiplications is restricted beforehand. In particular,

<sup>3</sup> In general, the number of applicable operations over ciphertexts is bounded by the size of(*pk*,*sk*).

L2HE (Additive HE that allows a single homomorphic multiplication) has attracted a great deal of attention. The BGN encryption system is the first L2HE and was invented by Boneh et al. [6] by assuming a single multiplication and sufficient numbers of additions. Using BGN, we can securely evaluate formulas in disjunctive normal form. Following this pioneering study, many practical L2HE protocols have been proposed [2, 9, 16, 22].

In terms of EDM computation, although Nakagawa et al. [44] introduced an algorithm for computing the EDM based on L2HE, their algorithm is very slow for large strings. Following on from this work, Yoshimoto et al. proposed another novel secure computation of EDM for large strings based on the faster L2HE proposed by Attrapadung et al. [2]. To our knowledge, there is no secure two-party protocol for EDM computation that uses only the additive homomorphic property. Whether we can compute EDM using a two-party protocol based on additive HE alone is an interesting question.

For the benefit of the reader, we give a simple review of the mechanism used by BGN, the first L2HE. For plaintexts *m*1, *m*<sup>2</sup> ∈ {1,..., *M*} and their corresponding ciphertexts*C*<sup>1</sup> and*C*2, the ciphertexts of *m*<sup>1</sup> + *m*<sup>2</sup> and*m*1*m*<sup>2</sup> can be computed directly from *C*<sup>1</sup> and *C*<sup>2</sup> without decrypting *m*<sup>1</sup> and *m*2, provided *m*<sup>1</sup> + *m*2, *m*1*m*<sup>2</sup> ≤ *M*.

For large primes *q*<sup>1</sup> and *q*2, the BGN encryption scheme is based on two multiplicative cyclic groups G and G of order *q*1*q*2, two generators *g*<sup>1</sup> and *g*<sup>2</sup> of G, an inverse function (·)−<sup>1</sup> : <sup>G</sup> <sup>→</sup> <sup>G</sup>, and a bihomomorphism *<sup>e</sup>* : <sup>G</sup> <sup>×</sup> <sup>G</sup> <sup>→</sup> <sup>G</sup> . By definition, *<sup>e</sup>*(·, *<sup>x</sup>*) and *<sup>e</sup>*(*x*, ·) are group homomorphisms for all *<sup>x</sup>* <sup>∈</sup> <sup>G</sup>. In addition, we assume that both the inverse function (·)−<sup>1</sup> and the bihomomorphism *e* can be computed in polynomial time in terms of the security parameter log2 *q*1*q*2. Such a system (G, G , *<sup>g</sup>*1, *<sup>g</sup>*2, (·)−<sup>1</sup>, *<sup>e</sup>*) can be generated by, for example, letting <sup>G</sup> be a subgroup of a supersingular elliptic curve and *e* be a Tate pairing [6]. The BGN encryption scheme proceeds as follows.

**Key generation**: Randomly generate two sufficiently large primes *q*<sup>1</sup> and *q*2, then use these to define (G, G , *g*1, *g*2, (·)−<sup>1</sup>, *e*) as described above. Choose two random generators *<sup>g</sup>* and *<sup>u</sup>* of <sup>G</sup>, set *<sup>h</sup>* <sup>=</sup> *<sup>u</sup><sup>q</sup>*<sup>2</sup> , and let *<sup>M</sup>* be a positive integer bounded above by a polynomial function of the security parameter log2 *p*<sup>1</sup> *p*2. The public key is then *pk* <sup>=</sup> (*p*<sup>1</sup> *<sup>p</sup>*2, <sup>G</sup>, <sup>G</sup> , *e*, *g*, *h*, *M*) and the private key is *sk* = *q*1.

**Encryption**: Encrypt the message *<sup>m</sup>* ∈ {0,..., *<sup>M</sup>*} using *pk* and a random*<sup>r</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* to *<sup>C</sup>* <sup>=</sup> *<sup>g</sup>mhr* <sup>∈</sup> <sup>G</sup> yielding the ciphertext *<sup>C</sup>*.

**Decryption**: Find the integer *m* such that *C<sup>q</sup>*<sup>1</sup> = (*gmhr*)*<sup>q</sup>*<sup>1</sup> = (*g<sup>q</sup>*<sup>1</sup> )*<sup>m</sup>* using a polynomial time algorithm. There is a known algorithm for this with time complexity of *O*( <sup>√</sup>*M*).

**Homomorphic properties**: For the ciphertexts *C*<sup>1</sup> = *g<sup>m</sup>*<sup>1</sup> *h<sup>r</sup>*<sup>1</sup> and *C*<sup>2</sup> = *g<sup>m</sup>*<sup>2</sup> *h<sup>r</sup>*<sup>2</sup> in G corresponding to the messages *m*<sup>1</sup> and *m*2, anyone can calculate the encrypted value of *m*<sup>1</sup> + *m*<sup>2</sup> and *m*1*m*<sup>2</sup> directly from *C*<sup>1</sup> and *C*<sup>2</sup> without knowing *m*<sup>1</sup> and *m*2, as follows.

**– Additive homomorphism**:

$$C\_a = C\_1 C\_2 h^r = (\mathbf{g}^{m\_1} h^{r\_1})(\mathbf{g}^{m\_2} h^{r\_2}) h^r = \mathbf{g}^{m\_1 + m\_2} h^{r\_1 + r\_2 + r\_1}$$

gives the encrypted value of *m*<sup>1</sup> + *m*2.

**– Multiplicative homomorphism**: *Cm* <sup>=</sup> *<sup>e</sup>*(*C*1,*C*2)*h<sup>r</sup>* <sup>∈</sup> <sup>G</sup> gives the encrypted value of *m*1*m*2, because

$$\begin{aligned} C\_m^{q\_1} &= e(C\_1, C\_2) \\ &= \left[ e(\mathcal{g}\_1, \mathcal{g}\_2)^{m\_1 m\_2} e(\mathcal{g}\_1, \mathcal{g}\_2)^{q\_2 m\_1 r\_1} e(\mathcal{g}\_2, \mathcal{g}\_1)^{q\_2 m\_2 r\_1} e(\mathcal{g}\_1, \mathcal{g}\_2)^{q\_2 r} \right]^{q\_1} \\ &= \left( e(\mathcal{g}\_1, \mathcal{g}\_2)^{q\_1} \right)^{m\_1 m\_2}, \end{aligned}$$

where we decrypt *Cm*, by computing *m*1*m*<sup>2</sup> from (*g*(*g*1, *g*2)*<sup>q</sup>*<sup>1</sup> ) *<sup>m</sup>*1*m*<sup>2</sup> and *e*(*g*1, *g*2)*<sup>q</sup>*<sup>1</sup> .

Note that*C*1,*C*<sup>2</sup> <sup>∈</sup> <sup>G</sup> also have additive homomorphic properties, so BGN allows a single multiplication and unlimited additions over ciphertexts.

#### *6.2.4 L2HE-Based Algorithm for Secure EDM*

We now explain the algorithm for computing approximate EDM based on L2HE [59]. Two parties *A*, *B* have strings *SA*, *SB*, respectively. First, they compute the corresponding ESP trees *T<sup>A</sup>* and *T<sup>B</sup>* offline and they assign tentative labels to internal nodes of *T<sup>A</sup>* and *T<sup>B</sup>* using a hash function *h* : *X* → {1, 2,..., *n*} for *X* ⊆ {0, 1,..., *m*} of *n* different labels in *T<sup>A</sup>* and *T<sup>B</sup>* with a fixed *m*. The goal is to securely relabel *X* using a bijection: *X* → {1, 2,..., *n*}, as described in Algorithm 2. We suppose that *A* and *B* generate their own public and private keys prior to the computation.

In Algorithm 2, we assume an L2HE scheme allowing a single multiplicative operation and a sufficient number of additive operations over encrypted integers. Because these operations are usually implemented by AND (·) and XOR (⊕) logic gates (e.g., [7]), we introduce the following notation for these gates. First, *EA*(*x*) denotes the ciphertext generated by encrypting plaintext *x* with *A*'s public key, and *EA*(*x*, *y*,*z*) is an abbreviation for the vector (*EA*(*x*), *EA*(*y*), *EA*(*z*). Here, *EA*(*x*, *y*,*z*) · *EA*(*a*, *b*, *c*) denotes(*EA*(*x* · *a*), *EA*(*y* · *b*), *EA*(*z* · *c*)) and *EA*(*x*, *y*,*z*) ⊕ *EA*(*a*, *b*, *c*) denotes (*EA*(*x* ⊕ *a*), *EA*(*y* ⊕ *b*), *EA*(*z* ⊕ *c*)) for each bit *x*, *y*,*z*, *a*, *b*, *c* ∈ {0, 1}. Using this notation, we describe the proposed protocol in Algorithm 2.

Next, we define the protocol security based on a model where both parties are assumed to be *semi-honest*, that is, corrupt parties merely cooperate to gather information out of the protocol, but do not deviate from the protocol specification. The security is defined as follows.

**Definition 6.1** (*S*emi-honest security [20]) A protocol is secure against semi-honest adversaries if each party's observation of the protocol can be simulated using only the input they hold and the output that they receive from the protocol.

Intuitively, this definition tells us that a corrupt party is unable to learn any extra information that cannot be derived from the input and output explicitly (for details, see [20]). Under this assumption, since the algorithm is symmetric with respect

# **Algorithm 2** for consistently labeling *T<sup>A</sup>* and *T<sup>B</sup>* [59]

**Preprocessing (tentative labeling)**: Parties *A* and *B* agree to use a hash function *H* with a range {0,..., *m*} for sufficiently large *m*. Both parties compute *T<sup>A</sup>* and *T<sup>B</sup>* corresponding to their respective strings offline. The label of internal node *u* is assigned *H*(*s*(*u*)) where *s*(*u*) is the string of all leaves of *u*. Now, parties *A* and *B* have tentative label sets [*TA*],[*TB*]⊆{0,..., *m*}, respectively.

**Goal:** Change all the labels using a bijection: [*TA*]∪[*TB*]→{1,..., *n*} without either party having to reveal anything about their private strings.

**Notation:** *EA*(*x*) denotes the ciphertext of a message *x* encrypted by an L2HE with *A*'s public key.

#### **Sharing a dictionary:**

**Step 1:** Party *A* computes the bit vector **X**[1 ... *m*] such that **X**[] = 1 iff ∈ [*TA*]. Similarly, party *B* computes **Y**[1 ... *m*] such that **Y**[] = 1 iff ∈ [*TB*]. **Step 2:** *A* sends *EA*(**X**) to *B* and *B* sends *EB*(**Y**) to *A*. **Step 3:** *B* computes (*EA*(**X**) ⊕ *EA*(**Y**)) ⊕ (*EA*(**X**) · *EA*(**Y**)) = *EA*(**X** ∪ **Y**) and *A* computes (*EB*(**X**) ⊕ *EB*(**Y**)) ⊕ (*EB*(**X**) · *EB*(**Y**)) = *EB*(**X** ∪ **Y**).

**Relabeling** [*TA*] **using** *EA*(**X** ∪ **Y**) ([*TB*] is relabeled in a symmetrical fashion)

$$\textbf{Step 4: } \mathcal{A} \text{ computes } E\_{\mathcal{B}}(L\_{\ell}) = E\_{\mathcal{B}}\left(\sum\_{i=1}^{\ell} (\textbf{X} \cup \textbf{Y})[i]\right) \text{ for all } \ell \in [T\_{\mathcal{A}}].$$
 
$$\textbf{Case 5: } \textbf{4}, \text{ and } \textbf{5}, \text{ and } E\_{\mathcal{B}}(T\_{\mathcal{A}}), \text{ and } \textbf{7} \text{ respectively.}$$

**Step 5:** *<sup>A</sup>* sends all *<sup>E</sup>B*(*L* <sup>+</sup> *<sup>r</sup>*) to *<sup>B</sup>* choosing *<sup>r</sup>* uniformly at random from <sup>N</sup>. **Step 6:** *B* decrypts all *L* + *r* and sends them back to *A*.

**Step 7:** *A* recreates *L* ∈ {1,..., *n*} for all ∈ [*TA*] by subtracting *r*.

to *A* and *B*, the following theorem proves the security of our algorithm's against semi-honest adversaries.

**Theorem 6.3** (Yoshimoto et al. [59]) *Let* [*TA*] *be the set of labels appearing in TA. The only knowledge that a semi-honest A can gain by executing Algorithm 2 is the distribution of the labels* {*L* | ∈ [*TA*]} *over* [1,..., *n*]*.*

**Theorem 6.4** (Yoshimoto et al. [59]) *Algorithm 2 assigns consistent labels using the injection:* [*TA*]∪[*TB*]→{1, 2,..., *n*} *without revealing the parties' private information. It has round and communication complexities of O*(1) *and O*(α(*n* lg *n* + *m* + *r n*))*, respectively, where n* = |[*TA*]∪[*TB*]|*, m is the modulus of the rolling hash used for preprocessing, r* = max{*r*1,...,*rn*} *is the security parameter, and* α *is the cost of executing a single encryption, decryption, or homomorphic operation.*

**Table 6.1** Comparison of the communication and round complexities of secure EDM computation models [44, 59] as well as a naive algorithm. Here, *N* is the total length of both parties' input strings, *n* is the number of characteristic substrings determining the approximate EDM, and *m* is the range of the rolling hash *H*(·) for the substrings satisfying *m* > *n*. "Naive" is the baseline method that uses *H*(·) as the labeling function for the characteristic substrings


#### *6.2.5 Result and Open Question*

The complexities of related algorithms are summarized in Table 6.1. Computing the approximate EDM involves two phases: the shared labeling of characteristic substrings (Phase 1) and the *L*1-distance computation of characteristic vectors (Phase 2).

Let the parties have strings *x* and *y*, respectively. In the offline case (i.e., there is no need for privacy-preserving communication), they construct the respective parsing trees *Tx* and *Ty* by the bottom-up parsing called ESP [12], where the node labels must be *consistent*, meaning that two labels are equal if they correspond to the same substring. In such an ESP tree, a substring derived by an internal node is called a characteristic substring. In a privacy-preserving model, the two parties need to jointly compute these consistent labels without revealing whether a characteristic substring is common to both of them (Phase 1). After computing all the labels in *Tx* and *Ty* , they jointly compute the *L*1-distance of two characteristic vectors containing the frequencies of all labels in *Tx* and *Ty* (Phase 2).

As reported in [44], a bottleneck exists in Phase 1. The task is to design a bijection *f* : *X* ∪ *Y* → {1, 2,..., *n*} where *X* and *Y* (|*X* ∪ *Y* | = *n*) are the sets of characteristic substrings for the parties, respectively. Since *X* and *Y* are computable without communication, the goal is to jointly compute *f* (w)for any w ∈ *X* without revealing whether w ∈ *Y* . This problem is closely related to the private set operation (PSO) where parties possessing their private sets want to obtain the results for several set operations, such as intersection or union. Applying the Bloom filter [5] and HE techniques, various protocols for PSO have been proposed [4, 13, 35]. However, these protocols are not directly applicable to our problem because they require at least three parties for the security constraints. In contrast, the algorithm reviewed here introduced a novel secure two-party protocol for Phase 1.

As shown in Table 6.1, the recent result eliminates the *O*(lg *N*) round complexity using the proposed method that can achieve *O*(1) round complexity while maintaining the efficiency of communication complexity. Furthermore, the practical performance of the algorithms for real DNA sequences was reported in [44].

**Acknowledgements** The authors were supported by JST CREST (JPMJCR1402) and JSPS KAK-ENHI (16K16009, 17H01791, 17H00762, and 18K18111).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 7 Compression and Pattern Matching**

**Takuya Kida and Isamu Furuya**

**Abstract** We introduce our research on compressed pattern matching technology that combines data compression and pattern matching. To show the results of this work, we explain the collage system proposed by Kida et al. in 2003 that is a unifying framework for compressed pattern matching, and we explain the Repair-VF method proposed by Yoshida and Kida in 2013 and the MR-Repair method proposed by Furuya et al. in 2019 as grammar compressions suitable for compressed pattern matching.

#### **7.1 Introduction**

Data compression is a technology that reduces the space used to store data by compactly expressing the redundancy contained in the data. It is mainly used for efficiently storing large amounts of data and reducing communication costs. If we consider the conversion of information to digital data as a kind of data compression, it has a long history that can be traced back to the Morse code developed in the 1830s. Many compression methods have been proposed depending on the type and application of data [43–46].

Information retrieval has also long been studied as a technique for efficiently finding a target part from a large-scale dataset or data group [3, 11, 13, 14, 30, 39, 55], and there are various methods depending on the required specifications. In particular, the approaches differ between searching for images and audio data and searching for documents (*text*). In this chapter, we focus on the latter task of text searching.

I. Furuya

T. Kida (B)

Faculty of Engineering, Hokkai-Gakuen University, 1-1, S26-W11, Chuo-ku, Sapporo 064-0926, Japan

e-mail: kida@lst.hokkai-s-u.ac.jp

Graduate School of IST, Hokkaido University, N14-W9, Kitaku, Sapporo 060-0814, Japan e-mail: furuya@ist.hokudai.ac.jp

One of the basic problems in text searching is the *pattern matching problem*, which is also called the string matching problem. This is the problem of finding the occurrences of keywords (*patterns*) in a target text. Broadly speaking, there are two approaches to solving this problem. One is to build an index for the input text in advance, which is called *text indexing*. A text index allows for efficient searching when the target text is static and is not subsequently updated. The other is to access the input text sequentially from the beginning to the end while checking if the given pattern matches at the current reference position in the text. This is called *text scanning*. Text scanning is applicable even if the text is updated from time to time and it does not require index structures. In general, text indexing is superior in terms of search speed, while text scanning is superior in terms of search flexibility. By convention, "pattern matching" often refers to the latter, text scanning, in a narrow sense.

In this chapter, we outline a fusion technology of data compression and pattern matching called *compressed pattern matching*. First, in Sect. 7.2, we look back on the history of this field of study. Then, in Sect. 7.3, we provide some notation and definitions that are used in the following sections. In addition, we recall grammar compression, which is the key compression scheme for compressed pattern matching. Next, in Sect. 7.4, we introduce the general framework of compressed pattern matching proposed by Kida et al. [21]. In Sects. 7.5 and 7.6, we present outlines of Repair-VF and MR-Repair, respectively, which are the results of our work in this study. Finally, we conclude in Sect. 7.7.

#### **7.2 History of Compressed Pattern Matching Research**

The technology of combining data compression and pattern matching emerged in the early 1990s. This has come to be known as the *compressed pattern matching problem* [1], which is the problem of performing pattern matching without first decompressing the compressed input text. Formally, when a text *T* = *t*1*t*<sup>2</sup> ... *tu* (*ti* is a symbol) is given in compressed form *Z* = *z*1*z*<sup>2</sup> ...*zn* (*zi* is an element of the compressed text), and pattern *P* is given, the problem is to find all occurrences of *P* in *T* using only *Z* and *P*. A simple method is to first decompress *Z* to *T* and then use some commonly used pattern matching algorithm. However, this approach requires O(*m* + *u*) time for pattern matching in addition to the decompression time.

The optimal algorithm for the compressed pattern matching problem is one that performs pattern matching in O(*m* + *n*) time in the worst case. However, it is not easy to achieve both efficient compression of text data and fast pattern matching on it. In the initial research in this field, individual pattern matching algorithms were developed for each compression method. For example, Eilam-Tzoreff and Vishkin et al. [15] proposed an algorithm for run-length compression, G¸asieniec et al. [18] and Farach Thorup et al. [16] proposed algorithms for LZ77, and Amir et al. [2] proposed an algorithm for LZW [54]. However, these algorithms tend to be complicated, and as Manber [29] pointed out, it is questionable as to whether they are more practical than the simple method.

From the late 1990s to the early 2000s, several efficient methods for compressed pattern matching emerged [22, 38, 41]. For the first time, it was shown experimentally that these methods can perform pattern matching faster than the simple method. Furthermore, methods have appeared that can perform pattern matching faster by about the compression ratio than matching on the original text. The key is to select a compression method suitable for pattern matching even at the expense of compression ratio. In fact, Byte Pair Encoding (BPE), which was used by Shibata et al. [48] for this purpose, has a compression ratio of at most about 50% for natural language texts, while the LZ-family methods can compress the same texts to about 30% or less. However, text compression by BPE offers an advantage for pattern patching because all the codewords are fixed at 8 bits and the correspondence between each codeword and a portion of the text is relatively clear.

This caused a paradigm shift. Whereas individual pattern matching algorithms were previously developed for each data compression method, we realized that in order to increase the matching speed it would be better to develop a new data compression method suitable for pattern matching. In fact, in the 2000s, several data compression methods were proposed for this purpose.

One of the main groups of compression methods based on this idea are the compression methods proposed by Brisaboa et al. [7–9] and by Klein and Ben-Nissan [24]. These are based on a technique called *dense coding* [8]. Dense coding divides an input (natural language) text into words, and then encodes them so that the codewords become shorter in descending order of the frequency of the words. In addition, each codeword is assigned a bit pattern that has an explicit end to facilitate codeword extraction. Although dense coding offers good performance in terms of both compression ratio and pattern matching speed, some ingenuity is required to apply it to data such as DNA sequences that cannot be divided into words.

The other system is *grammar-based compression* (or *grammar compression*) [23] with fixed-length coding. This idea is an extension of BPE and can be applied even if an input text cannot be separated into words. One direct improvement of BPE is a method using a context-sensitive grammar by Maruyama et al. [34], while for compression methods based on context-free grammar we have the methods by Klein and Shapira [25] and Uemura et al. [51]. In both methods, a modified version of suffix tree [53] is used as a dictionary tree for constructing grammar.

In this chapter, for convenience, we refer to the former system as the dense coding system and the latter as the VF coding system.

In the 2010s, new data compression algorithms began to appear that achieved compression performance comparable to well-known compression tools such as Gzip and Bzip while maintaining properties suitable for pattern matching. Among the two systems described above, the VF coding system has difficulties in terms of compression rate and compression speed. Therefore, research looked into searching for and improving grammar compression, which is the basis of the VF coding system. Among this work, the algorithm RePair [26], which was proposed before the name "grammar compression" was used, has attracted attention because of its excellent compression ratio. Yoshida and Kida et al. [56] proposes a variant of RePair, called Repair-VF, which reduces the decrease in compression ratio by suppressing unnecessary grammar rules while encoding the output using fixed length codewords. The time and space complexities required for Repair-VF are both O(*n*) for text of length *n*, which is the same as the original RePair. Repair-VF realizes high-speed pattern matching on compressed text while having a good compression ratio comparable to Gzip.

Very recently, we proposed a novel variant of RePair, called MR-RePair [17], which constructs more compact grammars than RePair, particularly for highly repetitive texts. This achievement comes from an analysis of RePair. We show in [17] that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent *maximal repeats*. We then reveal in [17] the relationship between maximal repeats and grammars constructed by RePair.

#### **7.3 Preliminaries**

#### *7.3.1 Definitions of Notation and Terms*

Let be an *alphabet*, that is, an ordered finite set of symbols. An element *T* = *t*<sup>1</sup> ... *tn* of -<sup>∗</sup> is called a *string* or a *text*, where |*T* | = *n* denotes its length. Let ε be an empty string of length 0, that is, |ε| = 0. We denote a concatenation of two strings *x*, *y* ∈ - by *x* · *y*, or *x y* for simplicity if no confusion occurs.

If *T* = *xyz* with *x*, *y*,*z* ∈ -<sup>∗</sup>, then *x*, *y*,*z* are called a *prefix*, *substring*, and *suffix* of *T* , respectively. Let *T* [*i* : *j*] = *ti* ··· *tj* for any 1 ≤ *i* ≤ *j* ≤ *n* denote a substring of *T* beginning at *i* and ending at *j* in *T* , and let *T* [*i*] = *ti* denote the *i* th symbol of *T* . Let w[*i* : *j*] = ε if *j* < *i* for simplicity.

#### *7.3.2 Grammar Compression*

A *context-free grammar* (*CFG* or simply *grammar*) G is defined as a four-tuple *G* = {*V*,-, *S*, *R*}, where *V* denotes an ordered finite set of variables, denotes an ordered finite alphabet, *R* denotes a finite set of binary relations called *production rules* (or *rules*) between *V* and (*V* ∪ -)∗, and *S* ∈ *V* denotes a special variable called *start variable*. A production rule refers to the situation where a variable is substituted and written in the form v → w, with v ∈ *V* and w ∈ (*V* ∪ -)∗. Let *X*, *Y* ∈ (*V* ∪ -)∗. If there are *xl*, *x*, *xr*, *y* ∈ (*V* ∪ -)<sup>∗</sup> such that *X* = *xl x xr*, *Y* = *xl yxr*, and *x* → *y* ∈ *R*, we write *X* → *Y* , and denote the reflexive transitive closure of→as <sup>∗</sup> ⇒. Let *val*(v) be a string derived from v, that is, v <sup>∗</sup> ⇒ *val*(v). We define grammar *G*ˆ = {*V*ˆ , -, ˆ *S*ˆ, *R*ˆ} as a *subgrammar* of *G* if *V*ˆ ⊆ *V*, -ˆ ⊆ (*V* ∪ -), and *R*ˆ ⊆ *R*.

Given a text *T* , *grammar compression* is a method for lossless text data compression that constructs a restricted CFG uniquely deriving the text *T* . For *G* to be deterministic, the production rule for each variable v ∈ *V* must be unique. In what follows, we assume that every grammar is deterministic and each production rule is v*<sup>i</sup>* → *expri* , where *expri* is an expression either *expri* = *a* (*a* ∈ -) or *expri* = v*<sup>j</sup>*<sup>1</sup> v*<sup>j</sup>*<sup>2</sup> ...v*jn* (*i* > *jk* for all 1 ≤ *k* ≤ *jn*). To estimate the effectiveness for compression, we use the size of the constructed grammar, which is defined as the total length of the right-hand side of all production rules of the grammar.

While the problem of constructing the smallest such grammar for a given text is known to be NP-hard [10], several approximation algorithms have been proposed. One of them is RePair [26], which is an off-line grammar compression algorithm. Despite its simple scheme, RePair is known for its high compression in practice [12, 19, 52], and hence, it has been comprehensively studied. Some examples of studies on the RePair algorithm include its extension to an online algorithm [35], practical working time/space improvements [6, 47], applications to various fields [12, 27, 49], and theoretical analysis of generated grammar sizes [10, 40, 42].

#### **7.4 Framework for Compressed Pattern Matching**

A grammar compressed text can be expressed in a framework called collage systems [21]. A pattern matching algorithm on the compressed text can then be obtained as an instance of the general algorithm on the collage system. Algorithm on collage systems can be understood as an extension of the Knuth-Morris-Pratt method (KMP method) [14].

#### *7.4.1 KMP Method*

The KMP method a well-known linear-time algorithm for pattern matching on an ordinary (uncompressed) text. Its movement can be modeled as a linear automaton (KMP automaton) for a given pattern *P*.

For a given pattern *P*, a KMP automaton consists of two functions:

goto function*g* : *Q* × - → *Q* ∪ {*fail*}, failure function *f* : *Q* \ {0} → *Q*,

where *Q* = {0, 1,..., |*P*|} is the set of states, and *fail* is a special symbol that is not included in *Q*. For *j* ∈ *Q* and *a* ∈ -, the goto function *g* returns *j* + 1 if *P*[*j* + 1] = *a* holds, otherwise it returns*fail*. For *j* = 0, let *g*(0, *a*) = 0 for all *a* ∈ - where *P*[1] = *a* holds. For *j* ∈ *Q* \ {0}, the failure function *f* returns the maximum integer *k* such that *P*[1 : *k*] = *P*[*j* − *k* + 1 : *j*] holds.

**Fig. 7.1** KMP automaton for *P* = abacb. In this figure, each circle indicates a state, and the double circle indicates the final state. Solid arrows and dashed arrows indicate the goto function and failure function, respectively

**Fig. 7.2** Movement of KMP automaton. Solid arrows and dashed arrows indicate state transitions caused by the goto function and the failure function, respectively

The automaton repeats state transitions by tracing *g* corresponding to the characters read one by one from the input text. If *g* returns *fail*, then *f* is repeatedly called with the current state number to go back until a transition by *g* succeeds with the same character. When the automaton finally reaches the rightmost state, it can be judged that *P* has occurred.

Figure 7.1 shows the KMP automaton for pattern *P* = abacb. The movement of the KMP automaton in Fig. 7.1 for the text *T* = abacbbabaabacb is shown in Fig. 7.2. In this example, it can be judged that *P* has occurred when the automaton reaches the state number 5.

To eliminate the failure function, we define the state transition function δ : *Q* × -→ *Q* as follows:

$$\delta(j, a) = \begin{cases} g(j, a) & \text{if } g(j, a) \neq fail, \\ \delta(f(j), a) \text{ otherwise} \end{cases}$$

Moreover, we extend it to *Q* × -<sup>∗</sup> as follows:

$$
\delta^\*(j,\varepsilon) = j, \quad \delta^\*(j,\mu a) = \delta(\delta^\*(j,\mu), a),
$$

where *j* ∈ *Q*, *u* ∈ -<sup>∗</sup>, and *a* ∈ -.

#### *7.4.2 Collage System*

A *collage system* is a pair D,S defined as follows: D is a sequence of assignments *X*<sup>1</sup> = *expr*1; *X*<sup>2</sup> = *expr*2;··· ; *Xn* = *exprn*, where each *Xk* is a token and *exprk* is any of the form:

*a* for *a* ∈ - ∪ {ε}, (*primitive assignment*) *Xi X <sup>j</sup>* for *i*, *j* < *k*, (*concatenation*) [*j*] *Xi* for *i* < *k* and an integer *j*, (*prefix truncation*) *X*[*j*] *<sup>i</sup>* for *i* < *k* and an integer *j*, (*suffix truncation*) (*Xi*)*<sup>j</sup>* for *i* < *k* and an integer *j*. (*j times repetition*)

Let the set of all tokens in D be denoted by *F*(D). Each token represents a string obtained by evaluating the expression as it implies. Let the string represented by token *X* ∈ *F*(D) be denoted by *X*.*u*. For example, for *X*<sup>1</sup> = a; *X*<sup>2</sup> = b; *X*<sup>3</sup> = *<sup>X</sup>*<sup>1</sup> · *<sup>X</sup>*2; *<sup>X</sup>*<sup>4</sup> <sup>=</sup> (*X*3)<sup>3</sup>; *<sup>X</sup>*<sup>5</sup> <sup>=</sup> *<sup>X</sup>*[1] <sup>4</sup> , *X*4.*u* = ababab and *X*5.*u* = ababa. However, in this section we identify token *X* with the string it represents, and simply denote both by *X* unless confusion occurs.

Let the number of assignments in D be the *size* of D, and denote it by ||D||, that is, ||D|| = |*F*(D)| = *n*. For a sequence S = *Xi*<sup>1</sup> , *Xi*<sup>2</sup> ,..., *Xik* of tokens defined in D, we denote by |S| the number *k* of tokens in S.

The collage system D,S represents the string obtained by concatenating *Xi*<sup>1</sup> ,..., *Xik* . That is, D and S correspond to the dictionary and compressed text in a compression method, respectively. Both D and S can be encoded in various ways. The compression ratios therefore depend on their encoding sizes rather than ||D|| and |S|.

A collage system is said to be *truncation-free* if D contains no truncation operation. A collage system is said to be *regular* if D contains neither truncation nor repetition operations. A regular collage system is said to be *simple* if for every assignment *X* = *Y Z*, |*Y*.*u*| = 1 or |*Z*.*u*| = 1.

#### *7.4.3 Pattern Matching on Collage Systems*

The basic idea of pattern matching on a collage system is to simulate the movement of the KMP automaton on uncompressed text. Using the state transition function δ<sup>∗</sup> of the KMP automaton defined in Sec. 7.4.1, we define the function *Jump* : *Q* × *F*(D) → *Q* as follows:

$$
\
Mump(j, X) = \delta^\*(j, X.u).
$$

The intent of *Jump* is to simulate the state transition of the original KMP automaton by jumping when it receives token *X*. Moreover, for any *j* ∈ *Q* and *X* ∈ *F*(D), we define the set *Output*(*j*, *X*) = *OccP* (*P*[1 : *j*] · *X*.*u*), where *OccP* (*x*) indicates the set of all indices of occurrences of *P* within *x*.

**Fig. 7.3** A matching algorithm on a collage system

For a given collage system D,S representing text *T* and for a given pattern *P*, the pattern matching algorithm preprocesses the information required to calculate *Jump* and *Output* from D, and then performs matching while scanning a sequence of tokens in S one by one from the head (Fig. 7.3).

From the results of [21], the following theorem is obtained.

**Theorem 1** (Theorem 3 of [21]) *For a collage system* D,S*, the compressed pattern matching problem can be solved in* O(||D|| + |S| + *m*<sup>2</sup> + *R*) *time using* O(||D|| + *m*<sup>2</sup>) *space if* D,S *is regular, where R is the number of occurrences of pattern P in the text represented by* D,S*.*

This theorem applies to both RePair and Repair-VF because texts compressed by these can be described by regular collage systems.

#### **7.5 Repair-VF**

This section first introduces RePair and then gives an outline of Repair-VF. Repair-VF has a structure that combines RePair with a fixed-length coding. Please refer to the literature [56] for the details of Repair-VF and experimental results for its performance.

#### *7.5.1 RePair*

RePair is a grammar compression algorithm that was proposed by Larsson and Moffat [26]. For input text *T* , let *G* = {*V*,-, *S*, *R*} be the grammar constructed by RePair. The RePair procedure can then be described by the following steps:

**Step 1**. Replace each symbol *a* ∈ with a new variable v*<sup>a</sup>* and add v*<sup>a</sup>* → *a* to *R*. **Step 2**. Find the most frequent pair *p* in *T* .


**Fig. 7.4** Example of the grammar generation process of RePair for *T* = abracadabra. The generated grammar is {{va, vb, vr, vc, v1, v2, v3, *S*},{a,b,r,c,d}, *S*,{v<sup>a</sup> → a, v<sup>b</sup> → b, v<sup>r</sup> → r, v<sup>c</sup> → c, v<sup>d</sup> → d, v<sup>1</sup> → vavb, v<sup>2</sup> → v1vc, v<sup>3</sup> → v2vd, *S* → v3vcvavdv3}} with a size of 16


Figure 7.4 illustrates an example of the grammar generation process of RePair.

The following theorem relates to the performance of RePair shown by Larsson and Moffat [26].

**Theorem 2** ([26]) *RePair works in* O(*n*) *expected time and* 5*n* + 4*k*<sup>2</sup> + 4*k* + <sup>√</sup>*<sup>n</sup>* <sup>+</sup> <sup>1</sup> − <sup>1</sup> *words of space, where n is the length of the source text, k denotes the cardinality of the source alphabet, and k denotes the cardinality of the final dictionary.*

#### *7.5.2 Outline of Repair-VF*

The original RePair encodes the rules in *R* excluding *S* using Elias gamma coding, that is, each codeword has a variable length, whereas Repair-VF uses a fixed-length code. The right side of *S* corresponds to the compressed text, and is converted to a sequence of fixed-length codewords of the rules in *S*.

Consider the number of fixed-length coded rules. In the process of Step 1 of RePair, |-| rules are created. In addition, the process of Step 3 of RePair replaces the most frequent pair and at the same time adds one rule to *R*. Let *s* be the number of rules which are added to *R* in Step 3. The total number of rules is then |-| + *s*, and thus each symbol can be fixed-length encoded with log(|-| + *s*) bits. The information about can be restored from the rules added in Step 3, so there is no need to explicitly save it. Therefore, only the rules added in Step 3 and the right side of *S* added last in Step 4 need to be saved. In the former, the right side of each rule consists of two symbols, so the total number of symbols to be saved is 2*s*. Since the latter depends on the input text *T* , let *n* be the length of *T* . The number of bits of the output compressed data is then

$$(2s+n)\lceil \log(|\Sigma|+s) \rceil\tag{7.1}$$

bits in total.

We want to find the best*s* that minimizes the total number of output bits. Note that the final output tends to be smaller as*s* increases up to some point. In RePair, Step 3 is repeated until the frequency of the most frequent pair becomes 1. In the case of using a fixed-length code as above, this increases useless rules. Increasing the number of rules can increase the bit length per symbol, resulting in a longer final bit length. We can eliminate the waste by terminating the process in the middle. However, it is difficult to determine on the fly the *s* at which the output becomes minimal. Even after the first time the output size increases, the length of *S* may become shorter by continuing Step 3, and the output size may decrease again.

Therefore, during the processing of RePair, we record the minimum value of the output size and the corresponding *s* by calculating Equation (7.1) every time a rule is added in Step 3. Note that the calculation of Equation (7.1) does not require actual coding or outputting since a fixed-length code is used. Finally, when *S* is output in Step 4, we can obtain the smallest output by outputting while expanding the rules added after the best *s*.

This is *Repair-VF* (called Repair-VF-best in the original paper). The suffix "VF" comes from an abbreviation for variable-to-fixed length coding (VF coding). For the input text *T* , each rule of the output grammar *G* corresponds to a substring of *T* , and the right-hand side of *S* can be regarded as the variable length factorization of *T* . Thus, Repair-VF can be viewed as a VF coding from the viewpoint of information source coding.

#### **7.6 MR-Repair**

In this section we outline MR-Repair, which is a method to reduce the output grammar size by focusing on the relationship between RePair and maximal repeats. Please refer to the literature [17] for the details of MR-Repair and experimental results for its performance.

#### *7.6.1 Maximal Repeats*

Let *s* be a substring of text *T* . If the frequency of *s* is greater than 1, *s* is called a *repeat*. A *left* (or *right*) *extension* of *s* is any substring of *T* in the form w*s* (or *s*w), where w ∈ -<sup>∗</sup>. We define *s* as a *left* (or *right*) *maximal* if left (or right) extensions of *s* occur a strictly lower number of times in *T* than *s*. Accordingly, *s* is a maximal repeat of *T* if *s* is both left and right maximal. In this paper, we only consider strings with a length of more than 1 as maximal repeats. For example, the substring abra of *T* = abracadabra is a maximal repeat, whereas br is not.

The following theorem describes an essential property of RePair, that is, RePair recursively replaces the most frequent maximal repeats.

**Theorem 3** (Theorem 1 of [17]) *Let T be a given text, under the condition that every most frequent maximal repeat of T does not appear overlapping itself. Let f be the frequency of the most frequent pairs of T , and t be a text obtained after all pairs with frequency f in T are replaced by variables. There is then a text s such that s is obtained after all maximal repeats with frequency f in T are replaced by variables, and s and t are isomorphic to each other.*

#### *7.6.2 MR Order*

According to Theorem 1 of [17], if there is just one most frequent maximal repeat in the current text, then RePair replaces all occurrences of it step by step. However, a problem arises if there are two or more most frequent maximal repeats, with some of them overlapping. In this case, the selection order of pairs (of course, they are most frequent) affects the priority of maximal repeats. We call this order of selecting (summarizing) maximal repeats the *maximal repeat selection order* (or simply *MRorder*). Note that the selection order of pairs actually depends on the implementation of RePair. If there are several distinct most frequent pairs with overlaps, RePair constructs grammars with different sizes according to the selection order of the pairs.

However, the following theorem states that the MR-order rather than the replacement order of pairs determines the size of the grammar generated by RePair.

**Theorem 4** (Theorem 2 of [17]) *The sizes of grammars generated by RePair are the same if they are generated in the same MR-order.*

#### *7.6.3 Algorithm*

The main strategy of the proposed method is to recursively replace the most frequent maximal repeats instead of the most frequent pairs.

**Definition 1** (*Definition 3 of* [17]) For an input text *T* , let *G* = {*V*,-, *S*, *R*} be the grammar generated by MR-Repair. MR-Repair constructs *G* through the following steps:

**Step 1.** Replace each symbol *a* ∈ with a new variable v a and add v*<sup>a</sup>* → *a* to *R*.

**Step 2.** Find the most frequent maximal repeat *r* in *T* .

**Step 3.** Check if |*r*| > 2 and *r*[1] = *r*[|*r*|], and if so, use *r*[1 : |*r*| − 1] instead of *r* in Step 4.


**Fig. 7.5** Example of the grammar generation process of MR-Repair for *T* = abracadabra. The generated grammar is {{v*<sup>a</sup>* , v*b*, v*r*, v*c*, v*<sup>d</sup>* , v1, *S*},{*a*, *b*,*r*, *c*, *d*}, *S*,{v*<sup>a</sup>* → *a*, v*<sup>b</sup>* → *b*, v*<sup>r</sup>* → *r*, v*<sup>c</sup>* → *c*, v*<sup>d</sup>* → *d*, v<sup>1</sup> → v*<sup>a</sup>* v*b*v*r*, v<sup>2</sup> → v1v*<sup>a</sup>* , *S* → v2v*c*v*<sup>a</sup>* v*<sup>d</sup>* v2}} with a size of 15


Figure 7.5 shows an example of the grammar generation process of MR-Repair. As shown in this figure, the size of the grammar generated by MR-Repair is smaller than that generated by RePair shown in Figure reffig:repair.

**Theorem 5** (Theorem 5 of [17]) *Assume that RePair and MR-Repair work based on the same MR-order for a given text. Let grp and gmr be the sizes of the grammars generated by RePair and MR-Repair, respectively. Then,* <sup>1</sup> <sup>2</sup> *grp* < *gmr* ≤ *grp holds.*

#### **7.7 Conclusion**

In this chapter, we outlined research on compressed pattern matching and showed that we can speed up pattern matching by selecting a suitable compression method. This has led to the development of compression methods that are useful for pattern matching. Whereas the initially developed compression methods had low compression performance, Repair-VF [56] achieves both a good compression rate and good matching speed by combining advanced grammar compression with fixed-length code. Collage systems [21] provide a unified algorithm for compressed pattern matching, allowing us to obtain an efficient pattern matching algorithm for grammar compression as an instance of the unified algorithm.

Since proposing Repair-VF, we have proposed several improvements for it. LT-Repair [35] improves RePair processing semi-online by adding the constraint called the *left-tall condition* to its grammar. This makes it possible to efficiently compress large-scale text data with small memory.

MR-Repair [17], which we have recently proposed, is a method that reduces the output grammar size by focusing on the relationship between RePair and maximal repeats. Although heuristic improvements [4, 20, 28, 36] focusing on non-maximal repetitive substrings have previously been proposed, MR-Repair is superior because it has been proven to generate theoretically smaller grammar than the original RePair. A topic for future work is to see whether compressed pattern matching using these methods can be performed efficiently.

In the present work, we mainly explained the compressed pattern matching problem based on text scanning. However, the *succinct index* technology which combines text index and data compression was also established in 2000. This is an indexing technology that utilizes a succinct data structure that can solve query processing with a small space of almost the information-theoretic lower bound. Succinct index has an excellent property that allows full-text searching while compressing a target text smaller than the original text. For details of this technology, refer to the excellent book by Navarro [37].

In terms of online grammar compression methods, there exists FOLCA proposed by Maruyama et al. [33] and its improvement SOLCA proposed by Takabatake et al. [50]. FOLCA is based on a string factorization called edit-sensitive parsing. It performs factorization and grammar generation in parallel while reading an input text sequentially from the beginning. It is known that *straight line programs* (restricted CFGs) generated by FOLCA can be used as index structures [5].

In recent years, Martinez et al. [31] proposed a novel compression method called Marlin and an improvement of it [32]. These methods achieve both decompression at ultrahigh-speed and good performance in terms of compression ratio. If we can decompress compressed data at sufficiently high speed, we can perform pattern matching efficiently even if it is performed after decompressing the data. Comparative studies on these approaches are also left for future work.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 8 Orthogonal Range Search Data Structures**

**Kazuki Ishiyama and Kunihiko Sadakane**

**Abstract** We first review existing space-efficient data structures for the orthogonal range search problem. Then, we propose two improved data structures, the first of which has better query time complexity than the existing structures and the second of which has better space complexity that matches the information-theoretic lower bound.

#### **8.1 Introduction**

Consider a set *P* of *n* points in the *d*-dimensional space R*<sup>d</sup>* . Given an orthogonal range *Q* = - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 × - *l* (*Q*) <sup>1</sup> , *<sup>u</sup>*(*Q*) 1 ×···× - *l* (*Q*) *<sup>d</sup>*−<sup>1</sup>, *<sup>u</sup>*(*Q*) *d*−1 , the problem of answering queries for information on *P* ∩ *Q*, the subset of *P* contained in the range *Q*, is called the *orthogonal range search* problem, and is one of the fundamental problems in computational geometry.

The information obtained about *P* ∩ *Q* differs depending on the query. The most basic queries are the *reporting* query, which enumerates all the points in *P* ∩ *Q*, and the *counting* query, which returns the number of points |*P* ∩ *Q*|. There are other queries such as the *emptiness* query, which checks whether *P* ∩ *Q* is empty or not, and aggregate queries, which compute the summation, average, or variance of weights of points in the query range.

Applications of the orthogonal range search problem include database searches [21]. For example, assuming there is a database of employees of a company, then a query to count the number of employees whose duration of service is at least *x*<sup>1</sup> years and at most *x*<sup>2</sup> years, age is at least *y*<sup>1</sup> and at most *y*2, and annual income is at least *z*<sup>1</sup> and at most *z*2, can be formalized as an orthogonal range search problem. Other applications include geographical information systems, CAD, and computer graphics.

K. Ishiyama · K. Sadakane (B)

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan e-mail: sada@mist.i.u-tokyo.ac.jp

<sup>©</sup> The Author(s) 2022

N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_8

In such applications, it is common to perform multiple queries on the same point set *P*. We therefore consider constructing the problem as an indexing problem: Given a point set *P* a priori, we first construct some data structure *D* from *P*. Then, when a query range *Q* is given, we answer the query using the data structure *D*.

#### *8.1.1 Existing Work*

In many existing works, the number *n* of points is regarded as a variable for evaluating time complexity and the number *d* of dimensions is regarded as a constant. However, in this chapter, we regard *d* as a variable too. For the computation model, we use w-bit word RAM where w = - (lg *n*) bits. That is, a constant number of coordinate values can be treated in constant time. Then, it takes O(*d*) time to check whether a point is inside a query range.

If more space than - (*dn*) words is allowed to be used for the space complexity of data structures and if we assume that *d* is a constant, then we can perform the counting and reporting queries in time polynomial to log *n*. *Range trees* [2, 14, 15, 23] are such data structures. Range trees support counting queries in O *d* log*<sup>d</sup>*−<sup>1</sup> *n* time and reporting queries in O *<sup>d</sup>* log*<sup>d</sup>*−<sup>1</sup> *<sup>n</sup>* <sup>+</sup> *dk* time using O *dn* log*<sup>d</sup>*−<sup>1</sup> *n* word space, where *k* = |*P* ∩ *Q*|, that is, the number of points enumerated by a reporting query using the fractional cascading technique [15, 23]. Although these data structures are time-efficient, it is desirable to develop more space-efficient data structures.

Some data structures having linear space complexity have been proposed. For example, quad trees [6] were the first data structures used for orthogonal range search. Unfortunately, quad trees have terrible worst-case behaviors. To overcome this, kd-tree [1] is used. The query time complexity of the kd-tree is O *d*2*n <sup>d</sup>*−<sup>1</sup> *d* for counting and O *d*2*n <sup>d</sup>*−<sup>1</sup> *<sup>d</sup>* <sup>+</sup> *dk* for reporting [13].

These data structures store the coordinates of points separately in plain form, and therefore can be applied to the case of real-valued coordinates. However, if the coordinates take integer values from 0 to *n* − 1, then there exist data structures with even smaller space complexity and query time complexity. For example, Chazelle [4] proposed a data structure for the two-dimensional case with linear space complexity and time complexity of O(lg *<sup>n</sup>*)for counting and O(lg *<sup>n</sup>* <sup>+</sup> *<sup>k</sup>* lg<sup>ε</sup> *<sup>n</sup>*)for reporting where 0 <ε< 1 is any constant. Note that although the assumption that each coordinate value is an integer from 0 to *n* − 1 seems too strict, as is explained in Sect. 8.2.2, any orthogonal range search problem in *d*-dimensional space can be reduced into one on the [*n*] *<sup>d</sup>* grid, and therefore the assumption does not create any difficulties.

There has also been research on succinct data structures for the orthogonal range search problem. The wavelet tree [9] is a data structure which was originally proposed for representing compressed suffix arrays, and it later turned out that wavelet tree can support various queries efficiently [18]. For the orthogonal range search problem, wavelet tree can support counting queries in O(lg *n*) time and reporting queries in O((1 + *k*)lg *n*) time [8]. Bose et al. [3] proposed improved succinct data structures that support counting queries in O(lg *n*/ lg lg *n*) time and reporting queries in O(((1 + *k*)lg *n*/ lg lg *n*) for two-dimensional cases.

For higher dimensions, Okajima and Maruyama [20] proposed the KDW-tree, which is a succinct data structure for any dimensionality. The query time complexity of the KDW-tree is smaller than that of the kd-tree. If we assume *d* is a constant, counting queries take O *n <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* lg *n* time and reporting queries take O*n <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* + *k* lg *n* time. The KDW-tree has been shown to be practical by numerical experiments.

#### *8.1.2 Our Results*

We show space and time complexities of data structures for the orthogonal range search problem explained in Sect. 8.1.1 and our proposed data structures in Table 8.1. Note that these are for the case where the coordinates are integers from 0 to *n* − 1, and the space complexities are measured in bits. Table 8.1 shows reporting time complexities. Counting time complexities can be obtained by letting *k* = 0.

Our data structures are space-efficient for high-dimensional orthogonal range search problems.

Our first data structure has the same space complexity as the KDW-tree and better query time complexities. Note that the result in Table 8.1 is for the case of *d* ≥ 3. If *<sup>d</sup>* <sup>=</sup> 2, we can improve the *<sup>n</sup> <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* term to lg *n*. This result appeared in [11].

Note that, as shown in Sect. 8.2.1, the necessary space to represent a set of *n* points in *d*-dimensional space such that each coordinate takes an integer value from 0 to *n* − 1 is (*d* − 1)*n* lg *n* + - (*n*) bits. This means that if we assume *d* is a constant, the space complexity of the KDW-tree and our first data structure does not match the information-theoretic lower bound asymptotically.


**Table 8.1** Comparison of complexities. The results or KDW-tree and Ours 1 are for *d* ≥ 3. Note that *k* is the number of points enumerated by a reporting query. The time complexities for counting queries are obtained by letting *k* = 0 in the time complexities for reporting queries

Our second data structure uses (*d* − 1)*n* lg *n* + (*d* − 1) · o(*n* lg *n*) bits of space. This asymptotically matches the information-theoretic lower bound even if *d* is assumed to be a constant. Therefore, we can say this data structure is truly succinct. Unfortunately, the worst-case query time complexity is O(*dn* lg *n*), which is not fast in theory. However, this data structure is fast in practice for the case where the number *d* of dimensions is large but the number *d* of dimensions used for a query is small. This kind of query often occurs in the database search applications shown in Sect. 8.1. This result appeared in [10].

#### **8.2 Preliminaries**

In this paper, we assume that coordinates of points are non-negative integers. As will be explained in Sect. 8.2.2, we sometimes assume that coordinates are integers from 0 to *n* − 1. Therefore, we define [n] as the set {0, 1,..., *n* − 1}. For a d-dimensional space, we denote each dimension by dim. 0, dim. 1, . . . , dim. *d* − 1, coordinate values of a point by 0-th coordinate value, 1-th coordinate value, ..., *d* − 1-th coordinate value. For a rooted tree, we assume the depth of the root node is 0. Throughout the paper, log x denotes the natural logarithm and lg x denotes the base 2 logarithm.

Next, we define two concepts used in this chapter. The first one is containment degree. This is the concept of an inclusion relationship between two orthogonal ranges introduced in [20]. For two *d*-dimensional orthogonal ranges *Q* = - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 × ... × - *l* (*Q*) *<sup>d</sup>*−<sup>1</sup>, *<sup>u</sup>*(*Q*) *d*−1 and *R*= - *l* (*R*) <sup>0</sup> , *<sup>u</sup>*(*R*) 0 ×···× - *l* (*R*) *<sup>d</sup>*−<sup>1</sup>, *<sup>u</sup>*(*R*) *d*−1 , we define CDeg(*R*, *Q*) as

$$\text{CDeg}(R,\,\mathcal{Q}) = \# \left\{ i \in [d] \; \Big|\; \left[l\_i^{(R)}, u\_i^{(R)}\right] \subseteq \left[l\_i^{(\mathcal{Q})}, u\_i^{(\mathcal{Q})}\right] \right\}$$

and call it the containment degree of *R* with respect to *Q*. This is the number of dimensions, in each of which *R* is contained in *Q*. The containment degree is an important concept for analyzing time complexities of orthogonal range search algorithms.

Next, we explain *z*-value. This is a projection of multi-dimensional data onto onedimensional data as proposed by Morton [17]. Consider a point *p* = (*p*0, *p*1,..., *pd*−<sup>1</sup>) in the *d*-dimensional space where the coordinate values are integers. If coordinate values are expressed as *l*-bit binary numbers *p*<sup>0</sup> = *b*<sup>0</sup> 0*b*1 <sup>0</sup> ··· *<sup>b</sup><sup>l</sup>*−<sup>1</sup> <sup>0</sup> , *p*<sup>1</sup> = *b*0 1*b*1 <sup>1</sup> ··· *<sup>b</sup><sup>l</sup>*−<sup>1</sup> <sup>1</sup> ,..., *pd*−<sup>1</sup> = *b*<sup>0</sup> *<sup>d</sup>*−<sup>1</sup>*b*<sup>1</sup> *<sup>d</sup>*−<sup>1</sup> ··· *<sup>b</sup><sup>l</sup>*−<sup>1</sup> *<sup>d</sup>*−<sup>1</sup>, the *z*-value *z*(*p*) of point *p* is defined as

$$z(p) = b\_0^0 b\_1^0 \cdots b\_{d-1}^0 b\_0^1 b\_1^1 \cdots b\_{d-1}^1 \cdots b\_0^{l-1} b\_1^{l-1} \cdots b\_{d-1}^{l-1} \cdots$$

In the case of a two-dimensional space, if we arrange grid points in increasing order of *z*-value, we see a z-shape curve as shown in Fig. 8.1. We therefore call the value *z*-value.

**Fig. 8.1** Curve obtained by joining grid points in *z*-valueorder in two-dimensional space

## *8.2.1 Succinct Data Structures and Information-Theoretic Lower Bound*

Succinctness of data structures was proposed by Jacobson [12] and is one of the criteria for measuring space complexities of data structures. It is defined as follows.

Let *n* be the number of different values that an object can take. Then, we need at least lg *n* bits of space to represent the object. If the space complexity *S*(*n*) of a data structure representing the object satisfies *S*(*n*) = lg *n* + o(lg *n*) bits, we say the data structure is *succinct* and lg *n* bits is the *information-theoretic lower bound* of the size of representations of the object. Note that succinct data structures not only offer data compression, but also support some efficient queries. For orthogonal range search, a naive algorithm supports linear time queries by scanning an array containing coordinate values of points. Succinct data structures are therefore expected to answer queries in sublinear time.

The space complexity of lg *n* + o(lg *n*) bits in the definition of succinct data structures indicates that the size of auxiliary indexing data structures added to the data is negligibly small compared with the size of the data itself (lg *n* bits). In other words, the space complexity of succinct data structures asymptotically matches the information-theoretic lower bound when *n* → ∞.

We compute the information-theoretic lower bound for representing a set of points with integer coordinates. Assume that *i*-th coordinate value takes integer values from 0 to *Ui* <sup>−</sup> 1. Because the number of grid points is *<sup>d</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *Ui* , the number of different sets of *n* points is

$$
\begin{pmatrix}
\prod\_{i=0}^{d-1} U\_i \\
\end{pmatrix}.
$$

By using Stirling's approximation formula

$$
\log n! = n \log n - n + \mathcal{O}(\log n),
$$

we obtain

$$\begin{split} \log\binom{U}{n} &= \log U! - \log(U-n)! - \log n! \\ &= U \log U - U - (U-n)\log(U-n) + (U-n) - n\log n + n + \mathcal{O}(\log U) \\ &= U \log\frac{U}{U-n} + n \log \frac{U-n}{n} + \mathcal{O}(\log U) \\ &= U \log\left(1 + \frac{n}{U-n}\right) + n \log \frac{U}{n} \left(1 - \frac{n}{U}\right) + \mathcal{O}(\log U) \\ &= U \left(\frac{n}{U-n} - \Theta\left(\left(\frac{n}{U-n}\right)^2\right)\right) + n \log \frac{U}{n} - \Theta\left(\frac{n^2}{U}\right) + \mathcal{O}(\log U) \\ &= n \log U - n \log n + \Theta\left(n\right). \end{split}$$

Therefore, the information-theoretic lower bound of the size for representing the point set is

$$\lg\left(\prod\_{n=0}^{d-1} \frac{U\_i}{n}\right) = \sum\_{i=0}^{d-1} n \lg U\_i - n \lg n + \Theta\left(n\right).$$

Note that storing coordinate values of the points explicitly using *<sup>d</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> lg *Ui* use *n* lg *n* bit more space than the information-theoretic lower bound.

#### *8.2.2 Assumptions on Point Sets*

Because data structures such as kd-tree or range trees that have linear or larger space complexities usually store the coordinates of points in a plain format, we do not care whether they are integers or real values. However, if we consider succinct data structures, we usually assume that coordinates values are integers from 0 to *n* − 1. We also assume that for any points *p*, *q* ∈ *P* and any *i* ∈ [*d*], the *i*-th coordinate value *pi* of *p* and the *i*-th coordinate value *qi* of *q* are different. Although this assumption may appear to be unrealistic and too strong, for the orthogonal range search problem, it is known that an arbitrary point set on R*<sup>d</sup>* can be transformed into a point set on [*n*] *<sup>d</sup>* [7].

Consider a set *<sup>P</sup>* of *<sup>n</sup>* points on <sup>R</sup>*<sup>d</sup>* . We create another point set *<sup>P</sup>* on [*n*] *<sup>d</sup>* as follows. The set *P* also contains *n* points and there is a one-to-one correspondence between points in *P* and points in *P* . Assume that *p* ∈ *P* corresponds to *p* ∈ *P* . Then, the *i*-th coordinate value *p <sup>i</sup>* of *p* is then defined from the *i*-th coordinate value *pi* of *p* as

#### 8 Orthogonal Range Search Data Structures 127

$$p'\_i = \#\{q \in P \mid q\_i < p\_i\}.\tag{8.1}$$

That is, the *i*-th coordinate value of *p* is the number of points in *P* such that the *i*-th coordinate value is smaller than *pi* . This is called the rank value of *p* with respect to the *i*-th coordinate value, and the transformation is called the transformation into rank space. We use arrays *C*0,*C*1,...,*Cd*−<sup>1</sup> each of length *n*. The array *Ci* stores the *i*-th coordinate values of points in *P* in increasing order.

By using the point set *P* on the rank space and the arrays *Ci* (*i* = 0,..., *d* − 1) that contain the original coordinate values of the points in *P*, we can reduce the problem of orthogonal range search on the original point set *P* into that on *P* . Assume that a query range *Q* = - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 ×···× - *l* (*Q*) *<sup>d</sup>*−<sup>1</sup>, *<sup>u</sup>*(*Q*) *d*−1 <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* is given for a point set *P*. From the construction of *P* , there exists a range *Q* = - *l* (*Q* ) <sup>0</sup> , *<sup>u</sup>*(*Q* ) 0 × ···× - *l* (*Q* ) *<sup>d</sup>*−<sup>1</sup> , *<sup>u</sup>*(*Q* ) *d*−1 ⊂ [*n*] *<sup>d</sup>* such that

$$p \in \mathcal{Q} \iff p' \in \mathcal{Q'}.$$

The boundaries of this *Q* are computed by

$$\begin{aligned} l\_i^{(\mathcal{Q}')} &= \# \left\{ p \in P \; \middle| \; p\_i < l\_i^{(\mathcal{Q})} \right\} \\ \mu\_i^{(\mathcal{Q}')} &= \# \left\{ p \in P \; \middle| \; p\_i \le \mu\_i^{(\mathcal{Q})} \right\} - 1. \end{aligned}$$

These are computed in O(*d* lg *n*) time by binary searches on the arrays *Ci* . Then, the counting query is performed by using *Q* . For the reporting query, after finding a point *p* ∈ *P* which is included in the query range *Q* in the rank space, we need to recover the original coordinates of the point *p* ∈ *P*. This is done in O(*d*) time using the arrays *Ci* containing the coordinates of the original points by

$$p\_i = C\_i[p'\_i].$$

Thus, an orthogonal range search problem on R*<sup>d</sup>* can be transformed into that on [*n*] *<sup>d</sup>* . Note that if coordinates are transformed as in Eq. (8.1), the identical coordinate values in <sup>R</sup>*<sup>d</sup>* are transformed into identical coordinate values in [*n*] *<sup>d</sup>* . By shifting values by one for the identical coordinate values, we can transform the coordinate values so that for any two distinct points *p* , *q* ∈ *P* and any *i* ∈ [*d*], the *i*-th coordinate value *p <sup>i</sup>* of *p* is different from the *i*-th coordinate value *q <sup>i</sup>* of *q* .

If the original points have integer coordinate values, we can reduce the space [19]. Consider the case where *P* is a point set on [*U*] *<sup>d</sup>* , that is, each coordinate value takes an integer value from 0 to *U* − 1. In this case, the point set *P* in the rank space does not change. However, we store the coordinates of the original point set *P* in a different way. We store them using multi-sets *M*0, *M*1,..., *Md*−1, each of which corresponds to one of the *d* dimensions. The multi-set *Mi* stores the *i*-th coordinate value of the points in *P*. We use the data structure of [22] to store multi-sets.

**Lemma 8.1** *There exists a data structure using n* lg(*U*/*n*) + O(*n*) *which supports a selectm query on a multi-set Mi in constant time.*

A selectm query on a multi-set *M* finds the *j*-th smallest element in *M*. That is, *Ci*[*j*] is obtained by finding the *j*-th smallest element in array *Ci* . Therefore, if a query range *Q* on [*U*] *<sup>d</sup>* is given, it can be transformed into a query range *Q* on the rank space by binary searches using selectm queries, and the original coordinate values are obtained by *d* many selectm queries.

Assume that there exists a succinct data structure *D* for a point set *P* on [*n*] *d* . Then, the space complexity of *D* is(*d* − 1)*n* lg *n* + (*d* − 1) · o(*n* lg *n*) bits, as shown in Sect. 8.2.1. If we add *d* data structures of Lemma 8.1, the total space complexity becomes *dn* lg *U* − *n* lg *n* + (*d* − 1) · o(*n* lg *n*) bits. This is succinct for the point set *P* on [*U*] *<sup>d</sup>* . Therefore, if there exists a succinct data structure for a point set on [*n*] *d* , we can construct a succinct data structure for a point set on [*U*] *<sup>d</sup>* . From here onward, we consider only point sets on [*n*] *d* .

#### **8.3 kd-Tree**

kd-tree [1] is a well-known data structure that partitions the space recursively. It is used not only for the orthogonal range search problem, but also for the nearest neighbor search problem.

#### *8.3.1 Construction of kd-Trees*

We explain the algorithm for constructing a kd-tree of a point set *P* for the twodimensional case. First, we find the point *p* for which the *x*-coordinate is the median of the point set *P*, and store *p* at the root of the kd-tree. Next, we divide the set *P* \ {*p*} into two: the set *P*left that stores points with *x*-coordinates smaller than that of *p*, and the set *P*right that stores points with *x* coordinates larger than that of *p*. We add two children vleft, vright to the root of the kd-tree. Next, from *P*left (*P*right), we find *p*left (*p*right) for which the *y*-coordinate is the median of the set, and we store *p*left (*p*right) in vleft (vright). Similarly, we divide the set *P*left \ {*p*left} (*P*right \ {*p*right}) into two subsets according to *y*-coordinates, find medians with respect to *x*-coordinates, and store them in children of vleft (vright), and repeat this recursively. Figure 8.2 shows an example of partitioning a point set.

For a *d*-dimensional space, we partition the space based on the first dimension, the second dimension, and so on. After partitioning the space based on the *d*-th dimension, we use the first dimension again.

**Fig. 8.2** Partitioning of a space based on the point set (left) and the corresponding kd-tree (right). Points *A*, *B*,*C*,... correspond to nodes *a*, *b*, *c*,.... The range corresponding to node *h* is shown in gray in the left figure

#### *8.3.2 Range Search Algorithm*

An important concept for understanding range searches using a kd-tree is the correspondence between nodes of the kd-tree and ranges. In Sect. 8.3.1, we explained that each node of the kd-tree stores a point. We can also consider that each node corresponds to an orthogonal range. Let *V*(v) denote the point in *P* stored in node v and *R*(v) denote the corresponding range. Then *R*(v) is defined as follows:


For example, in Fig. 8.2, the range *R*(*h*) corresponding to node *h* is the gray area.

The algorithm for reporting queries using a kd-tree is as follows. The algorithm searches the space by traversing tree nodes from the root. Each time a node v is visited, the algorithm checks whether the corresponding point *V*(v) (∈ *P*) is contained in the query range *Q* or not. If the range *R*(v) is fully contained in the query range *Q*, the algorithm outputs all the points stored in the sub-tree rooted at v. If *R*(v) and *Q* has no intersection, the algorithm terminates the search of the sub-tree. For a counting query, instead of outputting all the points when *R*(v) is contained in *Q*, the algorithm finds and accumulates the size of the sub-tree rooted at v. Although it may seem impossible to execute the algorithm since the range *R*(v) for node v is not explicitly stored in the kd-tree, if the range *R*(v) for node v is known, then we know the coordinate values of the hyperplane partitioning the range from the coordinate values of point *V*(v), and we can compute *R*(vleft) and *R*(vright). Therefore, we can execute the algorithm by keeping the range *R*(v) during the search.

#### *8.3.3 Complexity Analyses*

The time complexity of kd-trees is analyzed in [13]. A counting query takes O *<sup>d</sup>* · *<sup>n</sup> <sup>d</sup>*−<sup>1</sup> *<sup>d</sup>* + 2*<sup>d</sup>* time. In general, we assume *d* is a constant and write the complexity as O *n <sup>d</sup>*−<sup>1</sup> *d* . For a reporting query, we output all coordinates of points in *Q*. Because a point can be output in constant time, the query time complexity is O *n <sup>d</sup>*−<sup>1</sup> *<sup>d</sup>* + *k* .

If *d* ≥ lg *n*, the height of the kd-tree is at most *d*, and therefore the space is partitioned at most *d* times. Then, it is necessary to traverse all the nodes and a query takes O(*n*) time.

#### **8.4 Wavelet Tree**

Wavelet tree is a succinct data structure supporting various queries on strings and integer sequences efficiently. It was originally proposed for representing compressed suffix arrays [9], but it later became known that wavelet tree can support more operations [18]. Orthogonal range search in two-dimensional space is one of these operations [16].

#### *8.4.1 Construction*

The two-dimensional point sets *P* that can be represented directly using wavelet tree are those where the coordinates take integer values from 1 to *n* and the *x*-coordinate values are all distinct. As explained in Sect. 8.2.2, without loss of generality, we can transform any point set into a point set in [*n*] *<sup>d</sup>* space. For such a two-dimensional point set *P*, consider an integer sequence *C* that contains the *y*-coordinates of the points in increasing order of *x*-coordinates. For example, for the point set in Fig. 8.3, the corresponding integer sequence *C* is 4, 2, 7, 5, 0, 3, 1, 6. For this sequence *C*, we construct a wavelet tree as follows.

First, we consider that the root of the wavelet tree corresponds to *C*. Note that we do not store *C* directly in the wavelet tree. We then focus on the most significant (highest) bit of the lg *n*-bit binary representation of each integer in *C*. If it is 0 (1), the integer is moved into the left (right) child of the root. We consider that each child node of the root corresponds to an integer sequence containing the numbers in the original array *C* in the same order. For example, in the example in Fig. 8.3, integers from 0 to 3 go to the left child, and integers from 4 to 7 go to the right child. Therefore, the left child corresponds to an integer sequence 2, 0, 3, 1, and the right child 4, 7, 5, 6.

**Fig. 8.3** A two-dimensional point set *P* (left) and the corresponding wavelet tree (right)

Next, for each integer sequence of child nodes, we focus on the second most significant bit of the binary representation of each number. We move a number with 0 bit to the left, and a number with 1 bit to the right. Similarly, we repeat this until the integer sequence of a node consists of the identical integer.

Note that we do not store integer sequences in nodes of the wavelet tree. In each node, we store a bit string of the same length as the corresponding integer sequence. The *i*-th bit of the bit string is 0 (1) if the *i*-th integer in the integer sequence goes to the left (right) child. In other words, a bit string stored in a node of depth *l* is the concatenation of the (*l* + 1)-th highest bit of each integer in the integer sequence corresponding to the node. In the example in Fig. 8.3, the integer sequence corresponding to the root node is 4, 2, 7, 5, 0, 3, 1, 6, and because integers from 0 to 3 go to the left child and integers from 4 to 7 go to the right child, the bit string stored in the root node is 1, 0, 1, 1, 0, 0, 0, 1. Note that we do not store bit strings at leaf nodes. We show the information stored in the wavelet tree in the right tree in Fig. 8.3. Only bit strings drawn above the dark gray rectangles, that is, those in the lower row of each node, are stored.

Note that although it may seem impossible to recover the original information (the integer sequence) from these bit strings, it is possible. Consider the recovery of the fourth integer of the wavelet tree in Fig. 8.3 (right). From the bit string stored in the root node, we know that the first bit of the integer is 1. Because this 1 bit corresponding to the fourth integer is the third 1 in the bit string, we know that the integer to be recovered corresponds to the third bit of the bit string in the right child of the root node. If we look at the third bit of the right child, we know that the second bit of the integer is 0. Further, this 0 bit is the second 0 in the bit string, the integer to be recovered corresponds to the second bit of the left child of the current node. Finally, from the second bit of the left child, we know the last bit of the integer to be recovered is 1. Therefore, the fourth integer is 101 in binary, that is, 5. This is shown in Fig. 8.4.

In this recovery operation, we need to compute the number of zeros/ones in the first *i* bits of a bit string. This operation is also used in the range search algorithm in the next section. If we look at bits one by one from the beginning of a bit string, it takes O(*n*) time, which is too slow. We therefore represent the bit string of each node by the following data structure [5, 12].

**Lemma 8.2** *For a bit string of length n, there exists a data structure using n* + o(*n*) *bits which answers a rank/select query in constant time, where the rank query* rank*<sup>b</sup>* (*B*,*i*) *is to count the number of b bits (b* = 0, 1*) in the bits from B*[0] *to B*[*i*] *(i* ≥ 0*) of a bit string B, and the select query* select*<sup>b</sup>* (*B*,*i*) *is to return the position of the i -th b (i* ≥ 1*, b* = 0, 1*) in a bit string B.*

The select query is also necessary for range searches using a wavelet tree.

#### *8.4.2 Range Search Algorithm*

We explain how to solve the two-dimensional range search problem using a wavelet tree. First, we explain the counting query, which is performed by a recursive function as in Algorithm 1. For a query range *Q* = [*l*,*r*]×[*b*, *t*], the argument of the function is WTCounting(*l*,*r*, *<sup>b</sup>*, *<sup>t</sup>*, vroot, <sup>0</sup>, <sup>2</sup>lg *<sup>n</sup>* <sup>−</sup> <sup>1</sup>), where <sup>v</sup>root is the root node of the wavelet tree. The left (right) child of node v is represented by vleft (vright). The bit string stored in node v is represented by v.*B*.

We explain the algorithm in Fig. 8.5 using the example of searching a range *Q* = [1, 6]×[1, 4] for the point set *P* in Fig. 8.3.

The search algorithm traverses the tree from the root. During the search, the algorithm keeps the interval *I* of an integer sequence (or bit string) corresponding to an interval of the *x*-coordinate of the query range. In the example in Fig. 8.5, we focus on the interval *I* = [1, 6] at the root node. To move to the left child, we need to compute the interval corresponding to the query range. This is done by a rank query that counts the number of zeros from the beginning of the bit string to a specified position. In the bit string stored in the root node, the number of zeros from the beginning to the 0-th position (in general, if the interval is *I* = [*l*,*r*], to (*l* − 1)-th

**Fig. 8.5** Behavior of the algorithm when searching a range of[1, 6]×[1, 4]for the two-dimensional point set in Fig. 8.3

**Algorithm 1** WTCounting(*x*1, *<sup>x</sup>*2, *<sup>y</sup>*1, *<sup>y</sup>*2, v, *<sup>a</sup>*, *<sup>b</sup>*)

**Input:** A node v of the wavelet tree and an interval [*x*1, *x*2] in the corresponding bit string, the interval [*a*, *b*] of *y* coordinate corresponding to node v, and the interval [*y*1, *y*2] of *y* coordinate for the query range.

**Output:** The number of points stored in the sub-tree rooted at v and contained in *Q*.

```
1: if x1 > x2 then
2: return 0
3: else if [a, b]∩[y1, y2] = ∅ then
4: return 0
5: else if [a, b]⊆[y1, y2] then
6: return x2 − x1 + 1
7: end if
8: xl
    1 ← rank0 (v.B, x1 − 1)
9: xl
    2 ← rank0 (v.B, x2) − 1
10: xr
     1 ← x1 − xl
                 1
11: xr
     2 ← x2 − xl
                 2 − 1
12: m ← (a + b)/2
13: return WTCounting(xl
                            1, xl
                                2, y1, y2, vleft, a, m)
           +WTCounting(xr
                             1 , xr
                                 2 , y1, y2, vright, m + 1, b)
```
position) is 0, so we know the interval corresponding to the query starts at position 0. Because the number of zeros from the beginning to the 6-th position (in general, if the interval is *I* = [*l*,*r*], to *r*-th position) is four, we know the interval ends at position 3. Thus, we obtain the interval *I* = [0, 3] for the left child. Similarly, for the right child, by using rank queries counting the number of ones, we can obtain the interval *I* = [1, 2].

We repeat this process by going down the tree maintaining an interval. When we reach a leaf, we can determine if the *y*-coordinate of the point is included in the query range. However, we can sometimes determine this at an earlier stage. For example, in Fig. 8.5, after obtaining the interval *I* = [1, 2] at the left child of the root, for the right child of the current node the interval of the *y*-coordinate corresponding to the node is [2, 3], which is completely included in the interval [1, 4] of the *y*-coordinate of the query range. Therefore, for the two points we focus on at this node, both the *x*- and *y*-coordinates are included in the query range, and we found two points in the query range. However, after computing the interval *I* = [1, 2] for the right child of the root, the interval of the *y*-coordinate corresponding to the right child of the current node is [6, 7], which has no intersection with the interval [1, 4] of the *y*-coordinate of the query range. We do not need to further search the sub-tree.

As observed above, in a range search using a wavelet tree, if the query range is *Q* = [*l*,*r*]×[*b*, *t*], we first focus on points for which the *x*-coordinates are contained in *Q*, that is, contained in the range [*l*,*r*]×[0, *n* − 1]. Next, the process of traversing down the tree corresponds to partitioning the range into two according to the *y*coordinate. If an obtained range is completely contained in the query range, or does not intersect with the query range, we terminate searching the sub-tree.

For counting queries, it is sufficient to sum the number of points. For reporting queries, the extra work of computing the coordinates of the points is also required. This is shown in Algorithm 2.

The outline of the reporting query is the same as the counting query. In Algorithm 1, we obtain the number of points in Line 2. We change it one by one to output coordinates of points corresponding to the interval [*x*1, *x*2] of the bit string v.*B*. The *<sup>x</sup>*- and *<sup>y</sup>*-coordinates of each point are obtained by WTReportX


**Input:** A node v of the wavelet tree, the interval [*x*1, *x*2] of the bit string stored in it, the interval [*a*, *b*] of *y* coordinates corresponding to the range for v, and the interval [*y*1, *y*2] of *y* coordinates for the query range. **Output:** Coordinates of point stored in the sub-tree rooted at v and contained in *Q*. 1: **if** *x*<sup>1</sup> > *x*<sup>2</sup> **then** 2: terminate 3: **else if** [*a*, *<sup>b</sup>*]∩[*y*1, *<sup>y</sup>*2] = <sup>∅</sup> **then** 4: terminate 5: **else if** [*a*, *b*]⊆[*y*1, *y*2] **then** 6: **for** *i* = *x*<sup>1</sup> to *x*<sup>2</sup> **do** 7: *<sup>x</sup>* <sup>←</sup> WTReportX(v,*i*) 8: *<sup>y</sup>* <sup>←</sup> WTReportY(v,*i*, *<sup>a</sup>*, *<sup>b</sup>*) 9: Output (*x*, *y*) 10: **end for** 11: **end if** 12: *x<sup>l</sup>* <sup>1</sup> ← rank0 (v.*B*, *x*<sup>1</sup> − 1) 13: *x<sup>l</sup>* <sup>2</sup> ← rank0 (v.*B*, *x*2) − 1 14: *x<sup>r</sup>* <sup>1</sup> <sup>←</sup> *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *<sup>x</sup><sup>l</sup>* 1 15: *x<sup>r</sup>* <sup>2</sup> <sup>←</sup> *<sup>x</sup>*<sup>2</sup> <sup>−</sup> *<sup>x</sup><sup>l</sup>* <sup>2</sup> − 1 16: *m* ← (*a* + *b*)/2 17: WTReporting(*x<sup>l</sup>* <sup>1</sup>, *<sup>x</sup><sup>l</sup>* <sup>2</sup>, *y*1, *y*2, vleft, *a*, *m*) 18: WTReporting(*x<sup>r</sup>* <sup>1</sup> , *<sup>x</sup><sup>r</sup>*

<sup>2</sup> , *y*1, *y*2, vright, *m* + 1, *b*)

# **Algorithm 3** WTReportX(v,*i*)

**Input:** A node v of the wavelet tree and an integer *i*.

**Output:** The *x* coordinate value of the point corresponding to the *i*-th bit of the bit string stored in v.

```
1: if v is the root then
2: return i
3: else if v is the left child of vparent then
4: i ← select0

                  vparent.B,i + 1

5: return WTReportX(vparent,i)
6: else
7: i ← select1

                  vparent.B,i + 1

8: return WTReportX(vparent,i)
9: end if
```
# **Algorithm 4** WTReportY(v,*i*, *<sup>a</sup>*, *<sup>b</sup>*)

**Input:** A node v of the wavelet tree, the interval [*a*, *b*] of *y* coordinate corresponding to the range for v, and an integer *i*.

**Output:** The *y* coordinate value of the point corresponding to the *i*-th bit of the bit string stored in v.

```
1: if a = b then
2: return a
3: else if v.B[i] = 0 then
4: i ← rank0 (v.B,i) − 1
5: return WTReportY(vleft,i, a, (a + b)/2)
6: else
7: i ← rank1 (v.B,i) − 1
8: return WTReportY(vright,i, (a + b)/2 + 1, b)
9: end if
```
and WTReportY, respectively. The algorithm WTReportY for computing the *<sup>y</sup>*coordinate (Algorithm 4) is similar to the algorithm for recovering a value of the original integer array explained in Sect. 8.4.1. We compute the *y*-coordinates by traversing down the tree using rank queries.

In contrast, the algorithm WTReportX for computing the *<sup>x</sup>*-coordinate (Algorithm 3) traverses up the tree using select queries. We explain this by example. In Fig. 8.5, assume that at node v, which is the right child of the left child of the root, we find that points corresponding to the interval *I* = [0, 1] are contained in the query range. Consider the computation of the *x*-coordinate of the point corresponding to the bit v.*B*[1]. First, the node v we focus on is the right child of its parent. We find the position of the second 1 in the parent by a select query. Then we know that the point corresponds to the bit v .*B*[2] in the parent node v . Next, because the current node is the left child of the parent (the root), we find the position of the third 0 in the bit string of the parent by a select query. Now we know that the point corresponds to the bit *r*.*B*[5] at the root node *r*. That is, the *x*-coordinate of the point is 5.

As shown above, we can traverse the nodes of the wavelet tree using rank and select queries on bit strings. For range searches, we traverse down the tree from the root computing the intervals of the *x*-coordinate corresponding to the query range. If we find a node where the corresponding interval of the *y*-coordinate is contained in the query range, we answer the query by computing the length of the interval or coordinate values by traversing the tree.

#### *8.4.3 Complexity Analyses*

We now analyze the space complexity of the wavelet tree and query time complexities for the orthogonal range search problem.

First, we analyze the space complexity. The height of the wavelet tree is lg *n*. The total length of bit strings stored in the nodes with the same depth is always *n*. Therefore, the total length of all the bit strings in the wavelet tree is *n* lg *n*. We can concatenate all the bit strings and store only a long bit string. Then it is not necessary to store the tree structure of the wavelet tree. By using the data structure of Lemma 8.2 for this long bit string, the space complexity is *n* lg *n* + o(*n* lg *n*) bits in total.

Next, we consider query time complexities. For a counting query, we consider the number of visited nodes. In the wavelet tree, each time we traverse an edge toward a leaf, points with small *y*-coordinates go to the left child, and points with large *y*coordinates go to the right child. At leaves we can consider that all points are sorted in increasing order of *y*-coordinates. This means that leaf nodes corresponding to the interval of *y*-coordinates of the query range exist in a consecutive place in the wavelet tree. Now, consider the set *M* of nodes of the wavelet tree defined as follows. The set *M* contains a maximal node v such that the *y*-coordinates corresponding to the leaf nodes in the sub-tree rooted at v are contained in the query range, that is, the *y*-coordinates of the leaves in the sub-tree of v are contained in the query range but the sub-tree of the parent of v contains some node for which the corresponding *y*-coordinate is not contained in the query range. This is the set of nodes from which we do not further search the sub-tree for a counting query using the wavelet tree, and in Fig. 8.6, it is shown as dark gray nodes.

Let *A* be the set of nodes that are ancestors of nodes of *M*. This is the set of nodes visited before reaching nodes of *M* which are shown as light gray nodes in Fig. 8.6. The number of nodes visited in a counting query is then |*A*|+|*M*|. We now consider the size of *M* and *A*.

For the size of the set *M*, the following lemma holds.

## **Lemma 8.3** *It holds* |*M*| = O(lg *n*)*.*

*Proof* (Lemma 8.3) The set *M* is constructed as follows. Let *M* be the set of leaf nodes of the wavelet tree corresponding to the interval of *y*-coordinates in the query range. For the nodes of *M* , if two nodes v<sup>1</sup> and v<sup>2</sup> have a common parent node v, we remove v<sup>1</sup> and v<sup>2</sup> from *M* and add v to *M* . By repeating this process until there are no such pairs of nodes, the set *M* coincides with *M*.

For each depth of the wavelet tree, the number of nodes of depth belonging to *M* is then at most two, because if there exist more than two nodes, two of them must have the same parent. This completes the proof that |*M*| = O(lg *n*).

**Fig. 8.6** Nodes visited by a counting query. We traverse light gray nodes, and when we reach a dark gray node, we do not further search the nodes below it

For the size of the set *A*, the following lemma holds.

**Lemma 8.4** *It holds* |*A*| = O(lg *n*)*.*

*Proof* (Lemma 8.4) Consider a node v in the set *A*. In the set of leaf nodes in the subtree rooted at v, there must exist a leaf node where the corresponding *y*-coordinate is included in the query range and a leaf node where the corresponding *y*-coordinate is not included in the query range. Therefore, for each depth of the wavelet tree, there are at most two such nodes in *A*, because if there exists more than two such nodes, for a node in the middle, the corresponding *y*-coordinates of the leaves in the sub-tree rooted at that node are contained in the query range. This completes the proof that |*A*| = O(lg *n*).

From the above discussion, the number of nodes visited in a counting query is |*A*|+|*M*| = O(lg *n*). When we visit a new node, we use a constant number of rank queries. Because a rank query takes constant time (Lemma 8.2), the time complexity of a counting query using the wavelet tree is O(lg *n*).

For a reporting query, it is necessary to compute coordinates of points in the query range. As explained in Sect. 8.4.2, *x*-coordinates are computed by traversing up the tree and *y*-coordinates are computed by traversing down the tree, with the coordinates of each point computed by visiting O(lg *n*) nodes. Moving to an adjacent node in the wavelet tree is done by a constant number of rank/select queries, and each rank/select query takes constant time (Lemma 8.2). Therefore, the coordinates of a point are obtained in O(lg *n*) time, and the time complexity for a reporting query using the wavelet tree is O((1 + *k*)lg *n*), where *k* is the number of output points.

We obtain the following theorem.

**Theorem 8.1** *The space complexity of the wavelet tree representing a twodimensional point set on* [*n*] <sup>2</sup> *is n* lg *n* + o(*n* lg *n*) *bits, and a counting query takes* O(lg *n*)*time, and a reporting query takes* O((*k* + 1)lg *n*)*time, where k is the number of points to enumerate.*

As shown in Sect. 8.2.1, the information-theoretic lower bound for a point set on [*n*] <sup>2</sup> is *n* lg *n* + O(*n*) bits. Therefore, the wavelet tree is a succinct data structure.

**Theorem 8.2** *Let P be a set of points on M* = [1..*n*]×[1..*n*]*in which all points have distinct x -coordinates. Then, there exists a data structure using n* lg *n* + o(*n* lg *n*) *bits that answers a counting query in* O(lg lg *n*) *time and a reporting query in* O((1 + *k*)lg *n*/ lg lg *n*) *time, where k is the number of points to output.*

## **8.5 Proposed Data Structure 1: Improved Query Time Complexity**

This data structure uses the idea of adding data structures to the kd-tree to improve the query time complexity [20]. First, we explain the idea of [20] in Sect. 8.5.1. Next, we explain the algorithm of range search in Sect. 8.5.3, and analyze the time complexity in Sect. 8.5.4.

#### *8.5.1 Idea for Improving the Time Complexity of the kd-Tree*

The method proposed in [20] improves the query time complexity of the kd-tree by adding *d* many wavelet trees to the kd-tree such that the term *n*(*d*−1)/*<sup>d</sup>* is replaced by *n*(*d*−2)/*<sup>d</sup>* (lg *n* if *d* = 2), at the cost of increasing the total complexity by a factor of O(lg *n*). Note that we assume point sets are on [*n*] *d* .

First, we construct the kd-tree for a given set *P* of points in the *d*-dimensional space. Next, we label the nodes of the kd-tree with numbers based on the inorder traversal of a binary tree defined as follows:


Figure 8.7 shows an example of a point set (left) and numbers assigned based on the inorder traversal of the kd-tree of the set (right).

Next, we make point sets *Pi* (*i* = 0,..., *d* − 1) with *n* points on [*n*] 2. The two-dimensional point set *Pi* is created as follows. If a point *p* in the original *d*dimensional point set *P* has the *i*-th coordinate value *pi* and the inorder position of the node of the kd-tree containing *p* is *j*, we add point (*j*, *pi*) to *Pi* . Figure 8.8 shows the point sets *P*0, *P*<sup>1</sup> created from the point set in Fig. 8.7.

**Fig. 8.7** A two-dimensional point set (left) and the corresponding kd-tree (right). The numbers of nodes are assigned by an inorder traversal of the kd-tree. The dashed lines in the left figure show the partition of the space by the kd-tree

**Fig. 8.8** Two-dimensional point sets obtained from the point set in Fig. 8.7

From these two-dimensional point sets *P*0,..., *Pd*−1, we construct wavelet trees *W*0,..., *Wd*−1. The wavelet trees *Wi* can be thought of as constructed from an integer sequence *Ai* containing the *i*-th coordinate value of points in *P* in the order of the kd-tree.

These data structures can be used for range searches as follows. Given a query range *Q*, we perform the original search using the kd-tree. In the original algorithm, as explained in Sect. 8.3, we traverse the kd-tree and shrink the range *R*(v), and when CDeg(*R*(v), *Q*) = *d* (i.e., *R*(v) ⊆ *Q*), we know that all the points in the sub-tree rooted at v are contained in *Q*. By using the *d* wavelet tree, we can terminate the search when CDeg(*R*(v), *Q*) = *d* − 1. Assume that when a node v is visited, *R*(v) is contained in *Q* for all dimensions except for *i*. The inorder numbers of nodes in the sub-tree rooted at node v have consecutive values. Let [*a*, *b*] be the interval for the numbers. Then, the points in this interval are contained in *Q* except for dim. *i*. This implies that points in *Pi* that are contained in the range [*a*, *b*]×[*l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *<sup>i</sup>* ] are contained in *Q* even for dim. *i*. Therefore, after finding the node v, it is sufficient to search the range [*a*, *b*]×[*l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *<sup>i</sup>* ] of *Pi* using wavelet trees *Wi* .

The number of nodes of the kd-tree visited by this method is O *n*(*d*−2)/*<sup>d</sup>* (O(lg *n*) for the case *d* = 2). The search of the last dimension using the wavelet tree takes O(lg *n*) time for a counting query. Therefore, the time complexity for a counting query using the kd-tree is improved to O *n*(*d*−2)/*<sup>d</sup>* lg *n* (O lg2 *n* for the case *d* = 2).

#### *8.5.2 Index Construction*

We now explain the proposed data structure. First, we construct the kd-tree for a given point set *P*. Note that this kd-tree is temporarily built in order to construct our data structure, and is not included in the final structure. Next, as in Sect. 8.5.1, we number the nodes of the kd-tree by an inorder traversal, and create *d* many two-dimensional point sets *P*0,..., *Pd*−1. For each *Pi* , we create the data structure of [3]. Let *Bi* be this data structure. Finally, we discard the kd-tree. The final data structure consists of *B*0,..., *Bd*−1.

#### *8.5.3 Range Search Algorithm*

We explain the algorithm for a reporting query using the data structure explained in the previous section. The pseudocode is shown in Algorithm 5. This algorithm simulates a search of the kd-tree using *B*0,..., *Bd*−1. We explain it in comparison with the search algorithm of the kd-tree. Note that we explain the algorithm assuming the inorder number of each node v of the kd-tree is also assigned to the point *V*(v) stored in v. That is, if we say a point with number *j*, it is the point stored in the node with inorder position *j*. We also assume that for an interval [*a*, *b*] of point numbers, *R*([*a*, *b*]) denotes the range containing points that have numbers in [*a*, *b*]. In Algorithm 5, the interval[*a*, *b*] of point numbers always corresponds to the interval of inorder numbers of nodes in the sub-tree rooted at a node v of the kd-tree. Therefore, *R*([*a*, *b*]) coincides with *R*(v).

If we use the kd-tree, we shrink the focused range *R*(v) by going down the tree. In the proposed method, by shrinking the interval [*a*, *b*] of point numbers, we reduce the corresponding range *R*([*a*, *b*]). Because the kd-tree stores the point *V*(v) corresponding to a node v, we can obtain the information of the point used for

# **Algorithm 5** Reporting([*a*, *<sup>b</sup>*], *<sup>Q</sup>*)

**Input:** An interval of point numbers [*a*, *b*] and a query range *Q*.

**Output:** Points with numbers in [*a*, *b*] and which are contained in range *Q*.

1: **if** *Deg*(*R*([*a*, *b*]), *Q*) = *d* − 1 **then** 2: For the last dimension - *i* such that *R*([*a*, *b*]) is not yet contained in *Q*, search [*a*, *b*] × *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* of *Pi* using *Bi* and enumerate points contained in *Q*. For each point, compute coordinates using *B*0,..., *Bd*−1. 3: **else if** *R*([*a*, *b*]) has no intersection with *Q* **then** 4: terminate 5: **else if** *a* = *b* **then** 6: Examine the point with number *a*. If it is in *Q*, output it. 7: **else** 8: *c* ← (*a* + *b*)/2 9: Output the point with number *c* if it is in *Q*. 10: Reporting([*a*, *<sup>c</sup>* <sup>−</sup> <sup>1</sup>], *<sup>Q</sup>*) 11: Reporting([*<sup>c</sup>* <sup>+</sup> <sup>1</sup>, *<sup>b</sup>*], *<sup>Q</sup>*) 12: **end if**

partitioning the space. In contrast, in the proposed method, points are not explicitly stored. However, if the focused interval [*a*, *b*] coincides with the interval of inorder numbers for the sub-tree rooted at a node v, we find *c* = (*a* + *b*)/2 is the number of the points used for partitioning.1 Furthermore, the intervals [*a*, *c* − 1] and [*c* + 1, *b*] correspond to the intervals of the numbers for sub-trees rooted at the left and right child of v, respectively. Therefore, by a recursive search of Algorithm 5, we can obtain the correct partitioning points.

For the range *R*([*a*, *b*]), we can compute the ranges after a partition from the range before partition and the coordinates of the point used for partitioning similarly to the case of the kd-tree.

#### *8.5.4 Complexity Analyses*

We now analyze the complexities of the algorithm. First, we consider its space complexity.We use *d* data structures of Bose et al. [3] each of which uses *n* lg *n* + o(*n* lg *n*) bits as in Theorem 8.2. The total space complexity is then *dn* lg *n* + o(*dn* lg *n*) bits.

Next, we consider the query time complexity. If we use the same analysis as in [20], assuming *d* is a constant, we can show the number of nodes corresponding to cells with containment degree of at most *<sup>d</sup>* <sup>−</sup> 1 is O *n <sup>d</sup>*−<sup>2</sup> *d* . Here, we derive the query time complexity using a novel analysis for non-constant *d*.

<sup>1</sup> In the kd-tree, at each depth, we partition the space by the median of the point set with respect to a dimension, and therefore *c* = (*a* + *b*)/2 is the number of the point used for partitioning. If the point set contains an even number of points, we can obtain the correct partitioning point using a predetermined rule.

The proposed method partitions the space for each dimension in order, in the same fashion as for the kd-tree. As in [20], we define a series of partitions with respect to dim. 0 to dim. d-1 as a cycle. We then calculate the number *Tm* (*n*, *d*) of nodes at which the containment degree with respect to *Q* is at most *d* − 2 in the *m*-th cycle. When the (*m* − 1)-th cycle has finished, the space is partitioned into 2*<sup>d</sup>*(*m*−1) many cells. Among them, we count the number of cells for which the containment degree with respect to *Q* is at most *d* − 2. These cells contain a (*d* − 2)-dimensional face of *Q* (an edge of a cuboid if *d* = 3). A (*d* − 2)-dimensional face of a *d*-dimensional orthogonal range *Q* is obtained by choosing two dimensions from the *d*-dimensions and choosing the upper side or the lower side of the range for each of the two dimensions. Therefore, *Q* has *<sup>d</sup>* 2 2<sup>2</sup> many (*d* − 2)-dimensional faces. When the (*m* − 1)-th cycle has finished, because each dimension is partitioned into 2*<sup>m</sup>*−<sup>1</sup> cells, the number of cells containing a (*d* − 2)-dimensional face is at most 2(*m*−1)(*d*−2) . Then after the (*m* − 1)-th cycle, the number of cells to be searched is at most

$$
\binom{d}{2} 2^2 \cdot 2^{(m-1)(d-2)}\dots
$$

In the sub-trees rooted at these nodes, the number of nodes in the *m*-th cycle is 2*<sup>d</sup>* − 1. Therefore, it holds that

$$\begin{aligned} T\_m(n,d) &\le \left(2^d - 1\right) \binom{d}{2} 2^2 \cdot 2^{(m-1)(d-2)} \\ &< 2^3 d (d-1) 2^{(d-2)m} .\end{aligned}$$

Let *N*(*n*, *d*) be the number of nodes for which the containment degree with respect to *Q* is at most *d* − 2. It then holds that

$$\begin{aligned} N(n,d) &= \sum\_{m=1}^{\frac{1}{d}\lg n} T\_m(n,d) \\ &< 2^3 d(d-1) \sum\_{m=1}^{\frac{1}{d}\lg n} 2^{(d-2)m} \\ &= 2^3 d(d-1) \frac{2^{d-2} \left(2^{\frac{d-2}{d}\lg n} - 1\right)}{2^{d-2} - 1} \\ &= \mathcal{O}\left(d^2 \cdot n^{\frac{d-2}{d}}\right). \end{aligned}$$

We use the fact that the containment degree is weakly increasing as we traverse down the tree. In the proposed method, we terminate the search when the containment degree reaches *d* − 1. The visited nodes are then those with containment degree of at most *d* − 2 and their child nodes. There are at most 2*N*(*n*, *d*) such nodes. The proposed method virtually traverses the kd-tree. It takes O *d* lg *<sup>n</sup>* lg lg *n* time to compute the coordinates of a point stored in a node. When a node for which the containment degree with respect to *<sup>Q</sup>* is *<sup>d</sup>* <sup>−</sup> 1, we search the last one dimension in O lg *<sup>n</sup>* lg lg *n* time. The time complexity of a counting query is then O *d*3*n <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* lg *<sup>n</sup>* lg lg *n* . For a reporting query, it takes O *d* lg *<sup>n</sup>* lg lg *n* time to compute the coordinates of a point. The total time complexity is then O*d*3*n <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* <sup>+</sup> *dk* lg *<sup>n</sup>* lg lg *n* , where *k* is the number of points in *Q*. In summary, we obtain the following:

**Theorem 8.3** *For an orthogonal range search problem on the* [*n*] *<sup>d</sup> space, there exists a data structure that has space complexity of dn* lg *n* + o(*dn* lg *n*) *bits and which answers a counting query in* O *d*3*n <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* lg *<sup>n</sup>* lg lg *n time and a reporting query in* O *d*3*n <sup>d</sup>*−<sup>2</sup> *<sup>d</sup>* <sup>+</sup> *dk* lg *<sup>n</sup>* lg lg *n , where k is the number of points in the query range.*

## **8.6 Proposed Data Structure 2: Succinct and Practically Fast**

The second proposed method is a data structure that is succinct and practically fast. In this method, we use *d* − 1 many wavelet trees to represent a point set on [*n*] *<sup>d</sup>* . In Sect. 8.6.1, we explain how to construct the data structure. In Sect. 8.6.2, we explain the algorithm for the orthogonal range search problem. In Sect. 8.6.3, we analyze the space and time complexities.

#### *8.6.1 Index Construction*

In this method, we assume that the points of *P* have distinct values in the 0-th coordinate value.

First, we create length-*n* integer arrays *A*1,..., *Ad*−1. The array *Ai* corresponds to dim. *i*, and stores the *i*-th coordinate value of the points in increasing order of the 0-th coordinate value. Next, for those arrays we create wavelet trees *W*1,..., *Wd*−1. The wavelet trees *Wi* can be considered to represent the two-dimensional point set *Pi* generated from the *d*-dimensional point set *P* by projecting the points onto the plane spanned by the 0-th axis and the *i*-th axis. Figure 8.9 shows an example.

# **Algorithm 6** Report(*Q*)

**Input:** A query range *Q* = - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 ×···× - *l* (*Q*) *<sup>d</sup>*−1, *<sup>u</sup>*(*Q*) *d*−1 . **Output:** The coordinates of points of *P* contained in *Q*. 1: *<sup>D</sup>* := <sup>∅</sup> 2: **for** *i* = 1 to *d* − 1 **do** 3: **if** - *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* [0, *<sup>n</sup>* <sup>−</sup> <sup>1</sup>] **then** 4: *D* = *D* ∪ {*i*} 5: *ci* := Count *Pi*, - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 × - *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* 6: **end if** 7: **end for** 8: Sort elements *i*1,...,*i*|*D*<sup>|</sup> of *D* in increasing order of *ci* . 9: *<sup>A</sup>* := ReportX *Pi*<sup>1</sup> , - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 × - *l* (*Q*) *<sup>i</sup>*<sup>1</sup> , *<sup>u</sup>*(*Q*) *i*1 10: **for** *i* = *i*<sup>2</sup> to *i*|*D*<sup>|</sup> **do** 11: **for all** *a* ∈ *A* **do** 12: **if** The *i*-th coordinate of a point for which the 0-th coordinate is *a* and is not contained in - *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* **then** 13: *A* = *A* \ {*a*} 14: **end if** 15: **end for** 16: **end for** 17: **for all** *a* ∈ *A* **do** 18: Obtain the coordinates of a point for which the 0-th coordinate is *a* and output them. 19: **end for**

#### *8.6.2 Range Search Algorithm*

Next, we explain how to solve the orthogonal range search problem using the data structure (Algorithm 6).

Assume that a query range *Q* = - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 ×···× - *l* (*Q*) *<sup>d</sup>*−1, *<sup>u</sup>*(*Q*) *d*−1 is given. For each *<sup>i</sup>* <sup>=</sup> <sup>1</sup>,..., *<sup>d</sup>* <sup>−</sup> 1 such that - *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* = [0, *n* − 1], that is, the dimension *i* used for the search, we count the number of points of *Pi* that are contained in range - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 × - *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* using wavelet trees *Wi* (counting query). Let *m* (= |*D*|) be the number of *<sup>i</sup>* (<sup>=</sup> <sup>1</sup>,..., *<sup>d</sup>* <sup>−</sup> 1) such that - *l* (*Q*) *<sup>i</sup>* , *<sup>u</sup>*(*Q*) *i* = [0, *n* − 1], and let *i*1,...,*im* be the sorted ones in increasing order of the number of answers of counting queries.

Using wavelet trees *Wi*<sup>1</sup> , we then enumerate only the *x*-coordinates of points of *Pi*<sup>1</sup> contained in - *l* (*Q*) <sup>0</sup> , *<sup>u</sup>*(*Q*) 0 × - *l* (*Q*) *<sup>i</sup>*<sup>1</sup> , *<sup>u</sup>*(*Q*) *i*1 and store them in a set *A*. For each element *a* of *A* and for each *i* = *i*2,...,*im*, we check whether the *i*-th coordinate of a point for which the 0-th coordinate is *a* is contained in the query range. The elements remaining in *A* correspond to points in the query range. The answer to a counting query is the cardinality of *A*. For a reporting query, we compute coordinates of the points and output them.

The reason we compute the number of points contained in each dimension by a counting query is twofold. Firstly, the *x*-coordinate (the 0-th coordinate) of points contained in the query range with respect to the *i*1-th (and the 0-th) dimension can be output quickly at line 9 of the algorithm if the number of points to enumerate is small. Secondly, in the double loops from line 10 to line 16, we want to reduce the size of *A* as soon as possible.

#### *8.6.3 Complexity Analyses*

Consider the space and time complexities of the proposed method.

For the space complexity, we use *d* − 1 many wavelet trees. Therefore, the space complexity is (*d* − 1)lg *n* + (*d* − 1) · o(lg *n*) bits.

For the query time complexity, let *m* be the number of wavelet trees used in a search. The time to perform *m* counting queries on wavelet tree is O(*m* lg *n*). We then sort *m* integers in O(*m* lg *m*) time. Next, we enumerate the *x*-coordinates of points contained in the query range for the dimension with the minimum number of points. Let *ci*<sup>1</sup> = *c*min be the number of points to enumerate. This takes O((1 + *c*min)lg *n*) time. The time to check whether these points are contained in the query range for other dimensions is O((*m* − 1)*c*min lg *n*). Let *d* be the number of dimensions used in the query, then it holds that *m* ≤ *d* . Therefore, the query time complexity can be written as O *d c*min lg *n* + *d* lg *d* .

#### **8.7 Conclusion**

In this chapter, we first reviewed data structures for high-dimensional orthogonal range search. We then proposed two data structures for the problem.

The first one simulates the search of the kd-tree using *d* succinct data structures for two-dimensional orthogonal range search data structures [3]. We improved the query time complexity of KDW-tree while keeping the same space complexity.

The second one is succinct and practically fast. The space complexity is (*d* − 1)*n* lg *n* + (*d* − 1) · o(*n* lg *n*), which is succinct. The worst-case query time complexity is O(*dn* lg *n*), which is not good. However, if the number *d* of dimensions is large but the number *d* of dimensions used in a search is small, it runs fast in practice.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 9 Enhanced RAM Simulation in Succinct Space**

**Taku Onodera**

**Abstract** We describe two recent results on space-efficient functional random access memory (RAM), which is RAM with non-standard functionalities. The first is about oblivious RAM, which enables a remote database to be accessed without revealing to the database owner which part of the database is being accessed. The other is about wear leveling, which enables the number of updates to be balanced among all the memory cells regardless of the content of the computation being performed on the memory.

## **9.1 Introduction**

Random access memory (RAM) underlies most modern computers, and improvements to the RAM itself can have a positive impact on a wide range of applications. For example, faster RAM access makes all RAM-based computations correspondingly faster. Some types of RAM improvements are not just about efficiency but also about *functionality*. An example is virtual memory in operating systems, which enables, among other things, application programs to utilize the memory without concern about cumbersome management issues such as allocation. Generally speaking, this type of RAM improvement functions by using conventional RAM to simulate "enhanced" RAM while introducing some performance overhead.

In this chapter, we describe two such enhanced RAM simulations—*oblivious RAM* (ORAM) and *wear leveling*—with the emphasis on how to minimize the space overhead. These topics were chosen mainly because the authors' knowledge of them, although there are also some conceptual similarities between ORAM and wear

T. Onodera (B)

The University of Tokyo, Tokyo, Japan e-mail: tk-ono@is.s.u-tokyo.ac.jp

© The Author(s) 2022 N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_9

leveling. Other functionality-enhanced RAM simulations include initializable array [2], memory checking [3], locally decodable code [11], and huge random object [8].<sup>1</sup>

#### **9.2 Oblivious RAM**

#### *9.2.1 Problem*

Suppose you want to outsource a database, stored in RAM, to a server and want to access it in a privacy-preserving way. Although you can hide the data content by encryption, the server can still see which part of the RAM you are accessing. This is a serious issue in the current era of cloud computing. The same problem also appears when one wants to hide the details of software implemented in a physically secure processor that accesses insecure main memory.

Oblivious RAM (ORAM) is the formalization and corresponding solution of this problem. Typically, it works by storing the RAM into some data structure on the server and moves the RAM cells dynamically in the data structure as the user accesses the RAM.

As an example, consider a scheme where the server stores the RAM as-is except that each cell is encrypted by the user's key. To access the *i*th cell, the user performs the following procedure for *j* = 1 to *N* where *N* is the number of cells:

	- For read access, copy the decrypted value to local memory.
	- For write access, change the decrypted value to the new value.

We assume semantically secure encryption when encryption is used in this chapter. In particular, there is an overwhelmingly high probability that the re-encrypted ciphertext looks totally different to the server from the ciphertext before re-encryption regardless of whether the plaintext is updated or not. Thus, no matter what actual access is performed, all that the server can see is that random-looking encrypted cells are updated to still random-looking re-encrypted cells in a fixed scan order. Of course, the access overhead of this method is very large since each cell access takes

<sup>1</sup> A huge random object, in this context, is a succinct representation of a pseudorandom object that supports certain queries. For example, a pseudorandom function can be thought of as a huge pseudorandom bitstring that is implicitly represented by a tiny seed and supports efficient random access. This is not a data structure in the conventional sense because the represented object is a pseudorandom bitstring instead of "data."

time linear to the entire RAM size. The purpose of this example is merely to illustrate the kind of security we want to achieve.

We now give a more formal problem description. We have three parties: the user, the server, and the simulator. The simulator models a program that runs in the local environment of the user. The simulator provides the user with an access interface to RAM that we call the virtual RAM while the server provides the simulator with an access interface to RAM that we call the physical RAM. That is, the user gives the simulator a series of queries of the form (type,*i*, *v*) where type ∈ {read, write}, *i* ∈ [*N*], and *v* ∈ {0, 1}*<sup>B</sup>*. We call these virtual queries. The parameter *N* specifies the number of virtual cells—cells in the virtual RAM—while *B* specifies the size of each virtual cell. Given a virtual query, the simulator gives the server another series of queries of the form (type,*i*, *v*), where type ∈ {read, write}, *i* ∈ [*N* ], and *v* ∈ {0, 1}*B* . We call these physical queries. The simulator, and thus the physical queries, is probabilistic in general.2 The server responds to physical queries in the obvious way. That is, for (read,*i*, ∗) where "∗" means that the third component is arbitrary, the server returns the value of the *i*th physical cell, and for (write,*i*, *v*), the server updates the value of the *i*th physical cell to *v*. If the virtual query from the user is of the form (read,*i*, ∗), the simulator derives the value of the *i*th virtual cell through the interaction with the server and returns it to the user. If the virtual query is of the form (write,*i*, *v*), the simulator updates the value of the *i*th virtual cell to *v*. The simulator must respond to the virtual queries online. We call the sequence of second components of the virtual queries (resp. physical queries) a virtual access pattern (resp. physical access pattern). For a virtual query sequence *q*, let *a*(*q*) denote the physical access pattern induced by *q*. Recall that *a*(*q*) is a random variable in general. The ORAM scheme is secure if *a*(*q*1) and *a*(*q*2) are indistinguishable for any virtual query sequences *q*<sup>1</sup> and *q*<sup>2</sup> of the same length. There are some variations in the exact meaning "indistinguishable". Typically used meanings of indistinguishability in descending order of security are a) equally distributed, b) statistically closely distributed, and c) computationally indistinguishable. The main performance metric of ORAM includes access overhead, which is the number of physical queries processed for each virtual query, the simulator local space size, and the server space size, which is *B N* bits.

As mentioned above, the simulator models a program running in the local environment of the user. Thus, in practice, we do not distinguish the user and the simulator. For example, we refer to the simulator local space as user space.

The ORAM problem is non-trivial only if the user space is smaller than *BN* bits since otherwise, the simulator can store the entire RAM locally and ignore the server.

<sup>2</sup> The simulator of the scan-based method is deterministic, and it is not hard to see that its linear access overhead is optimal if we restrict the simulator to be deterministic. Thus, all simulators of interest are indeed probabilistic.


**Table 9.1** Summary of existing results. † means amortized bound. ≈ log *N* means *O*( *f* (*N*)) for any *f* ∈ ω(log *N*). The constant factor of the user space of [26] is 1

#### *9.2.2 Existing Results*

Table 9.1 gives a summary of some of the existing results. Every method has physical cell size *B* = *B* + (log *N*). There is an (log *N*) lower bound for the access overhead if the user space is at most *N*1− for constant > 0 [14].

There are mainly two types of techniques that are actively studied: hierarchical approaches and tree-based approaches.3 Asymptotically, the state-of-the-art hierarchical method [12] has access overhead matching the lower bound mentioned above while the state-of-the-art tree-based methods have about log *N* times larger asymptotic access overhead. Yet, tree-based methods are still of practical interest because they tend to have much smaller access overhead constant factors than the hierarchical methods. The access overheads of tree-based methods also constrain the worst case while those of the hierarchical methods are often amortized. Although there are techniques to achieve competitive worst-case access overhead via the hierarchical methods [13, 19], they tend to be complex and add further constant factors to the performance bounds.

In the past, the ORAM research community has focused mainly on reducing the access overhead because it was the biggest obstacle to applying ORAM in practice. However, some recent studies have achieved practical access performances [16, 22] by combining tree-based ORAM with special hardware. For example, the PHAN-TOM secure processor system [16] supports access pattern-hiding SQL queries with a time overhead of 1.2–6 × compared to the standard insecure version. Thus, at least for tree-based ORAM, exploration of aspects other than access overhead is beginning to make sense.

<sup>3</sup> The square root method [7] was the first non-trivial ORAM and is the origin of some of the ideas underlying the hierarchical methods.

We describe the recent development of techniques for reducing the number of physical cells *N* of the tree-based ORAM to (1 + *o*(1))*N* [17]. Note that there also is a space overhead originating from the cell size: typically, *B* = *B* + *c* lg *N* where *c* is a small constant such as 2. We ignore the cell size overhead and focus on the cell number because the typical value of *B* is 128 bytes in the secure processor setting and the overhead with respect to the cell size is just a few percent.

#### *9.2.3 Tree-Based Methods*

The tree-based method of Stefanov et al. [27] works as follows. The server organizes the physical cells into a complete binary tree with *N* leaves where each node is a bucket—a container that can accommodate a constant number of virtual cells. Each virtual cell has a position label—an integer in [*N*]—and a virtual cell with position label *i* is stored either in some bucket on the path from the root to the *i*th leaf or in a stash, which is a container in the user's local memory that can accommodate a small number of virtual cells. Let *vi* be the *i*th virtual cell and let *pi* be the position label of *vi* . Suppose the user maintains *pi* for all *i* ∈ [*N*] in local memory. This requires (*N*) user space but simplifies the exposition. We will reduce the user space later. To access *vi* , the user retrieves all of the blocks on the path from the root to the *pi* th leaf. Let this path be *P*. At this point, *vi* must be in the stash. The user copies the value of *vi* to somewhere in its local memory for a read query or changes it to some other value for a write query. Then, the user updates *pi* to a fresh random value in [*N*]. After that, the user scans the buckets on *P* from the leaf to the root, and for each bucket, moves cells in the stash to the bucket greedily while respecting the position labels and the bucket capacity. See Fig. 9.1 for an example.

Sometimes, some virtual cells in the stash cannot be moved back to the tree. For example, if all cells are assigned the same position label, only (log *N*) physical cells can be used to store the virtual cells and thus, most virtual cells must end up in the stash. (Of course, if *N* is large, such an event happens only extremely rarely.) Stefanov et al. proved that if the bucket size is at least 5, the number of cells left in the stash after processing a query is exponentially small. Thus, if the stash size is ω(log *N*), the stash overflows during processing a polynomial number of queries with only negligible probability.

To reduce the user space for position labels, the user outsources the position labels using the same method recursively. That is, each position label is a lg *N* bit integer and the table for position labels of all virtual cells can be thought of as a RAM storing the *N*lg *N* -bit concatenation of the integers. Thus, the original problem of hiding the access pattern to RAM consisting of *N* cells, each of *B* bits, is reduced to hiding the access pattern to RAM consisting of *N*lg *N* /*B* cells, each of *B* bits. If, say, *B* ≥ 2 lg *N*, which is a completely reasonable assumption for all reasonable *N*, <sup>4</sup> the problem size (cell number) decreases exponentially and reaches *O*(1) after

<sup>4</sup> Recall that the typical value of *B* is 128 bytes in secure processor applications.

*O*(lg *N*) levels of recursion. At that point, the user can store the *O*(1) size RAM locally terminating the recursion.

The tree at the top level of recursion has size (*N*) and the tree at higher recursion levels decreases exponentially. The access overhead is proportional to the sum of the heights of the trees at all recursion levels, which is *O*(log<sup>2</sup> *N*). The server space is proportional to the sum of the sizes of the trees at all recursion levels, which is (*N*).

Although each recursion level requires a stash, the numbers of cells left in those stashes are independent and it turns out that the total number of cells left in all stashes is still exponentially small. Thus, *f* (*N*) user space is enough for any *f* (*N*) ∈ ω(log *N*).

#### *9.2.4 Succinct Construction*

The constant factor hidden in the (*N*) server space bound of the method described above is about 10: the top-level tree has 2*N* nodes each of capacity 5 while the size of the recursive trees is negligible because typically, *B* is much larger than lg *N*. (Theoretically, we assume *B* = ω(lg *N*).) Though one can reduce this constant factor to some extent by decreasing the tree height while tuning the bucket size, it is not possible to achieve a factor ≤ 2 while maintaining a meaningful stash overflow probability, at least using the currently known analysis techniques. This method also leads to prohibitively large access overhead as the server space becomes close to 2*N*. We now describe a method for achieving (1 + *o*(1))*N* server space with a modest sacrifice in access overhead [17].

#### **Fig. 9.2** Large leaf layout

The idea is to modify the layout of the tree at the top recursion level so that the leaf number is *<sup>N</sup>*/ lg<sup>1</sup>.<sup>4</sup> *<sup>N</sup>* and the leaf size is lg1.<sup>4</sup> *<sup>N</sup>* <sup>+</sup> lg1.<sup>3</sup> *<sup>N</sup>* (see Fig. 9.2). It is obvious that the tree size is (<sup>1</sup> <sup>+</sup> <sup>1</sup>/ lg<sup>0</sup>.<sup>1</sup> *<sup>N</sup>*)*<sup>N</sup>* while access overhead remains *<sup>O</sup>*(log<sup>2</sup> *<sup>N</sup>*). We now explain why the stash overflow probability remains small.

Let *Ni* be the number of cells with position label *<sup>i</sup>* for *<sup>i</sup>* ∈ [*N*/ lg<sup>1</sup>.<sup>4</sup> *<sup>N</sup>*]. Let the load of a bucket be the number of cells stored in the bucket. At each moment in the lifetime of the scheme, *Ni* follows the binomial distribution with parameters *N* and *<sup>N</sup>*/ lg<sup>1</sup>.<sup>4</sup> *<sup>N</sup>* for each *<sup>i</sup>*. The probability that this becomes larger than lg1.<sup>4</sup> *<sup>N</sup>* <sup>+</sup> lg1.<sup>3</sup> *N* is negligible. Thus, no leaf becomes full while processing a polynomial number of queries. Under this assumption, the distribution of the internal bucket loads is dominated by the distribution of the loads of the corresponding *<sup>N</sup>*/ lg<sup>1</sup>.<sup>4</sup> *<sup>N</sup>* <sup>−</sup> <sup>1</sup> internal buckets in the standard *N* leaf layout scheme described above. This is so because the internal buckets in the large leaf layout do not need to store the cells that overflow from the leaves. Thus, assuming no leaf becomes full, the stash overflow probability of the large leaf case is negligible. The same is true even without this assumption because there is only a negligible probability that the assumed case does not occur.

We can reduce the *N*/ lg<sup>0</sup>.<sup>1</sup> *N* extra term on the tree size even further by "the power of two choices". That is, we give two random position labels to each virtual cell. One is primary, which determines the path on which the cell can reside, while the other is secondary, which is a dummy needed to hide the access pattern. Now, *Ni* is the number of virtual cells with primary position label *i*. We maintain *Ni* for all *i* in a sub-ORAM in the same way we store position labels in recursive ORAM. To access a virtual cell *v*, we retrieve all cells on the path from the root to the *p*1th leaf and the path from the root to the *p*2th leaf. We choose two random labels *p* <sup>1</sup>, *p* 2 and let *p* <sup>1</sup> (resp. *p* 2) be the new primary (resp. secondary) label of *v* if *Np* <sup>1</sup> < *Np* 2 . Otherwise, we exchange the role of *p* <sup>1</sup> and *p* 2. We then scan the paths specified by the old labels and greedily move back the cells as in the previous method. Here, *Ni* is not binomial but concentrated much more tightly around the mean due to the effect of the two choices. Thus, the "head space" for each leaf can be much smaller than lg1.<sup>3</sup> *N*, leading to a smaller tree size.

By tuning the parameters, the first technique (large leaf layout) alone can achieve about (<sup>1</sup> <sup>+</sup> (log *<sup>N</sup> <sup>B</sup>* + <sup>√</sup> 1 log *<sup>N</sup>* ))*N* server space while the second technique (two choices) decreases it to (<sup>1</sup> <sup>+</sup> (log *<sup>N</sup> <sup>B</sup>* <sup>+</sup> log log *<sup>N</sup>* log<sup>2</sup> *<sup>N</sup>* ))*N*. 5

#### *9.2.5 Open Problem*

It is unknown whether the optimal *O*(log *N*) access overhead and (1 + *o*(1))*N* server space can be achieved at the same time. There are two natural approaches for answering this question affirmatively:


#### **9.3 Wear Leveling**

#### *9.3.1 Problem*

Consider the case where you have RAM with the limitation that each cell state can be updated at most a certain number of times. Once the number of updates has reached the limit, the cell dies and you can no longer update it. The utility of the RAM quickly degrades as the cells start to die because the total amount of information that can be stored decreases, and it becomes cumbersome to manage which cells are still alive. Thus, the number of times you can support updates before cells start to die is of primary interest. This number depends heavily on the case. In the best case where the updates are uniform among the cells, you can perform *nL* updates where *n* is the number of cells and *L* is the number of times each cell can be updated. In contrast, in the worst case where all updates fall onto a particular cell, you can perform updates only *L* times. Wear leveling is the problem of prolonging the memory lifetime as much as possible while keeping the associated overhead, if any, as small as possible.

The system community has been studying wear leveling for decades. Historically, flash memory was the main motivation for studies conducted from the late 1980s to the mid 2000s [1, 6, 15]. Today, the main motivation for wear leveling comes

<sup>5</sup> These bounds include the cell size overhead that we ignored in the main explanation for brevity.

from phase change memory (PCM), which is an emerging next-generation memory technology that has many features, including low latency, energy efficiency, and non-volatility [5]. Each PCM cell supports only 108–109 updates, which means that cells can start dying within minutes or even seconds if no effort is made to perform wear leveling. PCM differs from flash memory in certain important respects, such as latency, access granularity, and in-place write capability, and thus requires a different wear leveling formalization than flash memory.

Most existing studies on wear leveling are conducted mainly from a practical point of view. Often, they do not have a formal problem statement or rigorous theoretical analyses. While this might not be a serious problem if the only thing that matters is the performance, some relatively recent studies have repeatedly emphasized the security aspects of wear leveling [21, 23, 24, 28, 29]. In particular, it is important to take into account the case of malicious users who actively try to reduce the memory lifetime. (Consider, for example, a computing outsourcing service.)

Below, we describe a recent theoretical study that constructed a problem formalization to capture the wear leveling for PCM explained above, and the corresponding solutions [18].

The formal problem statement is as follows. There are two parties: the user and server. The server has three resources: physical RAM, wear-free memory, and private randomness. The physical RAM is RAM that consists of *N B*-bit cells while the wearfree memory is RAM that consists of a small number of *B*-bit cells. The user provides the server with adversarially chosen read/write queries to virtual RAM—a RAM consisting of *n b*-bit cells—and the server must respond to these queries "correctly." That is, each request is of the form (type,*i*, *v*) where type ∈ {read, write}, *i* ∈ [*n*], and *v* ∈ {0, 1}*<sup>b</sup>* and, for (read,*i*, ∗) where "∗" means that the third component is arbitrary, the server must return the last value written to the *i*th virtual cell (the *v* in the last query of the form (write,*i*, *v*)). The server not only needs to return the correct responses but also needs to support as many write queries as possible with high probability without updating any physical cell more than *L* times where *L* is a parameter. We assume *L* = *n*<sup>δ</sup> for some constant δ > 0. Equivalently, we define δ := log*<sup>L</sup> n* and assume it is a constant. This assumption is reasonable even though, in reality, *L* and *n* are independent, because *L* is 108–109 and log*<sup>L</sup> n* is at most 2 or 3 for reasonable *n*.

The performance metric for wear leveling includes the physical memory size, the wear-free memory size, the number of write queries supported, and the access overhead, which is the number of physical RAM accesses needed for each virtual RAM access. We say a wear-leveling scheme is "optimal" if it satisfies the following conditions (asymptotic notations are in terms of *n* → ∞):


<sup>6</sup> We ignore the cell size overhead for brevity. Security Refresh [23] described below does not have any cell size overhead (*B* = *B*) while the method of Onodera and Shibuya [18] described after that has a cell size of *B* = *B* + 2lg *n* + 1.

**Fig. 9.3** Movement of cells in an epoch of security refresh. All the numbers are binary. *n* = 4(= *N*), *r*<sup>0</sup> ="10", and *r*<sup>1</sup> ="11". Solid arrows mean cell swaps while dotted arrows mean skipped cell swaps


#### *9.3.2 Security Refresh*

The wear leveling scheme of Sewong et al. [23] is optimal if *L* = *N*<sup>δ</sup> for δ > 1 while it is non-optimal (in fact, "far from" optimal) for δ < 1 [18].

In this method, *n* virtual cells are stored in the *N* = *n* physical cells in permuted order. The method works in epochs. At each epoch, two random lg *n*-bit integers *r*<sup>0</sup> and *r*<sup>1</sup> are maintained. (We assume *n* is a power of two for brevity.) At the start of an epoch, for each *i* ∈ [*n*], the *i*th virtual cell *vi* is stored in the *i* ⊕ *r*0th physical cell *Vi*⊕*r*<sup>0</sup> where ⊕ means bit-wise XOR. During the epoch, each *vi* is moved from *Vi*⊕*r*<sup>0</sup> to *Vi*⊕*r*<sup>1</sup> . Note that the virtual cell stored in the destination *Vi*⊕*r*<sup>1</sup> of *vi* is *vi*⊕*r*1⊕*r*<sup>0</sup> and its destination is *Vi*⊕*r*<sup>0</sup> ; that is, *vi* and *vi*⊕*r*0⊕*r*<sup>1</sup> swap their positions. This is done as follows. For every *t* write queries processed where *t* is a parameter, we perform a remap subroutine. At the *i*th remap subroutine call in an epoch, we check if *i* < *i* ⊕ *r*<sup>0</sup> ⊕ *r*1. If so, *vi* still is in *Vi*⊕*r*<sup>0</sup> and thus, we swap the contents of *Vi*⊕*r*<sup>0</sup> and *Vi*⊕*r*<sup>1</sup> . Otherwise, *vi* is already in *Vi*⊕*r*<sup>1</sup> and we skip swapping. The epoch ends after the *n*th remap subroutine finishes. At that point, each cell *vi* is stored in *Vi*⊕*r*<sup>1</sup> . We update *r*<sup>0</sup> to *r*1, and *r*<sup>1</sup> to a fresh random lg *n*-bit integer. Now every *vi* is in *Vi*⊕*r*<sup>0</sup> as required for the epoch start, and we restart another epoch at this point. See Fig. 9.3 for an example. To access *vi* , we access *Vi*⊕*r*<sup>0</sup> if *vi* was already remapped in the epoch. (We have already seen how to check this.) Otherwise, we access *Vi*⊕*r*<sup>1</sup> .

The non-trivial part of the analysis is the proof of a high-probability guarantee of memory lifetime. We outline the key points. Fix a physical cell and let *Xi* be the number of times it is updated during the *i*th epoch. We need to place a bound on the probability that the sum of *Xi*s deviates from its expected value. To do this, it suffices to bound the deviation of the sum of odd-indexed variables *X*1, *X*3,... from its expected value and do the same for the sum of even-indexed variables *X*2, *X*4,... separately. This is helpful because each *Xi* is a random variable that depends on *r*0,*r*<sup>1</sup> in the *i*th epoch (and the queries) and thus, the odd-indexed variables *X*1, *X*3,... are independent of each other and so are the even-indexed variables. Regardless of the queries, *Xi* is bounded by the number of write queries processed in an epoch *tn*. Although this suggests the use of the Hoeffding inequality, it turns out that it does not work for the case δ < 2, essentially because the condition *Xi* ≤ *tn* alone does not capture the fact that some cell being updated many, say, ≈ *tn*, times in an epoch negatively affects the number of times other cells are updated in the epoch. To derive the bound for the case 1 <δ< 2, bound the second moment of *Xi* and apply the Bernstein inequality [4].

If the user tries to keep on updating *vi* continuously, one of *Vi*⊕*r*<sup>0</sup> and *Vi*⊕*r*<sup>1</sup> is updated *tn*/2 = (*n*) times during the first epoch, and this physical cell dies if δ < 1. Thus, this method is not optimal for δ < 1.

#### *9.3.3 Construction for Small Write Limit Cases*

We now briefly describe a method for achieving optimality for the case δ < 1, that is, the memory is large [18]. The idea is to prepare spare cells and remap the frequently updated cells to free spare cells adaptively. (We maintain the write counts of cells by appending a counter to each cell.) We store pointers to the new locations in the old locations to trace the remapped cells. To keep the number of pointers to follow small, we connect pointers in a manner that is similar to the DFS of a complete *d*-ary tree with *d<sup>h</sup>* leaves where *d*, *h* are parameters (see Fig. 9.4). As we continue to process write queries, the data structure gradually degrades: the free spare cells become scarce and the trees become saturated. To reset the degradation, we perform a Security Refresh-style mapping. That is, we treat the structure in Fig. 9.4 as residing in another RAM *u* and maintain a global mapping—a gradually changing one-to-one map between the cells of *u* and the physical cells *V*1, *V*2,... . Once we have globally remapped a cell of *u* corresponding to a tree root, we reset the "DFS" starting from that cell. For example, if we globally remap *ui* in state (5) of Fig. 9.4, we free *un*+<sup>1</sup>, *un*+4, resetting DFS for the tree from *ui* to the root. Garbage such as *un*+<sup>3</sup> are also reclaimed sooner or later when they are globally remapped. To access *vi* , the tree path traversal in *u* starting from *i* is simulated translating between addresses in *u* and addresses in *V*.

Although analysis of the bound on memory lifetime is cumbersome, the same idea as the analysis of Security Refresh applies. Indeed, the core argument is easier because the Hoeffding bound suffices.

#### *9.3.4 Open Problem*

The access overhead of the method for the small write limit case described above is about 1/δ. It is easy to obtain amortized 1 + *o*(1) and worst-case (*n*) access overhead if we allow relatively large wear-free memory, for example, *O*(*n* ) for an appropriate constant 0 << 1. It seems possible and practically relevant to

**Fig. 9.4** Example evolution of *u*. *d* = *h* = 2. *u*1,..., *un* are default locations while the rest are spare cells. Each panel shows the state just after the thick-bordered cell was allocated because the cell previously storing its content was updated and the write count reached the threshold

achieve amortized 1 + *o*(1) and worst-case *O*(1) access overhead in this setting. A theoretically more interesting challenge is to give negative results that justify the use of such large wear-free memory.

#### **9.4 Conclusion**

We reviewed two recent studies on ORAM and wear leveling that achieve succinct space usage. Though these objects have totally different motivations and are studied in different communities, there are some similarities between them. As we mentioned in the introduction, several other concepts with similar flavors are known, including initializable RAM, memory checking, locally decodable code, and huge random objects. There are probably many more such enhanced RAM instances yet to be found, and trying to find them can be an avenue for making progress in studies of data structures.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part IV Sublinear Modelling**

# **Chapter 10 Review of Sublinear Modeling in Probabilistic Graphical Models by Statistical Mechanical Informatics and Statistical Machine Learning Theory**

#### **Kazuyuki Tanaka**

**Abstract** We review sublinear modeling in probabilistic graphical models by statistical mechanical informatics and statistical machine learning theory. Our statistical mechanical informatics schemes are based on advanced mean-field methods including loopy belief propagations. This chapter explores how phase transitions appear in loopy belief propagations for prior probabilistic graphical models. The frameworks are mainly explained for loopy belief propagations in the Ising model which is one of the elementary versions of probabilistic graphical models. We also expand the schemes to quantum statistical machine learning theory. Our framework can provide us with sublinear modeling based on the momentum space renormalization group methods.

# **10.1 Introduction**

Statistical machine learning frameworks using **probabilistic graphical models** are useful for many applications, including information communication technologies [1–3], compressed sensing [4, 5] and neural information processing systems [6–10] in data-driven sciences.

Most probabilistic graphical models belong to the exponential family [11] and can be regarded as classical spin systems in **statistical mechanical informatics**[12–17]. However, it is well known that many applicable formulations in data sciences as well as computational sciences can be reduced to combinatorial problems with some constraint conditions which can be regarded as an **Ising Model** in statistical mechanical informatics [18, 19]. Moreover, much interest has focused on applying **quantum annealing** as a novel high-speed optimization technology to massive optimization problems [20–24].

K. Tanaka (B)

Graduate School of Information Sciences, Tohoku University, Sendai, Japan e-mail: kazu@tohoku.ac.jp

N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_10

#### **10.2 Statistical Machine Learning**

In statistical machine learning, most of the mathematical frameworks for machine learning are based on maximum likelihood frameworks [25, 26] from statistical mathematical sciences. The important points are how to assume the prior distribution and the data generative probability distribution and how to express the joint probability between the parameters and the data vector. In this section, we explore maximum likelihood frameworks in terms of model selection and parameter selection from a given data vector.

## *10.2.1 Bayesian Statistics and Maximization of Marginal Likelihood*

Let us consider a graph specified by nodes and edges,(*V*, *E*), where *V* is the set of all nodes*i* and *E* is the set of all edges{*i*, *j*}. State variables*si* and *di* are associated with

$$\text{Each node } i. \text{ The vectors } \mathbf{s} = \begin{pmatrix} s\_1 \\ s\_2 \\ \vdots \\ s\_{|V|} \end{pmatrix} \text{ and } \mathbf{d} = \begin{pmatrix} d\_1 \\ d\_2 \\ \vdots \\ d\_{|V|} \end{pmatrix} \text{ correspond to the parameters.}$$

and the data vector, respectively. The state spaces of *si* and *di* are given by and (−∞, +∞), respectively. Now ρ(*d*|*s*,β) and *P*(*s*|α) which correspond to the data generative and prior models, respectively, are assumed to be as follows:

$$\rho(\mathbf{d}|\mathbf{s},\beta) = \prod\_{i \in V} \sqrt{\frac{\beta}{2\pi}} \exp\left(-\frac{1}{2}\beta \left(d\_i - s\_i\right)^2\right),\tag{10.1}$$

$$P(\mathbf{s}|\alpha) = \frac{\prod\_{\{i,j\} \in E} \exp\left(-\frac{1}{2}\alpha \left(\mathbf{s}\_i - \mathbf{s}\_j\right)^2\right)}{\sum\_{s\_1 \in \Omega} \sum\_{z\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \prod\_{\{i,j\} \in E} \exp\left(-\frac{1}{2}\alpha \left(\mathbf{s}\_i - \mathbf{s}\_j\right)^2\right)}. \tag{10.2}$$

The expressions for the posterior probability *P*(*s*|*d*, α, β), joint probability ρ(*s*, *d*|α, β), and marginal likelihood ρ(*d*|α, β) are given by Bayes formulas as follows:

$$P(\mathbf{s}|\mathbf{d},\alpha,\beta) = \frac{\rho(\mathbf{s},\mathbf{d}|\alpha,\beta)}{\rho(\mathbf{d}|\alpha,\beta)} = \frac{\rho(\mathbf{d}|\mathbf{s},\beta)P(\mathbf{s}|\alpha)}{\rho(\mathbf{d}|\alpha,\beta)},\tag{10.3}$$

10 Review of Sublinear Modeling in Probabilistic Graphical Models … 167

$$
\rho(\mathbf{s}, \mathbf{d} | \alpha, \beta) = \rho(\mathbf{d} | \mathbf{s}, \beta) P(\mathbf{s} | \alpha), \tag{10.4}
$$

$$\begin{split} \rho(\mathbf{d}|\alpha,\beta) &= \sum\_{s\_1 \in \Omega s\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \rho(\mathbf{s}, \mathbf{d}|\alpha, \beta) \\ &= \sum\_{s\_1 \in \Omega s\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \rho(\mathbf{d}|\mathbf{s}, \beta) P(\mathbf{s}|\alpha) . \end{split} \tag{10.5}$$

Estimates of the hyperparameters and the parameter vectorα, <sup>β</sup> ,*s* =  *s*1,*s*2, ···,*s*|*V*<sup>|</sup> are determined by

$$\left(\widehat{\boldsymbol{\alpha}}(\boldsymbol{d}), \widehat{\boldsymbol{\beta}}(\boldsymbol{d})\right) = \operatorname\*{argmax}\_{\left(\boldsymbol{a}, \boldsymbol{\beta}\right)} \rho\left(\boldsymbol{d} \,|\,\boldsymbol{\alpha}, \boldsymbol{\beta}\right),\tag{10.6}$$

$$\widehat{\mathbf{s}}\_{i}(\mathbf{d}) = \operatorname\*{argmax}\_{\mathbf{s}\_{i} \in \Omega} P\_{i} \left( \mathbf{s}\_{i} \middle| \mathbf{d}, \widehat{\boldsymbol{\alpha}}(\mathbf{d}), \widehat{\boldsymbol{\beta}}(\mathbf{d}) \right) (i \in V). \tag{10.7}$$

Equations (10.6) and (10.7) are referred to as the **maximization of marginal likelihood (MML)** [25, 26] and the **maximization of posterior marginal (MPM)** [27], respectively.

#### *10.2.2 Expectation-Maximization Algorithm*

The expectation-maximization (EM) algorithm is often used to maximize the marginal likelihood in Eq. (10.6) [25, 26]. The *Q*-function for the EM algorithm in the present framework is defined by

$$\mathcal{Q}(\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{\alpha}',\boldsymbol{\beta}',\mathbf{d}) \equiv \sum\_{\boldsymbol{s}\_1 \in \Omega \boldsymbol{s}\_2 \in \Omega} \sum\_{\boldsymbol{s}\_{|V|} \in \Omega} \cdots \sum\_{\boldsymbol{s}\_{|V|} \in \Omega} P(\mathbf{s}|\mathbf{d},\boldsymbol{\alpha}',\boldsymbol{\beta}') \ln(\rho(\mathbf{s},\mathbf{d}|\boldsymbol{\alpha},\boldsymbol{\beta})) . \quad (10.8)$$

The EM algorithm is a procedure that performs the following procedures of **E-** and **M-step** repeatedly for *<sup>t</sup>* <sup>=</sup> <sup>0</sup>, <sup>1</sup>, <sup>2</sup>, ··· until α(*d*) and β( *<sup>d</sup>*) converge:

**E-step:** Compute *Q* α, β α(*d*, *<sup>t</sup>*), β(*d*, *<sup>t</sup>*), *<sup>d</sup>* for various values of α and β.

**M-step:** Determine  α(*d*, *t* + 1), β(*d*, *t* + 1) so as to satisfy the extremum conditions of *Q* α, β α(*d*, *<sup>t</sup>*), β(*d*, *<sup>t</sup>*), *<sup>d</sup>* with respect to α and β. Update α(*d*)←α(*d*, *<sup>t</sup>* <sup>+</sup> <sup>1</sup>) and β( *<sup>d</sup>*)←β(*d*, *<sup>t</sup>* <sup>+</sup> <sup>1</sup>).

The update rule from(α(*d*, *t*), β(*d*, *t*))to (α(*d*, *t* + 1), β(*d*, *t* + 1))for the extremum conditions can be written as

$$\begin{split} \frac{1}{|E|} \sum\_{\{l,j\}} \sum\_{\mathbf{c} \in Es\_1 \in \Omega s\_2 \in \Omega} &\cdots \sum\_{s\_{|V|} \in \Omega} \left( s\_l - s\_j \right)^2 P(s\_1, s\_2, \dots, s\_{|V|} | \boldsymbol{\alpha}(\mathbf{d}, \boldsymbol{\iota} + \mathbf{l})) \\ &= \frac{1}{|E|} \sum\_{\{l,j\} \in Es\_1 \in \Omega s\_2 \in \Omega} \sum\_{s\_{|V|} \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \left( s\_l - s\_j \right)^2 P(s\_1, s\_2, \dots, s\_{|V|} | \boldsymbol{\alpha}, \boldsymbol{\alpha}(\mathbf{d}, \boldsymbol{\iota}), \boldsymbol{\beta}(\mathbf{d}, \boldsymbol{\iota})), \end{split} \tag{10.9}$$

$$\frac{1}{\beta(\mathbf{d}, t+1)} = \frac{1}{|V|} \sum\_{l \in V} \sum\_{s\_1 \in \Omega} \sum\_{z \in \Omega} \cdots \sum\_{s \mid V \mid \in \Omega} (s\_l - d\_l)^2 P(s\_1, s\_2, \cdots, s\_{|V|} | \mathbf{d}, \alpha(\mathbf{d}, t), \beta(\mathbf{d}, t)). \tag{10.10}$$

The marginal probability distributions of *P s d*, α, β and *P s* α are introduced as

$$P\_l(s\_l | \mathbf{d}, \alpha, \beta) = \sum\_{\mathbf{r}\_l \in \mathfrak{Q}} \sum\_{\mathbf{r}\_l \in \mathfrak{Q}} \cdots \sum\_{\mathbf{r}\_{|V|} \in \mathfrak{Q}} \delta\_{\mathfrak{sl}\_l, \mathfrak{T}\_l} P(\mathbf{r}\_1, \mathbf{r}\_2, \cdots, \mathbf{r}\_{|V|} | \mathbf{d}, \alpha, \beta) \text{ ( $i \in V$ ), (10.11)}$$

$$\begin{split} P\_{lj}(s\_l, s\_j | \mathbf{d}, \alpha, \beta) &= P\_{lj}(s\_j, s\_l | \mathbf{d}, \alpha, \beta) \\ &= \sum\_{\mathbf{r}\_l \in \Omega} \sum\_{\mathbf{r}\_l \in \Omega} \cdots \sum\_{\mathbf{r}\_{|V|} \in \Omega} \delta\_{\mathcal{S}l, \operatorname{tr}} \delta\_{\mathcal{S}\_l, \mathbf{r}\_l} P\left(\mathbf{r}\_1, \mathbf{r}\_2, \cdots, \mathbf{r}\_{|V|} | \mathbf{d}, \alpha, \beta\right) (\langle l, j \rangle \in E), \end{split} \tag{10.12}$$

$$\begin{split}P\_{lj}(s\_i, s\_j | \boldsymbol{\alpha}) &= P\_{jl}(s\_j, s\_i | \boldsymbol{\alpha}) \\ &= \sum\_{\boldsymbol{\tau}\_1 \in \Omega} \sum\_{\boldsymbol{\tau} \in \Omega} \cdots \sum\_{\boldsymbol{\tau} | V \in \Omega} \delta\_{\boldsymbol{s}\_i, \boldsymbol{\tau}\_l} \delta\_{s\_j, \boldsymbol{\tau}\_j} P\left(\boldsymbol{\tau}\_1, \boldsymbol{\tau}\_2, \cdots, \boldsymbol{\tau}\_{|V|} | \boldsymbol{\alpha}\right) (\boldsymbol{\ell}, \boldsymbol{j}) \in E). \end{split} \tag{10.13}$$

In this way, the extremum conditions can be reduced to

$$\begin{split} \frac{1}{|E|} \sum\_{\{i,j\} \in E \, s\_i \in \Omega} &\sum\_{i \in \Omega} \sum\_{s\_j \in \Omega} \left( s\_i - s\_j \right)^2 P\_{ij} \left( s\_i, s\_j \middle| \alpha(\mathbf{d}, t+1) \right) \\ &= \frac{1}{|E|} \sum\_{\{i,j\} \in E \, s\_i \in \Omega} \sum\_{s\_j \in \Omega} \sum\_{t \in \Omega} \left( s\_i - s\_j \right)^2 P\_{ij} \left( s\_i, s\_j \middle| \mathbf{d}, \alpha(\mathbf{d}, t), \beta(\mathbf{d}, t) \right), \tag{10.14} \end{split} (10.14)$$

$$\frac{1}{\beta(\mathbf{d}, t+1)} = \frac{1}{|V|} \sum\_{i \in V} \sum\_{\mathbf{s}\_i \in \Omega} (\mathbf{s}\_i - d\_i)^2 P\_i(\mathbf{s}\_i | \mathbf{d}, \alpha(\mathbf{d}, t), \beta(\mathbf{d}, t)). \tag{10.15}$$

To realize the EM procedure as a practical algorithm, **Markov chain Monte Carlo (MCMC) Methods** are often used, which are powerful probabilistic methods [28, 29]. In some recent developments, advanced mean-field methods from statistical mechanical informatics are also used as powerful deterministic algorithms, as shown in Sect. 10.3. Consider the expectation values for both sides of Eqs. (10.9) and (10.10) with respect to the state vector *d* of a data point according to the following probability density function where the hyperparameters α and β are set to their true values α<sup>∗</sup> and β∗, respectively:

$$\rho\left(\mathbf{d}|\alpha^\*,\beta^\*\right) = \sum\_{\mathbf{r}\in\Omega^{|V|}} \rho\left(\mathbf{d}|\mathbf{r},\alpha^\*,\beta^\*\right) P\left(\mathbf{r}|\alpha^\*\right),\tag{10.16}$$

such that

$$\rho\left(d\_1, d\_2, \cdots, d\_{|V|} \middle| a^\bullet, \beta^\bullet\right) = \sum\_{\mathbf{r}\_1 \in \Omega} \sum\_{\mathbf{r}\_2 \in \Omega} \cdots \sum\_{\mathbf{r}\_{|V|} \in \Omega} \rho\left(d\_1, d\_2, \cdots, d\_{|V|} \middle| \tau\_1, \tau\_2, \cdots, \tau\_{|V|}, \beta^\bullet\right)$$

$$\times P\left(\mathbf{r}\_1, \mathbf{r}\_2, \cdots, \mathbf{r}\_{|V|} \middle| a^\bullet\right)$$

$$\left(d\_1 \in ( -\infty, +\infty), d\_2 \in (-\infty, +\infty), \cdots, d\_{|V|} \in (-\infty, +\infty)\right). \tag{10.17}$$

We can then derive simultaneous equations for the statistical trajectory {α(α∗, β∗, *t*), β(α∗, β∗, *t*)|*t* = 1, 2, 3, · · ·}) in the convergence process {(α(*d*, *t*), β(*d*, *t*))|*t* = 1, 2, 3, · · ·} of the above EM algorithm.

Equations (10.9) and (10.10) can be rewritten as follows:

$$\frac{1}{|E|} \frac{\partial}{\partial \alpha(\mathbf{d}, t+1)} (\ln(Z(\alpha(\mathbf{d}, t+1)))) = \frac{1}{|E|} \frac{\partial}{\partial \alpha(\mathbf{d}, t)} (\ln(Z(\mathbf{d}, \alpha(\mathbf{d}, t), \beta(\mathbf{d}, t)))),\tag{10.18}$$

$$\frac{1}{\beta(\mathbf{d}, t+1)} = \frac{1}{|V|} \frac{\partial}{\partial \beta(\mathbf{d}, t)} (\ln(Z(\mathbf{d}, \alpha(\mathbf{d}, t), \beta(\mathbf{d}, t)))),\tag{10.19}$$

where

$$Z(\alpha) \equiv \sum\_{s\_1 \in \Omega} \sum\_{\mathbf{s}\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \prod\_{\{i,j\} \in E} \exp\left(-\frac{1}{2}\alpha \left(s\_i - s\_j\right)^2\right),\tag{10.20}$$

$$Z(\mathbf{d}, \alpha, \beta) \equiv \sum\_{\mathbf{s}\_1 \in \Omega \mathbf{s}\_2 \in \Omega} \cdots \sum\_{\mathbf{s}\_{|V|} \in \Omega} w(\mathbf{s}\_1, \mathbf{s}\_2, \cdots, \mathbf{s}\_{|V|} | \mathbf{d}, \alpha, \beta), \tag{10.21}$$

$$\begin{split} \mathbb{W}(\mathbf{s}|\mathbf{d},\boldsymbol{\alpha},\boldsymbol{\beta}) &= \mathbb{W}(\mathbf{s}\_{1},\mathbf{s}\_{2},\cdots,\mathbf{s}\_{|V|} | \boldsymbol{d}\_{1},\boldsymbol{d}\_{2},\cdots,\boldsymbol{d}\_{|V|},\boldsymbol{\alpha},\boldsymbol{\beta}) \\ &\equiv \left(\prod\_{i\in V} \exp\left(-\frac{1}{2}\beta(\mathbf{s}\_{i}-\boldsymbol{d}\_{i})^{2}\right)\right) \left(\prod\_{\{i,j\}\in E} \exp\left(-\frac{1}{2}\alpha(\mathbf{s}\_{i}-\boldsymbol{s}\_{j})^{2}\right)\right). \end{split} \tag{10.22}$$

By taking the expectation values of both sides of Eqs. (10.18) and (10.19) with respect to the state vector of the data point *d* in the probability density function ρ(*d*|α∗, β∗), the simultaneous deterministic equation for the statistical trajectory {α(α∗, β∗, *t*), β(α∗, β∗, *t*)|*t* = 1, 2, 3, · · ·} of the EM procedure can be derived as follows:

$$\begin{split} \frac{1}{|E|} \frac{\partial}{\partial \overline{\boldsymbol{\pi}}(\boldsymbol{a}^{\*}, \boldsymbol{\beta}^{\*}, t+1)} & \Big( \ln \Big( Z(\overline{\boldsymbol{\pi}}(\boldsymbol{a}^{\*}, \boldsymbol{\beta}^{\*}, t+1)) \big) \Big) \\ - \frac{1}{|E|} \frac{\partial}{\partial \overline{\boldsymbol{\pi}}(\boldsymbol{a}^{\*}, \boldsymbol{\beta}^{\*}, t)} \Big( \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \rho(\boldsymbol{d} | \boldsymbol{a}^{\*}, \boldsymbol{\beta}^{\*}) (\ln \big( Z(\boldsymbol{d}, \overline{\boldsymbol{\pi}}(\boldsymbol{a}^{\*}, \boldsymbol{\beta}^{\*}, t), \overline{\boldsymbol{\beta}}(\boldsymbol{a}^{\*}, \boldsymbol{\beta}^{\*}, t))) \Big) d d\boldsymbol{d}\_{1} d\boldsymbol{d}\_{2} \cdots d\boldsymbol{d}\_{|V|} \Big) . \end{split} \tag{10.23}$$

$$\begin{split} \frac{1}{\overline{\beta}(\alpha^{\*},\beta^{\*},t+1)} &\quad -\frac{1}{|V|} \frac{\partial}{\partial \overline{\beta}(\alpha^{\*},\beta^{\*},t)} \\ &= \left( \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \rho(\mathbf{d} \, | \alpha^{\*}, \beta^{\*}) \langle \ln \{ Z(\mathbf{d}, \overline{\mathbf{z}}(\mathbf{a}^{\*}, \beta^{\*}t), \overline{\beta}(\alpha^{\*}, \beta^{\*},t) \} \rangle \right) d d\boldsymbol{d}\_{1} d\boldsymbol{d}\_{2} \cdots d\boldsymbol{d}\_{|V|} \right). \end{split} \tag{10.24}$$

In the case of a continuous state space - = (−∞, +∞), the posterior and prior probabilistic models correspond to Gaussian graphical models, and the statistical trajectory in Eqs. (10.23) and (10.24) can be exactly computed by means of the multi-dimensional Gaussian integral formula [30].

For a discrete state space -, it is generally hard to treat Eqs. (10.23) and (10.24) analytically. To estimate Eqs. (10.23) and (10.24), the following quantity is often introduced in statistical mechanical informatics [13, 17]:

$$\int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \rho(\mathbf{d}|\alpha^\*, \beta^\*) (\ln(Z(\mathbf{d}, \alpha, \beta))) d d\_1 d d\_2 \cdots d d\_{|V|}. \quad (10.25)$$

The quantity in Eq. (10.25) can be rewritten as follows:

 +∞ −∞ +∞ −∞ ··· +∞ −∞ ρ *d* α∗, β<sup>∗</sup> lim *n*→+0 1 *n Z*(*d*, α, β) *<sup>n</sup>* <sup>−</sup> <sup>1</sup> *dd*1*dd*2···*dd*|*<sup>V</sup>* <sup>|</sup> = lim *n*→+0 1 *n* +∞ −∞ +∞ −∞ ··· +∞ −∞ ρ *d* α∗, β<sup>∗</sup> *<sup>Z</sup>*(*d*, α, β) *<sup>n</sup>* <sup>−</sup> <sup>1</sup> *dd*1*dd*2···*dd*|*V*<sup>|</sup> = lim *n*→+0 1 *n* +∞ −∞ +∞ −∞ ··· +∞ −∞ ρ *d* α∗, β<sup>∗</sup> ⎛ ⎝ *s*1∈- *s*2∈- ··· *s*|*V*|∈- *w s d*, α, β ⎞ ⎠ *n dd*1*dd*2···*dd*|*V*<sup>|</sup> − 1 = lim *n*→+0 1 *n* +∞ −∞ +∞ −∞ ··· +∞ −∞ ρ *d* α∗, β<sup>∗</sup> × *n j*=1 ⎛ <sup>⎝</sup> *s*1,*<sup>j</sup>* ∈- *s*2,*<sup>j</sup>* ∈- ··· *s*|*<sup>V</sup>* <sup>|</sup>,*<sup>j</sup>* ∈- *w s*1,*j*,*s*2,*j*, ···*s*|*<sup>V</sup>* <sup>|</sup>,*<sup>j</sup> d*, α, β ⎞ <sup>⎠</sup>*dd*1*dd*2···*dd*|*V*<sup>|</sup> − 1 <sup>=</sup> <sup>1</sup> *Z*(α∗) β<sup>∗</sup> 2π |*<sup>V</sup>* <sup>|</sup> × lim *n*→+0 1 *n* +∞ −∞ +∞ −∞ ··· +∞ −∞ ⎛ ⎝ τ1∈- τ2∈- ··· τ|*<sup>V</sup>* |∈- *w* τ1, τ2, ···τ|*V*<sup>|</sup> *d*, α∗, β<sup>∗</sup> ⎞ ⎠ × *n j*=1 ⎛ <sup>⎝</sup> *s*1,*<sup>j</sup>* ∈- *s*2,*<sup>j</sup>* ∈- ··· *s*|*<sup>V</sup>* <sup>|</sup>,*<sup>j</sup>* ∈- *w s*1,*j*,*s*2,*j*, ···*s*|*<sup>V</sup>* <sup>|</sup>,*<sup>j</sup> d*, α, β ⎞ ⎠ *dd*1*dd*2···*dd*|*<sup>V</sup>* <sup>|</sup> <sup>−</sup> <sup>1</sup>. (10.26)

Equation (10.26) means that computation of the statistical quantity in Eq. (10.25) can be reduced, up to some normalization constant, to computation of the statistical quantity in the probabilistic model given by the weight factor

$$\left\|\mathbf{w}\left(\tau\_1,\tau\_2,\cdots,\tau\_{|V|}\middle|\mathbf{d},\alpha^\*,\beta^\*\right)\right\|\_{j=1}^n \left|\mathbf{w}\left(\mathbf{s}\_{1,j},\mathbf{s}\_{2,j},\cdots,\mathbf{s}\_{|V|,j}\middle|\mathbf{d},\alpha,\beta\right).\right.\tag{10.27}$$

We remark that the weight factor (10.27) is expressed by considering some replicas of the posterior probabilistic model *P*(*s*|*d*, α, β) and the analysis starting from the weight factor (10.27) is referred to as a **replica method** [13, 17]. One possible case for analytical treatment is the EM algorithm with the prior and posterior probabilistic models in Eqs. (10.2) and (10.3) for the compete graph (*V*, *E*). The dynamics of the EM algorithm with the MCMC method can be analyzed by using the replica method and the **master equations** for **Glauber dynamics** [31].<sup>1</sup>

## *10.2.3 Expectation-Maximization Algorithm for Probabilistic Image Segmentations*

This section extends the previous section to the statistical machine learning framework for probabilistic image segmentation. In probabilistic image segmentations, we consider a square grid graph (*V*, *E*) in which a light intensity vector *di* = (*di*R, *di*G, *di*B) for the three components red *di*R, green *di*<sup>G</sup> and blue *di*<sup>B</sup> is assigned to each node *i*. The state vector *s* for the labeled configuration and the data matrix *D* for the color image configuration are expressed as

$$\mathbf{s} = \begin{pmatrix} s\_1 \\ s\_2 \\ \vdots \\ \vdots \\ s\_{|V|} \end{pmatrix}, \ D = \begin{pmatrix} d\_1 \\ d\_2 \\ d\_3 \\ \vdots \\ d\_{|V|} \end{pmatrix} = \begin{pmatrix} d\_{1\mathbb{R}} & d\_{1\mathbb{G}} & d\_{1\mathbb{B}} \\ d\_{2\mathbb{R}} & d\_{2\mathbb{G}} & d\_{2\mathbb{B}} \\ \vdots & \vdots & \vdots \\ d\_{|V|\mathbb{R}} & d\_{|V|\mathbb{G}} & d\_{|V|\mathbb{B}} \end{pmatrix}. \tag{10.28}$$

Here ρ(*D*|*s*, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) and *P*(*s*|α) are assumed to be as follows:

$$\rho(\mathcal{D}|\mathbf{s}, \mathbf{a}(+1), \mathbf{a}(-1), \mathcal{C}(+1), \mathcal{C}(-1)) = \prod\_{i \in V} \mathbf{g}\left(\mathbf{d}\_i \middle| s\_i, \mathbf{a}(\mathbf{s}\_i), \mathcal{C}(\mathbf{s}\_i)\right), \tag{10.29}$$

<sup>1</sup> **Glauber dynamics** was proposed in Ref. [32].

172 K. Tanaka

$$P(s|\alpha) = \frac{\prod\_{\{i,j\} \in E} \exp(-2\alpha(1 - \delta\_{s, s\_j}))}{\sum\_{s\_1 \in \Omega z\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega \{i, j\} \in E} \exp(-2\alpha(1 - \delta\_{s\_i, s\_j}))},\tag{10.30}$$

where

$$\log \left( \mathbf{d}\_{\bar{i}} \, \middle| \, s\_i, \mathbf{a}(s\_i), \mathbf{C}(s\_{\bar{i}}) \right) \equiv \sqrt{\frac{1}{\det(2\pi \mathbf{C}(s\_{\bar{i}}))}} \exp \left( -\frac{1}{2} (\mathbf{d}\_{\bar{i}} - \mathbf{a}(s\_{\bar{i}})) \mathbf{C}^{-1}(s\_{\bar{i}}) (\mathbf{d}\_{\bar{i}} - \mathbf{a}(s\_{\bar{i}}))^{\mathrm{T}} \right), \tag{10.31}$$

$$\mathbf{a}(+1) = \begin{pmatrix} a\_{\mathrm{R}}(+1) \\ a\_{\mathrm{G}}(+1) \\ a\_{\mathrm{B}}(+1) \end{pmatrix}, \; \mathbf{a}(-1) = \begin{pmatrix} a\_{\mathrm{R}}(-1) \\ a\_{\mathrm{G}}(-1) \\ a\_{\mathrm{B}}(-1) \end{pmatrix}, \tag{10.32}$$

$$\mathbf{C}(+1) = \begin{pmatrix} \mathbf{C}\_{\mathrm{RR}}(+1) & \mathbf{C}\_{\mathrm{RG}}(+1) & \mathbf{C}\_{\mathrm{RB}}(+1) \\ \mathbf{C}\_{\mathrm{GR}}(+1) & \mathbf{C}\_{\mathrm{GG}}(+1) & \mathbf{C}\_{\mathrm{BB}}(+1) \\ \mathbf{C}\_{\mathrm{BR}}(+1) & \mathbf{C}\_{\mathrm{BG}}(+1) & \mathbf{C}\_{\mathrm{BB}}(+1) \end{pmatrix}, \; \mathbf{C}(-1) = \begin{pmatrix} \mathbf{C}\_{\mathrm{RR}}(-1) & \mathbf{C}\_{\mathrm{RG}}(-1) & \mathbf{C}\_{\mathrm{RB}}(-1) \\ \mathbf{C}\_{\mathrm{GR}}(-1) & \mathbf{C}\_{\mathrm{GG}}(-1) & \mathbf{C}\_{\mathrm{BB}}(-1) \\ \mathbf{C}\_{\mathrm{BR}}(-1) & \mathbf{C}\_{\mathrm{BB}}(-1) & \mathbf{C}\_{\mathrm{BB}}(-1) \end{pmatrix}. \tag{10.33}$$

Note that the probabilistic graphical model in Eq. (10.30) is referred to as a **Potts mode** [33].

In probabilistic segmentation and clustering, ρ(*D*|*s*, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) in Eq. (10.29) and *P*(*s*|α) in Eq. (10.30) correspond to the data generative and prior models, respectively. The joint probability of *s* and *D* is expressed in terms of the data generative and prior distributions, ρ(*D*|*s*, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) and *P*(*s*|α), as follows:

$$\rho(\mathfrak{s}, \mathfrak{D} | \mathfrak{a}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathfrak{C}(+1), \mathfrak{C}(-1)) \equiv \rho(\mathfrak{D} | \mathfrak{s}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathfrak{C}(+1), \mathfrak{C}(-1)) P(\mathfrak{s} | \mathfrak{a}). \tag{10.34}$$

By using the joint probability distribution, the posterior probability *P*(*s*|*D*, α, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) and the marginal likelihood ρ(*D*|α, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) are defined by using Bayes formulas as follows:

$$P(\mathbf{s}|\mathcal{D}, \boldsymbol{\alpha}, \mathbf{a}(+\mathbf{l}), \mathbf{a}(-\mathbf{l}), \mathcal{C}(+\mathbf{l}), \mathcal{C}(-\mathbf{l})) \equiv \frac{\rho(\mathbf{s}, \mathcal{D}|\boldsymbol{\alpha}, \mathbf{a}(+\mathbf{l}), \mathbf{a}(-\mathbf{l}), \mathcal{C}(+\mathbf{l}), \mathcal{C}(-\mathbf{l}))}{\rho(\mathcal{D}|\boldsymbol{\alpha}, \mathbf{a}(+\mathbf{l}), \mathbf{a}(-\mathbf{l}), \mathcal{C}(+\mathbf{l}), \mathcal{C}(-\mathbf{l}))},\tag{10.35}$$

$$\begin{split} \rho(\mathcal{D}|\boldsymbol{\alpha}, \boldsymbol{\mathfrak{a}}(+\mathrm{l}), \boldsymbol{\mathfrak{a}}(-\mathrm{l}), \mathcal{C}(+\mathrm{l}), \mathcal{C}(-\mathrm{l})) \\ \equiv \sum\_{s\_1 \in \Omega s\_2 \in \Omega} \sum\_{s\_|V| \in \Omega} \cdots \sum\_{s\_|V| \in \Omega} \rho(\boldsymbol{s}, \mathcal{D}|\boldsymbol{\alpha}, \boldsymbol{\mathfrak{a}}(+\mathrm{l}), \boldsymbol{\mathfrak{a}}(-\mathrm{l}), \mathcal{C}(+\mathrm{l}), \mathcal{C}(-\mathrm{l})). \end{split} \tag{10.36}$$

Estimates of the hyperparameters and parameter vector, namely, α(*D*), *a*(+1|*D*), *a*(−1|*D*), *C* (+1|*D*), *C* (−1|*D*), *s*(*D*) =  *s*1(*D*),*s*2(*D*), ···,*s*|*V*|(*D*) are determined by **--**

$$\begin{split} \left( \widehat{\boldsymbol{a}}(\mathcal{D}), \widehat{\boldsymbol{a}}(+1|\mathcal{D}), \widehat{\boldsymbol{a}}(-1|\mathcal{D}), \widehat{\boldsymbol{C}}(+1|\mathcal{D}), \widehat{\boldsymbol{C}}(-1|\mathcal{D}) \right) \\ = \mathop{\arg\max}\_{\left( \boldsymbol{a}, \boldsymbol{a}(+1), \boldsymbol{a}(-1), \boldsymbol{C}(+1), \boldsymbol{C}(-1) \right)} \rho \left( \boldsymbol{D} \middle| \boldsymbol{a}, \boldsymbol{a}(+1), \boldsymbol{a}(-1), \boldsymbol{C}(+1), \boldsymbol{C}(-1) \right), \end{split} \tag{10.37}$$
 
$$\widehat{\boldsymbol{s}}\_{i}(\mathcal{D}) = \mathop{\arg\max}\_{\boldsymbol{s}\_{i} \in \Omega} P\_{i} \Big( \boldsymbol{s}\_{i} \big| \boldsymbol{D}, \widehat{\boldsymbol{a}}(\mathcal{D}), \widehat{\boldsymbol{a}}(+1|\mathcal{D}), \widehat{\boldsymbol{a}}(-1|\mathcal{D}), \widehat{\boldsymbol{C}}(+1|\mathcal{D}), \widehat{\boldsymbol{C}}(-1|\mathcal{D}) \Big) (i \in V). \tag{10.38}$$

The *Q*-function for the EM algorithm in the present framework is defined by

$$\begin{split} \mathcal{Q}(\boldsymbol{\alpha},\boldsymbol{s}(+1),\,\boldsymbol{\mathfrak{a}}(-1),\,\boldsymbol{\mathcal{C}}(+1),\,\boldsymbol{\mathcal{C}}(-1)|\boldsymbol{s}'(+1),\,\boldsymbol{s}'(-1),\,\boldsymbol{\mathcal{C}}'(+1),\,\boldsymbol{\mathcal{C}}'(-1),\,\boldsymbol{\mathcal{D}} \\ \equiv \sum\_{\boldsymbol{s}\_{1}\in\Omega} \sum\_{\boldsymbol{s}\in\Omega} \cdots \sum\_{\boldsymbol{s}\_{|V|}\in\Omega} P(\boldsymbol{s}|\boldsymbol{\mathcal{D}},\boldsymbol{\alpha},\,\boldsymbol{\mathfrak{a}}'(+1),\,\boldsymbol{\mathfrak{a}}'(-1),\,\boldsymbol{\mathcal{C}}'(+1),\,\boldsymbol{\mathcal{C}}'(-1)) \\ \quad \times \ln(\rho(\mathbf{s},\,\boldsymbol{\mathfrak{D}}|\boldsymbol{\alpha},\,\boldsymbol{\mathfrak{a}}(+1),\,\boldsymbol{\mathfrak{a}}(-1),\,\boldsymbol{\mathfrak{C}}(+1),\,\boldsymbol{\mathcal{C}}(-1))). \end{split}$$

The EM algorithm is a procedure that performs the following **E-step** and **Mstep** repeatedly for *<sup>t</sup>* <sup>=</sup> <sup>0</sup>, <sup>1</sup>, <sup>2</sup>, ··· until α(*D*), *<sup>a</sup>*(+1, *<sup>D</sup>*), *a*(−1, *D*), *C* (+1, *D*), *C* **-**(−1, *D*) converge:


$$\begin{array}{l} \mathcal{Q}(\mathfrak{a}(t+1), \mathfrak{a}(+1, t+1), \mathfrak{a}(-1, t+1), \mathcal{C}(+1, t+1), \mathcal{C}(-1, t+1)) \\ \leftarrow \underset{\mathfrak{a}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathcal{C}(+1), \mathcal{C}(-1)}{\text{extremum}} \\ \mathcal{Q}(\mathfrak{a}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathcal{C}(+1), \mathcal{C}(-1) \big| \mathfrak{a}(t), \mathfrak{a}(+1, t), \mathfrak{a}(-1, t), \mathcal{C}(+1, t), \mathcal{C}(-1, t), \mathcal{D}(-1, t) \big) \\ \qquad \qquad \qquad (10.40) \\ \end{array} \tag{10.40}$$
 
$$\text{pdate } \widehat{\mathfrak{a}}(\mathcal{D}) \leftarrow \mathfrak{a}(t+1), \, \widehat{\mathfrak{a}}(+1, \mathcal{D}) \leftarrow \mathfrak{a}(+1, t+1), \, \widehat{\mathfrak{a}}(-1, \mathcal{D}) \leftarrow \mathfrak{a}(-1, \mathcal{D}) \end{array}$$

Update α(*D*)←α(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>), *a*(−1, *D*)←*a*(−1, *t* + 1), *C* (+1, *D*)←*C*(+1, *t* + 1) and *C* (−1, *D*)←*C*(−1, *t* + 1).

By using the equalities in Eqs. (10.29), (10.30), (10.34), and (10.35), the EM algorithm by the *Q*-function can be reduced to the following simultaneous update rules:

$$\begin{split} \frac{1}{|E|} \sum\_{\{\boldsymbol{l},\boldsymbol{j}\}} & \sum\_{\{\boldsymbol{l},\boldsymbol{j}\}} \sum\_{\boldsymbol{s}\in\Omega} \sum\_{\boldsymbol{l}\in\Omega} \left(1 - \delta\_{\boldsymbol{l}\_{\boldsymbol{l}},\boldsymbol{s}\_{\boldsymbol{J}}}\right) P\_{\boldsymbol{l}\boldsymbol{j}} \left(\boldsymbol{s}\_{\boldsymbol{l}},\boldsymbol{s}\_{\boldsymbol{J}}\right) \mathbf{a}(\boldsymbol{t}+\boldsymbol{l}) \\ &= \frac{1}{|E|} \sum\_{\{\boldsymbol{l},\boldsymbol{j}\}} \sum\_{\boldsymbol{s}\in\Omega} \sum\_{\boldsymbol{s}\in\Omega} \sum\_{\boldsymbol{l}\in\Omega} \left(1 - \delta\_{\boldsymbol{l}\_{\boldsymbol{l}},\boldsymbol{s}\_{\boldsymbol{J}}}\right) P\_{\boldsymbol{l}\boldsymbol{j}} \left(\boldsymbol{s}\_{\boldsymbol{l}},\boldsymbol{s}\_{\boldsymbol{J}}\right) \mathbf{B}, \mathbf{a}(\boldsymbol{t}), \mathbf{a}(+\boldsymbol{1},\boldsymbol{t}), \mathbf{a}(-\boldsymbol{1},\boldsymbol{t}), \mathbf{C}(+\boldsymbol{1},\boldsymbol{t}), \mathbf{C}(-\boldsymbol{1},\boldsymbol{t}) \right), \end{split} \tag{10.41}$$

$$\mathfrak{a}(\mathbf{x}, t+1) = \frac{\sum\_{i \in V} \mathbf{d}\_i \, P\_l(\mathbf{s}\_i \, | \, \mathbf{D}, \mathbf{a}(t), \mathbf{a}(+1, t), \mathbf{a}(-1, t), \mathbf{C}(+1, t), \mathbf{C}(-1, t))}{\sum\_{i \in V} \mathbf{P}\_l(\mathbf{s}\_i \, | \, \mathbf{D}, \mathbf{a}(t), \mathbf{a}(+1, t), \mathbf{a}(-1, t), \mathbf{C}(+1, t), \mathbf{C}(-1, t))} \, (\mathbf{s}\_i \in \Omega), \tag{10.42}$$

$$\begin{split} \mathcal{C}(s\_{i}, t+1) \\ \sum\_{i \in V} (\mathbf{d}\_{i} - \mathbf{a}(s\_{i}, t))^{\mathrm{T}} (\mathbf{d}\_{i} - \mathbf{a}(s\_{i}, t)) P\_{i} \langle s\_{i} | \mathcal{D}, \mathbf{a}(t), \mathbf{a}(+1, t), \mathbf{a}(-1, t), \mathbf{C}(+1, t), \mathbf{C}(-1, t) \rangle \\ \sum\_{i \in V} P\_{i} \langle s\_{i} | \mathcal{D}, \mathbf{a}(t), \mathbf{a}(+1, t), \mathbf{a}(-1, t), \mathbf{C}(+1, t), \mathbf{C}(-1, t) \rangle \\ \end{split} \tag{5} \\ (s\_{i} \in \mathbb{Q}), \tag{10.43}$$

where

$$\begin{split} &P\_i(\mathbf{s}\_i|\mathcal{D}, \mathfrak{a}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathfrak{C}(+1), \mathfrak{C}(-1)) \\ &= \sum\_{\mathsf{r}\_l \in \Omega \mathsf{r}\_2 \in \Omega} \cdots \sum\_{\mathsf{r}\_{|V|} \in \Omega} \delta\_{\mathfrak{s}\_i, \mathfrak{r}\_i} P\left(\mathfrak{r}\_1, \mathfrak{r}\_2, \dots, \mathfrak{r}\_{|V|} \middle| \mathcal{D}, \mathfrak{a}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathfrak{C}(+1), \mathfrak{C}(-1)\right) (i \in V), \end{split} \tag{10.44}$$

$$\begin{split} & \quad P\_{ij}(s\_i, s\_j | \mathbf{D}, \boldsymbol{\alpha}, \mathbf{a}(+1), \mathbf{a}(-1), \mathbf{C}(+1), \mathbf{C}(-1)) \\ &= P\_{ji}(s\_j, s\_i | \mathbf{D}, \boldsymbol{\alpha}, \mathbf{a}(+1), \mathbf{a}(-1), \mathbf{C}(+1), \mathbf{C}(-1)) \\ &= \sum\_{\mathbf{r}\_l} \sum\_{\mathbf{r} \in \Omega} \cdots \sum\_{\mathbf{r}\_{|V|}, \mathbf{c}} \delta\_{l\_l, i\_l} \delta\_{i\_l, i\_f} \mathcal{P} \{ \mathbf{r}\_1, \mathbf{r}\_2, \cdots, \mathbf{r}\_{|V|} | \mathbf{D}, \boldsymbol{\alpha}, \mathbf{a}(+1), \mathbf{a}(-1), \mathbf{C}(+1), \mathbf{C}(-1) \} \tag{10.45} \\ &\tag{10.45} \end{split} \tag{10.45}$$

$$\begin{split} P\_{ij}(\mathbf{s}\_i, \mathbf{s}\_j | \boldsymbol{\alpha}) &= P\_{ji}(\mathbf{s}\_j, \mathbf{s}\_i | \boldsymbol{\alpha}) \\ &\equiv \sum\_{\mathbf{r}\_l \in \mathfrak{Q} \mathbf{r}\_2 \in \mathfrak{Q}} \sum\_{\mathbf{r}\_{|V|} \in \mathfrak{Q}} \cdots \sum\_{\mathbf{r}\_{|V|} \in \mathfrak{Q}} \delta\_{\mathbf{s}\_i, \mathbf{r}\_l} \delta\_{\mathbf{s}\_j, \mathbf{r}\_l} P(\mathbf{r}\_1, \mathbf{r}\_2, \dots, \mathbf{r}\_{|V|} | \boldsymbol{\alpha}) \tag{10.46} \end{split} \tag{10.46}$$

#### **10.3 Statistical Mechanical Informatics**

In statistical mechanical informatics [13–17], Ising models are very familiar probabilistic models for which computations are done by statistical mechanical techniques, including advanced mean-field methods, renormalization group methods, Monte Carlo simulations, and replica methods [36, 37]. This section reviews the framework of the Ising model and associated advanced mean-field methods.<sup>2</sup>

<sup>2</sup> A review of both exact results and approximate results as well as perturbative computations for Ising models is given in Refs. [34, 35].

#### *10.3.1 Ising Model*

Let us consider an Ising model defined by the following probability distribution for the state space -= {+1, −1} for the state variable *si* at each node *i*(∈*V*):

$$\begin{split}P\left(\mathbf{s}\middle|d,\frac{J}{k\_{\mathrm{B}}T},\frac{h}{k\_{\mathrm{B}}T}\right) &= P\left(s\_{1},s\_{2},\cdots,s\_{|V|}\middle|d\_{1},d\_{2},\cdots,d\_{|V|},\frac{J}{k\_{\mathrm{B}}T},\frac{h}{k\_{\mathrm{B}}T}\right) \\ &= \frac{\exp\left(-\frac{1}{k\_{\mathrm{B}}T}\left(\frac{1}{2}J\sum\_{\{l\_{i}\}\in E}\left(s\_{l}-s\_{j}\right)^{2}+\frac{1}{2}h\sum\_{l\in V}\left(s\_{l}-d\_{l}\right)^{2}\right)\right)}{\sum\_{s\_{1}\in\Omega\_{2}}\sum\_{z\in\Omega\_{1}}\cdots\sum\_{s\_{|V|}\in\Omega\_{2}}\exp\left(-\frac{1}{k\_{\mathrm{B}}T}\left(\frac{1}{2}J\sum\_{\{l\_{i}\}\in E}\left(s\_{l}-s\_{j}\right)^{2}+\frac{1}{2}h\sum\_{l\in V}\left(s\_{l}-d\_{l}\right)^{2}\right)\right)} \end{split} \tag{10.47}$$

$$\begin{split} \left(I>0,\ \ T>0,\ d\_{l}\in(-\infty,+\infty)\ \ (\forall i\in V) \text{.} \end{split} \tag{10.47}$$

Because *si* <sup>2</sup> = 1 (*i*∈*V*), the probability distribution *P*(*s*) can be reduced to

$$P\left(s\middle|\mathbf{d}, \frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) = \frac{1}{Z} \exp\left(-\frac{1}{k\_{\rm B}T}H(\mathbf{s})\right)(T > 0),\tag{10.48}$$

$$\begin{aligned} H(\mathbf{s}) &= H(s\_1, s\_2, \dots, s\_{|V|}) \\ &\equiv -J \sum\_{\{i,j\} \in E} s\_i s\_j - h \sum\_{i \in V} d\_i s\_i \left( J > 0, \ h \ge 0, \ d\_i \in ( -\infty, +\infty) \ (\forall i \in V) \right), \end{aligned} \tag{10.49}$$

$$Z \equiv \sum\_{s\_1 \in \Omega} \sum\_{\nu\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \exp\left(-\frac{1}{k\_{\text{B}}T} H(\mathbf{s})\right),\tag{10.50}$$

where *H*(*s*) and *Z* are referred to in statistical mechanical informatics as the **energy function** (or **Hamiltonian**) and the **partition function**, respectively, the probability distribution in Eq. (10.47) is called the **Gibbs distribution**, *k*<sup>B</sup> is the **Boltzmann constant**, *T* is the **(absolute) temperature**, *J* is the **(ferromagnetic) interaction**, and *h* is the **external field**.

Let us suppose the **Kullback-Leibler Divergence**

$$\text{KL}[P||R] \equiv \sum\_{s\_1 \in \Omega s\_2 \in \Omega} \sum\_{s\_{|V|} \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} R(s) \ln \left( \frac{R(s)}{P\left(s \left| \frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T} \right)} \right), \tag{10.51}$$

which is always non-negative for two probability distributions *P s J <sup>k</sup>*B*<sup>T</sup>* , *<sup>h</sup> k*B*T* and *R*(*s*) and is regarded as a pseudo-distance between them. By substituting the explicit expression for *P*(*s*) in Eqs. (10.48), (10.49) and (10.50) into Eq. (10.51), the expression for the Kullback-Leibler divergence (10.51) in terms of the partition function *Z* and the free energy functional *F*[*R*] can be derived as follows:

$$\text{KL}[P||R] = \frac{1}{k\_{\text{B}}T} \left( k\_{\text{B}}T \ln(Z) + \mathcal{F}[R] \right),\tag{10.52}$$

where

$$\mathcal{F}[\mathcal{R}] \equiv \sum\_{\mathbf{s}\_{\mathcal{I}} \notin \Omega \mathbf{z}\_{2} \in \Omega} \cdots \sum\_{\mathbf{s} \mid \mathcal{V} \nmid \Omega} H(\mathbf{s}) R(\mathbf{s}) + k\_{\mathcal{B}} T \sum\_{\mathbf{s}\_{\mathcal{I}} \notin \Omega \mathbf{z}\_{2} \in \Omega} \cdots \sum\_{\mathbf{s} \mid \mathcal{V} \nmid \Omega} R(\mathbf{s}) \ln \{ R(\mathbf{s}) \}. \tag{10.53}$$

For the free energy functional *F*[*R*], it is valid that

$$\operatorname\*{argmin}\_{\boldsymbol{R}} \left\{ \mathcal{F}[\boldsymbol{R}] \left| \sum\_{\boldsymbol{\tau}\_{1} \in \Omega \boldsymbol{\tau}\_{2} \in \Omega} \cdots \sum\_{\boldsymbol{\tau}\_{|V|} \in \Omega} \mathcal{R} \left( \boldsymbol{\tau}\_{1}, \boldsymbol{\tau}\_{2}, \cdots, \boldsymbol{\tau}\_{|V|} \right) = 1 \right.} \right| = \boldsymbol{P} \left( \text{s} \left| \frac{\boldsymbol{J}}{k\_{\mathrm{B}} \boldsymbol{T}}, \frac{\boldsymbol{h}}{k\_{\mathrm{B}} \boldsymbol{T}} \right), \tag{10.54}$$

$$\min\_{R} \left\{ \mathcal{F}[R] \left| \sum\_{\tau\_1 \in \Omega} \sum\_{\tau\_2 \in \Omega} \cdots \sum\_{\tau\_{|V|} \in \Omega} R \left( \tau\_1, \tau\_2, \cdots, \tau\_{|V|} \right) = 1 \right. \right\} = -k\_{\mathbb{B}} T \ln(Z). \tag{10.55}$$

Note that −*k*B*T* ln(*Z*) is referred to as the **free energy** for the Gibbs distribution in Eq. (10.47).

#### *10.3.2 Advanced Mean-Field Method*

This section reviews the fundamental framework of advanced mean-field methods [12], including the **mean-field approximation** [35–37] and the **Bethe approximation** [35, 39–41]. Our framework is given for the Ising model in Eqs. (10.48), (10.49), and (10.50). It is known that a generalization of the present framework can be realized by using the **cluster variation method** in Refs. [42–45].

We introduce a trial probability distribution *R*(*s*) = *R*(*s*1,*s*2, ···,*s*|*V*|) which is restricted to the following functional form:

$$R(\mathbf{s}) = R\left(\mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{|V|}\right) = \prod\_{i \in V} R\_i(\mathbf{s}\_i),\tag{10.56}$$

$$R\_i(s\_i) = \sum\_{\mathbf{r}\_1 \in \Omega} \sum\_{\mathbf{r}\_2 \in \Omega} \cdots \sum\_{\mathbf{r}\_{|V|} \in \Omega} \delta\_{s\_i, \mathbf{r}\_i} R\left(\mathbf{r}\_1, \mathbf{r}\_2, \cdots, \mathbf{r}\_{|V|}\right) (i \in V). \tag{10.57}$$

By using the definition of *Ri*(*si*) and the normalization condition

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 177

$$\sum\_{\tau\_1 \in \Omega} \sum\_{\tau\_2 \in \Omega} \cdots \sum\_{\tau\_{|V|} \in \Omega} \mathcal{R}(\tau\_1, \tau\_2, \cdots, \tau\_{|V|}) = 1,\tag{10.58}$$

we confirm that

$$\sum\_{\tau\_i \in \Omega} R\_i(\tau\_i) = R\_i(+1) + R\_i(-1) = 1 \ (i \in V). \tag{10.59}$$

By substituting the expression *R*(*s*) in terms of the marginal probability distribution *Ri*(*si*) (*i*∈*V*), such that *Ri* = *Ri*(+1) 0 0 *Ri*(−1) into Eqs. (10.56)–(10.53), the free energy functional *F*[*R*] can be reduced to the following mean-field free energy functional:

$$\mathcal{F}[R] = \mathcal{F}\_{\rm MF}[\{R\_i | i \in V\}],\tag{10.60}$$

where

**-**

$$\begin{split} \mathcal{F}\_{\text{MF}}[\{\mathcal{R}\_{\hat{i}} | i \in V\}] &= \mathcal{F}\_{\text{MF}} \Bigg[ \Bigg( \Bigg( \begin{pmatrix} R\_{\hat{i}}(+1) & 0 \\ 0 & R\_{\hat{i}}(-1) \end{pmatrix} \Big|\_{i \in V} \Bigg) \Bigg] \\ & \equiv -J \sum\_{\{i,j\} \in E} \Bigg( \sum\_{\mathbf{r}\_{i} \in \Omega} \mathbf{r}\_{i} R\_{\hat{i}}(\mathbf{r}\_{i}) \Bigg) \Bigg( \sum\_{\mathbf{r}\_{j} \in \Omega} \mathbf{r}\_{j} R\_{j}(\mathbf{r}\_{j}) \Bigg) - h \sum\_{i \in V} d\_{i} \Bigg( \sum\_{\mathbf{r}\_{i} \in \Omega} \mathbf{r}\_{i} R\_{\hat{i}}(\mathbf{r}\_{i}) \Bigg) \\ & + k\_{\text{B}} T \sum\_{i \in V} \sum\_{\mathbf{r}\_{i} \in \Omega} \mathbf{R}\_{i}(\mathbf{r}\_{i}) \ln(R\_{\hat{i}}(\mathbf{r}\_{i})). \end{split} \tag{10.61}$$

Let us suppose the following conditional minimization of the free energy functional: **-**

$$\begin{split} \widehat{\boldsymbol{\mathcal{R}}}\_{i} &= \left| \begin{pmatrix} \widehat{\boldsymbol{\mathcal{R}}}\_{i}(+1) & \boldsymbol{0} \\ \boldsymbol{0} & \widehat{\boldsymbol{\mathcal{R}}}\_{i}(-1) \end{pmatrix} \Big| i \in V, s\_{i} \in \Omega \right. \\ &= \arg\min\_{\{\boldsymbol{\mathcal{R}}\_{i} \mid i \in V\}} \left\{ \boldsymbol{\mathcal{F}}\_{\text{MF}} [\{\boldsymbol{\mathcal{R}}\_{i} \mid i \in V\}] \, \middle| \, \sum\_{\boldsymbol{\tau}\_{i} \in \Omega} \boldsymbol{\mathcal{R}}\_{i}(\boldsymbol{\tau}\_{i}) = 1, \, i \in V \right\}. \end{split} \tag{10.62}$$

First we introduce the Lagrange multiplier λ*<sup>i</sup>* (*i*∈*V*) to ensure the normalization conditions τ∈- *Ri*(τ ) = 1 (*i*∈*V*) as follows:

$$\mathcal{L}\_{\text{MF}}[\{\mathcal{R}\_{i}|i \in V\}] \equiv \mathcal{F}\_{\text{MF}}[\{\mathcal{R}\_{i}|i \in V\}] - \sum\_{i \in V} \lambda\_{i} \left(\sum\_{\tau\_{i} \in \Omega} \mathcal{R}\_{i}(\tau\_{i}) - 1\right). \tag{10.63}$$

*R <sup>i</sup>* (*i*∈*V*) are determined so as to satisfy the following extremum condition: 178 K. Tanaka

$$\left[\frac{\partial}{\partial \mathcal{R}\_i(-1)} \mathcal{L}\_{\text{MF}}[\{\mathcal{R}\_i | i \in V\}]\right]\_{\{\mathcal{R}\_i = \widehat{\mathcal{R}}\_i | i \in V\}} = 0 \ (i \in V),\tag{10.64}$$

$$\left[\frac{\partial}{\partial \mathcal{R}\_i(+1)} \mathcal{L}\_{\text{MF}}[\{\mathcal{R}\_i | i \in V\}]\right]\_{\{\mathcal{R}\_i = \tilde{\mathcal{R}}\_i | i \in V\}} = 0 \ (i \in V), \tag{10.65}$$

such that

$$\left[\frac{\partial}{\partial \mathcal{R}\_i} \mathcal{L}\_{\text{MF}}[\{\mathcal{R}\_i | i \in V\}]\right]\_{\{\mathcal{R}\_i = \widehat{\mathcal{R}}\_i | i \in V\}} = 0 \ (i \in V). \tag{10.66}$$

It needs to be shown that *R <sup>i</sup>* (*i*∈*V*) are derived as follows:

$$\begin{split} \widehat{R}\_{i}(\boldsymbol{s}\_{i}) \\ = & \exp\left(-1 + \frac{\lambda\_{i}}{k\_{\mathrm{B}}T}\right) \exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(hd\_{i} + J\sum\_{j \in \partial i} \left(\sum\_{\tau\_{j} \in \Omega} \tau\_{j}\widehat{R}\_{j}(\tau\_{j})\right)\right) \mathbf{s}\_{i}\right) \text{ ( $i \in V$ ,  $s\_{i} \in \Omega$ )}. \end{split} \tag{10.67}$$

Finally, λ*<sup>i</sup>* needs to be determined such that it satisfies the normalization condition of the marginal probability *R <sup>i</sup>*(*si*). The marginal probabilities # *R i i*∈*<sup>V</sup>* \$ are derived as

$$\widehat{R\_{l}}(s\_{l}) = \frac{\exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(hd\_{l} + J \sum\_{j \in \partial i} \left(\sum\_{\tau\_{j} \in \Omega} \tau\_{j} \widehat{R}\_{j}(\tau\_{j})\right)\right) s\_{l}\right)}{\sum\_{\tau\_{i} \in \Omega} \exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(hd\_{l} + J \sum\_{j \in \partial i} \left(\sum\_{\tau\_{j} \in \Omega} \tau\_{j} \widehat{R}\_{j}(\tau\_{j})\right)\right) \tau\_{i}\right)} \tag{10.68}$$

We introduce the local magnetization

$$m\_i \equiv \sum\_{\mathbf{r}\_i \in \Omega} \mathbf{r}\_i \widehat{R}\_i(\mathbf{r}\_i). \tag{10.69}$$

By solving the simultaneous equations

$$\sum\_{\mathbf{r}\_i \in \Omega} \widehat{R}\_i(\mathbf{r}\_i) = \widehat{R}\_i(+1) + \widehat{R}\_i(-1) = 1 \ (i \in V), \tag{10.70}$$

$$\sum\_{\tau\_i \in \Omega} \tau\_i \widehat{R}\_i(\tau\_i) = \widehat{R}\_i(+1) - \widehat{R}\_i(-1) = m\_i \ (i \in V),\tag{10.71}$$

with respect to *R <sup>i</sup>*(+1) and *<sup>R</sup> <sup>i</sup>*(−1), we derive the following expression for the marginal probability:

10 Review of Sublinear Modeling in Probabilistic Graphical Models … 179

$$
\widehat{R}\_i(\mathbf{s}\_i) = \frac{1}{2} \left( 1 + m\_i \mathbf{s}\_i \right) (i \in V). \tag{10.72}
$$

The extremum conditions in Eq. (10.68) can be reduced to the following simultaneous deterministic equation of {*mi*|*i*∈*V*}:

$$m\_i = \tanh\left(\frac{1}{k\_\text{B}T} \left(hd\_i + J\sum\_{j \in \partial i} m\_j\right)\right) (i \in V),\tag{10.73}$$

which is referred to as the **mean-field equation**. 3

By substituting Eq. (10.72) into Eq. (10.61), the mean-field free energy functional can be reduced to

$$\mathcal{F}\_{\rm MF} \left[ \left\{ \widehat{R}\_i(-1), \widehat{R}\_i(+1) \middle| i \in V \right\} \right] = F\_{\rm MF} (m\_1, m\_2, \dots, m\_{|V|}), \tag{10.74}$$

$$\begin{aligned} F\_{\text{MF}}(m\_1, m\_2, \dots, m\_{|V|}) & \equiv -J \sum\_{\{i,j\} \in E} m\_i m\_j - h \sum\_{i \in V} d\_i m\_i \\ &+ k\_{\text{B}} T \sum\_{i \in V} \frac{1}{2} (1 + m\_i) \ln \left( \frac{1}{2} (1 + m\_i) \right) \\ &+ k\_{\text{B}} T \sum\_{i \in V} \frac{1}{2} (1 - m\_i) \ln \left( \frac{1}{2} (1 - m\_i) \right) . \end{aligned} (10.75)$$

The extremum conditions

$$\frac{\partial}{\partial m\_i} F\_{\rm MF} \left( m\_1, m\_2, \dots, m\_{|V|} \right) = 0 \ (i \in V) \tag{10.76}$$

can be reduced to the mean-field equations in Eq. (10.73).

We now explore the framework of the Bethe approximation for the Ising model in Eqs. (10.48), (10.49), and (10.50). Our framework is based on the cluster variation method [39, 42–45].

We introduce a trial probability distribution *R*(*s*) = *R*(*s*1,*s*2, ···,*s*|*V*|) that is restricted to the following functional form:

<sup>3</sup> Equation (10.73) is often referred to as the **naive mean-field equation** in statistical machine learning theory [2, 3, 12].

$$\begin{aligned} R(s) = R(s\_1, s\_2, \dots, s\_{|V|}) &= \left(\prod\_{i \in V} R\_i(s\_i)\right) \left(\prod\_{\{i,j\} \in E} \frac{R\_{ij}(s\_i, s\_j)}{R\_i(s\_i)R\_j(s\_j)}\right) \\ &= \left(\prod\_{i \in V} R\_i(s\_i)^{1 - |\partial i|}\right) \left(\prod\_{\{i,j\} \in E} R\_{ij}(s\_i, s\_j)\right), (10.77) \end{aligned}$$

where

$$\begin{split} \mathcal{R}\_{lj}(s\_{i}, s\_{j}) &= \mathcal{R}\_{ji}(s\_{j}, s\_{i}) \\ &\equiv \sum\_{\mathbf{r}\_{l} \in \mathfrak{Q} \mathbf{r}\_{2} \in \mathfrak{Q}} \sum\_{\mathbf{r}\_{|V|} \in \mathfrak{Q}} \cdots \sum\_{\mathbf{r}\_{|V|} \in \mathfrak{Q}} \delta\_{\mathfrak{S}\_{l}, \mathbf{r}\_{l}} \delta\_{\mathbf{r}\_{j}, \mathbf{r}\_{j}} \mathcal{R}(\mathbf{r}\_{1}, \mathbf{r}\_{2}, \cdots, \mathbf{r}\_{|V|}) \; (\langle i, j \rangle \in E) . \end{split} \tag{10.78}$$

By using Eqs. (10.57) and (10.78), we can derive the normalization and reducibility conditions in the marginal probabilities as follows:

$$\sum\_{\mathfrak{r}\_i \in \Omega} R\_i(\mathfrak{r}\_i) = 1 \ (i \in V), \ \sum\_{\mathfrak{r}\_i \in \Omega \mathfrak{r}\_j \in \Omega} R\_{ij}(\mathfrak{r}\_i, \mathfrak{r}\_j) = 1 \ (\{i, j\} \in E), \qquad (10.79)$$

$$R\_i(\mathbf{s}\_i) = \sum\_{\mathbf{r}\_j \in \Omega} R\_{ij}(\mathbf{s}\_i, \mathbf{r}\_j), \ R\_j(\mathbf{s}\_j) = \sum\_{\mathbf{r}\_i \in \Omega} R\_{ij}(\mathbf{r}\_i, \mathbf{s}\_j) \ (\{i, j\} \in E). \qquad (10.80)$$

By substituting the explicit expression for *P*(*s*) and the expression ln(*R*(*s*)) in terms of the marginal probability distributions *R<sup>i</sup>* = *Ri*(+1) 0 0 *Ri*(−1) (*i*∈*V*) and

$$
\mathcal{R}\_{ij} = \begin{pmatrix}
\mathcal{R}\_{ij}(+1,+1) & 0 & 0 & 0 \\
0 & \mathcal{R}\_{ij}(+1,-1) & 0 & 0 \\
0 & 0 & \mathcal{R}\_{ij}(-1,+1) & 0 \\
0 & 0 & 0 & \mathcal{R}\_{ij}(-1,-1) \\
\cdots & \ddots & \cdots & \cdots & \cdots & \cdots
\end{pmatrix} (\{i,j\} \in E) \text{ in Eq.}
$$

(10.77) into Eq. (10.51), the Kullback-Leibler divergence can be reduced to the following expression in terms of the partition function *Z* and the Bethe free energy functional *F*Bethe[{*Ri*|*i*∈*V*},{*Ri j*|{*i*, *j*}∈*E*}]:

$$\text{KL}\{P||R\} = \frac{1}{k\_{\text{B}}T} \left( k\_{\text{B}}T \ln Z + \mathcal{F}\_{\text{Bethe}} \left[ \{ \mathcal{R}\_i | i \in V \}, \{ \mathcal{R}\_{ij} | \{ i, j \} \in E \} \right] \right), \quad (10.81)$$

where

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 181

$$\begin{split} & \mathcal{F}\_{\text{Bethe}} \left[ \{ \mathcal{R}\_{i} \, | \, i \in V \}, \{ \mathcal{R}\_{ij} \, | \, (i, j) \in E \} \right] \\ & \equiv -J \sum\_{\{i, j\} \in E} \left( \sum\_{\tau\_{i} \in \Omega \tau\_{j}} \sum\_{\tau\_{i} \in \Omega} \tau\_{i} \tau\_{j} R\_{\{i, j\}} (\tau\_{i}, \tau\_{j}) \right) - h \sum\_{i \in V} \left( \sum\_{\tau\_{i} \in \Omega} \tau\_{i} R\_{i} (\tau\_{i}) \right) \\ & \quad + k\_{\text{B}} T \sum\_{i \in V} (1 - |\partial i|) \left( \sum\_{\tau\_{i} \in \Omega} R\_{i} (\tau\_{i}) \ln \{ R\_{i} (\tau\_{i}) \} \right) \\ & \quad + k\_{\text{B}} T \sum\_{\{i, j\} \in E} \left( \sum\_{\tau\_{i} \in \Omega} \sum\_{\tau\_{i} \in \Omega} R\_{ij} (\tau\_{i}, \tau\_{j}) \ln \{ R\_{ij} (\tau\_{i}, \tau\_{j}) \} \right). \end{split} \tag{10.82}$$

Let us suppose the following conditional minimization of the Bethe free energy functional: **--**

$$\begin{aligned} \left( \left\{ \widehat{\mathbf{R}}\_{i} \, \middle| \, i \in V \right\}, \left\{ \widehat{\mathbf{R}}\_{\{i,j\}} \left| \{i,j\} \in E \right\} \right) \\ = \arg \min\_{\{ \mathbf{R}\_{i} \mid i \in V \}, \left\{ \mathbf{R}\_{\{i,j\}} \left| \{i,j\} \in E \right\} \right\} & \left\{ \mathbf{R}\_{\{i,j\}} \left| \{i,j\} \in E \right\} \right\} \\ \left| \sum\_{\mathbf{r}\_{i} \in \Omega} R\_{i}(\mathbf{r}\_{i}) = 1 \ (i \in V), \sum\_{\mathbf{r}\_{i} \in \Omega} \sum\_{j \in \Omega} R\_{ij}(\mathbf{r}\_{i}, \mathbf{r}\_{j}) = 1 \ (\left\{ i,j \right\} \in E), \\ R\_{i}(-1) = \sum\_{\mathbf{r}\_{j} \in \Omega} R\_{ij}(-1, \mathbf{r}\_{j}) \ (j \in \partial i, \ i \in V), \\ R\_{i}(+1) = \sum\_{\mathbf{r}\_{j} \in \Omega} R\_{ij}(+1, \mathbf{r}\_{j}) \ (j \in \partial i, \ i \in V) \end{aligned} \tag{10.83}$$

We introduce the Lagrange multiplier λ*<sup>i</sup>* (*i*∈*V*), λ{*i*,*j*}, λ*<sup>i</sup>*,*i j*(−1) = λ*<sup>i</sup>*,*ji*(−1), λ*<sup>i</sup>*,*i j*(+1) = λ*<sup>i</sup>*,*ji*(+1) ({*i*, *j*}∈*E*) to ensure the normalization and reducibility conditions as follows:

$$\begin{split} \mathcal{L}\_{\text{Bethe}} \left[ \{ R\_{l} \mid i \in V \}, \{ R\_{l} \mid \langle i, j \rangle \in E \} \right] &= \mathcal{F}\_{\text{Bethe}} \{ \{ R\_{l} \mid i \in V \}, \{ R\_{l} \mid \langle i, j \rangle \in E \} \} \\ &- \sum\_{l \in V} \lambda\_{l} \left( \sum\_{\mathbf{r}\_{l} \in \Omega} R\_{l} (\mathbf{r}\_{l}) - 1 \right) \\ &- \sum\_{\{l, j\} \in E} \lambda\_{\{l, j\}} \left( \sum\_{\mathbf{r}\_{l} \in \Omega} \sum\_{\mathbf{r}\_{l} \in \Omega} R\_{lj} (\mathbf{r}\_{l}, \mathbf{r}\_{j}) - 1 \right) \\ &- \sum\_{l \in V} \sum\_{j \in \partial i} \lambda\_{i, l, j} (-1) \left( R\_{l} (-1) - \sum\_{\mathbf{r}\_{j} \in \Omega} R\_{lj} (-1, \mathbf{r}\_{j}) \right) \\ &- \sum\_{l \in V} \sum\_{j \in \partial i} \lambda\_{i, l, j} (+1) \left( R\_{l} (+1) - \sum\_{\mathbf{r}\_{j} \in \Omega} R\_{lj} (+1, \mathbf{r}\_{j}) \right). \end{split} \tag{10.84}$$

The marginal probabilities *R <sup>i</sup>* (*i*∈*V*) and *R i j* ({*i*, *j*}∈*E*) are determined so as to satisfy the following extremum condition:

$$\left[\frac{\partial}{\partial \mathcal{R}\_i} \mathcal{L}\_{\text{Bethe}} \left[ \{ \mathcal{R}\_i | i \in V \}, \{ \mathcal{R}\_{ij} | \{ i, j \} \in E \} \right] \right]\_{\{ \mathcal{R}\_i = \widehat{\mathcal{R}}\_i | i \in V \}, \{ \mathcal{R}\_{ij} = \widehat{\mathcal{R}}\_{ij} | \{ i, j \} \in E \}} = 0 \ (i \in V), \tag{10.85}$$

**-**

**-**

$$\left[\frac{\partial}{\partial \mathcal{R}\_{lj}} \mathcal{L}\_{\text{Bethe}} \Big[ \{ \mathcal{R}\_l | i \in V \}, \{ \mathcal{R}\_{lj} | \langle i, j \rangle \in E \} \Big] \right]\_{\{ \mathcal{R}\_l = \widehat{\mathcal{R}}\_l | i \in V \}, \{ \mathcal{R}\_{lj} = \widehat{\mathcal{R}}\_{lj} | \langle i, j \rangle \in E \}} = 0 \ (\langle i, j \rangle \in E). \tag{10.86}$$

**-**

**-**

It needs to be shown that *R <sup>i</sup>*(*si*) (*i*∈*V*) and *<sup>R</sup> i j*(*si*,*sj*) <sup>=</sup> *<sup>R</sup> ji*(*sj*,*si*) ({*i*, *<sup>j</sup>*}∈*E*) are derived as follows:

$$\begin{split} \widehat{R}\_{i}(\mathbf{s}\_{i}) &= \exp\left(-1 - \frac{\lambda\_{i}}{k\_{\mathrm{B}}T\left(|\partial i|-1\right)}\right) \\ &\times \exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(-\frac{1}{|\partial i|-1}hd\_{i}\mathbf{s}\_{i} - \frac{1}{|\partial i|-1}\sum\_{j\in\partial i}\lambda\_{i,ij}(\mathbf{s}\_{i})\right)\right) (i\in V), \quad (10.87) \end{split}$$

$$
\widehat{\mathcal{R}}\_{lj}(\mathbf{s}\_i, \mathbf{s}\_j) = \widehat{\mathcal{R}}\_{ji}(\mathbf{s}\_j, \mathbf{s}\_i) = \exp\left(-1 + \frac{\lambda\_{\{i,j\}}}{k\_\mathbf{B}T}\right)
$$

$$
\times \exp\left(\frac{1}{k\_\mathbf{B}T} (J s\_i \mathbf{s}\_j - \lambda\_{i,ij}(\mathbf{s}\_i) - \lambda\_{j,ij}(\mathbf{s}\_j))\right) (\{i, j\} \in E). \tag{10.88}
$$

Finally, λ*<sup>i</sup>* and λ{*i*,*j*} need to be determined so as to satisfy the normalization condition of the marginal probabilities *R <sup>i</sup>*(*si*) and *<sup>R</sup> i j*(*si*,*sj*).

By introducing the messages μ*<sup>k</sup>*→*<sup>i</sup>*(*si*) and μ*<sup>l</sup>*→*<sup>j</sup>*(*sj*) in the transformations

$$\exp\left(-\frac{\lambda\_{i,ij}(s\_i)}{k\_\mathcal{B}T}\right) = \left(\prod\_{k \in \partial i \backslash \{j\}} \mu\_{k \to i}(s\_i)\right) \exp\left(\frac{h}{k\_\mathcal{B}T}d\_i s\_i\right),\tag{10.89}$$

$$\exp\left(-\frac{\lambda\_{j,ij}(\mathbf{s}\_j)}{k\_\mathbf{B}T}\right) = \left(\prod\_{l \in \partial j \backslash \langle i \rangle} \mu\_{l \to j}(\mathbf{s}\_j)\right) \exp\left(\frac{h}{k\_\mathbf{B}T}d\_j\mathbf{s}\_j\right). \tag{10.90}$$

The expressions of the marginal probabilities *R <sup>i</sup>* and *R i j* in Eqs. (10.87) and (10.88) can be reduced to the following expressions:

$$\widehat{R}\_{i}(\mathbf{s}\_{i}) = \frac{1}{Z\_{i}} \left( \prod\_{k \in \partial i} \mu\_{k \to i}(\mathbf{s}\_{i}) \right) \exp\left(\frac{h}{k\_{\mathrm{B}}T} d\_{i}\mathbf{s}\_{i}\right) (i \in V), \tag{10.91}$$

$$\begin{split} \widehat{R}\_{ij}(\mathbf{s}\_{i},\mathbf{s}\_{j}) = \widehat{R}\_{ji}(\mathbf{s}\_{j},\mathbf{s}\_{i}) &= \frac{1}{\mathbf{Z}\_{\{i,j\}}} \left( \prod\_{k \in \partial i \; \langle j \rangle} \mu\_{k \to i}(\mathbf{s}\_{i}) \right) \left( \prod\_{l \in \partial j \; \langle i \rangle} \mu\_{l \to j}(\mathbf{s}\_{j}) \right) \\ &\times \exp\left( \frac{1}{k\_{\mathbb{B}}T} (Js\_{i}\mathbf{s}\_{j} + hd\_{i}\mathbf{s}\_{i} + hd\_{j}\mathbf{s}\_{j}) \right) (\langle i,j \rangle \in E), \end{split} \tag{10.92}$$

$$Z\_i \equiv \sum\_{\mathfrak{r}\_i \in \mathfrak{Q}} \left( \prod\_{k \in \partial i} \mu\_{k \to i}(\mathfrak{r}\_i) \right) \exp\left(\frac{h}{k\_\mathrm{B} T} d\_i \mathfrak{r}\_i\right) (i \in V), \tag{10.93}$$

$$Z\_{\{i,j\}} \equiv \sum\_{\mathfrak{r}\_i \in \mathfrak{\Omega}\mathfrak{r}\_j \in \mathfrak{\Omega}} \left( \prod\_{k \in \bar{\mathcal{H}} \backslash \{j\}} \mu\_{k \to i}(\mathfrak{r}\_i) \right) \left( \prod\_{l \in \bar{\mathcal{H}} \backslash \{i\}} \mu\_{l \to j}(\mathfrak{r}\_j) \right)$$

$$\times \exp\left(\frac{1}{k\_{\mathbb{B}}T} \left( J\mathfrak{r}\_i\mathfrak{r}\_j + hd\_i\mathfrak{r}\_i + hd\_j\mathfrak{r}\_j \right) \right) (\{i,j\} \in E). \tag{10.94}$$

By substituting Eqs. (10.91) and (10.92) into the reducibility conditions in Eq. (10.80), the simultaneous deterministic equations for the messages can be derived as follows:

$$\mu\_{j\to i}(\mathbf{s}\_i) = \frac{Z\_i}{Z\_{\{i,j\}}} \sum\_{\mathbf{r}\_j \in \Omega} \left( \prod\_{l \in \partial j \, \backslash \{i\}} \mu\_{l \to j}(\mathbf{r}\_j) \right) \exp\left(\frac{1}{k\_\mathbf{B}T} (Js\_i \mathbf{r}\_j + h d\_j \mathbf{r}\_j)\right) \tag{10.95}$$

$$\mu\_{i \to j}(\mathbf{s}\_{\bar{i}}) = \frac{\mathbf{Z}\_{\bar{j}}}{\mathbf{Z}\_{\{i,j\}}} \sum\_{\mathbf{t}\_{\bar{i}} \in \Omega} \left( \prod\_{k \in \partial i \backslash \{j\}} \mu\_{k \to i}(\mathbf{t}\_{\bar{i}}) \right) \exp\left( \frac{1}{k\_{\mathrm{B}} T} (J \tau\_{\bar{i}} \mathbf{s}\_{\bar{j}} + h d\_{\bar{i}} \tau\_{\bar{i}}) \right) (\{i, j\} \in E). \tag{10.96}$$

The Bethe free energy functional is given by **--**

$$\mathcal{F}\_{\text{Bethe}}\left[\{\widehat{\mathcal{R}}\_{i}|i\in V\}, \{\widehat{\mathcal{R}}\_{ij}|\{i,j\}\in E\}\right] = -k\_{\text{B}}T\sum\_{i\in V} (1-|\partial i|) \ln Z\_{i} - k\_{\text{B}}T \sum\_{\{i,j\}\in E} \ln Z\_{\{i,j\}}.\tag{10.97}$$

The framework in the Bethe approximation using Eqs. (10.91), (10.92), (10.93), and (10.94) with Eqs. (10.95) and (10.96) is referred to as a **loopy belief propagation** in statistical machine learning theory [12, 46–48]. The present derivation is based on the cluster variation method in Refs. [39, 42–44], and [45]. Recently, some novel approaches for loopy belief propagation methods have been proposed, including the **approximate message passing algorithm** [49], and **replica cluster variation method** [50, 51]. A review summarizing recent developments in loopy belief propagation methods is given in Ref. [52].

By solving

$$\sum\_{\mathbf{r}\_i \in \Omega \mathbf{r}\_f \in \Omega} \sum\_{\mathbf{i} \in \Omega} \widehat{R}\_{ij}(\mathbf{r}\_i, \mathbf{r}\_f) = \widehat{R}\_{ij}(+\mathbf{l}, +\mathbf{l}) + \widehat{R}\_{ij}(-\mathbf{l}, +\mathbf{l}) + \widehat{R}\_{ij}(+\mathbf{l}, -\mathbf{l}) + \widehat{R}\_{ij}(-\mathbf{l}, -\mathbf{l}) = \mathbf{l},\tag{10.98}$$

$$\sum\_{\mathbf{r}\_i \in \Omega \mathbf{r}\_f \in \Omega} \sum\_{i \in \Omega} \mathbf{r}\_i \widehat{R}\_{ij}(\mathbf{r}\_i, \mathbf{r}\_f) = \widehat{R}\_{ij}(+1, +1) - \widehat{R}\_{ij}(-1, +1) + \widehat{R}\_{ij}(+1, -1) - \widehat{R}\_{ij}(-1, -1) = m\_i,\tag{10.99}$$

$$\sum\_{\mathbf{r}\_i \in \Omega \mathbf{r}\_j} \sum\_{\mathbf{r}\_i \in \Omega} \mathbf{r}\_j \widehat{R}\_{ij}(\mathbf{r}\_i, \mathbf{r}\_j) = \widehat{R}\_{ij}(+1, +1) + \widehat{R}\_{ij}(-1, +1) - \widehat{R}\_{ij}(+1, -1) - \widehat{R}\_{ij}(-1, -1) = m\_j,\tag{10.100}$$

$$\sum\_{\tau\_i \in \Omega \tau\_j \in \Omega} \sum\_{i} \tau\_i \tau\_j \widehat{R}\_{ij}(\tau\_i, \tau\_j) \ = \widehat{R}\_{ij}(+1, +1) - \widehat{R}\_{ij}(-1, +1)$$

$$-\widehat{R}\_{ij}(+1, -1) + \widehat{R}\_{ij}(-1, -1) = c\_{\{i, j\}} = c\_{\{j, i\}},$$

as simultaneous linear equations for *R i j*(+1, <sup>+</sup>1), *<sup>R</sup> i j*(−1, <sup>+</sup>1), *<sup>R</sup> i j*(+1, <sup>−</sup>1), and *R i j*(−1, <sup>−</sup>1), we can confirm the following equality:

$$
\widehat{R}\_{ij}(\mathbf{s}\_i, \mathbf{s}\_j) = \frac{1}{4} (1 + m\_i \mathbf{s}\_i + m\_j \mathbf{s}\_j + c\_{\{i,j\}} \mathbf{s}\_i \mathbf{s}\_j). \tag{10.102}
$$

By substituting Eqs. (10.72) and (10.102) into Eq. (10.82), the Bethe free energy functional can be reduced to **--**

$$\mathcal{F}\_{\text{Bethe}}\left[\left\{\widehat{\mathcal{R}}\_{i}\middle|i\in V\right\}, \left\{\widehat{\mathcal{R}}\_{ij}\middle|\left\langle i,j\right\rangle\in E\right\}\right] = F\_{\text{Bethe}}\left(\left\{m\_{i}\middle|i\in V\right\}, \left\{c\_{\left\langle i,j\right\rangle}\middle|\left\langle i,j\right\rangle\in E\right\}\right), (10.103)\right)$$

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 185

$$F\_{\text{Bethe}}\left(\left\{m\_{i}\left|\ell\in V\right.\right.\right), \left\{c\_{\left[l,j\right]}\left|\left
$$-k\_{\text{B}}T\sum\_{i\in V}(1-\left|\partial l\right|)\sum\_{s\_{i}=\pm 1}\widehat{R}\_{i}(s\_{i})\ln\left(\widehat{R}\_{i}(s\_{i})\right) + k\_{\text{B}}T\sum\_{\left\{l,j\right\}\in E: s\_{j}=\pm 1}\widehat{R}\_{lj}(s\_{i},s\_{j})\ln\left(\widehat{R}\_{lj}(s\_{l},s\_{j})\right). \tag{10.104}$$
$$

The extremum conditions

$$\frac{\partial}{\partial m\_k} F\_{\text{Bethe}}\left(\{m\_i \, | \, i \in V\}, \{c\_{\{i,j\}} \, | \, \{i,j\} \in E\}\right) = 0 \, (k \in V), \tag{10.105}$$

$$\frac{\partial}{\partial c\_{\{k,l\}}} F\_{\text{Bethe}}\left(\left\{m\_i \,|\, i \in V\right\}, \left\{c\_{\{i,j\}} \,|\, \{i,j\} \in E\right\}\right) = 0 \ (\{k,l\} \in E) \qquad (10.106)$$

can be reduced to the following simultaneous equations:

$$\frac{h}{2k\_{\rm B}T}d\_{l} = \frac{1}{2}(1 - |\partial l|) \sum\_{\mathbf{r}\_{l} \in \Omega} \mathbf{r}\_{l} \ln\left(\widehat{R}\_{l}(\mathbf{r}\_{l})\right) + \frac{1}{4} \sum\_{f \in \mathcal{S} \land \mathbf{r}\_{l} \in \Omega} \sum\_{\mathbf{r}\_{f} \in \Omega} \mathbf{r}\_{l} \ln\left(\widehat{R}\_{l/}(\mathbf{r}\_{l}, \mathbf{r}\_{f})\right) (i \in V), \quad (10.107)$$

$$\frac{J}{k\_{\rm B}T} = \frac{1}{4} \sum\_{\mathbf{r}\_i \in \Omega \mathbf{r}\_j \in \Omega} \mathbf{r}\_i \pi\_j \ln \left( \widehat{R}\_{ij} (\pi\_i, \pi\_j) \right) \text{ (} \langle i, j \rangle \in E). \tag{10.108}$$

The schemes for the derivations of Eqs. (10.107) and (10.108) from the Bethe free energy (Eqs. (10.103)–(10.104)) are given in Refs. [41, 53–55].

In the advanced mean-field method, some researchers are interested in perturbative computation of the correction terms with respect to *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* from the mean-field free energy [56, 57], which is referred to as a **Thouless-Anderson-Palmar (TAP) free energy**. The scheme used in the derivations has been extended to a classical Heisenberg model [58]. One familiar perturbative method in statistical mechanical informatics is the Plefka expansion, in which we obtain higher-order correction terms with respect to *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* from the mean-field free energy [12]. By substituting Eq. (10.102) into Eq. (10.108), *<sup>c</sup>*{*i*,*j*} can be expressed in terms of *mi* , *<sup>m</sup> <sup>j</sup>* , and *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* . It is known that the TAP equation can be derived by expanding the expression for *c*{*i*,*j*} up to the second-order term *<sup>J</sup> k*B*T* 2 with respect to an infinitesimal *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* and by substituting it into Eq. (10.82) with Eqs. (10.72) and (10.102) [41], The fundamental framework of the TAP free energy and its expansion using the advanced mean-field method has been clarified [59]. The Bethe free energy functional and the TAP free energy as well as loopy belief propagation have been applied to Boltzmann machine learning [53, 60–64]. Some recent developments appear in Chap. 7 of Part 3 in this book.

The EM schemes with advanced mean-field methods in the previous sections have been applied to noise reduction in probabilistic image processing [30, 51, 65–69]. The basic frameworks are based on Eqs. (10.14) and (10.15) with the two-body and

**Fig. 10.1** Fundamental framework of Bayesian noise reduction by generalized sparse prior and additive white Gaussian noise

one-body posterior marginal probability distributions in Eqs. (10.11) and (10.12) as well as the two-body prior marginal probability distribution in Eq. (10.13). They can be computed by means of the message passing algorithms in Eqs. (10.91) and (10.92) with Eqs. (10.93), (10.94), (10.95), and (10.96) for the Ising model in Eqs. (10.47), (10.48), and (10.50) with the prior and posterior probability distributions in Eqs. (10.2) and (10.3), respectively. The framework and some numerical experimental results are shown in Figs. 10.1 and 10.2, respectively. Moreover, the loopy belief propagation is applicable to Bayesian image segmentation in the framework of Sect. 10.2.3 [70]. They are also useful for community detection be means of the stochastic block model for modular networks [71–73].

**Fig. 10.2** Numerical experiments in Bayesian noise reduction by the generalized sparse prior and additive white Gaussian noise

## *10.3.3 Free Energy Landscapes and Phase Transitions in the Thermodynamic Limit*

In this section, we consider the Ising model defined by

$$P\left(s\Big|\frac{J}{k\_{\rm B}T},\frac{h}{k\_{\rm B}T}\right) \equiv \frac{1}{Z\left(\frac{J}{k\_{\rm B}T},\frac{h}{k\_{\rm B}T}\right)}\exp\left(\frac{1}{k\_{\rm B}T}H(\mathbf{s})\right),\tag{10.109}$$

where

$$\begin{aligned} H(\mathbf{s}) &= H(\mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{|V|})\\ &\equiv -J \sum\_{\{i,j\} \in E} s\_i \mathbf{s}\_j - h \sum\_{i \in V} \mathbf{s}\_i \ (J > 0), \end{aligned} \tag{10.110}$$

$$Z\left(\frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) \equiv \sum\_{s\_1 \in \Omega s\_2 \in \Omega} \dots \sum\_{s\_{|V|} \in \Omega} \exp\left(-\frac{1}{k\_{\rm B}T}H(\mathbf{s})\right). \tag{10.111}$$

The energy function in Eq. (10.110) corresponds to the one in Eq. (10.49) for the case of *di* = 1 for every node *i*(∈*V*). In the present section, we consider a regular of degree 4 that includes a square grid graph with periodic boundary conditions along the *x*- and *y*-direction as shown in Fig. 10.3.

For the Ising model in Eq. (10.110) and its partition function in Eq. (10.50), we have the free energy per node

$$f(J, h, T) = -k\_{\mathbb{B}} T \times \lim\_{|V| \to +\infty} \frac{1}{|V|} \ln(Z),\tag{10.112}$$

the internal energy for zero external field

$$\begin{split} J\mu &\equiv \frac{\partial}{\partial \left(\frac{1}{k\_{\text{B}}T}\right)} \Big(\frac{1}{k\_{\text{B}}T} f(J, h = 0, T)\Big) \\ &= \lim\_{|V| \to +\infty} \frac{1}{|E|} \sum\_{\langle i, j \rangle \in E} \sum\_{s} (-s\_{i}s\_{j}) P\left(s \Big| \frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T} = 0\right), \end{split} \tag{10.113}$$

and the spontaneous magnetization

$$\begin{split} m\_{\pm} & \equiv \lim\_{h \to \pm 0} \frac{\partial}{\partial \left( \frac{h}{k\_{\text{B}}T} \right)} \left( \frac{1}{k\_{\text{B}}T} f(J, h, T) \right) \\ &= \lim\_{h \to \pm 0} \lim\_{|V| \to +\infty} \frac{1}{|V|} \sum\_{i \in V} \sum\_{s} s\_{i} P\left( s \Big| \frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T} \right), \end{split} \tag{10.114}$$

as important statistical quantities in the thermodynamic limit |*V*|→ + ∞. The existence of the thermodynamic limit |*V*|→ + ∞ means that the limit of the right-hand side in Eq. (10.110) converges. Sufficient conditions for the existence of the thermodynamic limit of the Ising model of Eqs. (10.109), (10.110), and (10.111) have been given by Ruelle in Ref. [38].

In the thermodynamic limit |*V*|→ + ∞ for the Ising model in Eq. (10.110) on a square grid graph with periodic boundary conditions along the *x*- and *y*-direction as shown in Fig. 10.3,

$$\frac{u}{J} = -\coth\left(\frac{2J}{k\_B T}\right) \left(1 + \left(2\tanh^2\left(\frac{2J}{k\_B T}\right) - 1\right) \left(\frac{2}{\pi}\right) \int\_0^{\frac{\pi}{2}} \left(1 - \left(\frac{2\sinh\left(\frac{2J}{k\_B T}\right)}{\cosh^2\left(\frac{2J}{k\_B T}\right)}\right)^2 \sin^2(\theta)\right)^{-\frac{1}{2}} d\theta\right). \tag{10.115}$$

$$m\_{\pm}{}^{2} = \lim\_{|r\_{i} - r\_{j}| \to +\infty} \lim\_{|V| \to +\infty} \sum\_{s} s\_{i} s\_{j} \, P\left(s \, \middle|\, \frac{J}{k\_{\rm B}T}, \, \frac{h}{k\_{\rm B}T} = 0\right)$$

$$= \begin{cases} 0 & \left(\frac{J}{k\_{\rm B}T} < \frac{1}{2} \text{arcsinh}(1)\right) \\ \left(1 - \sinh^{-4}\left(\frac{2J}{k\_{\rm B}T}\right)\right)^{\frac{1}{4}} & \left(\frac{J}{k\_{\rm B}T} > \frac{1}{2} \text{arcsinh}(1)\right) \end{cases} \,, \qquad (10.116)$$

where *r<sup>i</sup>* is the position vector of each node *i*(∈*V*) [34, 74, 75]. In Eq. (10.116), the spontaneous magnetizations *m*<sup>+</sup> and *m*<sup>−</sup> correspond to each branch of *m*+≥0 and *m*−≤0, respectively. They are as shown in Fig. 10.4. Note that for the Ising model in Eq. (10.110) on such regular graphs,

$$m\_{i,V}\left(\frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) \equiv \sum\_{s} s\_{i}P\left(s \Big| \frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right),\tag{10.117}$$

for every *i*(∈*V*), does not depend on *i* but can be expressed as *mV J <sup>k</sup>*B*<sup>T</sup>* , *<sup>h</sup> k*B*T* .

In the mean-field approximation of the previous subsection, the spontaneous magnetizations

$$m\_{\pm} = \lim\_{h \to \pm 0 \mid V \vert \to +\infty} \lim\_{} m\_{V} \left( \frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T} \right), \tag{10.118}$$

are given as solutions of the following mean-field equation:

$$m\_{\pm} = \tanh\left(\frac{1}{k\_{\rm B}T}(h + 4Jm\_{\pm})\right)(h \to \pm 0),\tag{10.119}$$

and the internal energy *J u* in Eq. (10.113) in the mean-field approximation is given as

$$
\mu = -m\_{\pm}{}^2. \tag{10.120}
$$

The solutions of Eq. (10.119) correspond to the extremum values of the following mean-field free energy:

**Fig. 10.4** Internal energy *J u* in Eq. (10.113) and magnetization *m*<sup>±</sup> in Eq. (10.114) in the Onsager solution in the Ising model of Eqs. (10.48) and (10.50) with Eq. (10.110) on the square grid graph (*V*, *E*) with the periodic boundary conditions along the *x*- and *y*-direction

$$\begin{aligned} f\_{\text{MF}}(m) & \equiv \frac{1}{|V|} F\_{\text{MF}}(m\_1, m\_2, \dots, m\_{|V|}) \\ &= -2Jm^2 + k\_{\text{B}}T \left( \frac{1}{2} (1+m) \right) \ln \left( \frac{1}{2} (1+m) \right) \\ &+ k\_{\text{B}}T \left( \frac{1}{2} (1-m) \right) \ln \left( \frac{1}{2} (1-m) \right), \end{aligned} \tag{10.121}$$

which corresponds to <sup>1</sup> <sup>|</sup>*V*<sup>|</sup> *<sup>F</sup>*MF *m*1, *m*2, ···, *m*|*V*<sup>|</sup> in Eq. (10.75) for *h* = 0. The spontaneous magnetization *m*<sup>±</sup> and the internal energy *u* for *h* = 0 are computed by setting 0 < *h k*B*T* < 10−<sup>5</sup> and using the iteration method for Eq. (10.119) numerically. The graphs of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* , *u* and *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* , *m*<sup>±</sup> are shown in Fig. 10.5. Moreover, the graphs of *m*, <sup>1</sup> *<sup>k</sup>*B*<sup>T</sup> f*MF(*m*) for *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* = 0.20, 0.25, and 0.40 are shown in Fig. 10.6. It is known that the mean-field equation always has the trivial solution *m*<sup>±</sup> = 0, and begins to have some non-trivial solutions for *m*<sup>+</sup> > 0 and *m*<sup>−</sup> < 0. The mean-field equation (10.119) begins to have some non-trivial solutions in the region of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>&</sup>gt; <sup>1</sup> 4 by expanding the right-hand side of Eq. (10.119) around *m* = 0 and keeping the first-order term of *m*.

**Fig. 10.5** Internal energy *u* from Eq. (10.113) and magnetization *m*<sup>±</sup> from Eq. (10.118) in meanfield approximation for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (*V*, *E*) of degree 4

**Fig. 10.6** Free energy from Eq. (10.121) in mean-field approximation for the Ising model in Eqs. (10.48), (10.50) and (10.110) on the regular graph (*V*, *E*) of degree 4. **a** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.2, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* = 0. **b** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.25, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> 0. **<sup>c</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.4, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* = 0

Next, we consider the Bethe approximation for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (*V*, *E*) of degree 4. In this case, the average *mi* , the correlation *ci j* and the messages μ*<sup>i</sup>*→*<sup>j</sup>*(+1) and μ*<sup>i</sup>*→*<sup>j</sup>*(−1) do not depend on *i* and *j*, and can be expressed as *m*, *c*, μ(+1) and μ(−1). We now introduce

$$
\Lambda \equiv \frac{1}{2} k\_{\rm B} T \ln \left( \frac{\mu(+1)}{\mu(-1)} \right). \tag{10.122}
$$

The message passing equations in Eqs. (10.95) and (10.96) and the magnetization are reduced to

$$\frac{\Lambda}{k\_{\rm B}T} = \text{arctanh}\left(\tanh\left(\frac{J}{k\_{\rm B}T}\right)\tanh\left(\frac{h+3\Lambda}{k\_{\rm B}T}\right)\right). \tag{10.123}$$

Moreover, since the marginal probabilities *R <sup>i</sup>*(+1) and *<sup>R</sup> <sup>i</sup>*(−1) are also independent of *i*, we can derive the expression for the magnetization in terms of  as follows:

$$\lim\_{h \to \text{Bethe}} \left( \frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T} \right) \equiv \frac{1}{|V|} \sum\_{i \in V} \left( \sum\_{s\_i \in \Omega} s\_i \widehat{R}(s\_i) \right) = \tanh \left( \frac{h + 4\Lambda}{k\_{\text{B}}T} \right). \tag{10.124}$$

For the infinitesimal small limits of *h*, such that *h*→ + 0 and *h*→ − 0, the magnetization *<sup>m</sup>*Bethe *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* , *<sup>h</sup> k*B*T* in Eq. (10.124) can be computed numerically by using the iteration method. Moreover, Eqs. (10.104) and (10.108) can be reduced to

$$\begin{split} f\_{\text{Bethe}}(m,c) &= \frac{1}{|V|} F\_{\text{Bethe}}\left(\left\{m\_{l}\left|i\in V\right\}, \left\{c\_{\left[l,j\right]}\left|\left$$

and

$$\frac{J}{k\_{\rm B}T} = \ln \frac{(1+c)^2 - 4m^2}{(1-c)^2},\tag{10.126}$$

such that

10 Review of Sublinear Modeling in Probabilistic Graphical Models … 193

$$c = \frac{1}{\tanh\left(\frac{2J}{k\_BT}\right)} \left(1 - \sqrt{1 - (1 - 2m^2)\tanh^2\left(\frac{2J}{k\_BT}\right) - 2m^2 \tanh\left(\frac{2J}{k\_BT}\right)}\right). \tag{10.127}$$

The graphs of

$$\left(m, \frac{1}{k\_{\rm B}T} f\_{\rm Bethe}\left(m, \frac{1}{\tanh\left(\frac{2J}{k\_{\rm B}T}\right)}\left(1 - \sqrt{1 - (1 - 2m^2)\tanh^2\left(\frac{2J}{k\_{\rm B}T}\right) - 2m^2 \tanh\left(\frac{2J}{k\_{\rm B}T}\right)}\right)\right)\right). \tag{10.128}$$

for *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.25, *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> arctanh <sup>1</sup> 3 , and *<sup>k</sup>*B*<sup>T</sup> <sup>J</sup>* = 0.40 in the case of *h* = 0 are shown in Figs. 10.7, 10.8, and 10.9, respectively. Figure 10.7 shows the internal energy *u* from Eq. (10.113) and the spontaneous magnetization *m*<sup>±</sup> in Eq. (10.118) in loopy belief propagation (Bethe approximation) for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (*V*, *E*) of degree 4. These quantities *u* and *m*<sup>±</sup> are obtained by

$$\mu\_{\rm Bethe} \left( \frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T} = 0 \right) = \frac{\cosh \left( \frac{6\Lambda}{k\_{\rm B}T} \right) - \exp \left( -\frac{2J}{k\_{\rm B}T} \right)}{\cosh \left( \frac{6\Lambda}{k\_{\rm B}T} \right) + \exp \left( -\frac{2J}{k\_{\rm B}T} \right)},\tag{10.129}$$

$$m\_{\rm Bethe} \left( \frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T} = 0 \right) = \tanh \left( \frac{4\,\Lambda}{k\_{\rm B}T} \right), \tag{10.130}$$

where

$$\frac{\Lambda}{k\_{\rm B}T} = \text{arctanh}\left(\tanh\left(\frac{J}{k\_{\rm B}T}\right)\tanh\left(\frac{3\Lambda}{k\_{\rm B}T}\right)\right). \tag{10.131}$$

These always give the same results as in Eq. (10.110) on the regular graph (*V*, *E*) of degree 4. In particular, it is known that the results for Eqs. (10.48), (10.50), and (10.110) on the regular tree graph (*V*, *E*) of degree 4 are exact. It is known that Eq. (10.123) always has the trivial solution  = 0, but begins to have some non-trivial solutions in the region of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup> <sup>J</sup>* <sup>&</sup>gt; arctanh <sup>1</sup> 3 by expanding the right-hand side of Eq. (10.123) around  = 0 and keeping the first-order term of . In Fig. 10.7, the blue curves correspond to global minimum states that are stable states and the red lines correspond to the local maximum state that are unstable states for each value of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* in the Bethe free energy *f*Bethe(*m*, *c*) of Eq. (10.125) for the case of *h* = 0. The Bethe free energy landscapes *f*Bethe(*m*, *c*) of Eq. (10.125) in the case of *h* = 0 for several values of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* are shown in Figs. 10.8 and 10.9. It is known that Eq. (10.123) always has the trivial solution  = 0, but begins to have some non-trivial solutions in the region of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>&</sup>gt; arctanh <sup>1</sup> 3 by expanding the right-hand side of Eq. (10.123) around  = 0 and keeping the first-order term of . In Fig. 10.7, the blue curves correspond to global minimum states that are stable states and the red lines

**Fig. 10.7** Internal energy *u* from Eq. (10.113) and magnetization *m* from Eq. (10.118) in loopy belief propagation (Bethe approximation) for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (*V*, *E*) of degree 4

correspond to the local maximum state that are unstable states for each value of *<sup>J</sup> k*B*T* in the Bethe free energy *f*Bethe(*m*, *c*) in Eq. (10.125) for the case of *h* = 0. The Bethe free energy landscapes *f*Bethe(*m*, *c*) in Eq. (10.125) in the case of *h* = 0 for several values of *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* are shown in Figs. 10.8 and 10.9.

Now we consider the |-|-state Potts model [33] given by

$$\begin{split} &P\left(s\,\middle|\,\frac{J}{k\_{\rm B}T},\frac{h\_{0}}{k\_{\rm B}T},\frac{h\_{1}}{k\_{\rm B}T},\dots,\frac{h\_{|\Omega|-1}}{k\_{\rm B}T}\right) \\ &=\frac{1}{Z}\left(\prod\_{\{i,j\}\in E}\exp\left(\frac{J}{k\_{\rm B}T}\delta\_{s\_{i}s\_{j}}\right)\right)\left(\prod\_{i\in V}\prod\_{n\in\Omega}\exp\left(\frac{h\_{n}}{k\_{\rm B}T}\delta\_{s\_{i},n}\right)\right),\end{split} \tag{10.132}$$

**Fig. 10.8** Bethe free energy for *f*Bethe(*m*, *c*) for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph (*V*, *E*) of degree 4. **a** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.25, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> 0. **<sup>b</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> arctanh <sup>1</sup> 3 , *h <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> 0. **<sup>c</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.4, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* = 0

**Fig. 10.9** Bethe free energy Extremum *<sup>c</sup> <sup>f</sup>*Bethe(*m*, *<sup>c</sup>*)for the Ising model of Eqs. (10.48), and (10.50) with Eqs. (10.110) on the regular graph (*V*, *E*) with degree 4. **a** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.25, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> 0. **<sup>b</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* = arctanh <sup>1</sup> 3 , *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> 0. **<sup>c</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.4, *<sup>h</sup> <sup>k</sup>*B*<sup>T</sup>* = 0

$$Z \equiv \sum\_{s\_1 \in \Omega} \sum\_{r\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} \left( \prod\_{\{i,j\} \in E} \exp\left(\frac{J}{k\_{\mathbb{B}}T} \delta\_{s\_i s\_j}\right) \right) \left( \prod\_{i \in V n \in \Omega} \prod\_{\sigma \in \Omega} \exp\left(\frac{h\_n}{k\_{\mathbb{B}}T} \delta\_{s\_i, n}\right) \right), \tag{10.133}$$

where - = {0, 1, 2, ···, |-| − 1}. By similar arguments to those for Eqs. (10.72) and (10.102), the marginal probabilities *R <sup>i</sup>*(*si*) and *<sup>R</sup> i j*(*si*,*sj*) can be expressed as orthonormal expansions as follows:

$$\widehat{R}\_{i}(\mathbf{s}\_{i}) = \left(\frac{1}{|\Omega|}\right) + \sum\_{k \in \Omega \backslash \{0\}} m\_{i}^{(k)} \Phi\_{k}(\mathbf{s}\_{i}),\tag{10.134}$$

$$\begin{split} \widehat{R}\_{ij}(s\_i, s\_j) &= \left(\frac{1}{|\Omega|}\right)^2 + \left(\frac{1}{|\Omega|}\right) \sum\_{k \in \Omega \backslash \{0\}} m\_i^{(k)} \Phi\_k(s\_i) \\ &+ \left(\frac{1}{|\Omega|}\right) \sum\_{l \in \Omega \backslash \{0\}} m\_j^{(l)} \Phi\_l(s\_j) + \sum\_{k \in \Omega \backslash \{0\} l \in \Omega \backslash \{0\}} c\_{ij}^{(k,l)} \Phi\_k(s\_i) \Phi\_l(s\_j), \quad (10.135) \end{split}$$

where { *<sup>k</sup>* (*si*)|*si*∈-, *k*∈-} is the set of orthonormal polynomials satisfying the following relationships:

$$\Phi\_0(s\_i) \equiv \left(\frac{1}{|\Omega|}\right),\tag{10.136}$$

$$\sum\_{s\_i \in \Omega} \Phi\_k(\mathbf{s}\_i) \Phi\_l(\mathbf{s}\_i) = \delta\_{k,l} \ (k \in \Omega, l \in \Omega). \tag{10.137}$$

Because it is valid that

$$\sum\_{s\_i \in \Omega s\_i} \sum\_{\mathbf{c} \in \Omega} \Phi\_k(\mathbf{s}\_i) \Phi\_l(\mathbf{s}\_j) \delta\_{\mathbf{s}\_i, \mathbf{s}\_j} = \sum\_{\mathbf{s}\_l \in \Omega} \Phi\_k(\mathbf{s}\_i) \Phi\_l(\mathbf{s}\_i) = \delta\_{k,l} \ (k \in \Omega, l \in \Omega), \tag{10.138}$$

we have the following orthonormal expansion of δ*si*,*sj* :

$$\delta\_{s\_i, s\_j} = \sum\_{k \in \Omega} \sum\_{l \in \Omega} \left( \sum\_{s\_i' \in \Omega s\_j' \in \Omega} \Phi\_k(s\_i') \Phi\_l(s\_j') \delta\_{s\_i', s\_j'} \right) \Phi\_k(s\_i) \Phi\_l(s\_j) $$
 
$$= \sum\_{k \in \Omega} \Phi\_k(s\_i) \Phi\_k(s\_j) \ (s\_i \in \Omega, s\_j \in \Omega). \tag{10.139}$$

By using Eqs. (10.134) and (10.135) and the orthonormal expansion of the two-body interaction part of the Potts model, the Bethe free energy functional for the Potts model in Eqs. (10.132) and (10.133) can be reduced to **--**

$$\begin{aligned} &= \mathcal{F}\_{\text{Bethe}}\left[ \left\{ \widehat{\mathcal{R}}\_{i} \,|\, i \in V \right\}, \left\{ \widehat{\mathcal{R}}\_{lj} \,|\, \{i,j\} \in E \right\} \right] \\ &= F\_{\text{Bethe}}\left( \left\{ m\_{i}^{(k)} \,|\, i \in V, \, k \in \mathfrak{Q} \,\backslash \{0\} \right\}, \left\{ c\_{\left\{i,j\right\}}^{(k,l)} \middle| \, \{i,j\} \in E, \, k, \, l \in \mathfrak{Q} \,\backslash \{0\} \right\} \right), (10.140) \end{aligned}$$

where

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 197

$$\begin{split} &F\_{\text{Bethe}}\left(\left\{m\_{i}^{(k)} \middle| i \in V, \ k \in \Omega \backslash \{0\} \right\}, \left\{c\_{\langle i,j\rangle}^{(k,l)} \middle| \langle i,j \rangle \in E, \ k, l \in \Omega \backslash \{0\} \right\} \right) \\ &\equiv -J \sum\_{i \in V} |\partial i| \left(\frac{1}{|\Omega|}\right)^{2} - J \sum\_{i \in V} |\partial i| \frac{1}{|\Omega|} \sum\_{k \in \Omega \backslash \{0\}} m\_{i}^{(k)} - J \sum\_{\langle i,j \rangle \in E \&\in \Omega \backslash \{0\}} c\_{i,j}^{(k,k)} \\ &+ k\_{\text{B}} T \sum\_{i \in V} (1 - |\partial i|) \widehat{R}\_{i}(s\_{i}) \ln \left(\widehat{R}\_{i}(s\_{i})\right) \\ &+ k\_{\text{B}} T \sum\_{\langle i,j \rangle \in E} \widehat{R}\_{ij}(s\_{i}, s\_{j}) \ln \left(\widehat{R}\_{ij}(s\_{i}, s\_{j})\right). \end{split} \tag{10.141}$$

For the case of spatially uniformity, *m*(*k*) *<sup>i</sup>* and *c* (*k*,*l*) *i j* are independent of *i* and {*i*, *j*} and can be represented by *m*(*k*) and *c*(*k*,*l*) , respectively, in the Bethe free energy in Eq. (10.141). For the three-state and four-state Potts model, the Bethe free energy in Eq. (10.141) can be represented by

$$F\_{\text{Bethe}}\left(\binom{m^{(1)}}{m^{(2)}}, \binom{c^{(1,1)}}{c^{(2,1)}} \stackrel{c^{(1,2)}}{c^{(2,2)}}\right),\tag{10.142}$$

and

$$F\_{\text{Bethe}}\left(\begin{pmatrix}m^{(1)}\\m^{(2)}\\m^{(3)}\end{pmatrix}, \begin{pmatrix}c^{(1,1)}\ c^{(1,2)}\ c^{(1,3)}\\c^{(2,1)}\ c^{(2,2)}\ c^{(2,3)}\\c^{(3,1)}\ c^{(3,2)}\ c^{(3,3)}\end{pmatrix}\right),\tag{10.143}$$

respectively. Figures 10.10 and 10.11 show the internal energy with no external fields

$$\mu = \lim\_{|V| \to +\infty} \frac{1}{|E|} \sum\_{\{i,j\} \in E} \sum\_{\mathbf{s}} (-\mathfrak{k}\_{\mathbf{i}\_i, \mathbf{s}\_j}) P\left(\mathbf{s} \Big| \frac{J}{k\_\mathbf{B}T}, \frac{h\_0}{k\_\mathbf{B}T} = 0, \frac{h\_1}{k\_\mathbf{B}T} = 0, \dots, \frac{h\_{|\Omega|-1}}{k\_\mathbf{B}T} = 0\right),\tag{10.144}$$

in loopy belief propagation (Bethe approximation) on the regular graph (*V*, *E*) of degree 4. We now consider also the moments *m*(2) and *m*(1) as **order parameters** for the three-state and four-state Potts model, respectively, for the following cases:

$$\begin{cases} \text{(I)}\lim\_{h\_0 \to +0} m^{(2)}, h\_1 = h\_2 = 0, \\ \text{(II)}\lim\_{h\_1 \to +0} m^{(2)}, h\_0 = h\_2 = 0, \\ \text{(III)}\lim\_{h\_2 \to +0} m^{(2)}, h\_0 = h\_1 = 0, \\ \text{(IV)}\ m^{(2)}\text{ under }h\_0 = h\_1 = h\_2 = 0 \text{ and } \mu(0) = \mu(1) = \mu(2) = \frac{1}{3}, \end{cases} \tag{10.145}$$

for the three-state Potts model, and

$$\begin{cases} \text{(I)} \quad \lim\_{h\_0 \to +0} m^{(2)}, h\_1 = h\_2 = h\_3 = 0, \\ h\_0 \to +0 \\ \text{(II)} \quad \lim\_{h\_1 \to +0} m^{(2)}, h\_0 = h\_2 = h\_3 = 0, \\ \text{(III)} \quad \lim\_{h\_2 \to +0} m^{(2)}, h\_0 = h\_1 = h\_3 = 0, \\ \text{(IV)} \quad \lim\_{m} m^{(2)}, h\_0 = h\_1 = h\_2 = 0, \\ h\_3 \to +0 \\ \text{(V)} \text{ } m^{(2)} \text{ under } h\_0 = h\_1 = h\_2 = 0 \text{ and } \mu(0) = \mu(1) = \mu(2) = \mu(3) = \frac{1}{4}, \end{cases} \tag{10.146}$$

for the four-state Potts model. These are also shown in Figs. 10.10 and 10.11. In Figs. 10.10 and 10.11, blue, green, and red lines show the global minimum states, local minimum states, and local maximum states, respectively, of the Bethe free energies which are given by Eq. (10.142) for the three-state Potts model and by Eq. (10.143) for the four-state Potts model. In the global minimum states, there exist discontinuous points in *m*(2) and *m*(1) as well as *u*. Although the first derivative *J u* of the free energy with respect to <sup>1</sup> *<sup>k</sup>*B*<sup>T</sup>* is always continuous, the second derivative diverges or has discontinuity in the Ising model as shown in Figs. 10.4, 10.5, and 10.7. This kind of singularity is referred to as a **second-order phase transition** in statistical mechanics. However, the first derivative *J u* of the free energy with respect to <sup>1</sup> *<sup>k</sup>*B*<sup>T</sup>* has a discontinuity as shown in Figs. 10.10 and 10.11. This singularity is referred to as a **first-order phase transition** in statistical mechanics. Figures 10.12 and 10.13 show the Bethe free energy landscapes

$$f\_{\text{Bethe}}\left(m^{(1)}, m^{(2)}\right) \equiv \frac{1}{|V|} \underset{\left(c^{(1,1)}, c^{(1,2)}\right)}{\text{extremum}} \ F\_{\text{Bethe}}\left(\binom{m^{(1)}}{m^{(2)}}, \binom{c^{(1,1)}}{c^{(2,1)}} \frac{c^{(1,2)}}{c^{(2,2)}}\right),\tag{10.147}$$

for the three-state Potts model and

$$f\_{\text{Bethe}}\left(m^{(1)},m^{(3)}\right) \equiv \frac{1}{|V|} \underbrace{\text{extremum}}\_{\begin{subarray}{c} c^{(1,1)}, c^{(1,2)} \ c^{(1,3)} \\ c^{(2,1)}, c^{(2,2)} \ c^{(2,3)} \\ c^{(3,1)} \ c^{(3,2)} \ c^{(3,3)} \end{subarray}} F\_{\text{Bethe}}\left(\begin{pmatrix} m^{(1)} \\ m^{(2)} \end{pmatrix}, \begin{pmatrix} c^{(1,1)} & c^{(1,2)} & c^{(1,3)} \\ c^{(2,1)} & c^{(2,2)} & c^{(2,3)} \\ c^{(3,1)} & c^{(3,2)} & c^{(3,3)} \end{pmatrix}\right),\tag{10.148}$$

for the four-state Potts model, respectively.

#### *10.3.4 Ising Model on a Complete Graph*

This section considers a complete graph (*V*, *E*) for which the energy function *H*(*s*) is defined by

$$\mathcal{P}\_{\ell}(s\_{\ell}) \equiv \lim\_{|V| \to +\infty} \sum\_{s\_1 \in \mathbb{R}} \sum\_{s\_2 \in \mathbb{R}} \cdots \sum\_{s\_{\ell-1} \in \mathbb{R}} \sum\_{s\_{\ell+1} \in \mathbb{R}} \sum\_{s\_{\ell+2} \in \mathbb{R}} \cdots \sum\_{s\_{|V|} \in \mathbb{R}} \mathcal{P}\left(s \, \middle| \, \frac{I}{\mathbf{k}\_B \mathbf{T}}, \frac{\hbar\_0}{\mathbf{k}\_B \mathbf{T}}, \frac{\hbar\_1}{\mathbf{k}\_B \mathbf{T}}, \frac{\hbar\_2}{\mathbf{k}\_B \mathbf{T}}\right) (\forall s\_{\ell} \in \mathfrak{A})^{\ell}$$

**Fig. 10.10** Internal energy *<sup>u</sup>* in Eq. (10.144) and (I) lim *<sup>h</sup>*0→+<sup>0</sup> *<sup>m</sup>*(2) for *<sup>h</sup>*<sup>1</sup> <sup>=</sup> *<sup>h</sup>*<sup>2</sup> <sup>=</sup> 0, (II) lim *<sup>h</sup>*1→+<sup>0</sup> *m*(2) for *<sup>h</sup>*<sup>0</sup> <sup>=</sup> *<sup>h</sup>*<sup>2</sup> <sup>=</sup> 0, (III) lim *<sup>h</sup>*2→+<sup>0</sup> *<sup>m</sup>*(2) for *<sup>h</sup>*<sup>0</sup> <sup>=</sup> *<sup>h</sup>*<sup>1</sup> <sup>=</sup> 0, (IV) *<sup>m</sup>*(2) under *<sup>h</sup>*<sup>0</sup> <sup>=</sup> *<sup>h</sup>*<sup>1</sup> <sup>=</sup> *<sup>h</sup>*<sup>2</sup> <sup>=</sup> 0 and μ(0) <sup>=</sup> μ(1) <sup>=</sup> μ(2) <sup>=</sup> <sup>1</sup> <sup>3</sup> such that *Pi*(0) = *Pi*(1) = *Pi*(2) in loopy belief propagation (Bethe approximation) for the three-state Potts model in Eqs. (10.132) and (10.133) on the regular graph (*V*, *E*) of degree 4

$$H(\mathbf{s}) = H(\mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{|V|}) \equiv -\frac{J}{|V|} \sum\_{\{i,j\} \in E} s\_i \mathbf{s}\_j - h \sum\_{i \in V} s\_i \ (J > 0), \tag{10.149}$$

instead of Eq. (10.110) in Eqs. (10.109) and (10.111). Note that the interaction between every pair of connected nodes is set to *<sup>J</sup>* <sup>|</sup>*V*<sup>|</sup> to guarantee the existence of the thermodynamic limit in |*V*|→ + ∞ for the complete graph in the sense of Ruelle in Ref. [38].

**Fig. 10.11** Internal energy *u* in Eq. (10.144) and order parameter *m*(1) in loopy belief propagation (Bethe approximation) for the four-state Potts model in Eqs. (10.132) and (10.133) on the regular graph (*V*, *E*) of degree 4

**Fig. 10.12** Bethe free energy for *f*Bethe(*m*(1) , *m*(2) ) for the three-state Potts model in Eqs. (10.132) and (10.133) on the regular graph (*V*, *E*) of degree 4. **a** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.850. **<sup>b</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.880. **<sup>c</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* = 0.881. **d** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.882. **<sup>e</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.885. **<sup>f</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* = 0.920

**Fig. 10.13** Bethe free energy for *f*Bethe(*m*(1) , *m*(2) ) for the four-state Potts model in Eqs. (10.132) and (10.133) on the regular graph (*V*, *E*) of degree 4. **a** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>0</sup>.900. **<sup>b</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>1</sup>.000. **<sup>c</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* = 1.010. **d** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>1</sup>.020. **<sup>e</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* <sup>=</sup> <sup>1</sup>.050. **<sup>f</sup>** *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* = 1.100

The free energy in Eq. (10.55) is expressed as follows:

$$\begin{split} & -k\_{\mathrm{B}}T\ln\left(Z\left(\frac{J}{k\_{\mathrm{B}}T},\frac{h}{k\_{\mathrm{B}}T}\right)\right) \\ &= -k\_{\mathrm{B}}T\ln\left(\sum\_{\mathbf{r}\_{i}\in\Omega\tau\_{2}\in\Omega}\cdots\sum\_{\mathbf{r}\_{|V|}\in\Omega}\exp\left(\frac{h}{k\_{\mathrm{B}}T}\sum\_{i\in V}\tau\_{i} + \frac{J}{|V|k\_{\mathrm{B}}T}\sum\_{\langle i,j\rangle\in E}\tau\_{i}\tau\_{j}\right)\right) \\ &= \frac{J}{2} - k\_{\mathrm{B}}T\ln\left(\sum\_{\tau\_{i}\in\Omega\tau\_{2}\in\Omega}\cdots\sum\_{\tau\_{|V|}\in\Omega}\exp\left(\frac{h}{k\_{\mathrm{B}}T}\sum\_{i\in V}\tau\_{i}\right)\exp\left(\frac{1}{2}\left(\sqrt{\frac{J}{|V|k\_{\mathrm{B}}T}}\sum\_{i\in V}\tau\_{i}\right)^{2}\right)\right). \end{split} \tag{10.150}$$

By using the Gauss integral formula

$$\frac{1}{\sqrt{2\pi}}\int\_{-\infty}^{+\infty} \exp\left(-\frac{1}{2}x^2 + a\chi\right)d\chi = \exp\left(\frac{1}{2}a^2\right),\tag{10.151}$$

the expression for the free energy is rewritten as

$$\begin{split} -k\_{\rm B}T \ln \Big( Z\Big( \frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T} \Big) \Big) \\ &= \frac{1}{2}J - \ln \Big( \sum\_{\tau\_{1} \in \Omega \, \tau\_{2} \in \Omega} \cdots \sum\_{\tau\_{|V|} \in \Omega} \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{+\infty} e^{-\frac{1}{2}x^{2}} \\ & \qquad \times \exp\Big( \sum\_{i \in V} \Big( \sqrt{\frac{J}{|V| \&\_{\rm B} T}} x + \frac{h}{k\_{\rm B}T} \Big) v\_{i} \Big) dx \Big) \\ &= \frac{1}{2}J - \ln \Big( \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{+\infty} e^{-\frac{1}{2}x^{2}} \prod\_{i \in V} \Bigg( \sum\_{t\_{i} \in \Omega} \exp\Big( \Big( \sqrt{\frac{J}{|V| \&\_{\rm B} T}} x + \frac{h}{k\_{\rm B}T} \Big) v\_{i} \Big) \Big) dx \Big) \\ &= \frac{1}{2}J - \ln \Big( \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{+\infty} e^{-\frac{1}{2}x^{2}} \prod\_{i \in V} \Big( 2 \cosh\Big( \sqrt{\frac{J}{|V| \&\_{\rm B} T}} x + \frac{h}{k\_{\rm B}T} \Big) \Big) dx \Big) . \end{split} \tag{10.152}$$

Note that the procedure in which a new continuous variable *x* is introduced in Eq. (10.152) is referred to as a **Hubbard-Stratonovich transformation** [13]. Moreover, 

by replacing the variable *x* by *y* = *x* |*V*| *k*B*T J* , the free energy can be written as

$$-k\_{\rm B}T\ln\left(Z\left(\frac{J}{k\_{\rm B}T},\frac{h}{k\_{\rm B}T}\right)\right) = \frac{1}{2}J - \ln\left(\frac{1}{\sqrt{2\pi}}\int\_{-\infty}^{+\infty} \exp\{|V|\psi(\mathbf{y})\}d\mathbf{y}\right), \quad (10.153)$$

where

$$\psi(\mathbf{y}) \equiv -\frac{1}{2} \left( \frac{J}{k\_{\rm B}T} \right) \mathbf{y}^2 + \frac{1}{|V|} \sum\_{i \in V} \ln \left( 2 \cosh \left( \frac{1}{k\_{\rm B}T} (J\mathbf{y} + h) \right) \right). \tag{10.154}$$

We now consider the magnetization

$$m\left(\frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) = \lim\_{|V| \to +\infty} \frac{1}{|V|} \sum\_{i \in V} \sum\_{s\_1 \in \Omega\_2} \sum\_{s\_2 \in \Omega} \cdots \sum\_{s\_{|V|} \in \Omega} s\_i \, P\left(\mathbf{s} \, \Big|\, \frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) \qquad\qquad\text{for }\mathbf{t}$$
  $\mathbf{E}\_{\rm acc} \,\,(10.100)$  and  $(10.111)$  with  $\mathbf{E}\_{\rm eff} \,\,(10.140)$  or  $\mathbf{f}$  follows:

Eqs. (10.109) and (10.111) with Eq. (10.149) as follows:

$$\begin{split} m\left(\frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) &= \lim\_{|V| \to +\infty} \frac{1}{|V|} \frac{\partial}{\partial \left(\frac{h}{k\_{\rm B}T}\right)} \left(-k\_{\rm B}T \ln\left(Z\left(\frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right)\right)\right) \\ &= \lim\_{|V| \to +\infty} \frac{\int\_{-\infty}^{+\infty} \exp\left(|V|\left(\psi\left(\mathbf{y}\right) + \frac{1}{|V|} \ln\left(\tanh\left(\frac{1}{k\_{\rm B}T}(J\mathbf{y} + h)\right)\right)\right)\right) d\mathbf{y}}{\int\_{-\infty}^{+\infty} \exp(|V|\psi\left(\mathbf{y}\right)) d\mathbf{y}}. \end{split} \tag{10.155}$$

Because it is valid that

$$\begin{split} \lim\_{|V| \to +\infty} \frac{\partial}{\partial \mathbf{y}} \psi(\mathbf{y}) &= \lim\_{|V| \to +\infty} \frac{\partial}{\partial \mathbf{y}} \left( \psi(\mathbf{y}) + \frac{1}{|V|} \ln \left( \tanh \left( \frac{1}{k\_{\mathrm{B}}T} (J\mathbf{y} + h) \right) \right) \right) \\ &= -\frac{J}{k\_{\mathrm{B}}T} \left( \mathbf{y} - \tanh \left( \frac{1}{k\_{\mathrm{B}}T} (J\mathbf{y} + h) \right) \right), \end{split} \tag{10.156}$$

we obtain the magnetization as

$$m\left(\frac{J}{k\_{\rm B}T}, \frac{h}{k\_{\rm B}T}\right) = \lim\_{|V| \to +\infty} \frac{\exp\left(|V|\left(\psi\left(\mathbf{y}\_{\rm max}\right) + \frac{1}{|V|}\ln\left(\tanh\left(\frac{1}{k\_{\rm B}T}(J\mathbf{y}\_{\rm max} + h)\right)\right)\right)\right)}{\exp\left(|V|\psi\left(\mathbf{y}\_{\rm max}\right)\right)}$$

$$= \tanh\left(\frac{1}{k\_{\rm B}T}(J\mathbf{y}\_{\rm max} + h)\right),\tag{10.157}$$

where

$$\text{y}\_{\text{max}} = \tanh\left(\frac{1}{k\_{\text{B}}T}(J\text{y}\_{\text{max}} + h)\right) \tag{10.158}$$

by using a **saddle point method** [37]. Equations (10.157) and (10.158) reduce to the following mean-field equation for *m <sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* , *<sup>h</sup> k*B*T* :

$$\operatorname{Im}\left(\frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T}\right) = \tanh\left(\frac{1}{k\_{\text{B}}T}\left(Jm\left(\frac{J}{k\_{\text{B}}T}, \frac{h}{k\_{\text{B}}T}\right) + h\right)\right). \tag{10.159}$$

This means that it is possible to treat the Ising model on the complete graph in the thermodynamic limit analytically using the mean-field method.

By combining the replica method with the Hubbard-Stratonovich transformation and the saddle point method, it is possible to treat the random average in Eq. (10.25) for the Ising model with non-uniform external fields on the complete graph analytically [13, 76]. In statistical mechanics, this kind of approach has been developed as the **spin glass theory** [77–80]. Such computational techniques that use the replica method for Ising models with spatially non-uniform interactions and external fields on the complete graph have been used to estimate statistical performance analysis for many probabilistic information processing systems [13, 15–17].

Next, we consider the belief propagation method for the Ising model on the complete graph in Eq. (10.149) with Eqs. (10.109) and (10.111). For an infinitesimal small |*V*| <sup>−</sup>1, the message passing rule in Eq. (10.95) can be expanded to

<sup>μ</sup>*j*→*i*(*si*) <sup>=</sup> *Zi Z*{*i*,*j*} τ *<sup>j</sup>*∈- ⎛ <sup>⎝</sup> *l*∈∂ *j*\{*i*} μ*l*→*j*(τ *<sup>j</sup>*) ⎞ ⎠exp 1 *k*B*T J* |*V*| *si* τ *<sup>j</sup>* + *h*τ *<sup>j</sup>* <sup>=</sup> *Zi Z*{*i*,*j*} τ *<sup>j</sup>*∈- ⎛ <sup>⎝</sup> *l*∈∂ *j*\{*i*} μ*l*→*j*(τ *<sup>j</sup>*) ⎞ ⎠exp *h <sup>k</sup>*B*<sup>T</sup>* <sup>τ</sup> *<sup>j</sup>* 1 + 1 |*V*| *J <sup>k</sup>*B*<sup>T</sup> si* <sup>τ</sup> *<sup>j</sup>* <sup>+</sup> *<sup>O</sup>* |*V* | −2 <sup>=</sup> *Zi <sup>Z</sup> <sup>j</sup> Z*{*i*,*j*} τ *<sup>j</sup>*∈- *R j*(τ *<sup>j</sup>*) 1 + 1 |*V*| *J <sup>k</sup>*B*<sup>T</sup> si* <sup>τ</sup> *<sup>j</sup>* <sup>+</sup> *<sup>O</sup>* |*V* | −2 <sup>=</sup> *Zi <sup>Z</sup> <sup>j</sup> Z*{*i*,*j*} ⎛ <sup>⎝</sup><sup>1</sup> <sup>+</sup> 1 *k*B*T* ⎛ ⎝ *J* |*V*| τ *<sup>j</sup>*∈- τ *<sup>j</sup> R j*(τ *<sup>j</sup>*) ⎞ <sup>⎠</sup>*si* <sup>+</sup> *<sup>O</sup>* |*V*| −2 ⎞ ⎠ <sup>=</sup> *Zi <sup>Z</sup> <sup>j</sup> Z*{*i*,*j*} ⎛ <sup>⎝</sup>exp ⎛ ⎝ 1 *k*B*T* ⎛ ⎝ *J* |*V*| τ *<sup>j</sup>*∈- τ *<sup>j</sup> R j*(τ *<sup>j</sup>*) ⎞ ⎠*si* ⎞ ⎠ + *O* |*V* | −2 ⎞ <sup>⎠</sup>. (10.160)

By substituting Eq. (10.160) into Eq. (10.91), the marginal probabilities can be expressed as follows:

$$
\widehat{R}\_{i}(s\_{i}) = \frac{\exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(\frac{J}{|V|} \sum\_{j \in V \backslash \langle i \rangle \tau\_{j} \in \Omega} \tau\_{j} \widehat{R}\_{j}(\tau\_{j}) + h\right) s\_{i}\right)}{\sum\_{\tau\_{i} \in \Omega} \exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(\frac{J}{|V|} \sum\_{j \in V \backslash \langle i \rangle \tau\_{j} \in \Omega} \tau\_{j} \widehat{R}\_{j}(\tau\_{j}) + h\right) \tau\_{i}\right)}
$$

$$
+ \mathcal{O}(|V|^{-1}) \ (|V| \to +\infty, \ s\_{i} \in \Omega, \ i \in V),
\tag{10.161}
$$

$$\widehat{R}\_{l}(s\_{l},s\_{f}) = \widehat{R}\_{l}(s\_{l})\widehat{R}\_{f}(s\_{f}) + \mathcal{O}(|V|^{-1}) \ (|V| \to +\infty, \ s\_{l} \in \Omega, \ s\_{f} \in \Omega, \ \langle l,j \rangle \in E). \tag{10.162}$$

Equation (10.161) can be regarded as a system of simultaneous deterministic equations for ' *R <sup>i</sup>*(*si*) *si*∈-,*i*∈*V* ( and is equivalent to the mean-field equation in Eq. (10.68) for Eq. (10.149) with Eqs. (10.109) and (10.111).

## *10.3.5 Probabilistic Segmentation by Potts Prior and Loopy Belief Propagation*

In Sect. 10.2.3, we gave the fundamental framework of probabilistic segmentation based on the Potts prior, and reduced the framework of the EM procedure for estimating hyperparameters to the extremum conditions of the *Q*-function as shown in Eqs. (10.41), (10.42), and (10.43) with Eqs. (10.44), (10.45), and (10.46). These frameworks can be realized by combining them with the loopy belief propagation in Sect. 10.3.2 to give the following practical procedures [70]:

**Probabilistic segmentation algorithm** (**Input** :*D*, **Output** :α(*D*),*u*(*D*),*a*(*D*), *C*(*D*),*s*(*D*))


$$\mu\_{j\to i}(\mathfrak{s}\_{i}) \leftarrow \frac{\sum\_{\mathfrak{r}\_{j}\in\Omega} \exp(2\widehat{\mathfrak{a}}(\mathcal{D})\delta\_{\mathfrak{s}\_{i},\mathfrak{r}\_{j}}) \mathfrak{g}(\mathsf{d}\_{j}|\mathsf{r}\_{j},\widehat{\mathfrak{a}}(\mathsf{r}\_{j},\mathcal{D}),\widehat{\mathfrak{C}}(\mathsf{r}\_{j},\mathcal{D})) \prod\_{k\in\partial j\backslash\{\bar{l}\}} \widehat{\mu}\_{k\to j}(\mathsf{r}\_{j},\mathcal{D})}{\sum\_{\mathfrak{r}\_{i}\in\Omega} \sum\_{\mathfrak{r}\_{i}\in\Omega} \exp(2\widehat{\mathfrak{a}}(\mathcal{D})\delta\_{\mathfrak{r}\_{i},\mathfrak{r}\_{j}}) \mathfrak{g}(\mathsf{d}\_{j}|\mathsf{r}\_{j},\widehat{\mathfrak{a}}(\mathsf{r}\_{j},\mathcal{D}),\widehat{\mathfrak{C}}(\mathsf{r}\_{j},\mathcal{D}))} \prod\_{k\in\partial j\backslash\{\bar{l}\}} \widehat{\mu}\_{k\to j}(\mathsf{r}\_{j},\mathcal{D})}{}$$

$$
\widehat{\mu}\_{j \to i}(\mathbf{s}\_i, \mathbf{D}) \leftarrow \mu\_{j \to i}(\mathbf{s}\_i) \text{ (}\mathbf{s}\_i \in \Omega, \text{ } i \in V, \text{ } j \in \partial i), \tag{10.164}
$$

$$B\_{l} \leftarrow \sum\_{\mathbf{r}\_{l} \in \Omega} \operatorname{g} \Big( \operatorname{d}\_{l} \Big| \operatorname{\bf r}\_{l}, \widehat{\mathbf{a}} (\operatorname{\bf r}\_{l}, \mathbf{D}), \widehat{\mathbf{o}} (\operatorname{\bf r}\_{l}, \mathbf{D}) \Big) \prod\_{k \neq l \land l} \widehat{\mu}\_{k \to i} (\operatorname{\bf r}\_{l}, \mathbf{D}) \quad (i \in V), \qquad (10.165)$$

$$\begin{split} B\_{\{i,j\}} \leftarrow \sum\_{\mathbf{r}\_{i} \in \Omega \mathbf{r}\_{j} \in \Omega} \sum\_{k \in \hat{\mathcal{U}} \cup \{j\}} \left( \prod\_{k \in \hat{\mathcal{U}} \cup \{j\}} \hat{\mu}\_{k \to i}(\mathbf{r}\_{i}, \mathbf{D}) \right) \mathbf{g} \big( \mathbf{d}\_{i} \big| \mathbf{r}\_{i}, \widehat{\mathbf{a}}(\mathbf{r}\_{i}, \mathbf{D}), \widehat{\mathbf{C}}(\mathbf{r}\_{i}, \mathbf{D}) \big) \\ \times \exp \Big( 2\hat{\mathbf{a}}(\mathbf{D}) \delta\_{\mathbf{r}\_{i}} \mathbf{r}\_{j} \big) \\ \times \boldsymbol{g} \big( \mathbf{d}\_{j} \big| \mathbf{r}\_{j}, \widehat{\mathbf{a}}(\mathbf{r}\_{j}, \mathbf{D}), \widehat{\mathbf{C}}(\mathbf{r}\_{j}, \mathbf{D}) \big) \Big( \prod\_{k \in \hat{\mathcal{U}} \cup \{i\}} \hat{\mu}\_{k \to j}(\mathbf{r}\_{j}, \mathbf{D}) \big) \big( \langle i, j \rangle \epsilon E \big), \\ \tag{10.166} \end{split} \tag{10.168}$$

$$\mathfrak{a}(s\_{i}) \leftarrow \frac{\sum\_{i \in V} \frac{1}{B\_{i}} \mathbf{d}\_{i} \mathbf{g}\left(\mathbf{d}\_{i} \big| s\_{i}, \widehat{\mathfrak{a}}(s\_{i}, \mathbf{D}), \widehat{\mathfrak{C}}(s\_{i}, \mathbf{D})\right) \Big(\prod\_{k \in \partial i} \widehat{\mu}\_{k \to i}(s\_{i}, \mathbf{D})\Big)}{\sum\_{i \in V} \frac{1}{B\_{i}} \mathbf{g}\left(\mathbf{d}\_{i} \big| s\_{i}, \mathbf{a}(s\_{i}, \mathbf{D}), \mathbf{C}(s\_{i}, \mathbf{D})\right) \Big(\prod\_{k \in \partial i} \widehat{\mu}\_{k \to i}(s\_{i}, \mathbf{D})\Big)} \tag{10.167}$$

$$\mathbf{C}(\boldsymbol{s}\_{i}) \leftarrow \frac{\sum\_{l \in V} \frac{1}{B\_{l}} (\mathbf{d}\_{l} - \widehat{\mathbf{a}}(\boldsymbol{s}\_{i}, \mathbf{D})) (\mathbf{d}\_{l} - \widehat{\mathbf{a}}(\boldsymbol{s}\_{i}, \mathbf{D}))^{\mathsf{T}} \mathbf{g} (\mathbf{d}\_{l}|\boldsymbol{s}\_{i}, \widehat{\mathbf{a}}(\boldsymbol{s}\_{i}, \mathbf{D}), \widehat{\mathbf{C}}(\boldsymbol{s}\_{i}, \mathbf{D})) \left(\prod\_{k \in \bar{\mathcal{U}}} \widehat{\mu}\_{k \to i}(\boldsymbol{s}\_{i}, \mathbf{D})\right)}{\sum\_{l \in V} \frac{1}{B\_{l}} \mathbf{g} (\mathbf{d}\_{l}|\boldsymbol{s}\_{i}, \widehat{\mathbf{a}}(\boldsymbol{s}\_{i}, \mathbf{D}), \widehat{\mathbf{C}}(\boldsymbol{s}\_{i}, \mathbf{D})) \left(\prod\_{k \in \bar{\mathcal{U}}} \widehat{\mu}\_{k \to i}(\boldsymbol{s}\_{i}, \mathbf{D})\right)} \tag{10.168}$$

$$
\begin{split}
\widehat{\mu}(\mathbf{D}) \leftarrow \frac{1}{|E|} \sum\_{\{i,j\} \in E} \left( \frac{1}{B\_{\left[i,j\right]}} \sum\_{\mathbf{r}\_{i} \in \Omega \mathbf{r}\_{i} \in \Omega} \left( -\delta\_{\mathbf{r}\_{i},\mathbf{r}\_{i}} \right) \exp\left(2\widehat{\mathbf{a}}(\mathbf{D})\delta\_{\mathbf{r}\_{i},\mathbf{r}\_{i}}\right) \\ \qquad \times \Big( \prod\_{k \in \partial i} \widehat{\mu}\_{k \to i}(\mathbf{r}\_{i}, \mathbf{D}) \Big) g\Big(\mathbf{d}\_{i} \Big| \tau\_{i}, \widehat{\mathbf{a}}(\tau\_{i}, \mathbf{D}), \widehat{\mathbf{C}}(\tau\_{i}, \mathbf{D})\Big) \\ \qquad \times \Big( \prod\_{k \in \partial j} \widehat{\mu}\_{k \to j}(\tau\_{j}, \mathbf{D}) \Big) g\Big(\mathbf{d}\_{j} \Big| \tau\_{j}, \widehat{\mathbf{a}}(\tau\_{j}, \mathbf{D}), \widehat{\mathbf{C}}(\tau\_{j}, \mathbf{D})\Big) \Big),
\end{split} \tag{10.169}
$$

**-**

$$
\widehat{\mathfrak{a}}(\mathbf{s}\_i, \mathbf{D}) \leftarrow \mathfrak{a}(\mathbf{s}\_i) \ (\mathbf{s}\_i \in \Omega), \tag{10.170}
$$

$$
\widehat{\mathbf{C}}(\mathbf{s}\_i, \mathbf{D}) \leftarrow \mathbf{C}(\mathbf{s}\_i) \; (\mathbf{s}\_i \in \Omega). \tag{10.171}
$$

Here, *g di* ξ ,*a*(ξ , *<sup>D</sup>*), *C*(ξ , *<sup>D</sup>*) is defined by Eq. (10.31) for each state ξ(∈-). **Step 3 (M-step):** Set the initial values of the messages {λ(ξ )|ξ∈-} in the loopy belief propagation for the Potts prior and repeat the following procedure until α(*d*) and {λ(ξ )|ξ∈-} converge:

$$\lambda(s\_i) \leftarrow \frac{\sum\_{\mathbf{r}\_i \in \Omega} \exp(2\widehat{\alpha}(\mathcal{D})\delta\_{s\_i, \mathbf{r}\_j})\widehat{\lambda}(\mathbf{r}\_j, \mathcal{D})^3}{\sum\_{\mathbf{r}\_i \in \Omega} \exp(2\widehat{\alpha}(\mathcal{D})\delta\_{\mathbf{r}\_i, \mathbf{r}\_j})\widehat{\lambda}(\mathbf{r}\_j, \mathcal{D})^3} \text{ ( $s\_i \in \Omega$ )},\qquad(10.172)$$

$$
\widehat{\lambda}(\mathbf{s}\_i, \mathbf{d}) \leftarrow \lambda(\mathbf{s}\_i) \text{ ( $\mathbf{s}\_i \in \Omega$ )},\tag{10.173}
$$

$$
\widehat{\boldsymbol{a}}(\mathcal{D}) \leftarrow \widehat{\boldsymbol{a}}(\mathcal{D}) \times \left(\frac{1}{1 + \widehat{\boldsymbol{u}}(\mathcal{D})} \frac{\tau\_{l} \in \Omega \tau\_{j} \in \Omega}{\sum\_{\tau\_{l} \in \Omega \tau\_{j} \in \Omega} \widehat{\boldsymbol{\lambda}}(\tau\_{l}, \mathcal{D})^{3} \exp\left(2\widehat{\boldsymbol{u}}(\mathcal{D})\delta\_{\tau\_{l}, \tau\_{j}}\right) \widehat{\boldsymbol{\lambda}}(\tau\_{j}, \mathcal{D})^{3}}\right)^{1/4}.
\tag{10.174}
$$

**Step 4** Compute the output*s*(*D*) <sup>=</sup> *s*1(*D*),*s*2(*D*), ···,*s*|*V*|(*D*) as follows: **-**

$$\begin{aligned} \text{Compute the output } \mathbf{s}(\mathcal{D}) &= \{ \mathbf{s}\_1(\mathcal{D}), \mathbf{s}\_2(\mathcal{D}), \dots, \mathbf{s}\_{|V|}(\mathcal{D}) \} \text{ as follows:}\\ \widehat{\mathbf{s}}\_i(\mathcal{D}) &\leftarrow \operatorname\*{argmax}\_{\mathbf{s}\_i \in \Omega} \operatorname\*{\mathbf{g}}\left(\mathbf{d}\_i \, [\mathbf{s}\_i, \widehat{\mathbf{a}}(\mathbf{s}\_i, \mathcal{D}), \widehat{\mathbf{C}}(\mathbf{s}\_i, \mathcal{D}) \, \middle|\, \prod\_{k \in \partial i} \widehat{\mu}\_{k \to i}(\mathbf{s}\_i, \mathcal{D}) \, (i \in V). \tag{10.175} \right) \end{aligned}$$

Stop if the hyperparametersα(*D*),*a*(*si*, *<sup>D</sup>*)(*si*∈-), and *C*(*si*, *<sup>D</sup>*)(*si*∈-) converge and return to **Step 2** otherwise.

Some of the numerical experimental results are shown in Fig. 10.14. The Potts prior has the first-order phase transition as shown in Sect. 10.3.6. Figure 10.14 shows how the hyperparameter 2<sup>α</sup> <sup>=</sup> *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* converges in the EM procedure with loopy belief propagation under the first-order phase transition.

## *10.3.6 Real-Space Renormalization Group Method and Sublinear Modeling of Statistical Machine Learning*

First, we explore the most fundamental real-space renormalization procedure for the Ising model in Eq. (10.49) on the ring graph (*V*, *E*), where

$$E \equiv \left\{ \{1, 2\}, \{2, 3\}, \{3, 4\}, \dots, \{|V| - 1, |V|\}, \{|V|, 1\} \right\}, \tag{10.176}$$

in the case of |*V*| = 2*<sup>L</sup>* . We have the following equality:

$$\begin{split} &\sum\_{s\_{\mathcal{I}}\in\Omega s\_{\mathcal{I}}\in\Omega s\_{\mathcal{I}}\in\Omega} \dots \sum\_{s\_{|V|}\in\Omega(i,j)\in E} \exp\left(\frac{1}{k\_{\mathcal{B}}T} J s\_{\mathcal{I}s\_{\mathcal{I}}i+1}\right) \\ &= \left(\sum\_{s\_{\mathcal{I}}\in\Omega} \exp\left(\frac{1}{k\_{\mathcal{B}}T} J(s\_{\mathcal{I}}+s\_{\mathcal{I}})s\_{\mathcal{I}}\right)\right) \left(\sum\_{s\_{\mathcal{I}}\in\Omega} \exp\left(\frac{1}{k\_{\mathcal{B}}T} J(s\_{\mathcal{I}}+s\_{\mathcal{I}})s\_{\mathcal{I}}\right)\right) \\ &\qquad\times \dots \times \left(\sum\_{s\_{|V|}\in\Omega} \exp\left(\frac{1}{k\_{\mathcal{B}}T} J\left(s\_{|V|-3}+s\_{|V|-1}\right)s\_{|V|-2}\right)\right) \end{split}$$

**Fig. 10.14** Numerical experimental results of probabilistic segmentations by Potts prior and loopy belief propagation. The graph (*V*, *E*)is a square grid graph with periodic boundary conditions along the *x*- and *y*-directions

$$\begin{split} & \left( \sum\_{s \mid v \in \Omega} \exp\left( \frac{1}{k\_{\mathbb{B}}T} J\left( s\_{|V|-1} + s\_{1} \right) s\_{|V|} \right) \right) \\ &= 2^{\frac{|V|}{2}} \left( \cosh\left( \frac{1}{k\_{\mathbb{B}}T} J \right) \right)^{\frac{1}{2} \left( 1 + s\_{1 \cap S} \right)} \left( \cosh\left( \frac{1}{k\_{\mathbb{B}}T} J \right) \right)^{\frac{1}{2} \left( 1 + s\_{\mathbb{B}} \right)} \\ & \qquad \times \cdots \times \left( \cosh\left( \frac{1}{k\_{\mathbb{B}}T} J \right) \right)^{\frac{1}{2} \left( 1 + s\_{|V|-1} \cdot 3^{|V|-1} \right)} \left( \cosh\left( \frac{1}{k\_{\mathbb{B}}T} J \right) \right)^{\frac{1}{2} \left( 1 + s\_{|V|-1} \cdot 3^{|V|} \right)} \\ &= 2^{\frac{|V|}{2}} \left( \prod\_{i=0}^{\frac{|V|}{2}-2} \exp\left( \left( 1 + s\_{2i+1} s\_{2i+3} \right) \times \frac{1}{2} \ln\left( \cosh\left( \frac{2}{k\_{\mathbb{B}}T} J \right) \right) \right) \right) \\ & \qquad \times \exp\left( \left( 1 + s\_{|V|-1} s\_{1} \right) \times \frac{1}{2} \ln\left( \cosh\left( \frac{2}{k\_{\mathbb{B}}T} J \right) \right) \right). \end{split} \tag{10.177}$$

For the <sup>|</sup>*V*<sup>|</sup> <sup>2</sup> -dimensional state vector (*a*1, *a*3, *a*5, ···, *a*|*V*−3|, *a*|*<sup>V</sup>* |−<sup>1</sup>), the marginal probability distribution *P*{1,3,5,···,|*V*|−3,|*<sup>V</sup>* |−1} *a*1, *a*3, *a*5, ···, *a*|*<sup>V</sup>* |−<sup>3</sup>, *a*|*V*|−<sup>1</sup> α is expressed as

*P*{1,3,5,···,|*V*|−3,|*V*|−1} *s*1,*s*3,*s*5, ···,*s*|*V*|−3,*s*|*V*|−<sup>1</sup> <sup>≡</sup> *s*2∈- *s*4∈- *s*6∈- ··· *s*|*<sup>V</sup>* |−2∈- *s*|*<sup>V</sup>* |∈- *P s*1,*s*2,*s*3,*s*4,*s*5,*s*6, ···,*s*|*V*|−3,*s*|*V*|−2*s*|*V*|−1,*s*|*V*<sup>|</sup> = ⎛ ⎝ |*V* | 2 −2 *i*=0 exp α(1) *s*2*i*+1,*s*2*i*+<sup>3</sup> ⎞ ⎠exp α(1) *s*|*V*|−1,*s*<sup>1</sup> *a*1∈- *a*3∈- ··· *a*|*<sup>V</sup>* |−1∈- ⎛ ⎝ |*V* | 2 −2 *i*=0 exp α(<sup>1</sup>)*s*2*i*+<sup>1</sup>,*s*2*i*+<sup>3</sup> ⎞ ⎠exp α(<sup>1</sup>)*s*|*V*|−<sup>1</sup>,*s*<sup>1</sup> , (10.178)

where

$$a^{(1)} \equiv \frac{1}{2} \ln \left( \cosh \left( \frac{2}{k\_{\text{B}} T} J \right) \right). \tag{10.179}$$

The remaining nodes, which are denoted by odd numbers, are now renumbered by replacing *i* with *<sup>i</sup>*−<sup>1</sup> <sup>2</sup> for *i* = 1, 3, 5, ···, |*V*| − 3, |*V*| − 1 and new sets *V*(1) and *E*(1) of nodes and edges and a new state vector *s(***1***)* = *s* (1) <sup>1</sup> ,*s* (1) <sup>2</sup> ,*s* (1) <sup>3</sup> , ···,*s* (1) |*V* | <sup>2</sup> −1 ,*s* (1) |*V* | 2 T are introduced as follows:

$$V^{(1)} \equiv \left\{ 1, 2, \dots, 3, 4, \dots, \frac{|V|}{2} - 1, \frac{|V|}{2} \right\},\tag{10.180}$$

$$E^{(1)} \equiv \left\{ \{1, 2\}, \{2, 3\}, \{3, 4\}, \dots, \{\frac{|V|}{2} - 1, \frac{|V|}{2}\}, \{\frac{|V|}{2}, 1\} \right\}, \quad (10.181)$$

$$s\_i^{(1)} = s\_{2i-1}\ (i = 1, 2, \cdots, |V|/2). \tag{10.182}$$

For the <sup>|</sup>*V*<sup>|</sup> <sup>2</sup> -dimentional state vector *s(***1***)* = *s* (1) <sup>1</sup> ,*s* (1) <sup>2</sup> ,*s* (1) <sup>3</sup> , ···,*s* (1) |*V* | <sup>2</sup> −1 ,*s* (1) |*V* | 2 T , we define a new renormalized probability distribution by

$$P^{(1)}(\mathbf{s}^{(1)}) \equiv \frac{\prod\_{\{i,j\} \in E^{(1)}} \exp\left(\alpha^{(1)} s\_i^{(1)} s\_j^{(1)}\right)}{\sum\_{a\_1^{(1)} \in \Omega a\_2^{(1)} \in \Omega} \cdots \sum\_{a\_{|V|/2}^{(1)} \in \Omega} \prod\_{\{i,j\} \in E^{(1)}} \exp\left(\alpha^{(1)} s\_i^{(1)} s\_j^{(1)}\right)}.\tag{10.183}$$

By repeating the above renormalizing procedures,

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 211

$$\sum\_{s\_2^{(r-1)} \in \Omega\_4^{s(-1)}} \sum\_{\boldsymbol{\alpha} \in \Omega\_6^{s(r-1)}} \sum\_{\boldsymbol{s} \in \Omega\_1^{s(r-1)}} \cdots \sum\_{s\_{|V^{(r-1)}|} \in \Omega\_{\left[i,j\right]}} \prod\_{\boldsymbol{s} \in E^{(r-1)}} \exp\left(\frac{1}{k\_\mathcal{B}T} J s\_i^{(r-1)} s\_{i+1}^{(r-1)}\right)$$

$$= 2^{\frac{|V|}{2}} \left(\prod\_{i=0}^{\frac{|V|}{2}-2} \exp\left(\left(1 + s\_{2i+1} s\_{2i+3}\right) \times \frac{1}{2} \ln\left(\cosh\left(2\alpha^{(r-1)}\right)\right)\right)\right)$$

$$\exp\left(\left(1 + s\_{|V|-1} s\_1\right) \times \frac{1}{2} \ln\left(\cosh\left(\frac{2}{k\_\mathcal{B}T} J\right)\right)\right),$$

$$\alpha^{(r)} \equiv \frac{1}{2} \ln \left( \cosh \left( 2 \alpha^{(r-1)} \right) \right), \tag{10.185}$$

$$s\_i^{(r)} = s\_{2i-1}^{(r-1)} \left( i = 1, 2, \dots, \frac{|V|}{2^r} \right),\tag{10.186}$$

the renormalized probability of the *r*-th step is generated as follows:

$$P^{(r)}(s^{(r)}) \equiv \frac{\prod\_{\{i,j\} \in E^{(r)}} \exp\left(\alpha^{(r)} s\_i^{(r)} s\_{i+1}^{(r)}\right)}{\sum\_{s\_1^{(r)} \in \Omega s\_2^{(r)} \in \Omega} \cdots \sum\_{s\_{|V|/2^r}^{(r)} \in \Omega} \prod\_{\{i,j\} \in E^{(r)}} \exp\left(\alpha^{(r)} s\_i^{(r)} s\_{i+1}^{(r)}\right)},\qquad(10.187)$$

where

$$V^{(r)} \equiv \{1, 2, \dots, \frac{|V|}{2^r}\},\tag{10.188}$$

$$E^{(r)} \equiv \left\{ \{1, 2\}, \{2, 3\}, \{3, 4\}, \dots, \{\frac{|V|}{2^r} - 1, \frac{|V|}{2^r}\}, \{\frac{|V|}{2^r}, 1\} \right\}.\tag{10.189}$$

Note that *<sup>V</sup>*(0) <sup>=</sup> *<sup>V</sup>*, *<sup>E</sup>*(0) <sup>=</sup> *<sup>E</sup>*, <sup>α</sup>(0) <sup>=</sup> *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* , *<sup>s</sup>*(0) <sup>=</sup> *<sup>s</sup>*, and *<sup>P</sup>*(0) *s*(0) = *P*(*s*).

Equation (10.185) corresponds to the update rule from α(*r*−1) α(*r*) . By solving Eq. (10.185) with respect to α(*r*−1) , we can derive the inverse transformation rule of the real-space renormalization group procedure as follows:

$$\alpha^{(r-1)} = \frac{1}{2} \text{arccosh}(\exp(2\alpha^{(r)})).\tag{10.190}$$

If the hyperparameter α(*r*) in the *r*-th renormalized probability distribution *P*(*r*) *s*(*r*) has been estimated from given data vectors by means of the EM algorithm for renormalized probabilistic graphical models on ring graphs  *V*(*r*) , *E*(*r*) , we can estimate the hyperparameter <sup>α</sup>(0) <sup>=</sup> *<sup>J</sup> <sup>k</sup>*B*<sup>T</sup>* of the probabilistic graphical models (10.49) on ring graphs (*V*, *E*) by using the inverse transformation rule of the real-space renormalization group procedure (10.190).

Now, we extend the real-space renormalization group scheme for the probabilistic graphical model on the ring graph to the square grid graph as a pair approximation in the real-space renormalization group framework as follows:

$$\exp(\alpha^{(r)}s\_1s\_3) \propto \sum\_{s\_2 \in \Omega s\_1 \in \Omega} \exp(\alpha^{(r-1)}(s\_1s\_2 + s\_2s\_3 + s\_1s\_4 + s\_4s\_3)).\tag{10.191}$$

Equation (10.191) can be reduced to

$$\alpha^{(r)} = \ln(\cosh(2\alpha^{(r-1)})).\tag{10.192}$$

The *r*-th renormalized probability distribution for Eq. (10.49) is expressed as

$$P^{(r)}\left(\mathbf{s}^{(r)}\right) \propto \prod\_{\{i,j\} \in E^{(r)}} \exp\left(\alpha^{(r)}\mathbf{s}\_i^{(r)}\mathbf{s}\_j^{(r)}\right). \tag{10.193}$$

The inversion formula in Eq. (10.192) can be derived as

$$\alpha^{(r-1)} = \frac{1}{2} \text{arccosh}(\exp(\alpha^{(r)})).\tag{10.194}$$

The above framework can be extended to the |-|-state Potts model, as shown in Fig. 10.15. The inverse renormalization group transformation can also be applied to the probabilistic segmentations in Eqs. (10.41), (10.42), and (10.43) with Eqs. (10.44), (10.45), and (10.46) in Sect. 10.2.3 [81]. One of the numerical experimental results in the inverse renormalization group transformation in probabilistic segmentations is shown in Fig. 10.16.

**Fig. 10.15** Fundamental framework of sublinear computational modeling by the inverse renormalization group transformation in probabilistic segmentations

**Fig. 10.16** Numerical experimental results of sublinear computational modeling in the inverse renormalization group transformation in probabilistic segmentations

#### **10.4 Quantum Statistical Machine Learning**

This section explores the fundamental frameworks of quantum probabilistic graphical models based on energy matrices and density matrices. Note that every energy matrix needs to be Hermitian and have a density matrix that is defined by all the eigenvalues and all the eigenvectors of each energy matrix. If all the off-diagonal elements of the density matrix are zero, the diagonal elements correspond to the probability distribution in the probabilistic graphical model. First, we explain general frameworks of density matrices and their differentiations and define the minimization of free energies of density matrices. Second, we give the definitions of tensor products of matrices as well as vectors. By using Pauli spin matrices as well as tensor products, we introduce quantum probabilistic graphical models. Finally, we extend the conventional EM algorithm to a quantum expectation-maximization (QEM) algorithm.

## *10.4.1 Elementary Function and Differentiations of Hermitian Matrices*

Before proceeding with the quantum statistical mechanical extension of statistical machine learning, we need to explore some essential formulas for Hermitian matrices and their derivatives. Some fundamental properties of matrices for statistical inference have appeared in Ref. [82]. In the present section, we give some useful formulas for treating the entropy in quantum probabilistic graphical models.

We consider the *M*×*M* Hermitian matrix *A*

$$A = \begin{pmatrix} A\_{11} & A\_{12} & \cdots & A\_{1M} \\ A\_{21} & A\_{22} & \cdots & A\_{2M} \\ \vdots & \vdots & \ddots & \vdots \\ A\_{M1} & A\_{M2} & \cdots & A\_{MM} \end{pmatrix},\tag{10.195}$$

which satisfies *A* = *A* T . Here we remark that *A*<sup>T</sup> and *A* are the transpose and conjugate matrix of *A*, respectively. We introduce vertical and horizontal basis vectors in the *M*-dimensional space as follows:

$$|1\rangle = \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 0 \end{pmatrix}, |2\rangle = \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 0 \end{pmatrix}, |3\rangle = \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 0 \end{pmatrix}, \cdots, |M-1\rangle = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 1 \\ 0 \end{pmatrix}, |M\rangle = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ 1 \end{pmatrix}. \tag{10.196}$$

and

$$\begin{cases} \quad \langle 1 \vert = (1, 0, 0, 0, \cdots, 0, 0, 0), \\ \quad \langle 2 \vert = (0, 1, 0, 0, \cdots, 0, 0, 0), \\ \quad \langle 3 \vert = (0, 0, 1, 0, \cdots, 0, 0, 0), \\ \quad \vdots \\ \quad \langle M - 1 \vert = (0, 0, 0, 0, \cdots, 0, 1, 0), \\ \quad \langle M \vert = (0, 0, 0, 0, \cdots, 0, 0, 1). \end{cases} \tag{10.197}$$

We can confirm that

$$\langle i|A|j\rangle = A\_{ij} \ (i \in \{1, 2, \cdots, M\}, \ j \in \{1, 2, \cdots, M\}).\tag{10.198}$$

The Hermitian matrix *A* is diagonalized as

$$A = U\Lambda U^{-1},\tag{10.199}$$

$$\mathbf{A} \equiv \begin{pmatrix} \lambda\_1 & 0 & 0 & \cdots & 0 \\ 0 & \lambda\_2 & 0 & \cdots & 0 \\ 0 & 0 & \lambda\_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda\_M \end{pmatrix},\tag{10.200}$$

where all the eigenvalues, λ1, λ2, ···, λ*<sup>M</sup>* , are always real numbers. For the eigenvector *u<sup>i</sup>* = ⎛ ⎜ ⎜ ⎜ ⎝ *U*1*<sup>i</sup> U*2*<sup>i</sup>* . . . *UMi* ⎞ ⎟ ⎟ ⎟ ⎠ corresponding to the eigenvalue λ*<sup>i</sup>* , such that *Au<sup>i</sup>* = λ*<sup>i</sup> u<sup>i</sup>* , for

every *i*∈{1, 2, 3, ···, *M*} the matrix *U* is defined by

$$U \equiv (\mathfrak{u}\_1, \mathfrak{u}\_2, \mathfrak{u}\_3, \dots, \mathfrak{u}\_M) = \begin{pmatrix} U\_{11} & U\_{12} & U\_{13} & \cdots & U\_{1M} \\ U\_{21} & U\_{22} & U\_{23} & \cdots & U\_{2M} \\ U\_{31} & U\_{32} & U\_{33} & \cdots & U\_{3M} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ U\_{M1} & U\_{M2} & U\_{M3} & \cdots & U\_{MM} \end{pmatrix} . \tag{10.201}$$

It is known that *<sup>U</sup>* is a unitary matrix that satisfies *<sup>U</sup>*−<sup>1</sup> <sup>=</sup> *<sup>U</sup>*<sup>T</sup> for any Hermitian matrix *s*. If λ<sup>1</sup> is the maximum eigenvalue, its corresponding eigenvector *u***<sup>1</sup>** is expressed using the following notation:

$$
\mathfrak{u}\_1 = \operatorname\*{argmax} \mathfrak{A}.\tag{10.202}
$$

Note that argmax*A* is the eigenvector that corresponds to the maximum eigenvalue of *A*.

For any Hermitian matrix *A*, the exponential function is defined by

$$\begin{aligned} \exp(A) & \equiv \sum\_{n=0}^{+\infty} \frac{1}{n!} A^n \\ &= U \begin{pmatrix} \exp(\lambda\_1) & 0 & 0 & \cdots & 0 \\ 0 & \exp(\lambda\_2) & 0 & \cdots & 0 \\ 0 & 0 & \exp(\lambda\_3) & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \exp(\lambda\_M) \end{pmatrix} U^{-1}, \quad (10.203) \end{aligned}$$

and ln(*A*) is defined by the inverse function of exp(*A*) such that

$$\exp(\ln(A)) = A.\tag{10.204}$$

In the present definition, we have

$$\exp(A \otimes I) = (\exp(A)) \otimes I,\\ \exp(I \otimes A) = I \otimes (\exp(A)), \qquad (10.205)$$

where *I* is an identity matrix.

For |1 − λ1| < 1, |1 − λ2| < 1, ···, |1 − λ*<sup>N</sup>* | < 1, ln(*A*) is defined by

$$\begin{aligned} \ln(A) = \ln(I - (I - A)) &\equiv -\sum\_{n=1}^{+\infty} \frac{1}{n} (I - A)^n \\ &= U \begin{pmatrix} \ln(\lambda\_1) & 0 & 0 & \cdots & 0 \\ 0 & \ln(\lambda\_2) & 0 & \cdots & 0 \\ 0 & 0 & \ln(\lambda\_3) & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \ln(\lambda\_M) \end{pmatrix} U^{-1} . \end{aligned} \tag{10.206}$$

By using Eqs. (10.203) and (10.206), we can confirm that

$$\begin{aligned} \exp(\ln(A)) &= \sum\_{n=0}^{+\infty} \frac{1}{n!} (\ln(A))^n \\ &= \sum\_{n=0}^{+\infty} \frac{1}{n!} U \begin{pmatrix} (\ln(\lambda\_1))^n & 0 & 0 & \cdots & 0 \\ 0 & (\ln(\lambda\_2))^n & 0 & \cdots & 0 \\ 0 & 0 & (\ln(\lambda\_3))^n & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & (\ln(\lambda\_M))^n \end{pmatrix} U^{-1} \end{aligned}$$

$$U = U \begin{pmatrix} \exp(\ln(\lambda\_1)) & 0 & 0 & \cdots & 0 \\ 0 & \exp(\ln(\lambda\_2)) & 0 & \cdots & 0 \\ 0 & 0 & \exp(\ln(\lambda\_3)) & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \exp(\ln(\lambda\_M)) \end{pmatrix} U^{-1}$$
 
$$= U \begin{pmatrix} \lambda\_1 & 0 & 0 & \cdots & 0 \\ 0 & \lambda\_2 & 0 & \cdots & 0 \\ 0 & 0 & \lambda\_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda\_M \end{pmatrix} U^{-1} = A. \tag{10.207}$$

Moreover, we have

$$\begin{aligned} &\sum\_{n=0}^{N} (I - A)^{n} - (I - \mathbf{s}) \sum\_{n=0}^{N} (I - A)^{n} \\ &= I + (I - \mathbf{s}) + (I - A)^{2} + \cdots + (I - A)^{N} \\ &\quad - (I - \mathbf{s}) - (I - A)^{2} - \cdots - (I - \mathbf{s})^{N} - (I - A)^{N + 1} \\ &= I - (I - A)^{N + 1}, \end{aligned} \tag{10.208}$$

such that

$$\sum\_{n=0}^{N} (I - A)^n = (I - (I - A))^{-1} \left( I - (I - A)^{N+1} \right). \tag{10.209}$$

We have

$$A^{N+1} = U \begin{pmatrix} (1 - \lambda\_1)^{N+1} & 0 & 0 & \cdots & 0 \\ 0 & (1 - \lambda\_2)^{N+1} & 0 & \cdots & 0 \\ 0 & 0 & (1 - \lambda\_3)^{N+1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & (1 - \lambda\_M)^{N+1} \end{pmatrix} U^{-1} \to 0 \ (N \to +\infty),\tag{10.210}$$

so it is valid that

$$A^{-1} = (I - (I - A))^{-1} = \sum\_{n=0}^{+\infty} (I - A)^n. \tag{10.211}$$

Note that exp(*A* and ln(*A*) as well as *A*−<sup>1</sup> are also Hermitian matrices in the present case. (This can be shown by using *<sup>A</sup><sup>n</sup>*<sup>T</sup> <sup>=</sup> *A*<sup>T</sup> *<sup>n</sup>* .)

We now introduce a Hermitian matrix function *G*(*x*) for any real number *x* as follows:

$$\mathbf{G}(\mathbf{x}) \equiv \begin{pmatrix} G\_{11}(\mathbf{x}) & G\_{12}(\mathbf{x}) & G\_{13}(\mathbf{x}) & \cdots & G\_{1M}(\mathbf{x}) \\ G\_{21}(\mathbf{x}) & G\_{22}(\mathbf{x}) & G\_{23}(\mathbf{x}) & \cdots & G\_{2M}(\mathbf{x}) \\ G\_{31}(\mathbf{x}) & G\_{32}(\mathbf{x}) & G\_{33}(\mathbf{x}) & \cdots & G\_{3M}(\mathbf{x}) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ G\_{M1}(\mathbf{x}) & G\_{M2}(\mathbf{x}) & G\_{M3}(\mathbf{x}) & \cdots & G\_{MM}(\mathbf{x}) \end{pmatrix} . \tag{10.212}$$

We have

$$G\_{ij}(\mathbf{x}) = \overline{G}\_{ji}(\mathbf{x}) \ (i \in \{1, 2, \dots, M\}, \ j \in \{1, 2, \dots, M\}),\tag{10.213}$$

such that

$$\langle i | \mathbf{G}(\mathbf{x}) | j \rangle = \langle j | \overline{\mathbf{G}}(\mathbf{x}) | i \rangle \ (i \in \{1, 2, \dots, M\}, \ j \in \{1, 2, \dots, M\}). \quad (10.214)$$

It is obvious that the derivative of the matrix *G*(*x*) with respect to *x*, namely,

$$\frac{d}{dx}G(\mathbf{x}) \equiv \begin{pmatrix} \frac{d}{dx}G\_{11}(\mathbf{x}) & \frac{d}{dx}G\_{12}(\mathbf{x}) & \frac{d}{dx}G\_{13}(\mathbf{x}) & \cdots & \frac{d}{dx}G\_{1M}(\mathbf{x})\\ \frac{d}{dx}G\_{21}(\mathbf{x}) & \frac{d}{dx}G\_{22}(\mathbf{x}) & \frac{d}{dx}G\_{23}(\mathbf{x}) & \cdots & \frac{d}{dx}G\_{2M}(\mathbf{x})\\ \frac{d}{dx}G\_{31}(\mathbf{x}) & \frac{d}{dx}G\_{32}(\mathbf{x}) & \frac{d}{dx}G\_{33}(\mathbf{x}) & \cdots & \frac{d}{dx}G\_{3M}(\mathbf{x})\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \frac{d}{dx}G\_{M1}(\mathbf{x}) & \frac{d}{dx}G\_{M2}(\mathbf{x}) & \frac{d}{dx}G\_{M3}(\mathbf{x}) & \cdots & \frac{d}{dx}G\_{MM}(\mathbf{x}) \end{pmatrix} \\ (10.215)$$

is also a Hermitian matrix such that

$$\langle i \vert \frac{d}{d\mathbf{x}} \mathbf{G}(\mathbf{x}) \vert j \rangle = \langle j \vert \frac{d}{d\mathbf{x}} \overline{\mathbf{G}}(\mathbf{x}) \vert j \rangle \ (i \in \{1, 2, \dots, M\}, \ j \in \{1, 2, \dots, \cdot, M\}). \tag{10.216}$$

We have the following equalities:

$$\frac{d}{d\boldsymbol{\chi}}(\text{Tr}[\mathbf{G}(\boldsymbol{\chi})]) = \text{Tr}\left[\frac{d}{d\boldsymbol{\chi}}\mathbf{G}(\boldsymbol{\chi})\right],\tag{10.217}$$

and

$$\operatorname{Tr}\left(\left(\frac{d}{dx}\mathbf{G}(\mathbf{x})\right)\mathbf{G}(\mathbf{x})\right) = \operatorname{Tr}\left(\mathbf{G}(\mathbf{x})\left(\frac{d}{dx}\mathbf{G}(\mathbf{x})\right)\right). \tag{10.218}$$

Equation (10.218) can be confirmed as follows:

$$\begin{split} \text{Tr}\Big(\Big(\frac{d}{dx}\mathbf{G}(\mathbf{x})\Big)\mathbf{G}(\mathbf{x})\Big) &= \sum\_{l=1}^{M} \langle i | \Big(\frac{d}{dx}\mathbf{G}(\mathbf{x})\Big) \mathbf{G}(\mathbf{x}) | i \rangle = \sum\_{l=1}^{M} \sum\_{j=1}^{M} \langle i | \Big(\frac{d}{dx}\mathbf{G}(\mathbf{x})\Big) | j \rangle \langle j | \mathbf{G}(\mathbf{x}) | i \rangle \\ &= \sum\_{j=1}^{M} \sum\_{l=1}^{M} \langle j | \mathbf{G}(\mathbf{x}) | i \rangle \langle i | \Big(\frac{d}{dx}\mathbf{G}(\mathbf{x})\Big) | j \rangle = \text{Tr}\Big(\mathbf{G}(\mathbf{x}) \Big(\frac{d}{dx}\mathbf{G}(\mathbf{x})\Big) \Big). \end{split} \tag{10.219}$$

By using Eqs. (10.217) and (10.219), we derive the following fundamental formula

*d dx* tr *G*(*x*) *n* <sup>=</sup> Tr *d dx G*(*x*) *G*(*x*) *<sup>n</sup>*−<sup>1</sup> <sup>+</sup> Tr *G*(*x*) *d dx G*(*x*) *n*−1 <sup>=</sup> Tr *d dx G*(*x*) *G*(*x*) *<sup>n</sup>*−<sup>1</sup> +Tr *G*(*x*) *d dx G*(*x*) *G*(*x*) *<sup>n</sup>*−<sup>2</sup> + *G*(*x*) *d dx G*(*x*) *n*−2 <sup>=</sup> Tr *d dx G*(*x*) *G*(*x*) *<sup>n</sup>*−<sup>1</sup> +Tr *G*(*x*) *d dx G*(*x*) *G*(*x*) *<sup>n</sup>*−<sup>2</sup> <sup>+</sup> Tr *G*(*x*) 2 *d dx G*(*x*) *n*−2 <sup>=</sup> Tr *d dx G*(*x*) *G*(*x*) *n*−1 +Tr *d dx G*(*x*) *G*(*x*)*G*(*x*) *n*−2 <sup>+</sup> Tr *G*(*x*) 2 *d dx G*(*x*) *n*−2 <sup>=</sup> Tr *d dx G*(*x*) 2*G*(*x*) *n*−1 <sup>+</sup> Tr *G*(*x*) 2 *d dx G*(*x*) *n*−2 =··· <sup>=</sup> Tr *d dx G*(*x*) (*n* − 1)*G*(*x*) *n*−1 <sup>+</sup> Tr *G*(*x*) *n*−1 *d dx G*(*x*) <sup>=</sup> Tr *d dx G*(*x*) (*n* − 1)*G*(*x*) *n*−1 <sup>+</sup> Tr *d dx G*(*x*) *G*(*x*) *n*−1 <sup>=</sup> Tr *d dx G*(*x*) *nG*(*x*) *n*−1 . (10.220)

From Eqs. (10.217) and (10.220), we can confirm the following equality:

$$\begin{split} \frac{d}{dx} \text{Tr}(\ln(G(\boldsymbol{x}))) &= \frac{d}{dx} \text{Tr}(\ln(I - (I - G(\boldsymbol{x})))) \\ &= \frac{d}{dx} \text{Tr}\left(-\sum\_{n=1}^{+\infty} \frac{1}{n} (I - G(\boldsymbol{x}))^n\right) \\ &= -\sum\_{n=1}^{+\infty} \frac{1}{n} \frac{d}{dx} \text{Tr}\left((I - G(\boldsymbol{x}))^n\right) \\ &= -\sum\_{n=1}^{+\infty} \frac{1}{n} \text{Tr}\left(\left(\frac{d}{dx}(I - G(\boldsymbol{x}))\right) n (I - G(\boldsymbol{x}))^{n-1}\right) \\ &= \text{Tr}\left(\left(-\frac{d}{dx}(I - G(\boldsymbol{x}))\right) \left(\sum\_{n=1}^{+\infty} (I - G(\boldsymbol{x}))^{n-1}\right)\right) \\ &= \text{Tr}\left(\left(\frac{d}{dx}G(\boldsymbol{x})\right) \left(\sum\_{n=1}^{+\infty} (I - G(\boldsymbol{x}))^{n-1}\right)\right) \\ &= \text{Tr}\left(\left(\frac{d}{dx}G(\boldsymbol{x})\right) (I - (I - G(\boldsymbol{x})))^{-1}\right). \end{split} \tag{10.221}$$

By using Eqs. (10.221), we can confirm the following equality:

$$\frac{d}{dA}\text{Tr}[A(\ln(A))] \equiv \begin{pmatrix} \frac{d}{dA\_{11}}\text{Tr}[A\ln(A)] & \frac{d}{dA\_{12}}\text{Tr}[A\ln(A)] & \cdots & \frac{d}{dA\_{1M}}\text{Tr}[A\ln(A)]\\ \frac{d}{dA\_{21}}\text{Tr}[A\ln(A)] & \frac{d}{dA\_{22}}\text{Tr}[A\ln(A)] & \cdots & \frac{d}{dA\_{2M}}\text{Tr}[A\ln(A)]\\ \vdots & \vdots & \ddots & \vdots\\ \frac{d}{dA\_{M1}}\text{Tr}[A\ln(A)] & \frac{d}{dA\_{M2}}\text{Tr}[A\ln(A)] & \cdots & \frac{d}{dA\_{MM}}\text{Tr}[A\ln(A)]\\ = \ln(A) + I. & & & (10.222) \end{pmatrix}$$

## *10.4.2 Minimization of Free Energy Functionals for Density Matrices*

For any *<sup>M</sup>*×*<sup>M</sup>* Hermitian matrix *<sup>H</sup>* that satisfies *<sup>H</sup>* <sup>=</sup> *<sup>H</sup>*<sup>T</sup> , the free energy functional for an *M*×*M* trial density matrix

$$\mathbf{R} = \begin{pmatrix} \ R\_{11} & \ R\_{12} & \cdots & \ R\_{1M} \\ \ R\_{21} & \ R\_{22} & \cdots & \ R\_{2M} \\ \vdots & \vdots & \ddots & \vdots \\ \ R\_{M1} & \ R\_{M2} & \cdots & \ R\_{MM} \end{pmatrix} \tag{10.223}$$

is defined by

$$\mathcal{F}[\mathcal{R}] = \text{Tr}[\mathcal{R}(H + k\_{\mathcal{B}}T\ln(\mathcal{R}))].\tag{10.224}$$

The density matrix *P* is determined so as to satisfy the following conditional minimization with the normalization condition as follows:

$$\mathcal{P} = \operatorname\*{argmin}\_{\mathcal{R}} \{ \mathcal{F}[\mathcal{R}] \, \Big| \, \text{Tr}[\mathcal{R}] = 1 \},\tag{10.225}$$

and this reduces to

**-**

$$P = \frac{1}{Z} \exp\left(-\frac{1}{k\_{\rm B}T}H\right),\tag{10.226}$$

$$Z \equiv \text{Tr}\left[\exp\left(-\frac{1}{k\_{\text{B}}T}H\right)\right].\tag{10.227}$$

First, we introduce the Lagrange multiplier λ to ensure the normalization condition as follows:

$$\mathcal{L}[\mathbf{R}] \equiv \mathcal{F}[\mathbf{R}] - \lambda \left( \text{Tr}[\mathbf{R}] - 1 \right). \tag{10.228}$$

*R* are determined so as to satisfy the following extremum condition:

$$\frac{\partial}{\partial R\_{mm'}} \mathcal{L}[R] = 0 \ (m = 1, 2, \cdots, M, \ m' = 1, 2, \cdots, M). \qquad (10.229)$$

Finally, by determining λ so as to satisfy the normalization condition Tr! *R* " = 1, Eqs. (10.226) and (10.227) can be derived.

Because the energy matrix *H* is a Hermitian matrix, all the eigenvalues *hm* are

always real numbers and all the eigenvectors ⎛ ⎜ ⎜ ⎜ ⎝ ψ(*m*) (1) ψ(*m*) (2) . . . ψ(*m*) (*M*) ⎞ ⎟ ⎟ ⎟ ⎠ can be chosen as real

vectors and are defined by

$$H\begin{pmatrix}\psi^{(m)}(1)\\\psi^{(m)}(2)\\\vdots\\\psi^{(m)}(M)\end{pmatrix} = h^{(m)}\begin{pmatrix}\psi^{(m)}(1)\\\psi^{(m)}(2)\\\vdots\\\psi^{(m)}(M)\end{pmatrix} (m = 1, 2, \cdots, M),\qquad(10.230)$$

where

222 K. Tanaka

$$\left(\psi^{(m)}(1), \psi^{(m)}(2), \dots, \psi^{(m)}(M)\right)\begin{pmatrix} \psi^{(m)}(1) \\ \psi^{(m)}(2) \\ \vdots \\ \psi^{(m)}(M) \end{pmatrix} = 1 \ (m = 1, 2, \dots, M). \tag{10.231}$$

By using these eigenvalues and eigenvectors of *H*, the density matrix can be expressed as

$$
\widehat{\mathbf{R}} = \begin{pmatrix}
\psi^{(1)}(1) & \psi^{(2)}(1) & \cdots & \psi^{(M)}(1) \\
\psi^{(1)}(2) & \psi^{(2)}(2) & \cdots & \psi^{(M)}(2) \\
\vdots & \vdots & \ddots & \vdots \\
\psi^{(1)}(M) & \psi^{(2)}(M) & \cdots & \psi^{(M)}(M)
\end{pmatrix} \begin{pmatrix}
p^{(1)} & 0 & \cdots & 0 \\
0 & p^{(2)} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & p^{(M)}
\end{pmatrix} \begin{pmatrix}
\psi^{(1)}(1) & \psi^{(2)}(1) & \cdots & \psi^{(M)}(1) \\
\psi^{(1)}(2) & \psi^{(2)}(2) & \cdots & \psi^{(M)}(2) \\
\vdots & \vdots & \ddots & \vdots \\
\psi^{(1)}(M) & \psi^{(2)}(M) & \cdots & \psi^{(M)}(M)
\end{pmatrix}^{\mathsf{T}}, \tag{10.232}
$$

where

$$p^{(m)} = \frac{\exp\left(-\frac{1}{k\_B T} h^{(m)}\right)}{\text{Tr}\left[\exp\left(-\frac{1}{k\_B T} h^{(m)}\right)\right]} \ (m = 1, 2, \cdots, M). \tag{10.233}$$

This means that the probability of each state ⎛ ⎜ ⎜ ⎜ ⎝ ψ(*m*) (1) ψ(*m*) (2) . . . ψ(*m*) (*M*) ⎞ ⎟ ⎟ ⎟ ⎠ is *<sup>p</sup>*(*m*) for *<sup>m</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, ···, *<sup>M</sup>*.

#### *10.4.3 Tensor Products*

This section explores **tensor products (Kronecker products)** [82]. Tensor products include some fundamental mathematical concepts for achieving quantum statistical mechanical extensions of probabilistic graphical models.

We introduce tensor products for matrices and vectors by the following definitions:

$$
\begin{pmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{pmatrix} \otimes \begin{pmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix} = \begin{pmatrix} A\_{11} \begin{pmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix} & A\_{12} \begin{pmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix} \\ A\_{21} \begin{pmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix} & A\_{22} \begin{pmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix} \end{pmatrix},
$$

$$
= \begin{pmatrix} A\_{11}B\_{11} & A\_{11}B\_{12} & A\_{12}B\_{11} & A\_{12}B\_{12} \\ A\_{11}B\_{21} & A\_{11}B\_{22} & A\_{12}B\_{21} & A\_{12}B\_{22} \\ A\_{21}B\_{11} & A\_{21}B\_{12} & A\_{22}B\_{11} & A\_{12}B\_{12} \\ A\_{21}B\_{21} & A\_{21}B\_{22} & A\_{22}B\_{21} & A\_{12}B\_{22} \end{pmatrix},\tag{10.234}
$$

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 223

$$
\begin{pmatrix} A\_1 \\ A\_2 \end{pmatrix} \otimes \begin{pmatrix} B\_1 \\ B\_2 \end{pmatrix} = \begin{pmatrix} A\_1 \begin{pmatrix} B\_1 \\ B\_2 \end{pmatrix} \\ A\_2 \begin{pmatrix} B\_1 \\ B\_2 \end{pmatrix} \end{pmatrix} = \begin{pmatrix} A\_{11}B\_{11} \\ A\_{11}B\_{21} \\ A\_{21}B\_{11} \\ A\_{21}B\_{21} \end{pmatrix}. \tag{10.235}
$$

We remark that

$$
\begin{split}
&\left(\begin{pmatrix}A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{pmatrix} \otimes \begin{pmatrix}B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix}\right) \left(\begin{pmatrix}C\_{11} & C\_{12} \\ C\_{21} & C\_{22} \end{pmatrix} \otimes \begin{pmatrix}D\_{11} & D\_{12} \\ D\_{21} & D\_{22} \end{pmatrix}\right) \\
&= \left(\begin{pmatrix}A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{pmatrix} \begin{pmatrix}C\_{11} & C\_{12} \\ C\_{21} & C\_{22} \end{pmatrix}\right) \otimes \left(\begin{pmatrix}B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{pmatrix} \begin{pmatrix}D\_{11} & D\_{12} \\ D\_{21} & D\_{22} \end{pmatrix}\right). \end{split} \tag{10.236}
$$

Moreover, for the following general matrices *A* and *B*,

$$\mathbf{A} = \begin{pmatrix} A\_{11} & A\_{12} & \cdots & A\_{1M} \\ A\_{21} & A\_{22} & \cdots & A\_{2M} \\ \vdots & \vdots & \ddots & \vdots \\ A\_{M1} & A\_{M2} & \cdots & A\_{MM} \end{pmatrix}, \quad \mathbf{B} = \begin{pmatrix} B\_{11} & B\_{12} & \cdots & B\_{1N} \\ B\_{21} & B\_{22} & \cdots & B\_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ B\_{N1} & B\_{N2} & \cdots & B\_{NN} \end{pmatrix}, \tag{10.237}$$

we define the tensor product *A*⊗*B* as

$$\begin{aligned} A \otimes B &= \begin{pmatrix} A\_{11} & A\_{12} & \cdots & A\_{1N} \\ A\_{21} & A\_{22} & \cdots & A\_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ A\_{M1} & A\_{M2} & \cdots & A\_{MM} \end{pmatrix} \otimes \begin{pmatrix} B\_{11} & B\_{12} & \cdots & B\_{1N} \\ B\_{21} & B\_{22} & \cdots & B\_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ B\_{N1} & B\_{N2} & \cdots & B\_{NN} \end{pmatrix} \\ \begin{pmatrix} A\_{11}B\_{11} & A\_{12}B\_{12} & \cdots & A\_{1N}B\_{1N} \\ A\_{21}B\_{21} & A\_{22}B\_{22} & \cdots & A\_{2N}B\_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ A\_{M1}B\_{11} & A\_{M2}B\_{12} & \cdots & A\_{2N}B\_{1N} & A\_{2N}B\_{22} & \cdots & A\_{2N}B\_{2N} & A\_{M2N}B\_{22} & \cdots & A\_{MN}B\_{NN} \\ A\_{M1}B\_{11} & A\_{M2}B\_{12} & \cdots & A\_{M1}B\_{1N} & A\_{M2}B\_{22} & \cdots & A\_{2N}B\_{2N} & A\_{M2N}B\_{22} & \cdots & A\_{MN}B\_{NN} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ A\_{M1}B\_{11} & A\_{M2}B\_{12} & \cdots & A\_{M1}B\_{1N} & A\_{M2}B\_{12} & \cdots & A\_{2N}B\_{2N} & A\_{M3}B\_{2N} & \cdots & A\_{2N}B\_{2N}$$

Similarly, for vectors

224 K. Tanaka

$$\mathbf{a} = \begin{pmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_M \end{pmatrix}, \mathbf{b} = \begin{pmatrix} b\_1 \\ b\_2 \\ \vdots \\ b\_N \end{pmatrix}, \tag{10.239}$$

the tensor product *a*⊗*b* is defined as

$$a \otimes b = \begin{pmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_M \end{pmatrix} \otimes \begin{pmatrix} b\_1 \\ b\_2 \\ \vdots \\ b\_N \end{pmatrix} = \begin{pmatrix} a\_1b \\ a\_2b \\ \vdots \\ a\_2b \\ \vdots \\ a\_Mb \end{pmatrix} = \begin{pmatrix} a\_1b \\ a\_2b \\ \vdots \\ a\_2b \\ a\_2b \end{pmatrix} = \begin{pmatrix} a\_1b\_1 \\ a\_1b\_2 \\ \vdots \\ a\_1b\_N \\ \vdots \\ a\_2b\_1 \\ a\_2b\_2 \\ \vdots \\ a\_2b\_2 \\ \vdots \\ a\_Mb\_N \end{pmatrix} = \begin{pmatrix} a\_1b\_1 \\ a\_1b\_2 \\ \vdots \\ a\_1b\_N \\ a\_2b\_1 \\ \vdots \\ a\_2b\_2 \\ \vdots \\ a\_Mb\_N \\ a\_Mb\_2 \\ \vdots \\ a\_Mb\_N \end{pmatrix}, \quad \text{(10.240)}$$

$$\begin{aligned} \mathbf{a}^{\mathrm{T}} \otimes \mathbf{b}^{\mathrm{T}} &= (a\_1, a\_2, \dots, a\_M) \otimes (b\_1, b\_2, \dots, b\_N) = \left(a\_1 \mathbf{b}^{\mathrm{T}}, a\_2 \mathbf{b}^{\mathrm{T}}, \dots, a\_M \mathbf{b}^{\mathrm{T}}\right) \\ &= \left(a\_1 b\_1, a\_1 b\_2, \dots, a\_1 b\_N, a\_2 b\_1, a\_2 b\_2, \dots, a\_2 b\_N, \dots, a\_M b\_1, a\_M b\_2, \dots, a\_M b\_N\right). \end{aligned} \tag{10.241}$$

We introduce the following two-dimensional fundamental vectors:

$$|1\rangle \equiv \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \ |2\rangle \equiv \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \tag{10.242}$$

$$|1\rangle \equiv (1,0), \ \langle 2| \equiv (0,1). \tag{10.243}$$

By using the fundamental vectors in two-dimensional space, we define the vertical and horizontal fundamental vectors in four-dimensional space by using the tensor product as follows:

$$\begin{cases} |1,1\rangle \equiv |1\rangle \otimes |1\rangle = \begin{pmatrix} 1\\0\\0\\0\\0\\0\\1 \end{pmatrix}, |1,2\rangle \equiv |1\rangle \otimes |2\rangle = \begin{pmatrix} 0\\1\\0\\0\\0\\0\\1 \end{pmatrix},\\ |2,1\rangle \equiv |2\rangle \otimes |1\rangle = |2\rangle \otimes |2\rangle = \begin{pmatrix} 0\\0\\0\\0\\1 \end{pmatrix}. \end{cases} \tag{10.244}$$

$$\begin{cases} \langle 1, 1 \vert \equiv \langle 1 \vert \otimes \langle 1 \vert = (1, 0, 0, 0), \,\langle 1, 2 \vert \equiv \langle 1 \vert \otimes \langle 2 \vert = (0, 1, 0, 0), \\\\ \langle 2, 1 \vert \equiv \langle 2 \vert \otimes \langle 1 \vert = (0, 0, 1, 0), \,\langle 2, 2 \vert \equiv \langle 2 \vert \otimes \langle 2 \vert = (0, 0, 0, 1), \end{cases} \tag{10.245}$$

It is easy to confirm the following equality:

$$
\langle i,j|\begin{pmatrix} A\_{11} \ A\_{12} \\ A\_{21} \ A\_{22} \end{pmatrix} \otimes \begin{pmatrix} B\_{11} \ B\_{12} \\ B\_{21} \ B\_{22} \end{pmatrix} | i',j' \rangle = A\_{i,i'} B\_{j,j'}.\tag{10.246}
$$

By extending the above example to general-dimensional fundamental vectors, the (*i*, *j*|*i* , *j* )-components of *A*×*B* for any *M*×*M* matrix *A* and *N*×*N* matrix *B* are expressed as

$$
\langle i,j|\mathbf{A}\otimes\mathbf{B}|i',j'\rangle = \langle i|\mathbf{A}|i'\rangle \langle j|\mathbf{B}|j'\rangle = A\_{i,i'}B\_{j,j'}.\tag{10.247}
$$

For *M*×*M* matrices *A* and *C* and *N*×*N* matrices *B* and *D*, we have

$$(A \otimes B)(C \otimes D) = (AC) \otimes (BD),\tag{10.248}$$

and

$$\text{Tr}[A \otimes B] = (\text{Tr}[A])(\text{Tr}[B]).\tag{10.249}$$

In deriving the equality in Eq. (10.248), the (*i*, *j*|*i* , *j* )-components of the *M N*×*M N* matrix (*A*⊗*B*)(*C*⊗*D*) are given by

$$
\langle i,j|(\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D})|i',j'\rangle = \sum\_{i''=1}^{M}\sum\_{j''=1}^{N}\langle i,j|(\mathbf{A}\otimes\mathbf{B})|i'',j''\rangle\langle i'',j''|(\mathbf{C}\otimes\mathbf{D})|i',j'\rangle.
\tag{10.2.50}
$$

$$
\langle i\in\{1,2,\cdots,M\},\ i'\in\{1,2,\cdots,M\},\ j\in\{1,2,\cdots,N\},\ j'\in\{1,2,\cdots,N\}.\tag{10.2.50}
$$

For the *M*×*M* and *N*×*N* identity matrices *I(M)* and *I(N)* , it is valid that 226 K. Tanaka

$$(A \otimes I^{(N)}) \left( I^{(M)} \otimes \mathcal{B} \right) = \left( I^{(M)} \otimes \mathcal{B} \right) \left( A \otimes I^{(N)} \right) = A \otimes \mathcal{B}. \tag{10.251}$$

Moreover, by using mathematical induction, we can confirm the following binomial expansion:

$$\left(A \otimes I^{(N)} + I^{(M)} \otimes B\right)^n = \sum\_{k=0}^n \frac{n!}{k!(n-k)!} \left(A \otimes I^{(N)}\right)^k \left(I^{(M)} \otimes B\right)^{n-k}.\quad(10.252)$$

By using Eq. (10.252), we can derive the following equality:

exp *<sup>A</sup>*⊗*I(N)* <sup>+</sup> *<sup>I</sup>(M)* ⊗*B* <sup>=</sup> +∞ *n*=0 1 *n*! *<sup>A</sup>*⊗*I(N)* <sup>+</sup> *<sup>I</sup>(M)* ⊗*B n* <sup>=</sup> +∞ *n*=0 1 *n*! *n k*=0 *n*! *k*!(*n* − *k*)! *<sup>A</sup>*⊗*I(N) k I(M)* ⊗*B <sup>n</sup>*−*<sup>k</sup>* <sup>=</sup> +∞ *k*=0 +∞ *n*=*k* 1 *n*! *n*! *k*!(*n* − *k*)! *<sup>A</sup>*⊗*I(N) k I(M)* ⊗*B <sup>n</sup>*−*<sup>k</sup>* <sup>=</sup> +∞ *k*=0 +∞ *n*=*k* 1 *k*!(*n* − *k*)! *<sup>A</sup>*⊗*I(N) k I(M)* ⊗*B <sup>n</sup>*−*<sup>k</sup>* <sup>=</sup> +∞ *k*=0 +∞ *l*=0 1 *k*!*l*! *<sup>A</sup>*⊗*I(N) k I(M)* ⊗*B l* = +∞ *k*=0 1 *k*! *<sup>A</sup>*⊗*I(N) k* +∞ *l*=0 1 *l*! *I(M)* ⊗*B l* = +∞ *k*=0 1 *k*! *Ak* <sup>⊗</sup>*I(N)* +∞ *l*=0 1 *l*! *I(M)* <sup>⊗</sup>*B<sup>l</sup>* = +∞ *k*=0 1 *k*! *Ak* <sup>⊗</sup>*I(N) I(M)* ⊗ +∞ *l*=0 1 *l*! *Bl* =  exp(*A*)⊗*I(N) I(M)* ⊗exp(*B*) = exp(*A*)⊗exp(*B*). (10.253)

By taking the logarithm of both sides of Eq. (10.253), we have

$$
\ln(\exp(A)) \otimes I^{(N)} + I^{(M)} \otimes \ln(\exp(B)) = \ln(\exp(A) \otimes \exp(B)). \quad (10.254)
$$

## *10.4.4 Quantum Probabilistic Graphical Models and Quantum Expectation-Maximization Algorithm*

This section explores a type of probabilistic graphical modeling based on Pauli spin matrices from the quantum statistical mechanical point of view. Our review focuses on the transverse Ising model in statistical mechanical informatics [37, 83, 84]. Note that generalization of the framework is possible.

Consider a graph specified by nodes and edges (*V*, *E*) where *V* is the set of all nodes *i* and *E* is the set of all edges {*i*, *j*}. We introduce Pauli spin matrices *σ <sup>z</sup>* and *σ <sup>x</sup>* as well as an identity matrix *I* defined by

$$
\sigma^z = \begin{pmatrix} +1 & 0 \\ 0 & -1 \end{pmatrix}, \ \sigma^x = \begin{pmatrix} 0 & +1 \\ +1 & 0 \end{pmatrix}, \ I = \begin{pmatrix} +1 & 0 \\ 0 & +1 \end{pmatrix}. \tag{10.255}
$$

The Pauli spin matrices at each node *i*∈*V*≡{1, 2, ···, *N*} are defined by

$$\begin{cases} \sigma\_1^{\mathbf{x}} \equiv \sigma^{\mathbf{x}} \otimes I \otimes I \otimes \cdots \otimes I \otimes I, \quad \sigma\_1^{\mathbf{y}} \equiv \sigma^{\mathbf{y}} \otimes I \otimes I \otimes \cdots \otimes I \otimes I, \quad \sigma\_2^{\mathbf{z}} \equiv \sigma^{\mathbf{z}} \otimes I \otimes I \otimes \cdots \otimes I \otimes I, \\ \sigma\_2^{\mathbf{z}} \equiv I \otimes \sigma^{\mathbf{x}} \otimes I \otimes \cdots \otimes I \otimes I, \quad \sigma\_2^{\mathbf{y}} \equiv I \otimes \sigma^{\mathbf{y}} \otimes I \otimes \cdots \otimes I \otimes I, \quad \sigma\_2^{\mathbf{z}} \equiv I \otimes \sigma^{\mathbf{z}} \otimes I \otimes \cdots \otimes I \otimes I, \\ \vdots & \vdots & \vdots \\ \sigma\_{|V|}^{\mathbf{x}} \equiv I \otimes I \otimes I \otimes \cdots \otimes I \otimes \sigma^{\mathbf{x}}, \sigma\_{|V|}^{\mathbf{y}} \equiv I \otimes I \otimes I \otimes \cdots \otimes I \otimes \sigma^{\mathbf{y}}, \sigma\_{|V|}^{\mathbf{z}} \equiv I \otimes I \otimes I \otimes \cdots \otimes I \otimes \sigma^{\mathbf{z}}. \end{cases} \tag{10.256}$$

The vertical and horizontal *N*-dimensional state vectors are defined by

$$|s\_1, s\_2, \dots, s\_{|V|}\rangle \equiv |s\_1\rangle \otimes |s\_2\rangle \otimes \dots \otimes |s\_{|V|}\rangle \text{ ( $s\_1 \in \Omega$ ,  $s\_2 \in \Omega$ ,  $\dots$ ,  $s\_{|V|} \in \Omega$ )},\tag{10.257}$$

$$|\langle s\_1, s\_2, \dots, s\_{|V|} \rangle| \equiv \langle s\_1 | \otimes \langle s\_2 | \otimes \dots \otimes \langle s\_{|V|} \rangle \ (s\_1 \in \Omega, s\_2 \in \Omega, \dots, s\_{|V|} \in \Omega), \tag{10.258}$$

where

$$\langle + \ | \equiv (1, 0), \ \langle - \ | \equiv (0, 1), \ | + \ | \ \rangle \equiv \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \ | - \ | \ \rangle \equiv \begin{pmatrix} 0 \\ 1 \end{pmatrix}. \ (10.259)$$

By using the state vector representations,(*s*1,*s*2, ···,*s*|*<sup>V</sup>*||*s* 1,*s* <sup>2</sup>, ···,*s* |*V*| )-elements of *σ <sup>x</sup> <sup>i</sup>* , *<sup>σ</sup> <sup>z</sup> <sup>j</sup>* and *<sup>σ</sup> <sup>z</sup> <sup>i</sup> <sup>σ</sup> <sup>z</sup> <sup>j</sup>* are given as

$$\langle s\_1, s\_2, \dots, s\_{|V|} | \sigma\_i^{\mathbf{x}} | s\_1', s\_2', \dots, s\_{|V|}' \rangle = \left( \prod\_{k \in V \backslash \langle i \rangle} \delta\_{s\_k, s\_k'} \right) \langle s\_i | \sigma^{\mathbf{x}} | s\_i' \rangle$$

$$(s\_1 \in \Omega, s\_2 \in \Omega, \dots, \cdot, s\_{|V|} \in \Omega; \ s\_1' \in \Omega, \ s\_2' \in \Omega, \dots, \cdot, s\_{|V|}' \in \Omega), \quad (10.260)$$

$$\langle s\_1, s\_2, \dots, s\_{|V|} | \sigma\_i^z | s\_1', s\_2', \dots, s\_{|V|}' \rangle = \left( \prod\_{k \in V \backslash \langle i \rangle} \delta\_{s\_k, s\_k'} \right) \langle s\_i | \sigma^z | s\_i' \rangle$$

$$(s\_1 \in \Omega, s\_2 \in \Omega, \dots, s\_{|V|} \in \Omega; \ s\_1' \in \Omega, s\_2' \in \Omega, \dots, s\_{|V|}' \in \Omega), \quad (10.261)$$

$$
\langle s\_1, s\_2, \dots, s\_{|V|} | \sigma\_I^{\pi} \sigma\_j^{\pi} | s\_1', s\_2', \dots, s\_{|V|}' \rangle = \left( \prod\_{k \in V \backslash \langle l, j \rangle} \delta\_{s\_k, s\_k'} \right) \langle s\_l, s\_j | (\sigma^{\pi} \otimes I)(I \otimes \sigma^{\pi}) | s\_i', s\_j' \rangle
$$

$$
(s\_1 \in \Omega, s\_2 \in \Omega, \dots, s\_{|V|} \in \Omega; \ s\_1' \in \Omega, \ s\_2' \in \Omega, \dots, s\_{|V|}' \in \Omega). \tag{10.262}
$$

The prior density matrix *P*(α, γ ) and the data generative density matrix *P*(*d*|β) for a given data vector *d* are assumed to be

$$P(\boldsymbol{\alpha}, \boldsymbol{\gamma}) = \frac{\exp\left(-\frac{1}{2}\alpha \sum\_{\{i,j\} \in E} \left(\sigma\_i^z - \sigma\_j^z\right)^2 + \gamma \sum\_{i \in V} \sigma\_i^x\right)}{\text{Tr}\left[\exp\left(-\alpha \sum\_{\{i,j\} \in E} \left(\sigma\_i^z - \sigma\_j^z\right)^2 + \gamma \sum\_{i \in V} \sigma\_i^x\right)\right]},\tag{10.263}$$

$$P(d|\beta) = \left(\sqrt{\frac{\beta}{2\pi}}\right)^{|V|} \exp\left(-\frac{1}{2}\beta \sum\_{i \in V} \left(d\_i I^{(2^{|V|})} - \sigma\_i^z\right)^2\right),\qquad(10.264)$$

where α, β, and γ are hyperparameters. The data generative density matrix *P*(*d*|β) is expressed as a |-| |*V*| ×|-| <sup>|</sup>*V*<sup>|</sup> diagonal matrix in which all the off-diagonal elements are zero. Each diagonal element *s*1,*s*2, ···,*s*|*<sup>V</sup>*||*P*(*d*|β)|*s*1,*s*2, ···,*s*|*<sup>V</sup>*| ((*s*1,*s*2, ···,*s*|*V*|)<sup>T</sup>∈-|*V*| ) corresponds to the probability of the data vector *d* according to additive white Gaussian noise when the state vector (*s*1,*s*2, ···,*s*|*V*|) is given, and β corresponds to the inverse of variance in the additive white Gaussian noise. By considering a quantum statistical mechanical extension of the Bayes formula, a posterior density matrix *P*(*d*, α, β, γ ) and a joint density matrix *P*(*d*|α, β, γ ) can be expressed as follows:

$$\begin{split}P(\boldsymbol{d},\boldsymbol{\omega},\boldsymbol{\beta},\boldsymbol{\gamma}) & \equiv \frac{\exp\Big[\ln\Big(\boldsymbol{P}(\boldsymbol{d}|\boldsymbol{\beta})\Big) + \ln\Big(\boldsymbol{P}(\boldsymbol{\alpha},\boldsymbol{\gamma})\Big)}{\mathrm{Tr}\Big[\exp\Big(\ln\Big(\boldsymbol{P}(\boldsymbol{d}|\boldsymbol{\beta})\Big) + \ln\Big(\boldsymbol{P}(\boldsymbol{\alpha},\boldsymbol{\gamma})\Big)\Big)\Big]} \\ & \quad \frac{\exp\Big(-\frac{1}{2}\boldsymbol{\alpha}\sum\_{\{\boldsymbol{l},\boldsymbol{l}\}\in\boldsymbol{E}}{\left(\boldsymbol{l},\boldsymbol{l}\right)\mathbb{I}\boldsymbol{e}\boldsymbol{I}} - \frac{1}{2}\beta\sum\_{\boldsymbol{l}\in\boldsymbol{V}}\Big(\boldsymbol{d}\boldsymbol{I}\,\boldsymbol{I}^{\{\mathbf{2}^{|\boldsymbol{V}|}\}} - \boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{\tau}}\Big)^{2} + \gamma\sum\_{\boldsymbol{l}\in\boldsymbol{V}}\boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{\tau}}\Big)}{\mathrm{Tr}\Big[\exp\Big(-\frac{1}{2}\boldsymbol{\alpha}\sum\_{\{\boldsymbol{l},\boldsymbol{l}\}\in\boldsymbol{E}}\Big(\boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{\tau}} - \boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{\tau}}\Big)^{2} - \frac{1}{2}\beta\sum\_{\boldsymbol{l}\in\boldsymbol{V}}\Big(\boldsymbol{d}\boldsymbol{I}\,\boldsymbol{I}^{\{\mathbf{2}^{|\boldsymbol{V}|}\}} - \boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{\tau}}\Big)^{2} + \gamma\sum\_{\boldsymbol{l}\in\boldsymbol{V}}\boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{\tau}}\Big)}}.\end{split} \tag{10.2.65}$$

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 229

$$\begin{split} \mathcal{P}\left(\boldsymbol{d}\,|\,\alpha,\beta,\gamma\right) &= \exp\Big(\ln\left(\boldsymbol{P}\,(\boldsymbol{d}|\boldsymbol{\beta})\right) + \ln\left(\boldsymbol{P}\,(\alpha,\gamma)\right)\Big) \\ &= \frac{\exp\Big(-\frac{1}{2}\alpha\sum\_{\left[i,j\right]\in E}\left(\sigma\_{i}^{z}-\sigma\_{j}^{z}\right)^{2} + \frac{1}{2}\beta\sum\_{i\in V}(d\_{i}\,\boldsymbol{I}^{\{2^{|V|}\}}-\sigma\_{i}^{z})^{2} + \gamma\sum\_{i\in V}\sigma\_{i}^{x}\Big)}{\left(\sqrt{\frac{2\pi}{\beta}}\right)^{|V|}\operatorname{Tr}\Big[\exp\Big(-\frac{1}{2}\alpha\sum\_{\left[i,j\right]\in E}\left(\sigma\_{i}^{z}-\sigma\_{j}^{z}\right)^{2} + \gamma\sum\_{i\in V}\sigma\_{i}^{x}\Big)\Big]}. \end{split} \tag{10.266}$$

The estimates of the states of the hyperparameters  α(*d*), β( *<sup>d</sup>*), γ (*d*) are found that maximize the marginal likelihood Tr! *P d* α, β, γ " as follows:

$$\left(\widehat{\alpha}(\mathbf{d}), \widehat{\beta}(\mathbf{d}), \widehat{\gamma}(\mathbf{d})\right) \equiv \arg\max\_{(\alpha, \beta, \gamma)} \text{Tr}\left[\mathbf{P}\left(\mathbf{d}|\alpha, \beta, \gamma\right)\right].\tag{10.267}$$

To achieve the estimation criteria for hyperparameters α, β, and γ in Eqs. (10.267), we extend the *Q*-function in Eq. (10.8) to the following expression from a quantum statistical mechanical point of view:

$$\mathcal{Q}\left(\alpha,\beta,\boldsymbol{\uprho}\,\middle|\,\alpha',\boldsymbol{\uprho}',\boldsymbol{\uprho}',\mathbf{d}\right) \equiv \text{Tr}\Big[\mathbf{P}\left(\mathbf{d},\alpha',\boldsymbol{\uprho}',\boldsymbol{\uprho}'\right)\text{ln}\Big(\mathbf{P}\left(\mathbf{d}\,\middle|\,\alpha,\boldsymbol{\uprho},\boldsymbol{\uprho}\right)\Big)\Big] (10.268)$$

The quantum EM algorithm can be summarized as a procedure consisting of the following **E-** and **M-step** which are repeated for *<sup>t</sup>* <sup>=</sup> <sup>0</sup>, <sup>1</sup>, <sup>2</sup>, ··· until<sup>α</sup> and <sup>β</sup> converge:

**E-step:** Compute *Q* α, β α(*t*), β(*t*), *<sup>d</sup>* for various values of α and β. **M-step:** Determine  α(*t* + 1), β(*t* + 1) so as to satisfy the extremum of conditions of *Q* α, β α(*t*), β(*t*), *<sup>d</sup>* with respect to <sup>α</sup> and <sup>β</sup>. Updateα←α(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) and <sup>β</sup> ←β(*<sup>t</sup>* <sup>+</sup> 1).

The quantum EM algorithm can obtain the solution of the extremum condition in the marginal likelihood Tr! *P d* α, β, γ " , because we have the following equalities:

⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ ! ∂ ∂α *Q* α, β, γ α, β, γ , *<sup>d</sup>* " (α,β,γ )=(α,β, γ ) <sup>=</sup> % <sup>∂</sup> ∂α ln Tr% *P d* α, β, γ &
& (α,β,γ )=(α,β, γ ), ! <sup>∂</sup> ∂β *Q* α, β, γ α, β, γ , *<sup>d</sup>* " (α,β,γ )=(α,β, γ ) <sup>=</sup> ! ∂ ∂β ln Tr% *P d* α, β, γ &
" (α,β,γ )=(α,β, γ ), ! <sup>∂</sup> ∂γ *Q* α, β, γ γ , β, γ , *<sup>d</sup>* " (α,β,γ )=(α,β, γ ) <sup>=</sup> % <sup>∂</sup> ∂α ln Tr% *P d* α, β, γ &
& (α,β,γ )=(α,β, γ ). (10.269)

By substituting Eq. (10.265) into Eq. (10.268), the *Q*-function can be rewritten as follows:

$$\mathcal{Q}(\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}|\boldsymbol{\alpha}',\boldsymbol{\beta}',\boldsymbol{\gamma}',\mathbf{d}) = -\frac{1}{2}\boldsymbol{\alpha}\sum\_{\{i,j\}\in E} \text{Tr}(\boldsymbol{\sigma}^{z}\otimes\boldsymbol{I} - \boldsymbol{I}\otimes\boldsymbol{\sigma}^{z})^{2}P\_{ij}(\boldsymbol{d},\boldsymbol{\alpha}',\boldsymbol{\beta}',\boldsymbol{\gamma}')$$

$$\begin{split} & -\frac{1}{2}\beta \sum\_{i \in V} \text{Tr} \Big[ d\_i \boldsymbol{I} - \sigma^z \boldsymbol{i} \,^2 \boldsymbol{P}\_i (\boldsymbol{d}, \alpha', \boldsymbol{\beta'}, \boldsymbol{\gamma'}) \Big] \\ & + \gamma \sum\_{i \in V} \text{Tr} \Big[ \sigma^x\_i \, \boldsymbol{P}\_i (\boldsymbol{d}, \alpha', \boldsymbol{\beta'}, \boldsymbol{\gamma'}) \Big] \\ & + |V| \ln \Big( \sqrt{\frac{2\pi}{\beta}} \Bigg) \\ & + \ln \Big( \text{Tr} \Big[ \exp \Big( -\frac{1}{2} \alpha \sum\_{\{i, j\} \in E} \left( \sigma^z \otimes I - I \otimes \sigma^z \right)^2 + \gamma \sum\_{i \in V} \sigma^x\_i \Big) \Big] . \end{split} \tag{10.270}$$

The extremum conditions of *Q* α, β, γ α(*t*), β(*t*), γ (*t*), *<sup>d</sup>* with respect to α, β and γ , such that,

$$\begin{cases} \frac{\partial}{\partial \boldsymbol{\alpha}} \mathcal{Q}(\boldsymbol{\alpha}, \boldsymbol{\beta}, \boldsymbol{\gamma} \left| \boldsymbol{\alpha}(t), \boldsymbol{\beta}(t), \boldsymbol{\gamma}(t), \mathbf{d} \right| = 0, \\\frac{\partial}{\partial \boldsymbol{\beta}} \mathcal{Q}(\boldsymbol{\alpha}, \boldsymbol{\beta}, \boldsymbol{\gamma} \left| \boldsymbol{\alpha}(t), \boldsymbol{\beta}(t), \boldsymbol{\gamma}(t), \mathbf{d} \right| = 0, \\\frac{\partial}{\partial \boldsymbol{\gamma}} \mathcal{Q}(\boldsymbol{\alpha}, \boldsymbol{\beta}, \boldsymbol{\gamma} \left| \boldsymbol{\alpha}(t), \boldsymbol{\beta}(t), \boldsymbol{\gamma}(t), \mathbf{d} \right| = 0 \end{cases} \tag{10.271}$$

can be reduced to the following simutaneous update rules in the quantum EM algorithm:

$$\begin{split} \sum\_{\{i,j\} \in E} \text{Tr} \Big[ \left( \sigma^{z} \otimes I - I \otimes \sigma^{z} \right)^{2} \mathbf{P}\_{ij} (\alpha(t+1), \boldsymbol{\chi}(t+1)) \Big] \\ = \sum\_{\{i,j\} \in E} \text{Tr} \Big[ \left( \sigma^{z} \otimes I - I \otimes \sigma^{z} \right)^{2} \mathbf{P}\_{ij} (\mathbf{d}, \alpha(t), \boldsymbol{\beta}(t), \boldsymbol{\chi}(t)) \Big], \end{split} \tag{10.272}$$

$$\frac{1}{\beta(t+1)} = \sum\_{i \in V} \text{Tr} \Big[ (d\_i I - \sigma^z)^2 P\_i(\mathbf{d}, \alpha(t), \beta(t), \gamma(t)) \Big],\qquad(10.273)$$

$$\sum\_{i \in V} \text{Tr} \Big[ \sigma^{\mathbf{x}} \boldsymbol{P}\_i(\boldsymbol{a}(t+1), \boldsymbol{\gamma}(t+1)) \Big] = \sum\_{i \in V} \text{Tr} \Big[ \sigma^{\mathbf{x}} \boldsymbol{P}\_i(\boldsymbol{d}, \boldsymbol{a}(t), \boldsymbol{\beta}(t), \boldsymbol{\gamma}(t)) \Big], \quad (10.274)$$

where

$$
\begin{split}
\langle s\_{l}|\mathbf{P}\_{l}(\mathbf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})|s'\_{l}\rangle &= \langle s\_{l}|\mathrm{Tr}\_{|\mathbf{V}|}\mathbf{P}(\mathbf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})|s'\_{l}\rangle \\ &= \sum\_{\mathbf{r}\_{l}\in\Omega\times\mathbf{r}\in\Omega}\sum\_{\mathbf{r}\_{l}\in\Omega}\cdots\sum\_{\mathbf{r}\_{l}\in\Omega}\sum\_{\mathbf{r}'\_{l}\in\Omega}\sum\_{\mathbf{r}'\_{l}\in\Omega}\cdots\sum\_{\mathbf{r}'\_{l}|\mathbf{v}\_{l}}\delta\_{\mathbf{s}\_{l},\mathbf{v}\_{l}}\delta\_{\mathbf{s}'\_{l},\mathbf{r}'\_{l}} \\ &\times \left(\prod\_{j\in V\cup\{\mathbf{r}\}}\delta\_{\mathbf{r}\_{j}\mid\mathbf{r}'\_{j}}\right)\langle\mathbf{r}\_{1},\mathbf{r}\_{2},\cdots,\mathbf{r}\_{|V|}|\mathbf{P}(\mathbf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})|\mathbf{r}'\_{1},\mathbf{r}'\_{2},\cdots,\mathbf{r}\_{|V|}'\rangle \\ &\quad (s\_{l}\in\Omega,\ \ s'\_{l}\in\Omega,\ i\in V),\tag{10.275}
\end{split}
$$

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 231

$$
\begin{split}
\langle s\_{i},s\_{j}|\operatorname{P}\_{ij}(\boldsymbol{d},\boldsymbol{a},\boldsymbol{\beta},\boldsymbol{\gamma})|s'\_{i},s'\_{j}\rangle &= \langle s\_{i},s\_{j}|\operatorname{P}\_{ji}(\boldsymbol{d},\boldsymbol{a},\boldsymbol{\beta},\boldsymbol{\gamma})|s'\_{i},s'\_{j}\rangle \\ &= \langle s\_{i},s\_{j}|\operatorname{T}\_{\left\{\left|i,j\right\rangle\right|}\operatorname{P}(\boldsymbol{d},\boldsymbol{a},\boldsymbol{\beta},\boldsymbol{\gamma})|s'\_{i},s'\_{j}\rangle \\ &\equiv \sum\_{\tau\_{1}\in\Omega\tau\_{2}\in\Omega}\cdots\sum\_{\tau\_{|V|}\in\Omega\tau'\_{1}\in\Omega\tau'\_{2}\in\Omega}\sum\_{\tau'\_{|V|}\in\Omega}\cdots\sum\_{\tau'\_{|V|}\in\Omega}\delta\_{s\_{i},\tau\_{1}}s\_{s\_{j},\tau\_{1}}s'\_{s'\_{i},\tau'\_{1}}s'\_{j},\tau'\_{j} \\ &\times\left(\prod\_{\ell\in V\backslash\{i,j\}}\delta\_{\tau\_{\ell},\tau'\_{\ell}}\right)\langle\tau\_{1},\tau\_{2},\cdots,\tau\_{|V|}|\operatorname{P}(\boldsymbol{d},\boldsymbol{a},\boldsymbol{a},\boldsymbol{\beta},\boldsymbol{\gamma})|\tau'\_{1},\tau'\_{2},\cdots,\tau'\_{|V|}\rangle
\end{split}
\tag{10.276}
$$

$$
\begin{split}
\langle s\_{i},s\_{j}|\mathbf{P}\_{ij}(\boldsymbol{\alpha},\boldsymbol{\gamma})|s\_{i}^{'},s\_{j}^{'}\rangle &= \langle s\_{i},s\_{j}|\mathbf{P}\_{ji}(\boldsymbol{\alpha},\boldsymbol{\gamma})|s\_{i}^{'},s\_{j}^{'}\rangle \\ &= \langle s\_{i},s\_{j}|\mathrm{Tr}\_{\{i,j\}}\mathbf{P}(\boldsymbol{\alpha},\boldsymbol{\gamma})|s\_{i}^{'},s\_{j}^{'}\rangle \\ &\equiv \sum\_{\mathbf{r}\_{l}\in\Omega\mathbf{r}\_{2}\in\Omega}\sum\_{\mathbf{r}\_{l}|V\_{l}\in\Omega\mathbf{r}\_{l}^{'}}\sum\_{\mathbf{r}\_{l}'\in\Omega\mathbf{r}\_{l}'}\sum\_{\mathbf{r}\_{l}'\in\Omega}\cdots\sum\_{\mathbf{r}\_{|V|}'\in\Omega}\delta\_{\mathbf{s}\_{l},\mathbf{r}\_{l}}\delta\_{\mathbf{r}\_{j}',\mathbf{r}\_{l}'}\delta\_{s\_{j}',\mathbf{r}\_{j}'} \\ &\times \left(\prod\_{k\in V\backslash\{i,j\}}\delta\_{\mathbf{r}\_{k},\mathbf{r}\_{k}'}\right)\langle\mathbf{r}\_{1},\mathbf{r}\_{2},\cdots,\mathbf{r}\_{|V|}|\mathbf{P}(\boldsymbol{\alpha},\boldsymbol{\gamma})|\mathbf{r}\_{1}^{'},\mathbf{r}\_{2}',\cdots,\mathbf{r}\_{|V|}'\rangle
\end{split}
$$

$$
\begin{split}
\langle\mathbf{s}\_{i}\in\Omega\ s\_{j}\in\Omega,\ s\_{j}'\in\Omega,\ s\_{j}'\in\Omega,\ i\in V,\ j\in V,\ i$$

Finally, we explain how the state at each node is estimated from the reduced posterior density matrix *Pi*(*d*, α, β, γ ) in Eq. (10.276) for each node *i*(∈*V*). The reduced posterior density matrix *Pi*(*d*, α, β, γ ) is a real symmetric matrix and can be diagonalized as

$$\begin{split}P\_{l}(\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) &= \begin{pmatrix} \boldsymbol{\psi}\_{i}^{(1)}(+1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \ \boldsymbol{\psi}\_{i}^{(2)}(+1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})\\ \boldsymbol{\psi}\_{i}^{(1)}(-1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \ \boldsymbol{\psi}\_{i}^{(2)}(-1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \end{pmatrix} \\ &\times \begin{pmatrix} P\_{i}^{(1)}(\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) & 0\\ 0 & P\_{i}^{(2)}(\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \end{pmatrix} \\ &\times \begin{pmatrix} \boldsymbol{\psi}\_{i}^{(1)}(+1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \ \boldsymbol{\psi}\_{i}^{(2)}(+1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})\\ \boldsymbol{\psi}\_{i}^{(1)}(-1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \ \boldsymbol{\psi}\_{i}^{(2)}(-1|\mathsf{d},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) \end{pmatrix}^{\mathsf{T}},\end{split} \tag{10.278}$$

where the eigenvalues, *P*(1) *<sup>i</sup>* (*d*, α, β, γ ) and *<sup>P</sup>*(2) *<sup>i</sup>* (*d*, α, β, γ ), are always real numbers. The vectors ψ(1) *<sup>i</sup>* (+1|*d*, α, β, γ ) ψ(1) *<sup>i</sup>* (−1|*d*, α, β, γ ) and ψ(2) *<sup>i</sup>* (+1|*d*, α, β, γ ) ψ(2) *<sup>i</sup>* (−1|*d*, α, β, γ ) correspond to the eigenvectors for the eigenvalues *P*(1) *<sup>i</sup>* (*d*, α, β, γ ) and *<sup>P</sup>*(2) *<sup>i</sup>* (*d*, α, β, γ ), such that

232 K. Tanaka

$$P\_{\vec{i}}(\mathsf{d},\alpha,\beta,\gamma)\begin{pmatrix}\psi\_{\vec{i}}^{(n)}(+1|\mathsf{d},\alpha,\beta,\gamma)\\\psi\_{\vec{i}}^{(n)}(-1|\mathsf{d},\alpha,\beta,\gamma)\end{pmatrix} = P\_{\vec{i}}(n|\mathsf{d},\alpha,\beta,\gamma)\begin{pmatrix}\psi\_{\vec{i}}^{(n)}(+1|\mathsf{d},\alpha,\beta,\gamma)\\\psi\_{\vec{i}}^{(n)}(-1|\mathsf{d},\alpha,\beta,\gamma)\end{pmatrix}\tag{10.279}$$
 
$$(i\in V,\ n\in\{1,2\}).\tag{10.279}$$

This means that the eigenvectors correspond to all possible states and probabilities of the states ψ(1) *<sup>i</sup>* (+1|*d*, α, β, γ ) ψ(1) *<sup>i</sup>* (−1|*d*, α, β, γ ) and ψ(2) *<sup>i</sup>* (+1|*d*, α, β, γ ) ψ(2) *<sup>i</sup>* (−1|*d*, α, β, γ ) are *P*(2) *<sup>i</sup>* (*d*, α, β, γ ) and *P*(2) *<sup>i</sup>* (*d*, α, β, γ ), respectively, in the reduced density matrix *Pi*(*d*, α, β, γ ). The estimates for the state at each node *i*(∈*V*), ψ *i* +1 *d*,α, β, <sup>γ</sup> ψ *i* −1 *d*,α, β, <sup>γ</sup> , are given by

$$
\begin{pmatrix}
\widehat{\psi}\_i(+1|\mathbf{d}, \widehat{\alpha}, \widehat{\beta}, \widehat{\gamma}) \\
\widehat{\psi}\_i(-1|\mathbf{d}, \widehat{\alpha}, \widehat{\beta}, \widehat{\gamma})
\end{pmatrix} \equiv \operatorname\*{argmax}\_{\mathbf{r}} P\_i(\mathbf{d}, \widehat{\alpha}, \widehat{\beta}, \widehat{\gamma}) \ (i \in V) . \tag{10.280}
$$

These estimation criteria in Eqs. (10.267) and (10.280) correspond to quantum statistical mechanical extensions of the maximizations of marginal likelihood and posterior marginal.

## *10.4.5 Quantum Expectation-Maximization (EM) Algorithm for Probabilistic Image Segmentation*

This section applies the framework of Sect. 10.4.5 to the EM algorithm for probabilistic image segmentations in Sect. 10.2.3. In our present framework, **Hubbard Operators** [85] are used instead of Pauli spin matrices.

First, we introduce Hubbard operators *X*τ,τ *<sup>i</sup>* at each node *i*(∈*V*) as follows:

$$\begin{cases} X\_1^{(\mathbf{r},\mathbf{r}')} \equiv X^{(\mathbf{r},\mathbf{r}')} \otimes I \otimes I \otimes \cdots \otimes I \otimes I, \\ X\_2^{(\mathbf{r},\mathbf{r}')} \equiv I \otimes X^{(\mathbf{r},\mathbf{r}')} \otimes I \otimes \cdots \otimes I \otimes I, \\ \vdots \\ X\_{|V|}^{(\mathbf{r},\mathbf{r}')} \equiv I \otimes I \otimes I \otimes \cdots \otimes I \otimes X\_{|V|}^{(\mathbf{r},\mathbf{r}')}, \end{cases} (10.281)$$

where

$$X^{(\mathbf{t}+\mathbf{1},+\mathbf{1})} \equiv \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \ X^{(\mathbf{t}+\mathbf{1},-\mathbf{1})} \equiv \begin{pmatrix} 0 \ 0 \\ 1 \ 0 \end{pmatrix}, \ X^{(-\mathbf{1},+\mathbf{1})} \equiv \begin{pmatrix} 0 \ 1 \\ 0 \ 0 \end{pmatrix}, \ X^{(-\mathbf{1},-\mathbf{1})} \equiv \begin{pmatrix} 0 \ 0 \\ 0 \ 1 \end{pmatrix}. \tag{10.282}$$

In probabilistic segmentation and clustering, ρ(*D*|*s*, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) in Eq. (10.29) and *P*(*s*|α) in Eq. (10.30) correspond to the data generative and prior models, respectively. By using the Hubbard operators and extending Eq. (10.29) and Eq. (10.30) from the standpoint of quantum statistical mechanical informatics,

the density matrices of the data generative model and the prior model in quantum machine learning systems for probabilistic image processing can be expressed as follows:

$$\begin{split} &R(\mathcal{D}|\boldsymbol{a}(\cdot),\boldsymbol{a}(-1),\mathcal{C}(+1),\mathcal{C}(-1)) \\ &=\prod\_{l\in V}\sum\_{s\_{l}\in\Omega}X\_{l}^{(s\_{l},s\_{l})}\sqrt{\frac{1}{\det(2\pi\mathcal{C}(s\_{l}))}}\exp\left(-\frac{1}{2}(\boldsymbol{d}\_{l}-\boldsymbol{a}(s\_{l}))\mathcal{C}^{-1}(s\_{l})(\boldsymbol{d}\_{l}-\boldsymbol{a}(s\_{l}))^{\mathrm{T}}\right) \\ &=\exp\left(-\frac{1}{2}\sum\_{l\in V}\sum\_{s\_{l}\in\Omega}\left((\boldsymbol{d}\_{l}-\boldsymbol{a}(s\_{l}))\mathcal{C}^{-1}(s\_{l})(\boldsymbol{d}\_{l}-\boldsymbol{a}(s\_{l}))^{\mathrm{T}}+\ln(\det(2\pi\mathcal{C}(s\_{l})))\right)X\_{l}^{(s\_{l},s\_{l})}\right), \end{split} \tag{10.283}$$

$$R(\boldsymbol{\omega},\boldsymbol{\gamma}) = \frac{\exp\left(-2\boldsymbol{\omega}\sum\_{\langle i,j\rangle \in \boldsymbol{K}} (\boldsymbol{I}^{(\mathbb{Z}^{(\mathbb{V})})} - \boldsymbol{X}\_{i}^{(\mathbb{A},\mathbb{A}+\mathbb{I})}\boldsymbol{X}\_{j}^{(\mathbb{A},\mathbb{A}+\mathbb{I})} - \boldsymbol{X}\_{i}^{(-1,-1)}\boldsymbol{X}\_{j}^{(-1,-1)}) + \boldsymbol{\gamma}\sum\_{i\in\boldsymbol{V}} (\boldsymbol{X}\_{i}^{(+1,-1)} + \boldsymbol{X}\_{i}^{(-1,+1)})\right)}{\mathrm{Tr}\left[\exp\left(-2\boldsymbol{\omega}\sum\_{\langle i,j\rangle \in \boldsymbol{K}} (\boldsymbol{I}^{(\mathbb{Z}^{(\mathbb{V})})} - \boldsymbol{X}\_{i}^{(+1,+1)}\boldsymbol{X}\_{j}^{(+1,+1)} - \boldsymbol{X}\_{i}^{(-1,-1)}\boldsymbol{X}\_{j}^{(-1,-1)}) + \boldsymbol{\gamma}\sum\_{i\in\boldsymbol{V}} (\boldsymbol{X}\_{i}^{(+1,-1)} + \boldsymbol{X}\_{i}^{(-1,+1)})\right)\right]},\tag{10.284}$$

where

$$\mathbf{a}(+1) = \begin{pmatrix} a\_{\mathbb{R}}(+1) \\ a\_{\mathbb{G}}(+1) \\ a\_{\mathbb{B}}(+1) \end{pmatrix}, \; \mathbf{a}(-1) = \begin{pmatrix} a\_{\mathbb{R}}(-1) \\ a\_{\mathbb{G}}(-1) \\ a\_{\mathbb{B}}(-1) \end{pmatrix}, \tag{10.285}$$

$$\mathbf{C}(+\mathbf{l}) = \begin{pmatrix} \mathbf{C\_{RR}(+\mathbf{l}) \ \mathbf{C\_{RG}(+\mathbf{l}) \ \mathbf{C\_{RB}(+\mathbf{l})} \\ \mathbf{C\_{GR}(+\mathbf{l}) \ \mathbf{C\_{GG}(+\mathbf{l}) \ \mathbf{C\_{GB}(+\mathbf{l})} \\ \mathbf{C\_{BR}(+\mathbf{l}) \ \mathbf{C\_{BR}(+\mathbf{l}) \ \mathbf{C\_{BB}(+\mathbf{l})} \end{pmatrix}, \ \mathbf{C(-\mathbf{l})} = \begin{pmatrix} \mathbf{C\_{RR}(-\mathbf{l}) \ \mathbf{C\_{RG}(-\mathbf{l}) \ \mathbf{C\_{BB}(-\mathbf{l})} \\ \mathbf{C\_{GR}(-\mathbf{l}) \ \mathbf{C\_{GG}(-\mathbf{l})} \ \mathbf{C\_{GB}(-\mathbf{l})} \\ \mathbf{C\_{BR}(-\mathbf{l}) \ \mathbf{C\_{BR}(-\mathbf{l})} \ \mathbf{C\_{BB}(-\mathbf{l})} \end{pmatrix}. \tag{10.286}$$

The joint density matrix of *s* and *D* is expressed in terms of the data generative and prior density matrix as follows:

$$\begin{aligned} &\mathcal{P}(\mathcal{D}|\boldsymbol{a},\boldsymbol{a}(+1),\boldsymbol{a}(-1),\mathcal{C}(+1),\mathcal{C}(-1)) \\ &\equiv \exp(\ln(\mathcal{P}(\mathcal{D}|\boldsymbol{a}(+1),\boldsymbol{a}(-1),\mathcal{C}(+1),\mathcal{C}(-1))) + \ln(\mathcal{P}(\boldsymbol{a},\boldsymbol{\mathcal{V}}))).\end{aligned} \tag{10.287}$$

By using the joint density matrix *P*(*D*, α, γ , *a*(+1), *a*(−1), *C*(+1), *C*(−1)), the posterior density matrix *P*(*D*, α, γ , *a*(+1), *a*(−1), *C*(+1), *C*(−1)) is defined by using Bayes formulas as follows:

$$P(\mathcal{D}, a, \boldsymbol{\chi}, \boldsymbol{a}(+1), \boldsymbol{a}(-1), \mathcal{C}(+1), \mathcal{C}(-1)) \equiv \frac{P(\mathcal{D}|a, \boldsymbol{\chi}, \boldsymbol{a}(+1), \boldsymbol{a}(-1), \mathcal{C}(+1), \mathcal{C}(-1))}{P(\mathcal{D}|a, \boldsymbol{\chi}, \boldsymbol{a}(+1), \boldsymbol{a}(-1), \mathcal{C}(+1), \mathcal{C}(-1))},\tag{10.288}$$

Estimates of the hyperparameters and parameter vector, α(*D*), γ (*D*), *a*(+1|*D*), *a*(−1|*D*), *C* (+1|*D*), *C* (−1|*D*), are given by **--**

$$\begin{aligned} &\widehat{\mathfrak{a}}(-1|\mathcal{D}), \widehat{C}(+1|\mathcal{D}), \widehat{C}(-1|\mathcal{D}), \text{are given by} \\ &\quad \left(\widehat{\mathfrak{a}}(\mathcal{D}), \widehat{\mathcal{V}}(\mathcal{D}), \widehat{\mathfrak{a}}(+1|\mathcal{D}), \widehat{\mathfrak{a}}(-1|\mathcal{D}), \widehat{C}(+1|\mathcal{D}), \widehat{C}(-1|\mathcal{D})\right) \\ &= \arg\max\_{(\mathfrak{a}, \boldsymbol{\gamma}, \mathfrak{a}(+1), \mathfrak{a}(-1), \boldsymbol{\mathcal{C}}(+1), \boldsymbol{\mathcal{C}}(-1))} \text{Tr}\Big[\boldsymbol{P}(\mathcal{D}|\boldsymbol{a}, \boldsymbol{\gamma}, \mathfrak{a}(+1), \mathfrak{a}(-1), \boldsymbol{\mathcal{C}}(+1), \boldsymbol{\mathcal{C}}(-1))], \\ &\qquad\qquad\qquad (10.289) \\ \text{The parameter vector } \quad \widehat{\mathfrak{s}}(\mathcal{D}) = \left(\widehat{\mathfrak{s}}\_{1}(\mathcal{D}), \widehat{\mathfrak{s}}\_{2}(\mathcal{D}), \dots, \widehat{\mathfrak{s}}\_{|V|}(\mathcal{D})\right) \text{ can be estimated} \end{aligned} \tag{10.280}$$

*s*(*D*) =  *s*1(*D*),*s*2(*D*), ···,*s*|*V*|(*D*) can be estimated from the reduced posterior marginal density matrix at each node *i* of *P*(*D*, α, *a*(+1), *a*(−1), *C*(+1), *C*(−1)) by similar arguments to those for Eqs. (10.278), (10.279), and (10.280).

The *Q*-function for the EM algorithm in the present framework is defined by

$$\mathcal{Q}(\boldsymbol{\alpha}', \boldsymbol{\gamma}', \boldsymbol{\mathfrak{a}}(+1), \boldsymbol{\mathfrak{a}}(-1), \boldsymbol{\mathfrak{C}}(+1), \boldsymbol{\mathfrak{C}}(-1) | \boldsymbol{\alpha}', \boldsymbol{\mathfrak{y}}', \boldsymbol{\mathfrak{a}}'(+1), \boldsymbol{\mathfrak{a}}'(-1), \boldsymbol{\mathfrak{C}}'(+1), \boldsymbol{\mathfrak{C}}'(-1), \boldsymbol{\mathfrak{D}} \rangle$$

$$\equiv \text{Tr} \Big[ \boldsymbol{\mathcal{P}} \big( \mathcal{D}, \boldsymbol{\alpha}, \boldsymbol{\mathfrak{y}}, \boldsymbol{\mathfrak{a}}'(+1), \boldsymbol{\mathfrak{a}}'(-1), \boldsymbol{\mathfrak{C}}'(+1), \boldsymbol{\mathfrak{C}}'(-1) \big) \Big]$$

$$\times \ln(\boldsymbol{\mathcal{P}}(\mathcal{D}|\boldsymbol{\alpha}, \boldsymbol{\mathfrak{y}}, \boldsymbol{\mathfrak{a}}(+1), \boldsymbol{\mathfrak{a}}(-1), \boldsymbol{\mathfrak{C}}(+1), \boldsymbol{\mathfrak{C}}(+1), \boldsymbol{\mathfrak{C}}(-1))) \Big]. \qquad (10.290)$$

The EM algorithm is a procedure that performs the following **E-** and **M-step** repeatedly for *<sup>t</sup>* <sup>=</sup> <sup>0</sup>, <sup>1</sup>, <sup>2</sup>, ··· until α(*D*), *<sup>a</sup>*(+1, *<sup>D</sup>*), *a*(−1, *D*), *C* (+1, *D*), *C* (−1, *D*) converge:


$$\left(\mathfrak{a}(t+1), \mathfrak{a}(+1, t+1), \mathfrak{a}(-1, t+1), \mathfrak{C}(+1, t+1), \mathfrak{C}(-1, t+1)\right)$$

<sup>←</sup> extremum α,*a*(+1),*a*(−1),*C*(+1),*C*(−1) *Q* α, *a*(+1), *a*(−1), *C*(+1), *C*(−1) α(*t*), *<sup>a</sup>*(+1, *<sup>t</sup>*), *<sup>a</sup>*(−1, *<sup>t</sup>*), *<sup>C</sup>*(+1, *<sup>t</sup>*), *<sup>C</sup>*(−1, *<sup>t</sup>*), *<sup>D</sup>* . (10.291) **--**

$$\begin{array}{c} \begin{array}{c} (10.291) \\ \end{array} \\\\ \begin{array}{c} \text{Update} & \widehat{\mathfrak{a}}(\mathcal{D}) \leftarrow \mathcal{a}(t+1), \quad \widehat{\mathfrak{a}}(+1,\mathcal{D}) \leftarrow \mathcal{a}(+1,t+1), \quad \widehat{\mathfrak{a}}(-1,\mathcal{D}) \leftarrow \mathfrak{a} \\\ (-1,t+1), \widehat{\mathfrak{C}}(+1,\mathcal{D}) \leftarrow \mathcal{C}(+1,t+1) \text{ and } \widehat{\mathfrak{C}}(-1,\mathcal{D}) \leftarrow \mathcal{C}(-1,t+1). \end{array} \end{array}$$

**-**

**-**

By using some equalities in Eqs. (10.283), (10.284), (10.287), and (10.288), the EM algorithm using the *Q*-function can be reduced to the following expression:

$$\frac{1}{|E|}\sum\_{\{i,j\}\in E} \text{Tr}\left[ (I\otimes I - X^{(+1,+1)}\otimes X^{(+1,+1)} - X^{(+1,+1)}\otimes X^{(+1,+1)})P\_{lj}(a(t+1),\gamma(t+1)) \right]$$

$$I = \frac{1}{|E|}\sum\_{\{i,j\}\in E} \text{Tr}\left[ \left(I\otimes I - X^{(+1,+1)}\otimes X^{(+1,+1)} - X^{(+1,+1)}\otimes X^{(+1,+1)}\right) \right.$$

$$\times P\_{lj}(\mathcal{D}, a(t),\gamma(t),a(+1,t),a(-1,t),\mathcal{C}(+1,t),\mathcal{C}(-1,t)) \Big].\tag{10.292}$$

$$\begin{split} \frac{1}{|V|} \sum\_{l \in V} \text{Tr} \Big[ \Big( X^{(+1,-1)} + X^{(-1,+1)} \Big) P\_l(\boldsymbol{a}(t+1), \boldsymbol{\gamma}(t+1)) \Big] \\ = \frac{1}{|V|} \sum\_{l \in V} \text{Tr} \Big[ \Big( X^{(+1,-1)} + X^{(-1,+1)} \Big) \\ \qquad \times P\_l(\boldsymbol{D}, \boldsymbol{a}(t), \boldsymbol{\gamma}(t), \boldsymbol{a}(+1,t), \boldsymbol{a}(-1,t), \boldsymbol{C}(+1,t), \boldsymbol{C}(-1,t)) \Big]. \end{split} \tag{10.293}$$

$$\mu(\xi, t+1) = \frac{\sum\_{l \in V} d\_l \text{Tr} \left[ X^{(\xi, \xi)} \mathbf{P}\_l(\mathbf{D}, a(t), \boldsymbol{\nu}(t), \mathbf{a}(+1, t), \mathbf{a}(-1, t), \mathbf{C}(+1, t), \mathbf{C}(-1, t)) \right]}{\sum\_{l \in V} \text{Tr} \left[ X^{(\xi, \xi)} \mathbf{P}\_l(\mathbf{D}, a(t), \boldsymbol{\nu}(t), \mathbf{a}(+1, t), \mathbf{a}(-1, t), \mathbf{C}(+1, t), \mathbf{C}(-1, t)) \right]} \tag{5.62}$$

$$\begin{split} \mathcal{C}(\boldsymbol{\xi};t+1) \\ \sum\_{l\in V} (\mathbf{d}\_{l}-\mathbf{a}(\boldsymbol{\xi};t))^{\mathrm{T}} (\mathbf{d}\_{l}-\mathbf{a}(\boldsymbol{\xi};t)) \mathrm{Tr} \Big[ \boldsymbol{X}^{\{\boldsymbol{\xi},\boldsymbol{\xi}^{\mathrm{f}}\}} \mathsf{P}\_{l}(\mathbf{D},\mathbf{a}(t),\boldsymbol{\gamma}(t),\mathbf{a}(+1,t),\mathbf{a}(-1,t),\mathbf{C}(+1,t),\mathbf{C}(-1,t)) \Big] \\ \sum\_{l\in V} \mathrm{Tr} \Big[ \boldsymbol{X}^{\{\boldsymbol{\xi},\boldsymbol{\xi}^{\mathrm{f}}\}} \mathsf{P}\_{l}(\mathbf{D},\mathbf{a}(t),\boldsymbol{\gamma}(t),\mathbf{a}(+1,t),\mathbf{a}(-1,t),\mathbf{C}(+1,t),\mathbf{C}(-1,t)) \Big] \end{split} (\boldsymbol{\xi}\in\Omega), \tag{10.295}$$

where

$$\begin{split} & \langle s\_{i}|P\_{l}(\mathcal{D},\boldsymbol{\alpha},\boldsymbol{\gamma},\boldsymbol{a}(+1),\boldsymbol{a}(-1),\boldsymbol{C}(+1),\boldsymbol{C}(-1))|s'\_{i}\rangle \\ &= \langle s\_{i}|\mathrm{Tr}\_{\boldsymbol{\gamma}}P(\mathcal{D},\boldsymbol{\alpha},\boldsymbol{\gamma},\boldsymbol{a}(+1),\boldsymbol{a}(-1),\boldsymbol{C}(+1),\boldsymbol{C}(-1))|s'\_{i}\rangle \\ &= \sum\_{\boldsymbol{\tau}\_{1}\in\Omega}\sum\_{\boldsymbol{\tau}\_{2}\in\Omega}\cdots\sum\_{\boldsymbol{\tau}\_{|V|}\in\Omega}\sum\_{\boldsymbol{\tau}\_{i}'\in\Omega}\cdots\sum\_{\boldsymbol{\tau}'\_{|V|}\in\Omega}\delta\_{s\_{i},\boldsymbol{\tau}\_{i}}\delta\_{s'\_{i},t'\_{i}} \\ & \times \left(\prod\_{j\in V(\boldsymbol{l})}\delta\_{\boldsymbol{\tau}\_{j},\boldsymbol{\tau}'\_{j}}\right)|\mathrm{tr}\_{1},\boldsymbol{\tau}\_{2},\cdots,\boldsymbol{\tau}\_{|V|}\rangle P(\mathcal{D},\boldsymbol{a},\boldsymbol{\gamma},\boldsymbol{a}(+1),\boldsymbol{a}(-1),\boldsymbol{C}(+1),\boldsymbol{C}(-1))|(\boldsymbol{\tau}'\_{1},\boldsymbol{\tau}'\_{2},\cdots,\boldsymbol{\tau}'\_{|V|})\rangle \\ & \quad \times \left(\boldsymbol{s}\_{i}\in\Omega,\ \boldsymbol{s}'\_{i}\in\Omega,\ i\in V\right), \tag{10.296} \end{split}$$

$$\begin{aligned} & \langle s\_{\bar{l}}, s\_{\bar{f}} \vert \mathcal{P}\_{\bar{l}f}(\mathcal{D}, \mathfrak{a}, \mathcal{Y}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathfrak{C}(+1), \mathfrak{C}(-1)) \vert s'\_{\bar{l}}, s'\_{\bar{f}} \rangle \\ &= \langle s\_{\bar{l}}, s\_{\bar{f}} \vert \mathcal{P}\_{\bar{f}\bar{l}}(\mathcal{D}, \mathfrak{a}, \mathcal{Y}, \mathfrak{a}(+1), \mathfrak{a}(-1), \mathfrak{C}(+1), \mathfrak{C}(-1)) \vert s'\_{\bar{l}}, s'\_{\bar{f}} \rangle \end{aligned}$$

$$\begin{split} &= \langle s\_{i},s\_{j}|\operatorname{Tr}\_{\{l\_{i}\}}\operatorname{P}(\mathcal{D},\mathcal{a},\boldsymbol{\beta},\boldsymbol{\gamma},\mathbf{a}(+1),\mathbf{a}(-1),\mathbf{c}(+1),\mathbf{C}(+1))|s'\_{i},s'\_{j}\rangle \\ &= \sum\_{\mathbf{r}\_{i}\in\Omega}\sum\_{\mathbf{r}\_{i}\in\Omega}\cdots\sum\_{\mathbf{r}\_{|V|}\in\Omega}\sum\_{\mathbf{r}'\_{i}\in\Omega}\cdots\sum\_{\mathbf{r}'\_{|V|}\in\Omega}\delta\_{\mathbf{s}\_{i},\mathbf{v}\_{i}}\delta\_{s\_{i},\mathbf{r}'\_{i}}\delta\_{s'\_{i},\mathbf{r}'\_{j}}\delta\_{s'\_{i},\mathbf{r}'\_{j}} \\ &\times\left(\prod\_{k\in V\backslash\langle l,j\rangle}\delta\_{\mathbf{t}\_{k},\mathbf{t}\_{k}^{\prime}}\right)\langle\mathbf{r}\_{1},\mathbf{r}\_{2},\cdots,\mathbf{r}\_{|V|}|\operatorname{P}(\mathcal{D},\mathcal{a},\boldsymbol{\gamma},\mathbf{a}(+1),\mathbf{a}(-1),\mathbf{C}(+1),\mathbf{C}(-1))|\mathbf{r}'\_{1},\mathbf{r}'\_{2},\cdots,\mathbf{r}\_{|V|}'\rangle \\ &\cdot\left(\imath\_{\mathbf{t}}\in\Omega\text{ s}\_{j}\epsilon\_{\!\!\!}\Omega,\ \operatorname{s}'\_{j}\epsilon\_{\!\!\!\!}\Omega,\ \operatorname{i\!}\epsilon\_{\!\!\!V},\ \operatorname{i\!}\epsilon\_{\!\!\!V},\ \operatorname{i\!}\epsilon\_{\!\!\!V}\right),\tag{10.297} \end{split}$$

$$
\begin{split}
\langle s\_i | \boldsymbol{P\_i}(\boldsymbol{\alpha}, \boldsymbol{\gamma}) | s\_i' \rangle &= \langle s\_i | \mathrm{Tr}\_{\langle i \rangle} \boldsymbol{\varPhi}(\boldsymbol{\alpha}, \boldsymbol{\gamma}) | s\_i' \rangle \\
\equiv \sum\_{\boldsymbol{\tau\_i} \in \Omega \boldsymbol{\tau\_i} \in \Omega} \sum\_{\boldsymbol{\tau\_{\{|\boldsymbol{\tau}|}}} \in \Omega} \sum\_{\boldsymbol{\tau\_i}' \in \Omega \boldsymbol{\tau\_i}' \in \Omega} \sum\_{\boldsymbol{\tau\_i'} \in \Omega} \cdots \sum\_{\boldsymbol{\tau\_{\{|\boldsymbol{\tau}|}}} \in \Omega} \delta\_{s\_i, \boldsymbol{\tau\_i}} \delta\_{s\_i', \boldsymbol{\tau\_i}'} \\
&\times \left( \prod\_{k \in V \backslash \{i\}} \delta\_{\boldsymbol{\tau\_k}, \boldsymbol{\tau\_i}'} \right) \langle \boldsymbol{\tau\_1}, \boldsymbol{\tau\_2}, \dots, \boldsymbol{\tau\_{\{|V|\}}} | \boldsymbol{P(\alpha, \boldsymbol{\gamma})} | \boldsymbol{\pi\_i'}, \boldsymbol{\pi\_2'}, \dots, \boldsymbol{\pi\_{\{|V|\}}} \rangle \\
&\quad \times \langle s\_i \in \Omega \, \, s\_j \in \Omega, \, \, s\_i' \in \Omega, \, \, s\_j' \in \Omega, \, \, i \in V \rangle, \,\tag{10.298}
\end{split}
$$

$$
\begin{split}
\langle s\_{i},s\_{j}|\mathbf{P}\_{lj}(\boldsymbol{\mu},\boldsymbol{\gamma})|s\_{i}',s\_{j}'\rangle &= \langle s\_{i},s\_{j}|\mathbf{P}\_{jl}(\boldsymbol{\mu},\boldsymbol{\gamma})|s\_{i}',s\_{j}'\rangle \\ &= \langle s\_{i},s\_{j}|\mathrm{Tr}\_{\left|\boldsymbol{\mu}\right|,\left|\boldsymbol{\nu}\right|}\mathbf{P}(\boldsymbol{\mu},\boldsymbol{\gamma})|s\_{i}',s\_{j}'\rangle \\ &= \sum\_{\mathbf{r}\_{l}\in\Omega}\sum\_{\mathbf{r}\_{l}\in\Omega}\cdots\sum\_{\mathbf{r}\_{l}\in\Omega}\sum\_{\mathbf{r}\_{l}'\in\Omega}\sum\_{\mathbf{r}\_{l}'\in\Omega}\cdots\sum\_{\mathbf{r}\_{l}'\in\Omega}\delta\_{\mathbf{l},\boldsymbol{\gamma},\mathbf{r}\_{l}}\delta\_{\boldsymbol{\delta}\_{j}',\mathbf{r}\_{j}'}\delta\_{\boldsymbol{\delta}\_{j}',\mathbf{r}\_{j}'} \\ &\qquad\times\left(\prod\_{\mathbf{t}\in V\left|\left(l,j\right)\right|}\delta\_{\mathbf{t}\_{l},\boldsymbol{\tau}\_{l}'}\right)\langle\mathbf{r}\_{1},\mathbf{r}\_{2},\dots,\mathbf{r}\_{\left|V\right|}|\mathbf{P}(\boldsymbol{\mu},\boldsymbol{\nu})\rangle\mathbf{r}\_{1}',\mathbf{r}\_{2}',\dots,\mathbf{r}\_{\left|V\right|}'\rangle
\end{split}
$$

#### **10.5 Quantum Statistical Mechanical Informatics**

This section explains some quantum graphical modeling using some quantum mechanical extensions of statistical mechanical informatics, such as quantum statistical mechanical informatics, and particularly, advanced quantum mean-field methods. Fundamental frameworks and recent developments have been explored in some textbooks in statistical mechanics [37, 86]. In some applications of quantum annealing to massive optimization problems, a transverse Ising model is an important quantum probabilistic graphical model [83, 84] and it is known that the density matrices, for example, in Eqs. (10.263), (10.265), and (10.266), in some familiar quantum statistical machine learning systems can be reduced to transverse Ising models.

In quantum statistical mechanical informatics, one of most important schemes is Suzuki-Trotter decompositions [87, 88]. This was used to realize the quantum Monte Carlo methods by mapping *d*-dimensional density matrices to corresponding (*d* + 1)-dimensional probability distributions [89]. Recently, some quantum annealing schemes have been realized as actual quantum computers, for example, the *d*-wave machine.

In the first part of this section, we explain some basic frameworks in advanced quantum mean-field methods for realizing familiar quantum statistical machine learning systems for the transverse Ising models, including conventional frameworks of quantum belief propagations. In the second part, we propose a quantum adaptive Thouless-Anderson-Palmar (TAP) method and a new approach using the momentum space renormalization group method to realize coarse graining for the transverse Ising model not only for regular graphs but also for random graphs. In the third part, we introduce Suzuki-Trotter decompositions [87, 88], and show the basic scheme for mapping a *d*-dimensional transverse Ising model to a (*d* + 1)-dimensional Ising model and apply the scheme to the message passing rules of the conventional quantum belief propagation.

## *10.5.1 Advanced Mean-Field Methods for the Transverse Ising Model*

This section explores the detailed derivation of the deterministic equations in both the quantum mean-field method and the quantum loopy belief propagation method for the transverse Ising model [83, 84]. Note that the present framework of the quantum mean-field method and the quantum loopy belief propagation method are constructed in real space, while other familiar frameworks in quantum statistical mechanics such as spin wave theory are constructed in momentum space.

For a graph (*V*, *E*) with a set of nodes *V* and set of edges *E*, we consider a density matrix *P* as

$$P = \frac{\exp\left(-\frac{1}{k\_B T} \left(\frac{1}{2} J \sum\_{\{i,j\} \in E} \left(\sigma\_i^z - \sigma\_j^z\right)^2 + \frac{1}{2} h \sum\_{i \in V} \left(\sigma\_i^z - d\_i I^{\{2^{|V|\_j}\}}\right)^2 - \gamma \sum\_{i \in V} \sigma\_i^x\right)\right)}{\text{Tr}\left[\exp\left(-\frac{1}{k\_B T} \left(\frac{1}{2} J \sum\_{\{i,j\} \in E} \left(\sigma\_i^z - \sigma\_j^z\right)^2 + \frac{1}{2} h \sum\_{i \in V} \left(\sigma\_i^z - d\_i I^{\{2^{|V|\_j}\}}\right)^2 - \Gamma \sum\_{i \in V} \sigma\_i^x\right)\right)\right]}.\tag{10.300}$$

Because *σ <sup>z</sup> <sup>i</sup> <sup>σ</sup> <sup>z</sup> <sup>i</sup>* <sup>=</sup> *<sup>I</sup>(***2|***V***<sup>|</sup>** *)* , the density matrix in Eq. (10.300) can be reduced to Eqs. (10.226) and (10.227) with

$$H = -J\sum\_{\{i,j\} \in E} \sigma\_i^z \sigma\_j^z - h \sum\_{i \in V} d\_i \sigma\_i^z - \Gamma \sum\_{i \in V} \sigma\_i^x. \tag{10.301}$$

Here, all the nodes *j* connected with the node *i* by an edge {*i*, *j*} are referred to as neighboring nodes of the node *i*, and the set of all neighboring nodes of the node *i* is denoted by the notation ∂*i*. The quantum probabilistic graphical model in Eqs. (10.300) and (10.301) is referred to as the **Transverse Ising Model** [83, 84].

First, we explain the conventional quantum mean-field method for the transverse Ising model. We introduce a 2*<sup>N</sup>*×2*<sup>N</sup>* trial density matrix *R* and its 2×2 trial reduced density matrix *R<sup>i</sup>* for each node *i*(∈*V*)) defined by

$$\mathbf{R}\_{\dot{l}} = \mathrm{Tr}\_{\dot{\vee}} \mathbf{R} = \begin{pmatrix} \langle +1|\mathbf{R}\_{\dot{l}}|+1\rangle \ \langle +1|\mathbf{R}\_{\dot{l}}|-1\rangle\\ \langle -1|\mathbf{R}\_{\dot{l}}|+1\rangle \ \langle -1|\mathbf{R}\_{\dot{l}}|-1\rangle \end{pmatrix},\tag{10.302}$$

where

$$
\langle s\_{i}|\mathbf{R}\_{i}|s\_{i}'\rangle = \langle s\_{i}|\mathrm{Tr}\_{\backslash}\mathbf{R}|s\_{i}'\rangle
$$

$$
=\sum\_{\mathbf{r}\_{1}\in\Omega}\sum\_{\mathbf{r}\_{2}'\in\Omega}\cdots\sum\_{\mathbf{r}\_{|V|}'\in\Omega}\sum\_{\mathbf{r}\_{1}'\in\Omega}\sum\_{\mathbf{r}\_{2}'\in\Omega}\cdots\sum\_{\mathbf{r}\_{|V|}'\in\Omega}\delta\_{s\_{i},\mathbf{r}\_{1}}\delta\_{s\_{i}',\mathbf{r}\_{1}'}\left(\prod\_{j\in V\backslash\langle l\rangle}\delta\_{\mathbf{r}\_{j},\mathbf{r}\_{j}'}\right)\langle\mathbf{r}\_{1},\mathbf{r}\_{2},\cdots,\mathbf{r}\_{|V|}|\mathbf{R}|\mathbf{r}\_{1}',\mathbf{r}\_{2}',\cdots,\mathbf{r}\_{|V|}'\rangle
$$

$$
(s\_{i}\in\Omega,\ s\_{i}'\in\Omega,\ i\in V).\tag{10.303}
$$

By using Eq. (10.303), the average Tr(*σ <sup>x</sup> <sup>i</sup> R*) can be expressed in terms of the reduced density matrix *Ri* as follows:

Tr(*σ <sup>x</sup> <sup>i</sup> <sup>R</sup>*) <sup>=</sup> *s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- *s* 1∈- *s* 2∈- ··· *s* |*V* | ∈- *s*1,*s*2, ···,*s*|*V*||*<sup>σ</sup> <sup>x</sup> <sup>i</sup>* |*s* 1,*s* <sup>2</sup>, ···,*s* |*V*| ×*s* 1,*s* <sup>2</sup>, ···,*s* |*V*| |*R*|*s*1,*s*2, ···,*s*|*<sup>V</sup>* | <sup>=</sup> *s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- *s* 1∈- *s* 2∈- ··· *s* |*V* | ∈- ⎛ <sup>⎝</sup> *k*∈*V*\{*i*} δ*sk* ,*s k* ⎞ ⎠ ×*si*|*<sup>σ</sup> <sup>x</sup>*|*s is* 1,*s* <sup>2</sup>, ···,*s* |*V*| |*R*|*s*1,*s*2, , ···,*s*|*<sup>V</sup>*| <sup>=</sup> *si*∈- *s i*∈- *si*|*<sup>σ</sup> <sup>x</sup>* <sup>|</sup>*s i* τ1∈- τ2∈- ··· τ|*<sup>V</sup>* |∈- τ 1∈- τ 2∈- ··· τ |*V* | ∈- δ*si*,τ*<sup>i</sup>* δ*<sup>s</sup> <sup>i</sup>*,τ *i* × ⎛ <sup>⎝</sup> *k*∈*V*\{*i*} δτ*<sup>k</sup>* ,τ *k* ⎞ <sup>⎠</sup><sup>τ</sup> <sup>1</sup>, τ <sup>2</sup>, ···, τ |*V* | |*R*|τ1, τ2, , ···, τ|*<sup>V</sup>*| <sup>=</sup> *si*∈- *s i*∈- *si*|*<sup>σ</sup> <sup>x</sup>* <sup>|</sup>*s is <sup>i</sup>*|*Ri*|*si* <sup>=</sup> Tr(*<sup>σ</sup> <sup>x</sup> Ri*). (10.304)

By similar arguments to those for Eq. (10.304), we derive

$$\operatorname{Tr}(\sigma\_i^z \mathbf{R}) = \operatorname{Tr}(\sigma^z \mathbf{R}\_i). \tag{10.305}$$

Now, we assume that the trial density matrix *R* is expressed as

$$\mathcal{R} = \mathcal{R}\_1 \otimes \mathcal{R}\_2 \otimes \cdots \otimes \mathcal{R}\_{|V|}. \tag{10.306}$$

In this case, the average Tr(*σ <sup>z</sup> <sup>i</sup> <sup>σ</sup> <sup>z</sup> <sup>j</sup> R*) and the entropy −*k*BTr*R*ln*R* can be expressed as

Tr(*σ <sup>z</sup> i σ z <sup>j</sup> <sup>R</sup>*) <sup>=</sup> Tr *σ z i σ z j R***1**⊗*R***2**⊗· · ·⊗*R***|***V***<sup>|</sup>** <sup>=</sup> *s*1∈- *s*2∈- ··· *<sup>s</sup>*|*V*|∈- *s* 1∈- *s* 2∈- ··· *s* |*V*| ∈- *s* <sup>1</sup> ∈- *s* <sup>2</sup> ∈- ··· *s* |*V*| ∈- ⎛ <sup>⎝</sup> *k*∈*V*\{*i*} <sup>δ</sup>*sk* ,*s k* ⎞ <sup>⎠</sup>*si* <sup>|</sup>*<sup>σ</sup> <sup>z</sup>*|*s i* × ⎛ <sup>⎝</sup> *l*∈*V* \{*j*} δ*s l* ,*s l* ⎞ ⎠*s <sup>j</sup>* <sup>|</sup>*<sup>σ</sup> <sup>z</sup>*|*s <sup>j</sup> s* <sup>1</sup> |*R***1**|*s*1*s* <sup>2</sup> |*R***2**|*s*2×· · ·×*s* |*V* | |*R***|***V***|**|*s*|*<sup>V</sup>* | = ⎛ ⎜ ⎝ *si* ∈- *s i* ∈- *si* <sup>|</sup>*<sup>σ</sup> <sup>z</sup>*|*s is <sup>i</sup>* |*Ri* |*si* ⎞ ⎟ ⎠ × ⎛ ⎜ ⎝ *sj* ∈- *s <sup>j</sup>* ∈- *sj* <sup>|</sup>*<sup>σ</sup> <sup>z</sup>*|*s <sup>j</sup> s <sup>j</sup>* |*R <sup>j</sup>* |*sj* ⎞ ⎟ ⎠ ⎛ ⎜ ⎝ *k*∈*V* \{*i*,*j*} ⎛ ⎜ ⎝ *s k*∈- *s <sup>k</sup>* ∈- <sup>δ</sup>*sk* ,*s k* δ*s <sup>k</sup>* ,*s k* ⎞ ⎟ ⎠*s <sup>k</sup>* |*Rk*|*sk* ⎞ ⎟ ⎠ =  Tr *<sup>σ</sup> <sup>z</sup> Ri* Tr *<sup>σ</sup> <sup>z</sup> <sup>R</sup> <sup>j</sup>* ⎛ <sup>⎝</sup> *k*∈*V*\{*i*,*j*} Tr *Rk* ⎞ ⎠ =  Tr *<sup>σ</sup> <sup>z</sup> Ri* Tr *<sup>σ</sup> <sup>z</sup> <sup>R</sup> <sup>j</sup>* , (10.307)

− *k*BTr(*R*ln(*R*)) = −Tr*R***1**⊗*R***2**⊗· · ·⊗*R***|***V***<sup>|</sup>** ln *R***1**⊗*R***2**⊗· · ·⊗*R***|***V***<sup>|</sup>** = −*k*<sup>B</sup> *N i*=1 Tr *R***1**⊗*R***2**⊗· · ·⊗*R***|***V***<sup>|</sup>** *I*(*i*−1) <sup>⊗</sup>ln(*Ri*)⊗*I*(*N*−*i*) = −*k*<sup>B</sup> *N i*=1 *s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- *s* 1∈- *s* 2∈- ··· *s* |*V*| ∈- *s*1|*R***1**|*s* 1*s*2|*R***2**|*s* <sup>2</sup>×· · ·×*s*|*<sup>V</sup>* ||*R***|***V***|**|*s* <sup>|</sup>*<sup>V</sup>* | × ⎛ <sup>⎝</sup> *k*∈*V*\{*i*} δ*sk* ,*s k* ⎞ ⎠*s <sup>i</sup>*|ln(*Ri*)|*si* = −*k*<sup>B</sup> *N i*=1 ⎛ ⎝ *si* ∈- *s <sup>i</sup>* ∈- *si*|*Ri*|*s is <sup>i</sup>*|ln(*Ri*)|*si* ⎞ ⎠ ⎛ <sup>⎝</sup> *k*∈*V*\{*i*} ⎛ ⎝ *sk*∈- *s k*∈- δ*sk* ,*s k sk* |*Rk*|*s* 1 ⎞ ⎠ ⎞ ⎠ = −*k*<sup>B</sup> *N i*=1 Tr *Ri* ln(*Ri*) ⎛ <sup>⎝</sup> *k*∈*V*\{*i*} Tr(*Rk*) ⎞ ⎠ = −*k*<sup>B</sup> *N i*=1 Tr *Ri* ln(*Ri*) . (10.308)

The free energy functional can be reduced to

240 K. Tanaka

$$\mathcal{F}[\boldsymbol{\mathcal{R}}] = \mathcal{F}\_{\text{MF}}[\boldsymbol{\mathcal{R}}\_{1}, \boldsymbol{\mathcal{R}}\_{2}, \dots, \boldsymbol{\mathcal{R}}\_{|V|}] \equiv -J \sum\_{\{i,j\} \in E} \left( \text{Tr}(\boldsymbol{\sigma}^{\boldsymbol{\varepsilon}} \boldsymbol{\mathcal{R}}\_{i}) \right) \left( \text{Tr}(\boldsymbol{\sigma}^{\boldsymbol{\varepsilon}} \boldsymbol{\mathcal{R}}\_{j}) \right) - h \sum\_{i \in V} \text{Tr}(\boldsymbol{\sigma}^{\boldsymbol{\varepsilon}} \boldsymbol{\mathcal{R}}\_{l})$$
 
$$-\Gamma \sum\_{i \in V} \text{Tr}(\boldsymbol{\sigma}^{\boldsymbol{\varepsilon}} \boldsymbol{\mathcal{R}}\_{l}) + k\_{\text{B}} T \sum\_{l \in V} \text{Tr}(\boldsymbol{\mathcal{R}}\_{l} \ln(\boldsymbol{\mathcal{R}}\_{i})). \tag{10.309}$$

We define the optimal reduced density matrix *R <sup>i</sup>* for each node *i*(∈*V*) by **-------**

$$\widehat{\mathcal{R}}\_{i} = \arg \operatorname\*{\mathrm{extremum}}\_{\mathcal{R}\_{i}} \left| \mathcal{F}\_{\mathrm{MF}} \left[ \widehat{\mathcal{R}}\_{1}, \widehat{\mathcal{R}}\_{1}, \dots, \widehat{\mathcal{R}}\_{i-1}, \mathcal{R}\_{i}, \widehat{\mathcal{R}}\_{i+1}, \widehat{\mathcal{R}}\_{i+2}, \dots, \widehat{\mathcal{R}}\_{|V|} \right] \middle| \operatorname{Tr} \mathcal{R}\_{i} = 1 \right| (i \in V). \tag{10.310}$$

The simultaneous self-consistent equations for reduced density matrices are expressed as **--**

$$\widehat{\mathbf{R}}\_{i} = \frac{1}{Z\_{i}} \exp\left(\frac{1}{k\_{\mathrm{B}}T} \left( \left( J \sum\_{j \in \partial i} (\mathrm{Tr}(\sigma^{z} \widehat{\mathbf{R}}\_{j})) + hd\_{i} \right) \sigma^{z} + \Gamma \sigma^{x} \right) \right), \quad (10.311)$$

$$Z\_i \equiv \operatorname{Tr} \left[ \exp \left( \frac{1}{k\_{\mathrm{B}} T} \left( \left( J \sum\_{j \in \partial i} (\operatorname{Tr}(\sigma^z \widehat{R}\_j)) + h d\_i \right) \sigma^z + \Gamma \sigma^x \right) \right) \right] . (10.312)$$

**-**

From Eq. (10.311), we can derive the following simultaneous self-consistent equations for the magnetizations *<sup>m</sup><sup>z</sup> <sup>i</sup>* ≡ Tr(*σ <sup>z</sup> R <sup>j</sup>*) (*i*∈*V*) and *<sup>m</sup><sup>x</sup> <sup>i</sup>* ≡ Tr(*σ <sup>x</sup> R <sup>j</sup>*) (*i*∈*V*):

$$
\widehat{\boldsymbol{m}}\_{i}^{\varepsilon} = \frac{\frac{J}{k\_{\mathrm{B}}T} \sum\_{j \in \partial i} \widehat{\boldsymbol{m}}\_{i}^{\varepsilon} + \frac{h}{k\_{\mathrm{B}}T} \boldsymbol{d}\_{i}}{\sqrt{\left(\frac{J}{k\_{\mathrm{B}}T} \sum\_{j \in \partial i} \widehat{\boldsymbol{m}}\_{j}^{\varepsilon} + \frac{h}{k\_{\mathrm{B}}T} \boldsymbol{d}\_{i}\right)^{2} + \left(\frac{\Gamma}{k\_{\mathrm{B}}T}\right)^{2}}}
$$

$$
\times \tanh\left(\sqrt{\left(\frac{J}{k\_{\mathrm{B}}T} \sum\_{j \in \partial i} \widehat{\boldsymbol{m}}\_{j}^{\varepsilon} + \frac{h}{k\_{\mathrm{B}}T} \boldsymbol{d}\_{i}\right)^{2} + \left(\frac{\Gamma}{k\_{\mathrm{B}}T}\right)^{2}}\right),
\tag{10.313}
$$

$$
\widehat{m}\_i^\times = \frac{\frac{\Gamma}{k\_\mathcal{\mathcal{B}}T}}{\sqrt{\left(\frac{J}{k\_\mathcal{\mathcal{B}}T}\sum\_{j\in\partial i}\widehat{m}\_j^\circ + \frac{h}{k\_\mathcal{\mathcal{B}}T}d\_i\right)^2 + \left(\frac{\Gamma}{k\_\mathcal{\mathcal{B}}T}\right)^2}} \tag{10.13}
$$

$$
\times \tanh\left(\sqrt{\left(\frac{J}{k\_\mathcal{\mathcal{B}}T}\sum\_{j\in\partial i}\widehat{m}\_j^\circ + \frac{h}{k\_\mathcal{\mathcal{B}}T}d\_i\right)^2 + \left(\frac{\Gamma}{k\_\mathcal{\mathcal{B}}T}\right)^2}\right).
\tag{10.14}
$$

The mean-field free energy*F*MF! *R* **<sup>1</sup>**, *R* **<sup>2</sup>**, ···, *R* **|***V***|** " of the present system is expressed as **---**

$$\begin{split} \mathcal{F}\_{\text{MF}} \Big[ \mathcal{R}\_{1}, \mathcal{R}\_{2}, \dots, \mathcal{R}\_{|V|} \Big] &= \sum\_{i \in V} \Big( -k\_{\text{B}} T \ln(Z\_{i}) \Big) \\ &= -k\_{\text{B}} T \sum\_{i \in V} \ln \left( \text{Tr} \left[ \exp \left( \frac{1}{k\_{\text{B}} T} \left( \left( J \left( \sum\_{j \in \text{bi}} (\text{Tr}(\sigma^{z} \widehat{R}\_{j}) \right) \right) \sigma^{z} + \Gamma \sigma^{z} \right) \right) \right) \right] \right) \\ &= -k\_{\text{B}} T \sum\_{i \in V} \ln \left( 2 \cosh \left( \sqrt{\left( \frac{J}{k\_{\text{B}} T} \sum\_{j \in \text{bi}} \widehat{\mathfrak{m}}\_{j}^{z} \right)^{2} + \Gamma^{2}} \right) \right). \end{split} \tag{10.315}$$

Next, we extend the above framework for the mean-field method for the transverse Ising model to the quantum loopy belief propagation method based on the quantum cluster variation method in Ref. [90]. We introduce a 2*<sup>N</sup>*×2*<sup>N</sup>* trial density matrix *R* and its 2×2 trial reduced density matrix *R<sup>i</sup>* for each node *i*(∈*V*)) defined by

*Ri j* <sup>=</sup> *<sup>R</sup> j i* <sup>=</sup> Tr\{*i*,*j*} *<sup>R</sup>* = ⎛ ⎜ ⎜ ⎜ ⎝ + 1, +1|*Ri j* | + 1, +1 + 1, +1|*Ri j* | − 1, +1 + 1, +1|*Ri j* | + 1, −1 + 1, +1|*Ri j* | − 1, −1 + 1, −1|*Ri j* | + 1, +1 + 1, −1|*Ri j* | − 1, +1 + 1, −1|*Ri j* | + 1, −1 + 1, −1|*Ri j* | − 1, −1 − 1, +1|*Ri j* | + 1, +1 − 1, +1|*Ri j* | − 1, +1 − 1, +1|*Ri j* | + 1, −1 − 1, +1|*Ri j* | − 1, −1 − 1, −1|*Ri j* | + 1, +1 − 1, −1|*Ri j* | − 1, +1 − 1, −1|*Ri j* | + 1, −1 − 1, −1|*Ri j* | − 1, −1 ⎞ ⎟ ⎟ ⎟ ⎠ (*i*∈*V*, *<sup>j</sup>*∈*V*, *<sup>i</sup>* <sup>&</sup>lt; *<sup>j</sup>*), (10.316)

where

$$
\begin{split}
\langle s\_i, s\_j | \mathbf{R}\_{ij} | s\_i', s\_j' \rangle &= \langle s\_i, s\_j | \mathbf{R}\_{j1} | s\_i', s\_j' \rangle \\ &= \langle s\_i, s\_j | \text{Tr}\_{\{i, i\}} \text{R} | s\_i', s\_j' \rangle \\ &= \sum\_{\mathbf{r}\_1 \in \Omega \mathbf{r}\_2 \in \Omega} \cdots \sum\_{\mathbf{r}\_{|V|} \in \Omega \mathbf{r}\_1' \in \Omega \mathbf{r}\_2' \in \Omega} \sum\_{\mathbf{r}\_{|V|}' \in \Omega} \cdots \sum\_{\mathbf{r}\_{|V|}' \in \Omega} \\ &\quad \times \delta\_{\mathbf{s}\_i, \mathbf{r}\_i} \delta\_{s\_i, \mathbf{r}\_i'} \delta\_{s\_i', \mathbf{r}\_i'} \delta\_{s\_j', \mathbf{r}\_j'} \Big( \prod\_{k \in V \backslash \{i, j\}} \delta\_{\mathbf{r}\_k, \mathbf{r}\_k'} \Big) \langle \mathbf{r}\_1, \mathbf{r}\_2, \dots, \mathbf{r}\_{|V|} | \mathbf{R} | \mathbf{r}\_1', \mathbf{r}\_2', \dots, \mathbf{r}\_{|V|}' \rangle \\ &\quad \times \delta\_i \in \Omega \text{ } s\_j \in \Omega, \ s\_i' \in \Omega, \ s\_j' \in \Omega, \ i \in V, \ j \in V, \ i < j \rangle. \tag{10.3.17}
\end{split}
$$

**-**

By similar arguments to those for Eq. (10.304), we derive

$$\operatorname{Tr}\left(\sigma\_i^z \sigma\_j^z \mathbf{R}\right) = \operatorname{Tr}\left((\sigma^z \otimes \mathbf{I})(\mathbf{I} \otimes \sigma^z)\mathbf{R}\_{ij}\right). \tag{10.318}$$

We now assume that the free energy functional can be expressed as

*F*[*R*] = *F*Bethe%#*Ri i*∈*<sup>V</sup>* \$ , # *R***{***i, <sup>j</sup>***}** {*i*, *<sup>j</sup>*}∈*<sup>E</sup>* \$& ≡ −*<sup>J</sup>* {*i*,*j*}∈*E* Tr (*σ <sup>z</sup>* <sup>⊗</sup>*I*)(*I*⊗*<sup>σ</sup> <sup>z</sup>* )*R*{*i*,*j*} −*h i*∈*V di*Tr(*<sup>σ</sup> <sup>z</sup> Ri*) <sup>−</sup> *i*∈*V* Tr(*σ <sup>x</sup> Ri*) +*k*B*T i*∈*V* Tr(*Ri* ln(*Ri*)) <sup>+</sup>*k*B*<sup>T</sup>* {*i*,*j*}∈*E* Tr *R*{*i*,*j*}ln(*R*{*i*,*j*} − Tr(*Ri* ln(*Ri*)) − Tr *Rj* ln(*Rj*) = −*<sup>J</sup>* {*i*,*j*}∈*E* Tr (*σ <sup>z</sup>* <sup>⊗</sup>*I*)(*I*⊗*<sup>σ</sup> <sup>z</sup>* )*R*{*i*,*j*} −*h i*∈*V di*Tr(*<sup>σ</sup> <sup>z</sup> Ri*) <sup>−</sup> *i*∈*V* Tr(*σ <sup>x</sup> Ri*) <sup>+</sup>*k*B*<sup>T</sup>* {*i*,*j*}∈*E* Tr *R*{*i*,*j*}ln(*R*{*i*,*j*}) + *k*B*T i*∈*V* (1 − |∂*i*|)Tr(*Ri* ln(*Ri*)). (10.319) **-**

We define the reduced density matrix *R <sup>i</sup>* for each node *i*(∈*V*) by **---**

**-**

$$\widehat{\boldsymbol{R}}\_{k} = \arg\underset{\boldsymbol{\mathcal{R}}\_{k}}{\text{arg}\,\text{extremum}} \left\{ \boldsymbol{\mathcal{F}}\_{\text{Bethe}} \left[ \boldsymbol{\mathcal{R}}\_{k}, \left\{ \widehat{\boldsymbol{R}}\_{l} \middle| i \in \boldsymbol{V} \backslash \{k\} \right\}, \left\{ \widehat{\boldsymbol{R}}\_{\{l,f\}} \middle| \langle i,j \rangle \in \boldsymbol{E} \right\} \right] \left| \text{Tr}\,\boldsymbol{\mathcal{R}}\_{k} = 1, \ \boldsymbol{\mathcal{R}}\_{k} = \text{Tr}\_{\mid\boldsymbol{k}} \widehat{\boldsymbol{R}}\_{\{k,f\}} \left( j \in \partial \boldsymbol{k} \right) \right\} \right| \tag{10.320}$$
 
$$(k \in \boldsymbol{V}). \tag{10.320}$$

**-**

$$
\widehat{\mathcal{R}}\_{\{k,l\}} = \arg\underset{\mathcal{R}\_{\{k,l\}}}{\text{arg}\, \text{extremum}} \left| \mathcal{F}\_{\text{Refle}} \left[ \mathcal{R}\_{\{k,l\}}, \left\{ \widehat{\mathcal{R}}\_{l} \left| i \in V \right.\right\}, \left\{ \widehat{\mathcal{R}}\_{\{i,j\}} \left| \langle i,j \rangle \in E \left\langle k,l \right.\right\rangle \right\} \right] \right|
\tag{10.21}
$$

$$
\text{Tr}\, \mathcal{R}\_{\{k,l\}} = 1, \ \widehat{\mathcal{R}}\_{k} = \text{Tr}\_{\{\langle k,l \rangle} \mathcal{R}\_{\{k,l\}}, \ \widehat{\mathcal{R}}\_{l} = \text{Tr}\_{\langle \langle k,l \rangle} \mathcal{R}\_{\{k,l\}} \right] \tag{10.22}
$$

To ensure the constraint conditions, we introduce the Lagrange multipliers as follows:

$$\mathcal{L}\left[\left\{\mathcal{R}\_i \,|\, i \in V\right\}, \left\{\mathcal{R}\_{i\bar{j}} \,|\, (i,j) \in E\right\}\right] = -J \sum\_{\{i,j\} \in E} \text{Tr}\left( (\sigma^z \otimes I)(I \otimes \sigma^z) \mathcal{R}\_{i\bar{j}} \right),$$

**-**

−*h i*∈*V di*Tr(*<sup>σ</sup> <sup>z</sup> Ri*) <sup>−</sup> *i*∈*V* Tr(*σ <sup>x</sup> Ri*) <sup>+</sup>*k*B*<sup>T</sup>* {*i*,*j*}∈*E* Tr *Ri j* ln(*Ri j*) + *k*B*T i*∈*V* (1 − |∂*i*|)Tr(*Ri* ln(*Ri*)). − *i*∈*V* <sup>λ</sup>*i*(Tr*Ri* <sup>−</sup> <sup>1</sup>) <sup>−</sup> {*i*,*j*}∈*E* λ{*i*,*j*} Tr*Ri j* − 1 <sup>−</sup> {*i*,*j*}∈*E* Tr*λi,i j R<sup>i</sup>* − Tr\*<sup>i</sup> Ri j* <sup>−</sup> {*i*,*j*}∈*E* Tr*λ <sup>j</sup>,i j R<sup>j</sup>* − Tr\ *<sup>j</sup> Ri j* <sup>=</sup> {*i*,*j*}∈*E* Tr *Ri j* <sup>−</sup> *<sup>J</sup>* (*<sup>σ</sup> <sup>z</sup>* <sup>⊗</sup>*I*)(*I*⊗*<sup>σ</sup> <sup>z</sup>* ) + *k*Bln(*Ri j*) + *λi,i j* ⊗*I* + *I*⊗*λ <sup>j</sup>,i j* − λ*i j*(*I*⊗*I*) + *i*∈*V* Tr *Ri* <sup>−</sup> *hdi<sup>σ</sup> <sup>z</sup>* <sup>−</sup> *<sup>σ</sup> <sup>x</sup>* <sup>+</sup> *<sup>k</sup>*B*<sup>T</sup>* 1 − |∂*i*| ln(*Ri*) <sup>−</sup> *j*∈∂*i λi,i j* − λ*<sup>i</sup> I* + *i*∈*V* <sup>λ</sup>*<sup>i</sup>* <sup>+</sup> {*i*,*j*}∈*E* λ{*i*,*j*}. (10.322) **-**

Here we remark that *λ<sup>i</sup>,i j* = *λ<sup>i</sup>, j i* and *λ <sup>j</sup>,i j* = *λ <sup>j</sup>, j i* ({*i*, *j*}∈*E*,*i* < *j*).

**-**

**-**

We define the reduced density matrix *R <sup>i</sup>* for each node *i*(∈*V*) and *Ri j* = *Rj i* for each edge {*i*, *j*}∈*E* by **-**

$$\widehat{\mathbf{R}}\_{l} = \arg \operatorname\*{\bf repremum}\_{\mathbf{R}\_{l}} \left\{ \mathbf{R}\_{l} \left( -h d\_{l} \boldsymbol{\sigma}^{\mathbf{z}} - \boldsymbol{\Gamma} \boldsymbol{\sigma}^{\mathbf{x}} - \sum\_{j \in \partial i} \lambda\_{i,i j} + k\_{\mathbb{B}} T \left( 1 - |\partial i| \right) \ln(\mathbf{R}\_{l}) - \lambda\_{i} I \right) \right\} \tag{10.323}$$
 
$$(i \in V), \tag{10.323}$$

$$\widehat{\mathbf{R}}\_{lf} = \arg\underset{\mathbf{R}\_{if}}{\text{arg}\, \text{return}} \left| \mathbf{R}\_{if} \left( -J(\boldsymbol{\sigma}^{\sharp} \otimes I)(I \otimes \boldsymbol{\sigma}^{\sharp}) + \lambda\_{l,lf} \otimes I + I \otimes \lambda\_{f,lf} + k\_{\mathbb{B}} T \ln(\mathbf{R}\_{if}) - \lambda\_{\{l, j\}}(I \otimes I) \right) \right|$$

$$(|l|, |j| \in E). \tag{10.324}$$

The simultaneous self-consistent equations for reduced density matrices are expressed as **-**

$$\widehat{\mathcal{R}}\_{i} = \exp\left(-1 + \frac{\lambda\_{i}}{k\_{\rm B}T}\right) \exp\left(\frac{1}{k\_{\rm B}T} \left(\frac{1}{|\partial i| - 1}\right) \left(-hd\_{i}\sigma^{z} - \Gamma \sigma^{x} - \sum\_{j \in \partial i} \lambda\_{i, ij}\right)\right),\tag{10.325}$$

$$\widehat{\mathcal{R}}\_{ij} = \exp\left(-1 + \frac{\lambda\_{(i,j)}}{k\_{\rm B}T}\right) \exp\left(\frac{1}{k\_{\rm B}T} \left(J(\boldsymbol{\sigma}^{\sharp}\otimes I)(I\otimes \boldsymbol{\sigma}^{\sharp}) - \lambda\_{i,ij}\otimes I - I\otimes \lambda\_{j,ij}\right)\right),\tag{10.326}$$

$$\exp\left(1-\frac{\lambda\_i}{k\_\mathcal{B}T}\right) = \text{Tr}\left[\exp\left(\frac{1}{k\_\mathcal{B}T}\left(\frac{1}{|\partial i|-1}\right)\left(-hd\_i\sigma^z - \Gamma\sigma^x - \sum\_{j\in\partial i} \lambda\_{k,kj}\right)\right)\right],\tag{10.327}$$

$$\exp\left(1 - \frac{\lambda\_{\{i,j\}}}{k\_{\rm B}T}\right) = \operatorname{Tr}\left[\exp\left(\frac{1}{k\_{\rm B}T} \left(J(\sigma^z \otimes I)(I \otimes \sigma^z) - \lambda\_{i,ij} \otimes I - I \otimes \lambda\_{j,ij}\right)\right)\right].\tag{10.328}$$

By introducing the linear transformations

$$
\lambda\_{i,ij} = \lambda\_{i,ji} = -hd\_i \sigma^z - \Gamma \sigma^x - \sum\_{k \in \partial i \backslash \{j\}} \lambda\_{k \to i}, \tag{10.329}
$$

Equations (10.325) and (10.326) can be rewritten as **-**

$$\widehat{\mathbf{R}}\_{i} = \frac{1}{Z\_{i}} \exp\left(\frac{1}{k\_{\mathrm{B}}T} \left(hd\_{i}\sigma^{z} + \Gamma \sigma^{x} + \sum\_{k \in \partial i} \lambda\_{k \to i}\right)\right),\tag{10.330}$$

$$\widehat{\mathbf{R}}\_{lj} = \frac{1}{Z\_{\left[l, \boldsymbol{j}\right]}} \exp\left(\frac{1}{k\_{\mathbb{B}}T} \left(J(\boldsymbol{\sigma}^{\boldsymbol{z}} \otimes I)(I \otimes \boldsymbol{\sigma}^{\boldsymbol{z}}) + h\left(d(\boldsymbol{\sigma}^{\boldsymbol{z}} \otimes I) + d\_{\boldsymbol{j}}(I \otimes \boldsymbol{\sigma}^{\boldsymbol{z}})\right) + \Gamma\left(\boldsymbol{\sigma}^{\boldsymbol{x}} \otimes I + I \otimes \boldsymbol{\sigma}^{\boldsymbol{x}}\right)\right)$$

$$+ \sum\_{\boldsymbol{k} \in \mathbb{B} \cap \{\boldsymbol{\beta} \mid \}} \lambda\_{\boldsymbol{k} \hookrightarrow \boldsymbol{\ell}} \otimes I + \sum\_{\boldsymbol{l} \in \mathbb{B} \backslash \{\boldsymbol{\beta} \mid \}} I \otimes \lambda\_{\boldsymbol{l} \hookrightarrow \boldsymbol{\ell}}\Big),\tag{10.331}$$

$$Z\_i = \text{Tr}\left[\exp\left(\frac{1}{k\_\text{B}T}\left(hd\_i\sigma^z + \Gamma\sigma^x + \sum\_{k \in \partial i} \lambda\_{k \to i}\right)\right)\right],\tag{10.332}$$

$$Z\_{[l,j]} = \text{Tr}\Big[\exp\Big(\frac{1}{k\_B T} \Big(J(\sigma^z \otimes I)(I \otimes \sigma^z) + h\Big(d\_l(\sigma^z \otimes I) + d\_f(I \otimes \sigma^z)\Big) + \Gamma\Big(\sigma^x \otimes I + I \otimes \sigma^x\Big) \\

$$+ \sum\_{k \in \partial l(\{f\})} \lambda\_{k \mapsto l} \otimes I + \sum\_{l \in \partial j(\{l\})} I \otimes \lambda\_{l \mapsto j}\Big)\Big)\Big].\tag{10.333}$$
$$

Then, by substituting Eq. (10.330) and Eq. (10.331) into

$$\mathcal{R}\_{\bar{l}} = \text{Tr}\_{\backslash i} \mathcal{R}\_{ij},\tag{10.334}$$

**-**

we derive the following simultaneous self-consistent equations for the effective fields:

$$\begin{split} \exp\left(\frac{1}{k\_{\rm B}T}\lambda\_{f\hookrightarrow\ell} + \frac{1}{k\_{\rm B}T}\left(hd\_{l}\sigma^{\ddagger} + \Gamma\sigma^{\x} + \sum\_{k \in \partial i \backslash\{\langle\rangle\}} \lambda\_{k\leftarrow i}\right)\right) \\ = \frac{Z\_{\ell}}{Z\_{\{l,j\}}} \text{Tr}\_{\backslash}\left[\exp\left(\frac{1}{k\_{\rm B}T}\left(I(\sigma^{\ddagger}\otimes I)(I\otimes\sigma^{\u}) + I\otimes\left(hd\,\sigma^{\u} + \Gamma\sigma^{\x} + \sum\_{l \in \partial j \backslash\{\langle\rangle\}} \lambda\_{l\leftarrow i}\right)\right)\right) \\ \quad + \frac{1}{k\_{\rm B}T}\left(hd\_{l}\sigma^{\u} + \Gamma\sigma^{x} + \sum\_{k \in \partial i \backslash\langle\rangle\langle\rangle} \lambda\_{k\leftarrow i}\right) \otimes I\right), \end{split} \tag{10.335}$$

such that

$$\begin{split} \frac{1}{k\_{\rm B}T} \lambda\_{f \rightarrow \boldsymbol{\pi}} &= -\frac{1}{k\_{\rm B}T} \left( h d\_{l} \boldsymbol{\sigma}^{\boldsymbol{\pi}} + \boldsymbol{\Gamma} \boldsymbol{\sigma}^{\boldsymbol{\pi}} + \sum\_{\boldsymbol{k} \in \mathcal{H} \cup \{\boldsymbol{\langle}\boldsymbol{j}\rangle}} \lambda\_{\boldsymbol{k} \rightarrow \boldsymbol{\pi}} \right) \\ &+ \ln \left( \frac{Z\_{i}}{Z\_{\langle i, \boldsymbol{\cdot} \rangle}} \text{Tr}\_{\boldsymbol{\triangle}} \left[ \exp \left( \frac{1}{k\_{\rm B}T} \left( J(\boldsymbol{\sigma}^{\boldsymbol{\pi}} \otimes I)(I \otimes \boldsymbol{\sigma}^{\boldsymbol{\pi}}) + I \otimes \left( h d\_{l} \boldsymbol{\sigma}^{\boldsymbol{\pi}} + \boldsymbol{\Gamma} \boldsymbol{\sigma}^{\boldsymbol{\pi}} + \sum\_{\boldsymbol{l} \in \mathcal{H} \cup \{\boldsymbol{\langle}\boldsymbol{\cdot}\rangle}} \lambda\_{\boldsymbol{l} \rightarrow \boldsymbol{\pi}} \right) \right) \right. \\ & \left. + \frac{1}{k\_{\rm B}T} \left( h d\_{l} \boldsymbol{\sigma}^{\boldsymbol{\pi}} + \boldsymbol{\Gamma} \boldsymbol{\sigma}^{\boldsymbol{\pi}} + \sum\_{\boldsymbol{l} \in \mathcal{H} \cup \{\boldsymbol{\langle}\boldsymbol{\cdot}\rangle}} \lambda\_{\boldsymbol{l} \rightarrow \boldsymbol{\pi}} \right) \right) \right]. \end{split} \tag{10.3.36}$$

Note that Eqs. (10.335) and (10.336) can be regarded as conventional message passing rule quantum loopy belief propagation. The Bethe free energy *<sup>F</sup>*Bethe!'*<sup>R</sup>* *i i*∈*<sup>V</sup>* ( , ' *R* *i j* {*i*, *<sup>j</sup>*}∈*<sup>E</sup>* (" of the present system is given by **--**

$$\mathcal{F}\_{\text{Bethe}}\left[\left\{\widehat{\mathcal{R}}\_{i}\,|\,i\in V\right\},\left\{\widehat{\mathcal{R}}\_{ij}\,|\,(i,j)\in E\right\}\right] = \sum\_{i\in V} \left(-k\_{\text{B}}T\ln(Z\_{i})\right)$$

$$+\sum\_{\{i,j\}\in E} \left(-k\_{\text{B}}T\ln(Z\_{i,j}) + k\_{\text{B}}T\ln(Z\_{i}) + k\_{\text{B}}T\ln(Z\_{j})\right). \tag{10.337}$$

The conventional quantum message passing rules in Eqs. (10.335) and (10.336) reduce to Eqs. (10.95) and (10.96) for the case of = 0.

Because we have the orthonormal relationships

$$\begin{cases} \operatorname{Tr}[\sigma^z \sigma^z] = \operatorname{Tr}[\sigma^x \sigma^x] = 2, \\ \operatorname{Tr}[\sigma^z I] = \operatorname{Tr}[I \sigma^z] = \operatorname{Tr}[\sigma^x I] = \operatorname{Tr}[I \sigma^x] = 0, \\ \operatorname{Tr}[\sigma^z \sigma^x] = \operatorname{Tr}[\sigma^x \sigma^z] = 0, \end{cases} \tag{10.338}$$

$$\begin{cases} \text{Tr}\left[ (\sigma^{\varepsilon} \otimes I)(\sigma^{\varepsilon} \otimes I) \right] = \text{Tr}\left[ (\sigma^{x} \otimes I)(\sigma^{x} \otimes I) \right] = \text{Tr}\left[ (I \otimes \sigma^{\varepsilon})(I \otimes \sigma^{x}) \right] = \text{Tr}\left[ (I \otimes \sigma^{x})(I \otimes \sigma^{x}) \right] = 0, \\ \text{Tr}\left[ (\sigma^{\varepsilon} \otimes I)(\sigma^{x} \otimes I) \right] = \text{Tr}\left[ (\sigma^{x} \otimes I)(\sigma^{\varepsilon} \otimes I) \right] = \text{Tr}\left[ (I \otimes \sigma^{x})(I \otimes \sigma^{\varepsilon}) \right] = \text{Tr}\left[ (I \otimes \sigma^{\varepsilon})(I \otimes \sigma^{x}) \right] = 0, \\ \text{Tr}\left[ (\sigma^{\varepsilon} \otimes I)(I \otimes \sigma^{\varepsilon}) \right] = \text{Tr}\left[ (\sigma^{x} \otimes I)(I \otimes \sigma^{x}) \right] = \text{Tr}\left[ (\sigma^{\varepsilon} \otimes I)(I \otimes \sigma^{x}) \right] = \text{Tr}\left[ (\sigma^{x} \otimes I)(I \otimes \sigma^{\varepsilon}) \right] = 0, \end{cases} \tag{10.3.39}$$

the reduced density matrices *Ri* and *Ri j* = *Rj i* expand to the following orthonormal expansions:

$$\begin{split} \mathcal{R}\_{i} &= \frac{1}{2} (\mathbf{I} + m\_{i}^{x} \boldsymbol{\sigma}^{x} + m\_{i}^{z} \boldsymbol{\sigma}^{z}), \\ \mathcal{R}\_{ij} &= \mathcal{R}\_{ji} \\ &= \frac{1}{4} \Big( (I \otimes I) + m\_{i}^{x} (\boldsymbol{\sigma}^{x} \otimes I) + m\_{i}^{z} (\boldsymbol{\sigma}^{z} \otimes I) + m\_{j}^{x} (I \otimes \boldsymbol{\sigma}^{x}) + m\_{j}^{z} (I \otimes \boldsymbol{\sigma}^{z}) \\ &+ c\_{\langle i,j \rangle}^{zz} (\boldsymbol{\sigma}^{z} \otimes I) (I \otimes \boldsymbol{\sigma}^{z}) + c\_{\langle i,j \rangle}^{xz} (\boldsymbol{\sigma}^{x} \otimes I) (I \otimes \boldsymbol{\sigma}^{z}) \\ &+ c\_{\langle i,j \rangle}^{zx} (\boldsymbol{\sigma}^{z} \otimes I) (I \otimes \boldsymbol{\sigma}^{x}) + c\_{\langle i,j \rangle}^{zx} (\boldsymbol{\sigma}^{x} \otimes I) (I \otimes \boldsymbol{\sigma}^{x}) \Big), \end{split} \tag{10.341}$$

where

$$\begin{cases} m\_i^\upsilon = \text{Tr}[\boldsymbol{\sigma}^\nu \boldsymbol{R}\_i] = \text{Tr}\Big[ (\boldsymbol{\sigma}^\nu \otimes I) \boldsymbol{R}\_{if} \Big], \\ m\_j^\upsilon = \text{Tr}[\boldsymbol{\sigma}^{\nu'} \boldsymbol{R}\_j] = \text{Tr}[(I \otimes \boldsymbol{\sigma}^{\nu'}) \boldsymbol{R}\_{if}], \quad \text{( $\{i, j\} \in E$ ,  $i < j$ ,  $\nu \in \{x, z\}$ ,  $\nu' \in \{x, z\}$ )}. \end{cases} \tag{10.342}$$
  $\upsilon\_{(i, j)}^{\upsilon, \upsilon'} = \text{Tr}[(\boldsymbol{\sigma}^\nu \otimes I)(I \otimes \boldsymbol{\sigma}^{\nu'}) \boldsymbol{R}\_{if}], \tag{10.342}$ 

By using these orthonormal expansions of the reduced density matrices, the Bethe free energy functional in Eq. (10.319) can be rewritten as

$$\begin{split} \mathcal{F}[\mathbb{R}] = \mathcal{F}\_{\text{Rethe}} \Big[ \Big\{ \Big| \begin{matrix} \boldsymbol{w}\_{i}^{\upsilon} \Big| \boldsymbol{i} \in V, \,\boldsymbol{\nu} \in \{\boldsymbol{x}, \boldsymbol{z}\} \end{matrix} \Big\}, \Big\{ \begin{matrix} \boldsymbol{c}\_{\{i,j\}}^{\upsilon,\upsilon'} \Big| \boldsymbol{\{i},j\} \in E, \,\boldsymbol{\nu}\{\boldsymbol{x}, \boldsymbol{z}\}, \,\boldsymbol{\nu}' \in \{\boldsymbol{x}, \boldsymbol{z}\} \Big\} \Big\} \Big\} \\ \equiv -\,\boldsymbol{J} \sum\_{\{i,j\} \in E} \Big\{ \boldsymbol{c}\_{\{i,j\}}^{\upsilon,\upsilon} \Big| -\boldsymbol{h} \sum\_{i \in V} \boldsymbol{d}\_{i} m\_{i}^{\sharp} - \boldsymbol{\Gamma} \sum\_{i \in V} m\_{i}^{\upsilon} \\ \quad + \boldsymbol{k}\_{\textrm{B}} T \sum\_{i \in V} (1 - |\partial \boldsymbol{i}|) \mathrm{Tr} (\boldsymbol{\mathcal{R}}\_{\boldsymbol{i}} \ln(\boldsymbol{\mathcal{R}}\_{\boldsymbol{i}})) \\ \quad + \boldsymbol{k}\_{\textrm{B}} T \sum\_{\{i,j\} \in E} \mathrm{Tr} \Big( \boldsymbol{\mathcal{R}}\_{\{i,j\}} \ln(\boldsymbol{\mathcal{R}}\_{\{i,j\}}) \Big) . \end{split} \tag{10.343}$$

The extremum conditions

$$\begin{aligned} &\frac{\partial}{\partial m\_k^\upsilon} \mathcal{F}\_{\text{Bethe}} \Big[ \left\{ m\_i^\upsilon \, \middle| \, i \in V, \, \upsilon \in \{x, z\} \right\}, \left\{ c\_{\langle i, j \rangle}^{\upsilon \upsilon'} \Big| \, \{i, j\} \in E, \, \upsilon \{x, z\}, \, \upsilon' \in \{x, z\} \right\} \Big] \\ &= 0 \ (k \in V, \, \upsilon \in \{x, z\}), \end{aligned} \tag{10.344}$$

$$\frac{\partial}{\partial \mathcal{C}^{\boldsymbol{\nu}\boldsymbol{\nu}^{\boldsymbol{\nu}}}\_{\{\boldsymbol{k},\boldsymbol{l}\}}} \mathcal{F}\_{\text{Bethe}} \Big[ \Big\{ m^{\boldsymbol{\nu}}\_{i} \Big| i \in \boldsymbol{V}, \; \boldsymbol{\nu} \in \{\boldsymbol{x}, \; \boldsymbol{z}\} \Big\}, \Big\{ \mathcal{c}^{\boldsymbol{\nu}\boldsymbol{\nu}^{\boldsymbol{\prime}}}\_{\{\boldsymbol{i},\boldsymbol{j}\}} \Big| i, \; \boldsymbol{j} \} \in \boldsymbol{E}, \; \boldsymbol{\nu} \{\boldsymbol{x}, \; \boldsymbol{z}\}, \ \boldsymbol{\nu}^{\boldsymbol{\prime}} \in \{\boldsymbol{x}, \; \boldsymbol{z}\} \Big\} \Big]$$

$$\mathbf{x} = \mathbf{0} \ ( \{ k, l \} \in E, \ \nu \in \{ \mathbf{x}, z \}, \ \ \ \ \nu' \in \{ \mathbf{x}, z \} ), \tag{10.345}$$

can be reduced to the following simultaneous equations: **--**

$$\begin{cases} \frac{h}{k\_{\rm B}T}d\_{l} = \frac{1}{2}(1-|\operatorname{id}|)\operatorname{Tr}\Big[\sigma^{\mathsf{T}}\ln\Big(\widehat{\mathsf{R}}\_{l}\Big)\Big] + \frac{1}{4}\sum\_{\{f\in\widehat{\mathsf{S}}\boldsymbol{d},f\sim\mathcal{I}\}}\operatorname{Tr}\Big[(\sigma^{\mathsf{T}}\otimes I)\ln(\widehat{\mathsf{R}}\_{l})\Big] + \frac{1}{4}\sum\_{\{k\in\widehat{\mathsf{S}}\boldsymbol{d},k\leqslant l\}}\operatorname{Tr}\Big[(I\otimes\sigma^{\mathsf{T}})\ln(\widehat{\mathsf{R}}\_{kl})\Big],\\ \frac{\Gamma}{k\_{\rm B}T} = \frac{1}{2}(1-|\operatorname{id}|)\operatorname{Tr}\Big[\sigma^{\mathsf{T}}\ln\Big(\widehat{\mathsf{R}}\_{l}\Big)\Big] + \frac{1}{4}\sum\_{\{f\in\widehat{\mathsf{S}}\boldsymbol{d},f\sim\mathcal{I}\}}\operatorname{Tr}\Big[(\sigma^{\mathsf{T}}\otimes I)\ln(\widehat{\mathsf{R}}\_{l})\Big] + \frac{1}{4}\sum\_{\{k\in\widehat{\mathsf{S}}\boldsymbol{d},k\leqslant l\}}\operatorname{Tr}\Big[(I\otimes\sigma^{\mathsf{T}})\ln(\widehat{\mathsf{R}}\_{kl})\Big].\\ \end{cases} (1\in V). \tag{10.346}$$

**-**

$$\begin{cases} \begin{aligned} &\frac{J}{k\_{\rm B}T} = \frac{1}{4} \text{Tr} \Big[ (\sigma^{z} \otimes I)(I \otimes \sigma^{z}) \ln(\widehat{R}\_{lj}) \Big], \\ &0 = \frac{1}{4} \text{Tr} \Big[ (\sigma^{z} \otimes I)(I \otimes \sigma^{x}) \ln(\widehat{R}\_{lj}) \Big] = \frac{1}{4} \text{Tr} \Big[ (\sigma^{x} \otimes I)(I \otimes \sigma^{z}) \ln(\widehat{R}\_{lj}) \Big], \text{ (} (l, j) \in E \text{)}, \\ &0 = \frac{1}{4} \text{Tr} \Big[ (\sigma^{x} \otimes I)(I \otimes \sigma^{x}) \ln(\widehat{R}\_{lj}) \Big]. \end{aligned} \tag{10.347}$$

where **-**

$$
\begin{split}
\widehat{\mathbf{R}}\_{i} &= \frac{1}{2} (I + \widehat{m}\_{i}^{\mathrm{x}} \sigma^{\mathrm{x}} + \widehat{m}\_{i}^{\mathrm{z}} \sigma^{\mathrm{z}}) \,(i \in V), \\
\widehat{\mathbf{R}}\_{ij} &= \widehat{\mathbf{R}}\_{ji}
\end{split}
\tag{10.348}
$$

$$
\begin{split}
&= \frac{1}{4} \Big( (I \otimes I) + \widehat{m}\_{i}^{\mathrm{x}} (\sigma^{\mathrm{x}} \otimes I) + \widehat{m}\_{i}^{\mathrm{z}} (\sigma^{\mathrm{z}} \otimes I) + \widehat{m}\_{j}^{\mathrm{x}} (I \otimes \sigma^{\mathrm{x}}) + \widehat{m}\_{j}^{\mathrm{z}} (I \otimes \sigma^{\mathrm{z}}) \\
&\quad + \widehat{c}\_{[i,j]}^{\mathrm{x}z} (\sigma^{\mathrm{z}} \otimes I) (I \otimes \sigma^{\mathrm{z}}) + \widehat{c}\_{[i,j]}^{\mathrm{x}z} (\sigma^{\mathrm{x}} \otimes I) (I \otimes \sigma^{\mathrm{z}}) \\
&\quad + \widehat{c}\_{[i,j]}^{\mathrm{x}z} (\sigma^{\mathrm{z}} \otimes I) (I \otimes \sigma^{\mathrm{x}}) + \widehat{c}\_{[i,j]}^{\mathrm{x}z} (\sigma^{\mathrm{x}} \otimes I) (I \otimes \sigma^{\mathrm{x}}) \Big) (\langle i, j \rangle \in E, \ i < j).
\end{split} \tag{10.349}
$$

For = 0, Eq. (10.347) with Eqs. (10.348) and (10.349) reduces to Eqs. (10.107) and (10.108) with Eqs. (10.72) and (10.102).

Before finishing the present subsection, we briefly review another framework of the quantum advanced mean-field method. As we mentioned above, advanced quantum mean-field methods have also been formulated in the momentum space. One familiar formulation is spin wave theory [91]. A general formulation of the quantum cluster variation method from the viewpoint of spin wave theory was proposed in Refs. [92, 93].

**-**

## *10.5.2 Real-Space Renormalization Group Method for the Transverse Ising Model*

We now present sublinear modeling in statistical machine learning procedures by using the real-space renormalization group method for the transverse Ising model in Eq. (10.301) on the ring graph (*V*, *E*) of Eq. (10.176) for the case of |*V*| = 2*<sup>L</sup>* and *h* = 0. The present scheme follows the one in Refs. [37, 94]. Some extensions of the present frameworks for the ring graph in Eq. (10.176) to higher-dimensional graphs such as the torus graph may be available according to the frameworks of Ref. [94].

The important part of the transverse Ising model in Eq. (10.301), −*J* (*σ <sup>z</sup>* ⊗*I*)(*I*⊗*σ <sup>z</sup>* ) − (*σ <sup>x</sup>*⊗*I*) can be diagonalized as

−*J σ z* ⊗*I <sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* − *<sup>σ</sup> <sup>x</sup>*⊗*<sup>I</sup>* = ⎛ ⎜ ⎜ ⎝ −*J* 0 − 0 0 *J* 0 − − 0 *J* 0 0 − 0 −*J* ⎞ ⎟ ⎟ ⎠ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 0 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> − 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 0 − 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ × ⎛ ⎜ ⎜ ⎝ − <sup>√</sup>*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> 0 00 0 − <sup>√</sup>*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> 0 0 0 0 <sup>√</sup>*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> <sup>0</sup> 0 00 <sup>√</sup>*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> ⎞ ⎟ ⎟ ⎠ × ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 0 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> − 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 0 − 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 1 2 <sup>1</sup> <sup>+</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 1 2 <sup>1</sup> <sup>−</sup> <sup>√</sup> *<sup>J</sup> J* <sup>2</sup>+<sup>2</sup> 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ T . (10.350)

The eigenvalues <sup>ε</sup><sup>1</sup> <sup>=</sup> <sup>ε</sup><sup>2</sup> = −√*<sup>J</sup>* <sup>2</sup> <sup>+</sup> 2, <sup>ε</sup><sup>3</sup> <sup>=</sup> <sup>ε</sup><sup>4</sup> = +√*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> have the relationship ε<sup>1</sup> = ε<sup>2</sup> < ε<sup>3</sup> = ε<sup>4</sup> and their corresponding eigenvectors are given by

$$\begin{cases} |1\rangle = \begin{pmatrix} \sqrt{\frac{1}{2}\left(1+\frac{J}{\sqrt{J^2+\Gamma^2}}\right)}\\ 0\\ \sqrt{\frac{1}{2}\left(1-\frac{J}{\sqrt{J^2+\Gamma^2}}\right)}\\ 0\\ 0 \end{pmatrix}, \quad |2\rangle = \begin{pmatrix} 0\\ +\sqrt{\frac{1}{2}\left(1-\frac{J}{\sqrt{J^2+\Gamma^2}}\right)}\\ 0\\ +\sqrt{\frac{1}{2}\left(1+\frac{J}{\sqrt{J^2+\Gamma^2}}\right)}\\ 0\\ +\sqrt{\frac{1}{2}\left(1-\frac{J}{\sqrt{J^2+\Gamma^2}}\right)} \end{pmatrix}, \quad |4\rangle = \begin{pmatrix} \frac{1}{2}\left(1-\frac{J}{\sqrt{J^2+\Gamma^2}}\right)\\ \sqrt{\frac{1}{2}\left(1-\frac{J}{\sqrt{J^2+\Gamma^2}}\right)}\\ 0\\ -\sqrt{\frac{1}{2}\left(1+\frac{J}{\sqrt{J^2+\Gamma^2}}\right)}\\ 0 \end{pmatrix} \end{cases} (10.351)$$

To realize the coarse graining of the present transverse Ising model for the case of zero temperature *T* = 0 for the density matrix *P* in Eq. (10.300), we introduce the following projection operator:

$$\mathbb{P}\_i^{(2^L)} = \underbrace{\mathbb{P} \otimes \mathbb{P} \otimes \cdots \otimes \mathbb{P}}\_{2^L \text{ P}\_s},\tag{10.352}$$

where

$$\begin{aligned} \mathbb{P} &= \begin{pmatrix} \langle 1 \vert \\ \langle 2 \vert \right) (\vert 1 \rangle \langle 1 \vert + \vert 2 \rangle \langle 2 \vert) &= \begin{pmatrix} \langle 1 \vert \\ \langle 2 \vert \end{pmatrix} \\ &= \begin{pmatrix} \sqrt{\frac{1}{2} \left( 1 + \frac{J}{\sqrt{J^2 + \Gamma^2}} \right)} & 0 & \sqrt{\frac{1}{2} \left( 1 - \frac{J}{\sqrt{J^2 + \Gamma^2}} \right)} \\ 0 & + \sqrt{\frac{1}{2} \left( 1 - \frac{J}{\sqrt{J^2 + \Gamma^2}} \right)} & 0 & \sqrt{\frac{1}{2} \left( 1 + \frac{J}{\sqrt{J^2 + \Gamma^2}} \right)} \end{pmatrix}. \end{aligned} \tag{10.353}$$

Because it is valid that

$$\mathbb{P}\begin{pmatrix} -J & 0 & -\Gamma & 0\\ 0 & J & 0 & -\Gamma\\ -\Gamma & 0 & J & 0\\ 0 & -\Gamma & 0 & -J \end{pmatrix} \mathbb{P}^{\mathrm{T}} = -\sqrt{J^{2} + \Gamma^{2}} \begin{pmatrix} \langle 1|1\rangle \ \langle 1|2\rangle\\ \langle 2|1\rangle \ \langle 2|2\rangle \end{pmatrix} = -\sqrt{J^{2} + \Gamma^{2}}I,\quad(10.354)$$

we can derive the following equalities

$$\begin{split} & \mathbb{P}\_{i}^{(\mathscr{L}^{L})} \left( - \operatorname{J} \sigma\_{2i-1}^{\mathbb{E}} \sigma\_{2i}^{\mathbb{E}} - \operatorname{\Gamma} \sigma\_{2i-1}^{\mathbb{E}} \right) \mathbb{P}\_{i}^{(\mathscr{L}^{L})} \\ & \qquad \text{Tensor products of } (i-1) \text{ matrices } (I \otimes I) \\ &= \mathbb{P}\_{i}^{(\mathscr{L}^{L})} \left( \overbrace{(I \otimes I) \otimes (I \otimes I)}^{\text{( $I \otimes I$ )} \otimes (I \otimes I)}^{\text{( $I \otimes I$ )}} \otimes \big( - \operatorname{J} \left( \sigma^{\mathbb{E}} \otimes I \right) (I \otimes \sigma^{\mathbb{E}}) - \operatorname{\Gamma} \left( \sigma^{\mathbb{E}} \otimes I \right) \big) \right) \\ & \qquad \otimes \underbrace{\begin{pmatrix} (I \otimes I) \otimes (I \otimes I) \otimes \cdots \otimes (I \otimes I) & \\ \operatorname{\Gamma} \text{-} \operatorname{\Gamma} & \operatorname{\Gamma} \end{pmatrix}}\_{\text{Tensor products of } (2^{L-1} - i) \text{ matrices } (I \otimes I)} \begin{pmatrix} \mathbb{P}\_{i}^{(\mathscr{L})} \\ \end{pmatrix}^{\text{T}} \end{split}$$

= Tensor Products of (*i*−1) Matrices  <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> <sup>5</sup> <sup>63</sup> <sup>4</sup> <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ P − *J σ z* ⊗*I <sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* − *<sup>σ</sup> <sup>x</sup>*⊗*<sup>I</sup>* P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> 3 45 6 Tensor Products of (2*L*−1−*i*) Matrices (P(*I*⊗*I*)PT) = Tensor Products of (*i*−1) Matrices  <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> <sup>5</sup> <sup>63</sup> <sup>4</sup> <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ P − *J σ z* ⊗*I <sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* − *<sup>σ</sup> <sup>x</sup>*⊗*<sup>I</sup>* P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> 3 45 6 Tensor Products of (2*L*−1−*i*) Matrices (P(*I*⊗*I*)PT) = (*i*−1) *I* s 5 63 4 *I*⊗*I*⊗· · ·⊗*I* ⊗ − 7 *J* <sup>2</sup> + <sup>2</sup> *I* ⊗ *I*⊗*I*⊗· · ·⊗*I* 3 45 6 (2*L*−1−*i*) *I*s , (10.355)

P(2*<sup>L</sup>* ) *i* <sup>−</sup> *<sup>J</sup><sup>σ</sup> <sup>z</sup>* **<sup>2</sup>***<sup>i</sup> <sup>σ</sup> <sup>z</sup>* **<sup>2</sup>***i***+<sup>1</sup>** <sup>−</sup> *<sup>σ</sup> <sup>x</sup>* **2***i* P(2*<sup>L</sup>* ) *i* T <sup>=</sup> <sup>P</sup>(2*<sup>L</sup>* ) *i* Tensor Products of (*i*−1) Matrices (*I*⊗*I*) <sup>5</sup> <sup>63</sup> <sup>4</sup> *I*⊗*I* ⊗ *I*⊗*I* ⊗· · ·⊗ *I*⊗*I* ⊗ − *J <sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* ⊗*I*⊗*I <sup>I</sup>*⊗*I*⊗*<sup>σ</sup> <sup>z</sup>* ⊗*I* − *<sup>I</sup>*⊗*<sup>σ</sup> <sup>x</sup>*⊗*I*⊗*<sup>I</sup>* ⊗  *I*⊗*I* ⊗ *I*⊗*I* ⊗· · ·⊗ *I*⊗*I* 3 45 6 Tensor Products of (2*L*−1−*i*−1) Matrices (*I*⊗*I*) P(2*<sup>L</sup>* ) *i* T = Tensor Products of (*i*−1) Matrices  <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> <sup>5</sup> <sup>63</sup> <sup>4</sup> <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>⊗P<sup>T</sup> − *J <sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* ⊗*I*⊗*I <sup>I</sup>*⊗*I*⊗*<sup>σ</sup> <sup>z</sup>* ⊗*I* − *<sup>I</sup>*⊗*<sup>σ</sup> <sup>x</sup>*⊗*I*⊗*<sup>I</sup>* <sup>P</sup>⊗P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> 3 45 6 Tensor Products of (2*L*−1−*i*−1) Matrices (P(*I*⊗*I*)PT) = Tensor Products of (*i*−1) Matrices  <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> <sup>5</sup> <sup>63</sup> <sup>4</sup> <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ − *J* P *<sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ P *σ z* ⊗*I* <sup>P</sup><sup>T</sup> <sup>−</sup> P *<sup>I</sup>*⊗*<sup>σ</sup> <sup>x</sup>* <sup>P</sup>T⊗P(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> ⊗· · ·⊗ <sup>P</sup>(*I*⊗*I*)P<sup>T</sup> 3 45 6 Tensor Products of (2*L*−1−*i*−1) Matrices (P(*I*⊗*I*)PT) = (*i*−1) *<sup>I</sup>* s 5 63 4 *I*⊗*I*⊗· · ·⊗*I* ⊗ <sup>−</sup> *<sup>J</sup>* <sup>2</sup> <sup>√</sup>*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> *σ z* ⊗*I <sup>I</sup>*⊗*<sup>σ</sup> <sup>z</sup>* <sup>−</sup> <sup>2</sup> <sup>√</sup>*<sup>J</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup> *<sup>σ</sup> <sup>x</sup>*⊗*<sup>I</sup>* ⊗ *I*⊗*I*⊗· · ·⊗*I* 3 45 6 (2*L*−1−*i*−1) *I*s , (10.356)

$$\begin{split} & \mathbb{P}\_{i}^{(2^{L})} \left( -J\sigma\_{1}^{z}\sigma\_{2^{L}}^{z} - \Gamma\sigma\_{2^{L}}^{x} \right) \mathbb{P}\_{i}^{(2^{L})} \\ &= \mathbb{P}\_{i}^{(2^{L})} \left( -J \left( (\sigma^{z}\otimes I)\otimes \overbrace{(I\otimes I)\otimes (I\otimes I)}^{\text{Tensor}} \underbrace{\circ\,\otimes(I\otimes I)}\_{\otimes\text{-}l} \otimes (I\otimes I) \right) \right) \end{split}$$

$$\begin{aligned} \text{Tensor products of } (2^{L-1}-2) \text{ matrices } (I \otimes I) \\ \times \left( (I \otimes I) \otimes \overbrace{(I \otimes I) \otimes (I \otimes I)}^{\left(\mathbb{Z}^{L-1}\right)} \otimes \dots \otimes (I \otimes I) \right) \\ - \Gamma (I \otimes I) \otimes \overbrace{(I \otimes I) \otimes (I \otimes I) \otimes \dots \otimes (I \otimes I)}^{\text{Tensor products of } (2^{L-1}-2) \text{ matrices } (I \otimes I)} \\ - \Gamma (I \otimes I) \otimes \overbrace{\left(I \otimes I\right) \otimes (I \otimes I) \otimes \dots \otimes \left(I \otimes I\right)}^{\left(\mathbb{Z}^{L-1}\right)} \otimes \left(I \otimes \sigma^{\mathbf{x}}\right) \left(\mathbb{Z}^{(2^{L})}\right)^{\text{T}} \\ - \frac{J^{2}}{\sqrt{J^{2} + \Gamma^{2}}} \left(\sigma^{\mathbf{z}} \otimes \left(\overline{I \otimes I \otimes \dots \otimes I}\right) \otimes I\right) \left(I \otimes \left(\overline{I \otimes I \otimes \dots \otimes I}\right)\right) \otimes \sigma^{\mathbf{z}}\right) \\ - \frac{\Gamma^{2}}{\sqrt{J^{2} + \Gamma^{2}}} \left(I \otimes \left(\overline{I \otimes I \otimes \dots \otimes I}\right) \otimes \sigma^{\mathbf{x}}\right). \end{aligned}$$

By using these equalities, the first step of the renormalized energy matrix *<sup>H</sup>(***2***L***−1***)* <sup>≡</sup> P(2*<sup>L</sup>* ) *<sup>i</sup> <sup>H</sup>*P(2*<sup>L</sup>* ) *i* T can be reduced as follows:

$$\begin{split} H^{(2^{L-1})} &= \mathbb{P}\_{i}^{(2^{L})} H \mathbb{P}\_{i}^{(2^{L})} \\ &= -2^{L-1} \sqrt{I^{2} + \Gamma^{2}} (\hat{I} \otimes I \otimes \cdots \otimes \hat{I}) \\ &- \sum\_{i=1}^{2^{L-1}} \Big( \widehat{I \otimes I \otimes \cdots \otimes I} \Big) \otimes \Big( \frac{I^{2}}{\sqrt{I^{2} + \Gamma^{2}}} (\mathfrak{o}^{\tau} \otimes I)(I \otimes \mathfrak{o}^{\tau}) + \frac{\Gamma^{2}}{\sqrt{I^{2} + \Gamma^{2}}} (\mathfrak{o}^{\tau} \otimes I) \Big) \otimes \Big( \underbrace{I \otimes I \otimes \cdots \otimes I}\_{(2^{L-1} \cdot i - 1) \text{ \AA}} \Big) \\ &- \frac{I^{2}}{\sqrt{I^{2} + \Gamma^{2}}} \Big( \mathfrak{o}^{\tau} \otimes \Big( \widetilde{I \otimes I \otimes \cdots \otimes I} \Big) \otimes \Big( I \otimes \widetilde{I \otimes I \otimes \cdots \otimes I} \Big) \otimes \mathfrak{o}^{\tau} \Big) \\ &- \frac{\Gamma^{2}}{\sqrt{I^{2} + \Gamma^{2}}} \Big( I \otimes \widetilde{I \otimes I \otimes \cdots \otimes I} \Big) \otimes \mathfrak{o}^{\tau} \Big). \end{split}$$

By similar arguments to those for the above procedure, the *r*-th step of the renormalized energy matrix *<sup>H</sup>(***2***L***−***r)* <sup>≡</sup> <sup>P</sup>(2*L*−*r*) *<sup>i</sup>* <sup>P</sup>(2*L*−*r*+1) *<sup>i</sup>* ···P(2*<sup>L</sup>* ) *i H* <sup>P</sup>(2*<sup>L</sup>* ) *i* T ···P(2*L*−*r*+1) *i* T <sup>P</sup>(2*L*−*r*) *i* T can be reduced to the following recursion formulas:

$$\begin{split} H^{(2^{L-r})} &= \mathbb{P}\_{i}^{(2^{L-r}+1)} H^{(2^{L-r+1})} \mathbb{P}\_{i}^{(2^{L-r+1})} \\ &= 2^{L-r} \varepsilon\_{i}^{(r)} \left( \widehat{I \otimes I \otimes \cdots \otimes I} \right) \\ &- \sum\_{i=1}^{2^{L-1}} \left( \widehat{I \otimes I \otimes \cdots \otimes I} \right) \otimes \left( J^{(r)} \left( \sigma^{z} \otimes I \right) (I \otimes \sigma^{z}) + \Gamma^{(r)} \left( \sigma^{x} \otimes I \right) \right) \otimes \left( \underbrace{I \otimes I \otimes \cdots \otimes I}\_{(2^{L-r}-i-1) \text{ \AA}} \right) \\ &- J^{(r)} \left( \sigma^{z} \otimes \underbrace{\left( I \otimes I \otimes \cdots \otimes I \right)}\_{I} \otimes I \right) \left( I \otimes \underbrace{\left( I \otimes I \otimes \cdots \otimes I \right)}\_{I} \right) \otimes \sigma^{z} \right) \end{split}$$

$$-\Gamma^{(r)}\Big(I\otimes\overbrace{I\otimes I\otimes\cdots\otimes I}^{(2^{L-r}-2)\text{ }I^{\text{s}}})\otimes\sigma^{x}\Big),\tag{10.359}$$

where

$$\begin{cases} J^{(r)} = \frac{(J^{(r-1)})^2}{\sqrt{(J^{(r-1)})^2 + (\Gamma^{(r-1)})^2}},\\ \Gamma^{(r)} = \frac{(\Gamma^{(r-1)})^2}{\sqrt{(J^{(r-1)})^2 + (\Gamma^{(r-1)})^2}},\end{cases} \tag{10.360}$$

$$\varepsilon\_1^{(r)} = -\sqrt{(J^{(r-1)})^2 + (\Gamma^{(r-1)})^2},\tag{10.361}$$

$$\begin{cases} H^{(2^L)} \equiv H, \\ J^{(0)} \equiv J, \\ \Gamma^{(0)} \equiv \Gamma. \end{cases} \tag{10.362}$$

The inverse of the real-space renormalization group is given by

$$\begin{cases} J^{(r-1)} = \sqrt{J^{(r)} \left( J^{(r)} + \Gamma^{(r)} \right)},\\ \Gamma^{(r-1)} = \sqrt{\Gamma^{(r)} \left( J^{(r)} + \Gamma^{(r)} \right)}, \end{cases} \tag{10.363}$$

If the hyperparameters *J* (*r*) and (*r*) in the *r*-th renormalized density matrix *H(***2***L***−***r)* have been estimated from given data vectors by using the QEM algorithm for a renormalized density matrix on ring graphs  *V*(*r*) , *E*(*r*) , we can estimate the hyperparameters *J* (0) = *J* and (0) = of the transverse Ising model (10.301) on the ring graph *E* of Eq. (10.176) for the case of |*V*| = 2*<sup>L</sup>* and *h* = 0 by using the inverse transformation rule of the real-space renormalization group procedure (10.363).

## *10.5.3 Sublinear Modeling Using a Quantum Adaptive TAP Approach and Momentum Space Renormalization Group in the Transverse Ising Model*

This section proposes a novel scheme for the momentum space renormalization group approaches in **Adaptive Thouless-Anderson-Palmar(TAP) Approaches** for the transverse Ising model on random graphs. The adaptive TAP approach is a familiar advanced mean-field method for the probabilistic graphical model and many extensions have been proposed [95–98]. Furthermore, sublinear modeling for the EM procedure in probabilistic graphical models has been realized by introducing **Momentum Space Renormalization Group Approaches** [99, 100]. The method proposed in this section is formulated by combining the adaptive TAP approaches with the momentum space renormalization group approaches. Moreover, our method is applicable not only to regular graphs but also to random graphs.

The density matrix *P* in Eq. (10.300) can be rewritten as

$$P = \frac{\exp\left(-\frac{J}{2k\_{\text{B}}T} \sum\_{\{l,\boldsymbol{l}\} \in E} \left(\sigma\_{l}^{\boldsymbol{z}} - \sigma\_{j}^{\boldsymbol{z}}\right)^{2} - \frac{h}{2k\_{\text{B}}T} \sum\_{\boldsymbol{l} \in \boldsymbol{V}} \left(\sigma\_{l}^{\boldsymbol{z}} - d\_{l}I^{\{2\}^{\boldsymbol{V}\}}\right)^{2} - \frac{1}{2k\_{\text{B}}T} \sum\_{\boldsymbol{l} \in \boldsymbol{V}} \left(\sigma\_{l}^{\boldsymbol{x}} - \gamma I^{\{2\}^{\boldsymbol{V}\}}\right)^{2}\right)}{\text{Tr}\left[\exp\left(-\frac{J}{2k\_{\text{B}}T} \sum\_{\{l,\boldsymbol{l}\} \in E} \left(\sigma\_{l}^{\boldsymbol{z}} - \sigma\_{j}^{\boldsymbol{z}}\right)^{2} - \frac{h}{2k\_{\text{B}}T} \sum\_{\boldsymbol{l} \in \boldsymbol{V}} \left(\sigma\_{l}^{\boldsymbol{z}} - d\_{l}I^{\{2\}^{\boldsymbol{V}\}}\right)^{2} - \frac{1}{2k\_{\text{B}}T} \sum\_{\boldsymbol{l} \in \boldsymbol{V}} \left(\sigma\_{l}^{\boldsymbol{x}} - \Gamma I^{\{2\}^{\boldsymbol{V}\}}\right)^{2}\right)\right]}.\tag{10.364}$$

The density matrix *P* satisfies the following minimization of the free energy functional:

$$\mathbf{P} = \operatorname\*{argmin}\_{\mathbf{R}} \left\{ \mathcal{F} \mathbf{[R} \, \middle| \, \mathrm{Tr} \, \mathbf{R} = 1 \right\}, \tag{10.365}$$

$$\begin{split} \mathcal{F}[\mathcal{R}] \equiv & \frac{1}{2} J \sum\_{\{i,j\} \in E} \text{Tr} \Big[ \left( \sigma\_i^z - \sigma\_j^z \right)^2 \mathcal{R} \Big] + \frac{1}{2} h \sum\_{i \in V} \text{Tr} \Big[ \left( \sigma\_i^z - d\_i I^{(2^{V})} \right)^2 \mathcal{R} \Big] \\ & + \frac{1}{2} \sum\_{i \in V} \text{Tr} \Big[ \left( \sigma\_i^x - \Gamma I^{(2^{V})} \right)^2 \mathcal{R} \Big] + k\_\mathcal{B} T \text{Tr} [\mathcal{R} \ln(\mathcal{R})]. \end{split} \tag{10.366}$$

Because all the off-diagonal elements of  *σ z <sup>i</sup>* <sup>−</sup> *<sup>σ</sup> <sup>z</sup> j* <sup>2</sup> are zero, we have

*F*% *R* & = 1 2 *J* {*i*,*j*}∈*E s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- *s*1,*s*2, ···,*s*|*<sup>V</sup>*|| *σ z <sup>i</sup>* <sup>−</sup> *<sup>σ</sup> <sup>z</sup> j* 2 |*s*1,*s*2, ···,*s*|*<sup>V</sup>*| ×*s*1,*s*2, ···,*s*|*<sup>V</sup>*||*R*|*s*1,*s*2, ···,*s*|*<sup>V</sup>*| + 1 2 *h i*∈*V s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- *s*1,*s*2, ···,*s*|*<sup>V</sup>*|| *σ z <sup>i</sup>* <sup>−</sup> *di <sup>I</sup>(***2|***V***<sup>|</sup>** *)* 2 |*s*1,*s*2, ···,*s*|*<sup>V</sup>*| ×*s*1,*s*2, ···,*s*|*<sup>V</sup>*||*R*|*s*1,*s*2, ···,*s*|*<sup>V</sup>*| + 1 2 *i*∈*V* Tr! *σ x <sup>i</sup>* <sup>−</sup> *I(***2|***V***<sup>|</sup>** *)* 2 *R* " + *k*B*T*Tr[*R*ln(*R*)] = 1 2 *J* {*i*,*j*}∈*E s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- *si* − *sj* 2 *s*1,*s*2, ···,*s*|*<sup>V</sup>*||*R*|*s*1,*s*2, ···,*s*|*<sup>V</sup>*| + 1 2 *h i*∈*V s*1∈- *s*2∈- ··· *s*|*<sup>V</sup>* |∈- (*si* − *di*) 2 *s*1,*s*2, ···,*s*|*<sup>V</sup>*||*R*|*s*1,*s*2, ···,*s*|*<sup>V</sup>*|

254 K. Tanaka

$$+\frac{1}{2}\sum\_{i\in V} \text{Tr}\left[\left(\sigma\_i^{\mathbf{x}} - \Gamma \boldsymbol{I}^{(2^{|\mathcal{V}|})}\right)^2 \mathcal{R}\right] + k\_\mathcal{B} T \text{Tr}[\mathcal{R} \ln(\mathcal{R})].\tag{10.367}$$

By introducing the reduced density matrix *Ri* in Eq. (10.275) and

$$\rho\left(\mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{|V|}\right) \equiv \langle \mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{|V|} | \mathbf{R} | \mathbf{s}\_1, \mathbf{s}\_2, \dots, \mathbf{s}\_{|V|} \rangle,\tag{10.368} \\ \text{s} \tag{10.369}$$

and by extending ρ *s*1,*s*2, ···,*s*|*V*<sup>|</sup> to

$$\rho(\boldsymbol{\phi}) = \rho\left(\phi\_1, \phi\_2, \dots, \phi\_{|V|}\right) \left(\boldsymbol{\phi} = \left(\phi\_1, \phi\_2, \dots, \phi\_{|V|}\right) \in (-\infty, +\infty)^{|V|}\right), \qquad (10.369)$$

the free energy functional can be expressed as

$$\begin{split} \mathcal{F}[\mathbf{R}] &= \frac{1}{2} \boldsymbol{I} \sum\_{\{\boldsymbol{l}, \boldsymbol{l}\} \in \boldsymbol{E}} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \left( \prod\_{l \in \boldsymbol{V}} \{\delta(\phi\_{\mathbf{k}} - 1) + \delta(\phi\_{\mathbf{k}} + 1) \} \right) (\phi\_{l} - \phi\_{\boldsymbol{j}})^{2} \rho(\boldsymbol{\Phi}) d\phi\_{1} d\phi\_{2} \cdots d\boldsymbol{\Phi}\_{|\boldsymbol{V}|} \\ &+ \frac{1}{2} h \sum\_{\boldsymbol{l} \in \boldsymbol{V}} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \left( \prod\_{l \in \boldsymbol{V}} \{\delta(\phi\_{\mathbf{k}} - 1) + \delta(\phi\_{\mathbf{k}} + 1) \} \right) (\phi\_{l} - \boldsymbol{l}\_{l})^{2} \rho(\boldsymbol{\Phi}) d\phi\_{1} d\phi\_{2} \cdots d\boldsymbol{\Phi}\_{|\boldsymbol{V}|} \\ &+ \frac{1}{2} \sum\_{\boldsymbol{l} \in \boldsymbol{V}} \text{Tr} \Big[ \left( \boldsymbol{\sigma}\_{\boldsymbol{l}}^{\boldsymbol{x}} - \boldsymbol{\Gamma} \boldsymbol{I}^{(2^{|\boldsymbol{V}|})} \right)^{2} \mathbf{R}\_{\boldsymbol{l}} \Big] + k\_{\rm B} T \text{Tr} [\boldsymbol{\mathcal{R}} \ln(\boldsymbol{\mathcal{R}})]. \end{split} \tag{10.370}$$

Now we consider the following approximate free energy:

$$\begin{split} \mathcal{F}\_{\text{Adapter \tiny{TAP}}} \left[ \rho, \{ \mathcal{R}\_{l}, \rho\_{l} \vert i \in V \} \right] &= \frac{1}{2} J \sum\_{\langle l, j \rangle \in E} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \left( \phi\_{l} - \phi\_{j} \right)^{2} \rho(\Phi) d\phi\_{1} d\phi\_{2} \cdots d\phi\_{|V|} \\ &+ \frac{1}{2} h \sum\_{i \in V} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \left( \phi\_{l} - d\bar{l} \right)^{2} \rho(\Phi) d\phi\_{1} d\phi\_{2} \cdots d\phi\_{|V|} \\ &+ \frac{1}{2} \sum\_{i \in V} \text{Tr} \left[ \left( \sigma\_{i}^{x} - \Gamma \, \mathbf{1}^{\{2V\}} \right) \right]^{2} \mathcal{R}\_{l} \Bigg] \\ &+ k\_{\text{B}} T \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \rho(\Phi) \text{ln} \left( \rho(\Phi) \right) d\phi\_{1} d\phi\_{2} \cdots d\phi\_{|V|} \\ &+ k\_{\text{B}} T \sum\_{i \in V} \Big( \text{Tr} \, \mathbf{R}\_{i} \ln \mathbf{R}\_{i} - \int\_{-\infty}^{+\infty} \rho\_{i}(\Phi\_{i}) \text{ln} \left( \rho\_{i}(\Phi\_{i}) \right) d\phi\_{i} \Big), \end{split} \tag{10.371}$$

where

$$\rho\_i(\phi\_i) \equiv \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \delta(\phi\_i - \phi\_i') \rho\left(\phi\_1', \phi\_2', \dots, \phi\_{|V|}'\right) d\phi\_1' d\phi\_2' \cdots d\phi\_{|V|}'$$

$$(i \in V, \phi\_i \in (-\infty, +\infty)). \quad (10.372)$$

The reduced density matrix *R<sup>i</sup>* and the marginal probability density functions ρ*i*(φ*i*) and ρ(*φ*) need to satisfy the consistencies

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 255

$$\begin{cases} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \phi\_l \rho(\Phi) d\phi\_l d\phi\_2 \cdots d\phi\_{|V|} = \int\_{-\infty}^{+\infty} \phi\_l \rho\_l(\phi\_l) d\phi\_l = \text{Tr}\sigma^z \,\text{R}\_l \,\,(i \neq V),\\ \sum\_{i \neq V} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \phi\_i^2 \,\rho(\Phi) d\phi\_1 d\phi\_2 \cdots d\phi\_{|V|} = \sum\_{i \neq V} \int\_{-\infty}^{+\infty} \phi\_i^2 \,\rho\_l(\phi\_l) d\phi\_i = 1, \end{cases} \tag{10.373}$$

and the normalizations

$$\begin{cases} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \rho(\Phi) d\phi\_1 d\phi\_2 \cdots d\phi\_{|V|} = 1, \\\int\_{-\infty}^{+\infty} \rho\_i(\phi\_i) d\phi\_i = 1 \ (i \in V). \end{cases} \tag{10.374}$$

*Ri* , ρ*i*(φ*i*) and ρ(*φ*) are determined so as to minimize the above approximate free energy *F*% ρ ,{*Ri*, ρ*<sup>i</sup>* |*i*∈*V*} & under the constraint conditions in Eqs. (10.373) and (10.374). We

introduce Lagrange multipliers *<sup>f</sup>* <sup>=</sup> ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ *f*1 *f*2 . . . *f*|*V*<sup>|</sup> ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ and *<sup>g</sup>* <sup>=</sup> ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ *g*1 *g*2 . . . *g*|*<sup>V</sup>* <sup>|</sup> ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ , *D*, *L*, λ, and λ*<sup>i</sup>* to ensure

the constraint conditions in Eqs. (10.373) and (10.374) as follows:

*L*Adaptive TAP% ρ ,{*Ri*, ρ*<sup>i</sup>* |*i*∈*V*} & ≡ *F*Adaptive TAP% ρ ,{*Ri*, ρ*<sup>i</sup>* |*i*∈*V*} & − *i*∈*V gi* +∞ −∞ +∞ −∞ ··· +∞ −∞ φ*<sup>i</sup>* ρ(*φ*)*d*φ1*d*φ2···*d*φ|*V*<sup>|</sup> − +∞ −∞ φ*i* ρ*i*(φ*i*)*d*φ*i* − *i*∈*V fi* +∞ −∞ +∞ −∞ ··· +∞ −∞ <sup>φ</sup>*<sup>i</sup>* ρ(*φ*)*d*φ1*d*φ2···*d*φ|*<sup>V</sup>* <sup>|</sup> <sup>−</sup> Tr*<sup>σ</sup> <sup>z</sup> Ri* −*D* ⎛ ⎝ *i*∈*V* +∞ −∞ +∞ −∞ ··· +∞ −∞ φ*i* <sup>2</sup>ρ(*φ*)*d*φ1*d*φ2···*d*φ|*<sup>V</sup>* <sup>|</sup> <sup>−</sup> <sup>1</sup> ⎞ ⎠ −*L* ⎛ ⎝ *i*∈*V* +∞ −∞ φ*i* <sup>2</sup>ρ*i*(φ*i*)*d*φ*<sup>i</sup>* <sup>−</sup> <sup>1</sup> ⎞ ⎠ −λ +∞ −∞ +∞ −∞ ··· +∞ −∞ ρ(*φ*)*dφ* − 1 − *i*∈*V* λ*i* +∞ −∞ ρ*i*(φ*i*)*d*φ*<sup>i</sup>* − 1 . (10.375) **-**

By taking the first variation of the approximate free energy *L*Adaptive TAP% ρ, {*Ri*, ρ*i*|*i*∈*V*} & with respect to the marginals, we can derive the approximate expressions of *R <sup>i</sup>* , <sup>ρ</sup>*i*(φ*i*), and ρ( *<sup>φ</sup>*) as follows: **-**

$$\widehat{\mathcal{R}}\_{\bar{l}} = \frac{\exp\left(\frac{1}{k\_{\bar{\mathcal{B}}}T} \left(f\_{\bar{l}}\sigma^{z} + \Gamma \sigma^{x}\right)\right)}{\mathrm{Tr}\exp\left(\frac{1}{k\_{\bar{\mathcal{B}}}T} \left(h\_{\bar{l}}\sigma^{z} + \Gamma \sigma^{x}\right)\right)} \qquad (i \in V), \tag{10.376}$$

ρ( *<sup>φ</sup>*)

= exp ⎛ ⎝ 1 *k*<sup>B</sup> *T* − 1 2 *D i*∈*V* φ*i* <sup>2</sup> <sup>+</sup> *i*∈*V fi* + *gi* <sup>φ</sup>*<sup>i</sup>* <sup>−</sup> <sup>1</sup> 2 *h i*∈*V* φ*<sup>i</sup>* − *di* <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> *<sup>J</sup>* {*i*,*j*}∈*E* φ*<sup>i</sup>* − φ*<sup>j</sup>* 2 ⎞ ⎠ +∞ −∞ +∞ −∞ ··· +∞ −∞ exp ⎛ ⎝ 1 *k*<sup>B</sup> *T* − 1 2 *D i*∈*V* φ*i* <sup>2</sup> <sup>+</sup> *i*∈*V fi* + *gi* <sup>φ</sup>*<sup>i</sup>* <sup>−</sup> <sup>1</sup> 2 *h i*∈*V* φ*<sup>i</sup>* − *di* <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> *<sup>J</sup>* {*i*,*j*}∈*E* φ*<sup>i</sup>* − φ*<sup>j</sup>* 2 ⎞ <sup>⎠</sup>*d*φ<sup>1</sup> *<sup>d</sup>*φ<sup>2</sup> ···*d*φ|*<sup>V</sup>* <sup>|</sup> , (10.377)

$$\widehat{\rho\_i}(\phi\_i) = \frac{\exp\left(\frac{1}{k\_\text{B}T}\left(-\frac{1}{2}L\phi\_i^{-2} + g\_i\phi\_i - \frac{1}{2}h\left(\phi\_i - d\_i\right)^2\right)\right)}{\int\_{-\infty}^{+\infty} \exp\left(\frac{1}{k\_\text{B}T}\left(-\frac{1}{2}L\phi\_i^{-2} + g\_i\phi\_i - \frac{1}{2}h\left(\phi\_i - d\_i\right)^2\right)\right)d\phi\_i} \tag{10.378}$$

where *<sup>C</sup>* is the <sup>|</sup>*V*|×|*V*<sup>|</sup> matrix in which the (*i*, *<sup>j</sup>*)-elements are defined by

$$C\_{ij} \equiv \begin{cases} |\partial i| \ (i=j), \\ -1 \ (\{i,j\} \in E), \\ 0 \ \text{(otherwise)}, \end{cases} \tag{10.379}$$

for any nodes *<sup>i</sup>*(∈*V*) and *<sup>j</sup>*(∈*V*). Equations (10.376), (10.377), and (10.378) can be rewritten as **-**

$$
\widehat{R}\_{i} = \frac{1}{2\cosh\left(\frac{1}{k\_{\rm B}T}\sqrt{f\_{i}^{2} + \Gamma^{2}}\right)} \frac{fi + \sqrt{f\_{i}^{2} + \Gamma^{2}}}{2\sqrt{f\_{i}^{2} + \Gamma^{2}}} \begin{pmatrix} 1 & -\frac{\Gamma}{f\_{i} + \sqrt{f\_{i}^{2} + \Gamma^{2}}}\\ +\frac{\Gamma}{f\_{i} + \sqrt{f\_{i}^{2} + \Gamma^{2}}} & 1 \end{pmatrix} \tag{1}
$$

$$
\times \begin{pmatrix} \exp\left(+\frac{1}{k\_{\rm B}T}\sqrt{f\_{i}^{2} + \Gamma^{2}}\right) & 0\\ 0 & \exp\left(-\frac{1}{k\_{\rm B}T}\sqrt{f\_{i}^{2} + \Gamma^{2}}\right) \end{pmatrix} \begin{pmatrix} 1 & +\frac{\Gamma}{f\_{i} + \sqrt{f\_{i}^{2} + \Gamma^{2}}}\\ -\frac{\Gamma}{f\_{i} + \sqrt{f\_{i}^{2} + \Gamma^{2}}} & 1 \end{pmatrix}. \tag{10.380}
$$

$$\begin{split} \widehat{\rho}(\boldsymbol{\Phi}) &= \sqrt{\frac{\det\left( (h+D)I^{|V|} + J\mathcal{C} \right)}{(2\pi)^{|V|}}} \\ &\times \exp\left( -\frac{1}{2k\_{\rm B}T} \Big( \boldsymbol{\Phi} - \left( (h+D)I^{|V|} + J\mathcal{C} \right)^{-1} (f+\mathbf{g}+h\mathbf{d}) \Big)^{\mathrm{T}} \Big( (h+D)I^{|V|} + J\mathcal{C} \Big) \right) \\ &\qquad\qquad\qquad\qquad\times \Big( \boldsymbol{\Phi} - \left( (h+D)I^{|V|} + J\mathcal{C} \right)^{-1} (f+\mathbf{g}+h\mathbf{d}) \Big) \Big), \quad (10.381) \end{split} \tag{10.381}$$

$$\widehat{\rho\_l}(\phi\_l) = \sqrt{\frac{h+L}{2\pi}} \exp\left(-\frac{1}{2k\_B T}(h+L)\left(x - \frac{g\_l + hd\_l}{h+L}\right)^2\right)(i \in V),\tag{10.382}$$

The Lagrange multipliers *f* , *g*, *L*, and *D* are often referred to as the **effective fields** and are determined so as to satisfy the consistencies in Eq. (10.373), which reduce to the following simultaneous equations:

$$\mathbf{g} + h\mathbf{d} = (h+L)\left((D-L)I^{(2^{|V|})} + J\mathcal{C}\right)^{-1}f,\tag{10.383}$$

$$\begin{aligned} \left(f+\mathfrak{g}+h\mathfrak{d}=\left((D-L)I^{(\|V\|)}+J\mathfrak{C}\right)^{-1}\right) \left(\begin{array}{c} \frac{f\_{1}}{\sqrt{f\_{1}^{2}+\Gamma^{2}}}\tanh\left(\frac{1}{k\_{\mathsf{B}}T}\sqrt{f\_{1}^{2}+\Gamma^{2}}\right) \\\\ \frac{f\_{2}}{\sqrt{f\_{2}^{2}+\Gamma^{2}}}\tanh\left(\frac{1}{k\_{\mathsf{B}}T}\sqrt{f\_{2}^{2}+\Gamma^{2}}\right) \\\\ \vdots \\\\ \frac{f\_{\left|V\right|}{\sqrt{f\_{\left|V\right|}^{2}+\Gamma^{2}}}\tanh\left(\frac{1}{k\_{\mathsf{B}}T}\sqrt{f\_{\left|V\right|}^{2}+\Gamma^{2}}\right) \end{array}\right) . \end{aligned} \tag{10.384}$$

$$L = -h + \frac{1}{2} + \sqrt{\frac{1}{4} + \frac{1}{|V|}(f + \mathfrak{g} + h\mathfrak{d})^{\mathrm{T}}(f + \mathfrak{g} + h\mathfrak{d})},\tag{10.385}$$

$$\frac{1}{\frac{1}{2} + \sqrt{\frac{1}{4} + \frac{1}{|V|}(f + \mathfrak{g} + h\mathcal{d})^{\mathrm{T}}(f + \mathfrak{g} + h\mathcal{d})}} = \frac{1}{|V|} \mathrm{Tr} \Big( \left( (h + D)I^{|V|} + J\mathcal{C} \right)^{-1} \Big). \tag{10.386}$$

The real symmetric matrix *C* is diagonalized as

$$\mathcal{C} = U\Lambda U^{-1},\tag{10.387}$$

$$\mathbf{A} \equiv \begin{pmatrix} \lambda\_1 & 0 & 0 & \cdots & 0 \\ 0 & \lambda\_2 & 0 & \cdots & 0 \\ 0 & 0 & \lambda\_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda\_{|V|} \end{pmatrix},\tag{10.388}$$

where <sup>λ</sup><sup>1</sup> <sup>≥</sup> <sup>λ</sup><sup>2</sup> <sup>≥</sup> <sup>λ</sup><sup>3</sup> ≥···≥ <sup>λ</sup>|*<sup>V</sup>* <sup>|</sup>. All the eigenvalues <sup>λ</sup>1, λ2, ···, λ|*<sup>V</sup>* <sup>|</sup> are always real numbers. For the eigenvector *<sup>u</sup><sup>i</sup>* <sup>=</sup> ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ *U*1*i U*2*i* . . . *U*|*V*|*<sup>i</sup>* ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ corresponding to the eigenvalue λ*<sup>i</sup>* such that *Au<sup>i</sup>* <sup>=</sup> <sup>λ</sup>*<sup>i</sup> <sup>u</sup><sup>i</sup>* , for every *<sup>i</sup>*∈{1, <sup>2</sup>, <sup>3</sup>, ···, *<sup>M</sup>*}, the matrix *<sup>U</sup>* is defined by

$$U \equiv (\mathfrak{u}\_1, \mathfrak{u}\_2, \mathfrak{u}\_3, \dots, \mathfrak{u}\_M) = \begin{pmatrix} U\_{11} & U\_{12} & U\_{13} & \cdots & U\_{1|V|} \\ U\_{21} & U\_{22} & U\_{23} & \cdots & U\_{2|V|} \\ U\_{31} & U\_{32} & U\_{33} & \cdots & U\_{3|V|} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ U\_{|V||1} & U\_{|V|2} & U\_{|V|3} & \cdots & U\_{|V|||V|} \end{pmatrix} . \tag{10.389}$$

It is known that *<sup>U</sup>* is a unitary matrix that satisfies *<sup>U</sup>*−<sup>1</sup> <sup>=</sup> *<sup>U</sup>*<sup>T</sup> for the real symmetric matrix *C*. By using the diagonal matrix and unitary matrix *U*, the density matrix *R* in Eq. (10.390) can be represented as follows:

$$\mathcal{R} = \frac{\exp\left(-\frac{1}{2k\_{\rm B}T}\xi^{\rm T}\left((hI^{(2^{|V|})} + J\Lambda)\otimes I^{(2^{|V|})}\right)\xi - \frac{1}{2k\_{\rm B}T}\xi^{\rm T}\xi\right)}{\mathrm{Tr}\left[\exp\left(-\frac{1}{2k\_{\rm B}T}\xi^{\rm T}\left((hI^{(2^{|V|})} + J\Lambda)\otimes I^{(2^{|V|})}\right)\xi - \frac{1}{2k\_{\rm B}T}\xi^{\rm T}\xi\right)\right]},\quad(10.390)$$

where

$$\xi = \begin{pmatrix} \xi\_1 \\ \xi\_2 \\ \vdots \\ \xi\_{|V|} \end{pmatrix} \equiv \left( U^{\mathbf{T}} \otimes I^{\mathbf{2}^{|V|}} \right) \begin{pmatrix} \sigma\_1^z \\ \sigma\_2^z \\ \vdots \\ \sigma\_{|V|}^z \end{pmatrix} - \left( \left( hI^{\|V\|} + J\Lambda \right)^{-1} U^{\mathbf{T}} \begin{pmatrix} d\_1 \\ d\_2 \\ \vdots \\ \vdots \\ d\_{|V|} \end{pmatrix} \right) \otimes I^{\mathbf{2}^{|V|}},\tag{10.391}$$

$$\boldsymbol{\xi} = \begin{pmatrix} \xi\_1 \\ \xi\_2 \\ \vdots \\ \xi\_{|V|} \end{pmatrix} = \left( \boldsymbol{U^T} \otimes \boldsymbol{I^{(2^{|V|})}} \right) \begin{pmatrix} \sigma\_1^x \\ \sigma\_2^x \\ \vdots \\ \sigma\_{|V|}^x \end{pmatrix} - \boldsymbol{\gamma} \begin{pmatrix} \boldsymbol{I^{(2^{|V|})}} \boldsymbol{V^{(1)}} \\ \boldsymbol{I^{(2^{|V|})}} \boldsymbol{V^{(1)}} \\ \vdots \\ \boldsymbol{I^{(2^{|V|})}} \boldsymbol{V^{(1)}} \end{pmatrix}. \tag{10.392}$$

By using the Gram-Schmidt orthonormalization in the framework of Fig. 10.17, we introduce a new unitary matrix

$$
\widetilde{U} = \begin{pmatrix}
\widetilde{U}\_{11} & \widetilde{U}\_{12} & \widetilde{U}\_{13} & \cdots & \widetilde{U}\_{1|\widetilde{V}|} \\
\widetilde{U}\_{21} & \widetilde{U}\_{22} & \widetilde{U}\_{23} & \cdots & \widetilde{U}\_{2|\widetilde{V}|} \\
\widetilde{U}\_{31} & \widetilde{U}\_{32} & \widetilde{U}\_{33} & \cdots & \widetilde{U}\_{3|\widetilde{V}|} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\widetilde{U}\_{|\widetilde{V}|1} & \widetilde{U}\_{|\widetilde{V}|2} & \widetilde{U}\_{|\widetilde{V}|3} & \cdots & \widetilde{U}\_{|\widetilde{V}|,|\widetilde{V}|}
\end{pmatrix} \equiv \left(\widetilde{u}\_{1}, \widetilde{u}\_{2}, \widetilde{u}\_{3}, \cdots, \widetilde{u}\_{|\widetilde{V}|}\right),
\tag{10.393}
$$

where

**Fig. 10.17** Momentum space renormalization group for graphical models on random graphs

$$\begin{aligned} \boldsymbol{\nu}\_{1} = \begin{pmatrix} U\_{11} \\ U\_{21} \\ \vdots \\ U\_{|\tilde{V}|1} \end{pmatrix}, \boldsymbol{\nu}\_{2} = \begin{pmatrix} U\_{12} \\ U\_{22} \\ \vdots \\ U\_{|\tilde{V}|2} \end{pmatrix}, \boldsymbol{\cdots}, \boldsymbol{\nu}\_{|\tilde{V}|} = \begin{pmatrix} U\_{1|\tilde{V}|} \\ U\_{2|\tilde{V}|} \\ \vdots \\ U\_{|\tilde{V}||\tilde{V}|} \end{pmatrix}, \\ &\tag{10.394} \end{aligned} \tag{10.394}$$

$$\begin{cases} u\_{1}^{\prime} &= v\_{1}, & \tilde{u}\_{1} = \frac{u\_{1}^{\prime}}{\sqrt{\boldsymbol{\mu}\_{1}^{\prime\prime} \boldsymbol{u}\_{1}^{\prime}}}, \\ u\_{2}^{\prime} &= v\_{2} - \frac{u\_{1}^{\prime\prime} v\_{2}}{\boldsymbol{u}\_{1}^{\prime\prime} \boldsymbol{u}\_{1}^{\prime}} u\_{1}^{\prime}, & \tilde{u}\_{2} = \frac{u\_{2}^{\prime}}{\sqrt{\boldsymbol{\mu}\_{2}^{\prime\prime} \boldsymbol{u}\_{2}^{\prime}}}, \\ u\_{3}^{\prime} &= v\_{3} - \frac{u\_{1}^{\prime\prime} v\_{3}}{\boldsymbol{u}\_{1}^{\prime\prime} \boldsymbol{u}\_{1}^{\prime}} u\_{1}^{\prime} - \frac{u\_{2}^{\prime\prime} v\_{3}}{\boldsymbol{u}\_{2}^{\prime\prime} \boldsymbol{u}\_{2}^{\prime\prime}} u\_{2}^{\prime}, & \tilde{u}\_{3} = \frac{u\_{3}^{\prime}}{\sqrt{\boldsymbol{\mu}\_{3}^{\prime\prime} \boldsymbol{u}\_{3}^{\prime}}}, \\ \vdots \\ u\_{\left|\tilde{\boldsymbol{\nabla}}\right|}^{\prime} = v\_{\left|\tilde{\boldsymbol{\nabla}}\right|} - \frac{u\_{1}^{\prime\prime} \frac{\boldsymbol{u}\_{1}^{\prime\prime} \boldsymbol{u}\_{1}^{\prime}}{\boldsymbol{u}\_{1}^{\prime\prime} \boldsymbol{u}\_{1}^{\prime}} u\_{1}^{\prime} - \frac{u\_{2}^{\prime\prime} v\_{1}}{\boldsymbol{u}\_{2}^{\prime\prime} \boldsymbol{u}\_{2}^{\prime\prime}} u\_{2}^{\prime} + \dots + \frac{u\_{\left|\tilde{\boldsymbol{\nabla}}\right|}{\boldsymbol{u}\_{\left|\tilde{\boldsymbol{\nabla}}\right|}} \frac{\boldsymbol{u}\_{$$

By using the new unitary matrix *<sup>U</sup>* and a diagonal matrix

$$
\widetilde{\mathbf{A}} \equiv \begin{pmatrix}
\lambda\_1 & 0 & 0 & \cdots & 0 \\
0 & \lambda\_2 & 0 & \cdots & 0 \\
0 & 0 & \lambda\_3 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & \lambda\_{|\widetilde{V}|} \\
\end{pmatrix}, \tag{10.396}
$$

we introduce a renormalized density matrix *<sup>R</sup>* from the standpoint of the momentum space renormalization group for general graphs as 

$$\begin{aligned} & \text{Minimalization group for general graphs as} \\ & \widetilde{\mathbf{P}} \equiv \frac{\exp\left(-\frac{1}{2k\_{\text{B}}T}\widetilde{\xi}^{\mathbf{T}}\left((hI^{(2\n|\widetilde{V}|\_{\mathbb{I}}}+J\widetilde{\Lambda})\otimes I^{(2\n|\widetilde{V}|\_{\mathbb{I}})}\right)\widetilde{\xi} - \frac{1}{2k\_{\text{B}}T}\widetilde{\xi}^{\mathbf{T}}\widetilde{\xi}\right)}{\text{Tr}\left[\exp\left(-\frac{1}{2k\_{\text{B}}T}\widetilde{\xi}^{\mathbf{T}}\left((hI^{(2^{|\widetilde{V}|\_{\mathbb{I}}})}+J\widetilde{\Lambda})\otimes I^{(2^{|\widetilde{V}|\_{\mathbb{I}}})}\right)\widetilde{\xi} - \frac{1}{2k\_{\text{B}}T}\widetilde{\xi}^{\mathbf{T}}\widetilde{\xi}\right)\right]}, \quad (10.397) \end{aligned}$$

where

$$
\widetilde{\boldsymbol{\xi}} = \begin{pmatrix} \widetilde{\xi}\_1 \\ \widetilde{\xi}\_2 \\ \vdots \\ \widetilde{\xi}\_{\lfloor \widetilde{V} \rfloor} \end{pmatrix} \equiv \left( \widetilde{\boldsymbol{U}}^{\mathbf{T}} \otimes \boldsymbol{I}^{(2^{\lfloor \widetilde{V} \rfloor})} \right) \begin{pmatrix} \sigma\_1^{\widetilde{\pi}} \\ \sigma\_2^{\widetilde{\pi}} \\ \vdots \\ \sigma\_{\lfloor \widetilde{V} \rfloor}^{\widetilde{\pi}} \end{pmatrix} - \left( (\boldsymbol{h} \boldsymbol{I}^{(\widetilde{V})} + \boldsymbol{J} \widetilde{\boldsymbol{\Lambda}})^{-1} \widetilde{\boldsymbol{U}}^{\mathbf{T}} \begin{pmatrix} \widetilde{\boldsymbol{d}}\_1 \\ \widetilde{\boldsymbol{d}}\_2 \\ \vdots \\ \widetilde{\boldsymbol{d}}\_{\lfloor \widetilde{V} \rfloor} \end{pmatrix} \right) \otimes \boldsymbol{I}^{(2^{\lfloor \widetilde{V} \rfloor})},\tag{10.398}
$$

$$
\widetilde{\boldsymbol{\xi}} = \begin{pmatrix} \widetilde{\boldsymbol{\xi}}\_1 \\ \widetilde{\boldsymbol{\xi}}\_2 \\ \vdots \\ \widetilde{\boldsymbol{\xi}}\_{|\widetilde{V}|} \end{pmatrix} \equiv \left( \boldsymbol{U}^{\mathbf{T}} \otimes \boldsymbol{I}^{(2^{|\widetilde{V}|})} \right) \begin{pmatrix} \sigma\_1^{\mathbf{x}} \\ \sigma\_2^{\mathbf{x}} \\ \vdots \\ \sigma\_{|\widetilde{V}|}^{\mathbf{x}} \end{pmatrix} - \boldsymbol{\gamma} \begin{pmatrix} \boldsymbol{I}^{(2^{|\widetilde{V}|}|, |\widetilde{V}|)} \\ \boldsymbol{I}^{(2^{|\widetilde{V}|}|, |\widetilde{V}|)} \\ \vdots \\ \boldsymbol{I}^{(2^{|\widetilde{V}|}|, |\widetilde{V}|)} \end{pmatrix}, \tag{10.399}
$$

$$
\begin{pmatrix}
\tilde{d}\_{1} \\
\tilde{d}\_{2} \\
\vdots \\
\tilde{d}\_{|\tilde{V}|}
\end{pmatrix} = 
\begin{pmatrix}
\tilde{U}\_{11} & \tilde{U}\_{12} & \tilde{U}\_{13} & \cdots & \tilde{U}\_{1|\tilde{V}|} \\
\tilde{U}\_{21} & \tilde{U}\_{22} & \tilde{U}\_{23} & \cdots & \tilde{U}\_{2|\tilde{V}|} \\
\tilde{U}\_{31} & \tilde{U}\_{32} & \tilde{U}\_{33} & \cdots & \tilde{U}\_{3|\tilde{V}|} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\tilde{U}\_{|\tilde{V}|1} & \tilde{U}\_{|\tilde{V}|2} & \tilde{U}\_{|\tilde{V}|3} & \cdots & \tilde{U}\_{|\tilde{V}|} \\
\end{pmatrix}
\begin{pmatrix}
U\_{11} & U\_{21} & U\_{31} & \cdots & U\_{|V|} \\
U\_{12} & U\_{22} & U\_{32} & \cdots & U\_{|V|} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
U\_{|\tilde{V}|1} & U\_{2|\tilde{V}|} & U\_{3|\tilde{V}|} & \cdots & U\_{|V|} \\
\end{pmatrix}
\begin{pmatrix}
d\_{1} \\
d\_{2} \\
d\_{3} \\
\vdots \\
d\_{|V|} \\
d\_{|V|}
\end{pmatrix}.
\tag{10.400}
$$

For this density matrix  $\tilde{F}$  in Eq. (10.397), we can formulate the approximate reduced

$$\text{d} \times \text{H} = \text{s} \text{ for}$$

density matrix **-***<sup>R</sup><sup>i</sup>* and the approximate Gaussian marginal probability density function ρ( <sup>9</sup> *<sup>φ</sup>*) <sup>=</sup> ρ(φ <sup>9</sup> <sup>1</sup>, φ2, ···, φ|*<sup>V</sup>* 9| ) and ρ(φ <sup>9</sup> *<sup>i</sup>*) for the corresponding quantum adaptive TAP approximation as follows:

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 261

**-**

$$
\begin{split}
\widehat{\tilde{\mathcal{R}}}\_{l} &= \frac{1}{2\cosh\left(\frac{1}{\mathbf{k}\mathbf{B}^{T}}\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}\right)}\frac{\tilde{f}\_{l}^{\*}+\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}}{2\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}}\begin{pmatrix} 1 & -\frac{\Gamma}{\tilde{f}\_{l}+\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}}\\ +\frac{\Gamma}{\tilde{f}\_{l}+\sqrt{\tilde{f}\_{l}^{2}}+\Gamma^{2}} & 1 \end{pmatrix} \\ &\times \begin{pmatrix} \exp\left(+\frac{1}{k\_{\mathrm{B}}\Gamma}\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}\right) & 0 \\ 0 & \exp\left(-\frac{1}{k\_{\mathrm{B}}\Gamma}\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}\right) \end{pmatrix} \begin{pmatrix} 1 & +\frac{\Gamma}{\tilde{f}\_{l}+\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}}\\ -\frac{\Gamma}{\tilde{f}\_{l}+\sqrt{\tilde{f}\_{l}^{2}+\Gamma^{2}}} & 1 \end{pmatrix}. \end{split} \tag{10.401}
$$

$$
\begin{split}
\widetilde{\rho}(\phi\_{1},\phi\_{2},\cdots,\phi\_{|\widetilde{V}|}) &= \sqrt{\frac{\det\Big((h+\widetilde{D})I^{1(\widetilde{V})} + I\widetilde{\Lambda}\Big)}{(2\pi k\_{\rm B}T)^{|\widetilde{V}|}}} \\ &\times \exp\Big(-\frac{1}{2k\_{\rm B}T} \Big(\widetilde{\theta} - \widetilde{U}\Big((h+\widetilde{D})I^{1(\widetilde{V})} + I\widetilde{\Lambda}\Big)^{-1}\widetilde{U}^{\rm T}\Big(\widetilde{f} + \widetilde{\mathfrak{g}} + h\widetilde{\mathcal{A}}\Big)\Big)^{\rm T} \\ &\times \widetilde{U}\Big((h+\widetilde{D})I^{1(\widetilde{V})} + I\widetilde{\Lambda}\Big)\widetilde{U}^{\rm T} \\ &\times \Big(\widetilde{\theta} - \mathcal{U}\Big((h+\widetilde{D})I^{1(\widetilde{V})} + I\widetilde{\Lambda}\Big)^{-1}\widetilde{U}^{\rm T}\Big(\widetilde{f} + \widetilde{\mathfrak{g}} + h\widetilde{\mathcal{A}}\Big)\Big).
\end{split}
\tag{10.402}
$$

$$\widehat{\rho\_l}(\phi\_l) = \sqrt{\frac{h + \widetilde{L}}{2\pi k\_B T}} \exp\left(-\frac{1}{2k\_B T} (h + \widetilde{L}) \left(\phi\_l - \frac{\widetilde{g}\_l + h\widetilde{d}\_l}{h + \widetilde{L}}\right)^2\right) (i \in \widetilde{V}).\tag{10.403}$$

The reduced density matrix **-***<sup>R</sup><sup>i</sup>* and the marginal probability density functions ρ9*i*(φ*i*) and ρ<sup>9</sup> <sup>φ</sup>1, φ2, ···, φ|*<sup>V</sup>* 9| need to satisfy the consistencies

$$\begin{split} \text{mult}\ & \rho\left(\boldsymbol{\Psi}\_{1},\boldsymbol{\Psi}\_{2},\cdots,\boldsymbol{\Psi}\_{|\boldsymbol{V}|}\right) \text{head to a family } \text{incons}\text{success} \\ & \text{mult}\ & \sum\_{i=-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \oint\_{-\infty} \widehat{\rho}\left(\boldsymbol{\Psi}\_{1},\boldsymbol{\Phi}\_{2},\cdots,\boldsymbol{\Phi}\_{|\boldsymbol{V}|}\right) d\boldsymbol{\Phi}\_{1} d\boldsymbol{\Phi}\_{2} \cdots d\boldsymbol{\Phi}\_{|\boldsymbol{V}|} = \int\_{-\infty}^{+\infty} \boldsymbol{\phi}\_{i} \widehat{\rho}\_{i}(\boldsymbol{\Phi}\_{i}) d\boldsymbol{\phi}\_{i} = \text{Tr}\boldsymbol{\sigma}^{2} \widehat{\mathbf{R}}\_{i}\left(i \in \widetilde{V}\right), \\ & \sum\_{i\in\widetilde{V}} \int\_{-\infty}^{+\infty} \int\_{-\infty}^{+\infty} \cdots \int\_{-\infty}^{+\infty} \boldsymbol{\phi}\_{i} \widehat{\rho}\_{i}^{2} \widehat{\rho}\left(\boldsymbol{\Phi}\_{1},\boldsymbol{\Phi}\_{2},\cdots,\boldsymbol{\Phi}\_{|\widetilde{V}|}\right) d\boldsymbol{\phi}\_{1} d\boldsymbol{\phi}\_{2} \cdots d\boldsymbol{\phi}\_{|\widetilde{V}|} = \sum\_{i\in\widetilde{V}} \int\_{-\infty}^{+\infty} \boldsymbol{\phi}\_{i} \widehat{\rho}\_{i}^{2} \widehat{\rho}\_{i}(\boldsymbol{\Phi}\_{i}) d\boldsymbol{\phi}\_{i} = 1. \end{split} \tag{10.404}$$

The Lagrange multipliers *f* , *g*, <sup>9</sup>*L*, and *<sup>D</sup>* <sup>9</sup> are determined so as to satisfy the consistencies in Eq. (10.404), which reduce to the following simultaneous equations: 

$$
\widetilde{\mathbf{g}} + h\widetilde{\mathbf{d}} = (h + \widetilde{L})\widetilde{U} \\
\left(\widetilde{U} - \widetilde{L}\right)\widetilde{U} = \widetilde{U} \\
+ \widetilde{L} = \widetilde{L}
$$
 
$$
\widetilde{\mathbf{g}} + h\widetilde{\mathbf{d}} = (h + \widetilde{L})\widetilde{U} \left((\widetilde{D} - \widetilde{L})I^{(\|\widetilde{V}\|)} + J\widetilde{\mathbf{A}}\right)^{-1}\widetilde{U}^{\mathbf{T}}\widetilde{f}, \\
\tag{10.405}
$$

$$
\widetilde{\boldsymbol{f}} + \widetilde{\mathbf{g}} + h\mathbf{d} = \widetilde{\boldsymbol{U}} \left( (\widetilde{\boldsymbol{D}} - \widetilde{\boldsymbol{L}}) \boldsymbol{I}^{(\text{p}\widetilde{\boldsymbol{V}}|\boldsymbol{0})} + J\widetilde{\boldsymbol{\Lambda}} \right)^{-1} \widetilde{\boldsymbol{U}}^{\text{T}} \begin{pmatrix} \frac{\widetilde{\boldsymbol{f}}\_{1}}{\sqrt{\widetilde{f}\_{1}^{2} + \Gamma^{2}}} \tanh\left(\frac{1}{k\_{\text{B}}\Gamma} \sqrt{\widetilde{f}\_{1}^{2} + \Gamma^{2}}\right) \\\\ \frac{\widetilde{\boldsymbol{f}}\_{2}}{\sqrt{\widetilde{f}\_{2}^{2} + \Gamma^{2}}} \tanh\left(\frac{1}{k\_{\text{B}}\Gamma} \sqrt{\widetilde{f}\_{2}^{2} + \Gamma^{2}}\right) \\\\ \vdots \\\\ \frac{\widetilde{\boldsymbol{f}}\_{|\widetilde{V}|}}{\sqrt{\widetilde{f}\_{|\widetilde{V}|}^{2} + \Gamma^{2}}} \tanh\left(\frac{1}{k\_{\text{B}}\Gamma} \sqrt{\widetilde{f}\_{|\widetilde{V}|}^{2} + \Gamma^{2}}\right) \end{pmatrix}, \tag{10.406}
$$

$$\widetilde{L} = -h + \frac{1}{2} + \sqrt{\frac{1}{4} + \frac{1}{|\widetilde{V}|} \left(\widetilde{f} + \widetilde{\mathfrak{g}} + h\widetilde{d}\right)^{\mathrm{T}} \left(\widetilde{f} + \widetilde{\mathfrak{g}} + h\widetilde{d}\right)},\tag{10.407}$$

$$\frac{1}{\frac{1}{2} + \sqrt{\frac{1}{4} + \frac{1}{|\mathcal{V}|} \left(\widetilde{f} + \widetilde{\mathfrak{g}} + hd\right)^{\mathsf{T}} \left(\widetilde{f} + \widetilde{\mathfrak{g}} + h\widetilde{d}\right)} = \frac{1}{|\mathcal{V}|} \text{Tr}\left[\left((\widetilde{D} - \widetilde{L})I^{(\widetilde{V}|\mathcal{V})} + J\widetilde{\Lambda}\right)^{-1}\right].\tag{10.408}$$

## *10.5.4 Suzuki-Trotter Decomposition in the Transverse Ising Model*

In this section, we review the Suzuki-Trotter formulas and extensions from conventional quantum loopy belief propagation using them. In quantum probabilistic graphical models, the state space is defined by all the eigenvectors of the density matrix *R* and the probability of each eigenvector is given by the eigenvalue as mentioned in Sect. 10.4.2. To compute some statistical quantities by using the Monte Carlo method, it is necessary to diagonalize the energy matrix *H*, which is a massive computation. Instead of such a scheme, quantum Monte Carlo methods based on the Suzuki-Trotter formulas were proposed [89]. One important part of the quantum Monte Carlo method is the mapping from a quantum probabilistic graphical model to a conventional (classical) probabilistic graphical model by introducing the techniques of Suzuki-Trotter decompositions. It is known that some statistical quantities for conventional (classical) probabilistic graphical models can be computed by MCMC methods. This is a basic idea behind quantum Monte Carlo methods. Let us first review the Suzuki-Trotter formulas and explicitly give a detailed scheme of Suzuki-Trotter decompositions for the transverse Ising model in Eqs. (10.226) and (10.227) with Eq. (10.301).

From the definition of the exponential function for square matrices, we have

#### 10 Review of Sublinear Modeling in Probabilistic Graphical Models … 263

$$\exp(\mathbf{x}(A+\mathbf{B})) = I + \mathbf{x}(A+\mathbf{B}) + \frac{1}{2}\mathbf{x}^2(A+\mathbf{B})^2 + \mathcal{O}(\mathbf{x}^3) \text{ ( $\mathbf{x} \to \mathbf{0}$ )},\tag{10.409}$$

$$\exp(\mathbf{x}A) = I + \mathbf{x}A + \frac{1}{2}\mathbf{x}^2 A^2 + \mathcal{O}(\mathbf{x}^3) \ (\mathbf{x} \to +0),\tag{10.410}$$

$$\exp(\mathbf{x}\,\mathbf{B}) = I + \mathbf{x}\,\mathbf{B} + \frac{1}{2}\mathbf{x}^2\,\mathbf{B}^2 + \mathcal{O}(\mathbf{x}^3) \text{ ( $\mathbf{x} \to +0$ )}.\tag{10.411}$$

From these equalities, the following formula can be confirmed:

$$\exp(\mathbf{x}(A+\mathbf{B})) = \exp(\mathbf{x}A)\exp(\mathbf{x}B) + \mathcal{O}(\mathbf{x}^2) \text{ ( $\mathbf{x} \rightarrow +0$ )}.\tag{10.412}$$

Moreover, we have

$$\exp(\mathbf{x}A) = \left[\exp\left(\frac{\mathbf{x}}{M}A\right)\right]^M + \mathcal{O}\left(\frac{\mathbf{x}^2}{M}\right)(\mathbf{x}^2 \ll M),\tag{10.413}$$

$$\exp(\mathbf{x}(A+\mathcal{B})) = \left[\exp\left(\frac{\mathbf{x}}{M}A\right)\exp\left(\frac{\mathbf{x}}{M}\mathcal{B}\right)\right]^M + \mathcal{O}\left(\frac{\mathbf{x}^2}{M}\right)(\mathbf{x}^2 \ll M).\tag{10.414}$$

Generally, for a graph (*V*, *<sup>E</sup>*) with the set of nodes *<sup>V</sup>* = {1, <sup>2</sup>, ···, *<sup>N</sup>*} and the set of edges *E* = # {*i*, *j*} \$ , we have

$$\exp\left(x\sum\_{\{i,j\}\in E} A\_{\{i,j\}}\right) = \left[\prod\_{\{i,j\}\in E} \exp\left(\frac{x}{M} A\_{\{i,j\}}\right)\right]^M + \mathcal{O}\left(\frac{x^2}{M}\right) (x^2 \ll M), \quad (10.415)$$

$$\begin{split} & \exp\left(\boldsymbol{x}\sum\_{\{i,j\}\in E} \mathcal{A}\_{\{i,j\}} + \boldsymbol{x}\sum\_{\{i,j\}\in E} \mathcal{B}\_{\{i,j\}}\right) \\ &= \left[ \left( \prod\_{\{i,j\}\in E} \exp\left(\frac{\boldsymbol{x}}{M} \mathcal{A}\_{\{i,j\}}\right) \right) \left( \prod\_{\{i,j\}\in E} \exp\left(\frac{\boldsymbol{x}}{M} \mathcal{B}\_{\{i,j\}}\right) \right) \right]^M + \mathcal{O}\left(\frac{\boldsymbol{x}^2}{M}\right) (\boldsymbol{x}^2 \ll M). \end{split} \tag{10.416}$$

These are referred to as a **Suzuki-Trotter Decomposition** [87, 88].

For the case of *<sup>N</sup>* <sup>=</sup> 2, we consider an energy matrix *<sup>H</sup>* defined by

$$H = -J\sigma\_1^z \sigma\_2^z - h\_1 \sigma\_1^z - h\_2 \sigma\_2^z - \Gamma \sigma\_1^x - \Gamma \sigma\_2^x. \tag{10.417}$$

It is referred to as a quantum transverse Ising model on a chain  *V* = # 1, 2 \$ , *E* = # {1, 2} \$ with three nodes and two edges. By using the above Suzuki-Trotter formula, we have

*s*1,1,*s*2,1|exp <sup>−</sup> <sup>1</sup> *k*B*T H* |*s* 1,1,*s* 2,2 <sup>=</sup> lim *<sup>M</sup>*→+∞ exp 1 *k*B*T M Jσ <sup>z</sup>* **1** *σ z* **<sup>2</sup>** <sup>+</sup> *<sup>h</sup>*1*<sup>σ</sup> <sup>z</sup>* **<sup>1</sup>** <sup>+</sup> *<sup>h</sup>*2*<sup>σ</sup> <sup>z</sup>* **2** exp 1 *k*B*T M σ <sup>x</sup>* **<sup>1</sup>** <sup>+</sup> *<sup>σ</sup> <sup>x</sup>* **2** *<sup>M</sup>* <sup>=</sup> lim *<sup>M</sup>*→+∞ τ1,1∈- τ2,1∈- *s*1,2∈- *s*2,2∈- τ1,2∈- τ2,2∈- ··· *s*1,*<sup>M</sup>* ∈- *s*2,*<sup>M</sup>* ∈- <sup>δ</sup>*s*1,*M*+1,*s* 1,1 <sup>δ</sup>*s*2,*M*+1,*s* 2,1 × *M m*=1 *s*1,*m*,*s*2,*m*|exp 1 *k*B*T M Jσ <sup>z</sup>* **1** *σ z* **<sup>2</sup>** <sup>+</sup> *<sup>h</sup>*1*<sup>σ</sup> <sup>z</sup>* **<sup>1</sup>** <sup>+</sup> *<sup>h</sup>*2*<sup>σ</sup> <sup>z</sup>* **2** |τ1,*m*, τ2,*m* ×τ1,*m*, τ2,*m*|exp 1 *k*B*T M σ <sup>x</sup>* **<sup>1</sup>** <sup>+</sup> *<sup>σ</sup> <sup>x</sup>* **2** |*s*1,*m*+1,*s*2,*m*+1 . (10.418)

Note that

$$
\Gamma \sigma\_1^{\mathbf{x}} + \Gamma \sigma\_2^{\mathbf{x}} = \Gamma \sigma^{\mathbf{x}} \otimes I + \Gamma I \otimes \sigma^{\mathbf{x}}.\tag{10.419}
$$

By using Eq. (10.253), Eq. (10.418) can be rewritten as

$$\begin{split} & \langle s\_{1,1}, s\_{2,1} | \exp \left( -\frac{1}{k\_{\rm B}T} H \right) | s'\_{1,1}, s'\_{2,1} \rangle \\ &= \lim\_{M \to +\infty} \sum\_{s\_{1,2} \in \Omega \cap \Sigma \cup \Omega} \sum\_{s\_{1,M} \in \Omega} \cdots \sum\_{s\_{1,M} \in \Omega \cap \Sigma} \sum\_{s\_{1,M} \in \Omega} \delta\_{s\_{1,M+1}, s'\_{1,1}} \delta\_{s\_{2,M+1}, s'\_{2,1}} \\ & \quad \times \prod\_{m=1}^{M} \Big( \exp \left( \frac{1}{k\_{\rm B}T M} \Big( J s\_{1,m} s\_{2,m} + h\_1 s\_{1,m} + h\_2 s\_{2,m} \Big) \right) \\ & \qquad \times \langle s\_{1,m}, s\_{2,m} | \Big( \exp \left( \frac{1}{k\_{\rm B}T M} \Gamma \sigma^{x} \right) \otimes I \Big) \Big( I \otimes \exp \left( \frac{1}{k\_{\rm B}T M} \Gamma \sigma^{x} \right) \Big) | s\_{1,m+1}, s\_{2,m+1} \Big) \Big). \end{split} \tag{10.420}$$

Moreover, by the definition of the tensor product for *<sup>A</sup>*⊗*<sup>I</sup>* and *<sup>I</sup>*⊗*<sup>A</sup>* for any matrix *<sup>A</sup>* in terms of Eq. (10.238), we have

$$\begin{split} \langle s\_{1,m}, s\_{2,m} \vert \left( \exp\left(\frac{1}{k\_{\rm B}TM} \Gamma \sigma^{\bf x}\right) \otimes I\right) \vert \left( \Gamma \otimes \exp\left(\frac{1}{k\_{\rm B}TM} \Gamma \sigma^{\bf x}\right) \right) \vert s\_{1,m+1}, s\_{2,m+1} \rangle \\ &= \langle s\_{1,m} \vert \exp\left(\frac{1}{k\_{\rm B}TM} \Gamma \sigma^{\bf x}\right) \vert s\_{1,m+1} \rangle \langle s\_{2,m} \vert \exp\left(\frac{1}{k\_{\rm B}TM} \Gamma \sigma^{\bf x}\right) \vert s\_{2,m+1} \rangle \\ &= \langle s\_{1,m} \vert \left( \cosh\left(\frac{1}{k\_{\rm B}TM} \Gamma\right) \sinh\left(\frac{1}{k\_{\rm B}TM} \Gamma\right) \right) \vert s\_{1,m+1} \rangle \\ &\quad \times \langle s\_{2,m} \vert \left( \frac{\cosh\left(\frac{1}{k\_{\rm B}TM} \Gamma\right)}{\sinh\left(\frac{1}{k\_{\rm B}TM} \Gamma\right)} \Gamma\right) \sinh\left(\frac{1}{k\_{\rm B}TM} \Gamma\right) \rangle \vert s\_{2,m+1} \rangle. \end{split} \tag{10.421}$$

Equation (10.420) can be rewritten in terms of the two-dimensional representation as

$$\begin{split} \langle s\_{1,1}, s\_{2,1} \vert \exp\left(-\frac{1}{k\_{\text{B}}T}H\right) \vert s'\_{1,1}, s'\_{2,1} \rangle \\ = & \lim\_{M \to +\infty} \sum\_{s\_{1,2} \in \Omega \Omega\_{2,2}} \sum\_{s\_{2,2} \in \Omega} \cdots \sum\_{s\_{1,M} \in \Omega \Omega\_{2,M} \in \Omega} \delta\_{s\_{1,M+1}, s'\_{1,1}} \delta\_{s\_{2,M+1}, s'\_{2,1}} \\ & \times \prod\_{m=1}^{M} \left( \exp\left(\frac{1}{k\_{\text{B}}T M} \left(Js\_{1,m}s\_{2,m} + h\_{1}s\_{1,m} + h\_{2}s\_{2,m}\right)\right) \right) \\ & \times \langle s\_{1,m} \vert \left(\frac{\cosh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right) \sinh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right)}{\sinh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right) \cosh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right)}\right) \vert s\_{1,m+1} \rangle \\ & \times \langle s\_{2,m} \vert \left(\frac{\cosh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right)}{\sinh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right) \cosh\left(\frac{1}{k\_{\text{B}}T M} \Gamma\right)}\right) \vert s\_{2,m+1} \rangle . \end{split}$$

Eventually, the density matrix *P* of the transverse Ising model for two nodes in Eq. (10.417), such that

$$P \equiv \frac{\exp\left(-\frac{1}{k\_{\rm B}T}H\right)}{\text{Tr}\left[\exp\left(-\frac{1}{k\_{\rm B}T}H\right)\right]},\tag{10.423}$$

can be reduced to the probability distribution *<sup>P</sup>*(*<sup>M</sup>*)(*s***1**, *<sup>s</sup>***2**, *<sup>s</sup>***3**, ···, *sM*, *sM***+1**) for *sm* <sup>=</sup> *s*1,*m s*2,*m* (*<sup>m</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, ···, *<sup>M</sup>* <sup>+</sup> 1) on the <sup>2</sup>×(*<sup>M</sup>* <sup>+</sup> <sup>1</sup>) ladder graph as follows: *s*1,1,*s*2,1|*P*|*s* 1,1,*s* 2,1 <sup>=</sup> lim *<sup>M</sup>*→+∞ *s***1**∈-2 *s***2**∈-2 ··· *sM* ∈-2 *sM***+1**∈-2 <sup>δ</sup>*s*1,*M*+1,*s* 1,1 <sup>δ</sup>*s*2,*M*+1,*s* 1,2 *P*(*M*) (*s***1**, *s***2**, *s***3**, ···, *sM*, *sM***+1**), (10.424)

where

$$\begin{split} \boldsymbol{P}^{(M)}(\boldsymbol{s}\_{1}, \boldsymbol{s}\_{2}, \boldsymbol{s}\_{3}, \dots, \boldsymbol{s}\_{M}, \boldsymbol{s}\_{M+1}) \\ \equiv & \frac{1}{Z^{(M)}} \prod\_{m=1}^{M} \Big( \exp\Big( \frac{1}{k\_{\mathrm{B}}TM} \Big( Js\_{1,m}s\_{2,m} + h\_{1}s\_{1,m} + h\_{2}s\_{2,m} \Big) \Big) \\ & \times \exp\Big( \frac{1}{k\_{\mathrm{B}}TM} \Big( K(MT, \Gamma)s\_{1,m}s\_{1,m+1} + K(MT, \Gamma)s\_{2,m}s\_{2,m+1} \Big) \Big) \Big), \end{split} \tag{10.425}$$

$$Z^{(M)} \equiv \sum\_{s\_{1,2} \in \Omega \nu\_{2,2}} \sum\_{\varepsilon \in \Omega} \cdots \sum\_{s\_{1,M} \in \Omega} \sum\_{s\_{2,M} \in \Omega} \delta\_{s\_{1,M+1}, s\_{1,1}} \delta\_{s\_{2,M+1}, s\_{2,1}}$$

$$\times \prod\_{m=1}^{M} \left( \exp \left( \frac{1}{k\_{\text{B}} TM} (Js\_{1,m} s\_{2,m} + h\_1 s\_{1,m} + h\_2 s\_{2,m}) \right) \right)$$

$$\times \exp \left( \frac{1}{k\_{\text{B}} TM} (K (MT, \Gamma) s\_{1,m} s\_{1,m+1} + K (MT, \Gamma) s\_{2,m} s\_{2,m+1}) \right) \Big), \tag{10.426}$$

$$K(T,\Gamma) \equiv k\_{\rm B} T \ln \left( \sqrt{\frac{\cosh\left(\frac{1}{k\_{\rm B}T}\right)}{\sinh\left(\frac{1}{k\_{\rm B}T}\right)}} \right). \tag{10.427}$$

The density matrix *<sup>P</sup>* in Eq. (10.423) of the transverse Ising model for <sup>|</sup>*V*<sup>|</sup> nodes *<sup>V</sup>* = {1, <sup>2</sup>, ···, <sup>|</sup>*V*|}, which is given by Eqs. (10.226) and (10.227) with Eq. (10.301), can be reduced to the matrix representation for s probability distribution

$$P^{(M)}(s\_1, s\_2, s\_3, \dots, s\_M, s\_{M+1}) \text{ for } s\_m = \begin{pmatrix} s\_{1,m} \\ s\_{2,m} \\ \vdots \\ \vdots \\ s\_{N,m} \end{pmatrix} (m = 1, 2, \dots, M+1) \text{ on the } |V| \times (M+1) \text{ }$$

ladder graph as follows:

$$\begin{split} \langle s\_{1,1}, s\_{2,1}, \dots, s\_{|V|,1} | P | s'\_{1,1}, s'\_{2,1}, \dots, s'\_{|V|,1} \rangle \\ = \lim\_{M \to +\infty} \sum\_{s\_1 \in \Omega^{|V|}} \sum\_{s\_2 \in \Omega^{|V|}} \dots \sum\_{s\_M \in \Omega^{|V|}} \sum\_{s\_{M+1} \in \Omega^{|V|}} \left( \prod\_{i \in V} \delta\_{s\_{i,M+1}, s'\_{i,1}} \right) P^{(M)} (s\_1, s\_2, s\_3, \dots, s\_M, s\_{M+1}), \end{split} \tag{10.428}$$

where

$$\begin{split} P^{(M)}(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3, \dots, \mathbf{s}\_M, \mathbf{s}\_{M+1}) \\ \equiv & \frac{1}{Z^{(M)}} \prod\_{m=1}^M \Bigg( \prod\_{\{i,j\} \in E} \exp\left(\frac{1}{k\_B T M} J\_{S\_i, m \cdot S\_j, m}\right) \Bigg) \\ & \times \Big( \prod\_{i \in V} \exp\left(\frac{1}{k\_B T M} K(MT, \Gamma)\_{S\_i, m \cdot S\_i, m+1}\right) \Bigg) \Bigg( \prod\_{i \in V} \exp\left(\frac{1}{k\_B T M} h\_{I \cdot S\_i, m}\right) \Bigg), \end{split} \tag{10.4.29}$$

$$\begin{split} Z^{(M)} &\equiv \sum\_{\mathbf{s}\_{\mathbf{l}} \in \Omega^{|V|}} \sum\_{\mathbf{s} \in \Omega^{|V|}} \cdots \sum\_{\mathbf{s}\_{M} \in \Omega^{|V|}} \left( \prod\_{i \in V} \delta\_{\mathbf{s}\_{i,M+1}, \mathbf{s}\_{i,1}} \right) \\ &\times \prod\_{m=1}^{M} \left( \prod\_{(i,j) \in E} \exp\left( \frac{1}{k\_{\mathbf{B}} T M} J s\_{i,m} s\_{j,m} \right) \right) \\ &\times \left( \prod\_{i \in V} \exp\left( \frac{1}{k\_{\mathbf{B}} T M} K(MT, \Gamma) s\_{i,m} s\_{i,m+1} \right) \right) \left( \prod\_{i \in V} \exp\left( \frac{1}{k\_{\mathbf{B}} T M} h\_{i \to i, m} \right) \right). \end{split} \tag{10.430}$$

The dynamics of quantum Monte Carlo methods based on Suzuki-Trotter decompositions have been analyzed by using Glauber dynamics [101, 102] and Langevin dynamics [103, 104]. Recently, these analyses are applied to some statistical machine learning systems with quantum annealing [105, 106]. Some statistical analysis of quantum Monte Carlo methods for statistical inferences based on Suzuki-Trotter decompositions [87, 88] are shown in Chaps. 12 and 13 of Part III of this book.

We now try to construct a modification of the conventional quantum message passing rule in Eq. (10.335) for the transverse Ising model in Eqs. (10.226) and (10.227) with Eq. (10.301) by imposing the assumption that all off-diagonal elements of *<sup>λ</sup> <sup>j</sup>***→***<sup>i</sup>* and *<sup>λ</sup>i***<sup>→</sup>** *<sup>j</sup>* for any edge {*i*, *<sup>j</sup>*}(∈*E*) are zero. By using the Suzuki-Trotter formulas in Eqs. (10.415)–(10.416), Eq. (10.335) can be represented as follows:

lim *<sup>M</sup>*→+∞ ⎡ ⎣exp 1 *<sup>k</sup>*B*T M <sup>σ</sup> <sup>x</sup>* exp 1 *<sup>k</sup>*B*T M hdi <sup>σ</sup> <sup>z</sup>* exp 1 *<sup>k</sup>*B*T M <sup>λ</sup> <sup>j</sup>***→***<sup>i</sup>* exp ⎛ ⎝ 1 *k*B*T M k*∈∂*i*\{*j*} *<sup>λ</sup>k***→***<sup>i</sup>* ⎞ ⎠ ⎤ ⎦ *M* <sup>=</sup> *Zi <sup>Z</sup>*{*i*,*j*} lim *<sup>M</sup>*→+∞Tr\*<sup>i</sup>* exp 1 *<sup>k</sup>*B*T M <sup>σ</sup> <sup>x</sup>*⊗*<sup>I</sup>* exp 1 *<sup>k</sup>*B*T M hdi <sup>σ</sup> <sup>z</sup>*⊗*<sup>I</sup>* ×exp 1 *<sup>k</sup>*B*T M <sup>J</sup> <sup>σ</sup> <sup>z</sup>*⊗*I*)(*I*⊗*<sup>σ</sup> <sup>z</sup>* exp 1 *<sup>k</sup>*B*T M hd <sup>j</sup> <sup>I</sup>***⊗***<sup>σ</sup> <sup>z</sup>* exp 1 *<sup>k</sup>*B*T M I*⊗*<sup>σ</sup> <sup>x</sup>* ×exp ⎛ ⎝ 1 *<sup>k</sup>*B*T M <sup>I</sup>*<sup>⊗</sup> ⎛ <sup>⎝</sup> *l*∈∂ *j*\{*i*} *<sup>λ</sup>l***<sup>→</sup>** *<sup>j</sup>* ⎞ ⎠ ⎞ <sup>⎠</sup>exp ⎛ ⎝ 1 *k*B*T M* ⎛ <sup>⎝</sup> *k*∈∂*i*\{*j*} *<sup>λ</sup>k***→***<sup>i</sup>* ⎞ <sup>⎠</sup>⊗*<sup>I</sup>* ⎞ ⎠ *M* , (10.431)

such that,

$$\begin{split} &\lim\_{M\to+\infty} \sum\_{s\_{i,2}\in\Omega} \dots \sum\_{s\_{i,M}\in\Omega} \left(\prod\_{m=1}^{M} \langle s\_{i,m}|\exp\left(\frac{1}{k\_{\mathrm{B}}\,T M} \sum\_{k\in\partial i\backslash\{j\}} \lambda\_{k\rightarrow i}\right)|s\_{i,m}\rangle \exp\left(\frac{1}{k\_{\mathrm{B}}\,T M}hd\_{i}s\_{i,m}\right)\right) \\ &\times \delta\_{s\_{i,1}',s\_{i,M},M+1}^{'} \prod\_{m=1}^{M} \left(\langle s\_{i,m}|\exp\left(\frac{1}{k\_{\mathrm{B}}\,T M}\lambda\_{j\rightarrow i}\right)|s\_{i,m}\rangle \langle s\_{i,m}|\exp\left(\frac{1}{k\_{\mathrm{B}}\,T M}\Gamma\sigma^{\mathbf{x}}\right)|s\_{i,m+1}\rangle\right) \\ &= \lim\_{M\to+\infty} \sum\_{s\_{i,2}'\in\Omega} \dots \sum\_{s\_{i,M}'\in\Omega} \left(\prod\_{m=1}^{M} \langle s\_{i,m}|\exp\left(\frac{1}{k\_{\mathrm{B}}\,T M}\sum\_{k\in\partial i\backslash\{j\}} \lambda\_{k\rightarrow i}\right)|s\_{i,m}\rangle \exp\left(\frac{1}{k\_{\mathrm{B}}\,T M}hd\_{i}s\_{i,m}\right)\right) \\ &\times \left(\delta\_{s\_{i,1}',s\_{i,M+1}}\frac{Z\_{i}}{Z\_{\{i,j\}}}\sum\_{s\_{j,1}'\in\Omega} \sum\_{k\_{\mathrm{B}}',s\_{j,1}} \delta\_{s\_{j,1},s\_{j,1}'} \right) \end{split}$$

$$\begin{split} & \times \sum\_{s\_{j}, \mathbf{S} \in \Omega} \cdots \sum\_{s\_{j}, M \in \Omega} \delta\_{s\_{j,1}', s\_{j,M+1}} \prod\_{m=1}^{M} \Biggl( \exp\Big(\frac{1}{k\_{\mathbf{B}}TM} \{Js\_{i,m}s\_{j,m}\}\Big) \exp\Big(\frac{1}{k\_{\mathbf{B}}TM}hd\_{j}s\_{j,m}\Big) \\ & \times \langle s\_{j,m}|\exp\Big(\frac{1}{k\_{\mathbf{B}}TM} \sum\_{l \in \partial j \, \langle\, \langle i\rangle\rangle} \lambda\_{l,m+1}\Big)|s\_{j,m}\rangle \\ & \times \langle s\_{i,m}|\exp\Big(\frac{1}{k\_{\mathbf{B}}TM} \Gamma\boldsymbol{\sigma}^{\mathbf{x}}\Big)|s\_{i,m+1}\rangle \langle s\_{j,m}|\exp\Big(\frac{1}{k\_{\mathbf{B}}TM} \Gamma\boldsymbol{\sigma}^{\mathbf{x}}\Big)|s\_{j,m+1}\rangle \Big). \end{split} \tag{10.432}$$

The sufficient conditions for Eq. (10.432) are given by

$$\begin{split} &\delta\_{s\_{i,1}',s\_{i,M+1}'} \prod\_{m=1}^{M} \left( \langle s\_{i,m} | \exp\left(\frac{1}{k\_{\text{B}}T M} \lambda\_{j\to i}\right) | s\_{i,m} \rangle \langle s\_{i,m} | \exp\left(\frac{1}{k\_{\text{B}}T M} \Gamma \sigma^{\mathbf{x}}\right) | s\_{i,m+1} \rangle \right) \\ &= \delta\_{s\_{i,1}',s\_{i,M+1}'} \frac{Z\_{i}}{Z\_{\{i,j\}}} \sum\_{s\_{j,1}' \in \Omega} \sum\_{s\_{j,1}' \in \Omega} \delta\_{s\_{j,1}',s\_{j,1}'} \\ & \quad \times \sum\_{s\_{j,2}' \in \Omega} \cdots \sum\_{s\_{j,M}' \in \Omega} \delta\_{s\_{j,1}',s\_{j,M+1}'} \prod\_{m=1}^{M} \left( \exp\left(\frac{1}{k\_{\text{B}}T M} J s\_{i,m} s\_{j,m} \right) \exp\left(\frac{1}{k\_{\text{B}}T M} h d\_{j} s\_{j,m} \right) \right) \\ & \quad \times \langle s\_{j,m} | \exp\left(\frac{1}{k\_{\text{B}}T M} \sum\_{l \in \partial j \backslash \{l\}} \lambda\_{l\to j} \right) | s\_{j,m} \rangle \\ & \quad \times \langle s\_{i,m} | \exp\left(\frac{1}{k\_{\text{B}}T M} \Gamma \sigma^{\mathbf{x}} \right) | s\_{i,m+1} \rangle \langle s\_{j,m} | \exp\left(\frac{1}{k\_{\text{B}}T M} \Gamma \sigma^{\mathbf{x}} \right) | s\_{j,m+1} \rangle \,. \end{split} \tag{10.4.33}$$

By taking the summations *si*,2∈- ··· *si*,*<sup>M</sup>* ∈- and the limit *<sup>M</sup>*→+∞ on both sides of Eq. (10.433), modified message passing rules can be derived as follows:

$$\begin{split} \exp\left(\frac{1}{k\_{\rm B}T}\lambda\_{j\rightarrow i} + \frac{1}{k\_{\rm B}T}\Gamma\sigma^{x}\right) \\ = &\frac{Z\_{i}}{Z\_{\{i,j\}}}\text{Tr}\_{\backslash i}\left[\exp\left(\frac{1}{k\_{\rm B}T}\left(J\left(\sigma^{z}\otimes I\right)\left(I\otimes\sigma^{z}\right) + \Gamma\left(\sigma^{x}\otimes I\right)\right)\right) \\ &+ \frac{1}{k\_{\rm B}T}I\otimes\left(hd\_{f}\sigma^{z} + \Gamma\sigma^{x} + \sum\_{l\in\partial j\backslash\langle i\rangle}\lambda\_{l\rightarrow j}\right)\right). \end{split} \tag{10.434}$$

We remark that the modified message passing rules of Eq. (10.434) can be derived by considering the Bethe free energy functional in the cluster variation method with a ladder-type basic cluster for the probabilistic graphical model [107] in Eqs. (10.429)– (10.430). While the conventional framework of quantum belief propagations was given as a quantum cluster variation method in Ref. [90], some extensions of loopy belief propagations have been proposed in Refs. [108–111] from a quantum statistical mechanical standpoint.

#### **10.6 Concluding Remarks**

This chapter explored sublinear modeling based on statistical mechanical informatics for statistical machine learning. In statistical machine learning, we need to compute some statistical quantities in massive probabilistic graphical models. Statistical mechanical informatics can provide us with many statistical approximate computational techniques. One is the advanced mean-field framework, which includes meanfield methods and loopy belief propagation methods such as the Bethe approximation. The advanced mean-field framework can provide good accuracy for statistical quantities, including averages and covariances. Some statistical quantities in probabilistic graphical models sometimes have phase transitions when computing the advanced mean-field method. As we have already shown in Sect. 10.3.3, we have two familiar phase transitions, namely, the first- and second-order phase transitions. Each step of the EM algorithm is often affected by the first-order phase transition because the internal energy in the prior probabilistic model has a discontinuity. This difficulty appears in the convergence procedure of the EM algorithm, in which the trajectory of a hyperparameter passes through not only the equilibrium state but also metastable and unstable states in the loopy belief propagation of probabilistic segmentations in Sect. 10.3.5. We show that some algorithms based on loopy belief propagation in probabilistic segmentations can be accelerated by the inverse real-space renormalization group techniques in Sect. 10.3.6.

The second part of this chapter explored quantum statistical machine learning and some statistical approximate algorithms in quantum statistical mechanical informatics for realizing the framework. Quantum mechanical computations for machine learning are rapidly developing in terms of both academic research and industrial implementation. In Sect. 10.4, we explained the modeling framework of density matrices and some fundamental mathematics for it and expanded the modeling framework to the quantum expectation-maximization algorithm. In Sect. 10.5, we showed the fundamental frameworks of quantum loopy belief propagation and quantum statistical mechanical extensions of the adaptive TAP method. Moreover, we reviewed the Suzuki-Trotter expansion, and the real and the momentum space renormalization group for sublinear modeling of density matrices.

Recently, we have the framework of massive fundamental mathematical modeling in the statistical machine learning theory for many practical applications, such that, mainly the sparse modeling [4, 5] and the deep learning [10]. Many academic researchers are interested in interpretations of such modelings in the stand point of probabilistic graphical models in the statistical mathematics [2, 3, 7] and the statistical mechanical informatics [8, 9, 13, 17]. Now we have novel technologies for realizing quantum computing in the stand point of quantum mechanical extensions of the statistical mechanical informatics, such that, for example, **D-wave Quantum Annealer**. Some results in which the D-Wave quantum annealers have achieved high performance computing have appeared in Refs. [112–116]. Some recent developments of the probabilistic graphical modelings and their static and dynamical analysis of the advanced mean-field methods and the Suzuki-Trotter decompositions as well as the replica methods for realizing sublinear modeling are shown in the subsequent Chaps. 12 and 13 of the present part of this book, in the statistical mechanical point of view.

**Acknowledgements** This work was partly supported by the JST-CREST program (No. JPMJCR1402) of the Japan Science and Technology Agency and a JSPS KAKENHI Grant (No.18H03303) from the Ministry of Education, Culture, Sports, Science, and Technology. The author is thankful for some valuable comments from Profs. Masayuki Ohzeki and Manaka Okuyama of Tohoku University and Prof. Muneki Yasuda of Yamagata University in Japan, Prof. Federico Ricci-Tersenghi of Università di Roma, La Sapenza in Italy, and Prof. Anthony C.C. Coolen of Radboud University in the Netherlands.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 11 Empirical Bayes Method for Boltzmann Machines**

**Muneki Yasuda**

**Abstract** The framework of the empirical Bayes method allows the estimation of the values of the hyperparameters in the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function. However, the maximization is computationally difficult because the empirical Bayes likelihood function involves intractable integrations of the partition function. The method presented in this chapter avoids this computational problem by using the replica method and the Plefka expansion, which is quite simple and fast because it does not require any iterative procedures and gives reasonable estimates under certain conditions.

#### **11.1 Introduction**

*Boltzmann machine learning* (BML) [1] has been actively studied in the fields of machine learning and statistical mechanics. In statistical mechanics, the problem of BML is sometimes referred to as the *inverse Ising problem* because a Boltzmann machine is the same as an Ising model, and it can be treated as an inverse problem for the Ising model. The framework of the *usual* BML is as follows. Given a set of observed data points, the appropriate values of the Boltzmann machine parameters, namely the biases and couplings, are estimated through maximum likelihood (ML) estimation. Because BML involves intractable multiple summations (i.e., evaluation of the partition function), several approximations have been proposed for it from the viewpoint of statistical mechanics [2]. Examples include methods based on mean-field approximations (e.g., the Plefka expansion [3] and the cluster variation method [4]) [5–11] and methods based on other approximations [12–14].

This chapter focuses on another type of learning problem for the Boltzmann machine. Consider the prior distributions of the Boltzmann machine parameters and assume that the prior distributions are governed by some hyperparameters. The introduction of the prior distributions is strongly connected to regularized ML estimation, in which the hyperparameters can be regarded as regularization coefficients. The reg-

M. Yasuda (B)

Graduate School of Science and Engineering, Yamagata University, Yamagata, Japan e-mail: muneki@yz.yamagata-u.ac.jp

<sup>©</sup> The Author(s) 2022

N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_11

**Fig. 11.1** Illustration of scheme of the empirical Bayes method considered in this chapter

ularized ML estimation is important for preventing overfitting to the dataset. As mentioned above, the *usual* BML aims to optimize the values of the Boltzmann machine parameters using a set of observed data points. However, the aim of the problem presented in this chapter is the estimation of the appropriate values of the hyperparameters from the dataset *without* estimating the specific values of the Boltzmann machine parameters. From the Bayesian viewpoint, this can be potentially accomplished by the *empirical Bayes method* (also known as type-II ML estimation or evidence approximation) [15, 16]. The schemes of the *usual* BML and the problem investigated in this chapter are illustrated in Fig. 11.1.

Recently, an effective algorithm was proposed for the empirical Bayes method for the Boltzmann machine [17]. Using this method, the hyperparameter estimates can be obtained without costly operations. This chapter aims to explain this effective method.

The rest of this chapter is organized as follows. The formulations of the Boltzmann machine and its usual and regularized ML estimations are presented in Sect. 11.2. The empirical Bayes method for the Boltzmann machine is presented in Sect. 11.3. Section 11.4 describes a statistical mechanical analysis for the empirical Bayes method and an inference algorithm obtained from the analysis. Experimental results for the presented algorithm are presented in Sect. 11.5. The summary and some discussions are presented in Sect. 11.6. The appendices for this chapter are given in Sect. 11.7.

#### **11.2 Boltzmann Machine with Prior Distributions**

Consider a fully connected Boltzmann machine with *n* (bipole) variables *S* := {*Si* ∈ {−1, +1} | *i* = 1, 2,..., *n*} [1]:

$$P(\mathbf{S} \mid h, J) := \frac{1}{Z(h, J)} \exp\left(h \sum\_{i=1}^{n} S\_i + \sum\_{i$$

where *<sup>i</sup>*<sup>&</sup>lt; *<sup>j</sup>* is the sum over all distinct pairs of variables, that is, *<sup>i</sup>*<sup>&</sup>lt; *<sup>j</sup>* = *<sup>n</sup> i*=1 *<sup>n</sup> <sup>j</sup>*=*i*+1. *Z*(*h*, *J*) is the partition function defined by

$$Z(h, J) := \sum\_{\mathcal{S}} \exp\left(h \sum\_{i=1}^n \mathcal{S}\_i + \sum\_{i$$

where *<sup>S</sup>* is the sum over all possible configurations of *S*, that is,

$$\sum\_{\mathbf{S}} := \prod\_{i=1}^{n} \sum\_{\mathbf{S}\_i = \pm 1} \cdot$$

The parameters *h* ∈ (−∞, +∞) and *J* := {*Ji j* ∈ (−∞, +∞) | *i* < *j*} denote the bias and couplings, respectively.

Given *N* observed data points, D := {**S**(μ) ∈ {−1, +1}*<sup>n</sup>* | μ = 1, 2,..., *N*}, the log-likelihood function is defined as

$$L\_{\rm ML}(h, \mathbf{J}) := \frac{1}{nN} \sum\_{\mu=1}^{N} \ln P(\mathbf{S}^{(\mu)} \mid h, \mathbf{J}). \tag{11.2}$$

The maximization of the log-likelihood function with respect to *h* and *J* (i.e., the ML estimation) corresponds to BML (or the inverse Ising problem), that is,

$$\{\hat{h}\_{\rm ML}, \hat{J}\_{\rm ML}\} = \underset{h, \mathbf{J}}{\text{arg}\, \max} \, L\_{\rm ML}(h, \mathbf{J}). \tag{11.3}$$

However, the exact ML estimations cannot be obtained because the gradients of the log-likelihood function include intractable sums over *O*(2*<sup>n</sup>*) terms.

We now introduce the prior distributions of the parameters *h* and *J* as *P*prior(*h* | *H*) and

$$P\_{\text{prior}}(J \mid \mathcal{Y}) := \prod\_{i$$

where *H* and γ are the hyperparameters of these prior distributions. One of the most important motivations for introducing the prior distributions is the Bayesian interpretation of the regularized ML estimation [16]. Given the observed dataset D, using the prior distributions, the posterior distribution of *h* and *J* is expressed as

$$P\_{\text{post}}(h, \mathbf{J} \mid \mathcal{D}, H, \boldsymbol{\gamma}) = \frac{P(\mathcal{D} \mid h, \mathbf{J} \mid P\_{\text{prior}}(h \mid H)P\_{\text{prior}}(\mathbf{J} \mid \boldsymbol{\gamma})}{P(\mathcal{D} \mid H, \boldsymbol{\gamma})},\tag{11.5}$$

where

$$P(\mathcal{D} \mid h, J) := \prod\_{\mu=1}^N P(\mathbf{S}^{(\mu)} \mid h, J).$$

The denominator of Eq. (11.5) is sometimes referred to as *evidence*. Using the posterior distribution, the maximum a posteriori (MAP) estimation of the parameters is obtained as

$$\{\hat{h}\_{\text{MAP}}, \hat{J}\_{\text{MAP}}\} = \underset{h, J}{\text{arg}\,\text{max}}\, L\_{\text{MAP}}(h, J), \tag{11.6}$$

where

$$\begin{split} L\_{\text{MAP}}(h, \mathbf{J}) &:= \frac{1}{nN} \ln P\_{\text{post}}(h, \mathbf{J} \mid \mathcal{D}, H, \mathbf{y}) \\ &= L\_{\text{ML}}(h, \mathbf{J}) + \frac{1}{nN} R\_0(h) + \frac{1}{nN} R\_1(\mathbf{J}) + \text{constant}. \end{split} \tag{11.7}$$

The MAP estimation of Eq. (11.6) corresponds to the regularized ML estimation, in which *R*0(*h*) := ln *P*prior(*h* | *H*) and *R*1( *J*) := ln *P*prior( *J* | γ ) work as penalty terms. For example, (i) when the prior distribution of *J* is a Gaussian prior,

$$P\_{\text{prior}}(J\_{ij} \mid \boldsymbol{\nu}) = \sqrt{\frac{n}{2\pi\boldsymbol{\chi}}} \exp\left(-\frac{nJ\_{ij}^2}{2\boldsymbol{\chi}}\right), \quad \boldsymbol{\chi} > 0,\tag{11.8}$$

*R*1( *J*) corresponds to the *L*<sup>2</sup> regularization term and γ corresponds to its coefficient; (ii) when the prior distribution of *J* is a Laplace prior,

$$P\_{\text{prior}}(J\_{ij} \mid \mathcal{Y}) = \sqrt{\frac{n}{2\gamma}} \exp\left(-\sqrt{\frac{2n}{\gamma}} |J\_{ij}|\right), \quad \gamma > 0,\tag{11.9}$$

*R*1( *J*) corresponds to the *L*<sup>1</sup> regularization term and γ again corresponds to its coefficient. The variances of these prior distributions are identical, that is, Var[*Ji j*] = γ /*n*.

The following uses the Gaussian prior for *J* and the following as a simple test case:

$$P\_{\text{prior}}(h \mid H) = \delta(h - H),\tag{11.10}$$

where δ(*x*) is the Dirac delta function; that is, in this test case, *h* does not distribute. It is noteworthy that the resultant algorithm obtained based on the Gaussian prior can be applied to the case of the Laplace prior without modification [17].

#### **11.3 Empirical Bayes Method**

Using the empirical Bayes method, the values of the hyperparameters, *H* and γ , can be inferred from the observed dataset, D. For the empirical Bayes method, a marginal log-likelihood function is defined as

$$L\_{\rm EB}(H,\chi) := \frac{1}{nN} \ln \left[ P(\mathcal{D} \mid h, J) \right]\_{h,J},\tag{11.11}$$

where [· · · ]*h*, *<sup>J</sup>* is the average over the prior distributions, that is,

$$\|\cdot\cdot\|\_{\hbar,J} := \int d\mathbf{J} \int dh (\cdot\cdot\cdot) P\_{\text{prior}}(h \mid H) P\_{\text{prior}}(\mathbf{J} \mid \boldsymbol{\chi}) \,.$$

This marginal log-likelihood function is referred to as the *empirical Bayes likelihood function* in this section. From the perspective of the empirical Bayes method, the optimal values of the hyperparameters, *H*ˆ and γˆ, are obtained by maximizing the empirical Bayes likelihood function, that is,

$$\{\hat{H}, \hat{\mathcal{Y}}\} = \underset{H, \mathcal{Y}}{\text{arg}\, \text{max}} \, L\_{\text{EB}}(H, \mathcal{Y}). \tag{11.12}$$

It is noteworthy that [*P*(D | *h*, *J*)]*h*, *<sup>J</sup>* in Eq. (11.11) is identified as the evidence appearing in Eq. (11.5).

The marginal log-likelihood function can be rewritten as

$$L\_{\rm EB}(H,\mathcal{Y}) = \frac{1}{nN} \ln \left[ \exp \left( nN L\_{\rm ML}(h, \mathcal{J}) \right) \right]\_{h,\mathcal{J}}.\tag{11.13}$$

Consider the case *N n*. In this case, by using the saddle point evaluation, Eq. (11.13) is reduced to

$$L\_{\rm EB}(H,\gamma) \approx \frac{1}{nN} \ln P\_{\rm prior}(\hat{h}\_{\rm ML} \mid H) + \frac{1}{nN} \ln P\_{\rm prior}(\hat{J}\_{\rm ML} \mid \gamma) + \text{constant}.$$

In this case, the empirical Bayes estimates {*H*ˆ , γˆ} thus converge to the ML estimates of the hyperparameters in the prior distributions in which the ML estimates of the parameters {*h*ˆML, *J*ˆ ML} (i.e., the solution for BML) are inserted. This indicates that parameter estimations can be conducted independently of hyperparameter estimation. This trivial case is not considered in this section. Remember that the objective is to estimate the hyperparameter values *without* estimating the specific values of the parameters.

## **11.4 Statistical Mechanical Analysis of Empirical Bayes Likelihood**

The empirical Bayes likelihood function in Eq. (11.11) involves intractable multiple integrations. This section presents an evaluation of the empirical Bayes likelihood function using statistical mechanical analysis. The outline of the evaluation is as follows. First, the intractable multiple integrations in Eq. (11.11) are evaluated using the *replica method* [18, 19]. This evaluation leads to a quantity with a certain intractable multiple summation. The quantity is approximately evaluated using the *Plefka expansion* [3]. Thus, from the two approximations, the replica method and Plefka expansion, the evaluation result for the empirical Bayes likelihood function is obtained.

#### *11.4.1 Replica Method*

The empirical Bayes likelihood function in Eq. (11.11) can be represented as

$$L\_{\rm EB}(H,\chi) = \frac{1}{nN} \ln \lim\_{x \to -1} \Psi\_x(H,\chi),\tag{11.14}$$

where

$$\Psi\_{\mathbf{x}}(H,\boldsymbol{\chi}) := \left[ Z(h,\boldsymbol{J})^{\times N} \exp N \left( h \sum\_{i=1}^{n} d\_i + \sum\_{i$$

and

$$d\_i := \frac{1}{N} \sum\_{\mu=1}^N \mathbf{S}\_i^{(\mu)}, \quad d\_{ij} := \frac{1}{N} \sum\_{\mu=1}^N \mathbf{S}\_i^{(\mu)} \mathbf{S}\_j^{(\mu)}$$

are the sample averages of the observed data points. We now assume that τ*<sup>x</sup>* := *x N* is a natural number larger than zero. Accordingly, Eq. (11.15) can be expressed as

$$\Psi\_x(H,\mathcal{V}) = \left[\sum\_{\mathcal{S}\_x} \exp\left\{h \sum\_{i=1}^n \left(\sum\_{a=1}^{\mathfrak{r}\_x} S\_i^{\{a\}} + N d\_i\right)\right.\right.$$

$$\left. + \sum\_{i$$

where *<sup>a</sup>*, *<sup>b</sup>* ∈ {1, <sup>2</sup>,...,τ*<sup>x</sup>* } are the replica indices and *<sup>S</sup>*{*a*} *<sup>i</sup>* is the *i*th variable in the *<sup>a</sup>*th replica. <sup>S</sup>*<sup>x</sup>* := {*S*{*a*} *<sup>i</sup>* | *i* = 1, 2,..., *n*; *a* = 1, 2,...,τ*<sup>x</sup>* } is the set of all vari-

ables in the replicated system (see Fig. 11.2) and <sup>S</sup>*<sup>x</sup>* is the sum over all possible configurations of S*<sup>x</sup>* , that is,

$$\sum\_{\mathcal{S}\_x} := \prod\_{i=1}^n \prod\_{a=1}^{\mathfrak{r}\_x} \sum\_{\mathcal{S}\_i^{\{a\}} = \pm 1} \cdot$$

We evaluate *<sup>x</sup>* (*H*,γ) under the assumption that τ*<sup>x</sup>* is a natural number, and then we take the limit of *x* → −1 from the evaluation result as an analytic continuation,<sup>1</sup> to obtain the empirical Bayes likelihood function (this is the so-called *replica trick*).

By employing the Gaussian prior in Eq. (11.8), Eq. (11.16) becomes

$$\Psi\_x^{\text{Gauss}}(H,\chi) = \exp\left\{nNHM + \frac{\chi(n-1)N^2}{4}\left(C\_2 + \frac{x}{N}\right) - F\_x(H,\chi)\right\},\tag{11.17}$$

where

$$M := \frac{1}{n} \sum\_{i=1}^{n} d\_i, \quad C\_k := \frac{2}{n(n-1)} \sum\_{i$$

and

$$F\_x(H, \mathcal{Y}) := -\ln \sum\_{\mathcal{S}\_x} \exp \left( -E\_x(\mathcal{S}\_x; H, \mathcal{Y}) \right) \tag{11.19}$$

is the replicated (Helmholtz) free energy [20–23], where

<sup>1</sup> The justification for this analytic continuation may not be guaranteed mathematically. Thus, this type of analysis is regarded as "trick."

$$\begin{split} E\_x(\mathbb{S}\_x; H, \boldsymbol{\nu}) := -H \sum\_{i=1}^n \sum\_{a=1}^{t\_x} S\_i^{\{a\}} - \frac{\boldsymbol{\nu} \cdot \boldsymbol{N}}{n} \sum\_{i$$

is the Hamiltonian (or energy function) of the replicated system, where *<sup>a</sup>*<*<sup>b</sup>* is the sum over all distinct pairs of replicas, that is, *<sup>a</sup>*<*<sup>b</sup>* <sup>=</sup> τ*<sup>x</sup> a*=1 τ*<sup>x</sup> <sup>b</sup>*=*a*+1.

#### *11.4.2 Plefka Expansion*

Because the replicated free energy in Eq. (11.19) includes intractable multiple summations, an approximation is required to proceed with the current evaluation. In this section, the replicated free energy in Eq. (11.19) is approximated using the Plefka expansion [3]. In brief, the Plefka expansion is a perturbative expansion in Gibbs free energy that is a dual form of a corresponding Helmholtz free energy.

The Gibbs free energy is obtained as

$$G\_x(m, H, \boldsymbol{\chi}) = -n\tau\_x H \boldsymbol{m} + \underset{\boldsymbol{\lambda}}{\text{ext}} \left\{ \boldsymbol{\lambda} n \, \tau\_x \boldsymbol{m} - \ln \sum\_{\mathcal{S}\_z} \exp \left( -E\_x(\mathcal{S}\_x; \boldsymbol{\lambda}, \boldsymbol{\chi}) \right) \right\}.\tag{11.21}$$

The derivation of this Gibbs free energy is described in Sect. 11.7.1. The summation in Eq. (11.21) can be performed when γ = 0, which gives

$$G\_x(m, H, 0) = -n\tau\_x Hm + n\tau\_x \mathop{\rm extr}\_{\lambda} \{\lambda m - \ln(2 \cosh \lambda)\}$$

$$= -n\tau\_x Hm + n\tau\_x e(m), \tag{11.22}$$

where *e*(*m*) is the negative mean-field entropy defined by

$$e(m) := \frac{1+m}{2} \ln \frac{1+m}{2} + \frac{1-m}{2} \ln \frac{1-m}{2}.\tag{11.23}$$

In the context of the Plefka expansion, the Gibbs free energy *Gx* (*m*, *H*,γ)is approximated by the perturbation from *Gx* (*m*, *H*, 0). Expanding *Gx* (*m*, *H*,γ) around γ = 0 gives

$$\frac{G\_x(m, H, \gamma)}{nN} = -xHm + xe(m) + \phi\_x^{(1)}(m)\gamma + \phi\_x^{(2)}(m)\gamma^2 + O(\gamma^3), \quad (11.24)$$

where φ(1) *<sup>x</sup>* (*m*) and φ(2) *<sup>x</sup>* (*m*) are the expansion coefficients defined by

$$\phi\_x^{(k)}(m) := \frac{1}{nNk!} \lim\_{\mathcal{V} \to 0} \frac{\partial^k G\_x(m, H, \mathcal{V})}{\partial \mathcal{V}^k}.$$

The forms of the two coefficients are presented in Eqs. (11.34) and (11.35) in Sect. 11.7.2.

From Eqs. (11.14), (11.17), (11.24), and (11.33), the approximation of the empirical Bayes likelihood function is obtained as

$$L\_{\rm EB}(H,\chi) \approx HM - \mathop{\rm extr}\limits\_{m} \left[ Hm - e(m) + \Phi(m)\chi + \phi\_{-1}^{(2)}(m)\chi^2 \right],\tag{11.25}$$

where

$$\Phi(m) := \phi\_{-1}^{(1)}(m) - \frac{(n-1)N}{4n} \left(C\_2 - \frac{1}{N}\right).$$

The forms of φ(1) <sup>−</sup><sup>1</sup>(*m*) and <sup>φ</sup>(2) <sup>−</sup><sup>1</sup>(*m*) are presented in Eqs. (11.37) and (11.38) in Sect. 11.7.2.

#### *11.4.3 Algorithm for Hyperparameter Estimation*

As mentioned in Sect. 11.3, the empirical Bayes inference is achieved by maximizing *L*EB(*H*,γ) with respect to *H* and γ (cf. Eq. (11.12)). The extremum condition in Eq. (11.25) with respect to *H* leads to

$$
\hat{m} = M,\tag{11.26}
$$

where *m*ˆ is the value of *m* that satisfies the extremum condition in Eq. (11.25). By combining the extremum condition of Eq. (11.25) with respect to *m* with Eq. (11.26),

$$\hat{H} = \text{atanh}M - \left(\frac{\partial \phi\_{-1}^{(1)}(M)}{\partial M}\gamma + \frac{\partial \phi\_{-1}^{(2)}(M)}{\partial M}\gamma^2\right) \tag{11.27}$$

is obtained, where atanh*x* is the inverse function of tanh *x*. From Eqs. (11.25) and (11.26), the optimal value of γ is obtained by

$$\hat{\boldsymbol{\chi}} = \underset{\boldsymbol{\chi}}{\text{arg}\max} \left[ -\boldsymbol{\Phi}(\boldsymbol{M})\boldsymbol{\chi} - \boldsymbol{\phi}\_{-1}^{(2)}(\boldsymbol{M})\boldsymbol{\chi}^2 \right].\tag{11.28}$$

Since Eq. (11.28) represents a univariate quadratic optimization, γˆ is immediately obtained as follows: (i) when φ(2) <sup>−</sup><sup>1</sup>(*M*) > 0 and (*M*) <sup>≥</sup> 0 or when <sup>φ</sup>(2) <sup>−</sup><sup>1</sup>(*M*) = 0 and (*M*) > 0, <sup>γ</sup><sup>ˆ</sup> <sup>=</sup> 0, (ii) when <sup>φ</sup>(2) <sup>−</sup><sup>1</sup>(*M*) > 0 and (*M*) < 0, γˆ = − (*M*)/ (2φ(2) <sup>−</sup><sup>1</sup>(*M*)), and (iii) <sup>γ</sup><sup>ˆ</sup> → ∞, elsewhere. The case of <sup>φ</sup>(2) <sup>−</sup><sup>1</sup>(*M*) = (*M*) = 0 is ignored because it may be rarely observed in realistic settings. Using Eqs. (11.27) and (11.28), the solution to the empirical Bayes inference can be obtained without any iterative process. The pseudocode of the presented procedure is shown in Algorithm 1. The order of the computational complexity of the presented method is *O*(*N n*2). Remember that the order of the computational complexity of the exact ML estimation is *O*(2*<sup>n</sup>*).

#### **Algorithm 1** Proposed Inference Algorithm

1: **Input** Observed dataset: <sup>D</sup> := {**S**(μ) ∈ {−1, <sup>+</sup>1}*<sup>n</sup>* <sup>|</sup> <sup>μ</sup> <sup>=</sup> <sup>1</sup>, <sup>2</sup>,..., *<sup>N</sup>*}.

2: Compute *M*, , *C*1, and *C*<sup>2</sup> using the dataset according to Eqs. (11.18) and (11.36).

3: Determine γˆ using Eq. (11.28):

$$
\hat{\boldsymbol{\nu}} = \begin{cases}
0 & \text{case (i)} \\
\infty & \text{elsewhere,}
\end{cases}
$$

where case (i): φ(2) <sup>−</sup>1(*M*) > <sup>0</sup>, (*M*) <sup>≥</sup> 0 or <sup>φ</sup>(2) <sup>−</sup>1(*M*) <sup>=</sup> <sup>0</sup>, (*M*) > 0 and case (ii): φ(2) <sup>−</sup>1(*M*) > 0, (*M*) < 0.

4: Using γˆ, determine *H*ˆ using Eq. (11.27).

5: **Output** γˆ and *H*ˆ .

In the presented method, the value of *H*ˆ does not affect the determination of γˆ. Several mean-field-based methods for BML (e.g., listed in Sect. 11.1) have similar procedures, in which *J*ˆ ML is determined separately from *h*ˆML. This is a common property of the mean-field-based methods for BML, including the current empirical Bayes problem.

Although the presented method is derived based on the Gauss prior presented in Eq. (11.8), the same procedure can be applied to the case of the Laplace prior presented in Eq. (11.9) [17].

#### **11.5 Demonstration**

This section discusses the results of numerical experiments. In these experiments, the observed dataset D was generated by the generative Boltzmann machine (gBM), which has the same form as Eq. (11.1), via Gibbs sampling (with a simulatedannealing-like strategy). The parameters of gBM were drawn from the prior distributions in Eqs. (11.4) and (11.10). This implies that the model-matched case (i.e., the generative and learning models are identical) was considered. In the following, the notation <sup>α</sup> := *<sup>N</sup>*/*<sup>n</sup>* and *<sup>J</sup>* := <sup>√</sup><sup>γ</sup> are used. The standard deviations of the Gaussian prior in Eq. (11.8) and the Laplace prior in Eq. (11.9) can thus be represented as *J*/ <sup>√</sup>*n*. The hyperparameters of gBM are denoted by *<sup>H</sup>*true and *<sup>J</sup>*true.

#### *11.5.1 Gaussian Prior Case*

We now consider the case in which the prior distribution of *J* is the Gaussian prior in Eq. (11.8). In this case, the Boltzmann machine corresponds to the Sherrington-Kirkpatrick (SK) model [24], and thus exhibits a spin-glass transition at *J* = 1 when *h* = 0 (i.e., when *H* = 0).

We consider the case *H*true = 0. The scatter plots for the estimation of *J*ˆ for various *J*true when *H*true = 0 and α = 0.4 are shown in Fig. 11.3. When *J*true < 1, our estimates of *J*ˆ are significantly consistent with *J*true. This implies that the validity of our perturbative approximation is lost in the spin-glass phase, as is often the case with several mean-field approximations. Figure 11.4 shows the scatter plots for various α. A smaller α causes *J*ˆ to be overestimated and a larger α causes it to be underestimated. In our experiments, at least, the optimal value of α seems to be αopt ≈ 0.4 when *H*true = 0. Our method can also estimate *H*ˆ . The results for the estimation of *H*ˆ when *H*true = 0 and α = 0.4 are shown in Fig. 11.5. Figure 11.5a, b shows the average of |*H*true − *H*ˆ | (i.e., the mean absolute error (MAE)) and the standard deviation of *H*ˆ over 300 experiments, respectively. The MAE and standard deviation increase in the region where *J*true > 1.

**Fig. 11.3** Scatter plots of *J*true (horizontal axis) versus *J*ˆ(vertical axis) when *H*true = 0 andα = 0.4: **a** *n* = 300 and **b** *n* = 500. These plots represent the average values over 300 experiments

#### *11.5.2 Laplace Prior Case*

We now consider the case in which the prior distribution of *J* is the Laplace prior in Eq. (11.9). The scatter plots for the estimation of *J*ˆ for various values of *J*true when *H*true = 0 are shown in Fig. 11.6. The plots shown in Fig. 11.6 almost completely overlap with those in Fig. 11.4.

**Fig. 11.4** Scatter plots of *J*true (horizontal axis) versus *J*ˆ (vertical axis) for various α = *N*/*n* when *H*true = 0: **a** *n* = 300 and **b** *n* = 500. These plots represent the average values over 300 experiments

**Fig. 11.5** Results of estimation of *H*ˆ versus *J*true when *H*true = 0 and α = 0.4: **a** the MAE and **b** standard deviation. These plots represent the average values over 300 experiments

**Fig. 11.6** Scatter plots of *J*true (horizontal axis) versus *J*ˆ (vertical axis) for various α = *N*/*n*, when *H*true = 0, in the case of the Laplace prior: **a** *n* = 300 and **b** *n* = 500. These plots represent the average values over 300 experiments

#### **11.6 Summary and Discussion**

This chapter describes the hyperparameter inference algorithm proposed in [17]. As evident from the numerical experiments, the proposed inference method in both the Gaussian and Laplace prior cases works efficiently except for the spin-glass phase. However, the presented method has the drawback that it is sensitive to the value of α = *N*/*n*. In the experiments in Sect. 11.5, although α ≈ 0.4 was appropriate when *H*true = 0, it is known that the appropriate value decreases as *H*true increases [17]. Since we cannot know the value of *H*true in advance, the appropriate setting of α is also unknown. Estimation of αopt is an open problem. It seems to be unnatural that there exists an optimal value of α because larger datasets are better in usual machine learning. Such peculiar behavior can be attributed to the truncating approximation in the Plefka expansion. A more detailed discussion of this issue is presented in [17].

Finally, we review the presented method from the perspective of sublinear computation without considering the aforementioned issues. The Boltzmann machine given in Eq. (11.1) has *p* parameters, where *p* = *O*(*n*<sup>2</sup>). In usual machine learning, *N* = *O*(*p*) is, at least, required to obtain a good ML estimate for the Boltzmann machine. Therefore, a hyperparameter inference "without" the empirical Bayes method (namely, the strategy in which the hyperparameters are inferred through the ML estimate in a similar manner as that discussed in the latter part of Sect. 11.3) requires a dataset of size *O*(*p*). However, the presented method requires only *N* = *O*(*n*) = *O*( <sup>√</sup>*p*) because <sup>α</sup> <sup>=</sup> *<sup>O</sup>*(1) with respect to *<sup>n</sup>*.

**Acknowledgements** This work was partially supported by JSPS KAKENHI (Grant Numbers: 15H03699, 18K11459, 18H03303, 25120013, and 17H00764), JST CREST (Grant Number: JPMJCR1402), and the COI Program from the JST (Grant Number JPMJCE1312).

#### **11.7 Appendices**

#### *11.7.1 Appendix 1: Gibbs Free Energy*

In this appendix, we derive the Gibbs free energy for the replicated (Helmholtz) free energy in Eq. (11.19).

The replicated free energy is obtained by minimizing the variational free energy, defined by

$$f[\mathcal{Q}] := \sum\_{\mathcal{S}\_x} E\_x(\mathcal{S}; H, \mathcal{\boldsymbol{\nu}}) \mathcal{Q}(\mathcal{S}\_x) + \sum\_{\mathcal{S}\_x} \mathcal{Q}(\mathcal{S}\_x) \ln \mathcal{Q}(\mathcal{S}\_x), \tag{11.29}$$

under the normalization constraint, that is, <sup>S</sup>*<sup>x</sup> Q*(S*<sup>x</sup>* ) = 1, where *Q*(S*<sup>x</sup>* ) is a test distribution over S*<sup>x</sup>* , and *Ex* (S*<sup>x</sup>* ; *H*,γ) is the Hamiltonian for the replicated system defined in Eq. (11.20).

The Gibbs free energy is obtained by adding new constraints to the minimization of *f* [*Q*]. We add the relationship

$$m = \frac{1}{n\tau\_x} \sum\_{i=1}^{n} \sum\_{a=1}^{\tau\_x} \sum\_{\mathcal{S}\_x} S\_i^{\{a\}} \mathcal{Q}(\mathcal{S}\_x) \tag{11.30}$$

as the constraint. Using Lagrange multipliers, the Gibbs free energy is obtained as

$$G\_x(m, H, \boldsymbol{\eta}) := \underset{\mathcal{Q}, \boldsymbol{\lambda}, r}{\text{extr}} \left\{ f[\boldsymbol{\mathcal{Q}}] - r \left( \sum\_{\mathcal{S}\_x} \mathcal{Q}(\mathcal{S}\_x) - 1 \right) \right.$$

$$- \lambda \left( \sum\_{i=1}^n \sum\_{a=1}^{\mathfrak{r}\_x} \sum\_{\mathcal{S}\_x} S\_i^{[a]} \mathcal{Q}(\mathcal{S}\_x) - n \boldsymbol{\tau}\_x m \right) \Big|, \tag{11.31}$$

where "extr" denotes the extremum with respect to the assigned parameters, and *r* and λ are the Lagrange multipliers for the normalization constraint of *Q*(S*<sup>x</sup>* ) and the constraint in Eq. (11.30), respectively. Performing the extremum operation with respect to *Q*(S) and *r* in Eq. (11.31) gives

$$G\_x(m, H, \boldsymbol{\gamma}) = \underset{\boldsymbol{\lambda}}{\text{ext}} \left\{ \boldsymbol{\lambda} n \, \boldsymbol{\tau}\_x m - \ln \sum\_{\mathcal{S}\_x} \exp \left( -E\_x(\mathcal{S}\_x; H + \boldsymbol{\lambda}, \boldsymbol{\gamma}) \right) \right\}. \tag{11.32}$$

The replicated free energy in Eq. (11.19) coincides with the extremum of this Gibbs free energy with respect to *m*, that is,

$$F\_x(H, \mathcal{Y}) = \underset{m}{\text{extr }} G\_x(m, H, \mathcal{Y}). \tag{11.33}$$

By performing the shift *H* + λ → λ in Eq. (11.32), Eq. (11.21) is obtained.

#### *11.7.2 Appendix 2: Coefficients of Plefka Expansion*

This appendix presents the coefficients of the Plefka expansion in Eq. (11.24). Refer to Ref. [17] for a detailed derivation. The first-order coefficient is given by

$$\phi\_x^{(1)}(m) = -\frac{\varkappa(n-1)NC\_1}{2n}m^2 - \frac{(n-1)K\_x}{2nN}m^4,\tag{11.34}$$

where *Kx* := τ*<sup>x</sup>* (τ*<sup>x</sup>* − 1)/2. The second-order coefficient is given by

$$\begin{split} \phi\_x^{(2)}(m) &= -\frac{(n-1)^2 \tau\_x N \Omega}{2n^2} m^2 (1-m^2) - \frac{(n-1)\tau\_x N C\_2}{4n^2} (1-m^2)^2 \\ &- \frac{(n-1)K\_x C\_1}{n^2} m^2 (1-m^2)^2 - \frac{(n-1)K\_x}{2n^2 N} (n+\tau\_x-3) m^4 (1-m^2)^2 \\ &- \frac{(n-1)K\_x}{4n^2 N} (1-m^4)^2, \end{split} \tag{11.35}$$

where in the first term of Eq. (11.35) is defined as

$$\Omega := \frac{1}{n} \sum\_{i=1}^{n} \omega\_i^2, \quad \omega\_i := \frac{1}{n-1} \sum\_{j \in \vartheta(i)} d\_{ij} - C\_1,\tag{11.36}$$

where ∂(*i*) := {1, 2,..., *n*}\{*i*}. When *x* = −1, these coefficients are

$$
\phi\_{-1}^{(1)}(m) = \frac{(n-1)NC\_1}{2n}m^2 - \frac{(n-1)(N+1)}{4n}m^4,\tag{11.37}
$$

$$
\begin{split}
\phi\_{-1}^{(2)}(m) &= \frac{(n-1)^2N^2\Omega}{2n^2}m^2(1-m^2) + \frac{(n-1)N^2C\_2}{4n^2}(1-m^2)^2 \\ &- \frac{(n-1)N(N+1)C\_1}{2n^2}m^2(1-m^2)^2 \\ &- \frac{(n-1)(N+1)}{4n^2}(n-N-3)m^4(1-m^2)^2 - \frac{(n-1)(N+1)}{8n^2}(1-m^4)^2.\end{split}\tag{11.38}
$$

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 12 Dynamical Analysis of Quantum Annealing**

**Anthony C. C. Coolen, Theodore Nikoletopoulos, Shunta Arai, and Kazuyuki Tanaka**

**Abstract** Quantum annealing aims to provide a faster method than classical computing for finding the minima of complicated functions, and it has created increasing interest in the relaxation dynamics of quantum spin systems. Moreover, problems in quantum annealing caused by first-order phase transitions can be reduced via appropriate temporal adjustment of control parameters, and in order to do this optimally, it is helpful to predict the evolution of the system at the level of macroscopic observables. Solving the dynamics of quantum ensembles is nontrivial, requiring modeling of both the quantum spin system and its interaction with the environment with which it exchanges energy. An alternative approach to the dynamics of quantum spin systems was proposed about a decade ago. It involves creating stochastic proxy dynamics via the Suzuki-Trotter mapping of the quantum ensemble to a classical one (the quantum Monte Carlo method), and deriving from this new dynamics closed macroscopic equations for macroscopic observables using the dynamical replica method. In this chapter, we give an introduction to this approach, focusing on the ideas and assumptions behind the derivations, and on its potential and limitations.

## **12.1 Quantum Ensembles and Their Dynamics**

We imagine an ensemble of *K* independent quantum systems |ψ<sup>α</sup>-, labeled by α = 1 ... *K*, all with the same Hamiltonian but distinct initial conditions. Making a measurement of an observable *A* in this ensemble means randomly picking one of the *K* systems, with equal probabilities, and measuring *A* in the selected system. The average of the observable *A* can then be written as *A*- = Tr(ρ *A*), where ρ, the density matrix, is the Hermitian nonnegative definite operator ρ = *K* <sup>−</sup><sup>1</sup> -*K* <sup>α</sup>=<sup>1</sup> <sup>|</sup>ψ<sup>α</sup>ψ<sup>α</sup>|,

S. Arai · K. Tanaka Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan © The Author(s) 2022 N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_12

A. C. C. Coolen (B) · T. Nikoletopoulos

Department of Biophysics, Faculty of Science, Radboud University, 6525AJ Nijmegen, The Netherlands

e-mail: a.coolen@science.ru.nl

Saddle Point Science Ltd, York, United Kingdom

with Tr(ρ) = 1. Since ρ is Hermitian it has a complete basis of eigenstates {|*k*-}. Its eigenvalues *wk* , which are nonnegative and normalized according to - *<sup>k</sup> wk* = 1, can be interpreted as probabilities. We can now write *A*- = - *<sup>n</sup> an* - *<sup>k</sup> wk* |*k*|*n*-|2. Hence the probability of measuring eigenvalue *an* of observable *A* in the ensemble is *Pn* = - *<sup>k</sup> wk* |*k*|*n*-|2, where |*k*|*n*-|<sup>2</sup> is the probability of observing *an* in eigenstate *k* of the density matrix, and *wk* is the probability of finding the ensemble in eigenstate *k*.

The evolution of the density matrix follows from the evolution of the states |ψ<sup>α</sup>-, each governed by the Schrödinger equation, giving <sup>d</sup> <sup>d</sup>*<sup>t</sup>* <sup>ρ</sup> <sup>=</sup> (i-)−1[*H*, ρ]. The solution is <sup>ρ</sup> <sup>=</sup> <sup>e</sup>−i*Ht*/<sup>ρ</sup>*<sup>t</sup>*=<sup>0</sup> <sup>e</sup><sup>i</sup>*Ht*/-. In particular, it follows using the eigenbasis {|*E*-} of *H* that

$$\langle H \rangle = \sum\_{E} \langle E | \mathbf{e}^{-iHt/\hbar} \rho\_{t=0} \mathbf{e}^{\mathrm{i}Ht/\hbar} H | E \rangle \, = \,\langle H \rangle\_{t=0}. \tag{12.1}$$

At equilibrium [*H*, ρ] = 0. The density matrix can therefore be diagonalized simultaneously with *H*, that is, ρ = - *<sup>E</sup> f* (*E*)|*E*-*E*|. The values of *f* (*E*) define the type of equilibrium ensemble at hand. In the canonical ensemble we have *f* (*E*) = exp(−β*E*)/*Z*(β), so

$$\rho = \frac{1}{\mathcal{Z}(\beta)} \sum\_{E} \mathbf{e}^{-\beta E} |E\rangle\langle E| \, \, = \, \frac{1}{\mathcal{Z}(\beta)} \mathbf{e}^{-\beta H}.\tag{12.2}$$

The quantum partition function *<sup>Z</sup>*(β) follows from Tr(ρ) <sup>=</sup> 1: *<sup>Z</sup>*(β) <sup>=</sup> Tr(e−<sup>β</sup> *<sup>H</sup>* ). The free energy and the average internal energy are given by *<sup>F</sup>* = −β−<sup>1</sup> log *<sup>Z</sup>*(β) and *<sup>E</sup>* = − <sup>∂</sup> ∂β log *Z*(β). The expectation values of operators become *A*-<sup>=</sup> *<sup>Z</sup>*(β)−<sup>1</sup> Tr(e−<sup>β</sup> *<sup>H</sup> A*). Note that if the systems of the ensemble evolve strictly according to the Schrödinger equation, there cannot be generic evolution of ρ toward the equilibrium form in Eq. (12.2). For any initial density operator with *H<sup>t</sup>*=<sup>0</sup> = *E* this is ruled out by Eq. (12.1). Since the state in Eq. (12.2) describes the result of equilibration of quantum systems in a heat bath with which they can exchange energy, a correct description of the dynamics requires a Hamiltonian that also describes the degrees of freedom of the heat bath.

This is the first obstacle in the analysis of the dynamics of quantum ensembles: it is difficult even to write down the correct microscopic dynamical laws. A similar situation occurs also in the classical setting.Without a heat bath we have a micro-canonical ensemble with conserved energy. Deriving the Gibbs-Boltzmann distribution from the joint dynamics of the system and heat bath requires us to connect deterministic trajectories to invariant measures via ergodic theory and to subsequently derive the form of these measures, which has so far proven possible for only a handful of models.

The approach followed in [1] was to circumvent ensembles altogether and solve the Schrödinger equation for small systems in which a decaying longitudinal field acts as quantum noise (which is indeed what happens in quantum annealing). In classical systems one often *defines* the pain away. One constructs an intuitively reasonable stochastic process that evolves toward the Gibbs-Boltzmann state, usually of the Markov Chain Monte Carlo (MCMC) form. This process is studied as a proxy for the dynamics of the original system. The price paid is that one cannot be sure to what extent the stochastic dynamics are close to those of the original system. The MCMC equations are not even unique, since there are many choices that evolve to the Gibbs-Boltzmann state. The same dynamics strategy can be applied to quantum systems if the latter can be mapped to classical ones. This is achieved by the Suzuki-Trotter formalism [4].

#### **12.2 Quantum Monte Carlo Dynamics**

In order to apply quantum annealing to optimization problems formulated in terms of binary variables, one needs spin- <sup>1</sup> <sup>2</sup> particles [1]. These are labeled by *i* = 1 ... *N*, with Pauli matrices {<sup>σ</sup> *<sup>x</sup> <sup>i</sup>* , σ *<sup>y</sup> <sup>i</sup>* , σ *<sup>z</sup> <sup>i</sup>* }. In the standard representation of σ *<sup>z</sup>* -eigenstates:

$$
\sigma^x = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \qquad \sigma^y = \begin{pmatrix} 0 & -1 \\ \mathrm{i} & 0 \end{pmatrix}, \qquad \sigma^z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}.
$$

In quantum annealing one chooses Hamiltonians of the form *H* = *H*<sup>0</sup> + *H*1, in which *H*<sup>0</sup> is obtained by replacing the classical spins σ*<sup>i</sup>* = ±1 in an Ising Hamiltonian by the matrices σ *<sup>z</sup> <sup>i</sup>* and a second part *H*<sup>1</sup> that acts as a form of quantum noise1:

$$H\_0 = -\sum\_{i$$

*H*<sup>0</sup> represents the quantity to be minimized in our optimization problem. The classical state achieving this minimum follows from the quantum ground state of the system upon moving the parameters and β−<sup>1</sup> adiabatically slowly to zero and is hence obtained from the partition function *<sup>Z</sup>*(β) <sup>=</sup> Tr(e−<sup>β</sup> *<sup>H</sup>*0−<sup>β</sup> *<sup>H</sup>*<sup>1</sup> ). For excellent reviews of the physics and the applications of the above types of quantum spin systems with transverse fields, we refer to [2, 3].

The Suzuki-Trotter procedure [4] allows us to convert the above quantum problem into a classical one using the operator identity

$$\mathbf{e}^{A+B} = \lim\_{M \to \infty} \left( \mathbf{e}^{A/M} \mathbf{e}^{B/M} \right)^M. \tag{12.4}$$

From now on we assume that *A* and *B* are Hermitian operators, and we write the basis of eigenstates of *A* as {|*n*-}. We then obtain after some simple manipulations:

<sup>1</sup> For simplicity, we choose *H*<sup>0</sup> here to be quadratic in the spins, and the external field to be uniform, but this is not essential.

298 A. C. C. Coolen et al.

$$\operatorname{Tr}(\mathbf{e}^{A+B}) = \lim\_{M \to \infty} \sum\_{n\_1 \dots n\_M} \mathbf{e}^{\sum\_{k=1}^M a\_{n\_k}/M} \prod\_{k, \text{ mod}(M)} \langle n\_k | \mathbf{e}^{B/M} | n\_{k+1} \rangle . \tag{12.5}$$

Application to *A* = −β *H*<sup>0</sup> and *B* = −β *H*1, where the relevant basis is that of the joint eigenstates of all {<sup>σ</sup> *<sup>z</sup> <sup>i</sup>* }, that is, |*s*1,...,*sN* -=|*s*1- ⊗ ... ⊗ |*sN* -, with *si* = ±1 and σ *<sup>z</sup> <sup>i</sup>* |*s*1,...,*sN* - = *si*|*s*1,...,*sN* -, gives *Z*(β) = lim*<sup>M</sup>*→∞ *Z<sup>M</sup>* (β), where

$$\begin{split} \mathcal{Z}\_{M}(\boldsymbol{\beta}) &= \sum\_{\{\boldsymbol{x}\_{ik} = \pm 1\}} \mathsf{e}^{(\boldsymbol{\beta}/M)} \sum\_{k=1}^{M} [\sum\_{\boldsymbol{i} < \boldsymbol{j}} J\_{\boldsymbol{i}\boldsymbol{j}} \mathsf{s}\_{ik} \boldsymbol{x}\_{jk} + h \sum\_{\boldsymbol{i}} \mathsf{s}\_{ik}]} \prod\_{k, \text{ mod}(M)} \prod\_{i=1}^{N} \langle \mathsf{s}\_{ik} | \mathsf{e}^{(\boldsymbol{\beta} \boldsymbol{\Gamma}/M) \mathsf{o}\_{i}^{\boldsymbol{x}}} | \mathsf{s}\_{i,k+1} \rangle \\ &= \mathsf{e}^{\frac{1}{2}NM \log\left[\frac{1}{2} \sinh(2\beta \boldsymbol{\Gamma}/M)\right]} \\ &\times \sum\_{\{\boldsymbol{x}\_{ik} = \pm 1\}} \mathsf{e}^{(\boldsymbol{\beta}/M)} \sum\_{k=1}^{M} [\sum\_{\boldsymbol{i} < \boldsymbol{j}} J\_{\boldsymbol{i}\boldsymbol{j}} \mathsf{s}\_{ik} \boldsymbol{x}\_{jk} + h \sum\_{\boldsymbol{i}} \mathsf{s}\_{\boldsymbol{i}k}] + B \sum\_{k, \text{mod}(M)} \sum\_{\boldsymbol{i}} \, \operatorname{\boldsymbol{s}\_{ik}\boldsymbol{x}\_{i,k+1}} . \end{split} \tag{12.6}$$

in which *<sup>B</sup>* = −<sup>1</sup> <sup>2</sup> log tanh(β/*M*). Thus the partition function of the *N*-spin quantum system is mapped (apart from a constant) onto the limit *M* → ∞ of that of a classical Ising model with *N M* spins *s* = {*sik* }, with Hamiltonian *H*(*s*) and asymptotic free energy density *f* = lim*<sup>N</sup>*→∞ lim*<sup>M</sup>*→∞ *fN*,*<sup>M</sup>* :

$$H(\mathbf{s}) = -\frac{1}{M} \sum\_{k=1}^{M} \sum\_{i
$$-\frac{B}{\beta} \sum\_{k, \text{mod}(M)} \sum\_{i} s\_{ik} s\_{i, k+1},$$

$$f\_{N, M} = -\frac{M}{2\beta} \log \left[ \frac{1}{2} \sinh(2\beta \Gamma/M) \right]$$

$$-\frac{1}{\beta N} \log \sum\_{\{s\_{il} = \pm 1\}} \mathbf{e}^{\frac{\beta}{M} \sum\_{k=1}^{M} (\sum\_{i$$
$$

The new system in Eq. (12.8), for *M* →∞ equivalent to the original quantum one, lends itself to constructing a stochastic dynamics. We first write the Suzuki-Trotter Hamiltonian in the standard form of *N M* interacting Ising spins in an external field:

$$H(\mathbf{s}) = -\frac{1}{2} \sum\_{ik, j\ell} s\_{ik} J\_{ik, j\ell} s\_{j\ell} - \theta \sum\_{ik} s\_{ik},\tag{12.9}$$

$$J\_{ik,j\ell} = \frac{1}{M}\delta\_{k\ell}J\_{ij}(1-\delta\_{ij}) + \frac{B}{\beta}\delta\_{ij}(\delta\_{k,\ell+1}+\delta\_{\ell,k+1}), \qquad \theta = h/M. \tag{12.10}$$

The conventional Glauber dynamics by which this classical system evolves toward the equilibrium state with the above Hamiltonian is, after switching to continuous time [5] and denoting by *pt*(*s*) the probability of finding the system in state *s* at time *t*:

$$\operatorname{tr} \frac{\mathbf{d}}{\mathbf{d}t} p\_t(\mathbf{s}) = \sum\_{i=1}^{N} \sum\_{k=1}^{M} \left\{ p\_t(F\_{ik}\mathbf{s}) w\_{ik}(F\_{ik}\mathbf{s}) - p\_t(\mathbf{s}) w\_{ik}(\mathbf{s}) \right\},\tag{12.11}$$

$$m\_{ik}(\mathbf{s}) = \frac{1}{2} [1 - s\_{ik} \tanh(\beta h\_{ik}(\mathbf{s}))], \qquad h\_{ik}(\mathbf{s}) = \sum\_{j\ell} J\_{ik, j\ell} \mathbf{s}\_{j\ell} + \theta. \tag{12.12}$$

This master equation describes a process where at each step a site *i* ∈ {1,..., *N*} and a Trotter slice *k* ∈ {1,..., *M*} are picked at random, followed by an attempt to flip the spin *sik* . The *wik* (*s*) denote transition rates for *sik* → −*sik* . *Fik* is an operator that flips spin *sik* and leaves all others invariant. The parameter τ defines time units such that the average duration of a single spin update is τ/*N*. Working out the local fields *hik* (*s*) gives

$$h\_{ik}(\mathbf{s}) = \frac{1}{M} \sum\_{j \neq i} J\_{ij} \mathbf{s}\_{jk} + \frac{B}{\beta} (\mathbf{s}\_{i,k+1} + \mathbf{s}\_{i,k-1}) + h/M. \tag{12.13}$$

The process in Eqs. (12.11, 12.12) is suitable for numerical simulation and defines the quantum Monte Carlo dynamics for the ensemble with Hamiltonian given by Eq. (12.3) provided we take *M* →∞. When applied to quantum annealing models, some authors have called it 'simulated quantum annealing'. The definition in Eqs. (12.11, 12.12), however, is not unique. Many alternative stochastic processes evolve toward the same Gibbs-Boltzmann state (see, e.g., [6]).

#### **12.3 Dynamical Replica Analysis**

The remaining challenge is to extract formulae describing the evolution of relevant macroscopic quantities from Eqs. (12.11, 12.12). This was addressed in [7–10] using the so-called dynamical replica method (DRT) [12–14]. In this chapter, we deviate from the definitions in [7–10] and stay closer to the original DRT ideas.

The dynamics (12.11, 12.12) imply that expectation values*G*(*s*)-=- *<sup>s</sup> pt*(*s*)*G*(*s*) evolve according to:

$$\pi \frac{\mathbf{d}}{\mathbf{d}t} \langle G(\mathbf{s}) \rangle = \sum\_{i=1}^{N} \sum\_{k=1}^{M} \sum\_{\mathbf{s}} p\_i(\mathbf{s}) w\_{ik}(\mathbf{s}) \Big[ G(F\_{ik}\mathbf{s}) - G(\mathbf{s}) \Big]. \tag{12.14}$$

To study the joint dynamics of a set of *L* observables *-*(*s*) = ( 1(*s*), . . . , *<sup>L</sup>* (*s*)) we substitute *G*(*s*) = δ[*-* − *-*(*s*)]. Now *G*(*s*)- = *Pt*(*-*), and

$$\tau \frac{d}{dt} P\_t(\mathfrak{A}) = \sum\_{i=1}^{N} \sum\_{k=1}^{M} \sum\_{s} p\_t(\mathfrak{s}) w\_{ik}(\mathfrak{s}) \left[ \delta[\mathfrak{A} - \mathfrak{A}(F\_{ik}\mathfrak{s})] - \delta[\mathfrak{A} - \mathfrak{A}(\mathfrak{s})] \right]. \tag{12.15}$$

If the observables μ(*s*) are *O*(1) and macroscopic in nature, their susceptibility to single spin flips *jk*μ(*s*) = μ(*Fik s*) − μ(*s*) will be small. We can then define *jk* <sup>=</sup> (*jk*1(*s*), . . . , *jkL* (*s*)) <sup>∈</sup> IR*<sup>L</sup>* , and expand (12.15) in a distributional sense, that is,

$$\begin{split} \pi \frac{\mathrm{d}}{\mathrm{d}t} \int \mathrm{d}\mathfrak{Q} \, \, P\_{\ell}(\mathfrak{Q}) G(\mathfrak{Q}) &= \int \mathrm{d}\mathfrak{Q} \, G(\mathfrak{Q}) \sum\_{\ell \ge 1} \frac{(-1)^{\ell}}{\ell!} \frac{\partial^{\ell}}{\partial \Delta\_{\mu\_{1}} \dots \partial \Delta\_{\mu\_{\ell}}} \\ & \times \left\{ \sum\_{\mu\_{1}=1}^{L} \dots \sum\_{\mu\_{\ell}=1}^{L} \sum\_{i=1}^{N} \sum\_{k=1}^{M} \left\langle w\_{ik}(\mathfrak{s}) \delta[\mathfrak{Q} - \mathfrak{A}(\mathfrak{s})] \Delta\_{ik\mu\_{1}}(\mathfrak{s}) \dots \Delta\_{ik\mu\_{\ell}}(\mathfrak{s}) \right\rangle \right\}. \end{split} \tag{12.16}$$

We thereby arrive at the following Kramers-Moyal expansion

$$\operatorname{tr} \frac{\mathbf{d}}{\operatorname{d}t} P\_{\ell}(\boldsymbol{\Omega}) = \sum\_{\ell \ge 1} \frac{(-1)^{\ell}}{\ell!} \sum\_{\mu\_1=1}^{L} \dots \sum\_{\mu\_\ell=1}^{L} \frac{\partial^{\ell}}{\partial \boldsymbol{\Omega}\_{\mu\_1} \dots \partial \boldsymbol{\Omega}\_{\mu\_\ell}} \Big\{ P\_{\ell}(\boldsymbol{\Omega}) F^{(\ell)}\_{\mu\_1 \dots \mu\_\ell}[\boldsymbol{\Omega}; \boldsymbol{\varepsilon}] \Big\}, \quad (12.17)$$

with

$$F^{(\ell)}\_{\mu\_1\dots\mu\_\ell}[\mathbf{A};t] = \left\langle \sum\_{i=1}^N \sum\_{k=1}^M w\_{ik}(\mathbf{s}) \Delta\_{ik\mu\_1}(\mathbf{s}) \dots \Delta\_{ik\mu\_\ell}(\mathbf{s}) \right\rangle\_{\mathbf{S};t},\tag{12.18}$$

$$\langle f(\mathbf{s})\rangle\_{\mathfrak{A};l} = \frac{\sum\_{\mathbf{s}} p\_l(\mathbf{s})\delta[\mathfrak{A} - \mathfrak{A}(\mathbf{s})]f(\mathbf{s})}{\sum\_{\mathbf{s}} p\_l(\mathbf{s})\delta[\mathfrak{A} - \mathfrak{A}(\mathbf{s})]}.\tag{12.19}$$

Asymptotically, that is, for *N*, *M* → ∞, only the first term of Eq. (12.17) survives if

$$\lim\_{N,M\to\infty} \sum\_{\ell\geq 2} \frac{1}{\ell!} \sum\_{\mu\_1=1}^L \dots \sum\_{\mu\_\ell=1}^L \sum\_{i=1}^N \sum\_{k=1}^M \left< \left| \Delta\_{ik\mu\_1}(\mathbf{s}) \dots \Delta\_{ik\mu\_\ell}(\mathbf{s}) \right| \right>\_{\mathbf{\hat{s}};t} = 0. \tag{12.20}$$

If all *ik*μ(*s*) scale similarly, that is, ∃˜ *<sup>N</sup>*,*<sup>M</sup>* such that *ik*μ(*s*) = *O*(˜ *<sup>N</sup>*,*<sup>M</sup>* ) for *N*, *M* →∞, then Eq. (12.17) retains only its first term if lim*<sup>N</sup>*,*M*→∞ *L*˜ *<sup>N</sup>*,*<sup>M</sup>* <sup>√</sup>*N M* <sup>=</sup> 0. In that case it reduces to a Liouville equation describing deterministic evolution of *-*:

$$\pi \frac{\mathbf{d}}{\mathbf{d}t} \boldsymbol{\Omega}\_{\mu} = \left\langle \sum\_{i=1}^{N} \sum\_{k=1}^{M} \boldsymbol{w}\_{ik}(\mathbf{s}) \boldsymbol{\Delta}\_{ik\mu}(\mathbf{s}) \right\rangle\_{\mathbf{\varOmega};t} \tag{12.21}$$

If lim*<sup>N</sup>*,*M*→∞ *L*˜ *<sup>N</sup>*,*<sup>M</sup>* <sup>√</sup>*N M* <sup>&</sup>gt; 0, we can no longer ignore the fluctuations in our observables *-*(*s*), which limits our choice of observables.

Equation (12.21) is closed if -*N i*=1 -*M <sup>k</sup>*=<sup>1</sup> *wik* (*s*)*ik*μ(*s*) is a function of *-*(*s*) only (which would simply drop out). If this is not the case, we close Eq. (12.21) using a maximum entropy argument: we approximate *pt*(*s*) in Eq. (12.21) by a form that assumes that all micro-states with the same value for *-*(*s*) are equally likely. Now Eq. (12.21) becomes

$$\pi \frac{\mathbf{d}}{\mathbf{d}t} \Omega\_{\mu} = \frac{\sum\_{s} \delta[\mathfrak{A} - \mathfrak{A}(\mathbf{s})] \sum\_{i=1}^{N} \sum\_{k=1}^{M} w\_{ik}(\mathbf{s}) \Delta\_{ik\mu}(\mathbf{s})}{\sum\_{s} \delta[\mathfrak{A} - \mathfrak{A}(\mathbf{s})]}. \tag{12.22}$$

Within the replica formalism [16, 17], this closed equation can also be written as

$$\pi \frac{\mathbf{d}}{\mathbf{d}t} \Omega\_{\mu} = \lim\_{n \to 0} \sum\_{\mathbf{s}^1 \dots \mathbf{s}^n} \left( \prod\_{a=1}^n \delta[\mathbf{\varDelta - \mathbf{\varDelta}(\mathbf{s}^a)] \sum\_{i=1}^N \sum\_{k=1}^M w\_{ik}(\mathbf{s}^1) \Delta\_{ik\mu}(\mathbf{s}^1) . \right. \tag{12.23}$$

The accuracy of Eq. (12.22) depends on our choice for the observables μ(*s*). We want them to be *O*(1), obeying lim*<sup>N</sup>*,*M*→∞ *L*˜ *<sup>N</sup>*,*<sup>M</sup>* <sup>√</sup>*N M* <sup>=</sup> 0, and such that the probability equipartitioning assumption is as harmless as possible. Including *H*(*s*)/*N* and *N* <sup>−</sup><sup>1</sup> log *p*0(*s*) in our set of observables ensures that equipartitioning holds for *t* → 0 and *t* → ∞. If we have disorder in the couplings {*Ji j*}, and for *N* →∞ our observables are self-averaging with respect to its realization, we can average over the disorder.<sup>2</sup> This gives

$$\pi \frac{\mathbf{d}}{\mathbf{d}t} \Omega\_{\mu} = \lim\_{n \to 0} \sum\_{\mathbf{s}^1 \dots \mathbf{s}^n} \overline{\left( \prod\_{a=1}^n \delta[\mathfrak{A} - \mathfrak{A}(\mathbf{s}^a)] \sum\_{i=1}^N \sum\_{k=1}^M w\_{ik}(\mathbf{s}^1) \Delta\_{ik\mu}(\mathbf{s}^1) . \right. \tag{12.24}$$

For the system in Eq. (12.8) and the typical initial conditions in quantum annealing, there are two natural and simple routes for choosing the observables in the DRT method,<sup>3</sup> all involving the normalized distinct energy contributions in Eq. (12.27):

• *Trotter slice-dependent observables* We choose, for *k* = 1 ... *M* (mod *M*),

$$E\_k(\mathbf{s}) = -\frac{1}{N} \sum\_{i$$

Now *L* = 3*M*, and the susceptibilities of the observables to single spin flips are, using - *<sup>j</sup> Ji jsjk* = *O*(1) for all *k* (required for an extensive Hamiltonian):

$$
\Delta\_{ik} E\_q(\mathbf{s}) = 2N^{-1} \delta\_{qk} s\_{ik} \sum\_{j \neq i} J\_{ij} s\_{jk} = \mathcal{O}(N^{-1}), \tag{12.26}
$$

$$
\Delta\_{ik} m\_q(\mathbf{s}) = -2N^{-1} \delta\_{qk} s\_{ik} = \mathcal{O}(N^{-1}), \tag{12.27}
$$

$$\Delta\_{ik}\mathcal{E}\_q(\mathbf{s}) = -2N^{-1}s\_{ik}(\delta\_{qk}s\_{i,k+1} + \delta\_{k,q+1}s\_{i,k-1}) = \mathcal{O}(N^{-1}).\tag{12.28}$$

<sup>2</sup> Without disorder one does not need the replica formalism yet and can work directly with (12.22).

<sup>3</sup> One can always add further observables, or split the present ones into distinct contributions. This generally improves the accuracy of the theory provided lim*N*,*M*→∞ *L*˜ *<sup>N</sup>*,*<sup>M</sup>* <sup>√</sup>*N M* <sup>=</sup> 0 still holds.

Hence ˜ *<sup>N</sup>*,*<sup>M</sup>* <sup>=</sup> *<sup>N</sup>*<sup>−</sup>1, so deterministic evolution requires that *<sup>M</sup> <sup>N</sup>* <sup>1</sup> <sup>3</sup> as *M*, *N* → ∞. Hence, on choosing Eq. (12.25) we can no longer take *M* →∞ before *N* →∞, which would have been the correct order, and must rely on these limits commuting.4

• *Trotter slice-independent observables*

These are simply averages over all Trotter slices of the previous set in Eq. (12.25), that is,

$$E(\mathbf{s}) = \frac{1}{M} \sum\_{k=1}^{M} E\_k(\mathbf{s}), \quad m(\mathbf{s}) = \frac{1}{M} \sum\_{k=1}^{M} m\_k(\mathbf{s}), \quad \mathcal{E}(\mathbf{s}) = \frac{1}{M} \sum\_{k=1}^{M} \mathcal{E}\_k(\mathbf{s}). \tag{12.29}$$

Hence *L* = 3, and the spin-flip susceptibilities come out as

$$
\Delta\_{ik}E(\mathbf{s}) = 2(NM)^{-1}s\_{ik}\sum\_{j\neq i}J\_{ij}s\_{jk} = \mathcal{O}((NM)^{-1}),\tag{12.30}
$$

$$
\Delta\_{ik} m(\mathbf{s}) = -\mathcal{D}(NM)^{-1} s\_{ik} = \mathcal{O}((NM)^{-1}), \tag{12.31}
$$

$$
\Delta\_{ik}\mathcal{E}(\mathbf{s}) = -\mathcal{D}(NM)^{-1}s\_{ik}(s\_{i,k+1} + s\_{i,k-1}) = \mathcal{O}((NM)^{-1}).\tag{12.32}
$$

Now ˜ *<sup>N</sup>*,*<sup>M</sup>* = 1/*N M*. Deterministic evolution requires lim*N*,*M*→∞(*N M*) − 1 <sup>2</sup> = 0, which is always true.We can therefore take our two limits in any desired order without having to worry about fluctuations in our macroscopic observables.

#### **12.4 Simple Examples**

We illustrate the previous approach by application to simple models. We investigate the commutation of the limits *N* → ∞ and *M* → ∞, and the link between stationary states of the dynamical equations and the equilibrium theory. We start with the simplest case of non-interacting spins in a uniform *x* field, followed by non-interacting spins in uniform *x* and *z* fields and ferromagnetically interacting quantum systems.

#### *12.4.1 Non-interacting Quantum Spins in a Uniform x Field*

This is the simplest case of Eq. (12.8), where *h* = *Ji j* = 0 for all (*i*, *j*). Although this specific model is physically trivial, it is still instructive since it already reveals many general features of the more general dynamical theory. The statics analysis gives

$$\mathcal{Z}\_M(\beta) = \left\{ \mathbf{e}^{\frac{1}{2}M \log\left\{\frac{1}{2}\sinh(2\beta\Gamma/M)\right\}} \text{Tr}(\mathbf{K}^M) \right\}^N,\tag{12.33}$$

with a 2 × 2 transfer matrix of the one-dimensional Ising chain:

<sup>4</sup> The assumption that the order of the limits *<sup>N</sup>* →∞ and *<sup>M</sup>* →∞ can be changed is also made in equilibrium studies such as [15], where steepest descent integration is used as *N* →∞ for fixed *M*.

$$K = \begin{pmatrix} \mathbf{c}^B & \mathbf{c}^{-B} \\ \mathbf{c}^{-B} & \mathbf{c}^B \end{pmatrix}, \quad \text{eigenvalues: } \lambda\_+ = 2\cosh(B), \ \lambda\_- = 2\sinh(B). \tag{12.34}$$

After some rewriting and insertion of the definition of *B* we obtain:

$$\mathcal{Z}\_M(\beta) = \left\{ \mathbf{e}^{\frac{1}{2}M\log\left[\frac{1}{2}\sinh(2\beta\Gamma/M)\right]} 2^M \left[\cosh^M(B) + \sinh^M(B)\right] \right\}^N$$

$$= \left[\mathbf{2}\cosh(\beta\Gamma)\right]^N. \tag{12.35}$$

This gives the correct free energy density *fN*,*<sup>M</sup>* = − <sup>1</sup> <sup>β</sup> log[2 cosh(β)].

Next, we turn to the macroscopic dynamical equations in Eq. (12.21). Since *Ji j* = 0, the order parameters *Ek* (*s*) and *E*(*s*) are always zero. The two dynamical routes give:

• *Trotter slice-dependent observables*

The observables are {*mk* (*s*), *E<sup>k</sup>* (*s*)}, and we are forced to take *N* →∞ before *M* →∞. Using identities such as tanh[*B*(*s*+*s* )] = <sup>1</sup> <sup>2</sup> (*s*+*s* )tanh(2*B*) we obtain:

$$
\pi \frac{\mathbf{d}}{\mathbf{d}t} m\_k = -m\_k + \frac{1}{2} (m\_{k+1} + m\_{k-1}) \tanh(2B), \tag{12.36}
$$

$$\pi \frac{\mathrm{d}}{\mathrm{d}t} \mathcal{E}\_k = \tanh(2B)[1 + \frac{1}{2}(C\_k + C\_{k+1})] - 2\mathcal{E}\_k,\tag{12.37}$$

in which, using the equivalence of the *N* sites *i*, we have the 2-slice correlators:

$$C\_k = \frac{\sum\_s \left[ \prod\_q \delta[m\_q - m\_q(\mathbf{s})] \delta[\mathcal{E}\_q - \mathcal{E}\_q(\mathbf{s})] \right] s\_{1,k-1} s\_{1,k+1}}{\sum\_s \left[ \prod\_q \delta[m\_q - m\_q(\mathbf{s})] \delta[\mathcal{E}\_q - \mathcal{E}\_q(\mathbf{s})] \right]}. \tag{12.38}$$

One can compute these for *N* → ∞ with fixed *M* via steepest descent integration:

$$C\_k = \frac{\sum\_{s\_1 \dots s\_M} \mathbf{e}^{\sum\_q (\mathbf{x}\_q \mathbf{s}\_q + \mathbf{y}\_q \mathbf{s}\_q \mathbf{x}\_{q+1})} \mathbf{e}\_{k-1} s\_{k+1}}{\sum\_{s\_1 \dots s\_M} \mathbf{e}^{\sum\_q (\mathbf{x}\_q \mathbf{s}\_q + \mathbf{y}\_q \mathbf{x}\_q \mathbf{x}\_{q+1})}},\tag{12.39}$$

in which *x* = (*x*1,..., *xM* ) and *y* = (*y*1,..., *yM* ) are to be solved from

$$m\_k = \frac{\partial \log Z}{\partial x\_k}, \quad \mathcal{E}\_k = \frac{\partial \log Z}{\partial \mathbf{y}\_k}, \quad Z(\mathbf{x}, \mathbf{y}) = \sum\_{\mathbf{x}\_1 \dots \mathbf{x}\_M} \mathbf{e}^{\sum\_q (\mathbf{x}\_q \mathbf{s}\_q + \mathbf{y}\_q \mathbf{s}\_{q+1})}. \tag{12.40}$$

• *Trotter slice-independent observables* In this case, we only have *m*(*s*) and *E*(*s*), and working out Eq. (12.21) gives

$$
\tau \frac{\mathbf{d}}{\mathbf{d}t} m = -m[1 - \tanh(2B)], \qquad \tau \frac{\mathbf{d}}{\mathbf{d}t} \mathcal{E} = (\mathbf{l} + \mathbf{C})\tanh(2B) - 2\mathcal{E}, \tag{12.41}
$$

with

$$C = \frac{\sum\_{s} \delta[m - m(\mathbf{s})] \delta[\mathcal{E} - \mathcal{E}(\mathbf{s})] \mathbf{s}\_{1,1} \mathbf{s}\_{1,3}}{\sum\_{s} \delta[m - m(\mathbf{s})] \delta[\mathcal{E} - \mathcal{E}(\mathbf{s})]}. \tag{12.42}$$

Calculating the 2-slice correlator *C* using steepest descent results in

$$C = \frac{\sum\_{\mathbf{x}\_1 \dots \mathbf{x}\_M} \mathbf{e}^{\frac{1}{M} \sum\_q (\mathbf{x} \mathbf{x}\_q + \mathbf{y} \mathbf{x}\_q \mathbf{x}\_{q+1})} s\_{1 \mathbf{S} \mathbf{S}}}{\sum\_{\mathbf{s}\_1 \dots \mathbf{s}\_M} \mathbf{e}^{\frac{1}{M} \sum\_q (\mathbf{x} \mathbf{x}\_q + \mathbf{y} \mathbf{x}\_q \mathbf{x}\_{q+1})}},\tag{12.43}$$

$$m = \frac{\partial \log Z}{\partial x}, \qquad \mathcal{E} = \frac{\partial \log Z}{\partial \mathbf{y}}, \qquad Z(\mathbf{x}, \mathbf{y}) = \sum\_{\mathbf{x}\_1 \dots \mathbf{x}\_M} \mathbf{e}^{\frac{1}{M} \sum\_q (\mathbf{x}\_q + \mathbf{y}\_q \mathbf{x}\_{q+1})}. \tag{12.44}$$

If at time zero the *mk* and *E<sup>k</sup>* in Eqs. (12.36, 12.37) are independent of *k*, this will remain true at all times<sup>5</sup> and the dynamics in Eqs. (12.36, 12.37) simplifies to Eq. (12.41). Computing *C* involves solving a one-dimensional Ising model with a constant external field, whereas computing *Ck* requires solving heterogeneous spin chain models in equilibrium for arbitrary coupling constants and fields. This is the second reason, in addition to the issue with limits, for why it is preferable to work with Trotter slice-independent observables.

For non-interacting spins with *h* = 0 the analysis is similar. Here *f* = lim*<sup>M</sup>*→∞ *fN*,*<sup>M</sup>* = −β−<sup>1</sup> log[2 cosh(β√<sup>2</sup>+*h*<sup>2</sup>)], with equilibrium magnetisation

$$
\delta m = -\partial f / \partial h = \tanh(\beta \sqrt{h^2 + \Gamma^2}) \frac{h}{\sqrt{h^2 + \Gamma^2}},
\tag{12.45}
$$

and the Trotter slice-independent observables are predicted to obey

$$\pi \frac{\mathbf{d}}{\mathbf{d}t} m = \frac{1}{2} (1 - C) \tanh(\beta h/M) + \frac{1}{2} \mathcal{Q}\_{+} (1 + C) - m (1 - \mathcal{Q}\_{-}), \quad (12.46)$$

$$
\pi \frac{\text{d}}{\text{d}t} \mathcal{E} = (\text{l} + \text{C}) \mathcal{Q}\_{-} + 2 \mathcal{Q}\_{+} m - 2 \mathcal{E}, \tag{12.47}
$$

with *<sup>Q</sup>*<sup>±</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> [tanh(β*h*/*M* +2*B*)±tanh(β*h*/*M* −2*B*)]. Since lim*<sup>h</sup>*→<sup>0</sup> *Q*<sup>+</sup> = 0 and lim*<sup>h</sup>*→<sup>0</sup> *Q*<sup>−</sup> = tanh(2*B*), Eqs. (12.46,12.47) indeed revert back to Eq. (12.41) for *h* → 0.We inspect the fixed-points of Eqs. (12.46, 12.47) after having also added spin interactions in the next section. Clearly, since lim*<sup>M</sup>*→∞ *Q*<sup>+</sup> = lim*<sup>M</sup>*→∞(1−*Q*−) = 0 the relaxation time of the system diverges for *M* → ∞, with closer inspection revealing that d*m*/d*<sup>t</sup>* <sup>=</sup> *<sup>O</sup>*(*M*−<sup>2</sup>). This makes physical sense: for large *<sup>M</sup>*, hence large *<sup>B</sup>*, the Trotter slices increasingly prefer identical states, so state changes (in a single slice) become rare as they require the mounting energetic costs of breaking the Trotter symmetry.

<sup>5</sup> In [7, 8, 10] this is called the static approximation.

## *12.4.2 Ferromagnetic z-interactions and Uniform x and z Fields*

We now choose *h* = 0, = 0, and *Ji j* = *J*0/*N* for all *i* = *j*, so that the quantum Hamiltonian is *H* = −(*J*0/*N*) - *<sup>i</sup>*<sup>&</sup>lt; *<sup>j</sup>* σ *<sup>z</sup> <sup>i</sup>* σ *<sup>z</sup> <sup>j</sup>* − - *<sup>i</sup>*(*h*σ *<sup>z</sup> <sup>i</sup>* <sup>+</sup> σ *<sup>x</sup> <sup>i</sup>* ). This is known as the Husimi-Temperley-Curie-Weiss model in a transverse field [11]. In the statics, after some simple manipulations and using the short-hand D*<sup>z</sup>* <sup>=</sup> (2π )<sup>−</sup> <sup>1</sup> <sup>2</sup> e<sup>−</sup> <sup>1</sup> 2 *z*2 d*z*, we find:

$$\begin{split} Z\_{M}(\boldsymbol{\beta}) &= \mathbf{e}^{\frac{1}{2}NM} \log[\frac{1}{2}\sinh(2\boldsymbol{\beta}\boldsymbol{\Gamma}/M)] - \frac{1}{2}\boldsymbol{\beta}J\_{0} \\ &\times \int \left[\prod\_{k=1}^{M} \mathbf{D}z\_{k}\right] \left\{ \text{Tr}\prod\_{k=1}^{M} \mathbf{K}\left(z\_{k}\sqrt{\frac{M}{\boldsymbol{\beta}J\_{0}N}}\right) \right\}^{N} \end{split} \tag{12.48}$$

with the non-symmetric transfer matrix

$$\mathbf{K}(\mathbf{x}) = \begin{pmatrix} \mathbf{e}^{B + \beta h/M + \beta J\_0 \mathbf{x}/M} & \mathbf{e}^{-B + \beta J\_0 \mathbf{x}/M} \\ \mathbf{e}^{-B - \beta J\_0 \mathbf{x}/M} & \mathbf{e}^{B - \beta h/M - \beta J\_0 \mathbf{x}/M} \end{pmatrix} = \mathbf{e}^{\mathbf{x}(\beta J\_0/M)\sigma^{\varepsilon}} \mathbf{K}(0). \tag{12.49}$$

We first turn to the statics of the model. It is not immediately clear whether or not the limits *N*, *M* →∞ in Eq. (12.48) commute. Upon taking the limit *N* → ∞ first, one obtains via steepest descent integration:

$$\begin{split} \lim\_{N \to \infty} f\_{N,M} &= -\frac{M}{2\beta} \log \left[ \frac{1}{2} \sinh \left( \frac{2\beta \Gamma}{M} \right) \right] \\ &- \frac{1}{\beta} \text{extr}\_{\mathbf{x}} \left\{ \log \text{Tr} \prod\_{k=1}^{M} \mathbf{K}(\mathbf{x}\_{k}) - \frac{\beta J\_{0}}{2M} \mathbf{x}^{2} \right\}. \end{split} \tag{12.50}$$

We find the derivatives of the quantity (*x*) to be extremized, with δ*ab* = 1−δ*ab*:

$$\frac{\partial\Psi}{\partial\mathbf{x}\_{q}} = \frac{\beta J\_{0}}{M} \left\{ \frac{\text{Tr}\,\prod\_{k=1}^{M} (\overline{\delta}\_{kq}\mathbf{I} + \delta\_{kq}\sigma^{z})\,\mathbf{K}(\mathbf{x}\_{k})}{\text{Tr}\,\prod\_{k=1}^{M}\,\mathbf{K}(\mathbf{x}\_{k})} - \mathbf{x}\_{q} \right\}, \tag{12.51}$$
 
$$\frac{\partial^{2}\Psi}{\partial\mathbf{x}\_{q}\,\partial\mathbf{x}\_{r}} = \left(\frac{\beta J\_{0}}{M}\right)^{2} \left\{ \frac{\text{Tr}\,\prod\_{k=1}^{M} (\overline{\delta}\_{kq}\mathbf{I} + \delta\_{kq}\sigma^{z})(\overline{\delta}\_{kr}\mathbf{I} + \delta\_{kr}\sigma^{z})\,\mathbf{K}(\mathbf{x}\_{k})}{\text{Tr}\,\prod\_{k=1}^{M}\,\mathbf{K}(\mathbf{x}\_{k})} \right.$$
 
$$-\frac{\text{Tr}\,\prod\_{k=1}^{M}(\overline{\delta}\_{kq}\mathbf{I} + \delta\_{kq}\sigma^{z})\,\mathbf{K}(\mathbf{x}\_{k})}{\text{Tr}\,\prod\_{k=1}^{M}\,\mathbf{K}(\mathbf{x}\_{k})} \frac{\text{Tr}\,\prod\_{k=1}^{M}(\overline{\delta}\_{k\tau}\mathbf{I} + \delta\_{k\tau}\sigma^{z})\,\mathbf{K}(\mathbf{x}\_{k})}{\text{Tr}\,\prod\_{k=1}^{M}\,\mathbf{K}(\mathbf{x}\_{k})} \right\} - \frac{\beta J\_{0}}{M}\delta\_{qr}. \tag{12.52}$$

In Trotter-symmetric solutions *xk* = *m* for all *k*, these derivatives simplify to

$$\frac{\partial \Psi}{\partial x\_q} = \frac{\beta J\_0}{M} \left\{ \frac{\text{Tr}[\sigma^z \mathbf{K}^M(m)]}{\text{Tr}[\mathbf{K}^M(m)]} - m \right\}, \tag{12.53}$$

$$\frac{\partial^2 \Psi}{\partial x\_q \partial x\_r} = \left(\frac{\beta J\_0}{M}\right)^2 \left\{ \frac{\text{Tr}[\sigma^z \mathbf{K}^{|q-r|}(m) \sigma^z \mathbf{K}^{M-|q-r|}(m)]}{\text{Tr}[\mathbf{K}^M(m)]} - \left(\frac{\text{Tr}[\sigma^z \mathbf{K}^M(m)]}{\text{Tr}[\mathbf{K}^M(m)]}\right)^2 \right\}$$

$$- (\beta J\_0/M) \delta\_{qr}. \tag{12.54}$$

and *m* is the solution of

$$m = \frac{\text{Tr}[\sigma^z \mathbf{K}^M(m)]}{\text{Tr}[\mathbf{K}^M(m)]}.\tag{12.55}$$

Trotter symmetry-breaking bifurcations occur when Det[(β *J*0/*M*)*A*−1I] = 0, where

$$A\_{qr} = \frac{\text{Tr}[\sigma^z K^{|q-r|}(m)\sigma^z K^{M-|q-r|}(m)]}{\text{Tr}[K^M(m)]} - m^2. \tag{12.56}$$

We introduce the symmetric matrix *<sup>Q</sup>*(*m*) <sup>=</sup> <sup>e</sup><sup>−</sup> <sup>1</sup> <sup>2</sup> *<sup>m</sup>*(β *<sup>J</sup>*0/*M*)σ *<sup>z</sup> K*(*m*)e 1 <sup>2</sup> *<sup>m</sup>*(β *<sup>J</sup>*0/*M*)σ *<sup>z</sup>* , with eigenvalues λ±(*x*) and orthogonal eigenbasis |±-. Now for any ∈ IN we have

$$\mathbf{K}^{\ell}(m) = \mathbf{c}^{\frac{1}{2}m(\beta J\_0/M)\sigma^{\varepsilon}} \left(\lambda^{\ell}\_+(m)|+\rangle\langle+| + \lambda^{\ell}\_-(m)|-\rangle\langle-|\right)\mathbf{c}^{-\frac{1}{2}m(\beta J\_0/M)\sigma^{\varepsilon}},\tag{12.57}$$

and hence, with the short-hand σ *<sup>z</sup> ab* = *a*|σ *<sup>z</sup>* |*b*and φ = λ−(*m*)/λ+(*m*) ∈ (−1, 1):

$$\frac{\text{Tr}[\sigma^z \mathbf{K}^M(m)]}{\text{Tr}[\mathbf{K}^M(m)]} = \frac{\sigma^z\_{++} + \sigma^z\_{--} \phi^M}{1 + \phi^M},\tag{12.58}$$

$$A\_{qr} = \frac{\sigma^{z2}\_{++} + [\phi^{|q-r|} + \phi^{M-|q-r|}]|\sigma^z\_{+-}|^2 + \phi^M \sigma^{z2}\_{--}}{1 + \phi^M} - m^2.\tag{12.59}$$

Since *A* has a Toeplitz form, we know its eigenvalues:

$$k = 1 \dots M \text{:} \quad a\_k = \frac{|\sigma\_{+-}^z|^2}{1 + \phi^M} \frac{(1 - \phi^M)(1 - \phi^2)}{1 + \phi^2 - 2\phi \cos(2\pi (k - 1)/M)}. \tag{12.60}$$

Finally we need to diagonalize *Q*(*m*) for large *M*. This gives:

$$\mathcal{Q}(m) = \begin{pmatrix} \mathbf{e}^{B + \beta(h + J\_0 m)/M} & \mathbf{e}^{-B} \\ \mathbf{e}^{-B} & \mathbf{e}^{B - \beta(h + J\_0 m)/M} \end{pmatrix} \tag{12.61}$$

$$\lambda\_{\pm}(m) = \mathbf{e}^{B \pm \frac{\beta}{M}} \sqrt{^{(h + J\_0 m)^2 + \Gamma^2}} + \mathcal{O}(M^{-2}),\tag{12.62}$$

$$\lim\_{M \to \infty} |\pm\rangle = \frac{1}{C\_{\pm}(m)} \left( \Gamma, -(h + J\_0 m) \pm \sqrt{(h + J\_0 m)^2 + \Gamma^2} \right), \tag{12.63}$$

$$C\_{\pm}(m) = \sqrt{2} \left[ (h + J\_0 m)^2 + \Gamma^2 \mp (h + J\_0 m) \sqrt{(h + J\_0 m)^2 + \Gamma^2} \right]^{\frac{1}{2}}.\tag{12.64}$$

It follows that

$$\boldsymbol{\phi} = \mathbf{e}^{-\frac{2\theta}{M}} \sqrt{\boldsymbol{\varsigma}^{(h+J\_0 m)^2 + \Gamma^2} + \mathcal{O}(M^{-2})}.\tag{12.65}$$

Hence lim*<sup>M</sup>*→∞ φ=1, lim*<sup>M</sup>*→∞ φ*<sup>M</sup>* = exp[−2β (*h*+*J*0*m*)<sup>2</sup>+<sup>2</sup>], lim*<sup>M</sup>*→∞ <sup>σ</sup> *<sup>z</sup>* ++ = <sup>−</sup> lim*<sup>M</sup>*→∞ <sup>σ</sup> *<sup>z</sup>* −− <sup>=</sup>(*h*+*J*0*m*)/ (*h*+*J*0*m*)<sup>2</sup>+2, and lim*<sup>M</sup>*→∞ <sup>σ</sup> *<sup>z</sup>* +− = / (*h*+*J*0*m*)<sup>2</sup>+2. The equation for the magnetization *m* and the eigenvalues of *A* thereby become

$$m = \frac{(h + J\_0 m) \tanh[\beta \sqrt{(h + J\_0 m)^2 + \Gamma^2}]}{\sqrt{(h + J\_0 m)^2 + \Gamma^2}},\tag{12.66}$$

$$a\_k = \frac{\Gamma^2 \tanh[\beta \sqrt{(h + J\_0 m)^2 + \Gamma^2}]}{\left(h + J\_0 m\right)^2 + \Gamma^2} \Big[ 1 + 2 \lim\_{M \to \infty} \frac{1 - \cos(2\pi (k - 1)/M)}{1 - \phi^2} \Big]^{-1}. \tag{12.67}$$

Since all *ak* are bounded for large *M*, the condition β *J*0*ak*/*M* = 1 for bifurcations away from the Trotter-symmetric state are never met, indicating that the state described by Eq. (12.66) is the physical one. The free energy density *f* = lim*<sup>M</sup>*→∞ lim*<sup>N</sup>*→∞ *fN*,*<sup>M</sup>* is

$$f = \frac{1}{2}J\_0m^2 - \lim\_{M \to \infty} \left\{ \frac{M}{2\beta} \log \left[ \frac{1}{2} \sinh \left( \frac{2\beta \Gamma}{M} \right) \right] + \frac{1}{\beta} \log \left( \lambda\_+^M(m) + \lambda\_-^M(m) \right) \right\}$$

$$= \frac{1}{2}J\_0m^2 - \frac{1}{\beta} \log \left[ 2 \cosh \left( \beta \sqrt{(h + J\_0m)^2 + \Gamma^2} \right) \right]. \tag{12.68}$$

Extremizing the expression in Eq. (12.68) over *m* reproduces Eq. (12.66).

We return to Eq. (12.48), and now seek to take the Trotter limit *M* →∞ first. The complexities are all in the evaluation for large *M* of the quantity

$$Z\_M = \int \left[\prod\_{k=1}^M \mathcal{D}z\_k\right] \left\{ \text{Tr} \left[\prod\_{k=1}^M \text{e}^{z\_k \sqrt{\frac{\beta l\_0}{M}} \sigma^\varepsilon} \left(\begin{array}{cc} \text{e}^{B+\beta h/M} & \text{e}^{-B} \\ \text{e}^{-B} & \text{e}^{B-\beta h/M} \end{array}\right) \right] \right\}^N. \tag{12.69}$$

This can be analyzed using random field Ising chain techniques [18]. Alternatively, we can use the fact that in summations of the form - *<sup>k</sup> zk* , each *zk* effectively scales as *<sup>O</sup>*(*M*<sup>−</sup> <sup>1</sup> <sup>2</sup> ), enabling us to use e−*<sup>B</sup>* <sup>=</sup> <sup>√</sup>tanh(β/*M*) and a modified version of the Trotter identity, viz. *k*≤*M* e*uk*/*<sup>M</sup>* e*<sup>v</sup>*/*<sup>M</sup>* <sup>=</sup> <sup>e</sup>*<sup>M</sup>*−<sup>1</sup> - *<sup>k</sup>*≤*<sup>M</sup> uk*+*<sup>v</sup>*, to derive

308 A. C. C. Coolen et al.

$$\begin{split} Z\_{M} &= \mathbf{e}^{NMB} \int \left[ \prod\_{k=1}^{M} \mathbf{D} \boldsymbol{\varepsilon}\_{k} \right] \left\{ \text{Tr} \left[ \prod\_{k=1}^{M} \mathbf{e}^{\boldsymbol{\varepsilon}\_{k} \sqrt{\frac{\rho\_{0}}{M}} \boldsymbol{\sigma}^{\boldsymbol{\varepsilon}}} \left( \mathbf{I} + \frac{\beta}{M} (h\boldsymbol{\sigma}^{\boldsymbol{\varepsilon}} + \boldsymbol{\Gamma}\boldsymbol{\sigma}^{\boldsymbol{x}}) + \mathcal{O}(\frac{1}{M^{2}}) \right) \right] \right\}^{N} \\ &= \sqrt{\beta J\_{0} N} \mathbf{e}^{NMB} \int \frac{\mathbf{d}m}{\sqrt{2\pi}} \mathbf{e}^{-\frac{1}{2}\beta J\_{0} N \mathbf{m}^{2}} \left\{ \text{Tr} \, \mathbf{e}^{\beta (h + J\_{0} \mathbf{m}) \boldsymbol{\sigma}^{\boldsymbol{\varepsilon}} + \beta \boldsymbol{\Gamma} \boldsymbol{\sigma}^{\boldsymbol{x}} + \mathcal{O}(M^{-1})} \right\}^{N} . \end{split} \tag{12.70}$$

The free energy density *f* = lim*<sup>N</sup>*→∞ lim*<sup>M</sup>*→∞ *fN*,*<sup>M</sup>* then becomes

$$f = -\frac{1}{\beta} \text{extr}\_m \left\{ \log \left( \mathbf{e}^{\beta \mu\_+ (m)} + \mathbf{e}^{\beta \mu\_- (m)} \right) - \frac{1}{2} \beta J\_0 m^2 \right\},\tag{12.71}$$

in which μ±(*m*) are the eigenvalues of the matrix *L*(*m*) = (*h*+*J*0*m*)σ *<sup>z</sup>*+σ *<sup>x</sup>* :

$$L(m) = \begin{pmatrix} h + J\_0 m & \Gamma \\ \Gamma & -(h + J\_0 m) \end{pmatrix}, \quad \mu\_{\pm}(m) = \pm \sqrt{(h + J\_0 m)^2 + \Gamma^2}. \tag{12.72}$$

We now recover Eqs. (12.66, 12.68), so the limits *N* → ∞ and *M* → ∞ can be interchanged:

$$f = \text{extr}\_m \left\{ \frac{1}{2} J\_0 m^2 - \frac{1}{\beta} \log \left[ 2 \cosh \left( \beta \sqrt{(h + J\_0 m)^2 + \Gamma^2} \right) \right] \right\}. \quad (12.73)$$

We next turn to the DRT dynamics. The energy and the usual initial conditions can once more be expressed in terms of {*mk* , *E<sup>k</sup>* } (slice-dependent observables) or (*m*, *<sup>E</sup>*) (slice-independent ones). We define the short-hand *<sup>Q</sup>*±(*m*)<sup>=</sup> <sup>1</sup> <sup>2</sup> tanh(β(*J*0*m*+*h*)/*<sup>M</sup>* <sup>+</sup>2*B*)<sup>±</sup> <sup>1</sup> <sup>2</sup> tanh(β(*J*0*m*+*h*)/*M* −2*B*) ∈ (−1, 1). Upon inserting Eqs. (12.27, 12.28) and Eqs. (12.31, 12.32) into Eq. (12.21), with the fields *hik* (*s*) <sup>=</sup> *<sup>M</sup>*−1[*h*+*J*0*mk* (*s*)] + (*B*/β)(*si*,*k*+1+*si*,*k*−<sup>1</sup>) <sup>+</sup> *<sup>O</sup>*(*<sup>N</sup>* <sup>−</sup><sup>1</sup>), and using expressions such as tanh[*a* + *b*(*s*+*s* )] = <sup>1</sup> <sup>4</sup> (1+*s*)(1+*s* )tanh(*a*+2*b*) <sup>+</sup> <sup>1</sup> <sup>4</sup> (1−*s*)(1−*s* ) tanh(*a*−2*b*) <sup>+</sup> <sup>1</sup> <sup>2</sup> (1−*ss* )tanh(*a*), one finds the following descriptions:

• *Trotter slice-dependent observables*

Our observables are *mq* (*s*)= *N* <sup>−</sup><sup>1</sup> - *<sup>i</sup> si*,*<sup>q</sup>* and *<sup>E</sup><sup>q</sup>* (*s*)<sup>=</sup> *<sup>N</sup>* <sup>−</sup><sup>1</sup> - *<sup>i</sup> si*,*<sup>q</sup> si*,*q*+1, for *q* = 1 ... *M*, and we must take the limit *N* → ∞ before *M* → ∞. We note that

$$\begin{split} \tanh(\beta h\_{ik}(\mathbf{s})) &= \frac{1}{2} (1 + s\_{i,k+1} s\_{i,k-1}) \mathcal{Q}\_+(m\_k(\mathbf{s})) + \frac{1}{2} (s\_{i,k+1} + s\_{i,k-1}) \mathcal{Q}\_-(m\_k(\mathbf{s})) \\ &+ \frac{1}{2} (1 - s\_{i,k+1} s\_{i,k-1}) \tanh(\beta (h + J\_0 m\_k(\mathbf{s})) / M), \end{split} \tag{12.74}$$

so with the correlators *Ck* in Eq. (12.39) the dynamical laws take the form

#### 12 Dynamical Analysis of Quantum Annealing 309

$$\begin{split} \tau \frac{\mathbf{d}}{\mathbf{d}\mathbf{f}} m\_{q} &= \frac{1}{2} (1 + C\_{q}) \mathcal{Q}\_{+} (m\_{q}) + \frac{1}{2} (m\_{q+1} + m\_{q-1}) \mathcal{Q}\_{-} (m\_{q}) - m\_{q} \\ &+ \frac{1}{2} (1 - C\_{q}) \tanh\left(\frac{\beta}{\mathcal{M}} (h + J\_{0} m\_{q})\right), \\ \tau \frac{\mathbf{d}}{\mathbf{d}\mathbf{f}} \mathcal{E}\_{q} &= \frac{1}{2} (m\_{q+1} + m\_{q-1}) \mathcal{Q}\_{+} (m\_{q}) + \frac{1}{2} (1 + C\_{q}) \mathcal{Q}\_{-} (m\_{q}) \\ &+ \frac{1}{2} (m\_{q} + m\_{q+2}) \mathcal{Q}\_{+} (m\_{q+1}) + \frac{1}{2} (1 + C\_{q+1}) \mathcal{Q}\_{-} (m\_{q+1}) \\ &+ \frac{1}{2} (m\_{q+1} - m\_{q-1}) \tanh\left(\frac{\beta}{\mathcal{M}} (h + J\_{0} m\_{q})\right) \\ &+ \frac{1}{2} (m\_{q} - m\_{q+2}) \tanh\left(\frac{\beta}{\mathcal{M}} (h + J\_{0} m\_{q+1})\right) - 2 \mathcal{E}\_{q}. \end{split}$$

For slice-independent initial conditions, where *mk* = *m* and *E<sup>k</sup>* = *E*, this becomes

$$\tau \frac{d}{dt} m = \frac{1}{2} (1 + C) \mathcal{Q}\_{+}(m) + m \mathcal{Q}\_{-}(m) - m + \frac{1}{2} (1 - C) \tanh\left(\frac{\beta}{M} (h + J\_0 m)\right), \tag{12.77}$$

$$
\pi \frac{\mathrm{d}}{\mathrm{d}t} \mathcal{E} = 2m \mathcal{Q}\_{+}(m) + (1+C)\mathcal{Q}\_{-}(m) - 2\mathcal{E},\tag{12.78}
$$

with the correlator *C* in Eq. (12.43).

## • *Trotter slice-independent observables*

For the choice (*m*, *E*) there is no constraint on the order of limits, but the quantities *mk* (*s*) appearing inside tanh(β*hik* (*s*)) can no longer be replaced by deterministic macroscopic observables, but must now be calculated. Using Trotter slice permutation symmetry wherever possible, one finds

$$\tau \frac{\mathbf{d}}{\mathbf{d}t} m = \frac{1}{2M} \sum\_{k=1}^{M} \left\langle \left[ 1 + C\_k(\mathbf{s}) \right] \mathcal{Q}\_+(m\_k(\mathbf{s})) + \left[ m\_{k+1}(\mathbf{s}) + m\_{k-1}(\mathbf{s}) \right] \mathcal{Q}\_-(m\_k(\mathbf{s})) \right. $$

$$\left. + \left[ 1 - C\_k(\mathbf{s}) \right] \tanh(\beta(h + J\_0 m\_k(\mathbf{s})) / M) \right\vert\_{m, \mathcal{E}} - m, \tag{12.79}$$

$$\tau \frac{\mathbf{d}}{\mathbf{d}t} \mathcal{E} = \frac{1}{M} \sum\_{k=1}^{M} \left\langle \left[ m\_{k+1}(\mathbf{s}) + m\_{k-1}(\mathbf{s}) \right] \mathcal{Q}\_+(m\_k(\mathbf{s})) \right\rangle\_{m, \mathcal{E}} $$

$$+ \frac{1}{M} \sum\_{k=1}^{M} \left\langle \left[ 1 + C\_k(\mathbf{s}) \right] \mathcal{Q}\_-(m\_k(\mathbf{s})) \right\rangle\_{m, \mathcal{E}} - 2\mathcal{E}, \tag{12.80}$$

with *Ck* (*s*) = *N* <sup>−</sup><sup>1</sup> - *<sup>i</sup> si*,*k*+<sup>1</sup>*si*,*k*−1. For large *M* and *N*, and in view of the interchangeability of the limits *M* → ∞ and *N* → ∞ in the equilibrium calculation, we may anticipate (and can indeed show) that we can neglect the fluctuations in the values of {*mk* (*s*)} and simply replace *mk* (*s*) → *m*(*s*) + *o*(1) in the right-hand sides of the above equations, upon which these simplify to Eqs. (12.77, 12.78).

#### **12.5 Link Between Statics and Dynamics**

We now show that for *M* → ∞, the stationary state of Eqs. (12.77, 12.78) reproduces the equilibrium result in Eq. (12.66) as expected. The fixed-point equations of Eqs. (12.77, 12.78) are

$$m = \frac{1}{2}(1+C)\mathcal{Q}\_{+}(m) + m\mathcal{Q}\_{-}(m) + \frac{1}{2}(1-C)\tanh\left(\frac{\beta}{M}(h+J\_{0}m)\right),\tag{12.81}$$

$$\mathcal{E} = m\mathcal{Q}\_{+}(m) + \frac{1}{2}(1+\mathcal{C})\mathcal{Q}\_{-}(m),\tag{12.82}$$

with the correlator *C* = *C*(*m*, *E*) ∈ (−1, 1) to be solved from

$$C = \frac{\sum\_{s\_1 \dots s\_M} \mathbf{c}^{\sum\_{k=1}^M (\mathbf{x}s\_k + \mathbf{y}s\_k s\_{k+1})} s\_{1 \cdot S\_3}}{\sum\_{s\_1 \dots s\_M} \mathbf{c}^{\sum\_{k=1}^M (\mathbf{x}s\_k + \mathbf{y}s\_k s\_{k+1})}},\tag{12.83}$$

$$m = \frac{1}{M} \frac{\partial \log Z}{\partial x}, \qquad \mathcal{E} = \frac{1}{M} \frac{\partial \log Z}{\partial \mathbf{y}}, \qquad Z(\mathbf{x}, \mathbf{y}) = \sum\_{\mathbf{x}\_{1} \dots \mathbf{x}\_{M}} \mathbf{c}^{\sum\_{k=1}^{M} (xs\_{k} + ys\_{k}s\_{k+1})}.\tag{12.84}$$

We compute *Z*(*x*, *y*) via the transfer matrix *K*(*x*, *y*)with elements *Kss*=e 1 <sup>2</sup> *x*(*s*+*s* )+*yss* . This gives *Z*(*x*, *y*)=λ*<sup>M</sup>* <sup>+</sup> (*x*, *<sup>y</sup>*)+λ*<sup>M</sup>* <sup>−</sup> (*x*, *y*), where λ±(.) are the eigenvalues of *K*(.),

$$\lambda\_{\pm}(\mathbf{x}, \mathbf{y}) = \mathbf{e}^{\mathbf{y}} \left( \cosh(\mathbf{x}) \pm \sqrt{\sinh^{2}(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}} \right). \tag{12.85}$$

For the equilbrium values of (*m*, *E*), Eq. (12.84) are solved by

$$\mathbf{x} = \beta (h + J\_0 m) / M, \qquad \mathbf{y} = B = -\frac{1}{2} \log \tanh \left( \frac{\beta \Gamma}{M} \right), \quad \text{so} \quad \mathbf{c}^{-4y} = \tanh^2 \left( \frac{\beta \Gamma}{M} \right). \tag{12.86}$$

This claim is confirmed by substituting these as ansätze into the expressions given in the appendix. The key ingredient φ = λ−/λ<sup>+</sup> of our formulae then becomes

$$\log \phi = -\frac{2\beta}{M} \sqrt{(h + J\_0 m)^2 + \Gamma^2} + \mathcal{O}(M^{-3}).\tag{12.87}$$

Hence for *M* → ∞ the formulae for *m* and *E* in Eq. (12.84) become

$$m = \frac{(h + J\_0 m)\tanh[\beta\sqrt{(h + J\_0 m)^2 + \Gamma^2}]}{\sqrt{(h + J\_0 m)^2 + \Gamma^2}}, \qquad \mathcal{E} = 1. \tag{12.88}$$

in which we recognize (12.66). For large *<sup>M</sup>* one finds *<sup>Q</sup>*+(*m*)=*O*(*M*−<sup>3</sup>) and *<sup>Q</sup>*−(*m*)=1−2(β/*M*)<sup>2</sup>+*O*(*M*−<sup>3</sup>), so expansion of the fixed-point equations gives

#### 12 Dynamical Analysis of Quantum Annealing 311

$$m = M(1 - C)\frac{h + J\_0 m}{4\beta\Gamma^2} + \mathcal{O}(M^{-1}),\tag{12.89}$$

$$\mathcal{E} = \frac{1}{2}(\mathbf{l} + \mathbf{C})[\mathbf{l} - 2(\beta \Gamma/M)^2] + \mathcal{O}(M^{-3}).\tag{12.90}$$

The first equation implies that *C* = 1−*C*˜ /*M* for *M* → ∞, with *C*˜ = *O*(1). In turn, this gives *<sup>E</sup>* <sup>=</sup> <sup>1</sup><sup>−</sup> *<sup>C</sup>*˜ <sup>2</sup>*<sup>M</sup>* <sup>+</sup>*O*(*M*−2). What is left in our proof is to show that *<sup>m</sup>* obeys

$$m = \frac{h + J\_0 m}{4\beta \Gamma^2} \lim\_{M \to \infty} M(1 - C). \tag{12.91}$$

We hence compute the correlator*C* to order *M*−1, using the identities in the appendix:

$$\begin{split} C &= \langle + | \sigma^{\ddagger} | + \rangle^{2} + \frac{\cosh\left[\left(\frac{1}{2}M - 2\right)\log\phi\right]}{\cosh\left[\frac{1}{2}M\log\phi\right]} \Big(1 - \langle + | \sigma^{\ddagger} | + \rangle^{2}\Big) \\ &= \frac{(h + J\_{0}m)^{2}}{(h + J\_{0}m)^{2} + \Gamma^{2}} + \frac{\cosh[\beta(1 - 4/M)\sqrt{(h + J\_{0}m)^{2} + \Gamma^{2}}]}{\cosh[\beta\sqrt{(h + J\_{0}m)^{2} + \Gamma^{2}}]} \\ &\times \frac{\Gamma^{2}}{(h + J\_{0}m)^{2} + \Gamma^{2}} + \mathcal{O}\left(\frac{1}{M^{2}}\right) \\ &= 1 - \frac{1}{M}\tanh\left[\beta\sqrt{(h + J\_{0}m)^{2} + \Gamma^{2}}\right] \frac{4\beta\Gamma^{2}}{\sqrt{(h + J\_{0}m)^{2} + \Gamma^{2}}} + \mathcal{O}\left(\frac{1}{M^{2}}\right). \end{split} \tag{12.92}$$

We can now read off the value of *C*˜, and the condition in Eq. (12.91) is found to reduce to Eq. (12.66), so that it is indeed satisfied. This completes the demonstration that for large *M*, the macroscopic Eqs. (12.77, 12.78) indeed have the equilibrium state as their fixed-point.

#### **12.6 Evolution on Adiabatically Separated Timescales**

We return to the dynamical laws in Eqs. (12.77, 12.78). As noted earlier, these exhibit a divergent relaxation time for the magnetization for large *M*, suggesting that the dynamics have distinct phases. The first phase is studied by choosing τ = *O*(1). Using

$$\mathcal{Q}\_{+}(m) = \frac{4\beta^3 \Gamma^2(J\_0 m + h)}{M^3} + \mathcal{O}(M^{-4}), \ \mathcal{Q}\_{-}(m) = 1 - \frac{2\beta^2 \Gamma^2}{M^2} + \mathcal{O}(M^{-4}), \tag{12.93}$$

we here find that

$$m = m\_0 + \mathcal{O}(M^{-1}), \qquad \tau \frac{\mathbf{d}}{\mathbf{d}t} \mathcal{E} = 1 + \mathcal{C}(m\_0, \mathcal{E}) - 2\mathcal{E} + \mathcal{O}(M^{-1}).\tag{12.94}$$

So on these timescales the magnetization does not change, whereas the Trotter energy evolves to the solution of the fixed-point equation *<sup>E</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup>*C*(*m*0, *E*), in which *C*(*m*0, *E*) is to be solved from the following equations according to the appendix:

$$m = -\frac{\sinh(\alpha)\tanh[\frac{1}{2}M\log\phi]}{\sqrt{\sinh^2(\alpha) + \mathbf{e}^{-4y}}},\tag{12.95}$$

$$\mathcal{E} = \frac{\sinh^2(\mathbf{x})}{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}} + \frac{\cosh[(\frac{1}{2}M - \mathbf{l})\log\phi]}{\cosh[\frac{1}{2}M\log\phi]} \frac{\mathbf{e}^{-4\mathbf{y}}}{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}},\qquad(12.96)$$

$$C = \frac{\sinh^2(\mathbf{x})}{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}} + \frac{\cosh[(\frac{1}{2}M - 2)\log\phi]}{\cosh[\frac{1}{2}M\log\phi]} \frac{\mathbf{e}^{-4\mathbf{y}}}{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}},\qquad(12.97)$$

withφ = [cosh(*x*)− sinh<sup>2</sup>(*x*)+e−4*<sup>y</sup>* ]/[cosh(*x*) <sup>+</sup> sinh<sup>2</sup>(*x*)+e−4*<sup>y</sup>* ]. Inspection of these equations reveals that the correct scaling with *M* requires(*x*, e−2*<sup>y</sup>* )=(*u*, *v*)/*M*, with *<sup>u</sup>*, *<sup>v</sup>*=*O*(1). Now <sup>1</sup> <sup>2</sup> *<sup>M</sup>* log <sup>φ</sup> =−√*u*2+*v*2+*O*(*M*−<sup>2</sup>), *<sup>E</sup>* <sup>=</sup> <sup>1</sup>−*E*˜/*<sup>M</sup>* <sup>+</sup>*O*(*M*−<sup>2</sup>), and *<sup>C</sup>* <sup>=</sup> <sup>1</sup>−2*E*˜/*<sup>M</sup>* <sup>+</sup>*O*(*M*−<sup>2</sup>), in which (*u*, *<sup>v</sup>*) are solved from

$$m\_0 = \frac{\mu \tanh(\sqrt{\mu^2 + \nu^2})}{\sqrt{\mu^2 + \nu^2}}, \quad \tilde{\mathcal{E}} = \frac{2\nu^2 \tanh(\sqrt{\mu^2 + \nu^2})}{\sqrt{\mu^2 + \nu^2}}.\tag{12.98}$$

Although the fixed-point equation for *<sup>E</sup>* is now solved to order *<sup>O</sup>*(*M*−<sup>1</sup>), computation of *<sup>E</sup>*˜ requires higher orders of *<sup>M</sup>*−1. Once *<sup>E</sup>* <sup>=</sup> <sup>1</sup>−*E*˜/*<sup>M</sup>* <sup>+</sup>*O*(*<sup>N</sup>* <sup>−</sup><sup>2</sup>) and *<sup>C</sup>*(*m*, *<sup>E</sup>*) <sup>=</sup> <sup>1</sup>−2*E*˜/*<sup>M</sup>* <sup>+</sup>*O*(*M*−<sup>2</sup>), we find d*m*/d*<sup>t</sup>* <sup>=</sup> *<sup>O</sup>*(*M*−<sup>2</sup>) and d*E*/d*<sup>t</sup>* <sup>=</sup> *<sup>O</sup>*(*M*−<sup>2</sup>), so nothing evolves further macroscopically on these finite timescales.

Since we need <sup>τ</sup> <sup>=</sup> *<sup>O</sup>*(*M*−<sup>2</sup>) to probe the macroscopic evolution of the system on larger timescales, spin flips in the Trotter system are attempted on unit timescales of *<sup>O</sup>*(*M*3*N*). <sup>6</sup> With the choice <sup>τ</sup> <sup>=</sup> *<sup>M</sup>*−2, and upon defining *<sup>M</sup>*(1−*E*) <sup>=</sup> *<sup>E</sup>*˜ and *<sup>M</sup>*(1<sup>−</sup> *C*) = *C*˜, the macroscopic laws (12.77, 12.78) become

$$\frac{d}{dt}m = \frac{1}{2}\tilde{C}\beta(h + J\_0m) - 2m\beta^2\Gamma^2 + \mathcal{O}\left(\frac{1}{M}\right),\tag{12.99}$$

$$\frac{d}{dt}\tilde{\mathcal{E}} = 4M\beta^2\Gamma^2 - M^2(2\tilde{\mathcal{E}} - \bar{\mathcal{C}}) - 8\beta^3\Gamma^2m(J\_0m + h) - 2\beta^2\Gamma^2\bar{\mathcal{C}} + \mathcal{O}\left(\frac{1}{M}\right).\tag{12.100}$$

The quantity *C*˜ = *C*˜(*m*, *E*˜) is to be solved together with (*x*, *y*) from Eqs. (12.95, 12.96, 12.97). The relevant scaling is still (*x*, <sup>e</sup>−2*<sup>y</sup>* ) <sup>=</sup> (*u*, *<sup>v</sup>*)/*M*, with *<sup>u</sup>*, *<sup>v</sup>* <sup>=</sup> *<sup>O</sup>*(1), but according to Eq. (12.100) we now need more than just the leading order in *M*−1. Using

$$\log \phi = -\frac{2\sqrt{\mu^2 + \nu^2}}{M} + \mathcal{O}(M^{-3}),\tag{12.101}$$

<sup>6</sup> This reflects the high energy cost of breaking Trotter symmetry to induce magnetization changes.

the equations for *E* and *C* take the form *E* = 1(*u*, *v*) and *C* = 2(*u*, *v*), where

$$\begin{split} \boldsymbol{\Sigma}\_{\ell}(\boldsymbol{\mu},\boldsymbol{\nu}) &= \left[ \sinh^{2} \left( \frac{\boldsymbol{\mu}}{M} \right) + \frac{\boldsymbol{\nu}^{2}}{M^{2}} \right]^{-1} \Big[ \sinh^{2} \left( \frac{\boldsymbol{\mu}}{M} \right) + \frac{\boldsymbol{\nu}^{2}}{M^{2}} \frac{F\_{\ell}(\boldsymbol{\mu},\boldsymbol{\nu})}{F\_{0}(\boldsymbol{\mu},\boldsymbol{\nu})} \Big], \tag{12.102} \\ \boldsymbol{F}\_{\ell}(\boldsymbol{\mu},\boldsymbol{\nu}) &= \cosh \left[ \left( \frac{1}{2}M - \ell \right) \log \phi \right]. \end{split} \tag{12.103} $$

Now, after tedious but straightforward expansion in *M*−<sup>1</sup> one finds that

$$\frac{F\_{\ell}(\boldsymbol{u},\boldsymbol{v})}{F\_{0}(\boldsymbol{u},\boldsymbol{v})} = 1 - \frac{2\ell\sqrt{\boldsymbol{u}^{2} + \boldsymbol{v}^{2}}}{M} \tanh\left(\sqrt{\boldsymbol{u}^{2} + \boldsymbol{v}^{2}}\right)$$

$$+ \frac{2\ell^{2}(\boldsymbol{u}^{2} + \boldsymbol{v}^{2})}{M^{2}} + \mathcal{O}(M^{-3}).\tag{12.104}$$

Hence

$$\Xi\_{\ell}(\mu,\nu) = 1 - \frac{2\ell\nu^2}{M} \frac{\tanh(\sqrt{\mu^2 + \nu^2})}{\sqrt{\mu^2 + \nu^2}} + \frac{2\ell^2\nu^2}{M^2} + \mathcal{O}(M^{-3}).\tag{12.105}$$

It follows that the equations for *E*˜ = *M*(1−*E*) and *C*˜ = *M*(1−*C*) take the form

$$\tilde{\mathcal{E}} = 2\nu^2 \frac{\tanh(\sqrt{\mu^2 + \nu^2})}{\sqrt{\mu^2 + \nu^2}} - \frac{2\nu^2}{M} + \mathcal{O}(M^{-2}), \quad \tilde{\mathcal{C}} = 2\tilde{\mathcal{E}} - \frac{4\nu^2}{M} + \mathcal{O}(M^{-2}).\tag{12.106}$$

The dynamical equations then become

$$\frac{\mathrm{d}}{\mathrm{d}t}m = \tilde{\mathcal{E}}\beta(\mathrm{h} + J\_0 m) - 2m\beta^2 \Gamma^2 + \mathcal{O}\left(\frac{1}{M}\right),\tag{12.107}$$

$$\frac{\mathbf{d}}{\mathbf{d}t}\tilde{\mathcal{E}} = 4M(\beta^2 \Gamma^2 - \nu^2) - 8\beta^3 \Gamma^2 m(J\_0 m + h) - 4\beta^2 \Gamma^2 \tilde{\mathcal{E}} + \mathcal{O}\left(\frac{1}{M}\right). \tag{12.108}$$

What remains is to express *v* in terms of (*m*, *E*˜), in leading two orders, by solving Eq. (12.106) for *E*˜ alongside our equation for *m*. The latter is

$$m = \frac{\mu \tanh(\sqrt{\mu^2 + \nu^2})}{\sqrt{\mu^2 + \nu^2}} + \mathcal{O}(M^{-2}). \tag{12.109}$$

Equation (12.106) shows that *v* = 0 corresponds to *E*˜ = 0, and that *E*˜ increases with *v*2. On intermediate timescales τ = *M*−1, we have

$$\frac{\mathbf{d}}{\mathbf{d}t}m = \mathcal{O}\left(\frac{1}{M}\right), \ \frac{\mathbf{d}}{\mathbf{d}t}\tilde{\mathcal{E}} = 4(\beta^2\Gamma^2 - \nu^2) + \mathcal{O}\left(\frac{1}{M}\right),\tag{12.110}$$

where *m* remains constant and *E*˜ evolves toward the value for which *v* = β + *<sup>O</sup>*(*M*−1)(which is also the equilibrium value for *<sup>v</sup>*). Thus, in the dynamical equations in Eqs. (12.107, 12.108) describing the process on timescales of τ = *M*−<sup>2</sup> we must substitute *<sup>v</sup>*<sup>2</sup> <sup>=</sup> <sup>β</sup>2<sup>2</sup> <sup>+</sup> *<sup>O</sup>*(*M*−1). Thus, during the slow process where *<sup>m</sup>* evolves we always have

$$
\tilde{\mathcal{E}} = 2\beta^2 \Gamma^2 m/\mu. \tag{12.111}
$$

Upon insertion into Eq. (12.107), this results in a closed dynamical equation for *m* only:

$$\frac{\mathbf{d}}{\mathbf{d}t}m = 2\beta^2\Gamma^2\left(\frac{\beta(h+J\_0m)\tanh(\sqrt{\mu^2+\beta^2\Gamma^2})}{\sqrt{\mu^2+\beta^2\Gamma^2}}-m\right),\tag{12.112}$$

without requiring additional approximations, and with *u* to be solved from7

$$m = \frac{\mu \tanh(\sqrt{\mu^2 + \beta^2 \Gamma^2})}{\sqrt{\mu^2 + \beta^2 \Gamma^2}}.\tag{12.113}$$

In equilibrium we recover from Eqs. (12.112, 12.113) the correct equilibrium state in Eq. (12.88), with *u* = β(*J*0*m*+*h*). Comparison with Eq. (10) in [7] reveals, apart from a harmless difference in time units, that the approximation of [7] (used also in [8–10]) implies replacing *u at any time* by β(*J*0*m*+*h*). While this indeed holds in equilibrium, the approximation may be dangerous far from equilibrium.

In Fig. 12.1 we test the predictions of Eqs. (12.112, 12.113) against numerical simulations of the process in Eqs. (12.11, 12.12). The approximate co-location of the simulation curves for widely varying values of *<sup>M</sup>* confirms that <sup>τ</sup> <sup>=</sup> *<sup>O</sup>*(1/*M*<sup>2</sup>) (inferred from the dynamical theory) indeed captures the characteristic timescale of the macroscopic process. Second, while not showing perfect agreement with the simulation data, which is not expected in view of the probability equipartitioning assumption used to close the macroscopic dynamical equations, away from stationarity the full theory in Eqs. (12.112, 12.113) is reasonably accurate and improves upon the approximation proposed in [7].

#### **12.7 Discussion**

In this chapter we aimed to explain the basic ideas and assumptions behind the DRT strategy for deriving and closing macroscopic dynamical equations, and its application to the types of spin systems used in quantum annealing with transverse fields.

<sup>7</sup> For certain values of *m* and β, Eq. (12.113) may have more than one solution *u*. In such cases the physical solution is the one with the largest absolute value.

**Fig. 12.1** Theory versus computer simulations of the microscopic process in Eqs. (12.11, 12.12) for the Trotter representation of the system with Hamiltonian *H* = −(*J*0/*N*) - *<sup>i</sup>*<sup>&</sup>lt; *<sup>j</sup>* <sup>σ</sup> *<sup>z</sup> i* σ *z <sup>j</sup>* − - *<sup>i</sup>*(*h*<sup>σ</sup> *<sup>z</sup> i* + σ *<sup>x</sup> <sup>i</sup>* ), with *<sup>N</sup>* <sup>=</sup>10000 and *<sup>M</sup>* ∈ {3, <sup>12</sup>, <sup>48</sup>, <sup>192</sup>}. In all cases *<sup>J</sup>*<sup>0</sup> <sup>=</sup>1, *<sup>T</sup>* <sup>=</sup>=0.5, and <sup>τ</sup> <sup>=</sup>1/*M*<sup>2</sup> (so time units correspond to *N M*<sup>3</sup> attempted moves per spin). Left figure: magnetization versus time for *h* =0.1; right figure: the same for *h* =0.5. The simulation data are shown as connected markers. The black curve is the theoretical prediction, that is, the solution of Eqs. (12.112, 12.113). The light blue curve is the approximated theory of [7], obtained by solving Eq. (12.112) with the equilibrium value *u* =β(*J*0*m* + *h*)

We focused on technicalities relating to commutation of the limits *N* → ∞ and *M* → ∞, the possible choices of macroscopic observables, the distinct *M*-dependent timescales in the evolution of the Trotter system, and on how an additional approximation made in earlier studies can be avoided, leading to a more precise dynamical theory. We have tested the theoretical predictions of the theory against numerical MCMC simulations of a ferromagnetic quantum system [11] with transverse external fields in Trotter representation and found good agreement.

Since there was no disorder in the examples used in this text, we could work with the dynamical laws in Eq. (12.22). If, in contrast, there is disorder in the problem, the macroscopic laws need to be averaged over its realization, and the main tool is Eq. (12.24). For models with random interactions, performing this disorder average is, however, relatively painless and does not make the dynamical theory significantly more complicated.

We hope that this introduction to the method may aid the development of further analytical studies of the macroscopic dynamics of quantum annealing, including models with time-dependent control parameters, more realistic quantum systems with disordered spin interactions or with interactions on finitely connected graphs, and more precise analytical descriptions in which the macroscopic dynamical observables are functions [14, 19, 20] instead of scalars.

**Acknowledgements** The authors are grateful for stimulating discussions with Professor Hidetoshi Nishimori.

# **Appendix: Mathematical Identities**

Here we list some basic properties of relevant transfer matrices and expectation values in the single-site Trotter system. The transfer matrix and its eigenvalues are

$$\mathbf{K} = \begin{pmatrix} \mathbf{e}^{\mathbf{y} + \mathbf{x}} & \mathbf{e}^{-\mathbf{y}} \\ \mathbf{e}^{-\mathbf{y}} & \mathbf{e}^{\mathbf{y} - \mathbf{x}} \end{pmatrix}, \quad \lambda\_{\pm} = \mathbf{c}^{\mathbf{y}} \Big[ \cosh(\mathbf{x}) \pm \sqrt{\sinh^{2}(\mathbf{x}) + \mathbf{c}^{-4\mathbf{y}}} \Big] \tag{12.114}$$

The corresponding normalized eigenvectors are

$$|+\rangle = \frac{1}{L} \Big(\mathbf{e}^{-2\mathbf{y}}, \sqrt{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}} - \sinh(\mathbf{x})\Big),\tag{12.115}$$

$$|{-}\rangle = \frac{1}{L} \Big(\sqrt{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4y}} - \sinh(\mathbf{x}), -\mathbf{e}^{-2y}\Big),\tag{12.116}$$

$$L^2 = \mathbf{e}^{-4\mathbf{y}} + \left(\sqrt{\sinh^2(\mathbf{x}) + \mathbf{e}^{-4\mathbf{y}}} - \sinh(\mathbf{x})\right)^2. \tag{12.117}$$

From these expressions one can find ±|σ *<sup>z</sup>* |±- = ± sinh(*x*)/ sinh<sup>2</sup>(*x*)+e−4*<sup>y</sup>* , and compute the following observables (with φ = λ−/λ+):

$$\frac{\sum\_{\substack{\boldsymbol{x}\_{1},...,\boldsymbol{x}\_{M}\ \boldsymbol{s}\_{1}}\prod\limits\_{k=1}^{M}K\_{\boldsymbol{s}\_{k}\boldsymbol{s}\_{k+1}}}{\sum\_{\substack{\boldsymbol{s}\_{1},...,\boldsymbol{s}\_{M}\ \prod\limits I\_{k}=1}}\prod\limits\_{k=1}^{M}K\_{\boldsymbol{s}\_{k}\boldsymbol{s}\_{k+1}}} = -\frac{\sinh(\boldsymbol{x})\tanh\left[\frac{1}{2}M\log\phi\right]}{\sqrt{\sinh^{2}(\boldsymbol{x})+\mathbf{c}^{-4\mathbf{y}}}},\tag{12.118}$$
 
$$\sum\_{\substack{\boldsymbol{s}\_{1},...,\boldsymbol{s}\_{M}\ \boldsymbol{s}\_{1}\boldsymbol{s}\_{2}}}\Gamma\_{\boldsymbol{k}\boldsymbol{s}}\prod\_{k=1}^{M}K\_{\boldsymbol{s}\_{k}\boldsymbol{s}\_{k+1}}\qquad\sinh^{2}(\boldsymbol{x})$$

$$\frac{\sum\_{s\_1\dots s\_M} s\_1 s\_2 \prod\_{k=1}^M K\_{s\_k s\_{k+1}}}{\sum\_{s\_1 \dots s\_M} \prod\_{k=1}^M K\_{s\_k s\_{k+1}}} = \frac{\sinh^2(x)}{\sinh^2(x) + \mathbf{c}^{-4\mathfrak{y}}}$$

$$+\ \frac{\cosh\left[\left(\frac{1}{2}M - 1\right)\log\phi\right]}{\cosh\left[\frac{1}{2}M\log\phi\right]} \frac{\mathbf{c}^{-4\mathfrak{y}}}{\sinh^2(x) + \mathbf{c}^{-4\mathfrak{y}}}\qquad(12.119)$$

$$\begin{split} \frac{\sum\_{\substack{\boldsymbol{s}\_{1},\ldots,\boldsymbol{s}\_{M} \ \boldsymbol{s}\_{1} \boldsymbol{s}\_{3} \prod\limits\_{k=1}^{M} K\_{\boldsymbol{s}\_{k}\boldsymbol{s}\_{k+1}}} - \boldsymbol{s}\_{k}}{\sum\limits\_{\boldsymbol{s}\_{1},\ldots,\boldsymbol{s}\_{M} \prod\limits} \prod\limits\_{k=1}^{M} K\_{\boldsymbol{s}\_{k}\boldsymbol{s}\_{k+1}}} &= \frac{\sinh^{2}(\boldsymbol{x})}{\sinh^{2}(\boldsymbol{x}) + \mathbf{c}^{-4\mathbf{y}}} \\ &+ \frac{\cosh\left[\left(\frac{1}{2}M - 2\right)\log\phi\right]}{\cosh\left[\frac{1}{2}M\log\phi\right]} \frac{\mathbf{c}^{-4\mathbf{y}}}{\sinh^{2}(\boldsymbol{x}) + \mathbf{c}^{-4\mathbf{y}}} \end{split} \tag{12.120}$$

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 13 Mean-Field Analysis of Sourlas Codes with Adiabatic Reverse Annealing**

**Shunta Arai**

**Abstract** In this chapter, we analyze the typical performance of adiabatic reverse annealing (ARA) for Sourlas codes. Sourlas codes are representative error-correcting codes related to *p*-body spin-glass models and have a first-order phase transition for *p* > 2, which degrades the estimation performance. In the ARA formulation, we introduce the initial Hamiltonian which incorporates the prior information of the solution into a vanilla quantum annealing (QA) formulation. The ground state of the initial Hamiltonian represents the initial candidate solution. To avoid the first-order phase transition, we apply ARA to Sourlas codes. We evaluate the typical ARA performance for Sourlas codes using the replica method. We show that ARA can avoid the first-order phase transition if we prepare for the proper initial candidate solution.

#### **13.1 Introduction**

Problems in information processing have been studied analytically from the viewpoint of statistical mechanics [12]. Associative memory, Sourlas codes, code-division multiple-access (CDMA), and image restoration are very popular examples [5, 6, 21, 24]. Many studies have focused on the degradation of the original signal or information due to noise. The noise can be physically regarded as thermal fluctuations. The original information can be estimated from the degraded data by tuning the strength of thermal fluctuations.

In this chapter, we focus mainly on error-correcting codes such as Sourlas codes, which are described by *p*-body spin-glass problems [21]. The main idea of errorcorrecting codes is to add redundancy while sending information to decode the original signal from noisy outputs. In Sourlas codes, the original signal is encoded in the interactions of the spins. To estimate the original signal, we search the ground

S. Arai (B)

Graduate School of Information Sciences, Tohoku University, 980-8579 Sendai, Japan e-mail: shunta.arai.d8@tohoku.ac.jp

Sigma-i Co.,Ltd., 108-0075 Minato, Tokyo, Japan

<sup>©</sup> The Author(s) 2022 N. Katoh et al. (eds.), *Sublinear Computation Paradigm*, https://doi.org/10.1007/978-981-16-4095-7\_13

state of the Hamiltonian or compute the expectation value over the Gibbs–Boltzmann distribution at a finite temperature.

In addition to thermal fluctuations, quantum fluctuations can also be used to infer the original information. Several studies have demonstrated that quantum fluctuations such as the transverse field do not necessarily enhance the performance of decoding for image restoration, Sourlas codes, or CDMA [2, 6, 15, 16]. The optimal estimation performance using quantum fluctuations is inferior to that using thermal fluctuations in Bayes-optimal cases. However, in some non-Bayes optimal cases, the estimation performance using finite quantum fluctuations and thermal fluctuations surpasses that using only thermal fluctuations; for example, when the assigned temperature is lower than the true noise scale. This implies the potential of combining quantum and thermal fluctuations for signal recovery problems.

Signal estimation algorithms using quantum fluctuations are related to optimization algorithms using quantum fluctuations, which is known as quantum annealing (QA) [9] or adiabatic quantum computation (AQC) [3]. The QA algorithm is physically implemented in the quantum annealer [7]. The quantum annealer has been tested in numerous applications, including traffic optimization [11] and in vehicles in factories [14].

In a closed system, the QA procedure is as follows. First, we set the initial state as the trivial ground state of the transverse field term. Next, we gradually decrease the strength of the transverse field. Following the Schrodinger equation, the trivial ground state evolves adiabatically into a nontrivial ground state of the target Hamiltonian, which is consistent with a solution of combinatorial optimization problems. The quantum adiabatic theorem indicates that the total computational time for searching the ground state is characterized by the minimum energy gap between the ground state and first excited state [23]. When the target Hamiltonian has a first-order phase transition, the computational time to find the ground state grows exponentially.

Reverse annealing (RA) is a protocol for restarting quantum dynamics from the final state of the standard QA procedure [17]. The RA algorithm can be used to avoid or mitigate the first-order phase transition and is classified into two methods: *adiabatic re*v*er se annealing* (ARA) [13] and *iterated re*v*er se annealing* (IRA) [26]. ARA and IRA are distinguished by how the final state is utilized. One implements the final state by introducing the initial Hamiltonian, and the other incorporates it as the initial condition.

In a recent study [2], ARA is applied to CDMA multiuser detection. ARA can avoid or mitigate the first-order phase transition in the CDMA model. In this chapter, we apply ARA for Sourlas codes. Sourlas codes have a first-order phase transition for *p* > 2. The existence of the first-order phase transition deteriorates the estimation performance. We evaluate the typical performance of ARA for Sourlas codes using the replica method. We demonstrate that ARA can avoid the first-order phase transition of Sourlas codes if we prepare the proper initial conditions.

#### **13.2 Sourlas Codes Using Quantum Fluctuations**

Following a previous study [15], we formulate Sourlas codes using quantum fluctuations. Sourlas codes are set up to send a set of products of *p* spins *Ji*1...*<sup>i</sup> <sup>p</sup>* = ξ*<sup>i</sup>*<sup>1</sup> ...ξ*<sup>i</sup> <sup>p</sup>* through a channel. The symbol ξ*<sup>i</sup>* = ±1(*i* = 1 ... *N*) represents the original signal, which is independently generated from the uniform distribution *P*(ξ*i*) = 1/2. We consider the Gauss channel as

$$P(J\_{i\_1\ldots i\_p}|\{\xi\}) = \left(\frac{N^{p-1}}{J^2\pi p!}\right)^{\frac{1}{2}} \exp\left\{-\frac{N^{p-1}}{J^2 p!} \left(J\_{i\_1\ldots i\_p} - \frac{J\_0 p! \xi\_{i\_1}\ldots\xi\_{i\_p}}{N^{p-1}}\right)^2\right\},\tag{13.1}$$

where *J* and *J*<sup>0</sup> are hyperparameters. The ratio *J*0/*J* represents the signal-to-noise ratio. The distribution *P*(*Ji*1...*<sup>i</sup> <sup>p</sup>* |{ξ }) is the conditional probability of the signal *Ji*1...*<sup>i</sup> <sup>p</sup>* for the encoded signal ξ*<sup>i</sup>*<sup>1</sup> ...ξ*<sup>i</sup> <sup>p</sup>* . We infer the original signal {ξ } from the noisy outputs {*Ji*1...*<sup>i</sup> <sup>p</sup>* }. Using the Bayes formula, we introduce the posterior probability for the estimated signal *σ* = {σ<sup>1</sup> ...σ*<sup>N</sup>* } ∈ {±1}*<sup>N</sup>* as

$$P(\sigma \mid \{J\_{i\_1 \ldots i\_p}\}) = \frac{P(\{J\_{i\_1 \ldots i\_p}\}|\sigma)P(\sigma)}{\sum\_{\sigma} P(\{J\_{i\_1 \ldots i\_p}\}|\sigma)P(\sigma)},\tag{13.2}$$

where *P*({*Ji*1...*<sup>i</sup> <sup>p</sup>* }|*σ*) and *P*(*σ*) are the likelihood and prior distribution, respectively. The summation of spin variables *<sup>σ</sup>* is defined for all possible configurations. The likelihood can be expressed as

$$P(\{J\_{i\_1\ldots i\_p}\}|\sigma) \propto \exp\left(\beta \sum\_{i\_1 < \cdots < i\_p} J\_{i\_1 \ldots i\_p} \sigma\_{i\_1} \ldots \sigma\_{i\_p}\right),\tag{13.3}$$

where β is the inverse temperature and the summation *<sup>i</sup>*1<···<*<sup>i</sup> <sup>p</sup>* runs over all possible combinations of *p* spins out of *N* spins. According to Eqs. (13.2) and (13.3), the posterior distribution can be written by using the Gibbs–Boltzmann distribution with the classical Hamiltonian *H* (*σ*), as follows:

$$P(\sigma \mid \{J\_{i\_1 \dots i\_p}\}) = \frac{1}{Z} \exp\left\{-\beta \left(\mathcal{H}(\sigma) + \mathcal{H}\_{\text{init}}(\sigma)\right)\right\},\tag{13.4}$$

$$Z = \sum\_{\sigma} \exp\left\{-\beta \left(\mathcal{H}(\sigma) + \mathcal{H}\_{\text{init}}(\sigma)\right)\right\},\tag{13.5}$$

$$\mathcal{H}(\sigma) = -\sum\_{i\_1 < \dots < i\_p} J\_{i\_1 \dots i\_p} \sigma\_{i\_1} \dots \sigma\_{i\_p},\tag{13.6}$$

where *Z* is the partition function and *H*init(*σ*) is the initial Hamiltonian, which represents the prior information of the estimated signal. We generally assume that the prior of the estimated signal follows a uniform distribution *P*(*σ*) = 1/2*<sup>N</sup>* .

To decode the original signal, one decoding strategy is the maximum a posteriori (MAP) estimation, which corresponds to searching the ground state of the classical Hamiltonian of Sourlas codes in the limit of zero temperature. Another is the marginal posterior mode (MPM) estimation, which corresponds to finding the expectation value over the posterior distribution at a finite temperature. In the limit of zero temperature, the MPM estimation is consistent with the MAP estimation. In this chapter, we mainly consider the MPM estimation. The estimation performance can be evaluated by the overlap between the original and estimated signal as

$$\mathcal{M}(\boldsymbol{\beta}) = \text{Tr}\_{\xi} \prod\_{i\_1 < \dots < i\_p} \int dJ\_{i\_1 \dots i\_p} P(\boldsymbol{\xi}) P(\{J\_{i\_1 \dots i\_p} \} | \boldsymbol{\xi}) \xi\_i \text{sgn}\langle \sigma\_i \rangle \tag{13.7}$$

where · is the expectation over the posterior distribution *P*(*σ*|{*Ji*1...*<sup>i</sup> <sup>p</sup>* }). This quantity is expected to exhibit a "self-averaging" property in the thermodynamics limit *N* → ∞. This means that the observables, such as the overlap for a quenched realization of the data {*Ji*1...*<sup>i</sup> <sup>p</sup>* }, and *ξ* , are equivalent to the expectation itself over the data distribution *P*(*ξ* )*P*({*Ji*1...*<sup>i</sup> <sup>p</sup>* }|*ξ* ). In this case, the overlap can be expressed as lim*<sup>N</sup>*→∞ *M* = [ξ*i*sgnσ*i*], where the bracket [·] indicates the expectation over the data distribution.

Quantum fluctuations can be utilized to decode the original information. The Hamiltonian of Sourlas code using quantum fluctuations is expressed as follows:

$$
\hat{\mathcal{H}} = s\hat{\mathcal{H}}\_0 + (1-s)\hat{\mathcal{H}}\_{\text{TF}},\tag{13.8}
$$

$$\hat{\mathcal{H}}\_0 = -\sum\_{i\_1 < \dots < i\_p} J\_{i\_1 \dots i\_p} \hat{\sigma}\_{i\_1}^z \dots \hat{\sigma}\_{i\_p}^z,\tag{13.9}$$

$$\hat{\mathcal{H}}\_{\text{TF}} = -\sum\_{i=1}^{N} \hat{\sigma}\_i^x,\tag{13.10}$$

where <sup>σ</sup><sup>ˆ</sup> *<sup>z</sup> <sup>i</sup>* and <sup>σ</sup><sup>ˆ</sup> *<sup>x</sup> <sup>i</sup>* are the *z* and *x* components of the Pauli matrix at site *i*. We parameterize the Hamiltonian by the annealing parameter *s* for the ARA formulation. Note that *H*ˆ <sup>0</sup> and *H*ˆ TF consist of the *z* and *x* components of the Pauli matrices, respectively. As in the classical case, we can consider the MPM estimation using quantum fluctuations. The performance of the MPM estimation using quantum fluctuations can be evaluated by the overlap as follows:

$$M(\beta, s) = \operatorname{Tr}\_{\{\xi\}} \int \prod\_{i\_l < \dots < i\_p} d\boldsymbol{J}\_{i\_l \dots i\_p} \, P(\{\boldsymbol{J}\_{i\_l \dots i\_p} \} | \{\xi\}) P(\{\xi\}) \xi\_i \operatorname{sgn}(\hat{\sigma}\_i^{\varepsilon})\_{\text{TF}}$$

$$\equiv \left[\xi\_i \operatorname{sgn}(\langle \hat{\sigma}\_i^{\varepsilon} \rangle\_{\text{TF}})\right],\tag{13.11}$$

where (·)TF ≡ Tr (·) ρˆ denotes the expectation over the density matrix ρˆ ≡ *e*−β*H*<sup>ˆ</sup> /Tr*e*−β*H*<sup>ˆ</sup> .

#### **13.3 Replica Analysis for Adiabatic Reverse Annealing**

Following Ref. [13], we formulate Sourlas codes using quantum fluctuations in ARA as follows:

$$
\hat{\mathcal{H}} = s\hat{\mathcal{H}}\_0 + (1 - s)(1 - \lambda)\hat{\mathcal{H}}\_{\text{init}} + (1 - s)\lambda\hat{\mathcal{H}}\_{\text{TF}},\tag{13.12}
$$

$$
\hat{\mathcal{H}}\_{\text{init}} = -\sum\_{i=1}^{N} \mathbf{r}\_{i} \hat{\sigma}\_{i}^{z}, \tag{13.13}
$$

where λ (0 ≤ λ ≤ 1) is the RA parameter. We now introduce the initial candidate solution τ*<sup>i</sup>* = ±1 that is expected to be close to the correct ground state ξ*<sup>i</sup>* . We define the probability distribution of the initial candidate solutions as follows:

$$P(\mathbf{r}) = \prod\_{i=1}^{N} P\left(\mathbf{r}\_i\right) = \prod\_{i=1}^{N} \left(c\_1 \delta(\mathbf{r}\_i - \xi\_i) + c\_{-1} \delta(\mathbf{r}\_i + \xi\_i)\right),\tag{13.14}$$

where we utilize the symbol *c*<sup>1</sup> = *c* and *c*−<sup>1</sup> = 1 − *c*. The number *c* (0 ≤ *c* ≤ 1) denotes the fraction of the original signal τ*<sup>i</sup>* = ξ*<sup>i</sup>* in the initial candidate solution as

$$c = \frac{1}{N} \sum\_{i=1}^{N} \delta\_{\tau\_i \xi\_i}. \tag{13.15}$$

We consider that the ARA formulation is the case when we adopt *P*(*σ <sup>z</sup>* |*τ* ) ∝ exp <sup>−</sup><sup>β</sup> *<sup>H</sup>*ˆinit as the prior distribution.

The typical behaviors of the order parameters, such as the overlap, can be obtained via the free energy. The free energy density *f* can be evaluated as <sup>−</sup><sup>β</sup> *<sup>f</sup>* <sup>=</sup> lim*<sup>N</sup>*→∞(1/*N*)[ln *<sup>Z</sup>*] in the limit of *<sup>N</sup>* → ∞ where *<sup>Z</sup>* <sup>=</sup> Tr exp −β*H*ˆ is the partition function of Eq. (13.12). In general, the direct computation of the free energy density is hard due to the configuration average of ln *Z* and the offdiagonal elements in Eq. (13.12). The configuration average can be found using the replica trick [20]. Even though we can avoid the direct computation of [ln *Z*], we cannot apply the standard techniques to evaluate the free energy density due to the non-commutativity of the Hamiltonian.

First, to eliminate the non-commutativity of the Hamiltonian, we apply the Suzuki–Trotter decomposition [22] to the partition function:

$$\begin{split} Z &= \lim\_{M \to \infty} \text{Tr} \left\{ \exp \left( -\frac{\beta}{M} \left( s\hat{\mathcal{H}}\_0 + (1-s)(1-\lambda)\hat{\mathcal{H}}\_{\text{init}} \right) \right) \exp \left( -\frac{\beta(1-s)\lambda}{M} \hat{\mathcal{H}}\_{\text{TF}} \right) \right\}^M \\ &= \lim\_{M \to \infty} Z\_M, \end{split} \tag{13.16}$$

where

$$\begin{split} Z\_{M} = \text{Tr} \exp\left(\frac{\beta s}{M} \sum\_{i=1}^{M} \sum\_{i\_{1} < \cdots < i\_{p}} J\_{i \gets p} \sigma\_{i\_{1}}^{z}(t) \dots \sigma\_{i\_{p}}^{z}(t) + \frac{\beta(1-s)(1-\lambda)}{M} \sum\_{i=1}^{N} \tau\_{i} \sigma\_{i}^{z}(t)\right) \\ + \frac{\beta(1-s)\lambda}{M} \sum\_{i=1}^{N} \sigma\_{i}^{x}(t) \right) \times \prod\_{i=1}^{N} \prod\_{t=1}^{M} \langle \sigma\_{i}^{z}(t) | \sigma\_{i}^{x}(t) \rangle \langle \sigma\_{i}^{x}(t) | \sigma\_{i}^{z}(t+1) \rangle, \end{split} \tag{13.17}$$

where the symbol *t* is the index of the Trotter slice, *M* is the Trotter number, and Tr denotes the trace in the *z* and *x* basis. We impose the periodic boundary conditions σ *<sup>z</sup> <sup>i</sup>* (1) <sup>=</sup> <sup>σ</sup> *<sup>z</sup> <sup>i</sup>* (*M* + 1) for all *i* and introduce the identity operator 1ˆ = {<sup>σ</sup> *<sup>z</sup>*(*t*)} |{<sup>σ</sup> *<sup>z</sup>* (*t*)}{σ *<sup>z</sup>* (*t*)}| and 1ˆ = {<sup>σ</sup> *<sup>x</sup>* (*t*)} |{<sup>σ</sup> *<sup>x</sup>* (*t*)}{<sup>σ</sup> *<sup>x</sup>* (*t*)}|. The detailed calculation is given in Appendix 13.5.

To evaluate [ln *Z*], we utilize the replica trick [20]:

$$\mathbb{E}\left[\log Z\right] = \lim\_{n \to 0} \frac{\left[Z^n\right] - 1}{n},\tag{13.18}$$

where *n* is the replica number. The replicated partition function can be written as

$$[Z^n] = \lim\_{M \to \infty} \sum\_{\{\xi\_i = \pm 1\}} \sum\_{\{\tau\_i = \pm \xi\_i\}} P(\xi) P(\tau) \prod\_{i\_1 < \dots < i\_p} \int dJ\_{i\_1, \dots, i\_p} P(\{J\_{i\_1, \dots, i\_p}\} | \xi\_{i\_1} \dots \xi\_{i\_p})$$

$$\times \operatorname{Tr} \exp \left\{ \frac{\beta s}{M} \sum\_{t, a} \sum\_{i\_1 < \dots < i\_p} J\_{i\_1 \dots p} \sigma\_{i\_1 a}^z(t) \dots \sigma\_{i\_p a}^z(t) + \frac{\beta(1 - s)(1 - \lambda)}{M} \sum\_{i, t, a} \tau\_i \sigma\_{i a}^z(t) \right\}$$

$$+ \frac{\beta(1 - s)\lambda}{M} \sum\_{i, t, a} \sigma\_{i a}^x(t) \left\{ \prod\_{i, t, a} \langle \sigma\_{i a}^z(t) | \sigma\_{i a}^x(t) \rangle \langle \sigma\_{i a}^z(t) | \sigma\_{i a}^z(t + 1) \rangle \right\} , \tag{13.19}$$

in which *a* denotes the replica index.

To remove the dependency of the original signal {ξ }, we apply the gauge transformation *Ji*1...*<sup>i</sup> <sup>p</sup>* <sup>→</sup> *Ji*1...*<sup>i</sup> <sup>p</sup>* <sup>ξ</sup>*<sup>i</sup>*<sup>1</sup> ...ξ*<sup>i</sup> <sup>p</sup>* and <sup>σ</sup> *<sup>z</sup> ia*(*t*) <sup>→</sup> <sup>σ</sup> *<sup>z</sup> ia*(*t*)ξ*<sup>i</sup>* to the partition function [*Zn <sup>M</sup>* ]. Performing the Gaussian integration over the distribution in Eq. (13.1), we introduce the following order parameters as

$$m\_a(t) = \frac{1}{N} \sum\_{i=1}^{N} \sigma\_{ia}^z(t),\tag{13.20}$$

$$q\_{ab}(t, t') = \frac{1}{N} \sum\_{i=1}^{N} \sigma\_{ia}^{z}(t) \sigma\_{ib}^{z}(t'),\tag{13.21}$$

$$R\_a(t, t') = \frac{1}{N} \sum\_{i=1}^{N} \sigma\_{ia}^{\natural}(t) \sigma\_{ia}^{\natural}(t'),\tag{13.22}$$

13 Mean-Field Analysis of Sourlas Codes with Adiabatic Reverse Annealing 325

$$m\_a^x(t) = \frac{1}{N} \sum\_{i=1}^N \sigma\_{ia}^x(t). \tag{13.23}$$

The physical meanings of the order parameters are as follows: *ma*(*t*) is the magnetization, *qab*(*t*, *t* ) is the spin-glass order parameter, *Ra*(*t*, *t* ) is the correlation between each Trotter slice, and *m<sup>x</sup> <sup>a</sup>* (*t*) is the transverse magnetization. Moreover, we introduce the auxiliary parameters *m*˜ *<sup>a</sup>*(*t*), *q*˜*ab*(*t*, *t* ), *R*˜*a*(*t*, *t* ), *m*˜ *<sup>x</sup> <sup>a</sup>* (*t*) of the order parameters with the delta function and its Fourier integral representation. Under the replica symmetry (RS) ansatz and static approximation, *ma*(*t*) = *m*, *qab*(*t*, *t* ) = *q*, *Ra*(*t*, *t* ) = *R*, *m<sup>x</sup> <sup>a</sup>* (*t*) = *m<sup>x</sup>* , *m*˜ *<sup>a</sup>*(*t*) = ˜*m*, *q*˜*ab*(*t*, *t* ) = ˜*q*, *R*˜*a*(*t*, *t* ) = *R*˜, *m*˜ *<sup>x</sup> <sup>a</sup>* (*t*) = *m*˜ *<sup>x</sup>* , we can attain the RS free energy density:

$$-\beta f\_{\rm RS} = \beta s J\_0 m^p + \frac{\beta^2 s^2 J^2}{4} (R^p - q^p) + \beta (1 - s) \lambda m^x - \beta m \tilde{m} - \beta m^x \tilde{m}^x$$

$$-\frac{\beta^2}{2} (R \tilde{R} - q \tilde{q}) + \sum\_{a = \pm 1} c\_a \int Dz \ln 2Y\_a,\tag{13.24}$$

$$Y\_a \equiv \int Dy \cosh \beta u\_a,\tag{13.25}$$

$$
\mu\_a \equiv \sqrt{g\_a^2 + (\tilde{m}^x)^2},
\tag{13.26}
$$

$$\mathbf{g}\_a \equiv \tilde{m} + a(1-\mathbf{s})(1-\lambda) + \sqrt{\tilde{q}}z + \sqrt{\tilde{R}-\tilde{q}}\mathbf{y},\tag{13.27}$$

where *Dz* means that the Gaussian measure *Dz* := 1/ <sup>√</sup>2π*dze*−*z*2/2, and *Dy* is the same as *Dz*. Detailed calculations for deriving the free energy density in Eq. (13.24) are provided in Appendix 13.5. The order parameters and their auxiliary parameters are determined by the saddle-point conditions in the free energy density. The extremization of Eq. (13.24) yields the following saddle-point equations:

$$m = \sum\_{a=\pm 1} c\_a \int Dz Y\_a^{-1} \int Dy \left(\frac{\mathfrak{g}\_a}{\mathfrak{u}\_a}\right) \sinh \beta \mathfrak{u}\_a,\tag{13.28}$$

$$q = \sum\_{a=\pm 1} c\_a \int Dz \left\{ Y\_a^{-1} \int Dy \left(\frac{g\_a}{\mu\_a}\right) \sinh \beta u\_a \right\}^2,\tag{13.29}$$

$$R = \sum\_{a=\pm 1} c\_a \int Dz Y\_a^{-1} \int Dy \left\{ \left( \frac{(\tilde{m}^x)^2}{\beta u\_a^3} \right) \sinh \beta u\_a + \left( \frac{g\_a}{u\_a} \right)^2 \cosh \beta u\_a \right\},\tag{13.30}$$

$$m^{x} = \sum\_{a=\pm 1} c\_{a} \int Dz Y\_{a}^{-1} \int Dy \left(\frac{\tilde{m}^{x}}{u\_{a}}\right) \sinh \beta u\_{a},\tag{13.31}$$

326 S. Arai

$$
\tilde{m} = sJ\_0 p m^{p-1},
\tag{13.32}
$$

$$
\tilde{q} = \frac{s^2 J^2}{2} pq^{p-1},
\tag{13.33}
$$

$$
\tilde{\mathcal{R}} = \frac{s^2 J^2}{2} p \mathcal{R}^{p-1},
\tag{13.34}
$$

$$
\tilde{m}^x = (1 - v)\lambda. \tag{13.35}
$$

From Eq. (13.11), the overlap function is easily expressed as

$$M(\beta, s, \lambda) = \sum\_{a=\pm 1} c\_a \int Dz \text{sgn} \left\{ Y\_a^{-1} \int Dy \left(\frac{\mathbf{g}\_a}{\mu\_a}\right) \sinh \beta u\_a \right\}. \tag{13.36}$$

In the low-temperature region, the *p*-body spin-glass model is known to exhibit replica symmetry breaking (RSB) [4]. The stability condition of RS solutions under the static approximation is expressed as

$$\frac{\beta^2 s^2 J^2 p(p-1)}{2} q^{p-2} \left( \sum\_{a=\pm 1} c\_a A\_a \right) < 1,\tag{13.37}$$

$$A\_a \equiv \int Dz \left\{ \left( Y\_a^{-1} \int Dy \left( \frac{g\_a}{\mu\_a} \right) \sinh \beta \mu\_a \right)^2 \right.$$

$$- Y\_a^{-1} \left( \int Dy \left( \frac{(\tilde{m}^r)^2}{\beta \mu\_a^3} \right) \sinh \beta \mu\_a + \int Dy \left( \frac{g\_a}{\mu\_a} \right)^2 \cosh \beta \mu\_a \right) \right\}^2. \tag{13.38}$$

This condition, called the Almeida–Thouless (AT) condition [1], can be attained by considering perturbations to the RS solutions. This result is consistent with the previous result in Ref. [25] for *p* = 2 , *J*<sup>0</sup> = 0, and λ = 1.

#### **13.4 Numerical Experiments**

We numerically solve the saddle-point equations in Eqs. (13.28)–(13.35) with *p* = 5, temperature *T* = 0.05, and signal-to-noise ratio *J*0/*J* = 1.5. To evaluate the typical MPM estimation performance, we often utilize the overlap *M*(β,*s*, λ). In this chapter, we focus mainly on the possibility of avoiding the first-order phase transition by ARA. For the sake of simplicity and computational cost, we adopt the magnetization as a measure of the average MPM estimation performance using ARA. Figure 13.1a shows the phase diagram of the Sourlas codes using quantum fluctuations in ARA. We consider three initial conditions: *c* = 0.7, 0.8, and 0.95. Each line represents a point of the first-order phase transition. We call these lines "critical" lines. We can avoid a first-order phase transition by preparing for proper initial conditions. When

**Fig. 13.1 a** Phase diagram of Sourlas codes in ARA for *c* = 0.7, 0.8, and 0.95. The vertical and horizontal axes represent the annealing parameter and the RA parameter, respectively. Each line represents the point where the first-order phase transition occurs. The AT line indicates where the AT condition is broken above the line. **b** Differences in magnetization between two local minima at the first-order phase transition in Fig. 13.1 (**a**). The vertical axis denotes the differences in the magnetization between two local minima at the first-order phase transition while the horizontal axis represents the RA parameter

we increase the ratio of the ground state in the initial Hamiltonian, the region where we can avoid the first-order phase transition becomes wider.

We also compute the AT condition Eq. (13.37). As shown in Fig. 13.1a, the AT condition is broken between the AT line and the "critical" line for *c* = 0.7. If the fraction of the ground state in the initial candidate solution is not enough, the spinglass phase emerges and RSB occurs. The emergence of RSB implies the existence of a metastable state. Figure 13.1a shows that we can avoid RSB if we tune the RA parameter λ. For *c* = 0.8, the AT condition is broken in the low λ region. The region where the AT condition is broken is smaller than that for *c* = 0.7. Since we cannot distinguish the AT line from the "critical" line at this scale, we omit the AT line from Fig. 13.1a. For *c* = 0.95, the AT condition holds. Therefore, the local stability of the RS solution is recovered if we can prepare for the proper initial conditions.

To evaluate the extent to which ARA mitigates the difficulty of estimating the original signal, we plot the differences in the magnetization *m* between the two local minima at the first-order phase transition for *c* = 0.7, 0.8, and 0.95. Significant differences in the magnetization result in the separation of the two local minima of the free energy. Figure 13.1b shows that *m* decreases as *c* increases. The two local minima of free energy are brought closer by ARA. As discussed in Ref. [13], the quantum tunneling rate between two local minima in the free-energy landscape increases if the distance between the two local minima is smaller. Our results demonstrate that ARA for Sourlas codes enhances the quantum tunneling effects if we prepare for an appropriate initial condition. This result is consistent with the CDMA model [2].

#### **13.5 Summary**

In this chapter, we explained a mean field analysis of ARA for Sourlas codes. Sourlas codes have a first-order phase transition with *p* > 2, which deteriorates their estimation performance. To avoid the first-order phase transition, we applied ARA to Sourlas codes. The first-order phase transition can be avoided by preparing for the proper initial conditions. The region where the first-order phase transition can be avoided becomes larger as *c* increases. We investigated the differences in magnetization between the two local minima at the first-order phase transition. When ARA was applied, the two local minima of the free energy came closer if we prepared for the proper initial conditions. ARA improved the probability of escaping the local minimum by quantum tunneling. This study shows that ARA can be useful for error correcting codes.

In the practical case, we need to prepare for the initial candidate solution by using some algorithms. In the previous study [2] for CDMA multiuser detection, we utilized the approximate message passing algorithm [8] to prepare for the initial candidate solution. The performance of ARA in practical case was different from the oracle cases where the initial candidate solution was generated from the original signal. Evaluation of the performance of ARA in the practical case for Sourlas codes is an interesting future direction.

**Acknowledgements** We are grateful for valuable comments from Kazuyuki Tanaka, Masayuki Ohzeki, Manaka Okuyama, and ACC Coolen. This work was partly supported by JST-CREST (No. JPMJCR1402).

#### **Appendix 1: Derivation of Eq. (13.17)**

In this appendix, we derive Eq. (13.15) in detail. We mainly follow the references [10, 18, 19]. We consider the *z* basis as the computational basis. In this case, Tr is replaced by {<sup>σ</sup> *<sup>z</sup>*}{<sup>σ</sup> *<sup>z</sup>* }|(·)|{σ *<sup>z</sup>* } and |{σ *<sup>z</sup>* } ≡ ⊗*<sup>N</sup> <sup>i</sup>*=<sup>1</sup>|<sup>σ</sup> *<sup>z</sup> <sup>i</sup>* . For the *z* basis, we introduce *M* copies of the identity operator 1ˆ = {<sup>σ</sup> *<sup>z</sup>*(*t*)} |{<sup>σ</sup> *<sup>z</sup>* (*t*)}{σ *<sup>z</sup>* (*t*)}| into Eq. (13.16),

$$Z\_M = \lim\_{M \to \infty} \prod\_{t=1}^M \sum\_{\{\sigma^z(t)\}} \exp\left(-\frac{\beta}{M} \sum\_{t=1}^M \left(s\mathcal{H}\_0 + (1-s)(1-\lambda)\mathcal{H}\_{\text{init}}\right)\right)$$

$$\times \prod\_{t=1}^M \langle\{\sigma^z(t)\}|\exp\left(-\frac{\beta(1-s)\lambda}{M}\hat{\mathcal{H}}\_{\text{TF}}\right)|\{\sigma^z(t+1)\}\rangle\tag{13.39}$$

where we introduce the periodic boundary condition |{σ *<sup>z</sup>* (1)} = |{σ *<sup>z</sup>* (*M* + 1)}. To show the dependence of the spin operator on the Trotter index, arguments are added to each Hamiltonian in Eq. (13.39). For *x* basis, we similarly introduce the *M* copies of the identity operator 1ˆ = {<sup>σ</sup> *<sup>x</sup>* (*t*)} |{<sup>σ</sup> *<sup>x</sup>* (*t*)}{<sup>σ</sup> *<sup>x</sup>* (*t*)}| into Eq. (13.39). The last term in Eq. (13.39) can be written as

$$\prod\_{t=1}^{M} \sum\_{\{\sigma^{\boldsymbol{x}}(t)\}} \exp\left(-\frac{\beta(1-\boldsymbol{s})\lambda}{M} \mathcal{H}\_{\text{TF}}\right) \prod\_{t=1}^{M} \langle\{\sigma^{\boldsymbol{z}}(t)\}|\{\sigma^{\boldsymbol{x}}(t)\}\rangle\langle\{\sigma^{\boldsymbol{x}}(t)\}|\{\sigma^{\boldsymbol{z}}(t+1)\}\rangle. \tag{13.40}$$

Finally, we can obtain Eq. (13.17) in the main text as

$$\begin{split} Z\_M &= \prod\_{t=1}^M \text{Tr} \exp\left(-\frac{\beta}{M} \sum\_{t=1}^M (s\mathcal{H}\_0 + (1-s)(1-\lambda)\mathcal{H}\_{\text{init}}) - \frac{\beta(1-s)\lambda}{M} \sum\_{t=1}^M \mathcal{H}\_{\text{TF}}\right) \\ &\times \prod\_{i=1}^N \prod\_{t=1}^M \langle \sigma\_i^z(t) | \sigma\_i^x(t) \rangle \langle \sigma\_i^x(t) | \sigma\_i^z(t+1) \rangle, \end{split} \tag{13.41}$$

where Tr denotes the summation over all the possible spin configurations {<sup>σ</sup> *<sup>z</sup> <sup>i</sup>* } and {σ *x <sup>i</sup>* }. Since the first term in Eq. (13.41) consists of the commutable numbers, we can take the configuration average over the data distribution.

#### **Appendix 2: Derivation of the RS Free Energy**

We derive the free energy density under the RS ansatz and the static approximation. After the gauge transformation *Ji*1...*<sup>i</sup> <sup>p</sup>* <sup>→</sup> *Ji*1...*<sup>i</sup> <sup>p</sup>* <sup>ξ</sup>*<sup>i</sup>*<sup>1</sup> ...ξ*<sup>i</sup> <sup>p</sup>* and <sup>σ</sup> *<sup>z</sup> ia*(*t*) <sup>→</sup> <sup>σ</sup> *<sup>z</sup> ia*(*t*)ξ*<sup>i</sup>* , we integrate over *Ji*1,...,*<sup>i</sup> <sup>p</sup>* as

$$\prod\_{i\_1<\cdots
$$=\prod\_{i\_1<\cdots
$$\simeq \exp\left\{ \frac{\beta s J\_0 N}{M} \sum\_{a,t} \left( \frac{1}{N} \sum\_{i=1}^N \sigma\_{ia}^z(t) \right)^p + \frac{\beta^2 s^2 J^2 N}{4M^2} \sum\_{a,b,t,t'} \left( \frac{1}{N} \sum\_{i=1}^N \sigma\_{ia}^z(t) \sigma\_{ib}^z(t') \right)^p \right\},\tag{13.42}$$
$$
$$

where we use the expression *<sup>i</sup>*1<···<*<sup>i</sup> <sup>p</sup>* <sup>σ</sup> *<sup>z</sup> <sup>i</sup>*<sup>1</sup> ...σ *<sup>z</sup> <sup>i</sup> <sup>p</sup>* = (*N <sup>p</sup>*/*p*!) *<sup>N</sup> <sup>i</sup>*=<sup>1</sup> <sup>σ</sup> *<sup>z</sup> <sup>i</sup>* /*N p* + *O*(*N <sup>p</sup>*−<sup>1</sup>). We introduce the delta function and its Fourier integral representation for Eqs. (13.20)–(13.23) as follows:

$$\prod\_{a,t} \int dm\_a(t)\delta\left(m\_a(t) - \frac{1}{N} \sum\_{i=1}^N \xi\_i \sigma\_{ia}^z(t)\right)$$

330 S. Arai

$$
\begin{split}
\mathcal{I} &= \prod\_{a,t} \int \frac{\beta i N d m\_a(t) d\bar{m}\_a(t)}{2\pi M} e^{-\frac{\beta \bar{a}\_a(t)}{M} \left( N m\_a(t) - \sum\_{i=1}^N \xi\_i \sigma\_{ia}^z(t) \right)}, \\
& \prod\_{a,t,t'} \int d R\_a(t, t') \delta \left( R\_a(t, t') - \frac{1}{N} \sum\_{i=1}^N \sigma\_{ia}^z(t) \sigma\_{ia}^z(t') \right) \\
&= \prod\_{a,t,t'} \int \frac{\beta^2 i N d R\_a(t, t') d\bar{R}\_a(t, t')}{4\pi M^2} e^{-\frac{\beta^2 \bar{a}\_a(t')}{2M^2} \left( N R\_a(t, t') - \sum\_{i=1}^N \sigma\_{ia}^z(t) \sigma\_{ia}^z(t') \right)}, \\
& \prod\_{a \neq b,t,t'} \int dq\_{ab}(t, t') \delta \left( q\_{ab}(t, t') - \frac{1}{N} \sum\_{i=1}^N \sigma\_{ia}^z(t) \sigma\_{ib}^z(t') \right) \\
&= \prod\_{a \neq b,t,t'} \int \frac{\beta^2 i N dq\_{ab}(t, t') d\bar{q}\_{ab}(t, t')}{4\pi M^2} e^{-\frac{\beta^2 \bar{a}\_b(t, t')}{2N^2} \left( N q\_{ab}(t, t') - \sum\_{i=1}^N \sigma\_{ia}^z(t) \sigma\_{ib}^z(t') \right)}, \\
& \tag{13.45}
\end{split} \tag{13.45}$$

$$\begin{split} & \prod\_{a,t} \int dm\_a^x(t) \delta\left(m\_a^x(t) - \frac{1}{N} \sum\_{i=1}^N \sigma\_{ia}^x(t)\right) \\ &= \prod\_{a,t} \int \frac{\beta i N d m\_a^x(t) d\tilde{m}\_a^x(t)}{2\pi \, M} e^{-\frac{\beta \tilde{m}\_a^x(t)}{M} \left(N m\_a^x(t) - \sum\_{i=1}^N \sigma\_{ia}^x(t)\right)}. \end{split} \tag{13.46}$$

The partition function can be written as

$$\begin{aligned} [Z^{\mathbb{Z}}] &\simeq \lim\_{M\to\infty} \prod\_{a,t} \int \frac{\beta i N d\boldsymbol{m}\_{a}(t) d\boldsymbol{\tilde{n}}\_{a}(t)}{2\pi M} \prod\_{a,t\neq t'} \int \frac{\beta^{2} i N d\boldsymbol{R}\_{a}(t,t') d\boldsymbol{\tilde{R}}\_{a}(t,t')}{4\pi M^{2}} \\ &\times \prod\_{a\neq b,t,t'} \int \frac{\beta^{2} i N d\boldsymbol{q}\_{ab}(t,t') d\boldsymbol{\tilde{q}}\_{ab}(t,t')}{4\pi M^{2}} \prod\_{a,t} \int \frac{\beta N d\boldsymbol{m}\_{a}^{\times}(t) d\boldsymbol{\tilde{m}}\_{a}^{\times}(t)}{2\pi i M} e^{G1+G2+G3}, \end{aligned} \tag{13.47}$$
 
$$\epsilon^{G1} \equiv \exp\left\{ \frac{\beta s J\_{0} N}{M} \sum\_{a,t} (m\_{a}(t))^{p} + \frac{\beta^{2} s^{2} J^{2} N}{4M^{2}} \left( \sum\_{a \neq b,t,t'} q\_{ab}^{p}(t,t') + \sum\_{a,t \neq t'} R\_{a}^{p}(t,t') + nM \right) \right\} \tag{13.48}$$
 
$$\epsilon^{G2} \equiv \sum \sum \sum \quad P(\mathbf{\tilde{t}})P(\mathbf{\tilde{r}}) \text{Tr} \exp\left\{ \frac{\beta}{M} \sum \boldsymbol{\tilde{m}}\_{a}(t) \sum \boldsymbol{q}\_{\tilde{r}\boldsymbol{\tilde{q}}}^{\tilde{r}}(t)}$$

$$\begin{split} \mathcal{I}^{1,0}\_{1} & \equiv \sum\_{\{\xi\_{i} = \pm 1\}} \sum\_{\{\tau\_{i} = \pm \xi\_{i}\}} P(\xi) P(\tau) \text{Tr} \exp\left\{ \frac{\nu}{M} \sum\_{a,t} \bar{m}\_{a}(t) \sum\_{i=1}^{\tau} \sigma\_{ia}^{\tau\_{i}}(t) \\ & + \frac{\beta(1-s)(1-\lambda)}{M} \sum\_{a,t,i} \tau\_{i} \xi\_{i} \sigma\_{ia}^{\tau\_{i}}(t) + \frac{\beta^{2}}{2M^{2}} \sum\_{a,t,t \neq t'} \bar{R}\_{a}(t,t') \sum\_{i=1}^{N} \sigma\_{ia}^{\tau\_{i}}(t) \sigma\_{ia}^{\tau\_{i}}(t') \\ & + \frac{\beta^{2}}{2M^{2}} \sum\_{a \neq b} \sum\_{t,t'} \bar{q}\_{ab}(t,t') \sum\_{i=1}^{N} \sigma\_{ia}^{\tau\_{i}}(t) \sigma\_{ib}^{\tau\_{i}}(t') + \frac{\beta}{M} \sum\_{a,t} \bar{m}\_{a}^{\tau\_{a}}(t) \sum\_{i=1}^{N} \sigma\_{ia}^{\tau\_{i}}(t) \end{split} \tag{13.49}$$
 
$$\times \prod\_{a,t,i} \langle \sigma\_{ia}^{\tau\_{i}}(t) | \sigma\_{ia}^{\tau\_{i}}(t) \rangle \langle \sigma\_{ia}^{\tau\_{i}}(t) | \sigma\_{ia}^{\tau\_{i}}(t+1) \rangle, \tag{13.49}$$

$$e^{G\_{\beta}} \equiv \exp\left\{-\frac{\beta N}{M} \sum\_{a,t} \tilde{m}\_a(t) m\_a(t) - \frac{\beta^2 N}{2M^2} \sum\_{a,t \neq t'} \tilde{R}\_a(t, t') R\_a(t, t')\right.$$

$$-\frac{\beta^2 N}{2M^2} \sum\_{a
$$+\frac{\beta(1-s)\lambda N}{M} \sum\_{a,t} m\_a^{\times}(t)\right\}.\tag{13.50}$$
$$

We assume the RS ansatz and the static approximation as

$$m\_a(t) = m, \, q\_{ab}(t, t') = q \,\, (a \neq b), \, R\_a(t, t') = R \,\, (t \neq t'), \, m\_a^x(t) = m^x,$$

$$\tilde{m}\_a(t) = \tilde{m}, \, \tilde{q}\_{ab}(t, t') = \tilde{q} \,\, (a \neq b), \, \tilde{R}\_a(t, t') = \tilde{R} \,\, (t \neq t'), \, \tilde{m}\_a^x(t) = \tilde{m}^x. \,\, (13.51)$$

Under the RS ansatz and the static approximation, *eG*<sup>1</sup> is represented as

$$e^{G1} \equiv \exp\left\{ \beta nN \left( sJ\_0 m^p + \frac{\beta s^2 J^2}{4} \left( (n-1)q^p + R^p \right) + \mathcal{O}\left(\frac{1}{M}\right) \right) \right\}. \tag{13.52}$$

We compute *eG*<sup>2</sup> under the RS ansatz and the static approximation as follows:

*<sup>e</sup>G*<sup>2</sup> <sup>=</sup> {ξ*i*=±1} {τ*i*=±ξ*i*} *<sup>P</sup>*(*<sup>ξ</sup>* )*P*(*<sup>τ</sup>* )Tr exp β*m*˜ *M a*,*t*,*i* σ *z ia*(*t*) +β(<sup>1</sup> <sup>−</sup> λ)(<sup>1</sup> <sup>−</sup> *<sup>s</sup>*) *M a*,*t*,*i* τ*<sup>i</sup>* ξ*i*σ *<sup>z</sup> ia*(*t*) <sup>+</sup> <sup>β</sup><sup>2</sup>*R*˜ 2*M*<sup>2</sup> *a*,*t*=*t N i*=1 σ *z ia*(*t*)σ *<sup>z</sup> ia*(*t* ) <sup>+</sup> <sup>β</sup><sup>2</sup>*q*˜ 2*M*<sup>2</sup> *a*=*b t*,*t N i*=1 σ *z ia*(*t*)σ *<sup>z</sup> ib*(*t* ) <sup>+</sup> <sup>β</sup>*m*˜ *<sup>x</sup> M a*,*t N i*=1 σ *x ia*(*t*) ⎫ ⎬ ⎭ <sup>×</sup> *a*,*t*,*i* σ *z ia*(*t*)|<sup>σ</sup> *<sup>x</sup> ia*(*t*)<sup>σ</sup> *<sup>x</sup> ia*(*t*)|<sup>σ</sup> *<sup>z</sup> ia*(*t* + 1) <sup>=</sup> *N i*=1 ξ*i*=±1 τ*i*=±ξ*<sup>i</sup>* 1 2 *P*(τ*i*) *Dz <sup>n</sup> a*=1 *Dy M t*=1 Tr exp <sup>β</sup> *<sup>M</sup>* (*m*˜ +(1 − *s*)(1 − λ)τ*<sup>i</sup>* ξ*<sup>i</sup>* + ˜*qz* + *<sup>R</sup>*˜ − ˜*q y* σ *z ia*(*t*) <sup>+</sup> <sup>β</sup>*m*˜ *<sup>x</sup> <sup>M</sup>* <sup>σ</sup> *<sup>x</sup> ia*(*t*) <sup>×</sup> *a*,*t*,*i* σ *z ia*(*t*)|<sup>σ</sup> *<sup>x</sup> ia*(*t*)<sup>σ</sup> *<sup>x</sup> ia*(*t*)|<sup>σ</sup> *<sup>z</sup> ia*(*t* + 1) <sup>=</sup> *N i*=1 ξ*i*=±1 τ*i*=±ξ*<sup>i</sup>* 1 2 *P*(τ*i*) *Dz* - *Dy*2 cosh β *g*<sup>2</sup>(τ*i*, ξ*i*) + (*m*˜ *<sup>x</sup>* )<sup>2</sup> *n*

$$\begin{split} \eta &\simeq \prod\_{i=1}^{N} \sum\_{\xi\_{i}=\pm 1} \frac{1}{2} \exp\left\{ n \int Dz \sum\_{\tau\_{i}=\pm \xi\_{i}} P(\tau\_{i}) \ln \int Dy 2 \cosh \beta \sqrt{g^{2}(\tau\_{i}, \xi\_{i}) + (\tilde{m}^{x})^{2}} \right\}, \\ \eta &= \exp\left\{ nN \left( \sum\_{a=\pm 1} c\_{a} \int Dz \ln \int Dy 2 \cosh \beta \sqrt{g\_{a}^{2} + (\tilde{m}^{x})^{2}} \right) \right\}, \end{split} \tag{13.53}$$

where

$$g(\tau\_i, \xi\_i) = \tilde{m} + (1 - s)(1 - \lambda)\tau\_i \xi\_i + \sqrt{\tilde{q}}z + \underbrace{\sqrt{\tilde{R}} - \tilde{q}}\_{} \text{y},\tag{13.54}$$

$$\mathbf{g}\_a = \tilde{m} + a(1-s)(1-\lambda) + \sqrt{\tilde{q}}z + \sqrt{\tilde{R} - \tilde{q}}\mathbf{y}.\tag{13.55}$$

We apply the Hubbard–Stratonovich transformation,

$$\exp\left(\frac{\mathbf{x}^2}{2}\right) = \int Dv\_1 \exp\left(\mathbf{x}v\_1\right),\tag{13.56}$$

to the terms β *q*˜/*M <sup>a</sup>*,*<sup>t</sup>* σ *<sup>z</sup> ia*(*t*) 2 /2 and *a* - β *R*˜ − ˜*q*/*M <sup>t</sup>* σ *<sup>z</sup> ia*(*t*) 2 /2. We now perform the inverse operation of the Suzuki–Trotter decomposition and take the trace.

Under the RS ansatz and the static approximation, *eG*<sup>3</sup> is expressed as

$$e^{G\_3} = \exp\left\{\beta nN\left(-m\tilde{m} - m^x\tilde{m}^x - \frac{\beta}{2}R\tilde{R} - \frac{\beta(n-1)}{2}q\tilde{q} + (1-s)\lambda m^x\right) + \mathcal{O}\left(\frac{1}{M}\right)\right\}.\tag{13.57}$$

In the thermodynamic limit *N* → ∞, the saddle-point method can be used. The RS free energy density is then expressed as

$$\begin{split} -\beta f\_{\text{RS}} &= \lim\_{n \to 0} \frac{[Z^n] - 1}{nN} \\ &= \underset{\substack{m, q, R \\ \tilde{m}, \tilde{q}, \tilde{R}}}{\text{ext}} \left[ \beta s J\_0 m^p + \frac{\beta^2 s^2 J^2}{4} (R^p - q^p) + \beta (1 - s) \lambda m^x - \beta m \tilde{m} - \beta m^x \tilde{m}^x \right] \\ &- \frac{\beta^2}{2} (R \tilde{R} - q \tilde{q}) + \sum\_{a = \pm 1} c\_a \int Dz \ln \int Dy 2 \cosh \beta \sqrt{s\_a^2 + (\tilde{m}^x)^2} \right]. \end{split} \tag{13.58}$$

The order parameters and their auxiliary parameters can be determined from the saddle-point conditions.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part V Applications**

# **Chapter 14 Structural and Functional Analysis of Proteins Using Rigidity Theory**

#### **Adnan Sljoka**

**Abstract** Over the past two decades, we have witnessed an unprecedented explosion in available biological data. In the age of big data, large biological datasets have created an urgent need for the development of bioinformatics methods and innovative fast algorithms. Bioinformatics tools can enable data-driven hypothesis and interpretation of complex biological data that can advance biological and medicinal knowledge discovery. Advances in structural biology and computational modelling have led to the characterization of atomistic structures of many biomolecular components of cells. Proteins in particular are the most fundamental biomolecules and the key constituent elements of all living organisms, as they are necessary for cellular functions. Proteins play crucial roles in immunity, catalysis, metabolism and the majority of biological processes, and hence there is significant interest to understand how these macromolecules carry out their complex functions. The mechanical heterogeneity of protein structures and a delicate mix of rigidity and flexibility, which dictates their dynamic nature, is linked to their highly diverse biological functions. Mathematical rigidity theory and related algorithms have opened up many exciting opportunities to accurately analyse protein dynamics and probe various biological enigmas at a molecular level. Importantly, rigidity theoretical algorithms and methods run in almost linear time complexity, which makes it suitable for high-throughput and big-data style analysis. In this chapter, we discuss the importance of protein flexibility and dynamics and review concepts in mathematical rigidity theory for analysing stability and the dynamics of protein structures. We then review some recent breakthrough studies, where we designed rigidity theory methods to understand complex biological events, such as allosteric communication, large-scale analysis of immune system antibody proteins, the highly complex dynamics of intrinsically disordered proteins and the validation of Nuclear Magnetic Resonance (NMR) solved protein structures.

A. Sljoka (B)

RIKEN Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan e-mail: adnan.sljoka@riken.jp

#### **14.1 Introduction**

In the current post-genomics era, advances in experimental and computational techniques have revolutionized biological and biomedical research. High-throughput technologies have paved the way to novel research avenues where we can systematically analyse whole genomes of organisms and individual or collection of proteins, including their structures and interactions with other proteins, which in many cases allow researchers to successfully decipher their biological functions. Proteins are macromolecules that are fundamental to most cellular function [1]. They comprise the highest levels of molecular and cellular structure and organization, and because the majority of physiological and disease processes are manifested within proteins, structural and computational biology research is focused on understanding protein function.

Proteins and other biomolecules are nanomachines. Accurate representation of their three-dimensional structure is a critical first step to understanding how they perform their functions. Advances in molecular biology, instrumentation, and imaging technologies such as X-ray crystallography, nuclear magnetic resonance (NMR), and electron microscopy have led to a revolution in structural biology. These techniques allow us to see beautiful yet complex three-dimensional shapes of protein structures and how they interact with other proteins and ligands. Protein imaging techniques are continuously improving, and for many proteins, we can now characterize their structures at an individual-atom-level resolution. A rapidly growing and revolutionary cryogenic-electron microscopy (cryo-EM) technique has been attracting significant attention, as very recently it has broken various resolution barriers [2] and can now discern individual atoms of very large protein structures (see Fig. 14.1). Cryo-EM complements X-ray crystallography because it reveals atomistic structural details without the need for a crystalline specimen. Protein Data Bank (PDB), a repository of experimentally solved protein structures, together with computationally determined protein structures, make up a rich source of protein structural data. Recent advances in AI and deep learning have provided significant improvements in inferring protein structures from a sequence of amino acids [3]. Deepmind's Alphafold method has demonstrated that deep learning structure predictions can come astonishingly close to experimentally determined structures, and in the near future, we expect this will result in huge growth of macromolecular structural data. The increasing richness of the available protein structural data and the rapidly growing proteomics and bioinformatics big-data repositories open up possibilities to systematically analyse complex biological questions and gain novel biological insights. To facilitate data-driven biological knowledge discovery, many bioinformatics and computational biology tools, software packages, and databases have been developed [4].

Despite tremendous advances in bioinformatics, structural biology and imaging technologies which have generated hundreds of thousands of atomic snapshots of protein structures, many fundamental biological problems such as protein folding, allosteric regulation, receptor signalling, and enzyme catalysis, to name a few, still remain largely unresolved [5–12]. While the static high-quality representation of **Fig. 14.1** Cryo-EM snapshot structure of viral spike protein of SARS-CoV-2 (a key protein involved in COVID-19), which is a very large protein structure consisting of three chains (distinct colours), each consisting of nearly 1300 amino acids

protein structures can offer clues to structure-function mechanisms, protein function is almost purely controlled by its dynamic character through a delicate mix of rigidity and flexibility. Research must move beyond static snapshot representations of proteins, as the mechanical heterogeneity of protein structures that dictates their dynamic nature is intimately linked to their highly diverse biological functions. Deep understanding of the connection between structures and internal protein flexibility, rigidity, and dynamics is absolutely critical, as it can lead to solutions to protein folding problem, elusive allosteric regulation and other dynamically driven biological secrets of protein regulation.

The primary desire of any protein researcher is to see proteins move in real time at the atomistic level while they carry out their biological functions. Yet, despite many advances in experimental techniques and molecular dynamics simulations, such a goal is still very far from being realized. Analysing and comprehending protein flexibility and dynamics has proven to be extremely difficult. One major challenge is that the main molecular simulation methods, such as classical molecular dynamics simulations, require a prohibitive amount of computational power and are not suitable to reach biologically relevant functional dynamics that occur on longer (millisecond-second) timescales. Furthermore, with rapid growth in the number of experimentally solved biomolecular structures and the increasing size of structural protein databases, including the expanding big-data size sets of computationally predicted protein structures, we are faced with a pressing need to develop fast algorithms and novel mathematical and computational techniques that simplify the classical force fields and can offer experimentally verified accurate predictions of protein flexibility and dynamics.

Techniques inspired from the field of mathematical-structural rigidity theory [13– 16] have gained special attention as they are suitable for handling the many challenges with computational analysis of protein flexibility and its dynamics. Biological functions of protein structures are often related to their network (or graph) properties. Mathematical rigidity theory offers considerable promise in deciphering graph theoretical properties of protein networks to better understand protein function [13, 15–18]. In rigidity theory, proteins are modelled as geometric molecular frameworks consisting of atoms and various connecting intermolecular forces. Such frameworks are essentially multigraphs (networks), in which atoms are vertices and edges form various bonding and non-bonding constraints (see Sect. 14.3). The programme FIRST [15] and related methods [19] apply mathematical results that provide combinatorial characterization of rigidity and flexibility on a molecular multigraph, which can rapidly decompose a protein framework (i.e., multigraph) into flexible and rigid regions. Starting with a decomposition of a protein into rigid and flexible regions, fast Monte Carlo geometric simulation methods, such as FRODA and FRODAN [19–22], can sample the highly complex conformational space of proteins and simulate their functionally relevant motions. The main advantage of rigidity theory methods over classical molecular dynamics simulations is that their predictions of rigidity and flexibility are very fast, they are not affected by timescale issues (see Sect. 14.2), and they are suitable for high-throughput and big-data style analyses. Moreover, predictions based on rigidity theory have been widely shown to be consistent with experimental measures of protein flexibility and dynamics [11, 12, 15, 17–19, 22, 24–26].

In this chapter, we first discuss the importance of protein flexibility and dynamics for biological function (Sect. 14.2). We then provide a brief review of fundamental concepts in rigidity theory (Sect. 14.3) that enables us to perform fast predictions of flexibility and dynamics of protein structures. We next discuss how to represent biomolecules as a graph constraint network, the mathematical/algorithmic background for analysing protein networks, and the basic uses of rigidity theory software for analysing protein flexibility and its dynamics. We then review some major advances contributed by the author of this chapter, in which rigidity theory and algorithms were used to elucidate and provide new perspectives on very complex biological phenomena, such as long-range allosteric communication, enzyme catalysis, antibody dynamics, and NMR structural validation (Sect. 14.4). We conclude by reviewing some of these recent developments and some surprising breakthroughs that have led to rich protein function discoveries that were mainly driven by mathematical rigidity theory.

#### **14.2 Protein Structural Flexibility and Dynamics**

In this section, we briefly cover for non-biologists the background and the importance of predicting protein flexibility, which is arguably one of the most fundamental research topics in biochemistry, structural biology, and bioinformatics.

## *14.2.1 Protein Flexibility and Dynamics Is Central to Protein Function*

Proteins are polypeptide chains composed of a linear sequence(s) of amino acids [1]. Through a complex protein folding process, forces are exerted on atoms which steer a polypeptide chain(s) into a defined three-dimensional biologically functional native-state structural ensemble. High-resolution X-ray crystallography and other techniques have revealed aesthetic structural complexity of protein structures and have revolutionized our understanding of their function, which have spearheaded the development of novel experimental and computational methods for examining protein function in atomistic detail. It is important to stress that solved protein structures are only snapshots or pictures of proteins at some low-energy state. This can often provide a misleading representation of proteins and potentially misinform about their function, which must include kinetic and thermodynamic descriptions [5] (see Fig. 14.2).

Proteins are composed of rigid and connecting flexible regions that can be highly dynamic, which facilitates sampling a wide variety of conformations spanning a complex multidimensional energy landscape. In this conformational biomolecular dance, proteins undergo dynamical fluctuations even under conditions that are preferentially biased towards a well-defined low-energy 'native' state [5]. Such dynamically driven conformational states and fluctuations are critical to long-range allosteric regulations, ligand recognition, catalytic efficiency, antibody–antigen recognition and the majority of functional mechanisms. Understanding protein flexibility and rigidity and how it is modified by mutations and ligand binding is critical to understanding and modulating protein function [5, 7, 8, 11, 12]. Most globular proteins (excluding intrinsically disordered proteins) function through utilizing a delicate mix of rigidity and flexibility. Achieving appropriate balance between rigidity and flexibility is one of the most important keys for biological function. Protein rigidity is necessary, as it maintains overall structural fold, while flexibility and dynamics enable proteins to perform specific functions. Protein defects can lead to alterations in overall folding, or they can cause proteins to be overly flexible, interfering with protein function, or cause other extreme defects that can result in indestructible rigid protein. These scenarios are related to numerous medical conditions, including neurological disorders, Alzheimer's disease, and Mad Cow disease [22, 27]. Hence, predicting and examining protein flexibility and dynamics is the most important, and probably the

**Fig. 14.2** The structure of an enzyme (Protein Data Bank ID 2jz3) showing **a** protein snapshot representation and conformational ensemble depicting its dynamical characteristics **b**

most complex, component of protein research. This is an active area of research in both experimental protein science and computational biology.

Protein structures can have thousands of conformational degrees of freedom. It is therefore easy to imagine that their motions can be extremely complex, and determining flexible and rigid regions and how they move relative to one another can seem like a daunting task. Moreover, many proteins are oligomeric structures consisting of two or more interacting polypeptide chains, and in some cases the structures are very large, consisting of thousands of amino acids (see Fig. 14.1). Protein flexibility and rigidity are often regulated by interactions with small ligands, drugs, hormones, and cations (e.g., calcium and magnesium) and changes in temperature, pressure, and pH [11, 15, 17, 18, 24]. Internal motion and conformational change can be rapid and transient and result in a structural ensemble that can often be spectroscopically indistinguishable from the snapshot ground state determined by X-ray crystallography or other imaging techniques (see Fig. 14.2). Protein dynamics occur across a wide range of timescales, from very rapid short-amplitude motions caused by bond vibrations occurring on a femtosecond range, to side-chain motions on the picosecond to nanosecond timescale, all the way up to very slow larger-amplitude collective domain motions, which are often biologically most significant, occurring in the milliseconds to seconds range [5] (see Fig. 14.3). Dynamics on longer timescales (i.e., millisecond to second timescales) are functionally very important because many biological processes—including allostery, enzyme catalysis, receptor activations, and protein– protein interactions—occur on such timescales [5, 9, 11, 12, 24, 28]. Fluctuations between different low-energy states and the heights of their energy barriers can also be affected by mutations, ligand binding, and changes in temperature or pH. The timescale component of protein dynamics is one critical factor that complicates the computation examination of protein dynamics. Another important characteristic of protein dynamics is the amplitude and directionality of conformational fluctuations [5]. All these factors combine to contribute to the difficulty in obtaining knowledge about the flexibility and motion of proteins.

Despite this complexity, functional motions will often involve large domain– domain motions (i.e., relative motions dominated by a few rigid bodies) and many degrees of freedom can be neglected or suppressed to study the functionally most important motions. Hence decomposition of a protein into rigid and flexible regions is a highly important aspect of deciphering protein dynamics.

## *14.2.2 Techniques for Analysing and Predicting Protein Flexibility and Dynamics*

In terms of experimental techniques, NMR measurements such as order parameter measurements and chemical shifts are very useful in studying protein dynamics [24, 29]. Mass spectrometry, hydrogen–deuterium exchange, crystallographic B-values, etc. can also provide deep insights into the dynamical nature of protein structures [5, 11, 24, 25]. Fluorescence resonance energy transfer (FRET) [30] measures in particular have high practical value as they can characterize changes in distance for single molecules over time as well as possible corresponding conformational changes. However, the disadvantage of FRET is that only a single distance change is measured. Experimental measurements are useful as they can be used to infer specific information about dynamics across a specific range of timescales (see Fig. 14.3) and are specifically very helpful in supporting and validating computational predictions. The disadvantage of experimental tools is their high cost, susceptibility to uncertainty in measurements, and frequent inability to provide information about very dynamic regions of protein structures. Moreover, protein structures often have to be stabilized to extract structural and dynamical information. Experimental measurements can also take a long time to perform, as they require maintenance of very expensive equipment; yet, such measurements can rarely provide dynamical information about individual atoms.

Computationally, it should be theoretically possible to describe protein dynamics in their entirety. Molecular dynamics (MD) simulation has been the most widely used approach for simulating the motions of proteins and other biopolymers [28]. Molecular dynamics simulations of proteins have been a common tool in biochemistry and biophysics since the 1970s [31]. It has been successfully applied to protein folding problems, the impact of protein motions on enzyme catalysis, and the effects of mutations and ligand binding on protein motions [28]. Its uses have increased in recent years, pointing to the key importance of deciphering the relationships between complex motions and protein function. In molecular dynamics simulations, the trajectories of individual atoms in protein structures can be predicted by repeated numerical solutions of the Newtonian motion equation (i.e., *F* = *ma*), with forward integration in time, where *F* represents a force field (energy function). A force field models all potential forces and energies between the molecules and is supposed to be a

**Fig. 14.3 a** A one-dimensional cross-sectional representation of a high-dimensional protein's energy landscape. Proteins can be defined as multiple collections of low-energy conformational states (defined as minima in the energy surface), with many conformational ensemble substates interconverting between one another on very fast timescales. The time it takes a protein to transition from one low-energy state to another is dependent on the height of the energy barrier between the states. When the barrier is high, this can occur in a relatively long microsecond to second range. **b** Timescales of different dynamic processes in proteins and different experimental methods that can detect fluctuations on each timescale. Longer timescales are largely inaccessible to classical MD simulations. However, rigidity theory methods and simulations are not confined by this timescale issue. Figure adapted from [5]

simple parameterization of the energy surface of the protein. A number of different methods and force field models exist for parametrizing the potential energy surface. Assuming one can use an accurate description of a force field, a difficult and heavily debated concept, molecular dynamics simulation can be extremely useful in tracking the precise position of atoms over time. However, the major downside of molecular dynamics simulations is that they require prohibitively excessive computational power. Indeed, even despite today's computational advances and special-purpose simulation machines [32], in the majority of cases molecular dynamics simulations are largely impractical for investigating biologically relevant protein motions on relatively long microsecond timescales. Stemming from the increase in protein structural data combined with the increasing size of solved structures, advances in emerging Cryo-EM technology and deep learning, it is clear that there is an urgent need to develop alternate efficient and accurate computational methods for molecular flexibility and dynamics simulations.

A large class of computational approaches that simplify classical force fields have been developed. Coarse-grained simulations, normal model analysis, principal component analysis, contact network analysis, and other related methods have become popular alternative approaches to classical MD simulations [33]. In coarse-grained and network approaches, physical units such as individual amino acids or a cluster of amino acids including rigid clusters can be treated as nodes (vertices), where edges indicate possible interactions or contacts. For more precise modelling, individual atoms should be treated as vertices and edges should model pairwise bonded and non-bonded contacts.

Arguably, one of the most powerful ways of analysing the flexibility and rigidity of protein structures, especially using an all atom representation, is based on mathematical rigidity theory [13–16, 19, 34]. Rigorous mathematical results in rigidity theory, whose details are explained below, can be used in combination with fast algorithms to rapidly decompose a protein constraint graph into rigid and flexible regions. Moreover, how rigidity is modified through protein–protein, protein–ligand, or other interactions can be quickly predicted. Such decompositions are very informative as they can be combined with other methods such as MD simulations, normal mode analysis, or Monte Carlo simulations [19, 22] to directly infer information about protein dynamics. This is discussed in more detail below. We now turn the discussion to mathematical formulations and the uses of rigidity theory for the analysis of protein structures.

#### **14.3 Rigidity Theory**

In this section, we present a basic introduction and results of rigidity theory that are essential for applications to protein structure and function analysis, with a focus on combinatorial rigidity theory concepts. For a thorough review of rigidity theory see [13, 19, 34].

## *14.3.1 Combinatorial Rigidity Theory and the Molecular Theorem*

In general terms, flexibility is the ability of a material or framework to reversibly change the configuration of its joints, bodies, or building blocks. Rigidity, which is the opposite property of flexibility, describes a state in which no relative motions are allowed between the framework's elements. In a rigid structure, only rigid body motions are possible (i.e., motions arising from congruences of space, rotations, translations, etc.). In biochemistry and biophysics, a notion related to rigidity is the concept of stability and robustness, where internal protein dynamics are not changed in response to small atomic fluctuations and the breaking of a few noncovalent interactions. Although to a non-expert, rigidity and stability may seem like related concepts, care should be taken to understand the potential differences and their implications.

Mathematical *rigidity theory*, sometimes called *structural rigidity* because of its close connections to structural and mechanical engineering, offers the most mathematically sound concepts and algorithms for analysis of rigidity and flexibility of frameworks [13, 14, 34]. Rigidity theory analyses the rigidity and flexibility of frameworks, as specified by geometric constraints such as fixed distances, directions, and volumes defined by a collection of points, lines, planes, or rigid bodies. Frameworks can be natural structures (molecules, crystals, proteins, etc.) or engineered structures (bridges, robots, etc.), and because rigidity is an essential property of most frameworks and materials, rigidity theory naturally has many applications in engineering, robotics, material science, and biology.

Rigidity theory has both geometric and combinatorial characteristics relying on techniques in linear algebra, discrete and algebraic geometry, graph theory, and combinatorics. Rigidity theory has a very long and rich history in mathematics, with early work appearing in the form of Euler's (1766) conjectures on rigidity of polyhedra. Maxwell's (1864) [14, 34] work on counting constraints in a framework for generic rigidity led to the birth of so-called '*combinatorial rigidity*'. Combinatorial characterization of rigidity theory, 140 years later, has turned out to be absolutely crucial for rapid flexibility analysis of materials such as glass networks and protein structures [14].

The classical and simplest frameworks studied in rigidity theory are the bar and joint frameworks (see Fig. 14.4), which are composed of universal (rotating) joints that are connected by bars that fix the distances between pairs of joints. A bar and joint framework is defined as a pair *(G, p)*, where *G* = *(V, E)* is an undirected graph and *<sup>p</sup>* : *<sup>V</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup>* , where vertices correspond to joints and edges correspond to bars that connect some pairs of joints; *p* represents a configuration of joints in R*<sup>d</sup>* . A framework *(G, p)* in R*<sup>d</sup>* is rigid if the only edge-length-preserving continuous motions of the vertices are derived from isometries of <sup>R</sup>*<sup>d</sup>* . If *<sup>d</sup>* <sup>≥</sup> 2, it is NP-hard to determine if a bar and joint framework is rigid [34]. As determining the rigidity of frameworks is very difficult, a common approach is to linearize the problem by differentiating the length/bar constraints of the corresponding pair of connecting

**Fig. 14.4** Bar and joint framework examples: **a** is flexible as it can deform its shape (note it is one edge too short in terms of Laman's count, |*E*| *<* 2|*V*| − 3); **b** is minimally rigid in 2D (but flexible in 3D as one can rotate two triangles around the diagonal). **c** is redundantly rigid in 2D as it has a redundant (i.e., extra) edge and is minimally rigid in 3D

points/joints, which leads to a system of linear equations (one equation per edge) and a corresponding rigidity matrix. The solution to such a homogenous system can be captured by calculating the rank of the rigidity matrix, which indicates if a framework is infinitesimally rigid [34, 35]. However, in many applications and large frameworks such as proteins, this is not particularly practical owing to numerical errors and uncertainty in rank computations of the rigidity matrix.

A well-known fact within rigidity theory is that if the framework is generic (i.e., it does not have special singular geometry), then rigidity and infinitesimal rigidity coincide [34]. Generic frameworks are very important, as rigidity can be studied by pure graph and combinatorial techniques—a subfield of rigidity theory called combinatorial rigidity theory. A framework is generically rigid if it maintains rigidity even after minor changes to the position of its joints, and almost all frameworks are generic [13, 34, 36]. By assuming that a framework is in a generic position, one can neglect the geometric embedding of joints and actual distances of bars to focus on only the topology of the bar and joint framework and discuss the generic rigidity of *(G, p)* in terms of graph *G*.

#### **14.3.1.1 Counting for Rigidity and Flexibility**

We now motivate the characterization of rigidity of generic frameworks using combinatorial arguments. For bar and joint frameworks in dimension *d*, each joint (point, vertex) has *d* conformational degrees; hence, *N* joints have a total of *d N* degrees of freedom. The number of trivial rigid body motions in dimension *d* or isometries is *d(d* + 1*)/*2. Therefore, in a generic rigid bar and joint framework, the number of bars ≥ *d N* − *d(d* + 1*)/*2. This is known as Maxwell's counting condition. In the plane (*d* = 2), Laman's theorem [34] extends this result by proving that the 2*N* − 3 count is both necessary and sufficient for generic rigidity of two-dimensional bar and joint frameworks. More formally, a two-dimensional bar and joint framework is generically *minimally rigidity* if and only if |*E*| = 2|*N*| − 3 and, for all subsets of edges, |*E* | ≤ 2|*N* | − 3. In other words, this remarkable theorem says one can count the vertices and edges in a graph and their distributions over subgraphs to **Fig. 14.5** Maxwell's counts in 3D do not guarantee rigidity. A bar and joint framework in 3D (known as the double banana graph) satisfies the 3|*N*| − 6 count condition but is flexible (two yellow rigid subgraphs can rotate about an imaginary hinge shown as a red dashed line)

predict generic rigidity of two-dimensional bar and joint frameworks. A framework is minimally rigid if removal of any edge (bar) results in a flexible framework (see Fig. 14.4).

Unfortunately, Maxwell's counting results are not sufficient for minimally rigid bar and joint graphs in dimension 3 and higher. For example, a well-known counterexample is a graph of a double banana, which satisfies Maxwell's 3|*N*| − 6 count but is flexible (see Fig. 14.5). Not only is there a lack of a Laman type of a theorem for generic bar and joint frameworks in dimension 3 and higher, there are no known polynomial time algorithms for testing rigidity for general three-dimensional graphs [34]. Extensive research has been conducted on this problem and, to date, only some partial results and approximation algorithms can be found [34, 35]. Fortunately, for different classes of frameworks, called *body-bar* and *body-hinge* frameworks, which includes molecular frameworks, there is a complete and rich combinatorial characterization of rigidity, which is discussed next.

#### **14.3.1.2 Rigidity Model of Molecules and the Molecular Theorem**

To build a computational method based on rigidity theory that can provide fast and accurate prediction of protein rigidity and flexibility, three requirements must be met: (i) a realistic physical model of a basic molecular framework; (ii) an accurate model of molecular interactions; and (iii) a fast algorithm for predicting rigidity/flexibility properties of the protein framework model.

Protein structures consist of atoms and various chemical interactions (forces) of different strengths. In rigidity theory, strong interactions between atoms are usually assumed to be fixed rigid constraints in terms of distances and angles. In such a rigidity model of a molecule, bonding interactions are assumed to fix distances between a pair of bonded atoms, and the angles between the bonds of an atom are fixed, allowing only dihedral angle rotations. High frequency motions such as bond vibrations are neglected. This is a sensible modelling assumption as single covalent bond lengths are essentially invariant. For example, the length of a covalent bond between two carbon atoms will vary less than a single percent from its equilibrium value of 1.53 angstroms [14]. Double bonds and peptide bonds lock dihedral angles, and noncovalent interactions such as hydrogen bonds and hydrophobic contacts also impose additional constraints.

A *molecular framework* in rigidity theory is a collection of atoms, which can be modelled as fully rigid bodies with six conformational degrees of freedom of a rigid body and bonds as rotatable hinges, which allow for rotational degrees of freedom between single-bonded atoms. Such frameworks in rigidity theory are a special case of *body-hinge framework*. Hinges (i.e., bonds) remove five degrees of freedom, and for algorithmic and theoretical reasons, it is useful to model hinges as a set of five rigid bars, where each bar (i.e., edge) generically removes a single degree of freedom between bonded atoms. This finally leads to a body-bar framework representation of a molecular body-hinge framework—that is, a collection of rigid bodies connected by linear bars. Special geometric criteria should be considered as bonds are not generic hinges (since bonds intersect at centre of atoms) and the five bars have to pass through the hinge axis to geometrically give the same model as a hinge, but such discussion is beyond the scope of this chapter (details can be found elsewhere; see [13]). Double bonds are modelled as a set of six bars between two atoms. Moreover, non-covalent interactions such as hydrogen bonds and hydrophobic interactions, which are important for overall protein structure folding and rigidity, can also be modelled as a set of one to five bars (where one bar indicates the bond is least restricting and five bars indicate it is most restricting) [25]. This overall model, consisting of rigid bodies for atoms and both covalent bonds and non-covalent interactions, defines the *body-bar framework* model of a protein structure (see Fig. 14.6).

The topological structure of a body-bar (and body-hinge and molecular bodyhinge) framework is a multigraph *G* = *(V, E)*. Vertex set *V* corresponds to a set of bodies (i.e., atoms) and edge set *E* to a set of bars (i.e., bond constraints). In accordance with Laman's theorem, an equivalent statement for body-bar frameworks was formulated by Tay [37]. Tay's theorem confirms that the rigidity of generic bodybar frameworks in 3D (which works for all dimensions) can be checked using the 6|*V*| − 6 count in a body-bar multigraph. Tay's theorem also extends to generic body-hinge structures [20]. It was proven by Katoh and Tanigawa [38] that the same counting condition stated in Tay's theorem also characterizes the rigidity of generic molecular body-hinge frameworks. This result is known as the molecular theorem, which is here combined with Tay's theorem into one statement.

**Theorem 1** (Tay's Theorem/Molecular Theorem) *A generic three-dimensional body-bar framework (body-hinge/molecular framework where bonds (hinges) are*

**Fig. 14.6 a** 3D body-hinge framework composed of seven rigid bodies connected by hinges (lines) can be modelled as a body-bar framework (with a corresponding body-bar multigraph shown). **b** A molecule consisting of two carbon atoms and a single bond can be viewed as a body-hinge structure where atoms are rigid bodies (one-valent hydrogen atoms are a part of a carbon atom rigid body, as their angles are fixed and can only spin around their axes) and a hinge is a rotatable bond, with corresponding body-bar multigraph. **c** A ring of seven carbon atoms (ignoring one-valent hydrogens) with a corresponding multigraph. (According to the molecular theorem a ring of seven atoms will have one internal degree of freedom. The total number of edges is 7*(*5*)* = 35, while we need 6|7| − 6 = 36). **d** Protein structure can be modelled as a molecular body-bar multigraph with black, red, and green lines corresponding to covalent bonds, hydrogen bonds, and hydrophobic contacts, respectively

*replaced by five bars) on a multigraph G* = *(V, E) is minimally rigid if and only if* |*E*| *=* 6|*V*| − 6*, and for all subsets of edges,* |*E* | ≤ 6|*V* | − 6*.*

In the stated original form, Tay's theorem leads to an exponential algorithm, as it requires counting the number of edges in every subgraph. However, because these counts of *G* (same as Laman's counts) define an independent set in a matroid [13, 35], this gives rise to greedy algorithms that can be used to efficiently track these counts. It is well known that all matroidal structures have greedy algorithms. A number of fast polynomial algorithms based on matroid unions, tree decompositions, and extension of bipartite matching algorithms, such as the *pebble game algorithm*, were subsequently developed for tracking these rigidity certifying counts (independence) in graph and subgraphs [16, 39].

#### **14.3.1.3 Pebble Game Algorithm**

The pebble game algorithm can very rapidly decompose a body-bar/molecular graph (i.e., protein structure) into rigid and flexible regions and quantify the overall number of degrees of freedom. The main step of the pebble game algorithm is to determine if a constraint (edge) is 'independent' (i.e., removes degrees of freedom) or is 'redundant' as its insertion has no effect on rigidity. The algorithm iteratively builds a maximal independent set of edges. We give a basic procedure of how the main steps of the pebble game algorithm are carried out for Tay's theorem without full details or speedups, which can be found in previous publications [16, 39]. A similar procedure can be derived for Laman's counts or other matroidal independence counting conditions. The implementation of the pebble game algorithm routine given here, which tracks counts in the molecular theorem, is important for the protein flexibility analysis that has been implemented in several software packages. such as FIRST (see below).

#### **The Pebble Game Algorithm** 6|*V* | − 6:

Input: A multigraph *G* = *(V, E)* .

Initialize *I(G)* and R*(G)*to an empty set of edges. Place six pebbles on each vertex of *G*. (Fig. 14.6a) Test the edges of *E* in an arbitrary order.

	- (a) If the vertices *u* and *v* have at least seven free pebbles, then place any pebble from either *u* or *v* onto *e*, directing the edge *e* from that vertex (Fig. 14.6b). Place *e* into *I(G)* (independent edges) and return to step 1.
	- (b) Else, search for a free pebble from *u* and *v*, by following the directed edges (covered edges) in the partially constructed directed graph *I(G)* (Fig. 14.6c).
		- (i) If the free pebble is found on some vertex *w* at the end of the directed path *P* (which starts at *u* or *v*), we perform a swap or sequence of swaps (cascade), reversing the entire path *P*, until a free pebble appears on the initial vertex (*u* or *v*) of the path *P* (i.e., *w* loses one free pebble, and *u* or *v* gains one free pebble) (Fig. 14.6c–e). Return to Step 2.
		- (ii) Else, we could not find the seventh free pebble, and the edge is declared redundant (could not be covered by the pebble) (Fig. 14.7). Place *e* into R*(G)* (redundant edges). Return to step 2.

When the algorithm is finished, *I(G)* is the maximal independent set of edges (edges that are covered by pebbles). R*(G)* is the set of redundant edges (edges that were not covered by a pebble). Total degrees of freedom (DOF) in a graph = number of remaining free pebbles.

The pebble game algorithm described here tracks the independence of edges in graphs prescribed by the molecular theorem. The initialization of placing six free pebbles on each vertex (corresponds to six trivial rigid body motions) tracks the 6|*V*| part of the count. Pebbles are synonymous with degrees of freedom and removal of a pebble indicates the inserted constraint (edge) is independent. Redundant constraints do not remove degrees of freedom (pebbles) as their insertion (or deletion) from an already rigid region causes no change in rigidity. Every time an edge is pebbled, it grows the set of independent edges. Pebble game algorithms are building a maximal subsets that are independent; at every stage, the edges covered by pebbles will satisfy |*E* | ≤ 6|*V* | − 6 on all subsets. The requirement of at least seven free pebbles on the vertices before an edge is pebbled (i.e., declared independent) ensures the critical

**Fig. 14.7** A demonstration of a 6|*V*| − 6 pebble game algorithm on a 3D cyclohexane graph. Edges are pebbled one by one (when there is at least seven free pebbles on its end vertices (**a**, **b**). If we cannot locate seven free pebbles we can search for free pebbles along with the partially created directed graph, swapping pebbles back. The graph has six remaining free pebbles and all edges are pebbled, indicating it is minimally rigid

subtraction in 6|*V*| − 6 is respected on all subsets of edges. The algorithm is greedy. In other words, regardless of the order the edges are pebbled (i.e., are tested for independence), the algorithm will always give unique answers for total remaining free pebbles, the size of maximal independent *I(G)* and redundant R(*G*) set of edges. The pebble game algorithm is a very intuitive algorithm, which in the worst case runs in *O*(*V*2) [39], and in practice, it runs in linear time [15] (Fig. 14.8).

There are many extensions one can extract from the pebble game [16]. For example, when we cannot locate the seventh free pebble, the failed search over the directed graph indicates a rigid cluster. By using this procedure, it is possible to find all the maximal rigid clusters and redundantly rigid clusters (Fig. 14.7). Prediction of a highly redundant rigid clusters provides useful importance to a biochemist as these regions will have additional robustness, and will not become unstable (flexible) due to one or few edges breaking. For example, when a hydrogen bond breaks in a significantly redundantly rigid region, it will not alter its rigidity. We can also extract the relative degree of freedom count for any subgraph in *G*. This is very useful in the prediction of flexibility of particular regions of interest in protein graphs, for example, in antibody protein flexibility studies and in allostery predictions, which is discussed in next section.

**Fig. 14.8** 6|*V* | − 6 Pebble game algorithm. **a** When we cannot pebble an edge, it indicates that edge is redundant and the corresponding failed search locates a redundantly rigid subgraph (**b**). Overall, the graph is flexible with one internal degree of freedom, as indicated by the remaining seven free pebbles. Rigid clusters are circled. Each one of the bonds can be moved with one internal rotational degree of freedom

## **14.4 Protein Flexibility, Dynamics, and Function Analysis with Rigidity Theory**

#### *14.4.1 FIRST and Rigid Cluster Decomposition*

The pebble game algorithm is the main component of the programme FIRST [15] and other related software for analysing protein rigidity and flexibility. Starting with a protein structure (experimentally or computationally determined structure) in Protein Data Bank File format, the programme FIRST begins by creating a molecular bodybar multigraph. The multigraph consists of all atoms (including hydrogen atoms) represented by vertices, with covalent bonds, hydrogen bonds, hydrophobic contacts, and electrostatic interactions represented by edges. Covalent bonds are modelled as five edges, with six edges for double bonds and peptide bonds (as they do not have bond rotation), while hydrogen bonds and hydrophobic interactions are modelled with between one and five edges [25]. Hydrophobic contacts are defined as a pair of carbon–carbon, carbon–sulfer, or sulfer–sulfer atoms in close contact. Each hydrogen bond is assigned an energy strength in kcal/mol using an energy potential based on hydrogen donor and acceptor geometries. Hydrogen bonds are very important to the overall protein shape and stability. A hydrogen bond cutoff energy value (which mimics temperature) is selected such that all bonds weaker than this cutoff are ignored in the graph. Once the final constraint multigraph is obtained (Fig. 14.6d), FIRST then uses the pebble game algorithm and molecular theorem to decompose the protein into rigid and flexible regions.

**Fig. 14.9** Rigidity and flexibility analysis using FIRST and the pebble game algorithm on protein data from the Protein Data Bank (Protein Data Bank ID, 2jz3). The hydrogen bond dilution plot indicates how the protein breaks down as the hydrogen bond cutoff is increased (i.e., energy is increased), breaking hydrogen bonds one by one. Flexible regions are indicated by thin black lines and rigid regions are indicated by blocks, with separate colours indicating distinct rigid clusters. Flexible regions are coloured black on the protein structure. Initially, with inclusion of all potential hydrogen bonds, the protein is dominated by a few large rigid clusters (indicated by separate colours), and as hydrogen bonds are gradually broken with increasing energy, most of the protein becomes flexible (black) with a few remaining rigid clusters

Figures 14.9 and 14.10 show some examples of rigid cluster decompositions obtained with FIRST and the pebble game algorithm for two proteins. The rigid cluster decomposition on a very large Spike protein complex consisting of nearly 4000 residues was obtained in less than one second of running time (Fig. 14.10) We can monitor gradual changes in the rigid cluster decomposition as hydrogen bonds are removed one by one (i.e., by lowering the hydrogen bond energy threshold) in the order of increasing bond strength. The change in rigidity can be visualized using a hydrogen bond 'dilution plot' (Fig. 14.9). Because the pebble game is a combinatorial integer algorithm (tracking molecular theorem counts) as opposed to a numeric algorithm, FIRST always gives a unique exact answer.

While tremendous computational power and resources are needed to simulate protein flexibility with MD simulations, FIRST can predict rigid clusters and flexible connections in less than one second on a typical PC/laptop. Because of its speed and efficiency, rigidity theory analysis using FIRST and other related programmes have been widely applied to analysing various aspects of protein function and flexibility analysis, such as viral capsids [40] (with enormous structures containing hundreds of copies of protein structures), protein engineering, and prediction and replica**Fig. 14.10** Rigid cluster decomposition obtained with FIRST on a very large SARS-CoV-2 (in COVID-19) spike protein complex (Protein Data Bank ID 6vyb). At -1 kcal/mol energy cutoff, spike protein consists of more than 70 rigid clusters, each containing at least 20 atoms

tion of experimental measures of dynamics such as hydrogen–deuterium exchange, allostery, and enzyme catalysis [11, 12, 15, 17–19, 23, 24, 26].

#### *14.4.2 Large-Scale Rigidity and Flexibility Analysis*

As an illustration of the efficiency and wider applicability of rigidity theory for large big-data high-throughput analyses of protein structures, we review a study where the author and colleagues carried out the largest study to date of flexibility predictions of antibody protein structures [41].

Antibodies are proteins produced by B cells that play a main role in the adaptive immune system. They recognize a variety of pathogens and induce further immune response to protect the organism from external disturbance. Molecules that are bound by antibodies are called antigens. The focus of this study was to characterize flexibility of the key hyper-variable binding region on antibody called CDR H3 loop, which is the most important region in binding and recognition of various antigens. More specifically, we analysed whether the conformational flexibility of CDR H3 loop is changed as antibodies undergo affinity maturation. Antibodies can rapidly evolve

**Fig. 14.11** Antibody is a large Y-shaped molecule. CDR H3 loop (shown in red) is located on the surface of each antibody arm, acting as a key region for antigen binding and recognition. In the study, authors applied extensions of the pebble game algorithm to analyse flexibility of the H3 loop using thousands of naïve and mature structures. There was no significant difference in flexibility between the naïve and mature H3 loops (figure on right adapted from [41])

to specific antigens, where affinity maturation drives this evolution through multiple cycles of mutation leading to enhanced antibody specificity and affinity. In this study, we utilized various extensions of the pebble game algorithm, initially developed in [16], which enables quantification of local flexibility of any subgraph, with focus on CDR H3 regions. By analysing thousands of mature and naïve antibody crystal structure and homology models, we found no clear statistically significant difference in the flexibility of CDR H3 loops (Fig. 14.11), which was also correlated with experimental measures of flexibility. Such large-scale analysis of the flexibility of protein structures could be carried out because of the speed of the underlying FIRST method and our various pebble game extensions.

#### *14.4.3 Protein Allostery Analysis with Rigidity Theory*

We now briefly discuss and review an important application of rigidity theory for analysis of allosteric signalling in protein structures. Allostery is one of the most powerful and fundamental mechanisms regulating protein function [8–12, 42–44]. Allostery refers to the regulation of protein function at a distance, where a perturbation of a protein structure at one part of protein structure (for example, due to a binding or mutational event) can affect conformations and dynamics at another distant site, resulting in regulation of protein function. Allostery is a common event in the cell, and most dynamic protein exhibit some form of allosteric control mechanism. Allostery has been referred to as 'the second secret of life', second only to the genetic code [8]. Monod and Jacob in 1960s [43] first introduced the allostery concept; however, most questions pertaining to allostery are still largely unresolved. Decoding the allosteric mechanism remains one of the key long-standing unsolved problems in the biological sciences.

One of the important areas in allostery research is describing the physical mechanism of distant coupled conformational changes. The utilization and extension of our earlier fundamental work in modelling allostery in frameworks and graphs [16] and a first rigidity-based mechanistic model of allosteric signalling has led to several important breakthroughs in understanding how allostery controls enzyme and receptor function [11, 12, 24, 44]. Our rigidity theory methods predict that if mechanical perturbation of rigidity at one site of the protein can transmit and propagate across a protein structure and, in turn, cause a change in the available conformational degrees of freedom and a change in the conformation and dynamics at a second distant site, resulting in allosteric transmission (Fig. 14.12a). Using various extensions of the pebble game algorithm, we can analyse how long-range conformational coupling occurs in protein structures, map out allosteric pathways (regions in protein that are important for allosteric signalling) and extract various other properties and features of long-range coupling.

A popular hypothesis is that dynamical effects play a central role in enzyme catalysis. Dynamical changes are often manifested in proteins through allosteric effects, where a substrate binding can cause changes in dynamics at remote parts of a protein. In a study published in *Science* [11] concerning bacterial homodimeric fluoroacetate dehalogenase enzyme, experimental NMR chemical shift data suggested that when a substrate binds to one monomer, the second empty monomer undergoes asymmetrically pronounced conformational changes through an increase in flexibility in dynamics, thereby entropically favouring the forward reaction. Our rigidity-based allostery theory was able to verify this and elucidate in great detail the key residues involved in the allosteric pathways responsible for changes in dynamics and how substrate binding enhances allosteric communication between two subunits (Fig. 14.12b). These findings also provided deep insights into the energetic nature of allosteric processes that drive catalysis.

In a follow-up study [24], we showed that when there is a high concentration of substrate, the enzyme undergoes catalysis inhibition through the reduction in dynamics and dampening of interprotomer allosteric effects. Our computational rigidity predictions of allosteric networks and resulting changes in dynamics when additional substrates were bound to the enzyme were validated with NMR and functional experimental studies. These studies represented a major breakthrough in illustrating the role of dynamics and allostery in enzyme function.

Our rigidity-theoretical approaches have been extremely useful for studying allostery in other enzymes and proteins. Indeed, we were able to provide a major advancement and new level of insight regarding key allosteric processes in GPCR activation. GPCRs are situated in the plasma membrane, engage the G-protein and initiate cell signalling [45]. In several studies [12], we have shown how interactions

**Fig. 14.12** Rigidity theoretical model for allosteric communication. **a** Conformational changes in one region of the framework (or protein structures) can propagate and change conformations and rigidity at distant regions. **b** Rigidity theory allostery analysis showed that homodimeric fluoroacetate dehalogenase enzyme with substrate fluoroacetate molecule (shown as orange spheres) exhibits allosteric communication between the two subunits (shown in distinct colours), which is critical for enzyme catalysis [11, 24]. **c** In a study of human adenosine A2A receptor [12, 18]., a member of superfamily of receptors called G-protein-coupled receptors (GPCRs) a similar approach was used to discover that allosteric communication between receptors and different domains of G-protein is critical for full receptor activation

between GPCR and its natural G-protein binding partner affect activation networks, as is critical for optimal GPCR activation (Fig. 14.12c), or how sodium, calcium, and magnesium can affect this activation process [18]. Our rigidity theory-based approaches offer a new perspective and opportunity to study the various facets of allosteric regulation of protein function, which will allow us to examine complicated signalling events in the cell.

#### *14.4.4 Using Rigidity Theory to Simulate Protein Dynamics*

So far, the discussion has focused on infinitesimal flexibility (which is equivalent to finite flexibility, assuming atom positions are in a generic configuration) and not on continuous motions. In other words, FIRST and the pebble game outputs do not simulate protein dynamics and indicate the amplitude of motions. One useful extension is to combine the rigid cluster decomposition with Monte Carlo-based geometric dynamics simulations [20, 21]. Rigid cluster decomposition can remove hundreds of degrees of freedom from the overall protein framework and serve as a natural coarse graining step to speed up protein dynamics simulations [19, 46]. For example, the all-atom geometric simulation method FRODA (Framework Rigidity Optimized Dynamic Algorithm) (which runs about 100,000 times faster than MD simulations) [20] uses rigid clusters as a preprocessing step to explore the conformational space of the protein motions. The rigid clusters, whose size and number depend on the selected energy threshold and the type of protein structure being analysed, can be kept fixed as rigid body geometrical components in the simulation motion (see Fig. 14.13). The atoms belonging to a rigid cluster can only move by utilizing trivial rigid body degrees of freedom. With this in mind, simulations can be focused on sim-

**Fig. 14.13** Geometric simulation as used in FRODA/FRODAN (**a**). A part of a 2D slice through the 3N-dimensional conformational space, where red indicates disallowed states and blue indicates allowed states [21]. A random move (green arrows) is accepted if it falls within a blue region (green dots) and rejected if it falls within a red region (yellow dots), followed by enforcement of the constraints (yellow arrows). The black path produces a valid geometric path within the allowed conformational space. Any rigid region (which can be potentially very large) identified with FIRST moves as a single rigid body within FRODA or very small rigid clusters or individual atoms within FRODAN. **b** FRODA was applied to a large antibody protein to explore the large-scale motions of arms (green and orange) of the Y-shaped antibody structure, where three distinct colours represent three separate large rigid bodies. **c** FRODAN dynamics simulation illustrating internal dynamics of a Spike protein [47]

ulating the relevant degrees of freedom belonging to intermediate flexible regions. FRODA rapidly generates geometrically valid conformations that are consistent with bond lengths and angular constraints while maintaining all rigid clusters. In these protein motion simulations, we need to add the van der Waals collisions of atoms as constraints, where only allowed geometries (valid stereochemistry, bonding angles, Ramachandran plots etc.) accessible to protein motions are simulated. Figure 14.13b shows the output of FRODA for an antibody protein, which exemplifies large amplitude motions.

We have applied and extended FRODA, using the related constrained geometric simulation programme FRODAN [21], which, like FRODA, provides very fast motion simulations but is better suited for proteins that are not dominated by large rigid clusters. In a FRODAN simulation, the rigid clusters are typically small, from single atoms up to small rigid cycles (e.g., proline rings and rigid loops). This makes FRODAN useful for simulations of protein motions that include substantial unfolding and refolding and analysing motions of intrinsically disordered proteins. Indeed, we have utilized a similar approach in combination with an experimental measure of dynamics, hydrogen–deuterium exchange, to characterize the highly complex motions and conformational ensemble of a large intrinsically disordered Tau protein [22]. Tau protein is a key protein in a number of pathologies and dementias such as

**Fig. 14.14 a** Tau protein is a large intrinsically disordered protein. Because of its high flexibility and disordered structure, it is able to take a wide variety of shapes, which makes it difficult to study with conventional MD simulations. **b** By performing large rigidity theory geometric simulations using FRODAN and its extensions, we were able to characterize the representative structures for the native and defective (i.e., hyperphosphorylated) forms of Tau, which was shown to be in agreement with HDX experimental data (The figure in b is adapted from [22])

Alzheimer's disease, and its primary physiological role is to stabilize microtubules in neuronal axons at all stages of development. One of the main challenges in understanding the Tau structure–function relationship and finding successful therapeutics for Alzheimer's disease is the poor understanding of the atomic structural ensemble and dynamics of the Tau protein. Moreover, Tau protein undergoes modifications to its shape and internal dynamics as mediated by a hyperphosphorylation defect. By performing FRODAN simulations and our various extensions, we were able to show an unprecedented first detailed view of the structural and dynamic characteristics of both the normal and the defective hyperphosphorylated forms of Tau [22]. This study provided a rich understanding of the structural basis of Tau pathology (see Fig. 14.14).

FRODA, FRODAN and our various extensions can be applied to probe the dynamics of very large structures such as Spike proteins [47] or disordered proteins, which provides a significant advantage over traditional MD simulations. Probing motions of intrinsically disordered proteins with MD simulation is extremely challenging, if not essentially impossible, owing to their highly dynamic character. The rigidity theoryinspired methodologies FRODA/N discussed here can be run in either targeted and non-targeted modes, and we have recently combined these techniques with search algorithms in reinforcement learning (under review). The targeted mode employs biasing force during transitioning, while the non-targeted mode explores unbiased random fluctuations, which enables the exploration of a broad conformation space. Additionally, the targeted mode is useful for determining the conformational transition pathways between distinct conformations (i.e., opening and closing motions such as hinge-bending motions, GPCR activation, etc.).

#### **14.5 Protein Structure Validation with Rigidity Theory**

We now discuss another application of rigidity theory to structural biology. In a very recent study, we made an important breakthrough in the area of protein structure validation [48, 49].

Experimentally solved protein structures are only useful if they are known to be accurate and realistically represent the protein structures in their native environment. The vast majority of protein structures in the Protein Data Bank [50] have been solved by X-ray crystallography or NMR experiments. Both X-ray crystal structures and NMR structures are only model representations of experimental data, which are prone to uncertainties and errors. It is widely accepted that experimentally solved protein structures must be validated with (i) geometric tests and (ii) how well structures match input experimental data (restraints) [51]. Geometric criteria are easy to check for both X-ray and NMR structures, and measurements like R factor and Rfree values can be used to check how well X-ray structures match input X-ray diffraction data [48]. Unfortunately, no such validation criteria exist for NMR structures [51], and unlike crystal structures, validating the quality of NMR structures has been extremely difficult. In fact, since the first protein was determined by NMR in 1985 until now, there has been no effective method for NMR protein structural validation, which has largely limited the applications and use of NMR structures among protein researchers [51–55]. This has created a problem not only for users of structural information, but also for scientists who use NMR to computationally solve structures and want to know how accurate their solved structure is.

While structures solved by NMR represent less than 10% of all structures in PDB, they are extremely important, as not all proteins can be crystalized and NMR structures also include a high proportion of proteins with under-represented folds (shapes). NMR structures are determined in solution (a protein's natural environment), whereas X-ray structures are determined in a crystalline environment, which arguably makes NMR structures more representative of in vivo structures. Hence, there has been a pressing need to find an acceptable validation measure for NMR structures.

We have developed the method ANSURR (Accuracy of NMR Structures Using Random Coil Index and Rigidity) [48], which addresses this critical long-standing gap for NMR protein structure validation. ANSURR assesses the quality of NMR structures by comparing two measures of local protein rigidity, one derived from the original NMR input data and the other derived from rigidity theory prediction of protein flexibility using structural data. The measure of rigidity using input data is based on the Random Coil Index (RCI), which uses experimental NMR chemical shifts (a readily available data type for each NMR structure) to quantify the extent of disordered structure for each amino acid in solution. The second measure is based on FIRST and our rigidity theory extensions, which involves calculating the dilution plot (see Fig. 14.9) and extracting a flexibility score for each residue. ANSURR then compares these two measures of local rigidity and provides a residue-by-residue test of how well the rigidity of the structure (obtained from rigidity theory) compares

**Fig. 14.15 a** The ANSURR method evaluates the accuracy of nuclear magnetic resonance (NMR) protein structure (which are given as an ensemble of models) by comparing two measures of protein flexibility (orange predicted from structure, using mathematical rigidity theory using extensions of the method FIRST, and blue derived from the random coil index [RCI] using experimental NMR chemical shift data). **b** Analysis of ANSURR using four models from NMR (Protein Data Bank ID, 1e17). ANSURR provides two metrics for accuracy: a correlation score between FIRST (rigidity) and RCI and a root mean square difference (RMSD) score. The structures in the top right portion of the plot (high correlation and high RMSD scores) are high-quality NMR structures, and structures in the bottom left of the plot are considered poor structures (Figure adapted from [48]). **c** ANSURR output for an example NMR structure (Protein Data Bank ID, 2kpp) that has high accuracy for most models in the ensemble

to the experimentally determined (true, RCI chemical shift) rigidity. ANSURR provides two metrics for accuracy measurement. One is a correlation score between FIRST (rigidity) and RCI, which assesses the accuracy of protein folding (secondary structures), and the second is an RMSD score, which measures how well the overall rigidity and flexibility between FIRST and RCI match (Fig. 14.15).

Unlike crystal structures, NMR structures are always represented as an ensemble of (typically around 20) possible structural models. Because it is unclear which models are useful or accurate, this has created substantial and unnecessary confusion for users of NMR structures. A nice feature of ANSURR is its ability to estimate the accuracy of each model.

The performance of ANSURR was tested using several approaches [48]; first, ANSURR was applied to structures refined in an explicit solvent (which was found to be much better than unrefined structures), and then ANSURR was applied to a large set of good and bad structures (using decoy generations). ANSURR was also compared against previously proposed measures of accuracy (mostly restraint-based tests and geometric checks). Several of these indicators, such as restraint violations and restraints per residue, were shown to be poor measures of accuracy. On the other hand, a Ramachandran analysis (a standard check to determine if a protein backbone has a correct geometry) was found to be a useful geometric check of accuracy. A typical comparison of how well a structure compares to another structure is the backbone root mean square deviation, which can show if protein structures resemble each other when superimposed. However, this measure may miss many of the important structural differences found in amino acid side-chain orientations, which are responsible for forming critical hydrogen bonding interactions that have a direct impact on protein stability and functional aspects such as protein dynamics and enzyme catalysis. As rigidity measures are sensitive to side chains, ANSURR can also be used to assess the quality of side-chain atomic positions, which makes it a powerful tool for the assessment and refinement of protein structures.

Recent work [49] applied ANSURR to more than 7000 NMR structures in the PDB, showing that NMR structures span a wide range of accuracy. Most NMR structures have accurate secondary structures, but are too floppy, particularly in their loops. Our studies also indicate that both crystal structures and NMR structures have equally accurate secondary structural elements (helices, sheets), but crystal structures are typically too rigid in disordered regions, whereas NMR structures are too flexible overall.

Development of ANSURR is a major advancement in the long-standing problem of protein structure validation, as it provides the first workable measure of the accuracy of NMR structures and is expected to give researchers more confidence in the use and application of structural NMR. Ultimately, this should lead to a better understanding of how proteins perform their functions, with general implications for structural biology research. This work opens up enormous new research avenues in protein structure determination and the improvement of standards for protein structure refinement.

#### **14.6 Conclusion**

Studying the rigidity and flexibility of geometric frameworks has advanced considerably since Maxwell's combinatorial characterization of the rigidity of mechanical frameworks in the 1800s. Mathematical advancements in rigidity theory over the last two decades have been tremendous, opening up many exciting opportunities in applied sciences and engineering. In this chapter, we have reviewed some of the latest advances in rigidity theory and its applications for the analysis of protein function at an atomistic scale. Moreover, we have shown how rigidity theory-based methods and our various algorithms and extensions can rapidly and accurately predict protein flexibility and dynamics, which can be used to decipher various aspects of protein function, including elusive issues of allostery, enzyme catalysis, GPCR signalling, or motions of intrinsically disordered proteins. Our recent development using rigidity theory in protein structure validation has led to a development of a first workable method in validation of NMR protein structures. This advance will provide confidence to users of protein structures and is expected to accelerate and improve the process of protein structure determination and aid computational drug discovery. Rigidity theory is heavily rooted in deep mathematical formulations in the area of discrete applied geometry and combinatorics, which has unfortunately remained largely inaccessible to most researchers in applied science and engineering fields. While there has been some cross-fertilization between the various scientific fields studying different aspects of rigidity and flexibility, stronger interactions and interdisciplinary training are needed between applied and theoretical scientific communities to realize the enormous potential of rigidity theory applications. We advocate that rigidity theory, through both algorithmic and mathematical progress, has significantly advanced such that it could be widely applied in the analysis of structural biological data, which can complement experimental approaches to reveal novel insights on intractable and fundamental biological enigmas of living organism. Rigidity theory exemplifies how mathematics and algorithms can make significant contributions to structural biology, biological big-data analyses, and progress in biological applications.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 15 Optimization of Evacuation and Walking-Home Routes from Osaka City After a Nankai Megathrust Earthquake Using Road Network Big Data**

#### **Atsushi Takizawa and Yutaka Kawagishi**

**Abstract** When a disaster such as a large earthquake occurs, the resulting breakdown in public transportation leaves urban areas with many people who are struggling to return home. With people from various surrounding areas gathered in the city, unusually heavy congestion may occur on the roads when the commuters start to return home all at once on foot. In this chapter, it is assumed that a large earthquake caused by the Nankai Trough occurs at 2 p.m. on a weekday in Osaka City, where there are many commuters.We then assume a scenario in which evacuation from a resulting tsunami is carried out in the flooded area and people return home on foot in the other areas. At this time, evacuation and returning-home routes with the shortest possible travel times are obtained by solving the evacuation planning problem. However, the road network big data for Osaka City make such optimization difficult. Therefore, we propose methods for simplifying the large network while keeping those properties necessary for solving the optimization problem and then recovering the network. The obtained routes are then verified by large-scale pedestrian simulation, and the effect of the optimization is verified.

## **15.1 Introduction**

When a disaster such as a large earthquake occurs, the resulting breakdown in public transportation leaves urban areas with many people who are struggling to return home. With people from various surrounding areas gathered in the city, unusually

Y. Kawagishi Graduate School of Engineering, Osaka City University, Sugimoto 3-3-138, Sumiyoshi-ku, Osaka 558-8585, Japan

A. Takizawa (B)

Graduate School of Human Life Science, Osaka City University, Sugimoto 3-3-138, Sumiyoshi-ku, Osaka 558-8585, Japan e-mail: takizawa@osaka-cu.ac.jp

heavy congestion may occur on the roads when the commuters start to return home all at once on foot. In Japan, the Great East Japan Earthquake on March 11, 2011 left many people in central Tokyo unable to return home, and roads were flooded with pedestrians attempting to do so. After the Osaka North Earthquake on June 18, 2018, the Shin-Yodogawa Bridge and its surroundings were extremely congested by displaced people crossing the Yodo River from Umeda.

From reflecting on such confusion, many local governments have already decided on countermeasures for people who are struggling to return home [16]. Common among these countermeasures is that people who need to return home from their places of work are urged not to do so immediately after the disaster but rather to remain in place. Meanwhile, although it is known empirically that great confusion arises when difficulties in returning home occur, the associated countermeasures tend to be approximate because it is not known how much congestion occurs and where until after the disaster has occurred. Pedestrian simulation of the whole city would seem useful in such cases, but this has not been attempted until recently because doing so requires large-scale and detailed data and a high-speed calculation environment. However, Hiroi et al. carried out a simulation of mass returning-home behavior on foot after a large earthquake for an area within 40 km from Tokyo Station [3].

In the case of western Japan, such as Osaka City, an earthquake originating from the Nankai Trough is the most dangerous. In Osaka City, the resulting tsunami is predicted to reach the shore in 1 h and 50 minutes and flood about half of the city [14]. One of the major problems with the tsunami is that it will travel up the Yodo River in the northern part of Osaka City and spread to the coastal area. As mentioned above, people returning home after the 2018 Northern Osaka Earthquake became congested around bridges crossing the Yodo River, a phenomena that Kawagishi and Takizawa predicted by means of a large-scale simulation of returning home from Osaka City [7].

However, if the timings of the tsunami flooding and the movement of people overlap, a large-scale secondary disaster may occur. Therefore, in Osaka City, it is necessary to consider risks such as delayed escape from tsunami along with the countermeasures for people who are struggling to return home, but it is difficult to say that the current countermeasures consider such risks. The purpose of the study by Kawagishi and Takizawa [7] was to investigate how a Nankai Trough earthquake would affect the return home of commuters in Osaka City. The results confirmed that bridges over the Yodo River from the center of Osaka City would be congested for a long time with people crossing, and that there would be a danger of delayed escape from tsunami by remaining in the vicinity. However, that study did not consider evacuation behavior from tsunami, and it assumed that people who walk home take the shortest route to do so. Therefore, problems remained, such as excessive concentration of pedestrians on specific roads and bridges.

In the present study, it is assumed that a large earthquake caused by the Nankai Trough occurs at 2 p.m. on a weekday in Osaka City, where there are many commuters. We then assume a scenario in which evacuation from tsunami is carried out in the flooded area and people return home on foot in the other areas. At this time, the evacuation and returning-home routes with the shortest possible travel times are obtained by solving the evacuation planning problem [8, 19]. However, the road network big data for Osaka City make such optimization difficult. Therefore, we propose methods for simplifying the large network while keeping those properties that are necessary for solving the optimization problem and then recovering the network. The obtained routes are then verified by large-scale pedestrian simulation, and the effect of the optimization is verified.

The remainder of this chapter is organized as follows. The next section explains the evacuation planning model. Next, the pedestrian simulation model for a largescale network model is described. Then, the results are discussed, and conclusions and suggestions for future work are presented.

#### **15.2 Quickest Evacuation Planning Problem**

This section describes the quickest evacuation planning problem based on a dynamic network in which the flow rate changes over time. Meanwhile, a network in which the flow rate does not change over time is called a static network.

#### *15.2.1 Dynamic Network*

We define a directed graph *D* = (*V*, *E*) for vertex set *V* and edge set *E*. In *V*, the sources (i.e., the starting vertices of the flow) and sinks (i.e., the destinations) are given. A directed edge with a start vertex *u* ∈ *V* and an end vertex *v* ∈ *V* is expressed as *e* = (*u*, *v*), and the start vertex of *e* is expressed as *tail*(*e*) and the end vertex is expressed as *head*(*e*). For vertex *v* ∈ *V*, δ<sup>+</sup> *<sup>D</sup>*(*v*) ⊂ *E* is defined as a set of edges going out of *v* and δ<sup>−</sup> *<sup>D</sup>*(*v*) ⊂ *E* is a set of edges going toward *v*. For each edge *e* ∈ *E*, we define a travel time function <sup>τ</sup> : *<sup>E</sup>* <sup>→</sup> <sup>Z</sup><sup>+</sup> that denotes the time required to flow on *e* from *tail*(*e*) to *head*(*e*). The maximum value of the flow on *e* is denoted by the capacity function *<sup>c</sup>* : *<sup>E</sup>* <sup>→</sup> <sup>R</sup>+. For each vertex *<sup>v</sup>* <sup>∈</sup> *<sup>V</sup>*, we define a supply function *<sup>b</sup>* : *<sup>V</sup>* <sup>→</sup> <sup>R</sup><sup>+</sup> that denotes the amount of supply at that vertex, and the set of vertices with one or more supplies as *S*<sup>+</sup> ⊆ *V*. Furthermore, the sink set *S*<sup>−</sup> ⊆ *V* is also defined.

Using the above definitions, a dynamic network *N* = (*D*, *c*,τ, *b*, *S*+, *S*−) is defined, and Fig. 15.1 shows an example of *N*. Assuming application to evacuation planning, the flow denotes the movement of evacuees, a sink denotes an evacuation site, and the flow reaching a sink denotes the accommodation of evacuees at that evacuation site. An evacuee who arrives at a vertex moves on an edge and is deemed evacuated upon reaching a sink. The total number of evacuees at point *v* ∈ *V* is regarded as the supply at that point *b*(*v*).

Next, we define a dynamic flow *<sup>f</sup>* : *<sup>E</sup>* <sup>×</sup> <sup>Z</sup><sup>+</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> on dynamic network *<sup>N</sup>* as the flow rate entering the edge *<sup>e</sup>* <sup>∈</sup> *<sup>E</sup>* at discrete time <sup>θ</sup> <sup>∈</sup> <sup>Z</sup>+, and it is expressed as

*f* (*e*,θ). Note that the flow that enters *tail*(*e*) of edge *e* at time θ arrives at *head*(*e*)

at time θ + τ (*e*). On the dynamic flow, the following three constraints are defined. First, the capacity constraint is given by

$$0 \le f(e, \theta) \le c(e) \ (\forall e \in E, \theta \in \mathbb{Z}\_+),\tag{15.1}$$

then the flow conservation law is given by

$$\sum\_{e \in \delta\_D^+(\upsilon)} \sum\_{\theta=0}^{\Theta} f(e, \theta) - \sum\_{e \in \delta\_D^-(\upsilon)} \sum\_{\theta=0}^{\Theta - \mathfrak{r}(e)} f(e, \theta) \le b(\upsilon) \text{ (\forall \upsilon \in V, \forall \Theta \in \mathbb{Z}\_+), \quad (15.2)$$

and the demand constraint is given by

$$\sum\_{s \in \mathcal{S}^-} \sum\_{e \in \delta\_D^-(s)} \sum\_{\theta = 0}^{\Theta - \mathfrak{r}(e)} f(e, \theta) = \sum\_{\nu \in V} b(\nu) \ (\exists \Theta \in \mathbb{Z}\_+) . \tag{15.3}$$

A dynamic flow that satisfies these three constraints is said to be feasible, and the feasible dynamic flow that achieves the minimum time ∗ is called the quickest flow. The quickest evacuation planning problem is to find the minimum evacuation completion time ∗.

Considering application to an actual evacuation planning problem, there is an upper limit on the number of evacuees that can be accepted at each sink, which is an evacuation center. As defined by Kamiyama et al. [6], it is assumed that the capacity function *<sup>l</sup>* : *<sup>S</sup>*<sup>−</sup> <sup>→</sup> <sup>Z</sup><sup>+</sup> pertains to each sink, and the feasible flow *<sup>f</sup>* satisfies

$$\sum\_{e \in \delta\_D^-(s)} \sum\_{\theta=0}^{\Theta} f(e, \theta) \le l(s) \text{ (\forall s \in S^-, \forall \Theta \in \mathbb{Z}\_+)}.\tag{15.4}$$

#### *15.2.2 Time-Expanded Network*

Ford and Fulkerson [1, 2] proposed the time-expanded network to obtain the quickest flow. This is a static network corresponding to dynamic network *N* with time constraint , and it is designated as *N*(). The set of vertices for *N*() is defined by

$$\{\nu(\theta)|\nu\in V, \theta\in \{0, \dots, \Theta\}\}.\tag{15.5}$$

That is, for vertex *v* of the original network, vertex *v*(θ ) is provided corresponding to each time θ ∈ {0,...,} (see Fig. 15.2).

The edge set of *N*() consists of two parts. First, for each edge *e* = (*u*, *v*) ∈ *E* and each time θ ∈ {0,..., − τ (*e*)}, we have edge *e*(θ ) = (*u*(θ ), *v*(θ + τ (*e*))) of capacity *c*(*e*). Second, for each vertex *v* ∈ *V* and time θ ∈ {0,..., − 1}, we add stagnant edges (*v*(θ ), *v*(θ + 1)) of capacity +∞ (the horizontal edges in Fig. 15.2). For each vertex *v* ∈ *V*, the supply of *v*(0) is defined as *b*(*v*). The supply of *v*(θ ) for θ ∈ {1,...,} is set to zero. Let the sink set of *N*() be {*s*(θ )|*s* ∈ *S*−, θ ∈ {0,...,}}.

## *15.2.3 Algorithm for Solving Quickest Evacuation Planning Problem*

Ford and Fulkerson [1, 2] showed that for the evacuation completion time to be less than in dynamic network *N*(), the necessary and sufficient condition is that there exists a flow of size *<sup>s</sup>*∈*S*<sup>+</sup> *b*(*s*) from source set {*s*(0)|*s* ∈ *S*+} to the sink set. The existence of such a feasible flow can be examined by obtaining the maximum flow of *N*(). To consider the sink capacity for evaluation sites, we add a super sink *st*

and edges of capacity *l*(*s*) from *s*(θ ) to *st* for *s* ∈ *S*<sup>−</sup> and θ ∈ . Then, the necessary and sufficient condition for the evacuation completion time to be less than or equal to in dynamic network *N* is that the aforementioned feasible flow exists in the time-expanded network *N*().

In this way, it is possible to obtain the quickest flow of evacuation planning in pseudo-polynomial time using the time-expanded network, but as the size of the actual network increases, so does that of the time-expanded one. Moreover, Hoppe and Tardos [4, 5] proposed the quickest transport algorithm without using a time-expanded network. However, although it is a polynomial-time algorithm, it is necessary to minimize the sub-modular function iteratively, and currently this algorithm is inefficient for a large-scale network such as the one in the present study.

Generally, there is more than one quickest flow, of which the one for which the cumulative number of evacuees who have so far been evacuated is the largest at each time before the evacuation completion time is called the universal quickest flow. This is obtained by first finding the evacuation completion time and then finding the flow known as the lexicographic maximum flow [9] on the corresponding time-expanded network. When the sinks are subjected to the capacity constraint, the universal quickest flow does not always exist. However, when this constraint is imposed, the obtained flow is experimentally similar to the universal quickest flow [19].

#### **15.3 Pedestrian Simulation Model**

Because both the travel time and time interval of a dynamic network model are approximate, pedestrian simulations are carried out for the obtained route to improve the accuracy, and the travel time and congestion are confirmed. Because the present study deals with a large-scale road network, we use the one-dimensional pedestrian model with high computational efficiency developed by Yamashita et al. [20]. In this model, pedestrians walking in the same direction move in a row on an edge. This row is called a lane, and the number of lanes is determined according to the width of the sidewalk as determined in Sect. 15.4.1. As illustrated in Fig. 15.3, it is assumed that pedestrians move in their specified lane and do not overtake. A discrete-time simulation is performed to determine the speed of each pedestrian in a lane at the next time step from their current speed and the distance between each pedestrian and the one walking immediately in front.

In a lane as illustrated in Fig. 15.3, the leading pedestrian is defined as the one closest to the target node. Let *xi*(*t*) be the distance of the *i*-th pedestrian from the beginning of the edge from the starting vertex at time *t*. The velocity *x*˙*i*(*t* + δ*t*) of pedestrian *i* in the lane at time *t* + δ*t* is considered to depend on the current velocity of the pedestrian and the distance to the pedestrian walking immediately in front, and it is determined by

**Fig. 15.3** Movement of pedestrians in a lane according to one-dimensional pedestrian model

$$\dot{\mathbf{x}}\_{i}(t+\delta t) = \dot{\mathbf{x}}\_{i}(t) + \left(a\_{1}(\mathbf{v}\_{0} - \dot{\mathbf{x}}\_{i}(t)) - a\_{2}\exp\left(\frac{r - (\mathbf{x}\_{i-1}(t) - \mathbf{x}\_{i}(t))}{a\_{3}}\right)\right)\delta(t),\tag{15.6}$$

where *v*<sup>0</sup> is the free walking speed, *r* is the radius of the pedestrian, and *a*1, *a*2, and *a*<sup>3</sup> are parameters. According to a previous study [20], we set *v*<sup>0</sup> = 1.023 [m/s], *r* = 0.522 [m], *a*<sup>1</sup> = 0.962, *a*<sup>2</sup> = 0.869, and *a*<sup>3</sup> = 0.214.

#### **15.4 Data Preparation**

The geographic information system (GIS) datasets used in this study are listed in Table 15.1, and Fig. 15.4 shows the city of Osaka covered by this study, the 20-km zone within which people walk home, and the flooded area. In the following, we explain the data preparation.

#### *15.4.1 Road Network*

Based on the approach of the Cabinet Office of Japan for people struggling to return home [10], the road network was calculated from the roads in Osaka City except for the expressways, and the range of the buffer was 20 km. Consequently, a large-scale


**Table 15.1** GIS datasets used for optimization and simulation

**Fig. 15.4** Osaka City and its 20-km surrounding area

road network comprising 815 739 edges and 621 670 nodes was obtained. Simplification of this large-scale road network is described in the next section. In the case of an earthquake due to the Nankai Trough, a seismic intensity of a 6-lower is assumed in Osaka City. There is expected to be little major damage to roads and buildings at this seismic intensity, therefore in this study buildings and roads are assumed to be undamaged.

We assume that pedestrians move on sidewalks, but there are no sidewalk data for this road network. Therefore, referring to the regulations of the Ministry of Land, Infrastructure and Transport [12], we sampled the sidewalk width every 10 blocks using the distance-measuring function of Google Maps for each of six road types obtained from the road network data, and we unified the sidewalk width by each road type.

Next, a sidewalk along which only one person could pass at a time was made to be a lane, and the lane width was made to be uniformly 0.75 m. The number of lanes was set as an even value that did not exceed the determined width of the sidewalk divided by the width of a lane. This was done so that edges opposite to each other had the same number of lanes. For each road type, the maximum and minimum numbers of lanes obtained under these conditions were eight and two, respectively.

#### *15.4.2 Tsunami Evacuation Buildings*

As tsunami evacuation buildings, we used 649 buildings designated by Osaka City in 2016. These were inputted as GIS point data, and each point was connected to the nearest road edge by a straight line. The capacity of tsunami evacuees was set for each tsunami evacuation building. In total, 560 816 people could be accommodated in all the tsunami evacuation buildings. Figure 15.5 shows the tsunami evacuation

**Fig. 15.5** Tsunami evacuation buildings and passable bridges

buildings that were used. As described in Sect. 15.6, when we optimize and simulate the routes including the bridges over the Yodo River, the bridges in the flooded area are set to be impassable.

#### *15.4.3 Daytime Population*

The daytime population was calculated from mobile spatial statistics generated from the travel histories of users of mobile phones. As shown in Fig. 15.6, we used 500-m mesh data of the population at 2 p.m. on a weekday in Osaka City in April 2015. There were 2 696 546 residents and commuters in Osaka City during this period, but note that mobile spatial statistics cover only the population between 15 and 79 years of age. We allocated the daytime population equally to nodes of the road network in each mesh, and this became the initial arrangement of evacuees and stranded people. Because the mobile spatial statistics also contain the population of each residential area, we chose the node of the home place for each pedestrian randomly according to this information.

## *15.4.4 Decisions on Number of People Struggling to Return Home and Number of Evacuees*

The polygons of the tsunami-flooded area were superimposed on the road network, and the flooded nodes and edges were determined. For each visitor, the action of evacuate, walk home, or remain in place was chosen according to the flooded condition of the present node, the flooded condition of the home node, and the distance to the home node. Whether or not to return home on foot was determined by the method used by the Cabinet Office to estimate the number of people struggling to return home [10].

Let *R* denote the set of commuters struggling to return home. In this approach, the probability *Pr* of resident *r* ∈ *R* deciding to return home on foot is determined by the following equation based on the distance *dr* [km] from the current place to the returning place:

$$P\_i = \begin{cases} 1 & (d\_r < 10), \\ \frac{20 - d\_r}{10} & (10 \le d\_r < 20), \\ 0 & (20 \le d\_r). \end{cases} \tag{15.7}$$

In this study, the return distance of each visitor is the length of the shortest path from their present node to their home node obtained on the road network before simplification. In the case of resident *r* whose return distance is 10 ≤ *dr* < 20, the action is decided probabilistically according to *Pr* with uniform distribution. The conditions for each action are summarized in Table 15.2.

**Fig. 15.6** Distribution of commuters in Osaka City at 2 p.m. on a weekday in April 2015


**Table 15.2** Decision rules for each action


**Table 15.3** Breakdown of numbers of people involved in each activity

Table 15.3 lists the breakdown of the number of people for each activity classified according to these rules. Of the people who return home on foot, approximately 150 000 cross bridges over the Yodo River, and they become the objects for route optimization. Everyone else returning on foot was deemed to take the shortest route.

## **15.5 Simplifying and Restoring Large Road Network for Route Optimization**

Computing the quickest flow depends greatly on the scale of the network. Although the main part of the quickest-flow algorithm is computing the maximum flow, the effect of parallelization on this algorithm is limited. Therefore, it is difficult to apply this algorithm to a large network, even by using a recent central processing unit with many cores. In this study, we simplify the large-scale road network and optimize the routes for evacuation and returning home using the quickest flow. We also develop a method for restoring the optimized routes to the original road network. The proposed method is outlined below and illustrated in Fig. 15.7.

#### *15.5.1 Simplification of Road Network*

The basic idea is to construct a simplified road network by dividing the space by polygons of the sub-regions of the area and connecting their centers of gravity with straight lines between adjacent sub-regions. At this time, the sum of the numbers of lanes of the original edges crossing the line segments shared by two polygons is made to be the number of lanes of a simplified edge. If no original edges crossed between two sub-regions, then the two polygons are not connected by a simplified edge. The length of a simplified edge is the Euclidean distance between centers of gravity, and commuters assigned to nodes in a polygon are aggregated on its center of gravity.

**Fig. 15.7** Simplification and restoration of road network

The following procedures were carried out using GIS software to simplify the original road network: recognizing adjacent polygons, decomposing polygons into line segments, generating the centers of gravity of the sub-region polygons, extracting the road edge that crosses the line segment of each pair of adjacent polygons, and generating the simplified road network. Consequently, there were 36 276 edges and 15 853 vertices, these being approximately 4% and 3%, respectively, of those of the original road network.

## *15.5.2 Restoring Optimized Routes on Original Road Network*

Let *A* be a set of sub-regions traversed by an origin–destination (OD) path in the set of optimized OD paths on a simplified road network. In this study, we refer to the OD path in the original road network being obtained as the shortest path in the road network in *A* as route restoration. However, with this method, the destination may not be reachable using only the road network in *A* (see Fig. 15.7d2). In that case, the route obtained by the optimization is not used and is replaced by the shortest path in the whole road network. Then, the extent to which the original OD path could be restored from the route in *A* is evaluated as the reproduction rate.

#### **15.6 Route Optimization Settings**

Thus, the routes for evacuation and returning home can be optimized. To prioritize human life, we first secure evacuation routes for tsunami evacuees and then optimize the routes for people walking home. The procedure and settings are described below.

#### *15.6.1 Optimization Steps*

First, we explain the concept of optimization for tsunami evacuees. As mentioned in Sect. 15.4, in Osaka City, there are many tsunami evacuation buildings in the expected flooded area, and the plan is to evacuate to those buildings. However, many areas may continue to be flooded for several days even after drainage is carried out, and it is feared that many tsunami evacuation buildings will be isolated by flooding.

In the event of a tsunami disaster in a large city, it is reasonable to suppose that not many evacuees will use the tsunami evacuation buildings, given the limited resources for rescuing evacuees from such buildings. Therefore, it is necessary to clarify which areas contain evacuees who can only evacuate to a tsunami evacuation building. In this study, we optimize the destinations and routes of evacuees in the following three steps.

#### Step 1

In the simplified road network, the destinations of evacuees are set not as the tsunami evacuation buildings but as the intersections of the boundaries of the flooded-area polygons and the intersecting edges. Then, they are connected to one super sink, the route is optimized by the universal quickest flow, and the evacuation completion time for each evacuee is calculated.

#### Step 2

For evacuees whose evacuation completion time determined in step 1 exceeds 1 h and 50 minutes, their evacuation routes are optimized again using the universal quickest flow to evacuate to tsunami evacuation buildings. At this time, the optimization is executed by using the residual network of the time-expanded network used in step 1 except for that of evacuees in step 2.

#### Step 3

The routes of approximately 150 000 commuters walking home across passable four bridges over the Yodo River shown in Fig. 15.5 are optimized using the universal quickest flow. We refer to such pedestrians as "bridge passers." In this case, sinks are set to nodes on the north side of each bridge and are connected by one super sink. In addition, the residual network used in the optimization up to step 2 is used. People in areas other than the flooded area return home via the shortest route, this being because there is less congestion than on the bridges, and the optimization problem in this case becomes a general multi-commodity flow problem, which is more difficult than the quickest-flow problem.

#### *15.6.2 Computational Conditions*

The time unit of the quickest flow was set to 10 s considering the computational time and available memory. The walking speed of a pedestrian was set to 1 m/s. Meanwhile, the free walking speed in the one-dimensional pedestrian model was set to 1.023 m/s, which is similar to that in the original model [20]. The optimization and simulation code was implemented using Visual C++ 2015, and LEDA 6.4 was also used as a network library to solve the maximum-flow problem. The optimization and simulation were carried out on a personal computer (PC) with Windows 10 Professional 64 bit, an Intel Core i7-6700k, and 32 GB of memory.

#### **15.7 Results of Route Optimization**

The optimization results are shown below, where the evacuation completion time is the result of each pedestrian walking along the designated route using the onedimensional pedestrian model on the original road network.

#### *15.7.1 Computational Times*

The computational times for the route optimization for the two types of pedestrian are listed in Table 15.4. Even though the PC that was used was of an older specification dating back several generations, the computation took only a matter of days. In other words, the problem could be computed even with such a low-specification PC. Although there were fewer bridge passers, their optimization took longer, probably because their routes were longer.


**Table 15.4** Computational time for each optimization

#### *15.7.2 Reproducibility of Restored Routes*

We analyze the reproducibility of the routes optimized by the simplified network after they are restored to the original network. The reproducibility is evaluated by the difference in the length of a route before and after the restoration and the selectivity of the route described above. Table 15.5 lists the mean route lengths before and after network restoration for each type of pedestrian. In the case of evacuees, the mean route length increases after restoration, whereas it decreases for bridge passers. However, the restoration does not cause an extremely large difference in either case.

Table 15.6 lists the selection ratio, which is the percentage of each type of pedestrian using the routes obtained by the universal quickest flow after network restoration. Although the selection ratio for evacuees exceeded 80%, that for bridge passers was only 64%. Because the route became longer for the latter, this is thought to have increased the number of cases in which a route cannot be constructed within the limited range.

#### *15.7.3 Optimization Results*

Here, we assess by how much the optimization shortened the travel time compared with that of the shortest route.

First, regarding the movement by evacuation, Fig. 15.8 shows how the cumulative number of evacuees for each type of route varies with time, and Table 15.7 lists the mean evacuation time and evacuation completion time. Although the cumulative numbers of evacuees for both types of route vary similarly, the effect of the optimization is evident because it shortens the evacuation completion time by approximately 1 h compared with that of the shortest route. However, the evacuation completion time


**Table 15.5** Mean route lengths for each type of pedestrian before and after network restoration


**Table 15.6** Selection ratios for optimized routes

**Fig. 15.8** Cumulative number of arriving evacuees for each route type

**Table 15.7** Comparison of travel times of evacuees for both route types


is over 7 h, which is too long to avoid the impact of the tsunami. This is considered to be a result of interference between the routes of evacuees and people returning home. At the time of optimization, priority was given to evacuees, but this assumption may have collapsed upon restoring the routes. Regardless, it is suggested that evacuees should avoid evacuating outside the flooded area by using the tsunami evacuation buildings as much as possible.

Next, we perform a similar verification for bridge passers. Figure 15.9 shows how the cumulative number of arriving people varies with time, and Table 15.8 compares the mean travel times and travel completion times for both route types. In the case of the shortest route, pedestrian bridge congestion begins early, after which the slope of the straight line of the accumulated number of arriving people is relatively low. As a result, the completion time of returning home was drastically shortened by about 3 h and 20 min by the optimization with consideration of securing routes for tsunami evacuees.

**Fig. 15.9** Cumulative number of arriving bridge passers for each route type


**Table 15.8** Comparison of travel times of bridge passers for both route types

The effect of the optimization was demonstrated, especially for bridge passers. To understand the changes concretely, the total numbers of pedestrians passing along each road for both route types are visualized in Fig. 15.10. In the case of the shortest route, people returning home are concentrated on the Nagara Bridge, but when the route is optimized, two bridges upstream from the Nagara Bridge are used.

People generally use the Shin-Yodogawa Bridge to travel to the north of the Yodo River from Osaka City, but in this case that bridge cannot be used because it is in the tsunami-flooded area. Therefore, with no restrictions, most people returning home would cross the Nagara Bridge, which is the next one upstream of the Shin-Yodogawa Bridge. Bridges further upstream than the Nagara Bridge are not usually used for transportation from Osaka City because they are located more than 3 km away. However, the optimization means that these bridges are also used for returning home, and congestion is reduced.

#### **15.8 Conclusion**

In this study, we proposed a method for network simplification and restoration to optimize the traveling routes of more than 2 million pedestrians with a large-scale and detailed road network in Osaka City and its surrounding area. We then showed that such route optimization worked well.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 16 Stream-Based Lossless Data Compression**

**Shinichi Yamagiwa**

**Abstract** In this chapter, we introduce aspects of applying data-compression techniques. First, we study the background of recent communication data paths. The focus of this chapter is a fast lossless data-compression mechanism that handles data streams completely. A data stream comprises continuous data with no termination of the massive data generated by sources such as movies and sensors. In this chapter, we introduce LCA-SLT and LCA-DLT, which accept the data streams, as well as several implementations of these stream-based compression techniques. We also show optimization techniques for optimal implementation in hardware.

#### **16.1 Introduction to Stream-Based Data Compression**

Rapid communication data paths are demanded in computer systems to improve performance, and the fastest data paths have recently reached the order of tens of gigahertz as implemented by optical fiber. One solution to achieving rapid communication data paths is to have parallelized paths in multiple connections, but technological trials have offered no clear solutions because of electrical and physical limitations such as crosstalks and refractions. To overcome the problems associated with high-speed communication, this chapter focuses on data compression on the data path. There are two ways in which this can be implemented. One is software-based compression, which is typically implemented on the lower layer of the communication data path, such as the device-driver level of Ethernet [18]. The other way is hardware-based implementation, which must provide low latency and stream-based compression and decompression.

Well-known algorithms such as Huffman encoding [17] and Lempel-Ziv-Welch (LZW) compression [21, 22] perform data encoding by creating a symbol lookup table (LUT), in which frequent data patterns are replaced by compressed symbols

S. Yamagiwa (B)

Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan e-mail: yamagiwa@cs.tsukuba.ac.jp

JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan

in the table. However, hardware implementation presents the following difficulties: (1) the processing time is unpredictable because the data length is not deterministic, (2) maximal memory must be prepared because the lengths of the data patterns are not deterministic, and (3) blocking decompression is performed. Here, we focus on a stream-based lossless data-compression mechanism that overcomes these problems. The key technology is a histogram mechanism that caches the compressed data. The decompressor must manage the same table contents as the compressor side and reproduce the original data from the table. In this chapter, we introduce challenges to implementing stream-based lossless compression based on hardware. The ultimate goal is to implement compact and fast data-compression hardware without blocking the compression operations upon accepting continuous data streams. We begin by focusing on a technique with a static LUT, called LCA-SLT, and then we show one with a dynamic table, called LCA-DLT. We also describe performance optimizations for LCA-DLT.

## **16.2 Stream-Based Lossless Data Compression with Static Look-Up Table**

#### *16.2.1 Design of LCA-SLT*

We begin by focusing on a compression algorithm called online LCA (Lowest Common Ancestor) [12], which converts a symbol pair to an unused symbol with the LUT of symbol pairs managed as shown in Fig. 16.1, which shows an example of compressing the sequence ABCDFFBC to the symbol Z. Online LCA addresses the problems caused by conventional dynamic LUT management and provides a fixed time complexity due to the two-symbol matching. During the decompression, online LCA invokes the opposite mappings by repeating conversions from one symbol to two according to the table starting from the deepest compression step.

Applying the concept of online LCA, we show here the mechanism of LCA-SLT (LCA Static Look-up Table) [20], which prepares statically allocated LUTs that are used for converting symbol pairs. The compressor encodes inputted symbols using the LUTs, and the decompressor does the opposite. The contents of the tables are stored statically and initially before the compression/decompression. The tables are prepared heuristically in the following steps: (1) a test set of the target data is examined by online LCA, (2) the LUTs are created from all the original symbol pairs and their matching symbols, (3) the entries in the LUTs are sorted in ascending order by frequency, and finally (4) the entries in the top ranks are registered as the table contents. These steps implement the best matching patterns in the original data set as determined by the frequency analysis.

As shown in Fig. 16.2, the compressor and decompressor perform online LCA using the tables created from a set of test data patterns. The modules are connected

from the compressed and decompressed data lines one after another and organize a pipeline for recursive compression/decompression operations.

Lookup table Lookup table Lookup table

The method with static LUTs has two main advantages. First, the compressed data never include any additional information for table management. Second, the amount of table resources is deterministic. Therefore, LCA-SLT can be implemented on compact hardware and is fast because of its simple compression/decompression operations.

#### *16.2.2 Implementation of LCA-SLT*

On hardware, the compressor and decompressor can be implemented using a contentaddressable memory (CAM) [8] and a normal memory (MEM), respectively. The CAM is a type of hardware into which a set of data bits is inputted and that outputs a matched address where the data are stored. Figure 16.3 shows the organization of the compression part. As an example, the combination of two symbols becomes 16 bits when the symbol width is 8 bits, and we add another bit per compressed data to mark whether it is compressed, called the *compression mark (CMark) bit*. Figure 16.3 shows a compression pipeline in which four modules are connected. Each module

adds another CMark bit, and the number of bits in the compressed data is extended by one bit per compressed data. Thus, the compression module at the end of the pipeline generates 12-bit compressed data. Decompression involves the same operations as the compression steps but in the opposite direction.

#### *16.2.3 Performance Evaluations*

We discuss here the performance of the LCA-SLT. We evaluate the compression ratio and the matching ratio to the symbol pairs in the LUT during the compression. The table is implemented with a fixed number of entries, namely, 32, 64, 128, or 256. For the evaluations, we use Linux source codes of 50 and 200 Mbyte, as well as a DNA sequence of 50 MB downloaded from [2]. Figure 16.4 shows the compression ratio (the data size after compression divided by the original size) and the matching ratio of the symbol pairs during the compression. With increasing number of table entries, the compression ratio improves and the matching ratio of the symbol pairs becomes about 60%.

Next, we show the implementation of the LCA-SLT module with 8-bit symbols and 4-bit CMark on a Xilinx Spartan-6 field-programmable gate array (FPGA; IC code XC6SLX453CSG324). We have two options for implementing the CAM: either shift register LUT (SRL)-based or block RAM (BRAM)-based CAM. We can implement the MEM by applying the BRAM on the FPGA. Table 16.1 shows the compilation reports. The operation timings of both the SRL-based and BRAM-based CAM are precisely the same. However, the number of used slice registers is larger than that of the LUTs in the case of BRAMs because the latches are not packed into the LUTs. The LUTs are used for the combinational logic for the I/O buses around the memory. Besides, the SRL-based case increases the number of LUTs. Therefore, when an application needs many LUTs, such as wide data/address buses

**Fig. 16.4** Performances of LCA-SLT

for a processor interface, it is effective to implement LCA-SLT. On the other hand, the SRL-based implementation shows that the maximal frequency for the input clock will decrease drastically with increasing number of LUT entries. In the FPGA case, we must consider how the number of table entries affects the performance because the limited number of physical wires in the large-scale integration decreases the routing availability when the matching address bits due to the CAM become wide.

Thus, the LCA-SLT implements a compression mechanism with small overhead for data streams. It is reconfigurable depending on the characteristics of the target data, addressing the desired performance depending on the number of compression/decompression modules or the number of bits in a symbol or the available symbol mapping entries in the LUT.


**Table 16.1** Compilation reports regarding hardware implementations of LCA-SLT

## **16.3 Stream-Based Lossless Data Compression with Dynamic Look-Up Table**

#### *16.3.1 Design of LCA-DLT*

Next, we focus on another algorithm for stream-based data compression with dynamic table management, called LCA-DLT (LCA Dynamic Look-up Table) [19]. It allocates corresponding symbol LUTs for the compressor and the decompressor, respectively. Each table has any number *N* of entries and the *i*-th entry *Ei* includes a pair of the original symbols (*s*0*<sup>i</sup>* , *s*1*i*), a compressed symbol *Si* , and a frequent counter *counti* . The compressor side uses the following rules: (1) reading two symbols (*s*0, *s*1) from the input data stream and if they match to *s*0*<sup>i</sup>* and *s*1*<sup>i</sup>* in a table entry *Ei* , then after incrementing the *counti* , it outputs *Si* as the compressed data; (2) if the symbols do not match to any entry in the table, it outputs (*s*0, *s*1) and register an entry (*s*0*<sup>k</sup>* ,*s*1*<sup>k</sup>* , *Sk* , *countk* = 1) where *Sk* is the index number of the entry; (3) if all entries in the table are used, then decrement all *counti* (0 ≤ *i* < *N*) until any count(s) become zero, and then delete the corresponding entries from the table. When compressed data *S* are transmitted from the compressor, the steps in the decompressor are equivalent to those in the compressor. The symbol matching is performed based on *Sk* in an entry. If the compressed symbol *Si* matches to *Sk* in a table entry, then (*s*0*<sup>k</sup>* , *s*1*<sup>k</sup>* ) is outputted. If not, then another symbol *S* from the compressed data stream and the pair (*S*, *S* ) is outputted and then the pair is registered in the table.

**Fig. 16.5** Compression example for the LCA-DLT

**Fig. 16.6** Decompression example for the LCA-DLT

When the table entry is full, the same operations as those of the compressor are performed.

Figures 16.5 and 16.6 show examples of compression and decompression operations, respectively. Here, the input data stream for the compressor is ABABCDA-CABEFDCAB. First, the compressor reads the first two symbols AB and tries to match that pair in the table (Fig. 16.5a). However, the matching fails, and the compressor registers A and B as the *s*0 and *s*1 in the table. Here, the compressed symbol is assigned in the entry, which is the index 0 of the table. Thus, a rule AB→0 is performed. The *count* is initially set to 1. When the compressor continuously reads a pair of symbols (again AB) and it matches in the table, Fig. 16.5b translates AB to 0. Subsequently the equivalent operations are performed. If the table becomes full (Fig. 16.5c), then the compressor decrements the *count*(s) of all entries until any *count*s become zero. Here, three entries are invalidated from the table in the figure. The compressor will register a new entry to the invalidated entry from the smallest index of the table. Figure 16.5d shows that the compressor added a new entry after the invalidation. Finally, the original input data are compressed to AB0CDAC0EFDC0.

The decompressor reads A first (Fig. 16.6a), but it does not match any compressed symbol in the table (because the table is empty). The decompressor then reads another symbol B and registers AB to a new table entry. The entry saves a rule AB→0. Thus, the output becomes AB. The decompressor reads the next symbol 0 (Fig. 16.6b), which matches to the table entry. The decompressor translates it to AB and outputs it again. After the subsequent decompression operations, when the table becomes full, the decompressor decrements the *count*(s) as well as on the compressor side (Fig. 16.6c). The invalidated entries must be equivalent to those on the compressor side. Therefore, the compressed symbols are consistently associated with the original symbols. Finally, the compressed data inputted to the decompressor are associated and outputted as ABABCDACABEFDCAB, which is the same pattern as the input data on the compressor side.

#### *16.3.2 Implementation of LCA-DLT*

Figure 16.7 shows an implementation of the LCA-DLT. The input data are propagated through the latches, and the compressed/decompressed data are processed in a pipeline manner. The LUT in the compressor is organized as shown in Fig. 16.8a. The symbol LUT performs the compressed/decompressed data association. Here, the index becomes the compressed symbol, and the enable signal from the matching part increments the *count*. The full management logic of the LUT activates the invalidate control: it decrements the *count* and resets the valid bits (v in the figure) regarding the invalidated entry. The LUT in the decompressor is organized with a RAM and a CAM as shown in Fig. 16.8b. The management part of *count* also performs equivalently to that of the compressor based on a CAM. Besides, the matching part is implemented simply in a RAM. The compressed data generated from the address are inputted to the RAM, and the original uncompressed data pair is associated.

The invalidate operation looks for the minimal *count*s in the table entries by decrementing those counts. During the operation, the stall signal is outputted to stop the compression/decompression data pipeline. Figure 16.9a shows an implementation based on parallel decrement logic, and Fig. 16.9b shows one based on serial decrement logic. These two implementations have a tradeoff between the amount of logics and the compression speed when the table becomes full.

In the LCA-DLT as in the LCA-SLT, the compressor adds the CMark bit that indicates whether or not the symbol is compressed. Moreover, by combining the compressor and decompressor in a module and cascading the modules as shown in Fig. 16.10, we can compress long symbol patterns corresponding to 2, 4, 8, or 16

**Fig. 16.7** Overall functional block diagrams of the compressor and decompressor in LCA-DLT. The compressor's LUT receives two input symbols from the latches and outputs the selected signal to the multiplexer for the output data. The decompressor's LUT performs the opposite data translation

**Fig. 16.8** Detailed organization of LUTs in LCA-DLT. The table has 2n entries when a symbol is n bits. The matching part for s0 and s1 must be organized as a content-addressable memory (CAM), which outputs the index (i.e., the address in the CAM) matched to an inputted pair of (s0, s1). The management part for count is also organized by a CAM

E E

**Fig. 16.9** Decrementing logic for entry invalidation in LCA-DLT

**Fig. 16.10** Cascading modules of LCA-DLT. This example compresses long symbol patterns corresponding to 2, 4, 8, and 16 symbols. If the input data at the first compressor are 8 bits long, then the output compressed data become 12 bits because of the CMark bits

symbols when there are four modules. If the input data at the first compressor are 8 bits long, then the output compressed data become 12 bits after four modules because of the CMark bits.

**Fig. 16.11** Compression perfomances of LCA-DLT

#### *16.3.3 Performance Evaluations*

Figure 16.11 shows the compression ratios ((*compressed*\_*data*\_*size* ÷ *original*\_ *data*\_*size*) × 100). The numbers of table entries are varied from 16 to 256. Focusing on the performance impact of the number of table entries, the compression ratios are improved linearly except for the gene DNA sequence; because the DNA data have a few patterns, all patterns can be saved in 16 entries. Furthermore, focusing on the impact of the number of modules, the compression ratios degrade in the case of more than two modules. This means that a communication data path using too many compression modules becomes disadvantageous because of the CMark bit added after each module.

Figures 16.12 and 16.13 show the hardware performances of the LCA-DLT. It was implemented with only hundreds of slices and a memory block in the FPGA. The LCA-DLT works at 100MHz with any number of modules, thereby achieving 800 Mbit/s. The LCA-DLT has large impact on resource usage with respect to the logic but not the memory because the recent FPGA does not have any dedicated hardware

**Fig. 16.12** Hardware resources of LCA-DLT. It is compiled with 8-bit data input in the first compressor for the Xilinx Artix7 device (XC7A200T-1FBG676C)

**Fig. 16.13** Performance comparison between parallel and serial invalidation mechanisms with two modules

macros for CAMs. It is inevitably implemented by LUT and registers in the FPGA.We also compare the amount of hardware resources among the mechanisms of the parallel and the serial invalidations. The parallel version uses larger hardware resources; regarding the dynamic performance of the LCA-DLT, the parallel version involves very few stalls, but its hardware resources explode. Assuming that the hardware works at 100MHz, the effective bandwidth in the input of the first compressor is about 800 and 340–730 Mbit/s with the parallel and serial invalidations, respectively. The output bandwidth of the second compressor will be reduced to 35–80% of the original data size. This means that the LCA-DLT realizes a communication data path that can send more data even if the speed of the path is slow, and it also contributes largely to realizing a high-speed communication data path while providing flexible adjustment between the hardware resources and the compression performance.

#### **16.4 Optimization Techniques for LCA-DLT**

Here we introduce optimization techniques for implementing the LCA-DLT. We consider two available optimization techniques: lazy management and time-sharing multi-threading.

#### *16.4.1 Lazy Management of Look-Up Tables*

First, we consider the techniques of dynamic invalidation on LUTs and lazy compression [11] that eliminate stalls during the LUT invalidations.

#### **16.4.1.1 Dynamic Invalidation for Look-Up Table**

With the management technique of dynamic invalidation for the symbol LUT, we prepare a *remove pointer* and an *insertion pointer*. Initially, the remove pointer points to any entry of the symbol LUT. The *counti* is decremented when the pointer comes to the table index *i*, and if the *counti* becomes zero after the decrement, then the entry is removed from the table. The pointer is moved to the next table index after any table search operation. By contrast, the insertion pointer initially points also to any empty entry in the symbol LUT; if the entry is used, then the pointer moves to an unused entry. Using these two pointers, we can expect that a moderate number of the entries occupied in the symbol LUT can be removed.

Figure 16.14 shows an example of the dynamic invalidation mechanism for compression. We assume that DCAADCBBDB is inputted to the compressor and that the remove pointer starts on the second entry of the table. First, DC does not match any entry in the table (Fig. 16.14a), and the compressor waits for an empty entry to appear. The remove pointer is moved to the next entry and the count value is decremented. In Fig. 16.14b, the count value of the third entry becomes zero, whereupon the entry is removed. The insertion pointer is moved to point to the empty entry. The new entry for DC is registered to where the insertion pointer is pointing. Now, DC is outputted. During these operations, the input and output of the compressor stall. When the input symbol pair matches an entry, it is compressed as shown in Fig. 16.14c, d, the remove pointer is moved, and the count value is decremented. If the entry that matches the input symbol pair corresponds to the one pointed out by the remove pointer, then the count value does not change, as shown in Fig. 16.14e. Finally, after the initially inserted DC is removed because of the count value, the entry is used as a new one. Because it was not found in the table, DB is outputted. Thus, the compressed data stream becomes DC012DB.

Figure 16.15 shows the steps of the decompression mechanism using the dynamic invalidation. The inputted compressed data stream is the one generated by the compression in Fig. 16.14. The insertion and the remove pointers begin from the same

**Fig. 16.15** Example of the dynamic invalidation mechanism for decompression

**Fig. 16.16** Example of the lazy compression on compressor side

**Fig. 16.17** Example of the lazy compression on decompressor side

entries initially defined by the compressor. Although the matching target is the compressed data, the steps are equivalent to the ones performed on the compressor side. In Fig. 16.15a, b, the I/O of the decompressor stall. When matching the compressed symbol in an entry, the decompressor outputs the corresponding symbol pair such as in Fig. 16.15c, d. Again, a stall occurs during the invalidation of an entry as shown in Fig. 16.15e, f. Finally, the original data stream is decoded.

#### **16.4.1.2 Lazy Compression**

Another optimization technique is the lazy compression. This technique ignores compression using the symbol lookup table when the symbol lookup table is full. This eliminates stalls and continuously outputs the data to the decompressor side.

Figure 16.16 shows a compression example of lazy compression applied to the LCA-DLT with dynamic invalidation. First, DC does not match any entry in the table. Here, the lazy compression just passes through the symbol pair without registering the pair into the table. Therefore, no stall occurs as in Fig. 16.16a, e. When the symbol pair matches an entry, the pair is compressed to the corresponding symbol as shown in Fig. 16.16b, d. If the table contains empty entry(ies) when the inputted symbol pair does not match any entry, then it is registered to the empty entry and is also passed through to the output as in Fig. 16.16c. The output from the compressor becomes DC0DC1DB, which is larger than DC021DB for the case of eager compression.

Figure 16.17 shows the case for the decompressor. First, D is not included in the table, therefore the input is the original data pair because actually the CMark bit is added to the compressed data. The compressor does not register the pair and passes through DC to the output as shown in Fig. 16.17a, e without any stall. If the compressed data are in the table, then the decompressor translates the original symbol

**Fig. 16.18** Compression ratios with optimizations. The orange lines show lazy compression against the full search method, and the blue ones show dynamic invalidation against the full search method. The results depicted as lines were from using a compressor with four modules

pair such as in Fig. 16.17b, d. If the symbol is not in the table and there are empty entry(ies), then the inputted symbol pair is registered.

#### **16.4.1.3 Performance Evaluations**

Figure 16.18 shows the compression ratios with the above optimizations in the LCA-DLT. The bars show the ratios (i.e., the compressed data size divided by the original data size). We can confirm that the lazy compression effectively eliminates stalls and does not disturb the compression, although it does not compress the inputted data when the data pair does not match entries of the symbol LUT. Overall, both of the proposed mechanisms provide more-effective compression ratios than does the full

**Fig. 16.19** Stall cycles and the stall ratios against the total clock cycles in LCA-DLT with optimizations

search method. These mechanisms work well if the randomness of the data is high (i.e., the data entropy is high).

We measured the stall clock cycles to compare the dynamic performance of hardware implementation with that of the proposed techniques. We used a Xilinx Artix-7 FPGA XC7A200T-1FBG676C. The full search method works at 100MHz in this device as described in the previous section. By contrast, the implementation with both proposed mechanisms works at 130MHz because the implementation was simplified by the lazy management of the symbol LUT.

Figure 16.19 shows the stall cycles as the bars and the stall ratios against the total clock cycles as the lines. The total throughput of the data stream becomes much better than that with the full search method. The degradation of the throughput is 30% with the full search but less than 3% with dynamic invalidation. Regarding lazy compression, the compression delay is the number of clock cycles for the input data stream and is also the number of bytes of the input data (i.e., 10M cycles) because lazy compression never causes stalls.

#### *16.4.2 Time-Sharing Multithreading on Compression*

#### **16.4.2.1 Design and Implementation of Time-Sharing Multithreading**

The time-sharing multi-threading [10] allows the compressor and decompressor to accept multiple different data streams by dividing the dictionary updating operations

**Fig. 16.20** Example structure of time-sharing multi-threading (TSM)

among the various input streams. When N data streams are inputted to the compressor/decompressor, the dictionary updating for each data stream allows NâLš1 clock ´ cycles to be inserted to solve the updating problem. For example, Fig. 16.20, shows a structure with two compressors that share the pipeline stages for the dictionary updating operations while accepting two different data streams. This mechanism does not cause any stalls during the input data streams, therefore the bandwidth of a data stream of a whole compressor/decompressor module degrades to 1/N. However, the clock frequency is expected to increase.

In implementing the compression mechanism, the following operations are assigned to stages of the encoder pipeline for the compressor hardware. The preprocess operation is performed to prepare the subsequent table matching operation, after which the table search operation is performed. The symbol registration operation to the LUT performs registration of symbols, and finally the symbolizing/lookup operations are performed against the LUT. For decompression, the operations are performed in the opposite way to symbolize the compressed data to an original data pair.

Next we discuss an implementation example of time-sharing multi-threading in the LCA-DLT. Assume that there are two input data streams for the compressor/decompressor, and the pipeline of the compressor is organized as shown in Fig. 16.20. The compression in both data streams takes eight cycles to process a data pair, as does the decompression. The compression pipeline consists of the search stage and the registration stage. The search stage compares the contents of the LUT with the incoming data and then creates a match flag list, and the registration stage updates the corresponding entry in the table according to the match flag list. The decompression pipeline consists of the same stages but is organized in the opposite direction.

#### **16.4.2.2 Performance Evaluations**

Here we discuss the performance effect of time-sharing multi-threading. The example structure with two data streams per module explained above is implemented on a


**Table 16.2** Performance comparisons of the time-sharing multi-threading (TSM)

Xilinx Kintex UltraScale FPGA XCKU025-FFVA1156-1-C, and Table 16.2 shows the comparisons. Compared with the clock frequency without time-sharing multithreading, that with the optimization increases by a factor of approximately 1.23 for compression and 1.08 for decompression, meaning that the total throughputs of the compressor and decompressor are increased by the same corresponding factors. However, the improvement is shared by the two data streams, so a single data stream achieves approximately 62% of the total throughput without time-sharing multithreading for compression and 54% for decompression. Regarding the resource usage given in Table 16.2, the optimization reduces the combinational logic by 23–65%, the registers in the compressor module are increased by approximately 32%, and the number of registers is reduced to a third of that for the implementation without the optimization.

#### **16.5 Related Works and Literatures**

The most important lossless-compression algorithm is LZW, which is simple and effective and can be found in lossless-compression software such as gz, bzip2, rar, and lzh. However, when attempting to implement a compressor on hardware, the problems discussed in this chapter inevitably arise. To implement compact hardware for LZW, we must prepare memory of the order of kilobytes. For example, Fowers et al. [3] and Kim et al. [5] solved the problem regarding the longest matching by parallelizing the operations. However, it is impossible to increase the size of the sliding dictionary because the number of start indices increases with the length of the symbols. Another important research topic is how to manage the symbol LUT in a limited memory space.

The field of machine learning contains well-known algorithms such as lossy counting [9] and the space saving [13]. However, these algorithms use operations based on pointers and are implemented in software. For a data stream with *k* different symbols, an attractive algorithm for frequency counting has been proposed in which the top-θ*k* frequent items are counted exactly within *O*(1/θ ) space [4] for any constant 0 <θ< 1. However, this also provides a software solution. Various hardware implementations of lossless data-compression techniques have been investigated in this decade, and a well-known approach is *arithmetic coding* (here in short, AC) [6], which is used widely to compress multimedia data. Arithmetic coding includes heavy computation with floating-point numbers to achieve high compression ratios. To avoid floating-point calculations, arithmetic coding based on binary numbers has been proposed [1, 7, 15]. However, it is not possible to avoid the potential fractal computation, which is why hardware implementations such as those by Pande et al. [16] and Mitchell et al. [14] have been proposed to accelerate the computing speed.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.