**Financial Mathematics and Fintech**

# Zhiyong Zheng

# Modern Cryptography Volume 1

A Classical Introduction to Informational and Mathematical Principle

# **Financial Mathematics and Fintech**

# **Series Editors**

Zhiyong Zheng, Renmin University of China, Beijing, Beijing, China Alan Peng, University of Toronto, Toronto, ON, Canada

This series addresses the emerging advances in mathematical theory related to finance and application research from all the fintech perspectives. It is a series of monographs and contributed volumes focusing on the in-depth exploration of financial mathematics such as applied mathematics, statistics, optimization, and scientific computation, and fintech applications such as artificial intelligence, block chain, cloud computing, and big data. This series is featured by the comprehensive understanding and practical application of financial mathematics and fintech. This book series involves cutting-edge applications of financial mathematics and fintech in practical programs and companies.

The Financial Mathematics and Fintech book series promotes the exchange of emerging theory and technology of financial mathematics and fintech between academia and financial practitioner. It aims to provide a timely reflection of the state of art in mathematics and computer science facing to the application of finance. As a collection, this book series provides valuable resources to a wide audience in academia, the finance community, government employees related to finance and anyone else looking to expand their knowledge in financial mathematics and fintech.

The key words in this series include but are not limited to:


More information about this series at https://link.springer.com/bookseries/16497

Zhiyong Zheng

# Modern Cryptography Volume 1

A Classical Introduction to Informational and Mathematical Principle

Zhiyong Zheng School of Mathematics Renmin University of China Beijing, China

ISSN 2662-7167 ISSN 2662-7175 (electronic) Financial Mathematics and Fintech ISBN 978-981-19-0919-1 ISBN 978-981-19-0920-7 (eBook) https://doi.org/10.1007/978-981-19-0920-7

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

# **Preface**

I organized several seminars on cryptography, the students generally reflected that cryptography doesn't need much mathematics, and computer language and computer working environment are more important. Later, I reviewed several common cryptography textbooks at home and abroad. If so, these textbooks are for engineering students, and the purpose is to cultivate cryptographic engineers. It is my original intention to write a textbook of theoretical cryptography for students of mathematics department and science postgraduates, which systematically teaches the statistical characteristics of cryptographic system, the computational complexity theory of cryptographic algorithm and the mathematical principles behind various encryption and decryption algorithms.

With the rapid development of the new generation of digital technology, China has entered the era of information, network and intelligence. Cryptography is not only the cornerstone of national security in the information age, but also a sharp sword to protect people's property security, personal privacy and personal dignity. After the establishment of the first-class discipline of Cyberspace Security, China has established the first-class discipline of security. In particular, on December 19, 2019, China officially promulgated the code law to formulate a law for a discipline. This is rare all over the world. Lately, the central government explicitly requests to cultivate our own cryptography professionals. It can be seen that the discipline construction and personnel training of cryptography have been promoted to the height of national security, which has become a major national strategic demand. Writing a textbook on cryptography theory aims to cultivate our own cryptographers, which is the ultimate reason for writing this book.

Cryptosystem is an ancient art. Since the birth of human beings, there has been cryptosystem. For example, the means of communication used by human beings in war, the marks and conventions used by special groups can be classified into the category of cryptosystem art. Among them, the famous Caesar cryptosystem can be regarded as the representative work of ancient cryptosystem. For thousands of years, cryptosystem, as a technology, relies on personal intelligence and ingenuity. Occasionally, some mathematical ideas and methods were used fragmentarily. This era of cryptographers changed fundamentally only after the great American mathematician M. Shannon came out.

In 1948 and 1949, Shannon successively published two epoch-making papers in the technical bulletin of Bell laboratory. In the first paper, Shannon established the mathematical theory of communication and established the random measurement of information by using the method of probability theory, thus laid the foundation of modern information theory. In the second paper, Shannon established the informatics principle of cryptography, introduced the probability and statistics principle system of mathematics into cryptography structure and cryptanalysis, and transformed the ancient cryptography technology from art to science. Therefore, people not only call Shannon the father of modern information theory, but also the father of modern cryptography.

After Shannon's great changes from the era of cryptographer to the era of cryptoscience, the ancient cryptology technology ushered in the second historic leap in 1976, that is, the era of symmetric cryptography changed into the era of public key cryptography. In 1976, two Stanford University scholars W. Diffie and M. Hellman published a pioneering paper on asymmetric cryptography in *IEEE Transactions on Information Theory* and then entered the era of public key cryptography. Public key cryptography and mathematics are more deeply crossed and integrated, making cryptography an inseparable branch of mathematics. The era characteristic of public key cryptography is to change the cryptography from a few users to mass consumer products, which greatly improves the efficiency and social value of the cryptography. Nowadays, asymmetric cryptosystem is widely used in message authentication, identity authentication, digital signature, digital currency and blockchain architecture, which cannot be replaced by classical cryptosystem.

Based on Shannon's information theory, this book systematically introduces the information theory, statistical characteristics and computational complexity theory of public key cryptography, focusing on the three main algorithms of public key cryptography, RSA, discrete logarithm and elliptic curve cryptosystem, strives to know what it is and why it is, and lays a solid theoretical foundation for new cryptosystem design, cryptoanalysis and attack.

Lattice theory-based cryptography is a representative technology of postquantum cryptography, which is recognized by the academic community as being able to resist quantum computing attacks. At present, the theory and technology of lattice cryptography have not entered university textbooks, and various achievements and introductions have been scattered in research papers at home and abroad in the past two decades. The greatest feature of this book is that it systematically simplifies and combs the theory and technology of lattice cryptography, making it a classroom textbook for senior college students and postgraduates of cryptography, which will play an important role in accelerating the training of modern cryptography talents in China.

This book requires the reader to have a good foundation in algebra, number theory and probability statistics. It is suitable for senior students majoring in mathematics, compulsory for cryptography and science and engineering postgraduates. It can also be used as the main reference book for scientific researchers engaged in cryptography research and cryptographic engineering.

The main contents of this book have been taught in the seminar. My doctoral students Hong Ziwei, Chen Man, Xu Jie, Zhang Mingpei, Associate Professor Huang Wenlin and Dr. Tian Kun have all put forward many useful suggestions and help for the contents of this book. In particular, Chen Man has devoted a lot of time and energy to text printing and proofreading. Here, I would like to express my deep gratitude to them!

Beijing, China November 2021 Zhiyong Zheng

# **Contents**



# **Acronyms**

1. [*x*] denotes the largest integer not greater than the real number *x*, *x* denotes the smallest integer not less than the real number *x*, so there are

$$
\lceil \lfloor x \rceil \le x < \lceil x \rceil + 1, \lceil \lceil x \rceil - 1 < x \le \lceil x \rceil .
$$


# **Chapter 1 Preparatory Knowledge**

Modern cryptography and information theory is a branch of mathematics which develops rapidly. Almost all mathematical knowledge, such as algebra, geometry, analysis, probability and statistics, has very important applications in information theory. Especially, some modern mathematical theories, such as algebraic geometry, elliptic curve and ergodic theory, play more and more important roles in coding and cryptography. It can be said that information theory is the most dynamic branch of modern mathematics with wide application, strong intersection. This chapter requires the reader to have a preliminary knowledge of analysis, algebra, number theory and probability statistics.

# **1.1 Injective**

Let <sup>σ</sup> be a mapping of two nonempty sets *<sup>A</sup>* to *<sup>B</sup>*, denoted as *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>*. Generally, the mappings between sets can be divided into three categories: injective, surjective and bijective.

**Definition 1.1** Let σ be a mapping of two nonempty sets *A* → *B*, we define

(i) *a*, *b* ∈ *A*, if *a* = *b* ⇒ σ (*a*) = σ (*b*), call σ an injective of *A* → *B*, it is called injective for short.

(ii) If any *b* ∈ *B*, there is a *a* ∈ *A* ⇒ σ (*a*) = *b*, call σ a surjective of *A* → *B*.

(iii) If *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* is an injective and a surjective, call <sup>σ</sup> a bijective of *<sup>A</sup>* <sup>→</sup> *<sup>B</sup>*.

(iv) Let 1*<sup>A</sup>* be the identity mapping of *A* → *A*, which is defined as

$$1\_A(a) = a, \forall \ a \in A.$$

(v) Suppose *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* <sup>τ</sup> −→ *<sup>C</sup>* are two mappings, define the product mapping of τ and σ, τ σ : *A* → *C*, and define as

<sup>©</sup> The Author(s) 2022

Z. Zheng, *Modern Cryptography Volume 1*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-19-0920-7\_1

2 1 Preparatory Knowledge

$$
\pi \sigma(a) = \pi(\sigma(a)), \,\forall \, a \in A.
$$

Obviously, the product of two mappings has no commutativity but has the following associative law.

*Property 1* Let *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* <sup>τ</sup> −→ *<sup>C</sup>* <sup>δ</sup> −→ *<sup>D</sup>* be three mappings, then we have

$$(\delta \cdot \mathbf{r}) \cdot \sigma = \delta \cdot (\mathbf{r} \cdot \sigma). \tag{1.1}$$

*Proof* It can be verified directly by definition.

If *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* is a given mappings, obviously, there is

$$
\sigma \mathbf{l}\_A = \sigma,\ \mathbf{l}\_B \sigma = \sigma.\tag{1.2}
$$

The above formula shows that identity mapping plays the role of multiplication identity in the product of mapping.

**Definition 1.2** (i) Suppose *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* <sup>τ</sup> −→ *<sup>A</sup>* are two mappings, if τ σ <sup>=</sup> <sup>1</sup>*A*, call <sup>τ</sup> is a left inverse mapping of σ, σ is a right inverse mapping of τ .

(ii) Let *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* <sup>τ</sup> −→ *<sup>A</sup>*, If τ σ <sup>=</sup> <sup>1</sup>*A*,στ <sup>=</sup> <sup>1</sup>*B*, call <sup>τ</sup> is an inverse mapping of <sup>σ</sup>. Denote as τ = σ <sup>−</sup>1.

The essential properties of injective, surjective and bijective between sets are described by the following lemma.

**Lemma 1.1** *(i) If A* <sup>σ</sup> −→ *B has an inverse mapping B* <sup>τ</sup> −→ *A, that is* σ τ <sup>=</sup> <sup>1</sup>*<sup>B</sup> and* τ σ = 1*A, then* τ *is unique ( denote as* τ = σ <sup>−</sup>1*).*

*(ii) A* <sup>σ</sup> −→ *B is an injective if and only if* <sup>σ</sup> *has a left inverse mapping B* <sup>τ</sup> −→ *A, that is* τ σ = 1*A.*

*(iii) A* <sup>σ</sup> −→ *B is an surjective if and only if* <sup>σ</sup> *has a right inverse mapping B* <sup>τ</sup> −→ *A, that is* σ τ = 1*B.*

*(iv) A* <sup>σ</sup> −→ *B is an bijective if and only if* <sup>σ</sup> *has an inverse mapping* <sup>τ</sup> *, and* <sup>τ</sup> *is unique.*

*Proof* First of all, prove (i). Let *B* <sup>τ</sup><sup>1</sup> −→ *<sup>A</sup>* and *<sup>B</sup>* <sup>τ</sup><sup>2</sup> −→ *A* be two inverse mappings of σ, then we have

$$
\tau\_1 \sigma = 1\_A, \tau\_2 \sigma = 1\_A, \text{and } \sigma \tau\_1 = 1\_B, \sigma \tau\_2 = 1\_B, \dots
$$

From (1.2), we have

$$
\mathfrak{r}\_1 = \mathfrak{r}\_1 \mathfrak{l}\_B = \mathfrak{r}\_1(\sigma \mathfrak{r}\_2) = (\mathfrak{r}\_1 \sigma)\mathfrak{r}\_2 = \mathfrak{l}\_A \mathfrak{r}\_2 = \mathfrak{r}\_2,
$$

so if σ has an inverse mapping, then the inverse mapping is unique.

To prove (ii), we note that if σ has a left inverse mapping τ , that is τ σ = 1*A*, then σ must be an injective, because if *a*, *b* ∈ *A*, *a* = *b*, then we have σ (*a*) = σ (*b*). If σ (*a*) = σ (*b*) ⇒ τ (σ (*a*)) = τ (σ (*b*)) ⇒ *a* = *b*, contradiction with *a* = *b*. Conversely, if *<sup>A</sup>* <sup>σ</sup> −→ *<sup>B</sup>* is an injective, then for the element σ (*a*) <sup>∈</sup> σ (*A*) in σ (*A*) <sup>⊂</sup> *<sup>B</sup>*, Let τ (σ (*a*)) = *a*, For the elements in the difference set *B*\σ (*A*), arrange an image randomly, then *<sup>B</sup>* <sup>τ</sup> −→ *<sup>A</sup>* satisfies τ σ <sup>=</sup> <sup>1</sup>*A*. Similarly, we can prove (iii) and (iv), we thus complete the proof.

In many books of information theory, we often confuse injective and bijective, but they are two different concepts in mathematics, which needs attention.

# **1.2 Computational Complexity**

In binary computing environment, the complexity of an algorithm is measured by the number of bit operations. Bit, short for "Binary digit," is the basic amount of information, one bit represents one digit of binary system, two bits represent two digits of binary system, so what is "bit operation"?

To understand "bit operation" accurately, we start with the *b*-ary expression of real number. Let *b* > 1 be a positive integer, and any nonnegative real number *x*can be uniquely expanded into the following geometric series.

$$\begin{aligned} \chi &= \sum\_{-\infty < i \le k-1} d\_i b^i \\ &= d\_{k-1} b^{k-1} + d\_{k-2} b^{k-2} + \dots + d\_1 b + d\_0 + \sum\_{i=1}^{+\infty} d\_{-i} b^{-i}, \end{aligned} \tag{1.3}$$

where ∀ *di* satisfies 0 ≤ *di* < *b*. So we can express *x* as

$$\mathbf{x} = (d\_{k-1}d\_{k-2}\cdots d\_0d\_{-1}d\_{-2}\cdots)\_b,\tag{1.4}$$

where (*dk*−<sup>1</sup>*dk*−<sup>2</sup> ··· *d*0)*<sup>b</sup>* is called a *b*-ary integer, (0.*d*−<sup>1</sup>*d*−<sup>2</sup> ···)*<sup>b</sup>* is called a *b*-ary decimal, and

$$\mathbf{x} = (d\_{k-1}d\_{k-2}\cdots d\_0)\_b + (0.d\_{-1}d\_{-2}\cdots)\_b. \tag{1.5}$$

If *b* = 2, then *x* = (*dk*−<sup>1</sup>*dk*−<sup>2</sup> ··· *d*0*d*−<sup>1</sup> ···)<sup>2</sup> is called the binary representation of *x*. If *b* = 10, then

$$\begin{aligned} \mathbf{x} &= (d\_{k-1}d\_{k-2}\cdots \cdot d\_1d\_0d\_{-1}d\_{-2}\cdots \cdot)\_{10} \\ &= d\_{k-1}d\_{k-2}\cdots \cdot d\_1d\_0.d\_{-1}d\_{-2}\cdots \cdot. \end{aligned} \tag{1.6}$$

It is our customary decimal expression. It is worth noting that in any system, integers and integers are one-to-one correspondence, and decimals and decimals are one-toone correspondence. For example, integer in decimal system corresponds to integer in binary system, so does decimal. In other words, the real number of (0, 1) interval on the real number axis corresponds to the decimal number of (0, 1) under the binary system one by one. It should be noted that we often ignore binary decimal; in fact, it is the main technical support of various arithmetic codes, such as Shannon code.

Now let us see the *b*-ary expression of a positive integer *n* in the decimal system. Let

$$m = (d\_{k-1}d\_{k-2} \cdots d\_1d\_0)\_b, \; 0 \le d\_i \prec b, d\_{k-1} \ne 0,$$

*k* is the number of *b*-ary digits of *n*.

**Lemma 1.2** *The number k of b-ary digits of positive integer n can be calculated according to the following formula:*

$$k = \lceil \log\_b n \rceil + 1,\tag{1.7}$$

[*x*] *denotes the largest integer not greater than the real number x.*

*Proof* Because of *dk*−<sup>1</sup> = 0, there is *bk*−<sup>1</sup> ≤ *n* < *b<sup>k</sup>* , that is

$$k - 1 \le \log\_b n < k,$$

There's *k* − 1 ≤ [log*<sup>b</sup> n*] on the left, [log*<sup>b</sup> n*] + 1 ≤ *k* on the right, and together there's

$$k = \lceil \log\_b n \rceil + 1.$$

We complete the proof of Lemma 1.2.

Now let us see the addition operation in *b*-ary system. For simplicity, we consider the addition of two positive integers in binary system. Let *n* = (1111000)2, *m* = (11110)2, then *n* + *m* = 1111000 + 0011110 = 10010110, that is *n* + *m* = (10010110)2. The addition of numbers on the same bit actually includes the following five contents (or operations).


#### 1.2 Computational Complexity 5

**Definition 1.3** A bit operation is an addition operation on the same bit in binary addition. Suppose *A* is an algorithm in binary system, we use Time(*A*) to represent the number of bit operations in algorithm *A*, that is, Time(*A*) = completes the total number of bit operations performed by algorithm *A*.

It is easy to deduce the number of bit operations of binary about addition and subtraction by definition. Let *n*, *m* be two positive integers, and their binary expression bits are *k* and *l* respectively, then

$$\text{Time}(n \pm m) = \max\{k, l\}.\tag{1.8}$$

In the same way, the number of bit operations required for the multiplication of B and D in binary system is satisfied

$$\text{Time}(nm) \le (k+l) \cdot \min\{k, l\} \le 2kl. \tag{1.9}$$

It is very convenient to estimate the number of bit operations by using the symbol "*O* commonly used in number theory. If *f* (*x*) and *g*(*x*) are two real valued functions, *g*(*x*) > 0, suppose there are two absolute constants *B* and *C* such that when |*x*| > *B*, we have

$$|f(\mathbf{x})| \le Cg(\mathbf{x}), \quad \text{notes } f(\mathbf{x}) = O(g(\mathbf{x})).$$

This sign indicates that when *x* → ∞, the order of growth of *f* (*x*) is the same as that of *g*(*x*). For example, let *f* (*x*) = *ad x <sup>d</sup>* + *ad*−1*x <sup>d</sup>*−<sup>1</sup> +···+ *a*1*x* + *a*0(*ad* > 0), then

$$f(\mathbf{x}) = O(|\mathbf{x}|^d), \quad \text{or } f(\mathbf{n}) = O(n^d), \ n \ge 1.$$

For any ε > 0, there is

$$
\log n = O(n^{\epsilon}), \ n \ge 1.
$$

From the lemmas of 1.2, (1.8) and (1.9), we have

**Lemma 1.3** *Let n*, *m be two positive integers, k and l are the bits of their binary expression, respectively, if m* ≤ *n, then l* ≤ *k, and*

$$\operatorname{Time}(n \pm m) = O(k) = O(\log n);$$

*Time*(*nm*) = *O*(*kl*) = *O*(log *n* log *m*);

$$\text{and}\quad \text{Time}(\frac{n}{m}) = O(kl) = O(\log n \log m).$$

In the above lemma, division is similar to multiplication. Next, we discuss the number of bit operations required to convert a binary representation into a decimal representation, and the number of bit operations required for *n*! to operate in binary.

#### **Lemma 1.4** *Let k be the number of digits in binary of n, then*

*Time*(*n convert to decimal expression*) <sup>=</sup> *<sup>O</sup>*(*k*<sup>2</sup> ) <sup>=</sup> *<sup>O</sup>*(log<sup>2</sup> *<sup>n</sup>*)

*and*

$$Time(n!) = O(n^2 k^2) = O(n^2 \log^2 n)...$$

*Proof* To convert *n* = (*dk*−<sup>1</sup>*dk*−<sup>2</sup> ··· *d*1*d*0)<sup>2</sup> to decimal expression. Then divide *n* by 10 = (1010)2, and the remainder is 0, 1, 10, 11, 100, 101, 110, 111, 1000 or 1001, one of these binary numbers, these ten numbers correspond to one of the numbers from 0 to 9 and are denoted as *a*0(0 ≤ *a*<sup>0</sup> ≤ 9), put *a*<sup>0</sup> as the decimal number of *n*. Similarly, divide the quotient by 10 = (1010)2, and the remainder is converted into a number from 0 to 9 as the ten digits of *n* in decimal system. If we go on like this, we use division [log10 *n*] + 1 times, the bit operation required for each division is *O*(4*k*), so

> Time(*<sup>n</sup>* convert to decimal expression) <sup>≤</sup> *<sup>k</sup>* · *<sup>O</sup>*(4*k*) <sup>=</sup> *<sup>O</sup>*(*k*<sup>2</sup> ).

In the same way, we can prove the bit operation estimation of *n*!. We complete the proof of Lemma 1.4.

Let us deduce the computational complexity of some common number theory algorithms. Let *m* and *n* be two positive integers, then there is a nonnegative integer*r* such that*m* ≡ *r*(mod *n*), where 0 ≤ *r* < *n*, we call*r* the smallest nonnegative residue of *m* under mod *n*, and denote as *r* = *m* mod *n*. If 1 ≤ *m* ≤ *n*, Euclid's division method is usually used to find the greatest common divisor (*n*, *m*) of *n* and *m*. If (*m*, *n*) = 1, then there is a positive integer *a* such that *ma* ≡ 1(mod *n*), *a* is called the multiplicative inverse of *m* under mod *n*, denote as *m*−<sup>1</sup> mod *n*. By Bezout formula, if (*n*, *m*) = 1, then there are integers *x* and *y* such that *xm* + *yn* = 1, we usually use the extended Euclid algorithm to find *x* and *y*. If we find x, we actually calculate *m*−<sup>1</sup> mod *n*. Under the above definitions and notations, we have

**Lemma 1.5** *(i) Suppose m and n are two positive integers, then*

*Time*(*calculate m* mod *n*) = *O*(log *n* · log *m*).

*(ii) Suppose m and n are two positive integers, and m* ≤ *n, then*

*Time*(*calculate* (*n*, *<sup>m</sup>*)) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> *<sup>n</sup>*).

*(iii) Suppose m and n are two positive integers, and* (*m*, *n*) = 1*, then*

*Time*(*calculate m*−<sup>1</sup> mod *<sup>n</sup>*) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> max(*n*, *<sup>m</sup>*)).

*(iv) Suppose n*, *m*, *b are positive integers, b* < *n, then*

$$\operatorname{Time}(b^m \bmod n) = O(\log m \cdot \log^2 n).$$

*Proof* To find the minimum nonnegative residue *r* of *m* under mod *n* is actually a division with remainder!

$$m = kn + r, \ 0 \le r < n.$$

From the lemma 1.3,

$$\text{Time}(\text{calculate } m \bmod n) = O(\log n \cdot \log m),$$

(i) holds. The Euclid algorithm used to calculate the greatest common divisor (*n*, *m*) of *n* and *m*, in fact, it is a division of *O*(log *n*) times with remainder, so

Time(calculate (*n*, *<sup>m</sup>*)) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> *<sup>n</sup>*).

In Euclid algorithm, we can get *x* and *y* by pushing from bottom to top, such that *xm* + *yn* = 1, this incremental process is called the expansion of Euclid algorithm, therefore, if *m* ≤ *n*, then

Time(calculate *<sup>m</sup>*−<sup>1</sup> mod *<sup>n</sup>*) <sup>=</sup> Time (calculate(*n*, *<sup>m</sup>*)) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> *<sup>n</sup>*).

(iv) the computational complexity of the power of an integer under mod *n*. the proof method is the famous "repeated square method" . Let

> *m* = (*mk*−<sup>1</sup>*mk*−<sup>2</sup> ··· *m*1*m*0)<sup>2</sup> <sup>=</sup> *<sup>m</sup>*<sup>0</sup> <sup>+</sup> <sup>2</sup>*m*<sup>1</sup> <sup>+</sup> <sup>4</sup>*m*<sup>2</sup> +···+ <sup>2</sup>*<sup>k</sup>*−<sup>1</sup> *mk*−<sup>1</sup>

be the binary representation of *m*, where *mi* = 0 or 1. First, let *a* = 1. if *m*<sup>0</sup> = 1, replace *a* with *b*, if *m*<sup>0</sup> = 0, then *a* = 1 remains unchanged, and let *b*<sup>1</sup> = *b*<sup>2</sup> mod *n*, this is the first square. If *m*<sup>1</sup> = 1, replace *a* with *ab*<sup>1</sup> mod *n*, if *m*<sup>1</sup> = 0, *a* remains unchanged, and let *b*<sup>2</sup> = *b*<sup>2</sup> <sup>1</sup> mod *n*, this is the second square. So if we go on to the square of *j*, we have

$$b\_j \equiv b^{2^j} (\bmod n).$$

Our calculation ends after the square of (*k* − 1); at this time, there is

$$a \equiv b^{m\_0 + 2m\_1 + 4m\_2 + \dots + 2^{k-1}m\_{k-1}} \equiv b^m (\text{mod } n) \dots$$

Obviously, the number of bit operations per square is *<sup>O</sup>*((log *<sup>n</sup>*<sup>2</sup>)<sup>2</sup>) <sup>=</sup> *<sup>O</sup>*(log<sup>2</sup> *<sup>n</sup>*). There is a total of *k* square operations, *k* = *O*(log *m*). So (iv) holds. We have completed the proof.

**Definition 1.4** If an algorithm *f* involves positive integers *n*1, *n*2,..., *nr*, whose binary digits are *k*1, *k*2,..., *kr*, and there are absolute nonnegative integers *d*1, *d*2, ..., *dr* such that

$$\text{Time}(f) = O(k\_1^{d\_1} k\_2^{d\_2} \cdots k\_r^{d\_r}),\tag{1.10}$$

The complexity of algorithm *f* is called polynomial; otherwise, it is called nonpolynomial.

From Lemma 1.4 , we can see that addition, subtraction, multiplication and division between positive integers are polynomial algorithms, but *n*! operation is the simplest example of nonpolynomial algorithm. If we do not need an exact value of *n*! and only need an approximate value, we can get an approximate value of *n*! by using a polynomial term algorithm based on Stirling formula (see Sect. 1.4 of this chapter). In the formula (1.10), if *d*<sup>1</sup> = *d*<sup>2</sup> =···= *dr* = 0, the complexity of algorithm *f* is constant, if *d*<sup>1</sup> = *d*<sup>2</sup> =···= *dr* = 1, the complexity of the algorithm *f* is said to be linear (the same is true for quadratic, cubic, etc.). In order to characterize nonpolynomial algorithms, we introduce two concepts: exponential and subexponential algorithms.

**Definition 1.5** Suppose that an algorithm *f* involves a positive integer *n*, and its binary digits are *k*, if

$$\text{Time}(f) = O(t^{g(k)}),\tag{1.11}$$

where *t* is a constant greater than 1, and *g*(*k*) is a polynomial function of *k* and deg *g* ≥ 1, then the computational complexity of *f* is exponential. If *g*(*k*) is not a polynomial function, but a function smaller than a polynomial, such as *e* <sup>√</sup>*<sup>k</sup>* log *<sup>k</sup>* , then the computational complexity of *f* is subexponential.

From the above definition, we can see the computational complexity of *n*!, let *k* be the binary number of *n*, from 1.2, then *n* = *O*(2*<sup>k</sup>* ), and then from 1.4,

$$\text{Time}(n!) = O(n^2 k^2) = O(k^2 2^{2k}) = O(2^{3k}),$$

So the computational complexity of *n*! in binary system is exponential. This is the simplest example of exponential algorithm.

Bit algorithm cannot only define the computational complexity but also describe the running speed and time complexity of computer. The so-called computer speed refers to the total number of bits that the computer can complete in unit time (such as a second, or 1 microsecond). Therefore, there is no difference between the computational complexity and the time complexity of an algorithm. We can use the figure below to illustrate, suppose that a computer can complete 10<sup>6</sup> bit operations in one second. When the binary bit of the algorithm is *k* = 106, the following figure lists the running time of different computational complexity algorithms on this computer (Table 1.1).

Note that 1 year ≈ 3 × 10<sup>7</sup> seconds, the age of the universe is about 10<sup>10</sup> years; when the number of binary digits *k* is large, the algorithm with exponential or subexponential computational complexity is actually impossible to complete on the computer; therefore, the only way to solve the problem is to improve the speed of the computer.

Computational complexity is often used to describe the complexity of a problem, because the computational complexity is also time complexity when the computer hardware conditions (such as computing speed and storage capacity) remain


**Table 1.1** Time requirements of algorithms with different computational complexity (*<sup>k</sup>* <sup>=</sup> <sup>10</sup>6)

unchanged. At present, the complexity of algorithms is defined in a model called Turing machine. Turing machine is a kind of finite state machine with infinite read and write ability. If the result of each operation and the content of the next operation are uniquely determined, such Turing machine is called deterministic Turing machine. Therefore, the determinacy of a polynomial algorithm is accomplished on a determinate Turing machine.

**Definition 1.6** If a problem can be solved by polynomial algorithm on a certain Turing machine, it is called a *P* class problem, and the *P* class problem is often called an easy to handle problem. If a problem can be solved by polynomial algorithm on an uncertain Turing machine, it is called a *N P* class problem.

According to the definition, the *P* class problem is definitely a *N P* class problem, because it can be solved by polynomial algorithm on deterministic Turing machine, and it can also be solved by polynomial algorithm on nondeterministic Turing machine. On the other hand, is the *N P* problem strictly larger than the *P* problem? This is an open problem that has not been solved in the field of theoretical computer. There is neither strict proof nor counterexample to show that a problem that can be solved by polynomial on a nondeterministic Turing machine cannot be solved by polynomial algorithm on a deterministic Turing machine. It is widely speculated that the problem of *P* class and *N P* class is not equivalent, which is also the cornerstone of many cryptosystems.

# **1.3 Jensen Inequality**

A real valued function *f* (*x*) in the interval (*a*, *b*) is called a strictly convex function, if for ∀ *x*1, *x*<sup>2</sup> ∈ (*a*, *b*), λ<sup>1</sup> > 0, λ<sup>2</sup> > 0, λ<sup>1</sup> + λ<sup>2</sup> = 1, we have

$$
\lambda\_1 f(\mathbf{x}\_1) + \lambda\_2 f(\mathbf{x}\_2) \le f(\lambda\_1 \mathbf{x}\_1 + \lambda\_2 \mathbf{x}\_2),
$$

and the equation holds if and only if *x*<sup>1</sup> = *x*2. By inductive method, we can prove the Jensen inequality as follows.

**Lemma 1.6** *If f* (*x*) *is a strictly convex function over* (*a*, *b*)*, then for any positive integer n* > 1*, any positive number* λ*i*(1 ≤ *i* ≤ *n*)*,* λ<sup>1</sup> + λ<sup>2</sup> +···+ λ*<sup>n</sup>* = 1 *and any xi* ∈ (*a*, *b*)(1 ≤ *i* ≤ *n*)*, we have*

$$\sum\_{i=1}^{n} \lambda\_i f(\mathbf{x}\_i) \le f(\sum\_{i=1}^{n} \lambda\_i \mathbf{x}\_i),\tag{1.12}$$

*the equation holds if and only if x*<sup>1</sup> = *x*<sup>2</sup> =···= *xn.*

*Proof* By inductive method, the proposition holds when *n* = 1 and *n* = 2. Suppose the proposition holds for *n* − 1. When *n* > 2, let

$$x' = \frac{\lambda\_1}{\lambda\_1 + \lambda\_2}x\_1 + \frac{\lambda\_2}{\lambda\_1 + \lambda\_2}x\_2,$$

it can be seen that *x* ∈ (*a*, *b*) and (λ<sup>1</sup> + λ2)*x* = λ1*x*<sup>1</sup> + λ2*x*2, therefore,

$$\begin{aligned} \sum\_{i=1}^n \lambda\_i f(\mathbf{x}\_i) &= \lambda\_1 f(\mathbf{x}\_1) + \lambda\_2 f(\mathbf{x}\_2) + \sum\_{i=3}^n \lambda\_i f(\mathbf{x}\_i) \\ &\le (\lambda\_1 + \lambda\_2) f(\mathbf{x}') + \sum\_{i=3}^n \lambda\_i f(\mathbf{x}\_i) \\ &\le f(\lambda\_1 \mathbf{x}\_1 + \lambda\_2 \mathbf{x}\_2 + \dots + \lambda\_n \mathbf{x}\_n) .\end{aligned}$$

We have the proposition that holds for *n*. Thus, the inequality (1.12) holds.

From the knowledge of mathematical analysis, *f* (*x*) is called a strictly convex function in the interval (*a*, *b*) if and only if *f* (*x*) < 0. Take *f* (*x*) = log *x*, then *f* (*x*) <sup>=</sup> <sup>−</sup><sup>1</sup> *<sup>x</sup>*<sup>2</sup> ln 2 , thus log *x* is a strictly convex function on the interval of (0, +∞), from Jensen inequality, we have the following inequality.

**Lemma 1.7** *Let g*(*x*) *be positive function, that is g*(*x*) > 0*, then for any integers* λ*i*(1 ≤ *i* ≤ *n*)*,* λ<sup>1</sup> + λ<sup>2</sup> +···+ λ*<sup>n</sup>* = 1*, and any a*1, *a*2,..., *an, we have*

$$\sum\_{i=1}^{n} \lambda\_i \log \mathbf{g}(a\_i) \le \log \sum\_{i=1}^{n} \lambda\_i \mathbf{g}(a\_i),\tag{1.13}$$

*the equation holds if and only if g*(*a*1) = *g*(*a*2) =···= *g*(*an*)*.*

*Proof* Because log *x* is strictly convex, let *xi* = *g*(*ai*), then *xi* ∈ (0, +∞)(1 ≤ *i* ≤ *n*), by Jensen inequality,

#### 1.3 Jensen Inequality 11

$$\begin{aligned} \sum\_{i=1}^n \lambda\_i \log g(a\_i) &= \sum\_{i=1}^n \lambda\_i \log x\_i \\ &\le \log(\sum\_{i=1}^n \lambda\_i x\_i) \\ &= \log(\sum\_{i=1}^n \lambda\_i g(a\_i)). \end{aligned}$$

So the lemma holds.

A real valued function *f* (*x*) is called a strictly convex function in the interval (*a*, *b*), if for ∀ *x*1, *x*<sup>2</sup> ∈ (*a*, *b*), λ<sup>1</sup> > 0, λ<sup>2</sup> > 0, λ<sup>1</sup> + λ<sup>2</sup> = 1, we have

$$f(\lambda\_1 \mathbf{x}\_1 + \lambda\_2 \mathbf{x}\_2) \le \lambda\_1 f(\mathbf{x}\_1) + \lambda\_2 f(\mathbf{x}\_2),$$

and the equation holds if and only if *x*<sup>1</sup> = *x*2. By induction, we can prove the following general inequality.

**Lemma 1.8** *If f* (*x*) *is called a strictly convex function in the interval* (*a*, *b*)*, then for any positive integer n* ≥ 2*, any positive numbers* λ*i*(1 ≤ *i* ≤ *n*)*,* λ<sup>1</sup> + λ<sup>2</sup> +···+ λ*<sup>n</sup>* = 1 *and any xi* ∈ (*a*, *b*)(1 ≤ *i* ≤ *n*)*, then we have*

$$f(\sum\_{i=1}^{n} \lambda\_i \mathbf{x}\_i) \le \sum\_{i=1}^{n} \lambda\_i f(\mathbf{x}\_i),\tag{1.14}$$

*the equation holds if and only if x*<sup>1</sup> = *x*<sup>2</sup> =···= *xn.*

We know that *f* (*x*)is strictly convex in the interval(*a*, *b*)if and only if *f* (*x*) > 0. Let *f* (*x*) = *x* log *x*, then *f* (*x*) <sup>=</sup> <sup>1</sup> *<sup>x</sup>* ln 2 > 0, when *x* ∈ (0, +∞). Then we have the following logarithmic inequality.

**Lemma 1.9** *If a*1, *a*2,..., *an and b*1, *b*2,..., *bn are two groups of positive numbers, then there are*

$$\sum\_{i=1}^{n} a\_i \log \frac{a\_i}{b\_i} \ge (\sum\_{i=1}^{n} a\_i) \log \frac{\sum\_{i=1}^{n} a\_i}{\sum\_{i=1}^{n} b\_i}.\tag{1.15}$$

*Proof* Because *f* (*x*) = *x* log *x* is a strictly convex function, from 1.8, we have

$$f(\sum\_{i=1}^{n} \lambda\_i x\_i) \le \sum\_{i=1}^{n} \lambda\_i f(x\_i),$$

where *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> λ*<sup>i</sup>* = 1. Take λ*<sup>i</sup>* = *bi n <sup>j</sup>*=<sup>1</sup> *bj* , *xi* <sup>=</sup> *ai bi* , then

$$\frac{1}{\sum\_{j=1}^n b\_j} \sum\_{i=1}^n a\_i \log \frac{\sum\_{i=1}^n a\_i}{\sum\_{i=1}^n b\_i} \le \sum\_{i=1}^n \frac{a\_i}{\sum\_{j=1}^n b\_j} \log \frac{a\_i}{b\_i},$$

*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *bj* is deleted at the same time on both sides, then there is

$$(\sum\_{i=1}^n a\_i) \log \frac{\sum\_{i=1}^n a\_i}{\sum\_{i=1}^n b\_i} \le \sum\_{i=1}^n a\_i \log \frac{a\_i}{b\_i},$$

thus (1.15) holds.

The above formula is called logarithm sum inequality, which is often used in information theory.

# **1.4 Stirling Formula**

In number theory (see reference 1's Apostol 1976), we can get the average asymptotic formula of some arithmetic functions by using the *Euler* sum formula, the most important of which is the following *Stirling* formula. For all real numbers *x* ≥ 1, we have

$$\sum\_{1 \le m \le x} \log m = x \log x - x + O(\log x),\tag{1.16}$$

where the *O* constant is an absolute constant. Take *x* = *n* ≥ 1 as a positive integer, then there is *Stirling* formula

$$
\log n! = n \log n - n + O(\log n). \tag{1.17}
$$

In number theory, the *Stirling* formula appears in the more precise form below,

$$n! \approx \sqrt{2\pi n} (\frac{n}{e})^n$$

or

$$\lim\_{n \to \infty} \frac{n!}{\sqrt{2\pi n}(\frac{n}{e})^n} = 1.$$

**Lemma 1.10** *Let* <sup>0</sup> <sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> *<sup>n</sup>*, *<sup>n</sup>*, *m be nonnegative integer, and <sup>n</sup> m be the combination number, then*

$$
\binom{n}{m} \le \frac{n^n}{m^m (n-m)^{n-m}}.\tag{1.18}
$$

#### 1.4 Stirling Formula 13

*Proof*

$$m^n = (m + (n - m))^n \ge \binom{n}{m} m^m (n - m)^{n - m},$$

The (1.18) follows at once.

We define the binary entropy function *H*(*x*)(0 ≤ *x* ≤ 1) as follows.

$$H(\mathbf{x}) = \begin{cases} 0, & \text{if } \mathbf{x} = \mathbf{0}. \\ -\mathbf{x}\log\mathbf{x} - (\mathbf{l} - \mathbf{x})\log(\mathbf{l} - \mathbf{x}), & \text{if } \mathbf{0} < \mathbf{x} \le \mathbf{1}. \end{cases} \tag{1.19}$$

It is obvious that *H*(*x*) = *H*(1 − *x*). So we only need to consider the case of <sup>0</sup> <sup>≤</sup> *<sup>x</sup>* <sup>≤</sup> <sup>1</sup> <sup>2</sup> . *H*(*x*) is the information entropy of binary information space (see the example 3.5 in Sect. 1.1 of Chap. 3), the image description is as follows (Fig. 1.1):

**Lemma 1.11** *Let* <sup>0</sup> <sup>≤</sup> <sup>λ</sup> <sup>≤</sup> <sup>1</sup> <sup>2</sup> *, then we have (i)* 0≤*i*≤λ*n n i* ≤ 2*n H*(λ). *(ii)* log 0≤*i*≤λ*n n i* ≤ *n H*(λ). *(iii)* lim*<sup>n</sup>*→∞ <sup>1</sup> *<sup>n</sup>* log 0≤*i*≤λ*n n i* = *H*(λ).

*Proof* We first prove that (*i*), (*ii*) can be obtained directly from the logarithm of (*i*).

$$\begin{split} 1 = (\lambda + (1 - \lambda))^n &\geq \sum\_{0 \leq i \leq \lambda n} \binom{n}{i} \lambda^i (1 - \lambda)^{n - i} \\ &= \sum\_{0 \leq i \leq \lambda n} \binom{n}{i} (1 - \lambda)^n (\frac{\lambda}{1 - \lambda})^i \\ &\geq \sum\_{0 \leq i \leq \lambda n} \binom{n}{i} (1 - \lambda)^n (\frac{\lambda}{1 - \lambda})^{\lambda n} \\ &= 2^{-nH(\lambda)} \sum\_{0 \leq i \leq \lambda n} \binom{n}{i} .\end{split}$$

In order to prove that (*iii*), we write *m* = [λ*n*] = λ*n* + *O*(1), from (*ii*), we have

$$\frac{1}{n}\log\sum\_{0\le i\le \lambda n} \binom{n}{i} \le H(\lambda).$$

on the other hand,

$$
\frac{1}{n}\log\sum\_{0\le i\le \lambda n} \binom{n}{i} \ge \frac{1}{n}\log\binom{n}{m}
$$

$$
=\frac{1}{n}\{\log n! - \log m! - \log(n-m)!\}.
$$

From the *Stirling* formula (1.17), we have

log *n*! − log *m*! − log(*n* − *m*)! = *n* log *n* − *m* log *m* − (*n* − *m*)log(*n* − *m*) + *O*(log *n*).

So there are

$$\frac{1}{n}\log\sum\_{0\le i\le\lambda n} \binom{n}{i} \ge \log n - \lambda\log\lambda n - (1-\lambda)\log n(1-\lambda) + O(\frac{\log n}{n}).$$

$$= -\lambda\log\lambda - (1-\lambda)\log(1-\lambda) + O(\frac{\log n}{n})$$

$$= H(\lambda) + O(\frac{\log n}{n}).$$

In the end, we have

$$\lim\_{n \to \infty} \frac{1}{n} \log \sum\_{0 \le i \le \lambda n} \binom{n}{i} = H(\lambda).$$

Lemma 1.11 holds.

# **1.5** *n***-fold Bernoulli Experiment**

In a given probability space, suppose that *x* is a random event and *y* is a random event. We denote the probability of event *x* occurrence by *p*(*x*), the probability of joint occurrence of *x* and *y* is denoted by *p*(*x y*) and the probability of occurrence of *x* under the condition of event *y* is denoted by *p*(*x*|*y*), which is called conditional probability. Obviously, there is a multiplication formula as follows:

$$p(\mathbf{x}\mathbf{y}) = p(\mathbf{y})p(\mathbf{x}|\mathbf{y}).\tag{1.20}$$

Two events *x* and *y*, if *p*(*x y*) = 0, say *x* and *y* are incompatible, if *p*(*x y*) = *p*(*x*)*p*(*y*), say two events are independent, or independent of each other.

A finite set of events {*x*1, *x*2,..., *xn*} is called complete event group, if

$$\sum\_{i=1}^{n} p(\mathbf{x}\_i) = 1,\text{ and } p(\mathbf{x}\_i \mathbf{y}\_j) = 0,\text{ when } i \neq j. \tag{1.21}$$

In a complete event group, we can assume that 0 < *p*(*xi*) ≤ 1(1 ≤ *i* ≤ *n*).

Total probability formula: If {*x*1, *x*2,..., *xn*} is a complete event group, *y* is any random event, then we have

$$p(\mathbf{y}) = \sum\_{i=1}^{n} p(\mathbf{y}x\_i) \tag{1.22}$$

and

$$p(\mathbf{y}) = \sum\_{i=1}^{n} p(\mathbf{x}\_i) p(\mathbf{y}|\mathbf{x}\_i). \tag{1.23}$$

**Lemma 1.12** *Let* {*x*1, *x*2,..., *xn*} *is a complete event group, then the event y can and can only occur simultaneously with a certain xi , so for any i*, 1 ≤ *i* ≤ *n, we have the following Bayes formula:*

$$p(\mathbf{x}\_i|\mathbf{y}) = \frac{p(\mathbf{x}\_i)p(\mathbf{y}|\mathbf{x}\_i)}{\sum\_{j=1}^n p(\mathbf{x}\_j)p(\mathbf{y}|\mathbf{x}\_j)}, \ 1 \le i \le n. \tag{1.24}$$

*Proof* From the product formula (1.20), we have

$$p(\mathbf{x}\_i \mathbf{y}) = p(\mathbf{y})p(\mathbf{x}\_i|\mathbf{y}) = p(\mathbf{x}\_i)p(\mathbf{y}|\mathbf{x}\_i).$$

then there is

$$p(\mathbf{x}\_i|\mathbf{y}) = \frac{p(\mathbf{x}\_i)p(\mathbf{y}|\mathbf{x}\_i)}{p(\mathbf{y})}.$$

And from the total probability formula (1.23), then we can know

$$p(\mathbf{x}\_i|\mathbf{y}) = \frac{p(\mathbf{x}\_i)p(\mathbf{y}|\mathbf{x}\_i)}{\sum\_{j=1}^n p(\mathbf{x}\_j)p(\mathbf{y}|\mathbf{x}\_j)}, \ 1 \le i \le n,$$

the Bayes formula (1.24) is proved.

Now we discuss the *n*-fold Bernoulli experiment. In statistical test, the test with only two possible results is called Bernoulli experiment, and the experiment satisfying the following agreement is called n-fold Bernoulli experiment:

(1) There are at most two possible results in each experiment: *a* or *a*¯.

(2) The probability *p* of occurrence of *a* in each test remains unchanged.

(3) Each experiment is statistically independent.

(4) A total of *n* experiments were carried out.

**Lemma 1.13** (Bernoulli theorem) *In Bernoulli experiment, the probability of event a is p, and then in the n-fold Bernoulli experiment, the probability B*(*k*; *n*, *p*) *of a appearing k*(0 ≤ *k* ≤ *n*) *times is*

$$B(k;n,p) = \binom{n}{k} p^k q^{n-k}, \; q = 1-p. \tag{1.25}$$

*Proof* The results of the *i*-th Bernoulli test are recorded as *xi*(*xi* = *a* or *a*¯), then *n*-fold Bernoulli experiment forms the following joint event *x*

$$\mathbf{x} = \mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n, \ \mathbf{x}\_i = a \text{ or } \bar{a}.$$

Because of the independence of the experiment, when there are exactly *k xi* = *a*, the occurrence probability of *x* is

$$p(\mathbf{x}) = p(\mathbf{x}\_1)p(\mathbf{x}\_2)\cdots p(\mathbf{x}\_n) = p^k q^{n-k}.$$

Obviously, there are exactly *k* joint events of *xi* = *a*, and the total number is *xi* = *a*, so

$$B(k;n,p) = \binom{n}{k} p^k q^{n-k}.$$

Lemma 1.13 holds.

In the same way, we can calculate the probability of event *a* appearing at the *k*-th in multiple Bernoulli experiments.

**Lemma 1.14** *Suppose that a and a are two possible events in Bernoulli experiments,* ¯ *then the probability of the first appearance of a in the k-th Bernoulli experiment is pq<sup>k</sup>*−<sup>1</sup>*.*

*Proof* Joint event *x* = *x*1*x*<sup>2</sup> ··· *xk* formed by *k*-fold Bernoulli experiment, where *k* − 1 *xi* = ¯*a*, and *xk* = *a*, then

$$p(\mathbf{x}) = p(\mathbf{x}\_1) \cdots p(\mathbf{x}\_{k-1}) \\ p(\mathbf{x}\_k) = pq^{k-1}.$$

We have completed the proof.

*n*-fold Bernoulli experiment is not only the most basic probability model in probability and statistics, but also a common tool in communication field. Next, we take the error of binary channel transmission as an example to illustrate.

*Example 1.1 (Error probability of binary channel)* In binary channel transmission, a codeword *x* of length *n* is a vector *x* = (*x*1, *x*2,..., *xn*)in *n*-dimensional vector space F*n* 2, where *xi* = 0 or 1(1 ≤ *i* ≤ *n*). For convenience, let us write *x* = *x*1*x*<sup>2</sup> ··· *xn*. Due to channel interference, characters 0 and 1 may have errors in transmission, that is, 0 becomes 1, 1 becomes 0, let the error probability be *p* (*p* may be very small), and the error probability of each transmission is constant. Under the above assumption, the codeword *x* with a transmission length of *n* can be regarded as a *n*-fold Bernoulli experiment, and the error probability *B*(*k*; *n*, *p*) of *k*(0 ≤ *k* ≤ *n*) errors of *x* in transmission is

$$B(k;n,p) = \binom{n}{k} p^k q^{n-k}, \; q = 1-p.$$

# **1.6 Chebyshev Inequality**

We call the variable ξ defined as a real number in a probability space a random variable. For any real number *x* ∈ (−∞, +∞), *p*(*x*) is defined as the probability of the value *x* of the random variable ξ , i.e.,

$$p(\mathbf{x}) = P\{\xi = \mathbf{x}\},\tag{1.26}$$

Call *p*(*x*) the probability function of ξ . If ξ has only a finite number of values, or countable infinite values, that is, the value space of ξ is a finite number of real numbers, or countable infinite real numbers, then ξ is called discrete random variable; otherwise, ξ is called continuous random variable. The distribution function *F*(*x*) of a random variable ξ is defined as

$$F(x) = P\{\xi \le x\}, \ x \in (-\infty, +\infty). \tag{1.27}$$

Obviously, the distribution function *F*(*x*) of ξ is defined as a monotone increasing function on the whole real axis (−∞, +∞). And it is a right continuous function, that is *<sup>F</sup>*(*x*0) <sup>=</sup> lim *<sup>x</sup>*→*x*0+<sup>0</sup> *F*(*x*). The probability distribution of a random variable ξ is completely determined by its distribution function *F*(*x*), in fact, for any *x*,

$$p(\boldsymbol{\alpha}) = P\{\boldsymbol{\xi} = \boldsymbol{\alpha}\} = F(\boldsymbol{\alpha}) - F(\boldsymbol{\alpha} - \boldsymbol{0}).$$

Let *f* (*x*) be a nonnegative integrable function on the real axis. And

$$F(\mathbf{x}) = \int\_{-\infty}^{\mathbf{x}} f(t) \, \mathbf{d}t,\tag{1.28}$$

*f* (*x*) is called the density function of the random variable ξ . Obviously, the density function satisfies:

$$f(\mathbf{x}) \ge 0, \ \forall \mathbf{x} \in (-\infty, +\infty), \ \int\_{-\infty}^{+\infty} f(\mathbf{x}) \, d\mathbf{x} = 1. \tag{1.29}$$

On the other hand, the function *f* (*x*) satisfying the formula (1.29) must be the density function of a random variable. Here, we introduce several common continuous random variables and their probability distribution.

1. Uniform distribution(Equal probability distribution)

A random variable ξ is equal probability value in interval [*a*, *b*], and ξ is said to be uniformly distributed, or it is also called a random variable of uniformly distributed, and its density function is

$$f(\mathbf{x}) = \begin{cases} \frac{1}{b-a}, & a \le \mathbf{x} \le b. \\ 0, & \text{otherwise.} \end{cases}$$

Its distribution function *F*(*x*) is

$$F(x) = \begin{cases} 0, & \text{when } x < a. \\ \frac{x - a}{b - a}, & \text{when } a \le x \le b. \\ 1, & \text{when } x > b. \end{cases}$$

2. Exponential distribution

The density function of random variable ξ is

$$f(x) = \begin{cases} \lambda e^{-\lambda x}, & \text{when } x \ge 0. \\ 0, & \text{when } x < 0. \end{cases}$$

where the given parameter is λ ≥ 0, and its distribution function is

$$F(x) = \begin{cases} 1 - \mathbf{e}^{-\lambda x}, & \text{when } x \ge 0. \\ 0, & \text{when } x < 0. \end{cases}$$

We call ξ an exponential distribution with parameter λ or a random variable with exponential distribution.

3. Normal distribution

A continuous random variable ξ whose density function *f* (*x*) is defined as:

$$f(\mathbf{x}) = \frac{1}{\sqrt{2\pi}\sigma} \mathbf{e}^{-\frac{(\mathbf{x}-\mu)^2}{2\sigma^2}}, \ x \in (-\infty, +\infty).$$

where μ and σ are constants, σ > 0. We say that ξ obeys the normal distribution with parameters of μ and σ2, and denote as ξ ∼ *N* μ, σ<sup>2</sup> . By Possion integral,

$$\int\_{-\infty}^{+\infty} \mathbf{e}^{-x^2} \, \mathrm{d}x = \sqrt{\pi},$$

it is not hard to verify

$$\int\_{-\infty}^{+\infty} f(\mathbf{x}) \, \mathbf{dx} = 1.$$

The distribution function *F*(*x*) of normal distribution *N* μ, σ<sup>2</sup> is

$$F(\mathbf{x}) = \frac{1}{\sqrt{2\pi}\sigma} \int\_{-\infty}^{\mathbf{x}} \mathbf{e}^{-\frac{(t-\mu)^2}{2\sigma^2}} \,\mathrm{d}t.$$

When μ = 0, σ = 1, *N*(0, 1) is called standard normal distribution.

Let us define the mathematical expectation and variance of a random variable ξ . First, let us see the mathematical expectation of a discrete random variable .

(1) Let ξ be a discrete random variable whose value space is {*x*1, *x*2,..., *xn*,...}. And let *p*(*xi*) = *P* {ξ = *xi*}. If

$$\sum\_{i=1}^{+\infty} |\mathbf{x}\_i| \, p(\mathbf{x}\_i) < \infty,$$

Then the mathematical expectation *E*(ξ ) of ξ is defined as

$$E\xi = E(\xi) = \sum\_{i=1}^{+\infty} x\_i p(x\_i). \tag{1.30}$$

(2) Let ξ be a continuous random variable and *f* (*x*) be its density function, if

$$\int\_{-\infty}^{+\infty} |x| f(x) dx < +\infty,$$

Then the mathematical expectation *E*(ξ ) of ξ is defined as

$$E\xi = E(\xi) = \int\_{-\infty}^{+\infty} x f(\mathbf{x})d\mathbf{x}.\tag{1.31}$$

(3) Let *h*(*x*) be a real valued function, then *h*(ξ ) is also a random variable, and *h*(ξ ) is called a function of the random variable ξ . The mathematical expectation *E*(*h*(ξ )) of *h*(ξ ) is *Eh*(ξ ).

**Lemma 1.15** *(1) Let* ξ *be a discrete random variable whose value space is*{*x*1, *x*2,...*, xn*,...}*, if E*(ξ ) *exists, then Eh*(ξ ) *also exists, and*

$$E\_h(\xi) = \sum\_{i=1}^{+\infty} h\left(\mathbf{x}\_i\right) p(\mathbf{x}\_i).$$

*(2) If* ξ *is a continuous random variable, and E*(ξ ) *exists, then Eh*(ξ ) *also exists, and*

$$E\_h(\xi) = \int\_{-\infty}^{\infty} h(\mathbf{x}) f(\mathbf{x}) d\mathbf{x}.$$

*Proof* Let the value space of η = *h*(ξ ) be {*y*1, *y*2,..., *yn*,...}, then

$$P\left\{\eta = \mathbf{y}\_j\right\} = P(\bigcup\_{\substack{i=1\\h(\mathbf{x}\_i) = \mathbf{y}\_j}}^{+\infty} \left\{\xi = \mathbf{x}\_i\right\}) = \sum\_{\substack{i=1\\h(\mathbf{x}\_i) = \mathbf{y}\_j}}^{+\infty} P\left\{\xi = \mathbf{x}\_i\right\}.$$

By the definition of *E*(η), then

$$\begin{split} E\_h(\xi) = E(\eta) &= \sum\_{j=1}^{+\infty} \mathbf{y}\_j P\left\{\eta = \mathbf{y}\_j\right\} \\ &= \sum\_{j=1}^{+\infty} \mathbf{y}\_j \sum\_{\substack{i=1 \\ h(\mathbf{x}\_i) = \mathbf{y}\_j}}^{+\infty} P\left\{\xi = \mathbf{x}\_i\right\} \\ &= \sum\_{i=1}^{+\infty} (\sum\_{\substack{j=1 \\ h(\mathbf{x}\_i) = \mathbf{y}\_j}}^{+\infty} \mathbf{y}\_j) P\left\{\xi = \mathbf{x}\_i\right\} \\ &= \sum\_{i=1}^{+\infty} h(\mathbf{x}\_i) p(\mathbf{x}\_i) .\end{split}$$

The same can be proved (2).

The following basic properties of mathematical expectation are easy to prove.

**Lemma 1.16** *(1) If* ξ = *c is constant, then E*(ξ ) = *c. (2) If a and b are constants, then E*(*a*ξ + *b*ξ ) = *aE*(ξ ) + *bE*(ξ )*. (3) If a* ≤ ξ ≤ *b, then a* ≤ *E*(ξ ) ≤ *b.*

If the mathematical expectation *E*(ξ ) of a random variable exists, then (ξ − *E*ξ )<sup>2</sup> is also a random variable (take *h*(*x*) = (*x* − *a*)2, where *a* = *E*(ξ )), We define the mathematical expectation *Eh*(ξ ) of *h*(ξ ) as the variance of ξ , denoted as *D*(ξ ), that is

$$D(\xi) = E(\left(\xi - E\xi\right)^2).$$

Denote <sup>σ</sup> <sup>=</sup> <sup>√</sup>*D*(ξ ) is the standard deviation of <sup>ξ</sup> . Here are some basic properties about variance.

**Lemma 1.17** *(1) D*(ξ ) = *E*(ξ <sup>2</sup>) − *E*<sup>2</sup>(ξ )*. (2) If* ξ = *ais constant, then D*(ξ ) = 0*. (3) D*(ξ + *c*) = *D*(ξ )*. (4) D*(*c*ξ ) = *c*2*D*(ξ )*. (5) If c* = *E*ξ *, then D*(ξ ) < *E*((ξ − *c*)<sup>2</sup>)*.*

*Proof* (1) can be seen from the definition,

$$\begin{aligned} D(\xi) &= E\left(\left(\xi - E\xi\right)^2\right) \\ &= E\left(\xi^2 - 2\xi E\xi + E^2(\xi)\right) \\ &= E(\xi^2) - 2(E\xi)^2 + (E(\xi))^2 \\ &= E(\xi^2) - (E\xi)^2. \end{aligned}$$

(2) is trivial. Let us prove (3). By (1),

$$\begin{split} D(\xi+c) &= E\left( (\xi+c)^2 \right) - \left( E(\xi+c) \right)^2 \\ &= E(\xi^2 + 2c\xi + c^2) - \left( (E\xi)^2 + 2cE(\xi) + c^2 \right) \\ &= E(\xi^2) + 2cE(\xi) + c^2 - (E\xi)^2 - 2cE(\xi) - c^2 \\ &= E(\xi^2) - (E\xi)^2 = D(\xi). \end{split}$$

(4) can also be derived directly from (1). In fact,

$$\begin{split} D(c\xi) &= E(c^2\xi^2) - \left(E(c\xi)\right)^2 \\ &= c^2E(\xi^2) - c^2(E\xi)^2 \\ &= c^2D(\xi). \end{split}$$

To prove (5), from Lemma 1.16, we notice that the mathematical expectation of (ξ − *E*ξ ) is 0, so if *c* = *E*(ξ ), by (3), we have

$$D(\xi) = D(\xi - c) = E((\xi - c)^2) - (E(\xi - c))^2.$$

Since the last term of the above formula is not zero, we always have

$$D(\xi) < E((\xi - c)^2).$$

(5) holds. This property indicates that *E*((ξ − *c*)2) reaches the minimum value *D*(ξ ) at *c* = *E*ξ . We have completed the proof.

Now we give the main results of this section; in mathematics, it is called Chebyshev type inequality, which is essentially the so-called moment inequality, because the mathematical expectation *E*ξ of a random variable ξ is the first-order origin moment and the variance is the second-order moment.

**Theorem 1.1** *Let h*(*x*) *be a nonnegative real valued function of x,* ξ *is a random variable, and expectation E*ξ *exists, then for any* ε > 0*, we have*

$$P\{h(\xi) \ge \varepsilon\} \le \frac{E\_h(\xi)}{\varepsilon},\tag{1.32}$$

*and*

$$P\{h(\xi) > \varepsilon\} < \frac{E\_h(\xi)}{\varepsilon}.\tag{1.33}$$

*Proof* We prove the theorem only for continuous random variable ξ . Let *f* (*x*) be density function of ξ , then by Lemma 1.15,

$$\begin{aligned} E\_h(\xi) &= \int\_{-\infty}^{+\infty} h(\mathbf{x}) f(\mathbf{x}) \, \mathrm{d}x \\ &\ge \int\_{h(\mathbf{x}) \ge \varepsilon} h(\mathbf{x}) f(\mathbf{x}) \, \mathrm{d}x \\ &\ge \varepsilon \int\_{h(\mathbf{x}) \ge \varepsilon} f(\mathbf{x}) \, \mathrm{d}x \\ &= \varepsilon P\{h(\mathbf{x}) \ge \varepsilon\}. \end{aligned}$$

so (1.32) holds. Similarly, we can prove (1.33).

In the theorem, we can get different Chebyshev inequality by replacing *h*(ξ ) with ξ − *E*ξ .

**Corollary 1.1** (Chebyshev) *If the variance D*(ξ ) *of the random variable* ξ *exists, then for any* ε > 0*, we have*

$$P\{|\xi - E\xi| \ge \varepsilon\} \le \frac{D(\xi)}{\varepsilon^2}.\tag{1.34}$$

*Proof* Take *h*(ξ ) = (ξ − *E*ξ )<sup>2</sup> in Theorem 1.1, then |ξ − *E*ξ | ≥ ε if and only if *h*(ξ ) ≥ ε2, from definition, *Eh*(ξ ) = *D*(ξ ). thus

$$P\{|\xi - E\xi| \ge \varepsilon\} = P\{h(\xi) \ge \varepsilon^2\} \le \frac{E\_h(\xi)}{\varepsilon^2}.$$

The Corollary holds.

**Corollary 1.2** (Chebyshev) *Suppose that both the expected value E*ξ *and the variance D*(ξ ) *of the random variable* ξ *exist, then for any k* > 0*, we have*

$$P\{|\xi - E\xi| \ge k\sqrt{D(\xi)}\} \le \frac{1}{k^2}.\tag{1.35}$$

*Proof* Take ε = *k* <sup>√</sup>*D*(ξ ) in Corollary 1.1, then

$$P\{|\xi - E\xi| \ge k\sqrt{D(\xi)}\} \le \frac{D(\xi)}{k^2 D(\xi)} = \frac{1}{k^2}.$$

Corollary 1.2 holds.

In mathematics, <sup>μ</sup> is often used as the expected value, <sup>σ</sup> <sup>=</sup> <sup>√</sup>*D*(ξ )(σ <sup>≥</sup> <sup>0</sup>) as the standard deviation, that is

$$
\mu = E\xi,\ \sigma = \sqrt{D(\xi)},\ \sigma \ge 0.
$$

Then the Chebyshev inequality in the Corollary 1.2 can be written as follows:

$$P\{ |\xi - \mu| \ge k\sigma \} \le \frac{1}{k^2}.\tag{1.36}$$

**Corollary 1.3** (Markov) *If the expected value of the random variable* ξ *satisfying the positive integer* |ξ | *<sup>k</sup> of k* ≥ 1 *exists, then*

$$P\{\left|\xi\right| \geq \varepsilon\} \leq \frac{E|\xi\|^k}{\varepsilon^k}.$$

*Proof* Take *h*(ξ ) = |ξ | *<sup>k</sup>* in Theorem 1.1, Replace ε with ε*<sup>k</sup>* , then the Markov inequality is directly derived from Theorem 1.1.

Next, we introduce several common discrete random variables and their probability distribution and calculate their expected value and variance.

*Example 1.2 (Degenerate distribution)* A random variable ξ takes a constant *a* with probability 1, that is ξ = *a*, *P*{ξ = *a*} = 1, ξ is called degenerate distribution. From Lemma 1.16, (1), *E*ξ = *a*, its variance is *D*(ξ ) = 0.

*Example 1.3 (Two point distribution)* A random variable ξ has only two values {*x*1, *x*2}, and its probability distribution is

$$P\{\xi = x\_1\} = p,\ P\{\xi = x\_2\} = 1 - p,\ 0 < p < 1.\ \ .$$

ξ is called a two-point distribution with parameter *p*, and its mathematical expectation and variance are

$$\begin{cases} E(\xi) = \mathbf{x}\_1 p + \mathbf{x}\_2 (1 - p), \\ D(\xi) = p(1 - p)(\mathbf{x}\_1 - \mathbf{x}\_2)^2. \end{cases}$$

Specially, take *x*<sup>1</sup> = 1, *x*<sup>2</sup> = 0, then the expected value and variance of the two-point distribution are

$$E(\xi) = p, \ D(\xi) = p(1-p).$$

*Example 1.4 (Equal probability distribution)* Let a random variable ξ have *n* values {*x*1, *x*2,..., *xn*} and be equal probability distribution, that is

$$P\{\xi = x\_i\} = \frac{1}{n}, \ 1 \le i \le n.$$

ξ is called a equal probability distribution or uniform distribution with obeying *n* points *x*1, *x*2,..., *xn*. The expected value and variance are

$$E(\xi) = \frac{1}{n} \sum\_{i=1}^{n} \chi\_i, \; D(\xi) = \frac{1}{n} \sum\_{i=1}^{n} (\chi\_i - E(\xi))^2.$$

*Example 1.5 (Binomial distribution)*In the *n*-fold Bernoulli experiment, the number of times ξ of event *a* is a random variable from 0 to *n*. The probability distribution is (see Bernoulli experiment)

$$P\{\xi=k\} = b(k;n,p) = \binom{n}{k} p^k q^{n-k},$$

where 0 ≤ *k* ≤ *n*, *p* is the probability of event *a* occurring in each experiment. ξ is called a binomial distribution with parameter *n*, *p*, denotes as ξ ∼ *b*(*n*, *p*). In fact, *b*(*k*; *n*, *p*) is the expansion of binomial (*p* + *q*)*<sup>n</sup>*.

**Lemma 1.18** *Let* ξ ∼ *b*(*n*, *p*)*, then*

$$E(\xi) = np, \ D(\xi) = npq, \ q = 1 - p.$$

#### 1.6 Chebyshev Inequality 25

#### *Proof* By definition,

$$\begin{aligned} E(\xi) &= \sum\_{k=0}^{n} k b(k; n, p) = \sum\_{k=1}^{n} k \binom{n}{k} p^k q^{n-k} \\ &= np \sum\_{k=1}^{n} \binom{n-1}{k-1} p^{k-1} q^{(n-1)-(k-1)} \\ &= np \sum\_{k=0}^{n-1} \binom{n-1}{k} p^k q^{n-1-k} \\ &= np \sum\_{k=0}^{n-1} b(k; n-1, p) \\ &= np. \end{aligned}$$

Similarly, it can be calculated

$$E(\xi^2) = \sum\_{k=0}^{n} k^2 b(k; n, p) = n^2 p^2 + npq.q.$$

thus

$$D(\xi) = E(\xi^2) - \left(E(\xi)\right)^2 = npq.$$

We have completes the proof.

**Lemma 1.19** *pn is the probability of event a in the n-fold Bernoulli experiment. If npn* → λ*, then we have*

$$\lim\_{n \to \infty} b(k; n, p\_n) = \frac{\lambda^k}{k!} e^{-\lambda}.$$

*Proof* Write λ*<sup>n</sup>* = *npn*, then

$$\begin{aligned} b(k;n,p\_n) &= \binom{n}{k} (p\_n)^k (1-p\_n)^{n-k} \\ &= \frac{n(n-1)\cdots(n-(k-1))}{k!} (\frac{\lambda\_n}{n})^k (1-\frac{\lambda\_n}{n})^{n-k} \\ &= \frac{(\lambda\_n)^k}{k!} (1-\frac{1}{n}) \cdots (1-\frac{k-1}{n}) (1-\frac{\lambda\_n}{n})^{n-k} .\end{aligned}$$

Because for fixed *<sup>k</sup>*, there is lim*<sup>n</sup>*→∞(λ*n*)*<sup>k</sup>* <sup>=</sup> <sup>λ</sup>*<sup>k</sup>* , and

$$\lim\_{n \to \infty} (1 - \frac{\lambda\_n}{n})^{n-k} = e^{-\lambda},$$

also

$$\lim\_{n \to \infty} (1 - \frac{1}{n})(1 - \frac{2}{n}) \cdots (1 - \frac{k - 1}{n}) = 1.$$

So there are

$$\lim\_{n \to \infty} b(k; n, p\_n) = \frac{\lambda^k}{k!} e^{-\lambda}.$$

So Lemma 1.19 holds.

*Example 1.6 (Possion distribution)* The value of discrete random variable ξ is 0, 1,..., *n*,..., λ ≥ 0 is a nonnegative real number, if the probability distribution of ξ is

$$P\{\xi = k\} = p(k, \lambda) = \frac{\lambda^k}{k!}e^{-\lambda},$$

ξ is called a random variable which obeys Poisson distribution. It can be proved that the expected value and variance of Poisson distribution ξ are λ. When *p* is very small, the random variable ξ*<sup>n</sup>* of *n*-fold Bernoulli experiment can be considered to be close to the Poisson distribution ξ . In this case, the probability distribution function *b*(*k*; *n*, *p*) can be approximately replaced by the possion distribution, that is

$$b(k;n,p) \approx \frac{(np)^k}{k!}e^{-np}.$$

# **1.7 Stochastic Process**

The so-called stochastic process is to consider the statistical characteristics of a consistent random variable {ξ*i*} *n <sup>i</sup>*=<sup>1</sup>. We can describe it as a *n* dimensional random vector. Let {ξ*i*} *n <sup>i</sup>*=<sup>1</sup> be *n* compatible random variables of a given probability space, <sup>ξ</sup> <sup>=</sup> (ξ1, ξ2,...,ξ*n*) is called an *<sup>n</sup>*-dimensional random vector with values in <sup>R</sup>*<sup>n</sup>* in the probability space.

A stochastic process or a *n* dimensional random vector ξ = (ξ1, ξ2,...,ξ*n*) is uniquely determined by the occurrence probability of the following joint events. Let *<sup>A</sup>*(ξ*i*) <sup>⊂</sup> <sup>R</sup> be the value space of random variable <sup>ξ</sup>*i*(<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*); then for any (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> *<sup>A</sup>*(ξ1) <sup>×</sup> *<sup>A</sup>*(ξ2) ×···× *<sup>A</sup>*(ξ*n*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, the probability of occurrence of the following joint event is denoted as

$$p(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) = p((\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n)) = P\{\xi\_1 = \mathbf{x}\_1, \xi\_2 = \mathbf{x}\_2, \dots, \xi\_n = \mathbf{x}\_n\}.$$

**Definition 1.7** If for any *xi* <sup>∈</sup> <sup>R</sup>(<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*), we have

$$p(\mathbf{x}\_1\mathbf{x}\_2\cdots\mathbf{x}\_n) = p(\mathbf{x}\_1)p(\mathbf{x}\_2)\cdots p(\mathbf{x}\_n).$$

Called stochastic process {ξ*i*} *n <sup>i</sup>*=<sup>1</sup> is statistically independent.

Strictly speaking, each real number *xi* in the Definition 1.7 should belong to the set of *Borel* on the line to ensure the event {ξ*<sup>i</sup>* = *xi*} generated by ξ*<sup>i</sup>* is the event in a given probability space.

Similarly, we can define a vector function *F*(*x*1, *x*2,..., *xn*) in R*<sup>n</sup>* as

$$F(\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) = P\{\xi\_1 \le \mathbf{x}\_1, \xi\_2 \le \mathbf{x}\_2, \dots, \xi\_n \le \mathbf{x}\_n\},$$

This is the distribution function of random vector ξ = (ξ1, ξ2,...,ξ*n*). Its marginal distribution function is

$$F\_i(\mathbf{x}\_i) = P\{\xi\_i \le \mathbf{x}\_i\} = F(+\infty, +\infty, \dots, \mathbf{x}\_i, +\infty, \dots, +\infty).$$

For the following properties of stochastic process, we do not give any proof. The reader can find them in the classical probability theory textbook (see reference 1's R*e*´nyi 1970, Li 2010, Long 2020).

**Lemma 1.20** *(1) A stochastic process* {ξ*i*} *n <sup>i</sup>*=<sup>1</sup> *is statistically independent if and only if*

$$F(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) = F(\mathbf{x}\_1) F(\mathbf{x}\_2) \cdots F(\mathbf{x}\_n).$$

*(2) Suppose* {ξ*i*} *n <sup>i</sup>*=<sup>1</sup> *is statistically independent, for any real value function gi*(*x*)*, then* {*gi*(ξ*i*)} *n <sup>i</sup>*=<sup>1</sup> *is also statistically independent.*

*(3) If* ξ*<sup>i</sup> is n random variables, then*

$$E(\xi\_1 + \xi\_2 + \dots + \xi\_n) = E(\xi\_1) + E(\xi\_2) + \dots + E(\xi\_n).$$

*(4) If* {ξ*i*} *n <sup>i</sup>*=<sup>1</sup> *is statistically independent, the expected value E*(ξ*i*) *of each random variable existence, then the mathematical expectation of random variable* ξ = (ξ1, ξ2,...,ξ*n*) *exists, and*

$$E(\xi) = E((\xi\_1, \xi\_2, \dots, \xi\_n)) = E(\xi\_1)E(\xi\_2)\cdots E(\xi\_n).$$

**Definition 1.8** Let {ξ*i*}<sup>∞</sup> *<sup>i</sup>*=<sup>1</sup> be a series of random variables, ξ is a given random variable, if for any ξ > 0, we have

$$\lim\_{n \to \infty} P\{ |\xi\_n - \xi| > \varepsilon \} = 0,$$

it is called {ξ*n*} converges to ξ in probability, denoted as ξ*<sup>n</sup> <sup>P</sup>* −→ <sup>ξ</sup> .

Obviously, ξ*<sup>n</sup> <sup>P</sup>* −→ <sup>ξ</sup> if and only if for any ε > 0, there is

$$\lim\_{n \to \infty} P\{ |\xi\_n - \xi| \le \varepsilon \} = 1.$$

If the occurrence probability of an event is *p*, the frequency of the event in the statistical test gradually approaches its probability *p*. Strict mathematical statements and proof are attributed to the Bernoulli law of large numbers.

**Theorem 1.2** (Bernoulli) *Let* μ*<sup>n</sup> be the number of occurrences of event a in the nfold Bernoulli experiment, it is known that the probability of occurrence of a in each experiment is p*(<sup>0</sup> <sup>&</sup>lt; *<sup>p</sup>* <sup>&</sup>lt; <sup>1</sup>)*, then the frequency* { <sup>μ</sup>*<sup>n</sup> <sup>n</sup>* } *of a converges in probability to p, that is, for any* ε > 0*, there is*

$$\lim\_{n \to \infty} P\{ |\frac{\mu\_n}{n} - p| > \varepsilon \} = 0.$$

*Proof* Consider <sup>μ</sup>*<sup>n</sup> <sup>n</sup>* as a random variable, its expected value and variance are

$$E(\frac{\mu\_n}{n}) = \frac{1}{n}E(\mu\_n) = p$$

and

$$D(\frac{\mu\_n}{n}) = \frac{1}{n^2}D(\mu\_n) = \frac{pq}{n}, \ q = 1 - p.$$

respectively. By Chebyshev inequality (1.34), we have

$$P\{ |\frac{\mu\_n}{n} - p| > \varepsilon \} < \frac{pq}{n\varepsilon^2}.$$

For any given ε > 0, we have

$$\lim\_{n \to \infty} P\{ |\frac{\mu\_n}{n} - p| > \varepsilon \} = 0.$$

So Bernoulli's law of large numbers holds.

In order to better understand Bernoulli's law of large numbers, we can use a random process to describe it. Define

$$
\xi\_i = \begin{cases} 1, & \text{if event } a \text{ occurs in the } i \text{-th experiment.}\\ 0, & \text{if event } a \text{ does not occur in the } i \text{-th experiment.} \end{cases}
$$

Then ξ*<sup>i</sup>* follows a two-point distribution with parameter *p* (see Sect. 1.6, example 1.3), and {ξ*i*} +∞ *<sup>i</sup>*=<sup>1</sup> is an independent and identically distributed stochastic process. Obviously,

$$\mu\_n = \sum\_{i=1}^n \xi\_i, \; E(\xi\_i) = p.s$$

#### 1.7 Stochastic Process 29

So Bernoulli's law of large numbers can be rewritten as follows:

$$\lim\_{n \to \infty} P\left\{ |\frac{1}{n} \sum\_{i=1}^n \xi\_i - \frac{1}{n} E(\sum\_{i=1}^n \xi\_i)| < \varepsilon \right\} = 1, 2$$

where {ξ*i*} is a sequence of independent random variables with the same two-point distribution of 0 − 1 with parameter *p*. It is not difficult to generalize this conclusion to a more general case.

**Theorem 1.3** (Chebyshev's law of large numbers) *Let* {ξ*i*} +∞ *<sup>i</sup>*=<sup>1</sup> *be a series of independent random variables, their expected value E*(ξ*i*) *and variance D*(ξ*i*) *exist, and the variance is bounded, i.e., D*(ξ*i*) ≤ *C holds for any i* ≥ 1*, then for any* ε > 0*, we have*

$$\lim\_{n \to \infty} P\{ |\frac{1}{n} \sum\_{i=1}^{n} \xi\_i - \frac{1}{n} \sum\_{i=1}^{n} E(\xi\_i)| < \varepsilon \} = 1.$$

*Proof* By Chebyshev inequality,

$$\begin{split} &P\{\left|\frac{1}{n}\sum\_{i=1}^{n}\xi\_{i} - \frac{1}{n}E(\sum\_{i=1}^{n}\xi\_{i})\right| \geq \varepsilon\} \\ &\leq \frac{D(\frac{1}{n}\sum\_{i=1}^{n}\xi\_{i})}{\varepsilon^{2}} \\ &= \frac{D(\sum\_{i=1}^{n}\xi\_{i})}{n^{2}\varepsilon^{2}} \\ &= \frac{1}{n^{2}\varepsilon^{2}}\sum\_{i=1}^{n}D(\xi\_{i}) \\ &\leq \frac{C}{n\varepsilon^{2}}. \end{split}$$

So there are

$$\lim\_{n \to \infty} P\{ |\frac{1}{n} \sum\_{i=1}^{n} \xi\_i - \frac{1}{n} \sum\_{i=1}^{n} E(\xi\_i)| \ge \varepsilon \} = 0.$$

That is, Theorem 1.3 holds.

Chebyshev's law of large numbers is more general than Bernoulli's law of large numbers, it can be understood as a sequence of independent random variables {ξ*i*}, the arithmetic mean of a random variable converges to the arithmetic mean of its expected value in probability.

As a special case, we consider an independent identically distributed stochastic process {ξ*i*}. Because there is the same probability distribution, there is the same expectation and variance.

**Corollary 1.4** *Let* {ξ*i*} *be an independent and identically distributed random process, their common expectation is* μ*, the variance is* σ2*, that is E*(ξ*i*) = μ, *D*(ξ*i*) = σ2(*i* = 1, 2, . . .)*, then we have*

$$\lim\_{n \to \infty} P\{ |\frac{1}{n} \sum\_{i=1}^n \xi\_i - \mu| < \varepsilon \} = 1,$$

*that is* <sup>1</sup> *n <sup>n</sup> <sup>i</sup>*=<sup>1</sup> ξ*<sup>i</sup> <sup>P</sup>* −→ <sup>μ</sup>*.*

In the above Corollary, the existence of variance is unnecessary, Sinchin proved an independent and identically distributed stochastic process {ξ*i*}, as long as the expected value *<sup>E</sup>*(ξ*i*) <sup>=</sup> <sup>μ</sup> exists. Then <sup>1</sup> *n <sup>n</sup> <sup>i</sup>*=<sup>1</sup> ξ*<sup>i</sup>* converges to its expected value in probability. This conclusion is called Sinchin's law of large numbers.

Finally, we state the so-called Lindbergh Levy's central limit theorem without proof.

**Theorem 1.4** (central limit theorem) *Let* {ξ*i*} +∞ *<sup>i</sup>*=<sup>1</sup> *is an independent and identically distributed stochastic process, the expected value is E*(ξ*i*) = μ*, the variance is D*(ξ*i*) = σ<sup>2</sup> > 0(*i* = 1, 2, . . .)*, then for any x, we have*

$$\lim\_{n \to \infty} P\{\frac{\sum\_{i=1}^{n} \xi\_i - n\mu}{\sigma \sqrt{n}} \le x\} = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{x} e^{-\frac{t^2}{2}} \,\mathrm{d}t\,,$$

*That is, the sum of random variables<sup>n</sup> <sup>i</sup>*=<sup>1</sup> ξ*<sup>i</sup> , whose standardized variables converge to the standard normal distribution N*(0, 1) *in probability.*

**Exercise 1** (Nie and Ding 2000)


$$(a^{2^{n}} + 1, a^{2^{n}} + 1) = 1 \text{ or } 2... $$

Thus prove Polya theorem: there are infinitely many primes.

6. On the positive integer set, the Möbius function μ(*n*) is defined as

#### 1.7 Stochastic Process 31

$$\mu(n) = \begin{cases} 1, & \text{when } n = 1, \\ 0, & \text{when } n \text{ contain square factor,} \\ (-1)^t, & \text{when } n = p\_1 p\_2 \cdots p\_t, \; p\_i \text{ are different primes.} \end{cases}$$

Prove Möbius identity

$$\sum\_{d|n} \mu(d) = \begin{cases} 1, & \text{when } n = 1, \\ 0, & \text{when } n > 1. \end{cases}$$

7. Suppose ϕ(*n*) is a Euler function, prove

$$
\varphi(n) = n \sum\_{d|n} \frac{\mu(d)}{d}.
$$

8. Let *n* ≥ 1 be positive integer, prove Wilson theorem:

$$(n-1)! + 1 \equiv 0(\text{mod } n)!$$

if and only if *n* is prime.

9. Let *n* and *b* be positive integers, *n* > *b*, prove *n* can be uniquely expressed as the following *b*-ary number:

$$m = b\_0 + b\_1b + b\_2b^2 + \dots + b\_{r-1}b^{r-1}, \text{ where } 0 \le b\_i < b, r \ge 1.$$

*n* = (*br*−<sup>1</sup>*br*−<sup>2</sup> ··· *b*1*b*0)*<sup>b</sup>* is called the *b*-ary expression of *n* and *r* is called the *b*-ary digit of *n*.

10. Let *f* (*n*) be a complex valued function on a set of positive integers, and prove the inversion formula of Möbius:

$$F(n) = \sum\_{d|n} f(d), \ \forall n \ge 1 \Leftrightarrow f(n) = \sum\_{d|n} \mu(d) F(\frac{n}{d}), \ \forall n \ge 1 \dots$$

11. Prove that the following sum formula:

$$\sum\_{\substack{1 \le r \le n \\ (r,n)=1}} r = \frac{n\varphi(n)}{2}.$$


$$\int\_{-\infty}^{+\infty} |F(x+a) - F(x)| dx = a.$$

21. (Generalization of Bernoulli's law of large numbers) Let μ*<sup>n</sup>* is the number of occurrences of event *A* in the first *n* experiments of a series of independent Bernoulli experiments, it is known that the probability of occurrence of event *A* in the *i* test is *pi* , try to write the corresponding law of large numbers and prove it.

# **References**

Apostol, T. M. (1976). *Introduction to analytic number theory*. Springer.


Li, X. (2010). *Basic probability theory*. Higher Education Press (in Chinese)


Nie, L., & Ding, S. (2000). *Introduction to algebra*. Higher Education Press (in Chinese)


Rosen, M. H. (2002). *Number theory in function fields*. Springer.


VanderWalden, B. L. (1976). *Algebra (II)*. Translated by Xihua Cao: Kencheng Zeng, Fuxin Hao, Beijing, Science Press (in Chinese).

VanLint, J. H. (1991). *Introduction to coding theory*. Springer.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 2 The Basis of Code Theory**

The channel of information transmission is called channel for short. The commonly used channels include cable, optical fiber, medium of radio wave transmission and carrier line, etc., and also include tape, optical disk, etc. The channel constitutes the physical conditions for social information to interact across space and time. In addition, a piece of social information, such as various language information, picture information, data information and so on, should be exchanged across time and space, information coding is the basic technical means. What is information coding? In short, it is the process of digitizing all kinds of social information. Digitization is not a simple digital substitution of social information, but is full of profound mathematical principles and beautiful mathematical technology. For example, the source code used for data compression and storage uses the principle of probability statistics to attach the required statistical characteristics to social information, so the source code is also called random code. The other is the so-called channel coding, which is used to overcome the channel interference. This kind of code is full of beautiful algebra, geometry and various mathematical techniques in combinatorics, in order to improve the accuracy of information transmission, so the channel coding is also called algebraic combinatorial code. The main purpose of this chapter is to introduce the basic knowledge of code theory for channel coding. Source coding will be introduced in Chap. 3.

With the hardware support of channel and the software technology of information coding, we can implement the long-distance exchange of various social information across time and space. Taking channel coding as an example, this process can be described as the following diagram (Fig. 2.1).

In 1948, American mathematician Shannon published his pioneering paper "Mathematical Principles of Communication" in the technical bulletin of Bell laboratory, marking the advent of the era of electronic information. In this paper, Shannon proved the existence of "good code" with the rate infinitely close to the channel capacity and the transmission error probability arbitrarily small by using probability theory (see Theorem in this Chap. 2.10), on the other hand, if the transmission

**Fig. 2.1** Channel Coding

error probability is arbitrarily small, the code rate (transmission efficiency) does not exceed an upper bound (channel capacity) (see Theorem in Chap. 3). This upper bound is called Shannon's limit, which is regarded as the golden rule in the field of electronic communication engineering technology.

Shannon's theorem is an existence proof rather than a constructive proof. How to construct the so-called good code which can not only ensure the communication efficiency (the code rate is as large as possible), but also control the transmission error rate is the unremitting goal after the advent of Shannon's theory. From Hamming and Golay to Elias, Goppa, Berrou and Turkish mathematician Arikan, from Hamming code, Golay code to convolutional code, turbo code to polar code, over the past decades, electronic communication has reached one peak after another, creating one technological miracle after another, until today's 5G era. In 1969, the U.S. Mars probe used Hadamard code to transmit image information. For the first time, mankind was lucky to witness one beautiful picture after another in outer space, in 1971, the U.S. Jupiter and Saturn probe used the famous Golay code G23 to send hundreds of frames of color photos of Jupiter and Saturn back to earth, 70 years of exploration of channel coding is a magnificent history of electronic communication.

The main purpose of this chapter is to strictly define and prove the mathematical characteristics of general codes in theory, so as to provide a solid mathematical foundation for further study of coding technology and cryptography. This chapter includes Hamming distance, Lee distance, linear code, some typical good codes, MacWilliams theorem and famous Shannon coding theorem. Master the content of this chapter, we will have a basic and comprehensive understanding of channel coding theory (error correction code).

# **2.1 Hamming Distance**

In channel coding, the alphabet usually chooses a *q*-element finite fieldF*<sup>q</sup>* , sometimes a ring Z*m*, where *q* is the power of a prime. Let *n* - 1 be a positive integer, F*<sup>n</sup> <sup>q</sup>* is an *n*-dimensional linear space over F*<sup>q</sup>* , also called codeword space.

$$\mathbb{F}\_q^n = \{ \boldsymbol{\mathfrak{x}} = (\boldsymbol{\mathfrak{x}}\_1, \boldsymbol{\mathfrak{x}}\_2, \dots, \boldsymbol{\mathfrak{x}}\_n) | \forall \boldsymbol{\mathfrak{x}}\_i \in \mathbb{F}\_q \}.$$

A vector *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) in <sup>F</sup>*<sup>n</sup> <sup>q</sup>* is called a codeword of length *n*. For convenience, a codeword *<sup>x</sup>*, we write as *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> ... *xn*, each *xi* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* is called a character, denoted by 0 = (0, 0,..., 0).

Two codewords *x* = *x*1*x*<sup>2</sup> ... *xn* and *y* = *y*<sup>1</sup> *y*<sup>2</sup> ... *yn* define the number of characters whose Hamming distance is different from *x* and *y*, that is

$$d(\mathbf{x}, \mathbf{y}) = \#\{i | 1 \le i \le n, \mathbf{x}\_i \ne \mathbf{y}\_i\}. \tag{2.1}$$

Obviously 0 *d*(*x*, *y*) *n* is a positive integer, the weight function of a codeword *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* is defined as w(*x*) = *d*(*x*, 0), that is Hamming distance between *x* and 0. The following properties are obvious.

**Property 2.1** *If x, y* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> , then*


*Property (i) is called nonnegativity, property (ii) symmetry and property (iv) translation invariance. This is the basic property of distance function in mathematics, and we can analogy with the distance between two points in plane or Euclidean space.*

**Lemma 2.1** *Let x, y* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> be two codings, then*

$$w(\mathfrak{x} \pm \mathfrak{y}) \leqslant w(\mathfrak{x}) + w(\mathfrak{y}).$$

*Proof* Because w(−*x*) = w(*x*), so w(*x* − *y*) = w(*x* + (−*y*)). We can only prove w(*x* + *y*) w(*x*) + w(*y*). Let *x* = *x*<sup>1</sup> ... *xn*, *y* = *y*<sup>1</sup> ... *yn*, then

$$
\lambda x + y = (\mathbf{x}\_1 + \mathbf{y}\_1)(\mathbf{x}\_2 + \mathbf{y}\_2)\dots(\mathbf{x}\_n + \mathbf{y}\_n).
$$

Obviously, if *xi* + *yi* = 0, then *xi* = 0, or *yi* = 0(1 *i n*). Thus w(*x* + *y*) w(*x*) + w(*y*).

$$w(\mathbf{x} - \mathbf{y}) = w(\mathbf{x} + (-\mathbf{y})) \lessgtr w(\mathbf{x}) + w(-\mathbf{y}) = w(\mathbf{x}) + w(\mathbf{y}).$$

We have completed the proof.

**Lemma 2.2** (Trigonometric inequality) *If x, y, z* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> are three codings, then*

$$d(\mathbf{x}, \mathbf{y}) \le d(\mathbf{x}, \mathbf{z}) + d(\mathbf{z}, \mathbf{y}).$$

*Proof* From 2.1, if *<sup>z</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , then

$$w(\mathbf{x} - \mathbf{y}) \le w(\mathbf{x} - \mathbf{z}) + w(\mathbf{z} - \mathbf{y}).$$

Then by property (v), *d*(*x*, *y*) = w(*x* − *y*), we have

$$d(\mathbf{x}, \mathbf{y}) \le d(\mathbf{x}, z) + d(z, \mathbf{y}).$$

The Lemma holds.

The nonnegativity, symmetry, translation invariance of Hamming distance and trigonometric inequality described in lemma 2 together show that Hamming distance of two codewords is equal to the distance between two points in physical space, which is a real distance function in mathematical sense. Similarly, we can define the concept of ball. A Hamming sphere with radius ρ centered on codeword *x* is defined as

$$B\_{\rho}(\mathbf{x}) = \{ \mathbf{y} | \mathbf{y} \in \mathbb{F}\_q^n, d(\mathbf{x}, \mathbf{y}) \leqslant \rho \},\tag{2.2}$$

where ρ is a nonnegative integer. Obviously, *B*0(*x*) = {*x*} contains only one codeword.

**Lemma 2.3** *For any x* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> ,* 0 ρ *n, we have*

$$|\mathcal{B}\_{\rho}(x)| = \sum\_{i=0}^{\rho} \binom{n}{i} (q-1)^i,\tag{2.3}$$

*where* |*B*ρ(*x*)| *is the number of codewords in Hamming ball B*ρ(*x*)*.*

*Proof* Let *x* = *x*1*x*<sup>2</sup> ... *xn*, 0 *i* ρ, *i* given, let

$$A\_i = \#\{\mathbf{y} \in \mathbb{F}\_q^n | d(\mathbf{y}, \mathbf{x}) = i\}.$$

Obviously,

$$A\_i = \binom{n}{i} (q-1)^i,$$

so

$$|B\_{\rho}(\chi)| = \sum\_{i=0}^{\rho} A\_i = \sum\_{i=0}^{\rho} \binom{n}{i} (q-1)^i.$$

**Corollary 2.1** *For* <sup>∀</sup>*<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> , we have*

$$|B\_{\rho}(\mathbf{x})| = |B\_{\rho}(\mathbf{0})|.$$

*That is to say, the number of codewords in B*ρ(*x*) *is a constant which only depends on radius* ρ*. This constant is usually denoted as B*ρ*.*

**Definition 2.1** If *C* - F*n <sup>q</sup>* , *C* is called a *q*-ary code, code for short, |*C*| is the number of codewords in code *C*. If |*C*| = 1, we call *C* a trivial code, and all the codes we discuss are nontrivial codes.

For a code of*C*, the following five mathematical quantities are of basic importance.

#### **Definition 2.2** If *C* is a code, define

Bit rate of *C R* <sup>=</sup> *RC* <sup>=</sup> <sup>1</sup> *<sup>n</sup>* log*<sup>q</sup>* <sup>|</sup>*C*<sup>|</sup> Minimum distance of *C d* = min{*d*(*x*, *y*)|*x*, *y* ∈ *C*, *x* = *y*} Minimal weight of *C* w = min{w(*x*)|*x* ∈ *C*, *x* = 0} Coverage radius of *<sup>C</sup>* <sup>ρ</sup> <sup>=</sup> max{min{*d*(*x*, *<sup>c</sup>*)|*<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>*}|*<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> q* } Disjoint radius of *C* ρ<sup>1</sup> = max{*r*|0 *r n*, *Br*(*c*1) ∩ *Br*(*c*2) = φ, ∀*c*1, *c*<sup>2</sup> ∈ *C*, *c*<sup>1</sup> = *c*2}

It is important to discuss the relationship between the above five mathematical quantities for the study of codes. We begin by proving lemma 2.4.

**Lemma 2.4** *Let d be minimum distance of C,* ρ<sup>1</sup> *be disjoint radius of C, then*

$$d = 2\rho\_1 + 1, \quad d = 2\rho\_1 + 2.$$

*Proof* We can only prove 2ρ<sup>1</sup> + 1 *d* 2ρ<sup>1</sup> + 2. If *d* 2ρ1, then there are codewords *c*<sup>1</sup> ∈ *C*, *c*<sup>2</sup> ∈ *C*, *c*<sup>1</sup> = *c*<sup>2</sup> such that

$$d(c\_1, c\_2) \leqslant 2\rho\_1.$$

This means that *c*<sup>1</sup> and *c*<sup>2</sup> have at most 2ρ<sup>1</sup> different characters. Without losing generality, we can make the first 2ρ<sup>1</sup> characters of *c*<sup>1</sup> and *c*<sup>2</sup> different, that is

$$\begin{cases} c\_1 = a\_1 a\_2 \dots a\_{\rho\_1} a\_{\rho\_1 + 1} \dots a\_{2\rho\_1} \ast \ast \cdot \ast \ast \\ c\_2 = b\_1 b\_2 \dots b\_{\rho\_1} b\_{\rho\_1 + 1} \dots b\_{2\rho\_1} \ast \ast \cdot \cdot \ast \ast \end{cases}$$

where \* represents the same character. We can put

$$\alpha = a\_1 a\_2 \dots a\_{\rho\_1} b\_{\rho\_1 + 1} \dots b\_{2\rho\_1} \ast \dots \ast,$$

this shows that

$$d(\mathbf{x}, c\_1) \lesssim \rho\_1, \quad d(\mathbf{x}, c\_2) \lesssim \rho\_1.$$

That is

$$
\propto \mathcal{B}\_{\rho\_1}(c\_1) \cap \mathcal{B}\_{\rho\_1}(c\_2) \dots
$$

It's in contradiction with *B*<sup>ρ</sup><sup>1</sup> (*c*1) ∩ *B*<sup>ρ</sup><sup>1</sup> (*c*2) = φ. So we have *d* - 2ρ<sup>1</sup> + 1. If *d* > 2ρ<sup>1</sup> + 2 = 2(ρ<sup>1</sup> + 1), then we can prove the following formula, which is in contradiction with the definition of disjoint radius ρ1.

$$B\_{\rho\_1+1}(c\_1) \cap B\_{\rho\_1+1}(c\_2) = \phi,\ \forall c\_1,\ c\_2 \in \mathcal{C},\ c\_1 \neq c\_2.$$

Because if the above formula does not hold, then *c*1, *c*<sup>2</sup> ∈ *C*, *c*<sup>1</sup> = *c*2, *B*ρ1+1(*c*1) intersects with *B*ρ1+1(*c*2), we might as well make

$$
\infty \in B\_{\rho\_1+1}(c\_1) \cap B\_{\rho\_1+1}(c\_2),
$$

Then the trigonometric inequality of lemma 2.2 is derived

$$d(c\_1, c\_2) \lesssim d(c\_1, \mathbf{x}) + d(c\_2, \mathbf{x}) \lesssim 2(\rho\_1 + \mathbf{l}),$$

It contradicts the hypothesis of *d* > 2(ρ<sup>1</sup> + 1). So we have 2ρ<sup>1</sup> + 1 *d* 2ρ<sup>1</sup> + 2. The Lemma holds.

In order to discuss the geometric meaning of covering radius ρ, we consider the set {*B*ρ(*c*)|*c* ∈ *C*} of balls on code *C*, if

$$\bigcup\_{c \in C} B\_{\rho}(c) = \mathbb{F}\_q^n,$$

Then {*B*ρ(*c*)|*<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>*} is called a cover of codeword space <sup>F</sup>*<sup>n</sup> q* .

**Lemma 2.5** *Let* ρ *be the covering radius of C, then* ρ *is the smallest positive integer of* {*B*ρ(*c*)|*<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>*} *covering* <sup>F</sup>*<sup>n</sup> q .*

*Proof* By the definition of <sup>ρ</sup>, for all *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , there is

$$\min \{ d(x, c) | c \in C \} \leqslant \rho.$$

Therefore, when *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* is given, there is a codeword *c* ∈ *C* ⇒ *d*(*x*, *c*) ρ, that is *x* ∈ *B*ρ(*c*), this shows that

$$\bigcup\_{c \in C} B\_{\rho}(c) = \mathbb{F}\_q^n.$$

That is, {*B*ρ(*c*)|*<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>*} forms a cover of <sup>F</sup>*<sup>n</sup> <sup>q</sup>* . Obviously, {*B*<sup>ρ</sup>−<sup>1</sup>(*c*)|*c* ∈ *C*} can't cover F*n <sup>q</sup>* , because if

$$\bigcup\_{c \in C} B\_{\rho - 1}(c) = \mathbb{F}\_q^n.$$

Then for any *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , ∃ *c* ∈ *C* ⇒ *x* ∈ *B*<sup>ρ</sup>−<sup>1</sup>(*c*), so

$$\min \{ d(\mathbf{x}, c) | c \in \mathbf{C} \} \leqslant \rho - 1.$$

Thus

$$\rho = \max\{\min\{d(\mathbf{x}, c) | c \in C\} | \mathbf{x} \in \mathbb{F}\_q^n\} \leqslant \rho - 1.$$

The contradiction indicates that ρ is the smallest positive integer. The lemma holds.

**Lemma 2.6** *Let d be the minimum distance of C and* ρ *be the covering radius of C, then*

$$d \leqslant 2\rho + 1.$$

*Proof* If *d* > 2ρ + 1, Let *c*<sup>0</sup> ∈ *C* be given, then we have

$$B\_{\rho+1}(c\_0) \cap B\_{\rho}(c) = \phi, \ \forall c \in C, \ c \neq c\_0.$$

So you can choose *x* ∈ *B*<sup>ρ</sup>+<sup>1</sup>(*c*0), and *d*(*x*, *c*0) = ρ + 1, then

$$\text{ax } \notin B\_{\rho}(c\_0), \text{ x } \notin B\_{\rho}(c), \text{ } \forall c \in C.$$

That is, {*B*ρ(*c*)|*<sup>c</sup>* <sup>∈</sup> *<sup>C</sup>*} cannot cover <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , which is contrary to lemma 2.5. So we always have *d* 2ρ + 1. The Lemma holds.

Combining the above three lemmas, we can get the following simple but very important corollaries.

**Corollary 2.2** *Let C* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> be an arbitrary q-ary code. d,* ρ*,* ρ<sup>1</sup> *are the minimum distance, covering radius and disjoint radius of C respectively, then*

*(i)* ρ<sup>1</sup> ρ*. (ii) If the minimum distance of C is d* = 2*e* + 1 ⇒ *e* = ρ1*.*

*Proof* (i) Directly from 2ρ<sup>1</sup> + 1 *d* 2ρ + 1, if *d* = 2*e* + 1 is odd, then by the lemma 2.4, *d* = 2ρ<sup>1</sup> + 1 = 2*e* + 1 ⇒ *e* = ρ1.

**Definition 2.3** A code *C*, if ρ = ρ1, is called a perfect code.

**Corollary 2.3** *(i) The minimum distance of any perfect code C is d* = 2ρ + 1*. (ii) The minimum distance of a code C is d* = 2*e* + 1*, Then C is a perfect code if and only if* <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> ,* ∃ *the only ball Be*(*c*)*, c* ∈ *C* ⇒ *x* ∈ *Be*(*c*)*.*

*Proof* (i) can be directly launched by 2ρ<sup>1</sup> + 1 *d* 2ρ + 1. To prove (ii), if *C* is a perfect code and the minimum distance is *d* = 2*e* + 1, so we have ρ<sup>1</sup> = ρ = *e*. On the other hand, if the conditions are right, then the coverage radius of *C* is ρ *e* = ρ<sup>1</sup> ρ, so ρ<sup>1</sup> = ρ. *C* is a perfect code.

In order to introduce the concept of error correcting code, we discuss the socalled decoding principle in electronic information transmission. This principle is commonly known as the decoding principle of "look most like". What looks like the most? When we transmit through the channel with interference, we receive a codeword *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , and a codeword *x* ∈ *C*. If

$$d(\mathbf{x}, \mathbf{x'}) = \min \{ d(c, \mathbf{x'}) | c \in C \},$$

*x* is the most similar codeword to *x* in code *C*. So we decode *x* to *x*. If the most similar codeword *x* is the only one in*C*, then theoretically, *x* is the codeword received after *x* transmission, so *x <sup>x</sup>* −→ is accurate.

**Definition 2.4** A code *C* is called *e*-error correcting code (*e* - <sup>1</sup>). If for any *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> q* , there is a *c* ∈ *C* ⇒ *x* ∈ *Be*(*c*), then *c* is unique.

An error correcting code allows transmission errors without affecting correct decoding. For example, suppose that *C* is a *e*-error correcting code, then for any *c* ∈ *C*, after *c* is transmitted through the channel with interference, the codeword we receive is *x*, if an error occurs when *c* is transmitted with no more than *e* characters at most, that is *d*(*c*, *x*) *e*, so the most similar codeword in *C* must be *c*, so we can decode *x decode* −−−→ *c* correctly.

**Corollary 2.4** *A perfect code with minimal distance d* = 2*e* + 1 *is e-error correcting code.*

*Proof* Because the disjoint radius ρ<sup>1</sup> of *C* has ρ<sup>1</sup> = ρ = *e* with the covering radius <sup>ρ</sup>. Therefore, for any received codeword *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , there exists and only exists a *c* ∈ *C* ⇒ *x* ∈ *Be*(*c*). That is, *C* is *e*-error correction code.

Finally, we prove the main conclusion of this section.

**Theorem 2.1** *The minimum distance of a code C is d* = 2*e* + 1*, then C is a perfect code if and only if the following sphere-packing condition holds.*

$$|C| \sum\_{i=0}^{\epsilon} \binom{n}{i} (q-1)^i = q^n. \tag{2.4}$$

*Proof* If the minimum distance of *C* is *d* = 2*e* + 1, and *C* is the perfect code ⇒ ρ = ρ<sup>1</sup> = *e*. So

$$\bigcup\_{c \in C} B\_c(c) = \mathbb{F}\_q^n.$$

Then we have

$$\left| \bigcup\_{c \in \mathcal{C}} B\_{\varepsilon}(c) \right| = q^n,$$

thus

$$|C|B\_{\epsilon} = |C| \sum\_{i=0}^{\epsilon} \binom{n}{i} (q-1)^i = q^n.$$

Conversely, the sphere-packing condition (2.4) holds. Because the minimum distance of *C* is *d* = 2*e* + 1, from corollary 2.2, we can see that ρ<sup>1</sup> = *e*, so we have

$$\bigcup\_{c \in C} B\_c(c) = \mathbb{F}\_q^n.$$

It can be concluded that ρ *e* = ρ<sup>1</sup> ρ, thus ρ = ρ1, *C* is a perfect code. The theorem holds.

#### 2.1 Hamming Distance 43

When *<sup>q</sup>* <sup>=</sup> 2, the alphabet <sup>F</sup><sup>2</sup> is a finite field of two elements {0, <sup>1</sup>}, at this time, the coding is called binary code or binary code, and the transmission channel is called binary channel. In binary channel transmission, the most important is binary entropy function *H*(λ), define as

$$H(\lambda) = \begin{cases} 0, & \text{when } \lambda = 0 \text{ or } \lambda = 1, \\ -\lambda \log \lambda - (1 - \lambda) \log(1 - \lambda), & \text{when } 0 < \lambda < 1. \end{cases} \tag{2.5}$$

Obviously, *H*(λ) = *H*(1 − λ), and 0 *H*(λ) 1, *H* 1 2 <sup>=</sup> 1, that is <sup>λ</sup> <sup>=</sup> <sup>1</sup> 2 reaching the maximum. For further properties of *H*(λ), please refer to Chap. 1.

**Theorem 2.2** *Let C be a perfect code with minimal distance d* = 2*e* + 1*, RC is the code rate of C, then*

$$\begin{aligned} (i) \ 1 - R\_C &= \frac{1}{n} \log\_2 \sum\_{i=0}^{\varepsilon} \binom{n}{i} \lessapprox H \left( \frac{\varepsilon}{n} \right). \\ (ii) \text{When the length of codeword is } n \to \infty, \text{ if } \lim\_{n \to \infty} R\_C &= a, \text{ then } \end{aligned}$$

$$\lim\_{n \to \infty} H\left(\frac{e}{n}\right) = 1 - a.$$

*Proof* (i) According to the sphere-packing condition, since *C* is the perfect code, so

$$|C| \sum\_{i=0}^{\epsilon} \binom{n}{i} = 2^n.$$

We have

$$\frac{1}{n}\log\_2|C| + \frac{1}{n}\log\_2\sum\_{i=0}^{e} \binom{n}{i} = 1.$$

That is

$$1 - R\_C = \frac{1}{n} \log\_2 \sum\_{i=0}^{e} \binom{n}{i} \leqslant H\left(\frac{e}{n}\right),$$

The last inequality is derived from lemma 1.11 in the Chap. 1, so (i) holds. If there is a limit of *RC* when *n* → ∞, again from lemma 1.11 in the Chap. 1, we have

$$\lim\_{n \to \infty} H\left(\frac{e}{n}\right) = 1 - \lim\_{n \to \infty} R\_C = 1 - a.s$$

The Theorem 2.2 holds.

Finally, we give an example of perfect code.

*Example 2.1* Let *<sup>n</sup>* <sup>=</sup> <sup>2</sup>*<sup>e</sup>* <sup>+</sup> 1 is an odd number, then the repeated code in <sup>F</sup>*<sup>n</sup>* <sup>2</sup> is *A* = {0, 1}, where 0 = 00 ... 0, 1 = 11 ··· 1 are Perfect codes of length n.

First, repeat code *<sup>A</sup>* = {0, <sup>1</sup>} ⊂ <sup>F</sup>*<sup>n</sup>* <sup>2</sup> contains only two codes 0 <sup>=</sup> <sup>0</sup> ... <sup>0</sup> <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup>, 1 = <sup>1</sup> ... <sup>1</sup> <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* 2, because *n* = 2*e* + 1 is an odd number, so from Corollary 2.2, Disjoint radius of *A* is ρ<sup>1</sup> = *e*, let's prove that the covering radius of *A* is ρ = ρ<sup>1</sup> = *e*, for any *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* 2, if *d*(0, *x*) > *e*, that is *d*(0, *x*) ≥ *e* + 1, this shows that at least *e* + 1 characters are 1 in *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> ... *xn* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> , the maximum number of *e* characters is 0, thus *d*(1, *x*) ≤ *e*. This shows that *x* ∈/ *Be*(0), then *x* ∈ *Be*(1), that is

$$B\_e(0) \cup B\_e(1) = \mathbb{F}\_2^n,$$

so ρ ≤ *e* = ρ<sup>1</sup> ≤ ρ, we have ρ = ρ1. That is *A* is the perfect code. Note that in this example, *e* can take any positive integer, so the code rate of the repeat code has a limit value

$$\lim\_{n \to \infty} R\_A = 0,\\
\Rightarrow \lim\_{n \to \infty} H(\frac{e}{n}) = 1.$$

As the end of this chapter, we discuss and define the equivalence of two codes. Let *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* be a code of length *n* and *Sn* be a permutation group of *n* elements. Any <sup>σ</sup> <sup>∈</sup> *Sn* is a *<sup>n</sup>* permutation, *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> ... *xn* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , We define σ (*x*) as

$$
\sigma(\mathbf{x}) = \mathbf{x}\_{\sigma(\mathbf{l})} \mathbf{x}\_{\sigma(2)} \dots \mathbf{x}\_{\sigma(n)} \in \mathbb{F}\_q^n,\tag{2.6}
$$

$$\sigma(\mathcal{C}) = \{\sigma(c) \mid c \in \mathcal{C}\}.\tag{2.7}$$

**Definition 2.5** Let *C* and *C*<sup>1</sup> be two codes in F*<sup>n</sup> <sup>q</sup>* , if there is σ ∈ *Sn* ⇒ σ (*C*) = *C*1, Call *C* and *C*<sup>1</sup> is equivalent, denoted as *C* ∼ *C*1. Obviously, equivalence is an equivalence relation between codes, because take σ = 1, then have *C* ∼ *C*. If *C* ∼ *C*1, that is *C*<sup>1</sup> = σ (*C*), then we have *C* = σ <sup>−</sup><sup>1</sup>(*C*1), that is *C* ∼ *C*<sup>1</sup> ⇒ *C*<sup>1</sup> ∼ *C*. Similarly, if *C* ∼ *C*1, *C*<sup>1</sup> ∼ *C*2, then *C* ∼ *C*2. Because *C*<sup>1</sup> = σ (*C*),*C*<sup>2</sup> = τ (*C*1) ⇒ *C*<sup>2</sup> = τσ(*C*). Another obvious property is that the function of σ does not change the Hamming distance between two codewords, that is, we have

$$d(\sigma(\mathbf{x}), \sigma(\mathbf{y})) = d(\mathbf{x}, \mathbf{y}), \forall \sigma \in S\_n. \tag{2.8}$$

**Lemma 2.7** *Suppose C* ∼ *C*1*are two equivalent codes, then they have the same code rate, the same minimum distance, the same coverage radius and the same disjoint radius. In particular, if C is a perfect code, then all codes C*<sup>1</sup> *equivalent to C are perfect codes.*

*Proof* All the results of lemma can be easily proved by using equation (2.8).

# **2.2 Linear Code**

Let *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* be a code, if *C* is a *k*-dimensional linear subspace of F*<sup>n</sup> <sup>q</sup>* , *C* is called a linear code, denote as *C* = [*n*, *k*]. So for a linear code *C*, we have

$$R\_C = \frac{1}{n} \log\_q |C| = \frac{k}{n}, \text{ minimal distance } d = \text{minimal weight } w.$$

Let {α1, α2,...,α*<sup>k</sup>* } ⊂ *C* be a set of bases of linear code *C*, where

$$
\alpha\_i = \alpha\_{i\_1} \alpha\_{i\_2} \cdots \alpha\_{i\_a} \in \mathbb{F}\_q^n, \ 1 \le i \le k \dots
$$

**Definition 2.6** If {α1, α2,...,α*<sup>k</sup>* } is a set of bases of linear code *C* = [*n*, *k*], then have *k* × *n*-order matrix

$$G = \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_k \end{bmatrix} = \begin{bmatrix} \alpha\_{11} \ \alpha\_{12} \ \cdots \ \alpha\_{1n} \\ \alpha\_{21} \ \alpha\_{22} \ \cdots \ \alpha\_{2n} \\ \vdots \\ \alpha\_{k1} \ \alpha\_{k2} \ \cdots \ \alpha\_{kn} \end{bmatrix}.$$

Called generation matrix of *C*, write

*G* = [*Ik* , *Ak*×(*n*−*k*)], *Ik* is *k* order identity matrix.

It is called the standard form of *G*.

**Lemma 2.8** *C* = [*n*, *k*] *is a linear code, G is generation matrix, then*

$$\mathcal{C} = \{aG|a \in \mathbb{F}\_q^k\}.$$

*Proof* Because {α1, α2,...,α*<sup>k</sup>* } is a set of bases of linear code *C*. ∀ *x* ∈ *C*, then

$$\mathbf{x} = a\_1\alpha\_1 + a\_2\alpha\_2 + \dots + a\_k\alpha\_k = (a\_1, a\_2, \dots, a\_k) \begin{bmatrix} \alpha\_1\\ \vdots\\ \alpha\_k \end{bmatrix} = a \cdot G.I.$$

Where *<sup>a</sup>* <sup>=</sup> (*a*1, *<sup>a</sup>*2,..., *ak* ) <sup>∈</sup> <sup>F</sup>*<sup>k</sup> <sup>q</sup>* , the Lemma holds.

Define the inner product in F*<sup>n</sup> <sup>q</sup>* , *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*<sup>1</sup> ... *xn*, *<sup>y</sup>* <sup>=</sup> *<sup>y</sup>*<sup>1</sup> ... *yn* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , then define < *<sup>x</sup>*, *<sup>y</sup>* <sup>&</sup>gt;<sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *xi yi* , if < *x*, *y* >= 0, Say *x* and *y* orthogonal, denote as *x* ⊥ *y*.

**Definition 2.7** Let *C* = [*n*, *k*] be a linear code whose orthogonal complement space *C*<sup>⊥</sup> is

$$\mathcal{C}^\perp = \{ \mathbf{y} \in \mathbb{F}\_q^n \mid < \mathbf{x}, \, \mathbf{y} \, \succ = \mathbf{0}, \, \forall \, \mathbf{x} \in \mathbf{C} \}.$$

Obviously, *C*<sup>⊥</sup> is an [*n*, *n* − *k*]-linear code, and *C*<sup>⊥</sup> is the dual code of *C*. The generating matrix *H* of *C*<sup>⊥</sup> is called the check matrix of *C*.

**Lemma 2.9** *C* = [*n*, *k*] *is a linear code, H is a check matrix, then we have*

46 2 The Basis of Code Theory

$$xH'=0 \Leftrightarrow x \in C.$$

*Where H is the transpose matrix of H.*

*Proof* We only prove the conclusion by taking the standard form of the generating matrix *G* of *C*. Let

$$G = [I\_k, A\_{k \times (n-k)}] = [I\_k, A], \quad A = A\_{k \times (n-k)}.$$

Then the check matrix of *C*, that is, the generating matrix of dual code *C*<sup>⊥</sup> is

$$H = [-A', I\_{n-k}], \ H' = \begin{bmatrix} -A \\ I\_{n-k} \end{bmatrix}.$$

By Lemma 2.8, if *<sup>x</sup>* <sup>∈</sup> *<sup>C</sup>*, then <sup>∃</sup> *<sup>a</sup>* <sup>∈</sup> <sup>F</sup>*<sup>k</sup> <sup>q</sup>* ⇒ *x* = *aG*, thus

$$axH' = aGH' = a[I\_k, A] \begin{bmatrix} -A \\ I\_{n-k} \end{bmatrix} = 0.$$

Conversely, if *x H* = 0, because *H* is the generating matrix of *C*⊥, again by Lemma 2.8, for <sup>∀</sup> *<sup>y</sup>* <sup>∈</sup> *<sup>C</sup>*⊥, <sup>∃</sup> *<sup>b</sup>* <sup>∈</sup> <sup>F</sup>*n*−*<sup>k</sup> <sup>q</sup>* ⇒ *y* = *bH*, thus

$$<\langle x, y> = xy' = xH'b' = 0 \Rightarrow x \in (C^\perp)^\perp = C.\,\,\,$$

The Lemma holds.

By Lemma 2.9, <sup>∀</sup> *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , then

$$\mathbf{x}H' = \mathbf{y}H' \Leftrightarrow \mathbf{x} - \mathbf{y} \in \mathbf{C} \dots$$

Because *C* is an additive subgroup of F*<sup>n</sup> <sup>q</sup>* , *x H* is called the check value of codeword *x*. Then the check values of the two codewords are equal ⇔. These two codewords are in the same additive coset of *C*. The following decoding principle of linear code is produced.

**Decoding principle:** If the *C* = [*n*, *k*] linear code is used for coding, through an interference channel, when the received codeword is *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , then find a codeword *x*<sup>0</sup> with the least weight in the additive coset *x* + *C* of *x*, that is, *x*<sup>0</sup> satisfies

$$\forall \mathbf{x}\_0 \in \mathbf{x} + \mathbf{C}, \text{and } w(\mathbf{x}\_0) = \min \{ w(\boldsymbol{\alpha}) | \boldsymbol{\alpha} \in \mathbf{x} + \mathbf{C} \}.$$

*x*<sup>0</sup> is called the leader codeword in coset *x* + *C*. We're going to decode *x* into *x* − *x*0.

**Lemma 2.10** *If the minimum distance of linear code C* = [*n*, *k*] *is d* = 2*e* + 1*, then there is at most one codeword x*<sup>0</sup> ⇒ w(*x*0) ≤ *e in any additive coset x* + *C of C.*

*Proof* If α, β ∈ *x* + *C*, and w(α) ≤ *e*, w(β) ≤ *e*. Then α − β ∈ *C*. And w(α − β) ≤ w(α) + w(β) = 2*e*, but minimal weight of *C* =Minimal distance of *C* = 2*e* + 1, so there are contradictions, thus α = β. The Lemma holds.

**Corollary 2.5** *For a perfect linear code C* = [*n*, *k*] *with minimal distance d* = 2*e* + 1*, then there exists and only exists a codeword with weight* ≤ *e in any additive coset x* + *C of C. In other words, the leader code in any addition set is unique.*

*Proof <sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* ⇒ ∃ *c* ∈ *C* such that *x* ∈ *Be*(*c*), that is *d*(*c*, *x*) ≤ *e*. So w(*x* − *c*) ≤ *e*.But *x* − *c* ∈ *x* + *C*. The Lemma holds.

**Definition 2.8** If any two column vectors of the generator matrix *G* of a linear code *C* = [*n*, *k*] are linearly independent, *C* is called a projective code.

In order to discuss the true meaning of projective codes, we consider the (*k* − 1) dimensional projective space *PG*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>, *<sup>q</sup>*) over <sup>F</sup>*<sup>q</sup>* .

In F*<sup>k</sup> <sup>q</sup>* , any two vectors *a* = (*a*1, *a*2,..., *ak* ), *b* = (*b*1, *b*2,..., *bk* ), say *a* ∼ *b*, if <sup>∃</sup> <sup>λ</sup> <sup>∈</sup> <sup>F</sup><sup>∗</sup> *<sup>q</sup>* <sup>⇒</sup> *<sup>a</sup>* <sup>=</sup> <sup>λ</sup>*b*. This is an equivalent relation on <sup>F</sup>*<sup>k</sup> <sup>q</sup>* . Obviously *b* ∼ 0 ⇔ *b* = 0, any *<sup>a</sup>* <sup>∈</sup> <sup>F</sup>*<sup>k</sup> <sup>q</sup>* , *<sup>a</sup>* = {λ*a*|<sup>λ</sup> <sup>∈</sup> <sup>F</sup><sup>∗</sup> *<sup>q</sup>* }, the quotient set <sup>F</sup>*<sup>k</sup> <sup>q</sup>* /<sup>∼</sup> is called a (*k* − 1)-dimensional projective space over <sup>F</sup>*<sup>q</sup>* . Denote as *PG*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>, *<sup>q</sup>*), therefore

$$PG(k-1,q) = \mathbb{F}\_q^k /\_{\sim} = \{\overline{a} | a \in \mathbb{F}\_q^k\}.$$

The number of nonzero points in (*k* − 1)-dimensional projective space *PG*(*k* − 1, *q*) is

$$|PG(k-1,q)| = \frac{q^k - 1}{q - 1} = 1 + q + \dots + q^{k-1}.$$

A linear code [*n*, *n* − *k*], its check matrix *H* is a *k* × *n*-order matrix, and any two column vectors are linearly independent, that is *H* = [*a*1, *a*2,..., *an*], then {*a*1, *a*2,..., *an*} ⊂ *PG*(*k* − 1, *q*) are *n* with different nonzeros. So the generating matrix of an [*n*, *k*] projective code consists of *n* different nonzero points in projective space *PG*(*k* − 1, *q*). Because *n* ≤ |*PG*(*k* − 1, *q*)|, when the maximum value is reached, i.e.

$$|n=|PG(k-1,q)| = \frac{q^k - 1}{q - 1}.$$

This leads to a perfect example of linear codes, called Hamming codes.

**Definition 2.9** Let *<sup>k</sup>* <sup>&</sup>gt; <sup>1</sup>, *<sup>n</sup>* <sup>=</sup> *<sup>q</sup>k*−<sup>1</sup> *<sup>q</sup>*−<sup>1</sup> , a linear code *<sup>C</sup>* = [*n*, *<sup>n</sup>* <sup>−</sup> *<sup>k</sup>*] is called a Hamming code if any two column vectors of the check matrix *H* of *C* are linearly independent.

Since *C* is a *n* − *k*-dimensional linear subspace and *C*<sup>⊥</sup> is a *k*-dimensional linear subspace, its generating matrix *H* is a *k* × *n*-order matrix. Therefore, if any two column vectors of *H* are linearly independent, they represent *n* different points in projective space *PG*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>, *<sup>q</sup>*). Because *<sup>n</sup>* <sup>=</sup> *<sup>q</sup>k*−<sup>1</sup> *<sup>q</sup>*−<sup>1</sup> , then the construction of Hamming codes is the most possible.

**Theorem 2.3** *Any Hamming code C* = [*n*, *n* − *k*] *is a perfect code, its minimum distance is d* = 3*; therefore, Hamming codes are perfect* 1−*error correcting codes.*

*Proof* We first prove that the minimum distance of Hamming code *C* is *d* ≥ 3. If *d* ≤ 2, there is *x* = *x*1*x*<sup>2</sup> ... *xn* ⇒ w(*x*) ≤ 2, that is, there are at most two characters *xi* and *x <sup>j</sup>* are not 0. Because the minimum distance *d* = minimum weight w of a linear code.

Let *H* = (α1, α2,...,α*n*) be the check matrix of *C*. if *x H* = 0, then

$$(\boldsymbol{x}\_1, \boldsymbol{x}\_2, \dots, \boldsymbol{x}\_n) \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_n \end{bmatrix} = \boldsymbol{0}.$$

We have α*<sup>i</sup> xi* + α*<sup>j</sup> x <sup>j</sup>* = 0, thus α*<sup>i</sup>* and α*<sup>j</sup>* are linearly related, contradiction. So *d* ≥ 3, by Lemma 2.4, then the disjoint radius of *C* is ρ<sup>1</sup> ≥ 1.

On the other hand, *c* ∈ *C*, by Lemma 2.3, the number of elements in ball *B*1(*c*) is

$$|B\_1(c)| = 1 + n(q - 1) = q^k.$$

Because *C* is a (*n* − *k*)-dimensional linear subspace, that is |*C*| = *qn*−*<sup>k</sup>* , so

$$|\bigcup\_{c \in \mathcal{C}} B\_1(c)| = |C|q^k = q^n = |\mathbb{F}\_q^n|,$$

$$\Rightarrow \bigcup\_{c \in \mathcal{C}} B\_1(c) = \mathbb{F}\_q^n.$$

We have 1 ≤ ρ<sup>1</sup> ≤ ρ ≤ 1 ⇒ ρ<sup>1</sup> = ρ = 1. *C* is a perfect code. Its minimal distance is *d* = 2ρ + 1 = 3, the Lemma holds.

Next, we discuss the weight polynomial of a linear code *C* and prove the famous MacWilliams theorem.

*x* ∈ *C* = [*n*, *k*], then the value of weight function w(*x*) is from 0 to *n*, actually w(*x*) = 0 ⇔ *x* = 0 ∈ *C*, w(*x*) = *n* ⇔ *x* = *x*<sup>1</sup> ... *xn*, ∀ *xi* = 0. So for each *i*, 0 ≤ *i* ≤ *n*, define

$$A(i) = \#\{\mathbf{x} \in C | w(\mathbf{x}) = i\}$$

and weighted polynomials of *C*.

$$A(z) = \sum\_{i=0}^{n} A\_i z^i, \text{  $z$  is a variable.}$$

Obviously, for any given *c* ∈ *C*, then the number of codewords in *C* whose Hamming distance to *c* is exactly equal to *i* is *Ai* , that is

$$A\_i = \#\{x \in C | d(x, c) = i\}.$$

The codes with the above properties are called distance invariant codes; obviously, linear codes are distance invariant codes.

The following result was proved by MacWilliams in 1963; he established the relationship between the weight polynomials of a linear code *C* and its dual code *C*⊥, which is the most basic achievement in code theory.

**Theorem 2.4** (MacWilliams) *Let C* = [*n*, *<sup>k</sup>*] *be a linear code over* <sup>F</sup>*<sup>q</sup> and the weight polynomial be A*(*z*)*, C*<sup>⊥</sup> *is the dual code of C, the weight polynomial is B*(*z*)*, then*

$$B(z) = q^{-k}(1 + (q - 1)z)^n A(\frac{1 - z}{1 + (q - 1)z}).$$

*Specially, when q* = 2*,*

$$2^k B(z) = (1+z)^n A(\frac{1-z}{1+z}).$$

*Proof* Let ψ(*a*) be an additive feature on F*<sup>q</sup>* . ψ(*a*) can be constructed as follows:

$$\psi(a) = \exp(\frac{2\pi i tr(a)}{p}), tr(a): \mathbb{F}\_q \to \mathbb{F}\_p.$$

For any *c* ∈ *C*, we define the polynomial *gc*(*z*) as

$$\mathfrak{g}\_c(z) = \sum\_{x \in \mathbb{F}\_q^n} z^{w(x)} \psi(),\tag{2.9}$$

therefore,

$$\sum\_{c \in C} \mathcal{g}\_c(z) = \sum\_{x \in \mathbb{F}\_q^n} z^{w(x)} \sum\_{c \in C} \psi(),\tag{2.10}$$

Let's calculate the inner sum of (2.10). If *x* ∈ *C*⊥, then

$$\sum\_{c \in C} \psi() = |C|.$$

If *x* ∈/ *C*⊥, let's prove

$$\sum\_{c \in C} \psi() = 0.\tag{2.11}$$

If *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* , *x* ∈/ *C*⊥, let

$$T(\mathbf{x}) = \{ \mathbf{y} \in \mathcal{C} \mid <\mathbf{y}, \mathbf{x} \text{ >=} = \mathbf{0} \} \subsetneq \mathcal{C},$$

so *T* (*x*) is a linear subspace of *C*. *c* ∈ *C*, we consider additive cosets any two codewords *c* + *y*<sup>1</sup> and *c* + *y*<sup>2</sup> in this set , we have

$$$$

On the contrary, any two additive cosets *c*<sup>1</sup> + *T* (*x*), *c*<sup>2</sup> + *T* (*x*), if < *c*1, *x* >=< *c*2, *x* >, then < *c*<sup>1</sup> − *c*2, *x* >= 0, that is *c*<sup>1</sup> − *c*<sup>2</sup> ∈ *T* (*x*), so *c*<sup>1</sup> + *T* (*x*) = *c*<sup>2</sup> + *T* (*x*). Therefore, the inner product of any two codewords in *c* + *T* (*x*) ⊂ *C* is the same with *x*. Conversely, different additive cosets and the inner product of *x* are not equal. Because *x* ∈/ *C*⊥, ∃ *c*<sup>0</sup> ∈ *C*, such that < *c*0, *x* >= 0, let < *c*0, *x* >= *a* = 0, then <sup>&</sup>lt; *<sup>a</sup>*−1*c*0, *<sup>x</sup>* <sup>&</sup>gt;<sup>=</sup> 1, let *<sup>c</sup>*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*−1*c*<sup>0</sup> <sup>∈</sup> *<sup>C</sup>*, then <sup>&</sup>lt; *<sup>c</sup>*1, *<sup>x</sup>* <sup>&</sup>gt;<sup>=</sup> 1. Therefore, <sup>∀</sup>*<sup>a</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* <sup>⇒</sup><sup>&</sup>lt; *ac*1, *<sup>x</sup>* <sup>&</sup>gt;<sup>=</sup> *<sup>a</sup>*. <sup>&</sup>lt; *<sup>c</sup>*, *<sup>x</sup>* <sup>&</sup>gt; takes every element of <sup>F</sup>*<sup>q</sup>* , so

$$\sum\_{c \in C} \psi() = [C:T(\chi)] \sum\_{a \in \mathbb{F}\_q} \psi(a) = 0.$$

That is, (2.11) holds. From (2.10), we can get that

$$\sum\_{\mathbf{c}\in\mathcal{C}}\mathcal{g}\_{\mathbf{c}}(\mathbf{z})=|\mathcal{C}|\sum\_{\mathbf{x}\in\mathbb{F}\_q^n}z^{w(\mathbf{x})}=|\mathcal{C}|B(\mathbf{z}).\tag{2.12}$$

Define the weight function w(*a*) <sup>=</sup> 1 for *<sup>a</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* , if *<sup>a</sup>* = 0, w(0) <sup>=</sup> 0. For any *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup> q* , *c* ∈ *C*, write *x* = *x*1*x*<sup>2</sup> ... *xn*, *c* = *c*1*c*<sup>2</sup> ... *cn*, then it is defined by G, we have

$$\begin{split} g\_c(z) &= \sum\_{\substack{1 \le i \le n \\ x\_i \in \mathbb{F}\_q}} z^{w(x\_1) + w(x\_2) + \dots + w(x\_n)} \, \psi(c\_1 x\_1 + \dots + c\_n x\_n) \\ &= \prod\_{i=1}^n \sum\_{x \in \mathbb{F}\_q} z^{w(x)} \, \psi(c\_i x) .\end{split} \tag{2.13}$$

The inner layer sum of the above formula can be calculated as

$$\sum\_{\mathbf{x}\in\mathbb{F}\_q} z^{w(\mathbf{x})} \psi(c\_i \mathbf{x}) = \begin{cases} 1 - z, & if \, c\_i \neq 0, \\ 1 + (q - 1)z, & if \, c\_i = 0. \end{cases}$$

From (2.13), then we have

$$g\_c(z) = (1 - z)^{w(c)} (1 + (q - 1)z)^{n - w(c)}.$$

Thus

$$\begin{aligned} \sum\_{c \in C} g\_c(z) &= \left( 1 + (q - 1)z \right)^n \sum\_{c \in C} (\frac{1 - z}{1 + (q - 1)z})^{w(c)} \\ &= (1 + (q - 1)z)^n A(\frac{1 - z}{1 + (q - 1)z}). \end{aligned}$$

Finally, from (2.12), we have

$$\begin{aligned} B(z) &= \frac{1}{|C|} (1 + (q - 1)z)^n A(\frac{1 - z}{1 + (q - 1)z}) \\ &= q^{-k} (1 + (q - 1)z)^n A(\frac{1 - z}{1 + (q - 1)z}) .\end{aligned}$$

We have completed the proof of the theorem.

# **2.3 Lee Distance**

*m* > 1 is a positive integer, Z*<sup>m</sup>* a is residue class rings of mod *m*, if Z*<sup>m</sup>* is the alphabet and *<sup>C</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> <sup>m</sup>* is the proper subset, then *C* is called an *m*-ary code. In this case, Hamming distance is not the best tool to measure error, we substitute Lee distance and Lee weight. Let *<sup>i</sup>* <sup>∈</sup> <sup>Z</sup>*m*, define Lee weight as

$$W\_L(i) = \min\{i, m - i\}.\tag{2.14}$$

Obviously,

$$W\_L(0) = 0, \ W\_L(-i) = W\_L(m - i) = W\_L(i). \tag{2.15}$$

Suppose *<sup>a</sup>* <sup>=</sup> (*a*1, *<sup>a</sup>*2,..., *an*) <sup>=</sup> *<sup>a</sup>*1*a*<sup>2</sup> ... *an* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>m</sup>*, *<sup>b</sup>* <sup>=</sup> *<sup>b</sup>*1*b*<sup>2</sup> ... *bn* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>m</sup>*, define Lee weight and Lee distance on *m*-ary code *C* as follows

$$\begin{cases} W\_L(a) = \sum\_{i=1}^n W\_L(a\_i) \\\ d\_L(a, b) = W\_L(a - b) .\end{cases}$$

From (2.15), we have

$$W\_L(-a) = W\_L(a), \\ d\_L(a,b) = d\_L(b,a), \forall \, a, b \in \mathbb{Z}\_m^n.$$

**Lemma 2.11** *For* <sup>∀</sup> *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> m, we have the following trigonometric inequalities*

$$d\_L(a,b) \le d\_L(a,c) + d\_L(c,b).$$

*Proof* Suppose 0 ≤ *i* < *m*, 0 ≤ *j* < *m*, we have

$$W\_L(i+j) \le W\_L(i) + W\_L(j). \tag{2.16}$$

Because 0 <sup>≤</sup> *<sup>i</sup>* <sup>+</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>2</sup> , then

$$W\_L(i+j) = i+j = W\_L(i) + W\_L(j).$$

If *<sup>m</sup>* <sup>2</sup> < *i* + *j* < *m*, we discuss it in three ways, (1) *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>2</sup> , *<sup>j</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>2</sup> , there is

$$W\_L(i+j) = m - i - j < i + j = W\_L(i) + W\_L(j).$$

(2) *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>2</sup> , *<sup>j</sup>* <sup>&</sup>gt; *<sup>m</sup>* <sup>2</sup> , there is

$$W\_L(i+j) = m - i - j \le m - j = W\_L(j) \le W\_L(i) + W\_L(j).$$

(3) *i* > *<sup>m</sup>* <sup>2</sup> , *<sup>j</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>2</sup> , there is

$$W\_L(i+j) = m - i - j \le m - i = W\_L(i) \le W\_L(i) + W\_L(j).$$

So we always have (2.16), in Z*<sup>n</sup> <sup>m</sup>*, (2.16) can be extended to

$$W\_L(a+b) \le W\_L(a) + W\_L(b), \forall \ a, b \in \mathbb{Z}\_m^n.$$

So

$$\begin{aligned} d\_L(a,b) &= W\_L(a-b) = W\_L((a-c) + (c-b)) \\ &\le W\_L(a-c) + W\_L(c-b) = d\_L(a,c) + d\_L(c,b). \end{aligned}$$

The Lemma holds.

Next let's make *<sup>m</sup>* <sup>=</sup> 4, the alphabet is <sup>Z</sup>4, On a 4-ary code, we discuss Lee weight and Lee distance. Suppose *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* <sup>4</sup>, 0 ≤ *i* ≤ 3, let

$$n\_i(a) = \#\{j | 1 \le j \le n, a = a\_1 a\_2 \dots a\_n, a\_j = i\}.\tag{2.17}$$

*ni*(*a*) is the number of characters equal to *<sup>i</sup>* in codeword *<sup>a</sup>*. *<sup>C</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* <sup>4</sup> is a 4-ary code, the symmetric polynomial and Lee weight polynomial of *C* are defined as

$$swe\_{\mathcal{C}}(w, x, y) = \sum\_{c \in \mathcal{C}} w^{n\_0(c)} x^{n\_1(c) + n\_3(c)} y^{n\_2(c)} \tag{2.18}$$

and

$$Lee\_{\mathcal{C}}(\mathbf{x}, \mathbf{y}) = \sum\_{c \in \mathcal{C}} \mathbf{x}^{2n - W\_L(c)} \mathbf{y}^{W\_L(c)}.\tag{2.19}$$

**Lemma 2.12** *Let C* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* <sup>4</sup> *is a 4-ary code with codeword length of n, then the symmetric polynomials and Lee weight polynomials have the following relation on C,*

$$Lee\_C(\mathfrak{x}, \mathfrak{y}) = swe\_C(\mathfrak{x}^2, \mathfrak{x}\mathfrak{y}, \mathfrak{y}^2).$$

*Proof <sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* 4, by definition

$$n\_0(a) + n\_1(a) + n\_2(a) + n\_3(a) = n\_\cdot$$

Let *a* = *a*1*a*<sup>2</sup> ... *an*, then

$$W\_L(a) = \sum\_{i=1}^n W\_L(a\_i) = n\_1(a) + 2n\_2(a) + n\_3(a).$$

So

$$\begin{aligned} Lee\_C(\mathbf{x}, \mathbf{y}) &= \sum\_{c \in \mathcal{C}} \mathbf{x}^{2n\_0(c)} \cdot (\mathbf{x}\mathbf{y})^{n\_1(c) + n\_3(c)} \mathbf{y}^{2n\_2(c)} \\ &= swe\_C(\mathbf{x}^2, \mathbf{x}\mathbf{y}, \mathbf{y}^2). \end{aligned}$$

The Lemma holds.

By using Lee weight and Lee distance, we can extend the MacWilliams theorem to Z<sup>4</sup> codes, we have

**Theorem 2.5** *Let C* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* <sup>4</sup> *be a linear code and C*<sup>⊥</sup> *be its dual code, LeeC*(*x*, *y*) *be a Lee weighted polynomial of C, then*

$$Lee\_{C^\perp}(\mathbf{x}, \mathbf{y}) = \frac{1}{|C|} Lee\_C(\mathbf{x} + \mathbf{y}, \mathbf{x} - \mathbf{y}).$$

*Proof* Let ψ be a nontrivial characteristic of Z4, and let ψ be

$$\psi(i) = (\sqrt{-1})^i, i = 0, 1, 2, 3.$$

Let *f* (*u*) be a function defined on Z*<sup>n</sup>* 4, we let

$$\log(c) = \sum\_{u \in \mathbb{Z}\_4^n} f(u)\psi(\cdot \le c, u >). \tag{2.20}$$

As in Theorem 2.4, there are

$$\sum\_{c \in C} \mathbf{g}(c) = |C| \sum\_{u \in C^\perp} f(u). \tag{2.21}$$

Take

$$f(u) = w^{n\_0(u)} x^{n\_1(u) + n\_3(u)} y^{n\_2(u)}, \ u \in \mathbb{Z}\_4^n.$$

Write *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>*1*u*<sup>2</sup> ... *un* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* 4, then for each *i*, 0 ≤ *i* ≤ 3, we have

$$n\_i(\mu) = n\_i(\mu\_1) + n\_i(\mu\_2) + \dots + n\_i(\mu\_n).$$

Thus

$$f(u) = \prod\_{i=1}^{n} f(u\_i).$$

Let *<sup>c</sup>* <sup>=</sup> *<sup>c</sup>*1*c*<sup>2</sup> ... *cn* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* 4, by (2.20),

$$\log(c) = \prod\_{i=1}^{n} (\sum\_{u \in \mathbb{Z}\_4} f(u)\psi()).\tag{2.22}$$

Now we calculate the inner sum on the right side of equation (2.22).

$$\sum\_{\mu \in \mathbb{Z}\_4} f(\mu)\psi() = \begin{cases} w + 2\mathbf{x} + \mathbf{y}, & \text{if } c\_i = 0 \\ w - \mathbf{y}, & \text{if } c\_i = 1 \text{ or } 3 \\ w - 2\mathbf{x} + \mathbf{y}, & \text{or } c\_i = 2. \end{cases}$$

by (2.22),

$$\mathbf{g}(c) = (w + 2\mathbf{x} + \mathbf{y})^{n\_0(c)} (w - \mathbf{y})^{n\_1(c) + n\_3(c)} (w - 2\mathbf{x} + \mathbf{y})^{n\_2(c)}.$$

So

$$\sum\_{c \in C} \mathbf{g}(c) = sw e\_C(w + 2x + \mathbf{y}, w - \mathbf{y}, w - 2x + \mathbf{y}).$$

by (2.21),

$$|C|sw e\_{\mathbb{C}^\perp}(w, x, \mathbf{y}) = sw e\_{\mathbb{C}}(w + 2x + \mathbf{y}, w - \mathbf{y}, w - 2x + \mathbf{y}).$$

By Lemma 2.12, and replace the variable, there are

$$Lee\_{C^\perp}(\mathbf{x}, \mathbf{y}) = \frac{1}{|C|} Lee\_C(\mathbf{x} + \mathbf{y}, \mathbf{x} - \mathbf{y}).$$

We have completed the proof.

# **2.4 Some Typical Codes**

# *2.4.1 Hadamard Codes*

In order to introduce Hadamard codes, we first define a Hadamard matrix of order *n*. Let *H* = (*ai j*), if *ai j* = ±1, and

$$HH'=nI\_n = \begin{bmatrix} n \ 0 \cdots \ 0 \\ 0 \ n \cdot \cdots \ 0 \\ \vdots \ \vdots \ \cdots \ \vdots \\ 0 \ 0 \cdot \cdots \ n \end{bmatrix},$$

*H* is called a Hadamard matrix of order *n*. It is easy to verify that the following *H*<sup>2</sup> is a Hadamard matrix of second order

$$H\_2 = \left[\begin{matrix} 1 & 1 \\ 1 & -1 \end{matrix}\right].$$

In order to obtain higher-order Hadamard matrices, a useful tool is the so-called Kronecker product. Let *A* = (*ai j*)*<sup>m</sup>*×*<sup>m</sup>*, *B* = (*bi j*)*<sup>n</sup>*×*<sup>n</sup>*, then *A* and *B*'s Kronecker product *A* ⊗ *B* define as

$$A \otimes B = \begin{bmatrix} a\_{11}B & a\_{12}B & \cdots & a\_{1m}B \\ a\_{21}B & a\_{22}B & \cdots & a\_{2m}B \\ \vdots & \vdots & \cdots & \vdots \\ a\_{m1}B & a\_{m2}B & \cdots & a\_{mm}B \end{bmatrix}.$$

Obviously, *A* ⊗ *B* is a square matrix of order *nm* × *nm*. The following results are easy to prove.

**Lemma 2.13** *Let A be a Hadamard matrix of order m, B be a Hadamard matrix of order n, then A* ⊗ *B be a Hadamard matrix of order nm* × *nm.*

*Proof* Let *A* = (*ai j*)*<sup>m</sup>*×*<sup>m</sup>*, *B* = (*bi j*)*<sup>n</sup>*×*<sup>n</sup>*, *H* = *A* ⊗ *B*, then

$$\begin{aligned} HH' &= \begin{bmatrix} a\_{11}B & a\_{12}B & \cdots & a\_{1m}B \\ a\_{21}B & a\_{22}B & \cdots & a\_{2m}B \\ \vdots & \vdots & \cdots & \vdots \\ a\_{m1}B & a\_{m2}B & \cdots & a\_{mm}B \end{bmatrix} \cdot \begin{bmatrix} a\_{11}B' & a\_{12}B' & \cdots & a\_{1m}B' \\ a\_{12}B' & a\_{22}B' & \cdots & a\_{2m}B' \\ \vdots & \vdots & \cdots & \vdots \\ a\_{1m}B' & a\_{2m}B' & \cdots & a\_{mm}B' \end{bmatrix} \\ &= \begin{bmatrix} c\_{11}B'B' & c\_{12}B'B' & \cdots & c\_{1m}B'B' \\ \vdots & \vdots & \cdots & \vdots \\ c\_{m1}B'B' & c\_{m2}B'B' & \cdots & c\_{mm}B'B' \end{bmatrix} \\ &= \begin{bmatrix} m m I\_{n} & 0 & \cdots & 0 \\ 0 & m m I\_{n} & \cdots & 0 \\ \vdots & \vdots & \cdots & \vdots \\ 0 & 0 & \cdots & m m I\_{n} \end{bmatrix} \\ &= m m I\_{nm}. \end{aligned}$$

The Lemma holds.

Since *H*<sup>2</sup> is a Hadamard matrix of order 2, then

$$H\_2 \otimes H\_2 = H\_2^{\otimes\_2}, H\_2 \otimes H\_2 \otimes \dots \otimes H\_2 = H\_2^{\otimes\_n}$$

are Hadamard matrix of order 4 and order 2*<sup>n</sup>* respectively.

Let *n* be an even number and *Hn* be a Hadamard matrix of order *n*, take α1, α2,...,α*<sup>n</sup>* as *n* row vectors, i.e.,

$$H\_n = \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_n \end{bmatrix}, -H\_n = \begin{bmatrix} -\alpha\_1 \\ -\alpha\_2 \\ \vdots \\ -\alpha\_n \end{bmatrix}.$$

We get 2*n* row vectors {±α1, ±α2,..., ±α*n*}, for each row vectors ±α*<sup>i</sup>* , we replace the component −1 with 0, the row vector α*<sup>i</sup>* so permuted is denoted as <sup>α</sup>*<sup>i</sup>* , <sup>−</sup>α*<sup>i</sup>* denote as <sup>−</sup>α*<sup>i</sup>* , so <sup>±</sup>α*<sup>i</sup>* forms a vector of <sup>F</sup>*<sup>n</sup>* 2, denote as

$$\mathcal{C} = \{ \overline{\pm \alpha\_1}, \overline{\pm \alpha\_2}, \dots, \overline{\pm \alpha\_n} \} \subset \mathbb{F}\_2^n.$$

*C* is called a Hadamard code.

**Theorem 2.6** *The minimum distance of Hadamard code C of length n (n is an even number) is d* <sup>=</sup> *<sup>n</sup>* 2 *.*

*Proof* Let *Hn* be a Hadamard matrix of order *n*, *Hn* = ⎡ ⎢ ⎢ ⎢ ⎣ α1 α2 . . . α*n* ⎤ ⎥ ⎥ ⎥ ⎦ , Each <sup>α</sup>*<sup>i</sup>* is a row

vector of *Hn*, substituteα*<sup>i</sup>* <sup>σ</sup> −→ <sup>α</sup>*<sup>i</sup>* , such that eachα*<sup>i</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> become a binary codeword. We see that this kind of permutation does not change the corresponding Hamming distance, that is

$$\begin{cases} d(\alpha\_i, \alpha\_j) = d(\overline{\alpha}\_i, \overline{\alpha}\_j) \\ d(-\alpha\_i, -\alpha\_j) = d(\overline{\alpha}\_i, \overline{\alpha}\_j), \end{cases}$$

where *<sup>i</sup>* = *<sup>j</sup>*. Let us prove that the minimum distance of *<sup>C</sup>* is *<sup>n</sup>* <sup>2</sup> , let *a* = *a*1*a*<sup>2</sup> ... *an*, *b* = *b*1*b*<sup>2</sup> ... *bn* are two different row vectors of Hadamard matrix *Hn*, because of

$$ab'=0 \Rightarrow \sum\_{i=1}^{n} a\_i b\_i = 0.$$

And *ai* = ±1, *bi* = ±1. Let the number of the same character be *d*<sup>1</sup> and the number of different characters be *d* = *d*(*a*, *b*), so there are *d*<sup>1</sup> − *d* = 0, that is *d*<sup>1</sup> = *d*, but *<sup>d</sup>*<sup>1</sup> <sup>+</sup> *<sup>d</sup>* <sup>=</sup> *<sup>n</sup>*, so *<sup>d</sup>* <sup>=</sup> *<sup>n</sup>* <sup>2</sup> . The Lemma holds.

**Corollary 2.6** *C* = {±α1, ±α2,..., ±α*n*} *is Hadamard code, then the Hamming distance of any two different codewords on C is <sup>n</sup>* 2 *.*

*Proof* {±α1, ±α2,..., ±α*n*}s the row vector of Hadamard matrix, let *a* = ±α*i*, *b* = ±α*j*(*i* = *j*), then

$$ab' = \pm \sum\_{i=1}^{n} a\_i b\_i = 0 \Rightarrow d(a, b) = \frac{n}{2}.$$

A code of length *n*, number of codewords *M*, minimum distance *d*, denoted as (*n*, *M*, *d*), different from linear code [*n*, *k*] or [*n*, *k*, *d*], Hadamard code is

$$C = (n, 2n, \frac{n}{2}).$$

When *n* = 8, *d* = 4, this is an extension of Hamming code.When *n* = 32,(32, 64, 16) is the code used by the U.S. Mars probe in 1969 to transmit pictures taken on Mars.

# *2.4.2 Binary Golay Codes*

In the theory and application of channel coding, binary Golay code is the most famous one. In order to introduce Golay code *G*<sup>23</sup> completely, we first introduce the concept of *t* − (*m*, *k*, λ) design.

Let *S* be a set of *m* elements, that is |*S*| = *m*. The elements in *S* are called points. Let R be the set of subsets with *k* elements in *S*, |R| = *M*, i.e.,

$$\mathcal{R} = \{B\_1, B\_2, \dots, B\_M\}, B\_i \subset S, |B\_i| = k, \ 1 \le i \le M.$$

Element *Bi* in R is called block.

**Definition 2.10** (*S*, R) is called *t* − (*m*, *k*, λ) design, if for any *T* ⊂ *S*, |*T* | = *t*, then there are exactly λ blocks *B* in R such that *T* ⊂ *B*. If (*S*, R) is a *t* − (*m*, *k*, λ) design, denote as (*S*, R) = *t* − (*m*, *k*, λ). If λ = 1, then *t* − (*m*, *k*, 1) is called a Steiner system.

In a *t* − (*m*, *k*, λ) design (*S*, R), we introduce its occurrence matrix. For any *a* ∈ *S*, the characteristic function χ*i*(*a*) is defined as

$$\chi\_i(a) = \begin{cases} 1, & \text{if } a \in B\_i, \\ 0, & \text{if } a \notin B\_i, \end{cases}$$

write *S* = {*a*1, *a*2,..., *am*}, R = {*B*1, *B*2,..., *BM* }, |R| = *M*. Matrix

$$A = (\chi\_j(a\_i))\_{m \times M} = \begin{bmatrix} \chi\_1(a\_1) \ \chi\_2(a\_1) \ \cdots \ \chi\_M(a\_1) \\ \chi\_1(a\_2) \ \chi\_2(a\_2) \ \cdots \ \chi\_M(a\_2) \\ \vdots \\ \vdots \\ \chi\_1(a\_m) \ \chi\_2(a\_m) \ \cdots \ \chi\_M(a\_m) \end{bmatrix},$$

*A* is called the occurrence matrix of *t* − (*m*, *k*, λ) design.

Let's now consider a concrete example, 2 − (11, 6, 3) design. Where there are 11 points in *S* and 6 points in R, and any two points in *S* have exactly three blocks containing it.

**Lemma 2.14** 2 − (11, 6, 3) *design is the only definite one, that is to say, let S* = {*a*1, *a*2,..., *a*11}*, then there are 11 blocks in* R*,*

$$\mathfrak{R} = \{B\_1, B\_2, \dots, B\_{11}\}.$$

*And for any a* ∈ *S, exactly* 6 *blocks Bj in* R *contain a.*

*Proof* Suppose ∀ *a* ∈ *S*, there is exactly *l Bj* containing it, because there are exactly 3 blocks in any 2 points, so there are 6*l* − *l* = 10 × 3. Then *l* = 6. In addition, suppose |R| = *M*, because each point has exactly six blocks containing it, there is 6 × *M* = 11 × 6, we can get *M* = 11.

By Lemma 2.14, the generating matrix *N* of 2 − (11, 6, 3) design is an 11-order square matrix

$$N = \begin{bmatrix} \chi\_1(a\_1) & \chi\_2(a\_1) & \cdots & \chi\_{11}(a\_1) \\ \chi\_1(a\_2) & \chi\_2(a\_2) & \cdots & \chi\_{11}(a\_2) \\ \vdots & \vdots & \cdots & \vdots \\ \chi\_1(a\_{11}) & \chi\_2(a\_{11}) & \cdots & \chi\_{11}(a\_{11}) \end{bmatrix}.$$

And every row of *N* has exactly six 1's and five 0's, and every column of *N* has exactly six 1's and five 0's.

**Lemma 2.15** *Let N be the occurrence matrix of* 2 − (11, 6, 3) *design, then*

$$NN' = \Im I\_{11} + \Im J\_{11}, \ J\_{11} = \begin{bmatrix} 1 \ 1 \ \cdots \ 1 \\ \vdots \ \vdots \ \ddots \ \vdots \\ 1 \ 1 \ \cdots \ 1 \end{bmatrix}.$$

*If N is regarded as a square matrix of order 11 over* F2*, then*

$$NN' = I\_{11} + J\_{11}.$$

*Further rank*(*N*) = 10*, and the solution of linear equation system X N* = 0 *is exactly two repeated codewords* <sup>0</sup> *and* <sup>1</sup>(<sup>0</sup> <sup>=</sup> (0, <sup>0</sup>,..., <sup>0</sup>), <sup>1</sup> <sup>=</sup> (1, <sup>1</sup>,..., <sup>1</sup>)) *in* <sup>F</sup><sup>11</sup> 2 *.*

*Proof* Let *N N* = (*bi j*)<sup>11</sup>×11, defined by

$$b\_{ij} = \sum\_{k=1}^{11} \chi\_k(a\_i)\chi\_k(a\_j).$$

When *i* = *j*, *bi j* = 3, when *i* = *j*, *bi j* = 6, so we have

$$NN' = \Im I\_{11} + \Im J\_{11} \equiv I\_{11} + J\_{11}(\text{mod } \mathcal{D}).$$

Let *N*(mod 2) still be *N*, which is a square matrix of order 11 over F2. we have

$$\text{rank}(N) = \text{rank}(I\_{11}) - \text{rank}(J\_{11}) = 10.$$

So the solution space of *X N* <sup>=</sup> 0 is a one-dimensional linear subspace of <sup>F</sup><sup>11</sup> <sup>2</sup> . Since each column vector of *N* has exactly six 1's and five 0's, then

> (1, <sup>1</sup>,..., <sup>1</sup>)*<sup>N</sup>* <sup>=</sup> (0, <sup>0</sup>,..., <sup>0</sup>) <sup>∈</sup> <sup>F</sup><sup>11</sup> 2 .

So there are exactly two solutions for *X N* = 0:

$$\mathbf{x} = (0, 0, \dots, 0), \ x = (1, 1, \dots, 1).$$

The Lemma holds.

Next, let's construct a matrix *G* of order 12 × 24, *G* = (*I*12, *P*), where

$$P = \begin{pmatrix} 0 \ 1 & \cdots \ 1 \\ 1 \\ \vdots & \text{N} \\ 1 \end{pmatrix}, \text{ and } G = \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_{12} \end{bmatrix}.$$

where <sup>α</sup>*<sup>i</sup>* <sup>∈</sup> <sup>F</sup><sup>24</sup> <sup>2</sup> is the 12 row vector of *G*. Obviously we have a weight function

$$w(\alpha\_1) = 12, \ w(\alpha\_i) = 8, \ 2 \le i \le 12. \tag{2.23}$$

**Lemma 2.16** *Let G* = ⎡ ⎢ ⎢ ⎢ ⎣ α1 α2 . . . α<sup>12</sup> ⎤ ⎥ ⎥ ⎥ ⎦ *, then* {α1, α2,...,α12} ⊂ <sup>F</sup><sup>24</sup> <sup>2</sup> *is a linear indepen-*

*dent group, and the weight of any nonzero linear combination is at least* 8*, that is*

$$w(a\_1\alpha\_1 + a\_2\alpha\_2 + \dots + a\_{12}\alpha\_{12}) \ge 8, \ a\_i \text{ not all zero.}\tag{2.24}$$

*Proof* Let's prove that {α*i*}<sup>12</sup> *<sup>i</sup>*=<sup>1</sup> is a set of vectors orthogonal to each other, that is, the inner product is < α*i*, α*<sup>j</sup>* >= α*i*α *<sup>j</sup>* = 0. Obviously we have

$$<\alpha\_1, \alpha\_j> = \alpha\_1 \alpha\_j' = 6 \equiv 0 (\text{mod } 2), \ j \neq 1 \dots$$

If *i* = 1, *j* = 1, *i* = *j*, then

$$<\alpha\_i, \alpha\_j> = 1 + \sum\_{k=1}^{11} \chi\_k(a\_i)\chi\_k(a\_j) = 4 \equiv 0(\text{mod } 2)\dots$$

So < α*i*, α*<sup>j</sup>* >= 0, when *i* = *j*, that is {α1, α2,...,α12} is a linear independent group of F<sup>24</sup> <sup>2</sup> . If *ai* <sup>∈</sup> <sup>F</sup>2, not all zero, take *<sup>a</sup>* <sup>=</sup> *<sup>a</sup>*1*a*<sup>2</sup> ... *<sup>a</sup>*12, let's prove (2.24) by induction of w(*a*). If w(*a*) = 1, the proposition holds by (2.23). When w(*a*) ≥ 8, the proposition is ordinary, for 2 ≤ w(*a*) ≤ 7, we can still prove

$$w(a\_1\alpha\_1 + a\_2\alpha\_2 + \dots + a\_{12}\alpha\_{12}) \ge 8.1$$

So the Lemma holds.

**Definition 2.11** The linear code [24, 12] generated by row vector group {α1, α2,..., <sup>α</sup>12} of *<sup>G</sup>* in <sup>F</sup><sup>24</sup> <sup>2</sup> is called Golay code, denoted as *G*24. Remove the last component of <sup>α</sup>*<sup>i</sup>* , <sup>α</sup>*<sup>i</sup>* <sup>→</sup> <sup>α</sup>*<sup>i</sup>* , then <sup>α</sup>*<sup>i</sup>* <sup>∈</sup> <sup>F</sup><sup>23</sup> <sup>2</sup> . The linear code [23, 12] generated by {α1, α2,..., α12} in F<sup>23</sup> <sup>2</sup> is called Golay code, denote as *G*23.

**Theorem 2.7** *Golay code G*<sup>23</sup> *is a perfect code* [23, 12] *with minimal distance of d* = 7*.*

*Proof* Because the minimal distance of linear codes is minimal weight, by Lemma 2.16,

$$w(a\_1\overline{a}\_1 + a\_2\overline{a}\_2 + \dots + a\_{12}\overline{a}\_{12}) \ge w(a\_1\alpha\_1 + a\_2\alpha\_2 + \dots + a\_{12}\alpha\_{12}) - 1 \ge 7.$$

On the one hand, w(α*i*) = 8 for ∀ α*<sup>i</sup>* , *i* = 1, so there is α*<sup>i</sup>* ⇒ w(α*i*) = w(α*i*) − 1 = 7. So the minimum distance of *G*<sup>23</sup> is *d* = 7. On the other hand, we note that

$$|G\_{23}| \sum\_{i=0}^{3} \binom{23}{i} = 2^{12} \sum\_{i=0}^{3} \binom{23}{i} = 2^{23} \cdot 2$$

By the sphere-packing condition of Theorem 2.1⇒ *G*<sup>23</sup> is a perfect code, the Lemma holds.

# *2.4.3 3-Ary Golay Code*

In order to introduce 3-ary Golay codes, we first define a Paley matrix of order *q*. Let *q* ≥ 3 be an odd number, and define a second-order real-valued multiplication characteristic χ (*a*) in the finite field F*<sup>q</sup>* as

$$\chi(a) = \begin{cases} 0, & \text{if } a = 0; \\ 1, & \text{if } a \in (\mathbb{F}\_q^\*)^2; \\ -1, & \text{if } a \notin (\mathbb{F}\_q^\*)^2. \end{cases}$$

Obviously, χ is a character in F<sup>∗</sup> *<sup>q</sup>* . Because F<sup>∗</sup> *<sup>q</sup>* is a (*q* − 1)-order cyclic multiplicative group, so we have

$$\chi(-1) = (-1)^{\frac{q-1}{2}} = \begin{cases} 1, & \text{if } q \equiv 1 (\text{mod } 4); \\ -1, & \text{if } q \equiv 2 \text{ or } 3 (\text{mod } 4). \end{cases}$$

Write <sup>F</sup>*<sup>q</sup>* = {*a*0, *<sup>a</sup>*1,..., *aq*−<sup>1</sup>} , where *<sup>a</sup>*<sup>0</sup> <sup>=</sup> 0, then Paley matrix *Sq* of order *<sup>q</sup>* is defined as

$$S\_q = (\chi(a\_i - a\_j))\_{q \times q} = \begin{bmatrix} 0 & \chi(-a\_1) & \chi(-a\_2) & \cdots & \chi(-a\_{q-1}) \\ \chi(a\_1) & 0 & \chi(a\_1 - a\_2) & \cdots & \chi(a\_1 - a\_{q-1}) \\ \chi(a\_2) & \chi(a\_2 - a\_1) & 0 & \cdots & \chi(a\_2 - a\_{q-1}) \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ \chi(a\_{q-1}) & \chi(a\_{q-1} - a\_1) & \cdots & \cdots & 0 \end{bmatrix}.$$

**Lemma 2.17** *The Paley matrix Sq of order q has the following properties:*

*(i) Sq Jq* = *Jq Sq* = 0*. (ii) Sq S <sup>q</sup>* = *q Iq* − *Jq . (iii) S <sup>q</sup>* = (−1) *q*−1 <sup>2</sup> *Sq .*

*Here, Iq is the unit matrix of order q and Jq is the square matrix of order q with all elements of 1.*

*Proof* Let *Sq Jq* = (*bi j*)*<sup>q</sup>*×*<sup>q</sup>* , then for ∀ 0 ≤ *i* ≤ *q* − 1, 0 ≤ *j* ≤ *q* − 1, there is

$$b\_{ij} = \sum\_{k=0}^{q-1} \chi \,(a\_i - a\_k) = \sum\_{c \in \mathbb{F}\_q} \chi \,(c) = 0.1$$

So (i) holds. To prove (ii), let *Sq S <sup>q</sup>* = (*ci j*)*<sup>q</sup>*×*<sup>q</sup>* , then

$$c\_{ij} = \sum\_{k=0}^{q-1} \chi(a\_i - a\_k)\chi(a\_j - a\_k).$$

Obviously, we have

$$c\_{ij} = \begin{cases} q-1, & \text{if } i=j; \\ -1, & \text{if } i \neq j. \end{cases}$$

So (ii) holds. To prove (iii), noticed that χ (−1) = (−1) *q*−1 <sup>2</sup> , so

$$S\_q = \chi(-1)S\_q^{'} = (-1)^{\frac{q-1}{2}}S\_q^{'},$$

the Lemma holds.

Let *q* = 5, we consider the Paley matrix *S*<sup>5</sup> of order 5, it has been calculated that

$$\mathcal{S}\_5 = \begin{bmatrix} 0 & 1 & -1 & -1 & 1 \\ 1 & 0 & 1 & -1 & -1 \\ -1 & 1 & 0 & 1 & -1 \\ -1 & -1 & 1 & 0 & 1 \\ 1 & -1 & -1 & 1 & 0 \end{bmatrix}.$$

In F<sup>11</sup> <sup>3</sup> , we consider a linear code *C* whose generator matrix is

$$G = \left(I\_6 \stackrel{\text{l-1}\quad \text{l-1}\quad \text{l-1}}{S\_S}\right).$$

So *C* is a six-dimensional linear subspace in F<sup>11</sup> <sup>3</sup> , that is *C* = [11, 6]. This code is called 3-ary Golay code. In order to further discuss 3-ary Golay codes [11, 6], we discuss the concept of extended codes of linear codes.

If *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup>* is a *q*-ary linear code of length *n*, the extension code *C* of *C* is defined as

$$\overline{C} = \{ (c\_1, c\_2, \dots, c\_{n+1}) | (c\_1, c\_2, \dots, c\_n) \in \mathcal{C}, \text{ and } \sum\_{i=1}^{n+1} c\_i = 0 \}.$$

Obviously, *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*n*+<sup>1</sup> *<sup>q</sup>* is a linear code.

**Lemma 2.18** *If C* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup> <sup>q</sup> is a linear code, the generation matrix is G and the test matrix is H, then the length of extension code <sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*n*+<sup>1</sup> *<sup>q</sup> is n* + 1*, its generation matrix G and test matrix H are*

$$
\overline{G} = [G, \beta], \text{ and } \overline{H} = \begin{pmatrix} 1 \ 1 & \cdots \ 1 \\ & 0 \\ H & \vdots \\ & & 0 \end{pmatrix},
$$

*respectively. Where* β *is a column vector and satisfies that the sum of all column vectors of* β *and G is 0. Further, let q* = 2*, if the minimum distance d of C is odd, then the minimum distance of C is d* + 1*.*

*Proof* The generation matrix and check matrix of *C* can be given directly by definition. The minimal weight w = w(*c*) of *C* can be obtained by *c* = *c*1*c*<sup>2</sup> ... *cn* ∈ *C*, because *q* = 2, so there are w *ci* = 1, and w is an odd number, then w = 0, let *cn*+<sup>1</sup> = 1, then

$$c^\* = c\_1 c\_2 \dots c\_{n+1} \in \overline{C} \text{ and } w(c^\*) = d+1.$$

This is the minimal weight in *C*. The lemma is proved.

Consider the extension codes *C* = [12, 6] of 3-ary Golay code *C* = [11, 6], its generating matrix is

$$
\overline{G} = \begin{pmatrix} 1 & 1 \ 1 & 1 & 1 & 0 \\ I\_6 & & & -1 \\ & S\_5 & & \vdots \\ & & & -1 \end{pmatrix}, \tag{2.25}
$$

Note that the sum of the components of each row vector of *S*<sup>5</sup> is 0, and the inner product of the different row vectors is - 1, and the inner product of the same row vector is 1, so

$$
\overline{G} \cdot \overline{G}' = 0.
$$

Therefore, the extended code *C* is a self-dual code, that is (*C*)<sup>⊥</sup> = *C*.

**Theorem 2.8** *3-ary Golay code C is a perfect linear code* [11, 6]*, its minimum distance is 5, so it is a 2-error correcting code.*

*Proof* The weight of each row vector of *G* is 6, according to the calculation, the weight of the linear combination of row vectors of *G* is 6, so the minimum distance of extension code *C* is 6 ⇒ the minimum distance of *C* is 5. So the disjoint radius of *C* is ρ<sup>1</sup> = 2. And because

$$|C| = 3^6, \ \sum\_{i=0}^{2} \binom{11}{i} 2^i = 3^5,$$

then the condition of sphere packing satisfy

$$|C| \sum\_{i=0}^{2} \binom{11}{i} 2^i = 3^{11} \dots$$

Thus by Theorem 2.1, *C* is a perfect code, the Theorem holds.

*Remark 2.1* It is worth noting that J.H.VanLint in 1971 (See reference 2 [24]), *A*.*Tieta*¨v*ainen* ¨ in 1973(See reference 2 [43]) independently proved that perfect codes (nontrivial) with minimal distance greater than 3 have only 2-ary Golay codes *G*<sup>23</sup> and 3-ary Golay codes over any finite field.

# *2.4.4 Reed–Muller Codes*

Reed and Muller proposed a class of 2-ary linear codes based on finite geometry in 1954. In order to discuss the structure and properties of these codes, we first prove some results in number theory.

**Lemma 2.19** *Let p be a prime, k*, *n be two nonnegative integers whose p-ary is expressed as*

$$n = \sum\_{i=0}^{l} n\_i p^i, \; k = \sum\_{i=0}^{l} k\_i p^i.$$

*Then*

$$\binom{n}{k} \equiv \prod\_{i=0}^{l} \binom{n\_i}{k\_i} (\text{mod } p), \text{ where} \\ \binom{n\_i}{k\_i} = 0, \text{ if } k\_i > n\_i.$$

*Proof* If *k* = 0, then *ki* = 0, so the above formula holds. If *n* = *k*, then *ni* = *ki* , the above formula also holds. We might as well make 1 ≤ *k* < *n*, note the polynomial congruence

$$(1+x)^p \equiv 1 + x^p(\text{mod } p),$$

#### 2.4 Some Typical Codes 65

so we have

$$\begin{aligned} (1+x)^n &= (1+x)^{\sum\_{i=0}^l n\_i p^i} \\ &\equiv \prod\_{i=0}^l (1+x^{p^i})^{n\_i} (\text{mod } p), \end{aligned}$$

Comparing the coefficients of the *x<sup>k</sup>* terms on both sides of the above formula, if there is a *k <sup>j</sup>* > *n <sup>j</sup>* , then the *x<sup>k</sup>* terms do not appear on the right side of the above formula, which means that the coefficients of the *x<sup>k</sup>* terms on the left side are

$$
\binom{n}{k} \equiv 0 (\text{mod } p).
$$

If *ki* ≤ *ni* , ∀ 0 ≤ *i* ≤ *l*, then

$$\binom{n}{k} \equiv \prod\_{i=0}^{l} \binom{n\_i}{k\_i} (\text{mod } p)\_{.i}$$

We complete the proof of Lemma.

Massey defined the concept of polynomial weight for the first time in 1973, on a finite field with characteristic 2 (*<sup>q</sup>* <sup>=</sup> <sup>2</sup>*<sup>r</sup>*), a polynomial *<sup>f</sup>* (*x*) <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* [*x*], whose Hamming weight is defined as

$$w(f(\mathbf{x})) = \text{The number of nonzero coefficients of } f(\mathbf{x})\text{.}$$

**Lemma 2.20** (Massey, 1973) *Let f* (*x*) <sup>=</sup> *<sup>l</sup> <sup>i</sup>*=<sup>0</sup> *bi*(*<sup>x</sup>* <sup>+</sup> *<sup>c</sup>*)*<sup>i</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* [*x*] *and bl* = <sup>0</sup>*, let i*<sup>0</sup> *be the smallest subscript i of bi* = 0*, then*

$$w(f(\mathbf{x})) \ge w((\mathbf{x} + c)^{i\_0}).$$

*Proof l* = 0, then *i*<sup>0</sup> = 0, the lemma holds. Let *l* < 2*<sup>n</sup>* be lemma, we consider 2*<sup>n</sup>* ≤ *l* < 2*<sup>n</sup>*+1, write *f* (*x*) as

$$\begin{aligned} f(\mathbf{x}) &= \sum\_{i=0}^{2^n - 1} b\_i (\mathbf{x} + \mathbf{c})^i + \sum\_{i=2^n}^{l} b\_i (\mathbf{x} + \mathbf{c})^{l\_i} \\ &= f\_1(\mathbf{x}) + (\mathbf{x} + \mathbf{c})^{2^n} f\_2(\mathbf{x}) \\ &= f\_1(\mathbf{x}) + \mathbf{c}^{2^n} f\_2(\mathbf{x}) + \mathbf{x}^{2^n} f\_2(\mathbf{x}), \end{aligned}$$

where deg *f*1(*x*) < 2*<sup>n</sup>*, deg *f*2(*x*) < 2*<sup>n</sup>*. There are two situations to discuss:

(i) If *f*1(*x*) = 0, then w( *f* (*x*)) = 2w( *f*2(*x*)). Because *i*<sup>0</sup> ≥ 2*<sup>n</sup>*, so

$$\begin{split} w((\mathbf{x} + c)^{i\_0}) &= w((\mathbf{x}^{2^n} + c^{2^n})(\mathbf{x} + c)^{i\_0 - 2^n}) \\ &= 2w((\mathbf{x} + c)^{i\_0 - 2^n}). \end{split}$$

From inductive hypothesis

$$w(f\_2(\mathbf{x})) \ge w((\mathbf{x} + c)^{i\_0 - 2^n}).$$

So there are

$$w(f(\mathbf{x})) = \mathcal{Z}w(f\_2(\mathbf{x})) > \mathcal{Z}w((\mathbf{x} + c)^{i\_0 - 2''}) = w((\mathbf{x} + c)^{i\_0}).$$

(ii) *f*1(*x*) = 0, *i*<sup>1</sup> is the subscript of *f*1(*x*), *i*<sup>2</sup> is the subscript of *f*2(*x*). If the term not 0 in *f*1(*x*) plus the corresponding term of *c*2*<sup>n</sup> f*2(*x*) becomes 0, then *x* <sup>2</sup>*<sup>n</sup> f*2(*x*) will have corresponding terms that are not zero, so we always have

$$w(f(\mathbf{x})) \ge w(f\_1(\mathbf{x})), \ w(f(\mathbf{x})) \ge w(f\_2(\mathbf{x})).$$

If *i*<sup>1</sup> < *i*2, then *i*<sup>0</sup> = *i*1, from inductive hypothesis,

w( *f* (*x*)) ≥ w( *f*1(*x*)) ≥ w((*x* − *c*) *<sup>i</sup>*<sup>1</sup> ) <sup>=</sup> w((*<sup>x</sup>* <sup>−</sup> *<sup>c</sup>*) *<sup>i</sup>*<sup>0</sup> ).

Similarly, if *i*<sup>2</sup> < *i*1, then *i*<sup>0</sup> = *i*2, there is

$$w(f(\mathbf{x})) \ge w(f\_2(\mathbf{x})) \ge w((\mathbf{x} - c)^{i\_2}) = w((\mathbf{x} - c)^{i\_0}).$$

If *i*<sup>1</sup> = *i*<sup>2</sup> , then it can always be changed into the case of *i*<sup>1</sup> = *i*0, so we always have Lemma holds.

Next, we use Massey's method to construct Reed–Muller codes. Let *<sup>m</sup>* <sup>≥</sup> 1, <sup>F</sup>*<sup>m</sup>* 2 be an *m*-dimensional affine space, denote as *AG*(*m*, 2), α ∈ *AG*(*m*, 2) is a point in affine space, write α as an *m*-dimensional column vector, let {*u*0, *u*1,..., *um*−<sup>1</sup>} be the standard base of F*<sup>m</sup>* <sup>2</sup> , that is

$$\boldsymbol{\alpha} = \begin{bmatrix} a\_0 \\ a\_1 \\ \vdots \\ a\_{m-1} \end{bmatrix}, \boldsymbol{u}\_0 = \begin{bmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \dots, \boldsymbol{u}\_{m-1} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix},$$

where *ai* = 0 or 1. Let's establish a 1 − 1 correspondence between the points in the integer set {0 ≤ *j* < 2*<sup>m</sup>*} and *AG*(*m*, 2). Let 0 ≤ *j* < 2*<sup>m</sup>*, then

#### 2.4 Some Typical Codes 67

$$j = \sum\_{i=0}^{m-1} a\_{ij} 2^i, a\_{ij} \in \mathbb{F}\_2.$$

We define

$$x\_j = \sum\_{i=0}^{m-1} a\_{ij} u\_i = \begin{bmatrix} a\_{0j} \\ a\_{1j} \\ \vdots \\ a\_{(m-1)j} \end{bmatrix} \in \mathbb{F}\_2^m,$$

Because when *j*<sup>1</sup> = *j*2, there is *x <sup>j</sup>*<sup>1</sup> = *x <sup>j</sup>*<sup>2</sup> , So {*x <sup>j</sup>*|0 ≤ *j* < 2*<sup>m</sup>*} gives all the points in F*m* <sup>2</sup> . Write *n* = 2*<sup>m</sup>* and consider the matrix

$$E = [\mathbf{x}\_0, \mathbf{x}\_1, \dots, \mathbf{x}\_{n-1}] = \begin{bmatrix} a\_{00} & a\_{01} & \cdots & a\_{0(n-1)} \\ a\_{10} & a\_{11} & \cdots & a\_{1(n-1)} \\ \vdots & \vdots & \cdots & \vdots \\ a\_{(m-1)0} & a\_{(m-1)1} & \cdots & a\_{(m-1)(n-1)} \end{bmatrix}\_{m \times n}$$

Each row vector <sup>α</sup>*<sup>i</sup>* <sup>=</sup> (*ai*0, *ai*1,..., *ai*(*n*−<sup>1</sup>))(<sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>−</sup> <sup>1</sup>) of *<sup>E</sup>* is a vector of <sup>F</sup>*<sup>n</sup>* 2, which is written as

$$E = \begin{bmatrix} \alpha\_0 \\ \alpha\_1 \\ \vdots \\ \alpha\_{m-1} \end{bmatrix} = (a\_{ij})\_{m \times n} (0 \le i < m, 0 \le j < 2^m = n).$$

For each *<sup>i</sup>*, 0 <sup>≤</sup> *<sup>i</sup>* <sup>&</sup>lt; *<sup>m</sup>*, define a linear subspace in <sup>F</sup>*<sup>m</sup>* 2 ,

$$B\_i = \{ \mathbf{x}\_j \in \mathbb{F}\_2^m | a\_{ij} = 0 \}.$$

Obviously, *Bi* is a linear subspace, and the additive coset of *Bi* is called an *m* − 1 dimensional plat in F*<sup>m</sup>* <sup>2</sup> . We consider *Ai* = *Bi* + *ui* ,

$$A\_i = \{ \mathbf{x}\_j \in \mathbb{F}\_2^m | a\_{ij} = 1, 0 \le j < n \} \Rightarrow |A\_i| = 2^{m-1}.$$

We define the characteristic function χ*i*(α) in F*<sup>m</sup>* <sup>2</sup> according to *Ai* ,

$$\chi\_i(\alpha) = \begin{cases} 1, & \text{if } \alpha \in A\_i; \\ 0, & \text{if } \alpha \notin A\_i. \end{cases}$$

where <sup>α</sup> <sup>∈</sup> <sup>F</sup>*<sup>m</sup>* <sup>2</sup> . So each row vector α*i*(0 ≤ *i* < *m*) in *E* can be expressed as

$$\alpha\_i = (\chi\_i(\mathbf{x}\_0), \chi\_i(\mathbf{x}\_1), \dots, \chi\_i(\mathbf{x}\_{n-1})).$$

,

For any two vectors <sup>α</sup> <sup>=</sup> (*b*0, *<sup>b</sup>*1,..., *bn*−1), β <sup>=</sup> (*c*0, *<sup>c</sup>*1,..., *cn*−1) in <sup>F</sup>*<sup>n</sup>* 2, define the product vector

$$\alpha\beta = (b\_0c\_0, b\_1c\_1, \dots, b\_{n-1}c\_{n-1}) \in \mathbb{F}\_2^n.$$

So for 0 ≤ *i*1,*i*<sup>2</sup> < *m*, we have the product of row vectors of *E*

$$
\alpha\_{i\_1}\alpha\_{i\_2} = (\chi\_{i\_1}(\mathbf{x}\_0)\chi\_{i\_2}(\mathbf{x}\_0), \chi\_{i\_1}(\mathbf{x}\_1)\chi\_{i\_2}(\mathbf{x}\_1), \dots, \chi\_{i\_1}(\mathbf{x}\_{n-1})\chi\_{i\_2}(\mathbf{x}\_{n-1})) .
$$

So the *j*-th (0 ≤ *j* < 2*<sup>m</sup>*) component of α*<sup>i</sup>*1α*<sup>i</sup>*<sup>2</sup> is

$$\chi\_{i\_1}(x\_j)\chi\_{i\_2}(x\_j) = \begin{cases} 1, & \text{if } x\_j \in A\_{i\_1} \cap A\_{i\_2}; \\ 0, & \text{if } x\_j \notin A\_{i\_1} \cap A\_{i\_2}. \end{cases}$$

From the definition of *Ai* , obviously,

$$|A\_{i\_1} \cap A\_{i\_2}| = 2^{m-2}.$$

**Lemma 2.21** *Let i*1,*i*2,...,*is be the number of s*(0 ≤ *s* < *m*) *different indexes from 0 to m* − 1*, then*

$$|A\_{i\_1} \cap A\_{i\_2} \cap \dots \cap A\_{i\_s}| = 2^{m-s},$$

*And* <sup>α</sup>*<sup>i</sup>*1α*<sup>i</sup>*<sup>2</sup> ··· <sup>α</sup>*is* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> *has a weight function*

$$w(\alpha\_{i\_1}\alpha\_{i\_2}\cdots\alpha\_{i\_s}) = 2^{m-s}.$$

*Proof* The first conclusion is obvious. Let's just prove the second conclusion,

$$\mathfrak{a} = \mathfrak{a}\_{l\_1} \mathfrak{a}\_{l\_2} \cdots \mathfrak{a}\_{l\_3} = (\mathbf{x}\_{l\_1}(\mathbf{x\_0}) \cdots \mathbf{x}\_{l\_3}(\mathbf{x\_0}), \mathbf{x}\_{l\_1}(\mathbf{x\_1}) \cdots \mathbf{x}\_{l\_3}(\mathbf{x\_1}), \dots, \mathbf{x}\_{l\_1}(\mathbf{x\_{l-1}}) \cdots \mathbf{x}\_{l\_3}(\mathbf{x\_{l-1}})) $$

has 2*<sup>m</sup>*−*<sup>s</sup> x <sup>j</sup>* ∈ *Ai*<sup>1</sup> ∩ *Ai*<sup>2</sup> ∩···∩ *Ais* , so there are 2*<sup>m</sup>*−*<sup>s</sup>* components in α that are 1 and the others are 0, so

$$w(\alpha) = w(\alpha\_{i\_1}\alpha\_{i\_2}\cdots\alpha\_{i\_s}) = 2^{m-s},$$

the Lemma holds.

For 0 ≤ *l* < 2*<sup>m</sup>*, *I*(*l*) is defined as an indicator set,

$$I(l) = \{i\_1, i\_2, \dots, i\_s \mid l = \sum\_{i=0}^{m-1} a\_{il} 2^i \text{ satisfy } a\_{il} = 0\}.$$

The following properties of the indicator set *I*(*l*) are obvious:


The above properties are easy to verify, such as(*iii*), because *l* = *n* − 1 = 2*<sup>m</sup>* − 1 =

<sup>1</sup> <sup>+</sup> <sup>2</sup> +···+ <sup>2</sup>*<sup>m</sup>*−1, so the subscripts *<sup>i</sup>* of *ail* <sup>=</sup> 0 don't exist, that is *<sup>I</sup>*(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>) <sup>=</sup> <sup>∅</sup>. Sometimes we can write indicator sets *I*(*l*) = {*i*1,*i*2,...,*is*}*<sup>l</sup>* .

**Lemma 2.22** *Let* 0 ≤ *l* < *n* = 2*<sup>m</sup>*, *I*(*l*) = {*i*1,*i*2,...,*is*}*, re hypothesis*

$$
\alpha\_{i\_1}\alpha\_{i\_2}\cdots \alpha\_{i\_r} = (b\_{l0}, b\_{l1}, \dots, b\_{l(n-1)}) \in \mathbb{F}\_2^n,
$$

*then in the ring* <sup>F</sup>2[*x*]*, there is*

$$(1+x)^l = \sum\_{j=0}^{n-1} b\_{lj} x^{n-1-j}.\tag{2.26}$$

*Proof* For 0 <sup>≤</sup> *<sup>j</sup>* <sup>&</sup>lt; *<sup>n</sup>*, write *<sup>j</sup>* <sup>=</sup> *m*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *ai j*2*<sup>i</sup>* , then

$$n - 1 - j = \sum\_{i=0}^{m-1} c\_{ij} 2^i \text{, where } c\_{ij} = 1 - a\_{ij}.$$

By Lemma 2.19,

$$\binom{l}{n-1-j} \equiv \prod\_{i=0}^{m-1} \binom{a\_{il}}{c\_{ij}} (\text{mod } 2).$$

If

$$
\binom{l}{n-1-j} \equiv \operatorname{l}(\operatorname{mod} \mathcal{D}),
$$

then when *ail* = 0, ⇒ *ci j* = 0 ⇒ *ai j* = 1, that is to say

$$\binom{l}{n-1-j} \equiv 1 \pmod{2} \Leftrightarrow a\_{ij} = 1, \text{ for } \forall \ i \in I(l).$$

on the other hand, from Lemma 2.21,

$$b\_{lj} = 1 \Leftrightarrow x\_j \in A\_{i\_1} \bigcap A\_{i\_2} \bigcap \dots \bigcap A\_{i\_r} \Leftrightarrow a\_{ij} = 1 \text{, when } i \in I(l).$$

Compare the *x <sup>n</sup>*−1<sup>−</sup> *<sup>j</sup>* terms on both sides of formula (2.26), so we have

$$(1+x)^l = \sum\_{j=0}^{n-1} b\_{lj} x^{n-1-j} \dots$$

The Lemma holds.

For any 0 ≤ *l* < *n* = 2*<sup>m</sup>*, we define the index set *I*(*l*) = {*i*1,*i*2,...,*is*} and the vector in F*<sup>n</sup>* 2.

$$N\_l = \alpha\_{i\_1}\alpha\_{i\_2}\cdots\alpha\_{i\_s}.$$

The index set *I*(*l*) corresponding to different *l* is different, so the corresponding vector *Nl* is different; since the index set corresponding to *l* = *n* − 1 is an empty set, the corresponding vector *Nn*−<sup>1</sup> is defined as

$$N\_{n-1} = (1, 1, \ldots, 1) = e.$$

Let *<sup>e</sup>*<sup>0</sup> <sup>=</sup> (1, <sup>0</sup>,..., <sup>0</sup>), . . . , *en*−<sup>1</sup> <sup>=</sup> (0, <sup>0</sup>,..., <sup>1</sup>) be a set of standard bases of <sup>F</sup>*<sup>n</sup>* 2.

**Lemma 2.23** *For* 0 ≤ *j* < *n, we have*

$$e\_j = \prod\_{i=0}^{m-1} (\alpha\_i + (1 + a\_{ij})e),$$

*where* α*<sup>i</sup> is the i -th row of matrix E.*

*Proof* For vectorα inF*<sup>n</sup>* 2, its complement vectorα is defined to replace the component of 1 in α with 0, and the component of 0 in α with 1. So there are

$$
\alpha + \overline{\alpha} = e = (1, 1, \dots, 1), \forall \, \alpha \in \mathbb{F}\_2^n.
$$

When 0 ≤ *j* < *n* is given, we define the *j*-th complement of row vector α*i*(0 ≤ *i* < *m*) of matrix *E* as

$$
\overline{\alpha\_i(j)} = \begin{cases}
\alpha\_i, & \text{if } a\_{ij} = 1; \\
\overline{\alpha\_i}, & \text{if } a\_{ij} = 0.
\end{cases}
$$

Obviously, there is

$$
\alpha\_i + (1 + a\_{ij})e = \overline{\alpha\_i(j)},
$$

from the definition of index set *I*(*l*), we have

$$
\overline{\alpha\_i(j)} = \begin{cases}
\alpha\_i, & \text{if } i \notin I(j); \\
e - \alpha\_i, & \text{if } i \in I(j).
\end{cases}
$$

#### 2.4 Some Typical Codes 71

Now let's calculate

$$\begin{aligned} \prod\_{i=0}^{m-1} (\alpha\_i + (1 + a\_{ij})e) &= \prod\_{i=0}^{m-1} \overline{\alpha\_i(j)} \\ &= \prod\_{i \in I(j)} (e - \alpha\_i) \prod\_{i \notin I(j)} \alpha\_i = b. \end{aligned}$$

where *<sup>b</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* 2, let *b* = (*b*0, *b*1,..., *bn*−1). Obviously, *bj* = 1. If *k* = *j*, then

$$b\_k = \prod\_{i \notin I(j)} a\_{ik} \cdot \prod\_{i \in I(j)} (1 - a\_{ik}) = 0.$$

Thus *b* = *e <sup>j</sup>* . We have completed the proof of Lemma.

**Lemma 2.24** {*Nl*}0≤*l*<*<sup>n</sup> constitutes a group of bases of* <sup>F</sup>*<sup>n</sup>* <sup>2</sup>*, where Nn*−<sup>1</sup> = *e* = (1, 1,..., 1)*.*

*Proof* {*Nl*}0≤*l*<*<sup>n</sup>* has exactly *n* different vectors, let's prove that they are linearly independent. Let

$$N\_l = \alpha\_{i\_1}\alpha\_{i\_2}\cdots\alpha\_{i\_s} = (b\_{l0}, b\_{l1}, \dots, b\_{l(n-1)}),$$

$$\sum\_{l=0}^{n-1} c\_l N\_l = (\sum\_{l=0}^{n-1} c\_l b\_{l0}, \sum\_{l=0}^{n-1} c\_l b\_{l1}, \dots, \sum\_{l=0}^{n-1} c\_l b\_{l(n-1)}),$$

be a linear combination. Where *c* = (*c*0, *c*1,..., *cn*−<sup>1</sup>) = 0. Because

$$f(\mathbf{x}) = \sum\_{l=0}^{n-1} (1+\mathbf{x})^l \in \mathbb{F}\_2[\mathbf{x}], \ f(\mathbf{x}) \neq \mathbf{0}.$$

By Lemma 2.22, we have

$$f(\mathbf{x}) = \sum\_{j=0}^{n-1} (\sum\_{l=0}^{n-1} c\_l b\_{lj}) \mathbf{x}^{n-1-j}.$$

So if there's a component *<sup>n</sup>*−<sup>1</sup> *<sup>l</sup>*=<sup>0</sup> *clblj* = 0, that is {*Nl*}<sup>0</sup>≤*l*<*<sup>n</sup>* is a group of bases. The Lemma holds.

**Definition 2.12** Let 0 ≤ *r* < *m*, a linear code of order*r*łłReed–Muller code *R*(*r*, *m*) be

$$R(r,m) = L(\{\alpha\_{i\_1}\alpha\_{i\_2}\dots\alpha\_{i\_r} | 0 \le s \le r\}) \subset \mathbb{F}\_2^n,$$

the vector corresponding to *s* = 0 is *e*.

Obviously, when *<sup>r</sup>* <sup>=</sup> 0, *<sup>R</sup>*(0, *<sup>m</sup>*) corresponds to the repeated code in <sup>F</sup>*<sup>n</sup>* 2:

$$R(0,m) = \{(0,0,\ldots,0), (1,1,\ldots,1)\}.$$

For general *<sup>r</sup>*, 0 <sup>≤</sup> *<sup>r</sup>* <sup>&</sup>lt; *<sup>m</sup>*, *<sup>R</sup>*(*r*, *<sup>m</sup>*) is a *<sup>t</sup>*-dimensional linear subspace in <sup>F</sup>*<sup>n</sup>* 2, where

$$t = \sum\_{s=0}^{r} \binom{m}{s} \cdot t$$

**Lemma 2.25** *The dual code of Reed–Muller code R*(*r*, *m*) *of order r is R*(*m* − *r* − 1, *m*)*.*

*Proof* The dimensions of *R*(*r*, *m*) and *R*(*m* − *r* − 1, *m*) are

$$\dim(R(r,m)) = \sum\_{s=0}^{r} \binom{m}{s}\_q$$

and

$$\dim(\mathcal{R}(m-r-1,m)) = \sum\_{s=0}^{m-r-1} \binom{m}{s} \cdot \mathcal{R}$$

Because

$$\begin{aligned} &\sum\_{s=0}^{r} \binom{m}{s} + \sum\_{s=0}^{m-r-1} \binom{m}{m-s} \\ &= \sum\_{s=0}^{r} \binom{m}{s} + \sum\_{s=r+1}^{m} \binom{m}{s} \\ &= \sum\_{s=0}^{m} \binom{m}{s} = (1+1)^{m} \\ &= 2^{m} = n. \end{aligned}$$

That is

$$
\dim(R(r,m)) + \dim(R(m-r-1,m)) = n.
$$

Let α*<sup>i</sup>*1α*<sup>i</sup>*<sup>2</sup> ··· α*is* , α*<sup>j</sup>*1α*<sup>j</sup>*<sup>2</sup> ··· α*jt* be the basis vectors of R(r,m) and R(m-r-1,m), respectively. Let

$$
\alpha = \alpha\_{i\_1} \alpha\_{i\_2} \cdots \alpha\_{i\_s}, \ \beta = \alpha\_{j\_1} \alpha\_{j\_2} \cdots \alpha\_{j\_t}, \ 
$$

by Lemma 2.21,

$$w(\alpha) = 2^{m-s}, \\ w(\beta) = 2^{m-t}, \ s \le r < m, \ t \le m - r - 1, \ j$$

because *s* + *t* < *m*, the product αβ = α*<sup>i</sup>*1α*<sup>i</sup>*<sup>2</sup> ··· α*is* · α*<sup>j</sup>*1α*<sup>j</sup>*<sup>2</sup> ··· α*jt* has weight

$$w(\alpha\beta) = w(\alpha\_{i\_1}\alpha\_{i\_2}\cdots \alpha\_{i\_s}\cdot \alpha\_{j\_1}\alpha\_{j\_2}\cdots \alpha\_{j\_r}) = 2^{m-(s+t)},$$

so

$$<\alpha, \beta> = 0,$$

That is, the dual code of *R*(*r*, *m*) is *R*(*m* − *r* − 1, *m*). The Lemma holds.

**Theorem 2.9** *Reed–Muller code R*(*r*, *m*) *of order r have minimal distance d* = 2*<sup>m</sup>*−*r, specially, when r* = *m* − 2*, R*(*m* − 2, *m*) *is a linear code* [*n*, *n* − *m* − 1]*.*

*Proof* From Lemma 2.21, we have

$$w(\alpha\_{i\_1}\alpha\_{i\_2}\cdots\alpha\_{i\_s}) = 2^{m-s},$$

so the minimum distance of R(r,m) is *d* ≤ 2*<sup>m</sup>*−*<sup>r</sup>*, on the other hand, let *I*1(*r*) be the value of all *l* of corresponding {*i*1,*i*2,...,*is*} under the condition of *s* ≤ *r*, let

$$
\alpha\_{i\_1}\alpha\_{i\_2}\cdots \alpha\_{i\_s} = (b\_{l0}, b\_{l1}, \dots, b\_{l(n-1)}),
$$

then

$$f(\mathbf{x}) = \sum\_{l \in I\_1(r)} (1+\mathbf{x})^l = \sum\_{j=0}^{n-1} (\sum\_{l \in I\_1(r)} c\_l b\_{lj}) \mathbf{x}^{n-1-j}.$$

Therefore, the weight function of linear combination has the following relationship:

$$w(\sum\_{l \in I\_l(r)} c\_l \alpha\_{i\_1} \alpha\_{i\_2} \cdots \alpha\_{i\_l}) = w(f(\mathbf{x})).$$

Define *i*<sup>0</sup> as

$$i\_0 = \min\{l | l \in I\_1(r)\}.$$

Obviously,

$$i\_0 = 1 + 2 + \dots + 2^{m-r-1} = 2^{m-r} - 1,$$

from Lemma 2.20, then there is

$$w(f(\mathbf{x})) \ge w((\mathbf{x} + 1)^{i\_0}) = i\_0 + 1 = 2^{m - r}.$$

Because the combination numbers

$$
\binom{i\_0}{k} = \binom{2^{m-r}-1}{k} \ (0 \le k \le 2^{m-r}-1)!
$$

are all odd, this is because

$$i\_0 = 1 + 2 + \dots + 2^{m-r-1}, \; k = k\_0 + k\_1 \cdot 2 + \dots + k\_{m-r-1} 2^{m-r-1}, \; k$$

∀ *ki* ≤ 1, so as to deduce

$$
\binom{i\_0}{k} = \prod \binom{I}{k\_i} (\text{mod } 2).
$$

So there is

$$
\binom{i\_0}{k} \equiv \lg(\bmod 2).
$$

In the end, we have *d* = 2*<sup>m</sup>*−*<sup>r</sup>*. If let*r* = *m* − 2, then the minimum distance is 4. The dimension of *R*(*m* − 2, *m*) is

$$\begin{aligned} t &= \sum\_{s=0}^{m-2} \binom{m}{s} = \sum\_{s=0}^{m} \binom{m}{s} - \binom{m}{m-1} - \binom{m}{m} \\ &= 2^m - m - 1 \\ &= n - m - 1. \end{aligned}$$

So *R*(*m* − 2, *m*) is a linear code [*n*, *n* − *m* − 1]. The theorem is proved.

Because *R*(*m* − 2, *m*) is in the form of linear code [*n*, *n* − *k*], and the minimum distance is 4, so we consider *R*(*m* − 2, *m*) as a class of extended Hamming codes. Although it is not perfect, Hamming codes are perfect linear codes.

# **2.5 Shannon Theorem**

In the channel transmission, due to the interference of the channel, a codeword *x* ∈ *C* cannot be decoded correctly after it is sent, the probability of this error is recorded as *p*(*x*), which is called the error probability of codeword *x*. According to Hamming distance, after code *C* is selected, according to the decoding principle of "look most like", the error probability *<sup>p</sup>*(*x*) of a codeword *<sup>x</sup>* sending −→ *x* satisfies

$$\begin{cases} p(\mathbf{x}) = 0, & \text{if } d(\mathbf{x}, \mathbf{x}') \le \rho\_1 < \frac{1}{2}n; \\\\ p(\mathbf{x}) > 0, & \text{if } d(\mathbf{x}, \mathbf{x}') > \rho\_1. \end{cases}$$

where ρ<sup>1</sup> is the disjoint radius of code *C*. Therefore, the error probability *p*(*x*) of code word *x* is related to code *C*. The error probability of code *C* is

$$p(C) = \frac{1}{|C|} \sum\_{\mathbf{x} \in \mathcal{C}} p(\mathbf{x}) .$$

#### 2.5 Shannon Theorem 75

It is difficult to calculate the error probability of a codeword mathematically, we take the binary channel as an example,*<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> is a binary code of length *n*, to calculate the error probability *p*(*x*) of *x* ∈ *C*, we agree that the transmission error probability of character 0 is *p*, *p* < <sup>1</sup> <sup>2</sup> , that is the probability of receiving 0 as 1 after transmission, and the probability of character 1 transmission error is also *p*, although the probability of error is very low, that is, the value of *p* is very small, the probability of error exists due to the interference of channel. We further agree that the error probability of each transmission of character 0 or 1 is *p*, which is called memoryless channel. In the memoryless binary channel, the transmission of a codeword *x* = *x*1*x*<sup>2</sup> ... *xn* ∈ *C* just constitutes the *n*-fold Bernoulli test, this probability model provides a theoretical basis for calculating the error probability of codeword *x*, let's take 2-tuple code as an example.

**Lemma 2.26** *Let An be a binary repeated code of length n, that is An* = {0, <sup>1</sup>} ⊂ <sup>F</sup>*<sup>n</sup>* 2*, p*(*An*) *is the probability of error, then*

$$\lim\_{n \to \infty} p(A\_n) = 0.$$

*Proof* The transmission of codeword 0 = (0, 0,..., 0)is regarded as *n*-fold Bernoulli test, the character 0 has only two results of 0 and 1 after each transmission, the probability of occurrence of 0 is *q* = 1 − *p*, and the probability of occurrence of 1 is *p* < <sup>1</sup> <sup>2</sup> . Let 0 ≤ *k* ≤ *n*, then the probability of 0 appearing *k* times is

$$
\binom{n}{k} q^k p^{n-k}.
$$

If *k* > <sup>1</sup> <sup>2</sup> *<sup>n</sup>*, then there are *<sup>k</sup>* <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> *n* 0 characters in the received codeword after the codeword 0 is transmitted, suppose 0 <sup>→</sup> 0, then *<sup>d</sup>*(0, <sup>0</sup>) <sup>≤</sup> *<sup>n</sup>* <sup>−</sup> *<sup>k</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> *n*. Because the disjoint radius of repeat code is <sup>1</sup> <sup>2</sup> *n*, according to the decoding principle, we can always decode 0 → 0 correctly; therefore, the error of codeword 0 = (0, 0,..., 0) ∈ F*n* <sup>2</sup> occurs if and only if when *<sup>k</sup>* <sup>≤</sup> <sup>1</sup> <sup>2</sup> *n*, the error probability is

$$p(0) = \sum\_{0 \le k \le \frac{n}{2}} \binom{n}{k} q^k p^{n-k}.$$

Similarly, the error probability of codeword 1 <sup>=</sup> (1, <sup>1</sup>,..., <sup>1</sup>) <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> is

$$p(1) = \sum\_{0 \le k \le \frac{n}{2}} \binom{n}{k} q^k p^{n-k}.$$

Therefore, the error probability of repeat code *An* is

$$p(A\_n) = \sum\_{0 \le k \le \frac{n}{2}} \binom{n}{k} q^k p^{n-k}.$$

To calculate the limit value *n* → ∞ of the above equation, let's see

$$\sum\_{0 \le k \le \frac{n}{2}} \binom{n}{k} < \sum\_{0 \le k \le n} \binom{n}{k} = 2^n.$$

Because *p* < <sup>1</sup> <sup>2</sup> , so *<sup>p</sup>* <sup>&</sup>lt; *<sup>q</sup>*, and when *<sup>k</sup>* <sup>≤</sup> *<sup>n</sup>* <sup>2</sup> , we have

$$k \log \frac{q}{p} \le \frac{n}{2} \log \frac{q}{p}.$$

It can be directly proved by the above formula

$$q^k p^{n-k} \le (qp)^{\frac{n}{2}}.$$

Thus

$$p(A\_n) \le 2^n (qp)^{\frac{n}{2}} = (4qp)^{\frac{n}{2}}.$$

Because when *p* < <sup>1</sup> 2 ,

$$p^2 - p + \frac{1}{4} = (p - \frac{1}{2})^2 > 0,$$

so

$$p(1-p) = pq < \frac{1}{4}, \text{ that is } 4pq < 1.$$

Therefore,

$$0 \le \lim\_{n \to \infty} p(A\_n) \le \lim\_{n \to \infty} (4qp)^{\frac{n}{2}} = 0.$$

The Lemma holds.

Below, we assume that the channel transmission is binary memoryless symmetric channel. Each code is binary code. The error probability of each transmission of characters 0 and 1 is *<sup>p</sup>*, *<sup>q</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>p</sup>*, *<sup>p</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> . For given codeword length *n* and the number of codewords *M* = *Mn*, we define Shannon's probability *P*∗(*n*, *Mn*, *p*) as

$$P^\*(n, M\_n, p) = \min\{P(C) | C \subset \mathbb{F}\_2^n, \ |C| = M\_n\}.$$

Shannon proved the following famous theorem in 1948.

**Theorem 2.10** (Shannon) *In a memoryless symmetric binary channel, let* 0 <λ< 1 + *p* log *p* + *q* log *q be a given real number, Mn* = 2[λ*n*] *, then we have*

$$\lim\_{n \to \infty} P^\*(n, M\_n, p) = 0.$$

In order to understand the meaning of Shannon's theorem and prove it, we need some auxiliary conclusions.

**Lemma 2.27** 0 <λ< 1 + *p* log *p* + *q* log *q is a given real number, any binary code C* <sup>⊂</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup>*, if* |*C*| = 2[λ*n*] *, then the code rate RC of C satisfies*

$$
\lambda - \frac{1}{n} < R\_C \le \lambda\_{..}
$$

*Specially, When n* → ∞*, the rate of C approaches* λ*.*

#### *Proof*

$$|C| = 2^{\{\lambda n\}} \Rightarrow \log\_2|C| = \lceil \lambda n \rceil \le \lambda n.$$

Therefore,

$$R\_C = \frac{1}{n} \log\_2 |C| \le \lambda.$$

From the properties of square bracket function,

$$
\lambda n < \lceil \lambda n \rceil + 1,
$$

so

$$
\lambda n - 1 < \lceil \lambda n \rceil = \log\_2 |C|.
$$

There are

$$
\lambda - \frac{1}{n} < \frac{1}{n} \log\_2 |C| = \mathcal{R}\_C.
$$

The Lemma 2.27 holds.

Combining Lemma 2.27, the significance of Shannon's theorem is that the code rate tends to the capacity 1 − *H*(*p*) of a channel when the code length *n* increases and tends to infinity, and there exists a code *C* whose error probability is arbitrarily small, according to Shannon's understanding, this kind of code is called "good code". Shannon first proved the existence of "good codes" under more general conditions by probability method. Theorem 2.10 is only a special case of Shannon's channel coding theorem. To prove Shannon theorem, we must accurately estimate the error probability of a given number of codewords under the principle of decoding.

**Lemma 2.28** *In the memoryless binary channel, let the probability of each transmission error of characters* <sup>0</sup> *and* <sup>1</sup> *be p, q* <sup>=</sup> <sup>1</sup> <sup>−</sup> *p, a codeword x* <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> ... *xn* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* 2 *has exactly*ω *characters error during transmission, then for any* ε > 0*, let b* = #*npq* ε *, we have*

$$P\{a > np + b\} \le \varepsilon.$$

*Proof* For any a codeword *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> ... *xn* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* 2, when transmitted in a memoryless binary channel, it can be regarded as an *n*-fold Bernoulli test, ω with exactly ω errors in *x* can be regarded as a discrete random variable with a value of 0, 1, 2,..., *n*, the probability of occurrence of ω is (i.e., the probability of the value ω of the random variable ω)

$$b(\alpha, n, p) = \binom{n}{\alpha} p^{\alpha} q^{n-\alpha}.$$

Therefore, the probability distribution of ω obeys the discrete random variable of binomial distribution. From Lemma 1.18 of the first chapter, the expected value *E*(ω) and variance *D*(ω) of ω are as follows:

$$E(\alpha) = np, \; D(\alpha) = npq.$$

From the Chebyshev inequality of corollary 1.2, for any *k* > 0,

$$P\{|\alpha - E(\alpha)| \ge k\sqrt{D(\alpha)}\} \le \frac{1}{k^2}.$$

Take *k* = <sup>√</sup> 1 <sup>ε</sup> , then we have

$$P\{w > np + b\} \le P\{|w - np| > b\} \le \varepsilon.$$

That is

$$P\{w > np + b\} \le \varepsilon.$$

The Lemma 2.28 holds.

**Lemma 2.29** *Take* ρ = [*np* + *b*]*, where b* = #*np*(1−*p*) <sup>ε</sup> *, then*

$$\begin{aligned} \frac{\rho}{n} \log \frac{\rho}{n} &= p \log p + O(\frac{1}{\sqrt{n}}),\\ (1 - \frac{\rho}{n}) \log(1 - \frac{\rho}{n}) &= q \log q + O(\frac{1}{\sqrt{n}}).\end{aligned}$$

*Proof* When ε > 0 is given, *b* = *O*( <sup>√</sup>*n*), so <sup>ρ</sup> can be rewritten as

$$
\rho = np + O(\sqrt{n}), \ \frac{\rho}{n} = p + O(\frac{1}{\sqrt{n}}).
$$

Thus

$$\begin{aligned} \frac{\rho}{n} \log \frac{\rho}{n} &= (p + O(\frac{1}{\sqrt{n}})) \log(p + O(\frac{1}{\sqrt{n}})) \\ &= (p + O(\frac{1}{\sqrt{n}})) (\log p + \log(1 + O(\frac{1}{\sqrt{n}}))) .\end{aligned}$$

For the real number *x* of |*x*| < 1, we have the following Taylor expansion

$$\log(1+x) = x - \frac{1}{2}x^2 + \frac{1}{3}x^3 - \frac{1}{4}x^4 \dots \dots$$

So when |*x*| < 1, we have

$$
\log(1+\chi) = O(|x|),
$$

thus

$$
\log(1 + O(\frac{1}{\sqrt{n}})) = O(\frac{1}{\sqrt{n}}),
$$

we have

$$\begin{split} \frac{\rho}{n} \log \frac{\rho}{n} &= (p + O(\frac{1}{\sqrt{n}})) (\log p + O(\frac{1}{\sqrt{n}})) \\ &= p \log p + O(\frac{1}{\sqrt{n}}). \end{split}$$

Similarly, for the second asymptotic formula,

$$(1 - \frac{\rho}{n})\log(1 - \frac{\rho}{n}) = q \log q + O(\frac{1}{\sqrt{n}}),$$

the Lemma 2.29 holds.

To prove Shannon theorem, we define the following auxiliary functions, and for any two codewords *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup>, ρ ≥ 0, define

$$f\_{\rho}(\mathbf{x}, \mathbf{y}) = \begin{cases} 0, & \text{if } d(\mathbf{x}, \mathbf{y}) > \rho; \\ 1, & \text{if } d(\mathbf{x}, \mathbf{y}) \le \rho. \end{cases}$$

Let *<sup>C</sup>* = {*x*1, *<sup>x</sup>*2,..., *xM* } ⊂ <sup>F</sup>*<sup>n</sup>* <sup>2</sup> be a binary code of |*C*| = *M*, define

$$g\_i(\mathbf{y}) = 1 - f\_{\boldsymbol{\rho}}(\mathbf{y}, \boldsymbol{\chi}\_i) + \sum\_{j \neq i} f\_{\boldsymbol{\rho}}(\mathbf{y}, \boldsymbol{\chi}\_j).$$

**Lemma 2.30** *Assuming y* <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> *is a given codeword, then*

$$\begin{cases} g\_i(\mathbf{y}) = 0, & \text{if } \mathbf{x}\_i \in C \text{ is the only codeword so that } d(\mathbf{y}, \mathbf{x}\_i) \le \rho, \\ g\_i(\mathbf{y}) \ge 1, & \text{otherwise.} \end{cases}$$

*Proof* If there is a unique *xi* ∈ *C* such that *d*(*y*, *xi*) ≤ ρ, then *f*ρ(*y*, *xi*) = 1, but *f*ρ(*y*, *x <sup>j</sup>*) = 0(*i* = *j*), therefore

$$g\_i(\mathbf{y}) = 1 - f\_\rho(\mathbf{y}, \mathbf{x}\_i) + \sum\_{j \neq i} f\_\rho(\mathbf{y}, \mathbf{x}\_j) = 0.$$

If *d*(*y*, *xi*) > ρ, then *f*ρ(*y*, *xi*) = 0, so

$$g\_i(\mathbf{y}) = 1 - f\_\rho(\mathbf{y}, \mathbf{x}\_i) + \sum\_{j \neq i} f\_\rho(\mathbf{y}, \mathbf{x}\_j) = 1 + \sum\_{j \neq i} f\_\rho(\mathbf{y}, \mathbf{x}\_j) \ge 1.$$

If *d*(*y*, *xi*) ≤ ρ, but there is at least one *xk* = *xi* such that *d*(*y*, *xk* ) ≤ ρ, then

$$\begin{aligned} g\_i(\mathbf{y}) &= 1 - f\_\rho(\mathbf{y}, \mathbf{x}\_i) + \sum\_{j \neq i} f\_\rho(\mathbf{y}, \mathbf{x}\_j), \\ &= 1 + \sum\_{j \neq i, j \neq k} f\_\rho(\mathbf{y}, \mathbf{x}\_j) \ge 1. \end{aligned}$$

The Lemma 2.30 holds.

With the above preparation, we give the proof of Shannon's theorem.

*Proof* (The proof of Theorem 2.10) According to the assumptions of the theorem, we assume that 0 <λ< <sup>1</sup> <sup>+</sup> *<sup>p</sup>* log *<sup>p</sup>* <sup>+</sup> *<sup>q</sup>* log *<sup>q</sup>* is a given positive real number (*<sup>p</sup>* <sup>&</sup>lt; <sup>1</sup> 2 ).

$$M = M\_n = 2^{\lceil \lambda n \rceil}, \ |C| = M.$$

Let

$$|C| = \{ \mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_M \} \subset \mathbb{F}\_2^n,$$

ε > 0 is any given positive number,

$$b = \sqrt{\frac{npq}{\varepsilon}}, \ \rho = \lceil pn + b \rceil.$$

Because of *p* < <sup>1</sup> <sup>2</sup> , when *n* is sufficiently large, we have ρ = *pn* + *O*( <sup>√</sup>*n*) < <sup>1</sup> 2 *n*. In order to calculate the error probability of codeword *xi* ∈ *C*, suppose *xi* transmit −→ *y*, if *d*(*xi*, *y*) ≤ ρ, and there is a unique codeword *xi* ∈ *C* such that *d*(*y*, *xi*) ≤ ρ, so according to the decoding principle of "look the most like", *xi* is the most similar codeword in *C*, so we can decode it correctly as *y* →*transmit xi* , in this case, the error probability of *xi* is 0. Otherwise, there will be real decoding error. On the other hand, *y* becomes *xi* , and the occurrence probability of the received codeword after transmission is the conditional probability *p* = (*y*|*xi*), so the error probability of *xi* is estimated as

$$\begin{split} P\_i &= p(\mathbf{x}\_i) \le \sum\_{\mathbf{y} \in \mathbb{F}\_2^u} p(\mathbf{y}|\mathbf{x}\_i) g\_i(\mathbf{y}) \\ &= \sum\_{\mathbf{y} \in \mathbb{F}\_2^u} p(\mathbf{y}|\mathbf{x}\_i) (1 - f\_\rho(\mathbf{y}, \mathbf{x}\_i)) + \sum\_{\mathbf{y} \in \mathbb{F}\_2^u} \sum\_{\substack{j=1 \\ j \neq i}}^M p(\mathbf{y}|\mathbf{x}\_i) f\_\rho(\mathbf{y}, \mathbf{x}\_j) .\end{split} \tag{2.27}$$

According to the definition of *f*ρ(*y*, *xi*), the first term of the above formula is the probability that the received codeword *y* sent by *xi* is not in ball *B*ρ(*xi*), i.e.

$$\sum\_{\mathbf{y}\in\mathbb{F}\_2^n} p(\mathbf{y}|\mathbf{x}\_i)(1-f\_\rho(\mathbf{y},\mathbf{x}\_i)) = P\{\text{received codes } \mathbf{y}|\mathbf{y}\notin B\_\rho(\mathbf{x}\_i)\}.$$

Because ω = *d*(*y*, *xi*) is exactly the number of ω error characters in *xi* → *y*, from the Chebyshev inequality of Lemma 2.28, we have

*P*{received codewords|*y* ∈/ *B*ρ(*xi*)} = *P*{ω>ρ} ≤ *P*{ω ≥ *np* + *b*} < ε,

from (2.27), we have

$$P\_i = p(\mathbf{x}\_i) \le \varepsilon + \sum\_{\mathbf{y} \in \mathbf{F}\_2''} \sum\_{j=1 \atop j \ne i}^M p(\mathbf{y}|\mathbf{x}\_i) f\_\rho(\mathbf{y}, \mathbf{x}\_j). \tag{2.28}$$

Because the definition of the error probability *p*(*C*) of code *C*, so there is

$$p(\mathbf{C}) = \frac{1}{M} \sum\_{i=1}^{M} p(\mathbf{x}\_i) \le \varepsilon + M^{-1} \sum\_{i=1}^{M} \sum\_{\mathbf{y} \in \mathbf{F}\_2^n} p(\mathbf{y}|\mathbf{x}\_i) \sum\_{\substack{j=1\\j\neq i}}^{M} f\_\rho(\mathbf{y}, \mathbf{x}\_j).$$

Since *C* is randomly selected, we can regard *p*(*C*) as a random variable, so Shannon's probability *P* ∗ (*n*, *Mn*, *p*) is the minimum value of *p*(*C*), so it is less than the expected value of *p*(*C*), i.e.

$$\begin{aligned} &P^\*(n, M\_n, p) \le E(P(C)) \\ &\le \varepsilon + M^{-1} \sum\_{i=1}^M \sum\_{\mathbf{y} \in \mathbb{F}\_2^n} \sum\_{j=1 \atop j \ne i}^M E(p(\mathbf{y}|\mathbf{x}\_i) \cdot f\_\rho(\mathbf{y}, \mathbf{x}\_j)). \end{aligned}$$

When *i* is given, the random variables *p*(*y*|*xi*) and *f*ρ(*y*, *x <sup>j</sup>*)(*j* = *i*) are statistically independent, so

$$E(p(\mathbf{y}|\mathbf{x}\_i)\cdot f\_\rho(\mathbf{y}, \mathbf{x}\_j)) = E(p(\mathbf{y}|\mathbf{x}\_i))E(f\_\rho(\mathbf{y}, \mathbf{x}\_j)).$$

So there is

$$P^\*(n, M\_n, p) \le \varepsilon + M^{-1} \sum\_{i=1}^M \sum\_{\mathbf{y} \in \mathbb{R}\_2^n} \sum\_{\substack{j=1 \\ j \ne i}}^M E(p(\mathbf{y}|\mathbf{x}\_i)) E(f\_\rho(\mathbf{y}, \mathbf{x}\_j)). \tag{2.29}$$

Let's calculate the expected value of *f*ρ(*y*, *x <sup>j</sup>*), because *y* is selected in F*<sup>n</sup>* <sup>2</sup> with equal probability, so

$$\begin{aligned} E(f\_\rho(\mathbf{y}, x\_j)) &= \sum\_{\mathbf{y} \in \mathbb{F}\_2^n} p(\mathbf{y}) f\_\rho(\mathbf{y}, x\_j) \\ &= \frac{1}{2^n} |B\_\rho(x\_j)| \\ &= \frac{1}{2^n} |B\_\rho(0)|. \end{aligned}$$

So there is

$$\begin{split} P^\*(n, M\_n, p) &= \varepsilon + M^{-1} \sum\_{i=1}^M \sum\_{\mathbf{y} \in \mathbb{F}\_2^n} E\left(p(\mathbf{y}|\mathbf{x}\_i)\right) \sum\_{j=1 \atop j \neq i}^M E\left(f\_\rho(\mathbf{y}, \mathbf{x}\_j)\right) \\ &= \varepsilon + M^{-1} \sum\_{i=1}^M \sum\_{\mathbf{y} \in \mathbb{F}\_2^n} E\left(p(\mathbf{y}|\mathbf{x}\_i)\right) \frac{(M-1)|B\_\rho(\mathbf{0})|}{2^n} .\end{split} \tag{2.30}$$

Now let's calculate the expected value of *p*(*y*|*xi*)(*y* fixed, *xi* randomly selected in *C*)

$$E(p(\mathbf{y}|\mathbf{x}\_i)) = \sum\_{i=1}^{M} p(\mathbf{x}\_i) p(\mathbf{y}|\mathbf{x}\_i) = p(\mathbf{y}),$$

thus

$$\sum\_{i=1}^{M} \sum\_{\mathbf{y} \in \mathbb{F}\_2^n} E(p(\mathbf{y}|\mathbf{x}\_i)) = \sum\_{i=1}^{M} \sum\_{\mathbf{y} \in \mathbb{F}\_2^n} p(\mathbf{y}) = M.$$

From (2.30),

$$P^\*(n, M\_n, p) \le \varepsilon + \frac{M - 1}{2^n} |B\_\rho(0)|,$$

$$
\log\_2(P^\*(n, M\_n, p) - \varepsilon) \le \log\_2 M + \log\_2 |B\_\rho(0)| - n.
$$

That is

$$\frac{1}{n}\log\_2(P^\*(n,M\_n,p)-\varepsilon) \le \frac{1}{n}\log\_2 M + \frac{1}{n}\log\_2|B\_\rho(0)| - 1.$$

From Lemma 1.11 of Chap. 1,

$$\frac{1}{n}\log\_2 B\_\rho(0) = \frac{1}{n}\log\_2\sum\_{i=0}^\rho \binom{n}{i} \le H(\frac{\rho}{n}),$$

where *<sup>H</sup>*(*x*) = −*<sup>x</sup>* log *<sup>x</sup>* <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)log(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)(<sup>0</sup> <sup>&</sup>lt; *<sup>x</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> ) is the binary entropy function, so there is

$$\frac{1}{n}\log\_2(P^\*(n,M\_n,p)-\varepsilon) \le \frac{1}{n}\log\_2 M + H(\frac{\rho}{n}) - 1.$$

By hypothesis *M* = 2[λ*n*] , ρ = [*pn* + *b*], *b* = *O*( <sup>√</sup>*n*), we have

$$\begin{aligned} \frac{1}{n} \log\_2(P^\*(n, M\_n, p) - \varepsilon) &\le \frac{[\lambda n]}{n} + H(\frac{\rho}{n}) - 1 \\ &= \lambda + H(\frac{\rho}{n}) - 1 + O(\frac{1}{n}). \end{aligned}$$

By Lemma 2.29,

$$H(\frac{\rho}{n}) = -(\frac{\rho}{n}\log\frac{\rho}{n} + (1 - \frac{\rho}{n})\log(1 - \frac{\rho}{n})).$$

$$= -(p\log p + q\log q + O(\frac{1}{\sqrt{n}})).$$

So

$$\frac{1}{n}\log\_2(P^\*(n,M\_n,p)-\varepsilon) \le \lambda - (1+p\log p + q\log q) + O(\frac{1}{\sqrt{n}}).$$

By hypothesis λ < 1 + *p* log *p* + *q* log *q*, when *n* is sufficiently large, we have

$$\frac{1}{n}\log\_2(P^\*(n,M\_n,p)-\varepsilon) \le -\beta(\beta>0).$$

Therefore, 0 ≤ *P*∗(*n*, *Mn*, *p*) ≤ ε + 2−β*<sup>n</sup>*, take the limit *n* → ∞ on both sides, finally,

$$\lim\_{n \to \infty} P^\*(n, M\_n, p) = 0.$$

We completed the proof of the theorem.

According to Shannon, the code rate is close to a given normal number λ,

$$0 < \lambda < 1 + p \log p + q \log q = 1 - H(p),$$

the code with arbitrarily small error probability is called "good code", we further analyze the construction of this kind of "good code". (Shannon only proved the existence of "good code" in probability).

**Theorem 2.11** *For given* <sup>λ</sup>*,* <sup>0</sup> <λ< <sup>1</sup> <sup>+</sup> *<sup>p</sup>* log *<sup>p</sup>* <sup>+</sup> *<sup>q</sup>* log *<sup>q</sup>*(*<sup>p</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> )*, Mn* = 2[λ*n*] *, if there is a perfect code Cn, and* |*Cn*| = *Mn, then we have*

$$\lim\_{n \to \infty} p(C\_n) = 0.$$

*Proof* If perfect code *Cn* exists, by Lemma 2.27,

$$
\lambda - \frac{1}{n} \le \mathcal{R}\_{C\_n} \le \lambda \dots
$$

Therefore, the code rate of *Cn* can be arbitrarily close to λ, the error probability of *Cn* can be arbitrarily small, so *Cn* is a "good code" in the mathematical sense. To prove Theorem 2.11, because *Cn* is a perfect code, the minimum distance *dn* is defined as

$$d\_n = 2e\_n + 1, \ e\_n < \frac{n}{2}.$$

Because of lim*<sup>n</sup>*→∞ *Rcn* <sup>=</sup> <sup>λ</sup>, by Theorem 2.2, we have

$$\lim\_{n \to \infty} H(\frac{e\_n}{n}) = 1 - \lambda > H(p).$$

Because the binary entropy function *H*(*x*) is a monotone continuous rising function (0 < *x* < <sup>1</sup> <sup>2</sup> ). So we have the limit lim*<sup>n</sup>*→∞ *en <sup>n</sup>* , and

$$\lim\_{n \to \infty} \frac{e\_n}{n} > p, \text{ that is } \frac{e\_n}{n} > p \text{, When } n \text{ is sufficiently large.}$$

Now consider the error probability *p*(*x*) of codeword *x* = *x*1*x*<sup>2</sup> ... *xn* ∈ *Cn*, since *Cn* is *en* error correction code, so *x* → *x* , when *d*(*x*, *x* ) ≤ *en*, we can always decode correctly, at this time, the error probability of *x* is 0. Therefore, *x* transmission error, that is, the case where *x* cannot be decoded correctly occurs only in case *d*(*x* , *x*) = w*<sup>n</sup>* > *en*. At this point we have(When *n* is sufficiently large)

$$\frac{w\_n}{n} > \frac{e\_n}{n} > p + \varepsilon \text{, (exist a } \varepsilon > 0).$$

So the error probability *p*(*x*) of *x* ∈ *Cn* is estimated

$$\begin{aligned} p(\mathbf{x}) &\leq P\{\frac{w\_n}{n} > p + \varepsilon\} \\ &\leq P\{ |\frac{w\_n}{n} - p| > \varepsilon \}. \end{aligned}$$

Because when *n* → ∞, the random variable sequence {w*n*} is a Bernoulli random process (i.e., for each *n*, it is *n*-folds Bernoulli test). From theorem 1.2 in Chap. 1, we have

$$\lim\_{n \to \infty} p(\boldsymbol{\alpha}) \le \lim\_{n \to \infty} P\{ |\frac{w\_n}{n} - p| > \varepsilon \} = 0.$$

For ∀ *x* ∈ *Cn* holds, so

$$\lim\_{n \to \infty} p(C\_n) = 0.$$

#### The Theorem 2.11 holds.

From the proof of Theorems 2.10 and 2.11, it can be seen that Shannon randomly selects a code and randomly selects a codeword, which essentially regards the input information as a random event in a given probability space, and the transmission process of information is essentially a random process. The fundamental difference between Shannon and other mathematicians at the same time is that he regards information or a code as a random variable. The mathematical model of information transmission is a dynamic probability model rather than a static algebraic model. The most important method to study a code naturally is probability statistics rather than the algebraic combination method of traditional mathematics. From the perspective of probability theory, Theorems 2.10 and 2.11 regard a code as a random variable, but they have great particularity. The probability distribution of this random variable obeys Bernoulli binomial distribution, especially the statistical characteristics of code rate, which are not clearly expressed. It is the core content of Shannon's information theory to study the relationship between random variables with general probability distribution and codes. One of the most basic concepts is information entropy, or code entropy. Using the concept of code entropy, the statistical characteristics of a code are clearly displayed. Therefore, we see a basic framework and prototype of modern information theory. In the next chapter, we explain and prove these basic ideas and results of Shannon information theory in detail. One of the most important results is Shannon channel coding theorem (see Theorem 3.12 in Chap. 3). Shannon uses the probability method to prove that the so-called good code with a code rate up to the transmission capacity and an arbitrarily small error probability exists for the general memoryless channel (whether symmetrical or not). On the contrary, the code rate of a code with an arbitrarily small error probability must not be greater than the capacity of the channel. This channel capacity is called Shannon's limit, which has been pursued for a long time in the field of electronic communication engineering technology. People want to find a channel coding scheme with error probability in a controllable range (e.g., less than ε) and transmission efficiency (i.e., code rate) reaching Shannon's limit. In today's 5G era, this engineering technical problem seems to have been overcome. Returning to theorem 2.10, we see that the upper limit 1 − *H*(*p*) of the code rate is the channel capacity of the memoryless symmetric binary channel (see example 2 in Sect. 8 of Chap. 3). From this example, we can get a glimpse of Shannon's channel coding theory.

#### **Exercise 2**

	- (*i*) Each codeword has a weight of 6.
	- (*ii*) Any two codewords have Hamming distance of 8.

Prove: |*C*| ≤ 16. Does the binary code *C* of |*C*| = 16 exist?

3. Let *C* be a binary code of length *n* and an error correcting code of one character, prove

$$|C| \le \frac{2^n}{n+2} \text{ ( $n$  is even)}.$$


$$\sum\_{\mathbf{x}\in\mathcal{C}} w(\mathbf{x}) = n(q-1)q^{k-1}.$$

Where w(*x*) is the weight of codeword *x*.


$$
\begin{pmatrix}
1 & 0 \ 0 \ 0 \ 1 \ 0 \ 1 \\
0 \ 1 \ 0 \ 0 \ 1 \ 0 \ 1 \\
0 \ 0 \ 1 \ 0 \ 0 \ 0 \ 1 \ 1 \\
0 \ 0 \ 0 \ 0 \ 1 \ 0 \ 1 \ 1
\end{pmatrix},
$$

Please decode the received codewords as follows: *y*<sup>1</sup> = (1101011), *y*<sup>2</sup> = (0110111), *y*<sup>3</sup> = (0111000).


#### 2.5 Shannon Theorem 87


$$A = H - I, G = (I, A), I \text{ is the unit matrix.}$$

Proved that *G* is the generating matrix of ternary code [24, 12] and the minimum distance is 9.

20. Let *C* = [4, 2] be a ternary Hamming code. *H* is the check matrix of *C*, let *I* be the unit matrix of order 4, *J* is a square matrix of order 4 with all elements of 1, define

$$G = \begin{bmatrix} J+I & I & I \\ 0 & H-H \end{bmatrix},$$

prove that *G* generates a ternary code *C* = [12, 6] and the minimum distance is 6.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 Shannon Theory**

# **3.1 Information Space**

According to Shannon, a message *x* is a random event. Let *p*(*x*) be the probability of occurrence of event *x*. If *p*(*x*) = 0, this event does not occur; If *p*(*x*) = 1, this event must occur. When *p*(*x*) = 0 or *p*(*x*) = 1, information *x* can be called trivial information or spam information. Therefore, the real mathematical significance of information *x* lies in its uncertainty, that is 0 < *p*(*x*) < 1. Quantitative research on the uncertainty of nontrivial information constitutes all the starting point of Shannon's theory; this starting point is now called information quantity or information entropy, or entropy for short. Shannon and his colleagues at Bell laboratory considered "bit" as the basic quantitative unit of information. What is "bit"? We can simply understand it as the number of bits in the binary system. However, according to Shannon, the binary system with *n* digits can express up to 2*<sup>n</sup>* numbers. From the point of view of probability and statistics, the probability of occurrence of these 2*<sup>n</sup>* numbers is <sup>1</sup> 2*n* . Therefore, a bit is the amount of information contained in event *x* with probability <sup>1</sup> 2 . Taking this as the starting point, Shannon defined the self-information *I*(*x*) contained in an information *x* as

$$I(\mathbf{x}) = -\log\_2 p(\mathbf{x}).\tag{3.1}$$

Therefore, one piece of information *<sup>x</sup>* contains *<sup>I</sup>*(*x*)-bit information, when *<sup>p</sup>*(*x*) <sup>=</sup> <sup>1</sup> 2 , then *I*(*x*) = 1. Equation (3.1) is Shannon's first extraordinary progress in information quantification. On the other hand, with the emergence of Telegraph and telephone, binary is widely used in the conversion and transmission of information. Therefore, we can assert that without binary, there would be no Shannon's theory, let alone the current informatics and information age. The purpose of this section is to strictly mathematically deduce and simplify the most basic and important conclusions in Shannon's theory. First, we start with the rationality of the definition of formula (3.1).

If *I*(*x*)is used to represent the self-information of a random event *x*, the greater the probability of occurrence *p*(*x*), the smaller the uncertainty. Therefore, *I*(*x*) should be a monotonic decreasing function of probability *p*(*x*). If *x y* is a joint event and

<sup>©</sup> The Author(s) 2022

Z. Zheng, *Modern Cryptography Volume 1*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-19-0920-7\_3

is statistically independent, that is, *p*(*x y*) = *p*(*x*)*p*(*y*), then the self-information amount is *I*(*x y*) = *I*(*x*) + *I*(*y*). Of course, the self-information amount *I*(*x*) is nonnegative, that is *I*(*x*) ≥ 0. Shannon prove, the self-information *I*(*x*) satisfying the above three assumptions must be

$$I(\boldsymbol{x}) = -c \log p(\boldsymbol{x}),$$

where *c* is a constant. This conclusion can be derived directly from the following mathematical theorems.

**Lemma 3.1** *If the real function f* (*x*) *satisfies the following conditions in interval* [1, +∞)*:*

*(i) f* (*x*) ≥ 0, *(ii) If x* < *y* ⇒ *f* (*x*) < *f* (*y*), *(iii) f* (*x y*) = *f* (*x*) + *f* (*y*).

*Then f* (*x*) = *c* log *x, where c is a constant.*

*Proof* Repeated use condition (*iii*), then there is

$$f(\mathbf{x}^k) = kf(\mathbf{x}), \ k \ge 1$$

for any positive integer *k*. Take *x* = 1, then the above formula holds if and only if *f* (1) = 0. It can be seen from (*ii*) that *f* (*x*) > 0 when *x* > 1. Let *x* > 1, *y* > 1 and *k* ≥ 1 given, you can always find a nonnegative integer *n* to satisfy

$$
\mathbf{y}^{n} \le \mathbf{x}^{k} < \mathbf{y}^{n+1},
$$

Take logarithms on both sides to get

$$
\frac{n}{k} \le \frac{\log x}{\log y} < \frac{n+1}{k},
$$

On the other hand, we have

$$nf(\mathbf{y}) \le kf(\mathbf{x}) < (n+1)f(\mathbf{y}),$$

thus

$$|\frac{f(x)}{f(y)} - \frac{\log x}{\log y}| \le \frac{1}{k},$$

when *k* → ∞, we have

$$\frac{f(\mathbf{x})}{f(\mathbf{y})} = \frac{\log \mathbf{x}}{\log \mathbf{y}}, \text{ } \forall \; \mathbf{x}, \mathbf{y} \in (1, +\infty).$$

Therefore,

#### 3.1 Information Space 93

$$\frac{f(\mathbf{x})}{\log \mathbf{x}} = \frac{f(\mathbf{y})}{\log \mathbf{y}} = c, \text{ } \forall \text{ } \mathbf{x}, \text{ } \mathbf{y} \in (\mathbf{l}, +\infty).$$

That is *f* (*x*) = *c* log *x*. The Lemma holds.

In Lemma 3.1, let *<sup>I</sup>*(*x*) <sup>=</sup> *<sup>f</sup>* ( <sup>1</sup> *<sup>p</sup>*(*x*)), then *f* (*x*) satisfies the condition (*i*), (*ii*) and (*iii*), thus *I*(*x*) = −*c* log *p*(*x*). That is (3.1) holds.

In order to introduce the definition of information space, we use *X* to represent a finite set of original information, or a countable and additive information set, which is called source state set. It can be an alphabet, a finite number of symbols or a set of numbers. For example, 26 letters in English and 2-element finite field F<sup>2</sup> are commonly used source state sets. Elements in *X* can be called messages, events, etc., or characters. We often use English capital letters such as *X*, *Y*, *Z* to represent a source state set, and lowercase Greek letters ξ, η, . . . to represent a random variable in a given probability space.

**Definition 3.1** The value space of a random variable ξ is a source state set *X*; the probability distribution of characters on *X* as events is defined as

$$p(\mathbf{x}) = P\{\xi = \mathbf{x}\}, \ \forall \mathbf{x} \in X. \tag{3.2}$$

We call (*X*,ξ) an information space in a given probability space, when the random variable ξ is clear, we usually record the information space (*X*,ξ) as *X*. If η is another random variable valued on *X*, and ξ and η obey the same probability distribution, that is

$$P\{\xi = \mathbf{x}\} = P\{\eta = \mathbf{x}\}, \ \forall \ \mathbf{x} \in X.$$

Call two information spaces (*X*,ξ) = (*X*, η), usually recorded as *X*.

As can be seen from Definition 3.1, an information space *X* constitutes a finite complete event group, that is, we have

$$\sum\_{\mathbf{x}\in X} p(\mathbf{x}) = 1,\ 0 \le p(\mathbf{x}) \le 1,\ \mathbf{x} \in X. \tag{3.3}$$

It should be noted that if there are two random variables ξ and η with values on *X*, when the probability distributions obeyed by ξ and η are not equal, then (*X*,ξ) and (*X*, η) are two different information spaces; at this point, we must distinguish the two different information spaces with *X*<sup>1</sup> = (*X*,ξ) and *X*<sup>2</sup> = (*X*, η).

**Definition 3.2** *X* and *Y* are two source state sets, and the random variables ξ and η are taken on *X* and *Y* , respectively; if ξ and η are compatible random variables, the probability distribution of joint event *x y*(*x* ∈ *X*, *y* ∈ *Y* ) is defined as

$$p(\text{xy}) = P\{\xi = \text{x}, \eta = \text{y}\}, \text{ } \forall \text{x} \in X, \text{ } \text{y} \in Y. \tag{3.4}$$

Then, we call the joint event set

$$XY = \{ \mathbf{x} \mathbf{y} | \mathbf{x} \in X, \mathbf{y} \in Y \}$$

Together with the corresponding random variables ξ and η, it is called the product space of information space (*X*,ξ) and (*Y*, η), denote as (*XY*, ξ, η), when ξ and η are clear, they can be abbreviated as *XY* = (*XY*, ξ , η). If *X* = *Y* are two identical source state sets, ξ and η have the same probability distribution, then the product space *XY* is denoted as *X*<sup>2</sup> and is called a power space.

Since the information space is a complete set of events, defined by the product information space, we have the following full probability formula and probability product formula:

$$\begin{cases} \sum\_{\mathbf{x} \in X} p(\mathbf{y} \mathbf{x}) = p(\mathbf{y}), \ \forall \ \mathbf{y} \in Y \\\\ \sum\_{\mathbf{y} \in Y} p(\mathbf{x} \mathbf{y}) = p(\mathbf{x}), \ \forall \ \mathbf{x} \in X. \end{cases} \tag{3.5}$$

And

*p*(*x*)*p*(*y*|*x*) = *p*(*x y*), ∀ *x* ∈ *X*, *y* ∈ *Y*.

Where *p*(*y*|*x*) is the conditional probability of *y* under the condition of *x*.

**Definition 3.3** Let *X*1, *X*2,..., *Xn*(*n* ≥ 2) be *n* source state sets, ξ1, ξ2,...,ξ*<sup>n</sup>* be *n* compatible random variables with values, respectively, in *Xi* , the probability distribution of joint event *x*1*x*<sup>2</sup> ··· *xn* is

$$P(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) = P\{\xi\_1 = \mathbf{x}\_1, \xi\_2 = \mathbf{x}\_2, \dots, \xi\_n = \mathbf{x}\_n\}.\tag{3.6}$$

Then called

$$X\_1 X\_2 \cdots \cdot X\_n = \{ \mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n | \mathbf{x}\_i \in X\_i, \ 1 \le i \le n \}$$

are the product of *n* information spaces, especially when *X*<sup>1</sup> = *X*<sup>2</sup> =···= *Xn* = *X*, and each ξ*<sup>i</sup>* has the same probability distribution on *X*, define *X<sup>n</sup>* = *X*1*X*<sup>2</sup> ··· *Xn*, called the *n*-th power space of information space *X*.

Let us give some classic examples of information space.

*Example 3.1* (*Two point information space with parameter* <sup>λ</sup>) Let *<sup>X</sup>* = {0, <sup>1</sup>} = <sup>F</sup><sup>2</sup> be a binary finite field, the random variable ξ taken on *X* is subject to the two-point distribution with parameter λ, that is

$$\begin{cases} p(0) = P\{\xi = 0\} = \lambda, \\ p(1) = P\{\xi = 1\} = 1 - \lambda. \end{cases}$$

where 0 <λ< 1, then (*X*,ξ)is called a two-point information space with parameter λ, still denote as *X*.

*Example 3.2* (*Equal probability information space*) Let *X* = {*x*1, *x*2,..., *xn*} be a source state sets, the random variable ξ on *X* obeys the equal probability distribution, that is

$$p(\mathbf{x}) = P\{\xi = \mathbf{x}\} = \frac{1}{|X|}, \ \forall \ x \in X.$$

Then (*X*,ξ) is called equal probability information space, still denote as *X*.

*Example 3.3* (*Bernoulli information space*) Let *<sup>X</sup>*<sup>0</sup> = {0, <sup>1</sup>} = <sup>F</sup>2. Let the random variable ξ*<sup>i</sup>* be the *i*-th Bernoulli test; therefore, {ξ*i*} *n <sup>i</sup>*=<sup>1</sup> is a set of independent and identically distributed random variables. We let the product space

$$X = (X\_0, \xi\_1)(X\_0, \xi\_2) \cdots (X\_0, \xi\_n) = X\_0^n \subset \mathbb{F}\_2^n,$$

the power space *X* is called Bernoulli information space, also alled memoryless binary information space. The probability function *p*(*x*) in *X* is

$$p(\mathbf{x}) = p(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) = \prod\_{i=1}^n p(\mathbf{x}\_i), \ x\_i = 0 \text{ or } 1. \tag{3.7}$$

where *p*(0) = λ, *p*(1) = 1 − λ.

*Example 3.4* (*Degenerate information space*) If *X* = {*x*}, it contains only one character. *X* is called a degenerate information space, or trivial information space. The random variable ξ takes the value *x* of probability 1, that is *P*{ξ = *x*} = 1. At this time, ξ is a random variable with degenerate distribution in probability.

**Definition 3.4** Let *X* = {*x*1, *x*2,..., *xn*} be a source state sets, if *X* is an information space, the information entropy *H*(*X*) of *X* is defined as

$$H(X) = -\sum\_{\mathbf{x} \in X} p(\mathbf{x}) \log p(\mathbf{x}) = -\sum\_{i=1}^{n} p(\mathbf{x}\_i) \log p(\mathbf{x}\_i),\tag{3.8}$$

if *p*(*xi*) = 0 in the above formula, we agreed that *p*(*xi*)log *p*(*xi*) = 0, the base of logarithm can be selected arbitrarily; if the base of the logarithm is *D*(*D* ≥ 2), then *H*(*X*) is called *D*-ary entropy, sometimes denote as *HD*(*X*).

**Theorem 3.1** *For any information space X, always have*

$$0 \le H(X) \le \log |X|. \tag{3.9}$$

*And H*(*X*) = 0 *if and only if X is a degenerate information space, H*(*X*) = log |*X*| *if and only if X is a equal probability information space.*

*Proof H*(*X*) ≥ 0 is trivial. We only prove the inequality on the right of Eq. (3.9). Because *f* (*x*) = log *x* is a strictly convex real value, from the Lemma 1.7 in Chap. 1, thake *<sup>g</sup>*(*x*) <sup>=</sup> <sup>1</sup> *<sup>p</sup>*(*x*) is a positive function, *p*(*x*) > 0, thus let *X* = {*x*1, *x*2,..., *xm*},

$$H(X) = \sum\_{i=1}^{m} p(\mathbf{x}\_i) \log \frac{1}{p(\mathbf{x}\_i)} \le \log \sum\_{i=1}^{m} \frac{p(\mathbf{x}\_i)}{p(\mathbf{x}\_i)} = \log m.$$

The above equal sign holds if and only if *<sup>p</sup>*(*x*1) <sup>=</sup> *<sup>p</sup>*(*x*2) =···= *<sup>p</sup>*(*xm*) <sup>=</sup> <sup>1</sup> *<sup>m</sup>* , that is, *X* is equal probability information space. If *X* = {*x*} is a degenerate information space, because *p*(*x*) = 1, so *H*(*X*) = 0. Conversely, if *H*(*X*) = 0, let *X* = {*x*1, *x*2,..., *xm*}, suppose ∃ *xi* ∈ *X*, such that 0 < *p*(*xi*) < 1, then

$$0 < p(\mathbf{x}\_i) \log \frac{1}{p(\mathbf{x}\_i)} \le H(X).$$

So there is *p*(*xi*) = 1, but *p*(*x <sup>j</sup>*) = 0(*j* = *i*); at this time, *X* degenerates into *X* = {*xi*}, which is a trivial information space, the Lemma holds.

An information space is a dynamic code (which changes with the change of the random variable on it). For "dynamic code", that is, the code rate of information space *X*, Shannon replaces <sup>1</sup> *<sup>n</sup> H*(*X*) with information entropy, so information entropy *H*(*X*) becomes the first mathematical quantity to describe dynamic code. From Theorem 3.1, when the code is degenerate, the minimum rate of a dynamic code is 0, when the code is equal probability, the maximum rate is the rate of the usual static code.

Next, we discuss the information entropy of several typical information spaces.

*Example 3.5* (i) Let *X* be the two-point information space of parameter λ, then

$$H(X) = -\lambda \log \lambda - (1 - \lambda) \log(1 - \lambda) = H(\lambda).$$

*H*(λ) we defined it in Chap. 1, it was called binary information entropy function at that time. Now we know why it is called entropy function

(ii) *X* = {*x*} is degraded information space, then *H*(*X*) = 0.

(iii) When *X* is equal overview information space, then *H*(*X*) = log |*X*|.

*Remark* Most authors directly regard a random variable as an information space. Mathematically, it is convenient to do so and call it the information measurement of random variables. However, from the perspective of information, using the concept of information space can better understand and simplify Shannon's theory; the core idea of this theory is the random measurement of information, not the information measurement of random variables.

# **3.2 Joint Entropy, Conditional Entropy, Mutual Information**

**Definition 3.5** Let *X*, *Y* be two information spaces, and ξ,η be random variables with corresponding values, respectively. If ξ and η are independent random variables, that is

$$P\{\xi = \mathbf{x}, \eta = \mathbf{y}\} = P\{\xi = \mathbf{x}\} \cdot P\{\eta = \mathbf{y}\}, \text{ } \forall \mathbf{x} \in X, \mathbf{y} \in Y.$$

*X* and *Y* are called independent information space, and the probability distribution of joint events is

$$p(\mathbf{x}\mathbf{y}) = p(\mathbf{x})p(\mathbf{y}), \ \forall \mathbf{x} \in X, \mathbf{y} \in Y.$$

**Definition 3.6** Let *X*, *Y* be two information spaces, the information entropy *H*(*XY* ) of the product space *XY* is called the joint entropy of *X* and *Y* , that is

$$H(XY) = -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}\mathbf{y}).\tag{3.10}$$

The conditional entropy *H*(*X*|*Y* ) of *X* versus *Y* is defined as

$$H(X|Y) = -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}|\mathbf{y}).\tag{3.11}$$

**Lemma 3.2** (Addition formula of entropy) *For any two information spaces X and Y , then we have*

$$H(XY) = H(X) + H(Y|X) = H(Y) + H(X|Y).$$

*Generally, for n information spaces X*1, *X*2,..., *Xn, we have*

$$H(X\_1 X\_2 \cdots X\_n) = \sum\_{i=1}^n H(X\_i | X\_{i-1} X\_{i-2} \cdots X\_1). \tag{3.12}$$

*Proof* By (3.10) and probability multiplication formula,

$$\begin{aligned} H(XY) &= -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}\mathbf{y}) \\ &= -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) (\log p(\mathbf{x}) + \log p(\mathbf{y}|\mathbf{x})) \\ &= -\sum\_{\mathbf{x}\in X} p(\mathbf{x}) \log p(\mathbf{x}) + H(Y|X) \\ &= H(X) + H(Y|X). \end{aligned}$$

The same can be proved

$$H(XY) = H(Y) + H(X|Y).$$

We prove (3.12) by induction, when *n* = 2,

$$H(X\_1 X\_2) = H(X\_1) + H(X\_2 | X\_1).$$

The proposition is true, and for general *n*, we have

$$\begin{aligned} H(X\_1 X\_2 \cdots X\_n) &= H(X\_1 X\_2 \cdots X\_{n-1}) + H(X\_n | X\_1 X\_2 \cdots X\_{n-1}) \\ &= \sum\_{i=1}^{n-1} H(X\_i | X\_{i-1} X\_{i-2} \cdots X\_1) + H(X\_n | X\_1 X\_2 \cdots X\_{n-1}) \\ &= \sum\_{i=1}^n H(X\_i | X\_{i-1} X\_{i-2} \cdots X\_1). \end{aligned}$$

The Lemma 3.2 holds.

**Theorem 3.2** *We have*

$$H(XY) \le H(X) + H(Y). \tag{3.13}$$

*If and only if X and Y are statistically independent information spaces,*

$$H(XY) = H(X) + H(Y). \tag{3.14}$$

*Generally, we have*

$$H(X\_1 X\_2 \cdots X\_n) \le H(X\_1) + H(X\_2) + \cdots + H(X\_n). \tag{3.15}$$

*If and only if X*1, *X*2,..., *Xn is an independent random process,*

$$H(X\_1 X\_2 \cdots X\_n) = H(X\_1) + H(X\_2) + \cdots + H(X\_n). \tag{3.16}$$

*Proof* By definition and Jensen inequality, we have

$$\begin{aligned} H(XY) - H(X) - H(Y) &= \sum\_{x \in X} \sum\_{\mathbf{y} \in Y} p(\mathbf{xy}) \log \frac{p(\mathbf{x}) p(\mathbf{y})}{p(\mathbf{xy})} \\ &\le \log \sum\_{\mathbf{x} \in X} \sum\_{\mathbf{y} \in Y} p(\mathbf{x}) p(\mathbf{y}) \\ &= 0. \end{aligned}$$

The above equal sign holds, if and only if for all *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>*, *<sup>y</sup>* <sup>∈</sup> *<sup>Y</sup>* , *<sup>p</sup>*(*x*)*p*(*y*) *<sup>p</sup>*(*x y*) = *c*( where *c* is a constant), thus *p*(*x*)*p*(*y*) = *cp*(*x y*). Both sides sum at the same time, we have

$$1 = \sum\_{\mathbf{x} \in X} p(\mathbf{x}) \sum\_{\mathbf{y} \in Y} p(\mathbf{y}) = c \sum\_{\mathbf{x} \in X} \sum\_{\mathbf{y} \in Y} p(\mathbf{x} \mathbf{y}),$$

thus *c* = 1, *p*(*x y*) = *p*(*x*)*p*(*y*). So if and only if *X* and *Y* are independent information spaces, (3.14) holds. By induction, we have (3.15) and (3.16). Theorem 3.2 holds.

By (3.15), we have the following direct corollary; for any information space *X* and *n* ≥ 1, we have

$$H(X^n) \le nH(X). \tag{3.17}$$

**Definition 3.7** Let *X* and *Y* be two information spaces, and say that *X* is completely determined by *Y* , if there is always a subset *Nx* ⊂ *Y* of *Y* for any given *x* ∈ *X*, satisfies

$$\begin{cases} p(\mathbf{x}|\mathbf{y}) = 1, & \text{if } \mathbf{y} \in N\_{\mathbf{x}}; \\ p(\mathbf{x}|\mathbf{y}) = 0, & \text{if } \mathbf{y} \notin N\_{\mathbf{x}}. \end{cases} \tag{3.18}$$

With regard to conditional information entropy *H*(*X*|*Y* ), we have the following two important special cases.

**Lemma 3.3** *(i)* 0 ≤ *H*(*X*|*Y* ) ≤ *H*(*X*)*. (ii) If the information space X is completely determined by Y , then*

$$H(X|Y) = 0.\tag{3.19}$$

*(iii) If X and Y are two separate information spaces,*

$$H(X|Y) = H(X). \tag{3.20}$$

*Proof* (i) is trivial. Let us prove (3.19) first. By Definition 3.7 and (3.18), for given *x* ∈ *X*, we have

$$p(\mathbf{x}\mathbf{y}) = p(\mathbf{y})p(\mathbf{x}|\mathbf{y}) = 0, \text{ y } \notin N\_x.$$

Thus

$$\begin{aligned} H(X|Y) &= -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}|\mathbf{y}) \\ &= -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in N\_{\mathbf{x}}} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}|\mathbf{y}) = 0. \end{aligned}$$

The proof of the formula (3.20) is obvious. Because *X* and *Y* are independent, the conditional probability

$$p(\mathbf{x}|\mathbf{y}) = p(\mathbf{x}), \ \forall \mathbf{x} \in X, \mathbf{y} \in Y.$$

Thus

$$\begin{aligned} H(X|Y) &= -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}) p(\mathbf{y}) \log p(\mathbf{x}) \\ &= -\sum\_{\mathbf{x}\in X} p(\mathbf{x}) \log p(\mathbf{x}) = H(X). \end{aligned}$$

The Lemma 3.3 holds.

Next, we define the mutual information *I*(*X*, *Y* ) of two information spaces *X* and *Y* .

**Definition 3.8** Let *X* and *Y* be two information spaces, and then their mutual information *I*(*X*, *Y* ) is defined as

$$I(X,Y) = \sum\_{\mathbf{x} \in X} \sum\_{\mathbf{y} \in Y} p(\mathbf{x}\mathbf{y}) \log \frac{p(\mathbf{x}|\mathbf{y})}{p(\mathbf{x})}.\tag{3.21}$$

From the multiplication formula of probability, for all *x* ∈ *X*, *y* ∈ *Y* ,

$$p(\mathbf{x})p(\mathbf{y}|\mathbf{x}) = p(\mathbf{y})p(\mathbf{x}|\mathbf{y}) = p(\mathbf{x}\mathbf{y}).$$

We have

$$\frac{p(\mathbf{x}|\mathbf{y})}{p(\mathbf{x})} = \frac{p(\mathbf{y}|\mathbf{x})}{p(\mathbf{y})}.$$

Therefore, there is a direct conclusion from the definition of mutual information *I*(*X*, *Y* )

$$I(X,Y) = I(Y,X).$$

**Lemma 3.4**

$$H(X,Y) = H(X) - H(X|Y) = H(Y) - H(Y|X).$$

*Proof* By definition,

$$\begin{aligned} I(X,Y) &= \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log \frac{p(\mathbf{x}|\mathbf{y})}{p(\mathbf{x})} \\ &= \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}|\mathbf{y}) - \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log p(\mathbf{x}) \\ &= -H(X|Y) - \sum\_{\mathbf{x}\in X} p(\mathbf{x}) \log p(\mathbf{x}) \\ &= H(X) - H(X|Y). \end{aligned}$$

The same can be proved

$$I(X,Y) = H(Y) - H(Y|X).$$

**Lemma 3.5** *Assuming that X and Y are two information spaces, I*(*X*, *Y* ) *is the amount of mutual information, then*

$$H(XY) = H(X) + H(Y) - I(X, Y). \tag{3.22}$$

*Further, we have I*(*X*, *Y* ) ≥ 0*, if and only if X and Y are independent, I*(*X*, *Y* ) = 0*.*

*Proof* By the addition formula of Lemma 3.2,

$$\begin{aligned} H(XY) &= H(X) + H(Y|X) \\ &= H(X) + H(Y) - (H(Y) - H(Y|X)) \\ &= H(X) + H(Y) - I(X,Y). \end{aligned}$$

The conclusion about *I*(*X*, *Y* ) ≥ 0 can be deduced directly from Theorem 3.2.

Let us prove an equation about entropy commonly used in the statistical analysis of cryptography.

**Theorem 3.3** *If X*, *Y*, *Z are three information spaces, then*

$$\begin{split}H(XY|Z) &= H(X|Z) + H(Y|XZ) \\ &= H(Y|Z) + H(X|YZ). \end{split} \tag{3.23}$$

*Proof* By the definition, we have

$$H(XY|Z) = -\sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{x} \mathbf{y} z) \log p(\mathbf{x} \mathbf{y} | z).$$

By probability product formula,

$$p(\mathbf{x}yz) = p(z)p(\mathbf{x}y|z) = p(\mathbf{x}z)p(\mathbf{y}|\mathbf{x}z).$$

Thus

$$p(\mathbf{x}|\mathbf{y}|\mathbf{z}) = \frac{p(\mathbf{x}|\mathbf{z})p(\mathbf{y}|\mathbf{x}\mathbf{z})}{p(\mathbf{z})} = p(\mathbf{x}|\mathbf{z})p(\mathbf{y}|\mathbf{x}\mathbf{z}).$$

So we have

$$\begin{split} H(XY|Z) &= -\sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{x} \mathbf{y}z) \log p(\mathbf{x}|z) p(\mathbf{y}|\mathbf{x}z) \\ &= -\sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{x}\mathbf{y}z) (\log p(\mathbf{x}|z) + \log p(\mathbf{y}|\mathbf{x}z)) \\ &= -\sum\_{x \in X} \sum\_{z \in Z} p(\mathbf{x}z) \log p(\mathbf{x}|z) - \sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{x}\mathbf{y}z) \log p(\mathbf{y}|\mathbf{x}z) \\ &= H(X|Z) + H(Y|XZ). \end{split}$$

Similarly, the second formula can be proved.

Finally, we extend the formula (3.15) to conditional entropy.

**Lemma 3.6** *Let X*1, *X*2,..., *Xn*, *Y be information spaces, then we have*

$$H(X\_1 X\_2 \cdots X\_n | Y) \le H(X\_1 | Y) + \cdots + H(X\_n | Y). \tag{3.24}$$

*Specially, when X*<sup>1</sup> = *X*<sup>2</sup> =···= *Xn* = *X,*

$$H(X^n|Y) \le nH(X|Y). \tag{3.25}$$

*Proof* We make an induction of *n*. The proposition is trivial when *n* = 1. Let the proposition be true when *n*, i.e.,

$$H(X\_1 X\_2 \cdots X\_n | Y) \le H(X\_1 | Y) + \cdots + H(X\_n | Y).$$

Then when *n* + 1, we let *X* = *X*1*X*<sup>2</sup> ··· *Xn*, then

$$H(X\_1 X\_2 \cdots X\_{n+1} | Y) = H(X X\_{n+1} | Y)$$

$$= -\sum\_{x \in X} \sum\_{z \in X\_{n+1}} \sum\_{\mathbf{y} \in Y} p(\mathbf{x} z \mathbf{y}) \log p(\mathbf{x} z | \mathbf{y}).$$

From the full probability formula,

$$H(X|Y) + H(X\_{n+1}|Y) = -\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{z}\in X\_{n+1}} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{z}\mathbf{y}) \log p(\mathbf{x}|\mathbf{y}) p(\mathbf{z}|\mathbf{y}) .$$

So by Jensen inequality,

$$\begin{aligned} &H(XX\_{n+1}|Y) - H(X|Y) - H(X\_{n+1}|Y) \\ &= \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{z}\in X\_{n+1}} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{z}\mathbf{y}) \log \frac{p(\mathbf{x}|\mathbf{y})p(\mathbf{z}|\mathbf{y})}{p(\mathbf{x}\mathbf{z}|\mathbf{y})} \\ &\leq \log \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{z}\in X\_{n+1}} \sum\_{\mathbf{y}\in Y} p(\mathbf{y})p(\mathbf{x}|\mathbf{y})p(\mathbf{z}|\mathbf{y}). \end{aligned}$$

By product formula

$$\begin{aligned} &\sum\_{\mathbf{x}\in X} \sum\_{\mathbf{z}\in X\_{\mathbf{s}+1}} \sum\_{\mathbf{y}\in Y} p(\mathbf{y}) p(\mathbf{x}|\mathbf{y}) p(\mathbf{z}|\mathbf{y}) \\ &= \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}|\mathbf{y}) p(\mathbf{y}) \\ &= \sum\_{\mathbf{x}\in X} p(\mathbf{x}) = 1. \end{aligned}$$

So by the inductive hypothesis,

$$\begin{aligned} H(XX\_{n+1}|Y) &\le H(X\_{n+1}|Y) + H(X|Y) \\ &\le H(X\_1|Y) + H(X\_2|Y) + \dots + H(X\_{n+1}|Y) .\end{aligned}$$

The proposition holds for *n* + 1. So the Lemma holds.

# **3.3 Redundancy**

Select a alphabet F*<sup>q</sup>* or a remaining class ring Z*<sup>m</sup>* of module *m*, each element in the alphabet is called character, and in the field of communication, alphabet is also called source state, and character is also called transmission signal. If the length of a *q*-ary code is increased, redundant transmission signals or characters will appear in each codeword. The digital measurement of "redundant characters" is called redundancy, which is a technical means to improve the accuracy of codeword transmission, and redundancy is an important mathematical quantity to describe this technical means. Therefore, we start by proving the following lemma.

**Lemma 3.7** *Let X*, *Y*, *Z be three information spaces, then*

$$H(X|YZ) \le H(X|Z). \tag{3.26}$$

*Proof* By total probability formula,

$$\begin{aligned} H(X|Z) &= -\sum\_{\mathbf{x}\in X} \sum\_{z\in Z} p(\mathbf{x}z) \log p(\mathbf{x}|z) \\ &= -\sum\_{\mathbf{x}\in X} \sum\_{z\in Z} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}z) \log p(\mathbf{x}|z) .\end{aligned}$$

So

$$\begin{split} &H(X|YZ) - H(X|Z) \\ &= \sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(xyz) \log \frac{p(x|z)}{p(x|zy)} \\ &\leq \log \sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} \frac{p(xyz)p(x|z)}{p(x|zy)} \\ &= \log \sum\_{x \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{y}z)p(x|z) \\ &= \log \sum\_{x \in X} \sum\_{z \in Z} p(z)p(x|z) \\ &= 0. \end{split}$$

Thus *H*(*X*|*Y Z*) ≤ *H*(*X*|*Z*). The Lemma holds.

Let *X* be a source state set and randomly select codewords to enter the channel of information transmission is a discrete random process. This mathematical model can be constructed and studied on *X* by taking the value of a group of random variables {ξ*i*}*<sup>i</sup>*≥1. Firstly, we assume that {ξ*i*}*<sup>i</sup>*≥<sup>1</sup> obeys the same probability distribution when taking value on *X*, and we get a set of information spaces {*X<sup>i</sup>* }*<sup>i</sup>*≥1, let *H*<sup>0</sup> = log |*X*| be the entropy of *X* as the equal probability information space, for *n* ≥ 1, we let

$$H\_n = H(X|X^{n-1}),\ H\_1 = H(X).$$

By Lemma 3.7, then {*Hn*} constitutes a number sequence with monotonic descent and lower bound, so that its limit exists, that is

$$\lim\_{n \to \infty} H\_n = a \ (a \ge 0). \tag{3.27}$$

We will extend the above observation to the general case: Let {ξ*i*}*i*≥<sup>1</sup> be any set of random variables valued on *X*, for any *n* ≥ 1, we let

$$X\_n = (X, \xi\_n), \ n \ge 1.$$

**Definition 3.9** A source state set *X* has a set of random variables {ξ*i*}*i*≥<sup>1</sup> valued on *X*, then *X* is called a source.


$$(\xi\_{t\_1}, \xi\_{t\_2}, \dots, \xi\_{t\_k}) (\xi\_{t\_1+h}, \xi\_{t\_2+h}, \dots, \xi\_{t\_k+h})$$

obey the same joint probability distribution, then *X* is called a stationary source. (iii) If {ξ*i*}*<sup>i</sup>*≥<sup>1</sup> is a *k*-order Markov process, that is, for ∀ *m* > *k* ≥ 1,

$$\begin{aligned} &p(\boldsymbol{\chi}\_m|\boldsymbol{\chi}\_{m-1}\boldsymbol{\chi}\_{m-2}\cdots\boldsymbol{\chi}\_1) \\ &= p(\boldsymbol{\chi}\_m|\boldsymbol{\chi}\_{m-1}\boldsymbol{\chi}\_{m-2}\cdots\boldsymbol{\chi}\_{m-k}), \ \forall \ \boldsymbol{\chi}\_1, \boldsymbol{\chi}\_2, \ldots, \boldsymbol{\chi}\_m \in X, \end{aligned}$$

Then *X* is called *k*-order Markov source, specially, *k* = 1, i.e.,

$$p(\mathbf{x}\_m|\mathbf{x}\_{m-1}\mathbf{x}\_{m-2}\cdots\mathbf{x}\_1) = p(\mathbf{x}\_m|\mathbf{x}\_{m-1}), \ \forall \,\mathbf{x}\_1, \mathbf{x}\_2, \ldots, \mathbf{x}\_m \in X,$$

call *X* Markov source.

The concept from information space to source changes from a single random variable taking value on *X* to an infinite dimensional random vector, so that the transmission process of code *X* constitutes a discrete random process. By definition, we have

**Lemma 3.8** *Let X be a source state set, and* {ξ*i*}*<sup>i</sup>*≥<sup>1</sup> *be a set of random variables valued on X, we write*

$$X\_i = (X, \xi\_i), \ i \ge 1. \tag{3.28}$$

*(i) If X is a memoryless source, the joint probability distribution on X satisfies*

$$p(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) = \prod\_{i=1}^n p(\mathbf{x}\_i), \ x\_i \in X\_i, \ n \ge 1. \tag{3.29}$$

*(ii) If X is a stationary source, then for all integers t*1, *t*2,...,*tk* (*k* ≥ 1) *and h, there is the following joint probability distribution,*

$$p(\mathbf{x}\_{t\_1}\mathbf{x}\_{t\_2}\cdots\mathbf{x}\_{t\_k}) = p(\mathbf{x}\_{t\_1+h}\mathbf{x}\_{t\_2+h}\cdots\mathbf{x}\_{t\_k+h}),\tag{3.30}$$

*where xi* ∈ *Xi*, *i* ≥ 1.

*(iii) If X is a stationary Markov source, then the conditional probability distribution on X satisfies for any m* ≥ 1 *and x*1*x*<sup>2</sup> ··· *xm* ∈ *X*1*X*<sup>2</sup> ··· *Xm, we have*

$$\begin{split}p(\mathbf{x}\_m|\mathbf{x}\_1\cdots\mathbf{x}\_{m-1}) &= p(\mathbf{x}\_m|\mathbf{x}\_{m-1})\\ &= P\{\xi\_{i+1} = \mathbf{x}\_m|\xi\_i = \mathbf{x}\_{m-1}\},\ \forall \ 1 \le i \le m-1.\end{split}\tag{3.31}$$

*Proof* (i) and (ii) can be derived directly from the definition. We only prove (iii). By (ii) of the definition 3.9, for ∀ *i* ≥ 1, we have

$$P\{\xi\_i = \mathbf{x}\_{m-1}, \xi\_{i+1} = \mathbf{x}\_m\} = P\{\xi\_{m-1} = \mathbf{x}\_{m-1}, \xi\_m = \mathbf{x}\_m\}$$

and

$$P\{\xi\_i = \chi\_{m-1}\} = P\{\xi\_{m-1} = \chi\_{m-1}\}.$$

Thus

$$\begin{aligned} &P\{\xi\_i = \chi\_{m-1}\} P\{\xi\_{i+1} = \chi\_m | \xi\_i = \chi\_{m-1}\} \\ &= P\{\xi\_{m-1} = \chi\_{m-1}\} P\{\xi\_m = \chi\_m | \xi\_{m-1} = \chi\_{m-1}\}. \end{aligned}$$

We have

$$P\{\xi\_{i+1} = \chi\_m | \xi\_i = \chi\_{m-1}\} = p(\chi\_m | \chi\_{m-1}).$$

The Lemma holds.

**Corollary 3.1** *A memoryless source X must be a stationary source.*

*Proof* Derived directly from Definition 3.9.

Next, we extend the limit formula in memoryless sources revealed by formula (3.27) to general stationary sources. For this purpose, we first prove two lemmas.

**Lemma 3.9** *Let* { *f* (*n*)}*<sup>n</sup>*-<sup>1</sup> *be a sequence of real numbers, which satisfies the following semi countable additivity,*

$$f(n+m) \lesssim f(n) + f(m), \quad \forall n \geqslant 1, \ m \geqslant 1.$$

*Then* lim *n*→∞ 1 *<sup>n</sup> <sup>f</sup>* (*n*) *exists, and*

$$\lim\_{n \to \infty} \frac{1}{n} f(n) = \inf \left\{ \frac{1}{n} f(n) | n \geqslant 1 \right\}.\tag{3.32}$$

*Proof* Let

$$\delta = \inf \left\{ \frac{1}{n} f(n) | n \geqslant 1 \right\}, \ \delta \neq -\infty.$$

For any ε > 0, select a sufficiently large positive integer *m* so that

$$\frac{1}{m}f(m) < \delta + \frac{\varepsilon}{2}.$$

Let *n* = *am* + *b*, where *a* is an integer, 0 *b* < *m*, by semi countable additivity, we have

$$f(n) \le af(m) + (n - am)f(1).$$

Divide *n* on both sides, we have

$$\frac{1}{n}f(n) \leqslant \frac{a}{am+b}f(m) + \frac{b}{am+b}f(1).$$

For given *b*, when *m* is large enough, we have

$$\frac{bf(\mathbf{l})}{am+b} < \frac{1}{2}\varepsilon.$$

So there is

$$\frac{1}{m}f(n) < \frac{1}{m}f(m) + \frac{1}{2}\varepsilon < \varepsilon + \delta. \tag{3.33}$$

Thus we have

$$\delta \ll \varliminf\_{n \to \infty} \frac{1}{n} f(n) \ll \varlimsup\_{n \to \infty} \frac{1}{n} f(n) \prec \delta + \varepsilon.$$

So

$$\lim\_{n \to \infty} \frac{1}{n} f(n) = \delta.$$

If δ = −∞, by (3.33),

$$\overline{\lim\_{n \to \infty}} \frac{1}{n} f(n) = -\infty,$$

so we still have

$$\lim\_{n \to \infty} \frac{1}{n} f(n) = \delta = -\infty.$$

The Lemma holds.

**Lemma 3.10** *Let* {*an*}*<sup>n</sup>*-<sup>1</sup> *be a sequence of real numbers, and the limit* lim *<sup>n</sup>*→∞ *an* <sup>=</sup> *a, then*

$$\lim\_{n \to \infty} \frac{1}{n} \sum\_{i=1}^{n} a\_i = a.$$

*Proof*

$$\begin{aligned} \left| \frac{1}{n} \sum\_{i=1}^{n} a\_i - a \right| &= \left| \frac{1}{n} \sum\_{i=1}^{n} (a\_i - a) \right| &\leq \frac{1}{n} \sum\_{i=1}^{n} |(a\_i - a)| \\ &= \frac{1}{n} \sum\_{i=1}^{N} |a\_i - a| + \frac{1}{n} \sum\_{i=N+1}^{n} |a\_i - a| \\ &< \frac{1}{n} \sum\_{i=1}^{N} |a\_i - a| + \frac{n - N}{n} \varepsilon \\ &< \frac{1}{n} \sum\_{i=1}^{N} |a\_i - a| + \varepsilon. \end{aligned}$$

When ε > 0 is given, *N*is also given accordingly, the first item of the above formula tends to 0, when *n* → ∞. So for any ε > 0, when *n* > *N*0,

$$\left| \frac{1}{n} \sum\_{i=1}^n a\_i - a \right| < 2\varepsilon.$$

Thus there is

$$\lim\_{n \to \infty} \frac{1}{n} \sum\_{i=1}^{n} a\_i = a.$$

The Lemma holds.

With the above preparations, we now give the main results of this section.

**Theorem 3.4** *Let X be any source,* {ξ*i*}*<sup>i</sup>*-<sup>1</sup> *is a set of random variables valued on X. For any positive integer n* 1*, let*

$$X\_n = (X, \xi\_n), \quad n \gg 1.$$

*Then when X is a stationary source, we have the following two limits that exist and are equal, that is*

$$\lim\_{n \to \infty} \frac{1}{n} H(X\_1 X\_2 \dots X\_n) = \lim\_{n \to \infty} H(X\_n | X\_1 X\_2 \dots X\_{n-1}).$$

*We denote the above common limit as H*∞(*X*)*.*

*Proof* Because *X* is a stationary source, for any *n* 1, *m* 1, then the joint event probability distribution of random vector {ξ*<sup>n</sup>*+<sup>1</sup>, ξ*<sup>n</sup>*+<sup>2</sup>,...,ξ*<sup>n</sup>*+*m*} on *X* is equal to the joint probability distribution of random vector (ξ1, ξ2,...,ξ*m*); therefore, we have

$$H(X\_1 X\_2 \cdots X\_m) = H(X\_{n+1} X\_{n+2} \cdots X\_{n+m}).\tag{3.34}$$

By Theorem 3.2, then

$$\begin{aligned} H(X\_1 X\_2 \cdots X\_n X\_{n+1} \cdots X\_{n+m}) &\leqslant H(X\_1 \cdots X\_n) + H(X\_{n+1} \cdot \cdots X\_{n+m}) \\ &= H(X\_1 \cdots X\_n) + H(X\_1 \cdots X\_m). \end{aligned}$$

Let *f* (*n*) = *H*(*X*<sup>1</sup> ··· *Xn*), then *f* (*n* + *m*) *f* (*n*) + *f* (*m*), so { *f* (*n*)}*n*-<sup>1</sup> is a nonnegative real number sequence with semi countable additive property, by Lemma 3.9, we have

$$\lim\_{n \to \infty} \frac{1}{n} H(X\_1 X\_2 \cdots X\_n) = \inf \left\{ \frac{1}{n} H(X\_1 X\_2 \cdots X\_n) | n \geqslant 1 \right\} \geqslant 0.$$

Next, we prove that there is a second limit, that is

$$\lim\_{n \to \infty} H(X\_n | X\_1 X\_2 \cdots X\_{n-1}) \text{exist.}$$

Firstly, we prove that the sequence is monotonically decreasing, because *X* is a stationary source, so

$$H(X\_1 X\_2 \cdots \cdot X\_{n-1}) = H(X\_2 X\_3 \cdots \cdot X\_n)$$

and

$$H(X\_2X\_3\cdots X\_nX\_{n+1}) = H(X\_1X\_2\cdots X\_n).$$

So we have

$$H(X\_{n+1}|X\_2X\_3\cdots X\_n) = H(X\_n|X\_1X\_2\cdots X\_{n-1}).\tag{3.35}$$

By Lemma 3.7,

$$\begin{aligned} H(X\_{n+1}|X\_1X\_2\cdots X\_n) &\leqslant H(X\_{n+1}|X\_2X\_3\cdots X\_n) \\ &= H(X\_n|X\_1X\_2\cdots X\_{n-1}). \end{aligned}$$

So {*H*(*Xn*|*X*1*X*<sup>2</sup> ··· *Xn*−1)}*<sup>n</sup>*-<sup>1</sup> is a monotonically decreasing sequence and has a lower bound, so lim*<sup>n</sup>*→∞ *<sup>H</sup>*(*Xn*|*X*1*X*<sup>2</sup> ··· *Xn*−1) exist. Further, by the addition formula of Lemma 3.2,

$$\frac{1}{n}H(X\_1X\_2\cdots X\_n) = \frac{1}{n}\sum\_{i=1}^n H(X\_i|X\_1X\_2\cdots X\_{i-1}).$$

By Lemma 3.10, finally we have

$$\lim\_{n \to \infty} \frac{1}{n} H(X\_1 X\_2 \cdots X\_n) = \lim\_{n \to \infty} H(X\_n | X\_1 X\_2 \cdots X\_{n-1}) = H\_{\infty}(X).$$

We completed the proof of the Theorem.

We call *H*∞(*X*) the entropy rate of source *X*. obviously, there is the following corollary.

**Corollary 3.2** *(i) For any stationary source X, we have*

$$H\_{\infty}(X) \lesssim H(X\_1) \lesssim \log |X|.$$

*(ii) If X is a memoryless source, then*

$$H\_{\infty}(X) = H(X\_1).$$

*(iii) If X is a stationary Markov source, then*

$$H\_{\infty}(X) = H(X\_2|X\_1).$$

*Proof* Since {*H*(*Xn*|*X*<sup>1</sup> ··· *Xn*−<sup>1</sup>)}*n*-<sup>1</sup> is a monotonically decreasing sequence, then

$$H\_{\infty}(X) \lesssim H(X\_1).$$

That is, (i) holds. If *X* is a memoryless source, then

$$\begin{aligned} H(X\_1 \cdots X\_n) &= -\sum\_{\mathbf{x}\_1 \in X\_1} \dots \sum\_{\mathbf{x}\_n \in X\_n} p(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) \log p(\mathbf{x}\_1 \mathbf{x}\_2 \cdots \mathbf{x}\_n) \\ &= -\sum\_{\mathbf{x}\_1 \in X\_1} \dots \sum\_{\mathbf{x}\_n \in X\_n} p(\mathbf{x}\_1 \dots \mathbf{x}\_n) \left\{ \log p(\mathbf{x}\_1) + \dots + \log p(\mathbf{x}\_n) \right\} \\ &= nH(X\_1) .\end{aligned}$$

So we have

$$H\_{\infty}(X) = H(X\_1).$$

Similarly, we can prove (iii).

**Definition 3.10** Let *X* be a stationary source, we define

$$\delta = \log|X| - H\_{\infty}(X), \ r = 1 - \frac{H\_{\infty}X}{\log|X|},\tag{3.36}$$

δ is the redundancy of information space *X*, and *r* is the relative redundancy of *X*.

We write

$$H\_0 = \log|X|,\ H\_n = H(X\_n|X\_1X\_2\cdots X\_{n-1}),\ \forall n \ge 1.$$

By Theorem 3.4, we have *H*∞(*X*) = *H*<sup>0</sup> ≤ *Hn*, so

$$H\_n \ge (1 - r)H\_0, \ \forall n \ge 1. \tag{3.37}$$

In information theory, redundancy is used to describe the effectiveness of the information carried by the source output symbol. The smaller the redundancy, the higher the effectiveness of the information carried by the source output symbol, and vice versa.

# **3.4 Markov Chain**

Let *X*, *Y*, *Z* be three information spaces, if there is the following conditional probability formula

$$p(\mathbf{x}|\mathbf{y}|z) = p(\mathbf{x}|z)p(\mathbf{y}|z). \tag{3.38}$$

Say that *X* and *Y* are statistically independent under the given condition of *Z*.

**Definition 3.11** If the information space *X* and *Y* are statistically independent under condition *Z*, *X*, *Y*, *Z* is called a Markov chain, denote as *X* → *Z* → *Y* .

**Theorem 3.5** *X* → *Z* → *Y is a Markov chain if and only if the probability of occurrence of the joint event x zy is*

$$p(\mathbf{x}z\mathbf{y}) = p(\mathbf{x})p(z|\mathbf{x})p(\mathbf{y}|z),\tag{3.39}$$

*if and only if*

$$p(\mathbf{x}z\mathbf{y}) = p(\mathbf{y})p(z|\mathbf{y})p(\mathbf{x}|z). \tag{3.40}$$

*Proof* If *X* → *Z* → *Y* is a Markov chain, then *p*(*x y*|*z*) = *p*(*x*|*z*)*p*(*y*|*z*), thus

$$\begin{aligned} p(\mathbf{x}z\mathbf{y}) &= p(z)p(\mathbf{x}\mathbf{y}|z) \\ &= p(z)p(\mathbf{x}|z)p(\mathbf{y}|z) \\ &= p(\mathbf{x})p(z|\mathbf{x})p(\mathbf{y}|z) .\end{aligned}$$

Similarly,

$$\begin{aligned} p(\mathbf{x}z\mathbf{y}) &= p(\mathbf{z})p(\mathbf{y}|\mathbf{z})p(\mathbf{x}|\mathbf{z})\\ &= p(\mathbf{y})p(\mathbf{z}|\mathbf{y})p(\mathbf{x}|\mathbf{z}). \end{aligned}$$

That is (3.39) and (3.40) holds. Conversely, if (3.39) holds, then

$$\begin{aligned} p(\mathbf{x}z\mathbf{y}) &= p(\mathbf{x})p(\mathbf{z}|\mathbf{x})p(\mathbf{y}|\mathbf{z})\\ &= p(\mathbf{z})p(\mathbf{x}|\mathbf{z})p(\mathbf{y}|\mathbf{z}). \end{aligned}$$

On the other hand, the product formula

$$p(\mathbf{x}z\mathbf{y}) = p(z)p(\mathbf{x}\mathbf{y}|z).$$

So we have

$$p(xy|z) = p(x|z)p(y|z).$$

That is *X* → *Z* → *Y* is a Markov chain. Similarly, if (3.40) holds, then *X* → *Z* → *Y* also is a Markov chain. The Theorem holds.

According to the above Theorem, or by Definition 3.11, obviously, if *X* → *Z* → *Y* is a Markov chain, then *Y* → *Z* → *X* is also a Markov chain.

**Definition 3.12** Let *U*, *X*, *Z*, *Y* be four information spaces, and the probability of joint event *uxzy* is

$$p(u \ge \mathbf{y}) = p(u)p(\mathbf{x}|u)p(\mathbf{z}|\mathbf{x})p(\mathbf{y}|\mathbf{z}),\tag{3.41}$$

Call *U*, *X*, *Z*, *Y* a Markov chain, denote as *U* → *X* → *Z* → *Y* .

**Theorem 3.6** *If U* → *X* → *Z* → *Y is a Markov chain, then U* → *X* → *Z and U* → *Z* → *Y are also Markov chains.*

*Proof* Assuming that *U* → *X* → *Z* → *Y* is a Markov chain, then

$$p(\mu \ge \mathbf{y}) = p(\mu)p(\mathbf{x}|\mu)p(\mathbf{z}|\mathbf{x})p(\mathbf{y}|\mathbf{z}),$$

Both sides sum *y* ∈ *Y* at the same time, and notice that  *<sup>y</sup>*∈*<sup>Y</sup> p*(*y*|*z*) = 1, then

$$p(\mu \ge z) = p(\mu)p(\mathfrak{x}|\mu)p(z|\mathfrak{x})\,.$$

By Theorem 3.5, *U* → *X* → *Z* is a Markov chain. The left side of the above formula can be expressed as

$$p(\mu \ge z) = p(\mu \ge) p(z|\mu \ge).$$

So we have

$$p(z|\mu x) = p(z|x).$$

Because *U* → *X* → *Z* → *Y* is a Markov chain, then

$$\begin{aligned} p(\mu z \mathbf{y}) &= p(\mu) p(\mathbf{x}|\mu) p(z|\mathbf{x}) p(\mathbf{y}|z) \\ &= p(\mu \mathbf{x}) p(z|\mu \mathbf{x}) p(\mathbf{y}|z) \\ &= p(\mu z z) p(\mathbf{y}|z) .\end{aligned}$$

Both sides sum *x* ∈ *X* at the same time, then we have

$$\begin{aligned} p(\mu z \mathbf{y}) &= p(\mu z) p(\mathbf{y}|z) \\ &= p(\mu) p(z|\mu) p(\mathbf{y}|z) .\end{aligned}$$

Thus *U* → *Z* → *Y* is also a Markov chain. The Theorem holds.

In the previous section, we defined the mutual information *I*(*X*, *Y* ) of two information spaces *X* and *Y* as

$$I(X,Y) = \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log \frac{p(\mathbf{x}\mathbf{y})}{p(\mathbf{x})p(\mathbf{y})}.$$

Now we define the mutual information *I*(*X*, *Y* |*Z*) of *X* and *Y* under condition *Z* as

$$I(X,Y|Z) = \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} \sum\_{z\in Z} p(\mathbf{x}\mathbf{y}z) \log \frac{p(\mathbf{x}\mathbf{y}|z)}{p(\mathbf{x}|z)p(\mathbf{y}|z)}.\tag{3.42}$$

By definition, we have

$$I(X,Y|Z) = I(Y,X|Z). \tag{3.43}$$

*I*(*X*, *Y* |*Z*) is called the conditional mutual information of *X* and *Y* .

For conditional mutual information, we first prove the following formula.

**Theorem 3.7** *Let X*, *Y*, *Z be three information spaces, then*

$$I(X,Y|Z) = H(X|Z) - H(X|YZ) \tag{3.44}$$

*and*

$$I(X,Y|Z) = H(Y|Z) - H(Y|XZ). \tag{3.45}$$

*Proof* We only prove (3.44), the same is true for equation (3.45). Because

$$\begin{aligned} H(X|Z) - H(X|YZ) &= \sum\_{\mathbf{x} \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{x} \mathbf{y}z) \log \frac{p(\mathbf{x}|\mathbf{y}z)}{p(\mathbf{x}|z)} \\ &= \sum\_{\mathbf{x} \in X} \sum\_{\mathbf{y} \in Y} \sum\_{z \in Z} p(\mathbf{x}\mathbf{y}z) \log \frac{p(\mathbf{x}\mathbf{y}|z)}{p(\mathbf{x}|z)p(\mathbf{y}|z)} \\ &= I(X, Y|Z). \end{aligned}$$

So (3.44) holds.

**Corollary 3.3** *We have I*(*X*, *Y* |*Z*) ≥ 0*, if and only if X* → *Z* → *Y is a Markov chain I*(*X*, *Y* |*Z*) = 0.

*Proof* By Theorem 3.7,

$$H(X,Y|Z) = H(X|Z) - H(X|YZ) \ge 0.$$

If *X* → *Z* → *Y* is a Markov chain, by (3.42),

$$\log \frac{p(\mathbf{x} \mathbf{y} | z)}{p(\mathbf{x} | z)p(\mathbf{y} | z)} = \log \mathbf{l} = \mathbf{0},$$

that is *I*(*X*, *Y* |*Z*) = 0. Vice versa.

Conditional mutual information can be used to establish the addition formula of mutual information.

**Corollary 3.4** (Addition formula of mutual information) *If X*1, *X*2,..., *Xn*, *Y are information spaces, then*

$$I(X\_1 X\_2 \cdots X\_n, Y) = \sum\_{i=1}^n I(X\_i, Y | X\_{i-1} \cdots X\_1). \tag{3.46}$$

*Specially, when n* = 2*, we have*

$$I(X\_1 X\_2, Y) = I(X\_1, Y) + I(X\_2, Y | X\_1). \tag{3.47}$$

*Proof* By Lemma 3.4, we have

$$\begin{aligned} H(X\_1 X\_2 \cdots X\_n, Y) &= H(X\_1 X\_2 \cdots X\_n) - H(X\_1 X\_2 \cdots X\_n | Y) \\ &= \sum\_{i=1}^n H(X\_i | X\_{i-1} \cdots X\_1) - \sum\_{i=1}^n H(X\_i | X\_{i-1} \cdots X\_1 Y) .\end{aligned}$$

Again by the chain rule of conditional entropy to get

$$I(X\_1X\_2\cdots X\_n, Y) = \sum\_{i=1}^n I(X\_i, Y | X\_1X\_2\cdots X\_{i-1})\dots$$

Therefore, the corollary holds.

Finally, we use Markov chain to prove the inequality of mutual information.

**Theorem 3.8** *Suppose X* → *Z* → *Y is a Markov chain, then we have*

$$I(X,Y) \le I(X,Z) \tag{3.48}$$

*and*

$$I(X,Y) \le I(Y,Z). \tag{3.49}$$

*Proof* We only prove (3.48), the same is true for equation (3.49). From equation (3.47) and corollary 3.3:

$$I(YZ, X) = I(Y, X) + I(X, Z | Y).$$

Thus we have

$$\begin{aligned} I(X,Y) &= I(X,YZ) - I(X,Z|Y) \\ &\le I(X,YZ) \\ &= I(X,Z) + I(X,Y|Z) \\ &= I(X,Z). \end{aligned}$$

In the last step, we use the Markov chain condition, thus *I*(*X*, *Y* |*Z*) = 0. The Theorem holds.

**Theorem 3.9** (Data processing inequality) *Suppose U* → *X* → *Y* → *V is a Markov chain, then we have*

$$I(U, V) \le I(X, Y).$$

*Proof* According to the conditions, *U* → *X* → *Y* and *U* → *Y* → *V* is a Markov chain, respectively, by Theorem 3.8,

$$I(U,Y) \le I(X,Y)$$

and

$$I(U, V) \le I(U, Y).$$

Thus

*I*(*U*, *V* ) ≤ *I*(*X*, *Y* ).

The Theorem holds.

# **3.5 Source Coding Theorem**

The information coding theory is usually divided into two parts: channel coding and source coding. The so-called channel coding is to ensure the success rate of decoding by increasing the length of codewords. Channel coding, also known as error correction code, is discussed in detail in Chap. 2. Source coding is to compress the data with redundant information to improve the success rate of decoding and recovery after information or data is stored. Another important result of Shannon's theory is that there are so-called good codes in source coding, which is characterized by fewer codewords as much as possible. To improve the storage space efficiency, and the error of decoding and restoration can be arbitrarily small. Source coding is also called typical code. Shannon first proved the asymptotic bisection property of 'block code'for memoryless source, and drew the statistical characteristics of typical code from now on. At Shannon's suggestion, McMillan (1953) and Breiman (1957) also proved a similar asymptotic bisection property for stationary ergodic sources. This is the very famous Shannon–McMillan–Breiman theorem in source coding, which constitutes the core content of modern typical code theory. The main purpose of this section is to strictly prove the asymptotic bisection of memoryless sources, so as to derive the source coding theorem for data compression (see Theorem 3.10). For the more general Shannon–McMillan–Breiman theorem, Chap. 2 of Ye Zhongxing's fundamentals of information theory (see Zhongxing, 2003 in reference 3) gives a proof under the condition of stationary ergodic Markov source, interested readers can refer to it or refer to more original documents (see McMillan, 1953; Moy, 1961; Shannon, 1959 in reference 3).

Firstly, let *X* = (*X*,ξ) be an information space, and the entropy *H*(*X*) of *X* essentially depends only on the probability function *p*(*x*)(*x* ∈ *X*) of random variable ξ . We can define the random variable taking value on *X* according to *p*(*x*).

$$
\eta\_1 = p(X), \ \eta\_2 = \log p(X). \tag{3.50}
$$

The probability function is

$$P\{\eta\_1 \text{ value } \mathbf{x}\} = P\{\eta\_2 \text{ value } \mathbf{x}\} = p(\mathbf{x}). \tag{3.51}$$

It is easy to see the expected value of η<sup>2</sup>

$$\begin{aligned} -E(\eta\_2) &= -E(\log p(X)) \\ &= -\sum\_{\mathbf{x} \in X} p(\mathbf{x}) \log p(\mathbf{x}) = H(X). \end{aligned} \tag{3.52}$$

Therefore, we can regard the entropy *H*(*X*) of *X* as the mathematical expectation of random variable log <sup>1</sup> *<sup>p</sup>*(*X*).

**Lemma 3.11** *Let X be a memoryless source, p*(*X<sup>n</sup>*) *and* log *p*(*X<sup>n</sup>*) *be two random variables whose values are on the power space Xn, then* <sup>−</sup><sup>1</sup> *<sup>n</sup>* log *p*(*X<sup>n</sup>*) *converges to H*(*X*) *according to probability, that is*

$$-\frac{1}{n}\log p(X^n) \stackrel{P}{\longrightarrow} H(X).$$

*Proof* Since *X* is a memoryless source, {ξ*i*}*<sup>i</sup>*≥<sup>1</sup> is a group of independent and identically distributed random variables, *Xi* = (*X*, ξ*i*)(*i* ≥ 1), *X<sup>n</sup>* = *X*1*X*<sup>2</sup> ··· *Xn*(*n* ≥ 1) is a power space, then there is

$$\begin{cases} p(X^n) = p(X\_1)p(X\_2) \cdots p(X\_n), \\ \log p(X^n) = \sum\_{i=1}^n \log p(X\_i). \end{cases}$$

Because {ξ*i*}*<sup>i</sup>*≥<sup>1</sup> is independent and identically distributed,{*p*(*X<sup>n</sup>*)} and {log *p*(*X<sup>n</sup>*}is also a group of independent and identically distributed random variables. According to Chebyshev's law of large numbers (see Theorem 1.3 of Chap. 1),

$$-\frac{1}{n}\log p(X^n) = \frac{1}{n}\sum\_{i=1}^n \log \frac{1}{p(X\_i)}$$

converges to the common expected value *H*(*X*), that is

$$E\left(\log\frac{1}{p(X\_i)}\right) = E\left(\log\frac{1}{p(X)}\right) = H(X).$$

For any ε > 0, for any codeword *x* = *x*1*x*<sup>2</sup> ··· *xn* ∈ *X<sup>n</sup>*, there is

$$P\{|\ -\frac{1}{n}\log p(X^n) - H(X)| < \varepsilon\} > 1 - \varepsilon. \tag{3.53}$$

The proof is completed.

**Definition 3.13** Let *X* be a memoryless source, power space *X<sup>n</sup>*, also known as block code,

$$X'' = \{ \mathbf{x} = \mathbf{x}\_1 \cdots \mathbf{x}\_n | \mathbf{x}\_i \in X, \ 1 \le i \le n \}, n \ge 1. \tag{3.54}$$

For any given ε > 0, *n* ≥ 1, we define a typical code or a typical sequence *W*(*n*) <sup>ε</sup> in the power space *X<sup>n</sup>* as

$$W\_{\varepsilon}^{(n)} = \{ \mathbf{x} = \mathbf{x}\_1 \cdots \mathbf{x}\_n \mid \mid -\frac{1}{n} \log \, p(\mathbf{x}) - H(X) \vert < \varepsilon \}. \tag{3.55}$$

Because the definition, and ε > 0, *n* ≥ 1, we have

$$W\_e^{(n)} \subset X^n, \ |X^n| = |X|^n. \tag{3.56}$$

**Lemma 3.12** (Progressive bisection) |*W*(*n*) <sup>ε</sup> | *represents the number of codewords in typical code W*(*n*) <sup>ε</sup> *, then for any* ε > 0*, in binary channels, we have*

$$n(1 - \varepsilon)2^{n(H(X) - \varepsilon)} \le |W\_{\varepsilon}^{(n)}| \le 2^{n(H(X) + \varepsilon)}.\tag{3.57}$$

*Proof* By Lemma 3.11 and (3.53), then for any *x* ∈ *X<sup>n</sup>*, we have

$$P\{| - \frac{1}{n} \log p(\mathbf{x}) - H(X) | < \varepsilon \} > 1 - \varepsilon.$$

In other words, for all codewords *x* = *x*1*x*<sup>2</sup> ··· *xn* ∈ *W*(*n*) <sup>ε</sup> , we have

$$H(X) - \varepsilon < -\frac{1}{n}\log p(\mathbf{x}) < H(X) + \varepsilon.$$

Equivalent in binary channel,

$$2^{-n(H(X)+\varepsilon)} \le p(\mathbf{x}) \le 2^{-n(H(X)-\varepsilon)},\tag{3.58}$$

Denote the probability of occurrence of *W*(*n*) <sup>ε</sup> as *P*{*W*(*n*) <sup>ε</sup> }, then

$$P\{W\_{\varepsilon}^{(n)}\} = P\{\mathbf{x} \in X'' : \mathbf{x} \in W\_{\varepsilon}^{(n)}\} > 1 - \varepsilon.$$

On the other hand,

$$P\{W\_e^{(n)}\} = \sum\_{\mathbf{x}\in W\_e^{(n)}} p(\mathbf{x}),$$

by (3.58),

$$|W\_{\varepsilon}^{(n)}| \cdot 2^{-n(H(X)+\varepsilon)} \le P\{W\_{\varepsilon}^{(n)}\} \le 1.$$

So

$$|W\_{\varepsilon}^{(n)}| \le 2^{n(H(X) + \varepsilon)}.$$

Again by (3.58), there is

$$|W\_{\varepsilon}^{(n)}| \cdot 2^{-n(H(X)-\varepsilon)} \ge P\{W\_{\varepsilon}^{(n)}\} > 1 - \varepsilon.$$

So we have

$$|W\_e^{(n)}| > (1 - \varepsilon)2^{n(H(X) - \varepsilon)}.$$

Combined with the above inequalities on both sides, we have

$$|(1 - \varepsilon)2^{n(H(X) - \varepsilon)} \le |W\_{\varepsilon}^{(n)}| \le 2^{n(H(X) + \varepsilon)}.$$

We completed the proof.

By Lemma 3.12, for memoryless source *X*, the probability distribution *p*(*x*) of its power space *X<sup>n</sup>* is approximate to

$$p(\mathbf{x}) \sim 2^{-nH(X)}, \ \forall \ \mathbf{x} \in X^n.$$

The number of codewords |*W*(*n*) <sup>ε</sup> | in typical code *W*(*n*) <sup>ε</sup> is approximately

$$|W\_e^{(n)}| \sim 2^{nH(X)}.$$

Further analysis shows that the proportion of typical code *W*(*n*) <sup>ε</sup> in block code *X<sup>n</sup>* is very small, which can be summarized as the following Lemma.

**Lemma 3.13** *For a sufficiently small* ε > 0 *given, when X is not an equal probability information space, we have*

$$\lim\_{n \to \infty} \frac{|W\_{\varepsilon}^{(n)}|}{|X|^n} = 0.$$

*Proof* By Lemma 3.12, we have

$$\frac{|W\_{\varepsilon}^{(n)}|}{|X|^n} \le \frac{2^{n(H(X)+\varepsilon)}}{|X|^n}.$$

So

$$\frac{|W\_e^{(n)}|}{|X|^n} \le \mathcal{Z}^{-n(\log|X| - H(X) - e)}\\_\cdot$$

By Theorem 3.1, since *X* is not an equal probability information space, when ε is sufficient, we have

$$H(X) + \varepsilon < \log |X|.$$

Therefore, when *n* is sufficiently large, the ratio of <sup>|</sup>*W*(*n*) <sup>ε</sup> <sup>|</sup> |*X*| *<sup>n</sup>* can be arbitrarily small. The Lemma 3.13 holds.

Combining Lemmas 3.11, 3.12 and 3.13, we can describe that the typical codes in block codes have the following statistical characteristics.

**Corollary 3.5** *Assuming that X is a memoryless source and the typical sequence (or typical code) W*(*n*) <sup>ε</sup> *in block code X<sup>n</sup> is defined by formula (3.55), then for any* ε > 0*, n* ≥ 1*, we have*

*(i)* (*Progressive bisection*)

$$|(1 - \varepsilon)2^{n(H(X) - \varepsilon)} \le |W\_{\varepsilon}^{(n)}| \le 2^{n(H(X) + \varepsilon)}.$$

*(ii) The occurrence probability P*{*W*(*n*) <sup>ε</sup> } *of W*(*n*) <sup>ε</sup> *is infinitely close to 1, that is*

$$P\{W\_{\varepsilon}^{(n)}\} = P\{\mathbf{x} \in X'' : \mathbf{x} \in W\_{\varepsilon}^{(n)}\} > 1 - \varepsilon.$$

*(iii) When X is not equal to almost information space, the proportion of W*(*n*) <sup>ε</sup> *in block code X<sup>n</sup> is any smaller, that is,*

$$\lim\_{n \to \infty} \frac{|W\_{\varepsilon}^{(n)}|}{|X|^n} = 0.$$

The above description of the statistical characteristics of typical codes is an important theoretical basis for source coding or data compression. Therefore, we find an effective way to compress the packet code information, so that the rearranged codewords are as few as possible, and the error probability of decoding and recovery is as small as possible. An effective method is to divide the codeword in block code *X<sup>n</sup>* into two parts; the codeword of typical code *W*(*n*) <sup>ε</sup> is uniformly numbered from 1 to *M*. That is, the codeword in *W*(*n*) <sup>ε</sup> forms one-to-one correspondence with the following positive integer set *I*,

$$I = \{1, 2, \ldots, M\}, \ M = |W\_{\varepsilon}^{(n)}|.$$

For codewords that do not belong to *W*(*n*) <sup>ε</sup> , we uniformly number them as 1: Obviously, for *i*,*i* = 1, 1 ≤ *i* ≤ *n*, there is a unique codeword *x*(*i*) ∈ *W*(*n*) <sup>ε</sup> in *W*(*n*) <sup>ε</sup> , so we can accurately restore *i* to *x*(*i*) , that is *i* decode −→ *x*(*i*) is the correct decoding. For *i* = 1, we will not be able to decode correctly, resulting in decoding recovery error. We denote the code rate of the typical code *W*(*n*) <sup>ε</sup> as <sup>1</sup> *<sup>n</sup>* log *M*, by Lemma 3.12,

$$(1 - \varepsilon)2^{n(H(X) - \varepsilon)} \le M \le 2^{n(H(X) + \varepsilon)}.$$

Equivalently,

$$
\log(1 - \varepsilon) + n(H(X) - \varepsilon) \le \log M \le n(H(X) + \varepsilon),
$$

Therefore, the bit rate of typical code *W*(*n*) <sup>ε</sup> is estimated as follows

$$\frac{1}{n}\log(1-\varepsilon) + H(X) - \varepsilon \le \frac{1}{n}\log M \le H(X) + \varepsilon,\tag{3.59}$$

when 0 <ε< 1 given, we have

$$H(X) - \varepsilon \le \lim\_{n \to \infty} \frac{1}{n} \log M \le H(X) + \varepsilon.$$

In other words, the code rate is typically close to *H*(*X*). Let us look at the decoding error probability *Pe* after this number, where

$$P\_e = P\{\mathbf{x} \in X^n : \mathbf{x} \notin W\_e^{(n)}\}.$$

Because

$$P\_e + P\{W\_e^{(n)}\} = 1,$$

According to the statistical characteristics (*ii*) of the typical code *W*(*n*) <sup>ε</sup> ,

$$P\_{\varepsilon} = 1 - P\{W\_{\varepsilon}^{(n)}\} < \varepsilon. \tag{3.60}$$

From this, we derive the main result of this section, the so-called source coding theorem.

**Theorem 3.10** (*Shannon, 1948*) *Assuming that X is a memoryless source, then*


*Proof* The above analysis has given the proof of (*i*). In fact, if

$$\mathcal{R} = \frac{1}{n} \log M\_1 > H(X),$$

then when ε is sufficiently small, by (3.59). Typical codes in block code *Xn* are

$$R \geqslant \frac{1}{n} \log |W\_{\varepsilon}^{(n)}|, \ M\_{\mathbb{I}} \succ |W\_{\varepsilon}^{(n)}|.$$

Therefore, we construct a code *C* ⊂ *X<sup>n</sup>*, which satisfies

$$W\_\varepsilon^{(n)} \subset C, |C| = M\_1.$$

Thus, the code rate of *C* is just equal to *R*, and the decoding error probability *Pe*(*C*) after compression coding satisfies *Pe*(*C*)<ε. Because the probability of occurrence of *C*

$$P\{C\} + P\_{\epsilon}(C) = 1.$$

But

$$P\{C\} \ge P\{W\_{\varepsilon}^{(n)}\} > 1 - \varepsilon,$$

(*i*) holds. To prove (*ii*), we note that, ∀ *x* ∈ *W*(*n*) <sup>ε</sup> , then

$$| - \frac{1}{n} \log p(x) - H(X) | < \varepsilon.$$

The above formula contains ∀ *x* ∈ *W*(*n*) <sup>ε</sup> ,

$$p(\mathbf{x}) < 2^{-n(H(X)-\epsilon)}.$$

Thus, the probability of occurrence of *W*(*n*) <sup>ε</sup> satisfies

$$P\{W\_{\varepsilon}^{(n)}\} = \sum\_{\mathbf{x} \in W\_{\varepsilon}^{(n)}} p(\mathbf{x}) \le |W\_{\varepsilon}^{(n)}| \cdot 2^{-n(H(X)-\varepsilon)}.\tag{3.61}$$

If we use *R* as the bit rate, because

$$\mathcal{R} = \frac{1}{n} \log M < H(X) - \delta,$$

then we have

$$|W\_{\varepsilon}^{(n)}| < M = 2^{n(H(X) - \delta)}.$$

$$P\{W\_{\varepsilon}^{(n)}\} < 2^{-n(\delta - \varepsilon)},\tag{3.62}$$

By (3.61),

when 0 <ε<δ, we have

$$1 - P\_{\varepsilon} = P\{W\_{\varepsilon}^{(n)}\} < \varepsilon.$$

Thus

$$\lim\_{n \to \infty} P\_{\epsilon} = 1,$$

Thus the theorem holds.

# **3.6 Optimal Code Theory**

Let *X* be a source state set, *x* = *x*1*x*<sup>2</sup> ··· *xn* ∈ *X<sup>n</sup>* be a message sequence, and *x* be output as a codeword *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>*1*u*<sup>2</sup> ··· *uk* <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> <sup>D</sup>* of length *k* after compression coding, where *<sup>D</sup>* <sup>≥</sup> 1 is a positive integer, <sup>Z</sup>*<sup>D</sup>* is the remaining class ring of mod *<sup>D</sup>*, *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>*1*u*<sup>2</sup> ··· *uk* <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> <sup>D</sup>* is called a *D*- ary codeword of length *k*. *u* is decoded and translated into message *x*, that is *u* → *x*. The purpose of source coding is to find a good coding scheme to make the code rate as small as possible under the requirement of sufficiently small decoding error. Below, we give the strict mathematical definitions of equal length code and variable length code.

**Definition 3.14** Let *X* be a source state set, Z*<sup>D</sup>* is the remaining class ring of mod *D*, *<sup>n</sup>*, *<sup>k</sup>* are positive integers. The mapping *<sup>f</sup>* : *<sup>X</sup><sup>n</sup>* <sup>→</sup> <sup>Z</sup>*<sup>k</sup> <sup>D</sup>* is called equal length code coding function; Z*<sup>k</sup> D* ψ −→ *X<sup>n</sup>* is called the corresponding decoding function. For <sup>∀</sup> *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*<sup>1</sup> ··· *xn* <sup>∈</sup> *<sup>X</sup><sup>n</sup>*, *<sup>f</sup>* (*x*) <sup>=</sup> *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>*<sup>1</sup> ··· *uk* <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> <sup>D</sup>*, *u* = *u*<sup>1</sup> ··· *uk* is called a codeword of length *k*.

$$C = \{ f(\mathbf{x}) \in \mathbb{Z}\_D^k | \mathbf{x} \in X^n \},\tag{3.63}$$

call

Call *<sup>C</sup>* is the code coded by *<sup>f</sup>* , and *<sup>R</sup>* <sup>=</sup> *<sup>k</sup> <sup>n</sup> logD* is the coding rate of *f* , also known as the code rate of *C*. *C* is called equal length code; it is sometimes called a block code with a packet length of *k*.

By Definition 3.14, the error probability of an equal length code coding scheme ( *f*,ψ) is

$$P\_{\epsilon} = P\{\psi(f(\mathbf{x})) \neq \mathbf{x}, \ \mathbf{x} \in X^n\}.\tag{3.64}$$

Let us first consider error free coding, that is *Pe* = 0. Obviously, *Pe* = 0 if and only if *f* is a injection, ψ = *f* <sup>−</sup><sup>1</sup> is the left inverse mapping of *f* . select a coding function *<sup>f</sup>* : *<sup>X</sup><sup>n</sup>* <sup>→</sup> <sup>Z</sup>*<sup>k</sup> <sup>D</sup>* as a injection if and only if <sup>|</sup>Z*<sup>k</sup> <sup>D</sup>*|≥|*X<sup>n</sup>*|, that is *D<sup>k</sup>* ≥ *N<sup>n</sup>*, where *N* = |*X*|, take logarithms on both sides,

$$R = \frac{k}{n} \log D \ge \log N = \log |X|. \tag{3.65}$$

Therefore, the code rate of error free compression coding *f* is at least log2 |*X*| bits or ln |*X*| naits.

We consider progressive error free coding, that is, for any given ε > 0, required decoding error probability *Pe* ≤ ε. By Theorem 3.10, only the code rate *R* ≥ *H*(*X*)is needed. In fact, take *X* as an information space and encode the *n*-lengthen message column *x* = *x*1*x*<sup>2</sup> ··· *xn* ∈ *X<sup>n</sup>*, if *x* ∈ *W*(*n*) <sup>ε</sup> is a typical sequence (typical code), *x* corresponds to a number in *M* = |*W*(*n*) <sup>ε</sup> |, if *x* ∈/ *W*(*n*) <sup>ε</sup> , uniformly code *x* as 1. If the *M* codewords in *W*(*n*) <sup>ε</sup> are represented by *D*-ary digits, let *D<sup>k</sup>* = *M* (the insufficient part can be supplemented), and the code rate *R* is

$$\mathcal{R} = \frac{1}{n} \log M = \frac{k}{n} \log D.$$

Since *M* is approximately 2*n H*(*X*) , *<sup>R</sup>* is approximately *<sup>H</sup>*(*X*), that is *<sup>R</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>* log *M* ∼ *H*(*X*). From the asymptotic bisection, the error probability of such coding is

$$P\_{\varepsilon} = P\{\mathbf{x} = \mathbf{x}\_1 \cdots \mathbf{x}\_n \notin W^{(n)}\_{\varepsilon}\} \prec \varepsilon,\text{ When }n\text{ is sufficiently large.}$$

However, in practical application, *n* cannot increase infinitely, which requires us to find the best coding scheme when given a finite *n*, so that the code rate is as close as possible to the theoretical value *H*(*X*). However, in application, we find that equal length code is not an efficient coding scheme, while variable length code is more practical. For example,

*Example 3.6* Let *X* = {1, 2, 3, 4} be an information space, and the probability distribution of random variable ξ taking value on *X* is

$$
\xi \sim \begin{pmatrix} 1 & 2 & 3 \ 4 \\ \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \frac{1}{8} \end{pmatrix} .
$$

The entropy *H*(*X*) of information space *X* is

$$H(X) = -\frac{1}{2}\log\_2\frac{1}{2} - \frac{1}{4}\log\_2\frac{1}{4} - \frac{1}{8}\log\_2\frac{1}{8} - \frac{1}{8}\log\_2\frac{1}{8} = 1.75 \text{bits.}$$

If equal length code is used for coding, the code length is 2, and the code is


Then the code rate *R*(*k* = 2, *n* = 1) is

$$R = 2\log\_2 2 = 2 > 1.7 \text{bits}.$$

Obviously, the use efficiency of equal length codes is not high. If the above codes are replaced with unequal length codes, such as



We use *l*(*x*) to represent the code length after the source letter *x* is encoded, then the average code length *L* required for *X* encoding is

$$L = \sum\_{i=1}^{4} p(\mathbf{x}\_i)l(\mathbf{x}\_i) = \frac{1}{2} \times 1 + \frac{1}{4} \times 2 + \frac{1}{8} \times 3 + \frac{1}{8} \times 3 = 1.75 \text{ bits} = H(X).$$

It can be seen that using unequal length code to compile *X* has higher efficiency. This example also explains the following compression coding principle: for characters with high probability of occurrence, a shorter codeword is prepared, and for characters with low probability of occurrence, a longer codeword is prepared to ensure that the average coding length is as small as possible.

Next, we give the mathematical definition of variable length coding. For this purpose, let *X*<sup>∗</sup> and Z<sup>∗</sup> *<sup>D</sup>* be the set of finite length sequences, respectively. That is *X*<sup>∗</sup> = <sup>1</sup>≤*k*<<sup>∞</sup> *<sup>X</sup><sup>k</sup>* .


(v) *<sup>f</sup>* : *<sup>X</sup>*∗−→Z<sup>∗</sup> *<sup>D</sup>* is called a real-time code mapping. If *f* is a block code mapping, and for any *x*, *y* ∈ *X*∗, *f* (*x*) and *f* (*y*) cannot be prefixes to each other.

*Remark 3.1 <sup>a</sup>* <sup>=</sup> *<sup>a</sup>*1*a*<sup>2</sup> ··· *an* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>D</sup>*, *<sup>b</sup>* <sup>=</sup> *<sup>b</sup>*1*b*<sup>2</sup> ··· *bm* <sup>∈</sup> <sup>Z</sup>*<sup>m</sup> <sup>D</sup>*, call codeword *a* the prefix of *b*, if *m* ≥ *n*, and for any 1 ≤ *i* ≤ *n*, there is *ai* = *bi* .

**Lemma 3.14** *Block code mapping f* : *<sup>X</sup>*∗−→Z<sup>∗</sup> *<sup>D</sup> is called a uniquely decodable mapping if and only if for* <sup>∀</sup> *<sup>n</sup>* <sup>≥</sup> <sup>1</sup>, *<sup>X</sup><sup>n</sup>*−→Z<sup>∗</sup> *<sup>D</sup>, f is restricted to a injection on Xn.*

*Proof* The necessity is obvious and the adequacy is proved. That is to prove for ∀ *x* = *x*1*x*<sup>2</sup> ··· *xn* ∈ *X<sup>n</sup>*, *y* = *y*<sup>1</sup> *y*<sup>2</sup> ··· *ym* ∈ *X <sup>m</sup>*, *x* = *y*, there is *f* (*x*) = *f* (*y*). Suppose there is *f* (*x*) = *f* (*y*), because *f* is a block code mapping, there is a mapping *<sup>g</sup>* : *<sup>X</sup>*−→Z<sup>∗</sup> *<sup>D</sup>*, we have

$$f(\mathbf{x}) = \mathbf{g}(\mathbf{x}\_1)\mathbf{g}(\mathbf{x}\_2)\cdots\mathbf{g}(\mathbf{x}\_n) = \mathbf{g}(\mathbf{y}\_1)\mathbf{g}(\mathbf{y}\_2)\cdots\mathbf{g}(\mathbf{y}\_m) = f(\mathbf{y}).$$

Then

$$\begin{aligned} f(\mathbf{x}\mathbf{y}) &= \mathbf{g}(\mathbf{x}\_1)\mathbf{g}(\mathbf{x}\_2)\cdots\mathbf{g}(\mathbf{x}\_n)\mathbf{g}(\mathbf{y}\_1)\mathbf{g}(\mathbf{y}\_2)\cdots\mathbf{g}(\mathbf{y}\_m) \\ &= \mathbf{g}(\mathbf{y}\_1)\mathbf{g}(\mathbf{y}\_2)\cdots\mathbf{g}(\mathbf{y}\_m)\mathbf{g}(\mathbf{x}\_1)\mathbf{g}(\mathbf{x}\_2)\cdots\mathbf{g}(\mathbf{x}\_n) \\ &= f(\mathbf{y}\mathbf{x}). \end{aligned}$$

But *x y* = *yx*, this contradicts the fact that *f* is restricted to a injection on *Xn*+*<sup>m</sup>*.

**Lemma 3.15** *A real-time code is uniquely decodable, and vice versa.*

*Proof* Suppose *<sup>f</sup>* : *<sup>X</sup>*∗−→Z<sup>∗</sup> *<sup>D</sup>* as an instant code mapping, and for *x*, *y* ∈ *X*∗, *x* = *y*, there is *<sup>f</sup>* (*x*) <sup>=</sup> *<sup>a</sup>*1*a*<sup>2</sup> ··· *an* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>D</sup>*, *<sup>f</sup>* (*y*) <sup>=</sup> *<sup>b</sup>*1*b*<sup>2</sup> ··· *bm* <sup>∈</sup> <sup>Z</sup>*<sup>m</sup> <sup>D</sup>*(*m* ≥ *n*). Because *f* (*x*) is not a prefix of *f* (*y*), it exists *i*(1 ≤ *i* ≤ *n*), there is *ai* = *bi* , thus *f* (*x*) = *f* (*y*), that is *f* is an injection. In turn, let us take a counter example,



where *<sup>X</sup>* = {1, <sup>2</sup>, <sup>3</sup>, <sup>4</sup>} is the information space and *<sup>f</sup>* : *<sup>X</sup>* <sup>→</sup> <sup>Z</sup><sup>∗</sup> <sup>2</sup> is a variable length code. *f* (1) is the prefix of *f* (2), that is, *f* is not a real-time code map, but obviously *f* is the only decodeable map. The Lemma holds.

What are the conditions for the code length of a real-time code? The following Kraft inequality gives a satisfactory answer.

**Lemma 3.16** *For the uniquely decodable code C value in* Z<sup>∗</sup> *<sup>D</sup>,* |*C*| = *m, the code lengths are l*1,*l*2,...,*lm, then there is the following McMillan–Kraft inequality.*

$$\sum\_{i=1}^{m} D^{-l\_i} \le 1.\tag{3.66}$$

*On the contrary, if li satisfies the above conditions, there is a code length set of real-time code C such that* {*l*1,*l*2,...,*lm*} *is C.*

#### *Proof* Consider

$$\left(\sum\_{i=1}^{m} D^{-l\_i}\right)^n = (D^{-l\_1} + D^{-l\_2} + \dots + D^{-l\_m})^n \ .$$

the form of each item is *D*−*li*1−*li*2−···−*lin* = *D*−*<sup>k</sup>* , where *li*<sup>1</sup> + *li*<sup>2</sup> +···+ *lin* = *k*. Suppose *l* = max{*l*1,*l*2,...,*lm*}, then the range of *k* is from *n* to *nl*. Define the number of items where *Nk* is *D*−*<sup>k</sup>* , then

$$\left(\sum\_{i=1}^{m} D^{-l\_i}\right)^n = \sum\_{k=n}^{nl} N\_k D^{-k}.$$

Note that *Nk* can be regarded as the number of codeword sequences with a total length of *k* just assembled by *n* codewords in *C*, i.e.,

$$N\_k = |\{(c\_1, c\_2, \dots, c\_n) \mid |c\_1 c\_2 \cdots c\_n| = k, c\_i \in \mathcal{C}\}|.$$

The codeword is still in Z<sup>∗</sup> *<sup>D</sup>*, and because *<sup>f</sup>* : *<sup>X</sup>*∗−→Z<sup>∗</sup> *<sup>D</sup>* is an injection, so *Nk* ≤ *D<sup>k</sup>* . then we have

$$\left(\sum\_{i=1}^{m} D^{-l\_i}\right)^n = \sum\_{k=n}^{nl} N\_k D^{-k} \le \sum\_{k=n}^{nl} D^k D^{-k} = nl - n + 1 \le nl.$$

If *x* ≥ 1, and when *n* Is Sufficiently Large, *x <sup>n</sup>* > *nl*. But the above formula holds for all arbitrary *n*. That is *<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *<sup>D</sup>*−*li* <sup>≤</sup> <sup>1</sup>.

On the contrary, assuming that Kraft inequality exists, that is, there is a given length *li*(1 ≤ *i* ≤ *m*) satisfying formula (3.66), now we need to construct a realtime code with these lengths, and *li*(1 ≤ *i* ≤ *m*) may not be completely different. Definition *n <sup>j</sup>* is the number of codewords with length j, if *l* = max{*l*1,*l*2,...,*lm*}, then

$$\sum\_{j=1}^{l} n\_j = m.$$

(3.66) equivalent to

$$\sum\_{j=1}^{l} n\_j D^{-j} \le 1.$$

Multiply both sides by *D<sup>l</sup>* , then *<sup>l</sup> <sup>j</sup>*=<sup>1</sup> *<sup>n</sup> <sup>j</sup> <sup>D</sup><sup>l</sup>*<sup>−</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>D</sup><sup>l</sup>* . There is

$$n\_l \le D^l - n\_1 D^{l-1} - n\_2 D^{l-2} - \dots - n\_{l-1} D,$$

$$n\_{l-1} \le D^{l-1} - n\_1 D^{l-2} - n\_2 D^{l-3} - \dots - n\_{l-2} D,$$

$$\dots$$

$$n\_3 \le D^3 - n\_1 D^2 - n\_2 D,$$

$$n\_2 \le D^2 - n\_1 D,$$

$$n\_1 \le D.$$

Because *n*<sup>1</sup> ≤ *D*, we can choose these *n*<sup>1</sup> codes arbitrarily, and the remaining *D* − *n*<sup>1</sup> codes with length 1 can be used as the prefix of other codewords. Therefore, there are (*D* − *n*1)*D* options for codewords with length of 2. That is *n*<sup>2</sup> ≤ *D*<sup>2</sup> − *n*1*D*. Similarly, (*D* − *n*1)*D* − *n*<sup>2</sup> codewords can be used as prefixes of subsequent codewords. Therefore, there are at most ((*D* − *n*1)*D* − *n*2)*D* options for codewords with length of 3. That is *n*<sup>3</sup> ≤ *D*<sup>3</sup> − *n*1*D*<sup>2</sup> − *n*2*D*. ..., in this way, we can always construct a real-time code with length {*l*1,*l*2,...,*lm*}. The Lemma holds!

Let us give an example that is not the only one that can be decoded.

*Example 3.7* Let *<sup>X</sup>* = {1, <sup>2</sup>, <sup>3</sup>, <sup>4</sup>}, <sup>Z</sup>*<sup>D</sup>* <sup>=</sup> <sup>F</sup>2, the coding scheme is


Because the encoder inputs and the decoder receives continuous codeword symbols, if the character received by the decoder is 001101, there may be two decoding results, 112212 and 3412. This shows that *f* <sup>∗</sup> is not an injection, that is, the code written by *f* is not uniquely decodable.

By Lemma 3.16, real-time codes or, more generally, uniquely decodable codes must satisfy Kraft inequality. However, the variable length code compiled according to kraft inequality is not the optimal code, because from the perspective of random coding, an optimal code not only requires the accuracy of decoding, but also ensures the efficiency, that is, the average random code length requires the shortest. We summarize the strict mathematical definition of the optimal code as.

**Definition 3.16** Let *X* = {*x*1, *x*2,..., *xm*} is an information space, a real-time code *C* = { *f* (*x*1), *f* (*x*2), . . . , *f* (*xm*)} is called an optimal code if its average random code length

$$L = \sum\_{i=1}^{m} p\_i l\_i \tag{3.67}$$

is the smallest, where *pi* = *p*(*xi*) is the occurrence probability of *xi* and *li* is the code length of *xi* .

For a source state set *X*, when its statistical characteristics are determined, that is, after *X* becomes an information space, the probability distribution {*p*(*x*)|*x* ∈ *X*} is given. Therefore, to find the optimal compression coding scheme for an information space *X* is to find the optimal solution {*l*1,*l*2,...,*lm*} of (3.67) under the condition of kraft inequality. Usually, we use the Lagrange multiplier method to find the optimal solution. Let

$$J = \sum\_{i=1}^{m} p\_i l\_i + \lambda \left(\sum\_{i=1}^{m} D^{-l\_i}\right),$$

Find the partial derivative of *li*

$$\frac{\partial J}{\partial l\_i} = p\_i - \lambda D^{-l\_i} \log D.$$

Thus

$$D^{-l\_i} = \frac{p\_i}{\lambda \log D}.$$

By Kraft inequality, that is

$$\sum\_{i=1}^{m} D^{-l\_i} \le 1.$$

We get

$$1 \ge \sum\_{i=1}^m D^{-l\_i} = \frac{1}{\lambda \log D} \sum\_{i=1}^m p\_i \Rightarrow \lambda \ge \frac{1}{\log D}.$$

Thus, the optimal code length *li* is

$$d\_i \ge -\log\_D p\_i, \, p\_i \ge D^{-l\_i}.\tag{3.68}$$

The corresponding optimal average code length *L* is

$$L = \sum\_{i=1}^{m} p\_i l\_i \ge -\sum\_{i=1}^{m} p\_i \log\_D p\_i = H\_D(X). \tag{3.69}$$

That is, *L* is the *D*-ary information entropy *HD*(*X*) of *X*. from this, we get the main results of this section.

**Theorem 3.11** *The average length L of any D-ary real-time code in an information space X shall satisfies*

$$L \ge H\_D(X).$$

*The equal sign holds if and only if pi* = *D*−*li .*

Next, we will give another proof of Theorem 3.11. Therefore, we consider that there are two random variables ξ and η on a source state set *X*, and their probability distributions are

$$p(\mathbf{x}) = P\{\xi = \mathbf{x}\}, \ q(\mathbf{x}) = P\{\eta = \mathbf{x}\}, \ \forall \ \mathbf{x} \in X.$$

The relative entropy of random variables is defined as

$$D(p||q) = \sum\_{\mathbf{x} \in X} p(\mathbf{x}) \log \frac{p(\mathbf{x})}{q(\mathbf{x})}.\tag{3.70}$$

**Lemma 3.17** *The relative entropy D*(*p*||*q*) *of two random variables on X satisfies*

$$D(p||q) \ge 0, \text{ and } D(p||q) = 0 \iff p(\mathbf{x}) = q(\mathbf{x}), \forall \mathbf{x} \in X.$$

*Proof* If the real number *x* > 0 is expanded by the power series of *e<sup>x</sup>* , it can be obtained

$$e^{\chi - 1} = 1 + (\chi - 1) + \frac{1}{2}(\chi - 1)^2 + \dotsb \dotsb$$

Thus *ex*−<sup>1</sup> ≥ *x*, there is log *x* ≤ *x* − 1, by (3.70), then

$$\begin{aligned} -D(p||q) &= \sum\_{\boldsymbol{\chi} \in \mathcal{X}} p(\boldsymbol{\chi}) \log \frac{q(\boldsymbol{\chi})}{p(\boldsymbol{\chi})} \\ &\leq \sum\_{\boldsymbol{\chi} \in \mathcal{X}} p(\boldsymbol{\chi}) (\frac{q(\boldsymbol{\chi})}{p(\boldsymbol{\chi})} - 1) = 0. \end{aligned}$$

Thus, there is *D*(*p*||*q*) ≥ 0, *D*(*p*||*q*) = 0's conclusion is obvious.

*Proof* (Another proof of theorem 3.11) Investigate *L* − *HD*(*X*),

$$\begin{split} L - H\_D(X) &= \sum\_{i=1}^m p\_i l\_i - \sum\_{i=1}^m p\_i \log\_D \frac{1}{p\_i} \\ &= -\sum\_{i=1}^m p\_i \log\_D D^{-l\_i} + \sum\_{i=1}^m p\_i \log\_D p\_i. \end{split} \tag{3.71}$$

Define

$$r\_i = \frac{D^{-l\_i}}{c}, \ c = \sum\_{j=1}^{m} D^{-l\_i}.$$

By Kraft inequality, we have

$$c \le 1, \text{ and } \sum\_{i=1}^{m} r\_i = 1.$$

Therefore, {*ri*, 1 ≤ *i* ≤ *m*} is a probability distribution on *X*, by (3.71),

$$L - H\_D(X) = -\sum\_{i=1}^m p\_i \log\_D cr\_i + \sum\_{i=1}^m p\_i \log\_D p\_i = \sum\_{i=1}^m p\_i \left(\log\_D \frac{p\_i}{r\_i} + \log\_D \frac{1}{c}\right).$$

By Lemma 3.17 and *c* ≤ 1, we have

$$L - H\_D(X) \ge 0,\\
\text{and } L = H\_D(X) \text{ if and only if } c = 1 \text{ and } r\_i = p\_i, \dots$$

that is

$$p\_i = D^{-l\_i}, \text{ or } l\_i = \log\_D \frac{1}{p\_i}.$$

We complete the proof of theorem 3.11.

By Theorem 3.11, coding according to probability, then the code length of *D*-ary optimal code is

$$l\_i = \log\_D \frac{1}{p\_i}, \ 1 \le i \le m.$$

But in general, log*<sup>D</sup>* 1 *pi* is not an integer, we use *a* to represent the smallest integer not less than the real number *a*. Take

$$l\_i = \left\lceil \log\_D \frac{1}{p\_i} \right\rceil, \ 1 \le i \le m. \tag{3.72}$$

Then

$$\sum\_{i=1}^{m} D^{-l\_i} \le \sum\_{i=1}^{m} D^{-\log\_D \frac{1}{p\_i}} = \sum\_{i=1}^{m} p\_i = 1.$$

Then the code length defined by formula (3.72) is {*l*1,*l*2,...,*lm*} and satisfies Kraft inequality. From Lemma 3.16, we can define the corresponding real-time code.

**Definition 3.17** Let *X* = {*x*1, *x*2,..., *xm*} be an information space, *pi* = *p*(*xi*),

$$l(f(\alpha\_i)) = l\_i = \left\lceil \log\_D \frac{1}{p\_i} \right\rceil, \ 1 \le i \le m.$$

Then the real-time code corresponding to {*l*1,*l*2,...,*lm*} is called Shannon code.

**Corollary 3.6** *The code length l*( *f* (*xi*)) *of a Shannon code C* = { *f* (*xi*)|1 ≤ *i* ≤ *m*} *satisfies*

130 3 Shannon Theory

$$l\_i = \left\lceil \log\_D \frac{1}{p(\mathbf{x}\_i)} \right\rceil, \log\_D \frac{1}{p(\mathbf{x}\_i)} \le l\_i < \log\_D \frac{1}{p(\mathbf{x}\_i)} + 1 \tag{3.73}$$

*and*

$$H\_D(X) \le L < H\_D(X) + 1.$$

*Where L is the average code length of C.*

*Proof* According to the definition of *a*, *a* ≤ *a* < *a* + 1, thus

$$\log\_D \frac{1}{p(\mathbf{x}\_i)} \le l\_i < \log\_D \frac{1}{p(\mathbf{x}\_i)} + 1.$$

So both sides multiply by *p*(*xi*) and sum 1 ≤ *i* ≤ *m*, then there is

$$\sum\_{i=1}^{m} p(\mathbf{x}\_i) \log\_D \frac{1}{p(\mathbf{x}\_i)} \le \sum\_{i=1}^{m} p(\mathbf{x}\_i) l\_i < \sum\_{i=1}^{m} p(\mathbf{x}\_i) \left( \log\_D \frac{1}{p(\mathbf{x}\_i)} + 1 \right).$$

That is

$$H\_D(X) \le L < H\_D(X) + 1.$$

The Corollary holds.

# **3.7 Several Examples of Compression Coding**

# *3.7.1 Morse Codes*

In variable length codes, in order to make the average code length as close to the source entropy as possible, the code length should match the occurrence probability of the corresponding coded characters as much as possible. The principle of probabilistic coding is that the characters with high occurrence probability are configured with short codewords, and the characters with low occurrence probability are configured with long codewords, So as to make the average code length as close to the source entropy as possible. This idea has existed long before Shannon theory. For example, Morse code invented in 1838 uses three symbols of dot, dash and space to encode 26 letters in English. It is expressed in binary, one dot is 10, a total of 2 bits, one dash is 1110, a total of 4 bits and the space is 000. There are three bits in total. For example, the commonly used English letter E is represented by a dot, while the infrequently used letter Q is represented by two dashes, one dot and one dash, which can make the average length of the codeword of the English text shorter. However, Morse code does not completely match the occurrence probability, so it is not the optimal code, and it is basically not used now. The following table is the coding table of Morse code (Fig. 3.1)

#### 3.7 Several Examples of Compression Coding 131

**Fig. 3.1** The coding table of Morse code

It is worth noting that Morse code appeared as a kind of password in the early stage, which is widely used in the transmission and storage of sensitive politics (such as military intelligence). The early cryptosystem compilers were also manufactured based on the principle of Morse code, which quickly mechanized the compilation and translation of passwords. In this sense, Morse code has played an important role in promoting the development of cryptography.

# *3.7.2 Huffman Codes*

Shannon, Fano and Huffman have all studied the coding methods of variable length codes, among which Huffman codes have the highest coding efficiency. We focus on the coding methods of Huffman binary and ternary codes.

Let *X* = {*x*1, *x*2,..., *xm*} be the source letter set of *m* symbols, arrange the *m* symbols in the order of occurrence probability, take the two letters with the lowest probability to prepare the numbers "0" and "1," respectively, then add their probabilities as a new letter and rearrange them in the order of probability with the source letters without binary numbers. Then take the two letters with the lowest probability to prepare the numbers "0" and "1," respectively, add the probabilities of the two letters as the probability of a new letter, and re queue; continue the above process until the probability of the remaining letters is added to 1. At this time, all source letters correspond to a string of "0" and "1," and we get a variable length code, which is called Huffman code. Taking *X* = {1, 2, 3, 4, 5} as the information space as an example, the corresponding probability distribution is

$$
\xi \sim \begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 0.25 & 0.25 & 0.2 & 0.15 & 0.15 \end{pmatrix}.
$$

Binary information entropy *H*2(*X*) and ternary information entropy *H*3(*X*) are

$$\begin{aligned} H\_2(X) &= -0.25 \log\_2 0.25 - 0.25 \log\_2 0.25 - 0.2 \log\_2 0.2 \\ &- 0.15 \log\_2 0.15 - 0.15 \log\_2 0.15 \\ &= 2.28 \text{ bits}, \\ H\_3(X) &= -0.25 \log\_3 0.25 - 0.25 \log\_3 0.25 - 0.2 \log\_3 0.2 \\ &- 0.15 \log\_3 0.15 - 0.15 \log\_3 0.15 \\ &= 1.44 \text{ bits}, \end{aligned}$$

respectively. The binary Huffman coding diagram of *X* is (Fig. 3.2).

The ternary Huffman coding diagram of *X* is (Fig. 3.3).

In summary, Huffman code has the following characteristics. Assuming that the occurrence probability of the *i*-th source letter is *pi* and the corresponding code length is *li* , then


Huffman code has been applied in practice, which is mainly used in the compression standard of fax image. However, in the actual data compression, the statistical characteristics of some sources change before and after. In order to make the statistical characteristics based on the coding adapt to the changes of the actual statistical characteristics of the source, an adaptive coding technology has been developed. In each step of coding, the coding of a new message is based on the statistical characteristics of previous messages. For example, R. G. Gallager first proposed the step-by-step updating technology of Huffman code in 1978, and D.E. Knuth made this technology a practical algorithm in 1985. Adaptive Huffman coding technology requires complex data structure and continuous updating of codeword set according to the statistical characteristics of source, We would not go into details here.

# *3.7.3 Shannon–Fano Codes*

Shannon–Fano code is an arithmetic code. Let *X* be an information space. It can be inferred from Corollary 3.6 in the previous section that the code length of Shannon code on *X* is

$$l(\mathbf{x}) = \left\lceil \log \frac{1}{p(\mathbf{x})} \right\rceil, \forall \, \mathbf{x} \in X.$$

Here, we introduce a constructive coding method using cumulative distribution function to allocate codewords, commonly known as Shannon–Fano coding method. Without losing generality, let each letter *x* in *X*, there is *p*(*x*) > 0, and define the cumulative distribution function *F*(*x*) and the modified distribution function *F*¯(*x*) as

$$F(\mathbf{x}) = \sum\_{a \le \mathbf{x}} p(a), \ \bar{F}(\mathbf{x}) = \sum\_{a < \mathbf{x}} p(a) + \frac{1}{2} p(\mathbf{x}), \tag{3.74}$$

where *X* = {1, 2,..., *m*} is a given information space. Without losing generality, let *p*(1) ≤ *p*(2) ≤···≤ *p*(*m*).

As can be seen from the definition, if *x* ∈ *X*, then *p*(*x*) = *F*(*x*) − *F*(*x* − 1), specially, if *x*, *y* ∈ *X*, then we have

$$
\bar{F}(\mathbf{x}) \neq \bar{F}(\mathbf{y}).
$$

So when we know *F*¯(*x*), we can find the corresponding *x*. The basic idea of Shannon– Fano arithmetic code is to use *F*¯(*x*) to encode *x*. Because *F*¯(*x*) is a real number, its binary decimal represents the first *l*(*x*) bits, denote as {*F*¯(*x*)}*l*(*x*), there is

$$
\bar{F}(\mathbf{x}) - \{\bar{F}(\mathbf{x})\}\_{l(\mathbf{x})} < 2^{-l(\mathbf{x})}.\tag{3.75}
$$

Take *l*(*x*) = log <sup>1</sup> *p*(*x*) + 1, then we have

$$\frac{1}{2^{I(\mathbf{x})}} = \frac{1}{2 \cdot 2^{\left[ \log \frac{1}{p(\mathbf{x})} \right]}} < \frac{p(\mathbf{x})}{2} = \bar{F}(\mathbf{x}) - F(\mathbf{x} - \mathbf{1}),\tag{3.76}$$

Now let the binary decimal of *F*¯(*x*) be expressed as

$$\bar{F}(\mathbf{x}) = 0.a\_1 a\_2 \cdots a\_{l(\mathbf{x})} a\_{l(\mathbf{x})+1} \cdots, \ \forall \ a\_i \in \mathbb{F}\_2.$$

Then Shannon–Fano code is

$$f(\mathbf{x}) = a\_1 a\_2 \cdots a\_{l(\mathbf{x})}, \text{ that is } \mathbf{x} \xrightarrow{\mathbf{encode}} a\_1 a\_2 \cdots a\_{l(\mathbf{x})} \in \mathbb{F}\_2^{l(\mathbf{x})}.\tag{3.77}$$

**Lemma 3.18** *The binary Shannon Fano code is a real-time code, and its average length L is at most two bits different from the theoretical optimal value H*(*X*)*.*

*Proof* By (3.76),

$$2^{-l(\mathbf{x})} < \frac{1}{2}p(\mathbf{x}) = \bar{F}(\mathbf{x}) - F(\mathbf{x} - \mathbf{l}).$$

Let the binary decimal of *F*¯(*x*) be expressed as

$$\bar{F}(\mathbf{x}) = 0.a\_1 a\_2 \cdots a\_{l(\mathbf{x})} \cdots, \ \forall \ a\_i \in \mathbb{F}\_2.$$

We use [*A*, *B*] to represent a closed interval on the real axis, so

$$\vec{F}(\mathbf{x}) \in [0.a\_1a\_2\cdots a\_{l(\mathbf{x})}, 0.a\_1a\_2\cdots a\_{l(\mathbf{x})} + \frac{1}{2^{l(\mathbf{x})}}].$$

If *y* ∈ *X*, *x* = *y*, and *f* (*x*) is the prefix of *f* (*y*), then we have

$$\bar{F}(\mathbf{y}) \in [0.a\_1a\_2\cdots a\_{l(x)}, 0.a\_1a\_2\cdots a\_{l(x)} + \frac{1}{2^{l(x)}}].$$

But

$$
\bar{F}(\mathbf{y}) - \bar{F}(\mathbf{x}) \ge \frac{1}{2}p(\mathbf{y}) \ge \frac{1}{2}p(\mathbf{x}) > \frac{1}{2^{l(\mathbf{x})}},
$$

This is contrary to the fact that *F*¯(*x*) and *F*¯(*y*) are in the same interval. Therefore, we have *f* as real-time code, that is, Shannon–Fano code is real-time code. Considering its average code length *L*,

$$L = \sum\_{\mathbf{x} \in X} p(\mathbf{x})l(\mathbf{x}) = \sum\_{\mathbf{x} \in X} p(\mathbf{x}) \left( \left\lceil \log \frac{1}{p(\mathbf{x})} \right\rceil + 1 \right) < \sum\_{\mathbf{x} \in X} p(\mathbf{x}) \left( \log \frac{1}{p(\mathbf{x})} + 2 \right) = H(X) + 2.1$$

We complete the proof of the Lemma.

Let *n* ≥ 1, *X<sup>n</sup>* is the power space of the information space, *x* = *x*<sup>1</sup> ··· *xn* ∈ *X<sup>n</sup>* is called a message column of length *n*. In order to improve the coding efficiency, it is often necessary to compress the power space *X<sup>n</sup>*, which is called arithmetic coding. Shannon–Fano code can also be used as arithmetic coding. Its basic method is to find a fast algorithm for calculating joint probability distribution *p*(*x*1*x*<sup>2</sup> ··· *xn*) and cumulative distribution function *F*(*x*), and then use Shannon–Fano method to encode *x* = *x*<sup>1</sup> ··· *xn*. We will not introduce the specific details here.

# **3.8 Channel Coding Theorem**

Let *X* be the input alphabet and *Y* the output alphabet, and let ξ and η be two random variables with values on *X* and *Y* . The probability functions *p*(*x*) and *p*(*y*) of *X* and *Y* and the conditional probability function *p*(*y*|*x*) are

$$p(\mathbf{x}) = P\{\xi = \mathbf{x}\}, \ p(\mathbf{y}) = P\{\eta = \mathbf{y}\}, \ p(\mathbf{y}|\mathbf{x}) = P\{\eta = \mathbf{y}|\xi = \mathbf{x}\} \text{respectively.}$$

From the full probability formula,

$$\begin{cases} p(\mathbf{y}|\mathbf{x}) \ge 0, & \forall \, \mathbf{x} \in X, \, \mathbf{y} \in Y. \\ \sum\_{\mathbf{y} \in Y} p(\mathbf{y}|\mathbf{x}) = 1, & \forall \, \mathbf{x} \in X. \end{cases} \tag{3.78}$$

If *X* and *Y* are finite sets, the conditional probability matrix *T* = (*p*(*y*|*x*))|*X*|×|*<sup>Y</sup>* <sup>|</sup> is called the transition probability matrix from *X* to *Y* , i.e.,

$$T = \begin{pmatrix} p(\mathbf{y}\_1|\mathbf{x}\_1) & p(\mathbf{y}\_2|\mathbf{x}\_1) & \dots & p(\mathbf{y}\_N|\mathbf{x}\_1) \\ p(\mathbf{y}\_1|\mathbf{x}\_2) & p(\mathbf{y}\_2|\mathbf{x}\_2) & \dots & p(\mathbf{y}\_N|\mathbf{x}\_2) \\ \vdots \\ p(\mathbf{y}\_1|\mathbf{x}\_M) & p(\mathbf{y}\_2|\mathbf{x}\_M) & \dots & p(\mathbf{y}\_N|\mathbf{x}\_M) \end{pmatrix},\tag{3.79}$$

where |*X*| = *M*, |*Y* | = *N*. By (3.78), each row of the transition probability matrix *T* is added to 1.


In discrete channel {*X*, *T*, *Y* }, codeword spaces *X<sup>n</sup>* and *Y <sup>n</sup>* with length *n* are defined as

$$X^n = \{ \mathbf{x} = \mathbf{x}\_1 \cdots \mathbf{x}\_n | \mathbf{x}\_i \in X \}, \\ Y^n = \{ \mathbf{y} = \mathbf{y}\_1 \cdots \mathbf{y}\_n | \mathbf{y}\_i \in Y \}, n \ge 1.$$

The probabilities of joint events *x* = *x*<sup>1</sup> ··· *xn* and *y* = *y*<sup>1</sup> ··· *yn* are defined as

$$p(\mathbf{x}) = p(\mathbf{x}\_1 \cdot \cdots \cdot \mathbf{x}\_n) = \prod\_{i=1}^n p(\mathbf{x}\_i), \quad p(\mathbf{y}) = p(\mathbf{y}\_1 \cdots \mathbf{y}\_n) = \prod\_{i=1}^n p(\mathbf{y}\_i), \tag{3.80}$$

then *X* and *Y* become a memoryless source, *X<sup>n</sup>* and *Y <sup>n</sup>* are power spaces, respectively.

**Definition 3.19** Discrete channel {*X*, *T*, *Y* } is called a memoryless channel if for any positive integer *n* ≥ 1, *x* = *x*<sup>1</sup> ··· *xn* ∈ *X<sup>n</sup>*, *y* = *y*<sup>1</sup> ··· *yn* ∈ *Y <sup>n</sup>*, we have

$$\begin{cases} p(\mathbf{y}|\mathbf{x}) = \prod\_{i=1}^{n} p(\mathbf{y}\_i|\mathbf{x}\_i), \\ p(\mathbf{x}\_i\mathbf{y}\_i) = p(\mathbf{x}\_1\mathbf{y}\_1), \forall \ i \ge 1. \end{cases} \tag{3.81}$$

From the joint event probability *p*(*xi yi*) = *p*(*x*<sup>1</sup> *y*1) in equation (3.81), then there is

$$p(\mathbf{y}\_i|\mathbf{x}\_i) = \frac{p(\mathbf{x}\_1)}{p(\mathbf{x}\_i)} p(\mathbf{y}\_1|\mathbf{x}\_1). \tag{3.82}$$

The above formula shows that in a memoryless channel, the conditional probability *p*(*yi*|*xi*) does not depend on *yi* .

Definition 3.19 is the statistical characteristic of a memoryless channel. The following lemma gives a mathematical characterization of a memoryless channel.

**Lemma 3.19** *A discrete channel* {*X*, *T*, *Y* } *is a memoryless channel if and only if the product information space XY is a memoryless source, and a power space* (*XY* )*<sup>n</sup>* = *XnY n.*

*Proof* If *XY* is a memoryless source (see Definition 3.9), thn for any *n* ≥ 1, and *x* = *x*<sup>1</sup> ··· *xn* ∈ *X<sup>n</sup>*, *y* = *y*<sup>1</sup> ··· *yn* ∈ *Y <sup>n</sup>*, *x y* ∈ *XnY <sup>n</sup>*, there is

$$p(\mathbf{x}\mathbf{y}) = p(\mathbf{x}\_1 \cdots \mathbf{x}\_n \mathbf{y}\_1 \cdots \mathbf{y}\_n) = p(\mathbf{x}\_1 \mathbf{y}\_1 \cdots \mathbf{x}\_n \mathbf{y}\_n) = \prod\_{i=1}^n p(\mathbf{x}\_i \mathbf{y}\_i).$$

Thus

$$p(\mathbf{x})p(\mathbf{y}|\mathbf{x}) = p(\mathbf{x})\prod\_{i=1}^{n}p(\mathbf{y}\_i|\mathbf{x}\_i),$$

so we have

$$p(\mathbf{y}|\mathbf{x}) = \prod\_{i=1}^{n} p(\mathbf{y}\_i|\mathbf{x}\_i).$$

*p*(*xi yi*) = *p*(*x*<sup>1</sup> *y*1) is given by the definition of memoryless source, so {*X*, *T*, *Y* } is a memoryless channel. Conversely, if {*X*, *T*, *Y* } is a memoryless channel, by (3.81), there are

$$p(\mathbf{x}\mathbf{y}) = \prod\_{i=1}^{n} p(\mathbf{x}\_i \mathbf{y}\_i)$$

and *p*(*xi yi*) = *p*(*x*<sup>1</sup> *y*1), then for any *a* = *a*1*a*<sup>2</sup> ··· *an* ∈ (*XY* )*<sup>n</sup>*, where *ai* = *xi yi* , we have

$$p(a) = p(\mathbf{x}\_1 \cdots \mathbf{x}\_n \mathbf{y}\_1 \cdots \mathbf{y}\_n) = p(\mathbf{x} \mathbf{y}) = \prod\_{i=1}^n p(\mathbf{x}\_i \mathbf{y}\_i) = \prod\_{i=1}^n p(a\_i)$$

and *p*(*ai*) = *p*(*a*1), therefore, *XY* is a memoryless source, that is, a group of independent and identically distributed random vectors ξ = (ξ1, ξ2,...,ξ*n*, . . .) take value on *XY* , and (*XY* )*<sup>n</sup>* = *XnY <sup>n</sup>* is called power space. The Lemma holds.

The following lemma further characterizes the statistical characteristics of a memoryless channel.

**Lemma 3.20** *If* {*X*, *T*, *Y* } *is a discrete memoryless channel, the conditional entropy H*(*Y <sup>n</sup>*|*X<sup>n</sup>*) *and information I*(*X<sup>n</sup>*, *Y <sup>n</sup>*) *of information space X<sup>n</sup> and Y <sup>n</sup> satisfy* ∀ *n* ≥ 1*,*

138 3 Shannon Theory

$$\begin{cases} H(Y^n|X^n) = nH(Y|X). \\ I(X^n, Y^n) = nI(X, Y). \end{cases} \tag{3.83}$$

*Proof* Because *XY* is a memoryless source, we have

$$H(X^n Y^n) = H((XY)^n) = nH(XY) = nH(X) + nH(Y|X).$$

On the other hand, by the addition formula of entropy, there is

$$H(X^n Y^n) = H(X^n) + H(Y^n | X^n) = nH(X) + H(Y^n | X^n).$$

The combination of the above two formulas has

$$H(Y^n|X^n) = nH(Y|X).$$

According to the definition of mutual information,

$$\begin{aligned} I(X^n, Y^n) &= H(Y^n) - H(Y^n | X^n) \\ &= nH(Y) - nH(Y|X) \\ &= n(H(Y) - H(Y|X)) = nI(X, Y). \end{aligned}$$

The Lemma holds.

Let us define the channel capacity of a discrete channel, this concept plays an important role in channel coding. First, we note that the joint probability distribution *p*(*x y*)in the product space *XY* is uniquely determined by the probability distribution *p*(*x*) on *X* and the probability transformation matrix *T* , that is *p*(*x y*) = *p*(*x*)*p*(*y*|*x*); therefore, the mutual information *I*(*X*, *Y* ) of *X* and *Y* is also uniquely determined by *p*(*x*) and *T* . In fact,

$$\begin{aligned} I(X,Y) &= \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log \frac{p(\mathbf{x}\mathbf{y})}{p(\mathbf{x})p(\mathbf{y})} \\ &= \sum\_{\mathbf{x}\in X} p(\mathbf{x}) \sum\_{\mathbf{y}\in Y} p(\mathbf{y}|\mathbf{x}) \log \frac{p(\mathbf{y}|\mathbf{x})}{\sum\_{\mathbf{x}\in X} p(\mathbf{x})p(\mathbf{y}|\mathbf{x})}. \end{aligned}$$

**Definition 3.20** The channel capacity *B* of a discrete memoryless channel{*X*, *T*, *Y* } is defined as

$$B = \max\_{p(x)} I(X, Y),\tag{3.84}$$

where formula (3.84) is the maximum of all probability distributions *p*(*x*) on *X*.

**Lemma 3.21** *The channel capacity B of a discrete memoryless channel* {*X*, *T*, *Y* } *is estimated as follows:*

$$0 \le B \le \min\{\log|X|, \ \log|Y|\}.$$

*Proof* The amount of mutual information between the two information spaces is *I*(*X*, *Y* ) ≥ 0 (see Lemma 3.5), so there is *B* ≥ 0. By Lemma 3.4,

$$H(X,Y) = H(X) - H(X|Y) \le H(X) \le \log |X|.$$

and

$$I(X,Y) = H(Y) - H(Y|X) \le H(Y) \le \log |Y|,$$

so we have

$$0 \le B \le \min\{\log|X|, \ \log|Y|\}.$$

The calculation of information capacity is a problem of solving the conditional extremum of constrained convex function. We will not discuss it in detail here but calculate its channel capacity for two simple channels.

*Example 3.8* The channel capacity of noiseless channel {*X*, *T*, *Y* } is *B* = log |*X*|.

*Proof* Let {*X*, *T*, *Y* } be a noise free channel, then |*X*|=|*Y* |, and the probability transfer matrix *T* is the identity matrix, so

$$\begin{aligned} I(X,Y) &= \sum\_{\mathbf{x}\in X} \sum\_{\mathbf{y}\in Y} p(\mathbf{x}\mathbf{y}) \log \frac{p(\mathbf{y}|\mathbf{x})}{p(\mathbf{y})} \\ &= \sum\_{\mathbf{x}\in X} p(\mathbf{x}) \sum\_{\mathbf{y}\in Y} p(\mathbf{y}|\mathbf{x}) \log \frac{p(\mathbf{y}|\mathbf{x})}{p(\mathbf{y})}. \end{aligned}$$

Because *p*(*y*|*x*) = 0, if *y* = *x*; *p*(*y*|*x*) = 1, if *x* = *y*. So there is

$$I(X,Y) = \sum\_{\boldsymbol{x} \in X} p(\boldsymbol{\alpha}) \log \frac{1}{p(\boldsymbol{\alpha})} = H(X) \le \log |X|.$$

Thus

$$B = \max\_{p(x)} I(X, Y) = \log |X|.$$

*Example 3.9* The channel capacity *B* of binary symmetric channel {*X*, *T*, *Y* } is

$$B = 1 - p \log p - (1 - p) \log(1 - p) = 1 - H(p),$$

where *p* < <sup>1</sup> <sup>2</sup> , *H*(*p*) is the binary entropy function.

*Proof* In binary symmetric channel {*X*, *<sup>T</sup>*, *<sup>Y</sup>* }, *<sup>X</sup>* <sup>=</sup> *<sup>Y</sup>* <sup>=</sup> <sup>F</sup><sup>2</sup> = {0, <sup>1</sup>}, *<sup>T</sup>* is a secondorder symmetric matrix

$$T = \begin{pmatrix} 1-p & p \\ p & 1-p \end{pmatrix}, \ p < 1.$$

Let *a* be the random variable in the input space F<sup>2</sup> and *b* be the random variable in the output space F2, all of which obey the two-point distribution, and then the transfer matrix *T* of the symmetric binary channel can be represented by the following clearer schematic diagram:

$$\begin{cases} P\{b=1|a=0\} = P\{b=0|a=1\} = p\\ P\{b=0|a=0\} = P\{b=1|a=1\} = 1-p. \end{cases}$$

Calculate mutual information *I*(*X*, *Y* ), there is

$$I(X,Y) = H(X) - H(X|Y),$$

however,

$$\begin{aligned} H(X|Y) &= \sum\_{\mathbf{x} \in \mathbb{F}\_2} \sum\_{\mathbf{y} \in \mathbb{F}\_2} p(\mathbf{x} \mathbf{y}) \log p(\mathbf{x}|\mathbf{y}) \\ &= -p \log p - (1 - p) \log(1 - p) = H(p). \end{aligned}$$

Thus

$$B = \max\{I(X, Y)\} = \max\{H(X) - H(p)\} = 1 - H(p).$$

In order to state and prove the channel coding theorem, we introduce the concept of joint typical sequence. By the Definition 3.13 of Sect. 5 this chapter, if *X* is a memoryless source, for any small ε > 0 and positive integer *n* ≥ 1, in the power space *X<sup>n</sup>*, we define the typical sequence *W*(*n*) <sup>ε</sup> as

$$W\_{\varepsilon}^{(n)} = \{ \mathbf{x} = \mathbf{x}\_1 \cdots \mathbf{x}\_n \in X^n || -\frac{1}{n} \log |p(\mathbf{x}) - H(X)| < \varepsilon \}.$$

If {*X*, *T*, *Y* } is a memoryless channel, by Lemma 3.19, *XY* is a memoryless source, in the power space (*XY* )*<sup>n</sup>* = *XnY <sup>n</sup>*, we define the joint canonical sequence *W*(*n*) <sup>ε</sup> as (Fig. 3.4)

$$W\_{\varepsilon}^{(n)} = \left\{ \mathbf{x} \mathbf{y} \in X^n Y^n \, \middle| \, -\frac{1}{n} \log p(\mathbf{x}) - H(\mathbf{X}) \, | \, < \varepsilon, \, | -\frac{1}{n} \log p(\mathbf{y}) - H(\mathbf{Y}) | \, < \varepsilon, \, | \right. \right\}$$

$$\left| -\frac{1}{n} \log p(\mathbf{x} \mathbf{y}) - H(\mathbf{X} \mathbf{Y}) \right| < \varepsilon \, \Big| \,. \tag{3.85}$$

**Lemma 3.22** (Progressive bisection) *In memoryless channel* {*X*, *T*, *Y* }*, the joint typical sequence W*(*n*) <sup>ε</sup> *satisfies the following asymptotic bisection properties:*

$$(i)\lim\_{n\to\infty} P\{\text{xy}\in W\_{\varepsilon}^{(n)}\} = 1;$$

#### **Fig. 3.4** The transfer matrix

*(ii)* (1 − ε) 2*<sup>n</sup>*(*H*(*XY* )−ε) ≤ |*W*(*n*) <sup>ε</sup> | ≤ 2*<sup>n</sup>*(*H*(*XY* )+ε)*; (iii) If x* ∈ *Xn, y* ∈ *Y n,and p*(*x y*) = *p*(*x*)*p*(*y*)*, then*

$$(1 - \varepsilon) \ 2^{-n(I(X, Y) + 3\varepsilon)} \le P\{\text{xy} \in W\_{\varepsilon}^{(n)}\} \le 2^{-n(I(X, Y) - 3\varepsilon)}.$$

*Proof* By Lemma 3.13, we have

$$-\frac{1}{n}\log p(X^n) \to H(X), \text{ convergence according to probability when } n \to \infty;$$

$$-\frac{1}{n}\log p(Y^n) \to H(Y), \text{ convergence according to probability when } n \to \infty;$$

$$-\frac{1}{n}\log p(X^n Y^n) \to H(XY), \text{ convergence according to probability when } n \to \infty.$$

So when ε is given, as long as *n* is sufficiently large, there is

$$\begin{aligned} P\_1 &= P\left\{ | -\frac{1}{n} \log p(\mathbf{x}) - H(X) | > \varepsilon \right\} < \frac{1}{3}\varepsilon, \\\\ P\_2 &= P\left\{ | -\frac{1}{n} \log p(\mathbf{y}) - H(Y) | > \varepsilon \right\} < \frac{1}{3}\varepsilon, \\\\ P\_3 &= P\left\{ | -\frac{1}{n} \log p(\mathbf{x}\mathbf{y}) - H(XY) | > \varepsilon \right\} < \frac{1}{3}\varepsilon, \end{aligned}$$

where *x* ∈ *X<sup>n</sup>*, *y* ∈ *Y <sup>n</sup>*. Thus, it can be obtained

$$P\left\{\text{xy}\notin W\_{\varepsilon}^{(n)}\right\} \le P\_1 + P\_2 + P\_3 < \varepsilon.$$

Thus

$$P\left\{\mathbf{x}\mathbf{y}\in W\_{\varepsilon}^{(n)}\right\} > 1 - \varepsilon,$$

in other words,

$$\lim\_{n \to \infty} P\{\mathbf{x}\mathbf{y} \in W\_{\varepsilon}^{(n)}\} = 1.$$

Property (*i*) holds. To prove (*ii*), let *x* ∈ *X<sup>n</sup>*, *y* ∈ *Y <sup>n</sup>*, and *x y* ∈ *W*(*n*) <sup>ε</sup> , then

$$H(XY) - \varepsilon < -\frac{1}{n} \log p(\text{xy}) < H(XY) + \varepsilon.$$

Equivalently,

$$2^{-n(H(XY)+\varepsilon)} < p(\ge \mathfrak{y}) < 2^{-n(H(XY)-\varepsilon)}.$$

By total probability formula,

$$1 = \sum\_{\mathbf{x}\mathbf{y}\in X^{\mathbf{n}}Y^{\mathbf{n}}} p(\mathbf{x}\mathbf{y}) \ge \sum\_{\mathbf{x}\mathbf{y}\in W\_{\varepsilon}^{(n)}} p(\mathbf{x}\mathbf{y}) \ge |W\_{\varepsilon}^{(n)}| \cdot 2^{-n(H(XY)+\varepsilon)}.$$

So there is

$$|W\_{\varepsilon}^{(n)}| \le 2^{n(H(XY) + \varepsilon)}.$$

On the other hand, when *n* is sufficiently large,

$$\begin{aligned} 1 - \varepsilon &< P\{\mathbf{x}\mathbf{y} \in W\_{\varepsilon}^{(n)}\} = \sum\_{\mathbf{x}\mathbf{y} \in W\_{\varepsilon}^{(n)}} p(\mathbf{x}\mathbf{y}) \\ &\leq |W\_{\varepsilon}^{(n)}| \ 2^{-n(H(XY) - \varepsilon)} .\end{aligned}$$

So there is

$$(1 - \varepsilon) \, 2^{n(H(XY) - \varepsilon)} \le |W\_{\varepsilon}^{(n)}| \le 2^{n(H(XY) + \varepsilon)},$$

property (*ii*) holds. Now let's prove property (*iii*). If *p*(*x y*) = *p*(*x*)*p*(*y*), then

$$\begin{aligned} \mathcal{P}\{\mathbf{x}\mathbf{y}\in W\_{\varepsilon}^{(n)}\} &= \sum\_{\mathbf{x}\mathbf{y}\in W\_{\varepsilon}^{(n)}} p(\mathbf{x})p(\mathbf{y}) \\ &\leq |W\_{\varepsilon}^{(n)}| 2^{-n(H(X)-\varepsilon)} 2^{-n(H(Y)-\varepsilon)} \\ &\leq 2^{n(H(XY)+\varepsilon-H(X)-H(Y)+2\varepsilon)} \\ &= 2^{-n(I(X,Y)-3\varepsilon)}. \end{aligned}$$

Similarity can prove its lower bound, so we have

$$(1 - \varepsilon) \ 2^{-n(I(X, Y) + 3\varepsilon)} \le P\{\text{xy} \in W\_{\varepsilon}^{(n)}\} \le 2^{-n(I(X, Y) - 3\varepsilon)}.$$

We have completed the proof of Lemma.

The following lemma has important applications in proving the channel coding theorem. In fact, the conclusion of lemma is valid in general probability space.

**Lemma 3.23** *In memoryless channel* {*X*, *T*, *Y* }*, if codeword y* ∈ *Y <sup>n</sup> is uniquely determined by x* ∈ *Xn, x* ∈ *Xn, x and x are independent, y and x are also independent.*

*Proof* If *y* is uniquely determined by *x*, then *p*(*x*) = *p*(*y*) = *p*(*x y*), or *p*(*y*|*x*) = 1. Therefore, the probability of joint event *yxx* is

$$p(\text{y} \mathbf{x} \boldsymbol{\alpha}') = p(\mathbf{x} \boldsymbol{\alpha}') = p(\mathbf{x})p(\mathbf{x}') = p(\mathbf{y})p(\mathbf{x}').$$

on the other hand,

$$p(\text{yax}') = p(\text{yx}').$$

Thus

$$p(\mathbf{y}\mathbf{x'}) = p(\mathbf{y})p(\mathbf{x'})\,.$$

The Lemma holds.

In order to define the error probability of channel transmission, we first introduce the workflow of channel coding. After source compression coding, a source message input set is generated,

*W* = {1, 2,..., *M*}, *M* ≥ 1 is positive integers.

Injection *f* : *W* → *X<sup>n</sup>* is called coding function, *f* encodes each input message w ∈ *W* as *f* (w) ∈ *X<sup>n</sup>*. Codeword *x* = *f* (w) ∈ *X<sup>n</sup>* receives codeword *y* ∈ *Y <sup>n</sup>* after transmission through channel {*X*, *<sup>T</sup>*, *<sup>Y</sup>* }, we write *<sup>x</sup> <sup>T</sup>* −→ *y*, or *y* = *T* (*x*). Mapping *g* : *Y <sup>n</sup>* → *W* is called decoding function. Therefore, the so-called channel coding is a pair of mapping ( *f*, *g*). Obviously,

$$\mathcal{C} = f(W) = \{ f(w) | w \in W \} \subset X''$$

is a code with length *n* in codeword space *X<sup>n</sup>*, number of codewords is |*C*|=|*W*| = *M*. *C* is the code of *f* . The code rate *RC* is

$$R\_C = \frac{1}{n} \log |C| = \frac{1}{n} \log M.$$

For each input message w ∈ *W*, if *g*(*T* ( *f* (w))) = w, it is said that the channel transmission is wrong, the transmission error probability λw is

$$
\lambda\_w = P\{g(T(f(w))) \neq w\}, \ w \in W. \tag{3.86}
$$

The transmission error probability of codeword *x* = *f* (w) ∈ *C* is recorded as *Pe*(*x*), obviously, *Pe*(*x*) = λw, that is, *Pe*(*x*) is the conditional probability

$$\begin{split} P\_{\epsilon}(\mathbf{x}) &= P\{g(T(\mathbf{x})) \neq w | \mathbf{x} = f(w) \} \\ &= P\{g(T(f(w))) \neq w \} = \lambda\_w. \end{split} \tag{3.87}$$

We define the transmission error probability of code *C* = *f* (*W*) ⊂ *X<sup>n</sup>* as *Pe*(*C*),

144 3 Shannon Theory

$$P\_{\epsilon}(C) = \frac{1}{M} \sum\_{\mathbf{x} \in C} P\_{\epsilon}(\mathbf{x}) = \frac{1}{M} \sum\_{w=1}^{M} \lambda\_{w}. \tag{3.88}$$

As before, a code *C* with length *n* and number of codewords *M* is recorded as *C* = (*n*, *M*).

**Theorem 3.12** (Shannon's channel coding theorem, 1948) *Let*{*X*, *T*, *Y* } *be a memoryless channel and B be the channel capacity, then*

*(i) When R* < *B, there is a column of codes Cn* = (*n*, 2[*n R*] )*, its transmission error probability Pe*(*Cn*) *satisfies*

$$\lim\_{n \to \infty} P\_{\epsilon}(C\_n) = 0;\tag{3.89}$$

*(ii) Conversely, if the transmission error probability of code Cn* = (*n*, 2[*n R*] )*satisfies Eq. (3.89), there is an absolute normal number N*0*, and we have the code rate RCn of Cn satisfies*

$$R\_{C\_n} \le B, \text{ when } n \ge N\_0.$$

If *Cn* = (*n*, 2[*n R*] ), by Lemma 2.27 of Chap. 2,

$$R - \frac{1}{n} < R\_{C\_\pi} \le R. \tag{3.90}$$

so (*i*) of Theorem 3.12 indicates that the code rate is sufficiently close to the channel capacity *B*, the "good code" with sufficiently small transmission error probability exists.(*ii*)indicates that the bit rate of the so-called good code with sufficiently small transmission error probability does not exceed the channel capacity. Shannon's proof Theorem 3.12 uses random code technology; this idea of using random method to prove deterministic results is widely used in information theory. At present, it has more and more applications in other fields.

*Proof* (Proof of theorem 3.12) Firstly, the probability function *p*(*xi*) is arbitrarily selected on the input alphabet *X*, and the joint probability in power space *X<sup>n</sup>* is defined as

$$p(\mathbf{x}) = \prod\_{i=1}^{n} p(\mathbf{x}\_i), \ \mathbf{x} = \mathbf{x}\_1 \cdots \mathbf{x}\_n \in X^n,\tag{3.91}$$

In this way, we get a memoryless source *X* and power space *X<sup>n</sup>*, which constitute the codeword space of channel coding. Then *M* = 2[*n R*] codewords are randomly selected in *X<sup>n</sup>* to obtain a random code *Cn* = (*n*, 2[*n R*] ). In order to illustrate the randomness of codeword selection, we borrow the source message set *W* = {1, 2,..., *M*}, where *M* = 2[*n R*] . For every message w, 1 ≤ w ≤ *M*, the randomly generated codeword is marked as *X*(*n*) (w). So we get a random code

$$C\_n = \{ X^{(n)}(1), X^{(n)}(2), \dots, X^{(n)}(M) \} \subset X^n.$$

The generation probability *P*{*Cn*} of *Cn* is

$$P\{C\_n\} = \prod\_{w=1}^M P\{X^{(n)}(w)\} = \prod\_{w=1}^M \prod\_{i=1}^n p(\chi\_i(w)),$$

where *X*(*n*) (w) = *x*1(w)*x*2(w)··· *xn*(w) ∈ *X<sup>n</sup>*.

We take *An* = {*Cn*} as the set of all random codes *Cn*, which is called the random code set. The average transmission error probability on random code set *An* is defined as

$$\bar{P}\_e(A\_n) = \sum\_{C\_n \in A\_n} P\{C\_n\} P\_e(C\_n). \tag{3.92}$$

If you want to prove that for any ε > 0, When *n* is sufficiently large, *P*¯ *<sup>e</sup>*(*An*)<ε, then there is at least one code *Cn* ∈ *An* such that *Pe*(*Cn*)<ε, which proves the (*i*). Therefore, we prove it in two steps.

(1) Principles of constructing random codes and encoding and decoding

We select each message in the source message set *W* = {1, 2,..., *M*} with equal probability, that is w ∈ *W*, the selection probability of w is

$$p(w) = \frac{1}{M} = 2^{-\lceil nR \rceil}, \ w = 1, 2, \dots, M.$$

In this way, *W* becomes an equal probability information space. For each input message w, it is randomly coded as *X*(*n*) (w) ∈ *X<sup>n</sup>*, where

$$X^{(n)}(w) = \mathbf{x}\_1(w)\mathbf{x}\_2(w)\cdots\mathbf{x}\_n(w) \in X''.$$

Codeword *X*(*n*) (w) is transmitted through memoryless channel {*X*, *T*, *Y* } with conditional probability

$$p(\mathbf{y}|X^{(n)}(w)) = \prod\_{i=1}^{n} p(\mathbf{y}\_i|\mathbf{x}\_i(w))$$

received codeword *y* = *y*<sup>1</sup> *y*<sup>2</sup> ··· *yn* ∈ *Y <sup>n</sup>*. The decoding principle of *y* is: If *X*(*n*) (w) is the only input codeword so that *X*(*n*) (w)*y* is joint typical, that is *X*(*n*) (w)*y* ∈ *W*(*n*) <sup>ε</sup> , then decode *g*(*y*) = w; if there is no such codeword *X*(*n*) (w), or there are two or more codewords *X*(*n*) (w) and *y* are joint typical, *y* cannot be decoded correctly.

(2) Estimating the average error probability of random code set *An*

By (3.92) and (3.88),

#### 146 3 Shannon Theory

$$\begin{split} \tilde{P}\_{\varepsilon}(A\_{n}) &= \sum\_{C\_{n} \in A\_{n}} P\{C\_{n}\} P\_{\varepsilon}(C\_{n}) \\ &= \sum\_{C\_{n} \in A\_{n}} P\{C\_{n}\} \frac{1}{M} \sum\_{x \in C\_{n}} P\_{\varepsilon}(x) \\ &= \frac{1}{M} \sum\_{w=1}^{M} \lambda\_{w} \sum\_{C\_{n} \in A\_{n}} P\{C\_{n}\} \\ &= \frac{1}{M} \sum\_{w=1}^{M} \lambda\_{w}, \end{split} \tag{3.93}$$

where λw is given by Eq. (3.86). Because w is input with equal probability, in other words, w is encoded with equal probability. Therefore, the transmission error probability λw of w does not depend on w, that is

$$
\lambda\_1 = \lambda\_2 = \dots = \lambda\_M.
$$

By (3.93), we have *P*¯ *<sup>e</sup>*(*An*) = λ1. To estimate λ1, we define

$$E\_i = \{ \mathbf{y} \in Y^n | X^n(i) \mathbf{y} \in W\_e^{(n)} \}, \ i = 1, 2, \dots, M,\tag{3.94}$$

If *E<sup>c</sup>* <sup>1</sup> = *Y <sup>n</sup>*\*E*<sup>1</sup> is the remainder of *E*1, because of the decoding principle,

$$\lambda\_1 = P\{E\_1^c \cup E\_2 \cup \dots \cup E\_M\} \le P\{E\_1^c\} + \sum\_{i=2}^M P\{E\_i\}.\tag{3.95}$$

By property (*i*) of Lemma 3.22,

$$\lim\_{n \to \infty} P\{\text{xy} \notin W\_{\varepsilon}^{(n)}\} = 0.$$

So there is

$$\lim\_{n \to \infty} P\{X^{(n)}(1) \mathbf{y} \notin W\_{\varepsilon}^{(n)}\} = 0.$$

Therefore, when *n* is sufficiently large,

$$P\{E\_1^c\} < \varepsilon.$$

Obviously, codeword *X*(*n*) (1) and other codewords *X*(*n*) (*i*),(*i* = 2,..., *M*) are independent of each other (see 3.91). By Lemma 3.23, *y* = *T* (*X*(*n*) (1)) and *X*(*n*) (*i*) (*i* = 1) also are independent of each other. Then by the property (*iii*) of Lemma 3.22,

$$P\{E\_i\} = P\{X^{(n)}(i) \mathbf{y} \in W\_e^{(n)}\} \le 2^{-n(I(X,Y) - 3\epsilon)} \text{ ( $i \ne 1$ )}.$$

To sum up,

$$\begin{aligned} \bar{P}\_{\varepsilon}(A\_n) &= \lambda\_1 \le \varepsilon + \sum\_{i=2}^{M} 2^{-n(I(X,Y) - 3\varepsilon)} \\ &\le \varepsilon + 2^{\lceil nR \rceil} 2^{-n(I(X,Y) - 3\varepsilon)} \\ &\le \varepsilon + 2^{-n(I(X,Y) - R - 3\varepsilon)} .\end{aligned}$$

If *R* < *I*(*X*, *Y* ), then *I*(*X*, *Y* ) − *R* − 3ε > 0(when ε is sufficiently small), so when *n* is large enough, we have *P*¯ *<sup>e</sup>*(*An*)<2ε. Due to the channel capacity *B* = max{*I*(*X*, *Y* )}, we can choose *p*(*x*) to make *B* = *I*(*X*, *Y* ). So when *R* < *B*, we have *P*¯ *<sup>e</sup>*(*An*) < 2ε, this completes the proof of (*i*).

To prove (*ii*), let's look at a special case first. If the error probability of *C* = (*n*, 2[*n R*] )is *Pe*(*C*) <sup>=</sup> 0, then the bit rate of*<sup>C</sup>* is *RC* <sup>&</sup>lt; *<sup>B</sup>* <sup>+</sup> <sup>1</sup> *<sup>n</sup>* , so when *n* is sufficiently large, there is *RC* ≤ *B*.

In fact, because *Pe*(*C*) = 0, decoding function *g* : *Y <sup>n</sup>* → *W* only determines *W*, there is *H*(*W*|*Y <sup>n</sup>*) = 0. Because *W* is equal probability information space, so

$$H(W) = \log|W| = \lceil nR \rceil.$$

Using the decomposition of mutual information, there are

$$I(W, Y^n) = H(W) - H(W|Y^n) = H(W) = [nR].\tag{3.96}$$

on the other hand, *W* → *X<sup>n</sup>* → *Y <sup>n</sup>* forms a Markov chain, by data inequality (see Theorem 3.8)

$$I(W, Y'') \le I(X'', Y'').$$

By Lemma 3.20,

$$I(W, Y'') \le I(X'', Y'') = nI(X, Y) \le nB.$$

By (3.96), there is [*n R*] ≤ *nB*. Because *n R* − 1 < [*n R*] ≤ *n R*, so *n R* < *nB* + 1, that is *<sup>R</sup>* <sup>&</sup>lt; *<sup>B</sup>* <sup>+</sup> <sup>1</sup> *<sup>n</sup>* , by (3.90), we have

$$
\mathcal{R}\_C \le \mathcal{R} < \mathcal{B} + \frac{1}{n},
$$

thus

*RC* ≤ *B*, when *n* is sufficiently large.

The above formula shows that when the transmission error probability is 0, as long as *n* is sufficiently large, there is *RC* ≤ *B*. Secondly, if the transmission error is allowed, that is, the error probability of *Cn* is *Pe*(*Cn*)<ε, where *Cn* = (*n*, 2[*n R*] ). Then when *n* is sufficiently large, we still have *RCn* ≤ *B*.

In order to prove the above conclusion, we note the error probability of random code *Cn* is

$$P\_e(C\_n) = \lambda\_w,\tag{3.97}$$

where w ∈ *W* is any given message. When w is given, we define a random variable ξw with a value on {0, 1} as

$$\xi\_w = \begin{cases} 1, & \text{if } \operatorname{g}(T(f(w))) \neq w; \\ 0, & \text{if } \operatorname{g}(T(f(w))) = w. \end{cases}$$

Let *<sup>E</sup>* <sup>=</sup> (F2, ξw) be a binary information space, by (3.97), then we have

$$P\_e(C\_n) = P\{\xi\_w = 1\}.$$

By Theorem 3.3,

$$\begin{split}H(EW|Y^{n}) &= H(W|Y^{n}) + H(E|WY^{n})\\ &= H(E|Y^{n}) + H(W|EY^{n}).\end{split}\tag{3.98}$$

Note that *E* is uniquely determined by *Y <sup>n</sup>* and *W*, so *H*(*E*|*W Y <sup>n</sup>*) = 0, at the same time, *E* is a binary information space, *H*(*E*) ≤ log 2 = 1, there is

$$H(E|Y^n) \le H(E) \le 1.$$

On the other hand, the random variable ξw is only related to w ∈ *W*, so

$$H(W|EY^n) = P\_\varepsilon(C\_n) \log(|W| - 1) \le n R P\_\varepsilon(C\_n).$$

By (3.98), we have

$$H(W|Y^n) \le 1 + n \, R P\_c(C\_n).$$

Because *f* (*W*) = *X<sup>n</sup>*(*W*) is a function of *W*, we have the following Fano inequality

$$H(f(W)|Y'') \le H(W|Y'') \le 1 + nR P\_\varepsilon(C\_n).$$

Finally,

$$\begin{aligned} 0 &= H(W) = H(W|Y^n) + I(W, Y^n) \\ &\le H(W|Y^n) + I(f(W), Y^n) \\ &\le 1 + n R P\_\varepsilon(C\_n) + I(X^n, Y^n) \\ &\le 1 + n R P\_\varepsilon(C\_n) + n B, \end{aligned}$$

because of *n R* − 1 < [*n R*], then we have

$$nR < 2 + nRP\_e(C\_n) + nB.$$

Thus

$$R\_{C\_n} \le R < B + \frac{2}{n} + \varepsilon,$$

When *n* is sufficiently large, we obtain *RCn* ≤ *B*, which completes the proof of the theorem.

It can be seen from Example 3.9 that the channel capacity *B* = 1 − *H*(*p*) of a binary symmetric channel. Therefore, Theorem 3.12 extends Theorem 2.10 in the previous chapter to a more general memoryless channel; at the same time, it is also proved that the code rate of a good code does not exceed the capacity of the channel.

#### **Exercise 3**

1. The joint probability functions of the two information spaces *X* and *Y* are as follows:

Solve *H*(*X*), *H*(*Y* ), *H*(*XY* ), *H*(*X*|*Y* ), *H*(*Y* |*X*), and *I*(*X*, *Y* ).

2. Let *X*1, *X*2, *X*<sup>3</sup> be three information spaces on F2, Known *I*(*X*1, *X*2) = 0, *I*(*X*1, *X*2, *X*3) = 1, prove:

$$H(X\_3) = 1, \text{and } H(X\_1 X\_2 X\_3) = 2.$$

	- (i) *H*(*XY* |*Z*) ≥ *H*(*X*|*Z*);
	- (ii) *I*(*XY*, *Z*) ≥ *I*(*X*, *Z*);
	- (iii) *H*(*XYZ*) − *H*(*XY* ) ≤ *H*(*X Z*) − *H*(*X*);
	- (iv) *I*(*X*, *Z*|*Y* ) = *I*(*Z*, *Y* |*Z*) − *I*(*Z*, *Y* ) + *I*(*X*, *Z*).

It also explains under what conditions the equality sign holds.


$$p(n) = P\{\xi = n\}, n = 0, 1, \dots, \infty$$

Given the mathematical expectation *E*ξ = *A* > 0 of ξ , find the maximum probability distribution {*p*(*n*)|*n* = 0, 1,...} of *H*(*X*) and the corresponding maximum information entropy.

	- (i) *H*(*X*1) ≥ *H*(*X*2), give the conditions under which the equal sign holds.
	- (ii) *H*(*X*1|*X*2) ≥ *H*(*X*2|*X*1), give the conditions under which the equal sign holds.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Cryptosystem and Authentication System**

In 1949, Shannon published a famous paper entitled "communication theory of secure systems" in the technical bulletin of Bell laboratory. Based on the mathematical theory of information established by him in 1948 (see Chap. 3), this paper makes a comprehensive discussion on the problem of secure communication and establishes the mathematical theory of secure communication system. It has a great impact on the later development of cryptography. It is generally believed that Shannon transformed cryptography from art (creative ways and methods) to science, so he is also known as the father of modern cryptography. The main purpose of this chapter is to introduce Shannon's important ideas and results in cryptography theory, which is the cornerstone of the whole modern cryptography.

# **4.1 Definition and Statistical Characteristics of Cryptosystem**

Let *X* = {*a*1, *a*2,..., *aq* } be the plaintext alphabet and a source. {ξ*i*}<sup>∞</sup> *<sup>i</sup>*=<sup>1</sup> is a set of random variables valued on *X*, for any given positive integer *n* ≥ 1, we define the plaintext space *P* as the product information space *X*1*X*<sup>2</sup> ··· *Xn*, that is

$$P = X\_1 X\_2 \cdots X\_n \text{, where } X\_i = (X, \xi\_i), \ 1 \le i \le n.$$

If *m* = *m*1*m*<sup>2</sup> ··· *mn* ∈ *P*(*mi* ∈ *Xi*), *m* is called a plaintext information column of alphabet length *n*, or a plaintext string of length *n*, the joint probability *p*(*m*) is defined as

$$p(m) = p(m\_1 m\_2 \cdots m\_n) = P\{\xi\_1 = m\_1, \xi\_2 = m\_2, \dots, \xi\_n = m\_n\}.\tag{4.1}$$

© The Author(s) 2022 Z. Zheng, *Modern Cryptography Volume 1*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-19-0920-7\_4

Let *Z* = {*b*1, *b*2,..., *bs*} be the key alphabet, which is also a memoryless source (see Definition 3.9 of Chap. 3), let {η*i*}<sup>∞</sup> *<sup>i</sup>*=<sup>1</sup> be a group of random variables with independent values on *Z* and equal probability distribution, then for any *b* ∈ *Z*,

$$p(b) = P\{\eta\_i = b\} = \frac{1}{|Z|} = \frac{1}{s}, \text{ } \forall \ i \ge 1. \tag{4.2}$$

We define the power space *Z<sup>r</sup>* as a key space, denoted by *K*, that is

$$K = Z^r = \{ k = k\_1 k\_2 \cdots k\_r | k\_i \in Z, \ 1 \le i \le r \}.$$

Each *k* = *k*1*k*<sup>2</sup> ··· *kr* ∈ *K* is called a key of length *r*, and the joint probability *p*(*k*) is

$$p(k) = p(k\_1 k\_2 \cdots k\_r) = \prod\_{i=1}^r p(k\_i) = \frac{1}{|Z|^r} = \frac{1}{|K|}. \tag{4.3}$$

This shows that the *r*-dimensional random vector η = (η1, η2,...,η*r*) taking value on the key space *K* is also equally almost distributed on *K*. Unless otherwise specified, we generally stipulate that the plaintext space *P* and the key space *K* are independent information spaces, that is

$$p(mk) = p(m)p(k), \forall m \in P, k \in K. \tag{4.4}$$

For every *k* ∈ *K*, *k* defines or controls an encryption transform *Ek* , denote by

$$E = \{ E\_k | k \in K \}.$$

*E* is called encryption algorithm.When *k* ∈ *K* is given, the encryption transformation *Ek* acts on the plaintext *m* ∈ *P* to produce a cryptosystemtext *Ek* (*m*), each encryption transformation *Ek* is an injection, and its left inverse mapping is recorded as *Dk* , which is called decryption transformation. Taking 1*<sup>P</sup>* as the identity transformation of plaintext space, that is 1*<sup>P</sup>* (*m*) = *m*, ∀ *m* ∈ *P*, then we have

$$D\_k E\_k = 1\_P,\text{ or } D\_k(E\_k(m)) = m,\forall\ m \in P.\tag{4.5}$$

Define cryptosystemtext space *C* as

$$C = \{E\_k(m) | m \in P, k \in K\} \subset X\_1 X\_2 \cdots X\_n. \tag{4.6}$$

That is, cryptosystemtext space *C* and plaintext space *P* have the same alphabet and the same letter length.

For each cryptosystemtext *c* ∈ *C*, *c* = *Ek* (*m*), then *c* is uniquely determined by plaintext *m* and key *k*, so we can define the occurrence probability *p*(*c*) of cryptosystemtext *c* as

4.1 Definition and Statistical Characteristics of Cryptosystem 155

$$p(c) = p(E\_k(m)) = p(km) = p(k)p(m). \tag{4.7}$$

Obviously,

$$\sum\_{c \in C} p(c) = \sum\_{k \in K} \sum\_{m \in P} p(km) = \sum\_{k \in K} p(k) \sum\_{m \in P} p(m) = 1,$$

Therefore, the cryptosystemtext space *C* defined by formula (4.7) is also an information space.

When *k* ∈ *K* is given, we let

$$A\_k = \{ E\_k(m) | m \in P \} \subset C. \tag{4.8}$$

Then the encryption transformation *Ek* is the full mapping of *P* → *Ak* , so *Ek* is a 1–1 correspondence of *P* → *Ak* , and its inverse mapping is the decryption transformation *Dk* , that is

$$D\_k E\_k = 1\_P, \; E\_k D\_k = 1\_{A\_k}, k \in K.$$

We denote *D* as all decryption transformations, that is

$$D = \{ D\_k | k \in K \}. \tag{4.9}$$

*D* is called decryption algorithm.

**Definition 4.1** Under the above provisions, R = {*P*,*C*, *K*, *E*, *D*} is called a cryptosystem, where *P*,*C*, *K* is the information space, *K* and *P* are statistically independent, *E* is the encryption algorithm and *D* is the decryption algorithm.

The statistical characteristics of a cryptosystem are attributed to the following theorem.

# **Theorem 4.1** *If* R = {*P*,*C*, *K*, *E*, *D*}*, then*

*(1)* ∀ *c* ∈ *C*,*we have*

$$p(c) = \sum\_{\substack{k \in K\\ c \in A\_k}} p(k) \, p(D\_k(c)). \tag{4.10}$$

*(2) c* ∈ *C*, *m* ∈ *P*, *then*

$$p(c|m) = \sum\_{\substack{k \in K\\E\_k(m) = c}} p(k). \tag{4.11}$$

*(3) c* ∈ *C*, *m* ∈ *P*, *then*

$$p(m|c) = \frac{p(m)\sum\_{\substack{k\in K\\D\_k(c) = m}} p(k)}{\sum\_{\substack{k\in K\\c\in A\_k}} p(k)p(D\_k(c))},\tag{4.12}$$

*where Ak is given by equation (4.8).*

*Proof* By (4.7), if *c* ∈ *C*, then

$$\begin{aligned} p(c) &= \sum\_{\substack{k \in K, m \in P \\ E\_k(m) = c}} p(km) = \sum\_{\substack{k \in K, m \in P \\ E\_k(m) = c}} p(k)p(m) \\ &= \sum\_{k \in K} p(k) \sum\_{\substack{m \in P \\ E\_k(m) = c}} p(m) = \sum\_{\substack{k \in K \\ c \in A\_k}} p(k)p(D\_k(c)). \end{aligned}$$

(1) holds. (2) is trivial. Because when *m* ∈ *P* is given, the occurrence probability *p*(*c*|*m*) of cryptosystemtext *c* has

$$p(c|m) = \sum\_{\substack{k \in K, \\ E\_k(m) = c}} p(k).$$

To prove (3), by (1.24),

$$p(m|c) = \frac{p(m)p(c|m)}{\sum\_{m' \in P} p(m')p(c|m')},$$

the items in the denominator are

$$\begin{aligned} \sum\_{m' \in P} p(m')p(c|m') &= \sum\_{m' \in P} p(m') \sum\_{\substack{k \in K\\E\_k(m') = c}} p(k) \\ &= \sum\_{k \in K} p(k) \sum\_{\substack{m' \in P\\E\_k(m') = c}} p(m') \\ &= \sum\_{\substack{k \in K\\c \in A\_k}} p(k)p(D\_k(c)). \end{aligned}$$

So in the end

$$p(m|c) = \frac{p(m)\sum\_{\substack{k\in K\\E\_k(m) = c}} p(k)}{\sum\_{\substack{k\in K\\c\in A\_k}} p(k)p(D\_k(c))},$$

Theorem 4.1 holds!

By Theorem 4.1, the statistical characteristics of a cryptosystem can be summarized as follows: The probability distribution of cryptosystemtext space and the conditional probability distribution of plaintext about cryptosystemtext are completely determined by the probability distribution of plaintext space and key space. That is, anyone who knows the probability distribution of plaintext space and key space will know the probability distribution of cryptosystemtext and the conditional probability distribution of plaintext about cryptosystemtext.

It is assumed that the plaintext space and the key space are statistically independent, by (3.14) of Theorem 3.2 of Chap. 3, we have

$$H(PK) = H(P) + H(K). \tag{4.13}$$

It has been previously specified that the key source alphabet *Z* is an equal probability information space without memory, and the probability *p*(*k*) of the key space *K* = {*k* = *k*1*k*<sup>2</sup> ··· *kr*|*ki* ∈ *Z*} is

$$p(k) = \prod\_{i=1}^{r} p(k\_i) = \frac{1}{|Z|^r} = \frac{1}{|K|}. \tag{4.14}$$

Therefore, the key space is also an equal probability information space.

From the definition of cryptosystem, when the plaintext space and key space are given, the cryptosystemtext space is completely determined. On the contrary, when the cryptosystemtext space and key space are known, the plaintext space is also known, combined with Lemma 3.3 in the previous chapter, we have

**Theorem 4.2** *In a cryptosystem* R = {*P*,*C*, *K*, *E*, *D*}*, we have*

$$H(P|KC) = 0, \ H(C|KP) = 0, \ H(K|PC) = 0.$$

*Proof* We only prove *H*(*P*|*K C*) = 0, similarly to *H*(*C*|*K P*) = 0 and *H*(*K*|*PC*) = 0. For given *m* ∈ *P*, let

$$N\_m = \{ kc | c \in \mathcal{C}, k \in K \text{and } E\_k(m) = c \}.$$

Thus *Nm* ⊂ *K C*, and

$$\begin{cases} p(m|kc) = 1, & \text{if } kc \in N\_m; \\ p(m|kc) = 0, & \text{if } kc \notin N\_m. \end{cases}$$

Because assuming *kc* ∈ *Nm* is selected, then *Ek* (*m*) = *c*, thus *m* = *Dk* (*c*), *m* will be determined. Conversely, if *kc* ∈/ *Nm*, when the *kc*-joint event occurs and *m* cannot occur, thus *p*(*m*|*kc*) = 0. By Lemma 3.3 from the previous chapter, *H*(*P*|*K C*) = 0, we complete the proof of Theorem 4.2.

**Corollary 4.1** *In a cryptosystem* R = {*P*,*C*, *K*, *E*, *D*}*, we always have*

$$H(P) \le H(C).$$

*Proof* It is stipulated that *P* and *K* are statistically independent, so there are

$$\begin{aligned} H(P) &= H(P|K) = H(P|K) + H(C|PK) \\ &= H(PC|K) \\ &= H(C|K) + H(P|KC) \\ &= H(C|K) \\ &\le H(C). \end{aligned}$$

The Corollary holds.

The Corollary shows that the uncertainty of plaintext is less than that of cryptosystemtext in cryptosystem.

# **4.2 Fully Confidential System**

Generally speaking, the mutual information *I*(*P*,*C*) between plaintext space and cryptosystemtext space (see Definition 3.8 in the previous chapter) reflects the information of plaintext space contained in cryptosystemtext space, so *I*(*P*,*C*) minimization is an important design goal of cryptosystem. If the cryptosystemtext does not provide any information about the plaintext, or the analyst cannot obtain any information about the plaintext by observing the cryptosystemtext, such a cryptosystem is called completely confidential.

**Definition 4.2** A cryptosystem R = {*P*,*C*, *K*, *E*, *D*}, if *H*(*P*|*C*) = *H*(*P*), or *I*(*P*,*C*) = 0, R is called complete secrecy system, or unconditional secrecy system.

**Theorem 4.3** *For any cryptosystem* R = {*P*,*C*, *K*, *E*, *D*}*, we have*

$$H(P,C) \ge H(P) - H(K). \tag{4.15}$$

*Proof* By Theorem 4.2, we have *H*(*P*|*K C*) = 0, and

$$\begin{aligned} H(P|C) &= H(P|C) + H(K|PC) \\ &= H(PK|C) \\ &= H(K|C) + H(P|KC). \end{aligned}$$

So we have

$$H(P|C) = H(K|C) \le H(K).$$

By definition,

$$H(P, \mathcal{C}) = H(P) - H(P|\mathcal{C}) \ge H(P) - H(K).$$

So the Theorem holds.

From the previous chapter, we know the amount of mutual information *I*(*X*, *Y* ) ≥ 0, there is

**Corollary 4.2** *In a completely confidential cryptosystem* R = {*P*,*C*, *K*, *E*, *D*}*, there is always*

$$H(P) \le H(K) = \log\_2 |K|. \tag{4.16}$$

*Proof* Defined by R = {*P*,*C*, *K*, *E*, *D*} as a completely confidential system, so *I*(*P*,*C*) = 0. From the above theorem, there are

$$H(P) - H(K) \le I(P, C) = 0,$$

Thus there is *H*(*P*) ≤ *H*(*K*). By (4.14), *K* is equipotential distribution, so there is

$$H(P) \le \log\_2 |K|.$$

It can be seen from the above that the larger the scale |*K*| of the key space, the better the confidentiality of the system!

**Definition 4.3** A cryptosystem R = {*P*,*C*, *K*, *E*, *D*} is called a "one secret at a time" system, if there is a unique key *k* ∈ *K* for a given *m* ∈ *P* and *c* ∈ *C*, such that *c* = *Ek* (*m*).

As can be seen from the above definition, for given*m* ∈ *P*, if *k* = *k* , then *Ek* (*m*) = *Ek* (*m*). In other words, we only use a unique key *k* to encrypt the same set of plaintext and cryptosystemtext. This is also the origin of the concept of "one secret at a time". Thus, for any given plaintext *m* ∈ *P* and cryptosystemtext *c* ∈ *C*, there happens to be a unique key *k* ∈ *K* such that *Ek* (*m*) = *c*. Therefore, when *k* traverses the key space *K*, *m* traverses the plaintext space *P*, and each *m* appears only once. Thus, for *c* ∈ *C*, we have

$$\begin{split} p(c) &= \sum\_{\substack{k \in K\\ E\_k(m) = c}} \sum\_{m \in P} p(k)p(m) \\ &= \sum\_{\substack{k \in K\\ E\_k(m) = c}} p(k) \sum\_{m \in P} p(m) \\ &= \frac{1}{|K|} \sum\_{m \in P} p(m) = \frac{1}{|K|}. \end{split} \tag{4.17}$$

That is to say, in a one-time cryptosystem, the cryptosystemtext space *C* is also an equal probability information space.

**Theorem 4.4** *The one-time password system is a completely confidential system. Proof* When *c*, *m* given, by (4.11),

$$p(c|m) = \sum\_{\substack{k \in K\\E\_k(m) = c}} p(k) = \frac{1}{|K|}.$$

By (4.12) and (4.17),

$$\begin{aligned} p(m|c) &= \frac{p(m)}{\sum\_{E\_{m'}(k) = c} \sum\_{m' \in P} p(m')}\\ &= \frac{p(m)}{\sum\_{m' \in P} p(m')}\\ &= p(m). \end{aligned}$$

Thus

$$\begin{aligned} H(P|C) &= -\sum\_{m \in P} \sum\_{c \in C} p(mc) \log\_2 p(m|c) \\ &= -\sum\_{m \in P} \sum\_{c \in C} p(mc) \log\_2 p(m) \\ &= -\sum\_{m \in P} p(m) \log\_2 p(m) \\ &= H(P). \end{aligned}$$

Therefore, R = {*P*,*C*, *K*, *E*, *D*} is a completely confidential system.

# **4.3 Ideal Security System**

In order to introduce Shannon's concepts of unique solution distance and ideal cryptosystem, we first consider the scenario of secret only attack. In the scenario of secret only attack, when the cryptanalyzer intercepts cryptosystemtext *c*, he may decrypt *c* with all decryption keys *Dk* to obtain

$$m' = D\_k(c), \ k \in K.$$

Therefore, he records the keys corresponding to all meaningful messages *m* , only one of the set of these keys is a correct key, while other incorrect keys are called pseudo keys. A large number of cryptosystemtexts are required as samples in secret only attacks. Therefore, we will consider the product space *P<sup>n</sup>* of plaintext and cryptosystemtext and the joint events in *C<sup>n</sup>*, *P<sup>n</sup>* and*C<sup>n</sup>* as plaintext string and cryptosystemtext string.

**Definition 4.4** For cryptosystemtext string *y* ∈ *C<sup>n</sup>* with given length *n*, let

$$K(\mathbf{y}) = \{ k \in K | \exists \ x \in P^n \text{ such that } E\_k(\mathbf{x}) = \mathbf{y} \}. \tag{4.18}$$

Then the number of pseudo keys is |*K*(*y*)| − 1. The mathematical expectation *Sn* of the pseudo key is defined as

$$S\_n = \sum\_{\mathbf{y} \in C^n} p(\mathbf{y})(|K(\mathbf{y})| - 1). \tag{4.19}$$

Therefore, the mathematical expectation of pseudo key is the weighted average of the number of pseudo keys of each cryptosystemtext string. We first prove the following two theorems.

**Theorem 4.5** *If* R = {*P*,*C*, *K*, *E*, *D*} *is a cryptosystem, there are*

$$H(K|C) = H(K) + H(P) - H(C). \tag{4.20}$$

*Proof* From the addition formula of information entropy (see Theorem 3.2 in the previous chapter),

$$H(KPC) = H(KP) + H(C|KP) = H(KC) + H(P|KC).$$

By Theorem 4.2, we have

$$H(C|KP) = H(P|KC) = 0.$$

thus,

$$H(KP) = H(KC).$$

Again, from the addition formula and note that *K* and *P* are statistically independent, so

$$H(KP) = H(P) + H(K|P) = H(P) + H(K)$$

and

$$H(P) + H(K) = H(KC) = H(C) + H(K|C).$$

So we have

$$H(K|C) = H(P) + H(K) - H(C).$$

The Theorem holds.

**Theorem 4.6** *Let* R = {*P*,*C*, *K*, *E*, *D*} *be a cryptosystem, and* |*C*|=|*P*|*, let r be the redundancy of P, then the pseudo key mathematical expectation Sn of a cryptosystemtext string with a given length of n satisfies*

162 4 Cryptosystem and Authentication System

$$S\_n \ge \frac{2^{H(K)}}{|P|^{nr}} - 1.\tag{4.21}$$

*Proof* From the definition and properties of product space,

$$\mathcal{R}\_n = \{P'', C'', K, E\_n, D\_n\}$$

also constitutes a cryptosystem. By Theorem 4.5, then

$$H(K|C'') = H(K) + H(P'') - H(C'').$$

By (3.9), we have

$$H(C^n) \le n \log\_2 |C|, \ |C| = |P|.$$

Replace information space *X* with *P*, then we have

$$\begin{aligned} H(P^n) &= H(P^{n-1}) + H(P|P^{n-1}) \\ &= H(P^{n-1}) + H\_n \\ &\ge H(P^{n-1}) + H\_{\infty} .\end{aligned}$$

So we have

$$H(P'') \ge nH\_{\infty} = n(1-r)H\_0 = n(1-r)\log\_2|P|.\tag{4.22}$$

Combined with the above formula, we have an estimate

$$H(K|C^n) \ge H(K) + n(1 - r)\log\_2|P| - n\log\_2|P|.\tag{4.23}$$

Because of the definition,

$$\begin{aligned} H(K|C^\mathbf{n}) &= -\sum\_{\mathbf{y}\in C^\mathbf{x}} \sum\_{k\in K} p(k\mathbf{y}) \log\_2 p(k|\mathbf{y}) \\ &= -\sum\_{\mathbf{y}\in C^\mathbf{x}} p(\mathbf{y}) \sum\_{k\in K} p(k|\mathbf{y}) \log\_2 p(k|\mathbf{y}) \\ &= -\sum\_{\mathbf{y}\in C^\mathbf{x}} p(\mathbf{y}) \sum\_{k\in K(\mathbf{y})} p(k|\mathbf{y}) \log\_2 p(k|\mathbf{y}) .\end{aligned}$$

We get

$$\sum\_{k \in K(\mathbf{y})} p(k|\mathbf{y}) = \sum\_{k \in K} p(k) = 1.$$

Then by Jensen inequality,

$$\begin{aligned} H(K|C^n) &\le \sum\_{\mathbf{y}\in C^n} p(\mathbf{y}) \log\_2 |k(\mathbf{y})| \\ &\le \log\_2 \sum\_{\mathbf{y}\in C^n} p(\mathbf{y}) |k(\mathbf{y})| \\ &= \log\_2(S\_n + 1). \end{aligned}$$

Finally, (4.21) can be obtained from form (4.23) to complete the proof!

When the mathematical expectation of the number of pseudo keys is greater than 0, the secret only attack cannot break the password in theory, so we define the unique solution distance of a cryptosystem as the value of *n* of *Sn* = 0.

**Definition 4.5** A cryptosystem whose unique solution distance is infinite is called an ideal security system.

From Theorem 4.6, we can obtain an approximate value of the distance of the unique solution.

$$n\_0 \approx \frac{H(k)}{r \log\_2 |P|}.$$

The unique solution distance indicates the minimum amount of cryptosystemtext that may be decrypted successfully when an exhaustive attack is carried out. Generally speaking, the greater the unique solution distance, the better the confidentiality of the system. However, Shannon only gives the existence of the unique solution distance, but does not give a specific calculation program. In practice, the amount of cryptosystemtext required to decryptosystem a cryptosystemtext is far greater than the theoretical value of the unique solution distance.

# **4.4 Message Authentication**

Authentication system, also known as authentication code, is an important tool to ensure the authenticity and integrity of messages. In 1984, Simmons systematically put forward the information theory of authentication system for the first time. He used mathematics to study the theoretical and practical security of authentication system. This paper puts forward the performance limit of authentication system and the mathematical principles that should be followed in the design of authentication code. Although Simmons' theory is not mature and perfect, its position in authentication system is as important as Shannon's theory in cryptosystem, which lays a theoretical foundation for the research of mathematical theory of authentication system.

In cryptography, authentication system includes entity authentication and message authentication. We mainly discuss message authentication system. At present, there are two main models of authentication system. One is the arbiter-free authentication system model. In this model, the participants of the system are mainly message sender, message receiver and attacker, in which the message sender and receiver trust each other. They share the same key information; another model is the authentication system model with arbiter. In this model, the participants of the system have arbiters in addition to the information sender, receiver and attacker. At this time, the sender and receiver of the message do not trust each other, but they all trust the arbiter. The arbiter shares the key information with the sender and receiver.

An authentication system without privacy and confidentiality function and without arbiter is composed of four parts: a finite set *S* of source states, called the source set, a finite set *A* of authentication tags, called the tag set, a key space composed of all solvable keys, and an authentication rule set *E* = {*ek* (*s*)|*k* ∈ *K*,*s* ∈ *S*}, where for any *k* ∈ *K*, *s* ∈ *S*, *ek* (*s*) is the authentication rule. It is a mapping of *S* → *A*.

**Definition 4.6** An authentication system is *T* = {*S*, *A*, *K*, *E*}, where *S*, *A*, *K* is the information space, *S* is the source space or source set, *A* is the label space or label set, and *K* is the key space, where *S* and *K* are statistically independent,

$$E = \{e\_k(s) | k \in K, s \in \mathcal{S}\}.$$

Each *ek* (*s*) is an injection of *S* → *A*, which is called an authentication rule.

**Definition 4.7** The product space *S A* is called the message space, and *M* represents *S A*.

Authentication protocol: The sender and receiver of the message use the following protocol to transmit information. First, they secretly select and share the random key *k* ∈ *K*; if the sender wants to transmit an information source state *s* ∈ *S* to the receiver, the sender calculates *a* = *ek* (*s*) and sends the message *sa* ∈ *M* to the receiver. When the receiver receives message *sa*, he calculates *a* = *ek* (*s*) again, if *a* = *a*, he confirms that the message is reliable and receives the message, otherwise he refuses to receive the message *sa*.

**Definition 4.8** Matrix [*ek* (*s*)]|*<sup>K</sup>*|×|*S*<sup>|</sup> is called authentication matrix. Its rows are marked by key *k* ∈ *K* and columns by source state *s* ∈ *S*. It is a |*K*|×|*S*|-order matrix, the element intersecting row *k* and column *s* is *ek* (*s*).

Authentication matrix is an important tool in authentication theory research. Our detailed list is as follows:

Let *K* = {*k*1, *k*2,..., *kn*}, *S* = {*s*1,*s*2,...,*sm*}. Then the authentication matrix is an *n* × *m*-order matrix, which is listed as follows:

$$\begin{bmatrix} e\_{k\_1}(\mathbf{s}\_1) \ e\_{k\_1}(\mathbf{s}\_2) \ \cdots \ e\_{k\_1}(\mathbf{s}\_m) \\\ e\_{k\_2}(\mathbf{s}\_1) \ e\_{k\_2}(\mathbf{s}\_2) \ \cdots \ e\_{k\_2}(\mathbf{s}\_m) \\\ \vdots \\\ e\_{k\_n}(\mathbf{s}\_1) \ e\_{k\_n}(\mathbf{s}\_2) \ \cdots \ e\_{k\_n}(\mathbf{s}\_m) \end{bmatrix}\_{n \times m}$$

.

# **4.5 Forgery Attack**

In the process of message authentication, the attacker is an intermediate intruder. We usually consider two types of attacks, one is forgery attack and the other is substitution attack, which correspond to secret only attack and plaintext attack in cryptosystem. In forgery attack, the attacker sends message *sa* ∈ *M* in the channel and wants the receiver to confirm that it is true and receive it; in the substitution attack, the attacker first observes a message *sa* ∈ *M* in the channel, so he analyzes the coding rules currently used, then he tampers the message *sa* with *s a* ∈ *M*, where *s* = *s*, and wants the receiver to receive it as a real message.

We assume that the attacker adopts the optimal deception strategy. *pd*<sup>0</sup> represents the probability that the forgery attacker is most likely to succeed in deception, and *pd*<sup>1</sup> represents the probability that the attacker is most likely to succeed in deception. The probability *pd* that the attacker is successful in deception is defined as

$$p\_d = \max\{p\_{d\_0}, p\_{d\_1}\}.\tag{4.24}$$

Simmons' theory is mainly to estimate the lower bound of *pd* , so as to provide a theoretical basis for constructing authentication codes with attack success probability *pd* as small as possible.

First, let's look at the definition and estimation of the maximum probability *pd*<sup>0</sup> of successful deception by forgery attackers.

*A* = {*a*1, *a*2,..., *ar*} represents the authentication tag space. The attacker first selects a source state *s* ∈ *S* and an authentication tag *a* ∈ *A*. Let *k*<sup>0</sup> ∈ *K* represent the shared key selected by the sender and receiver, if *a* = *ek*<sup>0</sup> (*s*), the forgery attacker can successfully deceive the receiver.We use pay off(*s*, *a*)to represent the probability that the message receiver receives *sa* as a true message, that is

$$\text{pay off } (s, a) = p(a = e\_{k\_0}(s)) = \sum\_{\substack{k \in K \\ e\_k(s) = a}} p(k). \tag{4.25}$$

If the attacker adopts the optimal strategy, then

$$p\_{d\_0} = \max\{\text{pay off } (s, a) | s \in S, a \in A\}. \tag{4.26}$$

**Theorem 4.7** *If the scale of authentication tag space A is set to* |*A*| = *r, for any fixed source state s* ∈ *S, there will always be an authentication tag a* ∈ *A such that*

$$\text{pay } off \text{ (s, } a) \ge \frac{1}{r}, \text{ thus } p\_{d\_0} \ge \frac{1}{r}.$$

*Proof* By the definition of pay off (*s*, *a*),

166 4 Cryptosystem and Authentication System

$$\sum\_{a \in A} \text{pay off } (s, a) = \sum\_{a \in A} \sum\_{\substack{k \in K \\ c\_k(s) = a}} p(k).$$

When *a* runs through the *s* column of the authentication matrix, *k* traverses the whole key space, so

$$\sum\_{a \in A} \text{pay off } (s, a) = \sum\_{k \in K} p(k) = 1.$$

Therefore, there is at least one *a* ∈ *A* such that

$$\text{pay off } (s, a) \ge \frac{1}{|A|} = \frac{1}{r} \text{.}$$

**Theorem 4.8** *Let T* = {*S*, *A*, *K*, *E*} *be a message authentication system,*

$$p\_{d\_0} = \max\{pay \ off f(s, a) | s \in \mathcal{S}, a \in A\}$$

*is the maximum probability of successful forgery attack, then*

$$
\log\_2 p\_{d\_0} \ge H(K|\mathcal{S}A) - H(K).
$$

*and*

$$p\_{d\_0} \ge \frac{1}{2^{H(K) - H(K|SA)}}.$$

*Proof* By definition, we know that *pd*<sup>0</sup> is not less than the mathematical expectation of pay off (*s*, *a*), that is

$$p\_{d\_0} \ge \sum\_{s \in S, a \in A} p(sa) \text{pay off } (s, a).$$

Then by Jensen inequality, we have

$$\begin{aligned} \log\_2 p\_{d\_0} &\geq \log\_2 \sum\_{s \in S, a \in A} p(sa) \text{pay off } (s, a) \\ &\geq \sum\_{s \in S, a \in A} p(sa) \log\_2 \text{pay off } (s, a) .\end{aligned}$$

Obviously,

$$\text{pay off } (\mathbf{s}, a) = p(a|\mathbf{s}).$$

Thus

$$p(sa) = p(\mathbf{s})p(a|\mathbf{s}) = p(\mathbf{s}) \text{pay off } (\mathbf{s}, a). \tag{4.27}$$

So

$$\begin{aligned} \log\_2 p\_{d\_0} &\geq \sum\_{s \in S} \sum\_{a \in A} p(sa) \log\_2 \text{pay off } (s, a) \\ &= \sum\_{s \in S} \sum\_{a \in A} p(sa) \log\_2 p(a|s) \\ &= -H(A|S). \end{aligned}$$

Because the source space *S* and the key space *K* are statistically independent, so

$$H(\mathbb{S}K) = H(K) + H(\mathbb{S}).$$

Also, the tag space *A* is completely determined by the source space *S* and the key space *K*, so

$$H(A|KS) = 0.$$

By the addition formula of information space,

$$\begin{aligned} H(KAS) &= H(AS) + H(K|AS) \\ &= H(S) + H(A|S) + H(K|AS). \end{aligned}$$

On the other hand,

$$\begin{aligned} H(KAS) &= H(KS) + H(A|KS) \\ &= H(KS) = H(K) + H(S) .\end{aligned}$$

On the whole, we have

$$-H(A|S) = H(K|AS) - H(K).$$

Thus

$$
\log\_2 p\_{d\diamond} \ge -H(A|S) = H(K|AS) - H(K).
$$

We completed the proof of the theorem.

*M* = *S A* is called message space, it can be seen from theorem 4.8 that the maximum success probability *pd*<sup>0</sup> of forgery attack satisfies

$$p\_{d\_{\mathbb{O}}} \ge \frac{1}{2^{I(K,M)}},$$

where *I*(*K*, *M*) is the average amount of mutual information between the key space and the information space. If the amount of mutual information *I*(*K*, *M*) is larger, the probability of the most successful forgery attack is lower. On the contrary, if the amount of mutual information is smaller, the success rate of forgery attack is higher.

# **4.6 Substitute Attack**

The so-called substitution attack is that the attacker first observes a message (*s*, *a*) on the message, and then replaces (*s*, *a*) with message (*s* , *a* ), hoping that the receiver will receive (*s* , *a* ) as a real message. Considering the maximum success probability *pd*<sup>1</sup> of substitution attack, it is more difficult than forgery attack, the main reason is that *pd*<sup>1</sup> depends on both the probability distribution of source state space *S* and the probability distribution of key space *K*.

Let (*s* , *a* ) and (*s*, *a*) be two messages, where *s* = *s* . We use pay off (*s* , *a* ,*s*, *a*) to express the probability that using (*s* , *a* ) instead of (*s*, *a*) can cheat success, then

$$\text{pay off } (\mathbf{s}', a', \mathbf{s}, a) = p(a' = e\_{k\_0}(\mathbf{s}') | a = e\_{k\_0}(\mathbf{s})), \ k\_0 \in K.$$

The above formula represents the conditional probability of *a* = *ek*<sup>0</sup> (*s* ) under the condition of *a* = *ek*<sup>0</sup> (*s*) under the same key *k*0, so

$$\begin{split} \text{pay off } (s', a', s, a) &= \frac{p(a' = e\_{k\_0}(s'), a = e\_{k\_0}(s))}{p(a = e\_{k\_0}(s))} \\ &= \frac{\sum\_{\substack{k \in K, \\ e\_{k\_0}(s') = a', e\_{k\_0}(s) = a}} p(k)}{\text{pay off } (s, a)}. \end{split} \tag{4.28}$$

When the message (*s*, *a*) ∈ *M* is given, the attacker uses the optimal strategy to maximize the success probability of the deceiver, so let

$$p\_{s,a} = \max\{\text{pay off } (s', a', s, a) | s' \in S, s' \neq s, a' \in A\},\tag{4.29}$$

Taking *ps*,*<sup>a</sup>* as a random variable, its mathematical expectation on message set *M* = *S A* is

$$p\_{d\_{\parallel}} \geq \sum\_{s \in \mathcal{S}, a \in A} p(sa) \, p\_{s,a} \,. \tag{4.30}$$

The above formula is the formal definition of *pd*<sup>1</sup> , which is the weighted average of the maximum success probability of pay off (*s* , *a* ,*s*, *a*) in message space *M*.

Like Theorem 4.7, we have

**Theorem 4.9** *Let T* = {*S*, *A*, *K*, *E*} *be an authentication code,* |*A*| = *r, then for any given s* ∈ *S, s* ∈ *S, s* = *s and a* ∈ *A, there is a label a* ∈ *A such that*

$$\text{pay off} \left( \text{s}', a', s, a \right) \geq \frac{1}{|A|} = \frac{1}{r}.$$

4.6 Substitute Attack 169

*So we have*

$$p\_{d\_1} \geq \frac{1}{r}.$$

*Proof* By (4.28),

$$\begin{aligned} \sum\_{a' \in A} \text{pay off } (s', a', s, a) &= \frac{1}{\text{pay off } (s, a)} \sum\_{a' \in A} \sum\_{\substack{k \in K\\ \epsilon\_k(s) = a, \epsilon\_k(s') = a'}} p(k) \\ &= \frac{1}{\text{pay off } (s, a)} \sum\_{\substack{k \in K\\ \epsilon\_k(s) = a}} p(k) = 1. \end{aligned}$$

So at least one *a* ∈ *A* such that

$$\text{pay off } (s', a', s, a) \ge \frac{1}{|A|} = \frac{1}{r}.$$

By the definition of *ps*,*<sup>a</sup>*, for ∀ *s* ∈ *S* and *a* ∈ *A*, we have

$$p\_{x,a} \ge \frac{1}{|A|} = \frac{1}{r}.$$

Thus

$$p\_{d\_{\mathbb{I}}} \ge \sum\_{s \in S, a \in A} p(sa)p\_{s,a} \ge \frac{1}{r} \sum\_{a \in A} p(a) = \frac{1}{r}.$$

**Theorem 4.10** *Let T* = {*S*, *A*, *K*, *E*} *be an authentication code, for any* (*s*, *a*) ∈ *M, when using* (*s* , *a* ) *instead of attack, let pd*<sup>1</sup> *be the mathematical expectation of ps*,*<sup>a</sup> in space M, then*

$$
\log\_2 p\_{d\_1} \ge H(K|M^2) - H(K|M).
$$

*and*

$$p\_{d\_1} \ge \frac{1}{2^{H(K|M) - H(K|M^2)}}.\tag{4.31}$$

*Proof* By (4.29), *ps*,*<sup>a</sup>* will not be less than the mathematical expectation of pay off (*s* , *a* ,*s*, *a*) on *s* ∈ *S*, *a* ∈ *A*, that is

$$p\_{s,a} \ge \sum\_{s' \in S, a' \in A} p(s'a'|sa) \text{pay off } (s', a', s, a).$$

By (4.30) and Jensen inequality, we have

$$\begin{split} \log\_2 p\_{d\_1} &\geq \sum\_{s \in S, a \in A} p(sa) \log\_2 p\_{s, a} \\ &\geq \sum\_{s \in S, a \in A} p(sa) \sum\_{s' \in S, a' \in A} p(s'a'|sa) \log\_2 \text{pay off } (s', a', s, a) \\ &= \sum\_{s \in S, a \in A} \sum\_{s' \in S, a' \in A} p(sa s' a') \log\_2 \text{ pay off } (s', a', s, a) \\ &= \sum\_{s \in S, a \in A} \sum\_{s' \in S, a' \in A} p(sa s' a') \log\_2 p(a' s' | as) \\ &= -H(M | M). \end{split}$$

In addition,

$$\begin{aligned} H(KM^2) &= H(M|M) + H(K|M^2) \\ &= H(K|M) + H(M|KM) .\end{aligned}$$

So there are

$$-H(M|M) = H(K|M^2) - H(K|M) - H(M|KM).$$

It can be proved that

$$H(M|KM) = 0.$$

So there are

$$-H(M|M) = H(K|M^2) - H(K|M).$$

Thus

$$
\log\_2 p\_{d\_1} \ge H(K|M^2) - H(K|M).
$$

That is

$$p\_{d\_1} \geq \frac{1}{2^{H(K|M) - H(K|M^2)}}.$$

The Theorem holds!

**Definition 4.9** An authentication code {*S*, *A*, *K*, *E*} is called perfect if

$$p\_d = 2^{H(K \mid M) - H(K)}.$$

**Theorem 4.11** *Perfect certification system exists.*

*Proof* The theorem is proved directly by the construction method. First, let the source state space be *S* = {0, 1}. Let *N* be a positive even number, and define the label space *A* and the key space *K* as follows:

$$A = \mathbb{Z}\_2^{\frac{N}{2}} = \{a\_1 a\_2 \cdots a\_{\frac{N}{2}} | a\_i \in \mathbb{Z}\_2, \ 1 \le i \le \frac{N}{2}\}$$

and

$$K = \mathbb{Z}\_2^N = \{k\_1 k\_2 \cdots k\_N | k\_i \in \mathbb{Z}\_2, 1 \le i \le N\}.$$

The authentication rule *ek* (*s*) determined by *k* = *k*1*k*<sup>2</sup> ··· *k <sup>N</sup>* <sup>2</sup> *k <sup>N</sup>* <sup>2</sup> <sup>+</sup><sup>1</sup> ··· *kN* is defined as

$$e\_k(0) = k\_1 k\_2 \cdots k\_{\frac{N}{2}}$$

and

$$e\_k(1) = k\_{\frac{N}{2}+1} \cdots k\_N.$$

Assuming that all 2*<sup>N</sup>* keys *k* are equitably selected, so for *s* ∈ *S* and *a* ∈ *A*, we have

$$\text{pay off } (\mathbf{s}, a) = p(a = e\_k(\mathbf{s})) = \mathcal{Z}^{-\frac{N}{2}}.$$

So there is *pd*<sup>0</sup> <sup>=</sup> <sup>2</sup><sup>−</sup> *<sup>N</sup>* <sup>2</sup> , similarly to *pd*<sup>1</sup> <sup>=</sup> <sup>2</sup><sup>−</sup> *<sup>N</sup>* <sup>2</sup> , so

$$p\_d = 2^{-\frac{\kappa}{2}\_\cdot}$$

Easy to calculate

$$H(K|M) - H(K) = \frac{N}{2} - N = -H(K|M).$$

So

$$p\_d = 2^{H(K \mid M) - H(K)}.$$

Therefore, {*S*, *A*, *K*, *E*} is a perfect authentication system.

# **4.7 Basic Algorithm**

# *4.7.1 Affine Transformation*

Encryption with matrix comes from the classical *Vigenere* ` password. Let *X* = {*a*1, *a*2,..., *aN* } be a plaintext alphabet of *N* characters, we replace the characters in Z*<sup>N</sup>* and *X* with numerical values, where Z*<sup>N</sup>* is the remaining class ring of mod *N*. Let *<sup>P</sup>* <sup>=</sup> <sup>Z</sup>*<sup>k</sup> <sup>N</sup>* be the plaintext space, *x* = *x*1*x*<sup>2</sup> ··· *xk* ∈ *P* is called a plaintext unit or a plaintext message of length *k*. Let *Mk* (Z*<sup>N</sup>* ) be a *k*-order full matrix ring over <sup>Z</sup>*<sup>N</sup>* , *<sup>A</sup>* <sup>∈</sup> *Mk* (Z*<sup>N</sup>* ) is a invertible matrix of order *<sup>k</sup>*, *<sup>b</sup>* <sup>=</sup> *<sup>b</sup>*1*b*<sup>2</sup> ··· *bk* <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> <sup>N</sup>* is a given directional quantity, each plaintext unit *x* = *x*1*x*<sup>2</sup> ··· *xk* in *P* is encrypted by affine transformation (*A*, *b*):

$$
\begin{pmatrix} \mathbf{x}\_1'\\ \mathbf{x}\_2'\\ \vdots \\ \mathbf{x}\_k' \end{pmatrix} = A \begin{pmatrix} \mathbf{x}\_1\\ \mathbf{x}\_2\\ \vdots \\ \mathbf{x}\_k \end{pmatrix} + \begin{pmatrix} b\_1\\ b\_2\\ \vdots \\ b\_k \end{pmatrix}. \tag{4.32}
$$

where *x* = *x*1*x*<sup>2</sup> ··· *xk* is clear text, *x* = *x* 1*x* <sup>2</sup> ··· *x <sup>k</sup>* is cryptosystemtext. The decryption algorithm is:

$$
\begin{pmatrix} x\_1 \\ x\_2 \\ \vdots \\ x\_k \end{pmatrix} = A^{-1} \begin{pmatrix} x\_1' \\ x\_2' \\ \vdots \\ x\_k' \end{pmatrix} - A^{-1} \begin{pmatrix} b\_1 \\ b\_2 \\ \vdots \\ b\_k \end{pmatrix} \tag{4.33}
$$

Because affine transformation (*A*, *b*) is a 1–1 correspondence on Z*<sup>k</sup> <sup>N</sup>* −→ <sup>Z</sup>*<sup>k</sup> N* , its inverse transformation is (*A*−<sup>1</sup>, −*A*−1*b*); therefore, using affine transformation (*A*, *b*), we obtain the so-called high-order affine cryptosystem. This cryptosystem was first proposed by mathematician Lester hill in American Mathematics monthly in 1929, so it is also called Hill cryptosystem.

Hill cryptosystem divides the plaintext into a group of *k* characters and then encrypts each plaintext unit in turn by using *k*-order affine transformation (*A*, *b*) on Z*<sup>N</sup>* . The advantage of this password is that it hides the statistical characteristics of a single character (such as 26 letters in English), which can better resist the statistical analysis of the occurrence frequency of characters, and has strong ability to resist cryptosystemtext only attacks. However, on the basis of mastering a large amount of plaintext, it is not difficult to find the key (*A*, *b*), so the hill password is not strong against the attack of known plaintext.

The mathematical principles used by Hill cryptosystem are the following two conclusions.

**Lemma 4.1** *The set of all k-order affine transformations on* Z*<sup>N</sup> is written as Gk , that is*

> *Gk* = {(*A*, *<sup>b</sup>*)|*Ais a k -order reversible square matrix*, *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> N* }.

*Then Gk forms a group under the multiplication of transformation, which is called the k-order affine transformation group of ring* Z*<sup>N</sup> .*

*Proof* Take *A* as the *k*-order identity matrix *E* and *b* = 0 as the *k*-dimensional zero vector, then (*E*, 0) is the identity transformation of Z*<sup>k</sup> <sup>N</sup>* −→ <sup>Z</sup>*<sup>k</sup> <sup>N</sup>* and the unit element of *Gk* . Secondly, we look at the product of two affine transformations (*A*1, *b*1) and (*A*2, *b*2),

$$(A\_1, b\_1)(A\_2, b\_2) = (A\_1 A\_2, A\_1 b\_2 + b\_1) \in G\_k.$$

Obviously, the inverse transformation of (*A*, *b*) is

$$(A,b)^{-1} = (A^{-1}, -A^{-1}b) \in G\_k.$$

Therefore, *Gk* is a group. The Lemma holds.

From the above lemma, any group element (*A*, *b*) ∈ *Gk* of affine transformation group will form a Hill cryptosystem. If we select *n* group elements(*A*1, *b*1), (*A*2, *b*2), ..., (*An*, *bn*) in *Gk* and let

$$(A, b) = \prod\_{i=1}^{n} (A\_i, b\_i).$$

Using (*A*, *b*) to encrypt, we get a more complex Hill cryptosystem.

**Lemma 4.2** *<sup>A</sup>* <sup>∈</sup> *Mk* (Z*<sup>N</sup>* )*,* <sup>|</sup>*A*| = *D is the determinant of A, then A is reversible if and only if D and N are coprime, that is* (*D*, *N*) = 1*.*

*Proof* If (*D*, *N*) = 1, then there is *D*<sup>1</sup> such that *D*1*D* ≡ 1(mod *N*), let

$$A^\* = D\_1 \begin{bmatrix} A\_{11} \ A\_{12} \ \cdots \ A\_{1k} \\ A\_{21} \ A\_{22} \ \cdots \ A\_{2k} \\ \cdots \ \cdots \ \cdots \ \cdots \ \cdots \\ A\_{k1} \ A\_{k2} \ \cdots \ A\_{kk} \end{bmatrix},\tag{4.34}$$

where *A* = (*ai j*)*<sup>k</sup>*×*<sup>k</sup>* , *Ai j* is the algebraic cofactor of *ai j* . obviously, we have

$$A^\*A = AA^\* = \begin{bmatrix} 1 \ 0 \ \cdots \ 0 \\ 0 \ 1 \ \cdots \ 0 \\ \vdots \ \vdots \\ 0 \ 0 \ \cdots \ 1 \end{bmatrix},$$

So *A* is reversible, *A*−<sup>1</sup> = *A*∗. Let's take *k* = 2 as an example, if |*A*| = *D*, (*D*, *N*) = 1,

$$A = \begin{pmatrix} a \ b \\ c \ d \end{pmatrix} \Longrightarrow A^{-1} = \begin{pmatrix} D\_1 d & -D\_1 b \\ -D\_1 c & D\_1 a \end{pmatrix}.$$

Conversely, if *A* is reversible and *A*−<sup>1</sup> is the inverse matrix, because *A*−<sup>1</sup>*A* = *AA*−<sup>1</sup> = *E*, we get

$$|AA^{-1}| = |A||A^{-1}| \equiv 1 (\text{mod } N).$$

So we have (*D*, *N*) = 1. The Lemma holds.

If *k* = 1, first-order affine cryptosystem *x* ≡ *ax* + *b*(mod *N*), where (*a*, *N*) = 1, contains many famous classical passwords, especially when *a* = 1, *b* = 3, *N* = 26, *x* = *x* + 3(mod 26) is the famous Caesar code in history.

Next, we analyze the computational complexity of affine cryptography. We have the following Lemma.

**Lemma 4.3** *If A* <sup>=</sup> (*ai j*)*<sup>k</sup>*×*<sup>k</sup> is a k-order reversible square matrix on* <sup>Z</sup>*<sup>N</sup> , the bit operation times of A*−<sup>1</sup> *are estimated as follows*

$$\operatorname{Time}(A^{-1}) = O(k^4 k! \log^3 N).$$

*Therefore, when k is a fixed constant, the algorithm for finding A*−<sup>1</sup> *is polynomial. When N is a fixed constant, the algorithm for finding A*−<sup>1</sup> *is exponential. In other words, the greater the order of the matrix, the higher the computational complexity.*

*Proof* Because *A* = (*ai j*)*<sup>k</sup>*×*<sup>k</sup>* is reversible, then determinant

$$D = |A| = \sum\_{j\_1 j\_2 \cdots j\_k} (-1)^{\mathfrak{t}} (j\_1 j\_2 \cdots j\_k) a\_{1j\_1} a\_{2j\_2} \cdots a\_{k j\_k},$$

where *j*<sup>1</sup> *j*<sup>2</sup> ··· *jk* is an arrangement of 1, 2,..., *k* and τ (*j*<sup>1</sup> *j*<sup>2</sup> ··· *jk* ) is the reverse order number of the arrangement. The number of bit operations of each summation is *<sup>O</sup>*(*k*<sup>3</sup> log<sup>2</sup> *<sup>N</sup>*), and there are *<sup>k</sup>*! summation terms in total, thus

$$\text{Time}(D) = O(k^3 k! \log^2 N).$$

By Lemma 1.5 of Chapter 1, find the multiplicative inverse of *D* under mod *N*, *D*−<sup>1</sup> mod *N* = *D*<sup>1</sup> is

$$\text{Time}(D\_{\mathbb{I}}) = O(\log^3 N).$$

The bit operation times of each algebraic cofactor *Ai j* of the adjoint matrix *A*<sup>∗</sup> of formula (4.34) is *<sup>O</sup>*((*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)<sup>3</sup>(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)! log<sup>2</sup> *<sup>N</sup>*), and there are *<sup>k</sup>*<sup>2</sup> algebraic cofactors, thus

$$\operatorname{Time}(A^\*) = O(k^4 k! \log^2 N).$$

So

$$\text{Time}(A^{-1}) = O(k^4 k! \log^3 N).$$

When *k* is constant, the algorithm for finding *A*−<sup>1</sup> is polynomial. When *N* is constant and *k* −→ ∞, it is obvious that the algorithm for finding *A*−<sup>1</sup> is exponential. The Lemma holds.

# *4.7.2 RSA*

In 1976, two mathematicians from Stanford University, Diffie and Hellman, put forward a new idea of cryptosystem design. In short, the encryption algorithm and decryption algorithm are designed based on the principle of asymmetry. We can use the following schematic diagram to illustrate

#### 4.7 Basic Algorithm 175

$$P \xrightarrow{f} C \xrightarrow{f^{-1}} P,\tag{4.35}$$

where *P* is the plaintext space, *C* is the cryptosystemtext space, *f* encryption algorithm and *f* <sup>−</sup><sup>1</sup> decryption algorithm. If *f* and *f* <sup>−</sup><sup>1</sup> are the same algorithm, such as the involution operation in binary system, or the encryption algorithm *f* can easily deduce the decryption algorithm *f* <sup>−</sup>1. For example, the matrix encryption algorithm mentioned in the previous section (the matrix order is very small), which is called symmetric cryptosystem. The essence of symmetric cryptosystem is that the encryption key and decryption key have the same confidentiality importance. Diffie and Hellman proposed that if *f* = *f* <sup>−</sup><sup>1</sup> and *f* are encryption algorithms that are easy to implement, while *f* <sup>−</sup><sup>1</sup> is a decryption algorithm that is very difficult to calculate, the key can be divided into encryption key and decryption key. Even if the encryption key is made public to the public, the security of decryption key will not be affected. This encryption algorithm *f* is called asymmetric or trapdoor one-way function. The password using asymmetric *f* is called asymmetric password or public key cryptosystem. Due to the bold innovation of Diffie and Hellman, cryptography has ushered in a new era—the era of public key cryptography. Its basic feature is that passwords change from few users to many users, which greatly improves the efficiency and social value of passwords.

How to design asymmetric encryption algorithm? Rivest, Shamir and Adleman jointly put forward the first secure and practical one-way encryption algorithm, which is called RSA algorithm in academic circles. This public key cryptosystem has been widely used in cryptographic design and has become an international standard algorithm. In addition to its simplicity and practicality, its security completely depends on the difficulty of large prime factorization of huge integers.

Let *p*, *q* be two large and relatively safe prime numbers, assume

$$10^{300} < p, q, \quad \text{its binary digits} \, k > 10 \text{24bits.} \tag{4.36}$$

Let *n* = *pq*, ϕ(*n*) be an Euler function, then

$$
\varphi(n) = (p-1)(q-1) = n+1-p-q.
$$

Randomly select a positive integer *e* to satisfy

$$1 < e < \varphi(n), (e, \varphi(n)) = 1. \tag{4.37}$$

The large prime numbers *p* and *q* and *e* satisfying formula (4.37) are randomly generated. The so-called random generation is to randomly select the *p*, *q* and *e* with the help of the computer random number generator (or pseudo-random number generator), and its computational complexity is

**Lemma 4.4** *Randomly generated large prime number p and q, n* = *pq,* ϕ(*n*) *is Euler function,* 1 < *e* < ϕ(*n*), (*e*, ϕ(*n*)) = 1*, then*

176 4 Cryptosystem and Authentication System

$$\begin{cases} \text{Time } (select \ out \ n) = O(\log^4 n), \\ \text{Time } (find \ e) = O(\log^2 n). \end{cases} $$

*Proof* Use the random number generator to generate a huge integer *m*, such as *m* > 10300, and then detect whether *m*, *m* + 1, *m* + 2,..., is a prime number. From the prime number theorem, we know that the frequency of prime numbers adjacent to *m* is about *O*( <sup>1</sup> log *<sup>m</sup>* ), so we only need about *O*(log *m*) tests to find the required prime number *p*, by Lemma 1.5 of Chapter 1,

$$\text{Time (find prime } p) = O(\log^2 m) = O(\log^2 n).$$

Similarly,

$$\text{Time (find prime } \mathbf{q}) = O(\log^2 n).$$

Because *n* = *pq*, so

$$\text{Time (select out n)} = O(\log^4 n).$$

*n* after confirmation, ϕ(*n*) = (*p* − 1)(*q* − 1). A positive integer *a*, 1 < *a* < ϕ(*n*), is randomly generated by the random number generator, and then whether *a*, *a* + 1, *a* + 2,... and ϕ(*n*) are mutually prime is detected in turn. Again, according to the prime number theorem, the frequency of the prime factor of ϕ(*n*) appearing in the vicinity of *a* is *O*( <sup>1</sup> log *<sup>a</sup>* ), so we only need *O*(log *a*) tests to get the required *e*. Thus

$$\text{Time } (\text{select out e}) = O(\log^2 a) = O(\log^2 n).$$

The Lemma holds.

After randomly selecting *p*, *q*, *n* = *pq*, and *e*, because (*e*, ϕ(*n*)) = 1, then exist *d* = *e*−<sup>1</sup> mod ϕ(*n*), that is

$$de \equiv 1 \, (\text{mod } \varphi(n)), \, 1 \, < \, d \, < \, \varphi(n). \tag{4.38}$$

**Definition 4.10** After randomly determining *n* = *pq*, let *Pe* = (*n*, *e*) be called public key, *Pd* = (*n*, *d*) be called private key, or *e* be public key and *d* be private key.

By Lemma 1.5 of chapter 1, calculate the number Time(*d*) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> ϕ(*n*)) <sup>=</sup> *<sup>O</sup>*(log<sup>3</sup> *<sup>n</sup>*) of bit operations required for *<sup>d</sup>* <sup>=</sup> *<sup>e</sup>*−<sup>1</sup> mod ϕ(*n*). By Lemma 4.4, we have

**Corollary 4.3** *The computational complexity of randomly generated public key Pe* = (*n*, *e*) *and private key Pd* = (*n*, *d*) *is polynomial.*

The key mathematical principle used in RSA cryptographic design is the generalized Euler congruence theorem. *n* ≥ 1 is a positive integer, (*m*, *n*) = 1, from Euler theorem, it can be seen that

$$m^{\varphi(n)} \equiv \mathbb{I}(\operatorname{mod} n), \Longrightarrow m^{\varphi(n)+1} \equiv m(\operatorname{mod} n). \tag{4.39}$$

We will prove that under the condition that *n* is a positive integer without square factor, there is formula (4.39) for all positive integers *m*, whether (*m*, *n*) = 1 or (*m*, *n*) > 1.

**Lemma 4.5** *If n* = *pq is the product of two different prime numbers, then for all positive integers m*, *k, there are*

$$m^{k\wp(n)+1} \equiv m(\mod n). \tag{4.40}$$

*Proof* If (*m*, *n*) = 1, then by Euler Theorem,

$$m^{k\wp(n)} \equiv 1 (\bmod n), \Longrightarrow m^{k\wp(n)+1} \equiv m (\bmod n).$$

We only consider the case of (*m*, *n*) > 1, because *n* = *pq*, so (*m*, *n*) = *p*, (*m*, *n*) = *q*, or (*m*, *n*) = *n*. If (*m*, *n*) = *n*, then(4.40) holds. Might as well let (*m*, *n*) = *p*, then *m* = *pt*, where 1 ≤ *t* < *q*. By Euler theorem, because (*m*, *q*) = 1, so

$$m^{\varphi(q)} \equiv 1 (\text{mod } q), \Longrightarrow m^{k\varphi(q)\varphi(p)} \equiv 1 (\text{mod } q).$$

For ∀ *k* ≥ 1, there is

$$m^{k\wp(n)} \equiv 1 (\bmod q).$$

We write

$$m^{k\wp(n)} = rq + 1.$$

Both sides are multiplied by *m*,

$$m^{k\varphi(n)+1} = rtn + m.$$

The above formula contains

$$m^{k\wp(n)+1} \equiv m(\bmod n).$$

We have completed the proof of lemma.

With the above preparations, the workflow of RSA password can be divided into the following three steps:

(1) Suppose *A* is a user of RSA, and *A* randomly generates two huge prime numbers *p* = *p*(*A*), *q* = *q*(*A*), *n* = *n*(*A*), where *n* = *pq*, ϕ(*n*) = (*p* − 1)(*q* − 1). Then randomly generate positive integers *e* = *e*(*A*), satisfies 1 < *e* < ϕ(*n*), (*e*, ϕ(*n*)) = 1, calculated *d* ≡ *e*−<sup>1</sup>(mod ϕ(*n*)), and 1 < *d* < ϕ(*n*). User *A* destroys two prime numbers *p* and *q*, and only keeps three numbers *n*, *e*, *d*, after publishing *Pe* = (*n*, *e*) as public key, he has private key *Pd* = (*n*, *d*) and keeps it strictly confidential.

(2) User *B* of another RSA sends encrypted information to user *A* using the known public key (*n*, *<sup>e</sup>*) of user *<sup>A</sup>*. *<sup>B</sup>* selects *<sup>P</sup>* <sup>=</sup> <sup>Z</sup>*<sup>n</sup>* as the plaintext space and encrypts each *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*n*. The encryption algorithm *<sup>c</sup>* <sup>=</sup> *<sup>f</sup>* (*m*) is defined as

$$c = f(m) \equiv m^e(\text{mod } n), \ 1 \le c \le n. \tag{4.41}$$

where *c* is cryptosystemtext.

(3) After receiving the cryptosystemtext *c* sent by user *B*, user *A* decrypts it with its own private key (*n*, *d*). Decryption algorithm *f* <sup>−</sup><sup>1</sup> is defined as:

$$m = f^{-1}(c) \equiv c^d(\text{mod } n), \ 1 \le m \le n. \tag{4.42}$$

User *A* gets the plaintext *m* sent by user *B*. so far, RSA cryptosystem completes encryption and decryption.

The correctness and uniqueness of RSA password are guaranteed by the following Lemma.

**Lemma 4.6** *The encryption algorithm f defined by equation (4.41) is a 1–1 correspondence of* <sup>Z</sup>*<sup>n</sup>* −→ <sup>Z</sup>*n, and f* <sup>−</sup><sup>1</sup> *defined by equation (4.42) is the inverse mapping of f .*

*Proof* By Lemma 4.5, for all *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*n*, *<sup>k</sup>* is a positive integer, then there is

$$m^{k\wp(n)+1} \equiv m(\text{mod } n)\,.$$

Because of *ed* ≡ 1(mod ϕ(*n*)), we can write

$$ed = k\varphi(n) + 1.$$

By (4.41), then there is

$$c^d \equiv m^{ed} \equiv m^{k\wp(n)+1} \equiv m(\text{mod } n).$$

That is to say, for all *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*n*,

$$f^{-1}(f(m)) = m.$$

In the same way, we have

$$m^e \equiv c^{\epsilon d} \equiv c^{k\varphi(n) + 1} \equiv c(\text{mod } n).$$

In other words,

$$f(f^{-1}(c)) = c.$$

By Lemma 1.1 of Chap. 1, *<sup>f</sup>* is a 1–1 correspondence of <sup>Z</sup>*<sup>n</sup>* −→ <sup>Z</sup>*n*, and *f f* <sup>−</sup><sup>1</sup> <sup>=</sup> 1, *f* <sup>−</sup><sup>1</sup> *f* = 1. Th Lemma holds.

#### 4.7 Basic Algorithm 179

Another very important application of RSA is for digital signature. From the workflow of RSA password, it can be seen that the encryption algorithm defined in formula (4.41) is based on the public key (*n <sup>A</sup>*, *eA*) of user *A*, and we denote *f* as *f <sup>A</sup>* and the decryption algorithm defined in formula (4.42) as *f* <sup>−</sup><sup>1</sup> *<sup>A</sup>* . The workflow of RSA digital signature is: User *A* sends his digital signature to user *B*, that is, *A* sends an encrypted message to *B*. Let *Pe*(*A*) = (*n <sup>A</sup>*, *eA*) be the public key of *A* and *Pd* (*A*) = (*n <sup>A</sup>*, *dA*) the private key of *A*. Similarly, *Pe*(*B*) = (*nB*, *eB*) is the public key of *B* and *Pd* (*B*) = (*nB*, *dB*) is the private key of *B*. Then the digital signature sent by user *A* to user *B* is

$$\begin{cases} \, \_f f\_A^{-1}(m), \text{ if } n\_A < n\_B \\ \, \_f f\_A^{-1} f\_B(m), \text{ if } n\_A > n\_B. \end{cases} \tag{4.43}$$

where *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>A</sup>* is the digital signature published by user *<sup>A</sup>*. After receiving the above digital signature of user *A*, user *B* adopts the following two different digital verification according to the two cases of *n <sup>A</sup>* < *nB* and *n <sup>A</sup>* > *nB*, formula (4.43) is the real signature of user *A*.

(i) If *n <sup>A</sup>* < *nB*, user *B* first decrypts with his private key *f* <sup>−</sup><sup>1</sup> *<sup>B</sup>* = (*nB*, *dB*) and then decrypts with user *A*'s public key *f <sup>A</sup>* = (*n <sup>A</sup>*, *eA*), the verification is as follows

$$(f\_A f\_B^{-1}(f\_B f\_A^{-1}(m)) = f\_A f\_A^{-1}(m) = m.$$

(ii) If *n <sup>A</sup>* > *nB*, user *B* uses user *A*'s public key *f <sup>A</sup>* = (*n <sup>A</sup>*, *eA*) first, then decrypt and verify with your own private key *f* <sup>−</sup><sup>1</sup> *<sup>B</sup>* = (*nB*, *dB*)

$$f\_B^{-1} f\_A(f\_A^{-1} f\_B(m)) = f\_B^{-1} f\_B(m) = m.$$

The security of RSA is the difficulty of large prime factorization based on *n*. When all users select the large prime numbers *p* and *q*, let *n* = *pq*, then destroy *p* and *q*, only (*n*, *e*) and its own secret (*n*, *d*) key information are retained, even if (*n*, *e*) is published to the public, outsiders only know *n* and do not know ϕ(*n*), so they cannot obtain the information of private key (*n*, *d*). Because the calculation of ϕ(*n*) must rely on the prime factorization of *n*, from the product formula of Euler, it is not difficult to see

$$\varphi(n) = n \prod\_{p|n} (1 - \frac{1}{p})\,.$$

Because we have very little knowledge of prime numbers, we have not found a general term formula to give an infinite number of prime numbers, so it is undoubtedly a difficult problem to judge whether a huge integer *n* is prime, not to mention the prime factorization of *n*.

# *4.7.3 Discrete Logarithm*

Let *G* be a finite group and *b*, *y* ∈ *G* be two group elements of *G*, let *t* be the minimum positive integer satisfying *b<sup>t</sup>* = 1, *t* is called the order of *b*, denote as*t* = *o*(*b*). If there is one *x*, 1 ≤ *x* ≤ *o*(*b*) such that *y* = *b<sup>x</sup>* , *x* is called the discrete logarithm of *y* under base *b*. Known *b* ∈ *G*, 0 ≤ *x* ≤ *o*(*b*), it's easy to calculate *y* = *b<sup>x</sup>* . Conversely, for any group element *y*, it is very difficult to find the discrete logarithm of *y* under base *b*. Therefore, using discrete logarithm to encrypt has become the most mainstream encryption algorithm in public key cryptosystem, including the famous ElGamal cryptosystem and elliptic curve cryptosystem. ElGamal cryptosystem uses the discrete logarithm on the multiplication group formed by all F<sup>∗</sup> *<sup>q</sup>* of nonzero elements in finite field F*<sup>q</sup>* . Elliptic curve cryptography uses the discrete logarithm algorithm of Mordell group on elliptic curve. Here we mainly discuss ElGamal cryptography, and elliptic curve cryptography is discussed in Chap. 6. We first prove several basic conclusions in finite field.

**Lemma 4.7** *Let* <sup>F</sup>*<sup>q</sup> be a finite field of q elements and q* <sup>=</sup> *<sup>p</sup><sup>n</sup> be the power of prime p.* F<sup>∗</sup> *<sup>q</sup>* <sup>=</sup> <sup>F</sup>*<sup>q</sup>* \{0} *is all the nonzero elements in* <sup>F</sup>*<sup>q</sup> , then* <sup>F</sup><sup>∗</sup> *<sup>q</sup> is a cyclic group of order* (*<sup>q</sup>* <sup>−</sup> <sup>1</sup>) *under multiplication, and the generating element g of* <sup>F</sup><sup>∗</sup> *<sup>q</sup> is called the generator of finite field* F*<sup>q</sup> .*

*Proof* According to Lagrange theorem, the number of zeros of polynomials in any field is not greater than the degree of polynomials. The finite field F<sup>∗</sup> *<sup>q</sup>* is a finite group of order (*<sup>q</sup>* <sup>−</sup> <sup>1</sup>) under multiplication. To prove that <sup>F</sup><sup>∗</sup> *<sup>q</sup>* is a cyclic group, it is only proved that for any factor *d* of *q* − 1, *d*|*q* − 1, the number of solutions of equation *<sup>x</sup> <sup>d</sup>* <sup>=</sup> 1 in <sup>F</sup><sup>∗</sup> *<sup>q</sup>* is not greater than *d*. This point can be deduced from Lagrange's theorem, because the number of zeros of polynomial *<sup>x</sup> <sup>d</sup>* <sup>−</sup> 1 in the whole field <sup>F</sup>*<sup>q</sup>* is not greater than *d*, so the number of zeros in F<sup>∗</sup> *<sup>q</sup>* is not greater than *d*. So F<sup>∗</sup> *<sup>q</sup>* is a finite cyclic group. The Lemma holds.

**Lemma 4.8** *Let* <sup>F</sup>*<sup>q</sup> be a q-element finite field, q* <sup>=</sup> *<sup>p</sup>n,* <sup>F</sup>*<sup>p</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>q</sup> is a subfield,* <sup>F</sup><sup>∗</sup> *<sup>p</sup>* < F∗ *<sup>q</sup> is a subgroup of* F<sup>∗</sup> *<sup>q</sup> , if g is the generator of* F<sup>∗</sup> *<sup>q</sup> , then g* = *g q*−1 *<sup>p</sup>*−<sup>1</sup> *is the generator of* F<sup>∗</sup> *p.*

*Proof g* is the generator of F<sup>∗</sup> *<sup>q</sup>* , then *o*(*g*) = *q* − 1. Let *g* = *g q*−1 *<sup>p</sup>*−<sup>1</sup> , then

$$o(g') = \frac{o(g)}{(q-1, \frac{q-1}{p-1})} = p-1.$$

Thus(*g* )*<sup>p</sup>*−<sup>1</sup> = 1, that is(*g* )*<sup>p</sup>* = *g* , so *<sup>g</sup>* <sup>∈</sup> <sup>F</sup>*p*. Because <sup>F</sup><sup>∗</sup> *<sup>p</sup>* is a cyclic group of order *p* − 1, and *o*(*g* ) <sup>=</sup> *<sup>p</sup>* <sup>−</sup> 1, so <sup>F</sup><sup>∗</sup> *<sup>p</sup>* <sup>=</sup><sup>&</sup>lt; *<sup>g</sup>* <sup>&</sup>gt;, *<sup>g</sup>* is the generator of <sup>F</sup><sup>∗</sup> *<sup>p</sup>*. The Lemma holds.

**Lemma 4.9** *Let* <sup>F</sup>*<sup>q</sup> be a q-element finite field, q* <sup>=</sup> *<sup>p</sup>n, for any d*|*n, let*

*Ad* = {*p*(*x*) <sup>∈</sup> <sup>F</sup>*p*[*x*]| deg *<sup>p</sup>*(*x*) <sup>=</sup> *<sup>d</sup>*, *<sup>p</sup>*(*x*) *is an irreducible monic polynomial*}

*and*

$$f\_d(\mathbf{x}) = \prod\_{p(\mathbf{x}) \in A\_d} p(\mathbf{x}).$$

*Then we have*

$$\mathbf{x}^{q} - \mathbf{x} = \mathbf{x}^{p^{\mathrm{v}}} - \mathbf{x} = \prod\_{d|n} f\_d(\mathbf{x}).\tag{4.44}$$

*Proof* We know

$$|\mathbf{x}^{p^{n^\*}} - \mathbf{x}|\mathbf{x}^{p^\*} - \mathbf{x} \Longleftrightarrow d|n..$$

Let *<sup>p</sup>*(*x*) <sup>∈</sup> *Ad* , that is *<sup>p</sup>*(*x*) <sup>∈</sup> <sup>F</sup>*p*[*x*], deg *<sup>p</sup>*(*x*) <sup>=</sup> *<sup>d</sup>*, *<sup>p</sup>*(*x*) is an irreducible monic polynomial. Let α be a root of *p*(*x*), then add a finite extension field of α on F*<sup>p</sup>* and <sup>F</sup>*p*(α) is a *<sup>d</sup>*-th finite extension on <sup>F</sup>*p*. If *<sup>d</sup>*|*n*, then

$$\mathbb{F}\_P(\alpha) = \mathbb{F}\_{\mathcal{P}^d} \subset \mathbb{F}\_q,$$

so there is <sup>α</sup> <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* . Because the zeros of *<sup>p</sup>*(*x*) are all in <sup>F</sup>*<sup>q</sup>* , so there is *<sup>p</sup>*(*x*)|*x<sup>q</sup>* <sup>−</sup> *<sup>x</sup>*. Any *p*(*x*) in *Ad* has *p*(*x*)|*x<sup>q</sup>* − *x*, so

$$f\_d(\mathbf{x}) = \prod\_{p(\mathbf{x}) \in A\_d} p(\mathbf{x}), \ f\_d(\mathbf{x}) | \mathbf{x}^q - \mathbf{x}.$$

Conversely, *p*(*x*) is the first irreducible polynomial, and deg *p*(*x*) = *d*. If *p*(*x*)|*x<sup>q</sup>* − *x*, then the zeros of *p*(*x*) are all in F*<sup>q</sup>* . Let α be a zero point of *p*(*x*), then there is <sup>F</sup>*p*(α) <sup>⊂</sup> <sup>F</sup>*<sup>q</sup>* , that is <sup>F</sup>*pd* <sup>⊂</sup> <sup>F</sup>*<sup>q</sup>* <sup>=</sup> <sup>F</sup>*pn* , so *<sup>d</sup>*|*n*. Finally,

$$\alpha^q - \alpha = \prod\_{d|n} f\_d(\chi).$$

The Lemma holds.

**Lemma 4.10** *Np*(*d*) *represents the number of the first irreducible polynomial with degree d in* <sup>F</sup>*p*[*x*]*, then*

$$N\_p(n) = \frac{1}{n} \sum\_{d|n} \mu(d) \, p^{\frac{n}{d}},\tag{4.45}$$

*where* μ *is Möbius function.*

*Proof* By Lemma 4.9 and (4.44),

$$x^q - x = x^{p^n} - x = \prod\_{d|n} f\_d(x).$$

Comparing the degree of polynomials on both sides, there is

$$p^n = \sum\_{d|n} dN\_p(d).$$

By the Möbius inverse formula,

$$nN\_p(n) = \sum\_{d|n} \mu(d)p^{\frac{n}{2}},$$

so there is (4.45), the Lemma holds.

**Corollary 4.4** *If d is a prime number, the degree in* <sup>F</sup>*p*[*x*] *is d and the number of the first irreducible polynomial is* <sup>1</sup> *<sup>d</sup>* (*p<sup>d</sup>* − *p*)*, that is*

$$N\_p(d) = \frac{1}{d}(p^d - p), \text{ if } d \text{ is a prime number.} $$

*Proof* By (4.45),

$$N\_p(d) = \frac{1}{d} \sum\_{\delta \mid d} \mu(\delta) p^{\frac{d}{d}}$$

$$= \frac{1}{d} (p^d - p).$$

The Corollary holds.

Based on the above basic conclusions about finite fields, we introduce two methods for solving discrete logarithms. The first is the Silver–Pohlig–Hellman smoothing method, and the second is the so-called exponential integration method.

#### Silver–Pohlig–Hellman

Let F*<sup>q</sup>* be a *q*-element finite field, *b* is the generator, that is F<sup>∗</sup> *<sup>q</sup>* =< *b* >,

$$o(b) = |\mathbb{F}\_q^\*| = q - 1 = p\_1^{a\_1} p\_2^{a\_2} \cdots p\_s^{a\_s},\tag{4.46}$$

where *pi* is a different prime number. *p* for each prime factor of *q* − 1, *p*|*q* − 1, if *p* is relatively "small", the positive integer *q* − 1 is called a smooth positive integer. Under the condition that *q* − 1 is smooth, for each prime factor *p*, calculate all *p*-th unit roots *r <sup>p</sup>*,*<sup>j</sup>* in F<sup>∗</sup> *<sup>q</sup>* , where

$$r\_{p,j} = b^{\frac{j(q-1)}{p}}, \ 1 \le j \le p. \tag{4.47}$$

Denote *<sup>R</sup>*(*p*) = {*<sup>r</sup> <sup>p</sup>*,*<sup>j</sup>*|<sup>1</sup> <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>p</sup>*} is the root of *p p* subunits in <sup>F</sup><sup>∗</sup> *<sup>q</sup>* , then in F<sup>∗</sup> *<sup>q</sup>* , we get a unit root table *R*.

$$\text{unit root table } R = \{R(p\_1), R(p\_2), \dots, R(p\_s)\}.\tag{4.48}$$

Now let's look at the calculation method of discrete logarithm in F<sup>∗</sup> *<sup>q</sup>* . Let *<sup>y</sup>* <sup>∈</sup> <sup>F</sup><sup>∗</sup> *<sup>q</sup>* , the discrete logarithm of *y* under base *b* is *m*, that is *y* = *b<sup>m</sup>*. When *y* and *b* are given, the value of *m* is desired (1 ≤ *m* ≤ *q* − 1), by the prime factor decomposition of *q* − 1 of formula (4.46), if for each *p*α*<sup>i</sup> <sup>i</sup>* (1 ≤ *i* ≤ *s*), the minimum nonnegative residue of *m* under mod *p*α*<sup>i</sup> <sup>i</sup>* is *mi* <sup>=</sup> *<sup>m</sup>* mod *<sup>p</sup>*α*<sup>i</sup> <sup>i</sup>* , according to the Chinese remainder theorem, there is a unique *m* mod *q* − 1 such that

$$m \equiv m\_i(\text{mod}(p\_i^{\alpha\_i})), \forall \ i, \ 1 \le i \le s.$$

Therefore, the discrete logarithm *m* of *y* is determined. Now the question is: let *p*<sup>α</sup>||*q* − 1, we determine *m* mod *p*<sup>α</sup>. Let

$$m \bmod p^a = m\_0 + m\_1p + m\_2p^2 + \dots + m\_{a-1}p^{a-1}, 0 \le m\_i < p$$

A is the minimum nonnegative residue of *m* mod *p*<sup>α</sup>, let's determine each *mi* . First, we calculate *m*0. Because *y* = *b<sup>m</sup>*, so

$$b^{\frac{q-1}{p}} = b^{\frac{m(q-1)}{p}} = b^{\frac{m\_0(q-1)}{p}}.$$

That is, *y q*−1 *<sup>p</sup>* is a unit root in F<sup>∗</sup> *<sup>q</sup>* , compare the unit root table *R* in F<sup>∗</sup> *<sup>q</sup>* , then we have *<sup>m</sup>*<sup>0</sup> <sup>=</sup> *<sup>j</sup>*, <sup>1</sup> <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>p</sup>*, which determines *<sup>m</sup>*0. Next, calculate *<sup>m</sup>*1, let *<sup>y</sup>*<sup>1</sup> <sup>=</sup> *<sup>y</sup> bm*<sup>0</sup> = *bm*−*m*<sup>0</sup> , therefore, the discrete logarithm of *y*<sup>1</sup> is *m* − *m*0, and

$$m - m\_0 \equiv m\_1 p + m\_2 p^2 + \dots + m\_{a-1} p^{a-1} (\text{mod } p^a),$$

so

$$y\_1^{\frac{q-1}{r^\*}} = b^{\frac{(m-m\_0)(q-1)}{r^2}} = b^{\frac{m\_1(q-1)}{p}}.$$

in other words, *y q*−1 *p*2 <sup>1</sup> is a *p* subunit root of F<sup>∗</sup> *<sup>q</sup>* , comparing the unit root table *R*, we can determine *m*1. Continuing with this method, we can calculate *m*2,..., *m*<sup>α</sup>−<sup>1</sup> in turn, so *m* mod *p*<sup>α</sup> is calculated, then by the Chinese remainder theorem, the discrete logarithm *m* of *y* under *b* is calculated.

#### Exponential integral method

Let <sup>F</sup>*<sup>q</sup>* be the finite field of *<sup>q</sup>* element, *<sup>q</sup>* <sup>=</sup> *<sup>p</sup><sup>n</sup>*, *<sup>p</sup>* be a relatively small prime number, and *n* be a large positive integer, so that the security of *q* can meet certain requirements. Let F*<sup>p</sup>* be the finite field of *p* element, we can think of F*<sup>q</sup>* as an *n*-th extension field of F*p*, according to the finite extension theory of the field, F*<sup>q</sup>* equivalent to (isomorphism) a quotient ring of polynomial ring <sup>F</sup>*p*[*<sup>T</sup>* ] over <sup>F</sup>*p*. Let *<sup>f</sup>* (*<sup>T</sup>* ) <sup>∈</sup> <sup>F</sup>*p*[*<sup>T</sup>* ] be the first irreducible polynomial of *<sup>n</sup>* degree, then

$$\mathbb{F}\_q = \mathbb{F}\_p[T]/\_{\prec f(T)\succ} = \{a\_0 + a\_1T + \dots + a\_0T^{n-1} | \forall \, a\_i \in \mathbb{F}\_p\}.\tag{4.49}$$

Therefore, any element *a* in F*<sup>q</sup>* is equivalent to a polynomial *a*(*T* ) on F*p*, where deg *<sup>a</sup>*(*<sup>T</sup>* ) <sup>≤</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup>. Let *<sup>b</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* be the generator of <sup>F</sup>*<sup>q</sup>* , *<sup>b</sup>* <sup>=</sup> *<sup>b</sup>*(*<sup>T</sup>* ), if *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>F</sup>*<sup>p</sup>* is a constant polynomial, *a*<sup>0</sup> is called a constant in F*<sup>q</sup>* .

By Lemma 4.8, the discrete logarithm of the constant in F*<sup>q</sup>* can be easily determined. Let *<sup>b</sup>* <sup>∈</sup> <sup>F</sup>*<sup>p</sup>* be the generator of <sup>F</sup><sup>∗</sup> *<sup>p</sup>*, if *m* is the discrete logarithm of constant *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>F</sup>*<sup>p</sup>* to base *<sup>b</sup>* , then by Lemma 4.8, *<sup>m</sup>* <sup>=</sup> *<sup>m</sup> <sup>q</sup>*−<sup>1</sup> *<sup>p</sup>*−<sup>1</sup> is the discrete logarithm of *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* under base *<sup>b</sup>*. Take *<sup>m</sup>* (*a*0) as the discrete logarithm of *a*<sup>0</sup> under base *b* , since *p* is small, we can easily calculate and list the discrete logarithms of all constants in F*<sup>q</sup>* :

$$L\_0 = \{ m'(a\_0) \frac{q-1}{p-1} | a\_0 \in \mathbb{F}\_p \}. \tag{4.50}$$

Next, we determine the discrete logarithm of a nonconstant polynomial under base *b*(*T* ). Let 1 < *m* < *n*, define

$$L\_m = \{ p(\mathbf{x}) \in \mathbb{F}\_p[\mathbf{x}] \, | \, p(\mathbf{x}) \text{ is monic irreducible polynomial}, \text{deg } p(\mathbf{x}) \le m \},\tag{4.51}$$

The number of irreducible polynomials in *Lm* is written as *hm*, that is |*Lm*| = *hm*. We first calculate the discrete exponent of irreducible polynomials in *Lm*.

Let *<sup>b</sup>* <sup>=</sup> *<sup>b</sup>*(*<sup>T</sup>* ) be the generator ofF<sup>∗</sup> *<sup>q</sup>* , *<sup>b</sup>*(*<sup>T</sup>* ) <sup>∈</sup> <sup>F</sup>*p*[*<sup>T</sup>* ], deg *<sup>b</sup>*(*<sup>T</sup>* ) <sup>≤</sup> *<sup>n</sup>* <sup>−</sup> 1, obviously, when *t* runs through all positive integers from 1 to *q* − 1, *b<sup>t</sup>* (*T* ) runs through all nonzero polynomials in Eq. (4.49). Appropriate choice *t*, let

$$b^!(T) \equiv c(T) (\text{mod } f(T)), \text{ deg } c(T) \le n - 1.$$

Such that

$$c(T) = c\_0 \prod\_{p(T) \in L\_n} p(T)^{a\_{c,p}},$$

denote the discrete logarithm of *a*(*T* ) under *b*(*T* ) with ind(*a*(*T* )), which can be obtained from the above formula,

$$\text{ind}(c(T)) - \text{ind}(c\_0) \equiv \sum\_{p(T) \in L\_n} \alpha\_{c,p} \text{ind}(p(T)) \ (\text{mod } q - 1) .$$

Because of ind(*c*(*T* )) = *t*, thus,

$$t - \text{ind}(c\_0) \equiv \sum\_{p(T) \in L\_n} a\_{c,p} \text{ind}(p(T)) \, (\text{mod } q - 1). \tag{4.52}$$

By (4.50), ind(*c*0) is known, therefore, the above formula is a linear equation with *hm* variables ind(*p*(*T* )). By continuously selecting the appropriate *t*, we can obtain *hm* independent linear equations, that is, the *hm* × *hm*-order matrix formed by the coefficients of *hm* variables and *hm* linear equations is reversible under mod *q* − 1, by Lemma 4.2, as long as its determinant and *q* − 1 are coprime. From the knowledge of linear algebra, we can calculate all ind(*p*(*T* )) by solving the above linear equations, the following exponential integral table *Bm* is obtained,

$$B\_m = \{ \text{ind}(p(T)) | p(T) \in L\_m \}.\tag{4.53}$$

With exponential integral table *Bm*, the discrete logarithm of any element *a*(*T* ) ∈ F∗ *<sup>q</sup>* can be easily calculated. Let *a*1(*T* ) = *a*(*T* )*b*(*T* )*<sup>t</sup>* , select the appropriate *t* such that

$$a\_1(T) \equiv a\_0 \prod\_{p(T) \in L\_m} p(T)^{a\_s} \pmod{f(T)}.$$

Once the decomposition is established, there are

$$\text{ind}(a\_1(T)) = \text{ind}(a\_0) + \sum\_{p(T) \in L\_n} \alpha\_d \text{ind}(p(T)).$$

Thus

$$\text{ind}(a(T)) = \text{ind}(a\_1(T)) - t.$$

The discrete logarithm of *a*(*T* ) is obtained.

*Remark 4.1* The key to the above calculation is to select an appropriate *m* to obtain the exponential integral table *Bm*. This *m* cannot be too large, because *hm* increases exponentially with *m*, for example, if *m* is a prime number, then by Corollary 4.4,

$$h\_m = |L\_m| = \frac{1}{m}(p^m - p).$$

When *hm* is too large and calculating the exponential integral table *Bm*, a matrix of order *hm* × *hm* will be solved, and its computational complexity is exponential. Obviously, *m* cannot be too small, the selection of *m* depends on *p* and *n*, when *<sup>p</sup>* <sup>=</sup> <sup>2</sup>, *<sup>n</sup>* <sup>=</sup> 127, *<sup>m</sup>*'s best choice is *<sup>m</sup>* <sup>=</sup> 17. Select finite field <sup>F</sup>*<sup>q</sup>* , *<sup>q</sup>* <sup>=</sup> <sup>2</sup>127, because *q* − 1 = 2<sup>127</sup> − 1 is a Mersenne prime. This is a popular option at present.

#### ElGamal cryptosystem

Using the computational complexity of discrete logarithm to design asymmetric cryptosystem is the basic idea of ElGamal cryptosystem. Each user randomly selects a finite field <sup>F</sup>*<sup>q</sup>* , *<sup>q</sup>* <sup>=</sup> *<sup>p</sup><sup>n</sup>*, *<sup>p</sup>* is a sufficiently large prime number, and then calculates the generator *g* of F<sup>∗</sup> *<sup>q</sup>* , select the positive integer *x* randomly, 1 < *x* < *q* − 1, and calculate *y* = *g<sup>x</sup>* , to get the public key *Pe* = (*y*, *g*, *q*), own private key *Pd* = (*x*, *g*, *q*).

Encryption algorithm: To send an encrypted message to user *A*, user *B* first corresponds each plaintext unit of plaintext space *P* to an element in F<sup>∗</sup> *<sup>q</sup>* , and then encrypts each plaintext unit. Let *<sup>m</sup>* <sup>∈</sup> <sup>F</sup><sup>∗</sup> *<sup>q</sup>* be a plaintext unit, and user *B* randomly selects an integer *k*, 1 < *k* < *q* − 1, then, the public key (*y*, *g*, *q*) of user *A* is used to encrypt *m*, and the encryption algorithm *f* is

186 4 Cryptosystem and Authentication System

$$f(m) = c', \text{where} \begin{cases} c' = m \text{y}^k, \\ c = \text{g}^k. \end{cases} \tag{4.54}$$

Get cryptosystemtext (*c*, *c* ).

Decryption algorithm: After receiving the cryptosystemtext (*c*, *c* ) sent by user *B*, user *A* decrypts (*c*, *c* ) with its own private key (*x*, *g*, *q*), decryption algorithm *f* <sup>−</sup><sup>1</sup> is

$$f^{-1}(c') = c'c^{-x}.\tag{4.55}$$

**Lemma 4.11** *The encryption algorithm f defined by Eq. (4.54) is a 1–1 correspondence of* F<sup>∗</sup> *<sup>q</sup>* −→ <sup>F</sup><sup>∗</sup> *<sup>q</sup> , the inverse mapping f* <sup>−</sup><sup>1</sup> *of f is given by equation (4.55).*

*Proof* By (4.54), *c* = *g<sup>k</sup>* , *c* = *my<sup>k</sup>* , then

$$c'c^{-x} = mg^k g^{-kx} = mg^{\times k}g^{-\times k} = m.$$

That is to say *f* <sup>−</sup><sup>1</sup>( *f* (*m*)) = *m*, conversely,

$$c'c^{-x} \mathbf{y}^k = c'\mathbf{g}^{-\times k}\mathbf{g}^{\times k} = c'.$$

that is *f* ( *f* <sup>−</sup><sup>1</sup>(*c* )) = *c* , therefore, *f* is the 1–1 correspondence of F<sup>∗</sup> *<sup>q</sup>* −→ <sup>F</sup><sup>∗</sup> *<sup>q</sup>* and the inverse mapping of *f* is *f* <sup>−</sup>1. The Lemma holds.

Finally, we discuss the computational complexity over finite fields.

**Lemma 4.12** <sup>F</sup>*<sup>q</sup> is a finite field, q* <sup>=</sup> *<sup>p</sup>n,* α, β <sup>∈</sup> <sup>F</sup><sup>∗</sup> *<sup>q</sup>* , *k* ≥ 1 *is a positive integer, then*

$$\begin{aligned} \operatorname{Time}(\alpha \beta) &= O(\log^3 q), \\\\ \operatorname{Time}(\frac{\alpha}{\beta}) &= O(\log^3 q), \\\\ \operatorname{Time}(\alpha^k) &= O(\log k \log^3 q). \end{aligned}$$

*Proof* Let *<sup>f</sup>* (*x*) <sup>∈</sup> <sup>F</sup>*p*[*x*], deg *<sup>f</sup>* (*x*) <sup>=</sup> *<sup>n</sup>*, *<sup>f</sup>* (*x*) is a monic irreducible polynomial, then

$$\mathbb{F}\_q = \mathbb{F}\_p[\mathbf{x}]/\_{\prec f(\mathbf{x}) \succ} = \{a\_0 + a\_1\mathbf{x} + \dots + a\_{n-1}\mathbf{x}^{n-1} | \forall \ a\_i \in \mathbb{F}\_p\}.$$

Let α, β <sup>∈</sup> <sup>F</sup><sup>∗</sup> *<sup>q</sup>* , then

$$\alpha = a\_0 + a\_1 \mathbf{x} + \dots + a\_{n-1} \mathbf{x}^{n-1}, \ \beta = b\_0 + b\_1 \mathbf{x} + \dots + b\_{n-1} \mathbf{x}^{n-1}.$$

The multiplication of two polynomials requires *n*<sup>2</sup> times of mod *p* operation, and the bit operation times of each mod *<sup>p</sup>* operation is *<sup>O</sup>*(log<sup>2</sup> *<sup>p</sup>*), so <sup>α</sup> · <sup>β</sup> needs *<sup>O</sup>*(*n*<sup>2</sup> log<sup>2</sup> *<sup>p</sup>*) <sup>=</sup> *<sup>O</sup>*(log2 *<sup>q</sup>*)- bit operation to get a polynomial on <sup>F</sup>*p*[*x*]. The resulting polynomial is divided by *f* (*x*) to obtain a polynomial of degree ≤ *n* − 1, that is, the final result of α · β, the number of bit operations required for this operation is *O*(*n* log<sup>3</sup> *p*). Therefore,

$$\operatorname{Time}(\alpha \beta) = O(\log^3 q + n \log^3 p) = O(\log^3 q),$$

the same can be estimated Time( <sup>α</sup> <sup>β</sup> ) and Time(α*<sup>k</sup>* ). The Lemma holds.

# *4.7.4 Knapsack Problem*

Given a pile of items with different weights, can you put all or several of these items into a backpack to make it equal to a given weight? This is a knapsack problem arising from real life. Abstract into mathematical problems: Suppose *A* = {*a*0, *a*1,..., *an*−1} are *n* sets of positive integers, *N* is a positive integer. Is *N* the sum of the elements of a subset in *A*? Using binary system, the knapsack problem in mathematics can be expressed as follows:

Knapsack problem:When *N* and *A* = {*a*0, *a*1,..., *an*−1} given, where each *ai* ≥ 1 is a positive integer, whether there is a binary integer *e* = (*en*−<sup>1</sup>*en*−<sup>2</sup> ··· *e*1*e*0)<sup>2</sup> makes the following formula true,

$$\sum\_{i=0}^{n-1} e\_i a\_i = N, \text{ where } e\_i = 0 \text{ or } e\_i = 1.$$

If *e* exists, it is called knapsack problem (*A*, *N*) solvable, denote as ψ(*A*, *N*) = *e*. If *N* = 0, then ψ(*A*, 0) = 0 (each *ei* = 0) is called a trivial solution. Therefore, *N* ≥ 1 is assumed to be a positive integer.

The above knapsack problem may have solutions, no solutions or multiple solutions. It is very difficult to solve the general knapsack problem (*A*, *N*), which belongs to the "NP complete" problem. If the conjecture of "*P* = *N P*" holds, there is no general algorithm, and its computational complexity is polynomial of *n* and log *N*. However, under some special conditions, such as the so-called super-increasing sequence, the solution of the problem will be very easy. Next, we introduce the polynomial solution method on the premise of super-increasing sequence.

**Definition 4.11** A positive integer sequence {*ai*}*<sup>i</sup>*≥<sup>0</sup> is called a super-increasing sequence, if each *ai*(*i* ≥ 1) is greater than the sum of the previous*i* positive integers, that is

$$a\_i > \sum\_{j=0}^{i-1} a\_j, \ 1 \le i < \infty. \tag{4.56}$$

The knapsack problem of super-increasing sequence is actually to find a monotonically decreasing index sequence {*ik* }*<sup>k</sup>*≥0, where *ik* > *ik*+<sup>1</sup>, 0 ≤ *ik* ≤ *n* − 1, ∀ *k* ≥ 0. First, *i*<sup>0</sup> is defined as

$$i\_0 = \max\{i | a\_i \le N\}.\tag{4.57}$$

Then consider *N* − *ai*<sup>0</sup> = 0, then the algorithm is completed, that *N* = *ai*<sup>0</sup> . If *N* − *ai*<sup>0</sup> > 0, then define

$$i\_1 = \max\{i | a\_i \le N - a\_{i\_0}\}.$$

For any *k* ≥ 1, define

$$i\_k = \max\{i | a\_i \le N - a\_{i\_0} - \dots - a\_{i\_{k-1}}\}.\tag{4.58}$$

If the equal sign in Eq. (4.58) holds, that is *aik* = *N* − *ai*<sup>0</sup> −···− *aik*−<sup>1</sup> , then the algorithm completes and obtains the solution *N* = *ai*<sup>0</sup> + *ai*<sup>1</sup> ···+ *aik* of (*A*, *N*). If *ik* does not exist, that is

$$N - a\_{i\_0} - \dots - a\_{i\_{k-1}} \prec a\_i, \ \forall \ i \neq i\_0, i\_1, \dots, i\_{k-1}, \dots$$

call the algorithm terminated. Obvious indicators *i*<sup>0</sup> > *i*<sup>1</sup> > ··· > *ik* > ··· . Let *I* be a set of some indicators, and denote the above algorithm as ψ.

**Lemma 4.13** *Let A* = {*a*0, *a*1,..., *an*−1} *be a given set of positive integers, ai*(*i* ≥ 0) *is a super-increasing sequence, N is a positive integer. If there is a k* ≥ 0 *that makes* ψ *complete at k, that is aik* = *N* − *ai*<sup>0</sup> −···− *aik*−<sup>1</sup> *, then the knapsack problem* (*A*, *N*) *has a solution and the solution is*

$$
\psi(A, N) = e = (e\_{n-1}e\_{n-2}\cdots e\_1e\_0)\_2,
$$

*where*

$$\begin{cases} e\_i = 1, & \sharp fi \in I, \\ e\_i = 0, & \sharp fi \notin I. \end{cases}$$

*If there is a k* ≥ 0*,* ψ *that terminates at k, i.e.,*

$$N - a\_{i\_0} - \dots - a\_{i\_{k-1}} < a\_i, \forall i \notin \{i\_0, i\_1, \dots, i\_{k-1}\}.$$

*Then the knapsack problem* (*A*, *N*) *has no solution.*

*Proof* If ψ is completed at *k* ≥ 0, then

$$N = a\_{i\_0} + a\_{i\_1} + \dots + a\_{i\_k}, \ I = \{i\_0, i\_1, \dots, i\_k\},$$

Let *ei* = 1, when *i* ∈ *I*; *ei* = 0, when*i* ∈/ *I*, obviously,

$$\sum\_{i=0}^{n-1} e\_i a\_i = N.$$

So ψ(*A*, *N*) = *e* = (*en*−<sup>1</sup>*en*−<sup>2</sup> ··· *e*1*e*0)2, if *k* ≥ 0 exists so that ψ terminates at *k*, that is

$$N - a\_{i\_0} - \dots - a\_{i\_{k-1}} < a\_i, \forall i \notin \{i\_0, i\_1, \dots, i\_{k-1}\}.$$

Then the knapsack problem (*A*, *N*) has no solution. We can prove this conclusion by means of counter-evidence. If (*A*, *N*) has a solution, you might as well make

$$N = a\_{j\_0} + a\_{j\_1} + \dots + a\_{j\_l}.$$

Adjust the order, we can let *j*<sup>0</sup> > *j*<sup>1</sup> > ··· > *jt* . By the definition of *i*0, and *aj*<sup>0</sup> ≤ *N*, know *j*<sup>0</sup> ≤ *i*0, thus

$$N \ge a\_{i\_0} \ge a\_{j\_0} > \sum\_{r=0}^{j\_0 - 1} a\_r \ge a\_{j\_0} + a\_{j\_1} + \dots + a\_{j\_r}$$

contradict with *N* = *aj*<sup>0</sup> + *aj*<sup>1</sup> +···+ *ajt* , so (*A*, *N*) has no solution. The Lemma holds.

MH knapsack public key encryption system

Merkle and Hellman first proposed an encryption method using knapsack problem in 1978, it is the first public key encryption password. Let *A* = {*a*0, *a*1,..., *an*−1} be a sequence of super-increasing positive integers, take *p*, *b* as two prime numbers and satisfy

$$p > \sum\_{i=0}^{n-1} a\_i, \ 1 \le b \le p - 1. \tag{4.59}$$

Calculate *ti* ≡ *bai*(mod *p*), 0 ≤ *i* ≤ *n* − 1, then the public key is *t* = (*t*0, *t*1,..., *tn*−<sup>1</sup>), private key are *A* and *b*.

Encryption algorithm: The plaintext space *<sup>P</sup>* <sup>=</sup> <sup>F</sup>*<sup>n</sup>* 2, for each plaintext unit *m* = (*m*0*m*<sup>1</sup> ··· *mn*−<sup>1</sup>) ∈ *P*, encryption algorithm

$$c = f(m) \equiv \sum\_{i=0}^{n-1} t\_i m\_i(\text{mod } p), \ 0 \le c \le p,\tag{4.60}$$

where *c* is cryptosystemtext.

Decryption algorithm: First, use the private key *N* ≡ *b*−<sup>1</sup>*c*(mod *p*), 0 ≤ *N* ≤ *p* − 1. Then use the algorithm ψ = *f* <sup>−</sup><sup>1</sup> of knapsack problem (*A*, *N*) to solve

$$f^{-1}(N) = (m\_0 m\_1 \cdots m\_{n-1}) \in \mathbb{F}\_2^n,\tag{4.61}$$

to get plaintext *m* = (*m*0*m*<sup>1</sup> ··· *mn*−<sup>1</sup>).

The correctness of MH knapsack public key cryptography is attributed to the following Lemma.

**Lemma 4.14** *The encryption algorithm f defined by Eq. (4.60) is a 1–1 correspondence of* F*<sup>n</sup>* <sup>2</sup> −→ <sup>F</sup>*p, its inverse mapping f* <sup>−</sup><sup>1</sup> *is given by equation (4.61).*

*Proof* If *<sup>m</sup>* <sup>=</sup> 0 is the zero vector in <sup>F</sup>*<sup>n</sup>* 2, then *c* = 0, thus *N* = 0. Knapsack problem (*A*, <sup>0</sup>) has a unique trivial solution ψ(*A*, <sup>0</sup>) <sup>=</sup> <sup>0</sup> <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> is a zero vector. Therefore, the zero vector in F*<sup>n</sup>* <sup>2</sup> is a 1–1 correspondence of the zero element in <sup>F</sup>*p*. Let *<sup>m</sup>* = 0, if

$$N \equiv b^{-1}c(\text{mod } p), \ c \equiv \sum\_{i=0}^{n-1} t\_i m\_i(\text{mod } p).$$

Then

$$N \equiv \sum\_{i=0}^{n-1} m\_i b^{-1} t\_i \equiv \sum\_{i=0}^{n-1} m\_i a\_i \pmod{p}.$$

By (4.59) and 0 ≤ *N* < *p*, to obtain

$$N = \sum\_{i=0}^{n-1} m\_i a\_i, \Longrightarrow \psi(A, N) = m = m\_0 m\_1 \cdots m\_{n-1}.$$

So we have

$$f^{-1}(f(m)) = m, \,\forall \, m \in \mathbb{F}\_2^n.$$

Conversely, if

$$N = \sum\_{i=0}^{\pi - 1} m\_i a\_i,$$

then

$$bN \equiv \sum\_{i=0}^{n-1} m\_i a\_i b \equiv \sum\_{i=0}^{n-1} m\_i t\_i (\text{mod } p).$$

So there is *N* ≡ *b*−<sup>1</sup>*c*(mod *p*), that is

$$f(f^{-1}(c)) = c, \,\forall \, c \in \mathbb{F}\_p.$$

It can be seen that *f* is a 1–1 correspondence of F*<sup>n</sup>* <sup>2</sup> −→ <sup>F</sup>*<sup>p</sup>* and the inverse mapping is *f* <sup>−</sup><sup>1</sup> = ψ. The Lemma holds.

It can be seen from the above discussion that if *A* = {*a*0, *a*1,..., *an*−<sup>1</sup>} is not a super-increasing sequence, the decryption algorithm *f* <sup>−</sup><sup>1</sup> is a difficult problem of "NP complete class", so the encryption and decryption algorithm defined by MH knapsack cryptosystem is the most typical trapdoor single function. Because of this, people believe that MH knapsack public key cryptography is very secure for a long time. However, in 1982, Shamir proved that a class of nonsuper-increasing sequences can

#### 4.7 Basic Algorithm 191

be transformed into super-increasing sequences by a simple transformation *x* −→ *ax* mod *m*, which can be solved by polynomial algorithm. Although this kind of convertible nonsuper-increasing sequence knapsack problem is quite special, it is enough to shake people's confidence in the security of knapsack problem public key cryptosystem. It is now generally accepted that knapsack public key cryptography is no longer secure.

Shamir transform

Let *A*<sup>1</sup> = {α0, α1,...,α*<sup>n</sup>*−<sup>1</sup>} is a super-increasing sequence of positive integers. Randomly select four positive integers *m*1, *a*1, *m*2, *a*2, where

$$m\_1 > \sum\_{i=0}^{n-1} \alpha\_i, \ m\_2 > nm\_1, \ (a\_1, m\_1) = (a\_2, m\_2) = 1. \tag{4.62}$$

A new positive integer sequence is defined by *m*<sup>1</sup> and *a*1,

$$A\_2 = \{a\_0, a\_1, \dots, a\_{n-1}\}, \text{ where } a\_i = a\_1 \alpha\_i \bmod m\_1.$$

Where *a*1α*<sup>i</sup>* mod *m*<sup>1</sup> represents the minimum nonnegative residue of *a*1α*<sup>i</sup>* mod *m*1, that is

$$0 \le \alpha\_i < m\_1,\text{ and } \alpha\_i \equiv a\_1 \alpha\_i (\text{mod } m\_1). \tag{4.63}$$

By the third sequence of positive integers is defined by *m*<sup>2</sup> and *a*2,

$$A\_3 = \{\mu\_0, \mu\_1, \dots, \mu\_{n-1}\}, \ u\_i = a\_2 \rho\_i \bmod m\_2,$$

that is

$$0 \le u\_i < m\_2, \ u\_i \equiv a\_2 \alpha\_i (\text{mod } m\_2). \tag{4.64}$$

Because {*ui*} is not a super-increasing sequence, if *A*<sup>3</sup> is used for encryption, it seems to be a general knapsack problem. Its difficulty will be NP complete, but Shamir transform will prove that its decryption algorithm is polynomial.

Let *<sup>x</sup>* <sup>=</sup> (*en*−<sup>1</sup>*en*−<sup>2</sup> ··· *<sup>e</sup>*1*e*0)<sup>2</sup> <sup>∈</sup> <sup>F</sup>*<sup>n</sup>* <sup>2</sup> be clear text and encrypt with *A*3,

$$c = f(\mathbf{x}) = \sum\_{i=0}^{n-1} e\_i u\_i,\tag{4.65}$$

get cryptosystemtext *c*. If decryption is required after receiving cryptosystemtext *c*, it is a general knapsack problem, but the problem of using private key (*b*1, *m*1, *b*2, *m*2) will become quite simple, where

$$\begin{cases} 0 \le b\_1 < m\_1, & a\_1 b\_1 \equiv 1 (\text{mod} \, m\_1) \\ 0 \le b\_2 < m\_2, & a\_2 b\_2 \equiv 1 (\text{mod} \, m\_2) . \end{cases}$$

#### 192 4 Cryptosystem and Authentication System

First, note the minimum nonnegative residue of *b*2*c* under mod *m*2,

$$N\_0 = b\_2 c \bmod m\_2 = \sum\_{i=0}^{n-1} e\_i \omega\_i. \tag{4.66}$$

Because by (4.65),

$$b\_2 c \equiv \sum\_{i=0}^{n-1} e\_i b\_2 \mu\_i \equiv \sum\_{i=0}^{n-1} e\_i \omega\_i (\text{mod } m\_2).$$

By the assumption *m*<sup>2</sup> > *nm*<sup>1</sup> of formula (4.62), and (4.63), there is

$$0 \le \sum\_{i=0}^{n-1} e\_i \alpha\_i < m\_2.$$

So (4.66) holds. Then consider the minimum nonnegative residue *N* = *b*1*N*<sup>0</sup> mod *m*1(0 ≤ *N* < *m*1) of *b*1*N*<sup>0</sup> mod *m*1, by (4.63),

$$N = b\_1 N\_0 \equiv \sum\_{i=0}^{n-1} e\_i b\_1 \alpha\_i \equiv \sum\_{i=0}^{n-1} e\_i \alpha\_i (\text{mod } m\_1) .$$

So there is

$$N = \sum\_{i=0}^{n-1} e\_i \alpha\_i, \ \alpha\_i \in A\_1.$$

Since *A*<sup>1</sup> is a super-increasing sequence, the algorithm of polynomial (see Lemma 4.13), we have

$$
\psi(A\_1, N) = (e\_{n-1}e\_{n-2}\cdots e\_1e\_0)\_2 = \times .
$$

To get plaintext *x*.

Therefore, Shamir uses simple transformation to transform the general knapsack problem into super-incremental knapsack problem. Although *A*<sup>3</sup> is very special, we have reason to doubt that the public key cryptography based on the general knapsack problem solving algorithm is not as secure as people think.

#### **Exercise 4**

	- (1) What are the advantages and disadvantages of symmetric cryptosystem and asymmetric cryptosystem?
	- (2) The goal of perfecting the certification system.

#### 4.7 Basic Algorithm 193


$$A = \begin{bmatrix} 1 & 3 \\ 4 & 3 \end{bmatrix} \bmod 5,\ A = \begin{bmatrix} 1 & 3 \\ 4 & 3 \end{bmatrix} \bmod 29,$$

$$A = \begin{bmatrix} 15 & 17 \\ 4 & 9 \end{bmatrix} \bmod 26,\ A = \begin{bmatrix} 197 & 62 \\ 603 & 271 \end{bmatrix} \bmod 841.$$

5. In number theory, Fibonacci number is defined as *a*<sup>1</sup> = 1, *a*<sup>2</sup> = 1, *a*<sup>3</sup> = 2, when *n* > 1, *an*+<sup>1</sup> = *an* + *an*−1. Prove

$$
\begin{bmatrix} a\_{n+1} & a\_n \\ a\_n & a\_{n-1} \end{bmatrix} = \begin{bmatrix} 1 \ 1 \\ 1 \ 0 \end{bmatrix}^n,
$$

and *an* is even if and only if 3|*n*. More generally, find the law of *d*|*an*.

	- (i) Mapping *<sup>A</sup>* <sup>σ</sup> −→ (*A*1, *<sup>A</sup>*2) is a 1–1 correspondence between *<sup>M</sup>*2(Z*<sup>N</sup>* ) <sup>σ</sup> −→ *<sup>M</sup>*2(Z*m*) <sup>×</sup> *<sup>M</sup>*2(Z*n*).
	- (ii) In the corresponding σ, *A* is the invertible matrix (mod *N*) if and only if *A*<sup>1</sup> is the invertible matrix (mod *m*) and *A*<sup>2</sup> are the invertible matrix (mod *n*).

$$19683 = 27^3 < n\_A < 28^3 = 21952,$$


$$a^{de} \equiv a(\text{mod} \, n)$$

for all integers *a*.

	- (i) *A* = {2, 3, 7, 20, 35, 69}, *N* = 45; (ii) *A* = {1, 2, 5, 9, 20, 49}, *N* = 73;
	- (iiii) *A* = {1, 3, 7, 12, 22, 45}, *N* = 67;
	- (iv) *A* = {2, 3, 6, 11, 21, 40}, *N* = 39;
	- (v) *A* = {4, 5, 10, 30, 50, 101}, *N* = 186.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Prime Test**

In the RSA algorithm in the previous chapter, we see that the decomposition of large prime factors constitutes the basis of RSA cryptosystem security. Theoretically, this security should not be questioned, because there is only the definition of prime in mathematics, and there is no general method to detect prime. The main purpose of this chapter is to introduce some basic prime test methods, including Fermat test, Euler test, Monte Carlo method, continued fraction method, etc., understanding the content of this chapter requires some special number theory knowledge.

# **5.1 Fermat Test**

According to Fermat's congruence theorem (commonly known as Fermat's small theorem, which is a special case of Euler congruence theorem), if *n* is a prime number, the following congruence formula holds for all integers *b*, (*b*, *n*) = 1,

$$b^{n-1} \equiv 1 (\text{mod } n). \tag{5.1}$$

The above formula is an important characteristic of prime numbers. Although *n* satisfying the above formula is not necessarily prime, it can be used as an important basis for detecting prime numbers, because we can conclude that *n* not satisfying the above formula is definitely not a prime number. Using Formula (5.1) as the standard to detect prime numbers is called Fermat test.

**Definition 5.1** An odd number *n*, assuming that *n* is a compound number (not a prime number) and there is a positive integer *b*, (*b*, *n*) = 1, satisfying

$$b^{n-1} \equiv 1 (\bmod n),$$

© The Author(s) 2022

Z. Zheng, *Modern Cryptography Volume 1*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-19-0920-7\_5

197

the compound number *n* is called a Fermat pseudo prime under base *b*.

The basic properties of pseudo prime numbers are discussed. Our working platform is a finite Abel group Z<sup>∗</sup> *<sup>n</sup>*, define as

$$\mathbb{Z}\_n^\* = \{\bar{a} | 1 \le a \le n, (a, n) = 1\}, \ n > 1,\tag{5.2}$$

where *a*¯ is a congruence class of mod *n* represented by *a*. The multiplication of two congruence classes is defined as *<sup>a</sup>*¯ · *<sup>b</sup>*¯ <sup>=</sup> *ab*¯ ; obviously, <sup>Z</sup><sup>∗</sup> *<sup>n</sup>* forms an Abel group of order ϕ(*n*) under multiplication, in a finite group *G*, the order of a group element *g* ∈ *G* is defined as

$$\rho(\mathbf{g}) = \min\{m \, : \, \mathbf{g}''' = 1, \, 1 \le m \le |G|\}.$$

*o*(*g*) = 1 if and only if *g* is the unit element of group *G*. By the definition of *o*(*g*), obviously,

$$\mathbf{g}^t = \mathbf{l} \Leftrightarrow o(\mathbf{g}) | \mathbf{t}. \tag{5.3}$$

The following two lemmas are the basic conclusions about the order of group element *g*.

**Lemma 5.1** *G is a finite group, g* <sup>∈</sup> *G, k* <sup>∈</sup> <sup>Z</sup> *is an integer, then*

$$o(\mathbf{g}^k) = \frac{o(\mathbf{g})}{(k, o(\mathbf{g}))},\tag{5.4}$$

*where the denominator is the greatest common divisor of k and o*(*g*)*.*

*Proof* Let *o*(*g*) = *m*, *o*(*g<sup>k</sup>* ) = *t*, obviously, (*g<sup>k</sup>* )*<sup>m</sup>* = 1, in particular,

$$g^{\frac{k\cdot m}{(k,m)}} = 1, \implies t\left|\frac{m}{(k,m)}\right.$$

On the other hand, by *gkt* = 1, there is *m*|*kt*, thus

$$\frac{m}{(k,m)} \left| \frac{k}{(k,m)} t, \Longrightarrow \frac{m}{(k,m)} \right| t.$$

So we have *<sup>t</sup>* <sup>=</sup> *<sup>m</sup>* (*k*,*m*), the Lemma holds.

**Lemma 5.2** *Suppose G is a finite Abel group, a*, *b* ∈ *G,* (*o*(*a*), *o*(*b*)) = 1*, then*

$$o(ab) = o(a)o(b).$$

*Proof* Let *o*(*a*) = *m*1, *o*(*b*) = *m*2, then (*m*1, *m*2) = 1. Let *o*(*ab*) = *t*, by (*ab*)*<sup>m</sup>*1*m*<sup>2</sup> = *a<sup>m</sup>*1*m*<sup>2</sup> *b<sup>m</sup>*1*m*<sup>2</sup> = 1, there is*t*|*m*1*m*2, on the other hand,(*ab*)*<sup>t</sup>* = 1, then (*ab*)*tm*<sup>1</sup> = 1, thus *btm*<sup>1</sup> = 1, *m*2|*m*1*t*, *m*2|*t*. By the same reason, there is *m*1|*t*, thus *m*1*m*2|*t*, *t* = *m*1*m*2. The Lemma holds.

Back to the finite group Z<sup>∗</sup> *<sup>n</sup>*, any integer *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>,(*a*, *<sup>n</sup>*) <sup>=</sup> 1, then *<sup>a</sup>*¯ <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>n</sup>*, we denote *o*(*a*¯) with *o*(*a*), *a* is called the order mod *n*, obviously, *o*(*a*) = *o*(*b*), if *a* ≡ *b*(mod *n*). A basic problem in number theory is the existence of primitive roots of mod *n*. equivalently, is Z<sup>∗</sup> *<sup>n</sup>* a cyclic group? If there is a positive integer *a*, (*a*, *n*) = 1, *o*(*a*¯) = |Z∗ *<sup>n</sup>*| = ϕ(*n*), then <sup>Z</sup><sup>∗</sup> *<sup>n</sup>* is a cyclic group of order ϕ(*n*), so that the primitive root of mod *n* exists and *a* is the primitive root of mod *n*.

**Lemma 5.3** (Existence of primitive root) *If and only if n* = 2*,* 4*, p*α(α ≥ 1) *and a* = 2*p*α(α ≥ 1) *four cases, the primitive root of* mod *n exists, where p* > 2 *is an odd prime.*

*Proof* If *<sup>n</sup>* <sup>=</sup> 2, 4, then the lemma holds. If *<sup>n</sup>* <sup>=</sup> *<sup>p</sup>*, then <sup>Z</sup>*<sup>n</sup>* <sup>=</sup> <sup>F</sup>*p*, <sup>Z</sup><sup>∗</sup> *<sup>n</sup>* <sup>=</sup> <sup>F</sup><sup>∗</sup> *<sup>p</sup>*, by Lemma 4.7 of Chap. 4, it can be seen that F<sup>∗</sup> *<sup>p</sup>* is a cyclic group of order (*p* − 1), so mod *p* has primitive roots. Now, we need to prove for all positive integer α, the primitive root of mod *p*<sup>α</sup> also exists. Therefore, let *a* be a primitive root of mod *p*, that is, the order of *a* mod *p* is *p* − 1. If the order of *a* mod *p*<sup>α</sup> is denoted by *o*(*a*), then

$$a^{o(a)} \equiv 1 \pmod{p^a}, \Longrightarrow a^{o(a)} \equiv 1 \pmod{p},$$

so there is *<sup>p</sup>* <sup>−</sup> <sup>1</sup>|*o*(*a*). And the number of elements of <sup>Z</sup><sup>∗</sup> *<sup>p</sup>*<sup>α</sup> is ϕ(*p*α) = *p*<sup>α</sup>−<sup>1</sup>(*p* − 1), obviously, *o*(*a*)|*p*<sup>α</sup>−<sup>1</sup>(*p* − 1), thus, *o*(*a*) = *p<sup>i</sup>* (*p* − 1), 0 ≤ *i* ≤ α − 1.

We might as well let *o*(*a*) = *p* − 1, if *o*(*a*) = *p<sup>i</sup>* (*p* − 1),1 ≤ *i*, then replace *a* with *a <sup>p</sup><sup>i</sup>* . By Lemma 5.1,

$$o(a^{p^i}) = \frac{p^i(p-1)}{(p^i, p^i(p-1))}.$$

Therefore, without losing generality, let *o*(*a*) = *p* − 1, then by Sylow theorem, when α > 1, *p*<sup>α</sup>−*<sup>a</sup>*|ϕ(*p*α), there is an integer *b*, (*b*, *n*) = 1, *b* is *o*(*n*) = *p*<sup>α</sup>−<sup>1</sup> in the order of mod *p*<sup>α</sup>, because of (*o*(*a*), *o*(*b*)) = 1, then by Lemma 5.2, there is

$$o(ab) = o(a)o(b) = p^{a-1}(p-1) = \varphi(p^a),$$

So the primitive root of mod *p*<sup>α</sup> exists.

When *n* = 2*p*<sup>α</sup>, *p* > 2 is odd prime, then ϕ(*n*) = ϕ(*p*α). Thus, the primitive root *a* of mod *p*<sup>α</sup> is also an primitive root of mod 2*p*<sup>α</sup>. The Lemma holds.

#### **Lemma 5.4** *Let n be an odd compound number, then*


*(iii) If exist one b* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>n</sup> does not satisfy Eq. (5.1), at least half of a, b* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>n</sup> do not satisfy Eq. (5.1).*

*Proof* (i) and (ii) are trivial. (i) can be obtained by (5.3). And *<sup>b</sup>*1, *<sup>b</sup>*<sup>2</sup> <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *n*,

$$\begin{cases} b\_1^{n-1} \equiv 1(\text{mod } n), b\_2^{n-1} \equiv 1(\text{mod } n). \implies (b\_1 b\_2)^{n-1} \equiv 1(\text{mod } n). \\\ b^{n-1} \equiv 1(\text{mod } n), \Longrightarrow (b^{-1})^{n-1} \equiv 1(\text{mod } n). \end{cases}$$

So there is (ii). To prove (iii). Let *n* not be Fermat pseudo prime to base *b*, if *n* is Fermat pseudo prime to base *a*, then *n* is not Fermat pseudo prime to base *ab*. By (ii), therefore, if there is a base to make *n* a Fermat pseudo prime number, there must be a base to make *n* not a Fermat pseudo prime number, so more than half of the base *b* must make *n* not a Fermat pseudo prime number. The Lemma holds.

By Lemma 5.3, if there is a base *b* so that *n* is not Fermat pseudo prime, detect *a*, 1 ≤ *a* ≤ *n*, (*a*, *n*) = 1 in sequence, whether *an*−<sup>1</sup> ≡ 1(mod *n*); that is, there is more than 50% chance that find the exact *b* such that *bn*−<sup>1</sup> ≡ 1(mod *n*), this proves that *n* is not a prime number. Is it possible that all *a*, 1 ≤ *a* ≤ *n*, (*a*, *n*) = 1, *n* is Fermat pseudo prime to base *a* The answer is yes, such a number *n* is called Carmichael number.

**Definition 5.2** A Carmichael number *n* is an odd compound number, and for ∀ *b* ∈ Z∗ *<sup>n</sup>*, there is

$$b^{n-1} \equiv \lg(\text{mod } n).$$

For Carmichael number, we have the following engraving.

**Theorem 5.1** *Let n be a compound number, then*


*Proof* Let's prove (i) first. Let *p*<sup>2</sup>|*n*, *p* be a prime number, by Lemma 5.3, mod *p*<sup>2</sup> has primitive roots. Let *g* be an original root of mod *p*2, that is *o*(*g*) = *p*(*p* − 1), let

$$n' = \prod\_{p'|n, p' \neq p} p', \ p' \text{ is a prime number.}$$

According to the Chinese remainder theorem, there is a positive integer *b* such that

$$\begin{cases} b \equiv \operatorname{g}(\operatorname{mod} p^2), \\ b \equiv \operatorname{l}(\operatorname{mod} n'). \end{cases}$$

Then *b* is an primitive root of mod *p*2, and (*b*, *n*) = 1. We assert that *n* to base *b* is not a Fermat pseudo prime. If *n* to base *b* is a Fermat pseudo prime, then

$$b^{n-1} \equiv 1 (\text{mod } n), \Longrightarrow b^{n-1} \equiv 1 (\text{mod } p^2), \Longrightarrow o(b) | n - 1.$$

That is *p*(*p* − 1)|*n* − 1, but *p*|*n* is contradict with *p*|*n* − 1. So *b<sup>n</sup>*−<sup>1</sup> ≡ 1(mod *n*), *n* is not Carmichael number, (i) holds.

Now to prove (ii). If <sup>∀</sup> *<sup>p</sup>*, *<sup>p</sup>*|*n*, there is *<sup>p</sup>* <sup>−</sup> <sup>1</sup>|*<sup>n</sup>* <sup>−</sup> 1, then <sup>∀</sup> *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *n*,

$$b^{n-1} = (b^{\frac{n-1}{p-1}})^{p-1} \equiv 1 (\text{mod } p), \forall \ p | n.$$

Because *n* is a square free number, so

$$b^{n-1} \equiv 1 (\text{mod } n), \ \forall \ b \in \mathbb{Z}\_n^\*.$$

Therefore, *n* is the Carmichael number. Conversely, if there is a prime number *p*, *<sup>p</sup>*|*n*, but *<sup>p</sup>* <sup>−</sup> <sup>1</sup> *n* − 1, Let *g* be a primitive root of mod *p*, which is given by the Chinese remainder theorem,

$$\begin{cases} b \equiv \gcd(p), \\ b \equiv 1 \left( \text{mod } \frac{n}{p} \right). \end{cases}$$

Then (*b*, *n*) = 1, and

$$b^{p-1} \equiv \mathfrak{g}^{p-1} \equiv 1 (\text{mod } p).$$

By *<sup>p</sup>* <sup>−</sup> <sup>1</sup> *n* − 1, then *gn*−<sup>1</sup> ≡ 1(mod *p*), so there is *bn*−<sup>1</sup> ≡ 1(mod *n*), this contradicts with the assumption that *n* is the Carmichael number. So (ii) holds.

To prove (iii), we just need to exclude that *n* is the product of two prime numbers. By (ii), let *n* = *pq*, *p* < *q*, if *n* is a Carmichael number, then *q* − 1 | *n* − 1, but *n* − 1 = *p*(*q* − 1 + 1) − 1 = *p*(*q* − 1) + *p* − 1, then

$$n - 1 \equiv p - 1 (\text{mod } q - 1),$$

this contradicts with *n* − 1 ≡ 0(mod *q* − 1), so *n* = *pq* must not be a Carmichael number, the Theorem holds.

Below we give some examples of Carmichael numbers, from property (ii) in Theorem 5.1, we can easily verify whether a square free number is Carmichael number.

*Example 5.1* The following positive integers *n* are Carmichael numbers,

$$\begin{aligned} n &= 1105 = \ $ \cdot 13 \cdot 7, n = 1729 = 1 \cdot 13 \cdot 19, n = 2465 = \$  \cdot 17 \cdot 29, n \\ n &= 2821 = 7 \cdot 13 \cdot 31, n = 6601 = 7 \cdot 23 \cdot 41. \end{aligned}$$

*Example 5.2* The positive integer 561 = 3 · 11 · 17 is the smallest Carmichael number.

*Proof* Defined by, the Carmichael number is odd and compound, so the minimum Carmichael number is

*n* = 3 · *p* · *q*, where *p* − 1|*n* − 1, *q* − 1|*n* − 1, *p* < *q* is a prime.

Let *p* = 5, *p* = 7, the congruence equation

$$3 \cdot p \cdot q \equiv 1 (\text{mod}\, q - 1), q > p$$

has no prime solution *q*, when *p* = 11, the above formula has a minimum solution *q* = 17, so *n* = 3 · 11 · 17 is the smallest Carmichael number.

*Example 5.3* For given prime number *r* ≥ 3, then the congruence equations

$$\begin{cases} rpq \equiv \operatorname{l}(\operatorname{mod} p - \operatorname{l}) \\ rpq \equiv \operatorname{l}(\operatorname{mod} q - \operatorname{l}) \end{cases}$$

has only finite different prime solutions *p*, *q*. Let's leave this conclusion for reflection.

# **5.2 Euler Test**

Let *p* > 2 be an odd prime, Euler test uses the Euler criterion in the quadratic residue of mod *p* to detect whether a positive integer *n* is prime. Like Fermat's test, it is obvious that the *n* that passes the test cannot be determined as prime, but the *n* that fails the test is certainly not prime. We know that when the positive integers *a* and *n* are given (*n* > 1), the solution of the quadratic congruence equation *x* <sup>2</sup> ≡ *a*(mod *n*) is a famous "NP complete" problem. We can't find a general solution in an effective time. However, in the special case where *n* = *p* > 2 is an odd prime number, we have rich theoretical knowledge to discuss the quadratic residue of mod *p*, these knowledge include the famous Gauss quadratic reciprocal law and Euler criterion, which constitute the core knowledge system of elementary number theory. First, we introduce Legendre sign and let *p* > 2 be a given odd prime number.

Z<sup>∗</sup> *<sup>p</sup>* is a (*<sup>p</sup>* <sup>−</sup> <sup>1</sup>)-order cyclic group, *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>p</sup>* (i.e., (*a*, *p*) = 1), we define the Legendre symbolic function as

$$a\left(\frac{a}{p}\right) = \begin{cases} 1, & \text{when } x^2 \equiv a(\text{mod } p) \text{ is solvable} \\ -1, & \text{when } x^2 \equiv a(\text{mod } p) \text{ is unsolvable} \end{cases}$$

If (*a*, *<sup>p</sup>*) > 1, that is *<sup>p</sup>* <sup>|</sup> *<sup>a</sup>*, we let ( *<sup>a</sup> <sup>p</sup>* ) <sup>=</sup> 0, for <sup>∀</sup> *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>, Legendre symbolic function ( *a <sup>p</sup>* ) is all defined, and it is a completely integral function of <sup>Z</sup> → {1, <sup>−</sup>1, <sup>0</sup>}.

$$\left(\frac{ab}{p}\right) = \left(\frac{a}{p}\right)\left(\frac{b}{p}\right), \forall \, a, b \in \mathbb{Z}$$

and

$$
\left(\frac{a}{p}\right) = \left(\frac{b}{p}\right), \text{ if } a \equiv b \pmod{p} \dots
$$

If ( *<sup>a</sup> <sup>p</sup>* ) = 1, then *x* <sup>2</sup> ≡ *a*(mod *p*) is solvable, *a* is called a quadratic residue of mod *p*, if ( *<sup>a</sup> <sup>p</sup>* ) = −1, then *x* <sup>2</sup> ≡ *a*(mod *p*) is unsolvable, *a* is called a quadratic nonresidue of mod *p*.

**Lemma 5.5** *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>, *<sup>p</sup> a, then the necessary and sufficient condition for a to be the quadratic residue of* mod *p is*

$$a^{\frac{p-1}{\frac{1}{\cdot}}} \equiv \lg(\text{mod } p).$$

*Proof* Z<sup>∗</sup> *<sup>p</sup>* is a *p* − 1-order cyclic group, let *g* be a primitive root of mod *p*, that is *g*¯ is the generator of Z<sup>∗</sup> *<sup>p</sup>*, that is <sup>∀</sup>*<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>, (*a*, *<sup>p</sup>*) <sup>=</sup> 1, we have

$$a \equiv g^t(\text{mod } p), \text{ where } 1 \le t \le p - 1.$$

Obviously, *a* is the quadratic residue of mod *p* ⇔ *t* is even. Therefore, if *t* is even, then

$$a^{\frac{p-1}{2}} \equiv g^{\frac{\iota(p-1)}{2}} \equiv (g^{\frac{\iota}{2}})^{p-1} \equiv 1 \pmod{p}.$$

Conversely, if *a <sup>p</sup>*−<sup>1</sup> <sup>2</sup> <sup>≡</sup> <sup>1</sup>(mod *<sup>p</sup>*), then *<sup>o</sup>*(*a*) <sup>|</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and by Lemma 5.1, can calculate

$$o(a) = o(\mathbf{g}^t) = \frac{p-1}{(t, \, p-1)}.$$

So

$$|o(a)| \stackrel{p-1}{\underset{2}{\rightleftharpoons}} \Leftrightarrow 2|(t, |p-1|) \Leftrightarrow 2|t, |$$

that is *t* is even, thus, *a* is a quadratic residue of mod *p*, the Lemma holds.

**Lemma 5.6** (Euler criterion). *For* <sup>∀</sup> *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*, we have*

$$a^{\frac{p-1}{2}} \equiv \left(\frac{a}{p}\right)(\text{mod } p). \tag{5.5}$$

*Proof* If (*a*, *<sup>p</sup>*) > 1, that is *<sup>p</sup>*|*a*, the above formula holds. Might as well let *<sup>p</sup> a*. By Fermat congruence theorem *a <sup>p</sup>*−<sup>1</sup> ≡ 1(mod *p*), there is

$$(a^{\frac{p-1}{2}} + 1)(a^{\frac{p-1}{2}} - 1) \equiv 0(\text{mod } p).$$

Thus

$$a^{\frac{p-1}{2}} \equiv \pm 1 (\text{mod } p).$$

If *a <sup>p</sup>*−<sup>1</sup> <sup>2</sup> <sup>≡</sup> <sup>1</sup>(mod *<sup>p</sup>*), by Lemma 5.5, then ( *<sup>a</sup> <sup>p</sup>* ) <sup>=</sup> 1. If *<sup>a</sup> <sup>p</sup>*−<sup>1</sup> <sup>2</sup> ≡ −1(mod *<sup>p</sup>*), then ( *<sup>a</sup> <sup>p</sup>* ) = −1. So (5.5) holds.

**Definition 5.3** Suppose *n* is an odd compound number, if there is an integer *b*, (*b*, *n*) = 1, it satisfies

$$b^{\frac{n-1}{2}} \equiv \left(\frac{b}{n}\right)(\text{mod } n),\tag{5.6}$$

Call *n* an Euler pseudo prime under base *b*. Where ( *<sup>b</sup> <sup>n</sup>* ) is Jacobi symbol, define as

$$\left(\frac{b}{n}\right) = \left(\frac{b}{p\_1}\right)^{a\_1} \left(\frac{b}{p\_2}\right)^{a\_2} \cdots \left(\frac{b}{p\_s}\right)^{a\_s}, \text{ if } n = p\_1^{a\_1} \cdots p\_s^{a\_s}.\tag{5.7}$$

From the definition, we obviously have a corollary: if *n* is Euler pseudo prime under basis *b*, then *n* is Fermat pseudo prime under basis *b*. This conclusion can be proved by squaring both sides of Eq. (5.6) at the same time.

The following example shows that the inverse of inference is not tenable; that is, if *n* is Fermat pseudo prime under basis *b*, but not Euler pseudo prime.

*Example 5.4 n* = 91 is Fermat pseudo prime under basis *b* = 3, but not Euler pseudo prime. In fact, it's easy to calculate 36 ≡ 1(mod 91), thus 390 ≡ 1(mod 91). From 36 ≡ 1(mod 91), we have

$$\mathfrak{J}^{42} \equiv \mathfrak{l} \pmod{91}, \Longrightarrow \mathfrak{J}^{45} \equiv \mathfrak{P} \mathfrak{l} \bmod 91.$$

So 91 to base 3 is not an Euler pseudo prime.

*Example 5.5 n* = 91 to base *b* = 10 is an Euler pseudo prime. Because

$$10^{45} \equiv 10^3 \equiv -1 \pmod{91},$$

calculate Legendre symbols

$$
\left(\frac{10}{91}\right) = \left(\frac{2}{91}\right) \cdot \left(\frac{5}{91}\right) = -1,
$$

so *n* = 91 to base *b* = 10 is an Euler pseudo prime.

From the Euler criterion of Lemma 5.6, we can easily calculate the Legendre symbols of −1 and 2.

**Lemma 5.7** *Let p* > 2 *be an odd prime, then we have*

$$\left(\frac{-1}{p}\right) = (-1)^{\frac{p-1}{2}}, \ \left(\frac{2}{p}\right) = (-1)^{\frac{1}{8}(p^2 - 1)}.\tag{5.8}$$

*Proof* By Lemma 5.6,

$$(-1)^{\frac{p-1}{2}} \equiv \left(\frac{-1}{p}\right)(\text{mod } p),$$

Since both sides of the congruence are <sup>±</sup>1, *<sup>p</sup>* <sup>&</sup>gt; 2, there is ( <sup>−</sup><sup>1</sup> *<sup>p</sup>* ) = (−1) *p*−1 <sup>2</sup> . To calculate the Legendre sign for 2, we notice that

$$\begin{cases} p - 1 \equiv (-1)^1 (\text{mod } p) \\ 2 \equiv 2 \cdot (-1)^2 (\text{mod } p) \\ p - 3 \equiv 3 \cdot (-1)^3 (\text{mod } p) \\ \vdots \\ 2 \\ r \equiv \frac{p-1}{2} \cdot (-1)^{\frac{p-1}{2}} (\text{mod } p), \end{cases}$$

where *<sup>r</sup>* <sup>=</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> , if *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> is a even; *<sup>r</sup>* <sup>=</sup> *<sup>p</sup>* <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> , if *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> is an odd. There is

$$(2 \cdot 4 \cdot 6 \cdots (p-1) \equiv \left(\frac{p-1}{2}\right)! (-1)^{\frac{1}{8}(p^2-1)} (\text{mod } p),$$

that is

$$2^{\frac{p-1}{2}} \equiv (-1)^{\frac{1}{k}(p^2-1)}(\text{mod } p),$$

by Lemma 5.6,

$$
\left(\frac{2}{p}\right) \equiv (-1)^{\frac{1}{8}(p^2 - 1)} (\text{mod } p),
$$

there is

$$
\left(\frac{2}{p}\right) = (-1)^{\frac{1}{k}(p^2 - 1)},
$$

Lemma 5.7 holds.

Let ( *<sup>a</sup> <sup>n</sup>* ) be a Jacobi symbol, defined by Eq. (5.6), then Lemma 5.7 can be extended to Jacobi symbol.

**Lemma 5.8** *Let n be an odd, then we have*

$$\left(\frac{-1}{n}\right) = (-1)^{\frac{n-1}{2}}, \ \left(\frac{2}{n}\right) = (-1)^{\frac{1}{6}(n^2 - 1)}.\tag{5.9}$$

*Proof* The square of any odd number is congruent 1 under mod 8, that is *a*<sup>2</sup> ≡ 1(mod 8). Write *n* = *a*<sup>2</sup> · *p*<sup>1</sup> *p*<sup>2</sup> ··· *pt* , where *pi* are different prime numbers, then

$$n \equiv p\_1 p\_2 \cdots p\_t (\text{mod } 8) .$$

Similarly, for <sup>∀</sup> *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>, by (5.7),

$$
\left(\frac{b}{n}\right) = \left(\frac{b}{p\_1}\right)\left(\frac{b}{p\_2}\right)\cdots\left(\frac{b}{p\_t}\right),\tag{5.10}
$$

thus

$$\left(\frac{-1}{n}\right) = \left(\frac{-1}{p\_1}\right)\left(\frac{-1}{p\_2}\right)\cdots\left(\frac{-1}{p\_t}\right) = (-1)^{\frac{p\_1 - 1}{2} + \frac{p\_2 - 1}{2} + \cdots + \frac{p\_t - 1}{2}} = (-1)^{\frac{n - 1}{2}}.\tag{5.11}$$

The same can be proved <sup>2</sup> *n* , the Lemma holds.

**Corollary 5.1** *For all odd numbers n, they are Euler pseudo prime under the base* ±1*.*

*Proof* It is trivial that *n* to 1 is an Euler pseudo prime number, and *n* to −1 is an Euler pseudo prime number, which is directly derived from Lemma 5.8.

**Lemma 5.9** (Gauss. ) *Let p and q be two different odd primes, then*

$$\left(\frac{q}{p}\right)\left(\frac{p}{q}\right) = (-1)^{\frac{1}{4}(p-1)(q-1)}\frac{1}{p}$$

*Proof* According to incomplete statistics, there are currently more than 270 methods to prove Gauss quadratic reciprocal law. In order to save space, we leave the proof to the readers, hoping that everyone can find their favorite proof method.

Next, we discuss the computational complexity of Fermat test and Euler test.

**Lemma 5.10** *Let n be an odd,* 1 ≤ *b* < *n,* (*b*, *n*) = 1*, then*

$$\begin{cases} \text{Time}(n \text{ to base } b \text{'s Fermat test}) = O(\log^3 n), \\ \text{Time}(n \text{ to base } b \text{'s Euler test}) = O(\log^4 n). \end{cases}$$

*Proof* By (5.1), the Fermat test of *n* to base *b* is actually an operation of *b<sup>n</sup>*−<sup>1</sup> to mod *n*, by the Lemma 1.5 of Chap. 1, bit operations of *b<sup>n</sup>*−<sup>1</sup> mod *n*,

$$\operatorname{Time}(b^{n-1}\operatorname{mod} n) = O(\log n \log^2 n) = O(\log^3 n).$$

Euler test of *n* to base *b*, by (5.6), the number of bit operations on the left is *O*(log<sup>3</sup> *n*). Find Jacobi symbol ( *<sup>b</sup> <sup>n</sup>* ), from Eq. (5.7) and quadratic reciprocal law, the calculation can be transformed into the calculation of Legendre symbol. Each reciprocal law is actually a division, so we only consider the calculation of Legendre symbols. By Euler criterion,

$$\text{Time}\left(\text{calculate } \left(\frac{b}{p}\right)\right) = \text{Time}\left(b^{\frac{p-1}{2}} \text{mod } p\right) = O(\log^3 n).$$

The number of prime factors of each *n* has an estimated *O*(log log *n*), so

$$\text{Time}\left(\text{calculate Jacobi symbol } \left(\frac{b}{n}\right)\right) = O\left(\log\log n \cdot \log^3 n\right) = O(\log^4 n).$$

We have completed the calculation of Lemma 5.10.

Solovay and Strassen proposed a probabilistic method to detect prime numbers by Euler test in 1977. When *n* > 1 is an odd number, *k* numbers are randomly selected, *b*1, *b*2,..., *bk* , where 1 < *bi* < *n*, (*bi*, *n*) = 1. Use Eq. (5.6) to calculate both sides of each *b* in turn, and the required bit operation is *O*(log<sup>4</sup> *n*), if both sides of Eq. (5.6) are not equal, then *n* is not a prime number and the test is terminated. If *k b* pass the Euler test of Eq. (5.6), then *n* is the probability < <sup>1</sup> <sup>2</sup>*<sup>k</sup>* of compound number, that is

$$P\{n \text{ is not prime}\} \le 2^{-k}.$$

The above formula is directly derived from Lemma 5.3. Let's introduce a better Miller–Rabin method than Solovay–Strassen method in a sense.

**Definition 5.4** Let *n* be an odd compound number, write *n* − 1 = 2*<sup>t</sup>* · *m*, where *<sup>t</sup>* <sup>≥</sup> 1, *<sup>m</sup>* is an odd. Let *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>n</sup>*, if *n* and *b* satisfy one of the following conditions,

$$b^m \equiv 1(\text{mod } n), \text{ or exists one } r, 0 \le r < t, \text{ such that } b^{2^r m} \equiv -1(\text{mod } n). \quad (5.12)$$

Then *n* is called a strong pseudo prime under base *b*.

**Lemma 5.11** *Suppose n* ≡ 3(mod 4)*, then n is a strong Pseudoprime under base b if and only if n is an Euler Pseudoprime under base b.*

*Proof* Because *<sup>n</sup>* <sup>≡</sup> <sup>3</sup>(mod 4), then *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> <sup>2</sup>*m*, that is *<sup>t</sup>* <sup>=</sup> 1, *<sup>m</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> (*n* − 1). By Definition 5.4, *n* is a strong pseudo prime under base *b* if and only if

$$b^m = b^{\frac{n-1}{2}} \equiv \pm 1 (\text{mod } n).$$

Therefore, if *n* is an Euler pseudo prime number under base *b*, the above formula holds, so it is also a strong pseudo prime number for base *b*. Conversely, if the above formula holds, because of *<sup>n</sup>* <sup>≡</sup> <sup>3</sup>(mod 4), then <sup>1</sup> <sup>2</sup> (*n* − 1) is an odd number, so ( <sup>−</sup><sup>1</sup> *<sup>n</sup>* ) = −1, and

$$\left(\frac{b}{n}\right) = \left(\frac{b}{n}\right)^{\frac{n-1}{2}} \equiv \left(\frac{b^{\frac{n-1}{2}}}{n}\right) \equiv b^{\frac{n-1}{2}(\text{mod }n)}.$$

Therefore, *n* to base *b* is Euler pseudo prime. The Lemma holds.

Below we give the main results of this section.

**Theorem 5.2** *Let n be an odd number, b* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *n, then*


Before proving Theorem 5.2, let's introduce Miller–Rabin's test method, in order to test whether a large odd number *n* is a prime number, we write *n* − 1 = 2*<sup>t</sup>* · *m*, *m* is an odd number, *t* ≥ 1, select one *b* at random, 1 ≤ *b* < *n*, (*b*, *n*) = 1. We first calculate *b<sup>m</sup>* mod *n*, if we get the result is ±1, then *n* passes the strong pseudo prime test (5.12). If *b<sup>m</sup>* mod *n* = ±1, then we square *b<sup>m</sup>* mod *n* and find the minimum nonnegative residue of the squared number under mod *n* to see if we get the result of −1 and perform *r* times. If we can't get −1, then *n* to base *b* fails to test Formula (5.12). Therefore, it is asserted that *n* to base *b* is not a strong pseudo prime number. If −1 is obtained by *r* squared, then *n* passes the test under base *b*.

In Miller–Rabin's test, if *n* to base *b* fails to pass the test Formula (5.12), then *n* must not be a prime number, if *n* to randomly selected *k b* = {*b*1, *b*2,..., *bk* } pass the test, by property (ii) of 5.2, each *bi* accounts for no more than 25

$$P\{n \text{ not prime}\} \le \frac{1}{4^k}.\tag{5.13}$$

Compared with the Solovay–Strassen method using Euler test, the Miller–Rabin method using strong pseudo prime test is more powerful.

To prove 5.2, we first prove the following two lemmas.

**Lemma 5.12** *Let G* = *g be a finite group of order m, that is o*(*g*) = *m, then equation x<sup>k</sup>* = 1 *has exactly d solutions in G, d* = (*k*, *m*)*.*

*Proof x* ∈ *G*, write *x* = *g<sup>t</sup>* , then *<sup>x</sup><sup>k</sup>* <sup>=</sup> *<sup>g</sup>kt* <sup>=</sup> <sup>1</sup> <sup>⇔</sup> *<sup>m</sup>*|*kt*, that is *<sup>m</sup> d* | *k <sup>d</sup>* · *<sup>t</sup>*, thus *<sup>m</sup> <sup>d</sup>* |*t*, let *<sup>t</sup>* <sup>=</sup> *<sup>m</sup> <sup>d</sup>* · *s*, then when *s* = 1, 2,..., *d*, *x* = *g<sup>t</sup>* has exactly *d* solutions. The Lemma holds.

**Lemma 5.13** *Let p be an odd prime number, p* − 1 = 2*<sup>t</sup> m , t* ≥ 1*, m is prime, then*

$$
\alpha^{2^r m} \equiv -1 (\text{mod } p), m \text{ is odd} \tag{5.14}
$$

*The number of solutions N in* Z<sup>∗</sup> *<sup>p</sup> satisfies*

$$N = \begin{cases} 0, & \text{if } r \ge t; \\ 2^r(m, m'), & \text{if } r < t. \end{cases}$$

*Proof* Let *g* be a generator of Z<sup>∗</sup> *<sup>p</sup>*, write *x* = *g <sup>j</sup>* , 1 ≤ *j* ≤ *p* − 1, because *o*(*g*) = *p* − 1, so

$$g^{\overset{p-1}{\rightleftharpoons}} \equiv -\lg(\text{mod } p)\,.$$

Thus

$$x^{2^r m} \equiv -1 (\text{mod } p) \Leftrightarrow 2^r m j \equiv \frac{p-1}{2} (\text{mod } p-1)... $$

Namely,

$$2^r m j \equiv 0 (\text{mod } p - 1) .$$

Because *p* − 1 = 2*<sup>t</sup> m* , the above formula is equivalent to

$$2^{r}mj \equiv 2^{t-1}m'(\text{mod }2^{t}m').\tag{5.15}$$

If *r* > *t* − 1, then the congruence has no solution to *j*, because *m* and *m* are odd numbers, so when *r* ≥ *t*, (5.14) is unsolvable. If *r* < *t*, let *d* = (*m*, *m* ), then

$$(\mathcal{Z}^r m, \mathcal{Z}^t m') = \mathcal{Z}^r d,$$

then Eq. (5.15) has exactly *d* solutions for *j*. Each *j* corresponds to one *x* = *g <sup>j</sup>* , then the number of solutions of Eq. (5.14) to *x* is *N* = 2*rd*, the Lemma holds.

With the above preparation, we now give the proof of Theorem 5.2.

*Proof* (*The proof of Theorem* 5.2). Let's first prove that (i), that is, *n* and *b* satisfy Eq. (5.12), we want to prove that formula (5.6) is satisfied; that is, if *n* to base *b* is a strong pseudo prime number, then *n* to base *b* is an Euler pseudo prime number, write *n* − 1 = 2*<sup>t</sup> m*, *m* is prime, we prove the property (i) of Theorem 5.2 in three cases.

(1) *b<sup>m</sup>* ≡ 1(mod *n*). In this case, it is obvious that *b n*−1 <sup>2</sup> ≡ 1(mod *n*). Let's prove ( *b <sup>n</sup>* ) = 1, in fact,

$$1 = \left(\frac{1}{p}\right) = \left(\frac{b^m}{p}\right) = \left(\frac{b}{p}\right)^m = 1.1$$

There is

$$b^{\frac{n-1}{2}} \equiv \left(\frac{b}{n}\right) \equiv 1 \pmod{n}.$$

That is *n* to base *b* is an Euler pseudo prime number.

(2) *b n*−1 <sup>2</sup> ≡ −1(mod *<sup>n</sup>*). In this case, we have to prove ( *<sup>b</sup> <sup>n</sup>* ) = −1, let *p*|*n* be any prime factor of *n*, write *p* − 1 = 2*<sup>t</sup>*1*m*1, where *t*<sup>1</sup> ≥ 1, *m*<sup>1</sup> is an odd number. Let's calculate the Legendre symbol ( *<sup>b</sup> <sup>p</sup>* ), in fact, *t*<sup>1</sup> ≥ *t*, and

$$\left(\frac{b}{p}\right) = \begin{cases} -1, & \text{if } t\_1 = t; \\ 1, & \text{if } t\_1 > t. \end{cases} \tag{5.16}$$

Because

$$b^{\frac{n-1}{2}} = b^{2^{\epsilon - 1}m} \equiv -1 (\text{mod } n), \\ \Longrightarrow b^{2^{\epsilon - 1}m m\_1} \equiv -1 (\text{mod } n),$$

by *p*|*n*, we have

$$b^{2^{"-1}mn\_{\parallel}} \equiv -\mathbf{l} \pmod{p}.\tag{5.17}$$

If *t*<sup>1</sup> < *t*, from the above formula, there is

$$b^{2^{\prime\prime}m\_{\vee}} \equiv -1 \pmod{p}, \Longrightarrow b^{p-1} \equiv -1 \pmod{p}.$$

This contradicts Fermat's congruence theorem, so we always have *t*<sup>1</sup> ≥ *t*. If *t*<sup>1</sup> = *t*, by (5.17), then

$$\left(\frac{b}{p}\right) \equiv b^{\frac{p-1}{2}} = b^{2^{r-1}m} \equiv -1 \pmod{p}.$$

Because if the above formula is 1, both sides will be *m* power at the same time, which will contradict Formula (5.17). If *t*<sup>1</sup> > *t*, put both sides of Eq. (5.17) to the power of 2*<sup>t</sup>*1−*<sup>t</sup>* at the same time, then ( *<sup>b</sup> <sup>p</sup>* ) = 1, so we have (5.16). We now complete the proof of case (2) under the conclusion of Eq. (5.16), write *n* = *<sup>p</sup>*|*<sup>n</sup> p*, *p* does not require different, define the positive integer *k* as

$$k = \#\{p \mid p \mid n, \, p - 1 = 2^{t\_1} m\_1, m\_1 \text{ is odd}, t\_1 = t\}.$$

By (5.16), then

$$
\left(\frac{b}{n}\right) = \prod \left(\frac{b}{p}\right) = (-1)^k. \tag{5.18}
$$

Let's prove that *k* is an odd number, because *t*<sup>1</sup> ≥ *t*, *p* − 1 = 2*<sup>t</sup>*1*m*1, *n* − 1 = 2*<sup>t</sup> m*, under mod 2*<sup>t</sup>*+1, we have

$$p \equiv \begin{cases} 1 \pmod{2^{t+1}}, & \text{if } t\_1 > t; \\ 1 + 2^t \pmod{2^{t+1}}, & \text{if } t\_1 = t. \end{cases}$$

Because *n* = 1 + 2*<sup>t</sup>* (mod 2*<sup>t</sup>*+<sup>1</sup>), so

$$n \equiv 1 + 2^t \equiv 1 + k \cdot 2^t (\text{mod } 2^{t+1}),$$

So *k* must be odd, by (5.18), then ( *<sup>b</sup> <sup>n</sup>* ) = −1. Case (2) is proved.

(3) *<sup>b</sup>*2*r*−1·*<sup>m</sup>* ≡ −1(mod *<sup>n</sup>*), where 1 <sup>≤</sup> *<sup>r</sup>* <sup>≤</sup> *<sup>t</sup>*, *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> <sup>2</sup>*<sup>t</sup>* · *<sup>m</sup>*.

In this case, we replace *r* of Eq. (5.12) with *r* − 1. Because *r* − 1 ≤ *t* − 1, so *b n*−1 <sup>2</sup> <sup>≡</sup> <sup>1</sup>(mod *<sup>n</sup>*). To prove property (i) of Theorem 5.2, we have to prove ( *<sup>b</sup> <sup>n</sup>* ) = 1, as in case (2), we let *p*|*n*, write *p* − 1 = 2*<sup>t</sup>*<sup>1</sup> · *m*1, *m*<sup>1</sup> is odd, then we have *t*<sup>1</sup> ≥ *r*, and

$$\left(\frac{b}{p}\right) = \begin{cases} -1, & \text{if } t\_1 = r; \\ 1, & \text{if } t\_1 > r. \end{cases} \tag{5.19}$$

The proof of Formula (5.19) is the same as that of case (2), write *n* = *p*, *p* is not required to be a different prime, define positive integer *k*1:

$$k\_1 = \#\{p \mid p \mid n, \, p - 1 = 2^{t\_1} m\_1, \, m\_1 \text{ is odd}, t\_1 = r\}.$$

as in case (2), we have ( *<sup>b</sup> <sup>n</sup>* ) = (−1)*<sup>k</sup>*<sup>1</sup> , similarly, under mod 2*<sup>r</sup>*+1, it can be proved that *k*<sup>1</sup> must be even. Thus( *<sup>b</sup> <sup>n</sup>* ) = 1, we have completed all the proofs of property (i) in Theorem 5.2.

Next, we prove property (ii) in Theorem 5.2. It is also discussed in three cases.

(1) *n* can be divided by a square number; that is, there is a prime number *p*, *p*<sup>α</sup>||*n*, α ≥ 2.

In this case, we prove that there are at least <sup>1</sup> <sup>4</sup> (*<sup>n</sup>* <sup>−</sup> <sup>1</sup>) *<sup>b</sup>*, *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>n</sup>*, *n* to base *b* is not Fermat prime number, let alone a strong pseudo prime. First, suppose *bn*−<sup>1</sup> ≡ 1(mod *n*), then there is a prime *p*, *p*2|*n*, thus *bn*−<sup>1</sup> ≡ 1(mod *p*<sup>2</sup>). Because Z∗ *<sup>p</sup>*<sup>2</sup> is a *p*(*p* − 1)-order cyclic group (see Theorem 5.3), let *g* be a generator of Z∗ *<sup>p</sup>*<sup>2</sup> , then

$$
\mathbb{Z}\_{p^2}^\* = \{ \mathbf{g}, \mathbf{g}^2, \dots, \mathbf{g}^{p(p-1)} \}.
$$

By Lemma 5.12, the number of *b* satisfying *bn*−<sup>1</sup> ≡ 1(mod *p*<sup>2</sup>) is *d*,

$$d = (n-1, p(p-1)) = (n-1, p-1).$$

Because *<sup>p</sup>*|*n*, so *<sup>p</sup> <sup>n</sup>* <sup>−</sup> 1, and *<sup>p</sup> d*; therefore, the maximum possibility of *d* is *p* − 1; therefore, the proportion of *b* in *b<sup>n</sup>*−<sup>1</sup> ≡ 1(mod *p*<sup>2</sup>) in 1 ≤ *b* < *n* shall not exceed

$$\frac{p-1}{p^2-1} = \frac{1}{p+1} \le \frac{1}{4}.$$

Therefore, there is at most *b* in the proportion of <sup>1</sup> <sup>4</sup> , so that *n* to base *b* is Fermat prime, in case (1), we prove the property (ii) of Theorem 5.2.

(2) *n* = *pq* are two different prime numbers.

In this case, let *p* − 1 = 2*<sup>t</sup>*1*m*1, *q* − 1 = 2*<sup>t</sup>*2*m*2, *m*1, *m*<sup>2</sup> to be odd. Without losing generality, you can let *<sup>t</sup>*<sup>1</sup> <sup>≤</sup> *<sup>t</sup>*2. Let *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *<sup>n</sup>*, in order for *n* to base *b* to be a strong pseudo prime number, it is necessary to satisfy

$$b^m \equiv 1 \pmod{p}, \; b^m \equiv 1 \pmod{q} \tag{5.20}$$

or

$$b^{2^r m} \equiv -1 (\text{mod } p), \; b^{2^r m} \equiv -1 (\text{mod } q), 0 \le r < t. \tag{5.21}$$

By Lemma 5.12, the number of *b* satisfied (5.20) is ≤ (*m*, *m*1)(*m*, *m*2) ≤ *m*1*m*2. By Lemma 5.13, for each *r*, 0 ≤ *r* < min(*t*1, *t*2) = *t*1, the number of *b* satisfying *<sup>b</sup>*2*<sup>r</sup> <sup>m</sup>* ≡ −1(mod *<sup>n</sup>*) is 2*<sup>r</sup>*(*m*, *<sup>m</sup>*1) · <sup>2</sup>*<sup>r</sup>*(*m*, *<sup>m</sup>*2) < <sup>4</sup>*rm*1*m*2. Because *<sup>n</sup>* <sup>=</sup> *pq*, then ϕ(*n*) = (*p* − 1)(*q* − 1), =⇒ *n* − 1 > ϕ(*n*) = 2*<sup>t</sup>*1+*t*<sup>2</sup> , therefore, the proportion of *b* of the strong pseudo prime of *n* to base *b* does not exceed

$$\frac{m\_1m\_2 + m\_1m\_2 + 4m\_1m\_2 + \dots + 4^{t\_1 - 1}m\_1m\_2}{2^{t\_1 + t\_2}m\_1m\_2} = 2^{-t\_1 - t\_2} \left( 1 + \frac{4^{t\_1} - 1}{4 - 1} \right) \tag{5.22}$$

in 1 ≤ *b* < *n*, (*b*, *n*) = 1.

If *t*<sup>1</sup> < *t*2, then the above formula shall not exceed

$$2^{-2t\_l - 1} \left(\frac{2}{3} + \frac{4^{t\_l}}{3}\right) \le 2^{-3} \cdot \frac{2}{3} + \frac{1}{6} = \frac{1}{4}.$$

If *t*<sup>1</sup> = *t*2, then *m*<sup>1</sup> = *m*2, so (*m*, *m*1) ≤ *m*<sup>1</sup> and (*m*, *m*2) ≤ *m*2, one must be strictly less than. The reason is that if they are equal, then *m*1|*m*, *m*2|*m*, *n* − 1 = 2*<sup>t</sup> m*, =⇒ *n* − 1 = 2*<sup>t</sup> m* = *pq* − 1 ≡ *q* − 1(mod *m*1), thus *m*1|*n* − 1, =⇒ *m*1|*q* − 1 = 2*<sup>t</sup>*2*m*2, =⇒ *m*1|*m*2, this is a contradiction. So (*m*, *m*1) ≤ *m*<sup>1</sup> and (*m*, *m*2) ≤ *m*<sup>2</sup> must have a strict less than 0. We have

$$(m, m\_1) \cdot (m, m\_2) \le \frac{1}{3} m\_1 m\_2.$$

If *m*1*m*<sup>2</sup> is substituted for <sup>1</sup> <sup>3</sup>*m*1*m*<sup>2</sup> in Eq. (5.22), the proportion of *n* to *b* whose base *b* is a strong pseudo prime number does not exceed

$$\frac{1}{3}2^{-2t\_l}\left(\frac{2}{3} + \frac{4^{t\_l}}{3}\right) \le \frac{1}{18} + \frac{1}{9} = \frac{1}{6} < \frac{1}{4}.$$

We complete the proof of property (ii) of Theorem 5.2 in case (2).

(3) Finally, suppose *n* = *p*<sup>1</sup> *p*<sup>2</sup> ··· *pk* , *k* ≥ 3 is the product of different prime factors. In this case, write *pi* − 1 = 2*ti mi* , *mi* as an odd number. As in case (2), without losing generality, it can make *t*<sup>1</sup> ≤ *tj*(1 ≤ *j* ≤ *k*). Similarly to the proof of formula (5.22), the proportion of *b* satisfying that *n* is a strong pseudo prime number for base *b* does not exceed

$$\begin{aligned} 2^{-t\_1 - t\_2 - \dots - t\_k} \left( 1 + \frac{2^{k+1} - 1}{2^k - 1} \right) &\leq 2^{-kt\_1} \left( \frac{2^k - 2}{2^k - 1} + \frac{2^{kt\_1}}{2^k - 1} \right) \\ &= 2^{-kt\_1} \cdot \frac{2^k - 2}{2^k - 1} + \frac{1}{2^k - 1} \\ &\leq 2^{-k} \frac{2^k - 2}{2^k - 1} + \frac{1}{2^k - 1} \\ &= 2^{1 - k} \\ &\leq \frac{1}{4}, \end{aligned}$$

because *k* ≥ 3, in this way, we have completed all the proofs of Theorem 5.2.

Euler test and strong pseudo prime test require some complex quadratic residual techniques. We summarize the main conclusions of this section as follows:


*P*{detect whether odd *n* is prime} > 1 − ε, ∀ ε > 0 given.

Moreover, the computational complexity of the detection algorithm is polynomial.

# **5.3 Monte Carlo Method**

Using all the prime number test methods introduced in the previous two sections, for a huge odd number *n*, even if we already know that *n* is not a prime number, we cannot successfully decompose *n*, because the prime number test does not provide prime factor decomposition information, A more direct method—like the sieve method verifies whether the prime factor of *<sup>n</sup>* is for prime numbers not greater than <sup>√</sup>*n*, because a compound number *<sup>n</sup>* must have a prime factor *<sup>p</sup>*, *<sup>p</sup>* <sup>≤</sup> <sup>√</sup>*n*. Selected *<sup>p</sup>* <sup>≤</sup> <sup>√</sup>*n*, the bit operation required to divide *<sup>n</sup>* by *<sup>p</sup>* is *<sup>O</sup>*(log *<sup>n</sup>*), there are *<sup>O</sup>*( √*n* log *<sup>n</sup>* ) prime numbers *<sup>p</sup>* <sup>≤</sup> <sup>√</sup>*<sup>n</sup>* in total, therefore, the bit operation required for such a verification is *O*( <sup>√</sup>*n*). A more effective method was proposed by J. M. Pollard in 1975. We call it Monte Carlo method, or "rho" method.

First, find a convenient mapping *f* of Z*<sup>n</sup> f* −→ <sup>Z</sup>*n*; for example, *<sup>f</sup>* (*x*) is an integer coefficient polynomial, such as *f* (*x*) = *x* <sup>2</sup> + 1; secondly, a prime number *x*<sup>0</sup> is randomly generated, let *x*<sup>1</sup> = *f* (*x*0), *x*<sup>2</sup> = *f* (*x*1), ..., *x <sup>j</sup>*+<sup>1</sup> = *f* (*x <sup>j</sup>*)(*j* = 0, 1, 2, ···). In these *x <sup>j</sup>* , we want to find two integers *x <sup>j</sup>* and *xk* , which are different elements in <sup>Z</sup>*n*, but there are some factors *<sup>d</sup>* of *<sup>n</sup>*, *<sup>d</sup>*|*n*, and *<sup>x</sup> <sup>j</sup>* and *xk* are the same elements in Z*<sup>d</sup>* , that is to say

*x <sup>j</sup>* ≡ *xk* (mod *n*), (*x <sup>j</sup>* − *xk* , *n*) > 1. (5.23)

Once *x <sup>j</sup>* and *xk* are found, the algorithm is said to be completed.

**Theorem 5.3** *Let S be a set of r elements, let f* : *S* → *S is a mapping, x*<sup>0</sup> ∈ *S, define x <sup>j</sup>*+<sup>1</sup> = *f* (*x <sup>j</sup>*)(*j* = 0, 1, 2, . . .)*. Suppose* λ *is a positive real number, let l* = <sup>1</sup> + [√2λ*r*]*, then the condition x*0, *<sup>x</sup>*1,..., *xl is the ratio* <sup>≤</sup> *<sup>e</sup>*−<sup>λ</sup> *of the mapping f of elements in different s to the initial value x*0*,* ( *f*, *x*0)*, f in all mappings S and all x*<sup>0</sup> ∈ *S.*

*Proof* The total number of mappings *f* from *f* : *S* → *S* is *r<sup>r</sup>*, because each *x* ∈ *S*, we can arrange *r* images for it, that is, *f* (*x*) has *r* choices. The initial value *x*<sup>0</sup> has *r* choices, so the total number of ( *f*, *x*0) is *rr*+1. The question is which of these ( *f*, *x*0) choices can satisfy the condition that *x*0, *x*1,..., *xl* is a different element in *S*. we want to prove that the proportion of ( *f*, *x*0) satisfying the condition in *rr*+<sup>1</sup> ( *f*, *x*0) is not greater than ≤ e−<sup>λ</sup>.

When *x*<sup>0</sup> ∈ *S* given, there are *r x*<sup>0</sup> choices, then *x*<sup>1</sup> = *f* (*x*0) has only *r* − 1 choices and *x*<sup>2</sup> = *f* (*x*1) has only *r* − 2 choices, this goes on until *xl* = *f* (*xl*−<sup>1</sup>), there are only *r* − *l* options. The remaining *x* ∈ *S* and *f* can be selected arbitrarily; that is, there are *rr*−*<sup>l</sup>* choices. Therefore, when *x*<sup>0</sup> is given, there are *N f* to make ( *f*, *x*0) meet the required conditions, where

$$N = r^{r-l} \prod\_{j=0}^{l} (r-j).$$

Divide *N* by *r<sup>r</sup>*+1, and the proportion of ( *f*, *x*0) satisfying the condition is

$$\frac{N}{r^{r+1}} = r^{-l} \prod\_{j=1}^{l} (r-j) = \prod\_{j=1}^{l} \left( 1 - \frac{j}{r} \right),\tag{5.24}$$

We notice that the real number *x* ∈ (0, 1), then log(1 − *x*) < −*x*. Take the logarithm to the right of the above formula, then

$$\sum\_{j=1}^{l} \log\left(1 - \frac{j}{r}\right) < -\sum\_{j=1}^{l} \frac{j}{r} = \frac{-l(l+1)}{2r} < -\frac{l^2}{2r}.$$

Because of *<sup>l</sup>* <sup>=</sup> <sup>1</sup> + [√2λ*r*] <sup>&</sup>gt; <sup>√</sup>2λ*r*, from the above formula,

$$\sum\_{j=1}^{l} \log\left(1 - \frac{j}{r}\right) < -\lambda.$$

By (5.24), we have

$$\frac{N}{r^{r+1}} \le \mathbf{e}^{-\lambda}.$$

We complete the proof of Theorem 5.3.

Monte Carlo method uses a polynomial *<sup>f</sup>* (*x*) <sup>∈</sup> <sup>Z</sup>[*x*], so that *<sup>n</sup>* is a positive integer, and the congruence equation of mod *n* is invariant to polynomial *f* (*x*), that is

$$a \equiv b(\operatorname{mod} n), \Longrightarrow f(a) \equiv f(b)(\operatorname{mod} n). \tag{5.25}$$

*<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* given, *<sup>x</sup> <sup>j</sup>*+<sup>1</sup> <sup>=</sup> *<sup>f</sup>* (*<sup>x</sup> <sup>j</sup>*)(*<sup>j</sup>* <sup>=</sup> <sup>0</sup>, <sup>1</sup>, . . .), if you find an *xk*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* that satisfies *xk*<sup>0</sup> ≡ *x <sup>j</sup>*<sup>0</sup> (mod *r*), where *r*|*n*, *r* > 1, *k*<sup>0</sup> > *j*0. By (5.25),

$$f(\mathbf{x}\_{k\_0}) \equiv f(\mathbf{x}\_{j\_0}) \\
(\text{mod } r), \Longrightarrow \mathbf{x}\_{k\_0+1} \equiv \mathbf{x}\_{j\_0+1} \\
(\text{mod } r).$$

Thus for any *k* > *j*, if *k* − *j* = *k*<sup>0</sup> − *j*0, there is *xk* ≡ *x <sup>j</sup>*(mod *r*), this proves that a polynomial mapping Z*<sup>n</sup> f* −→ <sup>Z</sup>*<sup>n</sup>* produces *<sup>k</sup>*<sup>0</sup> different residue classes under mod *r*(*r*|*n*),

$$\{\mathbf{x}\_0, \mathbf{x}\_1, \dots, \mathbf{x}\_{k\_0 - 1}\} \cdot$$

Therefore, there is the following Lemma 5.14.

**Lemma 5.14** *<sup>f</sup>* (*x*) <sup>∈</sup> <sup>Z</sup>[*x*] *is a polynomial, n* <sup>&</sup>gt; <sup>1</sup> *is an positive integer, let x*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>*n, x <sup>j</sup>* = *f* (*x <sup>j</sup>*−<sup>1</sup>)(*j* = 1, 2, . . .)*, if k is the first subscript, there is a j,* 0 ≤ *j* < *k, such that*

$$(x\_k - x\_j, n) = r > 1..$$

*Then* {*x*0, *x*1,..., *xk*−<sup>1</sup>} *is k different residual classes under* mod *r, so it is also k different residual classes under* mod *n. Moreover, Monte Carlo calculation defined by f can only produce k different residual classes.*

We call the polynomial *f* and the initial value *x*<sup>0</sup> described in Lemma 5.14 an average mapping. When the first subscript *k* is very large, the amount of calculation is very large. Here we give an improved Monte Carlo algorithm.

*<sup>f</sup>* (*x*) <sup>∈</sup> <sup>Z</sup>[*x*] given, Monte Carlo algorithm needs to continuously calculate *xk* (*k* = 1, 2, . . .). Let 2*<sup>h</sup>* ≤ *k* < 2*<sup>k</sup>*+<sup>1</sup>(*h* ≥ 0), *j* = 2*<sup>h</sup>* − 1; that is, *k* is an (*h* + 1) bit number, *j* is the maximum *h*-bit number, compare *xk* with *x <sup>j</sup>* and calculate (*xk* − *x <sup>j</sup>*, *n*), if (*xk* − *x <sup>j</sup>*, *n*) > 1, then the calculation is terminated, otherwise consider *k* + 1. The improvedMonte Carlo algorithm only needs to calculate (*xk* − *x <sup>j</sup>*, *n*) once for each *k* , *j* = 2*<sup>h</sup>* − 1. There is no need to verify every *j*, 0 ≤ *j* < *k*, when *k* is very large, it reduces a lot of computation, but there is a disadvantage. It may miss the smallest subscript *k* satisfying the condition, but the error is controllable. In fact, we have the following error estimation.

**Lemma 5.15** *<sup>f</sup>* (*x*) <sup>∈</sup> <sup>Z</sup>[*x*]*, n* <sup>≥</sup> <sup>1</sup> *given, x*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>*n, x <sup>j</sup>* <sup>=</sup> *<sup>f</sup>* (*<sup>x</sup> <sup>j</sup>*−1)(*<sup>j</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, . . .)*, let k*<sup>0</sup> *be the smallest subscript and satisfy* (*xk*<sup>0</sup> − *x <sup>j</sup>*<sup>0</sup> , *n*) > 1*, where* 0 ≤ *j*<sup>0</sup> < *k*0*, assuming that k is the smallest positive integer satisfying* (*xk* − *x <sup>j</sup>*, *n*) > 1 *in the improved Monte Carlo algorithm, we have k* ≤ 4*k*0*.*

*Proof* Suppose *k*<sup>0</sup> has(*h* + 1) bits. Let *j* = 2*<sup>h</sup>*+<sup>1</sup> − 1, *k* = *j* + (*k*<sup>0</sup> − *j*0). By Lemma 5.14, then

$$(\mathbf{x}\_{k\_0} - \mathbf{x}\_{j\_0}, n) > 1, \Longrightarrow (\mathbf{x}\_k - \mathbf{x}\_j, n) > 1.$$

Obviously, *j* is the maximum number of (*h* + 1) bits and *k* is the number of (*h* + 2) bits, so *k* is the required subscript calculated by the improved Monte Carlo algorithm. Obviously,

$$k = j + (k\_0 - j\_0) \le 2^{h+1} - 1 + 2^{h+1} < 4 \cdot 2^h \le 4k\_0.$$

Lemma 5.15 holds.

*Example 5.6* Let *n* = 91, *f* (*x*) = *x* <sup>2</sup> + 1, *x*<sup>0</sup> = 1. By Monte Carlo algorithm, then *x*<sup>1</sup> = 2, *x*<sup>2</sup> = 5, *x*<sup>3</sup> = 26 and *x*<sup>4</sup> = 40 (because262 + 1 ≡ 40(mod 91)). By the improved Monte Carlo algorithm, only (*x*<sup>4</sup> − *x*3, 91) needs to be detected to obtain

$$(\mathbf{x}\_4 - \mathbf{x}\_3, 91) = (14, 91) = 7.5$$

**Lemma 5.16** *Let n be an odd number and a compound number, and r be a factor of n, r*|*n,* <sup>1</sup> <sup>&</sup>lt; *<sup>r</sup>* <sup>&</sup>lt; <sup>√</sup>*n. Let f* (*x*) <sup>∈</sup> <sup>Z</sup>[*x*]*, x*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> given, then the computational complexity of finding r by Monte Carlo algorithm* ( *f*, *x*0) *is*

$$\text{Time}((f, \mathbf{x}\_0)) = O(\sqrt{n} \log^3 n) \text{ bits.} \tag{5.26}$$

*Further, there is a normal number C, so that for any positive real number* λ*, the success probability of Monte Carlo algorithm* ( *f*, *x*0) *to find a nontrivial factor r of n is greater than* 1 − e−<sup>λ</sup>*, that is*

$$P\{(f, \mathbf{x}\_0) \text{find out } r | n, r > 1\} \ge 1 - \mathbf{e}^{-\lambda}.\tag{5.27}$$

*The number of bit calculation operations required by the algorithm that depends on parameter* λ *(to ensure the success rate of the algorithm) is O*( √λ <sup>√</sup><sup>4</sup> *<sup>n</sup>* log<sup>3</sup> *<sup>n</sup>*)*.*

*Proof* From the discussion of computational complexity in Chap. 1, finding the maximum common divisor of two integers and the addition, subtraction, multiplication and division in mod *n* are polynomial. Let *C*<sup>1</sup> satisfies

$$\text{Time}((\text{y} - z, n)) \le C\_1 \log^3 n, \text{ where } \text{y}, z \le n.$$

*C*<sup>2</sup> satisfies

$$\text{Time}(f(\mathbf{x}) \bmod n) \le C\_2 \log^3 n, \mathbf{x} \in \mathbb{Z}\_n.$$

If *k*<sup>0</sup> is ( *f*, *x*0), the first subscript in the calculation satisfies (*xk*<sup>0</sup> − *x <sup>j</sup>*<sup>0</sup> , *n*) > 1, by the improved Monte Carlo algorithm, we have (*xk* − *x <sup>j</sup>*, *n*) > 1, where *j* = 2*<sup>h</sup>* − 1, 2*<sup>h</sup>* ≤ *k* < 2*<sup>h</sup>*+1. By Lemma 5.15, *k* ≤ 4*k*. Thus

$$\text{Time}(\text{found by } (f, \ge\_0) \, k) \le 4k\_0 (C\_1 \log^3 n + C\_2 \log^3 n). \tag{5.28}$$

Let (*xk*<sup>0</sup> <sup>−</sup> *<sup>x</sup> <sup>j</sup>*<sup>0</sup> , *<sup>n</sup>*) <sup>=</sup> *<sup>r</sup>* <sup>&</sup>gt; 1, *<sup>r</sup>* <sup>&</sup>lt; <sup>√</sup>*n*, by Lemma 5.14, *<sup>k</sup>*<sup>0</sup> <sup>≤</sup> *<sup>r</sup>*, so

$$\text{Time}(\text{find } r, r | n, r < \sqrt{n}) \le 4\sqrt{n}(C\_1 \log^3 n, C\_2 \log^3 n).$$

Equation (5.26) proved. In the sense of probability, that is, on the premise of allowing certain errors, Eq. (5.26) can be further improved.

Let λ > 0 be any given real number, by Lemma 5.3, ratio of *<sup>k</sup>*<sup>0</sup> <sup>≥</sup> <sup>1</sup> <sup>+</sup> <sup>√</sup>2λ*<sup>r</sup>* <sup>&</sup>lt; <sup>e</sup>−<sup>λ</sup>, in other words, the probability of successfully finding *<sup>r</sup>*, *<sup>r</sup>*|*n*, *<sup>r</sup>* <sup>≤</sup> <sup>√</sup>*<sup>n</sup>* is

$$P\{\text{find out}\,r,r|n,r<\sqrt{n}\} \ge 1 - \mathbf{e}^{-\lambda}.$$

In order to ensure the success rate, then *<sup>k</sup>*<sup>0</sup> <sup>≤</sup> <sup>1</sup> <sup>+</sup> <sup>√</sup>2λ*r*. By (5.28), the number of bit operations required shall not be greater than

$$(4(1+\sqrt{2\lambda r})(C\_1\log^3 n + C\_2\log^3 n) = O(\sqrt{\lambda}\sqrt[4]{n}\log^3 n)...$$

We have completed the proof of Lemma.

*Remark 5.1* A basic assumption ofMonte Carlo method is that the integer coefficient polynomial *f* can be used as an average mapping (see Lemma 5.14); this has not yet been proved.

# **5.4 Fermat Decomposition and Factor Basis Method**

**Lemma 5.17** *Suppose n is an odd number, there is a 1-1 correspondence between factorization n* = *a* · *b*(*a* ≥ *b* > 0) *of n and expression n* = *t* <sup>2</sup> − *s*<sup>2</sup> *(t and s are nonnegative integers) of n. The corresponding* σ : (*a*, *b*) → (*t*,*s*) *can be written as* σ ((*a*, *b*)) = (*t*,*s*)*, where*

$$
\sigma\left(\left(a,b\right)\right) = \left(\frac{a+b}{2}, \frac{a-b}{2}\right).
$$

*Inverse mapping is*

$$
\sigma^{-1}((t,s)) = (t+s, t-s).
$$

*Proof* If *<sup>n</sup>* <sup>=</sup> *ab*, because both *<sup>a</sup>* and *<sup>b</sup>* are odd, then *<sup>n</sup>* <sup>=</sup> ( *<sup>a</sup>*+*<sup>b</sup>* <sup>2</sup> )<sup>2</sup> <sup>−</sup> ( *<sup>a</sup>*−*<sup>b</sup>* <sup>2</sup> )2, so define

$$
\sigma\left(\left(a,b\right)\right) = \left(\frac{a+b}{2}, \frac{a-b}{2}\right).
$$

Conversely, if *n* = *t* <sup>2</sup> − *s*2, then *n* = (*t* + *s*)(*t* − *s*). So define σ <sup>−</sup>1((*t*,*s*)) = (*t* + *s*, *t* − *s*), we prove σ <sup>−</sup>1σ = 1, σ σ <sup>−</sup><sup>1</sup> = 1. By the definition,

$$\begin{cases} \sigma^{-1}\sigma((a,b)) = \sigma^{-1}\left(\frac{a+b}{2}, \frac{a-b}{2}\right) = (a,b), \\\\ \sigma(\sigma^{-1}((t,s))) = \sigma(t+s, t-s) = (t,s). \end{cases}$$

So σ is a 1-1 correspondence between the two decomposition *n* = *ab* = *t* <sup>2</sup> − *s*2, the Lemma holds.

The above simple lemma provides us with a method of factor decomposition, called Fermat factor decomposition: if *<sup>n</sup>* <sup>=</sup> *ab*, *<sup>a</sup>* is very close to *<sup>b</sup>*, then *<sup>n</sup>* <sup>=</sup> ( *<sup>a</sup>*+*<sup>b</sup>* <sup>2</sup> )<sup>2</sup> + ( *<sup>a</sup>*−*<sup>b</sup>* <sup>2</sup> )<sup>2</sup> <sup>=</sup> *<sup>t</sup>* <sup>2</sup> <sup>−</sup> *<sup>s</sup>*2, where *<sup>s</sup>* is very small and *<sup>t</sup>* is only a little larger than <sup>√</sup>*n*. Therefore, starting from *<sup>t</sup>* = [√*n*] + 1, we successively detect whether *<sup>t</sup>* <sup>2</sup> <sup>−</sup> *<sup>n</sup>* is a complete square number. If not, we change it to *<sup>t</sup>* = [√*n*] + 2 for detection. In this way, until *t* <sup>2</sup> − *n* = *s*2, we get *n* = (*t* + *s*)(*t* − *s*) through Fermat factorization. This method is effective when *n* = *ab*, *a* and *b* are very close.

Fermat factor decomposition can be further expanded into a factor-based method to become a more effective factor decomposition method. Its basic idea is: in Fermat factorization, *t* <sup>2</sup> − *n*<sup>2</sup> is required to be a complete square, which is difficult to appear in practice, but *t* <sup>2</sup> ≡ *s*<sup>2</sup>(mod *n*), *t* ≡ ±*s*(mod *n*) is easy to appear. Calculate the maximum common divisor (*t* + *s*, *n*) and (*t* − *s*, *n*), then we have factorization

$$n = (t+s, n)(t-s, n).$$

**Definition 5.5** Let B be *h* different primes (maybe *p*<sup>1</sup> = −1), *B* is called a factor base. An integer *b* is called a *B*-number, if the minimum nonnegative residue of *b*<sup>2</sup> under mod *n* can be expressed as the product of prime numbers in *B*, where *n* is the given positive integer.

*Example 5.7* Let *n* = 4633, *B* = {−1, 2, 3}, then 67, 68, 69 are all *B*-number, because 672 ≡ −144(mod 4633), 682 ≡ −9(mod 4633), 692 ≡ 128(mod 4633).

If *b* is a *B*-number, *b*<sup>2</sup> mod *n* represents the minimum nonnegative residue of *b*<sup>2</sup> under mod *n*, by the definition,

$$b^2 \bmod n = \prod\_{i=1}^h p\_i^{\alpha\_i}, \alpha\_i \ge 0.$$

Let *<sup>e</sup>* = {*e*1, *<sup>e</sup>*2,..., *eh*} ∈ <sup>F</sup>*<sup>h</sup>* <sup>2</sup> be an *h*-dimensional binary vector, define

#### 5.4 Fermat Decomposition and Factor Basis Method 219

$$e\_j = \begin{cases} 0, & \text{if } \alpha^j \text{ is even;}\\ 1, & \text{if } \alpha^j \text{ is odd.} \end{cases} \quad 1 \le j \le h.$$

*e* is called the binary vector corresponding to *b* if{*bi*} = *A* is a set of *B*-numbers. The binary vector corresponding to each *bi* is denoted as *ei* = {*ei*<sup>1</sup> , *ei*<sup>2</sup> ,..., *eih* }, denote *b*2 *<sup>i</sup>* mod *n* with *ai* . We have

$$\prod\_{i \in A} a\_i = \prod\_{j=1}^h p\_j^{\sum\_{i \in A} a\_{ij}}, \text{ where } a\_i = \prod\_{j=1}^h p\_j^{a\_{ij}},$$

Suppose *<sup>i</sup>*∈*<sup>A</sup> ei* <sup>=</sup> (0, <sup>0</sup>,..., <sup>0</sup>) is the zero vector in <sup>F</sup>*<sup>h</sup>* <sup>2</sup> , then

$$\sum\_{i \in A} \alpha\_{ij} \equiv 0 (\text{mod } 2), \forall \ 1 \le j \le h.$$

That is, *ai* is a square number. Let *rj* <sup>=</sup> <sup>1</sup> 2 *<sup>i</sup>*∈*<sup>A</sup>* α*i j* , then

$$\prod\_{i \in A} a\_i = \left(\prod\_{j=1}^h p\_j^{r\_j}\right)^2, \text{define } c = \prod\_{j=1}^h p\_j^{r\_j}, \tag{5.29}$$

On the other hand, *bi* mod *n* represents the minimum nonnegative residue of *bi* under mod *n*, let

$$b = \prod\_{i \in A} (b\_i \bmod n) = \prod\_{i \in A} \delta\_i,\tag{5.30}$$

where δ*<sup>i</sup>* = *bi* mod *n*, that is 0 ≤ δ*<sup>i</sup>* < *n*, and *bi* ≡ δ*i*(mod *n*), thus

$$\prod\_{i \in A} b\_i \equiv b(\operatorname{mod} n).$$

Because of *ai* = *b*<sup>2</sup> *<sup>i</sup>* mod *n*, that is 0 ≤ *ai* < *n*, and *b*<sup>2</sup> *<sup>i</sup>* ≡ *ai*(mod *n*). There is

$$\prod\_{i \in A} b\_i^2 = b^2 \equiv \prod\_{i \in A} a\_i = c^2(\text{mod } n).$$

Two different integers *b* and *c* defined by Eqs. (5.29) and (5.30) satisfy *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*), We write the above analysis as the following lemma.

**Lemma 5.18** *Let A* = {*b*1, *b*2,..., *bi*,...} *be a finite set of some B-numbers, let ei* <sup>=</sup> (*ei*<sup>1</sup> , *ei*<sup>2</sup> ,..., *eih* ) <sup>∈</sup> <sup>F</sup>*<sup>h</sup>* <sup>2</sup> *be the binary vector corresponding to bi , ai* = *b*<sup>2</sup> *<sup>i</sup>* mod *n,* δ*<sup>i</sup>* = *bi* mod *n. If <sup>i</sup>*∈*<sup>A</sup> ei* <sup>=</sup> <sup>0</sup> *is the zero vector in* <sup>F</sup>*<sup>h</sup>* <sup>2</sup> *, then <sup>i</sup>*∈*<sup>A</sup> ai is a square number. Write*

$$a\_i = \prod\_{j=1}^h p\_j^{\alpha\_{i\backslash}}, \\ \prod\_{i \in A} a\_i = \prod\_{j=1}^h p\_j^{\sum\_{i \in A} \alpha\_{i\backslash}} = c^2.$$

*where*

$$c = \prod\_{j=1}^{h} p\_j^{\frac{1}{2} \sum\_{i \in A} \alpha\_{ij}},$$

*Further let b* = δ1δ<sup>2</sup> ··· *, we have b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*)*.*

From the above lemma, if *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*), *b* ≡ ±*c*(mod *n*). Then we will find a nontrivial factor *d* = (*b* + *c*, *n*) of *n*. Now the question is, if *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*), how likely is *b* ≡ ±*c*(mod *n*)? Might as well make (*b*, *n*) = (*c*, *n*) = 1, otherwise both sides are divided by (*b*, *n*)2: by *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*), =⇒ (*bc*−<sup>1</sup>)<sup>2</sup> ≡ 1(mod *n*). The problem is transformed into how many solutions *x* are in *x* <sup>2</sup> ≡ 1(mod *n*), 1 ≤ *x* < *n*.

**Lemma 5.19** *Let n be an odd number, then the number of solutions of x* <sup>2</sup> ≡ 1(mod *n*) *is* 2*r, where r is the number of different prime factors of n.*

*Proof* If *r* = 1, then *n* = *p*α(α ≥ 1), *p* is an odd prime, now *x* <sup>2</sup> ≡ 1(mod *p*α) has two solutions *x* = ±1, because let *g* be the original root of mod *p*<sup>α</sup>, then *x* = *g<sup>t</sup>* (1 ≤ *<sup>t</sup>* <sup>≤</sup> *<sup>p</sup>*<sup>α</sup>−<sup>1</sup>(*<sup>p</sup>* <sup>−</sup> <sup>1</sup>)), *<sup>x</sup>* <sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>⇔</sup> *<sup>p</sup>*<sup>α</sup>−<sup>1</sup>(*<sup>p</sup>* <sup>−</sup> <sup>1</sup>)|2*t*. So there are only two solutions *<sup>t</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *<sup>p</sup>*<sup>α</sup>−<sup>1</sup>(*<sup>p</sup>* <sup>−</sup> <sup>1</sup>) and *<sup>t</sup>* <sup>=</sup> *<sup>p</sup>*<sup>α</sup>−<sup>1</sup>(*<sup>p</sup>* <sup>−</sup> <sup>1</sup>). So *<sup>x</sup>* ≡ ±1(mod *<sup>p</sup>*α). If *<sup>n</sup>* <sup>=</sup> *<sup>p</sup>*<sup>α</sup><sup>1</sup> <sup>1</sup> ··· *p*<sup>α</sup>*<sup>r</sup> <sup>r</sup>* , then the number of solutions of *x* <sup>2</sup> ≡ 1(mod *n*) deduced from the Chinese remainder theorem is 2*<sup>r</sup>*. The Lemma holds!

**Lemma 5.20** *n is an odd number and is the product of the power of more than two different primes, B* = {*p*1, *p*2,..., *ph*} *is a factor base. Randomly select two B-numbers b and c, then b*<sup>2</sup> <sup>≡</sup> *<sup>c</sup>*<sup>2</sup>(mod *<sup>n</sup>*), =⇒ *<sup>b</sup>* ≡ ±*c*(mod *<sup>n</sup>*)*'s rate is* <sup>≤</sup> <sup>1</sup> 2 *.*

*Proof x* <sup>2</sup> ≡ 1(mod *n*) has 2*<sup>r</sup>* different solutions (mod *n*), *r* ≥ 2. The two solutions corresponding to *x* ≡ ±1(mod *n*) correspond to *b* ≡ ±*c*(mod *n*). Thus

$$a b^2 \equiv c^2 (\text{mod } n), \Longrightarrow b \equiv \pm c (\text{mod } n) \text{'s rate } \leq \frac{2}{2^r} \leq \frac{1}{2},$$

Lemma 5.20 holds.

According to Lemma 5.20, *b* and *c* are selected by using factor basis, if *b* ≡ <sup>±</sup>*c*(mod *<sup>n</sup>*), then select failure, and the probability of failure is <sup>≤</sup> <sup>1</sup> <sup>2</sup> . If the selection fails, select another *b*<sup>1</sup> and *c*1, in this way, we randomly select *k b* and *c* equally almost independently, and the probability of success of *b* ≡ ±*c*(mod *n*) is

$$P\{b^2 \equiv c^2(\text{mod } n), b \not\equiv \pm c(\text{mod } n)\} \ge 1 - \frac{1}{2^k}.\tag{5.31}$$

In other words, the probability of finding a nontrivial factor *d* = (*b* + *c*, *n*) of *n* by using the factor base can be infinitely close to 1. Below, we systematically summarize the factor base decomposition method as follows:

Factor-based method

Let *n* be a large odd number and *y* be an appropriately selected integer (e.g., *<sup>y</sup>* <sup>≤</sup> *<sup>n</sup>* <sup>1</sup> <sup>10</sup> ), let the factor base be

$$B = \{-1, p \mid p \text{ is prime}, \ p \le \nu\}.$$

Select a certain number of *B*-number at random, *A*<sup>1</sup> = {*b*1, *b*2,..., *bN* }, usually *N* ≤ π(*y*) + 2 will meet the needs. Each *bi* is expressed as the product of prime numbers in *B*. Calculate the corresponding binary vector *ei* , select a subset *A* ⊂ *A*<sup>1</sup> in *A*1, such that *<sup>i</sup>*∈*<sup>A</sup> ei* = 0, *bi* corresponding to binary vector *ei* , denote as *A* = {*b*1, *b*2,..., *bi*,...}. Let

$$b = \prod\_{i \in A} (b\_i \bmod n) = \prod\_{i \in A} \delta\_i, \text{ where } \delta\_i = b\_i \bmod n$$

and

$$c = \prod\_{j \in B} p\_j^{r\_j} \bmod n, \ r\_j = \frac{1}{2} \sum\_{i \in A} \alpha\_{ij}.$$

We have *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*), if *b* ≡ ±*c*(mod *n*), then reselect the subset *A*, Until finally *b* ≡ ±*c*(mod *n*), in this way, we find a nontrivial factor *d*|*n* of *n*, *d* = (*b* + *c*, *n*). Therefore, there is factorization *<sup>n</sup>* <sup>=</sup> *<sup>d</sup>* · *<sup>n</sup> d* .

Factor decomposition using factor-based method cannot guarantee the success rate of 100% because *b* ≡ ±*c*(mod *n*) cannot be deduced from *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod *n*), however, the success probability of factorization for large odd *n* can be infinitely close to 1. Under the condition of success probability <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>*<sup>k</sup>* (*k* is a given normal number), the computational complexity of factorization *n* of by factor-based method can be estimated as

Time(factor-based method to *<sup>n</sup>* factorization) <sup>=</sup> *<sup>O</sup>*(e*<sup>c</sup>* <sup>√</sup>log *<sup>n</sup>* log log *<sup>n</sup>*). (5.32)

The proof of Formula (5.32) is relatively complex. No detailed proof is given here. Interested readers can refer to pages 136–141 of (Pomerance, 1982a) in reference 5. The exact value of *C* in (5.32) is unknown. It is generally guessed that *C* = 1 + ε, where ε > 0 is any small positive real number.

Let *k* be the number of bits of *n*, and the estimate on the right of (5.32) can be written as *O*(e*<sup>c</sup>* <sup>√</sup>*<sup>k</sup>* log *<sup>k</sup>* ). Therefore, the computational complexity of the factor-based method is sub-exponential. Compared with the Monte Carlo method introduced in the previous section (see (5.31)), its computational complexity is exponential, because

$$O(\sqrt{n}) = O(\mathfrak{e}^{c\_1 k}), \text{ where } c\_1 = \frac{1}{2} \log 2.$$

As we all know, the security of RSA public key cryptography is based on the prime factorization *n* = *pq* of *n*. Although there is no general method to factorize any large odd *n*, although Monte Carlo method and factor-based method are probability calculation methods, the probability of successful factorization is very large, The disadvantage is that their computational complexity is exponential and sub exponential, which is the reason for choosing huge prime numbers *p* and *q* in RSA.

# **5.5 Continued Fraction Method**

In the factor-based method introduced in the previous section, *b*<sup>2</sup> mod *n* can be the residual of the minimum absolute value of *b*<sup>2</sup> under mod *n*, that is

$$|b^2 \equiv b^2 \bmod n (\text{mod} \, n), \, |b^2 \bmod n| \le \frac{n}{2}.$$

In this way, *b*<sup>2</sup> mod *n* can be decomposed into the product of some smaller prime numbers. The continued fraction method is the best method at present. How to find the integer *b*, so that |*b*<sup>2</sup> mod *n*| < 2 <sup>√</sup>*n*, *<sup>b</sup>*<sup>2</sup> mod *<sup>n</sup>* is more likely to be decomposed into the product of some small prime numbers. First, we introduce what is continued fraction and some basic properties.

Suppose *<sup>x</sup>* <sup>∈</sup> <sup>R</sup> is a real number, [*x*] is the integer part of *<sup>x</sup>*, and {*x*} is the decimal part of *<sup>x</sup>*. Let *<sup>a</sup>*<sup>0</sup> = [*x*], if{*x*} <sup>=</sup> 0, and let *<sup>a</sup>*<sup>1</sup> = [ <sup>1</sup> {*<sup>x</sup>*}], because of *<sup>x</sup>* = [*x*]+{*x*}, there is

$$\|x\| = a\_0 + \frac{1}{\{x\}} = a\_0 + \frac{1}{a\_1 + \{\{x\}^{-1}\}}.$$

If {{*x*}−1} = 0, write

$$a\_2 = [\{\{x\}^{-1}\}^{-1}],$$

consider

$$\{\{\{\{x\}^{-1}\}^{-1}\}^{-1},$$

So we got

$$x = a\_0 + \frac{1}{a\_1 + \frac{1}{a\_2 + \frac{1}{a\_3 + \dotsb}}} \cdot$$

The above formula is called the continued fraction expansion of real number *x*. To save space, write *x* = [*a*0, *a*1,..., *an*,...], if and only if *x* is a rational number, the continued fraction of *x* is expanded to be finite, denote as

$$\mu = [a\_0, a\_1, \dots, a\_n], \text{ where } a\_n > 1.$$

It is called the standard expansion of rational number *x*.

**Definition 5.6** *x* = [*a*0, *a*1,..., *an*,...] is the continued fraction expansion of *x*, for *<sup>i</sup>* <sup>≥</sup> 0, call *bi ci* = [*a*0, *a*1,..., *ai*] the *i*th asymptotic fraction of *x*, specially,

$$\frac{b\_0}{c\_0} = \frac{a\_0}{1}, \ \frac{b\_1}{c\_1} = \frac{a\_1 a\_0 + 1}{a\_1}.$$

The progressive fraction *bi ci* of the real number *x* is a reduced fraction, that is (*bi*, *ci*) = 1, and has the following properties.

**Lemma 5.21** *<sup>x</sup>* = [*a*0, *<sup>a</sup>*1,..., *an*, ···] *is the continued fraction expansion of x, bi ci is the asymptotic fraction, then*

*(i) when i* <sup>≥</sup> <sup>2</sup>*, bi*

$$\frac{b\_i}{c\_i} = \frac{a\_i b\_{i-1} + b\_{i-2}}{a\_i c\_{i-1} + c\_{i-2}}.\tag{5.33}$$

*(ii) If i* ≥ 1*, then*

$$b\_i c\_{i-1} - b\_{i-1} c\_i = (-1)^{i-1}.\tag{5.34}$$

*Proof* We prove that (i) by induction. Obviously, the proposition of *i* = 2 holds, that is

$$\frac{b\_2}{c\_2} = \frac{a\_2b\_1 + b\_0}{a\_2c\_1 + c\_0} = \frac{a\_2(a\_1a\_0 + 1) + a\_0}{a\_2a\_1 + 1}.$$

If the proposition holds for *i*, that is

$$\frac{b\_i}{c\_i} = \frac{a\_i b\_{i-1} + b\_{i-2}}{a\_i c\_{i-1} + c\_{i-2}}.$$

Then write [*a*0, *<sup>a</sup>*1,..., *ai*, *ai*+1]=[*a*0, *<sup>a</sup>*1,..., *ai* <sup>+</sup> <sup>1</sup> *ai*+<sup>1</sup> ],

$$\frac{b\_{i+1}}{c\_{i+1}} = \frac{\left(a\_i + \frac{1}{a\_{i+1}}\right)b\_{i-1} + b\_{i-2}}{\left(a\_i + \frac{1}{a\_{i+1}}\right)c\_{i-1} + c\_{i-2}} = \frac{a\_{i+1}b\_i + b\_{i-1}}{a\_{i+1}c\_i + c\_{i-1}}.$$

So (i) holds.

We prove Formula (5.34) by induction, when *i* = 1,

$$b\_1c\_0 - b\_0c\_1 = a\_1a\_0 + 1 - a\_1a\_0 = 1 = (-1)^0.$$

So when *i* = 1, the proposition holds, and when *i*, the proposition holds, that is

$$b\_i c\_{i-1} - b\_{i-1} c\_i = (-1)^{i-1}.$$

Then

$$\begin{aligned} b\_{i+1}c\_i - b\_ic\_{i+1} &= (a\_{i+1}b\_i + b\_{i-1})c\_i - b\_i(a\_{i+1}c\_i + c\_{i-1}) \\ &= b\_{i-1}c\_i - b\_ic\_{i-1} \\ &= (-1)^i. \end{aligned}$$

Lemma 5.21 holds.

Continued fractions have many important applications in numbers, such as rational approximation of real numbers and rational approximation of algebraic numbers. Periodic continued fractions are an important special case in rational approximation of algebraic numbers. *x* = [*a*0, *a*1,..., *an*,...]. If these *ai* occur in cycles of a certain length, they are called periodic continued fractions. The famous Lagrange theorem shows that the necessary and sufficient condition for the expansion of the continued fraction of *x* into a periodic continued fraction is that *x* is a quadratic real algebraic number. Here we do not discuss some profound properties of continued fractions, but only prove some properties we need.

**Lemma 5.22** *Let x* > 1 *be a real number, bi ci* (*i* ≥ 0) *is the asymptotic fraction of x, then*

$$|b\_i^{\; 2} - x^2 c\_i^2| < 2\mathfrak{x}, \forall \; i \ge 0.$$

*Proof* Because *x* is between progressive scores *bi ci* and *bi*+<sup>1</sup> *ci*+<sup>1</sup> , by property (ii) of Lemma 5.21, there is

$$\left| \frac{b\_{i+1}}{c\_{i+1}} - \frac{b\_i}{c\_i} \right| = \frac{1}{c\_i c\_{i+1}}, i \ge 0.$$

Thus

$$\begin{aligned} |b\_i|^2 - x^2 c\_i^2| &= c\_i^2 \left| x - \frac{b\_i}{c\_i} \right| \left| x + \frac{b\_i}{c\_i} \right| \\ &< c\_i^2 \cdot \frac{1}{c\_i c\_{i+1}} \left( x + \left( x + \frac{1}{c\_i c\_{i+1}} \right) \right) .\end{aligned}$$

So

$$\begin{aligned} |b\_i|^2 - x^2 c\_i^2 |-2x| &< 2x \left( -1 + \frac{c\_i}{c\_{i+1}} + \frac{1}{2x c\_{i+1}^2} \right) \\ &< 2x \left( -1 + \frac{c\_i}{c\_{i+1}} + \frac{1}{c\_{i+1}} \right) \\ &< 2x \left( -1 + \frac{c\_{i+1}}{c\_{i+1}} \right) = 0. \end{aligned}$$

The Lemma holds.

**Lemma 5.23** *Let n be a positive integer and n not a complete square. Let* { *bi ci* }*<sup>i</sup>*≥<sup>0</sup> *be the asymptotic fraction of the continued fraction expansion of* <sup>√</sup>*n, and b*<sup>2</sup> *<sup>i</sup>* mod *n be the residue of the minimum absolute value of b*<sup>2</sup> *<sup>i</sup> under* mod *n, then we have*

$$b\_i^2 \bmod n < 2\sqrt{n}, \ \forall i \ge 0.$$

*Proof* By Lemma 5.22, let *<sup>x</sup>* <sup>=</sup> <sup>√</sup>*n*, then

$$b\_i^2 \equiv b\_i^2 - n c\_i^2 (\text{mod } n).$$

Because

$$|b\_i^2 - nc\_i^2| < 2\sqrt{n}, \Longrightarrow b\_i^2 \bmod n < 2\sqrt{n}, \ \forall i \ge 0.$$

The Lemma holds.

Combining the above Lemma 5.23 with the factorization method, we obtain the continued fraction decomposition method.

Continued fraction decomposition method:

The operations of mod *n* involved in this algorithm, except that it is specially pointed out, are the minimum nonnegative residue of mod *n*. If *n* is a large odd number, it is also a compound number, first let *<sup>b</sup>*−<sup>1</sup> <sup>=</sup> *<sup>b</sup>*, *<sup>b</sup>*<sup>0</sup> <sup>=</sup> *<sup>a</sup>*<sup>0</sup> = [√*n*], and *<sup>x</sup>*<sup>0</sup> <sup>=</sup> <sup>√</sup>*<sup>n</sup>* <sup>−</sup> *<sup>a</sup>*<sup>0</sup> = {√*n*}, calculate *<sup>b</sup>*<sup>2</sup> <sup>0</sup> mod *n*, in fact, *b*<sup>2</sup> <sup>0</sup> mod *n* = *b*<sup>2</sup> <sup>0</sup> − *n*. Second, consider *i* = 1, 2,.... To determine *bi* , we proceed in several steps:


By Lemma 5.23, *b*<sup>2</sup> *<sup>i</sup>* mod *n* < 2 <sup>√</sup>*n*, it can be decomposed into the product of some small prime numbers. If a prime number *p* appears in the decomposition of two or more *b*<sup>2</sup> *<sup>i</sup>* mod *n*, or in the decomposition of an *b*<sup>2</sup> *<sup>i</sup>* mod *n*, *p* appears to an even power, *p* is called a standard prime number, in other words, a standard prime *p* is

$$p|b\_i^2 \bmod n, \ p|b\_j^2 \bmod n, i \neq j.$$

Or

$$p^{\alpha} \| b\_i^2 \bmod n, \text{ } \alpha \text{ is even.}$$

We choose factor base *B* as

$$B = \{-1, \text{ standard prime}\}.$$

In this way, all *b*<sup>2</sup> *<sup>i</sup>* mod *n* are *B*-numbers, and the corresponding binary vector is *ei* . Select a subset *A* = {*bi*}, =⇒ *<sup>i</sup>*∈*<sup>A</sup> ei* = 0. Let

$$b = \prod\_{i \in A} (b\_i \bmod n) = \prod\_{i \in A} \delta\_i$$

and *c* = *<sup>j</sup>*∈*<sup>B</sup> p rj <sup>j</sup>* , where

$$r\_j = \frac{1}{2} \sum\_{i \in A} \alpha\_{ij}, \forall j \in B.$$

If *b* ≡ ±*c*(mod *n*), then (*b* + *c*, *n*) is a nontrivial factor of *n*, and we obtain the factorization of *n*. If *b* ≡ ±*c*(mod *n*), then another subset *A* is selected and repeated to complete the continued fraction factorization method.

*Example 5.8* The continued fraction method is used to factorize *n* = 9073.

Solution:We calculate *ai* , *bi* and *b*<sup>2</sup> *<sup>i</sup>* mod *n* in turn, where *bi* = (*aibi*−<sup>1</sup> + *bi*−<sup>2</sup>) mod *n*, the table is as follows:


From the value of *b*<sup>2</sup> *<sup>i</sup>* mod *n*, we can choose the factor base *B* as *B* = {−1, 2, 3, 7}. Then *b*<sup>2</sup> *<sup>i</sup>* mod *n* is the number of *B*-number, when *i* = 0, 2, 4,.... The corresponding binary vector is

*e*<sup>0</sup> = (1, 4, 1, 0), *e*<sup>2</sup> = (1, 0, 0, 1), and *e*<sup>4</sup> = (1, 0, 3, 0).

Easy to calculate *e*<sup>0</sup> + *e*<sup>4</sup> = (0, 0, 0, 0). Therefore, we choose

$$\begin{cases} b = 95 \cdot 2619 \equiv 3834 \bmod 9073; \\ c = 2^2 \cdot 3^2 = 36. \end{cases}$$

Because *b*<sup>2</sup> ≡ *c*<sup>2</sup>(mod 9073), that is 3834<sup>2</sup> = 362(mod 9073), but 3834 ≡ ±36 (mod 9073), so we get a nontrivial factor of *n* = 9073, *d* = (3834 + 36, 9073) = 43. Thus 9073 = 43 · 211, the factorization of 9073 is obtained.

#### **Exercise 5**


7. Prove that the following integers are Carmichael numbers:

1105 = 5 · 13 · 17, 1729 = 7 · 13 · 19, 2465 = 5 · 17 · 29, 2821 = 7 · 13 · 31, 6601 = 7 · 23 · 41, 29,341 = 13 · 37 · 61, 172,081 = 7 · 13 · 31 · 61, 278,545 = 5 · 17 · 29 · 113.


*n* = 8633, *n* = 809,009, *n* = 92,296,873, *n* = 88,169,891.

14. The Fermat factorization method is used to decompose the positive integer as follows:

*n* = 68,987, *n* = 29,895,581, *n* = 19,578,079, *n* = 17,018,759.


# **References**


Kranakis, E. (1986). *Primality and cryptography*. Wiley.

Lehman, R. S. (1974). Factoring large number. *Mathematics of Computation, 28,* 637–646.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 Elliptic Curve**

In 1985, mathematician v. Miller introduced elliptic curve into cryptography for the first time. In 1987, mathematician N. Koblitz further improved and perfected Miller's work and formed the famous elliptic curve public key cryptosystem. Elliptic curve public key cryptosystem, RSA public key cryptosystem and ElGamal public key cryptosystem based on discrete logarithm are recognized as the three major public key cryptosystems, which occupy the most prominent position in modern cryptography. Compared with RSA cryptography, elliptic curve cryptography can provide the same or higher level of security with a shorter key; compared with ElGamal cryptosystem, they are based on the same mathematical principle and are essentially based on discrete logarithm cryptosystem. ElGamal cryptosystem is based on the discrete logarithm of multiplication group over finite field, and elliptic curve cryptosystem is based on the discrete logarithm of Mordell group of elliptic curve over finite field, but choosing elliptic curve has more flexibility than choosing finite field, so elliptic curve cryptosystem has attracted more attention This paper systematically and comprehensively introduces elliptic curve cryptography from the three aspects of cryptography mechanism and factorization, in order to make readers better understand and master this public key cryptography mechanism.

# **6.1 Basic Theory**

The working platform of this chapter is a field *<sup>E</sup>*, especially *<sup>E</sup>* <sup>=</sup> <sup>R</sup>(real number field), *<sup>E</sup>* <sup>=</sup> <sup>C</sup>(complex field), *<sup>E</sup>* <sup>=</sup> <sup>Q</sup>(rational number field) or *<sup>E</sup>* <sup>=</sup> <sup>F</sup>*<sup>q</sup>* (Finite field of *q* elements) four common fields. The characteristic χ (*E*) of a field *E* is the order of the multiplicative unit element *e* of *E* in the additive group. That is, χ (*E*) = *o*(*e*) is a prime number or ∞, specifically,

$$\chi(E) = \begin{cases} \infty, & \text{if } E = \mathbb{C}, \mathbb{R}, \mathbb{Q}, \\ p, & \text{if } E = \mathbb{F}\_q, q = p'. \end{cases}$$

**Definition 6.1** (i) Suppose *E* is a field, the character of *E* χ (*E*) = 2, 3, *f* (*x*) = *x* <sup>3</sup> + *ax* + *b* ∈ *E*[*x*] is a cubic polynomial and has no multiple roots in the split field. An elliptic curve in field *E* refers to the set of finite points (*x*, *y*) ∈ *E*<sup>2</sup> plus infinity on the "plane," where the finite point (*x*, *y*) satisfies

$$y^2 = x^3 + ax + b, \text{where } a \in E, \ b \in E \text{ given}...$$

*CE* represents the elliptic curve, and "O" represents the infinity point, i.e.,

$$C\_E = \{ (\mathbf{x}, \mathbf{y}) \in E^2 | \mathbf{y}^2 = \mathbf{x}^3 + a\mathbf{x} + b \} \cup \{ O \}. \tag{6.1}$$

(ii) If χ (*E*) = 2, then an elliptic curve *CE* on the field *E* with the characteristic of 2 is defined as

$$\mathcal{C}\_E = \{ (\mathbf{x}, \mathbf{y}) \in E^2 | \mathbf{y}^2 + \mathbf{y} = \mathbf{x}^3 + a\mathbf{x} + b \} \cup \{ O \}. \tag{6.2}$$

(iii) If χ (*E*) = 3, *x* <sup>3</sup> + *ax* <sup>2</sup> + *bx* + *c* ∈ *E*[*x*] has no multiple roots in the split field, then an elliptic curve *CE* on *E* is defined as

$$C\_E = \{ (\mathbf{x}, \mathbf{y}) \in E^2 | \mathbf{y}^2 = \mathbf{x}^3 + a\mathbf{x}^2 + b\mathbf{x} + c \} \cup \{ O \}. \tag{6.3}$$

Let *F*(*x*, *y*) ∈ *E*[*x*, *y*] be a bivariate polynomial, then *F*(*x*, *y*) = 0 defines an algebraic curve *C* on *E*. (*x*0, *y*0) ∈ *C* is called a nonsingular point on *C*, if at least one of the partial derivatives <sup>∂</sup>*<sup>F</sup>* <sup>∂</sup>*<sup>x</sup>* and <sup>∂</sup>*<sup>F</sup>* ∂*y* at(*x*0, *y*0)is not 0. If χ (*E*) = 2, 3, let *f* (*x*) = *x* <sup>3</sup> + *bx* + *c*, then the finite points of an elliptic curve *F*(*x*, *y*) = *y*<sup>2</sup> − *f* (*x*) = 0 on *E* are nonsingular points, which is the same as that in χ (*E*) = 2, χ (*E*) = 3. Therefore, an elliptic curve is also called a nonsingular cubic curve.

Among many profound arithmetic properties of elliptic curves, Mordell group on elliptic curves is the most beautiful and important basic property. Firstly, we introduce Mordell group when *<sup>E</sup>* <sup>=</sup> <sup>R</sup> is familiar with real number field and then extend it to finite field.

Elliptic curve over real number field

**Definition 6.2** Let *<sup>E</sup>* <sup>=</sup> <sup>R</sup> be real number field, *CE* is an elliptic curve, *<sup>P</sup>* and *<sup>Q</sup>* are two points on *CE* , that is *P* ∈ *CE* , *Q* ∈ *CE* , we define addition according to the following rules:

(1) If *P* = *O* is infinity, define that *P* + *P* = *O* is still infinity; that is, infinity is the unit element of addition, and the negative element of *P* is −*P* = *P* = *O*.

(2) If *P* = (*x*, *y*) ∈ *CE* is a finite point. Define −*P* = (*x*, −*y*), obviously, −*P* ∈ *CE* , is the specular reflection point of point *P* on the *x y*−plane.

(3) If *P*, *Q* ∈ *CE* are two finite points, they have different *x*-coordinates(i.e.,*P* = (*x*1, *y*1), *Q* = (*x*2, *y*2), *x*<sup>1</sup> = *x*2), then there is exactly a point *R* on the connecting line between *P* and *Q* on the *x y*−plane, which is the intersection of the connecting line and the elliptic curve, define *P* + *Q* = −*R*, is the specular reflection point of *R*. If *Q* is infinity. Then define *P* + *O* = *P*.

(4) If *Q* = −*P*, that is, *P* and *Q* have the same *x*-coordinate, and *P* + *Q* = *O* is defined as infinity.

(5) If *P* = *Q* is a finite point on *CE* . Then the tangent of *CE* at *P* has exactly an intersection *R* with *CE* , define *P* + *P* = −*R*.

We use the geometric construction method to define the addition on elliptic curve *CE* , for the connection of finite points with different *x*-coordinates and why the tangent at the finite point has only a unique intersection with *CE* , it needs strict mathematical proof. We attribute it to the following lemma.

**Lemma 6.1** *Let P* = (*x*1, *y*1), *Q* = (*x*2, *y*2) *be two finite points on elliptic curve CE , and x*<sup>1</sup> = *x*2*, then*

*(i) The line between P and Q has only a unique intersection R* = (*x*3, *y*3) *with CE , satisfies R* = *P*, *R* = *Q, where*

$$\begin{cases} \mathbf{x}\_{3} = (\frac{\mathbf{y}\_{2} - \mathbf{y}\_{1}}{x\_{2} - x\_{1}})^{2} - \mathbf{x}\_{1} - \mathbf{x}\_{2}, \\ \mathbf{y}\_{3} = -\mathbf{y}\_{1} + (\frac{\mathbf{y}\_{2} - \mathbf{y}\_{1}}{x\_{2} - x\_{1}})(\mathbf{x}\_{1} - \mathbf{x}\_{3}). \end{cases} \tag{6.4}$$

*(ii) Let* α *be the value of derivative dy dx at point P, then the tangent of point P and CE only have a unique intersection R* = (*x*3, *y*3)*, R* = *P, where*

$$\begin{cases} x\_3 = (\frac{3x\_1^2 + a}{2y\_1})^2 - 2x\_1, \\ y\_3 = -y\_1 + (\frac{3x\_1^2 + a}{2y\_1})(x\_1 - x\_3). \end{cases} \tag{6.5}$$

*Proof* Let the functional equation of the connecting line between *P* and *Q* be *y* = α*x* + β on the *x y*−plane, where

$$\alpha = \frac{y\_2 - y\_1}{x\_2 - x\_1}, \; \beta = y\_1 - \alpha x\_1.$$

A point (*x*, α*x* + β) on line *y* = α*x* + β is on elliptic curve *CE* if and only if

$$\left(\alpha\mathbf{x} + \boldsymbol{\beta}\right)^{2} = \mathbf{x}^{3} + a\mathbf{x} + b.\tag{6.6}$$

Therefore, the three solutions of *x* <sup>3</sup> − (α*x* + β)<sup>2</sup> + *ax* + *b* = 0 are *x*, and each solution will produce an intersection. But we assume that *P* and *Q* are at the intersection, so there is only the third intersection *R* = (*x*, α*x* + β) = (*x*3, α*x*<sup>3</sup> + β). Because the three solutions *x*1, *x*2, *x*<sup>3</sup> of equation (6.6) satisfy the following relationship

$$x\_1 + x\_2 + x\_3 = \alpha^2.$$

There is

$$\begin{cases} x\_3 = (\frac{\mathbf{y}\_2 - \mathbf{y}\_1}{x\_2 - x\_1})^2 - x\_1 - x\_2, \\ y\_3 = \alpha x\_3 + \beta = -\mathbf{y}\_1 + (\frac{\mathbf{y}\_2 - \mathbf{y}\_1}{x\_2 - x\_1})(x\_1 - x\_3). \end{cases}$$

Thus, (6.4) holds. If point *Q* is infinitely close to point *P*, the connecting line becomes the tangent of curve *CE* at point *P*, now

$$\alpha = \frac{d\mathbf{y}}{dx}|\_{(\mathbf{x}\_1, \mathbf{y}\_1)} = \frac{\mathbf{3}\mathbf{x}\_1^2 + a}{2\mathbf{y}\_1}.$$

So the tangent has a unique intersection with *CE* , *R* = *P*, *R* = (*x*3, α*x*<sup>3</sup> + β), where

$$\begin{cases} \varkappa\_3 = \alpha^2 - 2\varkappa\_1 = (\frac{3\varkappa\_1^2 + a}{2\varkappa\_1})^2 - 2\varkappa\_1, \\\varkappa\_3 = -\varkappa\_1 + (\frac{3\varkappa\_1^2 + a}{2\varkappa\_1})(\varkappa\_1 - \varkappa\_3). \end{cases}$$

(6.5) holds, So as to complete the proof of Lemma (Fig. 6.1).

*Example 6.1* On the real plane, we give a specific example *y*<sup>2</sup> = *x* <sup>3</sup> − *x* to illustrate the addition rule on this elliptic curve:

The point of *CE* in the left half plane is called the torsion point of *CE* , and the point of C in the right half plane is called the free point of *CE* .

*Remark 6.1* In Lemma 6.1, if *P* = (*x*1, 0), that is *y*<sup>1</sup> = 0, then the only intersection of the tangent of point *P* and *CE* is defined as the infinity point "*O* .

From Definition 6.1 and Lemma 6.1, we have the following important corollaries.

**Corollary 6.1** *(i) All points of elliptic curve CE form an Abel group under addition, in which the infinity point "O" is the zero element of the group. This group is called Mordell group.*

*(ii) If P* = (*x*1, *y*1), *Q* = (*x*2, *y*2) *is a rational point, that is, x*1, *y*1, *x*2, *y*<sup>2</sup> *is a rational number, then another unique intersection R between the line between P and Q and CE is also a rational point.*

*Proof* (*i*)is directly given by Definition 6.1. Conclusion (*ii*)is directly derived from Formula (6.4) and Formula (6.5) of Lemma 6.1.

Elliptic curves over rational fields

Let *<sup>E</sup>* <sup>=</sup> <sup>Q</sup>, then *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup>* in Definition 6.1 are rational numbers. Elliptic curves over rational number fields are one of the most important research topics in modern number theory. There are many important conclusions and famous number theory problems related to them, such as the famous "BSD" conjecture, the ancient congruence problem and so on. Mordell theorem is the most basic conclusion of elliptic curves over rational fields. Since cryptography only cares about elliptic curves over finite fields, here we briefly introduce some important results without proof.

Let *CE* be an elliptic curve in the field of rational numbers. From Corollary 6.1, all points of *CE* form an Abel group *G*. In algebra, an Abel group is equivalent to a module over an integer ring, so an Abel group is also called <sup>Z</sup>−module. The Mordell group on elliptic curve *CE* is regarded as a <sup>Z</sup>−module *<sup>G</sup>*, according to the decomposition theorem of modules on the principal ideal ring, a <sup>Z</sup>−module can be decomposed into the direct sum of a twisted module and a free module. Therefore, the Mordell group *G* on *CE* has

$$G = \operatorname{Tor}(G) \oplus \operatorname{Free}(G).$$

Mordell first proved the following important conclusions. Mordell Theorem: The Abel group *<sup>G</sup>* on elliptic curve *CE* (*<sup>E</sup>* <sup>=</sup> <sup>Q</sup> is a rational number field) is finitely generated; in other words, *G* is a finitely generated Z-module. Therefore, Mordell group *G* can be decomposed into

$$G = \operatorname{Tor}(G) \oplus \mathbb{Z}(\alpha\_1, \alpha\_2, \dots, \alpha\_r).$$

where α1, α2,...,α*<sup>r</sup>* is a set of bases of free module *Free*(*G*) and *r* is the rank of free module. The rank *r* is only known to be finite, but how to calculate it is a famous number theory problem. The so-called BSD conjecture holds that *r* can be given by the function value of *L*-function on elliptic curve, but it has not been fully proved at present.

Another problem related to elliptic curves is the ancient congruence problem, which can be traced back to Plato's time in ancient Greece.

The congruent number problem: if *n* > 1 is a positive integer, is there a right triangle with rational side length, and its area is exactly *n*?

This problem is equivalent to the rank *r* > 0 of elliptic curve *y*<sup>2</sup> = *x* <sup>3</sup> − *n*2*x*, at present, this problem has not been completely solved. Chinese mathematicians prof. Tian Ye have made substantial progress in this problem.

Elliptic curves over finite fields

Let *<sup>E</sup>* <sup>=</sup> <sup>F</sup>*<sup>q</sup>* be a *<sup>q</sup>*-element finite field, *<sup>q</sup>* <sup>=</sup> *<sup>p</sup><sup>r</sup>*, and *<sup>p</sup>* be a prime number. Let

$$F(\mathbf{x}, \mathbf{y}) = \begin{cases} \mathbf{y}^2 - f(\mathbf{x}), & \text{if } p \neq 2, \\ \mathbf{y}^2 + \mathbf{y} - f(\mathbf{x}), & \text{if } p = 2. \end{cases} \tag{6.7}$$

where

$$f(\mathbf{x}) = \begin{cases} \mathbf{x}^3 + a\mathbf{x} + b, & \text{if } p \neq 2, \\ \mathbf{x}^3 + a\mathbf{x}^2 + b\mathbf{x} + c, & \text{if } p = 2. \end{cases} \\ a, b, c \in \mathbb{F}\_q. \tag{6.8}$$

Then an elliptic curve *CE* on F*<sup>q</sup>* is defined as

$$C\_E = \{ (\mathbf{x}, \mathbf{y}) \in \mathbb{F}\_q^2 | F(\mathbf{x}, \mathbf{y}) = \mathbf{0} \} \cup \{ O \}. \tag{6.9}$$

where "*O* is the infinity point.

Obviously, the number of points in *CE* is limited, let *Nq* = |*CE* |, be called the number of points of elliptic curve in <sup>F</sup>*<sup>q</sup>* . *Nq* <sup>≤</sup> <sup>2</sup>*<sup>q</sup>* <sup>+</sup> 1 is a trivial estimate, because each *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* has at most two *<sup>y</sup>* values, together with the infinity point. The more accurate estimation of *Nq* depends on the Riemann hypothesis on the field of univariate algebraic functions proved by A. Weil, which is a very profound result in mathematics. A. Hasse proved the following results when *F*(*x*, *y*) is an elliptic curve.

**Theorem 6.1** (Hasse Theorem) *Let Nq be the number of elliptic curve F*(*x*, *y*) = 0 *at the midpoint of* F*<sup>q</sup> , then we have*

$$|N\_q - (q+1)| \le 2\sqrt{q}.$$

*Proof* Let χ be a quadratic real feature in F*<sup>q</sup>* , that is

$$\chi(a) = \begin{cases} 0, & \text{if } a = 0, \\ 1, & \text{if } a = b^2, b \in \mathbb{F}\_q, \\ -1, & \text{otherwise.} \end{cases}$$

By definition, it is obvious that the number of solutions of *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>a</sup>* in <sup>F</sup>*<sup>q</sup>* is 1 <sup>+</sup> χ (*a*), so suppose *Nq* is the number of solutions of elliptic curve *<sup>F</sup>*(*x*, *<sup>y</sup>*) <sup>=</sup> 0 in <sup>F</sup>*<sup>q</sup>* , where *F*(*x*, *y*) is given by Eq. (6.7), then

#### 6.1 Basic Theory 235

$$\begin{split} N\_q &= 1 + \sum\_{\mathbf{x} \in \mathbb{F}\_q} (1 + \chi(\mathbf{x}^3 + a\mathbf{x} + b)) \\ &= q + 1 + \sum\_{\mathbf{x} \in \mathbb{F}\_q} \chi(\mathbf{x}^3 + a\mathbf{x} + b). \end{split} \tag{6.10}$$

We use F*<sup>q</sup>* (*x*) to represent the rational function field on F*<sup>q</sup>* , then the univariate algebraic function field defined by *y*<sup>2</sup> = *f* (*x*) can be regarded as a quadratic finite extension field on <sup>F</sup>*<sup>q</sup>* (*x*). The genus *<sup>d</sup>* of this function field is *<sup>d</sup>* <sup>=</sup> 3. Hasse can prove that the Riemann hypothesis on this special algebraic function field is true; that is, all zeros of the corresponding Riemann ξ−function lie on the straight line of *<sup>s</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> + *it*. A special case of this conclusion is

$$|\sum\_{x \in \mathbb{F}\_q} \chi(\alpha^3 + ax + b)| \le (d - 1)\sqrt{q} = 2\sqrt{q}.\tag{6.11}$$

By (6.10),

$$|N\_q - (q+1)| \le 2\sqrt{q}.$$

We have completed the proof.

*Remark 6.2* (6.11) is called the characteristic sum over a finite field, so that *g*(*x*) ∈ <sup>F</sup>*<sup>q</sup>* [*x*] is any polynomial and <sup>χ</sup> is any nontrivial multiplication characteristic over <sup>F</sup>*<sup>q</sup>* , according to A. Weil's famous theorem, we have the following general characteristics and estimates,

$$|\sum\_{\mathbf{x}\in\mathbb{F}\_q} \chi(\mathbf{g}(\mathbf{x}))| \le (\deg \mathbf{g} - 1)\sqrt{q}.$$

Let's briefly introduce A. Weil's theorem. Let F*qn* be an *n*-th extension on F*<sup>q</sup>* , that is *<sup>n</sup>* = [F*qn* : <sup>F</sup>*<sup>q</sup>* ]. *Nqn* is the number of solutions of elliptic curve *<sup>F</sup>*(*x*, *<sup>y</sup>*) <sup>=</sup> <sup>0</sup> in extended field F*qn* . Zeta function *Z*(*T*,*CE* ) on elliptic curve *CE* is defined as the formal power series of *T* :

$$Z(T) = Z(T, C\_E) = \exp(\sum\_{n=1}^{+\infty} \frac{1}{n} N\_{q^n} T^n). \tag{6.12}$$

where exp(*a*) = *e<sup>a</sup>* is an exponential function. A. Weil proves that *Z*(*T* ) is a rational function, i.e.,

$$Z(T) = \frac{qT^2 - \alpha T + 1}{(1 - T)(1 - qT)}.\tag{6.13}$$

where α is an integer depends on elliptic curve *CE* . In fact, the above formula is valid for general algebraic curves. Because of *Nq* = *q* + 1 − α, and α<sup>2</sup> − 4*q* < 0 (Hasse theorem). Therefore, zeta function *Z*(*T* ) has two complex roots, that is, the two solutions <sup>α</sup><sup>1</sup> and <sup>α</sup><sup>2</sup> of *qT* <sup>2</sup> <sup>−</sup> <sup>α</sup>*<sup>T</sup>* <sup>+</sup> <sup>1</sup> <sup>=</sup> 0, and <sup>|</sup> <sup>1</sup> α1 |=| <sup>1</sup> α2 | = <sup>√</sup>*q*. This is the Riemann hypothesis on elliptic curves. A. Weil proved it on general algebraic curves for the first time. See Chap. 5 of Silverman (1986 of reference 6) for the specific proof process.

From the above a. Weil theorem, take logarithms on both sides of Eq. (6.12) and compare the coefficients on both sides of the formal power series. Let *Nqn* be the number of points of the elliptic curve in F*qn* , then

$$|N\_{q^n} - (q^n + 1)| \le 2q^{\frac{n}{2}} (n \ge 1). \tag{6.14}$$

The above formula can also be derived directly from Hasse theorem.

Now let's look at a specific elliptic curve in <sup>F</sup>2, *<sup>y</sup>*<sup>2</sup> <sup>+</sup> *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>* 3; thus, we have a better understanding of A. Weil's theorem. Because *F*(*x*, *y*) = *y*<sup>2</sup> + *y* − *x* <sup>3</sup> = 0 has three points in F2, the zeta function on the elliptic curve,

$$\begin{split} Z(T) &= \exp(\sum\_{n=1}^{+\infty} \frac{N\_n}{n} T^n) \\ &= \frac{2T^2 + 1}{(1 - T)(1 - 2T)}. \end{split}$$

Write 2*T* <sup>2</sup> + 1 = (1 − α1*T* )(1 − α2*T* ), where α<sup>1</sup> = *i* <sup>√</sup>2, <sup>α</sup><sup>2</sup> = −*<sup>i</sup>* <sup>√</sup>2. Take logarithms on both sides of the above formula and compare the coefficients of *T <sup>n</sup>* on both sides,

$$N\_n = \begin{cases} 2^n + 1, & \text{if } n \text{ is odd,} \\ 2^n + 1 - 2(-2)^{\frac{n}{2}}, & \text{if } n \text{ is even.} \end{cases}$$

Where *Nn* represents the number of points of elliptic curve *<sup>y</sup>*<sup>2</sup> <sup>+</sup> *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>* <sup>3</sup> in <sup>F</sup>2*<sup>n</sup>* .

Finally, the Mordell group of elliptic curve on F*<sup>q</sup>* is a finite Abel group of order *Nq* ; according to the classification theorem of finite Abel groups, this group can be expressed as the direct sum of two cyclic groups, which will be further explained when necessary.

# **6.2 Elliptic Curve Public Key Cryptosystem**

An elliptic curve over a finite field F*<sup>q</sup>* forms a finite Abel group *G*, which is similar to F∗ *<sup>q</sup>* ; therefore, the elliptic curve public key cryptosystem can be constructed by using discrete logarithm. Compared with other public key cryptosystems based on discrete logarithm (such as ElGamal cryptosystem), elliptic curve cryptosystem has greater flexibility, because when a huge *q* is selected, the working platform of ElGamal cryptosystem has only one multiplication group F<sup>∗</sup> *<sup>q</sup>* , but multiple elliptic curves can be defined on F*<sup>q</sup>* , so there will be multiple Mordell groups to choose, and elliptic curve cryptosystem has greater concealment and security.

Before introducing elliptic curve cryptography, we first discuss the computational complexity on two species group. The computational complexity of multiplication over finite field <sup>F</sup>*<sup>q</sup>* has been discussed in Chap. <sup>4</sup> Lemma 4.12, specially, <sup>α</sup> <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* , *<sup>k</sup>* is an integer, then *Time*(α*<sup>k</sup>* ) <sup>=</sup> *<sup>O</sup>*(log *<sup>k</sup>* log<sup>3</sup> *<sup>q</sup>*). In the case of elliptic curves, the Mordell group *G* is an addition operation, so that *P* ∈ *G* is a point. *k P* means that the points *P* are added *k* times continuously.

**Lemma 6.2** *Let E* <sup>=</sup> <sup>F</sup>*<sup>q</sup> be a q-element finite field, CE be an elliptic curve on* <sup>F</sup>*<sup>q</sup> defined by Weierstrass equations (6.7), (6.8) and (6.9), P* ∈ *G, G be a Mordell group on CE , then for any integer k,*

$$\begin{cases} Time(kP) = O(\log k \log^3 q), & \text{if } k \le N\_q; \\ Time(kP) = O(\log^4 q), & \text{if } k > N\_q. \end{cases}$$

*where Nq is the number of points of curve CE and the order of Mordell group G.*

*Proof* Let *P* = (*x*, *y*), *y* = 0, then *P* + *P* = (*x* , *y* ), where *x* and *y* are determined by Equation (6.5), (6.5) (addition, subtraction, multiplication, division, etc.) involved in the formula shall not exceed 20 calculations, and the bit operation times of each calculation is *O*(log<sup>3</sup> *q*). By the "repeated square method," *k P* can be transformed into log *k* steps, thus

$$Time(kP) = O(\log k \log^3 q).$$

If *y* = 0, defined by *P* + *P* = *O* and "repeated square method," there is *Time*(*k P*) = *O*(log *k*).

If *k* > *Nq* , because *Nq* · *P* = 0, let *k* = *s* · *Nq* + *r*, 1 ≤ *r* ≤ *Nq* , thus *k P* = *r P*. We can calculate *r P*. Thus

$$Time(kP) = O(\log N\_q + \log N\_q \log^3 q) = O(\log^4 q).$$

We use Hasse's theorem: *Nq* = *q* + 1 + *O*( <sup>√</sup>*q*), there is *Nq* <sup>=</sup> *<sup>O</sup>*(*q*), thus

$$
\log N\_q = O(\log q).
$$

Lemma 6.2 holds.

Secondly, we consider how to correspond a plaintext unit *m* to a point on a given elliptic curve *CE* , which is a necessary premise for encryption using elliptic curve. Unfortunately, there is no definite algorithm for polynomial bit operation, which can correspond any huge integer *m* to a point on any elliptic curve. Instead, we can only choose the probability algorithm with sufficiently low error probability to realize the correspondence from number to point. The so-called probability algorithm does not guarantee 100% success rate (therefore, each operation depends on your luck), but the success probability should be large enough, that is

*P*{number to point correspondence} > 1 − ε, ε > 0 sufficient small.

Next, we introduce a probabilistic algorithm to realize the correspondence from number to point, which makes theoretical preparation for the construction of elliptic curve cryptosystem.

Probabilistic algorithm

Treat each plaintext unit as an integer *m*, 0 ≤ *m* < *M*, *k* is an integer. Select a finite field <sup>F</sup>*<sup>q</sup>* , *<sup>q</sup>* <sup>=</sup> *<sup>p</sup><sup>r</sup>* satisfies *<sup>q</sup>* <sup>&</sup>gt; *kM*. We write the positive integer *<sup>n</sup>* from 1 to *kM* as follows,

$$1 \le n \le kM, n = mk + j, 0 \le m < M, 1 \le j \le k. \tag{6.15}$$

**Lemma 6.3** *There is a 1-1 correspondence* τ *between the set of integers A* = {1, <sup>2</sup>,..., *kM*} *and a subset of finite field* <sup>F</sup>*<sup>q</sup>* (*<sup>q</sup>* <sup>&</sup>gt; *kM*)*.*

*Proof* Because *<sup>q</sup>* <sup>=</sup> *<sup>p</sup><sup>r</sup>*, let *<sup>g</sup>*(*x*) <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* [*x*] be a monic irreducible polynomial, and deg *<sup>g</sup>*(*x*) <sup>=</sup> *<sup>r</sup>* <sup>−</sup> 1, from the finite extension theory of fields, <sup>F</sup>*<sup>q</sup>* is isomorphic to a quotient ring of polynomial ring <sup>F</sup>*p*[*x*] over subfield <sup>F</sup>*p*, that is

$$\mathbb{F}\_q \cong \mathbb{F}\_p[\mathbf{x}]/\_{\preccurlyeq (x)>} = \{a\_0 + a\_1\mathbf{x} + \dots + a\_{r-1}\mathbf{x}^{r-1} | a\_i \in \mathbb{F}\_p\}.$$

Each element <sup>α</sup> <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* uniquely corresponds to a polynomial *<sup>a</sup>*<sup>0</sup> <sup>+</sup> *<sup>a</sup>*1*<sup>x</sup>* +···+ *ar*−1*xr*−1, we write

$$\alpha = (a\_{r-1}a\_{r-2}\cdots a\_1a\_0)\_P,$$

is called the *p*-ary representation of α.

For every *m*, 0 ≤ *m* < *M*, each *j*, 1 ≤ *j* ≤ *k*, then it uniquely corresponds to *n* = *mk* + *j*, express *n* as a *p*-ary number, if the *p*-ary of *n* is expressed as (*ar*−<sup>1</sup>*ar*−<sup>2</sup> ··· *<sup>a</sup>*1*a*0)*p*, then let τ (*n*) <sup>=</sup> <sup>α</sup> <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* . The uniqueness represented by *<sup>p</sup>*-ary, then τ is an injection.

$$A' = \{\tau(n)|1 \le n \le kM\} \subset \mathbb{F}\_q.$$

Therefore, we establish a 1-1 correspondence τ of *A* → *A* . The Lemma holds.

Next, for each *m*(0 ≤ *m* < *M*), we establish a 1-1 correspondence σ between *m* and the point on elliptic curve *CE* . Arbitrary choice 1 ≤ *j* ≤ *k*, then *n* = *mk* + *j* corresponds to an element in <sup>F</sup>*<sup>q</sup>* , that is τ (*n*) <sup>=</sup> *<sup>x</sup> <sup>j</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* . For each *<sup>x</sup> <sup>j</sup>* , consider the solution of the following equation.

$$\mathbf{y}^2 = f(\mathbf{x}\_j) = \mathbf{x}\_j^3 + a\mathbf{x}\_j + b.\tag{6.16}$$

If the above equation has a solution, let *y*<sup>1</sup> be one of the solutions, then *Pm* = (*x <sup>j</sup>*, *y*1) ∈ *CE* , we let σ (*m*) = *Pm*, the inverse mapping σ <sup>−</sup><sup>1</sup>(*Pm*) of σ is

$$\sigma^{-1}(P\_m) = \lceil \frac{\pi^{-1}(\mathbf{x}\_j) - 1}{k} \rceil. \tag{6.17}$$

where τ is the 1-1 correspondence in lemma 6.3. Because τ <sup>−</sup>1(*x <sup>j</sup>*) = *mk* + *j*, so

$$[\frac{\pi^{-1}(x\_j) - 1}{2}] = [m + \frac{j - 1}{k}] = m \dots$$

So σ <sup>−</sup><sup>1</sup> is exactly the inverse mapping of σ. From σ, a 1-1 correspondence between each *m* and the point on the elliptic curve is established. σ is called a probabilistic algorithm.

**Lemma 6.4** *Probability algorithm* <sup>σ</sup> *can successfully achieve probability* <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> 2*k , that is*

*<sup>P</sup>*{*is generated by* <sup>σ</sup> *and the number m corresponds to 1-1 of the point*} ≥ <sup>1</sup> <sup>−</sup> <sup>1</sup> 2*k* .

*Proof* When *m*, 0 ≤ *m* < *M* given, *n* = *mk* + *j*, where *k* is any given positive integer, 1 <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>k</sup>*. By Lemma 6.3, τ (*n*) <sup>=</sup> *<sup>x</sup> <sup>j</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* , then the probability that *<sup>f</sup>* (*<sup>x</sup> <sup>j</sup>*) <sup>=</sup> *x* 3 *<sup>j</sup>* <sup>+</sup> *ax <sup>j</sup>* <sup>+</sup> *<sup>b</sup>* is a square number is <sup>1</sup> <sup>2</sup> , in other words, the probability that equation (6.16) has a solution in F*<sup>q</sup>* is <sup>1</sup> <sup>2</sup> ; therefore, the probability of no solution is also <sup>1</sup> 2 . We randomly and independently select *j*, 1 ≤ *j* ≤ *k*, the error probability of each *j* (no solution in Eq. (6.16)) is <sup>1</sup> <sup>2</sup> ; therefore, the error probability of *k j* is <sup>1</sup> <sup>2</sup>*<sup>k</sup>* . Once Equation (6.16) has a solution, then *Pm* = (*x <sup>j</sup>*, *y*) ∈ *CE* , we can establish the 1-1 correspondence σ between *m* and points on *CE* , σ (*m*) = *Pm*. Thus

$$P\{\sigma \text{ Successfully independently}\} \ge 1 - \frac{1}{2^k}.$$

We complete the proof of lemma.

*Remark 6.3 f* (*x <sup>j</sup>*) = *x* <sup>3</sup> *<sup>j</sup>* + *ax <sup>j</sup>* + *b* is a square number, that is, the probability that Equation (6.16) has a solution is exactly *Nq* /2*q*, where *Nq* is the number of points of *CE* . By Hasse's theorem, *Nq* /2*q* is very close to <sup>1</sup> 2 .

**Definition 6.3** Let *CE* be an elliptic curve over a finite field <sup>F</sup>*<sup>q</sup>* and *<sup>B</sup>* <sup>∈</sup> *CE* be a point. For any point *P* on *CE* , if there is an integer *x*, such that *x B* = *P*, *x* is called the discrete logarithm of *P* to base *B*.

With the above preparation, we can establish elliptic curve public key cryptosystem.

Diffie–Hellman key conversion principle

Symmetric cryptosystem, also known as classical cryptosystem or traditional cryptosystem, is the mainstream cryptosystem before the advent of public key cryptosystem. It has high efficiency because its encryption and decryption share the same algorithm (such as DES, the data encryption standard algorithm launched by the American Bureau of standards in 1977). When Diffie and Hellman proposed asymmetric cryptosystem, they pointed out that symmetric cryptosystem and asymmetric cryptosystem are not completely separated. The two cryptosystems are interrelated and can even be used together. Diffie–Hellman key conversion principle is based on the following mathematical principles.

**Lemma 6.5** *Let p be a prime number, q* <sup>=</sup> *<sup>p</sup>r,* <sup>F</sup>*<sup>q</sup> is a q-element finite field,* <sup>Z</sup>*pr is the residual class ring* mod *pr,* Z*<sup>r</sup> <sup>p</sup> is an n-dimensional vector space on* F*p, then* F*<sup>q</sup> ,* Z*pr ,* Z*<sup>r</sup> <sup>p</sup> have 1-1 correspondence with each other.*

*Proof* F*<sup>q</sup>* is an *r*-th finite extension on F*p*, so the additive group F<sup>+</sup> *<sup>q</sup>* of F*<sup>q</sup>* is isomorphic with F*<sup>r</sup> <sup>p</sup>*, that is F<sup>+</sup> *<sup>q</sup>* ∼= <sup>F</sup>*<sup>r</sup> <sup>p</sup>*, therefore, there is a 1-1 correspondence between F*<sup>q</sup>* and F*<sup>r</sup> <sup>p</sup>*. Each *<sup>a</sup>* <sup>=</sup> (*a*0, *<sup>a</sup>*1,..., *ar*−1) <sup>∈</sup> <sup>F</sup>*<sup>r</sup> <sup>p</sup>*, define

$$
\sigma(a) = a\_0 + a\_1 p + \dots + a\_{r-1} p^{r-1} \in \mathbb{Z}\_p^r.
$$

Then σ is a surjection and a injection of F*<sup>r</sup> <sup>p</sup>* <sup>→</sup> <sup>Z</sup>*pr* , so <sup>σ</sup> is a 1-1 correspondence of F*<sup>r</sup> <sup>p</sup>* <sup>→</sup> <sup>Z</sup>*pr* . Since there is a 1-1 correspondence between <sup>F</sup>*<sup>q</sup>* and <sup>F</sup>*<sup>r</sup> <sup>p</sup>* and a 1-1 correspondence between F*<sup>r</sup> <sup>p</sup>* and Z*pr* , there is also a 1-1 correspondence between F*<sup>q</sup>* and Z*pr* , the Lemma holds.

From the above lemma, we have the following conclusions.

**Lemma 6.6** *Let N be a positive integer.* Z*<sup>N</sup> is a residue class ring* mod *N. Then for any prime p, there is a finite field* <sup>F</sup>*pr such that there is an injection* <sup>σ</sup> *of* <sup>Z</sup>*<sup>N</sup>* <sup>→</sup> <sup>F</sup>*pr , this injection is also called embedded mapping.*

*Proof* When *N* given, for any prime *p*, express *N* as a *p*-ary number, then exists a positive integer *r* ≥ 1, such that *pr*−<sup>1</sup> ≤ *N* < *p<sup>r</sup>*. We write

<sup>Z</sup>*<sup>N</sup>* = {0, <sup>1</sup>, <sup>2</sup>,..., *<sup>N</sup>* <sup>−</sup> <sup>1</sup>}⊂{0, <sup>1</sup>, <sup>2</sup>,..., *<sup>N</sup>* <sup>−</sup> <sup>1</sup>, *<sup>N</sup>*,..., *<sup>p</sup><sup>r</sup>* <sup>−</sup> <sup>1</sup>} = <sup>Z</sup>*pr* .

That is, Z*<sup>N</sup>* is regarded as a subset of Z*pr* . Let Z*pr* <sup>σ</sup> −→ <sup>F</sup>*pr* be 1-1 correspond, so <sup>σ</sup> gives that <sup>Z</sup>*<sup>N</sup>* <sup>→</sup> <sup>F</sup>*pr* is an injection. The Lemma holds.

From the above conclusions, we can establish Diffie–Hellman's key conversion principle. Because symmetric cryptographic keys are related to the numbers of Z*<sup>N</sup>* , each number in Z*<sup>N</sup>* can be embedded into a finite field F*<sup>q</sup>* by Lemma 6.6. Therefore, the discrete logarithm on F*<sup>q</sup>* can encrypt each embedded number asymmetrically, so that the two cryptosystems can be combined with each other.

Taking the affine cryptosystem introduced in Chap. 4 as an example, *A* is a *k* × *k*order reversible square matrix in <sup>Z</sup>*<sup>N</sup>* , *<sup>b</sup>* <sup>=</sup> (*b*1, *<sup>b</sup>*2,..., *bk* ) <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> <sup>N</sup>* is a given vector, affine transformation *f* = (*A*, *b*) gives the encryption algorithm of each plaintext unit *<sup>m</sup>* <sup>=</sup> *<sup>m</sup>*1*m*<sup>2</sup> ··· *mk* <sup>∈</sup> <sup>Z</sup>*<sup>k</sup> N* .

$$f(m) = c = A \begin{pmatrix} m\_1 \\ \vdots \\ m\_k \end{pmatrix} + \begin{pmatrix} b\_1 \\ \vdots \\ b\_k \end{pmatrix}.$$

Let *<sup>A</sup>* <sup>=</sup> (*ai j*)*<sup>k</sup>*×*<sup>k</sup>* , each *ai j* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* . By Lemma 6.6, we can embed *ai j* into a finite field F*<sup>q</sup>* . *ai j* is encrypted again by using the discrete logarithm algorithm on F*<sup>q</sup>* , so that the two cryptosystems can be effectively combined.

In the case of elliptic curve, we introduce the workflow of Diffie–Hellman elliptic curve cryptography. First, the user selects a public finite field F*<sup>q</sup>* , and an elliptic curve *CE* on <sup>F</sup>*<sup>q</sup>* , randomly select a point *<sup>P</sup>* <sup>∈</sup> *CE* , let *<sup>P</sup>* <sup>=</sup> (*x*, *<sup>y</sup>*), then *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* . By Lemma 6.5, *<sup>x</sup>* corresponds to an *<sup>r</sup>*-dimensional vector (*a*0, *<sup>a</sup>*1,..., *ar*−1) in <sup>F</sup>*<sup>r</sup> <sup>p</sup>* space (where *q* = *p<sup>r</sup>*), consider (*a*0, *a*1,..., *ar*−1) as a *p*-ary number, that is

$$(a\_0, a\_1, \dots, a\_{r-1}) \to a\_0 + a\_1 p + \dots + a\_{r-1} p^{r-1}.$$

Then (*a*0, *a*1,..., *ar*−<sup>1</sup>) can be used as the key of other cryptosystems, especially symmetric cryptosystems.

Secondly, the user selects a common point *B* ∈ *CE* , like a finite field, as the basis of the discrete logarithm on the Mordell group. The difference from finite field is that the Mordell group on elliptic curve is not a cyclic group, so point *B* is not the generator of Mordell group. However, we require order *o*(*B*) of *B* to be as large as possible (*o*(*B*)|*Nq* ). When point *B* is selected, the working platform of elliptic curve cryptography is actually the subgroup < *B* > generated by *B*.

In order to generate the key, each user can randomly select an integer *a*, whose order is roughly the same as *Nq* , as their own user's private key, *a* should be strictly confidential. Calculate *a B* = *A* ∈ *CE* . Point *A* is the public key of each user. Now each user has its own public key (*A*, *B*) and private key (*a*, *B*).

Massey–Omura elliptic curve cryptography.

In order to encrypt and send a plaintext unit *m*(0 ≤ *m* < *M*), *m* is corresponding to the only point *Pm* ∈ *CE* on elliptic curve *CE* by using the probability algorithm introduced earlier. Let *N* = *Nq* = |*CE* |; that is, the order of Mordell group is known. Each user randomly selects an integer *e* to satisfy

$$1 \le e \le N, \text{and } (e, N) = 1.$$

*d* = *e*−<sup>1</sup> mod *N* is calculated by Euclidean rolling division method, that is

$$de \equiv 1 \pmod{N}, \text{ and } 1 \le d \le N.$$

Suppose user *A* wants to encrypt and send plaintext message *Pm* to user *B*, so that (*eA*, *dA*) and (*eB*, *dB*) are the respective private keys of *A* and *B*. First, *A* sends a message *eA Pm* to *B*, and then *B* returns the message *eBeA Pm* to *A*, *A* can calculate the message by using the private key *dA*. Because *N Pm* = 0, *dAeA* ≡ 1(mod *N*), so

$$d\_A e\_B e\_A P\_m = e\_B P\_m.$$

Finally, user *A* sends the calculation result *eB Pm* to *B*, and user *B* can read the original real message *Pm* of user *A* by using the private key *dB*, because *dBeB* ≡ 1(mod *N*), so

$$d\_B e\_B P\_m = P\_m \dots$$

It should be noted that even if user *B* receives the message *eA Pm* sent by *A* for the first time, *eA Pm* is given to user *B* as a point *Q* = *eA Pm* on the elliptic curve. If *B* does not calculate the discrete logarithm, *eA* and *dA* are not known. Although the last user *B* already knows the plaintext *Pm*, the calculation of the discrete logarithm of *Q* under base *Pm* is very complex. Similarly, when user *A* receives a reply from user *B* and calculates *eB Pm*, he cannot know *B*'s private key (*eB*, *dB*).

ElGamal elliptic curve cryptography

ElGamal cryptosystem is another elliptic curve cryptosystem completely different from Massey–Omura cryptosystem. In this system, the order *N* of Mordell group of elliptic curve does not need to be known. All users jointly select a fixed finite field <sup>F</sup>*<sup>q</sup>* , an elliptic curve *CE* on <sup>F</sup>*<sup>q</sup>* and a fixed point *<sup>B</sup>* <sup>∈</sup> *CE* on *CE* as the basis of discrete logarithm. Each user randomly selects an integer *a*(0 ≤ *a* < *Nq* ) as the private key, calculates *Q* = *a B* ∈ *CE* and discloses it. Its workflow is as follows:

If user *A* wants to encrypt and send a plaintext unit *Pm* to user *B*, the public key of *A* is *QA* = *aA* · *B*, the private key is *aA*, the public key of *B* is *QB* = *aB* · *B* and the private key is *aB*. The encryption algorithm of *<sup>A</sup> <sup>f</sup>* −→ *<sup>B</sup>* is

$$f(m) = f(P\_m) = (kB, P\_m + k\mathcal{Q}\_B) = c.\tag{6.18}$$

The decryption algorithm is that user *B* multiplies the first number with private key *aB* and then subtracts the second number. That is,

$$f^{-1}(c) = P\_m + k \, \mathcal{Q}\_B - a\_B(kB). \tag{6.19}$$

Because *QB* = *aB* · *B*, there is

$$f^{-1}(c) = P\_m + ka\_B \cdot B - ka\_B \cdot B = P\_m \cdot c$$

Where *k* is an integer randomly selected by user *A*. This integer *k* does not appear in cryptosystemtext *c* and is called a layer of " mask" added by user *A* to protect plaintext *Pm*. In fact, the cryptosystemtext *c* = (*A*1, *A*2) received by user *B* is two points on elliptic curve *CE* , where

$$A\_1 = kB,\\ A\_2 = P\_m + k\underline{Q}\_B = P\_m + k(a\_B \cdot B).$$

Even if the third user knows the private key *aB* of user *B* (assuming that the private key of user *B* is not secure), decryption with *A*<sup>2</sup> − *aB* · *B* cannot obtain plaintext *Pm*, because

$$A\_2 - a\_B \cdot B = P\_m + k \, \mathcal{Q}\_B - a\_B B = P\_m + k(a\_B \cdot B) - a\_B \cdot B \ne P\_m,$$

if *k* = 1.

The two elliptic curve cryptosystems introduced above are based on the selected elliptic curve *CE* and a point *B* on *CE* as the basis of discrete logarithm. How to randomly select *CE* and *B* needs further research.

**Lemma 6.7** *Let x* <sup>3</sup> <sup>+</sup> *ax* <sup>+</sup> *<sup>b</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* [*x*] *be a cubic polynomial, then x*<sup>3</sup> <sup>+</sup> *ax* <sup>+</sup> *<sup>b</sup>* <sup>=</sup> <sup>0</sup> *have no multiple roots in the split domain if and only if the discriminant* 4*a*<sup>3</sup> + 27*b*<sup>2</sup> = 0*.*

*Proof* This conclusion can be deduced directly from the root formula of cubic algebraic equation.

In order to randomly select an elliptic curve on F*<sup>q</sup>* , *CE* is determined by equation *<sup>y</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>* <sup>3</sup> <sup>+</sup> *ax* <sup>+</sup> *<sup>b</sup>* at χ (F*<sup>q</sup>* ) = <sup>2</sup>, 3. Randomly select three elements (*x*0, *<sup>y</sup>*0, *<sup>a</sup>*) in F*<sup>q</sup>* , let

$$b = \chi\_0^2 - (\chi\_0^3 + a\chi\_0).$$

Check whether *f* (*x*) = *x* <sup>3</sup> + *ax* + *b* has multiple roots. From Lemma 6.7, just check whether discriminant 4*a*<sup>3</sup> + 27*b*<sup>2</sup> is 0. If *f* (*x*) has no multiple roots, then select the elliptic curve *y*<sup>2</sup> = *x* <sup>3</sup> + *ax* + *b*. Where (*x*0, *y*0) ∈ *CE* is a point on an elliptic curve. So let *B* = (*x*0, *y*0) is the base of discrete logarithm. Similarly, for *q* = 2*<sup>r</sup>* or *q* = 3*<sup>r</sup>*, we can also randomly draw an elliptic curve *CE* and determine the basis *B* ∈ *CE* of the discrete logarithm at the same time.

It should be noted that at present, no algorithm can calculate the number of points *Nq* of any elliptic curve. Some special algorithms, such as schoof algorithm, are quite complex and lengthy in practical application, although the computational complexity is polynomial.

Now we introduce the second method of selecting elliptic curves, called mod *p* method. An elliptic curve *CE* , if *<sup>E</sup>* is a number field, such as *<sup>E</sup>* <sup>=</sup> <sup>R</sup>, <sup>Q</sup>, <sup>C</sup>, *CE* is called a global curve. We use the mod *p* method to convert a global curve into a "local" curve. Firstly, a point *B* ∈ *CE* on a global curve *CE* and *CE* is selected, where *<sup>B</sup>* is the group element of Mordell group, its addition order is <sup>∞</sup>, where *<sup>E</sup>* <sup>=</sup> <sup>Q</sup> is the rational number field.

$$\mathcal{C}\_E: \mathbf{y}^2 = \mathbf{x}^3 + a\mathbf{x} + b, \ a, b \in \mathbb{Q}.$$

Let *p* be a prime number and coprime with the integers in the denominators of *a* and *b*, then we obtain an elliptic curve on F*p*,

$$C\_E \bmod p : y^2 \equiv x^3 + ax + b (\text{mod } p), a, b \in \mathbb{F}\_p.$$

and a point *B* mod *p* on *CE* mod *p*, when localizing an elliptic curve, the choice of prime *p* only needs to satisfy

$$a \not\mid a \\ \text{and } b \text{ 's denominator, and } 4a^3 + 27b^2 \not\equiv 0 \\ (\text{mod } p) \dots$$

In fact, we can ask further

$$N\_p = |C\_E \bmod p| = \text{prime.}\tag{6.20}$$

In this way, the Mordell group of *CE* mod *p* is a cyclic group, and any finite point of *CE* mod *p* will be the generator of the group. At present, there is no deterministic algorithm for selecting the prime number *p* satisfying Formula (6.20), and it is generally speculated that a probabilistic algorithm with success probability <sup>≥</sup> *<sup>O</sup>*( <sup>1</sup> log *<sup>p</sup>* ) exists.

# **6.3 Elliptic Curve Factorization**

In 1986, mathematician H.W. Lenstra used elliptic curve to find a new method of factor decomposition. Lenstra's method has greater advantages than the known old algorithms in many aspects, which is also one of the main reasons why elliptic curve has attracted more and more attention in the field of cryptography, We first introduce a classical factorization method called Pollard (*p* − 1) algorithm.

(*p* − 1) algorithm

Suppose *n* is a compound number, and *p* is a prime factor of *n*; of course, *p* is unknown and needs to be further determined. If *p* − 1 happens to have some small prime factors, or all prime factors of *p* − 1 are not too large, the essence of (*p* − 1) method is to find the prime factor *p* with this property of *n*. (*p* − 1) method can be completed in the following steps:


In order to explain the working principle of (*p* − 1) algorithm, we further assume that *k* is a multiple of all positive integers less than *B*, and *p*|*n*,

$$p - 1 = \prod p\_i^{\alpha\_i}, \text{ where } \forall \ p\_i^{\alpha\_i} \le B. \tag{6.21}$$

There is *p* − 1|*k*. By Fermat congruence theorem,

$$a^{p-1} \equiv 1 \pmod{p}, \Longrightarrow a^k \equiv 1 \pmod{p}.$$

So *p*|*d*, where *d* = (*a<sup>k</sup>* − 1, *n*).

**Definition 6.4** Suppose *n* is a compound number, *p*|*n*. *B* is a sufficiently large positive integer arbitrarily selected, and *p* is called *B*−smooth prime, if Eq. (6.21) holds. That is, *p* − 1 can be decomposed into the product of prime powers less than *B*.

**Lemma 6.8** *Suppose n is a compound number and B is a positive integer. If n has a B*−*smoothing prime factor p, select k and a according to the algorithm steps* 1 − 4*, then we have d* <sup>=</sup> (*a<sup>k</sup>* <sup>−</sup> <sup>1</sup>, *<sup>n</sup>*) > <sup>1</sup>*, so we have factor decomposition n* <sup>=</sup> *<sup>d</sup>* · *<sup>n</sup> d .*

*Proof* If *p* is a smoothing prime factor of *n*, then we have *p*|(*a<sup>k</sup>* − 1, *n*), thus *d* > 1. The Lemma holds.

In the above algorithm, if *d* = (*a<sup>k</sup>* − 1, *n*) = *n*. That is *n*|*a<sup>k</sup>* − 1, if the algorithm fails, we must reselect *a* and carry out a new round of testing.

*Example 6.2* Factorization of *n* = 540143, if (*p* − 1) method is used, then choose *B* = 8, *k* = 840, is the least common multiple of 1, 2,..., 8, let *a* = 2, calculate the minimum nonnegative residue of 2<sup>840</sup> under mod *n*,

$$2^{840} \equiv 53047(\text{mod } 540143).$$

Calculate (2<sup>840</sup> − 1, *n*),

$$d = (2^{840} - 1, n) = (\ $3046, \$ 40143) = 421.5$$

So we have factorization 540143 = 421 × 1283.

Pollard's (*<sup>p</sup>* <sup>−</sup> <sup>1</sup>) method is essentially the multiplication group of <sup>Z</sup>*p*, the order of Z<sup>∗</sup> *<sup>p</sup>* cannot be divided by a huge prime number; otherwise, this method will not work. Lenstra can overcome this disadvantage by using elliptic curves for factor decomposition, because there are many elliptic curves to choose from, we can always hope that the order of Mordell group on an elliptic curve is not divided by a huge prime number. Next, we introduce Lenstra's method in detail. First, we discuss the elliptic curve mod *n*.

The following general assumption is that *n* is an odd number and a compound number, *p*|*n* (*p* is unknown) and *p* > 3. Let *m* be a positive integer, *x*1, *x*<sup>2</sup> be two rational numbers, and the denominators of *x*<sup>1</sup> and *x*<sup>2</sup> are mutually prime with *m*, so that *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> <sup>=</sup> *<sup>c</sup> <sup>d</sup>* is a reduced fraction, then define

$$
\lambda x\_1 \equiv \chi\_2(\operatorname{mod} m), \text{ if } m|c. \tag{6.22}
$$

**Lemma 6.9** *Suppose x*<sup>1</sup> <sup>∈</sup> <sup>Q</sup> *is a rational number, if its denominator and m are mutually prime, there is a unique nonnegative integer r, such that x*<sup>1</sup> ≡ *r*(mod *m*)*. r is called the nonnegative residue of x*<sup>1</sup> *under* mod *m, denote as r* = *x*<sup>1</sup> mod *m.*

*Proof* Write *<sup>x</sup>*<sup>1</sup> <sup>=</sup> *<sup>b</sup> <sup>a</sup>* , where (*a*, *<sup>m</sup>*) <sup>=</sup> 1, *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>=</sup> <sup>−</sup>*ax*+*<sup>b</sup> <sup>a</sup>* , because the congruence equation −*ax* + *b* ≡ 0(mod *m*) has a unique solution *r*, 0 ≤ *r* < *m*. So there is a unique *r* such that *x*<sup>1</sup> ≡ *r*(mod *m*). The Lemma holds.

In order to randomly generate an elliptic curve *CE* over the rational number field <sup>Q</sup>, we randomly select three integers *<sup>a</sup>*, *<sup>x</sup>*0, *<sup>y</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>, let *<sup>b</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>0</sup> − *x* <sup>3</sup> <sup>0</sup> − *ax*<sup>0</sup> to satisfy

$$
\Delta = 4a^2 + 27b^2 \neq 0,\\
\text{and } (\Delta, n) = 1. \tag{6.23}
$$

We get an elliptic curve *CE* : *y*<sup>2</sup> = *x* <sup>3</sup> + *ax* + *b*, where (*x*0, *y*0) ∈ *CE* . Because *<sup>a</sup>*, *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup>, = 4*a*<sup>2</sup> + 27*b*<sup>2</sup> and *n* are coprime, then for all prime *p*, *p*|*n*, =⇒ ≡ <sup>0</sup>(mod *<sup>p</sup>*). Therefore, as a cubic algebraic equation over a finite field <sup>F</sup>*p*, *<sup>x</sup>* <sup>3</sup> <sup>+</sup> *ax* <sup>+</sup> *<sup>b</sup>* has no multiple roots, so we obtain a "local" elliptic curve *CE* mod *p*, where

$$C\_E \bmod p : \text{y}^2 \equiv \text{x}^3 + a\text{x} + b(\text{mod } p). \tag{6.24}$$

And a point (*x*<sup>0</sup> mod *p*, *y*<sup>0</sup> mod *p*) ∈ *CE* mod *p* on *CE* mod *p*, let's write this point on *CE* mod *p* with *P*, that is

$$P = (\chi\_0 \bmod p, \,\chi\_0 \bmod p) \in C\_E \bmod p.$$

Next, we want to calculate *k P*, like the "continuous square method" of multiplication, and there is a similar continuous doubling method for addition.

**Lemma 6.10** *When k is a huge integer, the computational complexity of k P is*

$$Time(kP) = \log k \cdot Time(P).$$

*Proof k* is expressed as a binary integer, i.e.,

$$k = a\_0 + a\_1 \mathcal{D} + a\_2 \mathcal{D}^2 + \dots + a\_{m-1} \mathcal{D}^{m-1}, \forall \ a\_i = 0 \text{ or } 1.$$

We can double continuously, that is, 2*<sup>j</sup> P* + 2*<sup>j</sup> P* = 2 · 2*<sup>j</sup> P*(0 ≤ *j* ≤ *m* − 2), thus obtain *k P*, *m* is the binary digit of *k*, *m* = *O*(log *k*), there is

$$Time(kP) = \log k \cdot Time(P).$$

The Lemma holds.

**Theorem 6.2** *Let CE be an elliptic curve over the rational field* Q*, define the equation as y*<sup>2</sup> <sup>=</sup> *<sup>x</sup>* <sup>3</sup> <sup>+</sup> *ax* <sup>+</sup> *b, where a*, *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup>*, and* (4*a*<sup>3</sup> <sup>+</sup> <sup>27</sup>*b*<sup>2</sup>, *<sup>n</sup>*) <sup>=</sup> <sup>1</sup>*. Let P*<sup>1</sup> *and P*<sup>2</sup> *be two points on CE , and their denominators are coprime with n, and P*<sup>1</sup> = −*P*2*, P*<sup>1</sup> + *P*<sup>2</sup> ∈ *CE . Let P*<sup>1</sup> + *P*<sup>2</sup> = (*x*, *y*)*, then the necessary and sufficient condition for the denominator of x and y to be mutually prime with n is that there is no prime factor p*|*n of n, P*<sup>1</sup> mod *p and P*<sup>2</sup> mod *p are two points on the local curve CE* mod *p,*

$$P\_1 \bmod p + P\_2 \bmod p = 0.1$$

*Proof* Let *P*<sup>1</sup> = (*x*1, *y*1), *P*<sup>2</sup> = (*x*2, *y*2) is the two points on *CE* . *P*<sup>1</sup> + *P*<sup>2</sup> = (*x*, *y*). If the denominators of *x* and *y* are coprime with *n*, we have to prove

$$P\_1 \bmod p + P\_2 \bmod p \neq 0, \forall \ p \mid n. \tag{6.25}$$

If *x*<sup>1</sup> ≡ *x*2(mod *p*), it is obvious that Formula (6.4) is true from Formula (6.25). Might as well make *x*<sup>1</sup> ≡ *x*2(mod *p*). If *P*<sup>1</sup> = *P*2, now *x*<sup>1</sup> = *x*2, *y*<sup>1</sup> = *y*2, we only need *p* - 2*y*1. If *p*|2*y*1, because the coordinates of 2*P*<sup>1</sup> = (*x*, *y*) are determined by equation (6.5),

$$\begin{cases} x = (\frac{3x\_1^2 + \alpha}{2y\_1})^2 - 2x\_1; \\ y = y\_1 - (\frac{3x\_1^2 + \alpha}{2y\_1})(x\_1 - x). \end{cases}$$

Where<sup>α</sup> <sup>=</sup> <sup>3</sup>*x*<sup>2</sup> <sup>1</sup>+*a* <sup>2</sup>*y*<sup>1</sup> . By *<sup>p</sup>*|2*y*1, =⇒ <sup>3</sup>*<sup>x</sup>* <sup>2</sup> <sup>1</sup> + α ≡ 0(mod *p*). Because *n* is an odd number, so *p*|*y*1, we have -

$$\begin{aligned} x\_1^3 + a x\_1 + b &\equiv 0 (\text{mod } p); \\ 3x\_1^2 + a &\equiv 0 (\text{mod } p). \end{aligned}$$

That is, *x*<sup>1</sup> is the root of *f* (*x*) = *x* <sup>3</sup> + *ax* + *b* and derivative *f* (*x*) = 3*x* <sup>2</sup> + *a*(mod *p*). This is contradictory to (4*a*<sup>3</sup> + 27*b*<sup>2</sup>, *n*) = 1. So you might as well let *P*<sup>1</sup> = *P*2, now *x*<sup>1</sup> ≡ *x*2(mod *p*), *x*<sup>1</sup> = *x*2(because *P*<sup>1</sup> = −*P*2), we can write

$$\mathbf{x}\_2 = \mathbf{x}\_1 + tp^r, r \ge 1.$$

The numerator and denominator of *t* and *p* are mutually prime, which can be deduced from Formula (6.4),

$$\mathbf{y}\_2 = \mathbf{y}\_1 + \mathbf{s}p'.$$

On the other hand, by *y*<sup>2</sup> <sup>2</sup> = *x* <sup>3</sup> <sup>2</sup> + *ax*<sup>2</sup> + *b*, there is

$$\begin{split} \mathbf{y}\_2^2 &= (\mathbf{x}\_1 + tp^r)^3 + a(\mathbf{x}\_1 + tp^r) + b \\ &\equiv \mathbf{x}\_1^3 + a\mathbf{x}\_1 + b + tp^r(\mathbf{3}\mathbf{x}\_1^2 + a)(\text{mod } p) \\ &\equiv \mathbf{y}\_1^2 + tp^r(\mathbf{3}\mathbf{x}\_1^2 + a)(\text{mod } p). \end{split} \tag{6.26}$$

But *x*<sup>1</sup> ≡ *x*2(mod *p*), *y*<sup>1</sup> ≡ *y*2(mod *p*), there is

$$P\_1 \bmod p + P\_2 \bmod p = 2P\_1.$$

The above formula is infinite if and only if *y*<sup>1</sup> ≡ *y*<sup>2</sup> ≡ 0(mod *p*). If *y*<sup>1</sup> ≡ *y*<sup>2</sup> ≡ 0(mod *p*), then *y*<sup>2</sup> <sup>2</sup> − *y*<sup>2</sup> <sup>1</sup> = (*y*<sup>2</sup> − *y*1)(*y*<sup>2</sup> + *y*1) will be divided by *p<sup>r</sup>*+1. Therefore, Equation (6.26) contains 3*x* <sup>2</sup> <sup>1</sup> + *a* ≡ 0(mod *p*). It's impossible. Because *x* <sup>3</sup> + *ax* + *b*(mod *p*) has no multiple roots, *x*<sup>1</sup> cannot be the roots of *x* <sup>3</sup> + *ax* + *b* and derivative 3*x* <sup>2</sup> under mod *p*. This proves that Formula (6.25) holds under the assumption.

Conversely, if Eq. (6.25) holds, we prove that the denominator of *P*<sup>1</sup> + *P*<sup>2</sup> and *n* are coprime. Fixed *p*|*n*, if *x*<sup>1</sup> ≡ *x*2(mod *p*), from equation (6.4), the denominator of *P*<sup>1</sup> + *P*<sup>2</sup> and *p* are coprime. Might as well make *x*<sup>1</sup> ≡ *x*2(mod *p*), then *y*<sup>2</sup> ≡ ±*y*1(mod *p*). Because *P*<sup>1</sup> mod *p* + *P*<sup>2</sup> mod *p* = 0, we have *y*<sup>2</sup> ≡ *y*<sup>1</sup> ≡ 0(mod *p*). First assume *P*<sup>2</sup> = *P*1, then Equation (6.5) and the fact of *y*<sup>1</sup> ≡ 0(mod *p*) prove that the denominator of *P*<sup>1</sup> + *P*<sup>2</sup> = 2*P*<sup>1</sup> and *p* is coprime. Finally, let *P*<sup>2</sup> = *P*1, we write *x*<sup>2</sup> = *x*<sup>1</sup> + *t p<sup>r</sup>*, (*t*, *p*) = 1, using the congruence of Formula (6.26), there are

$$\frac{y\_2^2 - y\_1^2}{x\_2 - x\_1} \equiv 3x\_1^2 + a(\text{mod } p).$$

Because *p y*<sup>2</sup> + *y*<sup>1</sup> ≡ 2*y*1(mod *p*), so the denominator of

$$\frac{\mathbf{y}\_2^2 - \mathbf{y}\_1^2}{(\mathbf{y}\_2 + \mathbf{y}\_1)(\mathbf{x}\_2 - \mathbf{x}\_1)} = \frac{\mathbf{y}\_2 - \mathbf{y}\_1}{\mathbf{x}\_2 - \mathbf{x}\_1}$$

cannot be divided by *p*, by (6.4), the denominator of *P*<sup>1</sup> + *P*<sup>2</sup> cannot be divided by *p*. Since *p*|*n* is arbitrary, we complete the proof of the whole theorem.

Lenstra algorithm.

Let *n* be an odd compound number, we hope to find a nontrivial factor *d* of *n*, *d*|*n*, <sup>1</sup> <sup>&</sup>lt; *<sup>d</sup>* <sup>&</sup>lt; *<sup>n</sup>*, so there is factorization *<sup>n</sup>* <sup>=</sup> *<sup>d</sup>* · *<sup>n</sup> <sup>d</sup>* . Previously, we have introduced the random selection of an elliptic curve *CE* on rational number field Q and a point *P* on *CE* . Lenstra's algorithm hopes to factorize *n* by (*CE* , *P*). There is no doubt that the Lenstra algorithm to be explained below is also a probability algorithm. If (*CE* , *P*) cannot be factorized successfully, as long as the probability of failure is *p* < 1, select another elliptic curve and a point above. If this continues, after randomly and independently selecting *n* elliptic curves, the probability of successful factorization of *n*,

$$P\{n = d \cdot \frac{n}{d}\} \ge 1 - p''(p < 1).$$

When *n* is sufficiently large, the success probability of Lenstra algorithm can be infinitely close to 1. Therefore, the so-called Lenstra algorithm can be simply summarized as an algorithm that factorizes *n* by using any rational elliptic curve (*CE* , *P*), and its failure probability is *p* < 1.

Let (*CE* , *P*) be a given rational elliptic curve, and *B* and *C* be the positive upper bound of selection. Let *k* be divided by some small prime powers, to be exact,

$$k = \prod\_{1 \le l \le B} l^{\alpha l},\tag{6.27}$$

where α*<sup>l</sup>* is the largest index satisfying *l* <sup>α</sup>*<sup>l</sup>* <sup>≤</sup> *<sup>C</sup>*. Thus <sup>α</sup>*<sup>l</sup>* = [log*<sup>C</sup>* log *<sup>l</sup>* ].

Next, we calculate *k P*(mod *n*), by (6.4) and (6.5), if *x*<sup>2</sup> − *x*<sup>1</sup> and 2*y*<sup>1</sup> have a rational number whose denominator and *n* are not prime, for example *d* = (*x*<sup>2</sup> − *x*1, *n*), 1 < *<sup>d</sup>* <sup>&</sup>lt; *<sup>n</sup>*; Then we have factorization *<sup>n</sup>* <sup>=</sup> *<sup>d</sup>* · *<sup>n</sup> <sup>d</sup>* . If *d* = *n*, then re select point *P* on rational elliptic curves *CE* and *CE* . By Theorem 6.2, *d* > 1 appears in these rational numbers *x*<sup>2</sup> − *x*<sup>1</sup> and *y*<sup>1</sup> if and only if there is a *k*1, such that

$$k\_1 \cdot (P \bmod p) = 0, \forall \ p | n.$$

From the selection of equation *k* in (6.27), there is a maximum probability *k*1|*k*, thus

$$k \cdot (P \bmod p) = 0, \forall \ p \vert n.$$

Therefore, in Lenstra algorithm, by calculating the rational point *k P*, there is a great probability that there is a certain *p*, *p*|*n* such that

$$k(P \bmod p) = 0, \, p|n, \,\, \text{is a prime number.}\tag{6.28}$$

By Theorem 6.2, let *P* = (*x*1, *y*1), (*k* − 1)*P* = (*x*2, *y*2), thus *d* = (*x*<sup>2</sup> − *x*1, *n*) > 1 or (2*y*1, *n*) = *d* > 1, we obtain the nontrivial factorization of *n*.

From the above Lenstra algorithm, the key problem is to calculate *k* · *P*. Using the continuous doubling method given in Lemma 6.10, we only need to calculate 2*P*, 2(2*P*), 2(4*P*), . . . , 2<sup>α</sup><sup>2</sup> *P*, 3(2<sup>α</sup><sup>2</sup> *P*), 3(3 · 2<sup>α</sup><sup>2</sup> *P*)··· 3<sup>α</sup><sup>3</sup> 2<sup>α</sup><sup>2</sup> *P*, this continues until ( <sup>1</sup><*l*≤*<sup>B</sup> l* <sup>α</sup>*l*)*P*, i.e., *k P*.

For the probability estimation and computational complexity of Lenstra algorithm, see 1986 of reference 6.

#### **Exercise 6**


5. Calculate the order of points on the following rational elliptic curves: (*i*) *P* = (0, 16),*CE* : *y*<sup>2</sup> = *x* <sup>3</sup> + 256; (*ii*) *<sup>P</sup>* <sup>=</sup> ( <sup>1</sup> 2 , 1 <sup>2</sup> ),*CE* : *<sup>y</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>* <sup>3</sup> <sup>+</sup> *<sup>x</sup>* 4 ; (*iii*) *P* = (3, 8),*CE* : *y*<sup>2</sup> = *x* <sup>3</sup> − 43*x* + 16; (*i*v) *P* = (0, 0),*CE* : *y*<sup>2</sup> + *y* = *x* <sup>3</sup> − *x* <sup>2</sup>.

	- (*c*) *y*<sup>2</sup> + *y* = *x* 3, when *q* ≡ 2(mod 3).

9. The deterministic algorithm can map the embedding of plaintext units to any <sup>F</sup>*q*<sup>−</sup> elliptic curve. Please give the specific algorithm process for the following elliptic curves:

(1) *CE* : *y*<sup>2</sup> = *x* <sup>3</sup> − *x*, when *q* ≡ 3(mod 4), (2) *CE* : *y*<sup>2</sup> + *y* = *x* 3, when *q* ≡ 2(mod 3).

10. Let *CE* be an elliptic curve on the finite field F*p*, and *Nr* represents the number of midpoint of *CE* in the finite field F*pr* , then (*i*) If *p* > 3, when *r* > 1, *Nr* is not prime.

(*ii*)When *p* = 2, 3, a counterexample is given to show that *Nr* is a prime number.

	- (*i*) *nk* is a prime, if and only if there is an integer *<sup>a</sup>*, *<sup>a</sup>*22*k*−<sup>1</sup> ≡ −1(mod *nk* ).
	- (*ii*) If *nk* is a prime, then *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup><sup>∗</sup> *nk* over 50% has the congruence property of (*i*).
	- (*iii*) When *k* > 1, we can always choose *a* = 3, 5, or *a* = 7.

# **References**

Fulton, W. (1969). *Algebraic curves*. Benjamin.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 7 Lattice-Based Cryptography**

# **7.1 Geometry of Numbers**

Let <sup>R</sup>*<sup>n</sup>* be an *<sup>n</sup>*-dimensional Euclidean space and *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* be an *n*-dimensional vector, *x* can be a row vector or a column vector, depending on the situation. If *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then *<sup>x</sup>* is called a integral point. <sup>R</sup>*m*×*<sup>n</sup>* is all *<sup>m</sup>* <sup>×</sup> *<sup>n</sup>*-dimensional matrices on <sup>R</sup>. *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>y</sup>* <sup>=</sup> (*y*1, *<sup>y</sup>*2,..., *yn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, define the inner product of *x* and *y* as

$$
\langle \mathbf{x}, \mathbf{y} \rangle = \sum\_{i=1}^{n} x\_i \mathbf{y}\_i. \tag{7.1}
$$

The length |*x*| of vector *x* is defined as

$$|\mathbf{x}| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle} = \sum\_{i=1}^{n} x\_i^2. \tag{7.2}$$

<sup>λ</sup> <sup>∈</sup> <sup>R</sup>, then <sup>λ</sup> · *<sup>x</sup>* is defined as

$$
\lambda \ge (\lambda x\_1, \lambda x\_2, \dots, \lambda x\_n). \tag{7.3}
$$

If the inner product *x*, *y* = 0 of two vectors *x* and *y*, *x* and *y* are said to be orthogonal, denote as *x*⊥*y*.

# **Lemma 7.1** *Let x*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*n,* <sup>λ</sup> <sup>∈</sup> <sup>R</sup> *is any real number, then*


253

*(iv) (Pythagorean theorem) If and only if x*⊥*y, we have*

$$\|\mathbf{x} \pm \mathbf{y}\|^2 = |\mathbf{x}|^2 + |\mathbf{y}|^2.$$

*Proof* (i) and (ii) can be derived directly from the definition. To prove (iii), let *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*), *<sup>y</sup>* <sup>=</sup> (*y*1, *<sup>y</sup>*2,..., *yn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, by Holder inequality: ¨

$$\left|\sum\_{i=1}^n \boldsymbol{\chi}\_i \boldsymbol{\text{y}}\_i\right| \le \left[\left(\sum\_{i=1}^n \boldsymbol{\chi}\_i^2\right)\left(\sum\_{i=1}^n \boldsymbol{\text{y}}\_i^2\right)\right]^{\frac{1}{2}}.$$

So there is

$$\begin{aligned} |\mathbf{x} + \mathbf{y}|^2 &= \sum\_{i=1}^n (\mathbf{x}\_i + \mathbf{y}\_i)^2 = \sum\_{i=1}^n \mathbf{x}\_i^2 + 2\sum\_{i=1}^n \mathbf{x}\_i \mathbf{y}\_i + \sum\_{i=1}^n \mathbf{y}\_i^2 \\ &\le \sum\_{i=1}^n \mathbf{x}\_i^2 + 2\left|\sum\_{i=1}^n \mathbf{x}\_i \mathbf{y}\_i\right| + \sum\_{i=1}^n \mathbf{y}\_i^2 \\ &\le \left(\sqrt{\sum\_{i=1}^n \mathbf{x}\_i^2} + \sqrt{\sum\_{i=1}^n \mathbf{y}\_i^2}\right)^2 = (|\mathbf{x}| + |\mathbf{y}|)^2,\end{aligned}$$

so (iii) holds. Then, by the definition of inner product,

$$
\langle \mathbf{x} \pm \mathbf{y}, \mathbf{x} \pm \mathbf{y} \rangle = \langle \mathbf{x}, \mathbf{x} \rangle \pm 2 \langle \mathbf{x}, \mathbf{y} \rangle + \langle \mathbf{y}, \mathbf{y} \rangle,
$$

if *x*⊥*y*, then

$$\|\mathbf{x} \pm \mathbf{y}\|^2 = |\mathbf{x}|^2 + |\mathbf{y}|^2.$$

Conversely, if *x* is not orthogonal to *y*, then *x*, *y* = 0, thus

$$\|\mathbf{x} \pm \mathbf{y}\|^2 \neq \|\mathbf{x}\|^2 + \|\mathbf{y}\|^2.$$

Lemma 7.1 holds.

From Pythagorean theorem, for orthogonal vector *x*⊥*y*, we have the following conclusion,

$$|\mathbf{x} + \mathbf{y}| = |\mathbf{x} - \mathbf{y}|, \text{ if } \mathbf{x} \perp \mathbf{y}. \tag{7.4}$$

**Definition 7.1** Let *<sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a subset, 0 <sup>∈</sup> *<sup>R</sup>*, *<sup>R</sup>* is called a symmetric convex body of R*<sup>n</sup>*, if

(i) *x* ∈ *R*,⇒ −*x* ∈ *R* (Symmetry);

(ii) Let *x*, *y* ∈ *R*, λ ≥ 0, μ ≥ 0, and λ + μ = 1, then λ*x* + μ*y* ∈ *R* (Convexity).

#### 7.1 Geometry of Numbers 255

The following example is a famous example of a symmetric convex body defined by a set of linear inequalities. Let *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*×*<sup>n</sup>* be an *<sup>m</sup>* <sup>×</sup> *<sup>n</sup>*-order matrix, *<sup>c</sup>* <sup>=</sup> (*c*1, *<sup>c</sup>*2,..., *cn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, and <sup>∀</sup> *ci* <sup>≥</sup> 0, *<sup>R</sup>*(*A*, *<sup>c</sup>*) is defined as the set of solutions of *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* defined by the following *<sup>m</sup>* linear inequalities, let *<sup>A</sup>* <sup>=</sup> (*ai j*)*<sup>m</sup>*×*<sup>n</sup>*,

$$\left| \sum\_{j=1}^{n} a\_{ij} x\_j \right| \le c\_i, \quad 1 \le i \le m. \tag{7.5}$$

We have

**Lemma 7.2** *For any A* <sup>∈</sup> <sup>R</sup>*m*×*n, and any positive vector c* <sup>=</sup> (*c*1, *<sup>c</sup>*2,..., *cn*) <sup>∈</sup> <sup>R</sup>*n, then R*(*A*, *c*) *is a symmetric convex body in* R*n.*

*Proof* Obviously zero vector *x* = (0, 0,..., 0) ∈ *R*(*A*, *c*), and if *x* ∈ *R*(*A*, *c*) ⇒ −*x* ∈ *R*(*A*, *c*). So we only prove the convexity of *R*(*A*, *c*). Suppose *x*, *y* ∈ *R*(*A*, *c*), let

$$z = \lambda x + \mu y, \lambda > 0, \mu > 0, \lambda + \mu = 1.$$

Then for any 1 ≤ *i* ≤ *m*, we have

$$\begin{aligned} &|a\_{i1}z\_1 + a\_{i2}z\_2 + \cdots + a\_{in}z\_n| \\ &\le \lambda |a\_{i1}\mathbf{x}\_1 + a\_{i2}\mathbf{x}\_2 + \cdots + a\_{in}\mathbf{x}\_n| + \mu |a\_{i1}\mathbf{y}\_1 + a\_{i2}\mathbf{y}\_2 + \cdots + a\_{in}\mathbf{y}\_n| \\ &\le \lambda c\_i + \mu c\_i = c\_i. \end{aligned}$$

So there is *z* = λ*x* + μ*y* ∈ *R*(*A*, *c*). Thus, *R*(*A*, *c*) is a symmetrical convex body. Lemma 7.2 holds.

**Lemma 7.3** *Let <sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a symmetrical convex body, x* <sup>∈</sup> *<sup>R</sup>, then when* <sup>|</sup>λ| ≤ <sup>1</sup>*, we have* λ*x* ∈ *R.*

*Proof* By convexity, let

$$
\rho = \frac{1}{2}(1+\lambda), \ \sigma = \frac{1}{2}(1-\lambda).
$$

Then ρ ≥ 0, σ ≥ 0, and ρ + σ = 1. So there is

$$
\rho \ge +\sigma(-\mathbf{x}) = \lambda \mathbf{x} \in \mathcal{A}.
$$

The Lemma holds.

**Lemma 7.4** *If x*, *y* ∈ *R, then* λ*x* + μ*y* ∈ *R, where* λ,μ *are real numbers, and satisfies* |λ|+|μ| ≤ 1*.*

*Proof* Let η<sup>1</sup> be the sign of λ and η<sup>2</sup> be the sign of μ, then by Lemma 7.3,

$$\begin{aligned} \mathbf{x}' &= \eta\_1 (|\lambda| + |\mu|) \mathbf{x} \in \mathcal{R}, \\ \mathbf{y}' &= \eta\_2 (|\lambda| + |\mu|) \mathbf{y} \in \mathcal{R}. \end{aligned}$$

Let <sup>ρ</sup> <sup>=</sup> <sup>|</sup>λ<sup>|</sup> |λ|+|μ| , <sup>σ</sup> <sup>=</sup> <sup>|</sup>μ<sup>|</sup> |λ|+|μ| , then ρ + σ = 1. By definition, we have

$$
\lambda x + \mu \mathbf{y} = \rho x' + \sigma \mathbf{y}' \in \mathcal{K},
$$

thus the Lemma holds. And this result is not difficult to be extended to the case of *n* variables.

**Lemma 7.5** (Blichfeldt) *Let <sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be any region in* <sup>R</sup>*<sup>n</sup> and V be the volume of R. If V* > 1*, then there are two different vectors x* ∈ *R*, *x* ∈ *R so that x* − *x is an integral point (thus a nonzero integral point).*

*Proof* For <sup>∀</sup>*<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, we define

$$\mathbf{f}\left[\left[\mathbf{x}\right]\right] = \left(\left[\mathbf{x}\_1\right], \left[\mathbf{x}\_2\right], \dots, \left[\mathbf{x}\_n\right]\right) \in \mathbb{Z}^n \tag{7.6}$$

and

$$\{\mathbf{x}\} = (\delta\_1, \delta\_2, \dots, \delta\_n) \in \mathbb{Z}'',\tag{7.7}$$

where [*xi*] is the square bracket function of *xi* and δ*<sup>i</sup>* is the nearest integer to *xi* .

For each integral point *<sup>u</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, define

$$\mathcal{R}\_{\boldsymbol{\mu}} = \{ \boldsymbol{\mathfrak{x}} \in \mathcal{R} | [\![\![\boldsymbol{\mathfrak{x}}]\!] = \boldsymbol{\mathfrak{x}} \}$$

and

$$D\_u = \{ \mathbf{x} - \boldsymbol{\mu} | \mathbf{x} \in \mathcal{R}\_u \}.$$

Because *<sup>R</sup><sup>u</sup>*<sup>1</sup> <sup>∩</sup> *<sup>R</sup><sup>u</sup>*<sup>2</sup> <sup>=</sup> <sup>∅</sup>, if *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>u</sup>*2. Thus by *<sup>R</sup>* <sup>=</sup> *<sup>u</sup> Ru*,

$$\begin{aligned} \Rightarrow V &= \text{Vol}(\mathcal{R}) = \sum\_{u} \text{Vol}(\mathcal{R}\_{u}) \\ &= \sum\_{u} V\_{u} > 1, \end{aligned}$$

where *Vu* = Vol(*Ru*). Thus *Vu* = Vol(*Du*). If *Du* is disjoint, then

$$\sum\_{\boldsymbol{u}} V\_{\boldsymbol{u}} = \text{Vol}\left(\bigcup\_{\boldsymbol{u}} D\_{\boldsymbol{u}}\right) \subset [0,1) \times \dots \times [0,1).$$

There is

$$\sum\_{u} V\_{u} \le 1,$$

so there is a contradiction. Therefore, there must be two different integral points *u* and *u* (*u* = *u* ) ⇒ *Du* ∩ *Du* = 0, that is *x* − *u* = *x* − *u* ⇒ *x* − *x* = *u* − *u* ∈ Z*<sup>n</sup>*. The Lemma holds.

**Lemma 7.6** (Minkowski) *Let R be a symmetric convex body, and the volume of R*

$$V = \text{Vol}(\mathcal{R}) > 2^n,$$

*then R contains at least one nonzero integer point.*

*Proof* Let

$$\frac{1}{2}\mathcal{R} = \left\{\frac{1}{2}\mathbf{x}|\mathbf{x} \in \mathcal{R}\right\}.$$

Thus

$$\text{Vol}\left(\frac{1}{2}\mathcal{A}\right) = \frac{1}{2^n}V > 1,$$

by Lemma 7.5, there are integral points where *x* , *<sup>x</sup>* <sup>∈</sup> <sup>1</sup> <sup>2</sup>*R* ⇒ *x* − *x* = *u* is nonzero. We prove *<sup>u</sup>* <sup>∈</sup> *<sup>R</sup>*. Write *<sup>x</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *<sup>y</sup>*, *<sup>x</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *z*, where *y*,*z* ∈ *R*. Then

$$
\mu = \frac{1}{2}\mathbf{y} - \frac{1}{2}\mathbf{z}, \quad \mathbf{y} \in \mathcal{R}, \; \mathbf{z} \in \mathcal{R}.
$$

By Lemma 7.4, then *u* ∈ *R*. The Lemma holds.

*Remark 7.1* The aboveMinkowski's conclusion cannot be improved, that is *V* > 2*<sup>n</sup>*, it cannot be improved to *V* ≥ 2*<sup>n</sup>*. A counterexample is

$$\mathcal{A} = \{ \mathbf{x} \in \mathbb{R}^n | \mathbf{x} = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n), \forall \, |\mathbf{x}\_i| < 1 \}.$$

Obviously Vol(*R*) = 2*<sup>n</sup>*, but there is no nonzero integer point in ordinary *R*.

When Vol(*R*) = 2*<sup>n</sup>*, in order to make a symmetric convex body *R* still have nonzero integral points, we need to make some supplementary constraints on *R*, first, we consider the bounded region. Let *<sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, call *<sup>R</sup>* bounded, if

$$\mathcal{A} = \{ \mathbf{x} = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) \in \mathbb{R}^n | |\mathbf{x}\_i| \le B, 1 \le i \le n \},$$

where *B* is a bounded constant.

**Lemma 7.7** *Let A*∈R*<sup>n</sup>*×*<sup>n</sup> be a reversible matrix, d* = |det(*A*)<sup>|</sup> <sup>&</sup>gt; <sup>0</sup>*, c* <sup>=</sup> (*c*1, *<sup>c</sup>*2,..., *cn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup> is a positive vector, that is* <sup>∀</sup> *ci* <sup>&</sup>gt; <sup>0</sup>*, then the symmetric convex body <sup>R</sup>*(*A*, *<sup>c</sup>*) *defined by Eq. (7.5) is bounded and its volume*

$$\text{Vol}(\mathcal{R}(A, c)) = 2^n d^{-1} c\_1 c\_2 \cdots c\_n.$$

*Proof* Let *A* = (*ai j*)*<sup>n</sup>*×*<sup>n</sup>*. Write *Ax* = *y*, then *x* = *A*−<sup>1</sup> *y*. And let *A*−<sup>1</sup> = (*bi j*)*<sup>n</sup>*×*<sup>n</sup>*, then for any *xi* , there is

$$|\mathbf{x}\_i| = \left| \sum\_{j=1}^n b\_{ij} y\_j \right| \le \sum\_{j=1}^n |b\_{ij}| \cdot c\_j \le B,$$

where *B* is a bounded constant. Therefore, *R*(*A*, *c*) is a bounded set. Obviously

$$\text{Vol}(\mathcal{R}(A,c)) = \int\limits\_{\mathbf{x} = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) \in \mathcal{R}(A,c)} \text{d}\mathbf{x}\_1 \text{d}\mathbf{x}\_2 \cdots \text{d}\mathbf{x}\_n,$$

do variable replacement *Ax* = *y*, then

$$\mathrm{d}x = \mathrm{d}x\_1 \cdots \mathrm{d}x\_n = \frac{1}{|\mathrm{det}(A)|} \mathrm{d}y\_1 \mathrm{d}y\_2 \cdots \mathrm{d}y\_n.$$

Thus

$$\begin{aligned} \text{Vol}(\mathcal{R}(A, c)) &= \frac{1}{|\text{det}(A)|} \int\_{-c\_1}^{c\_1} \cdots \int\_{-c\_n}^{c\_n} \text{d}\mathbf{y}\_1 \text{d}\mathbf{y}\_2 \cdots \text{d}\mathbf{y}\_n \\ &= 2^n d^{-1} \prod\_{i=1}^n c\_i, \end{aligned}$$

Lemma 7.7 holds.

*Remark 7.2* In (7.5), "≤" is changed to "<" to define *R*(*A*, *c*), and the above lemma is still holds.

Now consider the general situation, let *A* = (*ai j*)*<sup>m</sup>*×*<sup>n</sup>*. If *m* > *n*, and rank(*A*) ≥ *n*, then *R*(*A*, *c*) defined by Eq. (7.5) is still a bounded region. Obviously if *m* < *n*, or *m* = *n*, rank(*A*) < *n*, then *R*(*A*, *c*) is an unbounded region, and *V* = ∞. Therefore, we have the following Corollary.

**Corollary 7.1** *Let A* = (*ai j*)*<sup>m</sup>*×*n, m* < *n or m* = *n,* det(*A*) = 0*, then for any small positive vector c* = (*c*1, *c*2,..., *cn*), 0 < *ci* < ε*, R*(*A*, *c*) *contains a nonzero integer point. In other words, the following m inequalities*

$$\left| \sum\_{j=1}^{n} a\_{ij} x\_j \right| < \varepsilon, \ l \le i \le m.$$

*There exists a nonzero integer solution x* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> <sup>Z</sup>*n.*

*Proof* When ε > 0 given, then Vol(*R*(*A*, *c*)) = ∞ > 2*<sup>n</sup>*. By Lemma 7.6, *R*(*A*, *c*) contains at least one nonzero zero point.

Let *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*×*<sup>n</sup>* be a matrix of order *<sup>m</sup>* <sup>×</sup> *<sup>n</sup>*, *<sup>c</sup>* <sup>=</sup> (*c*1, *<sup>c</sup>*2,..., *cn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is a positive vector, that is ∀ *ci* > 0, write *A* = (*ai j*)*<sup>m</sup>*×*<sup>n</sup>*, *R* (*A*, *c*)is defined as the set of solutions *x* = (*x*1, *x*2,..., *xn*) of the following linear inequality:

$$\begin{cases} \left| \sum\_{j=1}^{n} a\_{1j} x\_j \right| \le c\_1, \\\\ \left| \sum\_{j=1}^{n} a\_{ij} x\_j \right| < c\_i, \ i = 2, \dots, m. \end{cases} \tag{7.8}$$

When *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is a reversible square matrix, we discuss the nonzero integral point in symmetric convex body *R* (*A*, *c*).

**Lemma 7.8** *If A* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup> is a reversible matrix and c* <sup>=</sup> (*c*1, *<sup>c</sup>*2,..., *cn*) *is a positive vector, when*

$$c\_1 c\_2 \cdots c\_n \ge |\det(A)|,\tag{7.9}$$

*Then R* (*A*, *c*) *contains a nonzero integer point.*

*Proof* When *c*1*c*<sup>2</sup> ··· *cn* > |det(*A*)|, because of

$$\text{Vol}(\mathcal{R}'(A, c)) = \frac{2^n c\_1 c\_2 \cdots c\_n}{|\text{det}(A)|} > 2^n,$$

by Lemma 7.6 and 7.7, then the proposition holds, we only discuss the case when the equal sign of formula (7.9) holds.

Let ε be any positive real number, 0 <ε< 1, then by Lemma 7.7, there is a nonzero integral solution *<sup>x</sup>*(ε) <sup>=</sup> (*x*(ε) <sup>1</sup> , *<sup>x</sup>*(ε) <sup>2</sup> ,..., *x*(ε) *<sup>n</sup>* ) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* satisfies

$$\begin{cases} \left| \sum\_{j=1}^{n} a\_{1j} x\_j^{(\boldsymbol{\epsilon})} \right| \le c\_1 + \varepsilon \le c\_1 + 1, \\\\ \left| \sum\_{j=1}^{n} a\_{ij} x\_j^{(\boldsymbol{\epsilon})} \right| < c\_i, \ 2 \le i \le n. \end{cases} \tag{7.10}$$

And there is an upper bound *B* independent of ε, which satisfies

$$|\mathbf{x}\_j^{(\iota)}| \le B, \quad 1 \le j \le n.$$

The integral point *x*(ε) satisfying the above bounded condition is finite, so there must be a nonzero integral point *x* = 0, which holds (7.10) for any ε > 0. Let ε → 0, then the Lemma holds.

In the following discussion, we make the following restrictions on *<sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*:

*R* is a symmetric convex body, *R* is bounded, and *R* is a closed subset of R*<sup>n</sup>*.

(7.11) Obviously, when *A* is an *n*-order reversible square matrix, for any positive vector *c* = (*c*1, *c*2,..., *cn*), *R*(*A*, *c*)satisfies the above restriction (7.11), but *R* (*A*, *c*) does not because *R* (*A*, *c*) is not closed.

**Definition 7.2** If *<sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* satisfies the restriction (7.11), then for any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, define the distance function *F*(*x*) as

$$F(\mathbf{x}) = F\_{\mathcal{R}}(\mathbf{x}) = \inf \{ \lambda | \lambda > 0, \lambda^{-1} \mathbf{x} \in \mathcal{R} \}. \tag{7.12}$$

By definition, it is obvious that we have the following ordinary conclusions:


$$F(\mathbf{x}) = \max\_{1 \le i \le n} c\_i^{-1} \left| \sum\_{j=1}^n a\_{ij} x\_j \right|. \tag{G1.12'}$$

Property (i) can be derived from the boundedness of *R*, and property (ii) can be derived directly from the definition of *R*(*A*, *c*). Later we will see that 0 ≤ *F*(*x*) < ∞ holds for all *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*. The main property of distance function *<sup>F</sup>*(*x*) is the following Lemma.

**Lemma 7.9** *If F*(*x*) *is a distance function defined by R satisfying the constraints, then*


*(iii) F*(*<sup>x</sup>* <sup>+</sup> *<sup>y</sup>*) <sup>≤</sup> *<sup>F</sup>*(*x*) <sup>+</sup> *<sup>F</sup>*(*y*), <sup>∀</sup> *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*n.*

*Proof* Since *R* is closed, by the definition, *F*−<sup>1</sup>(*x*)*x* ∈ *R*. Thus, if λ ≥ *F*(*x*), by Lemma 7.3, then

$$
\lambda^{-1}x = \frac{F(\mathbf{x})}{\lambda} \cdot F^{-1}(\mathbf{x})\mathbf{x}, \ \left|\frac{F(\mathbf{x})}{\lambda}\right| \le 1.
$$

We have λ−<sup>1</sup>*x* ∈ *R* ⇒ *x* ∈ λ*R*. Conversely, if λ < *F*(*x*) ⇒ λ−<sup>1</sup>*x* ∈/ *R*. So when *x* ∈ λ−*R*, there must be λ ≥ *F*(*x*), (i) holds.

(ii) is ordinary. Because |λ| <sup>−</sup><sup>1</sup>*F*−<sup>1</sup>(*x*)λ*x* ∈ *R*. There is

$$F(\lambda x) \le |\lambda| F(x).$$

Conversely, let δ = *F*(λ*x*), because of δ−1λ*x* ∈ *R*, you might as well let λ > 0, thus

$$F(\mathbf{x}) \le \frac{\delta}{\lambda} \implies \lambda F(\mathbf{x}) \le F(\lambda \mathbf{x}).$$

So there is *F*(λ*x*) = |λ|*F*(*x*), (ii) holds.

To prove (iii), we let <sup>μ</sup><sup>1</sup> <sup>=</sup> *<sup>F</sup>*(*x*), μ<sup>2</sup> <sup>=</sup> *<sup>F</sup>*(*y*), =⇒ <sup>μ</sup><sup>−</sup><sup>1</sup> <sup>1</sup> *<sup>x</sup>* <sup>∈</sup> *<sup>R</sup>*, μ<sup>−</sup><sup>1</sup> <sup>2</sup> *y* ∈ *R*. By Lemma 7.4, we have

$$(\mu\_1 + \mu\_2)^{-1}(\mathbf{x} + \mathbf{y}) = \frac{\mu\_1}{\mu\_1 + \mu\_2}(\mu\_1^{-1}\mathbf{x}) + \frac{\mu\_2}{\mu\_1 + \mu\_2}(\mu\_2^{-1}\mathbf{y}) \in \mathcal{R}.$$

Thus

$$F(\mathbf{x} + \mathbf{y}) \le \mu\_1 + \mu\_2.$$

The Lemma holds.

Let the volume of *<sup>R</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* be *<sup>V</sup>* <sup>&</sup>gt; 0, there are *<sup>n</sup>* linearly independent vectors {α1, α2,...,α*n*} in *<sup>R</sup>* to form a set of bases of <sup>R</sup>*<sup>n</sup>*. For any real number μ1, μ2,...,μ*n*, by Lemma 7.9, we have

$$\begin{aligned} F(\mu\_1\alpha\_1 + \dots + \mu\_n\alpha\_n) &\leq |\mu\_1|F(\alpha\_1) + |\mu\_2|F(\alpha\_2) + \dots + |\mu\_n|F(\alpha\_n) \\ &\leq |\mu\_1| + |\mu\_2| + \dots + |\mu\_n|. \end{aligned}$$

Because α*<sup>i</sup>* ∈ *R* ⇒ *F*(α*i*) ≤ 1, so the above formula holds. That proves for ∀ *x* ∈ <sup>R</sup>*<sup>n</sup>* <sup>⇒</sup> *<sup>F</sup>*(*x*) ≤ ∞.

**Corollary 7.2** *Let <sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> meet the limiting conditions (7.11), and* Vol(*R*) > <sup>0</sup>*, then*

*(i)* <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n, there is* <sup>λ</sup> *such that x* <sup>∈</sup> <sup>λ</sup>*R;*

*(ii) Let* {α1, α2,...,α*n*} ⊂ *<sup>R</sup> be a set of bases of* <sup>R</sup>*n, then*

$$\left\{ \sum\_{i=1}^{n} \mu\_i \alpha\_i ||\mu\_1| + |\mu\_2| + \dots + |\mu\_n| \le 1 \right\} \subset \mathcal{R}.$$

*Proof* Because *F*(*x*) < ∞, so by (i) of Lemma 7.9, we can directly deduce the conclusion of (i) and (ii) given directly by Lemma 7.4.

Now let *j* be a subscript, and we define λ*<sup>j</sup>* as

<sup>λ</sup>*<sup>j</sup>* <sup>=</sup> min{<sup>λ</sup> <sup>≥</sup> <sup>0</sup>|λ*<sup>R</sup>* contains *<sup>j</sup>* linear independent integral points in <sup>R</sup>*<sup>n</sup>*}, (7.13)

and λ*<sup>j</sup>* is called the *j*th continuous minimum of *R*. By Lemma 7.3, λ*R* ⊂ λ *R*, if 0 ≤ λ ≤ λ . Therefore, λ increases continuously, then λ*R* can always contain any set of desired vectors. Therefore, the existence of λ*<sup>j</sup>* is proof.

By Lemma 7.6, let *V* be the volume of *R*, then Vol(λ*R*) = λ*nV*, for the first continuous minimum λ1, we have the following estimation

$$
\lambda\_1^n V \le 2^n. \tag{7.14}
$$

For λ*<sup>j</sup>* (*j* ≥ 2), there is no explicit upper bound estimation, but we have the following conclusions.

**Lemma 7.10** *Let <sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a convex body satisfying the limiting condition (7.11), V* = Vol(*R*)*,* λ1, λ2,...,λ*<sup>n</sup> be n continuous minima of R, then we have*

$$\frac{2^n}{n!} \le V\lambda\_1\lambda\_2\cdots\lambda\_n \le 2^n. \tag{7.15}$$

*Proof* We only prove the left inequality of the above formula, and we continuously select the linear independent whole point *x*(1) , *x*(2) ,..., *x*(*j*) such that *x*(*j*) ∈ λ*jR*, and *x*(*j*) *x*(1) , *x*(2) ,..., *x*(*j*−1) is linearly independent. Let *x*(*j*) =(*x <sup>j</sup>*1, *x <sup>j</sup>*2,..., *x jn*) ∈ <sup>Z</sup>*<sup>n</sup>*. Because matrix *<sup>A</sup>* <sup>=</sup> (*<sup>x</sup> ji*)*<sup>n</sup>*×*<sup>n</sup>* is an integer matrix, and det(*A*) <sup>=</sup> 0, so

$$|\det(A)| \ge 1.$$

By Lemma 7.9, for any constant μ1, μ2,...,μ*n*, we have

$$\begin{aligned} &F(\mu\_1 \mathbf{x}^{(1)} + \mu\_2 \mathbf{x}^{(2)} + \dots + \mu\_n \mathbf{x}^{(n)}) \\ &\le |\mu\_1| F(\mathbf{x}^{(1)}) + |\mu\_2| F(\mathbf{x}^{(2)}) + \dots + |\mu\_n| F(\mathbf{x}^{(n)}) \\ &\le |\mu\_1| \lambda\_1 + |\mu\_2| \lambda\_2 + \dots + |\mu\_n| \lambda\_n. \end{aligned}$$

Thus, if |μ1|λ<sup>1</sup> + |μ2|λ<sup>2</sup> +···+|μ*n*|λ*<sup>n</sup>* ≤ 1, then

$$
\mu\_1 \mathbf{x}^{(1)} + \mu\_2 \mathbf{x}^{(2)} + \dots + \mu\_n \mathbf{x}^{(n)} \in \mathcal{R}.
$$

So set

$$\mathcal{A}\mathcal{R}\_1 = \{\mu\_1 \mathbf{x}^{(1)} + \mu\_2 \mathbf{x}^{(2)} + \dots + \mu\_n \mathbf{x}^{(n)} || \mu\_1 | \lambda\_1 + |\mu\_2 | \lambda\_2 + \dots + |\mu\_n | \lambda\_n \le 1\} \subset \mathcal{A}'.$$

The volume of the left set *R*<sup>1</sup> is

$$\begin{split} \mathrm{Vol}(\mathcal{R}\_{1}) &= \int \cdots \int \cdots \int\_{|\mu\_{1}|\lambda\_{1} + |\mu\_{2}|\lambda\_{2} + \cdots + |\mu\_{n}|\lambda\_{n} \leq 1} \mathrm{d}\mu\_{1} \mathrm{d}\mu\_{2} \cdots \mathrm{d}\mu\_{n} = \frac{2^{n}|\det(A)|}{n!\lambda\_{1}\cdots\lambda\_{n}} \\ &\geq \frac{2^{n}}{n!\lambda\_{1}\cdots\lambda\_{n}} .\end{split}$$

So there is

$$\frac{\mathcal{Z}^n}{n!\lambda\_1\cdots\lambda\_n} \le \text{Vol}(\mathcal{R}\_1) \le \text{Vol}(\mathcal{R}) = V.$$

Therefore, the left inequality of (7.15) holds. The proof of the right inequality is quite complex and is omitted here. Interested readers can refer to the classic works (1963, 1971) of J. W. S. Cassels.

An important application of the above geometry of numbers is to solve the problem of rational approximation of real numbers, which is called Diophantine approximation in classical number theory. The main conclusion of this section is the following simultaneous rational approximation theorem of *n* real numbers.

**Theorem 7.1** *Let* θ1, θ2,...,θ*<sup>n</sup> be any n real numbers,* θ*<sup>i</sup>* = 0*, then for any positive number N* > 1*, there are nonzero positive integers q and p*1, *p*2,..., *pn to satisfy*

$$\begin{cases} |q\theta\_i - p\_i| < N^{-\frac{1}{n}}, & 1 \le i \le n; \\\ |q| \le N. \end{cases} \tag{7.16}$$

*Proof* The proof of the theorem is a simple application of Minkowski's linear type theorem (see Lemma 7.8). Let *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>(*n*+1)×(*n*+1) be an (*<sup>n</sup>* <sup>+</sup> <sup>1</sup>)-order reversible square matrix, defined as

$$A = \begin{pmatrix} -1 & 0 & \cdots & \cdots & 0 & \theta\_1 \\ 0 & -1 & \cdots & \cdots & 0 & \theta\_2 \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\ 0 & 0 & \cdots & 0 & -1 & \theta\_n \\ 0 & \cdots & \cdots & 0 & 0 & -1 \end{pmatrix}.$$

Obviously <sup>|</sup>det(*A*)| = 1. Let (*<sup>n</sup>* <sup>+</sup> <sup>1</sup>)-dimensional positive vector *<sup>c</sup>* <sup>=</sup> (*<sup>N</sup>* <sup>−</sup> <sup>1</sup> *<sup>n</sup>* , *N* <sup>−</sup> <sup>1</sup> *n* , ..., *N* <sup>−</sup> <sup>1</sup> *<sup>n</sup>* , *N*), because

$$c\_1 c\_2 \cdots c\_n c\_{n+1} = N^{-1} \cdot N = 1 \ge |\det(A)|\dots$$

So by Lemma 7.8, the symmetric convex body *R* (*A*, *c*) defined by *A* and *c* has a nonzero integral point *x* = (*p*1, *p*2,..., *pn*, *q*) = 0. We prove *q* = 0. Because *x* = 0, if *q* = 0, then *pk* = 0 (1 ≤ *k* ≤ *n*), therefore, the *k*-th inequality in Eq. (7.16) will produce the following contradiction,

$$1 \le |q\theta\_k - p\_k| < N^{-\frac{1}{n}} < 1.$$

So *q* = 0, we complete the proof of Theorem 7.1.

**Corollary 7.3** *Let* θ1,...,θ*<sup>n</sup> be any n real numbers, then for any* ε > 0*, there is rational number pi <sup>q</sup>* (1 ≤ *i* ≤ *n*) *satisfies*

$$\left|\theta\_i - \frac{p\_i}{q}\right| < \frac{\varepsilon}{q}.\tag{7.17}$$

*Proof* Any ε > 0 given, let *N* <sup>−</sup> <sup>1</sup> *<sup>n</sup>* < ε, Formula (7.17) can be derived directly from Theorem 7.1.

# **7.2 Basic Properties of Lattice**

Lattice is one of the most important concepts in modern cryptography. Most of the so-called anti-quantum computing attacks are lattice based cryptosystems. What is a lattice? In short, a lattice is a geometry in *n*-dimensional Euclidean space R*<sup>n</sup>*, for example *<sup>L</sup>* <sup>=</sup> <sup>Z</sup>*<sup>n</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, then <sup>Z</sup>*<sup>n</sup>* is a lattice in <sup>R</sup>*<sup>n</sup>*, which is called an integer lattice or a trivial lattice. If Z*<sup>n</sup>* is rotated once, we get the concept of a general lattice in R*<sup>n</sup>*, which is a geometric description of a lattice, next, we give an algebraic precise definition of a lattice.

**Definition 7.3** Let *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a nonempty subset, which is called a lattice in <sup>R</sup>*<sup>n</sup>*, if


$$\min\{|x||\mathbf{x}\in L, \mathbf{x}\neq \mathbf{0}\} = \lambda,\tag{7.18}$$

λ = λ(*L*) is called the minimal distance of a lattice *L*.

By Definition 7.3, a lattice is simply a discrete additive subgroup in R*<sup>n</sup>*, in which the minimum distance λ = λ(*L*) is the most important mathematical quantity of the lattice. Obviously, we have

$$\lambda = \min\{|\mathbf{x} - \mathbf{y}| | \mathbf{x} \in L, \mathbf{y} \in L, \mathbf{x} \neq \mathbf{y} \},\tag{7.19}$$

Equation (7.19) shows the reason why λ is called the minimal distance of a lattice. If *x* ∈ *L* and |*x*| = λ, *x* is called the shortest vector of *L*.

In order to obtain a more explicit and concise mathematical expression of any lattice, we can regard an additive subgroup as a Z-module. First, we prove that any lattice is a finitely generated Z-module.

**Lemma 7.11** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice and* {α1, α2,...,α*m*} ⊂ *L be a set of vectors in L, then* {α1, α2,...,α*m*}*is linearly independent in* <sup>R</sup> *if and only if*{α1, α2,...,α*m*} *is linearly independent in* Z*.*

*Proof* If {α1, α2,...,α*m*} is linearly independent in <sup>R</sup>, it is obviously linearly independent in <sup>Z</sup>. conversely, if {α1, α2,...,α*m*} is linearly independent in <sup>Z</sup>, that is, any linear combination

$$a\_1\alpha\_1 + \dots + a\_m\alpha\_m = 0, \quad a\_i \in \mathbb{Z}, i$$

we have *<sup>a</sup>*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*<sup>2</sup> =···= *am* <sup>=</sup> 0, then the linear combination in <sup>R</sup> is equal to 0, that is

$$
\theta\_1 \alpha\_1 + \theta\_2 \alpha\_2 + \dots + \theta\_m \alpha\_m = 0, \ \theta\_i \in \mathbb{R}. \tag{7.20}
$$

We prove θ<sup>1</sup> = θ<sup>2</sup> =···= θ*<sup>m</sup>* = 0. By Lemma 7.1, for sufficiently large *N* > 1, there are positive integers *q* = 0 and *p*1, *p*2,..., *pm* such that

$$\begin{cases} |q\theta\_i - p\_i| < N^{-\frac{1}{m}}, & 1 \le i \le m; \\ q \le N. \end{cases}$$

By (7.20), we have

$$\begin{aligned} |p\_1\alpha\_1 + \dots + p\_m\alpha\_m| &= |(q\theta\_1 - p\_1)\alpha\_1 + \dots + (q\theta\_m - p\_m)\alpha\_m| \\ &\le N^{-\frac{1}{m}}(|\alpha\_1| + \dots + |\alpha\_m|) \\ &\le N^{-\frac{1}{m}} \max\_{1 \le i \le m} |\alpha\_i|. \end{aligned}$$

Let λ be the minimal distance of *L* and ε > 0 be a sufficiently small positive number, we choose

$$N > \max\left\{ \varepsilon^{-m}, \max\_{1 \le i \le m} \frac{|\alpha\_i|^m}{\lambda^m} \right\},$$

then *N* <sup>−</sup> <sup>1</sup> *<sup>m</sup>* < ε, and

$$N^{-\frac{1}{m}} \max\_{1 \le i \le m} |\alpha\_i| < \lambda.$$

Thus

$$|p\_1\alpha\_1 + \dots + p\_m\alpha\_m| < \lambda\_\dots$$

Notice that *p*1α<sup>1</sup> +···+ *pm*α*<sup>m</sup>* ∈ *L*, so *p*1α<sup>1</sup> +···+ *pm*α*<sup>m</sup>* = 0. Since {α1, α2, ...,α*m*} is linearly independent on <sup>Z</sup>, *<sup>p</sup>*<sup>1</sup> <sup>=</sup> *<sup>p</sup>*<sup>2</sup> =···= *pm* <sup>=</sup> 0 is derived. For any *<sup>i</sup>*, 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>*, we get <sup>|</sup>θ*i*|≤|*q*θ*i*<sup>|</sup> <sup>&</sup>lt; *<sup>N</sup>* <sup>−</sup> <sup>1</sup> *<sup>m</sup>* < ε. Since ε is any small positive number, there is θ<sup>1</sup> = θ<sup>2</sup> =···= θ*<sup>m</sup>* = 0. This proves that {α1, α2,...,α*m*} is also linearly independent in R. Lemma 7.11 holds.

From the above lemma, any lattice *L* in R*<sup>n</sup>* is a finitely generated Z-module. Let {β1, β2,...,β*m*} ⊂ *<sup>L</sup>* be a set of <sup>Z</sup>-bases in *<sup>L</sup>*, then *<sup>L</sup>* as the rank of <sup>Z</sup>-module satisfies

$$\text{rank}(L) = m \le n,\tag{7.21}$$

and

$$L = \left\{\sum\_{i=1}^{m} a\_i \beta\_i | a\_i \in \mathbb{Z}\right\}.\tag{7.22}$$

If {β1, β2,...,β*m*} is a <sup>Z</sup>-basis of *<sup>L</sup>* and each <sup>β</sup>*<sup>i</sup>* is regarded as a column vector, then the matrix

$$B = [\beta\_1, \beta\_2, \dots, \beta\_m] \in \mathbb{R}^{n \times m}, \quad \text{rank}(B) = m.$$

Equation (7.22) can be written as

$$L = L(B) = \{ Bx | x \in \mathbb{Z}^m \} \subset \mathbb{R}^n. \tag{7.23}$$

We take *<sup>L</sup>* as the <sup>Z</sup>-modules, *<sup>m</sup>* as the rank of lattice *<sup>L</sup>*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>m</sup>* as the generating matrix of lattice *L*, and {β1, β2,...,β*m*} as a set of generating bases of *L*.

If {α1, α2,...,α*m*} ⊂ <sup>R</sup>*<sup>n</sup>* is any *<sup>m</sup>* column vectors in <sup>R</sup>*<sup>n</sup>*, the Gram matrix of *<sup>A</sup>* = [α1, α2,...,α*m*] ∈ <sup>R</sup>*n*×*<sup>m</sup>*, {α1, α2,...,α*m*} is defined as

$$T = (\langle \alpha\_i, \alpha\_j \rangle)\_{m \times m} \cdot$$

Obviously, we have *T* = *A A*, where *A* is the transpose matrix of *A*.

**Lemma 7.12** *Let A* <sup>∈</sup> <sup>R</sup>*n*×*m, b* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> (m* <sup>≤</sup> *n is not required), then*

*(i) Let x*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>m</sup> be a solution of A Ax* = *A b, then*

$$|A\mathbf{x}\_0 - b|^2 = \min\_{\mathbf{x} \in \mathbb{R}^m} |A\mathbf{x} - b|^2.$$


$$x = \left(A^\prime A\right)^{-1} A^\prime b.$$

*Proof* First we prove (i). Let *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* satisfies *<sup>A</sup> Ax*<sup>0</sup> = *A <sup>b</sup>*, then for any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*, we have

$$Ax - b = (Ax\_0 - b) + A(x - x\_0) = \mathcal{Y} + \mathcal{Y}\_1 \in \mathbb{R}^n.$$

We prove that γ and γ<sup>1</sup> are two orthogonal vectors in R*<sup>n</sup>*. Because

$$\begin{aligned} (A(\mathbf{x} - \mathbf{x}\_0))'(A\mathbf{x}\_0 - b) \\ &= (\mathbf{x} - \mathbf{x}\_0)'A'(A\mathbf{x}\_0 - b) \\ &= (\mathbf{x} - \mathbf{x}\_0)'(A'A\mathbf{x}\_0 - A'b) = 0. \end{aligned}$$

So γ ⊥γ1, by Pythagorean theorem, we have

$$|A\mathbf{x} - b|^2 = |A\mathbf{x}\_0 - b|^2 + |A(\mathbf{x} - \mathbf{x}\_0)|^2 \ge |A\mathbf{x}\_0 - b|^2.$$

So (i) holds.

To prove (ii), let *VA* be the solution space of *Ax* = 0 and *VA <sup>A</sup>* the solution space of *A Ax* = 0, let's prove *VA* = *VA <sup>A</sup>*. First, there is *VA* ⊂ *VA <sup>A</sup>*. Conversely, let *x* ∈ *VA <sup>A</sup>*, that is *A Ax* = 0, then

$$
\alpha' A' A x = 0 \Rightarrow (A\alpha)' A x = \langle A\alpha, A\alpha \rangle = 0.
$$

The above formula holds if and only if *Ax* = 0, so *x* ∈ *VA*. There is *VA* = *VA <sup>A</sup>*. Notice that

$$\begin{cases} \dim V\_A = m - \text{rank}(A) \\ \dim V\_{A'A} = m - \text{rank}(A'A) .\end{cases}$$

So rank(*A*) = rank(*A <sup>A</sup>*), (ii) holds. To prove (iii), *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* given, then the rank of the augmented matrix of linear equation system *A Ax* = *A b* is

$$\begin{aligned} \text{rank}[A'A, A'b] &= \text{rank}(A'[A, b]) \\ &\le \text{rank}(A') = \text{rank}(A) = \text{rank}(A'A). \end{aligned}$$

Therefore, the augmented matrix and the coefficient matrix have the same rank, so the linear equations have solutions. When rank(*A*) = *m*, then rank(*A A*) = *m*, that is, *A A* is a reversible *m*-order square matrix, thus

$$x = (A^\prime A)^{-1} \cdot A^\prime b, \text{implies the solution is unique.}$$

Lemma 7.12 holds!

**Lemma 7.13** *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*m, and* rank(*A*) <sup>=</sup> *m, then A A is a positive definite real symmetric matrix of order m, so there is a real orthogonal matrix P* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup> of order m satisfies*

$$P'A'AP = \begin{vmatrix} \delta\_1 \ \cdots \ \cdot & 0 \\ \vdots & \delta\_2 \\ \vdots & \dots \ \cdot & \cdot \\ 0 & & \delta\_m \end{vmatrix},\tag{7.24}$$

*where* δ*<sup>i</sup>* > 0 *is the m eigenvalues of A A.*

*Proof* rank(*A*) = *m* ⇒ *m* ≤ *n*. Let *T* = *A A*, then *T* is a symmetric matrix of order *<sup>m</sup>*. Let *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* be *<sup>m</sup>* arguments, quadratic form

$$
\lambda^\prime T x = \mathfrak{x}^\prime A^\prime A x = (A\mathfrak{x})^\prime (A\mathfrak{x}) = \langle A\mathfrak{x}, A\mathfrak{x} \rangle \ge 0.
$$

Because rank(*A*) = *m*, the above formula if and only if when *x* = 0, *x T x* = 0. So *T* is a positive definite matrix. From the knowledge of linear algebra, there is an orthogonal matrix of order *m*, *P* ⇒ *P T P* is a diagonal matrix, that is

$$P'TP = \text{diag}\{\delta\_1, \delta\_2, \dots, \delta\_m\}.$$

Because *P T P* and *T* have the same eigenvalue, δ1, δ2,...,δ*<sup>m</sup>* is the eigenvalue of *T* , and ∀ δ*<sup>i</sup>* > 0. The Lemma holds.

Lemma 7.12 is called the least square method in linear algebra, its significance is to find a vector *<sup>x</sup>*<sup>0</sup> with the shortest length in the set {*Ax* <sup>−</sup> *<sup>b</sup>*|*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*} for a given *<sup>n</sup>* <sup>×</sup> *<sup>m</sup>*-order matrix *<sup>A</sup>* and a given vector *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*. Lemma 7.12 gives an effective algorithm, that is, to solve the linear equations *A Ax* = *A b*, and *x*<sup>0</sup> is the solution of the equations, Lemma 7.13 is called the diagonalization of quadratic form. Now, the main results are as follows:

**Theorem 7.2** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice,* rank(*L*) <sup>=</sup> *<sup>m</sup>* (*<sup>m</sup>* <sup>≤</sup> *<sup>n</sup>*)*, if and only if there is a real matrix B* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>m</sup> of order n* <sup>×</sup> *m,* rank(*B*) <sup>=</sup> *m, such that*

$$L = \{ B \ge |\boldsymbol{x} \in \mathbb{Z}^m \} = \left\{ \sum\_{i=1}^m a\_i \beta\_i | a\_i \in \mathbb{Z} \right\},\tag{7.25}$$

*where B* = [β1, β2,...,β*m*]*, each* <sup>β</sup>*<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> is a column vector.*

*Proof* Equation (7.23) proves the necessity of the condition, and we only prove the sufficiency of the condition. If a subset *L* in R*<sup>n</sup>* is given by Eq. (7.25), it is obvious that *<sup>L</sup>* is an additive subgroup of <sup>R</sup>*<sup>n</sup>*, because any <sup>α</sup> <sup>=</sup> *Bx*1, β <sup>=</sup> *Bx*2, where *<sup>x</sup>*1, *<sup>x</sup>*<sup>2</sup> <sup>∈</sup> <sup>Z</sup>*<sup>m</sup>*, then *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, and

$$
\alpha - \beta = B(\mathbf{x}\_1 - \mathbf{x}\_2) = B\mathbf{x} \in L.
$$

So we only prove the discreteness of *L*. Let *T* = *B B*, then from Lemma 7.13, *T* is a positive definite real symmetric matrix, let δ1, δ2,...,δ*<sup>m</sup>* be the eigenvalue of *T* , then

$$\delta = \min\{\delta\_1, \delta\_2, \dots, \delta\_m\} > 0.$$

We prove

$$\min\_{\substack{x \in \mathbb{Z}^n \\ x \neq 0}} |Bx| \ge \sqrt{\delta} > 0. \tag{7.26}$$

By Lemma 7.13, there is an orthogonal matrix *P* of order *m* such that

$$P'TP = \text{diag}\{\delta\_1, \delta\_2, \dots, \delta\_m\}.$$

For any given *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>m</sup>*, *<sup>x</sup>* <sup>=</sup> 0. We have

$$\|B\mathbf{x}\|^2 = \mathbf{x}'T\mathbf{x} = \mathbf{x}'P(P'TP)P'\mathbf{x} \ge \delta |P'\mathbf{x}|^2 = \delta |\mathbf{x}|^2.$$

Because *x* = 0, then |*x*| <sup>2</sup> ≥ 1, so

$$\left|B\mathbf{x}\right|^2 \ge \delta, \quad \forall \ \mathbf{x} \in \mathbb{Z}^m, \ \mathbf{x} \ne \mathbf{0}.$$

This shows that the distance between any two different points in *L* is ≥ δ > 0. Therefore, in a sphere with 0 as the center and *r* as the radius, the number of points in *L* is finite. In these finite vectors, there is a α ∈ *L*,⇒

$$|\alpha| = \min\_{\substack{\boldsymbol{x} \in I \\ \boldsymbol{x} \neq \boldsymbol{0}}} |\boldsymbol{x}| = \lambda \ge \delta > 0.$$

According to the definition of lattice, *L* is a lattice in R*<sup>n</sup>*, the Lemma holds.

It can be directly deduced from the above theorem

**Corollary 7.4** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice of* rank(*L*) <sup>=</sup> *m,* <sup>λ</sup> *be the minimum distance of L, B* <sup>∈</sup> <sup>R</sup>*n*×*m,* <sup>δ</sup> *be the minimum eigenvalue of B B, then* <sup>λ</sup> <sup>≥</sup> <sup>√</sup>δ*.*

**Definition 7.4** *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a lattice, and rank(*L*) <sup>=</sup> *<sup>n</sup>*, call *<sup>L</sup>* is a full rank lattice of R*<sup>n</sup>*.

By Theorem 7.2, a sufficient and necessary condition for a full rank lattice with *<sup>L</sup>* as <sup>R</sup>*<sup>n</sup>* is the existence of a reversible square matrix *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>*, det(*B*) <sup>=</sup> 0, such that

$$L = L(B) = \left\{ \sum\_{i=1}^{n} a\_i \beta\_i | a\_i \in \mathbb{Z}, \; 1 \le i \le n \right\} = \{ B\mathbf{x} | \mathbf{x} \in \mathbb{Z}^n \}.\tag{7.27}$$

If *L* = *L*(*B*) is a full rank lattice, define *d* = *d*(*L*) as

$$d = d(L) = |\det(B)|,\tag{7.28}$$

call *d* is the determinant of *L*. *d* = *d*(*L*) is the second most important mathematical quantity of a lattice. The lattice we discuss below is always assumed to be a full rank lattice.

For a lattice (full rank lattice), the generating matrix is not unique, but *d* = *d*(*L*) is unique. To prove this, first define the so-called unimodular matrix. Define

$$SL\_n(\mathbb{Z}) = \{ A = (a\_{ij})\_{n \times n} | a\_{ij} \in \mathbb{Z}, \det(A) = \pm 1 \},\tag{7.29}$$

Obviously, *SLn*(Z) forms a group under the multiplication of the matrix, because the *<sup>n</sup>*-order identity matrix *In* <sup>∈</sup> *SLn*(Z), and *<sup>A</sup>*<sup>1</sup> <sup>∈</sup> *SLn*(Z), *<sup>A</sup>*<sup>2</sup> <sup>∈</sup> *SLn*(Z), then *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*1*A*<sup>2</sup> <sup>∈</sup> *SLn*(Z). Specially, if *<sup>A</sup>* <sup>∈</sup> *SLn*(Z),*<sup>A</sup>* <sup>=</sup> (*ai j*)*<sup>n</sup>*×*<sup>n</sup>*, then the inverse matrix of *A*

$$A^{-1} = \pm \begin{vmatrix} a\_{11}^\* \ a\_{12}^\* \cdots \ a\_{1n}^\* \\ a\_{21}^\* \ a\_{22}^\* \cdots \ a\_{2n}^\* \\ \cdots \ \cdots \ \cdots \ \cdots \ \cdots \\ a\_{n1}^\* \ \cdots \ \cdots \ \cdots \ a\_{nn}^\* \end{vmatrix} \in SL\_n(\mathbb{Z}),$$

where *a*<sup>∗</sup> *i j* is the algebraic cofactor of *ai j* .

**Lemma 7.14** *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is a lattice (full rank lattice), B*<sup>1</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*n, then L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>=</sup> *<sup>L</sup>*(*B*1)*if and only if there is a unimodular matrix U* <sup>∈</sup> *SLn*(Z) <sup>⇒</sup> *<sup>B</sup>* <sup>=</sup> *<sup>B</sup>*1*U.* *Proof* If *<sup>B</sup>* <sup>=</sup> *<sup>B</sup>*1*U*, *<sup>U</sup>* <sup>∈</sup> *SLn*(Z), we prove *<sup>L</sup>*(*B*) <sup>=</sup> *<sup>L</sup>*(*B*1). Let <sup>α</sup> <sup>=</sup> *<sup>B</sup>*1*<sup>x</sup>* <sup>∈</sup> *<sup>L</sup>*(*B*1), where *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then

$$\alpha = B\_1 \mathbf{x} = B\_1 U U^{-1} \mathbf{x} = B U^{-1} \mathbf{x}.$$

Because of *<sup>U</sup>* <sup>−</sup>1*<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then <sup>α</sup> <sup>∈</sup> *<sup>L</sup>*(*B*), that is *<sup>L</sup>*(*B*1) <sup>⊂</sup> *<sup>L</sup>*(*B*). Similarly, if <sup>α</sup> <sup>=</sup> *Bx*, *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then

$$\alpha = Bx = B\_1 Ux \text{, where } Ux \in \mathbb{Z}^n.$$

Thus, α ∈ *L*(*B*1), that is *L*(*B*) = *L*(*B*1).

Conversely, if *L*(*B*) = *L*(*B*1), let *B* = [β1, β2,...,β*n*], *B*<sup>1</sup> = [α1, α2,...,α*n*], transition matrix

$$(\beta\_1, \beta\_2, \dots, \beta\_n) = (\alpha\_1, \alpha\_2, \dots, \alpha\_n)U.$$

Obviously by β*<sup>i</sup>* ∈ *L*(*B*1)(1 ≤ *i* ≤ *n*), *U* is an integer matrix. and

$$(\alpha\_1, \alpha\_2, \dots, \alpha\_n) = (\beta\_1, \beta\_2, \dots, \beta\_n) U\_1 \dots$$

Because α*<sup>i</sup>* ∈ *L*(*B*)(1 ≤ *i* ≤ *n*), *U*<sup>1</sup> also is an integer matrix. Because of

$$(\beta\_1, \beta\_2, \dots, \beta\_n) = (\alpha\_1, \alpha\_2, \dots, \alpha\_n)U = (\beta\_1, \beta\_2, \dots, \beta\_n)U\_1U.$$

We have *<sup>U</sup>*1*<sup>U</sup>* <sup>=</sup> *In*, thus det(*U*) = ±1, that is *<sup>U</sup>* <sup>∈</sup> *SLn*(Z), *<sup>B</sup>* <sup>=</sup> *<sup>B</sup>*1*U*, the Lemma holds.

By Lemma 7.14, *B*, *B*<sup>1</sup> are any two generating matrices of a lattice *L*, then

$$|\det(B)| = |\det(B\_1)| = d = d(L).$$

That is, the determinant *d*(*L*) of a lattice is an invariant.

For a lattice (full rank lattice) *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, the dual lattice of *<sup>L</sup>* is defined as

$$L^\* = \{ \alpha \in \mathbb{R}^n | \langle \alpha, \beta \rangle \in \mathbb{Z}, \forall \, \beta \in L \}. \tag{7.30}$$

**Lemma 7.15** *Let L* = *L*(*B*) *be a lattice, then the dual lattice of L is L*<sup>∗</sup> = *L*((*B*−<sup>1</sup>) )*, that is, if B is the generating matrix of L, then* (*B*−<sup>1</sup>) *is the generating matrix of L*∗*.*

*Proof* Let

$$L((B^{-1})') = \{ (B^{-1})' \circ | \, \mathbf{y} \in \mathbb{Z}^n \}.$$

any α ∈ *L*((*B*−<sup>1</sup>) ), α = (*B*−<sup>1</sup>) *<sup>y</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, <sup>β</sup> <sup>∈</sup> *<sup>L</sup>*, <sup>β</sup> <sup>=</sup> *Bx*, *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then

$$
\langle \alpha, \beta \rangle = \alpha' \beta = \mathbf{y}' B^{-1} B \mathbf{x} = \mathbf{y}' \mathbf{x} \in \mathbb{Z}.
$$

That means *L*((*B*−<sup>1</sup>) ) ⊂ *L*∗. Conversely, any α ∈ *L*∗, for all β ∈ *L*, there isα, β ∈ <sup>Z</sup>. So let *<sup>B</sup>* = [β1, β2,...,β*n*], then

#### 7.2 Basic Properties of Lattice 271

$$\left\langle \alpha, \sum\_{i=1}^{n} x\_i \beta\_i \right\rangle = \sum\_{i=1}^{n} x\_i \langle \alpha, \beta\_i \rangle \in \mathbb{Z}, \forall \ x\_i \in \mathbb{Z},$$

therefore, for each generating vector <sup>β</sup>*i*(<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*), there is α, β*i* ∈ <sup>Z</sup>. Write <sup>α</sup> <sup>=</sup> (*y*1, *y*2,..., *yn*),

$$\langle \alpha, \beta\_i \rangle \in \mathbb{Z}, \Longrightarrow \begin{pmatrix} \mathsf{y}\_1 \\ \vdots \\ \mathsf{y}\_n \end{pmatrix} = \begin{pmatrix} \mathsf{x}\_1 \\ \vdots \\ \mathsf{x}\_n \end{pmatrix} \in \mathbb{Z}^n.$$

Thus

$$
\begin{pmatrix} \mathbb{y}\_1 \\ \vdots \\ \mathbb{y}\_n \end{pmatrix} \in (B')^{-1} \begin{pmatrix} x\_1 \\ \vdots \\ x\_n \end{pmatrix}.
$$

That is α ∈ *L*((*B* )−<sup>1</sup>). Because *B* · *B*−<sup>1</sup> = *In*, =⇒ (*B*−<sup>1</sup>) *B* = *In*, thus (*B*−<sup>1</sup>) = (*B* )−1. So α ∈ *L*((*B*−<sup>1</sup>) ), that is *L*<sup>∗</sup> ⊂ *L*((*B*−<sup>1</sup>) ). We have *L*<sup>∗</sup> = *L*((*B*−<sup>1</sup>) ). The Lemma holds.

By Lemma 7.15, we immediately have the following corollary.

**Corollary 7.5** *Let L* = *L*(*B*) *be a full rank lattice, L*<sup>∗</sup> *is the dual lattice of L, then*

$$d(L^\*) = d^{-1}(L).$$

An equivalence relation in <sup>R</sup>*<sup>n</sup>* can be defined by using a lattice *<sup>L</sup>*, for all α, β <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, we define

$$\alpha \equiv \beta (\text{mod } L) \Longleftrightarrow \alpha - \beta \in L.$$

Obviously, this is an equivalent relation, called the congruence relation of mod *L*.

**Definition 7.5** Let *<sup>F</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a subset, and call *<sup>F</sup>* the basic region of a lattice (full rank lattice) *L*, if


By definition, the basic neighborhood of a lattice is the representative element set of the additive quotient group R*<sup>n</sup>*/*L*. Therefore, a basic neighborhood of any *L* forms an additive group under mod *L*.

**Lemma 7.16** *Let L* = *L*(*B*) *be a full rank lattice, then*


272 7 Lattice-Based Cryptography

*(iii)* Vol(*F*) = *d* = *d*(*L*)*.*

*Proof* (i) is trivial, because

$$F\_1 \cong \mathbb{R}^n / L, \, F\_2 \cong \mathbb{R}^n / L, \, \Longrightarrow \, F\_1 \cong F\_2.$$

To prove (ii), let *<sup>B</sup>* = [β1, β2,...,β*n*], then {β1, β2,...,β*n*} is a set of bases of <sup>R</sup>*<sup>n</sup>*, <sup>∀</sup> <sup>α</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, <sup>α</sup> can be expressed as a linear combination of <sup>β</sup>1, β2,...,β*n*, that is

$$\alpha = \sum\_{i=1}^{n} a\_i \beta\_i, \forall \ a\_i \in \mathbb{R}.$$

Let [α]*<sup>B</sup>* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=1[*ai*]β*<sup>i</sup>* , {α}*<sup>B</sup>* = α − [α]*B*, then {α}*<sup>B</sup>* can be expressed as

$$\{\alpha\}\_B = B \begin{pmatrix} \chi\_1 \\ \chi\_2 \\ \vdots \\ \chi\_n \end{pmatrix}, \text{ where } 0 \le \chi\_i < 1, \quad 1 \le i \le n.$$

That is {α}*<sup>B</sup>* <sup>∈</sup> *<sup>F</sup>*. Because <sup>α</sup> − {α}*<sup>B</sup>* = [α]*<sup>B</sup>* <sup>∈</sup> *<sup>L</sup>*, so for any <sup>α</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, there is a {α}*<sup>B</sup>* ∈ *F*, such that

$$\alpha \equiv \langle \alpha \rangle\_B (\text{mod } L).$$

And two points α = *Bx* and β = *By* in *F*, then

$$
\alpha - \beta = B(\alpha - \mathfrak{y}) = Bz.
$$

where *z* = (*z*1,*z*2,...,*zn*), |*zi*| < 1. So α ≡ β(mod *L*), if α = β. So *F* is a basic neighborhood of *L*.

Let's prove (iii). Because all basic neighborhoods of *L* are isomorphic, they have the same volume, (iii) gives a specific basic region *F* of *L*, so we can only prove Vol(*F*) = *d* = *d*(*L*). Obviously,

$$\text{Vol}(F) = \int\limits\_{\mathbf{y} = (y\_1, y\_2, \dots, y\_n) \in F} \mathbf{d} \mathbf{y}\_1 \mathbf{d} \mathbf{y}\_2 \cdots \mathbf{d} \mathbf{y}\_n$$

make variable substitution *Bx* = *y* and calculate the Jacobi of the vector value

$$\mathbf{d} \mathbf{y}\_1 \mathbf{d} \mathbf{y}\_2 \cdots \mathbf{d} \mathbf{y}\_n = d(\lambda) \mathbf{d} x\_1 \cdots \mathbf{d} x\_n \dots$$

#### 7.2 Basic Properties of Lattice 273

Thus

$$\text{Vol}(F) = \int\_0^1 \cdots \int\_0^1 d(\lambda) \text{d}x\_1 \cdots \text{d}x\_n = d(L).$$

We have completed the proof of Lemma 7.16.

Next, we discuss the gram Schmidt orthogonalization algorithm. If *B* = [β1, β2, ...,β*n*] is the generation matrix of *L*, {β1, β2,...,β*n*} can be transformed into a set of orthogonal bases {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* }, where β<sup>∗</sup> <sup>1</sup> = β1, and

$$
\beta\_i^\* = \beta\_i - \sum\_{j=1}^{i-1} \frac{\langle \beta\_i, \beta\_j^\* \rangle}{\langle \beta\_j^\*, \beta\_j^\* \rangle} \beta\_j^\*, \tag{7.31}
$$

{β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } is called the orthogonal basis corresponding to {β1, β2,...,β*n*}. *B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] is the orthogonal matrix corresponding to *B*. For any 1 ≤ *i* ≤ *n*, denote

$$\begin{cases} u\_{ii} = 1, u\_{ij} = 0, \text{when } j > i. \\ u\_{ij} = \frac{\langle \beta\_i, \beta\_j^\* \rangle}{|\beta\_j^\*|^2}, \text{when } 1 \le j \le i \le n. \\ U = (u\_{ij})\_{n \times n}. \end{cases} \tag{7.32}$$

Then *U* is a lower triangular matrix, and

$$
\begin{pmatrix} \beta\_1 \\ \beta\_2 \\ \vdots \\ \beta\_n \end{pmatrix} = U \begin{pmatrix} \beta\_1^\* \\ \beta\_2^\* \\ \vdots \\ \beta\_n^\* \end{pmatrix}. \tag{7.33}
$$

If both sides are transposed at the same time, there is

$$(\beta\_1, \beta\_2, \dots, \beta\_n) = (\beta\_1^\*, \beta\_2^\*, \dots, \beta\_n^\*)U'.\tag{7.34}$$

Therefore, *U* is the transition matrix between two groups of bases.

**Lemma 7.17** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice, B* = [β1, β2,...,β*n*] *is the generating matrix, B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] *is the corresponding orthogonal matrix, d* = *d*(*L*) *is the determinant of L, then we have*

$$d = \prod\_{i=1}^{n} |\beta\_i^\*| \le \prod\_{i=1}^{n} |\beta\_i|. \tag{7.35}$$

*Proof* By (7.24), we have *B* = *B*∗*U*, because det(*U*) = 1, so

$$\det(B) = \det(B^\*).$$

By the definition,

$$\begin{aligned} d^2 &= \det(B^\prime B) = \det(U(B^\*)^\prime B^\* U^\prime) \\ &= \det((B^\*)^\prime B^\*) \\ &= \det(\text{diag}(|\beta\_1^\*|^2, |\beta\_2^\*|^2, \dots, |\beta\_n^\*|^2)). \end{aligned}$$

So there is

$$d = \prod\_{i=1}^{n} |\mathcal{B}\_i^\*| \dots$$

In order to prove the inequality on the right of Eq. (7.35), we only prove

$$|\beta\_i^\*| \le |\beta\_i|, \quad 1 \le i \le n. \tag{7.36}$$

Because <sup>β</sup>*<sup>i</sup>* <sup>=</sup> "*<sup>i</sup> <sup>j</sup>*=<sup>1</sup> *ui j*β<sup>∗</sup> *<sup>j</sup>* , then

$$\begin{aligned} |\beta\_i|^2 &= \langle \beta\_i, \beta\_i \rangle = \left\langle \sum\_{j=1}^i u\_{ij} \beta\_j^\*, \sum\_{j=1}^i u\_{ij} \beta\_j^\* \right\rangle \\ &= \sum\_{j=1}^i u\_{ij}^2 \langle \beta\_j^\*, \beta\_j^\* \rangle \\ &= \langle \beta\_i^\*, \beta\_i^\* \rangle + \sum\_{j=1}^{i-1} u\_{ij}^2 \langle \beta\_j^\*, \beta\_j^\* \rangle. \end{aligned}$$

Therefore, the inequality on the right of (7.35) holds, the Lemma is proved.

Equation (7.35) is usually called Hadamard inequality, and we give another proof here.

In order to define the concept of continuous minima on a lattice *L*, we record the minimum distance on *L* with λ1. That is λ<sup>1</sup> = λ(*L*). Another definition of λ<sup>1</sup> is the minimum positive real number *r*, so that the linear space formed by *L* ∩ Ball(0,*r*) is a one-dimensional space, where

$$\text{Ball}(0, r) = \{x \in \mathbb{R}^n | |x| \le r\}$$

is a closed sphere with 0 as the center and *r* as the radius. The concept of *n* continuous minima λ1, λ2,...,λ*<sup>n</sup>* in *L* can be given.

**Definition 7.6** Let *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a full rank lattice, the *<sup>i</sup>*-th continuous minimum λ*<sup>i</sup>* is defined as

$$
\lambda\_i = \lambda\_i(L) = \inf \{ r | \dim(\text{span}(L \cap \text{Ball}(0, r))) \ge i \}.
$$

#### 7.2 Basic Properties of Lattice 275

The following lemma is a useful lower bound estimate of the minimum distance λ1.

**Lemma 7.18** *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is a lettice (full rank lattice), B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *n* ] *is the corresponding orthogonal basis, then*

$$
\lambda\_1 = \lambda(L) \ge \min\_{1 \le i \le n} |\beta\_i^\*|. \tag{7.37}
$$

*Proof* For <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, *<sup>x</sup>* <sup>=</sup> 0, we prove

$$|\mathcal{B}x| \ge \min\_{1 \le i \le n} |\beta\_i^\*|, \ x \in \mathbb{Z}^n, \ x \ne 0.$$

Let *x* = (*x*1, *x*2,..., *xn*) = 0, *j* be the largest subscript ⇒ *x <sup>j</sup>* = 0, then

$$|\langle Bx, \beta\_j^\* \rangle| = \left| \left\langle \sum\_{i=1}^j x\_i \beta\_j, \beta\_j^\* \right\rangle \right| = |x\_j| |\beta\_j^\*|^2.$$

Because when *i* < *j*,

$$
\langle \beta\_i, \beta\_j^\* \rangle = 0, \text{ and } \langle \beta\_j, \beta\_j^\* \rangle = \langle \beta\_j^\*, \beta\_j^\* \rangle.
$$

On the other hand,

$$|\langle Bx, \beta\_j^\* \rangle| \le |Bx||\beta\_j^\*|.$$

So

$$|Bx| \ge |x\_j| |\beta\_j^\*| \ge \min\_{1 \le i \le n} |\beta\_i^\*|.$$

Lemma 7.18 holds!

**Corollary 7.6** *The continuous minimum* λ1, λ2,...,λ*<sup>n</sup> of a lattice L is reachable, that is, it exists* α*<sup>i</sup>* ∈ *L* ⇒ |α*i*| = λ*<sup>i</sup> ,* 1 ≤ *i* ≤ *n.*

*Proof* The lattice points contained in ball Ball(0, δ) with center 0 and radius δ (δ > λ*i*) are finite, because in a bounded region (finite volume), if there are infinite lattice points, there must be a convergent subsequence, but the distance between any different two points in *L* is greater than or equal to λ1, which indicates that

$$|L \cap \text{Ball}(0, \delta)| \langle \infty, \,\, \delta \rangle \lambda\_i \rangle$$

has finite lattice points, it's not hard for us to find α<sup>1</sup> ∈ *L* ⇒ |α1| = λ1, α<sup>2</sup> ∈ *L* ⇒ |α2| = λ2,…,|α*n*| = λ*n*. The Corollary holds.

In Sect. 7.1, the geometry of numbers is relative to the integer lattice Z*<sup>n</sup>*; next, we extend the main results to the general full rank lattice.

**Lemma 7.19** (Compare with Lemma 7.5) *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is a lattice (full rank lattice), <sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*n, if* Vol(*R*) > *<sup>d</sup>*(*L*)*, then there are two different points in <sup>R</sup>,* <sup>α</sup> <sup>∈</sup> *<sup>R</sup>,* β ∈ *R* ⇒ α − β ∈ *L.*

*Proof* Let *F* be a basic region of *L*, that is

$$F = \{ Bx | \mathbf{x} = (\mathbf{x}\_1, \dots, \mathbf{x}\_n), \mathbf{0} \le |\mathbf{x}\_i| < 1, \, 1 \le i \le n \}.$$

Obviously, R*<sup>n</sup>* can be divided into the following disjoint subsets,

$$\begin{aligned} \mathbb{R}^n &= \cup\_{\alpha \in L} \{\alpha + \mathbf{y} | \mathbf{y} \in F\}, \\ &= \cup\_{\alpha \in L} \{\alpha + F\}. \end{aligned}$$

For a given lattice point α ∈ *L*, define

$$
\mathcal{R}\_{\alpha} = \mathcal{R} \cap \{\alpha + F\} = \alpha + D\_{\alpha}, D\_{\alpha} \subset F.
$$

Therefore, *R* can be divided into the following disjoint subsets,

$$\mathcal{A} = \cup\_{a \in L} \mathcal{R}\_a,\\ \Rightarrow \text{Vol}(\mathcal{A}) = \sum\_{a \in L} \text{Vol}(\mathcal{R}\_a) = \sum\_{a \in L} \text{Vol}(D\_a).$$

If for any α, β <sup>∈</sup> *<sup>L</sup>*, <sup>α</sup> <sup>=</sup> <sup>β</sup>, *<sup>D</sup>*<sup>α</sup> <sup>∩</sup> *<sup>D</sup>*<sup>β</sup> <sup>=</sup> <sup>∅</sup>, then

$$\text{Vol}(\mathcal{A}) = \text{Vol}(\cup\_{a \in L} D\_a) \le \text{Vol}(F) = d(L),$$

contradicts assumptions. So it must exist α, β <sup>∈</sup> *<sup>L</sup>*, <sup>α</sup> <sup>=</sup> β, =⇒ *<sup>D</sup>*<sup>α</sup> <sup>∩</sup> *<sup>D</sup>*<sup>β</sup> <sup>=</sup> <sup>∅</sup>. Let *x* ∈ *D*<sup>α</sup> ∩ *D*β, then α + *x* ∈ *R*, β + *x* ∈ *R*. And

$$(\alpha + \mathfrak{x}) - (\beta + \mathfrak{x}) = \alpha - \beta \in L.\,\,\,\,$$

The Lemma holds.

**Lemma 7.20** (Compare with 7.6) *Let L be a full rank lattice, <sup>R</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is a symmetric convex body. And* Vol(*R*) > 2*nd*(*L*)*, then R contains a nonzero lattice point, that is* ∃ α ∈ *L,* α = 0, *such that* α ∈ *R.*

*Proof* Let

$$\frac{1}{2}\mathcal{R} = \{\mathbf{x}|2\mathbf{x} \in \mathcal{R}\}.$$

Then

$$\text{Vol}\left(\frac{1}{2}\mathcal{R}\right) = 2^{-n}\text{Vol}(\mathcal{R}) > d(L).$$

By 7.19, there is *<sup>x</sup>* <sup>∈</sup> <sup>1</sup> <sup>2</sup>*R*, *<sup>y</sup>* <sup>∈</sup> <sup>1</sup> <sup>2</sup>*R*, =⇒ *x* − *y* ∈ *L*. And because 2*x* ∈ *R*, 2*y* ∈ *R*, *R* is a symmetric convex body, by Lemma 7.4,

#### 7.2 Basic Properties of Lattice 277

$$\frac{1}{2}(2x - 2y) = x - y \in \mathcal{R}.$$

The Lemma holds.

**Corollary 7.7** *Let L be a full rank lattice,* λ(*L*) = λ<sup>1</sup> *is the minimum distance of L. Then*

$$
\lambda\_1 = \lambda(L) \le \sqrt{n} (d(L))^{\frac{1}{n}}.\tag{7.38}
$$

*Proof* First we prove

$$\text{Vol}(\text{Ball}(0, r)) \ge \left(\frac{2r}{\sqrt{n}}\right)^n. \tag{7.39}$$

This is because Ball(0,*r*) contains the following cubes

$$\left\{ \mathbf{x} \in \mathbb{R}^n | \mathbf{x} = (\mathbf{x}\_1, \dots, \mathbf{x}\_n), \forall \, |\mathbf{x}\_i| < \frac{r}{\sqrt{n}} \right\} \subset \text{Ball}(0, r).$$

By the definition, there are no nonzero lattice points in open ball Ball(0, λ1), by Lemma 7.20, because Ball(0, λ1) is a symmetrical convex body, there is

$$\text{Vol}(\text{Ball}(0, \lambda\_1)) \le 2^n d(L).$$

Thus

$$\left(\frac{2\lambda\_1}{\sqrt{n}}\right)^n \le 2^n d(L).$$

That is

$$
\lambda\_1 \le \sqrt{n} (d(L))^{\frac{1}{n}}.
$$

The Corollary holds.

Combined with Eq. (7.37), we obtain the estimation of the upper and lower bounds of the minimum distance of a lattice,

$$\min\_{1 \le i \le n} |\beta\_i^\*| \le \lambda(L) \le \sqrt{n} (d(L))^{\frac{1}{n}}.\tag{7.40}$$

**Lemma 7.21** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice (full rank lattice),* <sup>λ</sup>1, λ2,...,λ*<sup>n</sup> is the continuous minimum of L, d* = *d*(*L*) *is the determinant of L, then*

$$
\lambda\_1 \lambda\_2 \dots \lambda\_n \le n^{\frac{n}{2}} d(L). \tag{7.41}
$$

*Proof* Let {α1, α2,...,α*n*} ⊂ *<sup>L</sup>*, and <sup>|</sup>α*i*| = <sup>λ</sup>*<sup>i</sup>* is a set of bases of <sup>R</sup>*<sup>n</sup>*. Let

$$T = \left\{ \mathbf{y} \in \mathbb{R}^n | \sum\_{i=1}^n \left( \frac{\langle \mathbf{y}, \alpha\_i^\* \rangle}{\lambda\_i |\alpha\_i^\*|} \right)^2 < 1 \right\},\tag{7.42}$$

where {α<sup>∗</sup> <sup>1</sup> , α<sup>∗</sup> <sup>2</sup> ,...,α<sup>∗</sup> *<sup>n</sup>* } is the orthogonal basis corresponding to {α1, α2,...,α*n*}. Let's prove that *T* does not contain any nonzero lattice points. Let *y* ∈ *L*, *y* = 0, let *k* be the largest subscript so that |*y*| ≥ λ*<sup>k</sup>* , then

$$\mathbf{y} \in \text{Span}(\alpha\_1^\*, \alpha\_2^\*, \dots, \alpha\_k^\*) = \text{Span}(\alpha\_1, \alpha\_2, \dots, \alpha\_k).$$

Because if *y* is linearly independent of α1, α2,...,α*<sup>k</sup>* , then

$$k+1 \le \dim(\text{Span}(\alpha\_1, \alpha\_2, \dots, \alpha\_k, \text{y}) \cap \text{Ball}(\mathbf{0}, |\mathbf{y}|)).$$

λ*<sup>k</sup>*+<sup>1</sup> ≤ |*y*| is obtained from the definition of λ*<sup>k</sup>*+1, which contradicts the definition of *k*. By *y* ∈ Span(α1, α2,...,α*<sup>k</sup>* ),

$$\sum\_{i=1}^{n} \left( \frac{\langle \mathbf{y}, \alpha\_i^\* \rangle}{\lambda\_i |\alpha\_i^\*|} \right)^2 = \sum\_{i=1}^{k} \left( \frac{\langle \mathbf{y}, \alpha\_i^\* \rangle}{\lambda\_i |\alpha\_i^\*|} \right)^2$$

$$\geq \frac{1}{\lambda\_k^2} \sum\_{i=1}^{k} \frac{\langle \mathbf{y}, \alpha\_i^\* \rangle^2}{|\alpha\_i^\*|^2} = \frac{1}{\lambda\_k^2} |\mathbf{y}|^2 \geq 1.$$

Therefore *y* ∈/ *T* , by Lemma 7.20, because *T* is a symmetric convex body, thus

$$\text{Vol}(T) \le 2^n d.$$

On the other hand,

$$\begin{aligned} \text{Vol}(T) &= \left(\prod\_{i=1}^n \lambda\_i\right) \cdot \text{Vol}(\text{Ball}(0, 1)) \\ &\geq \prod\_{i=1}^n \lambda\_i \left(\frac{2}{\sqrt{n}}\right)^n. \end{aligned}$$

So

$$\prod\_{i=1}^n \lambda\_i \le n^{\frac{\pi}{2}} d.$$

Lemma 7.21 holds.

The above lemma shows that the upper bound (7.38) of λ<sup>1</sup> is valid for λ*<sup>i</sup>* in the sense of geometric average.

Finally, we discuss the computational difficulties on the lattice. These problems are the main scientific basis and technical support in the design of trap gate function, and they are also the cornerstone of the security of lattice cryptography.

#### 1. Shortest vector problem SVP

Lattice *<sup>L</sup>* is a discrete geometry in <sup>R</sup>*<sup>n</sup>*, we know that its minimum distance <sup>λ</sup><sup>1</sup> <sup>=</sup> λ(*L*) is the length of the shortest vector in *L*. How to find its shortest vector *u*<sup>0</sup> ∈ *L*

for any full rank lattice *L*, =⇒

$$|\mu\_0| = \min\_{\mathbf{x} \in L, \mathbf{x} \neq \mathbf{0}} |\mathbf{x}| = \lambda\_1.$$

It is the so-called shortest vector calculation problem. At present, there are insurmountable difficulties in theory and calculation, because we only know the existence of *u*0, but we can't calculate *u*0. Second, the current main research focuses on the approximation of the shortest vector. The so-called shortest vector approximation is to find a nonzero vector *u* ∈ *L* on *L*, =⇒

$$|u| \le r(n)\lambda\_1, \ u \in L, u \ne 0,$$

where *r*(*n*) ≥ 1 is called the approximation coefficient, which only depends on the dimension of lattice *L*.

In 1982, H. W. Lenstra, A. K. Lenstra and L. Lovasz creatively developed a set of algorithms in (1982) to effectively solve the approximation problem of the shortest vector, which is the famous LLL algorithm in lattice theory. The computational complexity of LLL algorithm is polynomial for the whole lattice, and the approximation coefficient *r*(*n*) = 2 *n*−1 <sup>2</sup> . How to improve the approximation coefficient in LLL algorithm to the polynomial coefficient of *n* is the main research topic at present. For example, Schnorr's work in 1987 and Gama and Nguyen's work (2008a, 2008b) are very representative, but they are still far from the polynomial function, so the academic circles generally speculate:

Conjecture 1: there is no polynomial algorithm that can approximate the shortest vector so that the approximation coefficient *r*(*n*) is a polynomial function of *n*.

#### 2. Closest vector problem CVP

Let *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a lattice, *<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is an arbitrary given vector, and it is easy to prove that there is a lattice point *ut* ∈ *L*, =⇒

$$|u\_t - t| = \min\_{\mathbf{x} \in L} |\mathbf{x} - t|,$$

*ut* is called the nearest lattice point (vector) of *t*. When *t* = 0 is a zero vector, *u*<sup>0</sup> is the shortest vector of *L*, so the adjacent vector problem is a general form of the shortest vector problem. Similarly, we only know the existence of the adjacent vector *ut* , and there is no definite algorithm to find *ut* instead of the approximation problem of the adjacent vector, *x* ∈ *L*, if

$$|x - t| \le r\_1(n)|u\_t - t|,$$

then *x* is called the approximation coefficient, which is the approximation adjacent vector of *r*1(*n*), in 1986, Babai proposed an effective algorithm to approximate the adjacent vector in Babai (1986), and its approximation coefficient *r*1(*n*) is generally of the same order as the approximation coefficient *r*(*n*) of the shortest vector.

There are many other difficult computational problems on lattice, such as the Successive Shortest vector problem, which is essentially to find a deterministic algorithm to approximate each α*<sup>i</sup>* ∈ *L*, where |α*i*| = λ*<sup>i</sup>* is the continuous minimum of *L*. However, SVP and CVP are commonly used in lattice cryptosystem design and analysis, and most of the research is based on the integer lattice.

# **7.3 Integer Lattice and** *q***-Ary Lattice**

**Definition 7.7** A full rank lattice *<sup>L</sup>* is called an integer lattice, if *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>*, an integer lattice *<sup>L</sup>* is called a *<sup>q</sup>*-ary lattice, if *<sup>q</sup>*Z*<sup>n</sup>* <sup>⊂</sup> *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>*, where *<sup>q</sup>* <sup>≥</sup> 1 is a positive integer.

It is easy to see from the definition that a lattice *L* = *L*(*B*) is an integer lattice <sup>⇔</sup> *<sup>B</sup>* <sup>∈</sup> <sup>Z</sup>*n*×*<sup>n</sup>* is an integer square matrix, so the determinant *<sup>d</sup>* <sup>=</sup> *<sup>d</sup>*(*L*) of an entire lattice *L* is a positive integer.

**Lemma 7.22** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an integer lattice, d* <sup>=</sup> *<sup>d</sup>*(*L*)*is the determinant of L, then d*Z*<sup>n</sup>* <sup>⊂</sup> *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*n, therefore, an integer lattice is always a d-ary lattice* (*d* = *q*)*.*

*Proof* Let <sup>α</sup> <sup>∈</sup> *<sup>d</sup>*Z*<sup>n</sup>*, let's prove that <sup>α</sup> <sup>∈</sup> *<sup>L</sup>*, that is, <sup>α</sup> <sup>=</sup> *Bx* always has the solution of the entire vector *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*. Let *<sup>B</sup>*−<sup>1</sup> be the inverse matrix of *<sup>B</sup>*, then

$$B^{-1} = \frac{1}{\det(B)} B^\* = \frac{1}{\det(B)} \begin{bmatrix} b\_{11}^\* \ b\_{12}^\* \ \cdots \ b\_{1n}^\* \\ b\_{21}^\* \ b\_{22}^\* \ \cdots \ b\_{2n}^\* \\ \cdots \ \cdots \ \cdots \ \cdots \ \cdots \\ b\_{n1}^\* \ b\_{n2}^\* \ \cdots \ b\_{nn}^\* \end{bmatrix},$$

where *B* = (*bi j*)*<sup>n</sup>*×*<sup>n</sup>*, *b*<sup>∗</sup> *i j* is the algebraic cofactor of *bi j* . Because *<sup>B</sup>* <sup>∈</sup> <sup>Z</sup>*n*×*<sup>n</sup>*, so *<sup>B</sup>*<sup>∗</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*×*<sup>n</sup>*, thus *d B*−<sup>1</sup> = ±*B*<sup>∗</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*×*<sup>n</sup>*, write <sup>α</sup> <sup>=</sup> *<sup>d</sup>*β, then <sup>β</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, and

$$\alpha = B^{-1}\alpha = dB^{-1}\beta = \pm B^\*\beta \in \mathbb{Z}^n.$$

Thus <sup>α</sup> <sup>∈</sup> *<sup>L</sup>*. That is *<sup>d</sup>*Z*<sup>n</sup>* <sup>⊂</sup> *<sup>L</sup>*, the Lemma holds.

The following lemma is a simple conclusion in algebra. For completeness, we prove the following.

**Lemma 7.23** *Let L be a q-ary lattice,* Z*<sup>q</sup> is the residual class rings* mod *q, then*


*Proof* <sup>α</sup> <sup>=</sup> (*a*1, *<sup>a</sup>*2,..., *an*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, <sup>β</sup> <sup>=</sup> (*b*1, *<sup>b</sup>*2,..., *bn*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, if <sup>∀</sup> *ai* <sup>≡</sup> *bi*(mod *<sup>q</sup>*), we write <sup>α</sup> <sup>≡</sup> β(mod *<sup>q</sup>*). For any <sup>α</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, define

$$
\bar{\alpha} = (\bar{a}\_1, \bar{a}\_2, \dots, \bar{a}\_n) \in \mathbb{Z}\_q^n,
$$

where *a*¯*<sup>i</sup>* is the minimum nonnegative residue of *ai* mod *q*, and thus, we have α ≡ α(¯ mod *<sup>q</sup>*). Define mapping <sup>σ</sup> : <sup>Z</sup>*<sup>n</sup>* <sup>σ</sup> −→ <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* as σ (α) = ¯α, this is a surjection, and

$$
\sigma(\alpha + \beta) = \bar{\alpha} + \bar{\beta} = \sigma(\alpha) + \sigma(\beta).
$$

Therefore, <sup>σ</sup> is a full group homomorphism. Obviously Ker<sup>σ</sup> <sup>=</sup> *<sup>q</sup>*Z*<sup>n</sup>*, therefore, by the isomorphism theorem of groups, we have

$$
\mathbb{Z}^n / q \mathbb{Z}^n \cong \mathbb{Z}\_q^n.
$$

Because of *<sup>q</sup>*Z*<sup>n</sup>* <sup>⊂</sup> *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>*, then by the isomorphism theorem of groups,

$$\mathbb{Z}^{n}/\mathbb{L} \cong \mathbb{Z}^{n}/q\mathbb{Z}^{n}/\mathbb{L}/q\mathbb{Z}^{n} \cong \mathbb{Z}\_{q}^{n}/\mathbb{L}/q\mathbb{Z}^{n}.$$

The Lemma holds.

Next, we will prove that Z*<sup>n</sup>*/*L* is a finite group. Therefore, we first discuss the elementary transformation of matrix. The so-called elementary transformation of matrix refers to elementary row transformation and elementary column transformation, specifically refers to the following three kinds of elementary transformations:

(1) Transform two rows or two columns of matrix *A*:

 σ*i j*(*A*)-Transform rows *i* and *j* of *A* τ*i j*(*A*)-Transform columns *i* and *j* of *A*

(2) A row or column multiplied by −1 by *A*:

 σ−*<sup>i</sup>*(*A*)-Multiply row *i* of *A* by − 1 τ−*<sup>i</sup>*(*A*)-Multiply column *i* of *A* by − 1

(3) Add the *<sup>k</sup>* times of a row (column) to another row (column), *<sup>k</sup>* <sup>∈</sup> <sup>R</sup>, in many cases, we require *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup> to be an integer:

> σ*ki*<sup>+</sup> *<sup>j</sup>*(*A*)-Add *k* times of row *i* of *A* to row *j* τ*ki*<sup>+</sup> *<sup>j</sup>*(*A*)-Add *k* times of column *i* of *A* to column *j*

The *n*-order identity matrix is represented by *In*, the matrix obtained by the above elementary transformation of *In* is called elementary matrix. We note that all elementary matrices are unimodular matrices (see (7.29)), and

$$\begin{cases} \sigma\_{ij}(A) = \sigma\_{ij}(I\_n)A, \ \mathsf{τ}\_{ij}(A) = A\mathsf{τ}\_{ij}(I\_n) \\ \sigma\_{-i}(A) = \sigma\_{-i}(I\_n)A, \ \mathsf{τ}\_{-i}(A) = A\mathsf{τ}\_{-i}(I\_n) \\ \sigma\_{ki+j}(A) = \sigma\_{ki+j}(I\_n)A, \ \mathsf{τ}\_{ki+j}(A) = A\mathsf{τ}\_{ki+j}(I\_n) \end{cases} \tag{7.43}$$

That is, elementary row transformation for *A* is equal to multiplying the corresponding elementary matrix from the left, and elementary column transformation for *A* is equal to multiplying the corresponding elementary matrix from the right.

**Lemma 7.24** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an integer lattice, then* <sup>Z</sup>*<sup>n</sup>*/*L is a finite group, and*

$$|\mathbb{Z}^n/L| = d(L).$$

*Proof* According to the knowledge of linear algebra, an integer square matrix *B* ∈ Z*<sup>n</sup>* can always be transformed into a lower triangular matrix by elementary row transformation; that is, there is a unimodular matrix *<sup>U</sup>* <sup>∈</sup> *SLn*(Z), so that

$$UB = \begin{bmatrix} \* & 0 & \cdots & 0 \\ \* & \* & \cdots & 0 \\ \cdot & \cdot & \cdot & \cdot & \cdots \\ \* & \* & \cdots & \cdot & \ast \end{bmatrix}.$$

Then the elementary column transformation of *U B* can always be transformed into an upper triangular matrix, so it is a diagonal matrix; that is, there is a unimodular matrix *<sup>U</sup>*<sup>1</sup> <sup>∈</sup> *SLn*(Z),<sup>⇒</sup>

$$UBU\_1 = \text{diag}\{\delta\_1, \delta\_2, \dots, \delta\_n\}.$$

where <sup>δ</sup>*<sup>i</sup>* <sup>=</sup> 0, <sup>δ</sup>*<sup>i</sup>* <sup>∈</sup> <sup>Z</sup>, and

$$d(L) = |\det(UBU\_1)| = \prod\_{i=1}^{n} |\delta\_i|.$$

Let *L*(*U BU*1) be an integral lattice generated by *U BU*1, we have quotient group isomorphism

$$\mathbb{Z}^n/L(UBU\_1) \cong \oplus\_{i=1}^n \mathbb{Z}/|\delta\_i|\mathbb{Z} = \oplus\_{i=1}^n \mathbb{Z}\_{|\delta\_i|}.$$

Thus

$$|\mathbb{Z}^n/L(UBU\_1)| = \prod\_{i=1}^n |\delta\_i| = d(L).$$

Because of *L*(*B*) = *L*(*BU*1) and *L*(*B*) ∼= *L*(*U B*), Thus *L*(*B*) ∼= *L*(*U BU*1), so

$$|\mathbb{Z}^n/L(B)| = |\mathbb{Z}^n/L(UBU\_1)| = d(L).$$

Lemma 7.24 holds.

An integer square matrix *<sup>B</sup>* <sup>=</sup> (*bi j*)*<sup>n</sup>*×*<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*×*<sup>n</sup>* is called Hermite normal form matrix, if *B* is an upper triangular matrix, that is *bi j* = 0, 1 ≤ *j* < *i* ≤ *n*, and

$$b\_{li} \ge 1, 0 \le b\_{lj} < b\_{li}, \; 1 \le i < j \le n. \tag{7.44}$$

A Hermite normal form matrix, referred to as HNF matrix.

**Definition 7.8** *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* is an integer lattice, and *<sup>B</sup>* is the HNF matrix, which is called the HNF basis of *L*, denote as *B* = HNF(*L*).

The following lemma proves that a whole lattice has a unique HNF basis, so it is reasonable to use HNF(*L*) to represent HNF basis.

**Lemma 7.25** *Let L* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an integer lattice, then there is a unique HNF matrix B* ⇒ *L* = *L*(*B*)*.*

*Proof* Let *L* = *L*(*A*), *A* is the generating matrix of *L*, by using the elementary column transformation, *A* can be transformed into an upper triangular matrix, that is

$$AU\_1 = \begin{bmatrix} c\_{11} \ c\_{12} \ \cdots \ c\_{1n} \\ 0 \ c\_{22} \ \cdots \ c\_{2n} \\ \cdots \ \cdots \ \cdots \ \cdots \ \cdots \\ 0 \ 0 \ \cdots \ c\_{nn} \end{bmatrix}, \quad U\_1 \in SLn(\mathbb{Z})\dots$$

where *Cii* > 0, 1 ≤ *i* ≤ *n*, if *AU*<sup>1</sup> is transformed continuously, there is a unimodular matrix *U*2,⇒ *AU*1*U*<sup>2</sup> = *B* is the HNF matrix, because *L*(*B*) = *L*(*AU*1*U*2), know that *L* has HNF base *B*.

Let's prove the uniqueness of HNF base *B* if there are two HNF matrices *B*1,*B*<sup>2</sup> ⇒ *L*(*B*1) = *L*(*B*2), then from Lemma 7.14, there is a unimodular matrix *<sup>U</sup>* <sup>∈</sup> *SLn*(Z) such that *<sup>B</sup>*<sup>1</sup> <sup>=</sup> *<sup>B</sup>*2*U*; that is, the elementary column transformation defined by formula (7.43) can be continuously implemented on *B*<sup>2</sup> to obtain *B*1, but for *B*2, any column transformation τ*i j* ,τ−*<sup>i</sup>* and τ*ki*<sup>+</sup> *<sup>j</sup>* is not a HNF matrix, so *U* = *In* is a unit matrix, that is *B*<sup>1</sup> = *B*2. The Lemma holds.

**Lemma 7.26** *Let L* = *L*(*B*) *be an integer lattice, B* = (*bi j*)*<sup>n</sup>*×*<sup>n</sup> is a HNF matrix, B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ]*is the orthogonal basis corresponding to B* = [β1, β2,...,β*n*]*, then*

$$B^\* = [\beta\_1^\*, \beta\_2^\*, \dots, \beta\_n^\*] = \text{diag}\{b\_{11}, b\_{22}, \dots, b\_{nn}\}$$

*is a diagonal matrix.*

*Proof* We prove β<sup>∗</sup> *<sup>i</sup>* = (0, 0,..., *bii*, 0,..., 0) , induction of *i*, when *i* = 1, β<sup>∗</sup> <sup>1</sup> = β<sup>1</sup> = (*b*11, 0,..., 0) . The proposition holds, if for *j* ≤ *i*, there isβ<sup>∗</sup> *<sup>j</sup>* = (0, 0,..., *bj j*, 0,..., 0) holds, then when *i* + 1, by (7.31), there is

.

$$\begin{aligned} \boldsymbol{\beta}\_{i+1}^{\*} &= \boldsymbol{\beta}\_{i+1} - \sum\_{j=1}^{i} \frac{\langle \beta\_{i+1}, \beta\_{j}^{\*} \rangle}{|\beta\_{j}^{\*}|^{2}} \boldsymbol{\beta}\_{j}^{\*} \\ &= \beta\_{i+1} - \sum\_{j=1}^{i} \frac{b\_{j(i+1)}}{b\_{jj}} \boldsymbol{\beta}\_{j}^{\*} \\ &= \begin{pmatrix} b\_{1(i+1)} \\ b\_{2(i+1)} \\ \vdots \\ b\_{(i+1)} \\ b\_{(i+1)(i+1)} \\ \vdots \\ b\_{(i+1)(i+1)} \\ 0 \end{pmatrix} - \begin{pmatrix} b\_{1(i+1)} \\ b\_{2(i+1)} \\ \vdots \\ b\_{(i+1)} \\ 0 \\ \vdots \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ b\_{(i+1)l(i+1)} \\ \vdots \\ 0 \end{pmatrix} \end{aligned}$$

Thus the proposition holds.

Next, we discuss *q*-ary lattices, where *q* ≥ 1 is a positive integer, the following two *q*-ary lattices are often used in lattice cryptosystems.

**Definition 7.9** Let <sup>Z</sup>*<sup>q</sup>* be a residue class ring mod *<sup>q</sup>*, *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*n*×*<sup>m</sup> <sup>q</sup>* , the following two *q*-ary lattices are defined as

$$\Lambda\_q(A) = \{ \mathbf{y} \in \mathbb{Z}^m | \text{there is } \mathbf{x} \in \mathbb{Z}^n \Rightarrow \mathbf{y} \equiv A^\prime \mathbf{x} (\text{mod } q) \}, \tag{7.45}$$

and

$$\Lambda\_q^\perp(A) = \{ \mathbf{y} \in \mathbb{Z}^m | A \mathbf{y} \equiv \mathbf{0} (\text{mod } q) \}. \tag{7.46}$$

By the definition: *<sup>q</sup>* (*A*) <sup>⊂</sup> <sup>Z</sup>*<sup>m</sup>* and <sup>⊥</sup> *<sup>q</sup>* (*A*) <sup>⊂</sup> <sup>Z</sup>*<sup>m</sup>* is an *<sup>m</sup>*-dimensional integer lattice. And any <sup>α</sup> <sup>∈</sup> *<sup>q</sup>*Z*<sup>m</sup>*, then there is *<sup>x</sup>* <sup>=</sup> <sup>0</sup> <sup>∈</sup> <sup>Z</sup>*<sup>m</sup>*,<sup>⇒</sup> <sup>α</sup> <sup>≡</sup> *<sup>A</sup> x*(mod *q*), and *A*α ≡ 0(mod *q*), there is

$$\begin{cases} q\mathbb{Z}^m \subset \Lambda\_q(A) \subset \mathbb{Z}^m\\ q\mathbb{Z}^m \subset \Lambda\_q^\perp(A) \subset \mathbb{Z}^m \end{cases}$$

That is, *<sup>q</sup>* (*A*) and <sup>⊥</sup> *<sup>q</sup>* (*A*) are *q*-element lattices of dimension *m*.

**Lemma 7.27** *We have*

$$\begin{cases} \Lambda\_q^\perp(A) = q \Lambda\_q(A)^\* \\ \Lambda\_q(A) = q \Lambda\_q^\perp(A)^\* \end{cases}$$

*Proof* Any α ∈ *<sup>q</sup>* (*A*)∗, by the definition, then

$$
\langle \mathbf{y}, \boldsymbol{\alpha} \rangle \in \mathbb{Z}, \forall \, \mathbf{y} \in \Lambda\_q(A).
$$

And

$$
\langle \mathbf{y}, \boldsymbol{\alpha} \rangle = \mathbf{y}^{\prime} \boldsymbol{\alpha} \in \mathbb{Z} \Rightarrow \mathbf{y}^{\prime} \boldsymbol{\alpha} \equiv \mathbf{0} (\text{mod } \mathbf{1}) .
$$

There is

$$
\text{y } q\alpha \equiv 0 \\
\text{(mod } q), \forall \text{ y } \in \Lambda\_q(A).
$$

Because *<sup>y</sup>* <sup>∈</sup> *<sup>q</sup>* (*A*), thus there is *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* <sup>⇒</sup> *<sup>y</sup>* <sup>≡</sup> *<sup>A</sup> x*(mod *q*), from the above formula,

$$\text{a}^{'}Aq\alpha \equiv 0(\text{mod}\,q), \forall \,\text{x} \in \mathbb{Z}^{n}.$$

Thus

$$Aq\alpha \equiv 0(\text{mod}\,q), \Rightarrow q\alpha \in \Lambda\_q^{\perp}(A).$$

We prove

$$q\Lambda\_q(A)^\* \subset \Lambda\_q^\perp(A).$$

Conversely, if *y* ∈ <sup>⊥</sup> *<sup>q</sup>* (*A*), we have

$$A\mathbf{y} \equiv \mathbf{0} (\text{mod } q) \Rightarrow A\left(\frac{1}{q}\mathbf{y}\right) \equiv \mathbf{0} (\text{mod } \mathbf{l}) .$$

Any <sup>α</sup> <sup>∈</sup> *<sup>q</sup>* (*A*), let *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, <sup>α</sup> <sup>≡</sup> *<sup>A</sup> x*(mod *q*), then

$$\left\langle \alpha, \frac{1}{q} \mathbf{y} \right\rangle = \mathbf{x}' A \left( \frac{1}{q} \mathbf{y} \right) \equiv \mathbf{0} (\text{mod } 1), \forall \, \mathbf{x} \in \mathbb{Z}^n.$$

We have

$$\frac{1}{q}\mathbf{y} \in \Lambda\_q(A)^\* \Rightarrow \mathbf{y} \in q\Lambda\_q(A)^\*.$$

That is

$$
\Lambda\_q^\perp(A) \subset q \Lambda\_q(A)^\*.
$$

Thus, <sup>⊥</sup> *<sup>q</sup>* (*A*) = *q <sup>q</sup>* (*A*)∗. Similarly, the second equation can be proved.

**Lemma 7.28** *Let q be a prime, A* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*×*<sup>m</sup> <sup>q</sup>* , *<sup>m</sup>* <sup>≥</sup> *n, and* rank(*A*) <sup>=</sup> *n, then*

$$|\det(\Lambda\_q^\perp(A))| = q'',\tag{7.47}$$

*and*

$$|\det(\Lambda\_q(A))| = q^{m-n}.\tag{7.48}$$

*Proof* In finite field <sup>Z</sup>*<sup>q</sup>* , rank(*A*) <sup>=</sup> *<sup>n</sup>*, then the linear equation system *Ay* <sup>=</sup> 0 has exactly *q<sup>m</sup>*−*<sup>n</sup>* solutions, from which we can get

$$|\Lambda\_q^\perp(A)/q\mathbb{Z}^m| = q^{m-n}.$$

By Lemma 7.23,

$$|\mathbb{Z}^m/\Lambda\_q^\perp(A)| = |\mathbb{Z}\_q^m/\Lambda\_q^\perp(A)/q\mathbb{Z}^m| = q^n.$$

By Lemma 7.24,

$$|\det(\Lambda\_q^\perp(A))| = |\mathbb{Z}^m/\Lambda\_q^\perp(A)| = q^n.$$

So (7.47) holds. By Corollary 7.5 of the previous section, we have

$$|\det(\Lambda\_q^\perp(A)^\*)| = q^{-n}.$$

By Lemma 7.27,

$$|\det(\Lambda\_q(A))| = q^m |\det(\Lambda\_q^\perp(A)^\*)| = q^{m-n}.$$

The Lemma holds.

# **7.4 Reduced Basis**

In lattice theory, Reduced basis and corresponding LLL algorithm are the most important contents, which have an important impact on computational algebra, computational number theory and other neighborhoods, and are recognized as one of the most important computational methods in recent 100 years. In order to introduce Reduced basis and LLL algorithm, we recall the gram Schmidt orthogonalization process summarized by Eqs. (7.31)–(7.34). Let {β1, β2,...,β*n*} ⊂ <sup>R</sup>*<sup>n</sup>* be a set of bases corresponding to <sup>R</sup>*<sup>n</sup>* , {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } is the corresponding Gram–Schmidt orthogonal basis, where

$$
\beta\_1^\* = \beta\_1, \ \beta\_i^\* = \beta\_i - \sum\_{j=1}^{i-1} \frac{\langle \beta\_i, \beta\_j^\* \rangle}{\langle \beta\_j^\*, \beta\_j^\* \rangle} \beta\_j^\*, \ 1 < i \le n. \tag{7.49}
$$

The above formula can be written as

$$\beta\_i = \sum\_{j=1}^i \frac{\langle \beta\_i, \beta\_j^\* \rangle}{\langle \beta\_j^\*, \beta\_j^\* \rangle} \beta\_j^\*, \ 1 \le i \le n. \tag{7.50}$$

There is

**Lemma 7.29** *Let* {β1, β2,...,β*n*} *be a set of bases of* <sup>R</sup>*n,* {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } *is the corresponding Gram–Schmidt orthogonal basis, L*(β1, β2,...,β*<sup>k</sup>* ) = Span{β1, β2, ...,β*<sup>k</sup>* } *is a linear subspace extended by* β1, β2,...,β*<sup>k</sup> , then*

*(i)*

$$L(\beta\_1, \beta\_2, \dots, \beta\_k) = L(\beta\_1^\*, \beta\_2^\*, \dots, \beta\_k^\*), \ 1 \le k \le n. \tag{7.51}$$

*(ii) For* 1 ≤ *i* ≤ *n, there is*

$$\begin{cases} \langle \beta\_i, \beta\_k^\* \rangle = 0, & \text{when } k > i;\\ \langle \beta\_i, \beta\_k \rangle = \langle \beta\_k^\*, \beta\_k^\* \rangle, & \text{when } k = i. \end{cases} \tag{7.52}$$

*(iii)* <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n, x* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *xi*β<sup>∗</sup> *<sup>i</sup> , then*

$$\chi\_i = \frac{\langle \mathbf{x}, \beta\_i^\* \rangle}{\langle \beta\_i^\*, \beta\_i^\* \rangle}, \ 1 \le i \le n. \tag{7.53}$$

*Proof* The above three properties can be derived directly from Eq. (7.49) or (7.50).

Let *U* = (*Ui j*)*<sup>n</sup>*×*<sup>n</sup>*, where

$$U\_{ij} = \frac{\langle \beta\_i, \beta\_j^\* \rangle}{\langle \beta\_j^\*, \beta\_j^\* \rangle}, \Rightarrow U\_{ij} = 0, \text{ when } j > i. \ U\_{ii} = 1. \tag{7.54}$$

Therefore, *U* is the lower triangular matrix with element 1 on the diagonal, and

$$
\begin{bmatrix}
\beta\_1\\\beta\_2\\\vdots\\\beta\_n
\end{bmatrix} = U \begin{bmatrix}
\beta\_1^\*\\\beta\_2^\*\\\vdots\\\beta\_n^\*
\end{bmatrix}.\tag{7.55}
$$

*U* is called the coefficient matrix when {β1, β2,...,β*n*} is orthogonalized.

Let's introduce the concept of orthogonal projection: suppose *<sup>V</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>k</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*(<sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>n</sup>*), the orthogonal complement space *<sup>V</sup>* <sup>⊥</sup> of *<sup>V</sup>* in <sup>R</sup>*<sup>k</sup>* is

$$V^{\perp} = \{ \mathfrak{x} \in \mathbb{R}^{k} | \langle \mathfrak{x}, \alpha \rangle = 0, \forall \, \alpha \in V \}. \tag{7.56}$$

Because <sup>R</sup>*<sup>k</sup>* <sup>=</sup> *<sup>V</sup>* <sup>⊕</sup> *<sup>V</sup>* <sup>⊥</sup>, so <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* , the only can be expressed as

$$
\alpha = \alpha + \beta, \text{ where } \alpha \in V, \beta \in V^{\perp}.
$$

α is called the orthogonal projection of *x* on subspace *V*, obviously |*x*| <sup>2</sup> = |α| 2 + |β| 2.

**Lemma 7.30** *Let* {β1, β2,...,β*n*} *be a set of bases of* <sup>R</sup>*<sup>n</sup> and* {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } *be the corresponding orthogonal basis,* 1 ≤ *k* ≤ *n, then* β<sup>∗</sup> *<sup>k</sup> is the orthogonal projection of* β*<sup>k</sup> on the orthogonal complement space V of the subspace L*(β1, β2,...,β*<sup>k</sup>*−<sup>1</sup>) *of L*(β1, β2,...,β*<sup>k</sup>* )*.*

*Proof* When *k* = 1, the proposition is trivial, if *k* > 1, then by Lemma 7.29,

$$L(\beta\_1, \beta\_2, \dots, \beta\_{k-1}) = L(\beta\_1^\*, \beta\_2^\*, \dots, \beta\_{k-1}^\*).$$

Therefore, the orthogonal complement space *V* = *L*(β<sup>∗</sup> *<sup>k</sup>* ) of *L*(β1, β2,...,β*<sup>k</sup>*−1) in *L*(β1, β2,...,β*<sup>k</sup>*−1, β*<sup>k</sup>* ) is a one-dimensional space, because of

$$
\beta\_k = \beta\_k^\* + \sum\_{j=1}^{k-1} \mu\_{kj} \beta\_j^\*,
$$

and

$$\left\langle \rho\_k^\*, \sum\_{j=1}^{k-1} \mu\_{kj} \rho\_j^\* \right\rangle = 0.1$$

So β<sup>∗</sup> *<sup>k</sup>* is the orthogonal projection of β*<sup>k</sup>* on *V*. The Lemma holds.

Next, we discuss the transformation law of the corresponding orthogonal basis when making the elementary column transformation of the base matrix [β1, β2,...,β*n*].

**Lemma 7.31** *Let* {β1, β2,...,β*n*} ⊂ <sup>R</sup>*<sup>n</sup> is a set of bases,* {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } *is the corresponding orthogonal basis, A* = (*ui j*)*<sup>n</sup>*×*<sup>n</sup> is the coefficient matrix. Exchange* <sup>β</sup>*<sup>k</sup>*−<sup>1</sup> *with* <sup>β</sup>*<sup>k</sup> to get a set of bases* {α1, α2,...,α*n*} *of* <sup>R</sup>*n, where*

$$
\alpha\_{k-1} = \beta\_k, \alpha\_k = \beta\_{k-1}, \alpha\_i = \beta\_i, \text{when } i \neq k - 1, k.
$$

*Let* {α<sup>∗</sup> <sup>1</sup> , α<sup>∗</sup> <sup>2</sup> ,...,α<sup>∗</sup> *<sup>n</sup>* } *be the corresponding orthogonal basis and A*<sup>1</sup> = (v*i j*)*<sup>n</sup>*×*<sup>n</sup> be the corresponding coefficient matrix, then we have*

*(i)* α<sup>∗</sup> *<sup>i</sup>* = β<sup>∗</sup> *<sup>i</sup> , if i* = *k* − 1, *k*. *(ii)*

$$\begin{cases} \alpha\_{k-1}^\* = \beta\_k^\* + \mu\_{kk-1} \beta\_{k-1}^\* \\ \alpha\_k^\* = \beta\_{k-1}^\* - \upsilon\_{kk-1} \beta\_{k-1}^\*. \end{cases}$$

*(iii)* <sup>v</sup>*i j* <sup>=</sup> *ui j , if* <sup>1</sup> <sup>≤</sup> *<sup>j</sup>* <sup>&</sup>lt; *<sup>i</sup>* <sup>≤</sup> *n, and* {*i*, *<sup>j</sup>*} <sup>+</sup> {*k*, *<sup>k</sup>* <sup>−</sup> <sup>1</sup>} = <sup>∅</sup>*. (iv)*

$$\begin{cases} v\_{ik-1} = \mu\_{ik-1} v\_{kk-1} + \mu\_{ik} \frac{|\beta\_k^\*|^2}{|\alpha\_{k-1}^\*|^2}, & i > k, \\ v\_{ik} = \mu\_{ik-1} - \mu\_{ik} \mu\_{kk-1}, & i > k. \end{cases}$$

*(v)* v*<sup>k</sup>*−<sup>1</sup> *<sup>j</sup>* = *ukj ,*v*kj* = *uk*−<sup>1</sup> *<sup>j</sup>*, 1 ≤ *j* < *k* − 1*.*

*Proof* If 1 ≤ *i* < *k* − 1, or *k* < *i* ≤ *n*, then the orthogonal complement space in *L*(α1, α2,...,α*i*) = *L*(β1, β2, ··· , β*i*),

$$V = L^\perp(\alpha\_1, \alpha\_2, \dots, \alpha\_{i-1}) = L^\perp(\beta\_1, \beta\_2, \dots, \beta\_{i-1}) .$$

Therefore, the orthogonal projection of α<sup>∗</sup> *<sup>i</sup>* as α*<sup>i</sup>* = β*<sup>i</sup>* on *V* is the same as that of β<sup>∗</sup> *i* as β*<sup>i</sup>* on *V*, that is α<sup>∗</sup> *<sup>i</sup>* = β<sup>∗</sup> *<sup>i</sup>* (*i* = *k* − 1, *k*), (i) holds.

To prove (ii), because α<sup>∗</sup> *<sup>k</sup>*−<sup>1</sup> is the orthogonal projection of β*<sup>k</sup>* (= α*<sup>k</sup>*−1) on the orthogonal complement space *L*(β<sup>∗</sup> *<sup>k</sup>*−1) of *L*(β1, β2,...,β*<sup>k</sup>*−2), because of

$$\begin{aligned} \beta\_k^\* &= \beta\_k - \sum\_{j=1}^{k-1} \mu\_{kj} \beta\_j^\* \\ &= \beta\_k - \mu\_{kk-1} \beta\_{k-1}^\* - \sum\_{j=1}^{k-2} \mu\_{kj} \beta\_j^\*, \end{aligned}$$

and *L*(β1, β2,...,β*<sup>k</sup>*−<sup>2</sup>) = *L*(β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β*<sup>k</sup>*−<sup>2</sup>)∗, there is

$$
\alpha\_{k-1}^\* = \beta\_k^\* + \mu\_{kk-1} \beta\_{k-1}^\*.
$$

Similarly, α<sup>∗</sup> *<sup>k</sup>* is the orthogonal projection of β<sup>∗</sup> *<sup>k</sup>*−<sup>1</sup> on *L*(α<sup>∗</sup> *<sup>k</sup>*−<sup>1</sup>), thus

$$
\alpha\_k^\* = \beta\_{k-1}^\* - \upsilon\_{kk-1} \alpha\_{k-1}^\*.
$$

where

$$\begin{split} \upsilon\_{kk-1} &= \frac{\langle \beta^\*\_{k-1}, \alpha^\*\_{k-1} \rangle}{|\alpha^\*\_{k-1}|^2} \\ &= \frac{\langle \beta^\*\_{k-1}, u\_{kk-1} \beta^\*\_{k-1} \rangle}{|\alpha^\*\_{k-1}|^2} \\ &= u\_{kk-1} \frac{|\beta^\*\_{k-1}|^2}{|\alpha^\*\_{k-1}|^2}, \end{split}$$

thus (ii) holds. Similarly, other properties can be proved. Lemma 7.31 holds.

**Lemma 7.32** *Let* {β1, β2,...,β*n*} *be a set of bases of* <sup>R</sup>*n,* {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } *be the corresponding orthogonal basis, and A* = (*ui j*)*<sup>n</sup>*×*<sup>n</sup> be the coefficient matrix. For any k* ≥ 2*, if we replace* β*<sup>k</sup> with* β*<sup>k</sup>* − *r*β*<sup>k</sup>*−<sup>1</sup> *and keep the other* β*<sup>i</sup> unchanged* (*i* = *k*)*, we get a new set of bases.*

$$\{\alpha\_1, \alpha\_2, \dots, \alpha\_n\} = \{\beta\_1, \beta\_2, \dots, \beta\_{k-1}, \beta\_k - r\beta\_{k-1}, \beta\_{k+1}, \dots, \beta\_n\}.$$

*Let* {α<sup>∗</sup> <sup>1</sup> , α<sup>∗</sup> <sup>2</sup> ,...,α<sup>∗</sup> *<sup>n</sup>* } *be the corresponding orthogonal basis and A*<sup>1</sup> = (v*i j*)*<sup>n</sup>*×*<sup>n</sup> the corresponding coefficient matrix, then we have*

*(i)* α<sup>∗</sup> *<sup>i</sup>* = β<sup>∗</sup> *<sup>i</sup> ,* ∀ 1 ≤ *i* ≤ *n, that is,* β<sup>∗</sup> *<sup>i</sup> remains unchanged. (ii)* v*i j* = *ui j , if* 1 ≤ *j* < *i* ≤ *n, i* = *k*. *(iii)* v*kj* = *ukj* − *r uk*−1,*<sup>j</sup>*, if *j* < *k* − 1

$$\dot{\left(v\_{kk-1} - \dot{u}\_{kk-1} - r\right)} = \dot{u}\_{kk-1} - r, \quad \text{if } \dot{j} = k - 1.$$

*Proof* When *i* < *k*, or *i* > *k*, α<sup>∗</sup> *<sup>i</sup>* = β<sup>∗</sup> *<sup>i</sup>* is trivial, to prove (i), only prove when *i* = *k*. Because α<sup>∗</sup> *<sup>k</sup>* is the orthogonal projection of α*<sup>k</sup>* = β*<sup>k</sup>* − *r*β*<sup>k</sup>*−<sup>1</sup> in the orthogonal complement space *L*(α<sup>∗</sup> *<sup>k</sup>* ) = *L*(β<sup>∗</sup> *<sup>k</sup>* ) of *L*(β1, β2,...,β*<sup>k</sup>*−1) = *L*(α1, α2,...,α*<sup>k</sup>*−1),

$$\begin{aligned} \beta\_k^\* &= \beta\_k - \sum\_{j=1}^{k-1} \mu\_{kj} \beta\_j^\* \\ &= \beta\_k - r\beta\_{k-1} - \left(\sum\_{j=1}^{k-2} \mu\_{kj} \beta\_j^\* + (\mu\_{kk-1} - r)\beta\_{k-1}^\*\right), \\ &= \alpha\_k - \left(\sum\_{j=1}^{k-2} \mu\_{kj} \beta\_j^\* + (\mu\_{kk-1} - r)\beta\_{k-1}^\*\right). \end{aligned}$$

This proves that α<sup>∗</sup> *<sup>k</sup>* = β<sup>∗</sup> *<sup>k</sup>* . Thus (i) holds. To prove (ii), when *i* = *k*, we have

$$v\_{ij} = \frac{\langle \alpha\_i, \alpha\_j^\* \rangle}{|\alpha\_j^\*|^2} = \frac{\langle \beta\_i, \beta\_j^\* \rangle}{|\beta\_j^\*|^2} = \mu\_{ij},$$

that is (ii) holds. When *i* = *k*,

$$\begin{split} v\_{kj} &= \frac{\langle \alpha\_k, \alpha\_j^\* \rangle}{|\alpha\_j^\*|^2} \\ &= \frac{\langle \beta\_k - r\beta\_{k-1}, \beta\_j^\* \rangle}{|\alpha\_j^\*|^2} (1 \le j < k \le n) \\ &= \frac{\langle \beta\_k, \beta\_j^\* \rangle}{|\beta\_j^\*|^2} - r \frac{\langle \beta\_{k-1}, \beta\_j^\* \rangle}{|\beta\_j^\*|^2} \\ &= u\_{kj} - r u\_{k-1j} .\end{split}$$

The above formula holds for all 1 ≤ *j* ≤ *k* − 1, thus (iii) holds, the Lemma holds.

Next, we introduce the concept of a set of Reduced bases of R*<sup>n</sup>*.

**Definition 7.10** Let {β1, β2,...,β*n*} ⊂ <sup>R</sup> be a set of bases, {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } be the corresponding orthogonal basis, *A* = (*ui j*)*<sup>n</sup>*×*<sup>n</sup>* be the coefficient matrix, and {β1, β2,...,β*<sup>n</sup>*−<sup>1</sup>} be a set of Reduced bases of <sup>R</sup>*<sup>n</sup>*, if

$$\begin{cases} \text{ (i) } |u\_{ij}| \le \frac{1}{2}, \forall \ 1 \le j < i \le n. \\\ \text{ (ii) } |\beta\_i^\* - u\_{kk-1}\beta\_{i-1}^\*|^2 \ge \frac{3}{4}|\beta\_{i-1}^\*|^2, \forall \ 1 < i \le n. \end{cases} \tag{7.57}$$

A set of Reduced bases of R*<sup>n</sup>* is sometimes called Lovisz Reduced bases, which is of great significance in lattice theory. The important result of this section is that any lattice *L* in R*<sup>n</sup>* has Reduced bases, and the method to calculate the Reduced bases is the famous LLL algorithm.

**Theorem 7.3** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice(full rank lattice), then there is a generating matrix B* = [β1, β2,...,β*n*] *of L, where* {β1, β2,...,β*n*} *is a Reduced basis of* <sup>R</sup>*<sup>n</sup> and will also be a Reduced basis of lattice L* = *L*(*B*)*.*

*Proof* Let *B* = [β1, β2,...,β*n*], *L* = *L*(*B*), first we prove

$$|u\_{kk-1}| \le \frac{1}{2}, \forall \ 1 \le k. \tag{7.58}$$

If there is a *k* > 1, then the above formula does not hold, let *r* be the nearest integer of *ukk*−1, obviously,

$$|\mu\_{kk-1} - r| \le \frac{1}{2}.$$

In {β1, β2,...,β*n*}, replace β*<sup>k</sup>* with β*<sup>k</sup>* − *r*β*<sup>k</sup>*−1, thus by Lemma 7.32,

$$
\mu\_{kj} \to \mu\_{kj} - r\mu\_{k-1j}, \ 1 \le j \le k.
$$

Specially, when *j* = *k* − 1,

$$
\mu\_{kk-1} \to \mu\_{kk-1} - r,
$$

under the new basis, all β*<sup>i</sup>* and *ui j*(1 ≤ *j* < *i* = *k*) remain unchanged, so Eq. (7.58) holds under the new basis.

In the second step of LLL algorithm, we prove that

$$|\beta\_k^\* - u\_{kk-1}\beta\_{k-1}^\*|^2 \ge \frac{3}{4}|\beta\_{k-1}^\*|^2, \forall \ 1 < k \le n. \tag{7.59}$$

By (7.4),

$$\left| \beta\_k^\* + \mu\_{kk-1} \beta\_{k-1}^\* \right|^2 = \left| \beta\_k^\* - \mu\_{kk-1} \beta\_{k-1}^\* \right|^2.$$

Therefore, the sign in the absolute value on the right of Eq. (7.59) can be changed arbitrarily. If there is a *k*, 1 < *k* ≤ *n* such that (7.59) does not hold, that is

$$\left| \beta\_k^\* + \mu\_{kk-1} \beta\_{k-1}^\* \right|^2 < \frac{3}{4} |\beta\_{k-1}^\*|^2. \tag{7.60}$$

In this case, if β*<sup>k</sup>* and β*<sup>k</sup>*−<sup>1</sup> are exchanged and the other β*<sup>i</sup>* remains unchanged, there is a new set of bases {α1, α2,...,α*n*}, the corresponding orthogonal basis {α<sup>∗</sup> <sup>1</sup> , α<sup>∗</sup> <sup>2</sup> ,...,α<sup>∗</sup> *<sup>n</sup>* } and the coefficient matrix *A*<sup>1</sup> = (v*i j*)*<sup>n</sup>*×*<sup>n</sup>*, where

$$
\alpha\_i = \beta\_i(i \neq k - 1, k), \ \alpha\_{k-1} = \beta\_k, \ \alpha\_k = \beta\_{k-1}.
$$

Let's prove that under the new base {α1, α2,...,α*n*}, there is

$$\left|\alpha\_{k}^{\*} + \upsilon\_{kk-1}\alpha\_{k-1}^{\*}\right|^{2} \geq \frac{3}{4} \left|\alpha\_{k-1}^{\*}\right|^{2},\tag{7.61}$$

by Lemma 7.31,

$$\begin{cases} \alpha\_{k-1}^\* = \beta\_k^\* + \mu\_{kk-1} \beta\_{k-1}^\* \\ \alpha\_k^\* = \beta\_{k-1}^\* - \upsilon\_{kk-1} \beta\_{k-1}^\*. \end{cases}$$

By (7.60), we have

$$\left|\alpha\_{k-1}^\*\right|^2 < \frac{3}{4} |\alpha\_k^\* + \upsilon\_{kk-1} \alpha\_{k-1}^\*|^2.$$

That is

$$|\alpha\_k^\* + \upsilon\_{kk-1} \alpha\_{k-1}^\*|^2 > \frac{4}{3} |\alpha\_{k-1}^\*|^2 > \frac{3}{4} |\alpha\_{k-1}^\*|^2.$$

Thus (7.61) holds. Using the above method continuously, it can be proved that formula (7.59) is valid for ∀ *k* > 1, however, when *k* is replaced by *k* − 1, the new β<sup>∗</sup> *<sup>k</sup>*−<sup>1</sup> is replaced by

$$
\beta\_{k-1}^\* \to \beta\_{k-1}^\* + \mu\_{k-1k-2} \beta\_{k-2}^\* = \overline{\beta\_{k-1}^\*}.
$$

We have to prove (7.59), it remains unchanged when *k* − 1 is used instead of *k*. In fact,

$$\begin{split} |\beta\_{k}^{\*} + u\_{kk-1}\overline{\beta}\_{k-1}^{\*}|^{2} &= |\beta\_{k}^{\*} + u\_{kk-1}(\beta\_{k-1}^{\*} + u\_{k-1k-2}\beta\_{k-2}^{\*})|^{2} \\ &= |\beta\_{k}^{\*} + u\_{kk-1}\beta\_{k-1}^{\*}|^{2} + |u\_{kk-1}u\_{k-1k-2}\beta\_{k-2}^{\*}|^{2} \\ &\geq \frac{3}{4} (|\beta\_{k-1}^{\*}|^{2} + u\_{kk-1}^{2}|u\_{k-1k-2}\beta\_{k-2}^{\*}|^{2}) \\ &\geq \frac{3}{4} (|\beta\_{k-1}^{\*}|^{2} + |u\_{k-1k-2}\beta\_{k-2}^{\*}|^{2}) \\ &= \frac{3}{4} |\beta\_{k-1}^{\*} + u\_{k-1k-2}\beta\_{k-2}^{\*}|^{2} \\ &= \frac{3}{4} |\overline{\beta\_{k-1}^{\*}}|^{2}. \end{split}$$

Therefore, Eq. (7.59) does not change when the transformation of commutative vector is carried out continuously; that is, Eq. (7.59) holds for all *k*, 1 < *k* ≤ *n*.

The third step of the LLL algorithm, let's prove that

$$|\mu\_{kj}| \le \frac{1}{2}, \forall \ 1 \le j \prec k \le n. \tag{7.62}$$

When *j* = *k* − 1, (7.58) is the (7.62). For given *k*, 1 < *k* ≤ *n*, if (7.62) does not hold, let *<sup>l</sup>* be the largest subscript ⇒ |*ukl*<sup>|</sup> <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> . Let *r* be the nearest integer to *ukl* , then <sup>|</sup>*ukl* <sup>−</sup> *<sup>r</sup>*| ≤ <sup>1</sup> <sup>2</sup> . Replace β*<sup>k</sup>* with β*<sup>k</sup>* − *r*β*<sup>l</sup>* , from Lemma 7.32, all β<sup>∗</sup> *<sup>i</sup>* remain unchanged and the coefficient matrix is changed to:

$$\begin{aligned} \boldsymbol{u}\_{kj} &= \boldsymbol{u}\_{kj} - r\boldsymbol{u}\_{lj}, \; 1 \le j < l, \\ \boldsymbol{u}\_{kl} &= \boldsymbol{u}\_{kl} - r. \end{aligned}$$

While the other *ui j* remains unchanged, at this time,

$$|\mu\_{kl} - r| = |v\_{kl}| \le \frac{1}{2}.$$

So we have Eq. (7.62) for all 1 ≤ *j* < *k* ≤ *n*.

The above matrix transformation is equivalent to multiplying a unimodular matrix from the right, so the Reduced basis *B* ⇒ *L* = *L*(*B*) of lattice *L* is finally obtained. We complete the proof of Theorem 7.3.

**Lemma 7.33** *Let L* = *L*(*B*) *be a lattice, B is a Reduced basis of L, and B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] *is the corresponding orthogonal basis, then for any* 1 ≤ *j* < *i* ≤ *n, we have*

$$|\beta\_j^\*|^2 \le 2^{i-j} |\beta\_i^\*|^2.$$

*Proof* Because *B* = [β1, β2,...,β*n*] is a Reduced basis, then

$$\left|\beta\_k^\* + \mu\_{kk-1} \beta\_{k-1}^\*\right|^2 \ge \frac{3}{4} |\beta\_{k-1}^\*|^2.$$

Thus

$$|\beta\_k^\* + u\_{kk-1}\beta\_{k-1}^\*|^2 = |\beta\_k^\*|^2 + u\_{kk-1}^2|\beta\_{k-1}^\*|^2 \ge \frac{3}{4}|\beta\_{k-1}^\*|^2.$$

There is

$$\begin{aligned} |\beta\_k^\*|^2 &= \frac{3}{4} |\beta\_{k-1}^\*|^2 - \mu\_{kk-1}^2 |\beta\_{k-1}^\*|^2 \\ &\ge \frac{3}{4} |\beta\_{k-1}^\*|^2 - \frac{1}{4} |\beta\_{k-1}^\*|^2 \\ &= \frac{1}{2} |\beta\_{k-1}^\*|^2. \end{aligned}$$

So when 1 ≤ *j* < *i* ≤ *n* given, we have

$$\begin{aligned} |\beta\_i^\*|^2 &\geq \frac{1}{2} |\beta\_{i-1}^\*|^2 \\ &\geq \frac{1}{4} |\beta\_{i-2}^\*|^2 \\ &\geq \dotsb \\ &\geq 2^{-(i-j)} |\beta\_j^\*|^2, \end{aligned}$$

thus

$$|\beta\_j^\*|^2 \le 2^{i-j} |\beta\_i^\*|^2.$$

*Remark 7.3* In the definition of Reduced base, the coefficient <sup>3</sup> <sup>4</sup> on the left of the second inequality of (7.57) can be replaced by any δ, where <sup>1</sup> <sup>4</sup> <δ< 1. Specially, Babai pointed out in (1986) that the second inequality of Eq. (7.57) can be replaced by the following weaker inequality,

$$|\beta\_i^\*| \le \frac{1}{2} |\beta\_{i-1}^\*|. \tag{7.63}$$

Let's discuss the computational complexity of the LLL algorithm. Let *B* = {β1, β2,...,β*n*} be any set of bases, for any 0 ≤ *k* ≤ *n*, we define

$$d\_0 = 1,\\ d\_k = \det(\langle \beta\_i, \beta\_j \rangle\_{k \times k}). \tag{7.64}$$

If {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } is the orthogonal basis corresponding to {β1, β2,...,β*n*}, there is obviously

$$d\_k = \prod\_{i=1}^k |\beta\_i^\*|^2, 0 < k \le n. \tag{7.65}$$

Thus, *di* is a positive number, and *dn* = *d*(*L*)2. Let

$$D = \prod\_{k=1}^{n-1} d\_k,\tag{7.66}$$

We first prove that *dk* (0 < *k* ≤ *n*) and *D* have lower bounds.

#### **Lemma 7.34** *Let*

$$m(L) = \lambda(L)^2 = \min\{|x|^2 : x \in L, x \neq 0\}.$$

*Then*

$$d\_k \ge \left(\frac{3}{4}\right)^{\frac{k(k-1)}{2}} m(L)^k, \ 1 \le k \le n.$$

*Proof* The determinant of *<sup>k</sup>*-dimensional lattice *Lk* <sup>=</sup> *<sup>L</sup>*(β1, β2,...,β*<sup>k</sup>* ) <sup>⊂</sup> <sup>R</sup>*<sup>k</sup>* (<sup>1</sup> <sup>≤</sup> *k* ≤ *n*) has

$$d^2(L\_k) = d\_k.$$

By the conclusion of Cassels (1971), there is a nonzero lattice point *x* in *Lk* , which satisfies *x* ∈ *Lk* , *x* = 0, and

$$|\mathbf{x}|^2 \le \left(\frac{4}{3}\right)^{\frac{k-1}{2}} d\_k^{\frac{1}{k}}.\tag{7.67}$$

Then

$$\begin{aligned} d\_k &\geq \left(\frac{3}{4}\right)^{\frac{k(k-1)}{2}} m(L\_k)^k\\ &\geq \left(\frac{3}{4}\right)^{\frac{k(k-1)}{2}} (m(L))^k. \end{aligned}$$

The Lemma holds.

Another important conclusion of this section is that for the integer lattice *L* estimation, the computational complexity of the Reduced basis of the integer lattice is obtained by using the LLL algorithm. We prove that the LLL algorithm on the integer lattice is polynomial.

**Theorem 7.4** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an integer lattice, B* = [β1, β2,...,β*n*] *is the generating matrix, suppose N satisfies*

$$\max\_{1 \le i \le n} \left| \beta\_i \right|^2 \le N.$$

*Then the computational complexity of the Reduced basis of L obtained by B using the LLL algorithm is*

$$\text{Time}(\text{LLL algorithm}) = O(n^4 \log N).$$

*The binary digits of all integers in the LLL algorithm are O*(*n* log *N*)*, so the computational complexity of the LLL algorithm on the integer lattice is polynomial.*

*Proof* By (7.36), we have

$$|\beta\_i^\*| \le |\beta\_i|, \ 1 \le i \le n.$$

where {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } is the orthogonal basis corresponding to {β1, β2,...,β*n*}, then by (7.65) and (7.66), we have

$$d\_k = \prod\_{i=1}^k |\beta\_i^\*|^2 \le \prod\_{i=1}^k |\beta\_i|^2 \le N^k, 1 \le k \le n.$$

And

$$1 \le D \le N^{\frac{n(n-1)}{2}}.\tag{7.68}$$

The inequality on the left of the above formula is because of *dk* <sup>∈</sup> <sup>Z</sup>, and *dk* <sup>≥</sup> 1, by (7.66), then *D* ≥ 1. Therefore, *O*(*n*) arithmetic operations are required in the first step of the LLL algorithm, *O*(*n*<sup>3</sup>) arithmetic operations are required in the second and third steps, and the number of bit operations per algorithm operation is ≤ Time (calculate *D*), thus

$$\text{Time}(\text{LLL algorithm}) \le O(n^3)\\ \text{Time}(\text{calculate } D) = O(n^4 \log N)...$$

Therefore, the first conclusion of Theorem 7.4 is proved. The second conclusion is more complex, we will omit it. Interested readers can refer to the original (1982) of A. K. Lenstra, H. W. Lenstra and L. Lovasz.

# **7.5 Approximation of SVP and CVP**

The most important application of lattice Reduced basis and LLL algorithm is to provide approximation algorithms for the shortest vector problem and the shortest adjacent vector problem, and obtain some approximate results. Firstly, we prove the following Lemma.

**Lemma 7.35** *Let*{β1, β2,...,β*n*} *be a Reduced basis of a lattice L,*{β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *n* } *be the corresponding orthogonal basis, and d*(*L*) *be the determinant of L, then we have*

*(i)*

$$d(L) \le \prod\_{i=1}^{n} |\beta\_i| \le 2^{\frac{n(n-1)}{4}} d(L). \tag{7.69}$$

*(ii)*

$$|\beta\_1| \le 2^{\frac{n-1}{4}} d(L)^{\frac{1}{n}}.\tag{7.70}$$

*Proof* The inequality on the left of (i), called Hadamard inequality, has been given by Lemma 7.17. The inequality on the right of (i) gives an upper bound of ,*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> |β*i*|, by Lemma 7.33,

$$|\beta\_j^\*| \le 2^{\frac{i-j}{2}} |\beta\_i^\*|, 1 \le j < i \le n. \tag{7.71}$$

Thus

$$
\beta\_i = \beta\_i^\* + \sum\_{j=1}^{i-1} u\_{ij} \beta\_j^\*.
$$

We get

$$\begin{aligned} |\beta\_i|^2 &= |\beta\_i^\*|^2 + \sum\_{j=1}^{i-1} \mu\_{ij}^2 |\beta\_j^\*|^2 \\ &\le |\beta\_i^\*|^2 + \frac{1}{4} \sum\_{j=1}^{i-1} |\beta\_j^\*|^2 \\ &\le \left(1 + \frac{1}{4} \sum\_{j=1}^{i-1} 2^{i-j}\right) |\beta\_i^\*|^2 \end{aligned}$$

#### 7.5 Approximation of SVP and CVP 297

$$\begin{aligned} &= \left(1 + \frac{1}{4}(2^i - 2)\right) |\beta\_i^\*|^2 \\ &\le 2^{i-1} |\beta\_i^\*|^2. \end{aligned}$$

There is

$$\begin{aligned} \prod\_{i=1}^n |\beta\_i|^2 &\le \prod\_{i=1}^n 2^{i-1} |\beta\_i^\*|^2 \\ &= 2^{\sum\_{i=0}^{n-1} i} \prod\_{i=1}^n |\beta\_i^\*|^2 \\ &= 2^{\frac{n}{2}(n-1)} \prod\_{i=1}^n |\beta\_i^\*|^2 \\ &= 2^{\frac{n}{2}(n-1)} (d(L))^2. \end{aligned}$$

So

$$\prod\_{i=1}^n |\beta\_i| \le 2^{\frac{n}{4}(n-1)}d(L).$$

We have (7.69) holds. To prove (iii), by (7.72) and (7.71), then

$$|\beta\_j|^2 \le 2^{j-1} |\beta\_j^\*|^2 \le 2^{j-1} 2^{i-j} |\beta\_i^\*|^2 = 2^{i-1} |\beta\_i^\*|^2. \tag{7.73}$$

For all 1 ≤ *j* ≤ *i* ≤ *n*, especially,

$$|\beta\_1^\*| \le 2^{i-1} |\beta\_i^\*|^2, 1 \le i \le n.$$

Thus

$$\begin{aligned} \left| \beta\_1 \right|^{2n} &\leq 2^{\sum\_{i=0}^n (i-1)} \prod\_{i=1}^n \left| \beta\_i^\* \right|^{2^i} \\ &= 2^{\frac{n}{2}(n-1)} (d(L))^2. \end{aligned}$$

So

$$|\beta\_1| \le 2^{\frac{n-1}{4}} d(L)^{\frac{1}{n}}.$$

Lemma 7.35 holds!

The following theorem shows that if {β1, β2,...,β*n*} is a set of Reduced bases of a lattice *L*, then β<sup>1</sup> is the approximation vector of the shortest vector *u*<sup>0</sup> of lattice *L*, and the approximation coefficient *rn* = 2*<sup>n</sup>*−1.

**Theorem 7.5** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice (full rank lattice), B* = [β1, β2, ...,β*n*] *is a set of Reduced bases of L,* λ<sup>1</sup> = λ(*L*) *is the minimal distance of L, then*

298 7 Lattice-Based Cryptography

$$|\beta\_1| \le 2^{\frac{n-1}{2}} \lambda\_1 = 2^{\frac{n-1}{2}} \lambda(L). \tag{7.74}$$

*Proof* We only prove that for ∀ *x* ∈ *L*, *x* = 0, there is

$$\|\beta\_1\|^2 \le 2^{n-1} |\mathbf{x}|^2, \forall \mathbf{x} \in L, \mathbf{x} \ne \mathbf{0}.\tag{7.75}$$

When *x* ∈ *L*, *x* = 0 given, let

$$\lambda x = \sum\_{i=1}^{n} r\_i \beta\_i = \sum\_{i=1}^{n} r'\_i \beta\_i^\*, r\_i \in \mathbb{Z}, r'\_i \in \mathbb{R}, \ 1 \le i \le n.$$

Let *k* be the largest subscript ⇒ *rk* = 0, thus *rk* = *r <sup>k</sup>* . So

$$|\mathbf{x}|^2 \ge r\_k^2 |\beta\_k^\*|^2 \ge |\beta\_k^\*|^2 \ge 2^{1-k} |\beta\_1|^2. \tag{7.76}$$

Thus

$$|\beta\_1|^2 \le 2^{k-1} |\alpha|^2 \le 2^{n-1} |\alpha|^2, \ x \in L, \ x \ne 0.$$

That is (7.75) holds, thus Theorem 7.5 holds.

The following results show that not only the shortest vector, the whole Reduced basis vector is the approximation vector of the Successive Shortest vector of the lattice.

**Lemma 7.36** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice,* {β1, β2,...,β*n*} *is a Reduced base of L, let* {*x*1, *x*2,..., *xt*} ⊂ *L be t linearly independent lattice points, then*

$$\left| \left| \beta\_{j} \right|^{2} \leq 2^{n-1} \max \{ \left| \mathbf{x}\_{1} \right|^{2}, \left| \mathbf{x}\_{2} \right|^{2}, \dots, \left| \mathbf{x}\_{l} \right|^{2} \} . \tag{7.77}$$

*For all* 1 ≤ *j* ≤ *t holds.*

*Proof* Write

$$x\_j = \sum\_{i=1}^n r\_{ij}\beta\_i, r\_{ij} \in \mathbb{Z}, \ 1 \le i \le n, \ 1 \le j \le t.$$

For fixed *j*, let *i*(*j*) the largest positive integer *i* ⇒ *ri j* = 0, by (7.76), we have

$$|x\_j|^2 \ge |\mathcal{J}\_{i(j)}^\*|^2, \ 1 \le j \le t.$$

Change the order of *x <sup>j</sup>* to ensure *i*(1) ≤ *i*(2) ≤···≤ *i*(*t*), then *j* ≤ *i*(*j*), for ∀ 1 ≤ *j* ≤ *t* holds. Otherwise, the assumption that

$$\{\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n\} \subset L(\beta\_1, \beta\_2, \dots, \beta\_{j-1})$$

is linearly independent of *x*1, *x*2,..., *x <sup>j</sup>* is contradictory. Thus *j* ≤ *i*(*j*). By (7.73) of Lemma 7.35, then

$$\begin{aligned} \left| \beta\_j \right|^2 &\le 2^{i(j)-1} \left| \beta\_{i(j)}^\* \right|^2 \\ &\le 2^{n-1} \left| \beta\_{i(j)}^\* \right|^2 \\ &\le 2^{n-1} \left| x\_j \right|^2, \forall \ 1 \le j \le t. \end{aligned}$$

Thus (7.77) holds, the Lemma holds.

*Remark 7.4* We give a proof of *rk* = *r <sup>k</sup>* in Theorem 7.5, because *k* is the largest subscript ⇒ *rk* = 0, so

$$x = \sum\_{i=1}^{k} r\_i \beta\_i = \sum\_{i=1}^{k} r'\_i \beta\_i^\* \dots$$

By (7.52) and (7.53),

$$r\_k^{'} = \frac{\langle \mathbf{x}, \beta\_k^\* \rangle}{|\beta\_k^\*|^2}, r\_k = \frac{\langle \mathbf{x}, \beta\_k^\* \rangle}{\langle \beta\_k, \beta\_k^\* \rangle}.$$

Because β*<sup>k</sup>* , β<sup>∗</sup> *<sup>k</sup>* =β<sup>∗</sup> *<sup>k</sup>* , β<sup>∗</sup> *<sup>k</sup>* , so

$$r\_k^\prime = \frac{\langle \alpha, \beta\_k^\* \rangle}{\langle \beta\_k, \beta\_k^\* \rangle} = r\_k.$$

In order to discuss the approximation of the Successive Shortest vector of a lattice, let's look at the definitions of the continuous minimum λ1, λ2,...,λ*<sup>n</sup>* and the Successive Shortest vector of a lattice, by Definition 7.6 and Corollary 7.6 in Sect. 7.2, the continuous minimum λ1, λ2,...,λ*<sup>n</sup>* of a full rank lattice is reachable, for all 1 ≤ *i* ≤ *n*, there is

$$|\alpha\_i| = \lambda\_i, \ \alpha\_i \in L, 1 \le i \le n.$$

For a Successive Shortest vector called α1, α2,...,α*n*, |α*i*| is the shortest under the condition that α*<sup>i</sup>* is linearly independent of {α1, α2,...,α*<sup>i</sup>*−<sup>1</sup>}.

**Theorem 7.6** *Let*{β1, β2,...,β*n*} *be a Reduced basis of lattice L, and* λ1, λ2,...,λ*<sup>n</sup> be the continuous minimum of L, then we have*

$$\left| |\beta\_i| \right|^2 \le 2^{n-1} \lambda\_i, \ 1 \le i \le n. \tag{7.78}$$

*Proof* We make an induction of *i*. Because {β1, β2,...,β*i*} is an Reduced basis of lattice *Li* in R*<sup>i</sup>* , the proposition is obviously true when *i* = 1 (see Theorem 7.5 ). If the proposition holds for *i* − 1, then by Lemma 7.36,

$$|\beta\_i^\*|^2 \le 2^{n-1} \max\{\lambda\_1, \lambda\_2, \dots, \lambda\_i\} = 2^{n-1}\lambda\_i.$$

Therefore, (7.78) holds for all *i*. The Theorem holds.

Next, we choose the Reduced basis to solve the shortest adjacent vector problem (CVP). For any given *<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, because there are only finite lattice points in one lattice *L* in the Ball(*t*,*r*) with *t* as the center and *r* as the radius, there is a lattice point *ut* closest to *t*, that is

$$|u\_t - t| = \min\_{\mathbf{x} \in L, \mathbf{x} \neq t} |\mathbf{x} - t|. \tag{7.79}$$

We use the Reduced basis to find a lattice point ω ∈ *L* ⇒

$$|\alpha - t| \le r\_1(n)|u\_t - t|,\tag{7.80}$$

ω is called an approximation of the nearest lattice point *ut* , and *r*1(*n*) is called an approximation coefficient. According to Babai (1986), to solve the approximation of the nearest lattice point *ut* , we adopt the following two technical means:

(A) rounding off: <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, [β1, β2,...,β*n*] = *<sup>B</sup>* is a Reduced base of lattice *<sup>L</sup>*. The discard vector [*x*]*<sup>B</sup>* of *x* is defined as follows, let

$$\chi = \sum\_{i=1}^{n} \chi\_i \beta\_i, \ x\_i \in \mathbb{R},$$

Let δ*<sup>i</sup>* be the nearest integer to *xi* , then define

$$\mathbb{I}[\chi]\_B = \sum\_{i=1}^n \delta\_i \beta\_i \,, \tag{7.81}$$

[*x*]*<sup>B</sup>* is called the discard vector of *x* under base *B*, write *x* = [*x*]*<sup>B</sup>* + {*x*}*B*, then

$$\{x\}\_B \in \left\{\sum\_{i=1}^n a\_i \beta\_i \,|\, -\frac{1}{2} < a\_i \le \frac{1}{2}, \, 1 \le i \le u\right\}.$$

(B) Adjacent plane

Let*<sup>U</sup>* <sup>=</sup> "*<sup>n</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> <sup>R</sup>β*<sup>i</sup>* <sup>=</sup> *<sup>L</sup>*(β1, β2,...,β*<sup>n</sup>*−<sup>1</sup>) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be an *<sup>n</sup>* <sup>−</sup> 1-dimensional subspace, *L* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> <sup>Z</sup>β*<sup>i</sup>* <sup>⊂</sup> *<sup>L</sup>* be a sublattice of *<sup>L</sup>*, and <sup>v</sup> <sup>∈</sup> *<sup>L</sup>*, call *<sup>U</sup>* <sup>+</sup> <sup>v</sup> is an affine plane of <sup>R</sup>*<sup>n</sup>*. When *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* given, if the distance between *<sup>x</sup>* and *<sup>U</sup>* <sup>+</sup> <sup>v</sup> is the smallest, *<sup>U</sup>* <sup>+</sup> <sup>v</sup> is called the nearest affine plane of *x*.

Let *x* be the orthogonal projection of *x* in the nearest affine plane *U* + v, let *y* ∈ *L* be the vector closest to *x* − v in *L* , and let w = *y* + v be the approximation of the vector closest to *x* in *L*.

Let *<sup>L</sup>*(β1, β2,...,β*n*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a lattice, {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } is the corresponding orthogonal basis. <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, write *<sup>x</sup>* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *xi*β<sup>∗</sup> *<sup>i</sup>* , *xi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, <sup>δ</sup>*<sup>i</sup>* represents the nearest integer of *xi* , according to the nearest plane method, we take (see Lemma 7.43 below).

$$\begin{cases} \begin{aligned} U &= L(\beta\_1^\*, \beta\_2^\*, \dots, \beta\_{n-1}^\*) = L(\beta\_1, \beta\_2, \dots, \beta\_{n-1}) \\ r &= \delta\_n \beta\_n \in L \\ \mathbf{x}' &= \sum\_{i=1}^{n-1} x\_i \beta\_i^\* + \delta\_n \beta\_n^\* \\ \mathbf{y} &\text{ is a sublattice The grid point closest to } \mathbf{x} - v \text{ in } L' = \sum\_{i=1}^{n-1} \mathbb{Z} \beta\_i \\ \boldsymbol{\omega} &= \mathbf{y} + v \end{aligned} \tag{7.82}$$

We prove that

**Theorem 7.7** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice, B* = [β1, β2,...,β*n*] *is a Reduced base of L, for* <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> given, the adjacent plane method produces a lattice point* ω = *y* + v *adjacent to x in L (by (7.82)), satisfies*

$$|w - x| \le 2^{\frac{q}{2}} |u\_x - x|,\tag{7.83}$$

*where ux is given by Eq. (7.79) and further*

$$|x - \omega| \le 2^{\frac{\mu}{2} - 1} |\beta\_n^\*|. \tag{7.84}$$

*Proof* If *<sup>n</sup>* <sup>=</sup> 1, then *<sup>B</sup>* <sup>=</sup> <sup>θ</sup> <sup>∈</sup> <sup>R</sup>, <sup>θ</sup> <sup>=</sup> 0. Let *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>, *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*1θ, *<sup>L</sup>* <sup>=</sup> *<sup>n</sup>*θ, then when *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>,

$$|\mathbf{x} - n\theta| = |\mathbf{x}\_1 \theta - n\theta| = |\mathbf{x}\_1 - n||\theta| \ge |\mathbf{x}\_1 - \delta||\theta|,$$

where δ is the nearest integer to *x*1, let ω = δθ, then

$$|\mathbf{x} - \boldsymbol{\omega}| = |\mathbf{x}\_1 - \boldsymbol{\delta}| |\boldsymbol{\theta}| \le |\mathbf{x} - \boldsymbol{n}\boldsymbol{\theta}|, \ \forall \, \boldsymbol{n} \in \mathbb{Z}.$$

So ω = δθ is the lattice point closest to *x* in *L*, so ω = *ux* ∈ *L*, that is

$$|x - a| = |u\_x - x|.$$

Thus (7.83) holds.

Let *n* ≥ 2, we observe (see (7.82)), v = δ*n*β*n*, *x* <sup>=</sup> "*<sup>n</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> *xi*β<sup>∗</sup> *<sup>i</sup>* + δ*n*β<sup>∗</sup> *<sup>n</sup>* , then

$$|\mathbf{x} - \mathbf{x}^{\prime}| = |\mathbf{x}\_n - \delta\_n| |\beta\_n^\*| \le \frac{1}{2} |\beta\_n^\*|,\tag{7.85}$$

since the distance between affine planes {*u* + *z*|*z* ∈ *L*} is at least |β<sup>∗</sup> *<sup>n</sup>* |, and |*x* − *x* | is the distance between *x* and the nearest affine plane, there is

$$|\mathbf{x} - \mathbf{x}'| \le |\boldsymbol{\mu}\_x - \mathbf{x}|.\tag{7.86}$$

Let ω = *y* + v = *y* + δ*n*β*<sup>n</sup>* ∈ *L*, we prove that

$$\|\mathbf{x} - \boldsymbol{\alpha}\|^2 = |\mathbf{x} - \mathbf{x}\prime|^2 + |\mathbf{x}\prime - \boldsymbol{\alpha}\prime|^2. \tag{7.87}$$

Because *x* − *x* = (*xn* − δ*n*)β<sup>∗</sup> *<sup>n</sup>* , *x* − ω = *x* − v − *y* ∈ *u*, so (*x* − *x* )⊥(*x* − ω). Therefore, by the Pythagorean theorem, (7.87) holds. By induction, we have (see (7.79))

$$\left|\mathbf{x} - \boldsymbol{\omega}\right|^2 \le \frac{1}{4} (|\boldsymbol{\beta}\_1^\*|^2 + |\boldsymbol{\beta}\_2^\*|^2 + \dots + |\boldsymbol{\beta}\_n^\*|^2).$$

By (7.71),

$$|\beta\_i^\*|^2 \le 2^{n-i} |\beta\_n^\*|^2.$$

Thus

$$\begin{aligned} \left| \left| x - w \right|^2 \leq \frac{1}{4} |\beta\_n^\*|^2 (1 + 2 + 2^2 + \dots + 2^{n-1}) \\ = \frac{1}{4} (2^n - 1) |\beta\_n^\*|^2 \\ \leq 2^{n-2} |\beta\_n^\*|^2. \end{aligned}$$

There is

$$|\mathbf{x} - \boldsymbol{\omega}| \le 2^{\frac{n}{2} - 1} |\boldsymbol{\beta}\_n^\*|,\tag{7.88}$$

that is (7.84) holds. To prove (7.83), we have two situations:

Case 1: if *ux* ∈ *U* + *x*,

In this case, *ux* − v ∈ *U* ⇒ *ux* − v ∈ *L* is the lattice point closest to *x* − v in *L*, so there is

$$|\mathbf{x}' - \boldsymbol{\omega}| = |\mathbf{x}' - \boldsymbol{v} - \mathbf{y}| \le C\_{n-1}|\mathbf{x}' - \boldsymbol{u}\_x| \le C\_{n-1}|\mathbf{x} - \boldsymbol{u}\_x|,$$

where *Cn* = 2 *n* <sup>2</sup> . By (7.87), we have

$$|\mathbf{x} - \boldsymbol{\omega}|^2 \le (1 + (C\_{n-1})^2)^{\frac{1}{2}} |\mathbf{x} - \boldsymbol{\mu}| < C\_n |\mathbf{x} - \boldsymbol{\mu}|.$$

The proposition holds.

Case 2: If *ux* ∈/ *U* + *x*, then

$$|x - u\_x| \ge \frac{1}{2} |\mathcal{J}\_n^\*|.$$

By (7.88), we get

$$|\mathbf{x} - \boldsymbol{\alpha}| < 2^{\frac{\mu}{2}}|\mathbf{x} - \boldsymbol{\mu}\_x|.$$

Thus, Theorem 7.7 holds.

Comparing Theorems 7.6 and 7.7, when *x* = 0, the approximation coefficient of Theorem 7.6 is 2*<sup>n</sup>*−<sup>1</sup> <sup>2</sup> , for general *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, there is an additional factor <sup>√</sup>2 in the approximation coefficient. Using the rounding off technique, we can give an approximation to adjacent vectors, another main result in this section is

**Theorem 7.8** *Let B* = [β1, β2,...,β*n*] *be a Reduced basis of L, x* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> given arbitrarily, ux* ∈ *L is the lattice point closest to x, and* [*x*]*<sup>B</sup> is given by Eq. (7.82), then* ω = [*x*]*<sup>B</sup>* ∈ *L, and*

$$|\mathbf{x} - \mathbf{f}\mathbf{x}|\_{B}| \le \left(1 + 2n\left(\frac{9}{2}\right)^{\frac{n}{2}}\right)|\mathbf{x} - \mathbf{u}\_x|.\tag{7.89}$$

By Theorem 7.8, [*x*]*<sup>B</sup>* ∈ *L* is an approximation of the nearest lattice point *ux* , and the approximation coefficient is γ1(*n*) = 1 + 2*n* - 9 2 . *n* <sup>2</sup> , it is a little worse than the approximation coefficients generated by adjacent planes, but the approximation vector is relatively simple. In lattice cryptosystem, [*x*]*<sup>B</sup>* as input information has higher efficiency. To prove Theorem 7.8, we need the following Lemma.

**Lemma 7.37** *Let B* = [β1, β2,...,β*n*] *is a Reduced base of* <sup>R</sup>*n,* <sup>θ</sup>*<sup>k</sup> represents the angle between vector* β*<sup>k</sup> and subspace Uk , where*

$$U\_k = \sum\_{i \neq k} \mathbb{R} \beta\_i. \tag{7.90}$$

*Then for each k,* 1 ≤ *k* ≤ *n, we have*

$$
\sin \theta\_k \ge \left(\frac{\sqrt{2}}{3}\right)^n. \tag{7.91}
$$

*Proof* 1 ≤ *k* ≤ *n* given, ∀ *m* ∈ *Uk* , we prove

$$|\beta\_k| \le \left(\frac{9}{2}\right)^{\frac{n}{2}} |m - \beta\_k|, m \in U\_k. \tag{7.92}$$

Because

$$\sin \theta\_k = \min\_{m \in U\_k} \frac{|m - \beta\_k|}{|\beta\_k|},$$

so by (7.92), ⇒ (7.91), the Lemma holds. To prove (7.92), let {β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* } be the orthogonal basis corresponding to the Reduced basis Reduced {β1, β2,...,β*n*}, then *m* ∈ *Uk* can express as

$$m = \sum\_{i \neq k} a\_i \beta\_i = \sum\_{j=1}^n b\_j \beta\_j^\*, a\_i, b\_j \in \mathbb{R}.$$

Write

$$m = (a\_1, \dots, a\_n) \begin{bmatrix} \beta\_1 \\ \beta\_2 \\ \vdots \\ \beta\_n \end{bmatrix} = (a\_1, \dots, a\_n) U \begin{bmatrix} \beta\_1^\* \\ \beta\_2^\* \\ \vdots \\ \beta\_n^\* \end{bmatrix}.$$

where *ak* = 0, *U* is the transition matrix of Gram–Schmidt orthogonalization (see (7.87)). Then for any 1 ≤ *j* ≤ *n*, 1 ≤ *k* ≤ *n*, there is

$$b\_j = \sum\_{i \neq k} a\_i \mu\_{ij}, \beta\_k = \sum\_{i=1}^n \mu\_{ki} \beta\_i^\*.$$

So

$$m - \beta\_k = \sum\_{j=1}^{n} \nu\_j \beta\_j^\*, \text{ where } \nu\_j = b\_j - \mu\_{kj}.$$

Let *ak* = −1, then

$$\gamma\_j = \sum\_{i=1}^n a\_i u\_{ij} = a\_j + \sum\_{i=j+1}^n a\_i u\_{ij}. \tag{7.93}$$

Therefore, Eq. (7.92) can be rewritten as

$$|\beta\_k|^2 = \sum\_{j=1}^k \mu\_{kj}^2 |\beta\_j^\*|^2 \le \left(\frac{9}{2}\right)^{\frac{9}{2}} \sum\_{j=1}^n \nu\_j^2 |\beta\_j^\*|^2. \tag{7.94}$$

Let us first prove the following assertion:

$$\sum\_{j=k}^{n} \nu\_j^2 \ge \left(\frac{2}{3}\right)^{2(n-k)}.\tag{7.95}$$

If the above formula does not hold, i.e.,

$$\sum\_{j=k}^{n} \nu\_j^2 < \left(\frac{2}{3}\right)^{2(n-k)}$$

Then for all *j*, *k* ≤ *j* ≤ *n*, there is

$$|\chi\_j^2 < \left(\frac{2}{3}\right)^{2(n-k)} \Rightarrow |\chi\_j| < \left(\frac{2}{3}\right)^{(n-k)}.\tag{7.96}$$

.

By (7.93),

$$\begin{cases} \begin{aligned} \gamma\_n &= a\_n \\ \gamma\_{n-1} &= a\_{n-1} + a\_n u\_{nn-1} \\ \gamma\_{n-2} &= a\_{n-2} + a\_{n-1} u\_{n-1n-2} + a\_n u\_{nn-2} \\ &\cdots \\ \gamma\_k &= a\_k + a\_{k+1} u\_{k+1k} + \cdots + a\_n u\_{nk} \end{aligned} \end{cases}$$

We can prove

$$|a\_j| < \left(\frac{3}{2}\right)^{n-j} \cdot \left(\frac{2}{3}\right)^{n-k}.\tag{7.97}$$

Because when *j* = *n*, *an* = γ*n*, (7.96) ensures that (7.97) holds. Reverse induction of *j*(*k* ≤ *j* ≤ *n*), by (7.93),

$$\begin{aligned} |a\_j| &= |\boldsymbol{\gamma}\_j - \sum\_{i=j+1}^n a\_i \boldsymbol{u}\_{ij}| \le |\boldsymbol{\gamma}\_j| + \sum\_{i=j+1}^n \frac{|a\_i|}{2} \\ &< \left(\frac{2}{3}\right)^{n-k} + \frac{1}{2} \sum\_{i=j+1}^n \left(\frac{3}{2}\right)^{n-i} \left(\frac{2}{3}\right)^{n-k} \\ &= \left(\frac{2}{3}\right)^{n-k} + \frac{1}{2} \left(\frac{2}{3}\right)^{n-k} \sum\_{i=0}^{n-j-1} \left(\frac{3}{2}\right)^i \\ &= \left(\frac{2}{3}\right)^{n-k} + \left(\frac{2}{3}\right)^{n-k} \left(\left(\frac{3}{2}\right)^{n-j} - 1\right) \\ &= \left(\frac{3}{2}\right)^{n-j} \left(\frac{2}{3}\right)^{n-k} .\end{aligned}$$

Therefore, under the assumption of (7.96), we have (7.97). Take *j* = *k* in (7.97), then |*ak* | < 1, but *ak* = −1, this contradiction shows that Formula (7.96) does not hold, thus (7.95) holds.

We now prove Formula (7.94) to complete the proof of Lemma. By Lemma 7.33,

$$|\beta\_k^\*|^2 \ge 2^{j-k} |\beta\_j^\*|^2, \ 1 \le j \le k \le n.$$

And

$$|\beta\_k^\*|^2 \le 2^{j-k} |\beta\_j^\*|^2, \; 1 \le k \le j \le n.$$

Therefore, there is an estimate on the left of Eq. (7.94)

$$\begin{aligned} \sum\_{j=1}^{k} u\_{kj}^2 |\beta\_j^\*|^2 &\le |\beta\_k^\*|^2 \sum\_{j=1}^{k} u\_{kj}^2 2^{k-j} \\ &\le \frac{1}{4} |\beta\_k^\*|^2 \sum\_{j=1}^{k} 2^{k-j} \\ &= \frac{1}{4} |\beta\_k^\*|^2 (2^k - 1) \\ &< 2^k |\beta\_k^\*|^2. \end{aligned}$$

On the other hand, there is an estimate on the right of (7.94),

$$\begin{aligned} \sum\_{j=1}^n \nu\_j^2 |\mathcal{B}\_j^\*|^2 &\geq \sum\_{j=k}^n \nu\_j^2 |\mathcal{B}\_j^\*|^2 \\ &\geq \sum\_{j=k}^n \nu\_j^2 2^{k-j} |\mathcal{B}\_i^\*|^2 \\ &\geq 2^{k-n} |\mathcal{B}\_k^\*|^2 \sum\_{j=k}^n \nu\_j^2 \\ &\geq 2^{k-n} \left(\frac{2}{3}\right)^{2(n-k)} |\mathcal{B}\_k^\*|^2 \\ &\geq \left(\frac{2}{9}\right)^{\frac{2}{7}} |\mathcal{B}\_k^\*|^2. \end{aligned}$$

Thus (7.94) holds, we complete the proof of Lemma 7.37.

Now we give the proof of 7.8:

*Proof* (*The proof of Theorem* 7.8) Let *B* = {β1, β2,...,β*n*} be a Reduced basis of lattice *L* = *L*(*B*), 1 ≤ *k* ≤ *n* given, *Uk* is a linear subspace generated by *B* − {β*<sup>k</sup>* }, by Lemma 7.37, we have

$$|\beta\_1| \le \left(\frac{9}{2}\right)^{\frac{4}{2}} |m - \beta\_k|, \forall m \in U\_k. \tag{7.98}$$

Let *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, <sup>ω</sup> = [*x*]*<sup>B</sup>* <sup>∈</sup> *<sup>L</sup>*, then

$$\|\mathbf{x} - \boldsymbol{\omega} = \mathbf{x} - \|\mathbf{x}\|\_{B} = \sum\_{i=1}^{n} c\_{i} \boldsymbol{\beta}\_{i}, \ |c\_{i}| \le \frac{1}{2} (1 \le i \le n).$$

Let *ux* be the nearest grid point to *x* in *L*, and let

#### 7.5 Approximation of SVP and CVP 307

$$
\mu\_x - \omega = \sum\_{i=1}^n a\_i \beta\_i, a\_i \in \mathbb{Z}.
$$

We prove

$$|u\_x - \omega| \le 2n\left(\frac{9}{2}\right)^{\frac{n}{2}}|u\_x - x|.\tag{7.99}$$

Might as well make *ux* = ω, and suppose

$$|a\_k \beta\_k| = \max\_{1 \le j \le n} |a\_j \beta\_j| > 0.$$

Obviously,

$$|\mu\_x - \omega| \le n|a\_k\beta\_k|.\tag{7.100}$$

On the other hand,

$$
\mu\_x - x = (\mu\_x - \omega) + (\omega - x) \\
= \sum\_{i=1}^n (a\_i + c\_i)\beta\_i \\
= (a\_k + c\_k)(\beta\_k - m).
$$

where

$$m = -\frac{1}{a\_k + c\_k} \sum\_{j \neq k} (a\_j + c\_j)\beta\_j \in U\_k.$$

By (7.99),

$$|u\_x - x| = |a\_k + c\_k| |\beta\_k - m| \ge \frac{1}{2} \left(\frac{2}{9}\right)^{\frac{\pi}{2}} |\beta\_k| |a\_k|.$$

There is

$$|a\_k \beta\_k| \le 2\left(\frac{9}{2}\right)^{\frac{n}{2}} |u\_x - x|.$$

So

$$|u\_x - w| \le 2n\left(\frac{9}{2}\right)^{\frac{6}{2}}|u\_x - w|.$$

That is (7.99) holds, finally,

$$|\mathbf{x} - \boldsymbol{\omega}| \le |\mathbf{x} - \boldsymbol{u}\_x| + |\boldsymbol{u}\_x - \boldsymbol{\omega}| \le \left(1 + 2n\left(\frac{9}{2}\right)^{\frac{9}{2}}\right)|\mathbf{x} - \boldsymbol{u}\_x|.$$

We complete the proof of Theorem 7.8.

# **7.6 GGH/HNF Cryptosystem**

Lattice-based cryptosystem is the main research object of postquantum cryptography. Since it was first proposed in 1996, it has only a history of more than 20 years. Among them, the representative technologies are Ajtai-Dwork cryptosystem, GGH cryptosystem, McEliece-Niederreiter cryptosystem and NTRU cryptosystem based on algebraic code theory. We will introduce them, respectively, below.

GGH cryptosystem is a cryptosystem based on lattice theory proposed by Goldreich, Goldwasser and Halevi in 1997. It is generally considered that it is a new public key cryptosystem to replace RSA in the postquantum cryptosystem era.

Let *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* be an integer lattice, *<sup>B</sup>* and *<sup>R</sup>* are two generating matrices of *<sup>L</sup>*, that is

$$L = L(B) = L(R).$$

Because there is a unique HNF base in *L* (see Lemma 3.4). Let *B* = HNF(*L*) be HNF matrix, *<sup>B</sup>* as public key and *<sup>R</sup>* as private key. Let <sup>v</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* be an integer point, *<sup>e</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is an error vector. Let <sup>σ</sup> be a parameter vector. Take *<sup>e</sup>* <sup>=</sup> <sup>σ</sup> or *<sup>e</sup>* = −σ, they each chose with a probability of <sup>1</sup> 2 .

Encryption: for the plaintext <sup>v</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* encoded and input and the error vector randomly selected according to the parameter vector σ, the public key *B* is used for encryption. The encryption function *fB*,σ is defined as

$$f\_{B, \sigma}(v, e) = Bv + e = c \in \mathbb{R}^n. \tag{7.101}$$

Decryption: decrypt cryptosystem text *<sup>c</sup>* with private key *<sup>R</sup>*, because *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* , *R* = [α1, α2,...,α*n*], then *c* can be expressed in {α1, α2,...,α*n*} linearity,

$$c = \sum\_{i=1}^{n} \boldsymbol{x}\_i \boldsymbol{\alpha}\_i \text{ , } \boldsymbol{x}\_i \in \mathbb{R} \text{.} $$

Let δ*<sup>i</sup>* be the nearest integer to *xi* , define (see (7.81))

$$\{c\}\_R = \sum\_{i=1}^n \delta\_i \alpha\_i \in L. \tag{7.102}$$

Define the decryption function as

$$\begin{cases} f\_{B,\sigma}^{-1}(c) = B^{-1}[c]\_{\mathcal{R}} = \upsilon, \\ e = c - B\upsilon. \end{cases} \tag{7.103}$$

In order to verify the correctness of decryption function *f* <sup>−</sup><sup>1</sup> *<sup>B</sup>*,σ , we first prove the following simple Lemma. For any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, and *<sup>R</sup>* = [α1, α2,...,α*n*] ∈ <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>* is any set of bases of <sup>R</sup>*<sup>n</sup>*, if *<sup>x</sup>* <sup>=</sup> (*a*1, *<sup>a</sup>*2,..., *an*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, <sup>γ</sup>*<sup>i</sup>* represents the integer closest to *ai* , then define (see (7.7))

$$\mathbb{E}[\mathbf{x}] = (\boldsymbol{\gamma}\_1, \boldsymbol{\gamma}\_2, \dots, \boldsymbol{\gamma}\_n) \in \mathbb{Z}^n. \tag{7.104}$$

Write *<sup>x</sup>* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *xi*α*<sup>i</sup>* , δ*<sup>i</sup>* is the nearest integer to *xi* , then define (see (7.102))

$$L[\chi]\_R = \sum\_{i=1}^n \delta\_i \alpha\_i \in L(R). \tag{7.105}$$

**Lemma 7.38** *For* <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n, R* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup> is a set of bases of* <sup>R</sup>*n, we have*

$$[x]\_R = \mathcal{R}[\mathcal{R}^{-1}x].$$

*Proof* Write

$$\boldsymbol{x} = \begin{bmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_n \end{bmatrix} \in \mathbb{R}^n \Rightarrow \left[ \boldsymbol{x} \right] = \begin{bmatrix} \delta\_1 \\ \delta\_2 \\ \vdots \\ \delta\_n \end{bmatrix} \in \mathbb{Z}^n, \left| a\_i - \delta\_i \right| \le \frac{1}{2}.$$

If *<sup>x</sup>* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *xi*α*<sup>i</sup>* , *R* = [α1, α2,...,α*n*], then

$$\boldsymbol{x} = \mathcal{R} \begin{bmatrix} \boldsymbol{x}\_1 \\ \boldsymbol{x}\_2 \\ \vdots \\ \boldsymbol{x}\_n \end{bmatrix}, \text{and } \|\boldsymbol{x}\|\_{\mathcal{R}} = \mathcal{R} \begin{bmatrix} \delta\_1 \\ \delta\_2 \\ \vdots \\ \delta\_n \end{bmatrix}, \text{ } \delta\_i \text{ is the nearest integer to } \boldsymbol{x}\_i.$$

Thus

$$\mathcal{R}^{-1}[x]\_R = \begin{bmatrix} \delta\_1\\ \delta\_2\\ \vdots\\ \delta\_n \end{bmatrix} = [R^{-1}x]\_{-1}$$

Lemma 7.38 holds.

**Theorem 7.9** *Let L* <sup>=</sup> *<sup>L</sup>*(*R*) <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an integer lattice, B is the public key, R is the private key,* <sup>v</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> is plaintext, e is the error vector. If and only if* [*R*−<sup>1</sup>*e*] = 0*,*

$$f\_{B, \sigma}^{-1}(c) \neq v.$$

*Proof* By the definition, cryptosystem text *c* = *B*v + *e* = *fB*,σ (v, *e*), and

$$f\_{B, \sigma}^{-1}(c) \equiv B^{-1}[c]\_R = B^{-1}R[R^{-1}c] = T[R^{-1}c].\tag{7.106}$$

where *<sup>T</sup>* <sup>=</sup> *<sup>B</sup>*−1*<sup>R</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>* is a unimodular matrix. Because *<sup>L</sup>*(*B*) <sup>=</sup> *<sup>L</sup>*(*R*),<sup>⇒</sup>

$$B = RU, \\ U \in SLn(\mathbb{Z}).$$

So

$$B^{-1}R = U\mathcal{R}^{-1}\mathcal{R} = U = T,$$

that is *T* is a unimodular matrix. By (7.106),

$$\begin{aligned} T[R^{-1}c] &= T[R^{-1}(Bv+e)] \\ &= T[R^{-1}Bv + R^{-1}e] \\ &= T[T^{-1}v + R^{-1}e]. \end{aligned}$$

Because *<sup>T</sup>* is a unimodular matrix, <sup>v</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, so

$$[T^{-1}v + R^{-1}e] = T^{-1}v + [R^{-1}e].\tag{7.107}$$

Thus

$$T[R^{-1}c] = v + T[R^{-1}e].$$

That is

$$f\_{\mathcal{B}, \sigma}^{-1}(c) = v + T[\mathcal{R}^{-1}e].$$

Because *T* is a unimodular matrix, *T* [*R*−1*e*] = 0 ⇔ [*R*−1*e*] = 0, so the Theorem holds.

By Theorem 7.9, whether the GGH cryptographic mechanism is correct or not depends entirely on whether [*R*−1*e*] is a 0 vector, that is

$$f\_{B, \sigma}^{-1}(c) = v \Leftrightarrow [R^{-1}e] = 0. \tag{7.108}$$

Therefore, when the private key *R* is given, the selection of error vector *e* and parameter vector σ becomes the key to the correctness of GGH password. Notice that (7.106), if we decrypt with public key *B*, then

$$[B^{-1}c] = [B^{-1}(Bv+e)] = [v+B^{-1}e] = v+[B^{-1}e].$$

Therefore, the basic condition for the security and accuracy of GGH password is

$$\begin{cases} \left[ \mathcal{R}^{-1} e \right] = 0 \\ \left[ \mathcal{B}^{-1} e \right] \neq 0. \end{cases} \tag{7.109}$$

Because the public key *B* we choose is HNF matrix, [*B*−<sup>1</sup>*e*] = 0 is easy to satisfy. Let *<sup>B</sup>* <sup>=</sup> (*bi j*)*<sup>n</sup>*×*<sup>n</sup>* <sup>⇒</sup> *<sup>B</sup>*−<sup>1</sup> <sup>=</sup> (*ci j*)*<sup>n</sup>*×*<sup>n</sup>*. Where *cii* <sup>=</sup> *<sup>b</sup>*−<sup>1</sup> *ii* . Let *e* = (*e*1, *e*2,..., *en*), each *ei* has the same absolute value, that is |*ei*| = σ , σ is the parameter. Thus, 2|*en*| > *bnn* ⇒ [*B*−1*e*] = 0. Let's focus on [*R*−1*e*] = 0.

<sup>∀</sup> *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2,..., *xn*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, define the *<sup>L</sup>*<sup>1</sup> norm <sup>|</sup>*x*|<sup>1</sup> and *<sup>L</sup>*<sup>∞</sup> norm <sup>|</sup>*x*|∞ of *<sup>x</sup>* as

$$\|\mathbf{x}\|\_{\infty} = \max\_{1 \le i \le n} |\mathbf{x}\_i|, \|\mathbf{x}\|\_1 = \sum\_{i=1}^n |\mathbf{x}\_i|. \tag{7.110}$$

**Lemma 7.39** *Let R* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup> be a reversible square matrix, R*−<sup>1</sup> <sup>=</sup> ⎡ ⎢ ⎢ ⎢ ⎣ α1 α2 . . . α*n* ⎤ ⎥ ⎥ ⎥ ⎦ *, where* α*<sup>i</sup>*

*is the row vector of R*−1*. e* <sup>=</sup> (*e*1, *<sup>e</sup>*2,..., *en*) <sup>∈</sup> <sup>R</sup>*n,* <sup>|</sup>*ei*| = <sup>σ</sup>*,* <sup>∀</sup> <sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *n, let*

$$\rho = \max\_{1 \le i \le n} |\alpha\_i|(|\alpha\_i|\_1) \tag{7.111}$$

*be the maximum of the L*<sup>1</sup> *norm of n row vectors of R*−1*, then when* σ < <sup>1</sup> <sup>2</sup><sup>ρ</sup> *, we have* [*R*−1*e*] = 0*.*

*Proof* Suppose α*<sup>i</sup>* = (*ci*1, *ci*2,..., *cin*), the *i*-th component of *R*−1*e* can be written as

$$\left|\sum\_{i=1}^n c\_{ij} e\_j \right| \le \sigma \sum\_{j=1}^n |c\_{ij}| = \sigma |\alpha\_i|\_{\infty} \le \sigma \rho.$$

If σ < <sup>1</sup> <sup>2</sup><sup>ρ</sup> , then each component of *<sup>R</sup>*−1*<sup>e</sup>* is <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> , there is [*R*−1*e*] = 0.

$$\textbf{Lemma 7.40} \ R \in \mathbb{R}^{n \times n}, \ R^{-1} = \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_n \end{bmatrix}, \ let \ \text{max}\_{1 \le i \le n} \, |\alpha\_i|\_{\infty} = \frac{\underline{\underline{\underline{\underline{\underline{\underline{\underline{\underline{\underline{\underline{\underline{\alpha}}}}}}}}}}{\sqrt{n}}}{\sqrt{n}}, \text{ then the probability}$$

*ability of* [*R*−<sup>1</sup>*e*] = 0 *is*

$$P\{\lceil R^{-1}e \rceil \neq 0\} \leq 2n \exp\left(-\frac{1}{8\sigma^2 \chi^2}\right). \tag{7.112}$$

*where* σ *is the parameter, error vector e* = (*e*1,..., *en*)*,* |*ei*| = σ*.*

$$Proof \text{ Let } R^{-1} = (c\_{ij})\_{n \times n}, \ R^{-1}e = \begin{bmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_n \end{bmatrix} \text{, where } a\_i = \sum\_{j=1}^n c\_{ij} e\_j.$$

Because <sup>|</sup>*ci j*| ≤ <sup>γ</sup> <sup>√</sup>*<sup>n</sup>* , <sup>|</sup>*<sup>e</sup> <sup>j</sup>*| = <sup>σ</sup>, then *ci j <sup>e</sup> <sup>j</sup>* is in interval [−γ σ <sup>√</sup>*<sup>n</sup>* , γ σ <sup>√</sup>*<sup>n</sup>* ]; therefore, by Hoeffding inequality, we have

312 7 Lattice-Based Cryptography

$$P\left\{|a\_i|>\frac{1}{2}\right\}=P\left\{\left|\sum\_{j=1}^n c\_{ij}e\_j\right|>\frac{1}{2}\right\}<2\exp\left(-\frac{1}{8\sigma^2\nu^2}\right).$$

To satisfy [*R*−1*e*] <sup>=</sup> 0, then only one of the above conditions {|*ai*<sup>|</sup> <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> } is true. Thus

$$\begin{aligned} P\{ [R^{-1}e] \neq 0 \} &= P\left\{ \bigcup\_{i=1}^{n} \left\{ |a\_i| > \frac{1}{2} \right\} \right\} \\ &\leq \sum\_{i=1}^{n} P\left\{ |a\_i| > \frac{1}{2} \right\} \\ &< 2n \exp\left( -\frac{1}{8\sigma^2 \mathcal{V}^2} \right). \end{aligned}$$

The Lemma holds.

**Corollary 7.8** *For any given* ε > 0*, when parameter* σ *satisfies*

$$
\sigma \le \left(\sqrt{8\log\frac{2n}{\varepsilon}}\right)^{-1} \Rightarrow P\{\lceil R^{-1}e \rceil \neq 0\} < \varepsilon. \tag{7.113}
$$

In order to have a direct impression of Eq. (7.113), let's give an example. Let *n* = 120, ε = 10−5, when the elements of matrix *R*−<sup>1</sup> = (*ci j*)*<sup>n</sup>*×*<sup>n</sup>* change in the interval[−4, 4], that is −4 ≤ *ci j* ≤ 4, then it can be verified that the maximum *L*<sup>∞</sup> norm of the row vector of *R*−<sup>1</sup> is approximately equal to <sup>1</sup> 30× <sup>√</sup><sup>120</sup> , thus <sup>γ</sup> <sup>=</sup> <sup>1</sup> <sup>30</sup> , by Corollary, when <sup>σ</sup> <sup>≤</sup> ( <sup>1</sup> 30 8 log 240 <sup>×</sup> 105)−<sup>1</sup> <sup>≈</sup> <sup>30</sup> <sup>11</sup>.<sup>6</sup> ≈ 2.6, we have

$$P\{\left[R^{-1}e\right] \neq 0\} < 10^{-s}.$$

It can be seen from the above analysis that GGH cryptosystem does not effectively solve the selection of private key *R*, public key *B*, especially parameter σ and error vector. In 2001, Professor Micciancio of the University of California, San Diego further improved GGH cryptosystem by using HNF basis and adjacent plane method. In order to introduce GGH/HNF cryptosystem, we review several important results in the previous sections.

**Lemma 7.41** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice, B* = [β1, β2,...,β*n*] *is the generating base, B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] *is the corresponding orthogonal basis,* λ<sup>1</sup> = λ(*L*) *is the minimum distance of L, then*

*(i)*

$$
\lambda\_1 = \lambda(L) \ge \min\_{1 \le i \le n} |\beta\_i^\*|. \tag{7.114}
$$

*For L* = *L*(*B*)*, take parameter* ρ = ρ(*B*) *as*

$$\rho = \frac{1}{2} \min\_{1 \le i \le n} |\mathcal{J}\_i^\*|. \tag{7.115}$$

*Then for any x* <sup>∈</sup> <sup>R</sup>*n, there is at most one grid point*

$$
\alpha \in L \Rightarrow |\mathbf{x} - \alpha| < \rho. \tag{7.116}
$$

*(ii) Suppose L* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> is an integer lattice, then L has a unique HNF base B, that is L* = *L*(*B*)*, B* = (*bi j*)*n*×*n, satisfies*

$$10 \le b\_{ij} < b\_{ii}, \text{ when } 1 \le i < j \le n,\\
b\_{ij} = 0, \text{ when } 1 \le j < i \le n.$$

*That is, B is an upper triangular matrix, and the corresponding orthogonal basis B*<sup>∗</sup> *of B is a diagonal matrix, that is*

$$B^\* = \text{diag}\{b\_{11}, b\_{22}, \dots, b\_{nn}\}.$$

*Proof* Equation (7.114) is given by Lemma 7.18 and the property (ii) is given by Lemma 7.26. We only prove that if there is lattice point α ∈ *L* ⇒ |*x* − α| < ρ, then α is the only one. Let α<sup>1</sup> ∈ *L* , α<sup>2</sup> ∈ *L*, and

$$|\alpha\_1 - \mathbf{x}| < \rho, |\alpha\_2 - \mathbf{x}| < \rho \Rightarrow |\alpha\_1 - \alpha\_2| < 2\rho = \min\_{1 \le i \le n} |\beta\_i^\*| \le \lambda\_1.$$

Because α<sup>1</sup> − α<sup>2</sup> ∈ *L*, this contradicts the definition of λ1. There is α<sup>1</sup> = α2.

In the previous section, we introduced Babai's adjacent plane method (see (7.82)). The distance between two subsets *A*<sup>1</sup> and *A*<sup>2</sup> in R*<sup>n</sup>* is defined as

$$|A\_1 - A\_2| = \min\{|x - y| |x \in A\_1, y \in A\_2\}.$$

*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is a vector, *<sup>A</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a subset, the distance between *<sup>x</sup>* and *<sup>A</sup>* is defined as

$$|\mathbf{x} - A| = \min\{|\mathbf{x} - \mathbf{y}| |\mathbf{y} \in A\}.$$

Suppose *<sup>L</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* , *<sup>B</sup>* = [β1, β2,...,β*n*] is a generating base, *<sup>B</sup>*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *n* ] is the corresponding orthogonal basis. Define subspace

$$\begin{cases} U = L(\beta\_1, \beta\_2, \dots, \beta\_{n-1}) = \mathbb{R}^{n-1}, & L' = \sum\_{i=1}^{n-1} \mathbb{Z}\beta\_i \text{ is a sub-lattice.} \\ A\_v = U + v, & v \in L. \end{cases}$$

*<sup>A</sup>*<sup>v</sup> is called an affine plane with <sup>v</sup> as the representative element. Any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, let *<sup>A</sup>*<sup>v</sup> be the affine plane closest to *x*, that is

$$|\mathbf{x} - A\_v| = \min\{|\mathbf{x} - A\_a||\mathbf{a} \in L\}.$$

Let *<sup>x</sup>* be the orthogonal projection of *<sup>x</sup>* on *<sup>A</sup>*v. Because *<sup>x</sup>* <sup>−</sup> <sup>v</sup> <sup>∈</sup> *<sup>U</sup>* <sup>=</sup> <sup>R</sup>*<sup>n</sup>*−1. Recursively let *y* ∈ *L* be the nearest lattice point to *x* − v. Then we define the adjacent plane operator τ*<sup>B</sup>* of *x* under base *B* as

$$
\pi\_B(\mathbf{x}) = w = \mathbf{y} + v \in L. \tag{7.117}
$$

**Lemma 7.42** *Under the above definition, if* v1, v<sup>2</sup> ∈ *L, and A*v<sup>1</sup> = *A*v<sup>2</sup> *, then*

$$|A\_{v\_1} - A\_{v\_2}| \ge |\beta\_n^\*|.\tag{7.118}$$

*Proof* v1, v<sup>2</sup> ∈ *L*, then it can be given by the linear combination of{β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *n* }, that is

$$\begin{cases} \upsilon\_1 = \sum\_{i=1}^n a\_i \beta\_i^\*, & \text{where } a\_i \in \mathbb{R}, a\_n \in \mathbb{Z}, \\\upsilon\_2 = \sum\_{i=1}^n b\_i \beta\_i^\*, & \text{where } b\_i \in \mathbb{R}, b\_n \in \mathbb{Z}. \end{cases}$$

In order to prove the *n*-th component, *an* and *bn* are integers, let

$$v\_1 = \sum\_{i=1}^n a\_i^\* \beta\_i, v\_2 = \sum\_{i=1}^n b\_i^\* \beta\_i, a\_i^\*, b\_i^\* \in \mathbb{Z}.$$

Therefore,

$$a\_n = \frac{\langle v\_1, \beta\_n^\* \rangle}{|\beta\_n^\*|^2} = \frac{\langle a\_n^\* \beta\_n, \beta\_n^\* \rangle}{|\beta\_n^\*|^2} = a\_n^\* \in \mathbb{Z}.$$

The above equation uses Eq. (7.52), which can prove *bn* <sup>∈</sup> <sup>Z</sup> in the same way. By condition v<sup>1</sup> − v<sup>2</sup> ∈/ *U*, then *an* = *bn*, therefore

$$|A\_{v\_1} - A\_{v\_2}| = |a\_n - b\_n| |\beta\_n^\*| \ge |\beta\_n^\*|.$$

We have completed the proof of Lemma.

**Lemma 7.43** *Under the above definitions and symbols, suppose x* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>x</sup>* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> γ*i*β<sup>∗</sup> *<sup>i</sup> ,* δ *is the nearest integer to* γ*n, then (i)*

$$v = \delta\beta\_n, \boldsymbol{x}' = \sum\_{i=1}^{n-1} \gamma\_i \beta\_i^\* + \delta\beta\_n^\*. \tag{7.119}$$

*That is, the affine plane closest to x is A*<sup>β</sup>*<sup>n</sup> , the orthogonal projection of x on A*<sup>v</sup> *is x .*

*(ii) Let ux* ∈ *L be the lattice point closest to x, then*

$$|\mathbf{x} - \mathbf{x}'| \le |\mathbf{x} - \boldsymbol{u}\_x|. \tag{7.120}$$

*Proof* Take v = δβ*n*, then v ∈ *L*, we want to prove that the distance between *x* and *<sup>A</sup>*<sup>v</sup> is the smallest. Because *<sup>x</sup>* <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> γ*i*β<sup>∗</sup> *<sup>i</sup>* , so (see (7.119))

$$\begin{aligned} \mathbf{x} - \boldsymbol{\upsilon} &= \sum\_{i=1}^{n-1} \boldsymbol{\chi}\_i^{\prime} \boldsymbol{\beta}\_i^\* + (\boldsymbol{\chi}\_n - \boldsymbol{\delta}) \boldsymbol{\beta}\_n^\*, \\\\ \Longrightarrow |\mathbf{x} - \boldsymbol{A}\_{\boldsymbol{\upsilon}}| &= |\mathbf{x} - \boldsymbol{\upsilon} - \boldsymbol{U}| \leq |\boldsymbol{\chi}\_n - \boldsymbol{\delta}| |\boldsymbol{\beta}\_n^\*| \leq \frac{1}{2} |\boldsymbol{\beta}\_n^\*|. \end{aligned}$$

Let v<sup>1</sup> ∈ *L*, v − v<sup>1</sup> ∈/ *U*, by trigonometric inequality,

$$|x - A\_{v\_1}| \ge |A\_{v\_1} - A\_v| - |x - A\_v| \ge |\beta\_n^\*| - \frac{1}{2}|\beta\_n^\*| = \frac{1}{2}|\beta\_n^\*| \ge |x - A\_v|.$$

So it is correct to take v = δβ*n*. Secondly, we prove that the orthogonal projection *x* of *x* and affine plane *A*<sup>v</sup> is

$$\chi' = \sum\_{i=1}^{n-1} \chi\_i \beta\_i^\* + \delta \beta\_n^\*.$$

Let's first prove *x* ∈ *A*v. Because v = δβ*n*, and

$$\beta\_n = \sum\_{i=1}^{n-1} c\_i \beta\_i^\* + \beta\_n^\* \Rightarrow \delta \beta\_n = \sum\_{i=1}^{n-1} \delta c\_i \beta\_i^\* + \delta \beta\_n^\* = \upsilon. \tag{7.121}$$

Thus

$$x' - v = \sum\_{i=1}^{n-1} (\wp\_i - \delta c\_i) \beta\_i^\* \in U.$$

That is *x* ∈ *U* + v = *A*v. And *x* − *x* = δβ<sup>∗</sup> *<sup>n</sup>* ⇒ (*x* − *x* )⊥*U*. Because

$$U \bigcap A\_v = \mathcal{Q}.$$

Then *A*<sup>v</sup> and *U* are two parallel planes, thus (*x* − *x* )⊥*A*v. This proves that the orthogonal projection of *x* on *A*<sup>v</sup> is *x* , and thus (i) holds.

The proof of (ii) is direct. By the definition of *x* and any affine plane *A*α, the distance of α ∈ *L* satisfies

$$|x - \alpha| \ge |x - A\_{\alpha}|.$$

When α = v, because (*x* − *x* )⊥*A*v, thus

$$|\mathbf{x} - \mathbf{x}'| = |\mathbf{x} - A\_v| \le |\mathbf{x} - A\_a|, \forall \; \alpha \in L.$$

Let *ux* ∈ *L* be the lattice point closest to *x*, then take α = *ux* , there is

$$|\mathbf{x} - \mathbf{x}'| \le |\mathbf{x} - A\_{\mu\_x}| \le |\mathbf{x} - \mu\_x|.$$

The Lemma holds.

**Lemma 7.44** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice, x* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, α <sup>∈</sup> *L. If* <sup>|</sup>*<sup>x</sup>* <sup>−</sup> <sup>α</sup><sup>|</sup> < ρ*, where* <sup>ρ</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> min{|β<sup>∗</sup> *<sup>i</sup>* ||1 ≤ *i* ≤ *n*}*, then the nearest plane operator* τ*<sup>B</sup> has*

$$
\pi\_B(\chi) = \alpha.\tag{7.122}
$$

*Proof* Because of

$$|\mathbf{x} - A\_{\alpha}| \le |\mathbf{x} - \alpha| < \rho.$$

By Lemma 7.42, *A*<sup>α</sup> is the plane *A*<sup>v</sup> closest to *x*, that is *A*<sup>α</sup> = *A*v. And τ*B*(*x*) = w = *y* + v, then we have

$$|\mathbf{x} - w| \le |\mathbf{x} - \alpha| < \rho. \tag{7.123}$$

By Lemma 7.41, we have α = w = τ*B*(*x*). The Lemma holds!

Now let's introduce the workflow of GGH/HNF password:

1. *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>=</sup> *<sup>L</sup>*(*R*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* is an integer lattice, *<sup>R</sup>* = [*r*1,*r*2,...,*rn*] is the private key, *B* = [β1, β2,...,β*n*] is the public key, and is the HNF basis of *L*, where

$$B^\* = \text{diag}\{b\_{11}, b\_{22}, \dots, b\_{nn}\}.$$

We choose the private key *<sup>R</sup>* as a particularly good base, that is <sup>ρ</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> min{|*r* <sup>∗</sup> *<sup>i</sup>* ||1 ≤ *i* ≤ *n*}. Specially, public key *B* satisfies

$$\frac{1}{2}b\_{ii} < \rho, \forall \ 1 \le i \le n.$$


$$f\_{B, \rho}(v, e) = Bv + e = c.$$

4. Decryption: We decrypt cryptosystem text *c* with private key *R*. Decryption is transformed into

$$f\_{\mathcal{B},\rho}^{-1}(v,e) = \mathcal{B}^{-1}\mathfrak{r}\_{\mathcal{R}}(c).$$

where τ*<sup>R</sup>* is the nearest plane operator defined by *R*.

By Lemma 7.44, when |*e*| < ρ, ⇒ |*c* − *B*v|=|*e*| < ρ, thus

$$B^{-1}\tau\_R(c) = B^{-1}\tau\_R(Bv+e) = B^{-1}Bv = v. \tag{7.124}$$

This ensures the correctness of decryption.

Comparing GGH with GGH/HNF, they choose the same encryption function, but the decryption transformation is very different. GGH adopts Babai's rounding off method, while GGH/HNF adopts Babai's nearest plane method. There is a certain difference between the two at the selection point of error vector *e*. The error vector *e* of GGH depends on each component of parameter σ and *e*, and ±σ. The error vector *e* of GGH/HNF depends on the parameter ρ as long as the length is less than ρ. Therefore, GGH/HNF has greater flexibility in the selection of error vector *e*.

Next, we explain the reason why public key *B* chooses HNF basis. For any entire lattice *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>*, *<sup>B</sup>*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] is the corresponding orthogonal basis. Using the congruence relation mod *L*, we define an equivalent relation in R*<sup>n</sup>*, which is also the equivalent relation between integral points in Z*<sup>n</sup>*. By Lemma 7.24, quotient group <sup>Z</sup>*<sup>n</sup>*/*<sup>L</sup>* is a finite group, and <sup>|</sup>Z*<sup>n</sup>*/*L*| = *<sup>d</sup>*(*L*). We further give a set of representative elements of Z*<sup>n</sup>*/*L*. Let

$$F(B^\*) = \left\{ \sum\_{i=1}^n x\_i \beta\_i^\* | 0 \le x\_i < 1 \right\} \tag{7.125}$$

be a parallelogram, it can be compared with the base area *<sup>F</sup>* <sup>=</sup> *<sup>F</sup>*(*B*) of <sup>R</sup>*<sup>n</sup>*/*<sup>L</sup>* (see Lemma 7.16).

$$F = F(B) = \left\{ \sum\_{i=1}^{n} x\_i \beta\_i |0 \le x\_i < 1 \right\}.$$

*F* is just a quadrilateral.

**Lemma 7.45** *For any integer point* <sup>α</sup> <sup>∈</sup> <sup>Z</sup>*n, there is a unique* <sup>w</sup> <sup>∈</sup> *<sup>F</sup>*(*B*∗) *such that*

$$
\alpha \equiv w(\text{mod } L).
$$

*Proof* <sup>α</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* is a integer point, then <sup>α</sup> can be expressed as a linear combination of *B*∗, write

$$\alpha = \sum\_{i=1}^{n} a\_i \beta\_i^\*, a\_i \in \mathbb{R}.$$

[*ai*] represents the largest integer not greater than *ai* , Suppose

$$w = \sum\_{i=1}^{n} a\_i \beta\_i^\* - \sum\_{i=1}^{n} [a\_i] \beta\_i. \tag{7.126}$$

Then

$$\alpha - w = \sum\_{i=1}^{n} [a\_i] \beta\_i \in L \Rightarrow \alpha \equiv w(\text{mod } L).$$

We prove that w ∈ *F*(*B*∗), linearly express w with the basis vector of *B*∗,

$$w = \sum\_{i=1}^{n} b\_i \beta\_i^\*.$$

We can only prove that 0 ≤ *bi* < 1. By (7.52), it is not difficult to have

$$b\_n = \frac{\langle w, \beta\_n^\* \rangle}{|\beta\_n^\*|^2} = \frac{(a\_n - \lceil a\_n \rceil) |\beta\_n^\*|^2}{|\beta\_n^\*|^2} = a\_n - \lceil a\_n \rceil.$$

Thus 0 ≤ *bn* < 1, It is not difficult to verify that ∀ 1 ≤ *i* ≤ *n*, we have 0 ≤ *bi* < 1 by induction, that is w ∈ *F*(*B*∗). To prove uniqueness. Let

$$w = \sum\_{i=1}^{n} a\_i \beta\_i^\*, \text{ where } |a\_i| < 1.$$

We prove that if

$$w = 0 (\text{mod } L) \Leftrightarrow w = 0. \tag{7.127}$$

Write <sup>w</sup> <sup>=</sup> "*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *bi*β*<sup>i</sup>* , then by (7.52), there is

$$a\_n = \frac{\langle w, \beta\_n^\* \rangle}{\langle \beta\_n^\*, \beta\_n^\* \rangle} = \frac{b\_n |\beta\_n^\*|^2}{|\beta\_n^\*|^2} = b\_n.$$

Because of w ∈ *L* and |*bn*| < 1 ⇒ *bn* = 0. It is not difficult to have *b*<sup>1</sup> = *b*<sup>2</sup> =···= *bn* = 0 by induction. That is w = 0, (7.127) holds.

<sup>α</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, if <sup>w</sup><sup>1</sup> <sup>∈</sup> *<sup>F</sup>*(*B*∗), w<sup>2</sup> <sup>∈</sup> *<sup>F</sup>*(*B*∗), α <sup>≡</sup> <sup>w</sup>1(mod *<sup>L</sup>*), α <sup>≡</sup> <sup>w</sup>2(mod *<sup>L</sup>*), then

$$w\_1 - w\_2 \equiv 0 (\text{mod } L).$$

By (7.127), there is w<sup>1</sup> = w2. As can be seen from the above, w<sup>1</sup> ∈ *F*(*B*∗), w<sup>2</sup> ∈ *F*(*B*∗), then when w<sup>1</sup> = w2, there is w<sup>1</sup> ≡ w2(mod *L*), that is, the points in *F*(*B*∗) are not congruent under mod *L*, the Lemma holds.

From the above lemma, any two points in parallelogram *F*(*B*∗) are not congruent mod *L*, therefore, for not congruent lattice points α1, α<sup>2</sup> ∈ *L*, then

$$\{F(B^\*) + \alpha\_1\} \cap \{F(B^\*) + \alpha\_2\} = \mathcal{Q}.\,\,$$

Thus, R*<sup>n</sup>* can be split into

$$\mathbb{R}'' = \cup\_{a \in L} F(\mathcal{B}^\*) + \alpha. \tag{7.128}$$

By Lemma 7.45, any <sup>α</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, there exists a unique <sup>w</sup> <sup>∈</sup> *<sup>F</sup>*(*B*∗) <sup>⇒</sup> <sup>α</sup> <sup>≡</sup> w(mod *<sup>L</sup>*), define

$$w = \alpha \bmod L.$$

Then <sup>α</sup> <sup>→</sup> <sup>α</sup> mod *<sup>L</sup>* gives a surjection of <sup>Z</sup>*<sup>n</sup>* <sup>→</sup> <sup>Z</sup>*<sup>n</sup>* <sup>∩</sup> *<sup>F</sup>*(*B*∗), this mapping is a 1-1 correspondence of <sup>Z</sup>*<sup>n</sup>*/*<sup>L</sup>* <sup>→</sup> <sup>Z</sup>*<sup>n</sup>* <sup>∩</sup> *<sup>F</sup>*(*B*∗). Because if α, β <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then

$$\begin{cases} \alpha \equiv \beta(\text{mod } L) \Rightarrow \alpha \bmod L = \beta \bmod L \in \mathbb{Z}^n \cap F(B^\*)\\ \alpha \not\equiv \beta(\text{mod } L) \Rightarrow \alpha \bmod L \neq \beta \bmod L. \end{cases}$$

By Lemma 7.24, we obviously have the following Corollary.

**Corollary 7.9** *If L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> is an integer lattice, then F*(*B*∗) <sup>∩</sup> <sup>Z</sup>*<sup>n</sup> is a representative element set of* Z*<sup>n</sup>*/*L, and*

$$|F(\mathcal{B}^\*) \cap \mathbb{Z}^n| = d(L). \tag{7.129}$$

If *B* is the HNF basis of the whole lattice *L*, then *B*<sup>∗</sup> = diag{*b*11, *b*22,..., *bnn*}, thus, parallelogram *F*(*B*∗) takes the simplest form:

$$F(B^\*) = \{ (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) | 0 \le \mathbf{x}\_i < b\_{ii} \}. \tag{7.130}$$

This is a cube with a volume of *d*(*L*). Thus

$$\mathbb{Z}^{\mathbb{n}}/L = F(\mathcal{B}^\*) \cap \mathbb{Z}^{\mathbb{n}} = \{ (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) | 0 \le \mathbf{x}\_i < b\_{ii}, \mathbf{x}\_i \in \mathbb{Z} \}. \tag{7.131}$$

This is another proof of Lemma 7.24.

<sup>α</sup> mod *<sup>L</sup>* is called the reduction vector of <sup>α</sup> under module *<sup>L</sup>*, for any <sup>α</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, express that the number of bits of the reduction vector α mod *L* is

$$\sum\_{i=1}^{n} \log b\_{ii} = \log \prod (b\_{ii}) = \log d(L). \tag{7.132}$$

To sum up, the parallelogram of the HNF basis of *L* has a particularly simple geometry, which is actually a cube, which is very helpful for calculating the reduction vector *<sup>x</sup>* mod *<sup>L</sup>* of an entire point *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, the reduction vector is of great significance in the further improvement and analysis of GGH/HNF cryptosystem. For detailed work, please refer to D. Micciancio's paper (Micciancio, 2001) in 2001.

# **7.7 NTRU Cryptosystem**

NTRU cryptosystem is a new public key cryptosystem proposed in 1996 by the number theory research unit (NTRU) composed of three digit theorists J. Hoffstein, J. Piper and J. Silverman of Brown University in the USA. Its main feature is that the key generation is very simple, and the encryption and decryption algorithms are much faster than the commonly used RSA and elliptic curve cryptography, NTRU, in particular, can resist quantum computing attacks and is considered to be a potential public key cryptography that can replace RSA in the postquantum cryptography era.

The essence of NTRU cryptographic design is the generalization of RSA on polynomials, so it is called the cryptosystem based on polynomial rings. However, NTRU can give a completely equivalent form by using the concept of *q*-ary lattice, so NTRU is also a lattice based cryptosystem. For simplicity, we start with polynomial rings.

Let <sup>Z</sup>[*x*] be a polynomial ring with integral coefficients and *<sup>N</sup>* <sup>≥</sup> 1 be a positive integer. We define the polynomial quotient ring *R* as

$$\mathcal{R} = \mathbb{Z}[\mathbf{x}] / \langle \mathbf{x}^{N} - 1 \rangle = \{a\_0 + a\_1\mathbf{x} + \dots + a\_{N-1}\mathbf{x}^{N-1} | a\_i \in \mathbb{Z}\}.$$

Any *F*(*x*) ∈ *R*, *F*(*x*) can be written as an entire vector,

$$F(\mathbf{x}) = \sum\_{i=0}^{N-1} F\_i \mathbf{x}^i = (F\_0, F\_1, \dots, F\_{N-1}) \in \mathbb{Z}^N. \tag{7.133}$$

In *R*, we define a new operation ⊗ called the convolution of two polynomials. Let

$$F(\boldsymbol{x}) = \sum\_{i=0}^{N-1} F\_i \boldsymbol{x}^i,\\ G(\boldsymbol{x}) = \sum\_{i=0}^{N-1} G\_i \boldsymbol{x}^i.$$

Define

$$F \otimes G = H(\boldsymbol{\alpha}) = \sum\_{i=0}^{N-1} H\_i \boldsymbol{\alpha}^i = (H\_0, H\_1, \dots, H\_{N-1}).$$

For any *k*, 0 ≤ *k* ≤ *N* − 1,

$$\begin{split} H\_k &= \sum\_{i=0}^k F\_i G\_{k-i} + \sum\_{i=k+1}^{N-1} F\_i G\_{N+k-i} \\ &= \sum\_{\substack{0 \le i < N \\ 0 \le j < N \\ i+j \equiv k(\text{mod }N)}} F\_i G\_j. \end{split} \tag{7.134}$$

**Lemma 7.46** *Under the new multiplication, R is a commutative ring with unit elements.*

*Proof* By (7.134),

$$F \otimes G = G \otimes F,\\ F \otimes (G + H) = F \otimes G + F \otimes H.$$

So *R* forms a commutative ring under ⊗.

If *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>, <sup>0</sup> <sup>≤</sup> *<sup>a</sup>* <sup>≤</sup> *<sup>N</sup>* <sup>−</sup> 1, is a constant polynomial in *<sup>R</sup>*, then

$$a \otimes F = aF = (aF\_0, aF\_1, \dots, aF\_{N-1}).$$

Therefore, *R* has the unit element *a* = 1. The Lemma holds..

Let *F*(*x*) = (*F*0, *F*1,..., *FN*−1) ∈ *R*. Define

$$\tilde{F} = \frac{1}{N} \sum\_{i=0}^{N-1} F\_i \text{, is arithmetic mean of the coefficients of } F. \tag{7.135}$$

The *L*<sup>2</sup> norm (European norm) and *L*<sup>∞</sup> norm of *F* are defined as

$$\begin{cases} |F|\_2 = (\sum\_{i=0}^{N-1} (F\_i - \tilde{F})^2)^{\frac{1}{2}} \\ |F|\_\infty = \max\_{0 \le i \le N-1} F\_i - \min\_{0 \le i \le N-1} F\_i. \end{cases} \tag{7.136}$$

**Definition 7.11** Let *d*1, *d*<sup>2</sup> be two positive integers, and *d*<sup>1</sup> + *d*<sup>2</sup> ≤ *N*, define polynomial set *A*(*d*1, *d*2) as

$$A(d\_1, d\_2) = \{ F \in R | F \text{ has } d\_1 \text{ coefficients of } \ 1, d\_2 \text{ coefficients of } -1, \\
\text{other coefficients are } 0 \}. \tag{7.137}$$

**Lemma 7.47** *Let* <sup>1</sup> <sup>≤</sup> *<sup>d</sup>* <sup>&</sup>lt; [ *<sup>N</sup>* 2 ]*, (i) Suppose F* ∈ *A*(*d*, *d* − 1)*, then*

$$|F|\_2 = \sqrt{2d - 1 - \frac{1}{N}}.$$

*(ii) If F* ∈ *A*(*d*, *d*)*, then*

$$|F|\_2 = \sqrt{2d}.$$

*Proof* If *<sup>F</sup>* <sup>∈</sup> *<sup>A</sup>*(*d*, *<sup>d</sup>* <sup>−</sup> <sup>1</sup>), by (7.135), then *<sup>F</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>N</sup>* , thus

$$\begin{aligned} (|F|\_2)^2 &= \sum\_{i=0}^{N-1} \left( F\_i - \frac{1}{N} \right)^2 \\ &= \sum\_{i=0}^{N-1} \left( F\_i^2 - \frac{2}{N} F\_i + \frac{1}{N^2} \right) \\ &= 2d - 1 - \frac{2}{N} + \frac{1}{N} = 2d - 1 - \frac{1}{N}, \end{aligned}$$

so (i) holds. If *F* ∈ *A*(*d*, *d*), then *F*˜ = 0, thus

322 7 Lattice-Based Cryptography

$$(\vert F \vert\_2)^2 = 2d, \Rightarrow \vert F \vert\_2 = \sqrt{2d}.$$

The Lemma holds.

The parameters of NTRU cryptosystem are three positive integers, *N*, *q*, *p*, where 1 ≤ *p* < *q*, and (*p*, *q*) = 1, that is

$$\text{parameter system} = \{(N, q, p) | 1 \le p < q, \text{and } (p, q) = 1\}.$$

When the parameter (*N*, *q*, *p*) is selected, we will discuss the key generation of NTRU.

Key generation. Each NTRU user selects two polynomials *f* ∈ *R*, *g* ∈ *R*, deg *f* = deg *g* = *N* − 1, as private key. Take *f* = ( *f*0, *f*1,..., *fN*−<sup>1</sup>), *g* = (*g*0, *g*1,..., *gN*−<sup>1</sup>) as the row vector, then ( *<sup>f</sup>*, *<sup>g</sup>*) <sup>∈</sup> <sup>Z</sup>2*<sup>N</sup>* <sup>⊂</sup> *<sup>R</sup>*2*<sup>N</sup>* . Where *<sup>f</sup>* mod *<sup>q</sup>* is reversible as a polynomial on <sup>Z</sup>*<sup>q</sup>* and *<sup>f</sup>* mod *<sup>p</sup>* is reversible as a polynomial on <sup>Z</sup>*p*, that is <sup>∃</sup> *Fq* <sup>∈</sup> <sup>Z</sup>*<sup>q</sup>* [*x*], *Fp* <sup>∈</sup> <sup>Z</sup>*p*[*x*] such that

$$F\_q \otimes f \equiv \operatorname{l}(\operatorname{mod} q), \operatorname{and} F\_p \otimes f \equiv \operatorname{l}(\operatorname{mod} p). \tag{7.138}$$

When the private key ( *f*, *g*) is selected, the public key *h* is given by the following formula:

$$h \equiv F\_q \otimes \operatorname{g}(\operatorname{mod} q). \tag{7.139}$$

*h* can be regarded as a polynomial on Z*<sup>q</sup>* . Quotient rings Z*<sup>q</sup>* and Z*<sup>p</sup>* are

$$\mathbb{Z}\_q = \mathbb{Z}/q\mathbb{Z} = \left\{ a \in \mathbb{Z} \Big| -\frac{q}{2} \le a < \frac{q}{2} \right\}.$$

$$\mathbb{Z}\_p = \mathbb{Z}/p\mathbb{Z} = \left\{ a \in \mathbb{Z} \Big| -\frac{p}{2} \le a < \frac{p}{2} \right\}.$$

Encryption transformation. User *B* wants to use NTRU to send encrypted information *<sup>m</sup>* to user *<sup>A</sup>*. First, the plaintext *<sup>m</sup>* is encoded as *<sup>m</sup>* <sup>∈</sup> *<sup>R</sup>*, that is *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* , then take the value under mod *p*, that is

$$m \in \mathbb{Z}\_p^N.$$

Then select a polynomial φ ∈ *R*, deg φ = *N* − 1 at random, then use the public key *h* of user *A* for encryption. The encryption function σ is

$$
\sigma(m) = c \equiv p\phi \otimes h + m(\text{mod } q). \tag{7.140}
$$

*c* is the cryptosystem text received by user *A*, *c* is a polynomial on Z*<sup>q</sup>* and a vector in Z*<sup>N</sup> q* .

Decryption transformation. After receiving cryptosystem text *c*, user *A* decrypts it with its own private keys *f* and *Fp* and first calculates

$$a \equiv f \otimes c(\text{mod}\, q). \tag{7.141}$$

*<sup>a</sup>* as a polynomial on <sup>Z</sup>*<sup>q</sup>* , that is, *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup> <sup>q</sup>* is unique. Finally, the decryption transform σ <sup>−</sup><sup>1</sup> is

$$
\sigma^{-1}(c) \equiv a \otimes F\_p(\text{mod } p). \tag{7.142}
$$

Why is the decryption transformation correct? If the parameter selection meets

$$p\phi \otimes h + m \in \mathbb{Z}\_q^N. \tag{7.143}$$

Then

$$c = p\phi \otimes h + m.\tag{7.144}$$

Similarly, if *<sup>c</sup>* <sup>⊗</sup> *<sup>f</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup> <sup>q</sup>* , then *a* = *f* ⊗ *c*. By (7.142),

$$a \otimes F\_p = F\_p \otimes f \otimes c \equiv c = p\phi \otimes h + m(\text{mod } p).$$

Thus

$$a \otimes F\_p \equiv m(\text{mod } p).$$

Because *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup> <sup>q</sup>* , so

$$
\sigma^{-1}(c) \equiv a \otimes F\_p \equiv m(\text{mod } p), \\
\Rightarrow \sigma^{-1}(c) = m.
$$

Therefore, the decryption transformation is correct under the conditions of (7.143) and *<sup>c</sup>* <sup>⊗</sup> *<sup>f</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup> q* .

NTRU's encryption and decryption transformation cannot guarantee the correct decryption of 100%. Because *a* is taken out as a polynomial under mod *q* for decryption operation (see (7.142)). To satisfy (7.144), and *<sup>c</sup>* <sup>⊗</sup> *<sup>f</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup> <sup>q</sup>* , then the following formula is necessary,

$$|f \otimes c|\_{\infty} = |f \otimes (p\phi \otimes h + m)|\_{\infty} < q. \tag{7.145}$$

Therefore, as a necessary condition, when the following formula holds, (7.145) holds.

$$|f \otimes m|\_{\infty} \le \frac{q}{4}, \text{and } |p\phi \otimes g|\_{\infty} \le \frac{q}{4}. \tag{7.146}$$

**Lemma 7.48** *For any* ε > 0*, there are constants r*<sup>1</sup> *and r*<sup>2</sup> > 0*, depending only on* ε *and N, for randomly selected polynomial F*, *G* ∈ *R, then the probability of satisfying the following formula is* ≥ 1 − ε*, that is*

$$P\{r\_1|F|\_2|G|\_2 \le |F \otimes G|\_\infty \le r\_2|F|\_2|G|\_2\} \ge 1-\varepsilon.$$

*Proof* See reference Hoffstein et al. (1998) in this chapter.

By Lemma, to satisfy (7.146), we choose three parameters *d <sup>f</sup>* , *dg* and *d*, where

$$f \in A(d\_f, d\_f - 1), \,\text{g} \in A(d\_\mathfrak{g}, d\_\mathfrak{g}), \,\phi \in A(d, d). \tag{7.147}$$

By Lemma 7.47, | *f* |2, |*g*|<sup>2</sup> and |φ|<sup>2</sup> are known, we choose

$$|f|\_2 \cdot |m|\_2 \approx \frac{q}{4r\_2}, |\phi|\_2 \cdot |g|\_2 \approx \frac{q}{4pr\_2}.\tag{7.148}$$

Then, Eq. (7.146) can be guaranteed to be true (in the sense of probability), so that the success rate of the decryption algorithm will be greater than 1 − ε. Thus, (7.148) becomes the main parameter selection index of NTRU.

Next, we use the concept of *q*-element lattice to make an equivalent description of the above NTRU. We first discuss it from the cyclic matrix. Let *T* and *T*<sup>1</sup> be the following two *N*-order square matrices.

$$T = \begin{pmatrix} 0 & \cdots & 0 \ 1 \\ & & 0 \\ & I\_{n-1} & \vdots \\ & & 0 \end{pmatrix}, \quad T\_1 = \begin{pmatrix} 0 \\ 0 & I\_{n-1} \\ \vdots \\ 1 \ 0 & 0 & 0 \end{pmatrix}.$$

Then *<sup>T</sup> <sup>N</sup>* <sup>=</sup> *<sup>T</sup> <sup>N</sup>* <sup>1</sup> = *IN* , *T*<sup>1</sup> = *T* , and *T*<sup>1</sup> = *T* <sup>−</sup>1, because *T* is an orthogonal matrix ⇒ *T*<sup>1</sup> = *T <sup>N</sup>*−1, where *IN* is the *N*-th order identity matrix, let *a* = (*a*1, *a*2,..., *aN* ) ∈ R*<sup>N</sup>* , it is easy to verify

$$T \cdot \begin{bmatrix} a\_1 \\ a\_2 \\ a\_3 \\ \vdots \\ a\_N \end{bmatrix} = \begin{bmatrix} a\_N \\ a\_1 \\ a\_2 \\ \vdots \\ a\_{N-1} \end{bmatrix}, \ (a\_1, a\_2, a\_3, \dots, a\_N) \\ T\_1 = (a\_N, a\_1, a\_2, \dots, a\_{N-1}). \tag{7.149}$$

The following general assumptions *a* = ⎡ ⎢ ⎢ ⎢ ⎣ *a*1 *a*2 . . . *aN* ⎤ ⎥ ⎥ ⎥ ⎦ <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* are the column vector. The

*N*-order cyclic matrix *T* <sup>∗</sup>(*a*) generated by *a* is defined as

$$T^\*(a) = [a, Ta, T^2a, \dots, T^{N-1}a].\tag{7.150}$$

If *<sup>b</sup>* <sup>=</sup> (*b*1, *<sup>b</sup>*2,..., *bN* ) <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* is a row vector, we define an *<sup>N</sup>*-order matrix

#### 7.7 NTRU Cryptosystem 325

$$T\_1^\*(b) = \begin{bmatrix} b \\ bT\_1 \\ \vdots \\ bT\_1^{N-1} \end{bmatrix}.\tag{7.151}$$

In order to distinguish in the mathematical formula, *T* <sup>∗</sup>(*a*) and *T* <sup>∗</sup> <sup>1</sup> (*a*) are sometimes written as *T* <sup>∗</sup>*a* and *T* <sup>∗</sup> <sup>1</sup> *a* or [*T* <sup>∗</sup>*a*] and [*T* <sup>∗</sup> <sup>1</sup> *a*]. Obviously, the transpose of *T* <sup>∗</sup>(*a*) is

$$(T^\*(a))' = \begin{bmatrix} a' \\ a'T\_1 \\ \vdots \\ a'T\_1^{N-1} \end{bmatrix} = T\_1^\*(a'). \tag{7.152}$$

Equation (7.150) is column vector blocking of cyclic matrix, in order to obtain row vector blocking of cyclic matrix. For any *<sup>x</sup>* <sup>∈</sup> (*x*1,..., *xN* ) <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* , we let

$$\overline{\mathfrak{x}} = (\mathfrak{x}\_N, \mathfrak{x}\_{N-1}, \dots, \mathfrak{x}\_2, \mathfrak{x}\_1) \Rightarrow \overline{\overline{\mathfrak{x}}} = \mathfrak{x}.$$

Similarly, define column vectors *<sup>x</sup>*. So for any column vector *<sup>a</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* , we have

$$T^\*(a) = [a, Ta, T^2a, \dots, T^{N-1}a] = \begin{bmatrix} \overline{a'}T\_1 \\ \overline{a'}T\_1^2 \\ \vdots \\ \overline{a'}T\_1^N \end{bmatrix}. \tag{7.153}$$

On the right side of (7.153) is a cyclic matrix, which is partitioned by rows. We first prove that the transpose of the cyclic matrix is still a cyclic matrix.

$$\begin{aligned} \text{Lemma 7.49 } \forall \, a = \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_N \end{bmatrix} \in \mathbb{R}^N, \text{ then } (T^\*(a))' = T^\*(\overline{T^{-1}a}). \\\\ \text{Proof } \text{Let } \quad \alpha = \begin{bmatrix} \alpha\_1 \\ \alpha\_2 \\ \vdots \\ \alpha\_N \end{bmatrix} \in \mathbb{R}^N, \quad \text{by} \quad (7.152), \quad (T^\*(a))' = T\_1^\*(a'), \quad \text{where } \alpha = (a\_1, \dots, a\_N) \in \mathbb{R}^N \text{ such that} \\\ (\alpha' = (\alpha\_1, \dots, \alpha\_N)) \text{ is the transpose of } \alpha \text{, let} \\ \text{end} \end{aligned}$$

$$\beta = (\alpha\_1, \alpha\_N, \alpha\_{N-1}, \dots, \alpha\_2) = \overline{\alpha'}T\_1.$$

Easy to verify

#### 326 7 Lattice-Based Cryptography

$$T\_1^\*(\beta) = \begin{bmatrix} \beta \\ \beta T\_1 \\ \vdots \\ \beta T\_1^{N-1} \end{bmatrix} = T^\*(\alpha).$$

There is

$$T\_1^\*(\beta) = (T^\*(\beta'))' = T^\*(\alpha).$$

Because α = (α*<sup>N</sup>* , α*<sup>N</sup>*−<sup>1</sup>, ··· , α2, α1), and β = α *T*1, so

$$
\beta' = T\overline{\alpha} \Rightarrow T^{-1}\beta' = \overline{\alpha} \Rightarrow \alpha = \overline{T^{-1}\beta'}.
$$

We let *a* = β , then

$$(T^\*(\alpha))' = T^\*(\alpha) = T^\*(\overline{T^{-1}\alpha}).$$

We have completed the proof of Lemma.

Next, we give an equivalent characterization of cyclic matrix.

$$\textbf{Lemma 7.50}\text{ }Let\ A = (a\_{ij})\_{N \times N}, a = \begin{bmatrix} a\_{11} \\ a\_{21} \\ \vdots \\ a\_{N1} \end{bmatrix} \in \mathbb{R}^N\text{ is the first column of }A\text{, then}$$

*A* = *T* <sup>∗</sup>(*a*) *is a cyclic matrix if and only if for all* 1 ≤ *k* ≤ *N, if* 1 + *i* − *j* ≡ *k*(mod *N*)*, then ai j* = *ak*1*.*

*Proof* If *A* = *T* <sup>∗</sup>(*a*) is a cyclic matrix, by simple observation, there is

$$\begin{cases} a\_{11} = a\_{22} = \dots = a\_{NN} = a\_{11} \\ a\_{21} = a\_{32} = \dots = a\_{NN-1} = a\_{21} \\ \vdots \\ a\_{(N-1)1} = a\_{N2} \\ a\_{N1} = a\_{N1} \end{cases}$$

.

Thus, 1 + *i* − *j* = *k*. The same applies to *i* < *j*. We have

$$k = N + 1 + i - j \Rightarrow 1 + i - j \equiv k \text{(mod } N).$$

So the Lemma holds.

The following lemma characterizes the main properties of cyclic matrices.

$$\textbf{Lemma 7.51} \quad f\/a = \begin{bmatrix} a\_1 \\ \vdots \\ a\_N \end{bmatrix}, b = \begin{bmatrix} b\_1 \\ \vdots \\ b\_N \end{bmatrix} are two column vectors, then \textbf{1.52}$$

#### 7.7 NTRU Cryptosystem 327


*Proof* (i) is trivial, because

$$T^\*(a) + T^\*(b) = [a+b, T(a+b), \dots, T^{N-1}(a+b)] = T^\*(a+b).$$

To prove (ii), using the row vector block of cyclic matrix (see (7.153)), then

$$[T^\*(a)]b = \begin{bmatrix} \overline{a'}T\_1\\ \overline{a'}T\_1^2\\ \vdots\\ \overline{a'}T\_1^N \end{bmatrix} b = \begin{bmatrix} \overline{a'}T\_1b\\ \overline{a'}T\_1^2b\\ \vdots\\ \overline{a'}T\_1^Nb \end{bmatrix},$$

and

$$T^\*(a) \cdot T^\*(b) = \begin{bmatrix} \frac{a'T\_1}{\overline{a'}T\_1^2} \\ \vdots \\ \frac{1}{\overline{a'}T\_1^N} \end{bmatrix} [b, Tb, \dots, T^{N-1}b] = (A\_{ij})\_{N \times N} \cdot \mathbf{b}$$

where

$$A\_{ij} = \overline{a'}T\_1^i \cdot T^{j-1}b = \overline{a'}T\_1^{N+i-j+1}b = \overline{a'}T\_1^{i+1-j}b.$$

By Lemma 7.50, then *T* <sup>∗</sup>(*a*) · *T* <sup>∗</sup>(*b*) = *T* <sup>∗</sup>([*T* <sup>∗</sup>(*a*)]*b*), so there is the first conclusion of (ii). We notice that

$$A\_{ij} = A'\_{ij} = b'T\_1^{j-1}T^i \overline{a} = b'T\_1^{N-i-1+j} \overline{a} = b'T\_1^{j-i-1} \overline{a}.$$

It is easy to prove that for any row vector *x* and column vector *y*, there is *x* · *y* = *x* · *y*, and

$$
\overline{\mathfrak{x}T\_1^k} = \overline{\mathfrak{x}} \cdot T\_1^{N-k}, \; 1 \le k \le N. \tag{7.154}
$$

Thus,

$$A\_{ij} = b'T\_1^{j-i-1}\overline{a} = \overline{b'}T\_1^{N+i+1-j}a = \overline{b'}T\_1^{i+1-j}a.$$

This proves that *T* <sup>∗</sup>(*a*)*T* <sup>∗</sup>(*b*) = *T* <sup>∗</sup>(*b*)*T* <sup>∗</sup>(*a*); that is, the multiplication of cyclic matrix to matrix is commutative.

To prove (iii), suppose (*T* <sup>∗</sup>(*a*)) = *A*, but det(*T* <sup>∗</sup>(*a*)) = det((*T* <sup>∗</sup>(*a*)) ), so we just need to calculate det(*A*). Make polynomial *f* (*x*) = *a*<sup>1</sup> + *a*2*x* +···+ *aN x <sup>N</sup>*−1, and let

$$V = \begin{bmatrix} 1 & 1 & 1 & \cdots & 1 \\ \xi\_1 & \xi\_2 & \xi\_3 & \cdots & \xi\_N \\ \xi\_1^2 & \xi\_2^2 & \xi\_3^2 & \cdots & \xi\_N^2 \\ \cdot \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \xi\_1^{N-1} & \xi\_2^{N-1} & \xi\_3^{N-1} & \cdots & \xi\_N^{N-1} \end{bmatrix}.$$

Then

$$AV = \begin{bmatrix} f(\xi\_1) & f(\xi\_2) & \cdots & f(\xi\_N) \\ \xi\_1 f(\xi\_1) & \xi\_2 f(\xi\_2) & \cdots & \xi\_N f(\xi\_N) \\ \cdots & \cdots & \cdots & \cdots \\ \xi\_1^{N-1} f(\xi\_1) \ \xi\_2^{N-1} f(\xi\_2) & \cdots & \xi\_N^{N-1} f(\xi\_N) \end{bmatrix}.$$

So

$$\det(A)\det(V) = \det(AV) = f(\xi\_1)f(\xi\_2)\cdots f(\xi\_N)\det(V).$$

Because ξ*<sup>i</sup>* is different from each other, that is det(*V*) = 0, so

$$\begin{aligned} \det(A) &= f(\xi\_1) f(\xi\_2) \cdots f(\xi\_N) \\ &= \prod\_{k=1}^N f(\xi\_k) \\ &= \prod\_{k=1}^N (a\_1 + a\_2 \xi\_k + \cdots + a\_N \xi\_k^{N-1}). \end{aligned}$$

$$\text{Now prove (iv). Let } e = \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \in \mathbb{R}^N \text{, then}$$

$$T^\*(e) = [e, Te, \dots, T^{N-1}e] = I\_N.$$

So take *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* to satisfy

$$T^\*(a) \cdot b = e \Rightarrow b = (T^\*(a))^{-1}e.$$

Obviously, *b* is the first column of (*T* <sup>∗</sup>(*a*))−1, by (ii),

$$T^\*(a)T^\*(b) = T^\*([T^\*(a)]b) = T^\*(e) = I\_N.$$

Thus, (*T* <sup>∗</sup>(*a*))−<sup>1</sup> = *T* <sup>∗</sup>(*b*). In other words, the inverse of a reversible cyclic matrix is also a cyclic matrix.

**Corollary 7.10** *Let N be a prime, a* = ⎡ ⎢ ⎣ *a*1 . . . *aN* ⎤ ⎥ ⎦ <sup>∈</sup> <sup>R</sup>*n, satisfy a* <sup>=</sup> <sup>1</sup>*, and* "*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *ai* = 0*, then the cyclic matrix T* <sup>∗</sup>(*a*) *generated by a is a reversible square matrix.*

*Proof* Under given conditions, we can only prove det(*T* <sup>∗</sup>(*a*)) = 0. Let ε*<sup>k</sup>* <sup>=</sup> exp( <sup>2</sup>π*ik <sup>N</sup>* ), 1 ≤ *k* ≤ *N* − 1, be *N* − 1 primitive unit roots of *N*-th( because *N* is a prime), if det(*<sup>T</sup>* <sup>∗</sup>(*a*)) <sup>=</sup> 0, because of "*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *ai* = 0, there must be a *k*, 1 ≤ *k* ≤ *N* − 1, such that

$$a\_1 + \varepsilon\_k a\_2 + \varepsilon\_k^2 a\_3 + \dots + \varepsilon\_k^{N-1} a\_N = 0.$$

In other words, ε*<sup>k</sup>* is a root of polynomial φ(*x*) = *a*<sup>1</sup> + *a*2*x* +···+ *aN x <sup>N</sup>*−1, so φ(*x*) and 1 + *x* +···+ *x <sup>N</sup>*−<sup>1</sup> have a common root ε*<sup>k</sup>* , therefore, the greatest common divisor of two polynomials

$$(\phi(\alpha), 1 + \alpha + \dots + \alpha^{N-1}) > 1.$$

Since 1 + *x* +···+ *x <sup>N</sup>*−<sup>1</sup> is a circular polynomial, it is an irreducible polynomial, *a* = 1, contradiction shows det(*T* <sup>∗</sup>(*a*)) = 0, the Corollary holds.

Next, we give an equivalent description of a lattice of NTRU by using the cyclic matrix. Firstly, we define the linear transformation σ in the even dimensional Euclidean space R2*<sup>N</sup>* , if *x* and *y* are two column vectors, define

$$
\sigma \begin{bmatrix} \chi \\ \chi \end{bmatrix} = \begin{bmatrix} T\chi \\ T\chi \end{bmatrix} \in \mathbb{R}^{2N}.\tag{7.155}
$$

Equivalently, if *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* , *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* are two row vectors, define

$$
\sigma(\mathbf{x}, \mathbf{y}) = (\mathbf{x}T\_{\mathbf{l}}, \mathbf{y}T\_{\mathbf{l}}) \in \mathbb{R}^{2N}.\tag{7.156}
$$

Obviously, <sup>σ</sup> defined above is a linear transformation of <sup>R</sup>2*<sup>N</sup>* <sup>→</sup> <sup>R</sup>2*<sup>N</sup>* .

**Definition 7.12** An entire lattice *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>2*<sup>N</sup>* is called a convolution *<sup>q</sup>*-ary lattice, if


$$
\begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \in L \Rightarrow \sigma \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} = \begin{bmatrix} T\mathbf{x} \\ T\mathbf{y} \end{bmatrix} \in L.
$$

Recall that NTRU's private key is two *<sup>N</sup>* <sup>−</sup> 1-degree polynomials *<sup>f</sup>* <sup>=</sup> "*<sup>N</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *fi <sup>x</sup><sup>i</sup>* , *<sup>g</sup>* <sup>=</sup> "*<sup>N</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *gi <sup>x</sup><sup>i</sup>* , and write *f* and *g* in column vector form:

$$f = \begin{bmatrix} f\_0 \\ \vdots \\ \vdots \\ f\_{N-1} \end{bmatrix} \in \mathbb{Z}^N, f' = (f\_0, f\_1, \dots, f\_{N-1}) \in \mathbb{Z}^N.$$

And

$$\mathfrak{g} = \left\lfloor \begin{array}{c} \mathfrak{g} \\ \vdots \\ \mathfrak{g}\_{N-1} \end{array} \right\rfloor \in \mathbb{Z}^N, \mathfrak{g}' = (\mathfrak{g}\_0, \mathfrak{g}\_1, \dots, \mathfrak{g}\_{N-1}) \in \mathbb{Z}^N.$$

NTRU's parameter system is *N*, *q*, *p* is two positive integers, *N* is prime, *p* < *q*, and defines a polynomial set

$$A\_d\{p, 0, -p\} = \{f(\mathbf{x}) \in \mathbb{Z}^N | d+1 \text{ coefficients of } f \text{ are } p, \\
$$

$$d \text{ coefficients of } f \text{ are } p, \text{ others are } 0\}. \tag{7.157}$$

Select two polynomials *<sup>f</sup>*, *<sup>g</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* of degree *<sup>N</sup>* <sup>−</sup> 1, and parameter *<sup>d</sup> <sup>f</sup>* are positive integers, which meet the following restrictions.


$$f - 1 \in A\_{d\_f} \\
\{p, 0, -p\}, \\
g \in A\_{d\_f} \\
\{p, 0, -p\}.$$

(C) *T* <sup>∗</sup>( *f* ) is reversible mod *q*.

The above (A)–(C) are the parameter constraints of NTRU. Obviously, under these conditions, *T* <sup>∗</sup>( *f* ) and *T* <sup>∗</sup>(*g*) are reversible matrices, and

$$T^\*(f) \equiv I\_N(\text{mod } p), \\ T^\*(g) \equiv 0(\text{mod } p). \tag{7.158}$$

After the polynomials *f* and *g* satisfying the above conditions are selected as the private key, then <sup>6</sup> *<sup>f</sup> g* 7 <sup>∈</sup> <sup>Z</sup>2*<sup>N</sup>* , let's construct a minimum convolution *<sup>q</sup>*-ary lattice containing <sup>6</sup> *<sup>f</sup> g* 7 . Suppose

$$A = [T\_1^\*(f'), T\_1^\*(g')]\_{N \times 2N}, \text{ and } A' = \begin{bmatrix} T^\*(f) \\ T^\*(g) \end{bmatrix}. \tag{7.159}$$

Consider *<sup>A</sup>* as an *<sup>N</sup>* <sup>×</sup> <sup>2</sup>*N*-order matrix on <sup>Z</sup>*<sup>q</sup>* , that is *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>*×2*<sup>N</sup> <sup>q</sup>* , then by (7.45), *A* defines a 2*N* dimensional *q*-ary lattice *<sup>q</sup>* (*A*), that is

$$\Lambda\_q(A) = \{ \mathbf{y} \in \mathbb{Z}^{2N} \mid \text{there is } \mathbf{x} \in \mathbb{Z}^N \Rightarrow \mathbf{y} \equiv A^\prime \mathbf{x} (\text{mod } q) \}. \tag{7.160}$$

We prove that *<sup>q</sup>* (*A*) is a convolution *<sup>q</sup>*-ary lattice containing <sup>6</sup> *<sup>f</sup> g* 7 . First, we prove the following general identity

$$\textbf{Lemma 7.52}\quad Suppose\ a = \begin{bmatrix} a\_1 \\ \vdots \\ a\_N \end{bmatrix} \in \mathbb{R}^N, then\ for\ \mathbf{x}\in\mathbb{R}^N\ and\ 0\le k\le N-1,$$
  $wa\ huwa$ 

*we have*

$$T^k(T^\*(a)\chi) = T^\*(a)(T^k\chi), \text{where } T^0 = I\_N.$$

*Proof k* = 0 is trivial, obviously, we can assume *k* = 1, that is

$$T(T^\*(a)\mathbf{x}) = T^\*(a)(Tx). \tag{7.161}$$

By (7.153),

$$T(T^\*(a)\boldsymbol{x}) = T\begin{bmatrix} \overline{a'}T\_1\boldsymbol{x} \\ \overline{a'}T\_1^2\boldsymbol{x} \\ \vdots \\ \overline{a'}T\_1^N\boldsymbol{x} \end{bmatrix} = \begin{bmatrix} \overline{a'}\boldsymbol{x} \\ \overline{a'}T\_1\boldsymbol{x} \\ \vdots \\ \overline{a'}T\_1^{N-1}\boldsymbol{x} \end{bmatrix}.$$

Because of *<sup>T</sup>* <sup>=</sup> *<sup>T</sup> <sup>N</sup>*−<sup>1</sup> <sup>1</sup> , then the right side of Eq. (7.161) is

$$T^\*(a)(Tx) = \begin{bmatrix} \overline{a'}T\_1Tx\\ \overline{a'}T\_1^2Tx\\ \vdots\\ \overline{a'}T\_1^NT\_X \end{bmatrix} = \begin{bmatrix} \overline{a'}\overline{x}\\ \overline{a'}T\_1x\\ \vdots\\ \overline{a'}T\_1^{N-1}x \end{bmatrix}.$$

So (7.161) holds, the Lemma holds.

**Lemma 7.53** *<sup>q</sup>* (*A*) *is a convolution q-ary lattice, and* <sup>6</sup> *<sup>f</sup> g* 7 ∈ *<sup>q</sup>* (*A*)*.*

*Proof* By Lemma 7.27, *<sup>q</sup>* (*A*) is a *<sup>q</sup>*-ary lattice, that is *<sup>q</sup>*Z2*<sup>N</sup>* <sup>⊂</sup> *<sup>q</sup>* (*A*) <sup>⊂</sup> <sup>Z</sup>2*<sup>N</sup>* , we only prove *<sup>q</sup>* (*A*) is closed under linear transformation σ. If *y* ∈ *<sup>q</sup>* (*A*), then there is *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* <sup>⇒</sup> *<sup>y</sup>* <sup>≡</sup> *<sup>A</sup> x*(mod *q*), by the definition of σ,

$$\sigma(\mathbf{y}) \equiv \begin{bmatrix} T(T^\*(f)\mathbf{x}) \\ T(T^\*(\mathbf{g})\mathbf{x}) \end{bmatrix} = \begin{bmatrix} T^\*(f)T\mathbf{x} \\ T^\*(\mathbf{g})T\mathbf{x} \end{bmatrix} \equiv A'T\mathbf{x}(\mathbf{mod}\,\mathbf{q}).$$

Because of *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* <sup>⇒</sup> *T x* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* , thus σ (*y*) <sup>∈</sup> *<sup>q</sup>* (*A*). That is, *<sup>q</sup>* (*A*) is a convolution *<sup>q</sup>*-ary lattice, which is proved <sup>6</sup> *<sup>f</sup> g* 7 ∈ *<sup>q</sup>* (*A*). Let *e* = ⎡ ⎢ ⎢ ⎢ ⎣ 1 0 . . . 0 ⎤ ⎥ ⎥ ⎥ ⎦ <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* , then *T* <sup>∗</sup>( *f* ) · *e* is the first column of *T* <sup>∗</sup>( *f* ), that is

$$T^\*(f)e = f,\\ T^\*(g)e = g.$$

Thus,

$$A'e = \begin{bmatrix} T^\*(f)e \\ T^\*(g)e \end{bmatrix} = \begin{bmatrix} f \\ g \end{bmatrix} \in \Lambda\_q(A).$$

The Lemma holds.

With the above preparation, we now introduce the equivalent form of NTRU in lattice theory.

Public key generation. After selected private key <sup>6</sup> *<sup>f</sup> g* 7 <sup>∈</sup> <sup>Z</sup>2*<sup>N</sup>* , NTRU's public key is generated as follows: Because the convolution *q*-ary lattice *<sup>q</sup>* (*A*) containing 6 *f g* 7 is an entire lattice, *<sup>q</sup>* (*A*) has a unique HNF basis *H*, where

$$H = \begin{bmatrix} I\_N \ T^\*(h) \\ 0 & qI\_N \end{bmatrix}, h \equiv [T^\*(f)]^{-1} g(\text{mod } q). \tag{7.162}$$

By (7.48) of Lemma 7.28, the determinant *d*( *<sup>q</sup>* (*A*)) of *<sup>q</sup>* (*A*) is

$$|d(\Lambda\_q(A))| = |\det(\Lambda\_q(A))| = q^{2N-N} = q^N.$$

So the diagonal elements of *<sup>H</sup>* are *IN* and *q IN* . By the assumption *<sup>T</sup>* <sup>∗</sup>( *<sup>f</sup>* ) <sup>∈</sup> <sup>Z</sup>*N*×*<sup>N</sup>* , and reversible mod *q*, [*T* <sup>∗</sup>( *f* )] <sup>−</sup><sup>1</sup> is the inverse matrix of *<sup>T</sup>* <sup>∗</sup>( *<sup>f</sup>* ) mod *<sup>q</sup>*, *<sup>h</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* , its component *hi* is selected between <sup>−</sup>*<sup>q</sup>* <sup>2</sup> and *<sup>q</sup>* <sup>2</sup> , that is <sup>−</sup>*<sup>q</sup>* <sup>2</sup> <sup>≤</sup> *hi* <sup>&</sup>lt; *<sup>q</sup>* <sup>2</sup> , such an *h* is the only one that exists. It is not difficult to verify that *H* is an HNF matrix and the lattice generated by *H* is *<sup>q</sup>* (*A*), so *H* is the HNF basis of *<sup>q</sup>* (*A*). *H* is published as a public key.

Encryption transformation. The message sender encodes the plaintext as *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* , and randomly select a vector *<sup>r</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>N</sup>* to satisfy

$$m \in A\_{d/}\{1, 0, -1\}, \ r \in A\_{d/}\{1, 0, -1\}. \tag{7.163}$$

That is, *m* has *d <sup>f</sup>* + 1 1, *d <sup>f</sup>* −1, other components are 0. Then, the plaintext *m* is encrypted with the public key *H* of the message recipient:

$$c = H \begin{bmatrix} m \\ r \end{bmatrix} \equiv \begin{bmatrix} m + [T^\*(h)]r \\ 0 \end{bmatrix} (\text{mod } q). \tag{7.164}$$

*c* is called cryptosystem text, the first *N* components are *m* + [*T* <sup>∗</sup>(*h*)]*r*, the last *N* components are 0.

Decryption transformation. If all components of *m* + [*T* <sup>∗</sup>(*h*)]*r* are between intervals [−*<sup>q</sup>* 2 , *q* <sup>2</sup> ), the message receiver can determine that the cryptosystem text *c* is

$$c = \begin{bmatrix} m + [T^\*(h)]r \\ 0 \end{bmatrix}.$$

Then decrypt it with its own private key *T* <sup>∗</sup>( *f* ),

$$\begin{aligned} c &\equiv [T^\*(f)]m + [T^\*(f)][T^\*(h)]r(\operatorname{mod} q) \\ &\equiv [T^\*(f)]m + [T^\*(g)]r(\operatorname{mod} q). \end{aligned} \tag{7.165}$$

By the definition of *h*, there is

$$[T^\*(f)]h \equiv \operatorname{g}(\operatorname{mod} q) \Rightarrow T^\*([T^\*(f)]h) \equiv T^\*(\operatorname{g})(\operatorname{mod} q) \dots$$

And by Lemma 7.51, there is *T* <sup>∗</sup>([*T* <sup>∗</sup>( *f* )]*h*) ≡ *T* <sup>∗</sup>( *f* ) · *T* <sup>∗</sup>(*h*), so

$$T^\*(f)T^\*(h) \equiv T^\*(g) \\
(\text{mod } q) .$$

Equation (7.165) holds.

If

$$[T^\*(f)]m + [T^\*(g)]r \in \left[ -\frac{q}{2}, \frac{q}{2} \right]^N. \tag{7.166}$$

So do mod *p* operation on [*T* <sup>∗</sup>( *f* )]*m* + [*T* <sup>∗</sup>(*g*)]*r*, and by (7.158), thus

$$d\left( [T^\*(f)]m + [T^\*(g)]r \right) \text{mod } p = I\_N m + 0 \cdot r = m. \tag{7.167}$$

The correctness of decryption transformation is guaranteed.

In order to ensure that (7.167) holds, it can be seen from the above analysis that the following conditions are necessary.

$$\begin{cases} m + [T^\*(h)]r \in [-\frac{q}{2}, \frac{q}{2}]^N\\ [T^\*(f)]m + [T^\*(g)]r \in [-\frac{q}{2}, \frac{q}{2}]^N. \end{cases} \tag{7.168}$$

Obviously, the first condition can be derived from the second condition; that is, the (7.168) can be derived from the (7.166). We first prove the following Lemma.

**Lemma 7.54** *If the parameter meets d <sup>f</sup>* <sup>&</sup>lt; ( *<sup>q</sup>* <sup>4</sup> −1) <sup>2</sup>*<sup>p</sup> , then*

$$[T^\*(f)]m + [T^\*(g)]r \in \left[ -\frac{q}{2}, \frac{q}{2} \right]^N.$$

*Proof* Because all components of *m* and *r* are ±1 or 0, therefore, we only prove that the absolute value of the row vectors of [*<sup>T</sup>* <sup>∗</sup>( *<sup>f</sup>* )] and [*<sup>T</sup>* <sup>∗</sup>(*g*)] is not greater than *<sup>q</sup>* 2 . Write *f*  = ( *f*0, *f*1,..., *fN*−1), because of *f*<sup>0</sup> = 1,

$$\left| \sum\_{i=0}^{N-1} f\_i \right| \le \sum\_{i=0}^{N-1} |f\_i| = 1 + (2d\_f + 1)p < \frac{q}{4}.$$

Similarly,

$$\left| \sum\_{i=0}^{N-1} \mathbf{g}\_i \right| \le \sum\_{i=0}^{N-1} |\mathbf{g}\_i| = (2d\_f + 1)p < \frac{q}{4}.$$

Thus

$$[T^\*(f)]m + [T^\*(g)]r \in \left[ -\frac{q}{2}, \frac{q}{2} \right]^N.$$

The Lemma holds.

According to the above lemma, NTRU algorithm needs to add the following additional conditions to ensure the correctness of decryption transformation:

(D)

$$d\_f < \frac{(\frac{q}{4} - 1)}{2p}.$$

To sum up, when NTRU cryptosystem satisfies the additional restrictions (A)–(D) on the parameter system, the private key is <sup>6</sup> *<sup>f</sup> g* 7 and the public key is HNF matrix *H*, the encryption and decryption algorithm can be based on the algorithm introduced above.

# **7.8 McEliece/Niederreiter Cryptosystem**

McEliece/Niederreiter cryptosystem is a cryptosystem designed based on the asymmetry of coding and decoding of a special class of linear codes (Goppa codes) over a finite field. It was proposed by McEliece and Niederreiter in 1978 and 1985. It is included in the category of postquantum cryptography. We start with cyclic codes. Recall the concept of linear code in Chap. 2, let F*<sup>q</sup>* be a *q*-element finite field, also known as the alphabet, and the elements in F*<sup>q</sup>* are called letters or characters. The *N*-dimensional linear space F*<sup>N</sup> <sup>q</sup>* on F*<sup>q</sup>* is called the codeword space of length *N*. Any a vector *<sup>a</sup>* <sup>=</sup> (*a*0, *<sup>a</sup>*1,..., *aN*−<sup>1</sup>) <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , *a* is called a codeword of length *N*, which is usually written as *<sup>a</sup>* <sup>=</sup> *<sup>a</sup>*0*a*<sup>1</sup> ··· *aN*−<sup>1</sup> <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , from the previous section, we have

$$aT\_1 = (a\_0, a\_1, \dots, a\_{N-1})T\_1 = (a\_{N-1}, a\_0, a\_1, \dots, a\_{N-2}).\tag{7.169}$$

The reverse codeword *a* of a codeword *a* = *a*0*a*<sup>1</sup> ··· *aN*−<sup>1</sup> is defined as

$$
\overline{a} = a\_{N-1} a\_{N-2} \cdots a\_1 a\_0 \in \mathbb{F}\_q^N. \tag{7.170}
$$

If *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , and *C* is a *k*-dimensional linear subspace of F*<sup>N</sup> <sup>q</sup>* , which is called a linear code, usually written as *C* = [*N*, *k*], *k* = 0, or *k* = *N*, [*N*, 0] and [*N*, *N*] is called trivial code, actually,

$$[N,0] = \{0 = 00 \cdot \cdot 0\}, [N,N] = \mathbb{F}\_q^N.$$

The reverse order code *C* of code *C* is defined as *C* = {*c*|*c* ∈ *C*}, obviously, if *C* = [*N*, *k*], then *C* = [*N*, *k*].

**Definition 7.13** A linear code *C* of length *N* is called a cyclic code, if ∀ *c* ∈ *C* ⇒ *cT*<sup>1</sup> ∈ *C*.

Next, we give an algebraic expression of cyclic codes using ideal theory. For this purpose, note that <sup>F</sup>*<sup>q</sup>* [*x*] is a univariate polynomial ring on <sup>F</sup>*<sup>q</sup>* , and *<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup> is the principal ideal generated by polynomial *<sup>x</sup> <sup>N</sup>* <sup>−</sup> 1. Write *<sup>R</sup>* <sup>=</sup> <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup> as quotient ring. If *<sup>a</sup>* <sup>=</sup> *<sup>a</sup>*0*a*<sup>1</sup> ··· *aN*−<sup>1</sup> <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , then *a*(*x*) = *a*<sup>0</sup> + *a*1*x* ···+ *aN*−1*x <sup>N</sup>*−<sup>1</sup> ∈ *<sup>R</sup>*, so *<sup>a</sup>* <sup>→</sup> *<sup>a</sup>*(*x*) is a 1-1 correspondence of <sup>F</sup>*<sup>N</sup> <sup>q</sup>* → *R* and an isomorphism between additive groups. In this correspondence, we equate codeword *a* with polynomial *<sup>a</sup>*(*x*). That is *<sup>a</sup>* <sup>=</sup> *<sup>a</sup>*(*x*) <sup>⇒</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* <sup>=</sup> *<sup>R</sup>* <sup>=</sup> <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>, and any code *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> q* .

$$\mathcal{C} = \mathcal{C}(\mathfrak{x}) = \{c(\mathfrak{x}) | c \in \mathcal{C}\} \subset \mathcal{R}.$$

That is, a code *<sup>C</sup>* is equivalent to a subset of <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>. The following lemma reveals the algebraic meaning of a cyclic code.

**Lemma 7.55** *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup> is a cyclic code* <sup>⇔</sup> *<sup>C</sup>*(*x*) *is an ideal in* <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>*.*

*Proof* If *<sup>C</sup>*(*x*) is an ideal of <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>, obviously *<sup>C</sup>* is a linear code, for any code *c* = *c*0*c*<sup>1</sup> ··· *cN*−1, there is *c*(*x*) = *c*<sup>0</sup> + *c*1*x* +···+ *cN*−<sup>1</sup>*x <sup>N</sup>*−<sup>1</sup> ∈ *C*(*x*), thus *xc*(*x*) = *cN*−<sup>1</sup> + *cox* + *c*1*x* <sup>2</sup> +···+ *cN*−<sup>2</sup>*x <sup>N</sup>*−<sup>1</sup> ∈ *C*(*x*). So *cT*<sup>1</sup> = *cN*−<sup>1</sup>*c*0*c*<sup>1</sup> ··· *cN*−<sup>2</sup> <sup>∈</sup> *<sup>C</sup>*, *<sup>C</sup>* is a cyclic code on <sup>F</sup>*<sup>q</sup>* . Conversely, if *<sup>C</sup>* is a cyclic code, then *cT*<sup>1</sup> <sup>∈</sup> *<sup>C</sup>*, thus *cT <sup>k</sup>* <sup>1</sup> ∈ *C*, for all 0 ≤ *k* ≤ *N* − 1 holds.Where *T* <sup>0</sup> <sup>1</sup> = *IN* is the *N*-th order identity matrix. Since the polynomial *cT <sup>k</sup>* <sup>1</sup> (*x*) corresponding to *cT <sup>k</sup>* <sup>1</sup> is

$$cT\_1^k(\mathbf{x}) = \mathbf{x}^k c(\mathbf{x}).\tag{7.171}$$

So ∀ *g*(*x*) ∈ *R* ⇒ *g*(*x*)*c*(*x*) ∈ *C*(*x*). This proves that *C*(*x*) is an ideal. The Lemma holds.

Using the homomorphism theorem of rings, we give the mathematical expressions of all ideals in *<sup>R</sup>*. Let <sup>π</sup> be the natural homomorphism of <sup>F</sup>*<sup>q</sup>* [*x*] <sup>π</sup> −→ <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>, then all ideals in *<sup>R</sup>* correspond to all ideals containing ker<sup>π</sup> = *<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup> in <sup>F</sup>*<sup>q</sup>* [*x*] one by one, that is

$$\ker \pi = \langle \mathfrak{x}^N - 1 \rangle \subset A \subset \mathbb{F}\_q[\mathfrak{x}] \xrightarrow{\pi} \mathbb{F}\_q[\mathfrak{x}]/\langle \mathfrak{x}^N - 1 \rangle = R.$$

Since <sup>F</sup>*<sup>q</sup>* [*x*] is the principal ideal ring and *<sup>A</sup>* is an ideal of <sup>F</sup>*<sup>q</sup>* [*x*], and *<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup> ⊂ *<sup>A</sup>*, then

$$A = \langle \mathbf{g}(\mathbf{x}) \rangle, \text{ where } \mathbf{g}(\mathbf{x}) | \mathbf{x}^N - 1. \tag{7.172}$$

Therefore, all ideals in *R* are finite principal ideals, which can be listed as follows

$$\{\langle \mathbf{g}(\mathbf{x}) \rangle \bmod N - 1 | \mathbf{g}(\mathbf{x}) \text{ divide } \mathbf{x}^N - 1\}.$$

where *g*(*x*) mod *x <sup>N</sup>* − 1 represents the principal ideal generated by *g*(*x*) in *R*, that is

$$\langle \mathbf{g}(\mathbf{x}) \rangle \mathbf{mod} \, \mathbf{x}^N - 1 = \{ \mathbf{g}(\mathbf{x}) f(\mathbf{x}) | 0 \le \deg f(\mathbf{x}) \le N - \deg(\mathbf{g}(\mathbf{x})) - 1 \}. \tag{7.173}$$

This proves that <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup> is a ring of principal ideals, and the number of principal ideals is the number *d* + 1 of positive factors of *x <sup>N</sup>* − 1. The so-called positive factor is a polynomial with the first term coefficient of 1. Therefore, the Corollary is as follows:

**Corollary 7.11** *Let d be the number of positive factors of x <sup>N</sup>* − 1*, then the number of cyclic codes with length N is d* + 1*.*

A cyclic code *C* corresponds to an ideal *C*(*x*) = *g*(*x*) mod *x <sup>N</sup>* − 1 in *R*, we define

**Definition 7.14** Let *C* be a cyclic code, if *C*(*x*) = *g*(*x*) mod *x <sup>N</sup>* − 1, then *g*(*x*) is called the generating polynomial of *C*, where *g*(*x*)|*x <sup>N</sup>* − 1.

If *g*(*x*) = *x <sup>N</sup>* − 1, then *x <sup>N</sup>* − 1 mod *x <sup>N</sup>* − 1 = 0, corresponding to zero ideal in *R*. Thus, the corresponding cyclic code *C* = {0 = 00 ··· 0} is called zero code. If *<sup>g</sup>*(*x*) <sup>=</sup> 1, then *g*(*x*) mod *<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> *<sup>R</sup>*. The corresponding code *<sup>C</sup>* <sup>=</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* . Therefore, there are always two trivial cyclic codes in cyclic codes of length *N*, zero code and F*<sup>N</sup> <sup>q</sup>* , which correspond to zero ideal in *R* and *R* itself, respectively.

**Lemma 7.56** *Let g*(*x*)|*x <sup>N</sup>* − 1*, g*(*x*) *be the generating polynomial of cyclic code C, and* deg *g*(*x*) = *N* − *k, then C is* [*N*, *k*] *linear code, further, let g*(*x*) = *g*<sup>0</sup> + *g*1*x* + ···+ *gN*−*k*−<sup>1</sup>*x <sup>N</sup>*−*k*−<sup>1</sup> + *gN*−*<sup>k</sup> x <sup>N</sup>*−*<sup>k</sup> , the corresponding codeword g* = (*g*0, *g*1,..., *gN*−*<sup>k</sup>* , 0, 0*,* ..., 0) ∈ *C, then the generating matrix G of C is*

$$G = \begin{bmatrix} g \\ gT\_1 \\ \vdots \\ gT\_1^{k-1} \end{bmatrix}\_{k \times N} . \tag{7.174}$$

*Proof* Let *C* correspond to ideal *C*(*x*) = *g*(*x*) mod *x <sup>N</sup>* − 1, then *g*(*x*), *xg*(*x*), . . ., *<sup>x</sup><sup>k</sup>*−1*g*(*x*) <sup>∈</sup> *<sup>C</sup>*(*x*), their corresponding codewords are {*g*, *gT*1,..., *gT <sup>k</sup>*−<sup>1</sup> <sup>1</sup> } ⊂ *C*, let's prove that{*g*, *gT*1,..., *gT <sup>k</sup>*−<sup>1</sup> <sup>1</sup> }is a set of bases of*C*. If <sup>∃</sup> *ai* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* <sup>⇒</sup> "*<sup>k</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *ai gT <sup>i</sup>* 1 = 0, then its corresponding polynomial is 0, that is

$$\left(\sum\_{i=0}^{k-1} a\_i \lg T\_1^i \right)(\mathbf{x}) = \sum\_{i=0}^{k-1} a\_i \lg T\_1^i(\mathbf{x}) = \sum\_{i=0}^{k-1} a\_i \mathbf{x}^i \lg(\mathbf{x}) = \mathbf{0}.$$

Thus

$$\sum\_{i=0}^{k-1} a\_i x^i = 0 \Rightarrow \forall \; a\_i = 0, 0 \le i \le k - 1.$$

That is, {*g*, *gT*1,..., *gT <sup>k</sup>*−<sup>1</sup> <sup>1</sup> } is a linear independent group in *C*. Further ∀*c* ∈ *C*, we can prove that *c* can be expressed linearly. suppose *c* ∈ *C*, then *c*(*x*) ∈ *C*(*x*), by (7.174), there is *f* (*x*),

$$\begin{aligned} f(\mathbf{x}) &= f\_0 + f\_1 \mathbf{x} + \dots + f\_{k-1} \mathbf{x}^{k-1} \Rightarrow c(\mathbf{x}) = g(\mathbf{x}) f(\mathbf{x}), \\ &= \sum\_{i=0}^{k-1} f\_i \mathbf{x}^i g(\mathbf{x}) \Rightarrow c = \sum\_{i=0}^{k-1} f\_i g \, T\_1^i. \end{aligned}$$

This proves that the dimension of linear subspace *C* is *N* − deg *g*(*x*) = *k*; that is, *C* is [*N*, *k*] linear code. Its generating matrix *G* is

$$G = \begin{bmatrix} \mathbf{g} \\ \mathbf{g}T\_1 \\ \vdots \\ \mathbf{g}T\_1^{k-1} \end{bmatrix}\_{k \times N}$$

.

The Lemma holds.

Next, we discuss the dual code of cyclic code and its check matrix.

**Lemma 7.57** *Let C* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup> be a cyclic code and g*(*x*) *be the generating polynomial of g*(*x*)*,* deg *g*(*x*) = *N* − *k, let g*(*x*)*h*(*x*) = *x <sup>N</sup>* − 1, *h*(*x*) = *h*<sup>0</sup> + *h*1*x* +···+ *hk <sup>x</sup><sup>k</sup>* , *<sup>h</sup>* <sup>=</sup> (*h*0, *<sup>h</sup>*1,..., *hk* , <sup>0</sup>, <sup>0</sup>, ··· , <sup>0</sup>) <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup> is the corresponding codeword. h is the reverse order codeword, then the check matrix of C is*

$$H = \begin{bmatrix} \overline{h} \\ \overline{h}T\_1 \\ \vdots \\ \overline{h}T\_1^{N-k-1} \end{bmatrix}\_{(N-k)\times N}.\tag{7.175}$$

*The dual code C*<sup>⊥</sup> *of C is* [*N*, *N* − *k*] *linear code, and*

$$C^\perp = \{aH|a \in \mathbb{F}\_q^{N-k}\},$$

*h*(*x*) *is called the check polynomial of cyclic code C.*

*Proof* By Lemma 7.56, *C* is a *k*-dimensional linear subspace, and the generating matrix *G* is given by (7.175). Because of *g*(*x*)*h*(*x*) = *x <sup>N</sup>* − 1, then there is *g*(*x*)*h*(*x*) = 0 in ring *R*. Equivalently,

$$g\_0 h\_i + g\_1 h\_{i-1} + \dots + g\_{N-k} h\_{i-N+k} = 0, \forall \ 0 \le i \le N-1.$$

The matrix of the above formula is expressed as *G H* = 0, so *H* is the generation matrix of dual code of *C*, and we have Lemma holds.

*Remark 7.5* The polynomial *h*(*x*) corresponding to the reverse codeword *h* is

$$\overline{h(\boldsymbol{\alpha})} = h\_0 \boldsymbol{\alpha}^{N-1} + h\_1 \boldsymbol{\alpha}^{N-2} + \dots + h\_k \boldsymbol{\alpha}^{N-k-1}.$$

In general, when *<sup>h</sup>*(*x*)|*<sup>x</sup> <sup>N</sup>* <sup>−</sup> 1, *<sup>h</sup>*(*x*) *<sup>x</sup> <sup>N</sup>* <sup>−</sup> 1, therefore, the dual code of cyclic code is not necessarily cyclic code.

**Definition 7.15** Let *x <sup>N</sup>* − 1 = *g*1(*x*)*g*2(*x*)··· *gt*(*x*) be the irreducible decomposition of *<sup>x</sup> <sup>N</sup>* <sup>−</sup> 1 on <sup>F</sup>*<sup>q</sup>* , where *gi*(*x*)(<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>t</sup>*) is the irreducible polynomial with the first term coefficient of 1 in <sup>F</sup>*<sup>q</sup>* [*x*]. Then the cyclic code generated by *gi*(*x*) is called the *i*-th maximal cyclic code in F*<sup>N</sup> <sup>q</sup>* , denote as *M*<sup>+</sup> *<sup>i</sup>* . The cyclic code generated by *<sup>x</sup> <sup>N</sup>*−<sup>1</sup> *gi*(*x*) is called the *i*-th minimal cyclic code, denote as *M*<sup>−</sup> *i* .

Minimal cyclic codes are also called irreducible cyclic codes because they no longer contain the nontrivial cyclic codes of F*<sup>N</sup> <sup>q</sup>* in *M*<sup>−</sup> *<sup>i</sup>* . The irreducibility of minimal cyclic codes can be derived from the fact that the ideal *M*<sup>−</sup> *<sup>i</sup>* (*x*) in *R* corresponding to *M*<sup>−</sup> *<sup>i</sup>* is a field. We can give a proof of pure algebra.

**Corollary 7.12** *Let M*<sup>−</sup> *<sup>i</sup> be the i -th minimal cyclic code of* <sup>F</sup>*<sup>N</sup> <sup>q</sup>* (1 ≤ *i* ≤ *t*)*, M*<sup>−</sup> *<sup>i</sup>* (*x*) *is the ideal corresponding to M*<sup>−</sup> *<sup>i</sup> in R, then M*<sup>−</sup> *<sup>i</sup>* (*x*) *is a field, thus, M*<sup>−</sup> *<sup>i</sup> no longer contains any nontrivial cyclic code of* F*<sup>N</sup> q .*

*Proof* Let *<sup>g</sup>*(*x*) <sup>=</sup> (*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>)/*gi*(*x*), *gi*(*x*) be an irreducible polynomial in <sup>F</sup>*<sup>q</sup>* [*x*], by (7.175),

$$M\_i^-(\mathbf{x}) = \mathbf{g}(\mathbf{x}) \mathbb{F}\_q[\mathbf{x}]/(\mathbf{x}^N - 1)\mathbb{F}\_q[\mathbf{x}] \cong \mathbb{F}\_q[\mathbf{x}]/g\_i(\mathbf{x})\mathbb{F}\_q[\mathbf{x}],$$

where *<sup>g</sup>*(*x*)F*<sup>q</sup>* [*x*] is the principal ideal generated by *<sup>g</sup>*(*x*) in <sup>F</sup>*<sup>q</sup>* [*x*]. Since *gi*(*x*) is an irreducible polynomial, so *M*<sup>−</sup> *<sup>i</sup>* (*x*) is a field.

*Example 7.1* All cyclic codes with length of 7 are determined on binary finite field F2.

**Solve:** Polynomial *<sup>x</sup>* <sup>7</sup> <sup>−</sup> 1 has the following irreducible decomposition on <sup>F</sup><sup>2</sup>

$$x^7 - 1 = (x - 1)(x^3 + x + 1)(x^3 + x^2 + 1)... $$

Therefore, *<sup>x</sup>* <sup>7</sup> <sup>−</sup> 1 has 7 positive factors on <sup>F</sup>2, by Corollary 7.11, there are 8 cyclic codes with length of 7 on F2. Where 0 and F<sup>7</sup> <sup>2</sup> are two trivial cyclic codes. There are three maximal cyclic codes generated by *g*(*x*) = *x* − 1, *g*(*x*) = *x* <sup>3</sup> + *x* + 1 and *g*(*x*) = *x* <sup>3</sup> + *x* <sup>2</sup> + 1, respectively. The dimensions of the corresponding cyclic codes are 6 dimension, 4 dimension and 4 dimension. Similarly, there are three minimal cyclic codes, corresponding to the dimension of one and two three-dimensional cyclic codes.

Another characterization of cyclic codes is zeroing polynomials, if *x <sup>N</sup>* − 1 = *g*1(*x*)··· *gt*(*x*), the ideal *M*<sup>+</sup> *<sup>i</sup>* (*x*) in *R* corresponding to the maximum cyclic code *M*+ *<sup>i</sup>* (1 ≤ *i* ≤ *t*) generated by *gi*(*x*) is

$$M\_i^+(\mathbf{x}) = \{ \mathbf{g}\_i(\mathbf{x}) f(\mathbf{x}) | 0 \le \deg f(\mathbf{x}) \le N - \deg \mathbf{g}\_i(\mathbf{x}) - 1 \}.$$

Let β be a root of *gi*(*x*) in the split field. Then *gi*(*x*) is the minimal polynomial of β in <sup>F</sup>*<sup>q</sup>* [*x*], all *<sup>c</sup>*(*x*) <sup>∈</sup> *<sup>M</sup>*<sup>+</sup> *<sup>i</sup>* (*x*) ⇒ *c*(β) = 0. Therefore,

$$M\_i^+(\mathfrak{x}) = \{ c(\mathfrak{x}) | c(\mathfrak{x}) \in R, \text{and } c(\beta) = 0 \}.$$

*Example 7.2* Suppose *N* = (*q<sup>m</sup>* − 1)/*q* − 1, (*m*, *q* − 1) = 1, β is an *N*-th primitive unit root in F*qm* , then the cyclic code

$$C = \{c(\mathbf{x}) | c(\boldsymbol{\beta}) = 0, c(\mathbf{x}) \in R\}$$

is equivalent to Hamming code [*N*, *N* − *m*].

*Proof* Because (*m*, *q* − 1) = 1, and

$$N = q^{m-1} + q^{m-2} + \dots + q + 1 = (q-1)(q^{m-2} + 2q^{m-3} + \dots + (m-1)) + m.$$

So (*N*, *q* − 1) = 1. Therefore, β*<sup>i</sup>*(*q*−1) = 1, for 1 ≤ *i* ≤ *N* − 1, in other words, β*<sup>i</sup>* ∈/ <sup>F</sup>*<sup>q</sup>* for <sup>∀</sup> <sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>* <sup>−</sup> 1 holds. In <sup>F</sup>*qm* , any two elements of {1, β, β<sup>2</sup>,...,β *<sup>N</sup>*−<sup>1</sup>} are linearly independent on F*<sup>q</sup>* . If each element is regarded as an *m*-dimensional column vector on <sup>F</sup>*<sup>q</sup>* , then the *<sup>m</sup>* <sup>×</sup> *<sup>N</sup>*-order matrix

$$H = [1, \beta, \beta^2, \dots, \beta^{N-1}]\_{m \times N}$$

constitutes the check matrix of cyclic code *C*, and any two rows of *H* are linearly independent on <sup>F</sup>*<sup>q</sup>* , by the definition, *<sup>C</sup>* is [*N*, *<sup>N</sup>* <sup>−</sup> *<sup>m</sup>*] Hamming code.

**Lemma 7.58** *Let C* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup> be a cyclic code, C*(*x*) ⊂ *Fq* [*x*]/*x <sup>N</sup>* − 1 *be an ideal,* (*N*, *q*) = 1*, then C*(*x*) *contains a multiplication unit element c*(*x*) ∈ *C*(*x*) ⇒

$$d(\mathbf{x})d(\mathbf{x}) \equiv d(\mathbf{x}) \\
(\text{mod } \mathbf{x}^N - 1), \forall \, d(\mathbf{x}) \in C(\mathbf{x}).$$

*The unit element c*(*x*) *in C*(*x*) *is unique.*

*Proof* Because (*N*, *<sup>q</sup>*) <sup>=</sup> <sup>1</sup> <sup>⇒</sup> *<sup>x</sup> <sup>N</sup>* <sup>−</sup> 1 has no double root inF*<sup>q</sup>* , let *<sup>g</sup>*(*x*) be the generating polynomial of *C* and *h*(*x*) be the checking polynomial of *C*, that is *g*(*x*)*h*(*x*) = *<sup>x</sup> <sup>N</sup>* <sup>−</sup> 1. Therefore, (*g*(*x*), *<sup>h</sup>*(*x*)) <sup>=</sup> 1, and there is *<sup>a</sup>*(*x*), *<sup>h</sup>*(*x*) <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* [*x*],<sup>⇒</sup>

$$a(\mathbf{x})g(\mathbf{x}) + b(\mathbf{x})h(\mathbf{x}) = 1.$$

Let *c*(*x*) = *a*(*x*)*g*(*x*) = 1 − *b*(*x*)*h*(*x*) ∈ *C*(*x*), so for ∀ *d*(*x*) ∈ *C*(*x*), write *d*(*x*) = *g*(*x*) *f* (*x*), thus

$$\begin{aligned} c(\mathbf{x})d(\mathbf{x}) &= a(\mathbf{x})g(\mathbf{x})g(\mathbf{x})f(\mathbf{x}) \\ &= (1 - b(\mathbf{x})h(\mathbf{x}))g(\mathbf{x})f(\mathbf{x}) \\ &= g(\mathbf{x})f(\mathbf{x}) - b(\mathbf{x})h(\mathbf{x})g(\mathbf{x})f(\mathbf{x}). \end{aligned}$$

Therefore

$$c(\mathbf{x})d(\mathbf{x}) \equiv d(\mathbf{x})(\text{mod } \mathbf{x}^N - 1).$$

There is *<sup>c</sup>*(*x*)*d*(*x*) <sup>=</sup> *<sup>d</sup>*(*x*) in *<sup>R</sup>* <sup>=</sup> <sup>F</sup>*<sup>q</sup>* [*x*]/*<sup>x</sup> <sup>N</sup>* <sup>−</sup> <sup>1</sup>. That is, *<sup>c</sup>*(*x*) is the multiplication unit element of *C*(*x*). obviously, *c*(*x*) exists only. The Lemma holds.

**Definition 7.16** *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* is a cyclic code, and the multiplication unit element *c*(*x*) in *C*(*x*) is called the idempotent element of *C*. If *C* = *M*<sup>−</sup> *<sup>i</sup>* is the *i*-th minimal cyclic code, the idempotent element of *C* is called the primitive idempotent element, denote as θ*i*(*x*).

**Lemma 7.59** *Let C*<sup>1</sup> <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* ,*C*<sup>2</sup> <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup> are two cyclic codes,* (*N*, *q*) = 1*, Idempotent elements are c*1(*x*) *c*2(*x*)*, respectively, then*


*Proof* It is obvious that *C*<sup>1</sup> <sup>+</sup> *<sup>C</sup>*<sup>2</sup> and *<sup>C</sup>*<sup>1</sup> <sup>+</sup> *<sup>C</sup>*<sup>2</sup> are cyclic codes in <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , because they correspond to ideal *C*1(*x*) and *C*2(*x*) in *R*, we have

$$C\_1(\mathfrak{x}) \cap C\_2(\mathfrak{x}) \text{ and } C\_1(\mathfrak{x}) + C\_2(\mathfrak{x})$$

is still the ideal in *R*. Therefore, the corresponding codes *C*<sup>1</sup> ∩ *C*<sup>2</sup> and *C*<sup>1</sup> + *C*<sup>2</sup> are still cyclic codes, and the conclusion on idempotents is not difficult to verify. The Lemma holds.

In 1959, A. Hocquenghem and 1960, R. Bose and D. Chaudhuri independently proposed a special class of cyclic codes, which required minimal distance. At present, it is generally called BCH codes in academic circles.

**Definition 7.17** A cyclic code *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* with length *N* is called a δ-BCH code. If its generating polynomial is the least common multiple of the minimal polynomial of β, β2,...,βδ−1, where δ is a positive integer, β is a primitive *N*-th unit root. δ-BCH code is also called BCH code with design distance of <sup>δ</sup>. If <sup>β</sup> <sup>∈</sup> <sup>F</sup>*qm* , *<sup>N</sup>* <sup>=</sup> *<sup>q</sup><sup>m</sup>* <sup>−</sup> 1, such BCH codes are called primitive.

**Lemma 7.60** *Let d be the minimal distance of a* δ*-BCH code, then we have d* ≥ δ*.*

*Proof* Suppose *x <sup>N</sup>* − 1 = (*x* − 1)*g*1(*x*)*g*2(*x*)··· *gt*(*x*), β is a primitive *N*-th unit root on <sup>F</sup>*<sup>q</sup>* , then <sup>β</sup> is the root of a *gi*(*x*). Let deg *gi*(*x*) <sup>=</sup> *<sup>m</sup>* <sup>⇒</sup> <sup>β</sup> <sup>∈</sup> <sup>F</sup>*qm* . Because of [F*qm* : <sup>F</sup>*<sup>q</sup>* ] = *<sup>m</sup>*, we can think of β, β<sup>2</sup>,...,β<sup>δ</sup>−<sup>1</sup> as an *<sup>m</sup>*-dimensional column vector. Let *H* be the following *m*(δ − 1) × *N*-order matrix.

$$H = \begin{bmatrix} 1 & \beta & \beta^2 & \cdots & \beta^{N-1} \\ 1 & \beta^2 & \beta^4 & \cdots & \beta^{2(N-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \beta^{\delta-1} & \beta^{2(\delta-1)} & \cdots & \beta^{(N-1)(\delta-1)} \end{bmatrix}\_{m(\delta-1)\times N}$$

In fact, *H* is the check matrix of δ-BCH code *C*, that is

$$c \in C \Longleftrightarrow cH' = 0.$$

We prove that any (δ − 1) column vectors of *H* are linear independent vectors. Let the first component of these (δ − 1) column vectors be β*<sup>i</sup>*<sup>1</sup> , β*<sup>i</sup>*<sup>2</sup> ,...,β*<sup>i</sup>*δ−<sup>1</sup> , where *i <sup>j</sup>* ≥ 0, the corresponding determinant is Vandermonde determinant , and

$$
\Delta = \beta^{i\_1 + i\_2 + \dots + i\_{\ell - 1}} \prod\_{r > s} (\beta^{i\_r} - \beta^{i\_s}) \neq 0.
$$

Therefore, any (δ − 1) column vectors of *H* are linearly independent. Thus, the minimum distance of *C* is *d* ≥ δ.

Now, we can introduce the design principle of McEliece/Niederreiter cryptosystem. Its basic mathematical idea is based on the decoding principle of error correction code. Recall the concept of error correction code in Chap. 2, a code *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* is called *<sup>t</sup>*-error correction code (*<sup>t</sup>* <sup>≥</sup> 1 is a positive integer). If for <sup>∀</sup> *<sup>y</sup>* <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , there is at most one codeword *c* ∈ *C* ⇒ *d*(*c*, *y*) ≤ *t*, *d*(*c*, *y*) is the Hamming distance between *c* and *y*. We know that if the minimum distance of a code *C* is *d*, then *C* is a *t*-error correction code, where *<sup>t</sup>* <sup>=</sup> :*<sup>d</sup>*−<sup>1</sup> 2 ; is the smallest integer not less than *<sup>d</sup>*−<sup>1</sup> <sup>2</sup> . Lemma 7.60 proves the existence of *t*-error correction codes for any positive integer *t*, i.e., 2*t* + 1- BCH code (δ = 2*t* + 1), this kind of code is called Goppa code (see the next section), which provides a theoretical basis for McEliece/Niederreiter cryptosystem. Next, we will introduce the working mechanism of this kind of cryptosystem in detail. First, let's look at the generation of key.

Private key: Select a *<sup>t</sup>*-error correction code *<sup>C</sup>* <sup>⊂</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* ,*C* = [*N*, *k*], *H* is the check matrix of *<sup>C</sup>*, *<sup>H</sup>* is an (*<sup>N</sup>* <sup>−</sup> *<sup>k</sup>*) <sup>×</sup> *<sup>N</sup>*-dimensional matrix. For <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* , *x* → *x H* ∈

.

F*<sup>N</sup>*−*<sup>k</sup> <sup>q</sup>* is a correspondence of Spaces F*<sup>N</sup> <sup>q</sup>* to F*<sup>N</sup>*−*<sup>k</sup> <sup>q</sup>* , let's prove that this correspondence is a single shot on a special codeword whose weight is not greater than *t*.

**Lemma 7.61** <sup>∀</sup> *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup> , if x H* = *y H , and* w(*x*) ≤ *t*, w(*y*) ≤ *t, then x* = *y.*

*Proof* By hypothesis,

$$\mathbf{x}H' = \mathbf{y}H' \Rightarrow (\mathbf{x} - \mathbf{y})H' = \mathbf{0} \Rightarrow \mathbf{x} - \mathbf{y} \in \mathbf{C} \dots$$

Obviously, the Hamming distance *d*(0, *x*) = w(*x*) ≤ *t* between *x* and 0, and the Hamming distance *d*(*x*, *x* − *y*) between *x* and *x* − *y* is

$$d(\mathbf{x}, \mathbf{x} - \mathbf{y}) = w(\mathbf{x} - (\mathbf{x} - \mathbf{y})) = w(\mathbf{y}) \le t.$$

Because *C* is *t*-error correction code, then *x* − *y* = 0, the Lemma holds.

We use *t*-error correction code *C* and check matrix *H* as the private key.

Public key: In order to generate the public key, we randomly select a permutation matrix *PN*×*<sup>N</sup>* so that *IN* is an *N*-order identity matrix, *IN* = [*e*1, *e*2,..., *eN* ], σ ∈ *SN* is an *N*-ary substitution, then

$$P = \sigma\left(I\_N\right) = [e\_{\sigma\left(\mathbf{l}\right)}, e\_{\sigma\left(2\right)}, \dots, e\_{\sigma\left(N\right)}].$$

This kind of matrix is also called Wyel matrix. A nonsingular diagonal matrix diag{λ1, λ2,...,λ*<sup>N</sup>* }(λ*<sup>i</sup>* <sup>∈</sup> <sup>F</sup>*<sup>q</sup>* , λ*<sup>i</sup>* <sup>=</sup> <sup>0</sup>) can also be randomly selected, and suppose

$$P = \sigma(\text{diag}\{\lambda\_1, \lambda\_2, \dots, \lambda\_N\}) = \text{diag}\{\lambda\_{\sigma\_1}, \lambda\_{\sigma\_2}, \dots, \lambda\_{\sigma\_N}\}.$$

Let *M* be an (*N* − *k*) × (*N* − *k*)-order invertible matrix. The public key is the (*N* − *k*) × *N*-order matrix *K* generated as follows,

 $K = PH^\prime M$ , this is  $N \times (N - k)$  ordermatrix.

We take *K* as the public key and *H*, *P* and *M* as the private key.

Encryption: Let *<sup>m</sup>* <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* be a codeword, w(*m*) ≤ *t*, encrypt *m* as plaintext as follows.

$$c = mK \in \mathbb{F}\_q^{N-k}, c \text{ is crypotospsten text.}$$

In fact, a plaintext with length *N* and weight no greater than *t* on F*<sup>q</sup>* is encrypted into a cryptosystem text with length (*<sup>N</sup>* <sup>−</sup> *<sup>k</sup>*) on <sup>F</sup>*<sup>q</sup>* through public key *<sup>K</sup>*.

Decrypt: After receiving cryptosystem text *c*, decrypt it through private keys *H*, *P* and *M*.

$$c \cdot M^{-1} = m \, K M^{-1} = m \, PH' M M^{-1} = m \, PH'.$$

Since *m P* <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* and *m* have the same root, that is

$$w(m) = w(mP) \le t.$$

Using the decoding principle of error correction code: all codewords *x H* = *mPH* satisfying *<sup>x</sup>* <sup>∈</sup> <sup>F</sup>*<sup>N</sup> <sup>q</sup>* actually constitute an additive coset of code *C*, as the leader vector of this additive coset, *m P* can be obtained accurately. That is

$$mPH' \stackrel{\text{decode}}{\longrightarrow} mP.$$

Finally, we have *m* = (*m P*) · *P*−1, and get plaintext.

# **7.9 Ajtai/Dwork Cryptosystem**

By choosing an appropriate *<sup>n</sup>* <sup>×</sup> *<sup>m</sup>*-order matrix *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*n*×*<sup>m</sup> <sup>q</sup>* , two *<sup>m</sup>*-dimensional *<sup>q</sup>*element lattices *<sup>q</sup>* (*A*) and <sup>⊥</sup> *<sup>q</sup>* (*A*) are defined (see (7.45) and (7.46)),

$$\Lambda\_q(A) = \{ \mathbf{y} \in \mathbb{Z}^m | \exists \ \mathbf{x} \in \mathbb{Z}^n \Rightarrow \mathbf{y} \equiv A^\not\propto (\text{mod } q) \}$$

and

$$\Lambda\_q^\perp(A) = \{ \mathbf{y} \in \mathbb{Z}^m | A\mathbf{y} \equiv 0 (\text{mod } q) \}.$$

Using matrix *A*, an anti-collision hash function can be defined:

$$f\_A: \{0, 1, \ldots, d-1\}^m \to \mathbb{Z}\_q^n,\tag{7.176}$$

where for any *y* ∈ {0, 1,..., *d* − 1}*<sup>m</sup>*, define *f <sup>A</sup>*(*y*) as

$$f\_A(\mathbf{y}) = A\mathbf{y} \mod q,\tag{7.177}$$

If parameter *d*, *q*, *n*, *m* is satisfied

$$n\log q < m\log d \Rightarrow \frac{n\log q}{\log d} < m. \tag{7.178}$$

Then Hash function *f <sup>A</sup>* will produce collision, that is there is *y*, *y* ∈ {0, 1,..., *d* − 1}*<sup>m</sup>*, *y* = *y* , and *f <sup>A</sup>*(*y*) = *f <sup>A</sup>*(*y* ). By (7.177), we have it directly

$$A(\mathbf{y} - \mathbf{y}^{\prime}) \equiv 0 \\
(\text{mod } q) \not\Rightarrow \mathbf{y} - \mathbf{y}^{\prime} \in \Lambda\_q^{\perp}(A),$$

this shows that the collision points *y* and *y* of Hash function *f <sup>A</sup>* directly lead to a shortest vector *y* − *y* on *q*-element lattice <sup>⊥</sup> *<sup>q</sup>* (*A*).

In order to obtain the anti-collision Hash function, the selection of *n* × *m*-order matrix *A* is very important. First, we can select the parameter system: let *d* = 2, *q* = *n*2, *n*|*m*, and *m* log 2 > *n* log *q*, where *n* is a positive integer. In Ajtai/Dwork cryptographic algorithm, there are two choices of parameter matrix *A*, one is cyclic matrix and the other is more general ideal matrix. Their corresponding *q*-element lattice <sup>⊥</sup> *<sup>q</sup>* (*A*) are cyclic lattice and ideal lattice, respectively.

Cyclic lattice

Because *<sup>n</sup>*|*m*, *<sup>A</sup>* can be divided into *<sup>m</sup> <sup>n</sup> n* × *n*-order cyclic matrices, that is

$$A = [A^{(1)}, A^{(2)}, \dots, A^{(\frac{\pi}{\pi})}],\tag{7.179}$$

where <sup>α</sup>(*i*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* is the *n*-dimensional column vector and *A*(*i*) is the cyclic matrix generated by α(*i*) (see (7.149)), that is

$$A^{(i)} = T^\*(\alpha^{(i)}) = [\alpha^{(i)}, T\alpha^{(i)}, \dots, T^{n-1}\alpha^{(i)}], 1 \le i \le \frac{m}{n}.$$

*A* is called an *n* × *m*-dimensional generalized cyclic matrix, and the *q*-element lattice in R*<sup>m</sup>* defined by *A*,

$$\Lambda\_q^\perp(A) = \{ \mathbf{y} \in \mathbb{Z}^m | A \mathbf{y} \equiv \mathbf{0}(\text{mod } q) \}$$

is called a cyclic lattice. The Ajtai/Dwork cryptosystem based on cyclic lattice can be stated as follows:

Algorithm 1: Hash function based on cyclic lattice. Parameter: *q*, *n*, *m*, *d* is a positive integer, *n* | *m*, *m* log *d* > *n* log *q*. Secret key: *<sup>m</sup> <sup>n</sup>* column vectors <sup>α</sup>(*i*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* , <sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup> n* . Hash function *<sup>f</sup> <sup>A</sup>* : {0, <sup>1</sup>,..., *<sup>d</sup>* <sup>−</sup> <sup>1</sup>}*<sup>m</sup>* −→ <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* define as

$$f\_A(\mathbf{y}) \equiv A \mathbf{y} (\mathbf{mod} \, q),$$

the cyclic matrix *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*×*<sup>m</sup> <sup>q</sup>* is given by (7.179).

We can extend the above concepts of cyclic matrix and cyclic lattice to more general cases and obtain the concepts of ideal matrix and ideal lattice. Let *h*(*x*) be the first integer coefficient polynomial of *n* degree, *h*(*x*) = *x <sup>n</sup>* + *an*−<sup>1</sup>*x <sup>n</sup>*−<sup>1</sup> +···+ *<sup>a</sup>*1*<sup>x</sup>* <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>[*x*], define the rotation matrix *Th* as

$$T\_h = \begin{pmatrix} 0 & \cdots & 0 & -a\_0 \\ & & -a\_1 \\ & & \vdots \\ & & & -a\_{n-1} \end{pmatrix},\tag{7.180}$$

if *h*(*x*) = *x <sup>n</sup>* − 1 is a special polynomial, then *Th* = *T* . *T* is highlighted in Sect. 7.7 of this chapter. Here, we discuss the more general *Th*. Obviously, when the constant term *a*<sup>0</sup> = 0, *Th* is a reversible *n*-order square matrix, and *Th* = det(*Th*) = (−1)*na*0.

**Lemma 7.62** *The characteristic polynomial of rotation matrix Th is f* (λ) = *h*(λ)*.*

*Proof* By the definition, the characteristic polynomial *f* (λ) of *Th* is

$$\begin{aligned} f(\lambda) &= \det(\lambda I\_n - T\_h) \\ &= \begin{vmatrix} \lambda & 0 & \cdots & 0 & a\_0 \\ & \ddots & 0 & \cdots & \vdots \\ -1 & \lambda & \cdots & \vdots & \vdots \\ \dots & \dots & \dots & \dots & \dots \\ 0 & \cdots & \dots & \lambda & a\_{n-2} \\ 0 & \cdots & \dots & 1 & a\_{n-1} \end{vmatrix} \\ &= \frac{1}{\lambda} \frac{1}{\lambda^2} \dots \frac{1}{\lambda^{n-1}} \begin{vmatrix} \lambda & 0 & \cdots & 0 & a\_0 \\ 0 & \lambda & \cdots & \vdots & \vdots \\ 0 & \lambda & \cdots & \vdots & \vdots \\ \dots & \dots & \dots & \dots & \dots \\ 0 & \cdots & \dots & \lambda^n + a\_{n-1}\lambda^{n-1} + \cdots + a\_1\lambda + a\_0 \end{vmatrix} \\ &= \lambda^n + a\_{n-1}\lambda^{n-1} + \dots + a\_1\lambda + a\_0 = h(\lambda). \end{aligned}$$

**Lemma 7.63** *Let h*(*x*) <sup>=</sup> *<sup>x</sup> <sup>n</sup>* <sup>+</sup> *an*−1*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> +···+ *<sup>a</sup>*1*<sup>x</sup>* <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>[*x*]*, if a*<sup>0</sup> <sup>=</sup> <sup>0</sup>*, then the rotation matrix Th is a reversible n-order square matrix, and*

$$T\_h^{-1} = \begin{bmatrix} -a\_0^{-1}\alpha \ I\_{n-1} \\ -a\_0^{-1} & 0 \end{bmatrix}, \alpha = \begin{bmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_{n-1} \end{bmatrix} \in \mathbb{Z}^{n-1}.$$

*Proof* By the definition of *Th*,

$$\begin{aligned} \, \_H \cdot \begin{bmatrix} -a\_0^{-1}\alpha \ I\_{n-1} \\ -a\_0^{-1} & 0 \end{bmatrix} &= \begin{bmatrix} 0 & -a\_0 \\ I\_{n-1} & -\alpha \end{bmatrix} \begin{bmatrix} a\_0^{-1}\alpha \ I\_{n-1} \\ -a\_0^{-1} & 0 \end{bmatrix} \\ &= \begin{bmatrix} 1 & 0 \\ 0 \ I\_{n-1} \end{bmatrix} = I\_n. \end{aligned}$$

So

$$T\_h^{-1} = \begin{bmatrix} -a\_0^{-1}\alpha \ I\_{n-1} \\ -a\_0^{-1} & 0 \end{bmatrix}.$$

For a given first polynomial *<sup>h</sup>*(*x*) <sup>=</sup> *<sup>x</sup> <sup>n</sup>* <sup>+</sup> *an*−<sup>1</sup>*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> +···+ *<sup>a</sup>*1*<sup>x</sup>* <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>[*x*] of degree *<sup>n</sup>*, let *<sup>R</sup>* be a residue class ring of module *<sup>h</sup>*(*x*) in <sup>Z</sup>[*x*], i.e.,

$$R = \mathbb{Z}[\mathbf{x}]/\langle h(\mathbf{x})\rangle,\tag{7.181}$$

where *h*(*x*) is the ideal generated by *<sup>h</sup>*(*x*) in <sup>Z</sup>[*x*]. Because of deg *<sup>h</sup>*(*x*) <sup>=</sup> *<sup>n</sup>*, then polynomial *g*(*x*) ∈ *R* in *R* has a unique expression: *g*(*x*) = *gn*−<sup>1</sup>*x <sup>n</sup>*−<sup>1</sup> + *gn*−<sup>2</sup>*x <sup>n</sup>*−<sup>2</sup> + ···+ *<sup>g</sup>*1*<sup>x</sup>* <sup>+</sup> *<sup>g</sup>*<sup>0</sup> <sup>∈</sup> *<sup>R</sup>*, define mapping <sup>σ</sup> : *<sup>R</sup>* −→ <sup>Z</sup>*<sup>n</sup>* as

$$\sigma(\mathbf{g}(\mathbf{x})) = \begin{bmatrix} \mathbf{g}\_0 \\ \mathbf{g}\_1 \\ \vdots \\ \mathbf{g}\_{n-1} \end{bmatrix} \in \mathbb{Z}^n. \tag{7.182}$$

Obviously, <sup>σ</sup> is an Abel group isomorphism of *<sup>R</sup>* −→ <sup>Z</sup>*<sup>n</sup>*. Therefore, any polynomial *g*(*x*) in *R* can be regarded as an *n*-dimensional integer column vector.

**Definition 7.18** For any *n*-dimensional column vector *g* = σ (*g*(*x*)) = ⎡ ⎢ ⎢ ⎢ ⎣ *g*0 *g*1 . . . *gn*−<sup>1</sup> ⎤ ⎥ ⎥ ⎥ ⎦ ∈

Z*<sup>n</sup>* in Z*<sup>n</sup>*, define

$$T\_h^\*(\mathbf{g}) = [\mathbf{g}, T\_h(\mathbf{g}), T\_h^2(\mathbf{g}), \dots, T\_h^{n-1}(\mathbf{g})]\_{n \times n},\tag{7.183}$$

the *n*-order square matrix *T* <sup>∗</sup> *<sup>h</sup>* (*g*) is called an ideal matrix generated by vector *g*.

Ideal matrix is a more general generalization of cyclic matrix. The former corresponds to a first *n*-degree polynomial *h*(*x*), and the latter corresponds to a special polynomial *x <sup>n</sup>* − 1. We first prove that the ideal matrix *T* <sup>∗</sup> *<sup>h</sup>* (*g*) and the rotation matrix *Th* generated by any vector *<sup>g</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* are commutative under matrix multiplication.

**Lemma 7.64** *For any given first n-degree polynomial h*(*x*) <sup>∈</sup> <sup>Z</sup>[*x*]*, and n-dimensional column vector g* <sup>∈</sup> <sup>Z</sup>*n, we have*

$$T\_h \cdot T\_h^\*(\mathbf{g}) = T\_h^\*(\mathbf{g}) \cdot T\_h.$$

*Proof* Let *<sup>h</sup>*(*x*) <sup>=</sup> *<sup>x</sup> <sup>n</sup>* <sup>+</sup> *an*−1*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> +···+ *<sup>a</sup>*1*<sup>x</sup>* <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>[*x*], by Lemma 7.62, the characteristic polynomial of rotation matrix *Th* is *h*(λ), then by Hamilton–Cayley theorem, we have

$$T\_h^n + a\_{n-1}T\_h^{n-1} + \dots + a\_1T\_h + a\_0 = 0,\tag{7.184}$$

there is

$$\begin{split} T\_h^\*(\mathbf{g})T\_h &= [\mathbf{g}, T\_h \mathbf{g}, T\_h^2 \mathbf{g}, \dots, T\_h^{n-1} \mathbf{g}] \begin{bmatrix} 0 & -a\_0 \\ I\_{n-1} & -\alpha \end{bmatrix} \\ &= [T\_h \mathbf{g}, T\_h^2 \mathbf{g}, \dots, -a\_0 \mathbf{g} - a\_1 T\_h \mathbf{g} - \dots - a\_{n-1} T\_h^{n-1} \mathbf{g}] \\ &= [T\_h \mathbf{g}, T\_h^2 \mathbf{g}, \dots, (-a\_0 - a\_1 T\_h - \dots - a\_{n-1} T\_h^{n-1}) \mathbf{g}] \\ &= [T\_h \mathbf{g}, T\_h^2 \mathbf{g}, \dots, T\_h^n \mathbf{g}] \\ &= T\_h [\mathbf{g}, T\_h \mathbf{g}, \dots, T\_h^{n-1} \mathbf{g}] \\ &= T\_h \cdot T\_h^\*(\mathbf{g}). \end{split}$$

When the monic *n*-degree integer coefficient polynomial *h* is selected, we want to establish the corresponding relationship between the ideal and the integer lattice *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* in the quotient ring *<sup>R</sup>* <sup>=</sup> <sup>Z</sup>[*x*]/*h*(*x*). First, we define the concept of an ideal lattice. In short, an ideal lattice is an integer lattice generated by the ideal matrix.

**Definition 7.19** Let *<sup>g</sup>* <sup>=</sup> (*g*0, *<sup>g</sup>*1,..., *gn*−1)*<sup>T</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* be a given column vector and *T* <sup>∗</sup> *<sup>h</sup>* (*g*) be the ideal matrix generated by *g*, and call the integer lattice *L* = *L*(*T* <sup>∗</sup> *<sup>h</sup>* (*g*)) an ideal lattice.

Our main result is the 1-1 correspondence between ideal and ideal lattice in *R* = <sup>Z</sup>[*x*]/*h*(*x*). This also explains the reason why *<sup>L</sup>*(*<sup>T</sup>* <sup>∗</sup> *<sup>h</sup>* (*g*)) is called ideal lattice.

**Theorem 7.10** *The principal ideal in R* <sup>=</sup> <sup>Z</sup>[*x*]/*h*(*x*) *1-1 corresponds to the ideal lattice in* Z*n. Specifically,*

*(i) If N* = *g*(*x*) *is any principal ideal in R, then*

$$\sigma(N) = \{\sigma(f)|f \in N\} = L(T\_h^\*(\sigma(g(\chi)))) = L(T\_h^\*(g)).$$

*(ii) If g* <sup>=</sup> (*g*0, *<sup>g</sup>*1,..., *gn*−<sup>1</sup>)*<sup>T</sup>* <sup>∈</sup> <sup>Z</sup>*n, T* <sup>∗</sup> *<sup>h</sup>* (*g*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> is any ideal lattice, then*

$$\sigma^{-1}(T\_h^\*(\mathbf{g})) = \langle \sigma^{-1}(b) | b \in T\_h^\*(\mathbf{g}) \rangle = \langle \mathbf{g}(\mathbf{x}) \rangle \subset R,$$

*where g*(*x*) = *g*<sup>0</sup> + *g*1*x* +···+ *gn*−1*x <sup>n</sup>*−<sup>1</sup> = σ <sup>−</sup><sup>1</sup>(*g*)*.*

*Proof* We first prove (i). Let *g*(*x*) = *g*<sup>0</sup> + *g*1*x* +···+ *gn*−1*x <sup>n</sup>*−<sup>1</sup> ∈ *R* be a given polynomial, *N* = *g*(*x*) ⊂ *R* is a principal ideal generated by *g*(*x*) in *R*, by (7.182),

$$\sigma(\mathbf{g}(\mathbf{x})) = (\mathbf{g}\_0, \mathbf{g}\_1, \dots, \mathbf{g}\_{n-1})^T = T\_h^\*(\mathbf{g}) \cdot \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \in L(T\_h^\*(\mathbf{g})) .$$

And because

$$\begin{aligned} \log(\mathbf{x}) &= g\_{n-1}\mathbf{x}^n + g\_{n-2}\mathbf{x}^{n-1} + \cdots + g\_1\mathbf{x}^2 + g\_0\mathbf{x} \\ &= (g\_{n-2} - g\_{n-1}a\_{n-1})\mathbf{x}^{n-1} + (g\_{n-3} - g\_{n-1}a\_{n-2})\mathbf{x}^{n-2} + \cdots + \mathbf{x}^n \\ &+ (g\_0 - g\_{n-1}a\_1)\mathbf{x} - g\_{n-1}a\_0, \end{aligned}$$

so

$$\sigma(\text{xg}(\mathbf{x})) = \begin{bmatrix} -g\_{n-1}a\_0 \\ g\_0 - g\_{n-1}a\_1 \\ \vdots \\ g\_{n-2} - g\_{n-1}a\_{n-1} \end{bmatrix} = T\_h \cdot \mathbf{g} = T\_h^\*(\mathbf{g}) \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \in L(T\_h^\*(\mathbf{g})).$$

For the same reason, for 0 ≤ *k* ≤ *n* − 1, we have

$$\sigma(\mathbf{x}^k g(\mathbf{x})) = T\_h^k \cdot \mathbf{g} = T\_h^\*(\mathbf{g}) \cdot \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \\ 0 \\ 0 \end{bmatrix} \in L(T\_h^\*(\mathbf{g})).$$

Suppose *f* (*x*) ∈ *N* = *g*(*x*), then *f* (*x*) = *b*(*x*) · *g*(*x*), where *b*(*x*) = *b*<sup>0</sup> + *b*1*x* + ···+ *bn*−1*x <sup>n</sup>*−1, then we have

$$\begin{aligned} \sigma(f(\mathbf{x})) &= \sigma(b(\mathbf{x})\mathbf{g}(\mathbf{x})) \\ &= \sum\_{k=0}^{n-1} b\_k \sigma(\mathbf{x}^k \mathbf{g}(\mathbf{x})) \\ &= T\_h^\*(\mathbf{g}) \begin{bmatrix} b\_0 \\ b\_1 \\ \vdots \\ b\_{n-1} \end{bmatrix} \in L(T\_h^\*(\mathbf{g})). \end{aligned} \tag{7.185}$$

That proves

$$
\sigma(\mathfrak{x}) = \sigma(\langle \mathfrak{g}(\mathfrak{x}) \rangle) \subset L(T\_h^\*(\mathfrak{g})).
$$

Conversely, for any lattice point α ∈ *L*(*T* <sup>∗</sup> *<sup>h</sup>* (*g*)), then

$$\alpha = T\_h^\*(\mathbf{g})\\b = T\_h^\*(\mathbf{g}) \begin{bmatrix} b\_0 \\ b\_1 \\ \vdots \\ b\_{n-1} \end{bmatrix},$$

since σ is 1-1 corresponds, by (7.185), then

$$f(\mathbf{x}) = \sigma^{-1}(f(\mathbf{x})) = \sigma^{-1}(T\_h^\*(\mathbf{g})b) \in N = \langle \mathbf{g}(\mathbf{x}) \rangle.$$

So we have

$$
\sigma(N) = \sigma(\langle \mathbf{g}(\mathbf{x}) \rangle) = L(T\_h^\*(\mathbf{g})).
$$

(i) holds. Again, σ is 1-1 corresponds, so (ii) can be derived directly. We complete the proof of Theorem 7.10.

The above discussion on ideal matrix and ideal lattice can be extended to a finite field <sup>Z</sup>*<sup>q</sup>* , because any quotient ring <sup>Z</sup>*<sup>q</sup>* [*x*]/*h*(*x*) on polynomial ring <sup>Z</sup>*<sup>q</sup>* [*x*] in finite field is a principal ideal ring. Therefore, we can establish the 1-1 correspondence between all ideals in *<sup>R</sup>* <sup>=</sup> <sup>Z</sup>*<sup>q</sup>* [*x*]/*h*(*x*) and linear codes on <sup>Z</sup>*<sup>q</sup>* .

Back to the Ajtai/Dwork cryptosystem, let *<sup>h</sup>*(*x*) <sup>∈</sup> <sup>Z</sup>*<sup>q</sup>* [*x*] be a given polynomial, and select an *<sup>n</sup>* <sup>×</sup> *<sup>m</sup>*-dimensional matrix *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*×*<sup>m</sup> <sup>q</sup>* as the generalized ideal matrix, i.e.,

$$A = [A\_1, A\_2, \dots, A\_{\frac{n}{n}}],\tag{7.186}$$

where *Ai*(<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup> <sup>n</sup>* ) is the ideal matrix generated by *<sup>g</sup>*(*i*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* , that is

$$A\_i = T\_h^\*(\mathbf{g}^{(i)}) = [\mathbf{g}^{(i)}, T\_h \mathbf{g}^{(i)}, \dots, T\_h^{n-1} \mathbf{g}^{(i)}],\tag{7.187}$$

we get the second algorithm of Ajtai/Dwork cryptosystem:

Algorithm 2: Hash function based on ideal lattice.

Parameter: *q*, *n*, *m*, *d* are positive integers, *n*|*m*, *m* log *d* > *n* log *q*.

Secret key: *<sup>m</sup> <sup>n</sup>* column vectors *<sup>g</sup>*(*i*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* (<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup> <sup>n</sup>* ), polynomial *h*(*x*) = *x <sup>n</sup>* + *an*−1*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> +···+ *<sup>a</sup>*1*<sup>x</sup>* <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>∈</sup> <sup>Z</sup>*<sup>q</sup>* [*x*].

Hash function *<sup>f</sup> <sup>A</sup>* : {0, <sup>1</sup>,..., *<sup>d</sup>* <sup>−</sup> <sup>1</sup>}*<sup>m</sup>* −→ <sup>Z</sup>*<sup>n</sup> <sup>q</sup>* defined as

$$f\_A(\mathbf{y}) \equiv A\mathbf{y}(\mathbf{mod}\,q),$$

The ideal matrix *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*n*×*<sup>m</sup> <sup>q</sup>* is given by Eq. (7.186).

We will not introduce the anti-collision performance of hash functions constructed by cyclic lattices and ideal lattices here. Interested students can refer to the reference Micciancio and Regev (2009) in this chapter.

#### **Exercise 7**


$$
\lambda\_1(L) \cdot \lambda\_1(L^\*) \le n.
$$

4. Let λ1(*L*), λ2(*L*), . . . , λ*n*(*L*) be the length of the Successive Shortest vector of lattice *L*, prove

$$
\lambda\_1(L) \cdot \lambda\_n(L^\*) \ge 1.
$$

5\*. Let *L* be a lattice, *B* = [β1, β2,...,β*n*] is the generating matrix of *L*, *B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] is the corresponding orthogonal matrix. Prove: any lattice *L* has a set of bases {β1, β2,...,β*n*}, such that

$$\frac{1}{n}\lambda\_1(L) \le \min\{|\beta\_1^\*|, |\beta\_2^\*|, \dots, |\beta\_n^\*|\} \le \lambda\_1(L).$$

(Hint: use KZ basis on lattice *L*).

6. Under the assumption of exercise 5, let λ1(*L*), λ2(*L*), . . . , λ*n*(*L*) be the continuous minimum of lattice *L*, prove:

$$
\lambda\_j(L) \ge \min\_{j \le i \le n} |\beta\_i^\*|, \ 1 \le j \le n.
$$

7. For a full rank lattice *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, define its coverage radius μ(*L*) as

$$\mu(L) = \max\_{\mathbf{x} \in \mathbb{R}^n} |\mathbf{x} - L|.$$

Prove: the covering radius of any lattice *L* exists.


$$
\lambda\_1(L) \cdot \mu(L^\*) \le n.
$$

# **References**


Micciancio, D. (2001). Improving lattice based cryptosystems using the Hermite normal form. In J. Silverman (Ed.), *Lecture Notes in Computer Science: Vol. 2146. Cryptography and lattices conference—CaLC 2001* (pp. 126–145). Springer.

Micciancio, D., & Regev, O. (2009). *Lattice-based cryptography*. Springer.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **References**


© The Editor(s) (if applicable) and The Author(s) 2022

Z. Zheng, *Modern Cryptography Volume 1*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-19-0920-7

