**Financial Mathematics and Fintech**

# Zhiyong Zheng Editor

Proceedings of the Second International Forum on Financial Mathematics and Financial Technology

# **Financial Mathematics and Fintech**

# **Series Editors**

Zhiyong Zheng, Renmin University of China, Beijing, Beijing, China Alan Peng, University of Toronto, Toronto, ON, Canada

This series addresses the emerging advances in mathematical theory related to finance and application research from all the fintech perspectives. It is a series of monographs and contributed volumes focusing on the in-depth exploration of financial mathematics such as applied mathematics, statistics, optimization, and scientific computation, and fintech applications such as artificial intelligence, block chain, cloud computing, and big data. This series is featured by the comprehensive understanding and practical application of financial mathematics and fintech. This book series involves cutting-edge applications of financial mathematics and fintech in practical programs and companies.

The Financial Mathematics and Fintech book series promotes the exchange of emerging theory and technology of financial mathematics and fintech between academia and financial practitioner. It aims to provide a timely reflection of the state of art in mathematics and computer science facing to the application of finance. As a collection, this book series provides valuable resources to a wide audience in academia, the finance community, government employees related to finance and anyone else looking to expand their knowledge in financial mathematics and fintech.

The key words in this series include but are not limited to:


Zhiyong Zheng Editor

# Proceedings of the Second International Forum on Financial Mathematics and Financial Technology

*Editor* Zhiyong Zheng Renmin University of China Beijing, China

ISSN 2662-7167 ISSN 2662-7175 (electronic) Financial Mathematics and Fintech ISBN 978-981-99-2365-6 ISBN 978-981-99-2366-3 (eBook) https://doi.org/10.1007/978-981-99-2366-3

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

# **Preface**

Financial technology is reshaping the financial industry ecology with explosive growth, making China's financial industry constantly achieve breakthroughs on a new runway. The rapid development of FinTech at the same time triggers the indepth integration between mathematics, finance, and advanced technology.

The Second International Academic Forum on Financial Mathematics and Financial Technology was successfully held online on August 13–15, 2021, jointly held by the School of Mathematics of Renmin University of China, the Engineering Research Center of the Ministry of Financial Computing and Digital Engineering, the Statistics and Big Data Research Institute of Renmin University of China, the Blockchain Research Institute of Renmin University of China, the Zhong guancun Internet Finance Research Institute, and the Renmin University Press. Several distinguished scholars engaged in the interdisciplinary research of mathematics, statistics, information technology, and finance delivered excellent speeches and discussed in depth on the bottlenecks faced by emerging technologies such as big data, AI, cloud computing, and blockchain. This forum has provided insightful understandings on the development frontier and research hotspot of financial mathematics and financial technology, and strengthened the contact between our institute and research institutes from home and abroad.

The proceedings emphasize the selected aspects of current and upcoming trends in FinTech, presenting the innovative mathematical models and state-of-the-art technologies, benefiting both scholars and practitioners in pursuing perfect integration of elegant mathematical models and up-to-date data mining technologies in financial market analysis.

Chapter "On the Development of Fintech in Asia" provides the general overview on the Development of Fintech in Asia. Chapter "A Probability Inequality with Application to Lattice Theory" gives a more precise estimation probability of decryption error about GGH public-key encryption scheme based on the Hoeffding inequality. The upper bound probability could be closed to 0 with applicable parameters, which means that the probability of decryption error for the cryptosystem could be sufficiently small. It is also confirmed that the GGH public-key cryptosystem could have high security. Chapter "Robust Identification of Gene-Environment Interactions Under High-Dimensional Accelerated Failure Time Models" considers censored survival data and adopt a high-dimensional accelerated failure time (AFT) model for robust identification of gene-environment interactions. Chapter "A Novel Approach for Improving Accuracy for Distributed Storage Networks" propose two approaches, periodic self-verification and user verification, to guarantee the reliability of the storage network while improving efficiency in distributed storage. Chapter "Iterative Learning Control Based on Random Variance Reduction Gradient Method" proposed a novel iterative learning control scheme based on stochastic variance reduced gradient (SVRG), which is not only suitable for resolving the incomplete information problem, but also converges efficiently under both strongly convex and non-strongly convex control objectives. Chapter "A Generalization of NTRUEncrypt" first discusses a more general form of the ordinary cyclic code and gives a generalized construction of NTRU based on ideal matrix and q-ary lattice theory. Compared with other variations of NTRU, such as CTRU, GNTRU, QTRU, and BITRU, the extended NTRU cryptosystem is constructed with general ideal matrix rather than some special algebraic structures. Chapter "Cyclic Lattices, Ideal Lattices, and Bounds for the Smoothing Parameter" shows that ideal lattices are actually a special subclass of cyclic lattices, and proves that there is a one-to-one correspondence between cyclic lattices and finitely generated R-modules. Chapter "On the LWE Cryptosystem with More General Disturbance" gives estimation probability of decryption error based on Gaussian disturbances and proves that the decryption error could be sufficiently small. The most salient innovation and contribution is that for any general disturbances, the decryption error could also be small enough. This indicates high security and reliability of LWE-based cryptosystem. In other words, this cryptosystem is secure enough against passive eavesdroppers and could be applied in many kinds of encryption process. Chapter "On the High Dimensional RSA Algorithm—A Public Key Cryptosystem Based on Lattice and Algebraic Number Theory" proves that high-dimensional RSA is a lattice based on public-key cryptosystem, of which would be considered as a new number in the family of postquantum cryptography. Moreover, the matrix expression of any algebraic number field is also given, which is a new result even in the sense of classical algebraic number theory. Chapter "Central Bank Digital Currency Cross-Border Payment Model Based on Blockchain Technology" combines the time-series model with fiscal science and puts forward a model for the fiscal budget variance of China's national general public budget. Chapter "LLE Based K-Nearest Neighbor Smoothing for scRNA-Seq Data Imputation" proposed LLE-based k-nearest neighbor smoothing for scRNA-seq data imputation where the data is of high dimensionality, sparse and noisy. Chapter "The Application of Time Series Analysis in the Fiscal Budget Variance of China" is about the application of time-series analysis in the fiscal budget variance of China.

We would like to take this opportunity to thank all the participants at the second International Forum on Financial Mathematics and FinTech. We are also pleased to Preface vii

thank the support of School of Mathematics, Renmin University of China, and Engineering Research Center of Finance Computation and Digital Engineering, Ministry of Education.

Beijing, China Zhiyong Zheng

# **Contents**



# **On the Development of Fintech in Asia**

# **Liu Yong**

**Abstract** There are five models of fintech development in the world: the technology promotion model represented by the USA, the rule-driven model represented by the UK, the market pull model represented by China, the mixed competition model represented by Japan and Indonesia, and the model of fanning out from point to area represented by South Korea and Israel. In terms of the layout, the transformation of traditional financial hubs has been accelerated, China and the USA have outstanding advantages in fintech, and the Asia-Pacific region has great potential for fintech development. The fintech of China has been promoted to the worlds leading level; Japan boosts the rapid growth of fintech through advantages of backwardness; Singapore gathers innovative resources with a relaxed and inclusive atmosphere; South Korea promotes scale development of fintech industry by fanning out from point to area; India is gradually exerting its potential for fintech development; Israel builds the highland of fintech development through guidance plus service; Indonesia has gradually become a rising star in fintech development in Southeast Asia; Hong Kong promotes the momentum of sound fintech development with government assistance.

**Keywords** Fifintech development · Asia · Policy and regulatory measures · Digital transformation

# **1 Overview of Global Fintech Development**

In recent years, global fintech has maintained a high speed of development, the adoption rate of fintech has gradually increased, and a large number of fintech unicorn enterprises have emerged. With the application of big data, blockchain, AI, and other technologies in the financial field becoming more and more mature, new models and

© The Author(s) 2023

L. Yong (B)

Zhongguancun Internet Finance Institute, Beijing, China e-mail: liuyong@guopeijigou.com

Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_1

industry forms of financial service have come into being. Among them, some application fields have developed more rapidly including digital currency, open banking, digital banking, etc.

# *1.1 Development Dynamics*

Fintech enterprises are growing fast. According to the relevant data, there are 1057 unicorn enterprises in the world now as of November 2021, and fintech unicorns play a decisive role in fintech field with the most amount of enterprises on the list which is 139 and the total valuation is 4.7 trillion yuan, accounting for 19% of the total valuation of unicorn enterprises on the list. From a country perspective, the USA has the largest number of unicorn enterprises in the fintech sector, followed by China. In 2019 Fintech 100 announced by Klynveld Peat Marwick Goerdeler (KPMG), the enterprises in Asia-Pacific region (including Australia and New Zealand) performed brilliantly, with a total of 42 enterprises on the list. As far as payment enterprises were concerned, 27 companies were on the list, which took the lead. As for other categories of companies on the list, there were 19 wealth management companies, 17 insurance companies, 15 lending companies, and 13 companies with relatively comprehensive financial business.

The developing economies represented by Southeast Asia and Latin America have obvious development characteristics in the field of financial science and technology. According to the report of the Future of Southeast Asian fintech by the British consultancy Dealroom, European venture capital company FinchCapital and Indonesian venture capital company MDIVentures, the outbreak of COVID-19 pneumonia has accelerated the digital transformation of fintech in the region, especially in the field of digital payment. Indonesia is expected to become the largest financial technology hub in the region by 2025, with an expected market value of US \$130 billion in related fields. According to the global fintech report for the second quarter of 2021 by CB Insights, fintech financing in Latin America has increased at a compound annual growth rate of 57% since 2016, reaching US \$4.246 billion by the second quarter of 2021. Among them, the financing amount of fintech companies in Brazil alone accounts for 70% of the total financing in the region.

More and more central banks have begun to actively study the issuance of CBDC (Central Bank Digital Currency), and some countries have even begun to build the underlying infrastructure of CBDC and start the pilot of CBDC technology. As the first country in the world to launch a sovereign digital currency, DC/EP has conducted pilot projects in some domestic cities, commercial banks, and cross-border payments since April 2020, and completed the country's first digital RMB insurance policy in December 2020. In 2020, the Bank of France launched a digital currency pilot project. European and American countries are also unwilling to fall behind. The central banks of Canada, Sweden, the UK, and other countries jointly set up a CBDC group with BIS. In May 2020, the United States released a white chapter on the digital dollar project (DDP), which introduced in detail the basic architecture, distribution purpose, and potential application scenarios of CBDC in the United States.

The evolution of digital banking is accelerating. As a banking development model which has arisen in recent years, digital banking is an important achievement of digital transformation of banks. Currently, 60% of the worlds banking population is using digital banking through online services and cashless transactions. According to the relevant data in the Nets (an European transaction processing center) report, noncontacting digital wallet transactions increased by more than two-thirds in the first half of 2020 compared with 2019. With the increase of the users of digital banking, the number of digital banks has gradually increased. In 2019, Hong Kong Monetary Authority (HKMA) approved the establishment of 8 virtual banks. Monetary Authority of Singapore (MAS) opened up applications for digital banking licences in 2020. In addition, digital banks in many countries engage in online banking business with traditional banking license or in traditional authorized business forms, such as Monzo Bank and N26 Bank in the UK, aiBank, WeBank, and MYbank, in China.

The world has a deeper understanding of the concept of sustainable development, and the practice scenes of fintech in the field of green finance have increased. From the perspective of application scenarios, the use of fintech tools covers ESG investment and financing, national carbon market trading, green building, green consumption, green agriculture, small and micro enterprises, and other fields. Fintech is widely used in environmental data, ESG data and evaluation, green credit information management system of financial institutions, and other scenarios.

# *1.2 The Financing Profile*

Global fintech investment and financing grew strongly. In 2020, the number of financing transactions reached 3443, and the number of financing transactions in the first three quarters of 2021 was 3549, which has exceeded the total amount of financing in the whole year of last year. The total financing amount of fintech in 2020 was US \$48.4 billion, and the amount of financing in the first three quarters of 2021 was US \$94.7 billion, nearly twice the total financing amount of last year.

The financing amount of financing projects is mainly concentrated in North America, Asia, and Europe, with a quarter on quarter increase of more than 50%. Among them, North America has the highest amount of total financing, accounting for more than half of the total global investment, reaching the highest in the second quarter of this year, with USD 16.56 billion, followed by Asia, which reached the highest in the third quarter of this year, with \$5.9 billion. South America exceeded USD 1 billion for the first time in the second quarter of this year. In Africa and Oceania, the amount of financing is relatively stable and has little change (Fig. 1).

# *1.3 Regulatory Environment*

In recent years, financial management departments in various economies have increasingly improved their regulation on fintech activities, and have promoted the healthy and orderly development of fintech through measures such as continuous monitoring, the establishment of regulators, and the introduction of regulatory policies. On the one hand, financial management departments support the entrance of fintech companies into the market to make up for the current weak links in financial services; on the other hand, countries have set a high threshold for access to financial business to reasonably guard against systemic risks.

The legislative process of data protection has been accelerated. In recent years, with the iterative innovation of new technologies, various business entities are accelerating the development of new data resources, and meanwhile the incurred problems such as data privacy protection are also increasingly valued by various countries. EU countries summarize and improve data legislation in practice: since the second half of 2019, the European Commission (EC) and Council of the European Union have organized each member countrys regulators to submit a law enforcement summary, and they have received 19 law enforcement summary reports from different countries. In September 2020, European Data Protection Board (EDPB) issued Guidelines on the Targeting of Social Media Users (the Draft Guidelines), expounding on the requirements of data protection in social media. At the beginning of 2020, California Consumer Privacy Act (CCPA) of the USA formally came into force and was formally incorporated into Californias judicial system. On October 21, 2020, the Peoples Republic of China released (Draft) and solicited public opinions. It is the first law that specifically stipulates personal information protection. Promulgated, it will become the basic law in the field of personal information protection, the Personal Information Protection Law of the Peoples Republic of China, officially came into force on November 1, 2021.

The innovation of fintech regulation tools has been continuously strengthened. Firstly, some countries have established fintech innovation mechanism. To cite a few

**Fig. 1** The amount of financing in fintech in global continents from 2010 to Q3 of 2021

examples, France proposed in March 2021 to establish a European exemption mechanism in regard to blockchain, relax some legal requirements that cannot meet the needs of blockchain development, and it suggested that exempted entities should follow the key principles of financial regulation. Secondly, some countries have further improved the Sandbox Mechanisms. Thirdly, many countries vigorously support the development of RegTech. To cite a few examples, Central Bank of Brazil (CBB) announced in April that Pier, an information integration platform for financial regulators based on blockchain technology, began its operation online, which could help the participating institutions quickly access the latest data of other institutions, thus shortening the data query operation that might have taken a month to several seconds.

The fintech policy system has been continuously improved. Nowadays, countries all over the world gradually realize the potential value of fintech and formulate relevant development strategies and improve relevant policy systems to support the development of fintech. At present, apart from the policies related to AI, blockchain, big data, and other key underlying technologies of fintech, areas such as digital banking, online payment, and encrypted assets are gradually covered. The regulation on the application of fintech has basically realized full coverage, and the fintech policy system is continuously improved.

# *1.4 The Models of Fintech Development*

At present, around the world there are generally five models of fintech development. The first is the Technology Promotion Model represented by the USA, which is characterized by mutual promotion of finance and technology and a win-win relationship between industry and culture. The second is the Rule Driven Model represented by the UK, which is characterized by innovating regulatory methods and boosting industrial development through rules. The third is the Market Pull Model represented by China, which is characterized by accelerated digital transformation and breakthroughs sought in strict regulation. The fourth is the Mixed Competition Model represented by Japan and Indonesia, which is characterized by accelerating the pace of reform and continuous stimulation of potential. The fifth is the Model of Fanning out from Point to Area represented by South Korea and Israel, which is characterized by locating breakthroughs and focusing on tackling key problems (Fig. 2).

# *1.5 Spatial Layout*

In recent years, the fintech hubs represented by Shanghai, Beijing, Shenzhen, Hangzhou, San Francisco (Silicon Valley), New York, London, and Chicago are accelerating their rise based on financial industry and driven by technology. China and the USA have their distinctive advantages in the development of fintech and have become leaders in the development of fintech worldwide. The Asia-Pacific region

**Fig. 2** The models of fintech development around the world

has gradually demonstrated its potential for fintech development and has attracted a large influx of capital, showing its advantage of backwardness.

The transformation of traditional financial hubs has been accelerated and there is great potential for the development of fintech in the Asia-Pacific region. With the comprehensive empowerment and transformation of finance by technology, the transformation of traditional financial hubs having been accelerated and newly emerging financial cities having been upgraded in an all-round way, and a new ecology of regional economy having been created with a strategic height, in the future financial hubs will take fintech as the core competitiveness of cities and compete for the commanding heights of fintech without exception. According to Global Fintech Hub Report 2021, the 9 cities in the first echelon of the global fintech hubs were Beijing, San Francisco (Silicon Valley), New York, Shanghai, Shenzhen, London, Hangzhou, Singapore, and Chicago respectively. These cities are home to the large financial institutions and the headquarters of financial institutions of the country. Most of them have a solid foundation for financial industry. They are currently starting the pace of all-round digital transformation of financial industry supported by technology. From the perspective of fintech experience, developing countries and Asia continue to maintain an overall leading edge. Not only the top 10 cities all located in developing countries in Asia, but also developing countries account for 80% among the top 20 cities for two consecutive years and Asian cities account for 65%.

# **2 Practice of Fintech Development in Asia**

# *2.1 China—The Fintech Has Been Promoted to the Worlds Leading Level*

### **2.1.1 Development Features: Accelerated Digital Transformation**

According to the development stages of technology application in financial industry, the development nodes of Chinas fintech industry are relatively clear. The development of fintech in China can be divided into four stages, as is shown in Fig. 3. China has entered the fintech 4.0 era, when finance and technology develop in a highly integrated way.

The development of fintech industry leap into the front ranks of the world. There are 139 unicorns in China's fintech industry, ranking first in the world. The market scale of China's fintech enterprises is growing steadily. According to the prediction and display of relevant data of the Forward Looking Industry Research Institute, the market scale of China's fintech enterprises is expected to reach 463.1 billion yuan in 2021, an increase of nearly 17% over the previous year. It is expected that the scale of China's fintech market will still achieve stable growth in 2022.

Great progress has been made in technological innovation. From 2015 to the first half of 2019, a total of more than 22,000 enterprises applied for fintech-related patents in China, with a total number of more than 88,000 patents. Among them, big data analysis, interconnection technology, and cloud computing accounted for the highest proportion, while big data, cloud computing, biometric security, and AI maintained relatively smooth and steady growth; blockchain technology performed brilliantly with explosive growth, with the proportion of patents increasing from 0.4% in 2015 to 8.5% in 2019 (Fig. 4).

**Fig. 3** Development process of Chinas fintech

**Fig. 4** Fintech patents

There is a shortage of fintech talents. At present, fintech talents are in short supply, and the growth rate is far lower than the development rate of fintech itself. According to 2018 China Fintech Employment Report released by Michael Page (China), 92% of the fintech enterprises interviewed found that China is currently confronting a severe shortage of fintech professional talents, 85% of the employers interviewed said that they encountered recruitment difficulties, and 45% of the employers interviewed said that the greatest difficulty they confronted in recruitment was the difficulty in finding talents that could meet the specific position requirements. According to the survey, the most popular fintech positions were big data position, AI position, and risk management position, accounting for 40%, 32%, and 12%, respectively.

# **2.1.2 Policies and Regulatory Measures: Finding Breakthroughs in Strict Regulation and Ensuring Steady Development of Data Protection**

The top-down design for fintech development has been continuously improved. In August 2019, the people's Bank of China issued the Financial Technology (fintech) Development Plan (2019–2021). The introduction of this programmatic document will build the top-level design of "four beams and eight columns" of financial technology. In December 2021, the central bank issued the Fintech Development Plan (2022–2025), which is the second round of fintech development plan issued by the central bank after the release of the plan in 2019. Compared with the first round of planning, this round of planning will focus on solving the problem of uneven and insufficient development of financial science and technology, with clearer key tasks, clearer development direction, and stronger implementation guarantee. At the same time, the plan puts forward the financial technology development vision of "striving to achieve the leap forward improvement of the overall level and core competitiveness by 2025", which has opened up a broader development space for China's financial technology industry.

The system of fintech supervision rules has been gradually improved. The basic regulatory rules system of fintech is gradually improving. While improving the rule system in a single technical field, it enriches the supervision of business links such as fintech innovative product design, operation mode, and risk control means. In addition, it further complements and improves the regulatory rules for consumer rights and interests protection, personal privacy, and financial information data.

The standardization of fintech has been gradually strengthened. The central bank has issued and implemented technical standards for payment tokenization, payment information protection, acceptance terminal registration management, mobile terminal trusted execution environment, mobile financial client application software, incorporated financial science and technology products into the national unified certification system, and continued to carry out leader activities in the field of point of sale terminals (POS), self-service terminals (ATM), bar code payment acceptance terminals and online banking services.

### **2.1.3 Layout of Key Fintech Cities: The Cities in East China are Leading, but Each of the Cities has Its Own Characteristics**

At present, China is already leading the global fintech. However, there are differences in the development speed and level of fintech among its cities. The overall strength of the cities in east China is relatively strong, the optimized layout of Beijings fintech develops steadily, Shanghai tries to build an international brand of fintech, Shenzhen strives to be the leading role in the development of Guangdong-Hong Kong-Macao Greater Bay Areas fintech, and Hangzhou adopts the strategy of policy plus talents to re-create new vitality for the citys development. Cities such as Chengdu, Chongqing, Guangzhou, Nanjing, and Qingdao are also actively laying out the development of fintech.

# *2.2 Japan—Boosting the Rapid Growth of Fintech Through Advantages of Backwardness*

## **2.2.1 Development Features: The Advantage of Backwardness in Fintech has Shown**

The comprehensive competitive strength lays the foundation for the development of fintech. Japan is the third largest developed country in the world, But its fintech development began relatively late. In 2018, the scale of Japans fintech market reached 214.5 billion yen, and it has been going up all the way. It was expected to reach 572.7 billion yen in 2020, with an average annual growth rate of more than 50% (Fig. 5).

Optimizing cultural soft environment and accelerating the shaping of a non-cash society. According to EY Global Fintech Adoption Index 2019, in terms of the

**Fig. 5** The scale of Japans fintech market (Unit: 100 million yen, %)

global consumer fintech application index, Japan ranked the lowest in 27 markets, with only 34%. The Japanese government issued Fintech Vision in May 2017, which clearly proposed that it should pay attention to the added value of fintech and focus on improving the adoption rate of electronic payments. After that, the government issued Future Investment Strategy 2017, explicitly proposing to triple the proportion of non-cash payments to more than 40% by Expo Osaka 2025. Since then, the Japanese government has been committed to promoting non-cash payment rebate activities [Consumers would get a rebate of about 2–5% for each non-cash payment], continuously optimizing the cultural environment and accelerating the shaping of a non-cash society.

The commercial configuration of various industries has gradually taken shape. The mobile payment sector has stepped out of the era of barbaric growth and formed a duopoly pattern of Line pay and PayPay, and has nurtured a number of outstanding fintech start-ups on this basis. Japan attaches great importance to the development of blockchain. In 2018, the market size of Japan blockchain reached 8.07 billion yen, and it reached 33.57 billion yen by 2020. With relatively strong development momentum, it saw the emergence of a number of blockchain start-ups with certain strength and characteristics, such as Dobulejump.tokyo and Nayuta Japans regulation on network lending is relatively loose, and network lending and crowdfunding have become an important part of Japans inclusive finance, hence the much rapid development of the industry. In 2014, when Japan amended its financial commodity trading law, crowdfunding suddenly came to the fore. In 2018, the scale of Japans crowdfunding market reached 204.5 billion yen. During the epidemic, many crowdfunding platforms also took on the responsibilities of assisting commercial tenants, etc. Japan is also actively promoting the development of sectors such as personal loan, Robo advising, and supply chain finance. Although they are still in the initial stage, they have great potential for future development (Fig. 6).

**Fig. 6** A diagram of the industrial ecology of Japans fintech

# **2.2.2 Policy and Regulatory Measures: Optimizing the Policy System and Forging Ahead with Determination**

In terms of the development policy and regulatory measures in fintech, Japan adopts strict regulation and easing measures at the same time. For the development of some traditional industries, especially in the aspects of digital transformation, the regulation is relatively strict. However, the regulation on sectors such as crowdfunding and network lending is relatively loose, so these sectors can develop rapidly. Strict regulation measures can effectively control the risks in fintech innovation. Moreover, in 2018, JFSA started to implement a sandbox mechanism for financial innovation, allowing financial and insurance products to be put into trial operation within a certain risk range, and steadily promoting healthy and sustainable innovative development. The loose measures in some sectors can stimulate the development vitality of the fintech industry for it to reform in development, maintain stability in progress, and create a safe and controllable development ecology in an all-round way.

# **2.2.3 Layout of Key Fintech Cities: Tokyo Bay Area Endowed with Good Resources to Push Traditional Financial Institutions on the Way of Reform**

Fintech got developed in Japan later than in other developed economies, so it has not yet formed a ubiquitous layout of fintech hubs. Whether according to the fintech hub report released by Global Fintech Hub Federation or the index and list of fintech hubs released by institutions such as Deloitte and Z/Yen Group, the fintech hubs of Japan that may enter the list tend to be Tokyo. Therefore, this chapter focused on the relevant situation and policy measures of Tokyo as a fintech hub. Tokyo ranks among internationally renowned financial centers together with other international financial centers such as New York financial center and London financial center. Meanwhile, Tokyo is also the capital of Japan and the financial capital of Japan. Since the 1960s, the Japanese government has been planning to build the capital circle of Tokyo, linking Tokyo with several neighboring counties for joint development and construction. At present, Tokyo Bay Area has become one of the worlds eight recognized bay areas.

Since the 1990s, the Japanese government has formulated and promulgated a series of science and technology innovation strategies and policy measures to stimulate the high-speed rise of science and technology innovation level in Tokyo Bay Area. Relying on internationally first-class universities and research institutions, innovative enterprise clusters, and the support of the Japanese governments policy inclination, Tokyo Bay Area has absorbed advanced technology and innovation concepts in its opening to the outside world, vigorously developed advanced scientific and technological productivity, formed a bay area ecological environment conducive to scientific and technological innovation, spawned numerous scientific and technological innovation institutions, and witnessed the emergence of a large number of scientific and technological innovation achievements, making Tokyo Bay Area gradually develop into a world center of innovation with international influence.

# *2.3 Singapore—Gathering Innovative Resources with a Relaxed and Inclusive Atmosphere*

### **2.3.1 Development Features: An Active Atmosphere for Fintech Innovation**

International innovation elements gather and multiple resources converge. Singapore is an international trade hub, an Asian financial center, and a place of strategic importance for technological innovation. Its convenient geographical conditions have facilitated the convergence of financial and technological innovation resources. On the one hand, as a global financial center, Singapore has financial industry as its service industry with the highest added value, with more than 1,200 financial institutions stationed here. On the other hand, Singapores scientific and technological innovation has developed rapidly. In Global Innovation Index 2018 released by WIPO (WIPO), Singapore ranked the fifth, overtaking traditional science and technology powers such as the USA, Germany, Israel, South Korea, and Japan and successfully ranking among the worlds leading science and technology innovation centers.

There are rich forms of activities and strong vitality in fintech. Since 2016, Singapore has been hosting Singapore Fintech Festival (SFF) and Singapore Week of Innovation & Technology (SWITCH). In 2019, SWITCH and SFF merged into SFF X SWITCH for the first time. On June 8, 2020, on the basis of previous experience of holding activities, Singapore held the MAS Global Fintech Innovation Challenge for the first time by innovating the form of activities. With the theme of Building Defenses, Seizing Opportunities, and Emerging Stronger, the competition had a total bonus of S\$1.75 million and comprised two parts: MAS FinTech Awards and MAS Global FinTech Hackcelerator.

Digital banking booms and digital finance accelerates. At present, Singapore is gradually loosening the restriction on the application for digital full bank license. The introduction of digital bank license is the largest banking liberalization in Singapore in the past 20 years. In December 2020, MAS issued a total of 4 digital bank licenses, of which 2 were DFB licenses and another 2 were DWB licenses. The launch of digital banks in Singapore will form competition with traditional banks, but meanwhile it will promote the rapid development of fintech in Singapore.

Actively extending the application scenarios of blockchain technology. Singapore is the friendliest country to the development of blockchain in Southeast Asia and even all over the world. At present, a large number of mature blockchain projects are distributed in sectors such as trading platforms, public blockchains, hosting, cloud storage, infrastructure, consulting, and insurance. Singapore vigorously promotes the application of blockchain technology in financial scenarios. On the one hand, it uses blockchain technology to promote the development of digital payment. On the other hand, it focuses on SME financing and supply chain finance. In addition, it adopts blockchain technology to ameliorate the pain points of service of industrial finance including supply chain finance, etc.

In sound fintech ecology, various subjects jointly pursue interconnected development. Singapores rich and diverse international fintech activities and its open and inclusive innovation environment, etc. have attracted diversified fintech talents to gather here. In August 2020, Singapore established Asian Institute of Digital Finance (AIDF), jointly founded by MAS, National Research Foundation (NRF) of Singapore, and National University of Singapore (NUS), to meet the demand for digital financial services in Asia. The strong community effect has attracted the convergence of talents such as entrepreneurs, domain experts, angel investors, and industry mentors, and it has provided a platform for exchanges and collaboration among entrepreneurs, investors, and financial institutions. Meanwhile, it has attracted highquality native start-ups of Asian countries such as India and Indonesia to migrate to Singapore, forming a highly open international fintech ecosystem (Fig. 7).

## **2.3.2 Policy and Regulatory Measures: The Top-Down Design is Optimized and Special Policies are Increased**

Perfecting the top-down design of fintech. The Singaporean government has authorized MAS to be the policy subject for the innovation and development of fintech which is fully responsible for the strategic planning, policy framework, and policy coordination of the development of fintech. In order to further promote the coordinated and efficient development of fintech, MAS has established professional fintech management institutions. Firstly, FinTech & Innovation Group (FTIG) was set up,

**Fig. 7** Fintech ecosystem

which comprises three offices, respectively for payment and technology solutions, technology infrastructure, and technology innovation lab. FTIG invested S\$ 225 million to promote Financial Sector Technology & Innovation Scheme (FSTI) and encourage the global financial industry to set up an innovation and research and development center in Singapore. Secondly, Fintech Office was established, which is mainly responsible for three tasks: the first one is to review, correspond to and improve fintech-related subsidy schemes for cross-governmental agencies; the second one is to pay attention to industrial infrastructure and the gap between talent training and manpower demand and put forward target strategies, policies, and programs to enhance the competitiveness of the industry and enterprise organizations; the third one is to manage Singapores fintech brand and marketing strategy through fintech activities and related initiatives and strive to become a global fintech hub.

Formulating special fintech regulatory policies to promote the healthy development of enterprises. The Singaporean government has adopted a multi-pronged approach to promote the development of fintech at home. Some of the measures are universal, including providing a supportive environment for start-ups, adopting a collaborative approach, and attracting foreign investment. Apart from common measures, Singapore has formulated special fintech regulatory policies to guide the development of various segments of fintech, mainly focusing on AI and data analysis (hereinafter referred to as AIDA), blockchain technology, digital assets, payment bill, and open banking.

Promoting the development of RegTech application to reduce risks SGX has launched a new RegTech scheme that can automatically report market irregularities and promote fair trading. At present, Singapore has had representative companies ranking the worlds top 100 in RegTech involving sectors such as compliance management and anti-money laundering. Besides, for local start-ups, the Singaporean government has established a Regulatory Sandbox system to encourage the innovative development of fintech start-ups and nurture and incubate outstanding enterprises.

# *2.4 South Korea—Promoting Scale Development of Fintech Industry by Fanning Out From Point to Area*

## **2.4.1 Development Features: The Foundation for the Development of Fintech is Solid**

The 5G information and communication technology takes the lead. According to the data from South Koreas Ministry of Science & Information and Communication Technology (MSICT), South Korea is the first country in the world to start 5G commercial use. After the official launch of 5G business services on April 3, 2019, the number of users has increased continuously, reaching nearly 10 million by the end of October 2020.

Blockchain technology is developing rapidly. According to the data from Korean Intellectual Property Office (KIPO), a total of 1,301 blockchain patents were registered in South Korea in 2019, a 50-fold increase from 24 in 2015, and the number of the patents increased further in 2020 after the Covid-19 pandemic began.

Forming the mechanism design of two kinds of institutions and three modes of sharing, the big data credit system is refined. South Korea has established a much refined personal and corporate big data credit system. In this system, Korea Federation of Banks (KFB) is the pillar of the credit industry. On this basis, there are three data-sharing modes of credit information service. One is to force financial institutions to submit credit information to KFB, which will then be provided by KFB to private credit reporting companies; the second is to share information within the industry through associations or corporate groups; the third is that credit reporting companies collect other information through commercial contracts. Under the mechanism design of two kinds of institutions and three modes of sharing, the South Korean credit investigation industry can not only quickly and timely collect nationwide credit information, but also ensure that valuable credit information could be legally and fully shared across the whole society.

P2P loan industry develops rapidly. According to the statistics of the South Korean government, the total investment in P2P loans in South Korea increased from 37.3 billion won at the end of 2015 to 2.34 trillion won at the end of 2017, and then rapidly increased to 6.2 trillion won by the end of June 2019. In August 2020, the Law on Financial Industry Related to Online Investment and laws related to user protection (P2P laws) were officially implemented, which would strengthen the protection of investors and formally set a legal framework for P2P development.

## **2.4.2 Policy and Regulatory Measures: Strengthen Planning and Launch Fintech Development Strategy**

On December 4, 2019, Financial Services Commission (FSC), Republic of Korea announced that it would vigorously promote the large-scale development of the fintech industry, and introduced 8 measures in different sectors, involving improving the current Regulatory Sandbox system, carrying out regulatory reform to promote the development of fintech, loosen the entry restrictions of the financial industry, establishing a regulatory basis for the digital age, developing new growth engine for financial innovation, promoting the investment in fintech and establishing a venture capital ecosystem with private sector investment as the core, assisting fintech enterprises with overseas expansion expanding public support for fintech enterprises.

Preferential taxation is applied to research and development of blockchain technology. A report released by the local Ministry of Strategy and Finance announced the latest tax laws that came into effect in February 2019. And the blockchain has been added to the research and development list that provides tax credits. This means that the companies or enterprises that develop blockchain technology can deduct some taxes from the research and development expenses. The tax reduction depends on the size of the company.

Implement the Regulatory Sandbox plan and accelerate the digital transformation. On April 1, 2019, FSC, Republic of Korea officially launched a fintech Regulatory Sandbox program, thereby hoping to promote competition in South Koreas financial industry and bring more favorable services to consumers. Up to May 2020, FSC had held a total of 14 assessment committee meetings. Through various assessments of business innovation, consumer convenience, and project stability and feasibility, a total of 102 innovative financial services were eventually selected into the Regulatory Sandbox, which obtained exemption from licensing and other regulations.

### **2.4.3 Layout of Key Fintech Cities: Seoul—Taking Various Measures to Create a Business Environment for Fintech**

Seoul is the largest city on the Korean Peninsula and one of the major financial cities in Asia with advantages in economic, technological, and cultural development. Seoul ranks the fifth among the top ten Asian cities in terms of economy, only after Tokyo, Shanghai, Beijing, and Hong Kong. Its economic aggregate accounts for about 23% of that of South Korea. The population of Seoul is about 10.2 million, accounting for 20% of the total population of South Korea. In addition, Seoul accounts for half of Korea in terms of personal income tax, corporate income tax, and bank deposits, and the number of innovative enterprises and graduates from colleges and universities account for 30% of Koreas total. Seoul is South Koreas political and economic center, which has laid a good foundation for fintech development. In recent years, Seoul has created a good business environment for fintech by holding the fintech week, establishing fintech labs, setting up innovation funds to increase investment in fintech industries, etc.

Holding fintech week to create an innovative atmosphere. South Korea launched the first Korea Fintech Week from May 23 to 25, 2019. The event was held in Seouls Dongdaemun Design Plaza (DDP). It is the first global fintech fair in South Korea, and the FSC hopes to develop it into an important annual fintech event in Asia. In 2019, Korea Fintech Week invited global financial institutions, international organizations, and global fintech companies to discuss relevant policies to help local fintech companies expand ties with local and global investors. In addition, the activity also provided counseling services for college students and young job seekers who are interested in the fintech industry.

Affected by the epidemic, the 2020 Korea Fintech Week was changed to be held online. The FSC said it had attracted more than 170,000 page visitors and received more than 110 million page views. Financial companies and fintech enterprises set up a total of 150 virtual exhibition halls. A total of 35 enterprises participated in the online job fair session, with more than 1,000 job seekers competing for about 80 jobs provided by 21 fintech enterprises.

Launching SEOUL FINTECH LAB. In April 2018, the Seoul municipal government launched the first SEOUL FINTECH LAB, which was mainly aimed at earlystage start-ups. In July 2019, the second fintech lab was launched, which would be targeted at growth-stage start-ups and accommodate approximately 14 start-ups from South Korea, the USA, Hong Kong, and Singapore. The fintech lab will become a key anchor point for South Koreas fintech industry, so as to help South Koreas promising fintech start-ups develop abroad. The Seoul Innovation Growth Fund was established. At the end of 2018, Seoul set up an innovation fund of around 130 billion won (approximately USD 116 million) to be invested exclusively in innovative industries such as blockchain and fintech. In early 2019, the Seoul municipal government announced that the Seoul municipal government would invest 1.2 trillion won (USD 1.07 billion) in start-up companies in the fintech sector through the investment fund by 2022.

# *2.5 KazakhstanCDigital Transformation Speeds Up the Construction of Central Asian Fintech Hub*

### **2.5.1 Development Features: The Digital Economy is Booming**

Central Asia is located between the worlds two largest economies. Proposed by the Belt and Road Initiative, it can be regarded as a bridge connecting Europe and China. As a growing potential market, Kazakhstan has played a key role in the process of Central Asia becoming a global fintech hub.

Although in the beginning the level was relatively low, the digitization process in Kazakhstan is developing rapidly, mainly including: (1) The rapid growth of e-commerce and mobile commerce; (2) The transition from cash payments to noncontact and digital payments; (3) The growth of innovative digital financial products and services. Since the outbreak of COVID-19, these structural changes which had lasted for many years have accelerated, creating a favorable environment for the further development of fintech.

The fast-growing e-commerce market is one of the driving forces behind the development of fintech. Compared with other emerging market countries and developed economies, in Kazakhstan, the e-commerce penetration has a significant upward potential. According to Euromonitors data, the market value of e-commerce in Kazakhstan in 2019 was estimated to be KZT 401.3 billion (USD 1.1 billion), equivalent to 3.4% of the total volume of retail trade, with a CAGR of 33.3% from 2016 to 2019.

Digital payment is developing rapidly. The adoption rates of Internet and mobile phone have increased significantly. According to Ovum (world mobile information service), Kazakhstans total number of smartphones increased from 12.7 million in 2016 to 19.2 million in 2019 and is expected to reach 23.4 million by 2024. Banks are using mobile and Internet banks to provide better financial services to remote and rural areas. Fintech companies have fewer opportunities to cooperate with standard financial sector, but with the increase of mobile Internet adoption rate, they have obtained huge opportunities in areas not covered by traditional financial markets. In 2019, Kazakhstans digital payment amount more than tripled to about USD 34.8 billion Like e-commerce, this trend has been accelerated by the COVID-19 epidemic. In 2019, Kaspi.kz accounted for 83% of the growth of the entire payment market in Kazakhstan and became the largest contributor to Kazakhstans transition to digital payment.

The digital transformation of banking and insurance industry is at the right time. The COVID-19 epidemic has enabled most retail banking activities to be carried out online, which has promoted the development of digital banking. Banking services are rapidly moving from branch-based, product-centric organizations that use traditional technologies to more personalized digital solutions that are consumer-centric and deliver seamlessly. Since January 2019, the citizens of Kazakhstan have been able to use electronic insurance and choose to submit their applications online. Within the conceptual framework of the development of the financial sector in the Republic of Kazakhstan to 2030, it is expected that electronic insurance policy sales will be introduced into the compulsory courses.

### **2.5.2 Policy and Regulatory Measures: Being Committed to Promoting Financial Innovation in a Wider Range of Areas**

Astana Financial Services Authority (AFSA), established on January 1, 2018, is independent from the National Bank of Kazakhstan (NBK) and the financial market supervision and Development Department of Kazakhstan. It is an independent regulator of financial and non-financial services activities of AIFC (Astana international financial center, established on July 5, 2018, is the financial center of Astana, Kazakhstan). In its fintech hub department, AIFC assists relevant companies in developing new products and services in the fintech sector in various ways. One way is to provide acceleration projects where start-ups can work closely with mentors from around the world to develop the necessary capabilities. Another way is for the fintech department of AFSA to provide a legal and regulatory basis for the development of new financial products and technologies and to test them at the fintech lab (Regulatory Sandbox). At present, 30 projects are being tested in Regulatory Sandbox. Currently, more than 125 start-ups work with the fintech department of AIFC. These companies are distributed in different sectors such as payment and mobile wallet, market, credit, AI and machine learning, blockchain, digital identification, network security, and fraud prevention.

The fintech department of AIFC supports venture capital and corporate innovative development with the goal of creating a healthy venture capital ecosystem and expanding opportunities for start-ups in Central Asia and the countries belonging to Commonwealth of Independent States (CIS) to attract investment and perform transactions. In this way, AIFC is creating a comprehensive ecosystem, which covers strengthening regulation, supporting start-ups, helping attract investment, and implementing fintech solutions within enterprises. To address the innovation challenge in the financial sector, AIFC is taking a series of regulatory measures to promote innovation and strengthen the protection of the consumers of financial services/products., including setting up a fintech lab to promote fintech development, introducing a regulatory framework to promote the development of crowdfunding, expanding the framework for the list of regulated and market activities, implementing a series of policies to promote the healthy development of digital asset, creating a looser banking system to strengthen inclusive financing, implementing open API to promote the innovation of digital currency and payment services, providing corporate income tax and value-added tax exemptions for fintech companies optimizing e-commerce regulatory measures, ameliorating the framework to promote venture capital financing, launching Global Financial Innovation Network (GFIN) to promote cross-border regulation and innovation.

### **2.5.3 Layout of Key Fintech Cities: Nur Sultan and Almaty—Leading the Development of Non-cash Payment**

Kazakhstans cities with the most active fintech development are undoubtedly Nur Sultan (the capital) and Almaty (the countrys largest city). As the most densely populated and economically developed cities, they have non-cash payments leading in both quantity and share. Almatys non-cash payments occupied the largest market share: nearly KZT 7 trillion (about USD 16.5 billion). The city also had the highest proportion of non-cash payments, which was 76.8%. Nur Sultan ranked the second with a market share of KZT 2.9 trillion (approximately USD 6.8 billion) in non-cash

**Fig. 8** Market share and proportion of non-cash payment in different cities and regions of Kazakhstan

payments, with the proportion of non-cash payments reaching 72.2%, which also ranked the second (Fig. 8).

Apart from being absolute leaders in the market share of non-cash payments, Almaty and Nur Sultan are also major hubs for fintech start-ups in the country. The countrys largest fintech accelerator, science park, and center are located in the following cities: the AIFC Fintech Hub in Astana, AIFC and Nuris; TechGarden and Most in Almaty.

# *2.6 India—Potential for Fintech Development Has Been Gradually Exerted*

# **2.6.1 Development Features: Digital Technology Promotes the Innovative Development of Fintech**

Indian fintech enterprises as a whole are experiencing the transition from initial stage to growth stage. According to relevant statistics, as of 2019, the number of fintech start-ups in India was second only to the USA, ranking the second in the world. Taking the development of fintech enterprises in several major sub-sectors as an example, there are only dozens of network lending platforms in India at present, which are in the initial stage, and few Indians have experienced online lending investment. Indias credit investigation industry is still in the exploratory stage, with a huge long tail market. Indias crowdfunding industry is in the early stage of development, showing a slow growth trend since 2014. The crowdfunding industry lacks clear regulation, and SEBI(Securities and Exchange Commission of India), the main regulator, has not yet issued regulatory regulations.

The commercial forms of fintech are constantly improving. Sectors such as payment, loan, wealth management technology, personal finance, insurance, and RegTech have blossomed in an all-round way. Take the sectors of payment and online lending as an example. In the sector of payment, Internet payment covers enterprises with various systems, such as telecom, e-commerce, banks, wallet companies, and other enterprises with different representatives, and some representative payment companies such as Paytm are popular with capital investment. Indian fintech also has great potential in payment, online lending, blockchain, robo advising, inclusive financing, technology-driven integrated banking services, Internet financial security, biometrics, etc.

Digital payment helps India seize the innovation highland of fintech. Digital payment industry has become the core field of accelerating digital capacity building in India and has greatly boosted India to seize the innovative highland of fintech. In 2016, Modi put forward the slogan of Stand Up India, which officially helped the entrepreneurial trend from the height of national policies, with a view to establishing a new ecosystem in the financial scope, and announced the implementation of the Digital India program, initiating the banknote scrapping campaign, and making clear that digital ID cards should be bound to financial services. The banknote scrapping campaign directly boosted Indias fintech industry to the mainstream position, and Indias unique payment infrastructure with a unified payment interface won the trust of the people, which solved the problem of cash-based mode of payment and reduced the financing difficulties of the enterprises. In 2019, India launched the Digital India program, hoping to digitize every offline transaction by unifying the payment industry and e-commerce system. Meanwhile, Mastercard has also launched a project called Team Cashless India in India. This activity would help merchants accept digital payment and improve the coverage of fintech. In addition, the huge development potential of Indias fintech market has also attracted more companies around the world to deploy the Indian fintech market, such as famous Chinese enterprises Alibaba, Tencent, and JD.com and investment institutions Sequoia Capital, Hillhouse Capital, etc. The international influence of the fintech development has been continuously enhanced.

Population advantage lays the foundation for fintech development. India has the second largest population in the world. According to the statistics of the World Bank, as of 2018, the population of India was 1.353 billion. In terms of population structure, the population under 35 years old accounts for 65% of the total, and the population under 25 years old accounts for 50%. In terms of age and proportion, India has a larger number of young people, which is an idealized proportion in the population structure. There are a large number of talents available for fintech development. In the entrepreneurial development of fintech, more young entrepreneurs have the opportunity to start businesses, and there are more long-tailed users of fintech. By 2019, the number of Internet users in India was 560 million, accounting for about 41% of the total population. As a country with big Internet demand only second to China, India has great room for the development and implementation of intelligent applications. In terms of the adoption rate of fintech, according to public data, as of 2019, the adoption rate of fintech in India had reached 87%.

# **2.6.2 Fintech Development Policies and Measures: Two-Pronged Approach of Regulation and Publicity to Accelerate Fintech Development**

Supervision is the key to the development of fintech. India insists on doing by learning, learning by doing in fintech regulation. In the aspect of fintech regulation, the top-down design is gradually improved, fintech is included in the regulation scope, the Regulatory Sandbox of fintech has been launched, and the popularization rate and adoption rate of fintech are improved by setting up funds, launching fintech publicity activities, etc.

## **2.6.3 Fintech Development Measures in Key Cities: Mumbai—Sound Financial Foundation and Satisfying Scene Experience**

In 2019, Mumbais overall ranking in Global Fintech Hub Index (GFHI) improved by 6 places and it entered the top 20 in the world for the first time. Mumbais fintech industry has improved rapidly, and with 6 highly financed unlisted fintech enterprises such as Freecharge and InCred the number ranked the 16th in the world. Moreover, Mumbai, with its huge population size and excellent population structure (the average age was only 27 years old), had 64% fintech users, ranking the 12th in the world. The advantage of ranking the first in Asia except Chinese cities was glamorous all over the world, making the fintech experience in Mumbai a major advantage.

Actively building fintech into one of the characteristic industries for urban development. Mumbai, as the largest financial center in India, constantly ameliorates its fintech ecology. At present, it has large financial institutions such as HDFC Bank, Kotak Mahindra Bank, ICICI Bank, and State Bank of India, ranking the seventh in the global TOP200 financial institutions by total market value. The digital transformation of traditional finance is accelerated actively, e.g., State Bank of India has launched YONO, a comprehensive life and financial service platform, HDFC Bank launched UltraCash, a mobile payment application program, etc. The rate of utilization of fintech has been increased in an accelerated manner.

# *2.7 Israel—Guidance Plus Service to Create a Highland for the Development of Fintech*

## **2.7.1 Development Features: Technology and International Resources are Transformed Into Fintech Development Advantages**

Israel is an internationally recognized innovation powerhouse. The proportion of scientists and engineers engaged in high-tech research and development in Israel is the highest in the world. Among the high-tech companies listed on Nasdaq in the United States, the total number of Israeli companies ranks second. Israel has more than 6,000 technology start-up companies, ranking first in the world. More than 270 multinational companies in the world have set up scientific research centers in Israel. Israel has strong scientific and technological innovation genes and international resources. These resources have laid a solid foundation for fintech innovation and development of Israel. In general, the development of Israeli fintech presents three major development features.

First, the underlying technology shows obvious endowment advantages. Israel is a model of the integration of military and civilian development all over the world. At the same time, the demand for cutting-edge technology and related innovations in the military field are smoothly transmitted to the commercial field. Israel has strong military applications in the fields of security, computer vision, and neuro-language planning. These technological applications are also applied to the development of fintech. Currently, it is the country with the highest usage density of fintech applications. Israel is one of the first countries in the world to adopt blockchain and digital encryption technology. It has many start-up companies with core technologies in the field of blockchain and digital encryption, such as QEDit and DAGLabs. From 2013 to 2019, the amount of financing for fintech and related underlying technologies (artificial intelligence, network security, etc.) was on an upward trend. Especially in the field of artificial intelligence, the amount of financing doubled from USD 1.463 billion in 2017 to USD 3.182 billion in 2019, and the average amount of a single financing increased from USD 7.07 million in 2017 to USD 16.41 million in 2019, increased by more than 100%.

Second, the science and innovation ecology is increasingly perfect. Israel attaches great importance to the formation of the science and innovation ecology. Its government has taken various measures to ensure that scientific research is one of its priorities. It provides security through tax and fee reduction, and at the same time it increases the expenditure in the industry. In the 2020 OECD R&D Intensity Index (the ratio of R&D investment to GDP), Israel continued to maintain its leading position. It is expected that the total future expenditure will continue to increase. According to the latest annual global entrepreneurial ecosystem rankings released by the global entrepreneurial research organization StartupBlink, Israels global ranking has risen by one over last year, ranking third in the world.

Third, the degree of internationalization of fintech is high. Due to geographical restrictions and market restrictions, Israels fintech has attracted international investment and fintech companies and has exported products and services to the outside world since its emerging. According to a report released by the Israel Venture Capital Data Center in 2020, the participation of foreign investors in Israeli equity investment in the fintech sector increased from 57% in 2018 to 69% in 2019. In the fields of payment, transaction, and digital currency, more than 90% of fintech companies provide international services. In 2019, Israels high-tech industry export continued to grow, reaching a historical record of USD 45.8 billion, accounting for about 46% of Israels total export, increased by 1.2% than that in 2018. Among them, the export of related technology products and services such as fintech and artificial intelligence accounted for a relatively large proportion.

# **2.7.2 Policies and Regulatory Measures: Guidance but Not Leading, and Strengthening of Communication Between Government and Industries**

Israel has many policies and regulatory measures. From the establishment of a regulatory and innovative fintech hub to the establishment of a fintech assistance center, from adjusting the fintech license application process to launching the data sandbox program, the Israeli government basically guides the development of fintech as a market assistant and industrial development guider. Strengthening the communication between the government and the industries and being a partner for the development of fintech are the characteristics of Israeli fintech policies and regulatory measures.

In July 2018, the Israel Securities Authority (hereinafter referred to as ISA) announced the establishment of a regulatory innovation fintech hub, mainly aiming at promoting dialogue between regulators and participants in the fintech industry. In 2019, the Capital Market Authority of Israel joined the GFIN and participated in the global fintech regulatory reform together with international institutions such as the World Bank and the International Monetary Fund, etc. In July 2020, the Israel Securities Authority and the Israel Innovation Authority jointly launched a data sandbox plan for fintech start-ups.

# **2.7.3 Layout of Key Fintech Cities: Tel Aviv—The Integration of Internal and External Strategies to Promote the Innovation and Development of Fintech**

Tel Aviv is Israels second largest city. The city cluster centered on Tel Aviv has become Israels largest metropolitan area and economic hub, and is known as Israels economic capital and technology center. 77% of Israeli start-ups, 81% of investment institutions, 72% of incubators, and 85% of R&D centers are located in Tel Aviv. Tel Aviv owns Israels only stock exchange, Tel Aviv Stock Exchange (TASE), which has become the international headquarters of venture capital companies, scientific research institutions, and a gathering place for high-tech companies. At the same time, Tel Aviv has a relatively complete innovation incubation system and scientific talents. Among the top 200 global entrepreneurial ecosystem cities, Tel Aviv of Israel ranks seventh. The advantage of focusing on financial innovation with leading technology is obvious. According to the 27th Global Financial Center Index Report (GFCI 27), Tel Aviv ranks 36th.

Tel Avivs fintech development adopts an international strategy. Taking advantage of its own superior innovation environment and developed international capital agglomeration, the products of major fintech companies consider Tel Aviv as an effective test point for product technology, and Tel Aviv will be the first place to test the effects of products and services before international marketing. Tel Aviv Global City Office is used to implement targeted marketing for international fintech customers. While enhancing the citys global media image, various activities are held to meet fintech services and needs, link start-ups and investment capital, as well as implement cross-bank and cross-domain cooperation.

# *2.8 Indonesia—A Rising Star of Fintech Development in Southeast Asia*

### **2.8.1 Development Features: Fintech is in the Preliminary Development Stage, and Its Potential Continues to be Highlighted**

The development of the Internet has certain advantages. According to the 2019 Southeast Asia Digital Economy Report, Indonesia is the country with the largest Internet economy in Southeast Asia. It was even more than quadrupled in 2019, with more than USD 40 billion, and it is expected to reach USD 130 billion in 2025. Internet users in Indonesia are growing rapidly. According to a report released by a global social media marketing company We Are Social and Hootsuite, in January 2020, Internet penetration rate of Indonesia was 64%, with an average annual growth rate of close to 20%. Moreover, Indonesia has crossed the mature development stage of the Internet and moved directly to the development stage of the mobile Internet. In January 2019, there were 356 million mobile phone users in Indonesia, the penetration rate of mobile phones was 133%, and the number of active mobile Internet users reached 142 million.

The fintech industry is developing rapidly. According to a market report by Swiss Global Enterprise, a Swiss export and promotion agency, Indonesias digital financial services revenue is expected to grow significantly at a compound annual growth rate (CAGR) of 34%, and will reach USD 8.6 billion in 2025. The research report Future of Southeast Asia Financial Technology shows that the total valuation of Indonesian fintech companies reached USD 35 billion in 2020, accounting for 32% of that in Southeast Asia.

The online lending and payment industry is booming. The cumulative amount of loans for online lending in Indonesia increased from IDR 2.56 trillion in December 2017 to IDR 102.52 trillion in March 2020, increased by 40 times. According to data compiled by the Otoritas Jasa Keuangan (OJK), Indonesias total loans from Fintech loans in May 2020 increased by 166.03% on a year-on-year basis. OJK estimates that there are more than 25 million borrower accounts and more than 654,200 entities providing loans. In terms of fintech payment, the total number of electronic money transactions at the end of 2019 reached 5.2 billion, an increase of 79.3% from 2.9 billion in last year. In May 2020, BI had issued licenses to 51 electronic money operators, and the main participants included GoPay, Ovo, Dana, and LinkAja.

## **2.8.2 Policies and Regulatory Measures: The System Continues to be Improved and the Supervision Continues to be Upgraded**

The fintech policy and supervision system have been continuously improved to encourage the development of the industry. Indonesias fintech sector is under the supervision of Bank Indonesia (BI) and the Otoritas Jasa Keuangan (OJK), with the Ministry of Information and Communications of Indonesia playing a supporting role. Bank Indonesia and OJK are responsible for different regulatory fields. Each of them has a supervisory team, and they learn from each other and complement each other (Table 1).

In October 2017, the Otoritas Jasa Keuangan (OJK) issued the 2017–2022 Development Plan, which formulated 10 major policies and implementation plans, and clearly stated that appropriate supervision should be carried out to optimize the development of financial technology. On November 30, 2017, Bank Indonesia issued the Financial Technology Regulatory Regulation No. 19/12/PBI/2017 for the first time, which aimed to regulate fintech behaviors to promote innovation, protect consumers and manage risks so as to maintain a stable currency and financial system and build an efficient, safe and reliable payment system. In the same year, the Bank Indonesia launched a fintech Regulatory Sandbox, allowing fintech companies [Payment system development (including blockchain and distributed ledgers); aggregate payment; Internet investment management and risk control; Internet insurance; credit,


**Table 1** Fintech regulatory authorities and their responsibilities

financing business, and capital allocation; other financial services (as judged by Bank Indonesia).] in six major sectors to perform a six-month test on their services under the supervision of the Bank Indonesia. On August 16, 2018, OJK, based on the experience of the Bank Indonesia in the fintech Regulatory Sandbox and pre-audit mechanism in the payment field, issued the Digital Financial Innovation Regulation No. 13/POJK.02/2018, which proposed a package of regulations on fintech supervision, established a Regulatory Sandbox system, and filled the blank of Indonesian Bank Regulation No. 19/12/PBI/2017.

Strengthen international fintech cooperation. In October 2018, at the annual meeting of the International Monetary Fund and the World Bank, the Indonesian government promoted the adoption of the FinTech Agenda. In the same year, Chinas Alibaba Cloud announced the establishment of its first data center in Indonesia and officially put it into operation. Since then, various Chinese fintech companies and investors have entered the Indonesian market. In September 2020, the Securities Commission of Malaysia (SC) signed a fintech cooperation agreement with Indonesias Otoritas Jasa Keuangan (OJK) in order to establish a cooperation framework to develop fintech ecosystems in the two markets.

### **2.8.3 Layout of Key Fintech Cities: Jakarta—The Rapid Development of Fintech with Inclusive Finance as the Core**

Jakarta is the capital, the largest city, and the economic center of Indonesia. The Greater Jakarta Region surrounding the surrounding towns is the second largest metropolitan area in the world. Its industry is dominated by finance, accounting for about one-third of the countrys GDP. It has the largest financial and major industrial and commercial institutions in the country. Stock exchanges and futures exchanges are all located in Jakarta. At the same time, Jakarta is also Indonesias fintech hub city. At present, most fintech companies are located in Jakarta (or Greater Jakarta Region), and domestic business customers are basically in the same area.

Jakarta has become the birthplace of fintech companies, the first test site, and the first launch site for products and services. Indonesias first fintech unicorn company OVO was born in Jakarta, and Akselan, the first equity crowdfunding platform, was officially established in Jakarta. Among the numerous fintech companies, more than 70% are engaged in digital financial inclusion business, mainly providing financing and lending services for small and micro enterprises and rural populations.

# *2.9 Hong Kong of China—The Government Assists the Strong Development of Fintech*

## **2.9.1 Development Features: Seek Innovation While Maintaining Stability to Build a Fintech Hub**

The enthusiasm for the development of fintech is booming. Hong Kong has a huge financial system, relatively complete financial ecology, and favorable conditions for the development of fintech. The construction and development of the Guangdong-Hong Kong-Macao Greater Bay Area also brings more opportunities for development of fintech in Hong Kong. According to statistics from the Hong Kong Investment Promotion Agency, there are currently more than 600 fintech companies in Hong Kong covering multiple business areas. According to relevant KPMG data and its statistics on global investment and financing, Hong Kong, China ranked ninth in Asia in terms of investment and financing in 2018. Between 2014 and 2018, the total investment of fintech companies operating in Hong Kong amounted to USD 1.1 billion. In the first half of 2019, fintech companies in Hong Kong raised a total of USD 150 million of fund, an increase of 561% on a year-on-year basis. At present, Hong Kongs key application areas of fintech involve mobile payment, cross-border ecommerce payment, securities payment settlement, online financing platform, wealth technology, commercial insurance, etc.

The internationalization characteristics in various fields of fintech are obvious. In terms of payment and settlement, with a wide variety of financial products in Hong Kong, coupled with the opening of Shanghai-Hong Kong Stock Connect, Shenzhen-Hong Kong Stock Connect, and Bond Connect, post-trade processing platforms have huge space for fintech. In terms of wealth management, as an international investment and asset management center, Hong Kong has already applied a large number of technologies in the field of asset management, such as computerized transactions and investments, and still has great potential in automated consulting, big data, and artificial intelligence. In terms of cross-border e-commerce in trade field, it involves payment and exchange in multiple currencies. Hong Kong has the obvious advantages of free currency convertibility and offshore RMB center. Coupled with tax laws and other conditions, Hong Kong is still preferred cross-border e-commerce. Relying on the above advantages, a large number of cross-border payment companies have recently appeared in Hong Kong. In terms of supply chain finance, the blockchain trade financing platform is the construction focus in Hong Kong.

The fintech talent training system is complete. Currently, the main body of cultivating fintech talents in Hong Kong mainly includes two natures, namely, colleges and universities and social organizations. In terms of colleges and universities, many colleges and universities have set up fintech majors at the undergraduate, master, and doctoral levels. The Chinese University of Hong Kong, the University of Hong Kong, City University of Hong Kong, and the Open University of Hong Kong offer fintech major at the undergraduate level; the Chinese University of Hong Kong, Hong Kong University of Science and Technology, Hong Kong Baptist University, etc. offer such major at the masters level; and the Hong Kong Polytechnic University offers fintech major. In terms of social organizations, organizations such as the Hong Kong Youth Association Continuing Education Center, the Vocational Training Council, the Institute of Financial Technologists of Asia, and the Hong Kong Institute of Bankers have attracted or strengthened the training of fintech talents in online and offline methods by setting up basic fintech courses and issuing certificates.

## **2.9.2 Fintech Development Policies and Measures: Innovative Support Measures Contribute to Strong Development of Fintech**

The government supervision service system is efficient and perfect. Hong Kong has successively established and improved relevant government service systems. First, specialized agencies are established and a Regulatory Sandbox is established. The government of Hong Kong Special Administrative Region has established an Innovation and Technology Bureau to coordinate the development of fintech. At the same time, the Hong Kong Monetary Authority, the Securities and Futures Commission, and the Insurance Regulatory Authority have respectively set up fintech Regulatory Sandboxes to provide companies with a pilot-based regulatory environment for the application of innovative technologies. At the same time, the government has also set up a fast track for Internet insurance sales companies, such as ZhongAn Insurance and other online insurance companies, to apply for licenses. Second, the introduction of corporate resources is strengthened. The Hong Kong Investment Promotion Agency has established a fintech task team to successfully attract 19 fintech companies to settle in Hong Kong and provide assistance to more than 310 fintech companies.

Integrate into the development of the Guangdong-Hong Kong-Macao Greater Bay Area. Hong Kong cooperates with companies such as Tencent, etc., and the Hong Kong Monetary Authority has issued the first batch of third-party payment licenses to them. Through the deployment of WeChat Hong Kong Wallet, Passenger QR Code, We Remit and other products, Tencent has well integrated with the advantages of Hong Kong and Macau based on its accumulated mobile payment capabilities for many years. With the support from all regulatory parties, Tencents E-Pass will give priority to pilot virtual multi-certificate integration in the Guangdong-Hong Kong-Macao Greater Bay Area, which can meet the needs of residents in Guangdong, Hong Kong, and Macau to use a unified digital identity to enjoy multiple services so as to realize the interconnection in Guangdong-Hong Kong-Macao Greater Bay Area.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Probability Inequality with Application to Lattice Theory**

**Tian Kun**

**Abstract** Here we mainly provide a probability inequality about GGH public-key encryption scheme. Given a constant <sup>σ</sup>, we first choose a lattice vector *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, and a small error vector *e* is generated satisfying |*e*| σ. The ciphertext result *c* could be computed by the function *fB*,σ (*v*, *e*) = *Bv* + *e* with a public basis *B*. To extract the message *v*, the function *f* <sup>−</sup><sup>1</sup> *<sup>B</sup>*,σ (*c*) = *B*−1[*c*]*<sup>R</sup>* will be used based on the private basis *R*. In this work we produce a bound for the error probability of *v* = *B*−1[*c*]*R*. We also illustrate the way choosing σ such that the error probability is arbitrarily small.

**Keywords** Probability inequality · Encryption scheme · Lattice

# **1 Introduction**

Given a full-rank lattice *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>*, we denote the public basis of *<sup>L</sup>* by *<sup>B</sup>* and private basis of *L* by *R*. Both *B* and *R* are *n* × *n* invertible matrices. In the GGH public-key encryption scheme, for a plaintext vector *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, the random error vector *<sup>e</sup>* is chosen by setting the absolute value of each entry no more than a constant σ, where σ is a positive real number. The ciphertext *<sup>c</sup>* is computed by *<sup>c</sup>* <sup>=</sup> *fB*,σ (*v*, *<sup>e</sup>*) <sup>=</sup> *Bv* <sup>+</sup> *<sup>e</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*. Using the results of BaBai and some other ones (Ajtai, 1996; Ajtai & Dwork, 1997; Babai, 1986; Coppersmith & Shamir, 1997; Goldreich et al., 1997; Micciancio, 2001; Hoffstein et al., 2017, 1998), we can decipher the plaintext *v* = *B*−<sup>1</sup>[*c*]*<sup>R</sup>* given *B*, *R* and ciphertext *c*. Here the lattice point [*c*]*<sup>R</sup>* is obtained by representing *c* as a linear combination on the columns of *R* and rounding the coefficients in this linear combination to the nearest integers. The problem is that how σ should be chosen so that we can get a right plaintext *v* or guarantee a low error probability. We show three theorems to solve this problem. A probability inequality is given to estimate the bound of inversion error probability.

T. Kun (B)

School of Mathematics, Renmin University of China, Beijing 100872, China e-mail: tkun19891208@ruc.edu.cn

<sup>©</sup> The Author(s) 2023

Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_2

# **2 Main Results**

**Theorem 1** *B is the public basis and R is the private basis of lattice L. v* <sup>∈</sup> <sup>Z</sup>*n, e is the random error vector,* |*e*|∞ σ*, c* = *fB*,σ (*v*, *e*) = *Bv* + *e. Then B*−<sup>1</sup>[*c*]*<sup>R</sup>* = *v if and only if* [*R*−1*e*] = <sup>0</sup>*, here* [*R*−1*e*] *denotes the vector in* <sup>Z</sup>*<sup>n</sup> which is obtained by rounding each entry in R*−1*e to the nearest integer.*

*Proof* Let *T* = *B*−1*R*, then

$$B^{-1}[c]\_R = B^{-1}[B\upsilon + e]\_R = B^{-1}R[R^{-1}(B\upsilon + e)] = T[T^{-1}\upsilon + R^{-1}e]$$

since *<sup>T</sup>* <sup>=</sup> *<sup>B</sup>*−1*<sup>R</sup>* is a unimodular matrix, *<sup>T</sup>* <sup>−</sup><sup>1</sup> is also a unimodular matrix. *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, so *<sup>T</sup>* <sup>−</sup>1*<sup>v</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*.

$$B^{-1}[c]\_R = T[T^{-1}\nu + R^{-1}e] = \nu + T[R^{-1}e]$$

Thus *B*−1[*c*]*<sup>R</sup>* = *v* is equivalent to *T* [*R*−1*e*] = 0, and this equality holds if and only if [*R*−1*e*] = 0.

**Remark 1** This theorem gives an equivalent condition to check whether the decryption result is accurate.

**Theorem 2** *Let R be the private basis of lattice L. e is the random error vector such that* |*e*|∞ σ*. Suppose the maximum L*<sup>1</sup> *norm of the rows in R*−<sup>1</sup> *is* ρ*. Then if* σ < <sup>1</sup> <sup>2</sup><sup>ρ</sup> *,* [*R*−1*e*] = 0 *holds.*

*Proof* Let *R*−<sup>1</sup> = (*ci j*)*<sup>n</sup>*×*<sup>n</sup>*, *R*−1*e* = (*a*1, *a*2, ..., *an*)*<sup>T</sup>* , i.e., *ai* = *n <sup>j</sup>*=<sup>1</sup> *ci j e <sup>j</sup>* , 1 *i* - *n*.

$$|a\_i| = |\sum\_{j=1}^n c\_{ij} e\_j| \leqslant |e\_j| |\sum\_{j=1}^n c\_{ij}| \leqslant \sigma \rho < \frac{1}{2}$$

This means that [*R*−<sup>1</sup>*e*] = 0.

**Remark 2** Theorem 2 shows how σ can be chosen so that no inversion error occurs.

**Theorem 3** *Let an n* × *n matrix R be the private basis used in the inversion of fB*,σ *, and denote the maximum L*<sup>∞</sup> *norm of the rows in R*−<sup>1</sup> *by* <sup>√</sup>*<sup>r</sup> <sup>n</sup> . Then the probability of inversion errors is bounded by*

$$P\{\lceil R^{-1}e \rceil \neq 0\} \leqslant 2n \cdot \exp\left(-\frac{1}{8\sigma^2 r^2}\right),$$

*here e* = (*e*1, *e*2, ..., *en*)*<sup>T</sup> and e*1, *e*2, ..., *en are n independent random variables such that* |*ei*| σ *and E*(*ei*) = 0 *for* 1 *i n.*

**Lemma 1** *For any non-negative random variable X with finite expectation E*(*X*) *and any positive real number* μ*, we have*

$$P\{X \geqslant \mu\} \leqslant \frac{E(X)}{\mu}.$$

*Proof* Here we treat *X* as a random variable of continuous type. For the other situations, the proof is similar. Let *f* (*x*) be the probability density function of *X*. Since *E*(*X*) = +∞ <sup>0</sup> *x f* (*x*)d*x* +∞ <sup>μ</sup> *x f* (*x*)d*x* +∞ <sup>μ</sup> μ*f* (*x*)d*x* = μ*P*{*X* μ}, then we have *P*{*X* μ} - *E*(*X*) <sup>μ</sup> .

**Lemma 2** *Given random variable X satisfying* −*a* - *X a with E*(*X*) = 0*, here a* > 0*. For any real number* λ*, we have*

$$E(e^{\lambda X}) \le \exp\left(\frac{\lambda^2 a^2}{2}\right).$$

*Proof* For any real number λ, *f* (*x*) = *e*<sup>λ</sup>*<sup>x</sup>* is a convex function. Notice that

$$x = \frac{x+a}{2a} \cdot a + \frac{a-x}{2a} \cdot (-a), \quad -a \le x \le a$$

then

$$f(x) \le \frac{x+a}{2a}f(a) + \frac{a-x}{2a}f(-a)$$

$$e^{\lambda x} \le \frac{x+a}{2a}e^{\lambda a} + \frac{a-x}{2a}e^{-\lambda a}$$

$$E(e^{\lambda X}) \leqslant E(\frac{X+a}{2a}e^{\lambda a} + \frac{a-X}{2a}e^{-\lambda a}) = \frac{1}{2}(e^{\lambda a} + e^{-\lambda a})$$

Let *<sup>t</sup>* <sup>=</sup> <sup>λ</sup>*a*, next we prove that <sup>1</sup> <sup>2</sup> (*e<sup>t</sup>* + *e*−*<sup>t</sup>* ) exp(*<sup>t</sup>* <sup>2</sup> <sup>2</sup> ). This inequality is equivalent to

$$
\ln \frac{e^t + e^{-t}}{2} \le \frac{t^2}{2}
$$

Let *<sup>g</sup>*(*t*) <sup>=</sup> *<sup>t</sup>* <sup>2</sup> <sup>2</sup> <sup>−</sup> ln *<sup>e</sup><sup>t</sup>* <sup>+</sup>*e*−*<sup>t</sup>* <sup>2</sup> , then *g* (*t*) <sup>=</sup> *<sup>t</sup>* <sup>−</sup> *<sup>e</sup><sup>t</sup>* <sup>−</sup>*e*−*<sup>t</sup> et*+*e*−*<sup>t</sup>* and *<sup>g</sup>* (0) = 0. Since *g*(*t*) 0, we get *g* (*t*) - 0 if *t* - 0 and *g* (*t*) 0 if *t* 0. Then *g*(*t*) *g*(0) = 0 and we complete the proof.

**Lemma 3** *Suppose X*1, *X*2, ..., *Xn are n independent random variables. For* 1 - *i n, we have* −*a* - *Xi a and E*(*Xi*) = 0*, here a* > 0*. Let Sn* = *n <sup>i</sup>*=<sup>1</sup> *Xi ,* ε > 0*, then*

$$P\{|S\_n| \geqslant \varepsilon\} \leqslant 2\exp(-\frac{\varepsilon^2}{2na^2}).$$

*Proof* For any λ > 0, based on Lemma 1, we can get

$$P\{S\_n \geqslant \varepsilon\} = P\{e^{\lambda S\_n} \geqslant e^{\lambda \varepsilon}\} \leqslant \frac{E(e^{\lambda S\_n})}{e^{\lambda \varepsilon}}$$

Since *X*1, *X*2, ..., *Xn* are independent random variables, combine with Lemma 2,

$$E(e^{\lambda S\_n}) = \prod\_{i=1}^n E(e^{\lambda X\_i}) \le \prod\_{i=1}^n e^{\frac{\lambda^2 a^2}{2}} = e^{\frac{n\lambda^2 a^2}{2}}$$

$$P\{S\_n \ge \varepsilon\} \le \frac{E(e^{\lambda S\_n})}{e^{\lambda \varepsilon}} \le e^{-\lambda \varepsilon + \frac{n\lambda^2 a^2}{2}}$$

Let <sup>λ</sup> <sup>=</sup> <sup>ε</sup> *na*<sup>2</sup> , therefore, the above inequality becomes to

$$P\{S\_n \geqslant \varepsilon\} \leqslant \exp\left(-\frac{\varepsilon^2}{2na^2}\right),$$

In the same way, we can prove that

$$P\{S\_n \leqslant -\varepsilon\} \leqslant \exp\left(-\frac{\varepsilon^2}{2na^2}\right).$$

Thus

$$P\{|S\_n| \geqslant \varepsilon\} \leqslant 2\exp\left(-\frac{\varepsilon^2}{2na^2}\right).$$

**Proof of Theorem 3**. Now we can prove Theorem 3 given at first according to Lemma 3.

Let *R*−<sup>1</sup> = (*ci j*)*<sup>n</sup>*×*<sup>n</sup>*, *e* = (*e*1, *e*2, ..., *en*)*<sup>T</sup>* , here *e*1, *e*2, ..., *en* are *n* independent random variables such that |*ei*| σ and *E*(*ei*) = 0 for 1 *i n*.

We denote *R*−<sup>1</sup>*e* = (*a*1, *a*2, ..., *an*)*<sup>T</sup>* , i.e., *ai* = *n <sup>j</sup>*=<sup>1</sup> *ci j e <sup>j</sup>* , 1 *i n*.

Since |*ci j*| - <sup>√</sup>*<sup>r</sup> <sup>n</sup>* and |*e <sup>j</sup>*| σ, then the random variable *ci j e <sup>j</sup>* is limited to the interval [− <sup>√</sup>*r*<sup>σ</sup> *<sup>n</sup>* , <sup>√</sup>*r*<sup>σ</sup> *<sup>n</sup>* ]. Based on Lemma 3,

$$P\{|a\_i| \geqslant \frac{1}{2}\} = P\{|\sum\_{j=1}^n c\_{ij} e\_j| \geqslant \frac{1}{2}\} \leqslant 2\exp(-\frac{(\frac{1}{2})^2}{2n(\frac{r\sigma}{\sqrt{n}})^2}) = 2\exp(-\frac{1}{8\sigma^2r^2})$$

$$P\{|R^{-1}e| \ne 0\} \leqslant \sum\_{i=1}^n P\{|a\_i| > \frac{1}{2}\} \leqslant \sum\_{i=1}^n P\{|a\_i| \geqslant \frac{1}{2}\} \leqslant 2n \cdot \exp(-\frac{1}{8\sigma^2r^2})$$

Thus the inequality in Theorem 3 holds.

**Corollary 1** *<sup>P</sup>*{[*R*−1*e*] = <sup>0</sup>} < ε *if* σ < 2*r* 2 ln <sup>2</sup>*<sup>n</sup>* ε <sup>−</sup><sup>1</sup> *.*

$$\begin{aligned} \text{Proof } \sigma &< \left( 2r \sqrt{2 \ln \frac{2n}{\varepsilon}} \right)^{-1} \Leftrightarrow 2n \cdot \exp \left( -\frac{1}{8\sigma^2 r^2} \right) < \varepsilon, \text{ from Theorem 3,} \\\\ P\{ [R^{-1}e] \neq 0 \} &< 2n \cdot \exp \left( -\frac{1}{8\sigma^2 r^2} \right) < \varepsilon \end{aligned}$$

**Remark 3** Theorem 3 provides a way to estimate the bound of inversion error probability, and Corollary 1 gives a detailed bound for σ based on Theorem 3 to get the error probability no more than a constant ε.

# **3 Conclusions**

In this work we mainly present a probability inequality about GGH public-key encryption scheme. In this scheme, we first take a lattice vector *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* and generate a small error vector *e* such that |*e*| σ. Given a public basis *B*, the function *fB*,σ (*v*, *e*) = *Bv* + *e* computes the ciphertext result *c*. To decrypt, the private basis *R* and the function *f* <sup>−</sup><sup>1</sup> *<sup>B</sup>*,σ (*c*) = *B*−1[*c*]*<sup>R</sup>* will be used to extract the message *v*. We give a bound for the error probability of *v* = *B*−1[*c*]*<sup>R</sup>* and explain how to choose σ in order to obtain the error probability no more than a given constant ε.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Robust Identification of Gene-Environment Interactions Under High-Dimensional Accelerated Failure Time Models**

## **Qingzhao Zhang, Hao Chai, Weijuan Liang, and Shuangge Ma**

**Abstract** For complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many of the existing G-E interaction methods conduct marginal analysis, which may not appropriately describe disease biology. Joint analysis methods have been developed, with most of the existing loss functions constructed based on likelihood. In practice, data contamination is not uncommon. Development of robust methods for interaction analysis that can accommodate data contamination is very limited. In this study, we consider censored survival data and adopt an accelerated failure time (AFT) model. An exponential squared loss is adopted to achieve robustness. A sparse group penalization approach, which respects the "main effects, interactions" hierarchy, is adopted for estimation and identification. Consistency properties are rigorously established. Simulation shows that the proposed method outperforms direct competitors. In data analysis, the proposed method makes biologically sensible findings.

**Keywords** Robust identification · Gene-environment interactions · High-dimensional · Accelerated failure time models

Q. Zhang

School of Economics and the Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, China e-mail: zhangqingzhao@amss.ac.cn

H. Chai · S. Ma (B) Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, US e-mail: shuangge.ma@yale.edu

#### W. Liang School of Statistics, Renmin University of China, Beijing, China e-mail: weijuanliang@yeah.net

© The Author(s) 2023 Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_3

# **1 Introduction**

For many complex diseases, it is essential to identify important risk factors that are associated with prognosis. In the omics era, profiling studies have been extensively conducted. It has been found that, beyond the main effects of genetic (G) and environmental (E) risk factors, gene-environment (G-E) interactions can also have important implications.

Denote *T* and *C* as the prognosis and censoring times, respectively. Denote *X* = (*X*1,..., *Xq* ) as the *q* environmental/clinical variables, and *Z* = (*Z*1,..., *Z <sup>p</sup>*)- as the *p* genetic variables. The existing G-E interaction analysis methods mainly belong to two families. The first family conducts marginal analysis (Hunter, 2005; Shi et al., 2014; Thomas, 2010), under which one or a small number of genes are analyzed at a time. Despite its significant computational simplicity, marginal analysis contradicts the fact that the prognosis of complex diseases is attributable to the *joint* effects of multiple main effects and interactions. The second family of methods, which is biologically more sensible, conducts joint analysis (Liu et al., 2013; Wu et al., 2014; Zhu et al., 2014). Among the existing joint analyses, the regression-based is the most popular and proceeds as follows. Consider the model *T* ∼ φ(α<sup>0</sup> + *q <sup>j</sup>*=<sup>1</sup> *X <sup>j</sup>*α*<sup>j</sup>* + *p <sup>k</sup>*=<sup>1</sup> *Zk*β*<sup>k</sup>* + *q j*=1 *p <sup>k</sup>*=<sup>1</sup> *X <sup>j</sup> Zk*γ*<sup>j</sup>*,*<sup>k</sup>* ), where model φ(·) is known up to the regression coefficients α0,{α*j*} *q* <sup>1</sup> ,{β*<sup>k</sup>* } *p* <sup>1</sup> , and {γ*<sup>j</sup>*,*<sup>k</sup>* } *q*,*p* <sup>1</sup> . Conclusions on the importance of interactions are drawn based on {γ*<sup>j</sup>*,*<sup>k</sup>* } *q*,*p* <sup>1</sup> . With the high data dimensionality and demand for the selection of relevant effects, regularized estimation is usually needed.

In the dominating majority of the existing studies, estimation is based on the standard likelihood, which is nonrobust. In practice, data contamination is not uncommon and can be caused by multiple reasons. Many diseases are heterogeneous, and different subtypes behave differently. When the subtype information is accurately available, subtype-specific analysis can be conducted. However, when such information is not or partially available, which is often the case in practice (He et al., 2015), subjects belonging to small subtypes may be viewed as "contamination" to those of the leading subtype. Human errors can also happen. It has been well noted that survival information extracted from medical records is not always reliable (Bowman, 2015; Fall et al., 2008), creating contamination in prognosis distributions. In low-dimensional biomedical studies, it has been well established that even a single contaminated observation can lead to biased model estimation and so false marker identification (Huber & Ronchetti, 2009). Our literature review suggests that in the analysis of G-E interactions, robust methods that can effectively accommodate contamination in prognosis outcomes have been very rare. For marginal interaction analysis, a few robust methods, for example, the multifactor dimensionality reduction (MDR), have been developed. However, they are not directly applicable to joint analysis because of both methodological and computational challenges. As discussed in (Wu & Ma, 2015), a handful of robustness studies have been conducted under high-dimensional settings for joint analysis. However, they are mostly on main effects and not directly applicable to interaction analysis because of the additional complexity caused by the "main effects, interactions" hierarchy. Most of them adopt the quantile regression technique. Studies under low-dimensional settings suggest that no robust technique can dominate. It is thus desirable to examine alternative robust techniques under high-dimensional settings. In addition, for quite a few existing methods, statistical properties have not been well studied, casting doubts on their validity.

Consider data with a prognosis outcome and both G and E measurements. Our goal is to conduct joint analysis and identify important G-E interactions and main G and E effects. This study advances from the literature in multiple aspects. Specifically, we consider the scenario with possible contamination in the prognosis outcome, which is commonly encountered but little addressed. We adopt an exponential squared loss to achieve robustness. This loss function provides a useful alternative to the popular quantile regression and other robust approaches but has not been well investigated under high-dimensional settings, especially not for interaction analysis. This study also marks a novel extension of the exponential squared loss to accommodate censored survival data. For regularized estimation and selection of relevant effects, we propose adopting a penalization technique, which respects the "main effects, interactions" hierarchy. Significantly advancing from most of the existing studies, consistency properties are rigorously established. Theoretical research for high-dimensional robust methods remains limited. As such, this study may provide valuable insights. With both methodological and theoretical developments, this study is warranted beyond the existing literature.

# **2 Methods**

# *2.1 Data and Model Settings*

For describing prognosis, we adopt the AFT model, which has been the choice of multiple studies with high-dimensional genetic data (Liu et al., 2013; Shi et al., 2014). Compared to alternatives including the Cox model, advantages of the AFT model include intuitive interpretations and low computational cost, which are especially desirable with high-dimensional genetic data. With a slight abuse of notation, still use *T* and*C* to denote the logarithms of the event and censoring times, and δ = *I*{*<sup>T</sup>* <sup>≤</sup>*C*}. The AFT model specifies that

$$T = \alpha\_0 + \sum\_{j=1}^{q} X\_j \alpha\_j + \sum\_{k=1}^{p} Z\_k \beta\_k + \sum\_{j=1}^{q} \sum\_{k=1}^{p} X\_j Z\_k \gamma\_{j,k} + \varepsilon, \varepsilon$$

where ε is the random error. Following Stute (1993, 1996), we assume that *T* and *C* are independent, and δ is conditionally independent of (*X*-, *Z*-) given *T* . Let *Wk* = (*Zk* , *X*1*Zk* ,..., *Xq Zk* ) and *bk* = (β*<sup>k</sup>* , γ<sup>1</sup>,*<sup>k</sup>* ,...,γ*<sup>q</sup>*,*<sup>k</sup>* )-, which represent all main and interaction effects corresponding to the *k*th genetic variable.

With *n* independent subjects, use subscript "*i*" to denote the *i*th subject. For subject *i*, let *yi* = min{*Ti*,*Ci*} and δ*<sup>i</sup>* = *I*{*Ti*≤*Ci*} be the observed time and event indicator, respectively. Then the *i*th observation consists of (*yi*, δ*i*, **x***i*, **z***i*), with **x***<sup>i</sup>* = (*xi*1,..., *xiq* )-, **z***<sup>i</sup>* = (*zi*1,...,*zi p*)-, and *Wk*,*<sup>i</sup>* = (*zik* , *xi*1*zik* ,..., *xiq zik* ) denoting the *i*th realization of *X*, *Z*, and *Wk* , respectively. Denote **u**- *<sup>i</sup>*, = (1, **x**- *<sup>i</sup>* , *W*- <sup>1</sup>,*i*,..., *W*- *<sup>p</sup>*,*i*), **U** = (**u**1,, ··· , **u***n*,)-, and ζ = (α0,...,α*<sup>q</sup>* , *b*- <sup>1</sup> ,..., *b*- *<sup>p</sup>* )-. Without loss of generality, assume that (*yi*, δ*i*, **u***i*,)'s have been sorted according to *yi*'s in an ascending manner.

# *2.2 Robust Estimation and Identification*

Consider the scenario where the distribution of ε is not specified, which significantly differs from the existing parametric studies and makes the proposed method more flexible. To motivate the proposed estimation, first consider data without contamination. Stute (1993) developed a weighted least squared estimation approach. Under low-dimensional settings, Stute's estimator is defined as the minimizer of the loss function

$$\sum\_{i=1}^{n} a\_i \left( \mathbf{y}\_i - \mathbf{u}\_{i,\cdot}^{\top} \boldsymbol{\xi} \right)^2 \cdot$$

Here the weights *ω* = (ω*i*) *n <sup>i</sup>*=<sup>1</sup> are computed based on the Kaplan-Meier estimation and defined as

$$\omega\_1 = \frac{\delta\_1}{n}, \omega\_i = \frac{\delta\_i}{n - i + 1} \prod\_{j=1}^{i-1} \left( \frac{n - j}{n - j + 1} \right)^{\delta\_j}, i = 2, \dots, n.$$

It is noted that Stute's estimator is not necessarily the most efficient. However, under high-dimensional settings, it can be computationally the most convenient with the least squared loss. It can be seen that, if ω*<sup>i</sup>* = 0, one contaminated *yi* can lead to severely biased model estimation.

Now consider the scenario with possible outliers in the prognosis data. We propose the objective function

$$Q\_{\theta}(\boldsymbol{\xi}) = \sum\_{i=1}^{n} \boldsymbol{\omega}\_{i} \exp(-(\boldsymbol{y}\_{i} - \mathbf{u}\_{i,\cdot}^{\top}\boldsymbol{\xi})^{2}/\theta). \tag{1}$$

This function has been motivated by the following considerations. Under lowdimensional regression analysis without censoring, (Wang et al., 2013) adopted an exponential squared loss to achieve robustness. The intuition is as follows. For a contaminated subject with the observed *yi* deviating from **u**- *<sup>i</sup>*, ζ (the "predicted" value based on the model), (*yi* − **u**- *<sup>i</sup>*, ζ )<sup>2</sup> has a large value. The exponential function downweighs such a contaminated observation. The degree of down-weighing is adjusted by θ: when θ gets smaller, the contaminated observations have smaller influence. While sharing certain similar ground as (Wang et al., 2013) and others, the present study has three main challenges/advancements. The first is the high dimensionality, which brings tremendous challenges to theoretical and computational developments. The second is the need to respect the "main effects, interactions" hierarchy (more details below). The third is censoring, to accommodate which we introduce the weight function ω*<sup>i</sup>* motivated by Stute's approach. As the weights are data-dependent, they bring challenges to the establishment of theoretical properties.

When *p n*, regularized estimation is needed. In addition, out of a large number of profiled G factors and G-E interactions, only a few are expected to be associated with prognosis. We adopt penalization for regularized estimation and identification, which has been the choice of a large number of genetic studies, especially recent interaction analyses (Bien et al., 2013; Liu et al., 2013; Shi et al., 2014). Specifically, consider the penalized robust objective function

$$L\_{\lambda\_1, \lambda\_2, \theta}(\boldsymbol{\xi}) = \mathcal{Q}\_{\theta}(\boldsymbol{\xi}) - \sum\_{k=1}^{p} \rho(\|b\_k\|; \lambda\_1, s) - \sum\_{k=1}^{p} \sum\_{j=2}^{q+1} \rho(|b\_{kj}|; \lambda\_2, s), \tag{2}$$

where · is the <sup>2</sup> norm, ρ(*t*; λ,*s*) = λ <sup>|</sup>*t*<sup>|</sup> 0 <sup>1</sup> <sup>−</sup> *<sup>x</sup>* λ*s* <sup>+</sup> *dx* is the MCP (minimax concave penalty, (Zhang, 2010)), and *bkj* is the *j*th element of *bk* . λ<sup>1</sup> and λ<sup>2</sup> are data-dependent tuning parameters, and *s* is the regularization parameter, per the terminologies in Zhang (2010). The robust estimator is defined as the maximizer of *L*<sup>λ</sup>1,λ2,θ (ζ ). An interaction term (or main effect) is concluded as important if its estimate is nonzero.

In recent genetic interaction analysis, it has been stressed that the "main effects, interactions" hierarchy should be respected. That is, if an interaction term is identified as important, its corresponding main effect(s) should be automatically identified. G-E interaction analysis has its uniqueness. The E variables usually have a low dimensionality and are manually chosen. As such, selection is usually not conducted on the E variables (if desirable, this can be easily achieved). Thus for G-E interaction analysis, the hierarchy postulates that if an G-E interaction is identified as important, its corresponding main G effect is automatically identified. In the adopted sparse group penalty, the first penalty, which is a group MCP, determines which groups are selected. Here one group corresponds to one genetic variable and its interactions. As the group MCP does not have within-group sparsity, the second penalty is imposed, where we penalize the interaction terms and determine which are nonzero. With the special design that the second penalty is only imposed on interactions, important interactions correspond to important groups, automatically leading the estimates of the corresponding main G effects nonzero. As such, the combination of the two penalties guarantees the hierarchy. We note that although sparse group penalization has been studied in the literature (Liu et al., 2013), it has been very rarely coupled with robust loss functions. It is also noted that MCP can be potentially replaced by other penalties.

# *2.3 Computation*

In this section, we develop an efficient algorithm to compute the maximizer of *L*λ1,λ2,θ (ζ ). The basic strategy is to iteratively approximate the objective function by its quadratic minorization. Then a coordinate-wise updating procedure is used to find the maximizer of each approximated objective function. The maximizer then serves as the starting point for the next minorization. Overall, this is a coordinatedescent (CD) algorithm nested in a Minorize-Maximization (MM) algorithm.

Let**W**(ζ ) be a diagonal matrix with the *i*th diagonal element**W***i*,*<sup>i</sup>* = 2ω*<sup>i</sup>* exp(−(*yi* − **u**- *<sup>i</sup>*, ζ )2/θ )/θ. Also let **<sup>v</sup>**(ζ ) <sup>=</sup> (*v*1, ··· , *vn*) with *vi* = *yi* − **u**- *<sup>i</sup>*, ζ . Define **U**,<sup>−</sup> *<sup>j</sup>* as the sub-matrix of **U** with the *j*th column excluded. Define **u**,*<sup>j</sup>* as the *j*th column of matrix **U**, and *ui*,*<sup>j</sup>* as the *j*th component of vector **u***i*,. Similarly, define *ζ* <sup>−</sup> *<sup>j</sup>* as the sub-vector of ζ with the *j*th element excluded. For the exponential squared objective function in (1), its first- and second-order derivatives with respect to ζ are

$$\begin{split} \frac{\partial \mathcal{Q}\_{\boldsymbol{\theta}}(\boldsymbol{\xi})}{\partial \boldsymbol{\zeta}\_{j}} &= 2 \sum\_{i=1}^{n} \boldsymbol{\omega}\_{i} \exp(-(\mathbf{y}\_{i} - \mathbf{u}\_{i,\cdot}^{\top}\boldsymbol{\xi})^{2}/\theta) \boldsymbol{u}\_{i,j} (\mathbf{y}\_{i} - \mathbf{u}\_{i,\cdot}^{\top}\boldsymbol{\xi})/\theta = \mathbf{u}\_{,j}^{\top} \mathbf{W}(\boldsymbol{\xi}) \mathbf{v}(\boldsymbol{\xi}), \\ \frac{\partial^{2} \mathcal{Q}\_{\boldsymbol{\theta}}(\boldsymbol{\xi})}{\partial \boldsymbol{\xi}\_{j} \partial \boldsymbol{\xi}\_{k}} &= 2 \sum\_{i=1}^{n} \boldsymbol{\omega}\_{i} \exp(-(\mathbf{y}\_{i} - \mathbf{u}\_{i,\cdot}^{\top}\boldsymbol{\xi})^{2}/\theta) \boldsymbol{u}\_{i,j} \boldsymbol{u}\_{i,k} [2(\mathbf{y}\_{i} - \mathbf{u}\_{i,\cdot}^{\top}\boldsymbol{\xi})^{2}/\theta - 1]/\theta. \end{split}$$

If (*yi* − **u**- *<sup>i</sup>*, ζ )<sup>2</sup>/θ > <sup>0</sup>.5, <sup>∂</sup><sup>2</sup> *<sup>Q</sup>*<sup>θ</sup> (ζ ) ∂ζ *<sup>j</sup>* ∂ζ*<sup>k</sup>* ≥ 0. On the other hand if (*yi* − **u**- *<sup>i</sup>*, ζ )<sup>2</sup>/θ ≤ 0.5, ∂<sup>2</sup> *Q*<sup>θ</sup> (ζ ) ∂ζ *<sup>j</sup>* ∂ζ*<sup>k</sup>* ≤ 0. Hence, to find the maximizer of *Q*<sup>θ</sup> (ζ ), the simple Newton-Raphson approach may lead to infinity if the starting value is too far from the true value. To tackle this problem, a minorization of *Q*<sup>θ</sup> (ζ ) is used to approximate *Q*<sup>θ</sup> (ζ ). Note that <sup>∂</sup><sup>2</sup> *<sup>Q</sup>*<sup>θ</sup> (ζ ) ∂ζ *<sup>j</sup>* ∂ζ*<sup>k</sup>* ≥ −2 *n <sup>i</sup>*=<sup>1</sup> ω*<sup>i</sup>* exp(−(*yi* − **u**- *<sup>i</sup>*, ζ )<sup>2</sup>/θ )*ui*,*jui*,*<sup>k</sup>*/θ. Hence a minorized approximation to *Q*<sup>θ</sup> (ζ ) at ζ *<sup>m</sup>* is

$$(\mathcal{Q}\_{\boldsymbol{\theta}}(\boldsymbol{\xi}^{\boldsymbol{m}}) + \mathbf{v}^{\top}(\boldsymbol{\xi}^{\boldsymbol{m}})\mathbf{W}(\boldsymbol{\xi}^{\boldsymbol{m}})\mathbf{U}(\boldsymbol{\xi} - \boldsymbol{\xi}^{\boldsymbol{m}}) - \frac{1}{2}(\boldsymbol{\xi} - \boldsymbol{\xi}^{\boldsymbol{m}})^{\top}\mathbf{U}^{\top}\mathbf{W}(\boldsymbol{\xi}^{\boldsymbol{m}})\mathbf{U}(\boldsymbol{\xi} - \boldsymbol{\xi}^{\boldsymbol{m}}).$$

Note that <sup>ζ</sup> *<sup>m</sup>* <sup>=</sup> (α*<sup>m</sup>* <sup>0</sup> ,...,α*<sup>m</sup> <sup>q</sup>* , *b<sup>m</sup>* 1 -,..., *b<sup>m</sup> p* -) with *b<sup>m</sup> <sup>k</sup>* <sup>=</sup> (β*<sup>m</sup> <sup>k</sup>* , γ *<sup>m</sup>* <sup>1</sup>,*<sup>k</sup>* ,...,γ *<sup>m</sup> <sup>q</sup>*,*<sup>k</sup>* )-. For the penalty, we apply a local linear approximation at ζ *<sup>m</sup>*, which is given by

$$-\sum\_{k=1}^{p} \dot{\rho}\left(\|b\_{k}^{m}\|\!\!\|\;\lambda\_{1}\;,s\right) \frac{|\beta\_{k}^{m}\|}{\|b\_{k}^{m}\|} |\beta\_{k}| - \sum\_{k=1}^{p} \sum\_{j=1}^{q} \left\{ \dot{\rho}\left(\|b\_{k}^{m}\|\!\|\;\lambda\_{1}\;,s\right) \frac{|\gamma\_{j,k}^{m}\|}{\|b\_{k}^{m}\|} + \dot{\rho}\left(|\gamma\_{j,k}^{m}|;\lambda\_{2},s\right) \right\} |\gamma\_{j,k}| \cdot$$

if the terms that do not depend on ζ are ignored, where ρ(˙ *t*; λ,*s*) = *sgn*(*t*) <sup>λ</sup> <sup>−</sup> <sup>|</sup>*t*<sup>|</sup> *s* + . If we replace *Q*<sup>θ</sup> (ζ ) in (2) with its minorized approximation and plug in the approximation of the penalty, the penalized objective function then has the form

Robust Identification of Gene-Environment Interactions … 43

$$\begin{split} L\_{\lambda\_{1},\lambda\_{2},\theta}(\boldsymbol{\xi}\mid\boldsymbol{\xi}^{m}) &= \boldsymbol{Q}(\boldsymbol{\xi}^{m}) + \mathbf{v}^{\top}(\boldsymbol{\xi}^{m})\mathbf{W}(\boldsymbol{\xi}^{m})\mathbf{U}(\boldsymbol{\xi}-\boldsymbol{\xi}^{m}) \\ &- \frac{1}{2}(\boldsymbol{\xi}-\boldsymbol{\xi}^{m})^{\top}\mathbf{U}^{\top}\mathbf{W}(\boldsymbol{\xi}^{m})\mathbf{U}(\boldsymbol{\xi}-\boldsymbol{\xi}^{m}) - \sum\_{k=1}^{p} \dot{\rho}(\|\boldsymbol{b}\_{k}^{m}\|;\lambda\_{1},s)\frac{|\boldsymbol{\beta}\_{k}^{m}|}{\|\boldsymbol{b}\_{k}^{m}\|}|\boldsymbol{\beta}\_{k}| \\ &- \sum\_{k=1}^{p} \sum\_{j=1}^{q} \left\{\dot{\rho}(\|\boldsymbol{b}\_{k}^{m}\|;\lambda\_{1},s)\frac{|\boldsymbol{\gamma}\_{j,k}^{m}|}{\|\boldsymbol{b}\_{k}^{m}\|} + \dot{\rho}(\|\boldsymbol{\gamma}\_{j,k}^{m}|;\lambda\_{2},s)\right\}|\boldsymbol{\gamma}\_{j,k}|. \end{split} \tag{3}$$

This function has a "weighted quadratic + penalty" form and can be optimized using the coordinate-descent approach.

The algorithm starts with *m* = 0 and ζ *<sup>m</sup>* = **0**, where *m* is the index of the MM iteration. At iteration *m*, the objective function is approximated by its minorization *L*<sup>λ</sup>1,λ2,θ (ζ |ζ *<sup>m</sup>*) given in (3). Then the penalized weighted quadratic function is maximized using the coordinate-descent algorithm. Denote ζ¯ *old* as the estimate of ζ before updating. We update each element of the estimate and denote the new estimate as ζ¯ *new*. This is repeated until the distance between ζ¯ *old* and ζ¯ *new* is smaller than a prefixed constant. Then ζ *<sup>m</sup>*+<sup>1</sup> = ζ¯ *new* serves as the new expansion base point for the next minorization. The overall procedure is repeated until convergence. Convergence properties of the MM and CD techniques have been well studied in the literature. With our problem, the objective function increases at each step and is bounded above, which leads to convergence. In numerical study, we conclude convergence if the difference between two estimates after two consecutive MM steps is small enough. We observe convergence in all numerical examples after a small to moderate number of MM iterations.

The proposed method involves tuning parameters. For *s* in MCP, we follow (Zhang, 2010) and other published studies, which suggest examining a small number of values or fixing it. In our numerical study, we fix *s* = 6, which has been adopted in published studies (Shi et al., 2014; Xu et al., 2018). We have also examined *s* values near 6 and observed similar performance (details omitted). In practice, for settings significantly different from ours, other *s* values may need to be considered. Under low-dimensional settings, (Wang et al., 2013) proposed an iterative approach to select the robust tuning parameter θ. However, their approach is computationally infeasible for high-dimensional data. Under the present setting, for each combination of (λ1, λ2,θ), we compute the solution. This way, we can obtain a solution surface over a three-dimensional tuning parameter grid. This is feasible as the proposed computational algorithm only involves simple updates and incurs low cost. Then the tuning parameters can be selected using a prediction-based method which proceeds as follows: (a) compute the cross-validated sum of prediction errors for each (λ1, λ2,θ) combination; (b) for each fixed θ, average the sum of prediction errors over λ1, λ2. Select θ that has the smallest average sum of prediction errors; (c) with the selected θ, select λ1, λ<sup>2</sup> that has the smallest sum of prediction errors. This procedure first groups all (λ1, λ2) values together and selects the best θ value. Then with the optimal θ value, the optimal (λ1, λ2) values are selected. Our numerical experiments suggest that this procedure generates more stable estimates than directly searching over the three-dimensional (λ1, λ2,θ) grid.

With a complex robust goodness-of-fit and a penalty that respects the hierarchy, the proposed method is inevitably computationally more expensive than some simpler alternatives. However, as the proposed computational algorithm is composed of relatively simple calculations, the overall computational cost is affordable. With fixed tunings, the analysis of one simulated dataset (described in detail below) takes about nine minutes on a regular laptop. Tuning parameter selection can be conducted in a highly parallel manner to save computer time.

# *2.4 Consistency Properties*

In this section, we rigorously prove that the proposed method can consistently identify the important interactions (and main effects) under ultrahigh-dimensional settings. In the literature, theoretical development for robust methods under high-dimensional settings has been limited. It is especially rare for methods other than the quantile based. With the consistency properties, the proposed method can be preferred over the alternatives whose statistical properties have not been well established. Our theoretical development not only provides a solid ground for the proposed method but also sheds insights for other robust methods under high-dimensional settings.

For any two subsets *S*<sup>1</sup> and *S*<sup>2</sup> of {1, ··· , *p* + *q* + *pq* + 1} and a matrix *H*, we denote by *HS*<sup>1</sup> *<sup>S</sup>*<sup>2</sup> the sub-matrix of *H* with rows and columns indexed by *S*<sup>1</sup> and *S*2, respectively. Let ζ <sup>∗</sup> = (α<sup>∗</sup> <sup>0</sup> ,...,α<sup>∗</sup> *<sup>q</sup>* , *b*<sup>∗</sup> 1 -,..., *b*<sup>∗</sup> *p* -)-, where *b*<sup>∗</sup> *<sup>k</sup>* = (β<sup>∗</sup> *<sup>k</sup>* , γ <sup>∗</sup> <sup>1</sup>,*<sup>k</sup>* ,...,γ <sup>∗</sup> *<sup>q</sup>*,*<sup>k</sup>* )- is the true value of ζ . Here we make the sparsity assumption, under which only a subset of the components of ζ <sup>∗</sup> is nonzero. Define the three groups of parameters:

$$\begin{aligned} A\_1 &= \{a\_0^\*, \dots, a\_q^\*\}, \quad A\_2 = \{\boldsymbol{\gamma}\_{j,k}^\* : \boldsymbol{\gamma}\_{j,k}^\* \neq 0, j = 1, \dots, q; k = 1, \dots, p\}, \\ A\_3 &= \{\boldsymbol{\beta}\_k^\* : \boldsymbol{\beta}\_k^\* \neq 0 \text{ or there exists some } 1 \le j \le q \text{ such that } \boldsymbol{\gamma}\_{j,k}^\* \neq 0, k = 1, \dots, p\}. \end{aligned}$$

Denote *A* as the set of indices of *A*<sup>1</sup> ∪ *A*<sup>2</sup> ∪ *A*<sup>3</sup> in the vector ζ <sup>∗</sup>. Let *A <sup>c</sup>* and |*A* | denote the complement and cardinality of set *A* , respectively. We then divide *A <sup>c</sup>* into there sets of indices *B*1, *B*2, and *B*3, which correspond to the following three sets

$$\begin{aligned} B\_1 &= \{ \beta\_k^\* : \beta\_k^\* = 0, k = 1, \dots, p \}, \\ B\_2 &= \{ \gamma\_{j,k}^\* : \gamma\_{j,k}^\* = 0 \text{ but } \beta\_k^\* \neq 0, j = 1, \dots, q; k = 1, \dots, p \}, \\ B\_3 &= \{ \gamma\_{j,k}^\* : \gamma\_{j,k}^\* = 0 \text{ and } \beta\_k^\* = 0, j = 1, \dots, q; k = 1, \dots, p \}, \end{aligned}$$

respectively. Define

$$D\_n(\boldsymbol{\xi}) = \sum\_{i=1}^n \alpha\_i \exp(-(\boldsymbol{y}\_i - \mathbf{u}\_i^\top \boldsymbol{\xi})^2/\theta) \frac{2(\mathbf{y}\_i - \mathbf{u}\_{i,\cdot}^\top \boldsymbol{\xi})}{\theta} \mathbf{u}\_{i,\cdot}$$

and

Robust Identification of Gene-Environment Interactions … 45

$$I\_n(\boldsymbol{\xi}) = \frac{2}{\theta} \sum\_{i=1}^n \alpha\_i \exp(-(\mathbf{y}\_i - \mathbf{u}\_{i,\cdot}^\top \boldsymbol{\xi})^2 / \theta) \left(\frac{2(\mathbf{y}\_i - \mathbf{u}\_{i,\cdot}^\top \boldsymbol{\xi})^2}{\theta} - 1\right) \mathbf{u}\_{i,\cdot} \mathbf{u}\_{i,\cdot}^\top.$$

The following conditions are needed to establish the consistency properties.


C1 and C2 have been commonly assumed in the literature. See, for example, (Stute, 1993, 1996; Huang et al., 2007). We note that the independent censoring assumption usually holds in practice, although from a theoretical perspective, quite a few studies have made the weaker conditional independence assumption. We have explored relaxing this assumption and found that alternative and less intuitive assumptions would have to be made. The zero expectations in C3 and C5 ensure the consistency of estimation. C4 is required for Theorem 1, and a similar assumption has been made in (Ma & Du, 2012). C6 requires that the smallest signal does not decay too fast, which is common in studies on high-dimensional inference. The following theorem establishes consistency of the proposed estimator <sup>ζ</sup> .

### **Theorem 1** *Suppose that conditions C1-C6 hold.*

*Let <sup>n</sup>* = (λ<sup>1</sup> ∧ λ2)/{max(Φ1, Φ2, Φ3)}*, where* Φ*<sup>t</sup>* = *I<sup>B</sup><sup>t</sup> <sup>A</sup>* (ζ <sup>∗</sup>)*IA A* (ζ <sup>∗</sup>)−<sup>1</sup>∞*, t* = 1, 2, 3*. If* |*A* | = *o*(*n*)*,* λ<sup>1</sup> ∨ λ<sup>2</sup> → 0*, n*<sup>2</sup> *<sup>n</sup>* → ∞*, and* log *p* = *o*(*n*<sup>2</sup> *n* )*, with probability tending to one, we have*

$$(a) \quad \|\widehat{\xi\_{\mathcal{M}}} - \xi\_{\mathcal{M}}^{\*}\|\_{2} = O\_{p}(\sqrt{|\mathcal{M}|/n}); \quad (b) \quad \widehat{\xi}\_{\mathcal{M}^{c}} = \mathbf{0}.$$

*Proof* For the proof, see Appendix. -

This theorem establishes that the proposed method is able to accommodate *p* with log *p* = *o*(*n*<sup>2</sup> *<sup>n</sup>* ). The penalized robust estimator enjoys the same asymptotic properties as the oracle estimator with probability approaching one. This property holds under high dimensions without restrictive conditions on the errors. To the best of our knowledge, properties of the robust exponential loss, even without censoring, have not been studied under high-dimensional settings. Thus our theoretical investigation can have independent value. Proof of the theorem is presented in Appendix.

# **3 Simulations**

In simulation, we set *n* = 300, *q* = 5, and *p* = 1000. The underlying true model contains a total of 35 nonzero effects, including 5 main E effects, 10 main G effects, and 20 interactions. The "positions" of nonzero main G effects are randomly placed. The nonzero interactions are generated to respect the "main effects, interactions" hierarchy. The nonzero regression coefficients are randomly generated from uniform (0.7, 1.3). We consider both continuous and categorical distributions to mimic, for example, gene expression and SNP data. Specifically, under the continuous scenario, the E and G factors are generated from multivariate normal distributions with marginal means zero, marginal variances one, and the following variance matrix structures: Independent, AR(0.3), AR(0.8), Band(0.3), Band(0.6), and CS(0.2). Under the independent scenario, all factors have zero correlations. Under the AR(ρ) correlation structure, for the *i*th and *j*th factors, *corr* = ρ|*i*<sup>−</sup> *<sup>j</sup>*<sup>|</sup> . Under the Band(ρ) correlation structure, for the *i*th and *j*th factors, *corr* = ρ · *I*(|*i* − *j*| = 2) + 0.3 · *I*(|*i* − *j*| = 1) + *I*(|*i* − *j*| = 0). Under the CS(ρ) correlation structure, for the *i*th and *j*th factors, the correlation coefficient *corr* = ρ*<sup>I</sup>*(*i*= *<sup>j</sup>*) . Under the categorical scenario, we first apply the same data generating approach as described above to obtain **U**. Then for each *ui*,*<sup>j</sup>* , the categorical measurement is generated as *I*(*ui*,*<sup>j</sup>* > −0.7). The threshold value −0.7 is chosen such that the proportion of 1's for each factor is roughly 75%. Under each of the above simulation settings, consider the random error distribution (1 − ξ )*N*(0, 1) + ξ*Cauchy*, with the contamination probability ξ = 0, 0.1, and 0.3. When ξ = 0, the error distribution has no contamination and favors the nonrobust approaches, while the latter two values lead to different levels of contamination. The log event times are generated from the AFT model. The censoring times are generated independently from Weibull distributions. The censoring parameters are adjusted so that the censoring rates are about 25%. Beyond the above scenarios, we also consider a set of parallel scenarios, under which there are 10 main E effects, 20 main G effects, and 40 interactions (that is, the number of important effects is doubled), and the nonzero coefficients are generated from uniform (0.4, 0.6) (that is, the signal levels are reduced by about 50%). Other settings remain the same.

The simulated data are analyzed using the proposed method. In addition, we also consider two alternatives: (a) the nonrobust method that adopts the weighted least squared loss and the same penalty as the proposed, and (b) the quantile regressionbased method that adopts an *L*<sup>1</sup> robust loss and the same penalty as the proposed. We note that multiple other methods are potentially applicable. Comparing with the nonrobust method can directly establish the merit of being robust. The quantile regression-based approach is the most popular for high-dimensional data (Wu & Ma, 2015). Thus these two alternatives are the most sensible to compare with.

All three methods involve tuning parameters. To eliminate the (possibly different) effects of tuning parameter selection on identification accuracy, we consider a sequence of tuning parameter values, evaluate identification accuracy at each value, and calculate the AUC (area under the ROC curve) as the overall measure. This approach has been adopted extensively in published studies (Zhu et al., 2014).

Summary statistics are computed based on 500 replicates. The AUC results for interactions and main effects combined are presented in Tables 1 and 2, respectively, for the scenarios with 35 and 70 important effects. To be thorough, we have also evaluated identification accuracy for interactions and main effects separately and present the AUC results in Tables 4, 5, 6 and 7 in Appendix. For all three methods, the AUC value decreases as the contamination proportion increases, as expected. In Table 1, the proposed method outperforms the two alternatives under all except one scenario. In Table 2, it dominates the alternatives. Under some scenarios, the proposed method leads to a significant improvement in identification accuracy. For example in Table 1, with the continuous G distribution, 30% contamination, and Band(0.3) correlation, the proposed method has a mean AUC of 0.901, while the alternatives have mean AUCs of 0.761 and 0.789. Compared to the nonrobust alternative, the proposed method also has smaller standard errors (Table 3).

We have also experimented with a few other scenarios and made similar observations. In particular, we have examined the scenarios where the event and censoring times have weak to moderate correlations and observed similar satisfactory performance (details omitted). The proposed method and two alternatives respect the hierarchy. We have also looked into simpler alternatives, including MCP and Lasso, which may violate the hierarchy, and observed inferior performance.

# **4 Analysis of the TCGA Lung Adenocarcinoma Data**

Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Profiling studies have been extensively conducted searching for its prognostic factors. Here we analyze the TCGA (The Cancer Genome Atlas Research Network, 2014) data on the prognosis of lung adenocarcinoma. The TCGA data were recently collected and published by NCI and have high quality. The prognosis outcome of interest is overall survival. The dataset contains measurements on 43 clinical/environmental variables and 18,897 gene expressions. There are a total of 468 patients, among whom 117 died during follow-up. The median follow-up time is 8 months. We select four E factors for downstream analysis, namely, age, gender, smoking pack years, and smoking history. These factors have a relatively low missing rate in the TCGA dataset and have been previously suggested as potentially related to lung cancer prognosis. There are a total of 436 samples with both E and G measurements available. Among them, 110 died during follow-up, and the median follow-up time is 23 months. For the 326 censored subjects, the median follow-up time is 6 months. In principle, the proposed method can directly analyze all of the available gene expressions. To improve stability and reduce the computational cost, we conduct marginal prescreening. Specifically, genes are screened based on their univariate regression significance (p-value less than or equal to 0.1) and interquartile range (above the median of all interquartile ranges). Similar prescreenings have been adopted in the literature. A total of 819 gene expressions are included in the downstream model fitting. Note that with the


**Table 1** Simulation: identification of both G-E interactions and main G effects. In each cell, mean AUC (se). There are a total of 35 nonzero effects, with coefficients ∼uniform (0.7, 1.3)


**Table 2** Simulation: identification of both G-E interactions and main G effects. There are a total of 70 nonzero effects, with coefficients ∼uniform (0.4, 0.6)

main G effects as well as interactions, the number of unknown parameters is much larger than the sample size.

Detailed estimation results are presented in Table 3 for the proposed method and Tables 8 and 9 in Appendix for the two alternatives. It is observed that the three methods lead to quite different findings. Specifically, the proposed and quantile methods share four common main G effects and four interactions. Otherwise, there is no overlap in identification. The "signals" in practical data can be weaker than those in simulated data, leading to the significant differences across methods.

With the proposed method, sixteen genes are identified to have interactions with either age or smoking status. As for many other cancer types, age has been identified as a critical factor in lung cancer prognosis. Smoking has been confirmed as the most important E factor for lung cancer risk and prognosis. In the literature, G-E interaction analysis for lung cancer prognosis is still very limited. However, there have been many studies on the functionalities of genes. Searching such studies can provide a partial support to the validity of our analysis results. Among the identified genes, many have been implicated in cancer in the literature. Specifically, the AGPAT family, which includes AGPAT6 as a member, has been found to play a role in multiple cancer types. For example, AGPAT2 and AGPAT11 have been found to be upregulated in ovarian, breast, cervical, and colorectal cancers (Agarwal and Garg, 2010). Another gene that is worth attention is ATF6, which acts both as a sensor and a transcription factor during endoplasmic reticulum stress. ATF6α has been found to promote hepatocarcinogenesis and cancer cell proliferation through activating downstream target gene BIP. Its efficiency of stress recognition and signaling has been found to decrease with age (Naidoo, 2009). We find that gene COLCA2 (colorectal cancer associated 2) interacts with smoking pack years. Studies have shown that COLCA2 may have critical functions in suppressing tumor formation in epithelial cells (Peltekova et al., 2014). We also identify an interaction between NOS1AP and age. It has been found that the protein complex of SCRIB, NOS1AP, and VANGL1 regulates cell polarity and migration, and this complex can be associated with cancer progression (Anastas et al., 2012). An interaction between PPP1R15B and smoking pack years has also been identified. It has been suggested that PPP1R15B is likely to be regulated by Nrf2, which has a protective response to smoking induced oxidative stress in the lung (Taylor et al., 2008). Also, PPP1R15B may promote cancer cell proliferation.

To complement the identification and estimation analysis, we also evaluate stability. Specifically, we randomly select 3/4 of the subjects and apply the proposed method and alternatives. This procedure is repeated 200 times. We then compute the probability that an interaction is identified. Similar procedures have been extensively adopted in published studies. The stability results are also provided in Tables 3, 8, and 9. We see that most of the identified interactions are relatively stable, with many having the probabilities of being identified close to one.

Robust Identification of Gene-Environment Interactions … 51

**Table 3** Analysis of the TCGA lung adenocarcinoma data using the proposed method. The identified interactions are denoted as "gene \* environmental variable". For the interactions, values in "()" are the stability results



**Table 4** Simulation: identification of main G effects. In each cell, mean AUC (se). There are a total of 10 nonzero main effects, with coefficients ∼uniform (0.7, 1.3)


**Table 5** Simulation: identification of G-E interactions. In each cell, mean AUC (se). There are a total of 20 nonzero interactions, with coefficients ∼uniform (0.7, 1.3)


**Table 6** Simulation: identification of main G effects. In each cell, mean AUC (se). There are a total of 20 nonzero main effects, with coefficients ∼uniform (0.4, 0.6)


**Table 7** Simulation: identification of G-E interactions. In each cell, mean AUC (se). There are a total of 40 nonzero interactions, with coefficients ∼uniform (0.4, 0.6)


**Table 8** Analysis of the TCGA lung adenocarcinoma data using the nonrobust method. The identified interactions are denoted as "gene \* environmental variable". For the interactions, values in "()" are the stability results

# **5 Discussions**

To understand the prognosis of complex diseases, it is essential to study G-E interactions. In "classic" low-dimensional biomedical studies, data contamination is found to be not rare, and it has been suggested that robust methods are needed to accommodate contamination. This study has developed a robust method for high-dimensional genetic interaction analysis, which is still limited in the literature. The proposed method consists of a novel robust loss function and a penalized identification strategy that respects the "main effects, interactions" hierarchy, both of which have novel Robust Identification of Gene-Environment Interactions … 57

Effect Estimate ×100 Age 0.891 ATP6V1C1 0.252 C1ORF27 7.321 SDE2 0.337 CD46 0.584 DNAJC21 1.272 KLHL7 0.932 PTK2 9.426 PVT1 1.148 RAB3GAP2 0.845 TSPAN3 8.557 TWISTNB 0.872 WDR26 1.265 WIPI2 7.227 YWHAZ 1.883 ATP6V1C1 \* age 0.0172(0.724) C1ORF27 \* age 0.295(0.899) SDE2 \* age 0.0153(0.758) CD46 \* age 0.0344(0.862) DNAJC21 \* age 0.327(0.791) KLHL7 \* age 0.372(0.514) PTK2 \* age 1.074(0.927) PVT1 \* age 0.876(0.711) RAB3GAP2 \* age 0.923(0.757) TSPAN3 \* age 1.388(0.942) TWISTNB \* age 0.915(0.812) WDR26 \* age 1.279(0.798) WIPI2 \* age 1.891(0.906) YWHAZ \* age 1.596(0.796)

**Table 9** Analysis of the TCGA lung adenocarcinoma data using the quantile method. The identified interactions are denoted as "gene \* environmental variable". For the interactions, values in "()" are the stability results

advancements. Also significantly advancing from the literature, we have rigorously established the consistency properties. The theoretical results may seem "familiar", which is "comforting" in that the consistency properties are not sacrificed with the additional robustness, high dimensionality, and interactions. It is worth noting that the consistency results do not demand excessive assumptions on the error distribution, which are usually needed in the existing literature. In simulation, the proposed method outperforms the nonrobust alternative. It is interesting to note that it has superior performance when there is no contamination. Another important finding is that it also outperforms the quantile-based robust method. Most of the existing high-dimensional robust studies have adopted the quantile regression technique. Our simulation suggests that it is prudent to develop alternative robust methods. In the analysis of TCGA lung cancer data, the proposed method generates results with some overlappings with the quantile regression method, however, none with the nonrobust method. The identified genes have important implications, and the identified interactions are stable.

The proposed study can be potentially extended in multiple directions. In survival analysis, there are many other models beyond the AFT. It can be of interest to develop robust methods based on other models. We have studied G-E interactions. It can be of interest to extend to G-G interactions. In theoretical analysis, one problem left is the breakdown point. Because of the extremely high complexity, this problem has been left uninvestigated in many other robust studies too. In our simulation, we have experimented with contamination rate as high as 30%, which is much higher than many of the existing studies. The superiority of the proposed method over the quantile regression method is observed. The relative efficiency of different robust methods, although of interest, will be postponed to future studies. In data analysis, the proposed method identifies a different set of main effects and interactions. Mining the literature and the stability evaluation can support the validity of findings to a certain extent. More validations need to be pursued in the future.

# **Appendix**

### **Proof of Theorem 1**

*Proof* Define the oracle estimator <sup>ζ</sup> with ζ*<sup>A</sup> <sup>c</sup>* <sup>=</sup> 0 and

$$\widehat{\boldsymbol{\xi}}\_{\boldsymbol{\omega}'} = \arg\max \sum\_{i=1}^{n} \boldsymbol{\omega}\_{i} \exp(-(\mathbf{y}\_{i} - \mathbf{u}\_{i,\boldsymbol{\omega}'}^{\top} \boldsymbol{\xi}\_{\boldsymbol{\omega}'})^2 / \theta). \tag{4}$$

Recall that the proposed objective function is

$$L\_{\lambda\_1,\lambda\_2,\theta}(\boldsymbol{\xi}) = \mathcal{Q}\_{\theta}(\boldsymbol{\xi}) - \sum\_{k=1}^{p} \rho(\|b\_k\|; \lambda\_1, s) - \sum\_{k=1}^{p} \sum\_{j=2}^{q+1} \rho(|b\_{kj}|; \lambda\_2, s). \tag{5}$$

In what follows, we first establish the estimation consistency of<sup>ζ</sup> in Step 1, and then show that <sup>ζ</sup> is a local maximizer of *<sup>L</sup>*<sup>λ</sup>1,λ2,θ (ζ ) is Step 2. **Step 1**. Define the objective function

$$R\_n(\boldsymbol{\zeta}\_{\boldsymbol{\omega}'}) = \sum\_{i=1}^n \boldsymbol{\omega}\_i \exp(-(\boldsymbol{\chi}\_i - \mathbf{u}\_{i,\boldsymbol{\omega}'}^\top \boldsymbol{\xi}\_{\boldsymbol{\omega}'})^2/\theta).$$

Robust Identification of Gene-Environment Interactions … 59

Then ζ*<sup>A</sup>* <sup>=</sup> arg max*Rn*(ζ*<sup>A</sup>* ). Let *rn* <sup>=</sup> <sup>|</sup>*<sup>A</sup>* <sup>|</sup>/*n*. To prove ζ*<sup>A</sup>* <sup>−</sup> <sup>ζ</sup> <sup>∗</sup> *<sup>A</sup>* <sup>2</sup> = *Op*(*rn*), it suffices to show that for any given η > 0, there exists a sufficiently large constant *C* > 0,

$$\Pr\left(\sup\_{\zeta\_{\mathcal{A}'} \in \mathcal{J}} R\_n(\zeta\_{\mathcal{A'}}) < R\_n(\zeta\_{\mathcal{A'}}^\*)\right) \ge 1 - \eta,\tag{6}$$

where *I* = ζ*<sup>A</sup>* : ζ*<sup>A</sup>* − ζ <sup>∗</sup> *<sup>A</sup>* <sup>2</sup> = *Crn* . This implies that *Rn*(ζ*<sup>A</sup>* ) has a local maximizer ζ*<sup>A</sup>* that satisfies ζ*<sup>A</sup>* <sup>−</sup> <sup>ζ</sup> <sup>∗</sup> *<sup>A</sup>* <sup>2</sup> = *Op*(*rn*).

Recall the definitions of *Dn*(ζ ) and *In*(ζ ). By Taylor's expansion, we have

$$\begin{split} R\_{n}(\boldsymbol{\xi}\_{\mathcal{M}}) - R\_{n}(\boldsymbol{\xi}\_{\mathcal{M}}^{\*}) &= \sum\_{i=1}^{n} \omega\_{i} \left\{ \exp(-(\mathbf{y}\_{i} - \mathbf{u}\_{i,\mathcal{M}}^{\top} \boldsymbol{\xi}\_{\mathcal{M}})^{2}/\theta) - \exp(-(\mathbf{y}\_{i} - \mathbf{u}\_{i,\mathcal{M}}^{\top} \boldsymbol{\xi}\_{\mathcal{M}}^{\*})^{2}/\theta) \right\} \\ &= D\_{n,\mathcal{M}}(\boldsymbol{\xi}^{\*})^{\top}(\boldsymbol{\xi}\_{\mathcal{M}} - \boldsymbol{\xi}\_{\mathcal{M}}^{\*}) \\ &\quad + \frac{1}{2} (\boldsymbol{\xi}\_{\mathcal{M}} - \boldsymbol{\xi}\_{\mathcal{M}}^{\*})^{\top} I\_{n,\mathcal{M},\mathcal{M}}(\bar{\boldsymbol{\xi}}) (\boldsymbol{\xi}\_{\mathcal{M}} - \boldsymbol{\xi}\_{\mathcal{M}}^{\*}) \\ &\doteq \boldsymbol{\mathcal{Q}}\_{1} + \boldsymbol{\mathcal{Q}}\_{2}, \end{split} \tag{7}$$

where ζ¯ lies between ζ <sup>∗</sup> and ζ . By C3 and C4, we have that for all *j* ∈ {1, ··· , *p* + *q* + *pq* + 1} and any given *t*, Pr(|*Dn*,*<sup>j</sup>*(ζ <sup>∗</sup>)| > *t*) ≤ 2 exp −*nt* <sup>2</sup>/σ<sup>2</sup> . Then *E*(| <sup>√</sup>*n Dn*,*<sup>j</sup>*(ζ <sup>∗</sup>)|) < *<sup>K</sup>* <sup>&</sup>lt; <sup>∞</sup> for all *<sup>j</sup>*. With Markov's inequality,

$$\Pr(\|D\_{n,\omega'}(\zeta^\*)\|\_2 > t) \le E[\|\sqrt{n}D\_{n,\omega'}(\zeta^\*)\|\_2^2]/(nt^2) \le |\omega'|K/(nt^2).$$

By the Cauchy-Schwarz inequality, *Q*<sup>1</sup> ≤ *CDn*,*<sup>A</sup>* (ζ <sup>∗</sup>)2*rn*. Let *t* = *C*ρ∗*rn*/3, where ρ<sup>∗</sup> is the smallest eigenvalue of −*IAA*(ζ <sup>∗</sup> *<sup>A</sup>*). From C5, we have that ρ<sup>∗</sup> is bounded away from zero and infinity. Then we have

$$\Pr(\mathcal{Q}\_1 \le \frac{1}{3}\rho\_\* C^2 r\_n^2) \le 1 - \frac{9K}{C^2 \rho\_\*^2}.\tag{8}$$

For *Q*2, we have

$$\begin{split} 2\mathcal{Q}\_{2} &= (\xi\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*})^{\top} I\_{\mathcal{M}'\mathcal{M}}(\xi^{\*}) (\xi\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*}) \\ &+ (\xi\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*})^{\top} \left\{ I\_{\mathcal{M}'\mathcal{M}}(\overline{\xi}) - I\_{\mathcal{M}'\mathcal{M}}(\xi^{\*}) \right\} (\xi\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*}) \\ &+ (\xi\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*})^{\top} \left\{ I\_{\mathcal{n},\mathcal{a}'\mathcal{M}}(\overline{\xi}) - I\_{\mathcal{n}'\mathcal{M}}(\overline{\xi}) \right\} (\xi\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*}) \\ &\doteq \mathcal{Q}\_{21} + \mathcal{Q}\_{22} + \mathcal{Q}\_{23}. \end{split} \tag{9}$$

Since λmax(*IA A* (ζ <sup>∗</sup>)) ≤ −ρ<sup>∗</sup> by C5, we have

$$
\mathcal{Q}\_{21} \le -\rho\_\* \mathcal{C}^2 r\_n^2. \tag{10}
$$

Under C4, we have

$$\mathcal{Q}\_{22} \le \kappa \|\bar{\xi} - \xi^\*\|\_2 C^2 r\_n^2 < \kappa C^3 r\_n^3 < \frac{1}{6} C^2 \rho\_\* r\_n^2. \tag{11}$$

The second inequality holds since ζ¯ lies between ζ <sup>∗</sup> and ζ , which yields ζ¯ − ζ ∗<sup>2</sup> < *Crn*. When *n* is sufficiently large, the last inequality holds. With C4 and Bonferroni's inequality,

$$\Pr(\|I\_{n,\mathcal{M}\ni\mathcal{J}}(\bar{\xi}) - I\_{\mathcal{M}\ni\mathcal{J}}(\bar{\xi})\|\_F^2 \geq \rho\_\*^2/9) \leq 2|\mathcal{M}|^2 \exp\left(-n\rho\_\*^2/\sigma^2\right),$$

where ·*<sup>F</sup>* denotes the Frobenius norm. By the inequality λmax(*In*,*A A* (ζ )¯ − *IA A* (ζ )) ¯ ≤ *In*,*A A* (ζ <sup>∗</sup>) − *IA A* (ζ <sup>∗</sup>)*<sup>F</sup>* , we have

$$\mathcal{Q}\_{23} \le \frac{1}{3} \rho\_\* C^2 r\_n^2 \text{ with probability at least } 1 - 2|\omega'|^2 \exp\left(-n\rho\_\*^2/\sigma^2\right). \tag{12}$$

Combining (9), (10), (11), and (12), we have

$$\Pr(Q\_2 < -\frac{1}{2}\rho\_\* C^2 r\_n^2) \ge 1 - 2|\rho'|^2 \exp\left(-n\rho\_\*^2/\sigma^2\right). \tag{13}$$

With (7), (8), and (13), we have

$$R\_n(\xi\_{\omega'}) - R\_n(\xi\_{\omega'}^\*) < -\frac{1}{6} \rho\_\* C^2 r\_n^2 < 0 \tag{14}$$

with probability at least

$$1 - \frac{9K}{C^2 \rho\_\*^2} - 2|\rho'|^2 \exp\left(-n\rho\_\*^2/\sigma^2\right).$$

Note that ρ<sup>∗</sup> is bounded away from zero and infinity in C5. As *n* → ∞, the above probability is bigger than 1 <sup>−</sup> <sup>16</sup>*<sup>K</sup> C*2ρ<sup>2</sup> ∗ . Let *C* = 4ρ−<sup>1</sup> ∗ <sup>√</sup>*K*/η, then we can conclude (6). **Step 2**. Next we show that the oracle estimator<sup>ζ</sup> studied in Step 1 satisfies the Karush-Kuhn-Tucher (KKT) condition, and then<sup>ζ</sup> is a local maximizer of *<sup>L</sup>*<sup>λ</sup>1,λ2,θ (ζ ). Based on the results in Step 1 and C6, we only need to check the following conditions

$$\left\| \left\| D\_{\mathfrak{n}, \partial \mathbb{I}\_1} (\widehat{\boldsymbol{\zeta}}) \right\|\_{\infty} < \lambda\_1, \; \left\| D\_{\mathfrak{n}, \partial \mathbb{I}\_2} (\widehat{\boldsymbol{\zeta}}) \right\|\_{\infty} < \lambda\_2, \; \left\| D\_{\mathfrak{n}, \partial \mathbb{I}\_3} (\widehat{\boldsymbol{\zeta}}) \right\|\_{\infty} < \lambda\_1 + \lambda\_2 \qquad (15)$$

hold with asymptotic probability one, where ν∞ = max*<sup>i</sup>* |ν*i*| for any vector ν = (ν1, ··· , ν|*<sup>A</sup> <sup>c</sup>* <sup>|</sup>). Applying Taylor's expansion,

$$D\_{n, \beta \parallel}(\widehat{\boldsymbol{\zeta}}) = D\_{n, \beta \parallel}(\boldsymbol{\xi}^\*) + I\_{n, \beta \parallel, \boldsymbol{\omega}'}(\widetilde{\boldsymbol{\xi}})(\widehat{\boldsymbol{\zeta}}\_{\boldsymbol{\omega}'} - \boldsymbol{\xi}^\*\_{\boldsymbol{\omega}'}),\tag{16}$$

where <sup>ζ</sup> lies between <sup>ζ</sup> <sup>∗</sup> and <sup>ζ</sup> . From (4) and the proof of Theorem 1(a), we have

$$
\widehat{\xi}\_{\mathcal{M}} - \xi\_{\mathcal{M}}^{\*} = -I\_{\mathfrak{n}, \mathcal{A}' \mathcal{A}'} (\bar{\xi})^{-1} D\_{\mathfrak{n}, \mathcal{A}'} (\xi^{\*}), \tag{17}
$$

where <sup>ζ</sup>¯ lies between <sup>ζ</sup> <sup>∗</sup> and <sup>ζ</sup> , which is defined in Step 1. By substituting (17) into (16),

$$D\_{n, \beta \uparrow}(\widehat{\boldsymbol{\xi}}) = D\_{n, \beta \uparrow}(\boldsymbol{\xi}^\*) - I\_{n, \beta \downarrow \boldsymbol{\omega}'}(\widetilde{\boldsymbol{\xi}}) I\_{n, \boldsymbol{\omega}' \cdot \boldsymbol{\omega}'}(\widetilde{\boldsymbol{\xi}})^{-1} D\_{n, \boldsymbol{\omega}'}(\boldsymbol{\xi}^\*). \tag{18}$$

Here we define

$$
\Delta\_{n,\beta\mathbb{I}\_1}^\* = D\_{n,\beta\mathbb{I}\_1}(\boldsymbol{\xi}^\*) - I\_{\beta\mathbb{I}\_1\rtimes'}(\boldsymbol{\xi}^\*)I\_{\alpha'\rtimes'}(\boldsymbol{\xi}^\*)^{-1}D\_{n,\alpha'}(\boldsymbol{\xi}^\*).
$$

Inspired by the deduction of *Q*<sup>2</sup> in Step 1, we can establish that

$$\Pr(\|D\_{n,\partial\mathbb{I}\_1}(\widehat{\zeta})\|\_{\infty} > \lambda\_1) \asymp \Pr(\|\Delta\_{n,\partial\mathbb{I}\_1}^\*\|\_{\infty} > \lambda\_1).$$

That is, we only need to focus on Δ<sup>∗</sup> *n*,*B*<sup>1</sup> ∞ in order to evaluate the probability of {*Dn*,*B*<sup>1</sup> (ζ )∞ < λ1} in (15). Note that,

$$\begin{split} \|\Lambda\_{n,\vec{\beta}\_{1}}^{\*}\|\_{\infty} &\leq \|D\_{n,\vec{\beta}\_{1}}(\boldsymbol{\xi}^{\*})\|\_{\infty} + \|I\_{\vec{\beta}\_{1}\rtimes\boldsymbol{\mathcal{A}}'}(\boldsymbol{\xi}^{\*})I\_{\boldsymbol{\mathcal{A}}'\rtimes\boldsymbol{\mathcal{I}}}(\boldsymbol{\xi}^{\*})^{-1}D\_{n,\boldsymbol{\mathcal{A}}'}(\boldsymbol{\xi}^{\*})\|\_{\infty} \\ &\leq \|D\_{n}(\boldsymbol{\xi}^{\*})\|\_{\infty} + \|I\_{\vec{\beta}\_{1}\rtimes\boldsymbol{\mathcal{A}}'}(\boldsymbol{\xi}^{\*})I\_{\boldsymbol{\mathcal{A}}'\rtimes\boldsymbol{\mathcal{I}}}(\boldsymbol{\xi}^{\*})^{-1}\|\_{\infty} \|D\_{n}(\boldsymbol{\xi}^{\*})\|\_{\infty} .\end{split} \tag{19}$$

Recall that Φ<sup>1</sup> = *I<sup>B</sup>*1*<sup>A</sup>* (ζ <sup>∗</sup>)*IA A* (ζ <sup>∗</sup>)−1∞. If

$$\|D\_n(\zeta^\*)\|\_{\infty} < \frac{\lambda\_1}{1 + \Phi\_1},$$

along with (19), we have Δ<sup>∗</sup> *n*,*B*<sup>1</sup> ∞ < λ1. Similarly, we also need

$$\|D\_n(\zeta^\*)\|\_{\infty} < \frac{\lambda\_2}{1 + \Phi\_2}, \text{ and } \|D\_n(\zeta^\*)\|\_{\infty} < \frac{\lambda\_1 + \lambda\_2}{1 + \Phi\_3}.$$

to satisfy the other two conditions in (15), where Φ<sup>2</sup> = *I<sup>B</sup>*2*<sup>A</sup>* (ζ <sup>∗</sup>)*IA A* (ζ <sup>∗</sup>)−<sup>1</sup>∞ and Φ<sup>3</sup> = *I<sup>B</sup>*3*<sup>A</sup>* (ζ <sup>∗</sup>)*IA A* (ζ <sup>∗</sup>)−<sup>1</sup>∞. Based on the above discussions, we have

$$\|\|D\_n(\xi^\*)\|\|\_{\infty} < \frac{\lambda\_1 \wedge \lambda\_2}{1 + \max\_{t=1}^3 \Phi\_t} < \min\{\frac{\lambda\_1}{1 + \Phi\_1}, \frac{\lambda\_2}{1 + \Phi\_2}, \frac{\lambda\_1 + \lambda\_2}{1 + \Phi\_3}\}.$$

We now derive the probability bound for the above event. By Bonferroni's inequality and C4, we can obtain

$$\Pr\left\{\|D\_{\hbar}(\boldsymbol{\zeta}^{\ast})\|\_{\infty} < \frac{\lambda\_{1} \wedge \lambda\_{2}}{1 + \max\_{t=1}^{3} \Phi\_{t}}\right\} \geq 1 - 2(pq + p + q + 1)\exp\left(-\frac{n(\lambda\_{1} \wedge \lambda\_{2})^{2}}{(1 + \max\_{t=1}^{3} \Phi\_{t})^{2}\sigma^{2}}\right).$$

Combining the results in Steps 1 and 2, we conclude that <sup>ζ</sup> is a local maximizer of *L*<sup>λ</sup>1,λ2,θ (ζ ) with probability at least

$$1 - O\left(p \exp\left(-\frac{n(\lambda\_1 \wedge \lambda\_2)^2}{(1 + \max\_{t=1}^3 \Phi\_t)^2 \sigma^2}\right)\right),$$

and satisfies ζ*<sup>A</sup>* <sup>−</sup> <sup>ζ</sup> <sup>∗</sup> *<sup>A</sup>* <sup>2</sup> = *Op*( <sup>|</sup>*<sup>A</sup>* <sup>|</sup>/*n*), ζ*<sup>A</sup> <sup>c</sup>* <sup>=</sup> 0. With C6, log *<sup>p</sup>* <sup>=</sup> *<sup>O</sup>*(*n*<sup>2</sup> *n* ), and *<sup>n</sup>* = (λ<sup>1</sup> ∧ λ2)/{max(Φ1, Φ2, Φ3)}, this tail probability is exponentially small. The theorem is thus proved.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Novel Approach for Improving Accuracy for Distributed Storage Networks**

**Liu Lu, Ke Yuanyuan, and Yuan Yong**

**Abstract** With the development of storage technology and Internet technology, cloud storage continues to make its impact. Scalability, reliability, and lowered costs have made cloud storage widely used with success in businesses and individuals. The advent of the blockchain has brought some changes. As the incentive layer for IPFS, Filecoin allows storage resources to become tradable, greatly extending storage capacity. However, the process of testing the integrity of data still needs constant improvement. In this chapter, we propose a new data audit proof, in which nodes continuously upload hashed data that has been added to random numbers, and the smart contract will compare the result to verify the integrity of the data. Meanwhile, data owner could calculate and then challenge to verify the data integrity. There are audit miners responsible for regulating the behavior of miners and the protection of users' data, and audit miners in a state of semi-participation. It is demonstrated later in the chapter that this proof is accurate enough and resistant to attacks.

**Keywords** Distributed storage networks · Cloud storage · Blockchain

# **1 Introduction**

Storage technology has evolved rapidly over the last few decades, with continuously decreasing hard disk prices and ever faster data speeds. However, the rapid growth of the online economy and big data technology has caused the need for data storage to expand exponentially, leading to the idea of cloud storage, in which data will be stored on cloud servers provided by third parties, and thus users can access data in a timely

L. Lu · K. Yuanyuan · Y. Yong (B)

School of Mathematics, Renmin University of China, Beijing, China e-mail: yong.yuan@ruc.edu.cn

L. Lu e-mail: liulu0309@ruc.edu.cn

K. Yuanyuan e-mail: ke\_yy@163.com

© The Author(s) 2023 Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_4

65

and more convenient manner. Typically, cloud storage providers use technologies such as distributed storage (Mattson et al., 1970) to significantly reduce storage costs. However, the centralized storage makes cloud storage providers vulnerable to single points of failure and can create risks such as overstepping provider privileges and causing information leakage. Efficient centralized storage provisioning will remain mainstream in the future, but there is also an emerging and urgent need to meet users' needs for information security.

Traditional cloud storage providers, such as Amazon and Google, build cloud storage architectures with vast resources, using distributed storage technology to serve billions of users. Distributed storage means that data are spread across multiple storage servers and these scattered storage resources form a virtual storage device, effectively storing data in various places across the provider. The benefits of distributed storage are increased system reliability, availability, and access efficiency, as well as improved scalability. But the disadvantages are also obvious. We store our data on Google Cloud Drive on the basis that we trust Google to protect our data from being tampered with or lost, which can also lead to other disadvantages. The central server is vulnerable to attacks from adversaries, and internal failures and malpractice can also lead to data loss. As such, the security of cloud storage has also been a focus of attention in recent years. Traditional symmetric encryption algorithms put the keys on a central server, which makes it easier for attackers to get these keys and thus reduces the security of information. Moreover, data integrity verification whether data are stored efficiently and without deletion is also a crucial part of cloud storage services. Literature (Priyadharshini, 2012) summarizes the data integrity verification of traditional cloud storage, which is performed by TPA (Third Party Auditor) between the user and the CSP (Cloud Service Provider) to validate the data. The user poses a challenge to verify the integrity of their cloud data, and the TPA responds by comparing the original data, or the hash value of it according to literature (Zikratov et al., 2017), to verify the integrity. However, inefficiencies and tripartite or joint evil behavior can make opaque audit proofs unreliable. The convenience of centralized services brings with it the corresponding pitfalls. With the emergence of Bitcoin, decentralized technology continues to be improved, and decentralized storage brings an important addition to the traditional storage market.

The idea of providing decentralized storage has become popular with the rise of blockchain technology, and their combination could be considered a perfect fit. Blockchain enables reaching the consensus among decentralized, untrusted nodes. Its development has facilitated intensive research in several technologies such as cryptography, data structures, and consensus algorithms. When data are stored in multiple copies on the hard drives of different nodes, we cannot guarantee that all nodes are trustworthy. How to ensure the security and integrity of the data is a very crucial issue. After ensuring the stability of the storage, we also need to consider how to motivate people to become nodes and provide their own storage capacity, which requires a reasonable incentive mechanism.

Much of the current research is focused on issues such as access control, integrity verification, data retrieval, and traceability. Many platforms that offer distributed storage have already been launched. For example, the Sia

(Vorick & Champine, 2023) storage system, which was online earlier, has been unable to be developed effectively due to its less than optimal incentive design. IPFS (Benet, 2014), as a relatively complete platform, is a distributed storage system protocol for distributing and storing resources of various data types. Filecoin (Protocol Labs, 2023), as its incentive layer, incentivizes storage miners and retrieval miners to complete their own work by issuing tokens. Taking Filecoin as an example, there are three roles in Filecoin: client, storage miner, and retrieval miner. Clients pay for the service of storing and retrieving data. They can choose from a selection of available service providers. If they want to store private data, they need to encrypt it before submitting it to the service provider. Storage miners store clients' data for a reward. They decide for themselves how much space to provide for storage. After the client and the storage miner have reached an agreement, the miner is obliged to provide proof of their stored data on an ongoing basis. Everyone can view this proof and make sure that the storage miner is reliable. Retrieval miners give data to customers upon their request. They can retrieve data from clients or storage miners. Retrieval miners and clients use small payments to exchange data and tokens. The data are fragmented and the client pays a small amount of tokens for each fragment. Retrieval miners can also act as storage miners at the same time.

We will now show how a decentralized storage network stores and audits. As shown in Fig. 1, we demonstrate a cloud service with blockchain participation in two aspects: storage and audit. Data owners upload their data to miners on the server, who store the data and record the transactions on the blockchain. The blockchain also verifies data owner's information and protects the user's privacy. In order to ensure that their data are stored intact on the server, data owner challenges the TPA, which

**Fig. 1** Decentralized storage network framework

sends a request to the system and verifies the data provided by the miner in response to data owner's challenge. The verified result is then recorded on the blockchain. So each block in the blockchain stores information such as the height of the block, the block header, information about the previous block, a timestamp, storage message, ID, and Auditing message.

Data integrity verification in distributed storage, i.e., the red lines in Fig. 1, is our concern. Data integrity verification is the verification that data is stored intact in the storage space of each untrusted node. This affects the security of the data and is key to the availability of the storage service. Current data integrity validation can be divided into two kinds, one for traditional cloud-based data integrity validation and the other for blockchain-based data integrity validation. In turn, audit solutions using blockchain technology can be divided into whether or not TPA is involved. Most of these audit schemes verify against raw data and avoid dishonest behaviors such as delayed audits, sybil attack, and generation attack through consensus and incentive mechanisms. Blockchain-based data integrity verification can be used not only for auditing cloud data in cloud storage networks but also for different data scenarios to improve the security of the system. However, efficiency and accuracy cannot be achieved together in the process of decentralized data auditing. This will be described in Sect. 3. Most of the verification schemes that have worked better in current research do not run in public blockchains or require the participation of trusted central nodes. Instead, in fully decentralized blockchains, most are more efficient in order to ensure availability. However, the accuracy of verification cannot be fully guaranteed and the system is vulnerable to dishonest attacks. Our algorithm will improve long-term efficiency and stability in a fully decentralized blockchain with guaranteed accuracy.

Our work is based on a modification of Filecoin for verifying the integrity of data in distributed storage. The audit miner in this algorithm is semi-involved and determines whether the data are kept intact by comparing the hash values of the data shards. If the result does not meet the desired goal, the audit miner will first ensure the integrity of the data and then find the storage miner that created the problem, acting as a reasonable supervisor. In Sect. 2, we will summarize the past work on distributed audit algorithms and describe the characteristics of each platform. In Sect. 3, we will present our audit algorithm and analyze its advantages and the problems it solves. Sect. 4 will analyze the fault tolerance of this audit proof.

# **2 Related Works**

# *2.1 Audit Research*

Ensuring data integrity in cloud computing has always been an important issue, and it is a guarantee that cloud computing can be widely used. Traditional data integrity verification can be divided into deterministic and probabilistic types. The dishonest behavior of TPA is also an important issue for audit algorithms when they are entrusted to perform audit integrity verification work. Blockchain technology with a decentralized architecture no longer relies excessively on the honesty of third parties and reaching an overall consensus based on a reasonable consensus and incentive mechanism and then a mutual benefit for all parties is the core element to be explored at this stage.

Literature (Zikratov et al., 2017) proposes a private blockchain called Zeppar, which determines the integrity of data by comparing the hash values of files. The use of cryptographic techniques to verify data integrity by comparing the original data is a common and applicable method. Such an approach is also used in literature (Wei et al., 2020), where smart contracts monitor data changes based on the unique hash value corresponding to the file generated by the Merkle Hash Tree (MHT). Verifying data integrity by constructing MHT is a relatively convenient method, e.g., in literature (Bai et al., 2018; Li et al., 2020). In literature (Li et al., 2020), data owner (DO) stores the verification tag of the data on the blockchain and verifies the data integrity by constructing MHT. After the blockchain network receives a request from the DO, it calculates the MHT root of the specified data, the CSP receives the DO's challenge and also calculates the corresponding MHT root, and the DO verifies the integrity of the data by comparing the two. We can find that neither the method of comparing file hash values nor the construction of MHT requires the involvement of TPA. Such an approach can be very efficient for verification but will compromise on the degree of centralization or be less fault-tolerant. It is relatively suitable for distributed storage systems where efficiency is required.

In order to ensure the activity of the data, some auditing schemes use the provision of random numbers to avoid users falsifying the results of data validation in advance. Literature (Pinheiro et al., 2020) uses the user's data information to generate random challenges and uses the smart contract to audit the challenge-response information sent by the CSP. The audit scheme also assesses the trustworthiness of each CSP.

# *2.2 Distributed Storage Project*


deferred encoding of data to get a copy of the data and then generates a zeroknowledge proof to guarantee the correctness of the encoding process. Its other consensus algorithm, Proof-of-Spacetime, requires miners to periodically generate Merkle proofs for the replicas and submit them to the blockchain compressed with zero-knowledge proofs for tokens reward. Such an incentive encourages miners to store data correctly and to prove data liveness to obtain proof of work as a reward.


Table 1 has given the difference among these four platforms. We can find that their audit proofs are different and lead to other differences in other natures.

However, there are still some flaws. The current work almost verifies the integrity of distributed storage data under specific conditions, but none of it has a systematic analysis of the limitations of auditing. We will analyze the compromise factors that


**Table 1** Distributed storage networks comparison

can arise from audit algorithms in distributed storage in the next section. We also analyze what requirements the Filecoin platform should have for auditing and what constraints it should have on storage miners. We design an audit proof for distributed storage and prove that it is sufficiently accurate and fault-tolerant.

# **3 Audit Algorithm**

# *3.1 An Audit Framework*

In this chapter, we will reformulate the audit proof of Filecoin to address the current problems of Filecoin platform. Our goal is to retain the decentralized nature and allow the distributed storage network to complete the audit process on its own. Audit miners will only appear when necessary. This will ensure the accuracy of the audit and improve the efficiency of all nodes in reaching consensus on the audit results. We propose the audit impossibility proposition regarding the distributed storage networks as follows:

**Proposition 1** (Audit impossibility): The degree of decentralization, the accuracy of audit results, and audit efficiency cannot be reached at the same time.

When integrity checks are performed on an absolutely centralized storage server, CSP can invest significant resources in a way that increases the efficiency and accuracy of the audit, as many cloud storage providers do nowadays. This is the approach that currently dominates the cloud storage market. However, with decentralization, we cannot perform fast and efficient integrity checks on untrustworthy storage nodes based on today's computing power and the sheer volume of data. How to balance accuracy and efficiency is currently the key issue for auditing in all distributed storage. For Filecoin, decentralization is its biggest advantage. However, too frequent data auditing not only affects the accuracy of the data audit results, but also causes the system to be less stable when the nodes are offline. Therefore, to improve the efficiency of auditing while ensuring the accuracy of the audit results is the issue considered in this chapter.

Our design starts by slicing and numbering the data owners encrypted data using the shard technique and then generates multiple copies (*k* copies) by replication, which will be stored randomly on storage miners. When auditing these files, we will take the last 16 bits of the hash of the previous block as the new random number *N* , which all miners will add to each of their stored shards for hashing. The result will need to be uploaded to the hash pool in a certain order with the miners' signatures. All the hash values are automatically matched by the smart contract. By determining whether the corresponding hash value is equal to *k*, it is concluded that the data are stored intact in the distributed storage network. This allows a simple comparison of

**Fig. 2** Algorithm overview

the results to determine whether the data owners' data are completely stored across the network. The data owner can also, but not necessarily, add his/her own shard data to the random number *N* and hash them. The result is then compared across the hash pool to determine if the data are stored correctly on the miners by finding the same *k* values in the network result. If the storage miner is not validly stored, the audit miner needs to find the problem miner quickly and back up the data in time (Fig. 2).

There are three roles in our platform, data owners *U*, storage miners *M*, and audit miners *A*. Data owners upload encrypted data according to their needs and can challenge the integrity of the data. The storage miner stores the data sequentially as assigned by the smart contract as well as uploads proof of data integrity every once in a while. The audit miner is responsible for handling the distribution of data, as well as reviewing and supervising miners, protecting data integrity, regulating content, and assuming legal responsibility. The number of audit miners is limited and storage miners can be audit miners at the same time. Audit miners only appear if there are problems with the audit.

# *3.2 Data Uploading*

From the moment the user uploads data, the user *Ui* should divide his/her data *D*(*i*) into several shards by using slicing and encryption technology in order to keep the data secure. If he/she does not have enough computing power to handle too large data, he/she can upload them to audit miner *A* for slicing and encrypting and then pay some tokens. All the shards are then distributed by the audit miner to storage miner *M* and back to user *Ui* . Data slicing is a common technique used in distributed storage to protect the data. We use *D*(*i*, *j*) to denote the *j*th shard of *Ui*'s data. The number of shards *Ui* has, *Ji* , will be determined by the size of *Ui*'s data. Replication is also used to replicate *k* copies of *D*(*i*, *j*): *D*(*i*, *j*, *k*). Before uploading, the *Ui* can add a random number *N* known only to him/her to calculate the result for all *D*(*i*, *j*) and encrypt the result as a validation audit option later. Next, the miner *Ma* is randomly sent a request to store the corresponding shard or not, with a specific request (Definition 1). *Ma* that receives the request has to choose whether to store the data or not, depending on its storage capacity. The miner who confirms storage will store the corresponding *D*(*i*, *j*) on his local hard disk. The label (*i*, *j*) of the data *D*(*i*, *j*) will only be stored in the smart contract and will not be transmitted to the miner who stored it. The miner will not know the exact label (*i*, *j*) of the data he/she stores, but will only number them sequentially according to the order in which he/she stores *D*(*i*, *j*). If *D*(*i*, *j*) is the sixth data storage of *Ma*, then the corresponding *D*(*i*, *j*) is *Ma*(6). We would use *Ma*() to express the set of shards stored by *Ma*. This allows for better protection of the user's information and data, and prevents the exchange of content between miners as much as possible.

We can effectively prevent malicious miners from sybil attacks or other attacks by slicing and replicating the data and storing them in a decentralized manner. We also require an appropriate specific request for sending shards to avoid joint attacks by miners.

**Definition 1** *(Request of distributing shards):* The distribution of the set {*D*(*i*, *j*, *k*), ∀*i*, *j*} to miners is subject to the following principles:


This ensures that the data are stored in a sufficiently decentralized manner, with enough miners storing the data owner's data together, so that a single point of failure does not have a major impact on the overall storage. It also ensures that the user's data are not stolen in its entirety, guaranteeing the security of the data. Definition 1 also makes the data stored by the two nodes different, avoiding outsourcing attack. We will specifically analyze the effectiveness of our algorithm in Sect. 4.

# *3.3 Self-integrity Verification*

Now all data owner's data has been uploaded to each storage miner. We then need to continuously interact with all miners to ensure that data liveness is guaranteed and the data are being stored intact. This is the core of the work in this chapter. We have described in Sect. 2, the current auditing methods, both fully decentralized and not fully decentralized, that are able to do the job but not well in the accuracy of audit results or audit efficiency. This chapter proposes a solution that does not require the data owner's data to be compared and achieves self-auditing through self-comparison in the blockchain network, which substantially improves the long-term stability of the system. At the same time, our proof is more efficient and can quickly reach a consensus on the integrity of all data in a short period of time with responses from all nodes. We also allow data owners to initiate challenges and quickly check the integrity of their own data through the hash algorithm.

The blockchain network audits whether the storage miners have correctly stored the corresponding data within a period *T* . To ensure timeliness, we use the last 16 bits of the hash value of the previous block as a random number *N* . After getting *N* , the miner has to upload the result of the hash operation of all his shards and *N* together with his signature *Ma*\_*sign* within a specified time *T* . Now we obtain a new set: {*hash*(*Ma*(), *N* ), *Ma*\_*sign*} to express the result of the hash of all *Ma*'s shards and its signature. It is important to note that the set is ordered, again according to the order in which *Ma* stores the shards. The advantage of this design is that even when faced with a pile of results, the smart contract can determine the corresponding label (*i*, *j*) based on its position. We will use *H*(*a*, *b*) to denote the hash value corresponding to the *b*th shard of the miner *a* with *N* , *D*(*i*, *j*)\_*hash* to denote the hash value of *D*(*i*, *j*) with *N* (Table 2).

After the storage miner *Ma* has uploaded his/her {*hash*(*Ma*(), *N* ), *Ma*\_*sign*}, the smart contract will quickly determine if the number of *H*(*a*, *b*) is equal to the number of shards already stored by *Ma*, and if it does not match, invalidate this result and demand *Ma* to recalculate and upload the new result. If the result matches, the result is accepted and moves on to the hash pool. Next, the smart contract will compare the number of occurrences of all the results in the hash pool. If there are exactly *k* identical results, i.e., if there are *k* sets (*a*, *b*) s.t. all the results of *H*(*a*, *b*) are equal, then it will be decided that all the copies of the shard have been stored correctly. This would be the best result that can be achieved. All the storage miners need to store their data correctly for their own benefits. If all miners store correctly, all the nodes can quickly and accurately obtain the result that the data are stored intact. We will now determine whether the data are stored correctly based on the occurrences of each hash value.

**Definition 2** *(strong integrity)*: The number of occurrences of *D*(*i*, *j*)\_*hash* is exactly equal to *k*.

**Definition 3** *(weak integrity)*: The number of occurrences of *D*(*i*, *j*)\_*hash* is greater than or equal to 2 and less than *k*.



If all shards achieve strong integrity, we can assume that the storage network has stored all data correctly and that all nodes would agree on this. If all shards achieve weak integrity, we can assume that all data are stored securely on the storage network. Weak integrity is a lower requirement for data availability in storage networks. During auditing, it is more of a constraint on the miners, so strong integrity is what is required by distributed storage networks.

We will now discuss what to do if strong integrity is not achieved. If the number of occurrences of a hash value is greater than *k*, the possible scenario is that the miners are jointly misbehaving with each other and copying the same result for output. This is because when the storage miner receives the shard corresponding to that result, no other shards are received, and only if the miner has stored other miners' shards. In this case, the *k* + α results are assigned a number (*i*, *j*) based on their location, and the numbers are then compared to find the miner with the incorrect result by audit miners *A*. The first step is to find the set of {(*i* , *j* )} corresponding to the wrong hash value, and then check whether the number of occurrences of hash value is *k*. If it is *k*, the shard has been completely stored in the storage network. Otherwise, this number can only be less than *k*, if so, *A* needs to find which miners did not upload the right value and ask them to upload in time. If they upload the wrong results, ask them to re-store correctly to solve the problem.

In fact, it is more often the case that the number is less than *k*. In case when the hash values whose number is less than *k*, we need the corresponding miners to upload proofs of the correct storage of the corresponding *D*(*i*, *j*). The following results may occur:


Audit miners *A* need to immediately copy *D*(*i*, *j*) to ensure that they couldn't be lost. After that, *A* will handle errant storage miners as above. Such handling effectively avoids errors caused by miners offline. We will also judge storage miners who make frequent errors as malicious miners. If for the same *D*(*i*, *j*), all the results of the hash operation are different or it is not possible to distinguish the correctness of the result, then *A* can ask all miners storing the *D*(*i*, *j*) to recalculate it with the random number *N* and compare it with the result calculated by *Ui* . In time, copy the data of the miners that output the correct result and ask *Ui* to re-add another random number *N* to the calculation and keep the result for future use (Fig. 3).

The above is the process by which a blockchain storage network audits of its own. This process allows for quick consensus to be reached under the condition that all the data are stored correctly, as well as finding malicious nodes if consensus is not reached.

# *3.4 Data Owner's Integrity Verification*

After the data owner gets*N* , he/she can also get a set of hash values *H*(*i*, *j*) generated by *Ui* by performing a hash operation on his/her own data shards *D*(*i*, *j*). Smart contract will look for *k* occurrences of these values in the hash pool to determine whether his/her data have been stored completely. If exactly each result occurs *k* times, then it is almost certain that *Ui*'s data has been stored correctly. If not, then the storage miner in problem can be found quickly and the data copied by the audit miner in his/her storage in time. Such an audit approach improves the shortcomings of self-integrity verification and increases the accuracy of data integrity verification.

# *3.5 The Game of Miners Versus Storage Networks*

Storage miners can only earn if they store the user's data correctly and upload *H*(*a*, *b*) correctly. If the miner wants to earn without storing correctly, he needs to join with

other miners. The miner does not know the number (*i*, *j*) of the data he is storing, so he needs to send a request to all miners in the network. And other miners can be rewarded by reporting those malicious nodes. The union of storage miners does not earn a reward, only the individual fulfillment of the storage function makes the storage network maximize its benefits. For audit miners, audit miners are only given the appropriate audit access if there is a problem with the storage miner. Audit miners are only able to earn more rewards by continuously completing audit tasks and tasks delegated by data owners. These ensure that all miners are driven by profit to achieve stability.

Thus our system satisfies incentive-compatible property and also data integrity, recoverability, publicly verifiable, and auditability. The satisfaction of the five properties is obvious. These are the same properties that filecoin satisfies. We can say that our audit proof is reasonable.

# **4 Fault-Tolerance Verification**

We now describe three attacks that are common in distributed storage networks.

• Sybil Attack (Douceur, 2002): Sybil attack is a type of attack in peer-to-peer networks in which a node in the network operates multiple identities actively at the same time and undermines the authority/power in reputation systems. In a distributed storage network, a malicious miner can create multiple sybil identities pretending to store many copies in order to be rewarded, but only one copy is stored in his local.

In our proof, a miner cannot claim to have stored multiple shards, as the number of shards per share is limited to *k*. Meanwhile, there is a little additional gain for a malicious miner to pretend to store multiple copies by creating multiple identities. Since each miner stores different content and for two storage miners, they have the number of the same shards less than *z*. We control the revenue in such a way that storage miners will not receive enough benefit in creating a witch identity, making them less likely to take risks for it. Subsequently, we can limit such a situation even further by monitoring IP address, generating *Ma*() proofs, etc. Such a scenario makes sybil attacks much less profitable.

• Outsourcing Attack: By relying on fast access to data from other storage providers, malicious miners promise to store more data than they can actually store.

If a malicious miner wants to launch an outsourcing attack, the miner cannot know the shard number and can only determine if there is an overlapping shard by sharing the miner's *H*(*a*, *b*) set with each other; if there is an overlapping shard, the hash result can be quickly retrieved later in the audit. But the benefit to the provider is weak, and the inclusion of an exposing mechanism keeps miners from going to extremes for the weak benefit. So we can conclude that the benefits of a small number of miners cooperating are much less than the risks associated with incomplete storage.

• Generation Attack: Malicious miners claim to have more storage than they actually have through a small program to gain a greater advantage in the mining competition.

With slicing and cryptography, miners cannot effectively generate data with small program. The generated proof results need to be computed by hash function, and a small change can lead to a huge difference in results. There are strict penalties for generation attack in Filecoin, so this attack can be substantially avoided from an incentive point of view.

# **5 Concluding Remarks**

In this chapter, we focus on current research on auditing and point out the imperfections of current auditing. We also analyze the audit requirements for Filecoin and redesign an audit algorithm for it. The algorithm determines whether the data have been stored intact in the storage network by comparing the results in the hash pool by means of storage miners uploading the hash results. The audit miner is set to a semi-participating state and will only join in time to gain access if a problem arises. Such an auditing algorithm is relatively accurate and secure for decentralized storage networks. Besides, it is obtained that the algorithm is highly fault-tolerant.

Our algorithm is not yet well designed in terms of incentives and needs to prove that the algorithm can be put into widespread use. Incentives are a key part of getting the algorithm used, and it is important to play the game between miners and the storage network so that both sides can get the optimal solution for their interests. The regulation of the data is also something that needs to be considered in the next phase. Our algorithm needs to be more complete in the future.

**Acknowledgements** This work was supported in part by the National Natural Science Foundation of China (72171230), and the Science and Technology Development Fund, Macau SAR (File No. 0050/2020/A1).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Iterative Learning Control Based on Random Variance Reduction Gradient Method**

**Yihua Gao, Dong Shen, and Jiaxi Qian**

**Abstract** Traditional iterative learning control (ILC) algorithms usually assume that full system information and operation data can be utilized. However, due to the uncertainty and complexity of actual systems, it is difficult to access full system information and operation data accurately and completely. In this chapter, a novel ILC scheme based on stochastic variance reduced gradient (SVRG) is proposed. This scheme is not only suitable for resolving the incomplete information problem, but also converges efficiently under both strongly convex and non-strongly convex control objectives. To demonstrate the advantages, this chapter studied two scenarios, i.e., random error data dropout and model-free data-driven approach, and proposed two SVRG-based ILC algorithms for these two scenarios, respectively. It is theoretically demonstrated and experimentally verified that the proposed SVRG-based ILC scheme converges faster than both the full gradient and stochastic gradient methods for the two involved scenarios.

**Keywords** Iterative learning control · Variance reduction gradient method · ILC algorithms

Y. Gao

Renmin University of China, Beijing, China e-mail: gaoyihua@ruc.edu.cn

D. Shen (B)

J. Qian

© The Author(s) 2023 Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_5

Distributed Artificial Intelligence Laboratory, Renmin University of China, Beijing, China e-mail: dshen@ruc.edu.cn

School of Mathematics, Renmin University of China, Beijing, China e-mail: qianjx1917@ruc.edu.cn

# **1 Introduction**

# *1.1 Background*

Iterative learning control (ILC) is a control method applicable to systems doing repeated operations. The basic idea is to use the input and error signals from previous iteration to improve the input of the next iteration. Arimoto et al. first proposed iterative learning control for robotic arms in 1984 and clarified the basic idea of iterative learning control (Arimoto et al., 1984). Subsequently, academia has published numerous chapters around ILC. It has gradually become one of the important branches in the field of control and is widely used in robotics, industrial production and hard disk manufacturing, and other controlled systems with repeated operation.

To achieve excellent control performance, most ILC assume that full operational data and system information can be obtained and utilized. However, in real systems, data delays and dropouts often occur due to various uncertainties. On the other hand, when the system structure is complex or unstable, it is difficult to obtain the system information accurately. To solve the incomplete information problems, it is of great theoretical and practical significance to design ILC algorithms with high performance.

Information incompleteness can be classified into two categories, objective and subjective incompleteness. Information incompleteness caused by objective factors is often related to the uncertainty of the system itself. For example, during the transmission of the signal, the instability of the channel can cause data packet loss. Three main random packet dropout models have been developed for this problem: the random sequence model, the Bernoulli distribution model, and the Markov chain model. Shen (2018) designed an iterative learning control algorithm based on the stochastic approximation algorithm corresponding to the three models and proved that the algorithms satisfy mean-square convergence and probabilistic strong convergence. Information incompleteness due to subjective factors usually artificially assumes that the system information is unknown, thus avoiding the complexity of system modeling and system instability. For example, Oomen et al. (2014) designed a model-free data-driven iterative learning control algorithm for *H*∞-parametric estimation of multi-input multi-output (MIMO) systems, which obtains the full gradient by conducting *n*<sup>0</sup> × *n*<sup>1</sup> experiments on *n*<sup>0</sup> × *n*1-dimensional MIMO systems. However, this algorithm is difficult to be applied to large MIMO systems due to the excessive number of experiments. Subsequently, Aarnoudse et al. (Owens et al., 2009) designed an iterative learning control algorithm based on the stochastic approximation method by constructing a random matrix to estimate the gradient, which effectively reduces the number of experiments.

It is important to note that the effect of information incompleteness on ILC tracking performance is essentially the robustness of ILC. However, this robustness differs for objective and subjective-type information incompleteness problems. The former is usually model-based and emphasizes modeling to analyze the causes of information deficiency. While the latter is data-based and is generally not concerned with the causes of information deficiency but with the inherent limitations of information deficiency on control performance. Model-based and data-based control methods are not opposed. To achieve the best control effect, the two control methods can also be used in combination. Existing studies on ILC for solving the information incompleteness problems are usually based on stochastic approximation method or other gradient methods. In this chapter, we will use a stochastic variance reduction gradient (SVRG) method to give a general framework for solving the system information incompleteness problems.

# *1.2 Design and Analysis of SVRG-Based ILC*

Modeling the control objective as an optimization function, for a deterministic discrete-time linear system, Owens et al. (Aarnoudse & Oomen, 2020) proposed a gradient-type ILC algorithm based on optimization ideas and analyzed the stability, monotonicity, and robustness of the algorithm. For noisy discrete-time linear systems, Yang and Ruan (2017) proposed an enhanced gradient-based ILC algorithm that can effectively converge in the presence of perturbations in the system. However, the above gradient-based ILC algorithm requires full error and system information for each iteration, and when this information is not fully available, the traditional gradient-based ILC algorithm is no longer applicable.

Notice that in Machine Learning, Stochastic Gradient Descent (SGD) method replaces the total gradient by randomly selecting a partial gradient each time. Corresponding to the control problems, the partial gradient can also be obtained when there is insufficient information about the error or the system. This correlation inspires us to find suitable stochastic gradient methods to solve errors or system information insufficient problems.

In order to improve the convergence speed and apply to non-smooth and nonstrongly convex objective functions, recent research in Machine Learning has produced a large number of improved versions of stochastic gradient descent algorithms, including momentum method, variance reduction method, and incremental aggregated gradients. Allen-Zhu (2018) divided these algorithms to three types according to their complexity under strongly convex conditions. The first generation is the momentum-based gradient algorithm, the second generation includes the variance reduction-based gradient algorithm and the proximal stochastic variance reduction gradient algorithm, the third generation includes the Katyusha algorithm and incremental aggregated gradient algorithms. In most cases, the complexity of the algorithms decreases with the growth of generation. Considering specific control problems, the algorithms in first generation are slow to converge and often fail to meet the practical needs, while the algorithms in third generation require accurate system modeling to achieve faster convergence and are difficult to apply to data-driven ILC. Therefore, the research in this chapter is mainly based on the algorithm in second generation—Stochastic Variance Reduction Gradient (SVRG) algorithm.

# *1.3 Main Work and Organization*

The purpose of this chapter is to construct SVRG-based ILC and use this framework to solve specific information incompleteness problems. As representatives of objective and subjective information incompleteness, two scenarios, error data random dropouts and model-free ILC, are selected in this chapter to give the corresponding SVRG-based ILC algorithms, respectively. The contribution is threefold.


Section 2 serves as the basis of the chapter, giving the SVRG-based ILC framework for SISO systems. Section 3 applies the framework to error data random dropouts problem. Section 4 extends the framework to MIMO systems in model-free scenario. Since Sect. 2 only gives the algorithm framework and does not cover the specific scenario, Sect. 2 does not give numerical simulations and contains only three parts: system description, algorithm design, and convergence analysis. Both Sects. 3 and 4 include four parts: system description, algorithm design, convergence analysis, and numerical simulation.

# **2 SVRG-Based ILC Framework**

As the basis of the following sections, this section uses SISO systems to give the basic framework of SVRG-based ILC algorithm. This section includes three parts: system description, algorithm design, and convergence analysis.

# *2.1 System Description*

Consider the following single-input single-output (SISO) discrete-time linear system

$$\begin{cases} \mathbf{x}\_k(t+1) = A\mathbf{x}\_k(t) + Bu\_k(t), \\ \mathbf{y}\_k(t) = C\mathbf{x}\_k(t), \end{cases} \tag{1}$$

where *t* = 0, 1,..., *N* − 1 is time index, and *k* = 1, 2, 3,... denotes the iteration index. *xk* (*t*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *uk* (*t*) <sup>∈</sup> <sup>R</sup>, and *yk* (*t*) <sup>∈</sup> <sup>R</sup> represent the system state, input and output, respectively. *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>n</sup>*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, and *<sup>C</sup>* <sup>∈</sup> <sup>R</sup>1×*<sup>n</sup>* are the system matrices. The initial condition is the same for each iteration, i.e., *xk* (0) <sup>=</sup> *<sup>x</sup>*0, <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> <sup>N</sup>∗.

Taking *t* = 0, 1,..., *N* − 1 in (1) yields

$$\begin{aligned} \mathbf{y}\_k(1) &= CA\mathbf{x}\_k(0) + CB\boldsymbol{u}\_k(0) = CB\boldsymbol{u}\_k(0) + CA\boldsymbol{x}\_0, \\ \mathbf{y}\_k(2) &= CA\mathbf{x}\_k(1) + CB\boldsymbol{u}\_k(1) = CAB\boldsymbol{u}\_k(0) + CB\boldsymbol{u}\_k(1) + CA^2\boldsymbol{x}\_0, \\ &\vdots \\ \mathbf{y}\_k(N) &= CA\mathbf{x}\_k(N-1) + CB\boldsymbol{u}\_k(N-1) \\ &= CA^{N-1}Bu\_k(0) + CA^{N-2}Bu\_k(1) + \cdots \\ &+ CAB\boldsymbol{u}\_k(N-2) + CB\boldsymbol{u}\_k(N-1) + CA^N\boldsymbol{x}\_0. \end{aligned}$$

Combining the above equations, system (1) can be rewritten in the following equivalent form

$$\mathbf{y}\_k = H\boldsymbol{u}\_k + K\boldsymbol{x}\_0,\tag{2}$$

where *uk* = [*uk* (0), *uk* (1), . . . , *uk* (*N* − 1)] *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*n*, *yk* <sup>=</sup> [*yk* (1), *yk* (2), . . . , *yk* (*N*)] *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*,

$$H = \begin{bmatrix} h\_{11} & 0 & \cdots & 0 \\ h\_{21} & h\_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ h\_{N1} & h\_{N2} & \cdots & h\_{NN} \end{bmatrix} = \begin{bmatrix} CB & 0 & \cdots & 0 \\ CAB & CB & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ CA^{N-1}B \ CA^{N-2}B & \cdots & CB \end{bmatrix},\\ K = \begin{bmatrix} CA \\ CA^2 \\ \vdots \\ CA^N \end{bmatrix}.$$

For further analysis, the following assumptions are required.

**Assumption 1** The input/output coupling matrix *C B* = 0.

**Assumption 2** For desired trajectory *yd* (*t*), there exists a unique desired input *ud* (*t*) and initial state *xd* (0) such that

$$\begin{cases} \chi\_d(t+1) = A\chi\_d(t) + B\mu\_d(t), \\ \quad \chi\_d(t) = C\chi\_d(t). \end{cases} \tag{3}$$

Also written in the form of (2), we have

$$\mathbf{y}\_d = H\boldsymbol{u}\_d + K\boldsymbol{x}\_d(0). \tag{4}$$

**Remark 1** Assumptions 1 and 2 describe the realizability of the system for desired trajectory *yd* . To be specific, Assumption 1 means that the relative degree of the system is 1. Assumption 2 describes the existence of an input signal *ud* that can precisely trace *yd* . If the system does not satisfy the Assumption 2, the system output can only be as close as possible to the desired trajectory *yd* .

**Assumption 3** The initial states of (1) and (3) are identical, i.e., *xd* (0) = *xk* (0) = *x*0, ∀*k*. Assume that *x*<sup>0</sup> = 0.

**Remark 2** Assumption 3 is based on the requirement for system repeatability in ILC. In order to simplify the algorithm, without loss of generality, take *x*<sup>0</sup> = 0. It is easy to verify that the result of this chapter is also valid when *x*<sup>0</sup> = 0.

In this chapter, the above three assumptions will be followed, but in fact, the SVRG-based ILC can also be established when Assumptions 1 and 2 are appropriately relaxed. Section 4 will give specific explanations on how to relax these assumptions.

Figure 1 illustrates the basic framework of ILC. The plant takes input *uk* and generates output *yk* and gets the error *ek* = *yd* − *yk* between the output and the desired trajectory *yd* , which is transmitted to the controller. The controller uses error *ek* and input *uk* to calculate the input signal *uk*+<sup>1</sup> for the next batch and transmits it to the plant. Our goal is to find a sequence of input {*uk* }, s.t.

$$\lim\_{k \to \infty} \|e\_k\| = \lim\_{k \to \infty} \|\mathbf{y}\_d - \mathbf{y}\_k\| = 0,\tag{5}$$

where · is the vector 2-norm and its induced matrix norm, and henceforth refers to this norm if not otherwise specified.

By Assumptions 2 and 3, (5) is equivalent to the optimization problem of function *F*:

$$F\left(\mathfrak{u}\_{k}\right) \stackrel{\Delta}{=} \frac{1}{2N} \left\|e\_{k}\right\|^{2} = \frac{1}{2N} \left\|\mathfrak{y}\_{d} - Hu\_{k}\right\|^{2}, \lim\_{k \to \infty} F\left(\mathfrak{u}\_{k}\right) = 0. \tag{6}$$

# *2.2 Algorithm Design*

The traditional gradient-based ILC updating law (Gu et al., 2019) is

Iterative Learning Control Based on Random Variance … 87

$$
\mu\_{k+1} = \mu\_k - \eta\_k \nabla\_k,\tag{7}
$$

where η*<sup>k</sup>* denotes step length, and ∇*<sup>k</sup>* is the gradient of the objective function. From (6), we have

$$
\nabla\_k = \nabla F\left(u\_k\right) = -\frac{1}{N} H^T e\_k. \tag{8}
$$

In (8), calculating the full gradient requires all the information of conjugate matrix *H<sup>T</sup>* and error *ek* . To give the gradient under partial information, consider decomposing the error *ek* = [*ek* (1), *ek* (2), . . . , *ek* (*N*)] *<sup>T</sup>* according to the time index. Let *fi* (*uk* ) <sup>=</sup> <sup>1</sup> <sup>2</sup> *ek* (*i*)<sup>2</sup> <sup>=</sup> <sup>1</sup> 2 *yd* (*i*) − *h<sup>T</sup> <sup>i</sup> uk* 2 , where *hi* = [*hi*1,..., *hii*, 0,..., 0] *T* denotes the *i*-th row of the matrix *H*. Then, equation (6) can be rewritten as

$$F\left(u\_k\right) = \frac{1}{N} \sum\_{i=1}^{n} f\_i\left(u\_k\right). \tag{9}$$

Take the gradient of both sides, we have

$$\nabla F\left(\mu\_k\right) = \frac{1}{N} \sum\_{i=1}^n \nabla f\_i\left(\mu\_k\right) = \frac{1}{N} \sum\_{i=1}^n -h\_i e\_k(i). \tag{10}$$

Define random gradient ∇˜ *<sup>k</sup>* as a discrete random variable that takes value uniformly over{∇ *fi* (*uk* )}*<sup>N</sup> <sup>i</sup>*=1, satisfying *P* ∇˜ *<sup>k</sup>* = ∇ *fi* (*uk* ) <sup>=</sup> <sup>1</sup> *<sup>N</sup>* . Therefore<sup>E</sup> ∇˜ *k* = ∇*<sup>k</sup>* , i.e., ∇˜ *<sup>k</sup>* is unbiased estimation of ∇*<sup>k</sup>* .

Note that by decomposing (6)–(9), calculating the specific value ∇ *fi* (*uk* ) of random vector ∇˜ *<sup>k</sup>* only requires one row of the system matrix *H* and one-dimensional information of the error *ek* . Therefore the decomposition can effectively reduce the information required for each iteration. This technique of gradient decomposition is the basis for solving the ILC of information incompleteness using SVRG method in this chapter. In Sects. 3 and 4, two specific decomposition methods are presented for incompleteness of error and system information, respectively.

Consider the stochastic gradient descent (SGD) method used in Machine Learning. Replacing the full gradient ∇*<sup>k</sup>* with the stochastic gradient ∇˜ *<sup>k</sup>* in the ILC updating law (7), we can obtain the SGD-based ILC algorithm. However, the convergence rate of SGD algorithm is *O*(1/ <sup>√</sup>*k*) even under strongly convex condition (Allen-Zhu, 2018), which cannot meet the practical requirements. This is because although the stochastic gradient ∇˜ *<sup>k</sup>* is unbiased estimate of the full gradient ∇*<sup>k</sup>* , the variance accumulates as the iteration increases. To reduce the variance, Johnson and Zhang (2013) proposed a general stochastic variance reducted gradient (SVRG) descent method. By recording a "snapshot" *u*˜*<sup>s</sup>* every few updates to construct an converging upper bound of the gradient, the rate of convergence of SVRG method is *O* ρ*k* under strongly convex condition and *O*(1/*k*) under non-strongly convex condition. Based on this method, the input updated with "snapshot" *u*˜*<sup>s</sup>* is denoted as *us*,*<sup>k</sup>* , and the SVRG-based ILC updating law is

$$u\_{s,k+1} = u\_{s,k} - \eta \left(\nabla f\_i\left(u\_{s,k}\right) - \nabla f\_i\left(\tilde{u}^s\right) + \nabla F\left(\tilde{u}^s\right)\right). \tag{11}$$

For system (2), the SVRG-based ILC algorithm with updating law (11) is shown in Algorithm 1.

**Algorithm 1** SISO SVRG-based ILC framework for SISO systems

```
Input: η, u0,0;
 m ← 2N; ˜u0 ← u0,0;
 for s ← 0 to S − 1 do
   us,0 ← ˜us, μs ← ∇ F (u˜s);
   for k ← 0 to m − 1 do
     wk ← ∇ fi

                  us,k

                       − ∇ fi (u˜s) + μs; where i from {1, 2,..., N} randomly
     us,k+1 ← us,k − ηwk ;
   end for
   Option I: u˜s+1 ← 1
                         m
                           m−1 k=0 us,k ;
   Option II: u˜s+1 ← us,m;
 end for
```
Algorithm 1 has two loops. The outer loop updates the "snapshot" *u*˜*<sup>s</sup>* once when the inner loop iterates *m* times. The iteration length *m* is taken as an integer multiple of *N*, which is empirically set to 2*N*. Line 9 and 10 of Algorithm 1 shows two ways of updating the "snapshot", Option I and Option II. Option I takes the average of the first *m* − 1 inputs as the "snapshot", without using *us*,*<sup>m</sup>*, so actually the inner loop only requires *m* − 1 iterations. The corresponding Option II takes the *m*th-iteration and uses *us*,*<sup>m</sup>* as the "snapshot". The two "snapshot" updating methods do not change the convergence of Algorithm 1 (Bottou et al., 2018). Due to the limitation of space, we only prove the convergence of Option I under strongly convex conditions and Option II under non-strongly convex conditions in this section and Sect. 4, respectively.

# *2.3 Convergence Analysis*

This subsection is divided into two parts, first giving the convex optimization knowledge required for the proof of this chapter and then analyzing the convergence of the system (2) when the "snapshot" update method of Algorithm 1 is set for Option I.

### **2.3.1 Preliminaries of Convex Optimization**

The basics of convex optimization required for this chapter are given below (Lyubashevsky, 2005).

**Definition 1** *(Smoothness)* Suppose *<sup>S</sup>* is an nonempty convex subset of <sup>R</sup>*<sup>d</sup>* , *<sup>f</sup>* : *<sup>S</sup>* <sup>→</sup> <sup>R</sup> <sup>∈</sup> *<sup>C</sup>*1. If <sup>∃</sup>*<sup>L</sup>* <sup>&</sup>gt; 0, s.t. <sup>∀</sup>*x*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*,

$$\|\nabla f(\mathbf{x}) - \nabla f(\mathbf{y})\| \le L \|\mathbf{x} - \mathbf{y}\|,$$

then we say that *f* is *L*-smooth or ∇ *f* (*x*) is *L*-Lipschitz continuous on *S*, where *L* is the Lipschitz constant.

**Definition 2** *(Strong convexity)* Suppose *S* is an nonempty convex subset of R*<sup>d</sup>* , *<sup>f</sup>* : *<sup>S</sup>* <sup>→</sup> <sup>R</sup> <sup>∈</sup> *<sup>C</sup>*1. If <sup>∃</sup>σ > 0, s.t. <sup>∀</sup>*x*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*,

$$f(\mathbf{y}) \ge f(\mathbf{x}) + \langle \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x} \rangle + \frac{\sigma}{2} \left\| \mathbf{x} - \mathbf{y} \right\|^2,$$

then we say that *f* is σ-strongly convex on *S*. When σ = 0, *f* (*y*) ≥ *f* (*x*) + ∇ *f* (*x*), *y* − *x*, *f* is convex.

**Definition 3** *(Conditional number)* If *f* is *L*-smooth and σ-strongly convex, κ = *L*/σ is the conditional number of *f* .

**Theorem 1** *For convex function f , the followings are equivalent:*

*a.* ∇ *f* (*x*) *is L-Lipschitz continuous, b. f* (*y*) <sup>≤</sup> *<sup>f</sup>* (*x*) + ∇ *<sup>f</sup>* (*x*), *<sup>y</sup>* <sup>−</sup> *<sup>x</sup>* + *<sup>L</sup>* <sup>2</sup> *y* − *x*2*, c. f* (*y*) <sup>≥</sup> *<sup>f</sup>* (*x*) + ∇ *<sup>f</sup>* (*x*), *<sup>y</sup>* <sup>−</sup> *<sup>x</sup>* + <sup>1</sup> <sup>2</sup>*<sup>L</sup>* ∇ *f* (*y*) − ∇ *f* (*x*)2*, d.* <sup>1</sup> *<sup>L</sup>* ∇ *f* (*y*) − ∇ *f* (*x*)<sup>2</sup> ≤ ∇ *f* (*x*) − ∇ *f* (*y*), *x* − *y.*

*Proof a* → *b* : Denote *g*(*t*) = *f* (*t*(*y* − *x*) + *x*), then *f* (*x*) = *g*(0), *f* (*y*) = *g*(1), and *g* (*t*) = ∇ *f* (*t*(*y* − *x*) + *x*), *y* − *x*. Therefore,

$$\begin{aligned} f(\mathbf{y}) - f(\mathbf{x}) - \langle \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x} \rangle &= g(\mathbf{I}) - g(\mathbf{0}) - \langle \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x} \rangle \\ &= \int\_0^1 g'(t) dt - \langle \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x} \rangle \\ &= \int\_0^1 (\nabla f(t(\mathbf{y} - \mathbf{x}) + \mathbf{x}) - \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x}) dt \\ &\leq \int\_0^1 \| \nabla f(t(\mathbf{y} - \mathbf{x}) + \mathbf{x}) - \nabla f(\mathbf{x}) \| \cdot \| \mathbf{y} - \mathbf{x} \| dt \\ &\leq \int\_0^1 L \| t(\mathbf{y} - \mathbf{x}) \| \cdot \| \mathbf{y} - \mathbf{x} \| dt = \frac{L}{2} \| \mathbf{y} - \mathbf{x} \|^2. \end{aligned}$$

*<sup>b</sup>* <sup>→</sup> *<sup>c</sup>* : Denote *fx* (*z*) <sup>=</sup> *<sup>f</sup>* (*z*) − ∇ *<sup>f</sup>* (*x*),*z*, for <sup>∀</sup>*z*,*z* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* ,

$$f\left(z'\right) - f(z) \le \left|\nabla f(z), z' - z\right| + \frac{L}{2} \left\|z' - z\right\|^2,$$

$$f\left(z'\right) - f(z) - \left|\nabla f(x), z' - z\right| \le \left|\nabla f(z) - \nabla f(x), z' - z\right| + \frac{L}{2} \left\|z' - z\right\|^2,$$

$$f\_x\left(z'\right) - f\_x(z) \le \left|\nabla f\_x(z), z' - z\right| + \frac{L}{2} \left\|z' - z\right\|^2. \tag{12}$$

By using the convexity of *f* , we have

$$\begin{aligned} \left\langle f\_x \left( z' \right) - f\_x(z) = f \left( z' \right) - f(z) - \left\langle \nabla f(x), z' - z \right\rangle \\ \geq \left\langle \nabla f(z), z - z' \right\rangle - \left\langle \nabla f(x), z - z' \right\rangle = \left\langle \nabla f\_x(z), z - z' \right\rangle. \end{aligned}$$

Therefore *fx* (*z*) is also convex, since ∇ *fx* (*z*) = ∇ *f* (*z*) − ∇ *f* (*x*), *fx* (*z*) achieves its minimum at *z* = *x*. By (12),

$$\begin{split} f\_x(\mathbf{x}) = \min\_{z'} f\_x\left(z'\right) &\leq \min\_{z'} \left\{ f\_x(z) + \left\langle \nabla f\_x(z), z' - z \right\rangle + \frac{L}{2} \left\| z' - z \right\|^2 \right\} \\ &= f\_x(z) + \min\_{\|\mathbf{y}\| = 1} \min\_{t \geq 0} \left\{ t \left\langle \nabla f\_x(z), \mathbf{y} \right\rangle + \frac{L}{2} t^2 \right\} \\ &= f\_x(z) + \min\_{\|\mathbf{y}\| = 1} \left\{ -\frac{\left\langle \nabla f\_x(z), \mathbf{y} \right\rangle^2}{2L} \right\} \\ &= f\_x(z) - \frac{1}{2L} \left\| \nabla f\_x(z) \right\|^2. \end{split}$$

Therefore *fx* (*z*) <sup>−</sup> *fx* (*x*) <sup>≥</sup> <sup>1</sup> <sup>2</sup>*<sup>L</sup>* ∇ *fx* (*z*)<sup>2</sup> , which implies

$$\begin{aligned} \left| f(\mathbf{y}) - f(\mathbf{x}) - \langle \nabla f(\mathbf{x}), \mathbf{y} - \mathbf{x} \rangle \right| &= f\_x(\mathbf{y}) - f\_x(\mathbf{x}) \\ \geq \frac{1}{2L} \left\| \nabla f\_x(\mathbf{y}) \right\|^2 &= \frac{1}{2L} \left\| \nabla f(\mathbf{y}) - \nabla f(\mathbf{x}) \right\|^2. \end{aligned}$$

*c* → *d* : Swapping x and y in *c*. , we have

$$f(\mathbf{x}) \ge f(\mathbf{y}) + \langle \nabla f(\mathbf{y}), \mathbf{x} - \mathbf{y} \rangle + \frac{1}{2L} \left\| \nabla f(\mathbf{y}) - \nabla f(\mathbf{x}) \right\|^2.$$

Summing the two equations, we have

$$\frac{1}{L} \left\| \nabla f(\mathbf{y}) - \nabla f(\mathbf{x}) \right\|^2 \le \langle \nabla f(\mathbf{x}) - \nabla f(\mathbf{y}), \mathbf{x} - \mathbf{y} \rangle.$$

*<sup>d</sup>* <sup>→</sup> *<sup>a</sup>*: ∇ *<sup>f</sup>* (*y*) − ∇ *<sup>f</sup>* (*x*)<sup>2</sup> <sup>≤</sup> *<sup>L</sup>*∇ *<sup>f</sup>* (*x*) − ∇ *<sup>f</sup>* (*y*), *<sup>x</sup>* <sup>−</sup> *<sup>y</sup>* ≤ *<sup>L</sup>*∇ *<sup>f</sup>* (*x*) − ∇ *<sup>f</sup>* (*y*)·*<sup>x</sup>* <sup>−</sup> *<sup>y</sup>* by Cauchy inequality, thus ∇ *f* (*x*) − ∇ *f* (*y*) ≤ *Lx* − *y*. **Theorem 2** *Let f* (*x*) <sup>=</sup> <sup>1</sup> <sup>2</sup> *x <sup>T</sup> Qx* + *q<sup>T</sup> x* + *c, where* Q *is positive definite. Then f* (*x*) *is L-smooth and* σ*-strongly convex, where L* = λ*<sup>M</sup>* , *and* σ = λ*m.* λ*<sup>M</sup>* , *and* λ*<sup>m</sup> are the maximum and minimum eigenvalues of Q, respectively.*

*Proof* Since ∇ *f* (*x*) = *Qx* + *q*, we have

$$\|\nabla f(\mathbf{x}) - \nabla f(\mathbf{y})\| \le \|Q(\mathbf{x} - \mathbf{y})\| \le \|Q\| \cdot \|\mathbf{x} - \mathbf{y}\| = \lambda\_M \|\mathbf{x} - \mathbf{y}\|.$$

Hence *f* (*x*) is λ*<sup>M</sup>* -smooth. It is easy to verify that

$$\langle f(\mathbf{x}) - f(\mathbf{y}) - \langle \nabla f(\mathbf{y}), \mathbf{x} - \mathbf{y} \rangle \rangle = \frac{1}{2} (\mathbf{x} - \mathbf{y})^T \mathcal{Q} (\mathbf{x} - \mathbf{y}) .$$

Since *Q* is positive definite, the orthogonal similarity can be diagonalized as *Q* = *P<sup>T</sup>P*, where *P* is the orthogonal matrix, is the diagonal matrix of eigenvalues, and λ*<sup>m</sup>* > 0. Thus

$$\frac{1}{2}(\mathbf{x} - \mathbf{y})^T \mathcal{Q}(\mathbf{x} - \mathbf{y}) = \frac{1}{2} \mathbf{z}^T \Lambda \mathbf{z} = \frac{1}{2} \sum\_{i} \lambda\_i \mathbf{z}\_i^2 \ge \frac{1}{2} \lambda\_m \left\| \mathbf{z} \right\|^2 = \frac{1}{2} \lambda\_m \left\| \mathbf{x} - \mathbf{y} \right\|^2.$$

Thus *f* (*x*) is λ*m*-strongly convex.

For system (2) and objective function (6), we have:

**Proposition 1** *Each fi is convex and L-smooth.*

*Proof* For *<sup>f</sup>* (*x*) <sup>=</sup> <sup>1</sup> 2 *q<sup>T</sup> x* + *c* 2 , where *q* = [*q*1, *q*2,..., *qn*] *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>. Obviously, *<sup>f</sup>* is convex, and <sup>∇</sup> *<sup>f</sup>* (*x*) <sup>=</sup> *qq<sup>T</sup> <sup>x</sup>* <sup>+</sup> *cq*, ∇ *<sup>f</sup>* (*x*) − ∇ *<sup>f</sup>* (*y*) = *qq<sup>T</sup>* (*<sup>x</sup>* <sup>−</sup> *<sup>y</sup>*) <sup>≤</sup> *qq<sup>T</sup>* · *x*<sup>−</sup> *<sup>y</sup>*≤*<sup>q</sup>* 2· *<sup>x</sup>* <sup>−</sup> *<sup>y</sup>*. Therefore, for each *fi* in (9), let *<sup>L</sup>* <sup>=</sup> max*i*{*hi*<sup>2</sup> } > 0, *hi* = [*hi*1,..., *hii*, 0,..., 0] *<sup>T</sup>* , then *fi* is *L*-smooth.

**Proposition 2** *F is L-smooth and* σ*-strongly convex.*

*Proof* By (6), we have

$$\begin{aligned} F\left(\boldsymbol{u}\_{k}\right) &= \frac{1}{2N} \left(\mathbf{y}\_{d} - H\boldsymbol{u}\_{k}\right)^{T} \left(\mathbf{y}\_{d} - H\boldsymbol{u}\_{k}\right) \\ &= \frac{1}{2N} \left(\boldsymbol{u}\_{k}^{T}\boldsymbol{H}^{T}H\boldsymbol{u}\_{k} - \mathbf{y}\_{d}^{T}H\boldsymbol{u}\_{k} - \boldsymbol{u}\_{k}^{T}\boldsymbol{H}^{T}\mathbf{y}\_{d} + \mathbf{y}\_{d}^{T}\mathbf{y}\_{d}\right), \end{aligned}$$

where *H<sup>T</sup> H* is positive definite. Then by Theorem 2, *F* (*uk* ) is *L*-smooth and σstrongly convex, where *L* and σ are the <sup>1</sup> <sup>2</sup>*<sup>N</sup>* of the maximal and minimal eigenvalues of *H<sup>T</sup> H*, respectively.

### **2.3.2 Proof of Convergence**

In Algorithm 1, set the "snapshot" updating as Option I. Then, under the assumptions of system (2), the convergence of Algorithm 1 is given by the following theorem.

**Theorem 3** *If each fi is convex and L-smooth, and F is* σ*-strongly convex. We denote the optimal point u*<sup>∗</sup> = argmin*<sup>u</sup> F*(*u*)*, and assume that m is large enough such that*

$$\alpha = \frac{1}{\sigma \eta (1 - 2L\eta) m} + \frac{2L\eta}{1 - 2L\eta} < 1.1$$

*Then the convergence of Algorithm 1 satisfies*

$$\mathbb{E}\left[F\left(\tilde{\mu}^S\right) - F\left(\mu^\*\right)\right] \le \alpha^S \left(F\left(\tilde{\mu}^0\right) - F\left(\mu^\*\right)\right).$$

*Proof* Since *fi* is convex and *L*-smooth, for any *i*, by Theorem 1,

$$\left\|\nabla f\_i(\boldsymbol{u}) - \nabla f\_i\left(\boldsymbol{u}^\*\right)\right\|^2 \le 2L \left[f\_i(\boldsymbol{u}) - f\_i\left(\boldsymbol{u}^\*\right) - \left\langle \nabla f\_i\left(\boldsymbol{u}^\*\right), \boldsymbol{u} - \boldsymbol{u}^\* \right\rangle\right].\tag{13}$$

Since <sup>1</sup> *N <sup>n</sup> <sup>i</sup>*=<sup>1</sup> ∇ *fi*(*u*) = ∇*F*(*u*), and ∇*F* (*u*∗) = 0, we regard ∇ *fi* as random vectors which take values from {∇ *fi*}*<sup>N</sup> <sup>i</sup>*=1, then

$$\mathbb{E}\left[\left\|\nabla f\_l(\boldsymbol{u}) - \nabla f\_l\left(\boldsymbol{u}^\*\right)\right\|^2\right] = \frac{1}{N} \sum\_{l=1}^n \left\|\nabla f\_l(\boldsymbol{u}) - \nabla f\_l\left(\boldsymbol{u}^\*\right)\right\|^2 \le 2L\left[F(\boldsymbol{u}) - F\left(\boldsymbol{u}^\*\right)\right].\tag{14}$$

For any fixed *s*, we set *wk* = ∇ *fi us*,*<sup>k</sup>* − ∇ *fi* (*u*˜*<sup>s</sup>* ) + ∇*F* (*u*˜*<sup>s</sup>* ), then

$$\begin{split} \mathbb{E}\left[\left\|{\left\|{\boldsymbol{w}\_{k}}\right\|}^{2}\right] \leq & 2\mathbb{E}\left[\left\|{\nabla f\_{i}\left(\boldsymbol{u}\_{\boldsymbol{s},k}\right)}-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)\right\|^{2}\right] \\ & + 2\mathbb{E}\left[\left\|{\nabla f\_{i}\left(\tilde{\boldsymbol{u}}^{\ast}\right)}-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)-\nabla F\left(\tilde{\boldsymbol{u}}^{\ast}\right)\right\|^{2}\right] \\ = & 2\mathbb{E}\left[\left\|{\nabla f\_{i}\left(\boldsymbol{u}\_{\boldsymbol{s},k}\right)}-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)\right\|^{2}\right] \\ & + 2\mathbb{E}\left[\left\|{\left\|{\nabla f\_{i}\left(\tilde{\boldsymbol{u}}^{\ast}\right)}-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)\right\|}-\mathbb{E}\left[\nabla f\_{i}\left(\tilde{\boldsymbol{u}}^{\ast}\right)-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)\right]\right\|^{2}\right] \\ \leq & 2\mathbb{E}\left[\left\|{\nabla f\_{i}\left(\boldsymbol{u}\_{\boldsymbol{s},k}\right)}-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)\right\|^{2}\right] + 2\mathbb{E}\left[\left\|{\nabla f\_{i}\left(\tilde{\boldsymbol{u}}^{\ast}\right)}-\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right)\right\|^{2}\right] \\ \leq & 4L\left[F\left(\boldsymbol{u}\_{\boldsymbol{s},k}\right)-F\left(\boldsymbol{u}^{\*}\right)+F\left(\tilde{\boldsymbol{u}}^{\ast}\right)-F\left(\boldsymbol{u}^{\*}\right)\right]. \end{split} \tag{15}$$

In the above, we have used the inequality *a* + *b*<sup>2</sup> ≤ 2*a*<sup>2</sup> + 2*b*2, and the property E <sup>ζ</sup> <sup>−</sup> <sup>E</sup><sup>ζ</sup> <sup>2</sup> <sup>=</sup> <sup>E</sup><sup>ζ</sup> <sup>2</sup> − E<sup>ζ</sup> <sup>2</sup> <sup>≤</sup> <sup>E</sup><sup>ζ</sup> 2, as well as (14). Notice that <sup>E</sup>[*wk* ] = ∇*<sup>F</sup> us*,*<sup>k</sup>* , thus

Iterative Learning Control Based on Random Variance … 93

$$\begin{split} &\mathbb{E}\left[\left\|u\_{s,k+1}-u^{\*}\right\|^{2}\right] \\ &=\left\|u\_{s,k}-u^{\*}\right\|^{2}-2\eta\mathbb{E}\left[\left\langle w\_{k},u\_{s,k}-u^{\*}\right\rangle\right]+\eta^{2}\mathbb{E}\left[\left\|w\_{k}\right\|^{2}\right] \\ &\leq\left\|u\_{s,k}-u^{\*}\right\|^{2}-2\eta\left[\left\langle\nabla F\left(u\_{s,k}\right),u\_{s,k}-u^{\*}\right\rangle\right] \\ &+4L\eta^{2}\left[F\left(u\_{s,k}\right)-F\left(u^{\*}\right)+F\left(\tilde{u}^{\*}\right)-F\left(u^{\*}\right)\right] \\ &\leq\left\|u\_{s,k}-u^{\*}\right\|^{2}-2\eta\left[F\left(u\_{s,k}\right)-F\left(u^{\*}\right)\right] \\ &+4L\eta^{2}\left[F\left(u\_{s,k}\right)-F\left(u^{\*}\right)+F\left(\tilde{u}^{\*}\right)-F\left(u^{\*}\right)\right] \\ &=\left\|u\_{s,k}-u^{\*}\right\|^{2}-2\eta(1-2L\eta)\left[F\left(u\_{s,k}\right)-F\left(u^{\*}\right)\right] \\ &+4L\eta^{2}\left[F\left(\tilde{u}^{\*}\right)-F\left(u^{\*}\right)\right]. \end{split}$$

In the above, we have used (15) and the convexity of *F*, i.e., ∇*F us*,*<sup>k</sup>* , *us*,*<sup>k</sup>* − *u*<sup>∗</sup> ≥ *F us*,*<sup>k</sup>* − *F* (*u*∗).

We sum up the expectations of the above equation for *k* = 0, 1,..., *m* − 1. Using the convexity of *F* and the selection of *u*˜*s*+<sup>1</sup> under Option I, we have *F u*˜*s*+<sup>1</sup> = *F* <sup>1</sup> *m <sup>m</sup>*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> *us*,*<sup>k</sup>* <sup>≤</sup> <sup>1</sup> *m <sup>m</sup>*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> *<sup>F</sup> us*,*<sup>k</sup>* . Therefore,

$$\begin{split} &\mathbb{E}\left[\left\|u\_{s,m}-u^{\ast}\right\|^{2}\right]+2\eta(1-2L\eta)m\mathbb{E}\left[F\left(\widetilde{u}^{s+1}\right)-F\left(\mu^{\ast}\right)\right] \\ &\leq\mathbb{E}\left[\left\|u\_{s,0}-u^{\ast}\right\|^{2}\right]+4L\eta^{2}m\mathbb{E}\left[F\left(\widetilde{u}^{s}\right)-F\left(\mu^{\ast}\right)\right] \\ &\leq\frac{2}{\sigma}\mathbb{E}\left[F\left(\widetilde{u}^{s}\right)-F\left(\mu^{\ast}\right)\right]+4L\eta^{2}m\mathbb{E}\left[F\left(\widetilde{u}^{s}\right)-F\left(\mu^{\ast}\right)\right]. \end{split}$$

Thus, we obtain

$$\begin{split} \mathbb{E}\left[F\left(\widetilde{\mu}^{s+1}\right) - F\left(\mu^\*\right)\right] &\leq \left(\frac{1}{\sigma\eta\left(1 - 2L\eta\right)m} - \frac{2L\eta}{1 - 2L\eta}\right) \mathbb{E}\left[F\left(\widetilde{\mu}^s\right) - F\left(\mu^\*\right)\right] \\ &= \alpha \mathbb{E}\left[F\left(\widetilde{\mu}^s\right) - F\left(\mu^\*\right)\right]. \end{split}$$

Summing up the above equation for *s* = 0, 1,..., *S* − 1, we have

$$\mathbb{E}\left[F\left(\tilde{\mu}^S\right) - F\left(\mu^\*\right)\right] \le \alpha^S \mathbb{E}\left[F\left(\tilde{\mu}^0\right) - F\left(\mu^\*\right)\right].\tag{7}$$

**Remark 3** Theorem3 indicates that Algorithm1 has the rate of convergence *O* α*S* . And this convergence rate is related to the value of α. If the information of the system *<sup>H</sup>* is known, to make <sup>α</sup> as small as possible, we generally take <sup>η</sup> <sup>=</sup> <sup>0</sup>.<sup>1</sup> *<sup>L</sup>* , *m* = (*n*), so that the value of α is close to <sup>1</sup> <sup>2</sup> . If the system information is unknown, we need to find the appropriate η and *m* by experiment.

# **3 SVRG-Based ILC Under Random Data Dropouts**

This section follows the SISO system in Sect. 2, but assumes that random data dropouts occur in the error signal transmission. This section consists of four parts: system description, algorithm design, performance analysis, and numerical simulation.

# *3.1 System Description*

In SISO system (2), we still hold Assumptions 1–3, but assuming that data dropouts occur in the transmission of the error signal, as shown in Fig. 2. We further assume that the dropouts satisfy the Bernoulli distribution model (Shen, 2018). Therefore, the ILC updating law (7) becomes

$$
\mu\_{k+1} = \mu\_k + \eta \frac{1}{N} H^T \Gamma\_k e\_k,\tag{16}
$$

where *<sup>k</sup>* <sup>=</sup> diag {γ*<sup>k</sup>* (1), γ*<sup>k</sup>* (2), . . . , γ*<sup>k</sup>* (*N*)}. {γ*<sup>k</sup>* (*i*)}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> is i.i.d following Bernoulli distribution. Let γ - E γ*<sup>k</sup>* (*i*) be the successful transmission rate, where γ*<sup>k</sup>* (*i*) = 0 means data dropout occurs in the *i*-th time of *k*-th batch, and otherwise, data dropout does not occur.

**Fig. 2** ILC with random data dropouts

# *3.2 Algorithm Design*

Based on gradient descent method, there are two main approaches to solve the data dropouts problem:


The first method requires a lot of wasted time when data retransmission is slow. Although the second method saves the time of data retransmission, the actual running time may be larger than first method when data retransmission.

Based on the framework of Algorithm 1, the SVRG-based ILC under error data dropouts can be constructed by utilizing the second method for each iteration, but calculating the full gradient every several iterations using the first method. This algorithm does not require data retransmission in most cases compared to the first method and has a significant improvement in convergence speed compared to the second method. Thus it can achieve a good balance between convergence rate and data retransmission speed, and it is more suitable for general data dropout cases.

The formal construction of the algorithm is given below.

Firstly, we take the random gradient ∇˜ *<sup>k</sup>* - <sup>−</sup> <sup>1</sup> <sup>γ</sup> *<sup>N</sup> H<sup>T</sup><sup>k</sup> ek* , and we note that E ∇˜ *k* = − <sup>1</sup> *<sup>N</sup> H<sup>T</sup> ek* = ∇*<sup>k</sup>* . For the convenience of proof, we present ∇˜ *<sup>k</sup>* as

$$\nabla \tilde{F}\_k \left( u\_k \right) = \frac{1}{\chi N} \sum\_{\{i \mid \chi\_i(i) = 1\}} \nabla f\_i \left( u\_k \right), \tag{17}$$

where ∇ *fi* is defined in the same way as (10). Notice that if there is no dropout, γ = 1, and therefore (17) is equivalent to (10).

**Remark 4** Equation (17) is similar to the Batch Gradient Descent (BGD) method in Machine Learning, but they are fundamentally different. In (17), the number of ∇ *fi* in each summation *<sup>i</sup>* ∇ *fi* varies according to the value of the random vector {γ*<sup>k</sup>* (*i*)}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup>. But in BGD, the number of ∇ *fi* is fixed. Therefore, the algorithm based on gradient ∇*F*˜ *<sup>k</sup>* cannot be directly applied to BGD.

Secondly, similar to (11), ILC updating law under random data dropouts is constructed:

$$
\mu\_{s,k+1} = \mu\_{s,k} - \eta \left( \nabla \tilde{F}\_k \left( \mu\_{s,k} \right) - \nabla \tilde{F}\_k \left( \tilde{\mu}^s \right) + \nabla F \left( \tilde{\mu}^s \right) \right). \tag{18}
$$

Finally, change the iteration length *m* in Algorithm 1 from 2*N* to 2γ *N*. Because the number of summation in each batch isE *<sup>N</sup> <sup>i</sup>*=<sup>1</sup> γ*<sup>k</sup>* (*i*) = γ *N* in the desired sense, ∇*F*˜ *<sup>k</sup>* is equivalent to γ *N* sum of ∇ *fi* .

In conclusion, SVRG-based ILC under random data dropouts is shown in Algorithm 2.

**Algorithm 2** Data dropout SISO SVRG-based ILC

```
Input: η, u0,0;
 m ← 2γ N; ˜u0 ← u0,0;
 for s ← 0 to S − 1 do
   us,0 ← ˜us, μs ← ∇ F (u˜s);
   for k ← 0 to m − 1 do
      wk ← ∇ fi

                  us,k

                        − ∇ fi (u˜s) + μs;
      us,k+1 ← us,k − ηwk ;
   end for
   u˜s+1 ← 1
             m
               m−1 k=0 us,k ;
 end for
```
For the "snapshot" of Algorithm 2, the update method is taken as Option I in Algorithm 1, and the recommended iteration length is set to 2γ *N*. When γ is unknown, we need to find the appropriate *m* by experiments.

# *3.3 Convergence Analysis*

By Proposition 1, every *fi* is convex and *L*-smooth, and the following proposition holds:

**Proposition 3** *Each value of* ∇*F*˜ *<sup>k</sup>* (*uk* ) *is convex and L -smooth, where L* = *L*/γ *, and L is the Lipschitz constant corresponding to the smoothness of fi in Proposition 1.*

*Proof* Since every *fi* is convex and *L*-smooth, <sup>1</sup> γ *N <sup>N</sup> <sup>i</sup>*=<sup>1</sup> ∇ *fi* (*uk* ) is *L* -Lipschitz continuous, where *L* = *L*/γ . Because each value of ∇*F*˜ *<sup>k</sup>* (*uk* ) is a linear combination of <sup>∇</sup> *fi* , the summation number does not exceed <sup>1</sup> γ *N <sup>N</sup> <sup>i</sup>*=<sup>1</sup> ∇ *fi* (*uk* ). Therefore ∇ *fi* (*uk* ); as a result, ∇*F*˜ *<sup>k</sup>* (*uk* ) is convex and *L* -smooth.

For the convergence of Algorithm 2, we have the following theorem.

**Theorem 4** *If each value of* ∇*F*˜ *<sup>k</sup>* (*uk* ) *is convex and L -smooth, F is L-smooth and* σ*-strongly convex, and for the optimal point u*<sup>∗</sup> = argmin*<sup>u</sup> F*(*u*)*, assuming that m is large enough s.t.*

$$\alpha = \frac{1}{\sigma \eta \left(1 - 2L'\eta\right)m} + \frac{2L'\eta}{1 - 2L'\eta} < 1.$$

*Then the convergence of Algorithm 2 satisfies*

$$\mathbb{E}\left[F\left(\tilde{\mu}^S\right) - F\left(\mu^\*\right)\right] \le \alpha^S \left(F\left(\tilde{\mu}^0\right) - F\left(\mu^\*\right)\right).$$

*Proof* By Proposition 3, replacing *fi* in the proof of Theorem 3 with ∇*F*˜ *<sup>k</sup>* , equation (13) is rewritten as:

$$\left\|\nabla \tilde{F}\_k(u) - \nabla \tilde{F}\_k\left(u^\*\right)\right\|^2 \le \frac{1}{\mathcal{V}N} \sum\_{\{l \mid \chi\_l(l) = 1\}} 2L' \left[f\_l(u) - f\_l\left(u^\*\right) - \left\langle \nabla f\_l\left(u^\*\right), u - u^\* \right\rangle\right].$$

The corresponding Eq. (14) is

$$\begin{split} &\mathbb{E}\left[\left\|\nabla\tilde{F}\_{k}(\boldsymbol{u})-\nabla\tilde{F}\_{k}\left(\boldsymbol{u}^{\*}\right)\right\|^{2}\right] \\ &\leq \frac{1}{\mathcal{V}N}\mathbb{E}\left[\sum\_{\{i\mid\gamma\_{i}(i)=1\}}2L^{\prime}\left[f\_{i}(\boldsymbol{u})-f\_{i}\left(\boldsymbol{u}^{\*}\right)-\left\langle\nabla f\_{i}\left(\boldsymbol{u}^{\*}\right),\boldsymbol{u}-\boldsymbol{u}^{\*}\right\rangle\right]\right] \\ &=\frac{1}{\mathcal{V}N}\mathcal{V}\mathbb{E}\left[2L^{\prime}\left[F(\boldsymbol{u})-F\left(\boldsymbol{u}^{\*}\right)-\left\langle\nabla F\left(\boldsymbol{u}^{\*}\right),\boldsymbol{u}-\boldsymbol{u}^{\*}\right\rangle\right]\right] \\ &=2L^{\prime}\mathbb{E}\left[F(\boldsymbol{u})-F\left(\boldsymbol{u}^{\*}\right)\right]. \end{split}$$

The rest of the proof repeats the proof of Theorem 3.

**Remark 5** Theorem 4 shows that Algorithm 2 also converges linearly, and the speed of convergence is related to α. For the choice of *m*, note that Theorem 4 differs from Theorem 3 in the Lipschitz constant corresponding to the smoothness of the condition. By Proposition 3, with *L* = *L*/γ , for

$$\alpha = \frac{1}{\sigma \eta \left(1 - 2L' \eta \right) m} + \frac{2L' \eta}{1 - 2L' \eta}.$$

We can consider multipling *m* by γ times, i.e., changing *m* from 2*N* to 2γ *N*, to approximately keep the convergence of Algorithm 2.

# *3.4 Numerical Simulation*

In SISO system (1), take the system matrix (*A*, *B*,*C*) as

$$A = \begin{bmatrix} 0.50 & -0.25 & 1.00 \\ 0.15 & 0.30 & -0.50 \\ -0.75 & 0.25 & -0.25 \end{bmatrix}, \quad B = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}, \quad C = \begin{bmatrix} 0 \ 0 \ 1.0 \end{bmatrix}.$$

Take desired trajectory *yd* (*t*) = sin(2π*t*/50), time length *N* = 50, initial state *x*<sup>0</sup> = 0, and initial input *u*<sup>0</sup> = 0. When γ = 0.9 and γ = 0.6, the ILC based on full gradient (GD), stochastic gradient (SGD) and stochastic variance reduced gradient (SVRG) is shown in Fig. 3.

When calculating the full gradient, each data retransmission increases 1 to the iteration number, which means that the controller skips one round of computation until all error information is completely transmitted. Take the optimal step that the three methods can converge.

When γ = 0.9, Fig. 3a shows that the SVRG-based ILC converges slightly faster than the GD- and SGD-based ILC. When γ = 0.6, Fig. 3b illustrates a significant difference in the convergence speed of the three types ILC, from fast to slow for SVRG-, SGD-, and GD-based ILC. In summary, the SVRG-based ILC under error data dropouts, i.e., Algorithm 2, outperforms the GD- and SGD-based ILC under different successful transmission rates, and the difference becomes more significant as the γ decreases.

# **4 Model-Free SVRG-Based ILC for MIMO Systems**

This section extends the Algorithm 1 in Sect. 2 from SISO systems to MIMO systems. Firstly, a system description of the discrete linear MIMO system is given. Secondly, the existing model-free data-driven methods are introduced, and a new model-free data-driven ILC based on SVRG method is constructed. Thirdly, the convergence of the algorithm under non-strongly convex conditions is proved. Finally, numerical simulations are established to verify the convergence performance of SVRG-based ILC in deterministic and noisy systems.

# *4.1 System Description*

Consider the following discrete linear multi-input multi-ouput (MIMO) system *J* , which has *q* inputs *u*<sup>1</sup> *<sup>k</sup>* , *u*<sup>2</sup> *<sup>k</sup>* ,..., *<sup>u</sup><sup>q</sup> <sup>k</sup>* , and *p* outputs *y*<sup>1</sup> *<sup>k</sup>* , *y*<sup>2</sup> *<sup>k</sup>* ,..., *<sup>y</sup> <sup>p</sup> <sup>k</sup>* . Rewrite the system in the form of (2),

$$\mathfrak{y}\_k = \bigwedge \mathfrak{u}\_k,\tag{19}$$

where

$$\mathcal{J}' = \begin{bmatrix} J\_{11} & J\_{12} & \cdots & J\_{1q} \\ J\_{21} & J\_{22} & \cdots & J\_{2q} \\ \vdots & \vdots & \ddots & \vdots \\ J\_{p1} & J\_{p2} & \cdots & J\_{pq} \end{bmatrix}, \mathfrak{u}\_k = \begin{bmatrix} \mathfrak{u}\_k^1 \\ \mathfrak{u}\_k^2 \\ \vdots \\ \mathfrak{u}\_k^q \end{bmatrix}, \mathfrak{y}\_k = \begin{bmatrix} \mathbf{y}\_k^1 \\ \mathbf{y}\_k^2 \\ \vdots \\ \mathbf{y}\_k^p \end{bmatrix}.$$

Each *Ji j* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>*×*<sup>N</sup>* has the same properties as matrix *<sup>H</sup>* in (2). *<sup>y</sup><sup>i</sup> k* = *yi <sup>k</sup>* (1), . . . , *<sup>y</sup><sup>i</sup> <sup>k</sup>* (*N*) *T* , *u j <sup>k</sup>* = *u j <sup>k</sup>* (0), . . . , *<sup>u</sup> <sup>j</sup> <sup>k</sup>* (*N* − 1) *T* , *N* is the length of time, and the desired trajectory is *y<sup>d</sup>* = *y*1 *d T* , *y*2 *d <sup>T</sup>* ,..., *y p d <sup>T</sup> <sup>T</sup>* .

For this system, consider the following assumptions:

**Assumption 4** System matrix *J* = 0.

**Assumption 5** The dimension of input signal does not exceed the dimension of output signal, i.e., *p* ≥ *q*.

**Remark 6** If *p* = *q* = 1, the system (19) degenerates to SISO system (2). Unlike Assumptions 1 and 4 can no longer guarantee that the system matrix *J* is of full rank. Assumption 4 is the most fundamental, since if *J* = 0, any input signal cannot track *yd* . Assumption 2 is also relaxed from the system, the reasons will be given in the proof of the convergence. In addition, Assumption 5 is added for the MIMO system, because if the input dimension is larger than the output dimension, it means that there is a redundant information.

For desired trajectory *y<sup>d</sup>* , consider control objective similar to (6), i.e., to find a sequence {*u<sup>k</sup>* }, s.t.

$$G\left(\mathfrak{u}\_{k}\right) = \frac{1}{2p} \left\|\mathfrak{e}\_{k}\right\|^{2} = \frac{1}{2p} \left\|\mathfrak{y}\_{d} - \stackrel{\mathcal{J}}{\mathcal{J}}\mathfrak{u}\_{k}\right\|^{2}, \lim\_{k \to \infty} G\left(\mathfrak{u}\_{k}\right) = G\left(\mathfrak{u}^{\*}\right), \qquad (20)$$

where *u*<sup>∗</sup> is the input when *G* takes the minimum value, and the error signal *e<sup>k</sup>* is

$$\mathbf{e}\_{k} = \mathbf{y}\_{d} - \mathbf{y}\_{k} = \begin{bmatrix} \mathbf{y}\_{k}^{1} - \mathbf{y}\_{d}^{1} \\ \mathbf{y}\_{k}^{2} - \mathbf{y}\_{d}^{2} \\ \vdots \\ \mathbf{y}\_{k}^{p} - \mathbf{y}\_{d}^{p} \end{bmatrix} = \begin{bmatrix} e\_{k}^{1} \\ e\_{k}^{2} \\ \vdots \\ e\_{k}^{p} \end{bmatrix}.$$

# *4.2 Algorithm Design*

The full gradient of *<sup>G</sup>* in (20) is <sup>∇</sup>*<sup>k</sup>* = ∇*<sup>G</sup>* (*u<sup>k</sup>* ) = − <sup>1</sup> *<sup>p</sup>J <sup>T</sup> y<sup>d</sup>* − *J u<sup>k</sup>* .We need the information of *J <sup>T</sup>* to calculate the full gradient. However, in model-free learning, we want to obtain the gradient by conducting experiments on system *J* only. For this purpose, Oomen et al. (2014) gives the following method to estimate *J <sup>T</sup>* .

**Lemma 1** *For SISO system J* = *J*11*, its transpose J <sup>T</sup> can be obtained by matrix multiplication*

$$(J\_{11})^T = \mathcal{T}\_N J\_{11} \mathcal{T}\_N,$$

*where T is the N -order permutation matrix whose anti-diagonal is 1, i.e.,*

$$\mathcal{T}\_N = \begin{bmatrix} 0 \cdots \cdots 0 \ 1 \\ 0 \cdots \cdots 1 \ 0 \\ \vdots & \vdots \vdots \\ 1 \cdots \cdots 0 \ 0 \end{bmatrix}.$$

*Therefore the full gradient of SISO system* <sup>−</sup> <sup>1</sup> *<sup>p</sup>* (*J*11) *<sup>T</sup> <sup>e</sup><sup>k</sup>* = − <sup>1</sup> *<sup>p</sup> T<sup>N</sup> J*11*T<sup>N</sup> e<sup>k</sup> can be obtained by a single experiment.*

**Lemma 2** *For MIMO system J , whose transpose J <sup>T</sup> is*

Iterative Learning Control Based on Random Variance … 101

$$\boldsymbol{\mathcal{J}}^{\boldsymbol{T}} = \begin{bmatrix} \left(\boldsymbol{J}\_{11}\right)^{\boldsymbol{T}} \cdots \left(\boldsymbol{J}\_{p1}\right)^{\boldsymbol{T}}\\\vdots & \ddots & \vdots\\\left(\boldsymbol{J}\_{lq}\right)^{\boldsymbol{T}} \cdots \left(\boldsymbol{J}\_{pq}\right)^{\boldsymbol{T}} \end{bmatrix} = \underbrace{\begin{bmatrix} \boldsymbol{T}\_{\boldsymbol{N}} \cdots \cdots \boldsymbol{0}\\\vdots \ddots \vdots\\\boldsymbol{0} \cdots \cdots \boldsymbol{T}\_{\boldsymbol{N}} \end{bmatrix}}\_{\boldsymbol{T}^{q\boldsymbol{N}}} \underbrace{\begin{bmatrix} \boldsymbol{J}\_{11} \cdots \cdots \boldsymbol{J}\_{p1} \\\vdots \ddots \vdots \\\boldsymbol{J}\_{1q} \cdots \cdots \boldsymbol{J}\_{pq} \end{bmatrix}}\_{\boldsymbol{J}} \underbrace{\begin{bmatrix} \boldsymbol{T}\_{\boldsymbol{N}} \cdots \cdots \boldsymbol{0}\\\vdots \ddots \vdots \\\boldsymbol{0} \cdots \cdots \boldsymbol{T}\_{\boldsymbol{N}} \end{bmatrix}}\_{\boldsymbol{T}^{p\boldsymbol{N}}}.$$

For symmetric MIMO systems, <sup>−</sup>*J*˜ = *<sup>J</sup>* , so the full gradient <sup>−</sup> <sup>1</sup> *<sup>p</sup>J <sup>T</sup> e<sup>k</sup>* of MIMO system cannot be obtained from a single experiment on system *J* . The method proposed by Oomen et al. (2014) estimates *J <sup>T</sup>* from *pq* experiments:

$$\mathcal{J}^T = \mathcal{J}^{qN} \left( \sum\_{i=1}^q \sum\_{j=1}^p \mathcal{L}^{ij} \lrcorner \mathcal{J} \stackrel{\circ}{\mathcal{L}}^{ij} \right) \mathcal{J}^{pN},\tag{21}$$

where *<sup>L</sup>i j* is a matrix consisting of *<sup>q</sup>* <sup>×</sup> *<sup>p</sup>* blocks. In *<sup>L</sup>i j* , the (*i*, *<sup>j</sup>*) block is unit matrix of order *N*, and the remaining blocks are all 0:

$$\mathcal{L}^{ij} = \begin{bmatrix} \mathbf{0} \cdots \mathbf{0} & \cdots \mathbf{0} \\ \vdots \ddots & \vdots & \ddots \vdots \\ \mathbf{0} \cdots \mathbf{I}\_N & \cdots \mathbf{0} \\ \vdots & \ddots & \vdots & \ddots \vdots \\ \mathbf{0} \cdots & \mathbf{0} & \cdots & \mathbf{0} \end{bmatrix} \in \mathbb{R}^{qN \times pN} \mathbf{.}$$

In (21), the left multiplication matrix *<sup>L</sup>i j* takes the *<sup>i</sup>*-th row of *<sup>J</sup>* , and the right multiplication matrix *<sup>L</sup>i j* takes out the *<sup>j</sup>*-th column of *<sup>J</sup>* . The two multiplications lead to a great loss of system information. We would like to improve the above method by extracting as much system information as possible. Therefore, consider the following decomposition as (9).

$$\text{Set } \mathbf{g}\_i \ (\mathbf{u}\_k) = \frac{1}{2} \left\| \boldsymbol{e}\_k^i \right\|^2 = \frac{1}{2} \left\| \mathbf{y}\_d^i - \sum\_{j=1}^q J\_{ij} \boldsymbol{u}\_k^j \right\|^2 \text{, then (20) can be written as } \mathbf{y}$$

$$G\left(\mathfrak{u}\_{k}\right) = \frac{1}{p} \sum\_{i=1}^{n} \mathfrak{g}\_{i}\left(\mathfrak{u}\_{k}\right). \tag{22}$$

Taking gradient on the both sides, we have

$$\nabla G\left(\mathfrak{u}\_{k}\right) = \frac{1}{p} \sum\_{i=1}^{n} \nabla \mathfrak{g}\_{i}\left(\mathfrak{u}\_{k}\right),\tag{23}$$

where

$$\nabla g\_i \ (\mathfrak{u}\_k) = - \begin{bmatrix} J\_{i1}^T e\_k^i \\ J\_{i2}^T e\_k^i \\ \vdots \\ \vdots \\ J\_{iq}^T e\_k^i \end{bmatrix} \in \mathbb{R}^{qN} \cdot \mathbb{R}$$

Note that the ∇*gi* (*u<sup>k</sup>* ) can be calculate by one line of the system matrix. The following lemma can help us design a controller to take out this information.

**Lemma 3** *Calculating* ∇*gi* (*u<sup>k</sup>* ) *only needs a single experiment.*

*Proof* First, we note that

$$
\underbrace{\begin{bmatrix} 0, \dots, 0, \underset{J}{\mathcal{J}\_{N}}, 0, \dots, 0 \end{bmatrix}}\_{T\_{l}^{qN}} \underbrace{\begin{bmatrix} J\_{11} & J\_{12} & \dots & J\_{1q} \\ J\_{21} & J\_{22} & \dots & J\_{2q} \\ \vdots & \vdots & \ddots & \vdots \\ J\_{p1} & J\_{p2} & \dots & J\_{pq} \end{bmatrix}}\_{\mathcal{J}} \underbrace{\begin{bmatrix} T\_{N} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & T\_{N} \end{bmatrix}}\_{T^{qN}} = \begin{bmatrix} J\_{11}^{T}, \dots, J\_{iq}^{T} \end{bmatrix} \in \mathbb{R}^{N \times qN},
$$

where *<sup>T</sup> pN <sup>i</sup>* <sup>∈</sup> <sup>R</sup>*N*×*pN* is the matrix whose *<sup>i</sup>*-th block is *<sup>T</sup><sup>N</sup>* and the rest blocks are 0. For *e<sup>i</sup> <sup>k</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* ,

$$e\_k^i = \underbrace{[0, \dots, 0, \underset{\mathcal{A}\_i}{\mathbf{0}}, 0, \dots, 0]}\_{\mathcal{A}\_i} \mathbf{e}\_k,$$

where *<sup>L</sup><sup>i</sup>* <sup>∈</sup> <sup>R</sup>*N*×*q N* is the matrix whose *<sup>i</sup>*-th block is an identity matrix of order *<sup>N</sup>* and the rest blocks are 0.

The matrix multiplication method can retrieve a row of information of the system, but it cannot directly obtain the matrix for further computation. Therefore, simple and easy-to-implement linear mappings are considered to change the matrix to suitable dimension.

Set *E<sup>i</sup> <sup>k</sup>* <sup>∈</sup> <sup>R</sup>*q N*×*<sup>q</sup>* and define a linear mapping :

$$
\Psi e\_k^i = E\_k^i = \begin{bmatrix} e\_k^i \ \cdots \ 0 \\ \vdots \ \ddots \ \vdots \\ 0 \ \cdots \ e\_k^i \end{bmatrix},
$$

where *E<sup>i</sup> <sup>k</sup>* is the matrix blocked by *<sup>N</sup>* <sup>×</sup> 1, with *<sup>e</sup><sup>i</sup> <sup>k</sup>* on the diagonal and 0 in the rest of the blocks.

Since

$$T\_i^{pN} \mathcal{J} T^{qN} E\_k^i = \begin{bmatrix} J\_{i1}^T e\_k^i, \dots, J\_{iq}^T e\_k^i \end{bmatrix} \in \mathbb{R}^{N \times q},$$

we can define the linear map : <sup>R</sup>*<sup>N</sup>*×*<sup>q</sup>* <sup>→</sup> <sup>R</sup>*q N* . It maps matrix in <sup>R</sup>*<sup>N</sup>*×*<sup>q</sup>* to <sup>R</sup>*q N* by arranging each column of the matrix in order to a vector, i.e.,

Combining above, we have

$$\nabla \varrho\_i \left( \mathfrak{u}\_k \right) = -\Phi \lrcorner \mathcal{J}\_i^{pN} \lrcorner \mathcal{J} T^{qN} L\_i \mathfrak{e}\_k \dots$$

Thus, ∇*gi* (*u<sup>k</sup>* ) can be calculated in a single experiment.

Based on Lemma 3, a controller can be designed as shown in Fig. 4.

This controller can reduce the calculation of the full gradient in (21) from *pq* experiments to *p* experiments. However, when the system is noisy, there is no guarantee that the partial gradient estimated for each experiment ∇*gi* (*u<sup>k</sup>* ) all correspond to the same full gradient ∇*G* (*u<sup>k</sup>* ). In nosiy systems, we can use the random gradient ∇˜ *<sup>k</sup>* , which takes values uniformly {∇*gi* (*u<sup>k</sup>* )}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> s.t. *P* ∇˜ *<sup>k</sup>* = ∇*gi* (*u<sup>k</sup>* ) <sup>=</sup> <sup>1</sup> *<sup>N</sup>* . However, the SGD method converges slowly. Combining the convergence speed and the effect of noise in the system, we consider to design a SVRG-based ILC algorithm similar to Algorithm 1, as shown in Algorithm 3.

### **Algorithm 3** Data-driven MIMO SVRG-based ILC

**Input:** η, *u***0***,***0**; *<sup>m</sup>* <sup>←</sup> <sup>2</sup>*p*; *<sup>u</sup>***˜ <sup>0</sup>** <sup>←</sup> *<sup>u</sup>***0***,***0**; **for** *s* ← 0 to *S* − 1 **do** *us,***<sup>0</sup>** <sup>←</sup> *<sup>u</sup>***˜***s*,*µ<sup>s</sup>* ← ∇*<sup>G</sup> u***˜***s* ; **for** *k* ← 0 to *m* − 1 **do** *wk* ← ∇*gi us,<sup>k</sup>* <sup>−</sup> *<sup>u</sup>***˜***<sup>s</sup>* + *µs*; where *i* from {1, 2,..., *p*} randomly *us,k***+<sup>1</sup>** ← *us,<sup>k</sup>* − η*wk*; **end for** *<sup>u</sup>***˜***s***+<sup>1</sup>** <sup>←</sup> *us,m*; **end for**

Algorithm 3 uses Option II in Algorithm 1 to update the "snapshot". If *m* is twice as many as *p*, a total of 2*p* experiments are required for each internal iteration.

*p* experiments are required to compute the full gradient, so a total of 3*p* system experiments are required for each iteration. Compared with the SGD-based ILC, Algorithm 3 requires *p* more systematic experiments per *m* iterations to compute the full gradient in order to accelerate the convergence.

# *4.3 Convergence Analysis*

From the following two propositions, we will see that *G* is not necessarily strongly convex.

**Proposition 4** *G and gi are convex and L-smooth.*

*Proof* We note that

$$\nabla \mathbf{g}\_i(\mathbf{x}) = -\begin{bmatrix} J\_{i1}^T \\ J\_{i2}^T \\ \vdots \\ J\_{iq}^T \end{bmatrix} \left( \mathbf{y}\_d^i - \sum\_{j=1}^q J\_{ij} \mathbf{x}^j \right) = -J\_i^T \left( \mathbf{y}\_d^i - J\_i \mathbf{x} \right),$$

where *Ji* = *Ji*<sup>1</sup> *Ji*<sup>2</sup> ... *Jiq* , *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*q N* .*gi* is convex, and

$$\begin{aligned} \left\| \nabla g\_l(\mathbf{x}) - \nabla g\_l(\mathbf{y}) \right\| &= \left\| J\_l J\_l^T (\mathbf{x} - \mathbf{y}) \right\| \\ &\le \left\| J\_l J\_l^T \right\| \cdot \left\| \mathbf{x} - \mathbf{y} \right\| \le \sum\_{j=1}^q \left\| J\_{lj} \right\|^2 \cdot \left\| \mathbf{x} - \mathbf{y} \right\|. \end{aligned}$$

Let *L* = max*<sup>i</sup>* %*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>λ</sup>*i j*& , where λ*i j* is the maximum eigenvalue of *J <sup>T</sup> i j Ji j* . Since *J T i j Ji j* is always semipositive definite, λ*i j* = 0 if and only if *Ji j* = 0. Hence by Assumption 4, *L* > 0, *gi* is convex and *L*-smooth.

Since *G* is a convex combination of *gi* , *G* is convex and *L*-smooth.

When *p* = *q* = 1,*J* = *J*<sup>11</sup> = diag{1, 0,..., 0}, *G* is not strong convex. The following proposition gives a sufficient condition for *G* to be strongly convex.

**Proposition 5** *If the system matrix J is of full column rank, then G is* σ*-strongly convex.*

*Proof* By Assumption 5, *p* ≥ *q*, and

$$G\left(\mathbf{u}\_k\right) = \frac{1}{2p} \left(\mathbf{u}\_k^T \not\mathcal{J}^T \not\mathcal{J}\,\mathbf{u}\_k - \mathbf{y}\_d^T \not\mathcal{J}\,\mathbf{u}\_k - \mathbf{u}\_k^T \not\mathcal{J}^T \mathbf{y}\_d + \mathbf{y}\_d^T \mathbf{y}\_d\right).$$

Since *J <sup>T</sup>J* is positive definite if and only if *J* has full column rank. Therefore by Theorem 2.2, *G* (*uk* ) is σ-strongly convex when *J* has full column rank, and the strong convexity factor is <sup>1</sup> <sup>2</sup>*<sup>p</sup>* of the minimum eigenvalue of matrix *J <sup>T</sup>J* .

When *G* is strongly convex, we can prove that Algorithm 3 converges linearly similar to Algorithm 1 (Bottou et al., 2018). The convergence proof of Algorithm 3 under the strongly convex condition is omitted because of limited space. We only give the convergence proof under non-strongly convex.

First we have the following lemma (Reddi et al., 2016).

**Lemma 4** *Assume that ck* , *ck*+<sup>1</sup>,β > 0*,*

$$c\_k = c\_{k+1} \left( 1 + \eta \beta + 2\eta^2 L^2 \right) + \eta^2 L^3.$$

*If* η, β *and ck*+<sup>1</sup> *are chosen such that*

$$\mathbf{T}\_k = \left(\eta - \frac{c\_{k+1}\eta}{\beta} - \eta^2 L - 2c\_{k+1}\eta^2\right) > 0,$$

*then each iteration of Algorithm 3 has an upper bound*

$$\mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^2\right] \le \frac{R\_{s,k} - R\_{s,k+1}}{\mathcal{T}\_k},$$

*where Rs*,*<sup>k</sup>* - E *G us,<sup>k</sup>* + *ct us,<sup>k</sup>* <sup>−</sup> *<sup>u</sup>*˜*<sup>s</sup>* 2 *.*

*Proof* Since *gi* is *L*-smooth,

$$\log\_i \left( \mathfrak{u}\_{s,k+1} \right) \le g\_i \left( \mathfrak{u}\_{s,k} \right) + \left< \nabla g\_i \left( \mathfrak{u}\_{s,k} \right), \mathfrak{u}\_{s,k+1} - \mathfrak{u}\_{s,k} \right> + \frac{L}{2} \left\| \mathfrak{u}\_{s,k+1} - \mathfrak{u}\_{s,k} \right\|^2.$$

For fixed *s*, let *w<sup>k</sup>* = ∇*gi us,<sup>k</sup>* − ∇*gi u*˜*s* <sup>+</sup> *<sup>µ</sup>s*, then <sup>E</sup>[*w<sup>k</sup>* ] = ∇*<sup>G</sup> us,<sup>k</sup>* . Since *us,k***+<sup>1</sup>** − *us,<sup>k</sup>* = −η*w<sup>k</sup>* , we use the above equation and take the expectation on both sides to obtain

$$\mathbb{E}\left[G\left(\mathfrak{u}\_{s,k+1}\right)\right] \le \mathbb{E}\left[G\left(\mathfrak{u}\_{s,k}\right) - \eta \left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^2\right] + \frac{L\eta^2}{2}\mathbb{E}\left[\left\|\mathfrak{w}\_k\right\|^2\right].\tag{24}$$

In addition, for *us,k***+<sup>1</sup>** <sup>−</sup> *<sup>u</sup>*˜*<sup>s</sup>* <sup>2</sup> , we have

$$\begin{split} & \mathbb{E}\left[\left\|\boldsymbol{u}\_{s,k+1} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\|^{2}\right] \\ &= \mathbb{E}\left[\left\|\boldsymbol{u}\_{s,k+1} - \boldsymbol{u}\_{s,k}\right\|^{2} + \left\|\boldsymbol{u}\_{s,k} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\|^{2} + 2\left\langle\boldsymbol{u}\_{s,k+1} - \boldsymbol{u}\_{s,k}, \boldsymbol{u}\_{s,k} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\rangle\right] \\ &= \mathbb{E}\left[\eta^{2}\left\|\boldsymbol{w}\_{k}\right\|^{2} + \left\|\boldsymbol{u}\_{s,k} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\|^{2}\right] - 2\eta\mathbb{E}\left[\left\langle\nabla G\left(\boldsymbol{u}\_{s,k}\right), \boldsymbol{u}\_{s,k} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\rangle\right] \\ & \leq \mathbb{E}\left[\eta^{2}\left\|\boldsymbol{w}\_{k}\right\|^{2} + \left\|\boldsymbol{u}\_{s,k} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\|^{2}\right] \\ & \quad - 2\eta\mathbb{E}\left[\frac{1}{2\beta}\left\|\nabla G\left(\boldsymbol{u}\_{s,k}\right)\right\|^{2} + \frac{\beta}{2}\left\|\boldsymbol{u}\_{s,k} - \widetilde{\boldsymbol{u}}^{\boldsymbol{s}}\right\|^{2}\right]. \end{split} \tag{25}$$

In the above, we have used Young's inequality *x*, *<sup>y</sup>* ≤ <sup>1</sup> <sup>2</sup><sup>β</sup> *x*<sup>2</sup> <sup>+</sup> <sup>β</sup> <sup>2</sup> *y*2.

For E *wk*<sup>2</sup> , we have the following estimation:

$$\begin{split} \mathbb{E}\left[\left\|\mathbf{w}\_{k}\right\|^{2}\right] &= \mathbb{E}\left[\left\|\nabla g\_{i}\left(\mathbf{u}\_{s,k}\right) - \nabla g\_{i}\left(\tilde{\mathbf{u}}^{\circ}\right) + \nabla G\left(\tilde{\mathbf{u}}^{\circ}\right)\right\|^{2}\right] \\ &= \mathbb{E}\left[\left\|\nabla g\_{i}\left(\mathbf{u}\_{s,k}\right) - \nabla g\_{i}\left(\tilde{\mathbf{u}}^{\circ}\right) + \nabla G\left(\tilde{\mathbf{u}}^{\circ}\right) - \nabla G\left(\mathbf{u}\_{s,k}\right) + \nabla G\left(\mathbf{u}\_{s,k}\right)\right\|^{2}\right] \\ &\leq 2\mathbb{E}\left[\left\|\nabla g\_{i}\left(\mathbf{u}\_{s,k}\right) - \nabla g\_{i}\left(\tilde{\mathbf{u}}^{\circ}\right) - \left(\nabla G\left(\mathbf{u}\_{s,k}\right) - \nabla G\left(\tilde{\mathbf{u}}^{\circ}\right)\right)\right\|^{2}\right] \\ &\quad + 2\mathbb{E}\left[\left\|\nabla G\left(\mathbf{u}\_{s,k}\right)\right\|^{2}\right] \\ &\leq 2\mathbb{E}\left[\left\|\nabla g\_{i}\left(\mathbf{u}\_{s,k}\right) - \nabla g\_{i}\left(\tilde{\mathbf{u}}^{\circ}\right)\right\|^{2}\right] + 2\mathbb{E}\left[\left\|\nabla G\left(\mathbf{u}\_{s,k}\right)\right\|^{2}\right] \\ &\leq 2L^{2}\mathbb{E}\left[\left\|\mathbf{u}\_{s,k} - \tilde{\mathbf{u}}^{\circ}\right\|^{2}\right] + 2\mathbb{E}\left[\left\|\nabla G\left(\mathbf{u}\_{s,k}\right)\right\|^{2}\right], \end{split} \tag{26}$$

where the inequality is given by *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>*<sup>2</sup> <sup>≤</sup> <sup>2</sup>*a*<sup>2</sup> <sup>+</sup> <sup>2</sup>*b*2, <sup>E</sup> <sup>ζ</sup> <sup>−</sup> <sup>E</sup><sup>ζ</sup> <sup>2</sup> = <sup>E</sup><sup>ζ</sup> <sup>2</sup> − E<sup>ζ</sup> <sup>2</sup> <sup>≤</sup> <sup>E</sup><sup>ζ</sup> <sup>2</sup> and the smoothness of *gi* , i.e., ∇*gi us*,*<sup>k</sup>* − ∇*gi u*˜*s* ≤ *us,<sup>k</sup>* <sup>−</sup> *<sup>u</sup>*˜*<sup>s</sup>* 2.

Denoting *Rs*,*<sup>k</sup>* - E *G us,<sup>k</sup>* + *ck us,<sup>k</sup>* <sup>−</sup> *<sup>u</sup>*˜*<sup>s</sup>* 2 , by (24) and (25), we have

$$\begin{split} \mathcal{R}\_{s,k+1} &\leq \mathbb{E}\left[G\left(\mathfrak{u}\_{s,k}\right) - \eta \left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^2\right] + \frac{L\eta^2}{2} \mathbb{E}\left[\left\|\mathfrak{w}\_{k}\right\|^2\right] \\ &+ c\_{k+1} \mathbb{E}\left[\eta^2 \left\|\mathfrak{w}\_{k}\right\|^2 + \left\|\mathfrak{u}\_{s,k} - \tilde{\mathfrak{u}}^s\right\|^2\right] \\ &- 2c\_{k+1} \eta \mathbb{E}\left[\frac{1}{2\beta} \left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^2 + \frac{\beta}{2} \left\|\mathfrak{u}\_{s,k} - \tilde{\mathfrak{u}}^s\right\|^2\right] \\ &\leq \mathbb{E}\left[G\left(\mathfrak{u}\_{s,k}\right) - \eta \left(1 - \frac{c\_{k+1}}{\beta}\right) \left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^2\right] \\ &+ \eta^2 \left(\frac{L}{2} + c\_{k+1}\right) \mathbb{E}\left[\left\|\mathfrak{w}\_{k}\right\|^2\right] \\ &+ c\_{l+1} (1 + \eta\beta) \mathbb{E}\left[\left\|\mathfrak{u}\_{s,k} - \tilde{\mathfrak{u}}^s\right\|^2\right] \end{split}$$

From (26), we have

$$\begin{split} R\_{s,k+1} &\leq \mathbb{E}\left[G\left(\mathfrak{u}\_{s,k}\right)\right] + \left(c\_{k+1}\left(1+\eta\beta+2\eta^{2}L^{3}\right)+\eta^{2}L^{3}\right)\mathbb{E}\left[\left\|\mathfrak{u}\_{s,k}-\tilde{\mathfrak{u}}^{\circ}\right\|^{2}\right] \\ &\quad - \left(\eta-\frac{c\_{k+1}\eta}{\beta}-L\eta^{2}-2c\_{k+1}\eta^{2}\right)\mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^{2}\right] \\ &= R\_{s,k} - \left(\eta-\frac{c\_{l+1}\eta}{\beta}-L\eta^{2}-2c\_{k+1}\eta^{2}\right)\mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^{2}\right]. \end{split}$$

Let T*<sup>k</sup>* - <sup>η</sup> <sup>−</sup> *ck*+1<sup>η</sup> <sup>β</sup> − η<sup>2</sup>*L* − 2*ck*+<sup>1</sup>η<sup>2</sup> , T*<sup>k</sup>* > 0, then

$$\mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^2\right] \le \frac{R\_{s,k} - R\_{s,k+1}}{\mathcal{T}\_k}.\tag{7}$$

Because of the complexity of non-strongly convex problem, we do not consider convergence criteria in Theorems <sup>3</sup> and <sup>4</sup> such as <sup>E</sup>[*G*(*u*) <sup>−</sup> *<sup>G</sup>* (*u*∗)] <sup>≤</sup> <sup>ε</sup>, but instead proving E ∇*G*(*u*)<sup>2</sup> ≤ ε for Algorithm 3. Note that if *G* is σ-strongly convex, it is easy to verify that

$$G(\mathfrak{u}) - G\left(\mathfrak{u}^\*\right) \le \frac{1}{2\sigma} \left\| \nabla G(\mathfrak{u}) \right\|^2.$$

Thus by E ∇*G*(*u*)<sup>2</sup> <sup>≤</sup> <sup>ε</sup>, we have <sup>E</sup>[*G*(*u*) <sup>−</sup> *<sup>G</sup>* (*u*∗)] <sup>≤</sup> <sup>ε</sup>. However, this relationship does not always hold under non-strongly convex case (Ghadimi & Lan, 2013). The following theorem gives the proof of the convergence E ∇*G*(*u*)<sup>2</sup> ≤ ε of Algorithm 3.

**Theorem 5** *Suppose each gi is convex and L-smooth, and G is convex. For* 0 ≤ *k* ≤ *m* − 1*, ck* , *ck*+<sup>1</sup>,β > 0, *cm* = 0 *satisfying*

$$c\_k = c\_{k+1} \left( 1 + \eta \beta + 2\eta^2 L^2 \right) + \eta^2 L^3.$$

*and* η*,* β*, ck*+<sup>1</sup> *are chosen such that*

$$\mathbf{T}\_k = \left(\eta - \frac{c\_{k+1}\eta}{\beta} - \eta^2 L - 2c\_{k+1}\eta^2\right) > 0.1$$

*Let* τ*<sup>m</sup>* = min*<sup>k</sup>* T*<sup>k</sup>* , *u<sup>a</sup> be a uniformly distributed random vector with values* {*u<sup>s</sup>*,*<sup>k</sup>* | 0 ≤ *s* ≤ *S* − 1, 0 ≤ *k* ≤ *m* − 1}*. Denote that u*<sup>∗</sup> = argmin*<sup>u</sup> G*(*u*)*, then Algorithm 3 satisfies*

$$\mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{a}\right)\right\|^{2}\right] \leq \frac{\mathbb{E}\left[G\left(\widetilde{\mathfrak{u}}^{0}\right) - G\left(\mathfrak{u}^{\*}\right)\right]}{Sm\mathfrak{r}\_{m}}.$$

*Proof* We take *k* = 0, 1,..., *m* − 1 in Lemma 1 and sum up to obtain

$$\sum\_{k=0}^{m-1} \mathbb{E}\left[ \left\| \nabla G \left( \mathfrak{u}\_{s,k} \right) \right\|^2 \right] \le \frac{R\_{s,0} - R\_{s,m}}{\mathfrak{r}\_m} = \frac{\mathbb{E}\left[ G \left( \widetilde{\mathfrak{u}}^s \right) - G \left( \widetilde{\mathfrak{u}}^{s+1} \right) \right]}{\mathfrak{r}\_m}.$$

By the definition of *Rs*,*<sup>k</sup>* , we choose *<sup>u</sup><sup>s</sup>*,<sup>0</sup> <sup>=</sup> *<sup>u</sup>*˜*<sup>s</sup>* , *us,<sup>m</sup>* <sup>=</sup> *<sup>u</sup>*˜*<sup>s</sup>*+<sup>1</sup> in *<sup>u</sup>*˜*<sup>s</sup>* . We take *s* = 0, 1,..., *S* − 1 in the above and sum up to get

$$\begin{split} \mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{a}\right)\right\|^{2}\right] &= \frac{1}{Sm} \sum\_{s=0}^{S-1} \sum\_{k=0}^{m-1} \mathbb{E}\left[\left\|\nabla G\left(\mathfrak{u}\_{s,k}\right)\right\|^{2}\right] \\ &\leq \frac{\mathbb{E}\left[G\left(\tilde{\mathfrak{u}}^{0}\right) - G\left(\tilde{\mathfrak{u}}^{s}\right)\right]}{Sm\tau\_{m}} \leq \frac{\mathbb{E}\left[G\left(\tilde{\mathfrak{u}}^{0}\right) - G\left(\mathfrak{u}^{\*}\right)\right]}{Sm\tau\_{m}}. \end{split}$$

In the above, use the definition of *u<sup>a</sup>* and *G u*˜*s* ≥ *G* (*u*∗).

**Remark 7** Theorem 5 shows that the convergence of Algorithm 3 is *O*(1/*Sm*) under non-strongly convex conditions. The convergence of the corresponding SGDbased ILC under non-strongly convex conditions is *O*(1/ <sup>√</sup>*Sm*) ((Johnson & Zhang, 2013)). Moreover, the theorem states that the convergence of Algorithm 3 is only related to the step size η but not to the choice of the number of iterations *m*. Since the system information is unknown, η needs to be estimated by experiment.

**Remark 8** Theorem 5 indicates that *G* (*u<sup>k</sup>* ) can approach the optimal value *G* (*u*∗) = (1/2*p*) *yd*− *J u*∗2, i.e., *u<sup>k</sup>* can converge to the optimal input *u*∗. Similarly, the Assumption 2 can also be changed to lim*<sup>k</sup>*→∞ *F* (*uk* ) = *F* (*u*∗) without affecting the convergence of Algorithm 1 and Algorithm 2. If *F* (*u*∗) = 0, then lim*<sup>k</sup>*→∞ *F* (*uk* ) = 0.

**Remark 9** System (21) degenerates to SISO system when both input and output are one dimension. At this point, the theorem indicates that when system (2) satisfies Assumption 4 (Assumption 1 need not to be satisfied), Algorithm 1 and Algorithm 2 still converge with updating the "snapshot" as Option II. But the convergence rate becomes *O*(1/*Sm*) when the objective function is not strongly convex.

# *4.4 Numerical Simulation*

We take MIMO systems with 21 × 21 input-output dimensions randomly generated by the Matlab drss function (Aarnoudse & Oomen, 2021) and set the time length to *N* = 42. The desired trajectory *y<sup>d</sup>* is 0.025 in each dimension. Model-free ILC based on full gradient (GD), stochastic gradient (SGD), and stochastic variance reduced gradient (SVRG) are performed in Fig. 5. We take the optimal step of each algorithm after multiple experiments.

As shown in Fig. 5a, the GD- and SVRG-based ILC converge similarly for deterministic systems, and both are faster than the SGD-based ILC algorithm.

For randomly generated MIMO systems with input-output dimensions of 21 × 21 and *N* = 42, we add Gaussian white noise to the system.

From Fig. 5b, we can see that both GD- and SGD-based ILC converge worse than SVRG type ILC when the system is noisy, and SVRG-based ILC can still maintain excellent convergence when the system is noisy.

# **5 Conclusions**

This chapter focuses on exploring ILC based on the SVRG method. Firstly, Sect. 2 gives the basic framework of SVRG-based ILC and proves that the algorithm converges at a rate of *O*(α*<sup>k</sup>* ) under smooth and strongly convex condition. Secondly, Sect. 3 designs a SVRG-based ILC algorithm to solve random error data dropouts and proves that the algorithm converges linearly. Finally, Sect. 4 constructs a modelfree SVRG-based ILC by improving the existing model-free algorithm for MIMO systems and proves that the convergence rate is *O*(1/*k*) under smooth and convex condition. Compared to the GD- and SGD-based ILC, two numerical simulations in Sects. 3 and 4 verify that the SVRG-based ILC has superior convergence rate in both the random error dropouts and model-free contexts, respectively.

It should be noted that the SVRG-based ILC framework given in this chapter is not only applicable to the random error dropouts and model-free problems but can also be utilized to solve other error or system information deficient problems by properly decomposing the control objectives. Future research includes comparing its advantages and disadvantages with the stochastic approximation (SA) method, extending the framework to other information deficient problems, and attempting to develop algorithms with faster convergence based on this framework.

**Acknowledgements** This work was supported by the National Natural Science Foundation of China (62173333), Beijing Natural Science Foundation (Z210002), and Research Fund of Renmin University of China (2021030187).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Generalization of NTRUEncrypt**

# **Zheng Zhiyong, Liu Fengxia, Huang Wenlin, Xu Jie, and Tian Kun**

**Abstract** The main purpose of this chapter is to give a more general construction of NTRU based on ideal matrices and *q*-ary lattice theory. To understand our construction, first we discuss a more general form of the ordinary cyclic code, namely φ-cyclic code, which firstly appeared in (Lopez-Permouth et al., 2009; Shi et al., 2020); thus, we give a more generalized NTRUEncrypt from replacing finite field by real number field R.

**Keywords** <sup>φ</sup>-cyclic code · Ideal matrices · Convolutional modular Lattice · NTRU

# **1** *φ***-Cyclic Code**

Let *Fq* be a finite field with *q* elements and *q* be a power of a prime number, *Fq* [*x*] be the polynomial ring of *Fq* with variable *x*. Let *F<sup>n</sup> <sup>q</sup>* be the *n*-dimensional linear space over *Fq* , and *a* = (*a*0, *a*1, ..., *an*−<sup>1</sup>) ∈ *F<sup>n</sup> <sup>q</sup>* be a fixed vector in *F<sup>n</sup> <sup>q</sup>* with *a*<sup>0</sup> = 0, the associated polynomial of *a* given by

$$\phi(\mathbf{x}) = \phi\_a(\mathbf{x}) = \mathbf{x}^n - a\_{n-1}\mathbf{x}^{n-1} - \dots - a\_1\mathbf{x} - a\_0 \in F\_q[\mathbf{x}], a\_0 \neq 0. \tag{1}$$

Z. Zhiyong · L. Fengxia · H. Wenlin · X. Jie · T. Kun (B) Engineering Research Center of Ministry of Education for Financial Computing and Digital Engineering, Renmin University of China, Beijing 100872, China e-mail: tkun19891208@ruc.edu.cn

Z. Zhiyong e-mail: zhengzy@ruc.edu.cn

L. Fengxia e-mail: liu\_fx@ruc.edu.cn

H. Wenlin e-mail: wenlin@ruc.edu.cn

X. Jie e-mail: xujie0665@ruc.edu.cn

113

<sup>©</sup> The Author(s) 2023 Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_6

Let < φ(*x*) > be the principal ideal generated by φ(*x*) in *Fq* [*x*]. There is a one to one correspondence between *F<sup>n</sup> <sup>q</sup>* and the quotient ring *R* = *Fq* [*x*]/ < φ(*x*) >, given by

$$\mathbf{c} = (c\_0, c\_1, \dots, c\_{n-1}) \in F\_q'' \xleftarrow{} \mathbf{c}(\mathbf{x}) = c\_0 + c\_1 \mathbf{x} + \dots + c\_{n-1} \mathbf{x}^{n-1} \in \mathbb{R}. \tag{2}$$

In fact, this correspondence is also an isomorphism of Abel groups. One may extend this correspondence to subsets of *F<sup>n</sup> <sup>q</sup>* and *R* by

$$\mathcal{C} \subset F\_q'' \rightleftharpoons \mathcal{C}(\mathfrak{x}) = \{c(\mathfrak{x}) | c \in \mathcal{C}\} \subset \mathcal{R}.\tag{3}$$

If *C* ⊂ *F<sup>n</sup> <sup>q</sup>* is a linear subspace of *F<sup>n</sup> <sup>q</sup>* of dimension *k*, then *C* is called a linear code in coding theory and written by*C* = [*n*, *k*] as usual. Each vector *c* = (*c*0, *c*1, ..., *cn*−<sup>1</sup>) ∈ *C* is called a codeword of length *n*. Obviously, *C* = [*n*, 0] and *C* = [*n*, *n*] are two trivial codes. Another one is called constant codes, of which is almost trivial given by

$$C = \{ (b, b, \ldots, b) | b \in F\_q \}, \text{ and } C = [n, 1].$$

According to the given polynomial φ(*x*) = φ*a*(*x*), we may define a linear transformation τφ in *F<sup>n</sup> q* ,

$$\begin{split} \pi\_{\phi}(c) &= \pi\_{\phi}((c\_0, c\_1, \dots, c\_{n-1})) \\ &= (a\_0 c\_{n-1}, c\_0 + a\_1 c\_{n-1}, c\_1 + a\_2 c\_{n-1}, \dots, c\_{n-2} + a\_{n-1} c\_{n-1}). \end{split} \tag{4}$$

It is easily seen that τφ : *F<sup>n</sup> <sup>q</sup>* → *F<sup>n</sup> <sup>q</sup>* is a linear transformation.

**Definition 1** Let *C* ⊂ *F<sup>n</sup> <sup>q</sup>* be a linear code. It is called a φ-cyclic code, if

$$
\forall c \in \mathcal{C} \Rightarrow \mathfrak{r}\_{\phi}(c) \in \mathcal{C}.\tag{5}
$$

In other words, a linear code *C* is a φ-cyclic code, if and only if *C* is closed under linear transformation τφ. Clearly, if *a* = (1, 0, ..., 0), and φ*a*(*x*) = *x <sup>n</sup>* − 1, then the φcyclic code is precisely the ordinary cyclic code (see this chapter of Lopez-Permouth et al. (2009)).

**Remark 1** The φ-cyclic code we give here is polycyclic code in fact, which firstly appeared in (Lopez-Permouth et al., 2009; Shi et al., 2020), but we mainly concern for its application to McEliece and Niederriter's cryptosystems. We first show that there is a one to one correspondence between φ-cyclic codes in *F<sup>n</sup> <sup>q</sup>* and ideals in *R* = *Fq* [*x*]/ < φ(*x*) >.

**Theorem 1** *Let C* ⊂ *F<sup>n</sup> <sup>q</sup> be a subset, then C is a* φ*-cyclic code, if and only if C*(*x*) *is an ideal of R.*

*Proof* We use column notation for vector in *F<sup>n</sup> <sup>q</sup>* , then linear transformation τφ may be written as

$$\pi\_{\phi} \begin{pmatrix} c\_0 \\ c\_1 \\ \vdots \\ c\_{n-1} \end{pmatrix} = \begin{pmatrix} a\_{0c\_{n-1}} \\ c\_0 + a\_1 c\_{n-1} \\ \vdots \\ c\_{n-2} + a\_{n-1} c\_{n-1} \end{pmatrix}, \forall c = \begin{pmatrix} c\_0 \\ c\_1 \\ \vdots \\ c\_{n-1} \end{pmatrix} \in F\_q^n.$$

Let *T*<sup>φ</sup> be a *n* × *n* square matrix over *Fq* ,

$$T\_{\phi} = \begin{pmatrix} 0 & \cdots & 0 \\ \hline & & a\_1 \\ & I\_{n-1} & \\ & & \vdots \\ & & & a\_{n-1} \end{pmatrix} \in F\_q^{n \times n},\tag{6}$$

where *In*−<sup>1</sup> is the (*n* − 1) × (*n* − 1) unit matrix. The matrix expression of τφ is as follows:

$$\pi\_{\phi} \begin{pmatrix} c\_0 \\ c\_1 \\ \vdots \\ c\_{n-1} \end{pmatrix} = T\_{\phi} \begin{pmatrix} c\_0 \\ c\_1 \\ \vdots \\ c\_{n-1} \end{pmatrix} = \begin{pmatrix} a\_0 c\_{n-1} \\ c\_0 + a\_1 c\_{n-1} \\ \vdots \\ c\_{n-2} + a\_{n-1} c\_{n-1} \end{pmatrix} . \tag{7}$$

Suppose *C* ⊂ *F<sup>n</sup> <sup>q</sup>* and *C*(*x*) is an ideal of *R*, it is clear that *C* is a linear code of *F<sup>n</sup> q* . To prove *C* is a φ-cyclic code, we note that for any polynomial *c*(*x*) ∈ *C*(*x*), then *xc*(*x*) ∈ *C*(*x*) if and only if τφ(*c*) ∈ *C*, namely, if *c*(*x*) ∈ *C*(*x*), then

$$\text{succ}(\mathbf{x}) \in C(\mathbf{x}) \Leftrightarrow \mathsf{r}\_{\phi}(c) \in C \Leftrightarrow T\_{\phi}c \in C. \tag{8}$$

Therefore, if *C*(*x*) is an ideal of *R*, then we have immediately that *C* is a φ-cyclic code of *F<sup>n</sup> q* .

Conversely, if *C* ⊂ *F<sup>n</sup> <sup>q</sup>* is a φ-cyclic code, then for all *k* 1, we have

$$
\forall c \in \mathcal{C} \Rightarrow T\_{\phi}^{k} c \in \mathcal{C}, k \geqslant 1.
$$

It follows that

$$\forall c(\mathbf{x}) \in C(\mathbf{x}) \Rightarrow \mathbf{x}^k c(\mathbf{x}) \in C(\mathbf{x}), 0 \lessgtr k \lessgtr n-1, 1$$

which implies *C*(*x*) is an ideal of *R*. This is the proof of Theorem 1.

By Theorem 1, to find a φ-cyclic code, it is enough to find an ideal of *R*. There are two trivial ideals *C*(*x*) = 0 and *C*(*x*) = *R*, the corresponding φ-cyclic codes are *C* = [*n*, 0] and *C* = *F<sup>n</sup> <sup>q</sup>* , respectively, which are called trivial φ-cyclic code. To find non-trivial φ-cyclic codes, we make use of homomorphic theorems, which is a standard technique in Algebra. Let π be the natural homomorphism from *Fq* [*x*] to its quotient ring *R* = *Fq* [*x*]/ < φ(*x*) >, kerπ =< φ(*x*) >,

$$<\phi(\mathbf{x})> \subset N \subset F\_q[\mathbf{x}] \xrightarrow{\pi} R = F\_q[\mathbf{x}]/<\phi(\mathbf{x})>,\tag{9}$$

where *N* is an ideal of *Fq* [*x*], of which is containing kerπ =< φ(*x*) >. Since *Fq* [*x*] is a principal ideal domain, then *N* =< *g*(*x*) > is a principal ideal generated by a monic polynomial *g*(*x*) ∈ *Fq* [*x*]. It is easy to see that

$$<\phi(\alpha)>\subset\Leftrightarrow g(\alpha)|\phi(\alpha).$$

It follows that all ideals *N* satisfying (1.9) are given by

$$\{<\mathbf{g}(\mathbf{x})>\mid\mathbf{g}(\mathbf{x})\in F\_q[\mathbf{x}] \text{ is monic and } \mathbf{g}(\mathbf{x})|\phi(\mathbf{x})\}.$$

We write by < *g*(*x*) > mod φ(*x*), the image of < *g*(*x*) > under π, it is easy to check

$$<\mathbf{g}(\mathbf{x})>\mod \phi(\mathbf{x}) = \{h(\mathbf{x})\mathbf{g}(\mathbf{x}) \mid h(\mathbf{x}) \in F\_q[\mathbf{x}] \text{ and } \deg h(\mathbf{x}) + \deg \mathbf{g}(\mathbf{x}) < n\},\tag{10}$$

more precisely, which is a representative elements set of < *g*(*x*) > mod φ(*x*), by homomorphism theorem in ring theory, all ideals of *R* given by

$$\{\mathbf{x} \mid \mathbf{g(x)} \succ \mathbf{mod}\,\phi(\mathbf{x}) \mid \mathbf{g(x)} \in F\_q[\mathbf{x}] \text{ is monic and } \mathbf{g(x)} | \phi(\mathbf{x})\}.\tag{11}$$

Let *d* be the number of monic divisors of φ(*x*) in *Fq* [*x*], it follows immediately that

**Corollary 1** *The number of* φ*-cyclic code in F<sup>n</sup> <sup>q</sup> is d.*

To compare the φ-cyclic code and ordinary cyclic code, we see a simple example.

**Example 1** Constant code *C* is always a cyclic code for 1 + *x* +···+ *x <sup>n</sup>*−<sup>1</sup>|*x <sup>n</sup>* − 1, and its generated polynomial is just 1 + *x* +···+ *x <sup>n</sup>*−1. But constant code *C* in *F<sup>n</sup> q* is not always a φ-cyclic code, it is a φ-cyclic code if and only if 1 + *x* +···+ *x <sup>n</sup>*−<sup>1</sup>|φ(*x*), an equivalent condition for 1 + *x* +···+ *x <sup>n</sup>*−<sup>1</sup>|φ(*x*) is

$$a\_{n-1} = a\_{n-2} = \dots = a\_1 = b, \text{ and } a\_0 = 1 + b.$$

**Definition 2** Let *C* be a φ-cyclic code and *C*(*x*) = *g*(*x*) mod φ(*x*). We call *g*(*x*) as the generated polynomial of *C*, where *g*(*x*) is monic and *g*(*x*)|φ(*x*).

**Lemma 1** *Let g*(*x*) = *g*<sup>0</sup> + *g*1*x* +···+ *gn*−*k*−<sup>1</sup>*x <sup>n</sup>*−*k*−<sup>1</sup> + *x <sup>n</sup>*−*<sup>k</sup> be the generated polynomial of a* φ*-cyclic code C, where* 1 *k n* − 1*, and g*(*x*)|φ(*x*)*, then C* = [*n*, *k*]*, and a generated matrix for C is the following block matrix:*

$$G = \begin{pmatrix} g \\ \tau\_{\phi}(g) \\ \tau\_{\phi}^{2}(g) \\ \vdots \\ \tau\_{\phi}^{k-1}(g) \end{pmatrix}\_{k \times n},\tag{12}$$

*where g* = (*g*0, *g*1, ..., *gn*−*k*−1, 1, 0, ..., 0) ∈ *C is the corresponding codeword of g*(*x*)*, and* τ *<sup>i</sup>* φ(*g*) <sup>=</sup> <sup>τ</sup> *<sup>i</sup>*−<sup>1</sup> <sup>φ</sup> (τφ(*g*)) *for* 1 *i n* − 1*.*

*Proof* By assumption, *<sup>C</sup>*(*x*) <sup>=</sup><sup>&</sup>lt; *<sup>g</sup>*(*x*) > mod φ(*x*), then {*g*, τφ(*g*), ..., τ *<sup>k</sup>*−<sup>1</sup> <sup>φ</sup> (*g*)} ⊂ *C*, we are to prove it is a basis of *C*. First, these vectors are linearly independent. Otherwise, we have

$$\sum\_{i=0}^{k-1} b\_i \tau\_\phi^i(\mathbf{g}) = 0,\text{ for some } b\_i \in F\_q,\tag{13}$$

and the corresponding polynomial is zero, namely

$$\left(\sum\_{i=0}^{k-1} b\_i x^i\right) \mathbf{g}(\mathbf{x}) = \mathbf{0}.$$

It follows that

$$\sum\_{i=0}^{k-1} b\_i x^i = 0 \Rightarrow b\_i = 0 \text{ for all } i, 0 \leqslant i \leqslant k - 1.$$

Next, if *c* ∈ *C*, and *c*(*x*) ∈ *C*(*x*), by (1.10), there is a polynomial *b*(*x*) = *b*<sup>0</sup> + *b*1*x* + ···+ *bk*−<sup>2</sup>*x<sup>k</sup>*−<sup>2</sup> + *x<sup>k</sup>*−<sup>1</sup> such that

$$c(\mathbf{x}) = b(\mathbf{x}) \\ \mathbf{g}(\mathbf{x}) = \left(\sum\_{i=0}^{k-1} b\_i \mathbf{x}^i\right) \mathbf{g}(\mathbf{x}), \text{ where } b\_{k-1} = 1.$$

Thus, we have the corresponding codeword of *C*(*x*)

$$c = \sum\_{i=0}^{k-1} b\_i \tau\_\phi^i(\mathbf{g}).$$

This shows that {*g*, τφ(*g*), ..., τ *<sup>k</sup>*−<sup>1</sup> <sup>φ</sup> (*g*)} is a basis of *C*, and a generated matrix for *C* is

$$G = \begin{pmatrix} \mathbf{g} \\ \tau\_{\phi}(\mathbf{g}) \\ \tau\_{\phi}^2(\mathbf{g}) \\ \vdots \\ \tau\_{\phi}^{k-1}(\mathbf{g}) \end{pmatrix}\_{k \times m}$$

.

We have Lemma 1 at once.

To describe a parity check matrix for a φ-cyclic code, for any *c* = (*c*0, *c*1, ..., *cn*−<sup>1</sup>) ∈ *F<sup>n</sup> <sup>q</sup>* , we write

$$\overline{c} = (c\_{n-1}, c\_{n-2}, \dots, c\_1, c\_0) \in F\_q^n.$$

**Lemma 2** *Suppose C is a* φ*-cyclic code with generated polynomial g*(*x*)*, where g*(*x*)|φ(*x*) *and degg*(*x*) = *n* − *k. Let h*(*x*)*g*(*x*) = φ(*x*)*, where h*(*x*) = *h*<sup>0</sup> + *h*1*x* + ···+ *hk*−1*xk*−<sup>1</sup> + *x<sup>k</sup> . Then a parity check matrix for C is*

$$H = \begin{pmatrix} \overline{h} \\ \tau\_{\phi}(\overline{h}) \\ \vdots \\ \tau\_{\phi}^{n-k-1}(\overline{h}) \end{pmatrix}\_{(n-k)\times n}.\tag{14}$$

*Proof* Since *h*(*x*)*g*(*x*) = φ(*x*), it means that *h*(*x*)*g*(*x*) = 0 in *R* = *Fq* [*x*]/ < φ(*x*) >, thus we have

$$g\_0 h\_i + g\_1 h\_{i-1} + \dots + g\_{n-k} h\_{i-n+k} = 0, \forall 0 \le i \le n-1.$$

It follows that *G H* = 0, where *G* is a generated matrix for *C* given by (1.12). Therefore, *H* is a parity check matrix for *C*.

A separable polynomial in Algebra means that it has no multiple roots in its splitting field. The following lemma shows that there is an unit element in any nonzero ideal of *R*, when φ(*x*) is a separable polynomial.

**Lemma 3** *Suppose* φ(*x*) *is a separable polynomial of Fq , and C*(*x*) = *g*(*x*) *mod* φ(*x*)*is an ideal of R with degg*(*x*) *n* − 1*, then there exists an element d*(*x*) ∈ *C*(*x*) *such that*

$$c(\mathbf{x})d(\mathbf{x}) = c(\mathbf{x}), \text{ for all } c(\mathbf{x}) \in C(\mathbf{x}).$$

*Proof* Let *h*(*x*)*g*(*x*) = φ(*x*). Since φ(*x*) is a separable polynomial, then gcd(*g*(*x*), *h*(*x*)) = 1, and there are two polynomials *a*(*x*) and *b*(*x*) in *Fq* [*x*] such that

$$a(\mathbf{x})\mathbf{g}(\mathbf{x}) + b(\mathbf{x})h(\mathbf{x}) = 1.$$

Let

$$d(\mathbf{x}) = a(\mathbf{x})g(\mathbf{x}) = 1 - b(\mathbf{x})h(\mathbf{x}) \in C(\mathbf{x}).$$

If *c*(*x*) ∈ *C*(*x*), by (1.10), we write *c*(*x*) = *g*(*x*)*g*1(*x*), it follows that

$$c(\mathbf{x})d(\mathbf{x}) \equiv a(\mathbf{x})g(\mathbf{x})g(\mathbf{x})g\_1(\mathbf{x}) \equiv (1 - b(\mathbf{x})h(\mathbf{x}))g(\mathbf{x})g\_1(\mathbf{x}),$$

$$\equiv g(\mathbf{x})g\_1(\mathbf{x}) \equiv c(\mathbf{x})(\text{mod }\phi(\mathbf{x})).$$

Thus, we have *c*(*x*)*d*(*x*) = *c*(*x*) in *R*.

Next, we discuss maximal φ-cyclic code. Let *C*(*x*) = *g*(*x*) mod φ(*x*), and *g*(*x*) be an irreducible polynomial in *Fq* [*x*], we call the corresponding φ-cyclic code *C* a maximal φ-cyclic code, because < *g*(*x*) > is a maximal ideal in *Fq* [*x*].

**Lemma 4** *Let C be a maximal* φ*-cyclic code with generated polynomial g*(*x*)*,* β *be a root of g*(*x*) *in some extensions of Fq , then*

$$C(\mathbf{x}) = \{ a(\mathbf{x}) \mid a(\mathbf{x}) \in R \text{ and } a(\boldsymbol{\beta}) = \mathbf{0} \}. \tag{15}$$

*Proof* If *a*(*x*) ∈ *C*(*x*), by (1.10) we have *a*(β) = 0 immediately. Conversely, if *a*(*x*) ∈ *Fq* [*x*] and *a*(β) = 0, since *g*(*x*) is irreducible, thus we have *g*(*x*)|*a*(*x*), and (1.15) follows at once.

An important application of maximal φ-cyclic code is to construct an errorcorrecting code, so that we may obtain modified McEliece-Niederriter's cryptosystem. To do this, let 1 *<sup>m</sup>* <sup>&</sup>lt; <sup>√</sup>*n*, and *Fqm* be an extension field of *Fq* of degree *<sup>m</sup>*. Suppose *Fqm* = *Fq* (θ ), where θ is a primitive element of *Fqm* and *Fq* (θ ) is the simple extension containing *Fq* and θ. Let *g*(*x*) ∈ *Fq* [*x*] be the minimum polynomial of θ, then *g*(*x*) is an irreducible polynomial of degree *m* of *Fq* [*x*]. It is well-known that *Fqm* is a Galois extension of *Fq* , so that all roots of *g*(*x*) are in *Fqm* . Let β1, β2, ..., β*<sup>m</sup>* be all roots of *g*(*x*), the Vandermonde matrix *V*(β1, β2, ..., β*m*) defined by

$$H = V(\beta\_1, \beta\_2, \dots, \beta\_m) = \begin{pmatrix} 1 & \beta\_1 & \beta\_1^2 & \cdots & \beta\_1^{n-1} \\ 1 & \beta\_2 & \beta\_2^2 & \cdots & \beta\_2^{n-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & \beta\_m & \beta\_m^2 & \cdots & \beta\_m^{n-1} \end{pmatrix}\_{m \times n},\tag{16}$$

where β<sup>1</sup> = θ and each β*<sup>i</sup>* is a vector of (*Fq* )*<sup>m</sup>*. For arbitrary monic polynomial *h*(*x*) ∈ *Fq* [*x*], deg*h*(*x*) = *n* − *m*, let φ(*x*) = *h*(*x*)*g*(*x*) and *C* be a maximal φ-cyclic code generated by *g*(*x*). It is easy to verify that

$$c \in C \Leftrightarrow cH' = 0.$$

Therefore, *H* is a parity check matrix for *C*. If we choose the primitive element θ, so that any *d* − 1 columns in *H* are linearly independent, then the minimum distance of *<sup>C</sup>* is greater than *<sup>d</sup>*, and *<sup>C</sup>* is a t-error-correcting code, where *<sup>t</sup>* = [ *<sup>d</sup>* 2 ].

The public key cryptosystems based on algebraic coding theory were created by Lyubashevsky and Micciancio (2006) and Micciancio and Regev (2009), a suitable t-error-correcting code plays a key role in their construction. The error-correcting code *C* should satisfy the following requirements:


Our results supply a different way to choose an error-correcting code by selecting arbitrary irreducible polynomials *g*(*x*) ∈ *Fq* [*x*] of degree *m* and roots of *g*(*x*) rather than an irreducible factor of *x <sup>n</sup>* − 1 and the roots of unit such as ordinary BCH code and Gappa code.

In fact, for any positive integer *m*, there is at least an irreducible polynomial *g*(*x*) ∈ *Fq* [*x*] with degree *m*. Let *Nq* (*m*) be the number of irreducible polynomials of degree *m* in *Fq* [*x*], then we have (see Theorem 3.25 of Lidl and Niederreiter (1983))

$$N\_q(m) = \frac{1}{m} \sum\_{d|m} \mu\left(\frac{m}{d}\right) q^d = \frac{1}{m} \sum\_{d|m} \mu(d) q^{\frac{m}{d}},$$

where *u*(*d*) is Mobi*u*¨s function.

Assuming one has selected two monic and irreducible polynomials *g*(*x*) and *h*(*x*) with deg*g*(*x*) = *m* and deg*h*(*x*) = *n* − *m*, let φ(*x*) = *g*(*x*)*h*(*x*), then one may obtain φ-cyclic code *C* generated by *g*(*x*) or *h*(*x*), which is more convenient and more flexible than the ordinary methods.

# **2 A Generalization of NTRUEncrypt**

The public key cryptosystem NTRU proposed in 1996 by Hoffstein, Pipher, and Silverman is the fastest known lattice-based encryption scheme, although its description relies on arithmetic over polynomial quotient ring *Z*[*x*]/ < *x <sup>n</sup>* − 1 >, it was easily observed that it could be expressed as a lattice-based cryptosystem (see IEEE Computer Society (2000)). For the background materials, we refer to (Coppersmith & Shamir, 1997; Hoffstein et al., 1998, 2017; Lint, 1999; McEliece, 1978; Micciancio, 2001). Our strategy in this section is to replace *Z*[*x*]/ < *x<sup>n</sup>* − 1 > by a more general polynomial ring *Z*[*x*]/ < φ(*x*) > and obtain a generalization of NTRUEncrypt, where φ(*x*) is a monic polynomial of degree *n* with integer coefficients.

In this section, we denote φ(*x*) and *R* by

$$\phi(\mathbf{x}) = \mathbf{x}^n - a\_{n-1}\mathbf{x}^{n-1} - \dots - a\_1\mathbf{x} - a\_0 \in Z[\mathbf{x}],$$

$$R = \mathbf{Z}[\mathbf{x}]/<\phi(\mathbf{x})>, a\_0 \neq 0. \tag{17}$$

Let *H*<sup>φ</sup> ∈ *Z<sup>n</sup>*×*<sup>n</sup>* be a square matrix given by

$$H = H\_{\phi} = \begin{pmatrix} 0 & \cdots & 0 & a\_0 \\ \hline & & a\_1 \\ & I\_{n-1} & \begin{pmatrix} & \\ & \vdots \\ & & \\ a\_{n-1} & \end{pmatrix}\_{n \times n} \end{pmatrix}\_{n \times n}, \tag{18}$$

where *In*−<sup>1</sup> is (*n* − 1) × (*n* − 1) unit matrix. Obviously, φ(*x*) is the characteristic polynomial of *<sup>H</sup>*, and *<sup>H</sup>* defines a linear transformation of <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>n</sup>* by *<sup>x</sup>* <sup>→</sup> *H x*, where R is real number field and *x* is a column vector of R*<sup>n</sup>*. We may extend this transformation to R2*<sup>n</sup>* and denote σ by

$$
\sigma \begin{pmatrix} \alpha \\ \beta \end{pmatrix} = \begin{pmatrix} H\alpha \\ H\beta \end{pmatrix}, \text{ where } \begin{pmatrix} \alpha \\ \beta \end{pmatrix} \in \mathbb{R}^{2n}. \tag{19}
$$

Of course, <sup>σ</sup> is again a linear transformation of <sup>R</sup>2*<sup>n</sup>* <sup>→</sup> <sup>R</sup>2*<sup>n</sup>*.

According to Micciancio (2001), a *q*-ary lattice is a lattice *L* such that *q Z<sup>n</sup>* ⊂ *L* ⊂ *Z<sup>n</sup>*, where *q* is a positive integer.

**Definition 3** A *q*-ary lattice *L* is called convolutional modular lattice, if *L* is in even dimension 2*n* satisfying

$$\forall \begin{pmatrix} \alpha \\ \beta \end{pmatrix} \in L \Rightarrow \sigma \begin{pmatrix} \alpha \\ \beta \end{pmatrix} = \begin{pmatrix} H\alpha \\ H\beta \end{pmatrix} \in L. \tag{20}$$

In other words, a convolutional modular lattice is a *q*-ary lattice in even dimension and is closed under the linear transformation σ.

Recalling the secret key *<sup>f</sup> g* of NTRU is a pair of polynomials of degree *n* − 1, we may regard *f* and *g* as column vectors in *Z<sup>n</sup>*. To obtain a convolutional modular lattice containing *<sup>f</sup> g* , we need some help of ideal matrices. An ideal matrix generated by a vector *f* is defined by

$$H^\*(f) = H^\*\_\phi(f) = [f, Hf, H^2f, \dots, H^{n-1}f]\_{n \times n},\tag{21}$$

which is a block matrix in terms of each column *H<sup>k</sup> f* (0 *k n* − 1). It is easily seen that *H*∗( *f* ) is a generalization of the classical circulant matrices (see Davis (1994)), in fact, let φ(*x*) = *x <sup>n</sup>* − 1, and *f* (*x*) = *f*<sup>0</sup> + *f*1*x* +···+ *fn*−<sup>1</sup>*x <sup>n</sup>*−<sup>1</sup> ∈ *Z*[*x*], the ideal matrix *H*<sup>∗</sup> <sup>φ</sup> ( *f* ) generated by *f* is given by

122 Z. Zhiyong et al.

$$H^\*(f) = H^\*\_\phi(f) = \begin{pmatrix} f\_0 & f\_{n-1} \ \cdots \ f\_1 \\ f\_1 & f\_0 \ \cdots \ f\_2 \\ \vdots & \vdots & \vdots \\ f\_{n-1} & f\_{n-2} \ \cdots & f\_0 \end{pmatrix}, \phi(x) = x^n - 1,$$

which is known as a circulant matrix. On the other hand, ideal matrix and ideal lattice play an important role in Ajtai's construction of a collision resistant Hash function, the related materials we refer to (Ajtai, 1996; Ajtai & Dwork, 1997; Lint, 1999; Niederreiter, 1986; Plantard & Schneider, 2013; Pradhan et al., 2019).

First, we have to establish some basic properties for an ideal matrix *H*∗( *f* ), most of them are known when *H*∗( *f* ) is a circulant matrix.

**Lemma 5** *Suppose H and H*∗( *f* ) *are given by (2.2) and (2.5), respectively, then for any f* <sup>∈</sup> <sup>R</sup>*n, we have*

$$H \cdot H^\*(f) = H^\*(f) \cdot H, \quad \forall f \in \mathbb{R}^n.$$

*Proof* Since φ(*x*) = *x <sup>n</sup>* − *an*−1*x <sup>n</sup>*−<sup>1</sup> −···− *a*1*x* − *a*<sup>0</sup> is the characteristic polynomial of *H*, by the Hamilton-Cayley theorem, we have

$$H^n = a\_0 I\_n + a\_1 H + \dots + a\_{n-1} H^{n-1}.\tag{22}$$

Let

$$b = \begin{pmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_{n-1} \end{pmatrix}, \text{ and } H = \begin{pmatrix} 0 & a\_0 \\ I\_{n-1} & b \end{pmatrix}.$$

By (2.5) we have

$$H^\*(f)H = [f, Hf, \dots, H^{n-1}f] \begin{pmatrix} 0 & a\_0 \\ I\_{n-1} & b \end{pmatrix}$$

$$= [Hf, H^2f, \dots, H^{n-1}f, a\_0f + a\_1Hf + \dots + a\_{n-1}H^{n-1}f]$$

$$= [Hf, H^2f, \dots, H^{n-1}f, H^nf]$$

$$= H[f, Hf, \dots, H^{n-1}f] = H \cdot H^\*(f).$$

The lemma follows.

$$\mathbf{\color{red}{Lemma 6}}\quad For\ any\ f=\begin{pmatrix}f\_0\\f\_1\\\vdots\\\vdots\\f\_{n-1}\end{pmatrix}\in\mathbb{R}^n\text{ we have}$$

$$H^\*(f) = f\_0 I\_n + f\_1 H + \dots + f\_{n-1} H^{n-1}.\tag{23}$$

*Proof* We use induction on *n* to show this conclusion. If *n* = 1, it is trivial. Suppose it is true for *n*, we consider the case of *n* + 1. For this purpose, we write *H* = *Hn*, *e*1, *e*2, ..., *en* the *n* column vectors of unit in R*<sup>n</sup>*, namely

$$e\_1 = \begin{pmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}, e\_2 = \begin{pmatrix} 0 \\ 1 \\ \vdots \\ 0 \end{pmatrix} \cdot \cdots e\_n = \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{pmatrix},$$

and

$$H\_{n+1} = \begin{pmatrix} 0 & A\_0 \\ e\_1 & H n \end{pmatrix},$$

where *<sup>A</sup>*<sup>0</sup> <sup>=</sup> (0, <sup>0</sup>, ..., *<sup>a</sup>*0) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is a row vector. For any *<sup>k</sup>*, 1 *<sup>k</sup> <sup>n</sup>* <sup>−</sup> 1, it is easy to check that

$$H\_n e\_k = e\_{k+1}, \ H\_n^k e\_1 = e\_{k+1} \text{ and } H\_{n+1}^k = \begin{pmatrix} 0 & A\_0 H\_n^{k-1} \\ e\_k & H\_n^k \end{pmatrix}.$$

$$\begin{aligned} \text{Let } f = \begin{pmatrix} f\_0 \\ f\_1 \\ \vdots \\ f\_{n-1} \\ f\_n \end{pmatrix} \in \mathbb{R}^{n+1}, \text{ we denote } f' \text{ by} \\\\ f' = \begin{pmatrix} f\_1 \\ f\_2 \\ \vdots \\ f\_n \end{pmatrix} \in \mathbb{R}^n, \text{ and } f = \begin{pmatrix} \end{pmatrix} \end{aligned}$$

*fn*

By the assumption of induction, we have

$$H\_n^\*(f') = [f', H\_n f', \dots, H\_n^{n-1} f'] = f\_1 I\_n + f\_2 H\_n + \dots + f\_n H\_n^{n-1}.$$

 *f*<sup>0</sup> *f* .

It follows that

$$H\_{n+1}^\*(f) = \left[ \begin{pmatrix} f\_0 \\ f' \end{pmatrix}, H\_{n+1} \begin{pmatrix} f\_0 \\ f' \end{pmatrix}, \dots, H\_{n+1}^n \begin{pmatrix} f\_0 \\ f' \end{pmatrix} \right]$$

$$= f\_0 I\_n + f\_1 H\_{n+1} + \dots + f\_n H\_{n+1}^n.$$

We complete the proof of Lemma 2.

We always suppose that φ(*x*) ∈ *Z*[*x*] is a separable polynomial and *w*1,*w*2, ...,*wn* are complex number roots of φ(*x*), of which are different from each other. The Vandermonde matrix *V*<sup>φ</sup> generated by {*w*1,*w*2, ...,*wn*} is

$$V\_{\phi} = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ w\_1 & w\_2 & \cdots & w\_n \\ \vdots & \vdots & & \vdots \\ w\_1^{n-1} & w\_2^{n-1} & \cdots & w\_n^{n-1} \end{pmatrix}, \qquad \text{and} \quad \det(V\_{\phi}) \neq 0.$$

**Lemma 7** *Let f* (*x*) <sup>=</sup> *<sup>f</sup>*<sup>0</sup> <sup>+</sup> *<sup>f</sup>*1*<sup>x</sup>* +···+ *fn*−1*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> <sup>∈</sup> <sup>R</sup>[*x*]*, then we have*

$$H^\*(f) = V\_\phi^{-1} \operatorname{diag} \left\{ f(\mathbf{w}\_1), f(\mathbf{w}\_2), \dots, f(\mathbf{w}\_n) \right\} V\_\phi,\tag{24}$$

*where diag* { *f* (*w*1), *f* (*w*2), ..., *f* (*wn*)} *is the diagonal matrix.*

*Proof* By Theorem 3.2.5 of Davis (1994), for *H*, we have

$$H = V\_{\phi}^{-1} \text{diag} \left\{ w\_1, w\_2, \dots, w\_n \right\} V\_{\phi}. \tag{25}$$

By Lemma 2, it follows that

$$H^\*(f) = V\_\phi^{-1} \text{diag}\left\{ f(\mathbf{w}\_1), f(\mathbf{w}\_2), \dots, f(\mathbf{w}\_n) \right\} V\_\phi.$$
 
$$\Box$$

Now, we summarize some basic properties for ideal matrix as follows.

**Theorem 2** *Let f* <sup>∈</sup> <sup>R</sup>*n, g* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> be two column vectors and H*∗( *<sup>f</sup>* ) *be the ideal matrix generated by f , then we have the following:*

*(i) H*∗( *f* )*H*∗(*g*) = *H*∗(*g*)*H*∗( *f* )*. (ii) H*∗( *f* )*H*∗(*g*) = *H*∗(*H*∗( *f* )*g*)*. (iii) det* (*H*∗( *<sup>f</sup>* )) <sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *f* (*wi*)*. (iv) H*∗( *f* ) *is an invertible matrix if and only if* φ(*x*) *and f* (*x*) *are coprime, i.e. gcd* (φ(*x*), *f* (*x*)) = 1*.*

*Proof* (i) and (ii) follow from Lemma 2 immediately, (iii) and (iv) follow from Lemma 3. Here we only give an equivalent form of (ii). Let

$$f \ast \mathbf{g} = H^\*(f)\mathbf{g}.\tag{26}$$

Then by (ii) we have

$$H^\*(f \ast \mathbf{g}) = H^\*(f)H^\*(\mathbf{g}).\tag{27}$$

To construct a convolutional modular lattice containing vector *<sup>f</sup> g* , let *<sup>f</sup> g* ∈ *Z*2*<sup>n</sup>*, (*H*∗( *f* )) be the transpose of *H*∗( *f* ), and

$$A = [(H^\*(f))', (H^\*(g))'] = \begin{pmatrix} f' & g' \\ f'H' & g'H' \\ f'(H')^2 & g'(H')^2 \\ \vdots & \vdots \\ f'(H')^{n-1} & g'(H')^{n-1} \end{pmatrix}\_{n \times 2n},\tag{28}$$

$$A' = \begin{pmatrix} H^\*(f) \\ H^\*(g) \end{pmatrix} = \begin{pmatrix} f \ Hf \cdots \ H^{n-1} f \\ g \ H g \cdots \cdots \ H^{n-1} g \end{pmatrix}\_{2n \times n}.\tag{29}$$

We consider *A* and *A* as matrices over *Zq* , i.e. *A* ∈ *Zn*×2*<sup>n</sup> <sup>q</sup>* , *A* ∈ *Z*2*n*×*<sup>n</sup> <sup>q</sup>* , a *q*-ary lattice ∧*<sup>q</sup>* (*A*) is defined by (see Micciancio (2001))

$$\wedge\_q(A) = \{ \mathbf{y} \in Z^{2n} \mid \text{there exists } \mathbf{x} \in Z^n \Rightarrow \mathbf{y} \equiv A^\prime \mathbf{x} (\text{mod } q) \}. \tag{50}$$

Under the above notations, we have the following.

**Theorem 3** *For any column vectors f* ∈ *Z<sup>n</sup> and g* ∈ *Zn, then* ∧*<sup>q</sup>* (*A*) *is a convolutional modular lattice, and <sup>f</sup> g* ∈ ∧*<sup>q</sup>* (*A*)*.*

*Proof* It is known that ∧*<sup>q</sup>* (*A*) is a *q*-ary lattice, i.e.

$$q\,\mathbf{Z}^{2n} \subset \wedge\_q(A) \subset \mathbf{Z}^{2n}.$$

We only prove that ∧*<sup>q</sup>* (*A*) is fixed under the linear transformation σ given by (2.4). If *y* ∈ ∧*<sup>q</sup>* (*A*), then *y* ≡ *A x*(mod *q*) for some *x* ∈ *Z<sup>n</sup>*, by Lemma 2, we have

$$\sigma(\mathbf{y}) \equiv \begin{pmatrix} HH^\*(f)\mathbf{x} \\ HH^\*(\mathbf{g})\mathbf{x} \end{pmatrix} = \begin{pmatrix} H^\*(f)H\mathbf{x} \\ H^\*(\mathbf{g})H\mathbf{x} \end{pmatrix},$$

$$\equiv A^\prime H\mathbf{x} (\text{mod } q).$$

It means that σ (*y*) ∈ ∧*<sup>q</sup>* (*A*) whenever *y* ∈ ∧*<sup>q</sup>* (*A*). Let

$$e = \begin{pmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix} \in Z^n \Rightarrow H^\*(f)e = f, \text{ and } H^\*(g)e = g.$$

We have *<sup>f</sup> g* ∈ ∧*<sup>q</sup>* (*A*), and Theorem 3 follows.

Since ∧*<sup>q</sup>* (*A*) ⊂ *Z*2*<sup>n</sup>*, then there is a unique Hermite Normal Form of basis *N*, which is a upper triangular matrix given by

$$N = \begin{pmatrix} I\_n \ H^\*(h) \\ 0 & qI\_n \end{pmatrix}, \text{ where } h \equiv (H^\*(f))^{-1} g (\text{mod } q). \tag{31}$$

Next, we consider parameters system of NTRU. To choose the parameters of NTRU, let *d <sup>f</sup>* be a positive integer and {*p*, 0, −*p*}*<sup>n</sup>* ⊂ *Z<sup>n</sup>* be a subset of *Z<sup>n</sup>*, of which has exactly *d <sup>f</sup>* + 1 positive entries and *d <sup>f</sup>* negative ones, the remaining *n* − 2*d <sup>f</sup>* − 1 entries will be zero. We take some assumption conditions for the choice of parameters as follows:


$$f(\mathbf{x}) - 1 \in \{p, 0, -p\}^n, \quad \mathbf{g} \in \{p, 0, -p\}^n.$$

(iii) *H*∗( *f* ) is invertible modulo *q*.

(iv) *d <sup>f</sup>* < ( *<sup>q</sup>* <sup>2</sup> <sup>−</sup> <sup>1</sup>)/4*<sup>p</sup>* <sup>−</sup> <sup>1</sup> 2 .

Under the above conditions, by Lemma 2, we have

$$H^\*(f) \equiv I\_n(\text{mod } p), \text{ and } H^\*(g) \equiv 0(\text{mod } p). \tag{32}$$

Now, we state a generalization of NTRU as follows.


$$
\begin{pmatrix} c \\ 0 \end{pmatrix} = N \begin{pmatrix} m \\ r \end{pmatrix} \equiv \begin{pmatrix} m + H^\*(h)r \\ 0 \end{pmatrix} (\text{mod } q), \tag{33}
$$

where *h* is given by (2.15). Then, the *n*-dimensional vector *c*

$$c \equiv m + H^\*(h)r(\text{mod } q)$$

is the ciphertext.

• Decryption. Suppose the entries of *n*-dimensional vector *c* belong to interval [−*<sup>q</sup>* 2 , *q* <sup>2</sup> ], then ciphertext *c* is decrypted by multiplying it by the secret matrix *H*∗( *f* ) mod *q*, it follows that

$$H^\*(f)c \equiv H^\*(f)m + H^\*(f)H^\*(h)r \equiv H^\*(f)m + H^\*(g)r (\text{mod } q). \tag{34}$$

Here, we use the identity (ii) of Theorem 2, namely

$$H^\*(f)H^\*(\mathbf{g}) = H^\*(H^\*(f)\mathbf{g}).$$

If the above conditions (iv) are satisfied, it is easily seen that the coordinates of vector *<sup>H</sup>*∗( *<sup>f</sup>* )*<sup>m</sup>* <sup>+</sup> *<sup>H</sup>*∗(*g*)*<sup>r</sup>* are all bounded by *<sup>q</sup>* <sup>2</sup> in absolute value, or, with high probability, even for larger value of *d <sup>f</sup>* . The decryption process is completed by reducing (2.18) modulo *p*, to obtain

$$H^\*(f)m + H^\*(g)r \equiv mI\_n(\text{mod } p).$$

Thus, one gets plaintext *m* from ciphertext *c*.

# **References**


Lint, J. H. V. (1999) *Introduction to coding theory* (Vol. 86). Springer. GTM.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Cyclic Lattices, Ideal Lattices, and Bounds for the Smoothing Parameter**

**Zheng Zhiyong, Liu Fengxia, Lu Yunfan, and Tian Kun**

**Abstract** Cyclic lattices and ideal lattices were introduced by Micciancio (2002), Lyubashevsky and Micciancio (2006), respectively, which play an efficient role in Ajtai's construction of a collision resistant Hash function (see Ajtai (1996), Ajtai and Dwork (1997)) and in Gentry's construction of fully homomorphic encryption (see Gentry (2009)). Let *R* = *Z*[*x*]/φ(*x*) be a quotient ring of the integer coefficients polynomials ring, Lyubashevsky and Micciancio regarded an ideal lattice as the correspondence of an ideal of *R*, but they neither explain how to extend this definition to whole Euclidean space R*<sup>n</sup>*, nor exhibit the relationship of cyclic lattices and ideal lattices. In this chapter, we regard the cyclic lattices and ideal lattices as the correspondences of finitely generated *R*-modules, so that we may show that ideal lattices are actually a special subclass of cyclic lattices, namely, cyclic integer lattices. In fact, there is a one to one correspondence between cyclic lattices in R*<sup>n</sup>* and finitely generated *R*-modules (see Theorem 4). On the other hand, since *R* is a Noether ring, each ideal of *R* is a finitely generated *R*-module, so it is natural and reasonable to regard ideal lattices as a special subclass of cyclic lattices (see Corollary 7). It is worth noting that we use a more general rotation matrix here, so our definition and results on cyclic lattices and ideal lattices are more general forms. As an application, we provide a cyclic lattice with an explicit and countable upper bound for the smoothing parameter (see Theorem 5). It is an open problem that is the shortest vector problem on cyclic lattice NP-hard (see Micciancio (2002)). Our results may be viewed as a substantial progress in this direction.

Z. Zhiyong · L. Fengxia (B) · L. Yunfan · T. Kun

Engineering Research Center of Ministry of Education for Financial Computing and Digital Engineering, Renmin University of China,Beijing 100872, China e-mail: liu\_fx@ruc.edu.cn

Z. Zhiyong e-mail: zhengzy@ruc.edu.cn

L. Yunfan e-mail: luyunfan@ruc.edu.cn


Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_7

**Keywords** Cyclic lattice · Ideal lattice · Finitely generated *<sup>R</sup>*-module · Smoothing parameter

# **1 Discrete Subgroup in** R*<sup>n</sup>*

Let R be the real numbers field, Z be the integers ring, and R*<sup>n</sup>* be Euclidean space of which is an *<sup>n</sup>*-dimensional linear space over <sup>R</sup> with the Euclidean norm <sup>|</sup>*x*<sup>|</sup> given by

$$|\mathbf{x}| = \left(\sum\_{i=1}^{n} x\_i^2\right)^{\frac{1}{2}}, \quad \text{where } x' = (x\_1, x\_1, \dots, x\_n) \in \mathbb{R}^n.$$

We use column vector notation for <sup>R</sup>*<sup>n</sup>* through out this chapter, and *<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2, ..., *xn*) is transpose of *x*, which is called row vector of R*<sup>n</sup>*.

**Definition 1** Let *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a non-trivial additive subgroup, it is called a discrete subgroup if there is a positive real number λ > 0 such that

$$\min\_{\mathbf{x}\in L, x\neq 0} |\mathbf{x}| \gg \lambda > \mathbf{0}.\tag{1}$$

As usual, a ball of center *x*<sup>0</sup> with radius δ is defined by

$$b(\mathbf{x}\_0, \delta) = \{ \mathbf{x} \in \mathbb{R}^n \: \left| \: |\mathbf{x} - \mathbf{x}\_0| \lesssim \delta \} \right.$$

If *L* is a discrete subgroup of R*<sup>n</sup>*, then there are only finitely many vectors of *L* lie in every ball *b*(0, δ), thus we always find a vector α ∈ *L* such that

$$|\alpha| = \min\_{\mathbf{x} \in L, \mathbf{x} \neq \mathbf{0}} |\mathbf{x}| = \lambda > 0, \quad \alpha \in L. \tag{2}$$

α is called one of shortest vector of *L* and λ is called the minimum distance of *L*.

Let *<sup>B</sup>* = [β1, β2,...,β*m*] ∈ <sup>R</sup>*<sup>n</sup>*×*<sup>m</sup>* be a *<sup>n</sup>* <sup>×</sup> *<sup>m</sup>* dimensional matrix with rank(*B*) <sup>=</sup> *m n*, it means that β1, β2,...,β*<sup>m</sup>* are *m* linearly independent vectors in R*<sup>n</sup>*. The lattice *L*(*B*) generated by *B* is defined by

$$L(B) = \sum\_{i=1}^{m} x\_i \beta\_i = \{Bx \mid x \in \mathbb{Z}^m\}, \quad \forall x\_i \in \mathbb{Z},\tag{3}$$

which is all linear combinations of <sup>β</sup>1, β2,...,β*<sup>m</sup>* over <sup>Z</sup>. If *<sup>m</sup>* <sup>=</sup> *<sup>n</sup>*, *<sup>L</sup>*(*B*) is called a full-rank lattice.

Cyclic Lattices, Ideal Lattices, and Bounds … 131

It is a well-known conclusion that a discrete subgroup *L* in R*<sup>n</sup>* is just a lattice *L*(*B*). Firstly, we give a detailed proof here by making use of the simultaneous Diophantine approximation theory in real number field R (see Cassels (1971) and Cassels (1963)).

**Lemma 1** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a discrete subgroup,* <sup>α</sup>1, α2,...,α*<sup>m</sup>* <sup>∈</sup> *L be m vectors of L. Then* α1, α2,...,α*<sup>m</sup> are linearly independent over* R*, if and only if which are linearly independent over* Z*.*

*Proof* If α1, α2,...,α*<sup>m</sup>* are linearly independent over R, trivially which are linearly independent over Z. Suppose that α1, α2,...,α*<sup>m</sup>* are linearly independent over Z, we consider arbitrary linear combination over R. Let

$$a\_1\alpha\_1 + a\_2\alpha\_2 + \dots + a\_m\alpha\_m = 0, \quad \forall a\_i \in \mathbb{R}.\tag{4}$$

We should prove (1.4) is equivalent to *a*<sup>1</sup> = *a*<sup>2</sup> =···= *am* = 0, which implies that α1, α2,...,α*<sup>m</sup>* are linearly independent over R.

By Minkowski's Third Theorem (see Theorem VII of Cassels (1963)), for any sufficiently large *N* > 1, there are a positive integer *q* - 1 and integers *p*1, *p*2,..., *pm* ∈ Z such that

$$\max\_{1 \le i \le m} |qa\_i - p\_i| < N^{-\frac{1}{m}}, \text{ and } 1 \le q \ll N. \tag{5}$$

By (1.4), we have


$$\lesssim mN^{-\frac{1}{m}} \max\_{1 \leqslant i \leqslant m} |\alpha\_i|\,. \tag{6}$$

Let λ be the minimum distance of *L*, ε > 0 be any positive real number. We select *N* such that

$$N \succ \max\{ (\frac{m}{\varepsilon})^m, \ (\frac{m}{\lambda})^m \max\_{1 \le i \le m} |\alpha\_i|^m \} .$$

It follows that *mN* <sup>−</sup> <sup>1</sup> *<sup>m</sup>* < ε and

$$mN^{-\frac{1}{m}}\max\_{1\le i\le m}|\alpha\_i| < \lambda\_{-i}$$

By (1.6) we have

$$|p\_1\alpha\_1 + p\_2\alpha\_2 + \dots + p\_m\alpha\_m| < \lambda\dots$$

Since *p*1α<sup>1</sup> + *p*2α<sup>2</sup> +···+ *pm*α*<sup>m</sup>* ∈ *L*, thus we have *p*1α<sup>1</sup> + *p*2α<sup>2</sup> +···+ *pm*α*<sup>m</sup>* = 0, and *<sup>p</sup>*<sup>1</sup> <sup>=</sup> *<sup>p</sup>*<sup>2</sup> =···= *pm* <sup>=</sup> 0. By (1.5) we have *<sup>q</sup>*|*ai*<sup>|</sup> <sup>&</sup>lt; <sup>1</sup> *<sup>m</sup>* ε for all *i*, 1 *i m*. Since ε is a sufficiently small positive number, we must have *a*<sup>1</sup> = *a*<sup>2</sup> =···= *am* = 0. We complete the proof of lemma.

Suppose that *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>m</sup>* is an *<sup>n</sup>* <sup>×</sup> *<sup>m</sup>*-dimensional matrix and rank(*B*) <sup>=</sup> *<sup>m</sup>*, *<sup>B</sup>* is the transpose of *B*. It is easy to verify

$$\text{rank}(B'B) = \text{rank}(B) = m \Rightarrow \det(B'B) \neq 0,$$

which implies that *B B* is an invertible square matrix of *m* × *m* dimension. Since *B B* is a positive defined symmetric matrix, then there is an orthogonal matrix *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*×*<sup>m</sup>* such that

$$P'B'B'P = \text{diag}\{\delta\_1, \delta\_2, \dots, \delta\_m\},\tag{7}$$

where δ*<sup>i</sup>* > 0 are the characteristic value of *B B*, and diag{δ1, δ2,...,δ*m*}is the diagonal matrix of *m* × *m* dimension.

**Lemma 2** *Suppose that B* <sup>∈</sup> <sup>R</sup>*n*×*<sup>m</sup> with rank*(*B*) <sup>=</sup> *m,* <sup>δ</sup>1, δ2,...,δ*<sup>m</sup> are m characteristic values of B B, and* λ(*L*(*B*)) *is the minimum distance of lattice L*(*B*)*, then we have*

$$\lambda(L(B)) = \min\_{\mathbf{x} \in \mathbb{Z}^n, \ \mathbf{x} \neq \mathbf{0}} |B\mathbf{x}| \gg \sqrt{\delta},\tag{8}$$

*where* δ = min{δ1, δ2,...,δ*m*}*.*

*Proof* Let *A* = *B <sup>B</sup>*, by (1.7), there exists an orthogonal matrix *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* such that

$$P'AP = \text{diag}\{\delta\_1, \delta\_2, \dots, \delta\_m\}.$$

If *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>m</sup>*, *<sup>x</sup>* = 0, we have

$$\begin{aligned} \left| Bx \right|^2 &= x^\prime A x = x^\prime P(P^\prime A P) P^\prime x \\\\ &= (P^\prime x)^\prime \text{diag}\{\delta\_1, \delta\_2, \dots, \delta\_m\} P^\prime x \\\\ &\ge \delta \left| P^\prime x \right|^2 = \delta \left| x \right|^2. \end{aligned}$$

Since *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>m</sup>* and *<sup>x</sup>* = 0, we have <sup>|</sup>*x*<sup>|</sup> <sup>2</sup> -1, it follows that

$$\min\_{\mathbf{x}\in\mathbb{Z}^n,\ \mathbf{x}\neq \mathbf{0}} |B\mathbf{x}| \geqslant \sqrt{\delta}|\mathbf{x}| \geqslant \sqrt{\delta}.$$

We have Lemma 2 immediately.

Another application of Lemma 2 is to give a countable upper bound for smoothing parameter (see Theorem 5). Combining Lemmas 1 and 2, we show the following assertion.

**Theorem 1** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a subset, then L is a discrete subgroup if and only if there is an n* <sup>×</sup> *m dimensional matrix B* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>m</sup> with rank*(*B*) <sup>=</sup> *m such that*

Cyclic Lattices, Ideal Lattices, and Bounds … 133

$$L = L(B) = \{ B \ge | \ge \mathbb{Z}^m \}. \tag{9}$$

*Proof* If *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a discrete subgroup, then *<sup>L</sup>* is a free <sup>Z</sup>-module. By Lemma 1, we have rankZ(*L*) <sup>=</sup> *<sup>m</sup> <sup>n</sup>*. Let <sup>β</sup>1, β2,...,β*<sup>m</sup>* be a <sup>Z</sup>-basis of *<sup>L</sup>*, then

$$L = \left\{ \sum\_{i=1}^{m} a\_i \beta\_i \mid a\_i \in \mathbb{Z} \right\}.$$

Writing *B* = [β1, β2,...,β*m*]*n*×*<sup>m</sup>*, then the rank of matrix *B* is *m*, and

$$L = \{ Bx \mid x \in \mathbb{Z}^m \} = L(B).$$

Conversely, let *L*(*B*) be arbitrary lattice generated by *B*, obviously, *L*(*B*) is an additive subgroup of R*<sup>n</sup>*, by Lemma 2, *L*(*B*) is also a discrete subgroup, we have Theorem 1 at once.

**Corollary 1** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice and G* <sup>⊂</sup> *L be an additive subgroup of L, then G is a lattice of* R*n.*

**Corollary 2** *Let L* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an additive subgroup, then L is a lattice of* <sup>R</sup>*n. These lattices are called integer lattices.*

According to above Theorem 1, a lattice *L*(*B*) is equivalent to a discrete subgroup of <sup>R</sup>*<sup>n</sup>*. Suppose *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) is a lattice with generated matrix *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>m</sup>*, and rank(*B*) <sup>=</sup> *m*, we write rank(*L*) =rank(*B*), and

$$d(L) = \sqrt{\det(B^\prime B)}.\tag{10}$$

In particular, if rank(*L*) = *n* is a full-rank lattice, then *d*(*L*) = |det(*B*)| as usual. A sublattice *N* of *L* means a discrete additive subgroup of *L*, the quotient group is written by *L*/*N*, and the cardinality of *L*/*N* is denoted by |*L*/*N*|.

**Lemma 3** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a lattice and N* <sup>⊂</sup> *L be a sublattice. If rank*(*N*) <sup>=</sup>*rank*(*L*)*, then the quotient group L*/*N is a finite group.*

*Proof* Let rank(*L*) <sup>=</sup> *<sup>m</sup>*, and *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*), where *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*×*<sup>m</sup>* with rank(*B*) <sup>=</sup> *<sup>m</sup>*. We define a mapping <sup>σ</sup> from *<sup>L</sup>* to <sup>Z</sup>*<sup>m</sup>* by σ (*Bx*) <sup>=</sup> *<sup>x</sup>*. Clearly, <sup>σ</sup> is an additive group isomorphism, σ (*N*) <sup>⊂</sup> <sup>Z</sup>*<sup>m</sup>* is a full-rank lattice of <sup>Z</sup>*<sup>m</sup>*, and *<sup>L</sup>*/*<sup>N</sup>* ∼= <sup>Z</sup>*<sup>m</sup>*/σ (*N*). It is a well-known result that

$$|\mathbb{Z}^m/\sigma(N)| = d(\sigma(N)).$$

It follows that

$$|L/N| = |\mathbb{Z}^m/\sigma(N)| = d(\sigma(N)).$$

Lemma 3 follows.

Suppose that *<sup>L</sup>*<sup>1</sup> <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>L</sup>*<sup>2</sup> <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* are two lattices of <sup>R</sup>*<sup>n</sup>*, we define *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> <sup>=</sup> {*<sup>a</sup>* <sup>+</sup> *<sup>b</sup>*|*<sup>a</sup>* <sup>∈</sup> *<sup>L</sup>*1, *<sup>b</sup>* <sup>∈</sup> *<sup>L</sup>*2}. Obviously, *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is an additive subgroup of <sup>R</sup>*<sup>n</sup>*, but generally speaking, *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is not a lattice of <sup>R</sup>*<sup>n</sup>* again.

**Lemma 4** *Let L*<sup>1</sup> <sup>⊂</sup> <sup>R</sup>*n, L*<sup>2</sup> <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be two lattices of* <sup>R</sup>*n. If rank*(*L*<sup>1</sup> <sup>∩</sup> *<sup>L</sup>*2) <sup>=</sup>*rank*(*L*1) *or rank*(*L*<sup>1</sup> <sup>∩</sup> *<sup>L</sup>*2) <sup>=</sup>*rank*(*L*2)*, then L*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> *is again a lattice of* <sup>R</sup>*n.*

*Proof* To prove *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is a lattice of <sup>R</sup>*<sup>n</sup>*, by Theorem 1, it is sufficient to prove *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is a discrete subgroup of <sup>R</sup>*<sup>n</sup>*. Suppose that rank(*L*<sup>1</sup> <sup>∩</sup> *<sup>L</sup>*2) <sup>=</sup>rank(*L*1), for any *x* ∈ *L*1, we define a distance function ρ(*x*) by

$$\rho(\mathbf{x}) = \inf \{ |\mathbf{x} - \mathbf{y}| \, \Big| \, \mathbf{y} \neq \mathbf{x}, \, \mathbf{y} \in L\_2 \}.$$

Since there are only finitely many vectors in *L*<sup>2</sup> ∩ *b*(*x*, δ), where *b*(*x*, δ) is any a ball of center *x* with radius δ. Therefore, we have

$$\rho(\mathbf{x}) = \min \{ |\mathbf{x} - \mathbf{y}| \, \middle| \, \mathbf{y} \neq \mathbf{x}, \, \mathbf{y} \in L\_2 \} = \lambda\_x > 0. \tag{11}$$

On the other hand, if *x*<sup>1</sup> ∈ *L*1, *x*<sup>2</sup> ∈ *L*1, and *x*<sup>1</sup> − *x*<sup>2</sup> ∈ *L*2, then there is *y*<sup>0</sup> ∈ *L*<sup>2</sup> such that *x*<sup>1</sup> = *x*<sup>2</sup> + *y*0, and we have ρ(*x*1) = ρ(*x*2). It means that ρ(*x*) is defined over the quotient group *L*<sup>1</sup> + *L*2/*L*2. Because we have the following group isomorphic theorem

$$L\_1 + L\_2/L\_2 \cong L\_1/L\_1 \cap L\_2.$$

By Lemma 3, it follows that

$$|L\_1 + L\_2/L\_2| = |L\_1/L\_1 \cap L\_2| < \infty.$$

In other words, *L*<sup>1</sup> + *L*2/*L*<sup>2</sup> is also a finite group. Let *x*1, *x*2,..., *xk* be the representative elements of *L*<sup>1</sup> + *L*2/*L*2, we have

$$\min\_{\mathbf{x}\in L\_1, \mathbf{y}\in L\_2, \mathbf{x}\neq \mathbf{y}} |\mathbf{x} - \mathbf{y}| = \min\_{1 \le i \le k} \rho(\mathbf{x}\_i) \ge \min\{\lambda\_{\mathbf{x}\_1}, \lambda\_{\mathbf{x}\_2}, \dots, \lambda\_{\mathbf{x}\_k}\} > 0.$$

Therefore, *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is a discrete subgroup of <sup>R</sup>*<sup>n</sup>*, thus it is a lattice of <sup>R</sup>*<sup>n</sup>* by Theorem 1.

**Remark 1** The condition rank(*L*<sup>1</sup> ∩ *L*2) =rank(*L*1) or rank(*L*<sup>1</sup> ∩ *L*2) =rank(*L*2) in Lemma 4 seems to be necessary. As a counterexample, we see the real line R, let *<sup>L</sup>*<sup>1</sup> <sup>=</sup> <sup>Z</sup> and *<sup>L</sup>*<sup>2</sup> <sup>=</sup> <sup>√</sup>2Z, then *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is not a discrete subgroup of <sup>R</sup>, thus *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> is not a lattice in <sup>R</sup>. Because *<sup>L</sup>*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> = {*<sup>n</sup>* <sup>+</sup> <sup>√</sup>2*<sup>m</sup> <sup>n</sup>* <sup>∈</sup> <sup>Z</sup>, *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>} is dense in <sup>R</sup> by Dirichlet's Theorem (see Theorem I of Cassels (1963)).

As a direct consequence, we have the following generalized form of Lemma 4.

**Corollary 3** *Let L*1, *L*2,..., *Lm be m lattices of* R*<sup>n</sup> and*

$$\operatorname{rank}(L\_1 \cap L\_2 \cap \dots \cap L\_m) = \operatorname{rank}(L\_j) \text{ for some } 1 \leqslant j \leqslant m.$$

*Then L*<sup>1</sup> <sup>+</sup> *<sup>L</sup>*<sup>2</sup> +···+ *Lm is a lattice of* <sup>R</sup>*n.*

*Proof* Without loss of generality, we assume that

$$\text{rank}(L\_1 \cap L\_2 \cap \dots \cap L\_m) = \text{rank}(L\_m).$$

Let *L*<sup>1</sup> + *L*<sup>2</sup> +···+ *Lm*−<sup>1</sup> = *L* , then

$$L' + L\_m / L' \cong L\_m / L' \cap L\_m \dots$$

Since rank(*L* ∩ *Lm*) =rank(*Lm*), by Lemma 4, we have *L* + *Lm* = *L*<sup>1</sup> + *L*<sup>2</sup> + ···+ *Lm* is a lattice of <sup>R</sup>*<sup>n</sup>* and the corollary follows.

# **2 Ideal Matrices**

Let <sup>R</sup>[*x*] andZ[*x*] be the polynomials rings over <sup>R</sup> and<sup>Z</sup> with variable *<sup>x</sup>*, respectively. Suppose that

$$\phi(\mathbf{x}) = \mathbf{x}^n - \phi\_{n-1}\mathbf{x}^{n-1} - \dots - \phi\_1\mathbf{x} - \phi\_0 \in \mathbb{Z}[\mathbf{x}], \ \phi\_0 \neq 0,\tag{12}$$

is a polynomial with integer coefficients of which has no multiple roots in complex numbers field C. Let *w*1,*w*2,...,*wn* be the *n* different roots of φ(*x*) in C, the Vandermonde matrix *V*<sup>φ</sup> is defined by

$$V\_{\phi} = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ w\_1 & w\_2 & \cdots & w\_n \\ \vdots & \vdots & & \vdots \\ w\_1^{n-1} & w\_2^{n-1} & \cdots & w\_n^{n-1} \end{pmatrix}, \quad \text{and} \quad \det(V\_{\phi}) \neq 0. \tag{13}$$

According to the given polynomial φ(*x*), we define a rotation matrix *H* = *H*<sup>φ</sup> by

$$H = H\_{\phi} = \begin{pmatrix} 0 & \cdots & 0 & \phi\_0 \\ \hline & & \phi\_1 \\ & I\_{n-1} & \\ & & \vdots \\ & & & \phi\_{n-1} \end{pmatrix}\_{n \times n} \in \mathbb{Z}^{n \times n},\tag{14}$$

where *In*−<sup>1</sup> is the (*n* − 1) × (*n* − 1) unit matrix. Obviously, the characteristic polynomial of *H* is just φ(*x*).

We use column notation for vectors in <sup>R</sup>*<sup>n</sup>*, for any *<sup>f</sup>* <sup>=</sup> ⎛ ⎜ ⎜ ⎜ ⎝ *f*0 *f*1 . . . *fn*−<sup>1</sup> ⎞ ⎟ ⎟ ⎟ ⎠ <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, the ideal

matrix generated by vector *f* is defined by

$$H^\*(f) = [f, Hf, H^2f, \dots, H^{n-1}f]\_{n \times n} \in \mathbb{R}^{n \times n},\tag{15}$$

which is a block matrix in terms of each column *H<sup>k</sup> f* (0 *k n* − 1). Sometimes, *f* is called an input vector. It is easily seen that *H*∗( *f* ) is a more general form of the classical circulant matrix (see Davis (1994)) and *r*-circulant matrix (see Shi (2018), Yasin and Taskara (2013)). In fact, if φ(*x*) = *x <sup>n</sup>* − 1, then *H*∗( *f* ) is the ordinary circulant matrix generated by *f* . If φ(*x*) = *x <sup>n</sup>* − *r*, then *H*∗( *f* ) is the *r*-circulant matrix.

By (2.4), it follows immediately that

$$H^\*(f+\mathbf{g}) = H^\*(f) + H^\*(\mathbf{g}), \text{ and } H^\*(\lambda f) = \lambda H^\*(f), \,\forall \lambda \in \mathbb{R}.\tag{16}$$

Moreover, *H*∗( *f* ) = 0 is a zero matrix if and only if *f* = 0 is a zero vector, thus one has *H*∗( *f* ) = *H*∗(*g*) if and only if *f* = *g*. Let *M*<sup>∗</sup> be the set of all ideal matrices, namely

$$M^\* = \{ H^\*(f) \mid f \in \mathbb{R}^n \}.\tag{17}$$

We may regard *H*<sup>∗</sup> as a mapping from R*<sup>n</sup>* to *M*<sup>∗</sup> of which is a one to one correspondence.

In Zheng et al. (2023), we have shown some basic properties of ideal matrix, most of them may be summarized as the following theorem.

**Theorem 2** *Suppose that* φ(*x*) <sup>∈</sup> <sup>Z</sup>[*x*] *is a fixed polynomial with no multiple roots in* C*, then for any two column vectors f and g in* R*n, we have*

*(i) H*∗( *f* ) = *f*<sup>0</sup> *In* + *f*1*H* +···+ *fn*−<sup>1</sup>*H<sup>n</sup>*−<sup>1</sup>*;*


*where V*<sup>φ</sup> *is the Vandermonde matrix given by (2.2), wi* (1 *i n*) *are all roots of* φ(*x*) *in* <sup>C</sup>*, and diag*{ *<sup>f</sup>* (*w*1), *<sup>f</sup>* (*w*2), . . . , *<sup>f</sup>* (*wn*)} *is the diagonal matrix.*

*Proof* See Theorem 2 of Zheng et al. (2023).

Let *e*1, *e*2,..., *en* be unit vectors of R*<sup>n</sup>*, that is

Cyclic Lattices, Ideal Lattices, and Bounds … 137

$$e\_1 = \begin{pmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}, e\_2 = \begin{pmatrix} 0 \\ 1 \\ \vdots \\ 0 \end{pmatrix}, \dots, e\_n = \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{pmatrix}.$$

It is easy to verify that

$$H^\*(e\_1) = I\_n,\text{ and } H^\*(e\_k) = H^{k-1},\text{ }1 \lessgtr k \lessapprox n. \tag{18}$$

This means that the unit matrix *In* and rotation matrices *H<sup>k</sup>* (1 *k n* − 1) are all the ideal matrices.

Let φ(*x*)R[*x*] and φ(*x*)Z[*x*] be the principal ideals generated by φ(*x*) in <sup>R</sup>[*x*] and <sup>Z</sup>[*x*], respectively, we denote the quotient rings *<sup>R</sup>* and *<sup>R</sup>* by

$$R = \mathbb{Z}[\boldsymbol{\chi}]/\phi(\boldsymbol{\chi})\mathbb{Z}[\boldsymbol{\chi}], \text{ and } \overline{R} = \mathbb{R}[\boldsymbol{\chi}]/\phi(\boldsymbol{\chi})\mathbb{R}[\boldsymbol{\chi}].\tag{19}$$

There is a one to one correspondence between *R* and R*<sup>n</sup>* given by

$$f(\mathbf{x}) = f\_0 + f\_1 \mathbf{x} + \dots + f\_{n-1} \mathbf{x}^{n-1} \in \overline{\mathbb{R}} \xrightarrow{t \tag{5}} f = \begin{pmatrix} f\_0 \\ f\_1 \\ \vdots \\ f\_{n-1} \end{pmatrix} \in \mathbb{R}^n.$$

We denote this correspondence by *t*, that is

$$\forall t \left( f(\mathbf{x}) \right) = f \text{ and } t^{-1}(f) = f(\mathbf{x}), \,\forall f(\mathbf{x}) \in \overline{\mathbb{R}}, \text{ and } f \in \mathbb{R}^n. \tag{20}$$

If we restrict *t* in the quotient ring *R*, then which gives a one to one correspondence between *R* and Z*<sup>n</sup>*. First, we show that *t* is also a ring isomorphism.

**Definition 2** For any two column vectors *f* and *g* inR*<sup>n</sup>*, we define theφ-convolutional product *f* ∗ *g* by *f* ∗ *g* = *H*∗( *f* )*g*.

By Theorem 2, it is easy to see that

$$f \ast \mathbf{g} = \mathbf{g} \ast f,\text{ and } H^\*(f \ast \mathbf{g}) = H^\*(f)H^\*(\mathbf{g}). \tag{21}$$

**Lemma 5** *For any two polynomials f* (*x*) *and g*(*x*) *in R, we have*

$$t(f(\mathbf{x})\mathbf{g}(\mathbf{x})) = H^\*(f)\mathbf{g} = f \ast \mathbf{g} \cdot \mathbf{g}$$

*Proof* Let *g*(*x*) = *g*<sup>0</sup> + *g*1*x* +···+ *gn*−<sup>1</sup>*x <sup>n</sup>*−<sup>1</sup> ∈ *R*, then

$$\log(\mathbf{x}) = \phi\_0 \mathbf{g}\_{n-1} + (\mathbf{g}\_0 + \phi\_1 \mathbf{g}\_{n-1})\mathbf{x} + \dots + (\mathbf{g}\_{n-2} + \phi\_{n-1} \mathbf{g}\_{n-1})\mathbf{x}^{n-1}.$$

It follows that

$$\operatorname{tr}(\operatorname{xg}(\mathbf{x})) = \operatorname{Ht}(\operatorname{g}(\mathbf{x})) = \operatorname{Hg}.\tag{22}$$

Hence, for any 0 *k n* − 1, we have

$$t(\mathbf{x}^k \mathbf{g}(\mathbf{x})) = H^k t(\mathbf{g}(\mathbf{x})) = H^k \mathbf{g},\ 0 \le k \le n - 1. \tag{23}$$

Let *f* (*x*) = *f*<sup>0</sup> + *f*1*x* +···+ *fn*−1*x <sup>n</sup>*−<sup>1</sup> ∈ *R*, by (i) of Theorem 2, we have

$$t(f(\mathbf{x})\mathbf{g}(\mathbf{x})) = \sum\_{i=0}^{n-1} f\_i t(\mathbf{x}^i \mathbf{g}(\mathbf{x})) = \sum\_{i=0}^{n-1} f\_i H^i \mathbf{g} = H^\*(f)\mathbf{g}.$$

The lemma follows.

**Theorem 3** *Under* φ*-convolutional product,* R*<sup>n</sup> is a commutative ring with identity element e*<sup>1</sup> *and* <sup>Z</sup>*<sup>n</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is its subring. Moreover, we have the following ring isomorphisms:*

$$
\overline{\mathcal{R}} \cong \mathbb{R}^n \cong \mathcal{M}^\*, \text{ and } \mathcal{R} \cong \mathbb{Z}^n \cong \mathcal{M}\_{\mathbb{Z}}^\*,
$$

*where M*<sup>∗</sup> *is the set of all ideal matrices given by (2.6), and M*<sup>∗</sup> <sup>Z</sup> *is the set of all integer ideal matrices.*

*Proof* Let *f* (*x*) ∈ *R* and *g*(*x*) ∈ *R*, then

$$t(f(\mathbf{x}) + \mathbf{g}(\mathbf{x})) = f + \mathbf{g} = t(f(\mathbf{x})) + t(\mathbf{g}(\mathbf{x})),$$

and

$$t(f(\mathbf{x})\mathbf{g}(\mathbf{x})) = H^\*(f)\mathbf{g} = f \ast \mathbf{g} = t(f(\mathbf{x})) \ast t(\mathbf{g}(\mathbf{x})).$$

This means that *t* is a ring isomorphism. Since *f* ∗ *g* = *g* ∗ *f* and *<sup>e</sup>*<sup>1</sup> <sup>∗</sup> *<sup>g</sup>* <sup>=</sup> *<sup>H</sup>*∗(*e*1)*<sup>g</sup>* <sup>=</sup> *In <sup>g</sup>* <sup>=</sup> *<sup>g</sup>*, then <sup>R</sup>*<sup>n</sup>* is a commutative ring with *<sup>e</sup>*<sup>1</sup> as the identity elements. Noting *<sup>H</sup>*∗( *<sup>f</sup>* ) is an integer matrix if and only if *<sup>f</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* is an integer vector, the isomorphism of subrings follows immediately.

According to property (v) of Theorem 2, *H*∗( *f* ) is an invertible matrix whenever ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> 1 in <sup>R</sup>[*x*], we show that the inverse of an ideal matrix is again an ideal matrix.

**Lemma 6** *Let f* (*x*) <sup>∈</sup> *R and* ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> <sup>1</sup> *in* <sup>R</sup>[*x*]*, then*

$$(H^\*(f))^{-1} = H^\*(u),$$

*where u*(*x*) ∈ *R is the unique polynomial such that u*(*x*) *f* (*x*) ≡ 1 *(mod* φ(*x*)*).*

*Proof* By Lemma 5, we have *u* ∗ *f* = *e*1, it follows that

$$H^\*(\mu)H^\*(f) = H^\*(e\_1) = I\_n.$$

Thus we have (*H*∗( *f* ))−<sup>1</sup> = *H*∗(*u*). It is worth to note that if *H*∗( *f* ) is an invertible integer matrix, then (*H*∗( *f* ))−<sup>1</sup> is not an integer matrix in general.

Sometimes, the following lemma may be useful, especially, when we consider an integer matrix.

**Lemma 7** *Let f* (*x*) <sup>∈</sup> <sup>Z</sup>[*x*] *and* ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> <sup>1</sup> *in* <sup>Z</sup>[*x*]*, then we have* ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> <sup>1</sup> *in* <sup>R</sup>[*x*]*.*

*Proof* Let *<sup>Q</sup>* be the rational number field. Since ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> 1 in <sup>Z</sup>[*x*], then ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> 1 in <sup>Q</sup>[*x*]. We know that <sup>Q</sup>[*x*] is a principal ideal domain, thus there are two polynomials *<sup>a</sup>*(*x*) and *<sup>b</sup>*(*x*) in <sup>Q</sup>[*x*] such that

$$a(\mathbf{x})f(\mathbf{x}) + b(\mathbf{x})\phi(\mathbf{x}) = 1.$$

This means that ( *<sup>f</sup>* (*x*), φ(*x*)) <sup>=</sup> 1 in <sup>R</sup>[*x*].

# **3 Cyclic Lattices and Ideal Lattices**

As we know that cyclic code plays a central role in the algebraic coding theorem (see Chap. 6 of Lint (1999)). In Zheng et al. (2023), we extended ordinary cyclic code to more general forms, namely φ-cyclic codes. To obtain an analogous concept of φ-cyclic code in R*<sup>n</sup>*, we note that every rotation matrix *H* defines a linear transformation of <sup>R</sup>*<sup>n</sup>* by *<sup>x</sup>* <sup>→</sup> *H x*.

**Definition 3** A linear subspace *<sup>C</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is called a <sup>φ</sup>-cyclic subspace if <sup>∀</sup><sup>α</sup> <sup>∈</sup> *<sup>C</sup>* <sup>⇒</sup> *<sup>H</sup>*<sup>α</sup> <sup>∈</sup> *<sup>C</sup>*. A lattice *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is called a <sup>φ</sup>-cyclic lattice if <sup>∀</sup><sup>α</sup> <sup>∈</sup> *<sup>L</sup>* <sup>⇒</sup> *<sup>H</sup>*<sup>α</sup> <sup>∈</sup> *<sup>L</sup>*.

In other words, a φ-cyclic subspace *C* is a linear subspace of R*<sup>n</sup>*, of which is closed under linear transformation *H*. A φ-cyclic lattice *L* is a lattice of R*<sup>n</sup>* of which is closed under *H*. If φ(*x*) = *x <sup>n</sup>* − 1, then *H* is the classical circulant matrix and the corresponding cyclic lattice first appeared in Micciancio (2002), but he does not discuss the further property for these lattices. To obtain the explicit algebraic construction of φ-cyclic lattice, we first show that there is a one to one correspondence between φ-cyclic subspaces of R*<sup>n</sup>* and the ideals of *R*.

**Lemma 8** *Let t be the correspondence between R and* R*<sup>n</sup> given by (2.9), then a subset C* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is a* <sup>φ</sup>*-cyclic subspace of* <sup>R</sup>*n, if and only if t*−<sup>1</sup>(*C*) <sup>⊂</sup> *R is an ideal.*

*Proof* We extend the correspondence *t* to subsets of *R* and R*<sup>n</sup>* by

$$C(\mathbf{x}) \subset \overline{\mathcal{R}} \xrightarrow{\iota'} \mathcal{C} = \{c|c(\mathbf{x}) \in C(\mathbf{x})\} \subset \mathbb{R}^n. \tag{24}$$

Let *<sup>C</sup>*(*x*) <sup>⊂</sup> *<sup>R</sup>* be an ideal, it is clear that *<sup>C</sup>* <sup>⊂</sup> *<sup>t</sup>*(*C*(*x*)) is a linear subspace of <sup>R</sup>*<sup>n</sup>*. To prove *C* is a φ-cyclic subspace, we note that if *c*(*x*) ∈ *C*(*x*), then by (2.11)

$$\text{succ}(\mathfrak{x}) \in C(\mathfrak{x}) \Leftrightarrow Ht(c(\mathfrak{x})) = Hc \in \mathbb{C}.$$

Therefore, if *<sup>C</sup>*(*x*) is an ideal of *<sup>R</sup>*, then *<sup>t</sup>*(*C*(*x*)) <sup>=</sup> *<sup>C</sup>* is a <sup>φ</sup>-cyclic subspace of <sup>R</sup>*<sup>n</sup>*. Conversely, if *<sup>C</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a <sup>φ</sup>-cyclic subspace, then for any *<sup>k</sup>* - 1, we have *H<sup>k</sup> c* ∈ *C* whenever *c* ∈ *C*, it implies

$$\forall c(\mathbf{x}) \in C(\mathbf{x}) \Rightarrow \mathbf{x}^k c(\mathbf{x}) \in C(\mathbf{x}), \ 0 \lessgtr k \lessgtr n-1, \dots$$

which means that *C*(*x*) is an ideal of *R*. We complete the proof.

By the above lemma, to find a φ-cyclic subspace in R*<sup>n</sup>*, it is enough to find an ideal of *R*. There are two trivial ideals *C*(*x*) = 0 and *C*(*x*) = *R*, the corresponding <sup>φ</sup>-cyclic subspace are *<sup>C</sup>* <sup>=</sup> 0 and *<sup>C</sup>* <sup>=</sup> <sup>R</sup>*<sup>n</sup>*. To find non-trivial <sup>φ</sup>-cyclic subspaces, we make use of the homomorphism theorems, which is a standard technique in algebra. Let <sup>π</sup> be the natural homomorphism from <sup>R</sup>[*x*] to *<sup>R</sup>*, ker<sup>π</sup> <sup>=</sup> φ(*x*)R[*x*]. We write φ(*x*)R[*x*] by < φ(*x*) >. Let *<sup>N</sup>* be an ideal of <sup>R</sup>[*x*] satisfying

$$<\phi(\mathbf{x}) \succsubset N \subset \mathbb{R}[\mathbf{x}] \xrightarrow{\pi} \overline{R} = \mathbb{R}[\mathbf{x}]/<\phi(\mathbf{x}) \succsim . \tag{25}$$

Since <sup>R</sup>[*x*] is a principal ideal domain, then *<sup>N</sup>* <sup>=</sup><sup>&</sup>lt; *<sup>g</sup>*(*x*) > is a principal ideal generated by a monic polynomial *<sup>g</sup>*(*x*) <sup>∈</sup> <sup>R</sup>[*x*]. It is easy to see that

$$<\phi(\alpha)>\subset\Leftrightarrow g(\alpha)|\phi(\alpha)\text{ in }\mathbb{R}[\chi].$$

It follows that all ideals *N* satisfying (2) are given by

$$\{\prec g(\mathbf{x}) > \, \Big|\, g(\mathbf{x}) \in \mathbb{R}[\mathbf{x}] \text{ is monic and } g(\mathbf{x}) | \phi(\mathbf{x}) \}.$$

We write by < *g*(*x*) > mod φ(*x*), the image of < *g*(*x*) > under π, i.e.

$$<\mathfrak{g}(\mathfrak{x})> \mod \mathfrak{g}(\mathfrak{x}) = \mathfrak{x}(\prec \mathfrak{g}(\mathfrak{x})>).$$

It is easy to check

$$<\mathbf{g}(\mathbf{x})>\mod \phi(\mathbf{x}) = \{a(\mathbf{x})\mathbf{g}(\mathbf{x}) \mid a(\mathbf{x}) \in \mathbb{R}[\mathbf{x}] \text{ and } \deg a(\mathbf{x}) + \deg \mathbf{g}(\mathbf{x}) < n\},\tag{26}$$

more precisely, which is a representative elements set of < *g*(*x*) > mod φ(*x*). By homomorphism theorem in ring theory, all ideals of *R* are given by

$$\{\prec g(\mathbf{x}) > \bmod \phi(\mathbf{x}) \, | \, g(\mathbf{x}) \in \mathbb{R}[\mathbf{x}] \text{ is monic and } \mathbf{g}(\mathbf{x}) | \phi(\mathbf{x})\}.\tag{27}$$

Let *<sup>d</sup>* be the number of monic divisors of φ(*x*) in <sup>R</sup>[*x*], we have the following.

# **Corollary 4** *The number of* φ*-cyclic subspace of* R*<sup>n</sup> is d.*

Next, we discuss φ-cyclic lattice, which is the geometric analogy of cyclic code. The φ-cyclic subspace of R*<sup>n</sup>* may be regarded as the algebraic analogy of cyclic code. Let the quotient rings *R* and *R* be given by (2.8). A *R*-module is an Abel group ∧ such that there is an operator λα ∈ ∧ for all λ ∈ *R* and α ∈ ∧, satisfying 1 · α = α and (λ1λ2)α = λ1(λ2α). It is easy to see that *R* is a *R*-module, if ∧ ⊂ *R* and ∧ is a *R*-module, then ∧ is called a *R*-submodule of *R*. All *R*-modules we discuss here are *R*-submodule of *R*. On the other hand, if *I* ⊂ *R*, then *I* is an ideal of *R*, if and only if *I* is a *R*-module. Let α ∈ *R*, the cyclic *R*-module generated by α be defined by

$$R\alpha = \{\lambda\alpha \mid \lambda \in R\}.\tag{28}$$

If there are finitely many polynomials α1, α2,...,α*<sup>k</sup>* in *R* such that ∧ = *R*α<sup>1</sup> + *R*α<sup>2</sup> +···+ *R*α*<sup>k</sup>* , then ∧ is called a finitely generated *R*-module, which is a *R*submodule of *R*.

Now, if *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a <sup>φ</sup>-cyclic lattice, *<sup>g</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>H</sup>*∗(*g*) is the ideal matrix generated by vector *g*, and *L*(*H*∗(*g*)) is the lattice generated by *H*∗(*g*). It is easy to show that any *L*(*H*∗(*g*)) is a φ-cyclic lattice and

$$L(H^\*(\mathbf{g})) \subset L,\text{ whenever } \mathbf{g} \in L,\tag{29}$$

which implies that *L*(*H*∗(*g*)) is the smallest φ-cyclic lattice of which contains vector *g*. Therefore, we call *L*(*H*∗(*g*)) is a minimal φ-cyclic lattice in R*<sup>n</sup>*.

**Lemma 9** *There is a one to one correspondence between the minimal* φ*-cyclic lattice in* R*<sup>n</sup> and the cyclic R-submodule in R, namely,*

$$\operatorname{tr}(\operatorname{Rg}(\chi)) = L(H^\*(\operatorname{g})), \text{ for all } \operatorname{g}(\chi) \in R$$

*and*

$$\operatorname{tr}^{-1}(L(H^\*(\mathbf{g}))) = \operatorname{Rg}(\mathbf{x}), \text{ for all } \mathbf{g} \in \mathbb{R}^n.$$

*Proof* Let *b*(*x*) ∈ *R*, by Lemma 5, we have

$$t(b(\mathbf{x})\mathbf{g}(\mathbf{x})) = H^\*(b)\mathbf{g} = H^\*(\mathbf{g})b \in L(H^\*(\mathbf{g})),$$

and *t*(*Rg*(*x*)) ⊂ *L*(*H*∗(*g*)). Conversely, if α ∈ *L*(*H*∗(*g*)), and α = *H*∗(*g*)*b* for some integer vector *b*, by Lemma 5 again, we have *b*(*x*)*g*(*x*) ∈ *Rg*(*x*), and *t*(*b*(*x*)*g*(*x*)) = α. This implies that *L*(*H*∗(*g*)) ⊂ *t*(*Rg*(*x*)), and

$$\iota(\mathcal{Rg}(\mathbf{x})) = L(H^\*(\mathbf{g})).$$

The lemma follows immediately.

Suppose *L* = *L*(β1, β2,...,β*m*) is arbitrary φ-cyclic lattice, where *B* = [β1, β2, ...,β*m*]*n*×*<sup>m</sup>* is the generated matrix of *L*. *L* may be expressed as the sum of finitely many minimal φ-cyclic lattices, in fact, we have

$$L = L(H^\*(\beta\_1)) + L(H^\*(\beta\_2)) + \dots + L(H^\*(\beta\_m)).\tag{50}$$

To state and prove our main results, first, we give a definition of prime spot in R*<sup>n</sup>*.

**Definition 4** Let *<sup>g</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, and *<sup>g</sup>*(*x*) <sup>=</sup> *<sup>t</sup>*−<sup>1</sup>(*g*) <sup>∈</sup> *<sup>R</sup>*. If (*g*(*x*), φ(*x*)) <sup>=</sup> 1 in <sup>R</sup>[*x*], we call *g* is a prime spot of R*<sup>n</sup>*.

By (v) of Theorem 2, *<sup>g</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is a prime spot if and only if *<sup>H</sup>*∗(*g*) is an invertible matrix, thus the minimal φ-cyclic lattice *L*(*H*∗(*g*)) generated by a prime spot is a full-rank lattice.

**Lemma 10** *Let g and f be two prime spots of* <sup>R</sup>*n, then L*(*H*∗(*g*)) <sup>+</sup> *<sup>L</sup>*(*H*∗( *<sup>f</sup>* )) *is a full-rank* φ*-cyclic lattice.*

*Proof* According to Lemma 4, it is sufficient to show that

$$\text{rank}\{L(H^\*(\mathbf{g})) \cap L(H^\*(f))\} = \text{rank}\{L(H^\*(\mathbf{g}))\} = n. \tag{31}$$

In fact, we should prove in general

$$L(H^\*(\mathcal{g}) \cdot H^\*(f)) \subset L(H^\*(\mathcal{g})) \cap L(H^\*(f)).\tag{32}$$

Since *H*∗(*g*) · *H*∗( *f* )is an invertible matrix, then rank *L*(*H*∗(*g*) · *H*∗( *f* )) = *n*, and (8) follows immediately.

To prove (9), we note that

$$L(H^\*(\mathbf{g}) \cdot H^\*(f)) = L(H^\*(\mathbf{g} \* f)).$$

It follows that

$$t^{-1}\left(L(H^\*(\mathbf{g})\cdot H^\*(f))\right) = \mathcal{R}\mathbf{g}\left(\mathbf{x}\right)f(\mathbf{x})\dots$$

Cyclic Lattices, Ideal Lattices, and Bounds … 143

It is easy to see that

$$
\mathcal{R}\mathcal{g}(\mathfrak{x})f(\mathfrak{x}) \subset \mathcal{R}\mathcal{g}(\mathfrak{x}) \cap \mathcal{R}f(\mathfrak{x})\,.
$$

Therefore, we have

$$L(H^\*(\mathcal{g}) \cdot H^\*(f)) = t(R\mathfrak{g}(\mathfrak{x})f(\mathfrak{x})) \subset L(H^\*(\mathcal{g})) \cap L(H^\*(f)).$$

This is the proof of Lemma 10.

It is worth to note that (9) is true for the more general case and does not need the condition of prime spot.

**Corollary 5** *Let* β1, β2,...,β*<sup>m</sup> be arbitrary m vectors in* R*n, then we have*

$$L(H^\*(\beta\_1)H^\*(\beta\_2)\cdots H^\*(\beta\_m)) \subset L(H^\*(\beta\_1)) \cap L(H^\*(\beta\_2)) \cap \cdots \cap L(H^\*(\beta\_m)).\tag{33}$$

*Proof* If β1, β2,...,β*<sup>m</sup>* are integer vectors, then (10) is trivial. For the general case, we write

$$L(H^\*(\beta\_1)\cdot H^\*(\beta\_2)\cdot \cdots H^\*(\beta\_m)) = L(H^\*(\beta\_1 \* \beta\_2 \* \cdots \* \beta\_m)),$$

where β<sup>1</sup> ∗ β<sup>2</sup> ∗···∗ β*<sup>m</sup>* is the φ-convolutional product, then

$$t^{-1}\left(L(H^\*(\beta\_1)\cdots H^\*(\beta\_m))\right) = R\beta\_1(\mathbf{x})\beta\_2(\mathbf{x})\cdots\beta\_m(\mathbf{x}).$$

Since

$$R\beta\_1(\mathbf{x})\beta\_2(\mathbf{x})\cdots\beta\_m(\mathbf{x}) \subset R\beta\_1(\mathbf{x}) \cap R\beta\_2(\mathbf{x}) \cap \cdots \cap R\beta\_m(\mathbf{x})\,.$$

It follows that

$$L(H^\*(\beta\_1)H^\*(\beta\_2)\cdots H^\*(\beta\_m)) \subset L(H^\*(\beta\_1)) \cap L(H^\*(\beta\_2)) \cap \cdots \cap L(H^\*(\beta\_m)).$$

We have this corollary.

By Lemma 10, we also have the following assertion.

**Corollary 6** *Let* β1, β2,...,β*<sup>m</sup> be m prime spots of* <sup>R</sup>*n, then L*(*H*∗(β1)) <sup>+</sup> *<sup>L</sup>*(*H*∗(β2)) <sup>+</sup> ···+ *<sup>L</sup>*(*H*∗(β*m*)) *is a full-rank* φ*-cyclic lattice.*

*Proof* It follows immediately from Corollary 3.

Our main result in this chapter is to establish the following one to one correspondence between φ-cyclic lattices in R*<sup>n</sup>* and finitely generated *R*-modules in *R*.

**Theorem 4** *Let* ∧ = *R*α1(*x*) + *R*α2(*x*) +···+ *R*α*m*(*x*) *be a finitely generated Rmodule in R, then t*(∧) *is a* <sup>φ</sup>*-cyclic lattice in* <sup>R</sup>*n. Conversely, if L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> is a* <sup>φ</sup>*-cyclic lattice in* R*n, then t*−1(*L*) *is a finitely generated R-module in R, that is a one to one correspondence.*

*Proof* If ∧ is a finitely generated *R*-module, by Lemma 9, we have

$$\begin{aligned} t(\wedge) &= t(R\alpha\_1(\boldsymbol{x}) + \dots + R\alpha\_m(\boldsymbol{x})) = L(H^\*(\alpha\_1)) \\ &+ L(H^\*(\alpha\_2)) + \dots + L(H^\*(\alpha\_m)). \end{aligned}$$

The main difficulty is to show that *<sup>t</sup>*(∧)is a lattice ofR*<sup>n</sup>* , we require a surgery to embed *<sup>t</sup>*(∧) into a full-rank lattice. To do this, let (α*i*(*x*), φ(*x*)) <sup>=</sup> *di*(*x*), *di*(*x*) <sup>∈</sup> <sup>Z</sup>[*x*], and β*i*(*x*) = α*i*(*x*)/*di*(*x*), 1 *i m*. Since φ(*x*) has no multiple roots by assumption, then (β*i*(*x*), φ(*x*)) <sup>=</sup> 1 in <sup>R</sup>[*x*]. In other words, each *<sup>t</sup>*(β*i*(*x*)) <sup>=</sup> <sup>β</sup>*<sup>i</sup>* is a prime spot. It is easy to verify *R*α*i*(*x*) ⊂ *R*β*i*(*x*) (1 *i m*), thus we have

$$\forall t(\wedge) \subset L(H^\*(\beta\_1)) + L(H^\*(\beta\_2)) + \dots + L(H^\*(\beta\_m)).$$

By Corollaries <sup>6</sup> and 1, we have *<sup>t</sup>*(∧) is <sup>φ</sup>-cyclic lattice. Conversely, if *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a <sup>φ</sup>-cyclic lattice of <sup>R</sup>*<sup>n</sup>*, and *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(β1, β2,...,β*m*), by (7), we have

$$t^{-1}(L) = R\beta\_1(\mathbf{x}) + R\beta\_2(\mathbf{x}) + \dots + R\beta\_m(\mathbf{x}),$$

which is a finitely generated *R*-module in *R*. We complete the proof of Theorem 4.

As we introduced in abstract, since *R* is a Noether ring, then *I* ⊂ *R* is an ideal if and only if *I* is a finitely generated *R*-module. On the other hand, if *I* ⊂ *R* is an ideal, then *<sup>t</sup>*(*I*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>* is a discrete subgroup of <sup>Z</sup>*<sup>n</sup>*, thus *<sup>t</sup>*(*I*) is a lattice, we define the following.

**Definition 5** Let *I* ⊂ *R* be an ideal, *t*(*I*) is called the φ-ideal lattice.

Ideal lattice first appeared in Lyubashevsky and Micciancio (2006) (see Definition 3.1 of Lyubashevsky and Micciancio (2006)). As a direct consequence of Theorem 4, we have the following.

**Corollary 7** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a subset, then L is a* <sup>φ</sup>*-cyclic lattice if and only if*

$$L = L(H^\*(\beta\_1)) + L(H^\*(\beta\_2)) + \dots + L(H^\*(\beta\_m)),$$

*where* <sup>β</sup>*<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> and m n. Furthermore, L is a* <sup>φ</sup>*-ideal lattice if and only if every* <sup>β</sup>*<sup>i</sup>* <sup>∈</sup> <sup>Z</sup>*n,* <sup>1</sup> *<sup>i</sup> m.*

**Corollary 8** *Suppose that* φ(*x*) *is an irreducible polynomial in* <sup>Z</sup>[*x*]*, then any nonzero ideal I of R defines a full-rank* <sup>φ</sup>*-ideal lattice t*(*I*) <sup>⊂</sup> <sup>Z</sup>*n.*

*Proof* Let *I* ⊂ *R* be a non-zero ideal, then we have *I* = *R*α1(*x*) + *R*α2(*x*) +···+ *R*α*m*(*x*), where α*i*(*x*) ∈ *R* and (α*i*(*x*), φ(*x*)) = 1. It follows that

$$t(I) = L(H^\*(\alpha\_1)) + L(H^\*(\alpha\_2)) + \dots + L(H^\*(\alpha\_m)).$$

Since each α*<sup>i</sup>* is a prime spot, we have rank(*t*(*I*)) = *n* by Corollary 6, and the corollary follows at once.

According to Definition 3.1 of Lyubashevsky and Micciancio (2006), we have proved that any an ideal of *R* corresponding to a φ-ideal lattice, which just is a φcyclic integer lattice under the more general rotation matrix *H* = *H*φ. Cyclic lattice and ideal lattice were introduced in Lyubashevsky and Micciancio (2006), Micciancio (2002), respectively, to improve the space complexity of lattice-based cryptosystems. Ideal lattices allow to represent a lattice using only two polynomials. Using such lattices, class lattice-based cryptosystems can diminish their space complexity from *O*(*n*<sup>2</sup>) to *O*(*n*). Ideal lattices also allow to accelerate computations using the polynomial structure. The original structure of Micciancio's matrices uses the ordinary circulant matrices and allows for an interpretation in terms of arithmetic in polynomial ring <sup>Z</sup>[*x*]/ < *<sup>x</sup> <sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>&</sup>gt;. Lyubashevsky and Micciancio (2006) later suggested to change the ring to <sup>Z</sup>[*x*]/ < φ(*x*) > with an irreducible φ(*x*) over <sup>Z</sup>[*x*]. Our results here suggest to change the ring to <sup>Z</sup>[*x*]/ < φ(*x*) > with any polynomial φ(*x*). There are many works subsequent to Micciancio (2002, Lyubashevsky and Micciancio (2006), such as (Feige & Micciancio, 2004; Micciancio & Regev, 2009; Peikert, 2016; Plantard & Schneider, 2013; Pradhan et al., 2019; Stehle & Steinfeld, 2011).

**Example 1** It is interesting to find some examples of φ-cyclic lattices in an algebraic number field *K*. Let *Q* be a rational number field, without loss of generality, an algebraic number field *K* of degree *n* is just *K* = *Q*(*w*), where *w* = *wi* is a root of φ(*x*). If all *<sup>Q</sup>*(*wi*) <sup>⊂</sup> <sup>R</sup> (<sup>1</sup> *<sup>i</sup> <sup>n</sup>*), then *<sup>K</sup>* is called a totally real algebraic number field. Let *OK* be the ring of algebraic integers of *K*, and *I* ⊂ *OK* be an ideal, *I* = 0. Since there is an integral basis {α1, α2,...,α*n*} ⊂ *I* such that

$$I = \mathbb{Z}\alpha\_1 + \mathbb{Z}\alpha\_2 + \dots + \mathbb{Z}\alpha\_n.$$

We may regard every ideal of *OK* as a lattice in *Q<sup>n</sup>*, and our assertion is that every non-zero ideal of *OK* is corresponding to a full-rank φ-cyclic lattice of *Q<sup>n</sup>*. To see this example, let

$$\mathcal{Q}[\mathbb{W}] = \left\{ \sum\_{i=0}^{n-1} a\_i \mathbb{w}^i \mid a\_i \in \mathcal{Q} \right\}.$$

It is known that *K* = *Q*[*w*], thus every α ∈ *K* corresponds to a vector α ∈ *Q<sup>n</sup>* by

$$\alpha = \sum\_{i=0}^{n-1} a\_i w^i \xrightarrow{\mathfrak{r}} \overline{\alpha} = \begin{pmatrix} a\_0 \\ a\_1 \\ \vdots \\ a\_{n-1} \end{pmatrix} \in \mathbb{Q}^n.$$

If *<sup>I</sup>* <sup>⊂</sup> *OK* is an ideal of *OK* and *<sup>I</sup>* <sup>=</sup> <sup>Z</sup>α<sup>1</sup> <sup>+</sup> <sup>Z</sup>α<sup>2</sup> +···+ <sup>Z</sup>α*n*, let *<sup>B</sup>* = [α1, <sup>α</sup>2,..., α*n*] ∈ *Q<sup>n</sup>*×*<sup>n</sup>*, which is full-rank matrix. We have τ (*I*) = *L*(*B*) as a full-rank lattice. It remains to show that τ (*I*) is a φ-cyclic lattice, we only prove that if α ∈ *I* ⇒ *H*α ∈ τ (*I*). Suppose that α ∈ *I*, then *w*α ∈ *I*. It is easy to verify that τ (*w*) = *e*<sup>2</sup> (see (2.7)) and

$$
\tau(\bowtie \pi) = \tau(\bowtie \pi(\alpha) = H\overline{\alpha} \in \mathfrak{r}(I).
$$

This means that τ (*I*) is a φ-cyclic lattice of *Q<sup>n</sup>*, which is a full-rank lattice.

# **4 Smoothing Parameter**

As an application of the algebraic structure of φ-cyclic lattice, we show an explicit upper bound of the smoothing parameter for theφ-cyclic lattices. Firstly, we introduce some basic notations.

A Gauss function ρ*<sup>s</sup>*,*<sup>c</sup>*(*x*) in R*<sup>n</sup>* is given by

$$
\rho\_{s,c}(\mathbf{x}) = e^{-\pi|\mathbf{x}-\mathbf{c}|^2/s^2},\tag{34}
$$

where *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, and *<sup>s</sup>* <sup>&</sup>gt; 0 is a positive real number. <sup>ρ</sup>*<sup>s</sup>*,*<sup>c</sup>*(*x*)is called the Gauss function around original point *c* with parameter *s*. It is easy to see that

$$\int\_{\mathbb{R}^n} \rho\_{s,c}(x) dx = s^n.$$

Thus, we may define a probability density function *Ds*,*<sup>c</sup>*(*x*) by

$$D\_{\mathbf{s},\mathbf{c}}(\mathbf{x}) = \rho\_{\mathbf{s},\mathbf{c}}(\mathbf{x}) / \int\_{\mathbb{R}^n} \rho\_{\mathbf{s},\mathbf{c}}(\mathbf{x}) d\mathbf{x} = \rho\_{\mathbf{s},\mathbf{c}}(\mathbf{x}) / s^n. \tag{35}$$

Suppose *<sup>L</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a lattice, let

$$D\_{\mathbf{s},\mathbf{c}}(L) = \sum\_{\mathbf{x} \in L} D\_{\mathbf{s},\mathbf{c}}(\mathbf{x}), \ \rho\_{\mathbf{s},\mathbf{c}}(L) = \sum\_{\mathbf{x} \in L} \rho\_{\mathbf{s},\mathbf{c}}(\mathbf{x}). \tag{36}$$

The discrete Gauss distribution over *L* is a probability distribution *DL*,*s*,*<sup>c</sup>* over *L* given by

Cyclic Lattices, Ideal Lattices, and Bounds … 147

$$D\_{L,s,c}(\mathbf{x}) = \frac{D\_{s,c}(\mathbf{x})}{D\_{s,c}(L)} = \frac{\rho\_{s,c}(\mathbf{x})}{\rho\_{s,c}(L)}.\tag{37}$$

If *<sup>c</sup>* <sup>=</sup> 0 is the zero vector of <sup>R</sup>*<sup>n</sup>*, we write <sup>ρ</sup>*<sup>s</sup>*,0(*x*) <sup>=</sup> <sup>ρ</sup>*s*(*x*), <sup>ρ</sup>*<sup>s</sup>*,0(*L*) <sup>=</sup> <sup>ρ</sup>*s*(*L*), *Ds*,0(*x*) = *Ds*(*x*), and *Ds*,0(*L*) = *Ds*(*L*). Suppose that *L* is a full-rank lattice and *L*<sup>∗</sup> is its dual lattice, we define the smoothing parameter ηε(*L*) of *L* to be the smallest *s* such that ρ1/*<sup>s</sup>*(*L*∗) 1 + ε, more precisely,

$$\eta\_{\varepsilon}(L) = \min\{s \, :\, s > 0 \,\text{and}\,\rho\_{1/s}(L^\*) \lesssim 1 + \varepsilon\},\tag{38}$$

where ε > 0 is a positive number. Notice that ρ<sup>1</sup>/*<sup>s</sup>*(*L*∗) is a continuous and strictly decreasing function of *s*, thus the smoothing parameter ηε(*L*) is a continuous and strictly decreasing function of ε.

Let *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(β1, β2,...,β*n*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a full-rank lattice with a basis <sup>β</sup>1, β2,...,β*n*, the fundamental region *P*(*L*) is given by

$$P(L) = \left\{ \sum\_{i=1}^{n} a\_i \beta\_i | 0 \le a\_i < 1, \ 1 \le i \le n \right\}.\tag{39}$$

Suppose that *X* and *Y* are two discrete random variables on R*<sup>n</sup>*, the statistical distance between *X* and *Y* over *L* is defined by

$$\Delta(X,Y) = \frac{1}{2} \sum\_{a \in L} |P\{X = a\} - P\{Y = a\}|.\tag{40}$$

If *X* and *Y* are continuous random variables with probability density function *T*<sup>1</sup> and *T*2, respectively, then (*X*, *Y* ) is defined by

$$\Delta(X,Y) = \frac{1}{2} \int\_{\mathbb{R}^n} |T\_1(z) - T\_2(z)| \, \mathrm{d}z. \tag{41}$$

The smoothing parameter was introduced by Micciancio and Regev (2007), which plays an important role in the statistical information of lattices. An important property of smoothing parameter is for any lattice *L* = *L*(*B*) and any ε > 0, the statistical distance between *Ds* mod *L* and the uniform distribution over the fundamental region *P*(*L*) is at most <sup>1</sup> <sup>2</sup> (ρ<sup>1</sup>/*<sup>s</sup>*(*L*(*B*)∗)). More precisely, for any ε > 0 and any *s* ηε(*L*(*B*)), the statistical distance is at most <sup>1</sup> <sup>2</sup> ε, namely

$$
\Delta \Big( D\_{s,c} \bmod L, \ U(P(L)) \Big) \leqslant \frac{\varepsilon}{2}. \tag{42}
$$

**Lemma 11** *Let L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a full-rank lattice, we have*

$$
\eta\_{2^{-s}}(L) \lesssim \sqrt{n}/\lambda\_1(L^\*),\tag{43}
$$

*where L*<sup>∗</sup> *is the dual lattice of L, and* λ1(*L*∗) *is the minimum distance of L*∗*.*

*Proof* See Lemma 3.2 of Micciancio and Regev (2007), or Banaszczyk (1993).

**Lemma 12** *Suppose that L*<sup>1</sup> *and L*<sup>2</sup> *are two full-rank lattices in* <sup>R</sup>*n, and L*<sup>1</sup> <sup>⊂</sup> *<sup>L</sup>*2*, then for any* ε > 0*, we have*

$$
\eta\_{\ell}(L\_2) \lesssim \eta\_{\ell}(L\_1). \tag{44}
$$

*Proof* Let ηε(*L*1) = *s*, we are to show that ηε(*L*2) *s*. Since

$$\rho\_{1/\mathfrak{s}}(L\_1^\*) = 1 + \varepsilon, \text{ and } \sum\_{\mathfrak{x} \in L\_1^\*} e^{-\pi s^2 |\mathfrak{x}|^2} = 1 + \varepsilon.$$

It is easy to check that *L*<sup>∗</sup> <sup>2</sup> ⊂ *L*<sup>∗</sup> 1, it follows that

$$1 + \varepsilon = \sum\_{\mathbf{x} \in L\_1^\*} e^{-\pi s^2 |\mathbf{x}|^2} \geqslant \sum\_{\mathbf{x} \in L\_2^\*} e^{-\pi s^2 |\mathbf{x}|^2},$$

which implies

$$
\rho\_{1/s}(L\_2^\*) \ll 1 + \varepsilon,
$$

and ηε(*L*2) *s* = ηε(*L*1), thus we have Lemma 12.

According to (2.4), the ideal matrix *<sup>H</sup>*∗( *<sup>f</sup>* ) with input vector *<sup>f</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is just the ordinary circulant matrix when φ(*x*) = *x <sup>n</sup>* − 1. Next lemma shows that the trans-

pose of a circulant matrix is still a circulant matrix. For any *g* = ⎛ ⎜ ⎜ ⎜ ⎝ *g*0 *g*1 . . . *gn*−<sup>1</sup> ⎞ ⎟ ⎟ ⎟ ⎠ <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, we

denote *g* = ⎛ ⎜ ⎜ ⎜ ⎝ *gn*−<sup>1</sup> *gn*−<sup>2</sup> . . . *g*0 ⎞ ⎟ ⎟ ⎟ ⎠ , which is called the conjugation of *g*.

$$\textbf{Lemma 13} \text{ }Let \,\phi(\mathbf{x}) = \mathbf{x}^n - 1 \text{, then for any } \mathbf{g} = \begin{pmatrix} \mathbf{g}\_0 \\ \mathbf{g}\_1 \\ \vdots \\ \mathbf{g}\_{n-1} \end{pmatrix} \in \mathbb{R}^n \text{, we have } \mathbf{g}\_0$$

$$(H^\*(\mathbf{g}))' = H^\*(H\overline{\mathbf{g}}).\tag{45}$$

*Proof* Since φ(*x*) = *x <sup>n</sup>* − 1, then *H* = *H*<sup>φ</sup> (see (2.3)) is an orthogonal matrix, and we have *H*−<sup>1</sup> = *H<sup>n</sup>*−<sup>1</sup> = *H* . We write *H*<sup>1</sup> = *H* = *H*−1. The following identity is easy to verify

Cyclic Lattices, Ideal Lattices, and Bounds … 149

$$H^\*(\mathcal{g}) = \begin{pmatrix} \overline{\mathcal{g}}' H\_1 \\ \overline{\mathcal{g}}' H\_1^2 \\ \vdots \\ \overline{\mathcal{g}}' H\_1^n \end{pmatrix}$$

It follows that

$$(H^\*(\mathfrak{g}))' = [H\overline{\mathfrak{g}}, H(H\overline{\mathfrak{g}}), \dots, H^{n-1}(H\overline{\mathfrak{g}})] = H^\*(H\overline{\mathfrak{g}}),$$

and we have the lemma.

**Lemma 14** *Suppose that g* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> and the circulant matrix H*∗(*g*) *is invertible. Let A* = (*H*∗(*g*)) *H*∗(*g*)*, then all characteristic values of A are given by*

$$\{ |\lg(\theta\_1)|^2, |\lg(\theta\_2)|^2, \dots, |\lg(\theta\_n)|^2 \},$$

*where* θ *<sup>n</sup> <sup>i</sup>* = 1 (1 *i n*) *are the n-th roots of unity.*

*Proof* By Lemma 13 and (ii) of Theorem 2, we have

$$A = H^\*(H\overline{\mathfrak{g}})H^\*\mathfrak{g} = H^\*(H^\*(H\overline{\mathfrak{g}})\mathfrak{g}) = H^\*(\mathfrak{g}''),$$

where *g* = *H*∗(*Hg*)*g*. Let *g*(*x*) = *t*−<sup>1</sup>(*g*) be the corresponding polynomial of *g*. By (iii) of Theorem 2, all characteristic values of *A* are given by

$$\{\mathbf{g}''(\theta\_1), \mathbf{g}''(\theta\_2), \dots, \mathbf{g}''(\theta\_n)\}, \ \theta\_i'' = 1, \ 1 \lessapprox i \lessapprox n. \tag{46}$$

Let *g* = ⎛ ⎜ ⎜ ⎜ ⎝ *g*0 *g*1 . . . *gn*−<sup>1</sup> ⎞ ⎟ ⎟ ⎟ ⎠ <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*. It is easy to see that

$$\log''(x) = \sum\_{i=0}^{n-1} g\_i^2 + \left(\sum\_{i=0}^{n-1} g\_i g\_{1-i}\right) x + \dots + \left(\sum\_{i=0}^{n-1} g\_i g\_{(n-1)-i}\right) x^{n-1} = \left|g(x)\right|^2,$$

where *g*−*<sup>i</sup>* = *gn*−*<sup>i</sup>* for all 1 *i n* − 1, then the lemma follows at once.

By definition 4, if *<sup>g</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is a prime spot, then there is a unique polynomial *u*(*x*) ∈ *R* such that *u*(*x*)*g*(*x*) ≡ 1 (mod φ(*x*)). We define a new vector *Tg* and its corresponding polynomial *Tg*(*x*) by

$$T\_{\mathcal{g}} = H\overline{u}, \text{ and } T\_{\mathcal{g}}(\mathbf{x}) = t^{-1}(H\overline{u}). \tag{47}$$

If *<sup>g</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* is an integer vector, then *Tg* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* is also an integer vector, and *Tg*(*x*) <sup>∈</sup> <sup>Z</sup>[*x*] is a polynomial with integer coefficients. Our main result on smoothing parameter is the following theorem.

**Theorem 5** *Let* φ(*x*) <sup>=</sup> *<sup>x</sup> <sup>n</sup>* <sup>−</sup> <sup>1</sup>*, L* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be a full-rank* <sup>φ</sup>*-cyclic lattice, then for any prime spots g* ∈ *L, we have*

$$\eta\_{2^{-n}}(L) \lesssim \sqrt{n} \left( \min \{ |T\_{\mathfrak{g}}(\theta\_1)|, |T\_{\mathfrak{g}}(\theta\_2)|, \dots, |T\_{\mathfrak{g}}(\theta\_n)| \} \right)^{-1},\tag{48}$$

*where* θ *<sup>n</sup> <sup>i</sup>* = 1*,* 1 *i n, and Tg*(*x*) *is given by (4.14).*

*Proof* Let *g* ∈ *L* be a prime spot, by Lemma 12, we have

$$L(H^\*(\mathbf{g})) \subset L \Rightarrow \eta\_\varepsilon(L) \lesssim \eta\_\varepsilon(L(H^\*(\mathbf{g}))), \ \forall \varepsilon > 0. \tag{49}$$

To estimate the smoothing parameter of *L*(*H*∗(*g*)), the dual lattice of *L*(*H*∗(*g*)) is given by

$$L(H^\*(\mathcal{g}))^\* = L((H^\*(\mu))') = L(H^\*(H\overline{\mu})) = L(H^\*(T\_\mathcal{g})),$$

where *u*(*x*) ∈ *R* and *u*(*x*)*g*(*x*) ≡ 1 (mod *x <sup>n</sup>* − 1), and *Tg* is given by (4.14). Let *A* = (*H*∗(*Tg*)) *H*∗(*Tg*), by Lemma 14, all characteristic values of *A* are

$$\{ |T\_{\mathfrak{g}}(\theta\_1)|^2, |T\_{\mathfrak{g}}(\theta\_2)|^2, \dots, |T\_{\mathfrak{g}}(\theta\_n)|^2 \}.$$

By Lemma 2, the minimum distance λ1(*L*(*H*∗(*g*))∗) is bounded by

$$\lambda\_1(L(H^\*(\mathcal{g}))^\*) \geqslant \min\{|T\_{\mathcal{g}}(\theta\_1)|, |T\_{\mathcal{g}}(\theta\_2)|, \dots, |T\_{\mathcal{g}}(\theta\_n)|\}.\tag{50}$$

Now, Theorem 5 follows from Lemma 11 immediately.

Let *L* = *L*(*B*) be a full-rank lattice and *B* = [β1, β2,...,β*n*]. We denote by *B*<sup>∗</sup> = [β<sup>∗</sup> <sup>1</sup> , β<sup>∗</sup> <sup>2</sup> ,...,β<sup>∗</sup> *<sup>n</sup>* ] the Gram-Schmidt orthogonal vectors {β<sup>∗</sup> *<sup>i</sup>* } of the ordered basis *B* = {β*i*}. It is a well-known conclusion that

$$\lambda\_1(L) \geqslant |B^\*| = \min\_{1 \leqslant i \leqslant n} |\beta\_i^\*|,$$

which yields by Lemma 11 the following upper bound

$$
\eta\_{2^{-n}}(L) \lesssim \sqrt{n} |B\_0^\*|^{-1},\tag{51}
$$

where *B*<sup>∗</sup> <sup>0</sup> is the orthogonal basis of dual lattice *L*<sup>∗</sup> of *L*.

For a φ-cyclic lattice *L*, we observe that the upper bound (4.17) is always better than (4.18) by numerical testing, we give two examples here.

**Example 2** Let *n* = 3 and φ(*x*) = *x* <sup>3</sup> − 1, the rotation matrix *H* is

$$H = \begin{pmatrix} 0 \ 0 \ 1 \\ 1 \ 0 \ 0 \\ 0 \ 1 \ 0 \end{pmatrix}.$$

We select a φ-cyclic lattice *L* = *L*(*B*), where

$$B = \begin{pmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix}.$$

Since *<sup>L</sup>* <sup>=</sup> <sup>Z</sup>3, thus *<sup>L</sup>* is a <sup>φ</sup>-cyclic lattice. It is easy to check

$$|B\_0^\*| = \min\_{1 \le i \le 3} |\beta\_i^\*| = \frac{\sqrt{3}}{3}.$$

On the other hand, we randomly find a prime spot *g* = ⎛ ⎝ 0 0 1 ⎞ <sup>⎠</sup> <sup>∈</sup> *<sup>L</sup>* and *<sup>g</sup>*(*x*) <sup>=</sup> *<sup>x</sup>* 2. Since *xg*(*x*) ≡ 1 (mod *x* <sup>3</sup> − 1), we have *Tg*(*x*) = *x* 2, it follows that |*Tg*(θ1)| = |*Tg*(θ2)|=|*Tg*(θ3)| = 1, and

$$\min\_{1 \le i \le 3} \left| T\_{\mathcal{S}}(\theta\_i) \right|^{-1} \lesssim \left| B\_0^\* \right|^{-1} = \sqrt{3}.$$

**Example 3** Let *n* = 4 and φ(*x*) = *x* <sup>4</sup> − 1, the rotation matrix *H* is

$$H = \begin{pmatrix} 0 \ 0 \ 0 \ 1 \\ 1 \ 0 \ 0 \ 0 \\ 0 \ 1 \ 0 \ 0 \\ 0 \ 0 \ 1 \ 0 \end{pmatrix}.$$

We select a φ-cyclic lattice *L* = *L*(*B*), where

$$B = \begin{pmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{pmatrix}.$$

Since *<sup>L</sup>* <sup>=</sup> <sup>Z</sup>4, thus *<sup>L</sup>* is a <sup>φ</sup>-cyclic lattice. It is easy to check

$$|B\_0^\*| = \min\_{1 \le i \le 4} |\beta\_i^\*| = \frac{1}{2}.$$

On the other hand, we randomly find a prime spot *g* = ⎛ ⎜ ⎜ ⎝ −2 1 0 0 ⎞ ⎟ ⎟ ⎠ ∈ *L* and *g*(*x*) = *x* −

2. Since ( <sup>1</sup> <sup>7</sup> *<sup>x</sup>* <sup>3</sup> <sup>−</sup> <sup>1</sup> <sup>7</sup> *<sup>x</sup>* <sup>2</sup> <sup>−</sup> <sup>2</sup> <sup>7</sup> *<sup>x</sup>* <sup>−</sup> <sup>5</sup> <sup>7</sup> )*g*(*x*) <sup>≡</sup> 1 (mod *<sup>x</sup>* <sup>4</sup> <sup>−</sup> 1), we have *Tg*(*x*) = −<sup>2</sup> <sup>7</sup> *<sup>x</sup>* <sup>3</sup> <sup>−</sup> <sup>1</sup> <sup>7</sup> *<sup>x</sup>* <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>7</sup> *<sup>x</sup>* <sup>−</sup> <sup>5</sup> <sup>7</sup> , it follows that <sup>|</sup>*Tg*(θ1)| = 1, <sup>|</sup>*Tg*(θ2)|=|*Tg*(θ3)|=|*Tg*(θ4)| = <sup>5</sup> <sup>7</sup> , and

$$\min\_{1 \le i \le 4} \left| T\_g(\theta\_i) \right|^{-1} = \frac{7}{5} \lesssim \left| B\_0^\* \right|^{-1} = 2.$$

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **On the LWE Cryptosystem with More General Disturbance**

**Zheng Zhiyong and Tian Kun**

**Abstract** The main purpose of this chapter is to give an extension on learning with errors problem (LWE)-based cryptosystem about the probability of decryption error with more general disturbance. In the first section, we introduce the LWE cryptosystem with its application and some previous research results. Then we give a more precise estimation probability of decryption error based on independent identical Gaussian disturbances and any general independent identical disturbances. This upper bound probability could be closed to 0 if we choose applicable parameters. It means that the probability of decryption error for the cryptosystem could be sufficiently small. So we verify our core result that the LWE-based cryptosystem could have high security.

**Keywords** Learning with errors problem · Decryption error · Probability · General disturbance

# **1 Introduction**

In this section, we describe a cryptosystem based on the learning with errors problem (LWE) (Micciancio & Regev, 2009; Regev, 2005). First, we introduce the LWE problem. Let *p* be a prime number, *m*, *n* be positive integers and consider a list of equations with error as follows:

Z. Zhiyong · T. Kun (B)

© The Author(s) 2023

Engineering Research Center of Ministry of Education for Financial Computing and Digital Engineering, Renmin University of China, Beijing 100872, China e-mail: tkun19891208@ruc.edu.cn

Z. Zhiyong e-mail: zhengzy@ruc.edu.cn

Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_8

$$\begin{cases}  \approx\_\chi \nu\_1 \pmod{p}, \\  \approx\_\chi \nu\_2 \pmod{p}, \\ \qquad \qquad \qquad \vdots \\  \approx\_\chi \nu\_m \pmod{p}. \end{cases}$$

Here *<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>p</sup>*, *a*1, *a*2,..., *am* are chosen independently and uniformly from Z*<sup>n</sup> <sup>p</sup>*, and *<sup>v</sup>*1, *<sup>v</sup>*2,..., *vm* <sup>∈</sup> <sup>Z</sup>*p*. <sup>&</sup>lt; *<sup>s</sup>*, *ai* <sup>&</sup>gt; is the inner product of two vectors *<sup>s</sup>* and *ai* . The errors in these equations are generated from a probability distribution <sup>χ</sup> : <sup>Z</sup>*<sup>p</sup>* <sup>→</sup> <sup>R</sup><sup>+</sup> on <sup>Z</sup>*p*, i.e. for each equation, we have *vi* <sup>=</sup><sup>&</sup>lt; *<sup>s</sup>*, *ai* <sup>&</sup>gt; <sup>+</sup>*ei* and *ei* <sup>∈</sup> <sup>Z</sup>*<sup>p</sup>* is chosen independently based on the probability distribution <sup>χ</sup>. The problem of finding *<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> p* from such equations is called LWE*<sup>p</sup>*,χ . There is an equivalent description for the LWE problem. The input has a pair (*A*, *<sup>v</sup>*) where *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>*m*×*<sup>n</sup> <sup>p</sup>* is chosen uniformly, and the choices of *v* have two cases. One case for *v* is chosen uniformly from Z*<sup>m</sup> p* , the other case is *As* <sup>+</sup> *<sup>e</sup>* for a uniformly chosen vector *<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> <sup>p</sup>* and vector *<sup>e</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>m</sup> p* chosen according to χ*<sup>m</sup>*. The goal is to distinguish between these two cases with non-negligible probability. It is also equivalent with a decoding problem in *q*-ary lattices (Regev, 2005).

The short integer solution (SIS) problem was first introduced in the seminal work of Ajtai (1996) and has served as the foundation for one-way and collision-resistant hash functions, identification schemes, digital signatures, and other "minicrypt" primitives. A very important work of Regev from 2005 introduced the LWE problem, which is the "encryption-enabling" analogue of the SIS problem (Regev, 2009). In fact, the two problems are very similar and can meaningfully be seen as duals of each other.

The LWE problem is a very robust problem and can be viewed as an extension of a well-known problem in learning theory. It remains hard even if the attacker learns extra information about the secret and errors. Regev gave the worst-case hardness theorem for LWE (Regev, 2009). The complexity of the best-known algorithm is running in exponential time in *n* (Ajtai et al., 2001; Blum et al., 2003; Kumar & Sivakumar, 2001). This theorem is proved by giving a quantum polynomial-time reduction that uses an oracle for LWE to solve GapSVP<sup>γ</sup> and SIVP<sup>γ</sup> in the worst case, thereby transforming any algorithm that solves LWE into a quantum algorithm for lattice problems. The quantum nature of the reduction is meaningful since there are no known quantum algorithms for GapSVP<sup>γ</sup> and SIVP<sup>γ</sup> that significantly outperform classical ones, beyond generic quantum speedups. It would be very useful to have a completely classical reduction to give further confidence in the hardness of LWE, which was given in 2009 by Peikert (2009). Regev also gave a public-key cryptosystem whose semantic security can provably be based on the LWE problem, and hence on the conjectured quantum hardness of GapSVP<sup>γ</sup> and SIVP<sup>γ</sup> for γ = *O*(*n*<sup>3</sup>/<sup>2</sup>) (Regev, 2009). LWE problem has a close relationship with decoding problems in coding theory (Ajtai, 2005; Ajtai & Dwork, 1997; Alekhnovich, 2003; Asokan et al., 2007; Ding, 2004; Kawachi et al., 2007; Peikert, 2007; Peikert et al., 2008; Regev, 2004; Signing et al., 2022). Regev's cryptosystem is secure against passive eavesdroppers since the LWE problem is hard.

Another application of LWE is fully homomorphic encryption (FHE) (Rivest et al., 1978). The earliest FHE constructions were based on average-case assumptions about ideal lattices (Gentry, 2009; Dijk et al., 2010). Later, Brakerski and Vaikuntanathan gave the second generation of FHE constructions, which were based on the LWE problem (Brakerski & Vaikuntanathan, 2011a, b). In 2013, Gentry, Sahai, and Waters proposed an LWE-based FHE scheme that has some unique and advantageous properties, such as homomorphic multiplication does not require any key-switching step, and the scheme can be made identity-based. This yields unbounded FHE based on LWE with just an inverse-polynomial *n*−*O*(1) error rate (Gentry et al., 1999).

Now we introduce the efficient lattice-based cryptosystem in the following which has strong theoretical security (Micciancio & Regev, 2009).


Here *m*, *n*,*l*, *t*, *q*,*r* are positive integers and α > 0. ψα is defined to be the distribution on Z*<sup>q</sup>* obtained by sampling a normal variable with mean 0 and standard deviation α*q*/ <sup>√</sup>2π, rounding the result to the nearest integer and reduced modulo *q*. *f* is defined as the function from Z*<sup>l</sup> <sup>t</sup>* to Z*<sup>l</sup> <sup>q</sup>* by multiplying each coordinate by *q*/*t* and rounding to the nearest integer. *f* <sup>−</sup><sup>1</sup> is defined to be the "inverse" mapping of *f* by multiplying each coordinate by *t*/*q* and rounding to the nearest integer. The definitions of *f* and *f* <sup>−</sup><sup>1</sup> are in the next section. The probability of decryption error in one letter for this cryptosystem is approximatively estimated in (Micciancio & Regev, 2009) as

$$\text{error probability per letter} \approx 2\left(1 - \Phi\left(\frac{1}{2t\alpha} \sqrt{\frac{6\pi}{mr(r+1)}}\right)\right),\tag{1}$$

where Φ is the cumulative distribution function of the standard normal distribution, i.e. Φ(*x*) <sup>=</sup> *<sup>x</sup>* −∞ <sup>√</sup> 1 <sup>2</sup><sup>π</sup> *<sup>e</sup>*<sup>−</sup> *<sup>t</sup>*<sup>2</sup> <sup>2</sup> d*t*. We give here a more precise upper bound estimation

$$\text{error probability} \lesssim 2l \left( 1 - \Phi \left( \frac{q - t}{2atq} \sqrt{\frac{6\pi}{mr(r+1)}} \right) \right). \tag{2}$$

This upper bound probability could be closed to 0 if we choose α small enough. It means that the probability of decryption error for the cryptosystem could be sufficiently small. However, the above estimation is based on Gaussian disturbance. In our work, we also give the probability of decryption error for the LWE-based cryptosystem with more general disturbance. By central limit theorem (Riauba, 1975), general disturbance could be approximated as Gaussian disturbance, then we get the following probability estimation result which is more advanced than that in (Micciancio & Regev, 2009).

$$\text{error probability} \leqslant 2l \left( 1 - \Phi \left( \frac{q - t}{2\beta t} \sqrt{\frac{3}{mr(r+1)}} \right) \right) + l\delta. \tag{3}$$

Here β is the standard deviation of disturbance distribution, and δ is a positive real number.

# *1.1 Innovation and Contribution*

Our work gives estimation probability of decryption error based on Gaussian disturbances and proves that the decryption error could be sufficiently small. The most salient innovation and contribution is that for any general disturbances, the decryption error could also be small enough. This indicates high security and reliability of LWE-based cryptosystem. In other words, this cryptosystem is secure enough against passive eavesdroppers and could be applied in many kinds of encryption processes.

# **2 Methodology**

# *2.1 Preliminary Property*

**Definition 1** <sup>∀</sup>*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>, let [*x*] be the closest integer to *<sup>x</sup>*, specially, [*x*] is defined to be *<sup>x</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> if the fractional part of *<sup>x</sup>* is <sup>1</sup> <sup>2</sup> . It is trivial that <sup>−</sup><sup>1</sup> <sup>2</sup> < *x* − [*x*] - 1 <sup>2</sup> for all *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>.

**Lemma 1** *t and q are positive integers, t q.* <sup>∀</sup>*<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>t</sup> , let f* (*a*) = [ *<sup>q</sup> <sup>t</sup> <sup>a</sup>*] ∈ <sup>Z</sup>*<sup>q</sup> .* <sup>∀</sup>*<sup>b</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>q</sup> , let f* <sup>−</sup><sup>1</sup>(*b*) = [ *<sup>t</sup> q <sup>b</sup>*] ∈ <sup>Z</sup>*<sup>t</sup> . Then f* <sup>−</sup><sup>1</sup>( *<sup>f</sup>* (*a*)) <sup>=</sup> *a for* <sup>∀</sup>*<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>t</sup> holds.*

**Remark 1** If *a*<sup>1</sup> ≡ *a*<sup>2</sup> (mod *t*), we have *f* (*a*1) ≡ *f* (*a*2) (mod *q*), so the definition of *f* is well defined and reasonable.

**Proof of Lemma 1** (1) If *t* = *q*, then we have *f* (*a*) = [*a*] = *a* and

$$f^{-1}(f(a)) = f^{-1}(a) = [a] = a, \ \forall a \in \mathbb{Z}\_t.$$

(2) If *t* < *q*, then *<sup>q</sup>* <sup>2</sup>*<sup>t</sup>* <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> , we know On the LWE Cryptosystem with More General Disturbance 159

$$
\frac{q}{t}a - \frac{1}{2} \leqslant \left[\frac{q}{t}a\right] < \frac{q}{t}a + \frac{1}{2}.
$$

It follows that

$$a\frac{q}{t}a - \frac{q}{2t} < \frac{q}{t}a - \frac{1}{2} \leqslant \left[\frac{q}{t}a\right] < \frac{q}{t}a + \frac{1}{2} < \frac{q}{t}a + \frac{q}{2t}.$$

So we can get

$$
\frac{q}{t}a - \frac{q}{2t} < \left[\frac{q}{t}a\right] < \frac{q}{t}a + \frac{q}{2t}.
$$

This is equivalent to

$$a - \frac{1}{2} < \frac{t}{q} \left[\frac{q}{t} a\right] < a + \frac{1}{2},$$

and

$$-\frac{1}{2} < \frac{t}{q} \left[\frac{q}{t}a\right] - a < \frac{1}{2}.$$

Thus,

$$\mathbb{E}\left[\frac{t}{q}\mathbb{E}\frac{q}{t}a\right] - a\right] = 0, \text{ and } \left[\frac{t}{q}\left[\frac{q}{t}a\right]\right] = a.$$

This means that

$$f^{-1}(f(a)) = a, \,\forall a \in \mathbb{Z}\_t. \tag{7}$$

**Lemma 2** *t and q are positive integers, t* > *q. If a is uniformly chosen in* Z*<sup>t</sup> , then*

$$P\{f^{-1}(f(a)) \neq a\} = 1 - \frac{q}{t}.$$

**Proof of Lemma 2** *t* > *q*, from Lemma 1 we have

$$
\left[\frac{q}{t}\left[\frac{t}{q}b\right]\right] = b, \ \forall b \in \mathbb{Z}\_q.
$$

This is equivalent to

$$f\left(\left[\frac{t}{q}b\right]\right) = b, \,\,\forall b \in \mathbb{Z}\_q.$$

So we get

$$f^{-1}\left(f\left(\left[\frac{t}{q}b\right]\right)\right) = f^{-1}(b) = \left[\frac{t}{q}b\right], \ \forall b \in \mathbb{Z}\_q.$$

Here 0, *t q* , 2*t q* ,..., (*q*−1)*t q* are different from each other in Z*<sup>t</sup>* . Next we prove that the number of *<sup>a</sup>* in <sup>Z</sup>*<sup>t</sup>* satisfying *<sup>f</sup>* <sup>−</sup><sup>1</sup>( *<sup>f</sup>* (*a*)) <sup>=</sup> *<sup>a</sup>* is no more than *<sup>q</sup>*. Let *<sup>A</sup>* be the set containing all the elements satisfying *<sup>f</sup>* <sup>−</sup><sup>1</sup>( *<sup>f</sup>* (*a*)) <sup>=</sup> *<sup>a</sup>* in <sup>Z</sup>*<sup>t</sup>* . <sup>∀</sup>*a*1, *<sup>a</sup>*<sup>2</sup> <sup>∈</sup> *<sup>A</sup>*, *<sup>a</sup>*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*<sup>2</sup>

in <sup>Z</sup>*<sup>t</sup>* , then we have *<sup>f</sup>* (*a*1) <sup>≡</sup> *<sup>f</sup>* (*a*2) (mod *<sup>q</sup>*), i.e. *<sup>f</sup>* (*a*1) <sup>=</sup> *<sup>f</sup>* (*a*2) in <sup>Z</sup>*<sup>q</sup>* . This means the number of *A* is no more than *q*.

Above all, it shows that 0, *t q* , 2*t q* ,..., (*q*−1)*t q* are just all the numbers in Z*<sup>t</sup>* such that *<sup>f</sup>* <sup>−</sup>1( *<sup>f</sup>* (*a*)) <sup>=</sup> *<sup>a</sup>*. Based on *<sup>a</sup>* is uniformly chosen in <sup>Z</sup>*<sup>t</sup>* , then

$$P\{f^{-1}(f(a)) \neq a\} = 1 - \frac{q}{t}. \tag{7}$$

**Corollary 1** *t, q, and l are positive integers.* <sup>∀</sup>*<sup>a</sup>* <sup>=</sup> (*a*1, *<sup>a</sup>*2,..., *al*) <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>t</sup> , let f* (*a*) <sup>=</sup> *<sup>q</sup> <sup>t</sup> a*<sup>1</sup> , *q <sup>t</sup> a*<sup>2</sup> ,..., *<sup>q</sup> t al* <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>q</sup> .* <sup>∀</sup>*<sup>b</sup>* <sup>=</sup> (*b*1, *<sup>b</sup>*2,..., *bl*) <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>q</sup> , let f* <sup>−</sup><sup>1</sup>(*b*) = *t q b*1 , *t q b*2 ,..., *t q bl* <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>t</sup> . If a is uniformly chosen in* Z*<sup>l</sup> <sup>t</sup> and a*1, *a*2,..., *al are independent, then*

$$P\{f^{-1}(f(a)) \neq a\} = \max\left\{0, 1 - \left(\frac{q}{t}\right)^l\right\}.$$

**Proof of Corollary 1** If *t q*, from Lemma 1, we have

$$f^{-1}(f(a\_i)) = a\_i, \ \forall a\_i \in \mathbb{Z}\_t, \ \forall 1 \leqslant i \leqslant l.$$

So

$$f^{-1}(f(a)) = a, \ \forall a \in \mathbb{Z}\_t^l.$$

$$P\{f^{-1}(f(a)) \neq a\} = 0 = \max\left\{0, 1 - \left(\frac{q}{t}\right)^l\right\}.$$

If *t* > *q*, from Lemma 2, we have

$$P\{f^{-1}(f(a\_i)) = a\_i\} = \frac{q}{t},\ a\_i \in \mathbb{Z}\_t,\ \forall 1 \leqslant i \leqslant l.$$

Since *a*1, *a*2,..., *al* are independent, therefore,

$$P\{f^{-1}(f(a)) = a\} = \left(\frac{q}{t}\right)^l, \ a \in \mathbb{Z}\_t^l.$$

$$P\{f^{-1}(f(a)) \neq a\} = 1 - \left(\frac{q}{t}\right)^l = \max\left\{0, 1 - \left(\frac{q}{t}\right)^l\right\}.\tag{7}$$

# *2.2 Probability of Decryption Error Based on Gaussian Disturbance*

Now we can calculate the probability of decryption error for the LWE-based cryptosystem. As described in the first section, assume *S* be the private key, (*A*, *P*) be the public key, and we choose *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>t</sup>* from the message space, encrypt *v*, and then decrypt it. The ciphertext is (*u* = *A<sup>T</sup> a*, *c* = *P<sup>T</sup> a* + *f* (*v*)). The decryption result is

$$\begin{aligned} f^{-1}(c - S^T u) &= f^{-1}(P^T a + f(\nu) - S^T u) \\ &= f^{-1}((AS + E)^T a + f(\nu) - S^T A^T a) \\ &= f^{-1}(E^T a + f(\nu)). \end{aligned}$$

Here the decryption result *<sup>f</sup>* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> <sup>a</sup>* <sup>+</sup> *<sup>f</sup>* (*v*)) <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>t</sup>* . The decryption error occurs if *f* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> a* + *f* (*v*)) = *v*. Since all the parameters are taken to guarantee security and efficiency of the cryptosystem, here we set *q* > *t* and obtain the following theorem.

**Theorem 1** *t, q, l, m, r are positive integers and q* <sup>&</sup>gt; *t. v* <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>t</sup> , f is defined in the previous section, Em*×*<sup>l</sup> is a Gaussian disturbance matrix with each element chosen independently from the Gaussian distribution with mean* 0 *and standard deviation* α*q*/ <sup>√</sup>2π*, a* ∈ {−*r*, <sup>−</sup>*<sup>r</sup>* <sup>+</sup> <sup>1</sup>, ··· ,*r*}*<sup>m</sup> is uniformly chosen at random. Then we have the following inequality of the probability of decryption error.*

$$P\{f^{-1}(E^T a + f(\mathbf{v})) \neq \mathbf{v}\} \leqslant 2l \left(1 - \Phi\left(\frac{q - t}{2atq} \sqrt{\frac{6\pi}{mr(r+1)}}\right)\right).$$

*Here* Φ *is the cumulative distribution function of the standard normal distribution, i.e.* Φ(*x*) <sup>=</sup> *<sup>x</sup>* −∞ <sup>√</sup> 1 <sup>2</sup><sup>π</sup> *<sup>e</sup>*<sup>−</sup> *<sup>t</sup>*<sup>2</sup> <sup>2</sup> d*t.*

**Proof of Theorem 1** In order to compute the probability of decryption error, we consider one letter first, i.e. the probability of *f* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> <sup>i</sup> a* + *f* (*vi*)) = *vi* , here *vi* is the *i*th coordinate of *v*, *Em*×*<sup>l</sup>* = (*E*1, *E*2,..., *El*), and *f* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> <sup>i</sup> a* + *f* (*vi*)) is the *i*th coordinate of *f* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> a* + *f* (*v*)). From Lemma 1, we know that *f* <sup>−</sup><sup>1</sup>( *f* (*vi*)) = *vi* for any *vi* <sup>∈</sup> <sup>Z</sup>*<sup>t</sup>* under this condition. We have

$$-\frac{1}{2} < \frac{q}{t}\nu\_i - \left[\frac{q}{t}\nu\_i\right] < \frac{1}{2}.$$

$$-\frac{t}{2q} < \frac{t}{q}\left[\frac{q}{t}\nu\_i\right] - \nu\_i < \frac{t}{2q}.$$

So if <sup>|</sup> *<sup>t</sup> q ET <sup>i</sup> <sup>a</sup>*<sup>|</sup> <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>t</sup>* <sup>2</sup>*<sup>q</sup>* , we get

$$
\left|\frac{t}{q}E\_i^T a + \frac{t}{q}\left[\frac{q}{t}\nu\_i\right] - \nu\_i\right| < \frac{1}{2} - \frac{t}{2q} + \frac{t}{2q} = \frac{1}{2}.
$$

$$
\left[\frac{t}{q}E\_i^T a + \frac{t}{q}\left[\frac{q}{t}\nu\_i\right] - \nu\_i\right] = 0.
$$

$$
\left[\frac{t}{q}E\_i^T a + \frac{t}{q}\left[\frac{q}{t}\nu\_i\right]\right] = \nu\_i.
$$

$$
f^{-1}(E\_i^T a + f(\nu\_i)) = \nu\_i.
$$

It means that if<sup>|</sup> *<sup>t</sup> q ET <sup>i</sup> <sup>a</sup>*<sup>|</sup> <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>t</sup>* <sup>2</sup>*<sup>q</sup>* , we can get *<sup>f</sup>* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> <sup>i</sup> a* + *f* (*vi*)) = *vi* . Equivalently, if *f* <sup>−</sup><sup>1</sup>(*E<sup>T</sup> <sup>i</sup> a* + *f* (*vi*)) = *vi* , i.e. the decryption error occurs in the *i*th letter, then | *t q ET <sup>i</sup> <sup>a</sup>*<sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>t</sup>* <sup>2</sup>*<sup>q</sup>* . So the probability of decryption error in one letter is no more than the probability of <sup>|</sup> *<sup>t</sup> q ET <sup>i</sup> <sup>a</sup>*<sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>t</sup>* <sup>2</sup>*<sup>q</sup>* , i.e.

$$P\{f^{-1}(E\_i^T a + f(\nu\_i)) \neq \nu\_i\} \leqslant P\left\{ \left| \frac{t}{q} E\_i^T a \right| \geqslant \frac{1}{2} - \frac{t}{2q} \right\}.$$

The next step we estimate the probability of <sup>|</sup> *<sup>t</sup> q ET <sup>i</sup> <sup>a</sup>*<sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>t</sup>* 2*q* . Since each coordinate of *Ei* is chosen independently from the Gaussian distribution with mean 0 and standard deviation α*q*/ <sup>√</sup>2<sup>π</sup> and the sum of independent Gaussian variables is still a Gaussian variable, *E<sup>T</sup> <sup>i</sup> a* is also a Gaussian distribution variable. *a* = (*a*1, *a*2,..., *am*) and each *ai* is chosen from {−*r*, −*r* + 1, ··· ,*r*} uniformly at random, then

$$E(a\_i) = \frac{-r + (-r+1) + \dots + r}{2r+1} = 0.$$

$$\text{Var}(a\_i) = \frac{(-r)^2 + (-r+1)^2 + \dots + r^2}{2r+1} = \frac{r(r+1)}{3}.$$

$$E(E\_i^T a) = 0.$$

$$\text{Var}(E\_i^T a) = \left(\frac{\alpha q}{2\pi}\right)^2 \cdot \frac{r(r+1)}{3} m = \frac{\alpha^2 q^2 m r (r+1)}{6\pi}.$$

Therefore, *E<sup>T</sup> <sup>i</sup> a* is treated as a normal distribution with mean 0 and standard deviation α*q* <sup>√</sup>*mr*(*<sup>r</sup>* <sup>+</sup> <sup>1</sup>)/√6π. We have

$$P\left\{\left|\frac{t}{q}E\_i^T a\right| \geqslant \frac{1}{2} - \frac{t}{2q}\right\} = P\left\{\left|E\_i^T a\right| \geqslant \frac{q-t}{2t}\right\}$$

On the LWE Cryptosystem with More General Disturbance 163

$$\begin{split} \mathcal{I} &= P \left\{ \left| E\_i^T a \right| \Big/ \left( \alpha q \sqrt{\frac{mr(r+1)}{6\pi}} \right) > \frac{q-t}{2t} \Big/ \left( \alpha q \sqrt{\frac{mr(r+1)}{6\pi}} \right) \right\}, \\\\ &= P \left\{ \left| E\_i^T a \right| \Big/ \left( \alpha q \sqrt{\frac{mr(r+1)}{6\pi}} \right) > \frac{q-t}{2\alpha t q} \sqrt{\frac{6\pi}{mr(r+1)}} \right\} \\\\ &= 2 \left( 1 - \Phi \left( \frac{q-t}{2\alpha t q} \sqrt{\frac{6\pi}{mr(r+1)}} \right) \right). \end{split}$$

So we get the following inequality for the probability of decryption error of the LWE-based cryptosystem

$$P\{f^{-1}(E^T a + f(\upsilon)) \neq \upsilon\}$$

$$\leq lP\{f^{-1}(E\_i^T a + f(\upsilon\_i)) \neq \upsilon\_i\}$$

$$\leq lP\left\{ \left| \frac{t}{q} E\_i^T a \right| \geqslant \frac{1}{2} - \frac{t}{2q} \right\}$$

$$= 2l\left(1 - \Phi\left(\frac{q - t}{2\alpha tq} \sqrt{\frac{6\pi}{mr(r+1)}}\right)\right).$$

This upper bound probability estimation is more precise than (1). The upper bound could be as closed as 0 if we choose α small enough. It means that the probability of decryption error for the LWE-based cryptosystem could be made very small with an appropriate setting of parameters.

# *2.3 Probability of Decryption Error for General Disturbance*

In this section, we estimate the probability of decryption error for the LWE-based cryptosystem when the noise matrix *E* = (*Ei j*)*<sup>m</sup>*×*<sup>l</sup>* is chosen independently from a general common variable.

**Theorem 2** *t, q, l, r are positive integers and q* > *t, m is a undetermined positive integer. v* <sup>∈</sup> <sup>Z</sup>*<sup>l</sup> <sup>t</sup> , f is defined in the second section, Em*×*<sup>l</sup> is a general disturbance matrix with each element chosen independently from a common random variable of mean* 0 *and standard deviation* β*, a* ∈ {−*r*, −*r* + 1, ··· ,*r*}*<sup>m</sup> is uniformly chosen at random. For any* δ > 0*, we can find positive integer m, such that the following inequality of the probability of decryption error holds.*

$$P\{f^{-1}(E^T a + f(\upsilon)) \neq \upsilon\} \leqslant 2l \left(1 - \Phi\left(\frac{q - t}{2\beta t} \sqrt{\frac{3}{mr(r + 1)}}\right)\right) + l\delta.$$

*Here* Φ *is the cumulative distribution function of the standard normal distribution, i.e.* Φ(*x*) <sup>=</sup> *<sup>x</sup>* −∞ <sup>√</sup> 1 <sup>2</sup><sup>π</sup> *<sup>e</sup>*<sup>−</sup> *<sup>t</sup>*<sup>2</sup> <sup>2</sup> d*t.*

**Proof of Theorem 2** Similarly as the proof of Theorem 1, we need to estimate the probability of <sup>|</sup> *<sup>t</sup> q ET <sup>i</sup> <sup>a</sup>*<sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>t</sup>* 2*q* . Since the coordinates of *E<sup>T</sup> <sup>i</sup>* are independent identically distributed, *E<sup>T</sup> <sup>i</sup>* and *a* are also independent, by central limit theorem (Riauba, 1975), *E<sup>T</sup> <sup>i</sup> a* is approximately normal distribution with mean 0 and standard deviation *d* = *mV ar*(*Ei j*)*V ar*(*ai*) = β *mr*(*r*+1) <sup>3</sup> . Thus, for any sufficiently small δ > 0, there is a positive integer *m* such that

$$P\left\{\left|\frac{t}{q}E\_i^T a\right| \geqslant \frac{1}{2} - \frac{t}{2q}\right\} = P\left\{\left|E\_i^T a\right| \geqslant \frac{q-t}{2t}\right\}$$

$$= P\left\{\left|E\_i^T a\right| \geqslant \left(\beta\sqrt{\frac{mr(r+1)}{3}}\right) \geqslant \frac{q-t}{2t} \Big/ \left(\beta\sqrt{\frac{mr(r+1)}{3}}\right)\right\}$$

$$= P\left\{\left|E\_i^T a\right| \geqslant \left(\beta\sqrt{\frac{mr(r+1)}{3}}\right) \geqslant \frac{q-t}{2\beta t} \sqrt{\frac{3}{mr(r+1)}}\right\}$$

$$= 2\left(1 - \Phi\left(\frac{q-t}{2\beta t}\sqrt{\frac{3}{mr(r+1)}}\right)\right) + \varepsilon.$$

Here |ε| δ. Then we get the following inequality for the probability of decryption error of the LWE-based cryptosystem for general disturbance

$$P\{f^{-1}(E^T a + f(\upsilon)) \neq \upsilon\}$$

$$\leq lP\{f^{-1}(E\_i^T a + f(\upsilon\_i)) \neq \upsilon\_i\}$$

$$\leq lP\left\{ \left| \frac{t}{q} E\_i^T a \right| \geq \frac{1}{2} - \frac{t}{2q} \right\}$$

$$= 2l\left(1 - \Phi\left(\frac{q - t}{2\beta t} \sqrt{\frac{3}{mr(r + 1)}}\right)\right) + l\varepsilon.$$

$$\leq 2l\left(1 - \Phi\left(\frac{q - t}{2\beta t} \sqrt{\frac{3}{mr(r + 1)}}\right)\right) + l\delta.\tag{7}$$

This probability could be also closed to 0 if we choose the parameter β <sup>√</sup>*<sup>m</sup>* and δ small enough. Therefore, the probability of decryption error of the LWE-based cryptosystem for general disturbance could be made very small, which leads to high security.

**Example 1** Let *<sup>t</sup>* <sup>=</sup> 2, *<sup>q</sup>* <sup>=</sup> 5, *<sup>l</sup>* <sup>=</sup> 1, *<sup>m</sup>* <sup>=</sup> 1, *<sup>r</sup>* <sup>=</sup> 1, <sup>δ</sup> <sup>=</sup> <sup>10</sup>−3, *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup><sup>2</sup> is uniformly chosen at random, the disturbance *E* is a random variable with the distribution ψβ such that *<sup>P</sup>*{*<sup>E</sup>* <sup>=</sup> *<sup>k</sup>*} = <sup>β</sup>*<sup>k</sup>* 2·*k*! *e*−<sup>β</sup> for integer *k* and *P*{*E* = 0} = *e*−<sup>β</sup> with parameter β = 10−3, *a* ∈ {−1, 0, 1} is uniformly chosen at random. Then the probability of decryption error

$$P\{f^{-1}(Ea+f(\nu)) \neq \nu\} = P\left\{\left[\frac{2}{5}\left(Ea + \left[\frac{5}{2}\right.\right]\right]\right] \neq \nu\right\}$$

$$= \frac{1}{2}P\left\{\left[\frac{2}{5}Ea\right] \neq 0\right\} + \frac{1}{2}P\left\{\left[\frac{2}{5}(Ea+2)\right] \neq 1\right\}$$

$$\leq \frac{1}{2}P\{E \neq 0\} + \frac{1}{2}P\{E \neq 0\}$$

$$= 1 - P\{E = 0\} = 1 - e^{-0.001} < 10^{-3}.$$

On the other hand,

$$2l\left(1-\Phi\left(\frac{q-t}{2\beta t}\sqrt{\frac{3}{mr(r+1)}}\right)\right)+l\delta>10^{-3}.$$

So it follows that

$$P\{f^{-1}(Ea+f(\nu)) \neq \nu\} < 2l \left(1 - \Phi\left(\frac{q-t}{2\beta t} \sqrt{\frac{3}{mr(r+1)}}\right)\right) + l\delta.$$

The inequality in Theorem 2 holds.

**Example 2** Let *<sup>t</sup>* <sup>=</sup> 2, *<sup>q</sup>* <sup>=</sup> 5, *<sup>l</sup>* <sup>=</sup> 1, *<sup>m</sup>* <sup>=</sup> 1, *<sup>r</sup>* <sup>=</sup> 1, <sup>δ</sup> <sup>=</sup> <sup>10</sup>−4, *<sup>v</sup>* <sup>∈</sup> <sup>Z</sup><sup>2</sup> is uniformly chosen at random, the disturbance *E* is a Laplace distribution variable with parameter <sup>λ</sup> <sup>=</sup> <sup>0</sup>.05 and probability density function *<sup>f</sup>* (*x*) <sup>=</sup> <sup>1</sup> <sup>2</sup><sup>λ</sup> *<sup>e</sup>*<sup>−</sup> <sup>|</sup>*x*<sup>|</sup> <sup>λ</sup> rounding to the nearest integer, *a* ∈ {−1, 0, 1} is uniformly chosen at random. Similarly as Example 1, the probability of decryption error

$$P\{f^{-1}(Ea+f(\nu)) \neq \nu\} = P\left\{ \left[ \frac{2}{5} \left( Ea + \left[ \frac{5}{2} \nu \right] \right) \right] \neq \nu \right\}$$

$$\ll 1 - P\{E = 0\} = 1 - \int\_{-\frac{1}{2}}^{\frac{1}{2}} \frac{1}{2\lambda} e^{-\frac{|x|}{\lambda}} d\lambda = e^{-10} < 10^{-4} \text{ .}$$

On the other hand,

$$2l\left(1-\Phi\left(\frac{q-t}{2\beta t}\sqrt{\frac{\mathfrak{I}}{mr(r+1)}}\right)\right)+l\delta>10^{-4}.$$

It follows that

$$P\{f^{-1}(Ea+f(\nu)) \neq \nu\} < 2l\left(1-\Phi\left(\frac{q-t}{2\beta t}\sqrt{\frac{3}{mr(r+1)}}\right)\right)+l\delta.$$

The inequality in Theorem 2 holds.

# **3 Results and Conclusions**

In this work, we first introduce the LWE problem and LWE-based cryptosystem. We give a more precise estimation probability of decryption error based on independent identical Gaussian disturbances. The salient significance of our work is that for any general independent identical disturbances, we also give the estimation probability of decryption error using central limit theorem. The upper bound probability could be closed to 0 if we choose applicable parameters. It means that the probability of decryption error for the cryptosystem could be sufficiently small. Then we confirm that the LWE-based cryptosystem could have high security.

# **4 Discussions**

# *4.1 Future Work*

Although we have reached our objective in this work, there are still many interesting works to study in this research area in the future. We will focus on the fully homomorphic encryption (FHE)-based cryptosystem later, which is an application of LWE (Brakerski & Vaikuntanathan, 2011a, b; Dijk et al., 2010; Gentry, 2009; Gentry et al., 1999). Fully homomorphic encryption was known to have abundant applications in cryptography, but for three decades no plausibly secure scheme was known until 2009. To date, the FHE-based cryptography has more than three generations. The third generation FHE scheme based on LWE problem is proved that has some unique and advantageous properties (Gentry et al., 1999). It also remains some improvable technique which needs to be studied in depth.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **On the High Dimensional RSA Algorithm—A Public Key Cryptosystem Based on Lattice and Algebraic Number Theory**

**Zheng Zhiyong, Liu Fengxia, and Chen Man**

**Abstract** The most known public key cryptosystem was introduced in 1978 by Rivest et al. (1978) and is now called the RSA public key cryptosystem in their honor. Later, a few authors gave a simple extension of RSA over algebraic numbers field (see Takagi and Naito (2015), Uematsu et al. (1985, 1986)), but they require that the ring of algebraic integers is Euclidean ring, and this requirement is much more stronger than the class number one condition. In this chapter, we introduce a high dimensional form of RSA by making use of the ring of algebraic integers of an algebraic number field and the lattice theory. We give an attainable algorithm (see Algorithm 1) which is significant both from the theoretical and practical point of view. Our main purpose in this chapter is to show that the high dimensional RSA is a lattice based on public key cryptosystem indeed, of which would be considered as a new number in the family of post-quantum cryptography (see Peikert (2014), Pradhanet al. (2019)). On the other hand, we give a matrix expression for any algebraic number fields (see Theorem 2), which is a new result even in the sense of classical algebraic number theory.

**Keywords** RSA · The ring of algebraic integers · Ideal matrix · Ideal lattice · HNF basis


Z. Zhiyong · L. Fengxia (B) · C. Man

Engineering Research Center of Ministry of Education for Financial Computing and Digital Engineering, Renmin University of China, Beijing 100872, China e-mail: liufengxia91@126.com

Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_9

# **1 Introduction**

Let *Q*, R, C be the rational numbers field, real numbers field, and complex numbers field, respectively, and <sup>Z</sup> be the integers ring. Let *<sup>E</sup>* <sup>⊂</sup> <sup>C</sup> be an algebraic numbers field of degree *n*, and *R* ⊂ *E* be the ring of algebraic integers of *E*. Suppose that *A* ⊂ *R* is a non-zero ideal(all ideals in this chapter are non-zero), then the factor ring *R*/*A* is a finite ring, we denote by *N*(*A*) the number of elements of *R*/*A*, which is called the norm of *A*, and denote by ϕ(*A*) the number of invertible elements of *R*/*A*, which is called the Euler totient function of *A*. For any α ∈ *R*, the principal ideal generated by α is denoted by α*R*, then α is an invertible element of *R*/*A* if and only if (α*R*, *A*) = 1. It is known (see Theorem 1.19 of Narkiewicz (2004)) that

$$\varphi(A) = N(A) \prod\_{P|A} (1 - \frac{1}{N(P)}) \tag{1}$$

where the product is extended over all prime ideals *P* dividing *A*. Moreover, if α ∈ *R* and (α*R*, *A*) = 1, then

$$\alpha^{\varphi(A)} \equiv 1 (\text{mod } A). \tag{2}$$

To generalize that RSA to arbitrary algebraic number fields *E*, we first show the following assertion.

**Theorem 1** *Let P*<sup>1</sup> *and P*<sup>2</sup> *be two distinct prime ideals of R and A* = *P*1*P*2*, then for any* α ∈ *R and integer k* ≥ 0, *we have*

$$
\alpha^{k\wp(A)+1} \equiv \alpha(\text{mod } A). \tag{3}
$$

*Proof* Let α ∈ *R*. If (α*R*, *A*) = 1, then (3) follows directly from (2). If (α*R*, *A*) = *A*, then α*R* ⊂ *A* and α ∈ *A*, (3) is trivial. Thus, we only consider the cases of (α*R*, *A*) = *P*<sup>1</sup> and (α*R*, *A*) = *P*2. If (α*R*, *A*) = *P*1, then (α*R*, *P*2) = 1, by (2) we have

$$\alpha^{\wp(P\_2)} \equiv \mathfrak{l}(\text{mod } P\_2)\,.$$

It follows that

$$\alpha^{k\varphi(A)} \equiv 1 \\
(\text{mod } P\_2), \quad \forall k \in \mathbb{Z}, \ k \ge 0.$$

Therefore, there exists an element β ∈ *P*<sup>2</sup> such that

$$
\alpha^{k\wp(A)} = 1 + \beta.
$$

We thus have

$$
\alpha^{k\wp(A)+1} = \alpha + \alpha\beta,\text{ and }\alpha^{k\wp(A)+1} \equiv \alpha(\text{mod }A),
$$

since αβ ∈ *A*. The same reason gives (3) when (α*R*, *A*) = *P*2.

#### **Table 1** RSA in the ring of algebraic integers

#### **RSA in the Ring of Algebraic Integers**


According to Theorem1, one can easily extend the classical RSA over an algebraic number field as follows (also see Takagi and Naito (2015)), but it does not give the proof of (3)).

Obviously, if *n* = 1, the above algorithm is the ordinary RSA. However, it is difficult to find the prime ideals in *R* and to construct a set of coset representatives of *R*/*A* yet. In Takagi and Naito (2015), the author supposed the ring *R* is a Euclidean ring, so that *S* can be constructed by Euclidean algorithm in *R*. The simplest way is to select an prime element α in *R*, so that the principal ideal α*R* is a prime ideal. In algorithm I, we would precisely construct a set of coset representatives for the factor ring *R*/*A* by the lattice theory. Here we give an approximate construction of the set of coset representatives for factor ring *R*/*A*.

If *<sup>P</sup>* <sup>⊂</sup> *<sup>R</sup>* is a prime ideal, then *<sup>P</sup>* <sup>∩</sup> <sup>Z</sup> <sup>=</sup> *<sup>p</sup>*Z, where *<sup>p</sup>* <sup>∈</sup> <sup>Z</sup> is a rational prime number. Since *<sup>R</sup>*/*<sup>P</sup>* is a finite field and <sup>Z</sup>/(*p*Z) <sup>⊂</sup> *<sup>R</sup>*/*P*, thus *<sup>N</sup>*(*P*) <sup>=</sup> *<sup>p</sup> <sup>f</sup>* , where *<sup>f</sup>* (<sup>1</sup> <sup>≤</sup> *<sup>f</sup>* <sup>≤</sup> *<sup>n</sup>*) is called the degree of *<sup>P</sup>*. We write *pR* <sup>=</sup> *<sup>P</sup><sup>e</sup>*<sup>1</sup> <sup>1</sup> *<sup>P</sup><sup>e</sup>*<sup>2</sup> <sup>2</sup> ··· *<sup>P</sup>eg <sup>g</sup>* , where *P* = *P*<sup>1</sup> and *Pi* are distinct prime ideals, *ei* is called the ramification index of *Pi* . There exists a remarkable relation among ramification indexes and degrees (see Theorem 3 of page 181 of Ireland and Rosen (1990))

$$\sum\_{i=1}^{g} e\_i f\_i = n. \tag{4}$$

Let {α1, α2, ··· α*n*} ⊂ *R* be an integral basis for *E*/*Q*, *A* = *P*1*P*2. Suppose that *P*<sup>1</sup> ∩ <sup>Z</sup> <sup>=</sup> *<sup>p</sup>*<sup>Z</sup> and *<sup>P</sup>*<sup>2</sup> <sup>∩</sup> <sup>Z</sup> <sup>=</sup> *<sup>q</sup>*Z, then *<sup>A</sup>* <sup>∩</sup> <sup>Z</sup> <sup>=</sup> *pq*Z, where *<sup>p</sup>* and *<sup>q</sup>* are two distinct rational prime numbers.

### **Lemma 1** *Let*

$$S\_1 = \left\{ \sum\_{i=1}^n a\_i \alpha\_i \mid 0 \le a\_i < pq, \ a\_i \in \mathbb{Z}, \ 1 \le i \le n \right\}.\tag{5}$$

*Then S*<sup>1</sup> *covers a set of coset representatives of R*/*A. Moreover, if the degrees of P*<sup>1</sup> *and P*<sup>2</sup> *are n, then S*<sup>1</sup> *is precisely an set of coset representatives of R*/*A.*

*Proof* Since *<sup>A</sup>* <sup>=</sup> *<sup>P</sup>*1*P*2, *<sup>P</sup>*<sup>1</sup> <sup>∩</sup> <sup>Z</sup> <sup>=</sup> *<sup>p</sup>*Z, and *<sup>P</sup>*<sup>2</sup> <sup>∩</sup> <sup>Z</sup> <sup>=</sup> *<sup>q</sup>*Z, we have *pq R* <sup>⊂</sup> *<sup>A</sup>*, thus *R*/*pq R* maps onto *R*/*A*. To prove the first assertion, it is enough to show that *S*<sup>1</sup> is a set of coset representatives of *R*/*pq R*. Since {α1, α2,...α*n*} is an integral basis and

$$R = \mathbb{Z}\alpha\_1 + \mathbb{Z}\alpha\_2 + \dots + \mathbb{Z}\alpha\_n.$$

Suppose that <sup>α</sup> <sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *mi*α*<sup>i</sup>* <sup>∈</sup> *<sup>R</sup>*, write *mi* <sup>=</sup> *ai pq* <sup>+</sup> *ri* , where 0 <sup>≤</sup> *ri* <sup>&</sup>lt; *pq*. Clearly

$$\alpha \equiv \sum\_{i=1}^{n} r\_i \alpha\_i (\text{mod } pq \,\mathsf{R}).$$

Thus every coset of *pq R* contains an element of *S*1. If *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ri*α*<sup>i</sup>* <sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *r <sup>i</sup>*α*<sup>i</sup>* are in *S*<sup>1</sup> and in the same coset mod *pq R*, then

$$\sum\_{i=1}^{n} \left(r\_i - r\_i'\right) \alpha\_i \equiv 0 (\text{mod } pqR).$$

Since α*<sup>i</sup>* are linearly independent, it follows that

$$r\_i \equiv r'\_i (\text{mod } pq) \quad \text{and} \quad r\_i = r'\_i, \quad 1 \le i \le n.$$

Next, suppose that the degrees of *P*<sup>1</sup> and *P*<sup>2</sup> are *n*, then *N* (*P*1) = *p<sup>n</sup>* and *N* (*P*2) = *q<sup>n</sup>*, by (4) we thus have *P*<sup>1</sup> = *pR*, *P*<sup>2</sup> = *q R*, and *A* = *pq R*. The second assertion follows immediately.

If one replaces *S* by *S*<sup>1</sup> in Table 1, then the successful probability of decryption is

$$N(A)/p^n q^n = p^{f\_l - n} q^{f\_l - n},\tag{6}$$

where *f*<sup>1</sup> and *f*<sup>2</sup> are the degrees of *P*<sup>1</sup> and *P*2, respectively.

We note that *f*<sup>1</sup> = *f*<sup>2</sup> = *n* if and only if *P*<sup>1</sup> = *pR* and *P*<sup>2</sup> = *q R*; in this special case, we may give a numerical explanation. It is easy to see that

$$
\varphi(A) = \varphi(pR)\varphi(qR) = \left(p^n - 1\right)\left(q^n - 1\right)\dots
$$

By Theorem 1, for any *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>, we have

On the High Dimensional RSA Algorithm … 173

$$a^{k(p''-1)(q''-1)+1} \equiv a(\text{mod } pq), \; k \in \mathbb{Z}, \; k \ge 0. \tag{7}$$

Since *<sup>S</sup>*<sup>1</sup> is a set of coset representatives of *<sup>R</sup>*/*A*, α <sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ai*α*<sup>i</sup>* ∈ *S*1, we may regard <sup>α</sup> as a vector (*a*1, *<sup>a</sup>*2,..., *an*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> pq* . Let *m* = *pq*, 1 ≤ *e* < (*p<sup>n</sup>* − 1) (*q<sup>n</sup>* − 1) and 1 *d* < (*p<sup>n</sup>* − 1) (*q<sup>n</sup>* − 1) such that

$$ed \equiv 1 (\text{mod } (p^n - 1)(q^n - 1))... $$

Then for every input message α = (*a*1, *a*2, ··· , *an*), we use the public key (*m*, *e*) and private key (*p*, *q*, *d*) to encryption and decryption for each *ai* in order, obviously, these are the algorithms given by Takagi and Naito (2015), we consider these algorithms are just a simple repeat of RSA.

The main purpose of this chapter is to show that the high dimensional form of RSA algorithm is a lattice based on cryptosystem in general. To do this, we first establish a relationship between an algebraic number field *E* and the Euclidean space *Q<sup>n</sup>*. Let R*<sup>n</sup>* be the Euclidean space which is a linear space over R with the Euclidean norm |*x*|,

$$|\mathbf{x}| = \left(\sum\_{i=1}^{n} \mathbf{x}\_i^2\right)^{\frac{1}{2}}, \quad \text{where} \ \mathbf{x}' = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) \in \mathbb{R}^n. \tag{8}$$

We use the column notation for vector in R*<sup>n</sup>*, and *x* is the transpose of *x*, which is called a row vector in <sup>R</sup>*<sup>n</sup>*. *<sup>Q</sup><sup>n</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is a subspace of <sup>R</sup>*<sup>n</sup>*.

Without loss of generality, an algebraic number field *E* of degree *n* may be expressed as *E* = *Q*(θ ), where θ is an algebraic integer of degree *n* and *Q*(θ ) is the field generated by θ over *Q*. Let φ(*x*) be the minimal polynomial of θ,

$$\phi(\mathbf{x}) = \mathbf{x}^n - \phi\_{n-1}\mathbf{x}^{n-1} - \dots - \phi\_1 \mathbf{x} - \phi\_0 \in \mathbb{Z}[\mathbf{x}],\tag{9}$$

where all <sup>φ</sup>*<sup>i</sup>* <sup>∈</sup> <sup>Z</sup>. It is known that

$$E = \mathcal{Q}[\theta] = \left\{ \sum\_{i=0}^{n-1} a\_i \theta^i \mid a\_i \in \mathcal{Q} \right\}. \tag{10}$$

We define an one to one correspondence between *E* and *Q<sup>n</sup>* by τ :

$$\alpha = \sum\_{i=0}^{n-1} a\_i \theta^i \in E \stackrel{\mathfrak{r}}{\longrightarrow} \overline{\alpha} = \begin{pmatrix} a\_0 \\ a\_1 \\ \vdots \\ a\_{n-1} \end{pmatrix} \in \mathcal{Q}^n \tag{11}$$

and write τ (α) <sup>=</sup> <sup>α</sup> or <sup>α</sup> <sup>τ</sup> → α. In fact, τ is a homomorphism of additive group from *<sup>E</sup>* to *<sup>Q</sup><sup>n</sup>*, because of τ (*a*α) <sup>=</sup> *<sup>a</sup>*τ (α) for all *<sup>a</sup>* <sup>∈</sup> <sup>Q</sup>.

As usual, the trace and norm mappings from *E* to *Q* are denoted by

$$\text{tr}(\alpha) = \text{tr}\_{E/\mathbb{Q}}(\alpha), \quad \text{and} \quad N(\alpha) = N\_{E/\mathbb{Q}}(\alpha).$$

It is known (see corollary of page 58 of Narkiewicz (2004)) that

$$N(\alpha R) = |N(\alpha)|, \quad \forall \alpha \in R. \tag{12}$$

A full-rank lattice *L* is a discrete addition subgroup of R*<sup>n</sup>*, the equivalent expression for *L* is (See Micciancio and Regev (2009), Zheng et al. (2023))

$$L = L(B) = \left\{ Bx \mid x \in \mathbb{Z}^n \right\},\tag{13}$$

where *B* = β1, β2, ··· , β*<sup>n</sup> <sup>n</sup>*×*<sup>n</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is an invertible matrix of *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* dimension, *<sup>B</sup>* is called a generated matrix of *<sup>L</sup>*. If *<sup>L</sup>* <sup>⊂</sup> *<sup>Q</sup><sup>n</sup>*, we call *<sup>L</sup>* a rational lattice, if *<sup>L</sup>* <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup>*, we call *L* an integer lattice. It is not difficult to see that every ideal of *R* corresponds to an rational lattice, we have the following.

**Lemma 2** *Let A* ⊂ *R be an ideal and A* = 0*, then* τ (*A*) *is a rational lattice.*

*Proof* Let {β1, β2, ··· , β*n*} ⊂ *A* be an integral basis for *E*/*Q*, one has

$$A = \mathbb{Z}\beta\_1 + \mathbb{Z}\beta\_2 + \dots + \mathbb{Z}\beta\_n.$$

It follows that

$$
\pi(A) = \mathbb{Z}\overline{\beta}\_1 + \mathbb{Z}\overline{\beta}\_2 + \dots + \mathbb{Z}\overline{\beta}\_n,
$$

where β*<sup>i</sup>* = τ (β*i*) ∈ *Q<sup>n</sup>*. Let *B* = [β1, β2, ··· , β*n*], since {β1, β2, ··· , β*n*} is linearly independent over *Q*, thus *B* is an invertible matrix, and we have

$$\pi(A) = L(B) = \{ Bx \mid x \in \mathbb{Z}^n \}.$$

The lemma follows at once.

Let *L* ⊂ *Q<sup>n</sup>* be a rational lattice, of which be corresponded by an ideal *A* in *E* for some suitable algebraic number field *E*, we call *L* an ideal lattice. Ideal lattice was first introduced by Lyubashevsky and Micciancio (2006) in the case of integer lattice, here we generalize this notation to the case of rational lattices. For more detailed discussion about ideal lattice, we refer to (Zheng et al., 2023).

To give an attainable algorithm for high dimensional RSA, we require the following NC-property for the algebraic number field E.

$$\text{NC-property:} \qquad E = \mathcal{Q}(\theta) \quad \text{and} \quad \mathcal{R} = \mathbb{Z}[\theta], \tag{14}$$

where

**Table 2** Algebraic number fields with NC-property

#### **Algebraic Number Fields with NC-property**


*<sup>E</sup>* <sup>=</sup> *<sup>Q</sup>* (ξ*n*), where <sup>ξ</sup>*<sup>n</sup>* <sup>=</sup> *<sup>e</sup>*2π*i*/*<sup>n</sup>* is a primitive *<sup>n</sup>*-th root of unity

• Totally Real Algebraic Number Fields (see Proposition 2.16 of Washington (1982)) *<sup>E</sup>* <sup>=</sup> *<sup>Q</sup>*(ξ*<sup>n</sup>* <sup>+</sup> <sup>ξ</sup>−<sup>1</sup> *<sup>n</sup>* ), and *<sup>E</sup>* <sup>⊂</sup> <sup>R</sup> is the maximal real subfield of *<sup>Q</sup>*(ξ*n*)

$$\mathbb{Z}[\theta] = \left\{ \sum\_{i=0}^{n-1} a\_i \theta^i \mid a\_i \in \mathbb{Z}, \ 1 \le i \le n \right\}.\tag{15}$$

Some of the well-known algebraic number fields satisfy the NC-property, we list a few as follows (Table 2).

# **2 Ideal Matrices**

Suppose that θ is an algebraic integer of degree *n*, φ(*x*) = *x <sup>n</sup>* − φ*<sup>n</sup>*−1*x <sup>n</sup>*−<sup>1</sup> −···− <sup>φ</sup>1*<sup>x</sup>* <sup>−</sup> <sup>φ</sup><sup>0</sup> <sup>∈</sup> <sup>Z</sup>[*x*] is the minimal polynomial of <sup>θ</sup>, thus φ(*x*) is irreducible. Let <sup>θ</sup> <sup>=</sup> θ0, θ1, θ2, ··· , θ*<sup>n</sup>*−<sup>1</sup> be *n* different roots of φ(*x*), the Vandermonde matrix of φ(*x*) is defined by

$$V = V\_{\phi} = \begin{bmatrix} \theta\_j^i \end{bmatrix}\_{0 \le i, j \le n-1}, \text{ and } \ \Delta = \det(V\_{\phi}) \ne 0. \tag{16}$$

According to φ(*x*), we denote the rotation matrix or adjoint matrix (see page 116 of Manin and Panchishkin (2005)) by

$$H = H\_{\phi} = \begin{pmatrix} 0 & \cdots & 0 \\ \hline & & \phi\_1 \\ & I\_{n-1} & \\ & & \vdots \\ & & & \phi\_{n-1} \end{pmatrix} \in \mathbb{Z}^{n \times n},\tag{17}$$

where *In*−<sup>1</sup> is the unit matrix of *n* − 1 dimension.

**Definition 1** An ideal matrix *<sup>H</sup>*∗( *<sup>f</sup>* ) generated by the input vector *<sup>f</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is defined by

$$H^\*(\overline{f}) = \left[ \, \overline{f}, H\overline{f}, \dots, H^{n-1}\overline{f} \right]\_{n \times n} \in \mathbb{R}^{n \times n} \tag{18}$$

and all ideal matrices are denoted by

$$M^\*\_{\mathbb{R}} = \left\{ H^\*(\overline{f}) \mid \overline{f} \in \mathbb{R}^n \right\} \quad \text{and} \quad M^\*\_{\mathcal{Q}} = \left\{ H^\*(\overline{f}) \mid \overline{f} \in \mathcal{Q}^n \right\}. \tag{19}$$

**Definition 2** For any two vectors *f* and *g* in R*<sup>n</sup>*, the φ-conventional product is defined by

$$f \otimes \overline{\mathfrak{g}} = H^\*(f)\overline{\mathfrak{g}} \tag{20}$$

and the m-multi product is denoted by

$$\overline{f}^{\otimes m} = \overbrace{\overline{f} \otimes \overline{f}}^{m} \otimes \dots \otimes \overline{f}, \quad m \in \mathbb{Z}, \ m \ge 1. \tag{21}$$

**Remark 1** If φ(*x*) = *x <sup>n</sup>* − 1, then *H*<sup>φ</sup> is the classical circulant matrix (see Davis (1994)), and conventional product with circulant matrix was first proposed by Hoffstein et al. (1998), which plays a key role in their cryptosystem. In Zheng et al. (2023), we generalized this definition with more general rotation matrices.

By (18), *H*∗( *f* ) = 0 is a zero matrix if and only if *f* = 0 is a zero vector, and *H*∗( *f* + *g*) = *H*∗( *f* ) + *H*∗(*g*), then *H*∗( *f* ) = *H*∗(*g*) if and only if *f* = *g*. Thus we may regard *<sup>H</sup>*<sup>∗</sup> : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>M</sup><sup>∗</sup> <sup>R</sup> as an one to one correspondence, which is also a homomorphism of Abel group.

The main aim of this subsection is to show the *Q<sup>n</sup>* is a field under the φconventional product and *M*<sup>∗</sup> *<sup>Q</sup>* is also a field under the ordinary additive and product of matrices, both of which are isomorphic to the algebraic number field *E* = *Q*(θ ). To do this, we require some basic properties of the ideal matrices.

Let *<sup>e</sup>*1, *<sup>e</sup>*2, ··· , *en* be the unit vectors of <sup>R</sup>*<sup>n</sup>*, namely

$$
\overline{e}\_1 = \begin{pmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}, \overline{e}\_2 = \begin{pmatrix} 0 \\ 1 \\ \vdots \\ 0 \end{pmatrix}, \cdots, \quad \overline{e}\_n = \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{pmatrix}. \tag{22}
$$

**Lemma 3** *Let* τ *be defined by (11), then we have*

$$\begin{cases} \tau \left( \theta^k \right) = \overline{e}\_{k+1}, & 0 \le k \le n-1 \\ H^\* \left( \overline{e}\_k \right) = H^{k-1}, & 1 \le k \le n \end{cases} \tag{23}$$

*Proof* τ θ *k* = *ek*+<sup>1</sup> follows directly from the definition of τ . We use induction to prove *H*<sup>∗</sup> (*ek* ) = *H<sup>k</sup>*−1. It is easy to see that *H*<sup>∗</sup> (*e*1) = *In*, the unit matrix of *n* dimension. Suppose that *H*<sup>∗</sup> (*ek*−<sup>1</sup>) = *H<sup>k</sup>*−2, for *k* ≥ 2, note that *ek* = *Hek*−1, it follows that

On the High Dimensional RSA Algorithm … 177

$$\begin{aligned} H^\*\left(\overline{e}\_k\right) &= \left[ H\overline{e}\_{k-1}, H^2\overline{e}\_{k-1}, \dots, H^n\overline{e}\_{k-1} \right] \\ &= H\left[ \overline{e}\_{k-1}, H\overline{e}\_{k-1}, \dots, H^{n-1}\overline{e}\_{k-1} \right] \\ &= HH^\*(\overline{e}\_{k-1}) = HH^{k-2} = H^{k-1} . \end{aligned}$$

The lemma follows immediately.

Since φ(*x*) is the characteristic polynomial of *H*, by the Hamilton-Cayley theorem, we have

$$\phi(H) = 0,\text{ or } H^n = \phi\_0 + \phi\_1 H + \dots + \phi\_{n-1} H^{n-1}.\tag{24}$$

Therefore, all the rotation matrices *H<sup>k</sup>* (*k* ≥ 0) are the ideal matrices, especially, the unit matrix *In* = *H*<sup>∗</sup> (*e*1) is an ideal matrix.

Let <sup>R</sup>[*x*] be the polynomials ring and <sup>R</sup>(*x*)/φ(*x*) be the quotient ring, where φ(*x*) is the principal ideal generated by φ(*x*) in <sup>R</sup>[*x*]. We establish an one to one correspondence *<sup>t</sup>* between <sup>R</sup>*<sup>n</sup>* and <sup>R</sup>[*x*]/φ(*x*) by

$$\overline{f} = \begin{pmatrix} f\_0 \\ f\_1 \\ \vdots \\ f\_{n-1} \end{pmatrix} \in \mathbb{R}^n \stackrel{\iota}{\longrightarrow} f(\mathbf{x}) = f\_0 + f\_1 \mathbf{x} + \dots + f\_{n-1} \mathbf{x}^{n-1} \in \mathbb{R}[\mathbf{x}] / \langle \phi(\mathbf{x}) \rangle \tag{25}$$

and write *t*( *f* ) = *f* (*x*), or *t*−<sup>1</sup>( *f* (*x*)) = *f* .

**Lemma 4** *For any <sup>f</sup>* <sup>∈</sup> <sup>R</sup>*n, the ideal matrix H*∗( *<sup>f</sup>* ) *is given by*

$$H^\*(\overline{f}) = f(H) = f\_0 I\_n + f\_1 H + \dots + f\_{n-1} H^{n-1}.\tag{26}$$

*Moreover, if F*(*x*) <sup>∈</sup> <sup>R</sup>[*x*] *and F*(*x*) <sup>≡</sup> *<sup>f</sup>* (*x*)(*mod* φ(*x*))*, then f* (*H*) <sup>=</sup> *<sup>F</sup>*(*H*)*.*

*Proof* Writing *f* = *f*0*e*<sup>1</sup> + *f*1*e*<sup>2</sup> +···+ *fn*−<sup>1</sup>*en*, by Lemma 3, we have

$$\begin{aligned} H^\*(\overline{f}) &= f\_0 H^\*(\overline{e}\_1) + f\_1 H^\*(\overline{e}\_2) + \dots + f\_{n-1} H^\*(\overline{e}\_n), \\ &= f\_0 I\_n + f\_1 H + \dots + f\_{n-1} H^{n-1} = f(H). \end{aligned}$$

Suppose that *F*(*x*) ≡ *f* (*x*)(mod φ(*x*)), by (24), we have *f* (*H*) = *F*(*H*) immediately.

**Lemma 5** *Let f and g be two vectors in* R*n, and f* (*x*), *g*(*x*) *be the corresponding polynomials, respectively, then we have*

$$\text{tr}(\overline{f}\otimes\overline{g}) \equiv f(\text{x})\text{g}(\text{x})(\text{mod }\phi(\text{x})).\tag{27}$$

*Proof* Since *t* is a bijection, it is suffice to show that

178 Z. Zhiyong et al.

$$t^{-1}(f(\mathbf{x})\mathbf{g}(\mathbf{x})) = \overline{f} \otimes \overline{\mathbf{g}}.\tag{28}$$

Let *<sup>g</sup>*(*x*) <sup>=</sup> *<sup>g</sup>*<sup>0</sup> <sup>+</sup> *<sup>g</sup>*1(*x*) +···+ *gn*−1*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> <sup>∈</sup> <sup>R</sup>[*x*]/φ(*x*), then

$$\begin{aligned} \log(\mathbf{x}) &= \mathbf{g}\_0 \mathbf{x} + \dots + \mathbf{g}\_{n-1} \mathbf{x}^n \\ &= \mathbf{g}\_{n-1} \phi\_0 + (\mathbf{g}\_0 + \phi\_1 \mathbf{g}\_{n-1}) \mathbf{x} + \dots + (\mathbf{g}\_{n-2} + \phi\_{n-1} \mathbf{g}\_{n-1}) \mathbf{x}^{n-1} .\end{aligned}$$

It follows that

$$t^{-1}(\arg(\mathbf{x})) = Ht^{-1}(\mathbf{g}(\mathbf{x})) = H\overline{\mathbf{g}}.$$

More generally, we have

$$t^{-1}\left(\mathbf{x}^k \mathbf{g}(\mathbf{x})\right) = H^k t^{-1}(\mathbf{g}(\mathbf{x})) = H^k \overline{\mathbf{g}}, \quad 0 \le k \le n - 1. \tag{29}$$

Let *f* (*x*) = *f*<sup>0</sup> + *f*1*x* +···+ *fn*−1*x <sup>n</sup>*−1, then

$$\pi^{-1}(f(\mathbf{x})\mathbf{g}(\mathbf{x})) = \sum\_{k=0}^{n-1} f\_k \mathbf{t}^{-1} \left(\mathbf{x}^k \mathbf{g}(\mathbf{x})\right) = \sum\_{k=0}^{n-1} f\_k H^k \overline{\mathbf{g}} = H^\*(\overline{f}) \overline{\mathbf{g}} = \overline{f} \otimes \overline{\mathbf{g}}.$$

The lemma follows immediately.

$$\textbf{Lemma 6}\quad For\ any\ two\ vectors\ \overline{f} = \begin{pmatrix} f\_0 \\ f\_1 \\ \vdots \\ \vdots \\ f\_{n-1} \end{pmatrix} \in \mathbb{R}^n,\ \overline{g} = \begin{pmatrix} g\_0 \\ g\_1 \\ \vdots \\ g\_{n-1} \end{pmatrix} \in \mathbb{R}^n,\ we\ have$$

$$\text{the following properties for ideal matrices:}$$

*the following properties for ideal matrices:*

*i H*∗( *f* )*H*∗(*g*) = *H*<sup>∗</sup> *g*)*H*∗( *f* ; *ii H*∗( *f* )*H*∗(*g*) = *H*∗(*H*∗( *f* )*g*); *iii H*∗( *<sup>f</sup>* ) <sup>=</sup> *<sup>V</sup>* <sup>−</sup><sup>1</sup> <sup>φ</sup> diag { *f* (θ0), *f* (θ1), ··· , *f* (θ*<sup>n</sup>*−<sup>1</sup>)} *V*φ*; iv* det *H*∗( *f* ) <sup>=</sup> *<sup>n</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *f* (θ*i*)*; v If f* ∈ *Qn, f* = 0*, then H*∗( *f* ) *is an invertible matrix and*

$$\left(H^\*(\overline{f})\right)^{-1} = H^\*(\overline{u}),$$

*where u*(*x*) ∈ *Q*[*x*] *is the unique polynomial such that u*(*x*) *f* (*x*) ≡ 1(*mod* φ(*x*)) *in Q*[*x*]*.*

*Proof* By Lemma 4, we have

$$H^\*(\overline{f})H^\*(\overline{\mathfrak{g}}) = f(H)\mathfrak{g}(H) = \mathfrak{g}(H)f(H) = H^\*(\overline{\mathfrak{g}})H^\*(\overline{f}).$$

To prove (ii), we write *H*∗( *f* )*g* = *f* ⊗ *g*, it follows that

On the High Dimensional RSA Algorithm … 179

$$H^\*\left(H^\*(\overline{f})\overline{\mathfrak{g}}\right) = H^\*(\overline{f}\otimes \overline{\mathfrak{g}}) = f(H)\mathfrak{g}(H) = H^\*(\overline{f})\cdot H^\*(\overline{\mathfrak{g}}).$$

By Theorem 3.5 of Davis (1994), we have

$$H = V\_{\phi}^{-1} \operatorname{diag} \{ \theta\_0, \theta\_1, \dots, \theta\_{n-1} \} \, V\_{\phi}. \tag{30}$$

It follows that

$$H^\*(\overline{f}) = f(H) = V\_{\phi}^{-1} \operatorname{diag} \left\{ f \left( \theta\_0 \right), f \left( \theta\_1 \right), \dots, f \left( \theta\_{n-1} \right) \right\} V\_{\phi}.$$

Since diag { *f* (θ0), *f* (θ1), ··· , *f* (θ*<sup>n</sup>*−<sup>1</sup>)} is a diagonal matrix, we have

$$\det\left(H^\*(\overline{f})\right) = \det\left(\text{diag}\left\{f\left(\theta\_0\right), f\left(\theta\_1\right), \dots, f\left(\theta\_{n-1}\right)\right\}\right) = \prod\_{i=0}^{n-1} f\left(\theta\_i\right) \dots$$

To show the last assertion, since *f* ∈ *Q<sup>n</sup>*, *f* = 0, and φ(*x*) is an irreducible polynomial, thus we have ( *f* (*x*), φ(*x*)) = 1 in *Q*[*x*], There are *u*(*x*) ∈ *Q*[*x*] and v(*x*) ∈ *Q*[*x*] such that

$$
\mu(\mathbf{x})f(\mathbf{x}) + v(\mathbf{x})\phi(\mathbf{x}) = 1.
$$

By (29) and noting that *<sup>t</sup>*−<sup>1</sup>(1) <sup>=</sup> *<sup>e</sup>*<sup>1</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*, we have *<sup>u</sup>* <sup>⊗</sup> *<sup>f</sup>* <sup>=</sup> *<sup>e</sup>*1. It follows that

$$H^\*(\overline{\mu}) \cdot H^\*(f) = H^\*(\overline{e}\_1) = I\_n.$$

We complete the proof of Lemma.

Next, we discuss the algebraic number field *E* = *Q*(θ ) and recall τ is an one to one correspondence between *E* and *Q<sup>n</sup>*.

**Lemma 7** *For any two elements* α *and* β *in E, we have*

$$
\pi(\alpha\beta) = \pi(\alpha) \otimes \pi(\beta) = \overline{\alpha} \otimes \overline{\beta}.\tag{31}
$$

*Proof* Let β = β<sup>0</sup> + β1θ +···+ β*<sup>n</sup>*−<sup>1</sup>θ *<sup>n</sup>*−1, where β*<sup>i</sup>* ∈ *Q*, it is easily seen that

$$
\theta \theta \beta = \phi\_0 \beta\_{n-1} + \left(\beta\_0 + \phi\_1 \beta\_{n-1}\right) \theta + \dots + \left(\beta\_{n-2} + \phi\_{n-1} \beta\_{n-1}\right) \theta^{n-1},
$$

thus we have τ (θβ) = *H*τ (β) = *H*β, and

$$\pi\left(\theta^{k}\beta\right) = H^{k}\pi(\beta) = H^{k}\overline{\beta}, \quad 0 \le k \le n-1. \tag{32}$$

Let α = α<sup>0</sup> + α1θ +···+ α*<sup>n</sup>*−<sup>1</sup>θ *<sup>n</sup>*−1, by Lemma 4, we have

$$\pi(\alpha \beta) = \sum\_{k=0}^{n-1} \alpha\_k \pi \left(\theta^k \beta\right) = \sum\_{k=0}^{n-1} \alpha\_k H^k \overline{\beta} = H^\*(\overline{\alpha}) \overline{\beta} = \overline{\alpha} \otimes \overline{\beta}.$$

the lemma follows immediately.

Let *A* = *ai j <sup>n</sup>*×*<sup>n</sup>* be a square matrix, and the trace of *<sup>A</sup>* is defined by Tr(*A*) <sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *aii* as usual. The main result of this subsection is the following theorem.

**Theorem 2** *Let E* = *Q*(θ ) *be an algebraic number field of degree n, and* φ(*x*) ∈ <sup>Z</sup>[*x*] *be the minimal polynomial of* <sup>θ</sup>*. Then the linear space Q<sup>n</sup> is a field under the* φ*-conventional product, and all of the ideal matrices M*<sup>∗</sup> *<sup>Q</sup> generated by rational vectors is also a field with the ordinary additive and product of matrices. Both of them are isomorphic to E, namely*

$$E \cong \mathbb{Q}^n \cong \mathbb{M}\_{\mathbb{Q}}^\*. \tag{33}$$

*Moreover, let* α ∈ *E,* tr(α) *and N*(α) *be the trace and norm of* α*, then we have*

$$\text{tr}(\alpha) = \text{Tr}\left(H^\*(\overline{\alpha})\right), \quad \text{and} \quad N(\alpha) = \text{det}\left(H^\*(\overline{\alpha})\right). \tag{34}$$

*Proof* τ : *E* → *Q<sup>n</sup>* given by (11), it is clearly that

$$
\pi(\alpha + \beta) = \pi(\alpha) + \pi(\beta), \quad \text{and} \quad \pi(\alpha \beta) = \pi(\alpha) \otimes \pi(\beta).
$$

Thus *<sup>Q</sup><sup>n</sup>* is a field under the <sup>φ</sup>-conventional product and *<sup>E</sup>* ∼= *<sup>Q</sup><sup>n</sup>*. By Lemma 6, we have

$$H^\*(\overline{\alpha} + \overline{\beta}) = H^\*(\overline{\alpha}) + H^\*(\overline{\beta}) \quad \text{and} \quad H^\*\left(\overline{\alpha} \otimes \overline{\beta}\right) = H^\*(\overline{\alpha})H^\*(\overline{\beta}),$$

thus *M*<sup>∗</sup> *<sup>Q</sup>* is also a field and *<sup>E</sup>* ∼= *<sup>Q</sup><sup>n</sup>* ∼= *<sup>M</sup>*<sup>∗</sup> *Q*.

The main difficulty is to prove (34). We observe that θ induces a linear transformation of *E*/*Q* by α → θα, and the matrix of this linear transformation under basis 1,θ,θ <sup>2</sup>, ··· , θ *<sup>n</sup>*−<sup>1</sup> is just *H*, namely

$$\theta\left(1,\theta,\theta^2,\cdots,\theta^{n-1}\right) = \left(1,\theta,\theta^2,\cdots,\theta^{n-1}\right)H.\cdot$$

By the definition of trace, we have

$$\operatorname{tr}(\theta) = \operatorname{Tr}(H), \quad \text{and } \operatorname{tr}(\theta^k) = \operatorname{Tr}(H^k), \quad 1 \le k \le n - 1.$$

Let α = α<sup>0</sup> + α1θ +···+ α*<sup>n</sup>*−<sup>1</sup>θ *<sup>n</sup>*−<sup>1</sup> ∈ *E*, it follows that

$$\operatorname{tr}(\alpha) = \sum\_{k=0}^{n-1} \alpha\_i \operatorname{tr} \left( \theta^k \right) = \sum\_{i=0}^{n-1} \alpha\_i \operatorname{Tr} \left( H^k \right) = \operatorname{Tr} \left( \sum\_{k=0}^{n-1} \alpha\_i H^k \right) = \operatorname{Tr} \left( H^\*(\overline{\alpha}) \right) \dots$$

On the High Dimensional RSA Algorithm … 181

To show that conclusion on the norm, let α(*i*) (0 ≤ *i* ≤ *n* − 1) be the *n* conjugations of α in the smallest normal extension of *Q* containing *E*, where α(0) = α = α<sup>0</sup> + α1θ +···+ α*<sup>n</sup>*−1θ *<sup>n</sup>*−1. It is easily seen that

$$\alpha^{(i)} = \sum\_{k=0}^{n-1} \alpha\_k \theta\_i^k, \text{ where } \theta\_0 = \theta \text{ and } 0 \le i \le n-1.$$

By property (iii) of Lemma 6, we have

$$N(\alpha) = \prod\_{i=0}^{n-1} \alpha^{(i)} = \prod\_{i=0}^{n-1} \alpha \left(\theta\_i\right) = \det\left(H^\*(\overline{\alpha})\right).$$

We complete the proof of Theorem 2.

The cyclic lattice in R*<sup>n</sup>* was introduced by Micciancio (2007), (also see Zheng et al. (2023)), which plays an important role in Ajtai's construction of collision resistant Hash function (see Ajtai and Dwork (1997)). As an application, we show that every ideal in an algebraic number field corresponds to a cyclic lattice:

**Corollary 1** *Let A* ⊂ *R be an ideal and A* = 0*, then* τ (*A*) ⊂ *Q<sup>n</sup> is a cyclic lattice.*

*Proof* Suppose that α ∈ *A*. Since θ ∈ *R*, then θα ∈ *A*. By (31), we have

$$
\mathfrak{r}(\theta \mathfrak{a}) = H\overline{\mathfrak{a}} \in \mathfrak{r}(A).
$$

Thus τ (*A*) is a cyclic lattice.

# **3 High Dimensional RSA**

In this section, we give an attainable algorithm for the high dimensional RSA by making use of lattice theory, and this algorithm is significant both from the theoretical and practical point of view. Suppose that the algebraic numbers field *E* satisfying the NC-property, then *<sup>R</sup>* <sup>=</sup> <sup>Z</sup>[θ] is the ring of algebraic integers of *<sup>E</sup>*, the restriction of correspondence τ gives a ring isomorphism from *R* to Z*<sup>n</sup>*. Let Z(*x*) be the ring of integer coefficients polynomials and (φ(*x*)) be the principal ideal generated by φ(*x*) in <sup>Z</sup>(*x*), it is easy to see that *<sup>R</sup>* ∼= <sup>Z</sup>[*x*]/(φ(*x*)). Let *<sup>M</sup>*<sup>∗</sup> <sup>Z</sup> be the set of ideal matrices generated by an integral vector, i.e.

$$M\_{\mathbb{Z}}^{\*} = \left\{ H^{\*}(\overline{f}) \mid \overline{f} \in \mathbb{Z}^{n} \right\}. \tag{35}$$

Then the following four rings are isomorphic from each other

$$\mathbb{Z}[\mathbf{x}]/(\phi(\mathbf{x})) \cong R \cong \mathbb{Z}^n \cong M\_\mathbb{Z}^\*. \tag{36}$$

For any polynomial α(*x*) <sup>=</sup> <sup>α</sup><sup>0</sup> <sup>+</sup> <sup>α</sup>1*<sup>x</sup>* +···+ <sup>α</sup>*<sup>n</sup>*−1*<sup>x</sup> <sup>n</sup>*−<sup>1</sup> <sup>∈</sup> <sup>Z</sup>[*x*]/(φ(*x*)), the corresponding algebraic integer is α = α<sup>0</sup> + α1θ +···+ α*<sup>n</sup>*−1θ *<sup>n</sup>*−<sup>1</sup> ∈ *R*, we write this isomorphism by

$$
\alpha(\mathfrak{x}) \to \mathfrak{a} \xrightarrow{\mathfrak{r}} \overline{\mathfrak{a}} \xrightarrow{H^\*} H^\*(\alpha). \tag{37}
$$

A φ-ideal lattice means an integer lattice of which corresponds an ideal of Z(*x*)/(φ(*x*)), it was first introduced by Lyubashevsky and Micciancio in (see also Zheng et al. (2023)), which also plays a key role in Gentry's construction for the full homomorphic cryptosystem (see Gentry (2009)), and Fluckiger and Suarez (2006) extended this definition to total real number field.

**Lemma 8** *Let E be an algebraic numbers field with NC- property, R* <sup>=</sup> <sup>Z</sup>[θ] *be the ring of algebraic integers of E. Then there is an one to one correspondence between ideals of R and the* φ*-ideal lattices. Moreover, if* α ∈ *R, then we have*

$$\pi(\alpha R) = L\left(H^\*(\overline{\alpha})\right). \tag{38}$$

*In general, suppose that A* ⊂ *R is an ideal and A* = 0*, then there exist two elements* α *and* β *in A such that*

$$\pi(A) = L\left(H^\*(\overline{\alpha})\right) + L\left(H^\*(\overline{\beta})\right). \tag{39}$$

*Proof* Since there is an one to one correspondence between the φ-ideal lattices and the ideals of <sup>Z</sup>[*x*]/(φ(*x*)) (See Corollary of Zheng et al. (2023)), by (36), the first assertion follows immediately. Let α ∈ *R*, then α*R* = {α*x* | *x* ∈ *R*}, by Lemma 7 we have

$$\pi(\alpha \mathbf{x}) = H^\*(\alpha)\overline{\mathbf{x}}, \quad \text{where } \overline{\mathbf{x}} = \begin{pmatrix} x\_0 \\ x\_1 \\ \vdots \\ x\_{n-1} \end{pmatrix} \in \mathbb{Z}^n.$$

It follows that

$$\pi(\alpha R) = \left\{ H^\*(\alpha) \overline{x} \mid \overline{x} \in \mathbb{Z}^n \right\} = L\left( H^\*(\overline{\alpha}) \right) \dots$$

To prove (39), it is known that any ideal of *R* is generated by at most two elements (see corollary 5 of page 11 of Narkiewicz (2004) ), namely, *A* = α*R* + β*R*, then we have

$$
\pi(A) = \pi(\alpha R) + \pi(\beta R) = L\left(H^\*(\overline{\alpha})\right) + L\left(H^\*(\beta)\right).
$$

To introduce an attainable algorithm for high dimensional RSA, we require some basic results from lattice theory. Let *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a full-rank lattice, and the determinant of *L* is defined by

$$d(L) = |\det(B)|.\tag{40}$$

Suppose that the generated matrix *B* = *b*1, *b*2, ··· , *bn* , *bi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is the column vectors of *B*. Since *b*1, *b*2, ··· , *bn* is a basis for <sup>R</sup>*<sup>n</sup>*, let *<sup>B</sup>*<sup>∗</sup> <sup>=</sup> *b* ∗ <sup>1</sup>, *b* ∗ <sup>2</sup>, ··· , *b* ∗ *n* be the corresponding orthogonal basis, where *b* ∗ <sup>1</sup> = *b*1, and *b* ∗ *<sup>i</sup>* is obtained by the Gram-Schmidt orthogonal process in order.

A basis *B* is called in Hermited Normal Form (HNF) if it is upper triangular, all elements on the diagonal are strictly positive, and any other elements *bi j* satisfies 0 ≤ *bi j* < *bii* . It is easy to see that every integer lattice *L* = *L*(*B*) has a unique basis in Hermited Normal Form, denoted by HNF(*L*) (see Theorem 2.4.3 of Cohen (1993)). Moreover, given any basis *B* for lattice *L*, HNF(*L*) can be efficiently computed from *B* (see Cohen (1993), Micciancio (2001)).

**Proposition 1** *Let L* = *L*(*B*) *and B* = (*bi j*)*<sup>n</sup>*×*<sup>n</sup> be the basis in HNF. Then the corresponding orthogonal basis B*<sup>∗</sup> *is a diagonal matrix, namely*

$$B^\* = \text{diag}\left\{ b\_{11}, b\_{22}, \dots, b\_{nn} \right\}.\tag{41}$$

*Moreover, we have*

$$d(L) = \prod\_{i=1}^{n} b\_{ii}. \tag{42}$$

*Proof* See Micciancio (2001).

**Definition 3** Let *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* be a full-rank lattice, and *<sup>B</sup>*<sup>∗</sup> <sup>=</sup> *b* ∗ <sup>1</sup>, *b* ∗ <sup>2</sup>, ··· , *b* ∗ *n* be the corresponding orthogonal basis, the orthogonal parallelepiped *F* (*B*∗) is defined by

$$F(B^\*) = \left\{ \sum\_{i=1}^n x\_i \overline{b}\_i^\* \mid 0 \le x\_i < 1 \text{ and } x\_i \in \mathbb{R} \right\}.\tag{43}$$

**Proposition 2** *Let L* <sup>=</sup> *<sup>L</sup>*(*B*) <sup>⊂</sup> <sup>Z</sup>*<sup>n</sup> be an integer lattice, B* <sup>=</sup> HNF(*L*) *be the basis in* HNF *and B*<sup>∗</sup> = diag {*b*11, *b*22, ··· , *bnn*} *be the corresponding orthogonal basis, F* (*B*∗) *is the orthogonal parallelepiped given by (43), then S is a set of coset representatives for the quotient group* Z*<sup>n</sup>*/*L, where*

$$S = F\left(\mathcal{B}^\*\right) \cap \mathbb{Z}^n = \left\{ \mathbf{x}' = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) \mid \forall \mathbf{x}\_i \in \mathbb{Z} \quad \text{and} \quad 0 \le \mathbf{x}\_1 < b\_{li} \right\}.$$

*Proof* See Sect. 4.1 of Micciancio (2001).

Now, we return to the algebraic numbers field *E* = *Q*[θ] (with NC-property). Let α, β ∈ *R* be two algebraic integers, by Lemma 8, the principal ideal α*R* corresponds to the minimal φ-ideal lattice *L*(*H*∗(α)). Thus *A* = (α*R*)(β*R*) = αβ*R* corresponds to *L* (*H*∗(α ⊗ β)).

**Definition 4** For given α, β ∈ *R*, τ (α) = α, and τ (β) = β, we denote the lattice *L*α,β by

$$L\_{\alpha,\beta} = L\left(H^\*(\overline{\alpha} \otimes \overline{\beta})\right). \tag{44}$$

The HNF basis of *L*α,β is denoted by *B*α,β and the corresponding orthogonal basis is denoted by

$$B^\*\_{\alpha,\beta} = \text{diag}\left\{ b\_1, b\_2, \dots, b\_n \right\},\tag{45}$$

where *bi* <sup>∈</sup> <sup>Z</sup> and *bi* <sup>≥</sup> 1. The parallelepiped is given by

$$S\_{\alpha,\beta} = \left\{ (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) \in \mathbb{Z}^n \mid \mathbf{x}\_i \in \mathbb{Z} \quad \text{and} \quad 0 \le \mathbf{x}\_i < b\_i \right\}. \tag{46}$$

**Lemma 9** *Let* α ∈ *R*, β ∈ *R, and A* = αβ*R. Then S*α,β *given by (46) is corresponding to a set of coset representatives of the factor ring R*/*A in the algebraic numbers field E with NC-property.*

*Proof* By Proposition 1, it is easy to see that

$$\left| S\_{\alpha,\beta} \right| = \prod\_{i=1}^{n} b\_i = \left| \det \left( H^\*(\overline{\alpha} \otimes \overline{\beta}) \right) \right| = \left| \det \left( H^\*(\overline{\alpha}) \right) \right| \cdot \left| \det \left( H^\*(\overline{\beta}) \right) \right| = d \left( L\_{\alpha,\beta} \right) \dots$$

By Theorems 2 and (12), we have

$$N(A) = |N(\boldsymbol{a} \cdot \boldsymbol{\beta})| = |N(\boldsymbol{a})| \cdot |N(\boldsymbol{\beta})| = \left| \det \left( H^\*(\overline{\boldsymbol{a}}) \right) \right| \cdot \left| \det \left( H^\*(\overline{\boldsymbol{\beta}}) \right) \right| = d \left( L\_{\boldsymbol{a}, \overline{\boldsymbol{\beta}}} \right) \dots$$

It follows that *N*(*A*) = *<sup>S</sup>*α,β . Since *<sup>E</sup>* satisfies NC-property, if <sup>α</sup> <sup>∈</sup> *<sup>R</sup>*, then <sup>α</sup> <sup>=</sup> τ (α) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, hence <sup>α</sup> <sup>≡</sup> β(mod *<sup>A</sup>*) in *<sup>R</sup>*, if and only if

$$
\overline{\alpha} \equiv \overline{\beta} \pmod{L\_{\alpha, \beta}} \dots
$$

The lemma follows from Proposition 2 immediately.

The main result of this subsection is the following theorem.

**Theorem 3** *Let E be an algebraic numbers field of degree n with NC-property,* α ∈ *R*, β ∈ *R be two distinct prime elements, A* = αβ*R, and L*α,β *be the lattice given by (44). Then for any <sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup>, *<sup>k</sup>* <sup>≥</sup> <sup>0</sup>*, we have*

$$\overline{a}^{\otimes (k\varphi(a,\beta)+1)} \equiv \overline{a} \left( mod \quad L\_{a,\beta} \right), \tag{47}$$

*where*

$$\varphi(\alpha, \beta) = \left( \left| \det \left( H^\*(\overline{\alpha}) \right) \right| - 1 \right) \left( \left| \det \left( H^\*(\overline{\beta}) \right) \right| - 1 \right). \tag{48}$$

*Proof* Since *<sup>E</sup>* satisfies NC-property, *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*, then *<sup>a</sup>* <sup>=</sup> <sup>τ</sup> <sup>−</sup><sup>1</sup>(*a*) <sup>∈</sup> *<sup>R</sup>*. By Theorem 1, we have

$$a^{k\wp(A)+1} \equiv a(\mod A).$$

It is easy to see that

#### **Table 3** Algorithm I

#### **Algorithm 9.1: RSA in the Algebraic Numbers Field**

*n* ≥ 1 is a positive integer, *E*/*Q* is an algebraic numbers field with NC-property of degree *n*, *R* ⊂ *E* is the ring of algebraic integers of *E*, α ∈ *R*, β ∈ *R* are two distinct prime elements of *R*, *A* = αβ*R* is a principal ideal of *R*, *H*∗(α ⊗ β) is the ideal matrix corresponding to *A*, *L*α,β = *L H*∗(α ⊗ β) is the lattice generated by *H*∗(α ⊗ β), *B*α,β = HNF *L*α,β is the basis of *L*α,β in HNF, *B*∗ α,β = diag {*b*1, *b*2, ··· , *bn*} is the corresponding orthogonal basis • **Parameters**: ϕ(α, β) = (|det (*H*∗(α))| − 1) det *H*∗(β) <sup>−</sup> <sup>1</sup> , *<sup>S</sup>*α,β = {*<sup>x</sup>* <sup>=</sup> (*x*1, *<sup>x</sup>*2, ··· , *xn*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* <sup>|</sup> <sup>0</sup> <sup>≤</sup> *xi* <sup>&</sup>lt; *bi*}. <sup>1</sup> <sup>≤</sup> *<sup>e</sup>* < ϕ(α, β), 1 ≤ *d* < ϕ(α, β), such that *ed* ≡ 1(mod ϕ(α, β)) • **Public keys**: The rotation matrix *H*, the lattice *L*(*B*α,β ) = *L*α,β and the positive integer *e* are public keys • **Private keys**: Ideal matrices *H*∗(α), *H*∗(β), the basis *H*∗(α ⊗ β) of *L*α,β and positive integer *d* are private keys. • **Encryption**: For any input message *a* ∈ *S*α,β , the ciphertext *c* is given by *<sup>c</sup>* <sup>≡</sup> *<sup>a</sup>*⊗*e*(mod *<sup>L</sup>*α,β ). • **Decryption**: *<sup>c</sup>*⊗*<sup>d</sup>* <sup>≡</sup> *<sup>a</sup>*⊗*de* <sup>≡</sup> *<sup>a</sup>*⊗(*k*ϕ(α,β)+1) <sup>≡</sup> *<sup>a</sup>*(mod *<sup>L</sup>*α,β ). One can find the plaintext

$$\overline{a} \text{ from } \overline{c} \text{ in } S\_{\alpha, \beta}$$

$$\begin{aligned} \varphi(A) = \varphi(\alpha R)\varphi(\beta A) &= (N(\alpha R) - 1)(N(\beta R) - 1) \\ &= (|N(\alpha)| - 1)(|N(\beta)| - 1) \\ &= (\left| \det \left( H^\*(\overline{\alpha}) \right) \right| - 1) \left( \left| \det \left( H^\*(\overline{\beta}) \right) \right| - 1 \right) \\ &= \varphi(\alpha, \beta). \end{aligned}$$

By Lemma 8, we have

$$\tau(A) = \tau(a\beta R) = L\left(H^\*(\overline{a}\otimes \overline{\beta})\right) = L\_{a,\beta} \text{ and } \tau\left(a^{k\varphi(a,\beta)+1}\right) = \overline{a}^{\otimes (k\varphi(a,\beta)+1)}.$$

Therefore, (47) follows immediately.

According to the above theorem, we may describe an attainable algorithm for high dimensional RSA as follows (Table 3).

**Remark 2** If the class number *hE* = 1, in other words, *R* is a UFD, then the prime elements are equivalent to irreducible elements in *R*, and one can find prime elements <sup>α</sup> from α(*x*) <sup>∈</sup> <sup>Z</sup>[*x*]/(φ(*x*)) and α(*x*) irreducible.

# **4 Security and Example**

The classical RSA public key cryptosystem is nowadays used in a wide variety of applications ranging from web browsers to smart cords. Since its initial publication in 1978, many researchers have tried to look for vulnerabilities in the system. Some clever attacks have been found (see Bonech (2002), Coppersmith (2001)). However, none of the known attacks is devastating and the ordinary RSA system is still considered secure.

The security of high dimensional RSA depends on virtually factoring of an element of the algebraic integers ring *R* into product of of distinct prime elements. Factoring on *R* is much more complicated than factoring of a positive integer, and none of efficient method is known up to day, thus we consider the high dimensional RSA almost absolutely secure.

To see the size of private keys, since det (*H*∗(α)) = *N*(α), it may be extremely huge, for example, if <sup>α</sup> <sup>=</sup> *<sup>p</sup>* <sup>∈</sup> <sup>Z</sup>, β <sup>=</sup> *<sup>q</sup>* <sup>∈</sup> <sup>Z</sup> are prime numbers, then

$$\det\left(H^\*(\overline{\alpha})\right) = N(\alpha) = p^n, \quad \det\left(H^\*(\overline{\beta})\right) = q^n$$

and

$$
\varphi(\alpha, \beta) = \left(p^n - 1\right)\left(q^n - 1\right),
$$

which is much larger than *pq*, the latter is the site of public key of the classical RSA cryptosystem.

The lattice based on cryptography has been intensively studied for the past two decades. The GGH cryptosystem proposed by Goldreich et al. (1997) is perhaps the most intuitive encryption scheme based on lattices. The public key is a "bad" basis for a lattice, and Micciancio proposed in (2001) to use, as the public basis, the Hermite Normal Form *B* = HNF(*L*). The private key of GGH is an exceptionally good basis for *L*. The security of GGH relies on the assumption that it is difficult to find a special basis for *L* from a known basis of *L*. In this sense, we regard the high dimensional RSA as secure as GGH/HNF cryptosystem at least.

Another number theoretic cryptosystem based on the lattice is NTRUEncrypt. The public key cryptosystem NTRU proposed in 1996 by Hoffstein et al. (1998) is the fastest known lattice-based encryption scheme, although its description relies on arithmetic over polynomial quotient ring *Z*[*x*]/*x<sup>n</sup>* − 1, it was easily observed that it could be expressed as a lattice based on cryptosystem. NTRU uses a q-ary convolutional modular lattice(see Micciancio and Regev (2009), Zheng (2022)), its public key is also the HNF basis of L, and the private key is a special basis of L containing two secrete polynomials *f* (*x*) and *g*(*x*). Obviously, our algorithm I is at least as hard as solving NTRUEncrypt.

Unfortunately, neither GGH nor NTRU is supported by a proof of security showing that breaking the cryptosystem is at least as hard as solving some underlying lattice problem; they are primarily practical proposals aimed at offering a concrete alternative to RSA or other number theoretic cryptosystems (see page 166 of Micciancio and Regev (2009)). However, the significance of this chapter is to show that the real alternative of RSA is the high dimensional RSA we present here rather than GGH and NTRU.

**Example 1** Finally, we give an example and see how to work the high dimensional RSA in a quadratic field. Let *E* = *Q*( <sup>√</sup>*d*), *<sup>d</sup>* <sup>∈</sup> <sup>Z</sup> be a square-free integer and *<sup>d</sup>* <sup>≡</sup> <sup>2</sup>, or 3 mod 4, thus *E* satisfies the NC-property. Let δ*<sup>E</sup>* be the discriminant of *E*, and it is known that δ*<sup>E</sup>* = 4*d* (see Proposition 13.1.2 of Ireland and Rosen (1990)). Let *<sup>p</sup>* <sup>∈</sup> <sup>Z</sup> be an odd prime satisfying the following condition:

$$p \nmid 4d, \quad \text{and} \quad x^2 \equiv d(\text{mod } p) \text{ is not solvable in } \mathbb{Z}. \tag{49}$$

By Proposition 13.1.3 of Ireland and Rosen (1990), we know that *p* is a prime element in *E*.

According to Algorithm I, we select two large primes *p* and *q* of which satisfying (49). Let α = *p* and β = *q*, then

$$
\bar{\alpha} = \begin{pmatrix} p \\ 0 \end{pmatrix}, \; \bar{\beta} = \begin{pmatrix} q \\ 0 \end{pmatrix}, \; H^\*(\overline{\alpha}) = \begin{pmatrix} p \ 0 \\ 0 \ p \end{pmatrix}, \; \text{and} \; H^\*(\overline{\beta}) = \begin{pmatrix} q \ 0 \\ 0 \ q \end{pmatrix}.
$$

It follows that

$$H^\*(\overline{\alpha} \otimes \overline{\beta}) = H^\*(\overline{\alpha})H^\*(\overline{\beta}) = \begin{pmatrix} pq & 0 \\ 0 & pq \end{pmatrix}, \quad L\_{a,\beta} = L\left(H^\*(\overline{\alpha} \otimes \overline{\beta})\right) \tag{50}$$

and

$$S\_{\alpha,\beta} = \left\{ \mathbf{x} = \begin{pmatrix} \mathbf{x}\_1 \\ \mathbf{x}\_2 \end{pmatrix} \in \mathbb{Z}^2 \mid \mathbf{0} \le \mathbf{x}\_1, \mathbf{x}\_2 < pq \right\}. \tag{51}$$

It is easy to see that

$$
\varphi(\alpha, \beta) = (p^2 - 1)(q^2 - 1). \tag{52}
$$

In this special case, the two-dimensional RSA may be described as follows (Table 4).

We can similarly deal with the cases of Cyclotomic Fields. Let *n* = ϕ(*m*)for some positive integers *m*, ξ*<sup>m</sup>* = *e*<sup>2</sup>π*i*/*<sup>m</sup>*, *E* = *Q*(ξ*m*), and *R* ⊂ *E* be the ring of algebraic integers of *<sup>E</sup>*. Suppose that *<sup>p</sup>* <sup>∈</sup> <sup>Z</sup> is a rational prime number, then *<sup>p</sup>* is a prime element of *R* if and only if (see Theorem 2 of page 196 of Ireland and Rosen (1990))

$$p \nmid m \text{ and } p^{\varphi(m)} \equiv 1 (\text{mod } m). \tag{53}$$

Suppose that *<sup>p</sup>* <sup>∈</sup> <sup>Z</sup> and *<sup>q</sup>* <sup>∈</sup> <sup>Z</sup> are two distinct prime numbers satisfying (53), we obtain the lattice *L*(*H*∗(*p* ⊗ *q*)) and an attainable algorithm in *Q*(ξ*m*).

#### **Table 4** RSA in a quadratic field

#### **RSA in A Quadratic Field**


# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Central Bank Digital Currency Cross-Border Payment Model Based on Blockchain Technology**

# **Mao Hanyu**

**Abstract** Since the turn of the twenty-first century, the growth of the globalized economy and trade has accelerated, and the cross-border payment system, which is an essential component of the international financial infrastructure, has played a significant role in the global economy and trade. However, traditional cross-border payments present risks and challenges, such as expensive processing fees, limited payment efficiency, information asymmetry in the trade process, and reliance on a highly centralized cross-border payment system. This chapter is based on consortium blockchain technology and utilizes Polkadot's Parachain, Relay chain, and cross-chain technologies as references; a scalable, high-efficiency, high-security, and privacy-protecting central bank digital currency cross-border payment model is designed. Analyzed the usage of hash digest technology and CoinJoin technology to avoid the tracing of transactions in order to protect privacy. The issuance of multi-country central bank digital currency or stablecoin anchored to a basket of fiat currencies is discussed as the currency in circulation in the model. Finally, the central bank digital currency cross-border payment development trend is summarized and forecasted.

**Keywords** Payment model · Cross-border · Blockchain technology · CBDC

# **1 Introduction**

Since 2020, the global digital transformation has been developing rapidly, and the era of central bank digital currency (CBDC) is accelerating, with China's digital currency—e-CNY leading the world. The launch of the e-CNY not only promotes the healthy development of China's digital economy but also benefits the RMB internationalization plan and speeds up the pace of RMB internationalization. At the same time, CBDC has especially significant advantages in cross-border payments,

M. Hanyu (B)

School of Mathematics, Renmin University of China, Beijing, China e-mail: 2020103607@ruc.edu.cn

<sup>©</sup> The Author(s) 2023

Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_10

which can effectively solve the problems of extended time, high cost, low efficiency, and low transparency faced by current cross-border payments. In addition, building a cross-border payment network system based on CBDC will also be a pivotal key to unlocking the opportunity to break the monopoly position of the US dollar and reshape the global cross-border payment system. Therefore, CBDC cross-border payments will inject new vitality into the rapid growth of our economy and will also play a pivotal role in establishing a fair and equitable international monetary settlement system.

With the rapid development of CBDC, CBDC cross-border payments are becoming a research hotspot in the central bank's digital currency research area. According to a survey by BIS, more than 50% of central banks consider cross-border payments as one of the crucial reasons for accelerating the development of CBDC. Traditional cross-border payments suffer from high fees, low efficiency, information asymmetry in the cross-border trade process, and the potential financial risk of a highly centralized cross-border payment system. The CBDC cross-border payment system, with the characteristics of high payment efficiency, low cost, and high transparency, is not only conducive to solving the current existence of cross-border trade friction and breaking the centralized cross-border payment system, but also conducive to eliminating the use of competitive currency devaluation, currency war, and other vicious behaviors between countries, promoting the peaceful development of financial markets, and laying a moderately centralized cross-border payment system with a healthy market foundation for international trade (Yang, 2020). Therefore, a large number of central banks and international organizations have started to try to explore the application of CBDC in cross-border payments. On February 26, 2022, the United States, together with the European Union, the United Kingdom, and Canada, issued a joint statement announcing that Russia is banned from using the Society for Worldwide Interbank Financial Telecommunications (SWIFT) international settlement system. It undoubtedly accelerated the research of countries investigating the idea of bypassing SWIFT for cross-border transactions.

Currently, the research on cross-border payment of central bank digital currency is still in the initial stage. There is a lack of in-depth research for a scalable and high-efficiency cross-border payment model, which leads to a lack of necessary theoretical research and essential technical support for its development. Therefore, it is significant to design a scalable, high-efficiency, CBDC cross-border payment model based on blockchain.

CBDC cross-border payments issued by central banks have become a significant trend. In this chapter, we use Polkadot's Parachain, Relay Chain, and cross-chain technologies as references for the CBDC cross-border payment model, and we commit to designing a scalable, high-efficiency, highly secure, and privacy-preserving CBDC cross-border payment model based on consortium chain.

# **2 CBDC Cross-Border Payment Development Current Situation**

CBDC cross-border payments can be made in two ways: first, retail central bank digital currencies (CBDCs) in a given jurisdiction are available to people both inside and outside the jurisdiction, with no coordination between central banks; and second, central banks work together to establish access and settlement arrangements between different retail or wholesale CBDCs (Wan & Wu, 2022). CBDC cross-border payments can be divided into four quadrants: "same system and same currency", "same system and different currency", "same currency and different currency", "same currency and different system", and "different currency and different system". Among them, "same system and different currency" and "different currency and different system" are the most typical scenarios for cross-border payments and will be a key research focus in the future.

At this stage, for CBDC cross-border payments research, the following three models are used to achieve cross-border and cross-currency interoperability, enhancing compatibility of CBDCs systems; linking multiple CBDC systems; integrating multiple CBDCs in a single multi-CBDC (mCBDC) system (Auer et al., 2021). Models linking multiple CBDC systems include the Stella project of the European Central Bank and the Bank of Japan (2019); and the Jasper-Ubin project of the Bank of Canada (BOC) and the Monetary Authority of Singapore (MAS) (2019). Jura project for cross-border payment between Banque de France and Swiss National Bank. Integrating multiple CBDCs in a single mCBDC system mainly contains the Aber project of the UAE and the Central Bank of Saudi Arabia (2020); Dunbar, a joint project of the Monetary Authority of Singapore and the BIS (2022); and the Inthanon-LionRock project of the Bank of Thailand and the Hong Kong Monetary Authority (2020). In 2021, with the addition of the Digital Currency Institute of the People's Bank of China and the United Arab Emirates Bank, the project evolved into its third phase. It was renamed the mCBDC Bridge (mBridge) Project (Inthanon-LionRock to mBridge-Building a multi CBDC platform for international payments, 2021).

Recently, the CBDC projects Jasper, Ubin, and Stella have completed their experiments. All these projects continue the line "from wholesale payments to voucher payments to cross-border payments" (Yao, 2021). Thus, enabling cross-border payments is the ultimate goal and an essential part of the CBDC research route. Moreover, the experimental results of these representative wholesale CBDC projects show that current technology and design solutions can support Real-Time Gross Settlement (RTGS) in terms of efficiency and can also realize Liquidity Saving Mechanism (LSM) in terms of functionality (Huang, 2022). Also, these projects show that crosschain technology is a crucial issue for CBDC cross-border payments. Although CBDC cross-border payments have become a research hotspot both at home and abroad, most existing research scholars focus on the two fields of economics and law for CBDC cross-border payments, and the proposed CBDC cross-border payment model has not been sufficiently investigated on a technical level.

At the technical level, most research by central banks or international organizations has focused on linking multiple CBDC systems and integrating multiple CBDCs in a single mCBDC system. However, most projects are still at the experimental stage with the participation of only a few countries and lack a certain degree of scalability in practice. At the same time, the cross-chain technology used to link multiple CBDC systems, hash time-locked contract (HTLC), is limited in its application scenarios, where the two sides of a transaction need to establish N2 magnitude of transaction channels between them, and the number of transaction channels grows in power as N increases. Therefore, the scalability of hash time-locked contract (HTLC) is deficient and may not be suitable for application to large-scale economies. While integrating multiple CBDCs in a single mCBDC system can avoid complex hash time-lock contract (HTLC) and improve payment efficiency, the establishment of privacy groups inevitably introduces multi-ledger-style behaviors and constraints that hinder the realization of transaction atomicity. Therefore, research on cross-chain technology and the introduction of effective privacy protection mechanisms to achieve transaction atomicity and improve transaction efficiency while ensuring transaction privacy are the focus of future research.

# **3 Polkadot Technology Overview**

Polkadot is a scalable heterogeneous multi-chain technology that provides a more general cross-chain protocol. Any blockchain system compatible with Polkadot's cross-chain protocol will be able to complete cross-chain interconnection (Polkadot, 2016). Polkadot is envisioned as a new form of blockchain "blockchain network" and one of the critical infrastructures of the future web 3.0. As shown in Fig. 1, Polkadot is completed with Parachain, relay chain, and bridge. It uses various Parachain technologies to satisfy the needs of different applications. It uses Relay chain technology to unify the management of consensus security and data interaction, which can solve the scalability and isolability problems of current blockchain technology.

# *3.1 Relay Chain and Parachain Technology*

The Parachain is a member blockchain of Polkadot that collects and processes transactions and transmits them to a Relay chain. Each participating Parachain has a high degree of autonomy and flexibility. Each Parachain can be designed and focused on a specific scenario as long as it follows the protocols set by Polkadot. The Relay chain is the core of the Polkadot network, responsible for maintaining the whole network's security, coordinating consensus among different Parachains, and forwarding crosschain transactions between each Parachain. The consensus mechanism of the Relay chain uses an asynchronous Byzantine fault-tolerant algorithm to reach consensus.

In order to maintain the relay chain, the Polkadot network establishes four roles: Nominator, Validator, Collator, and Fisherman. Nominators are a group of token (DOT) holders who have the authority to vote for the Validator. Validator nodes have the highest authority in the network, having the ability to create blocks for the whole network. They are elected by Nominator vote and can validate blocks and pack blocks after a sufficient deposit (TOKEN) is mortgaged in the system. If the validators perform their duties, they are rewarded for generating blocks. If the validators don't perform their duties, they are punished by having some or all of their deposit deducted. The collators are a group of nodes that collect information from the Parachain and package it for submission to the validators. They submit candidate blocks to the validators and assist them in creating valid blocks, which are rewarded with a fee. Collators will go and collect as much information as possible in order to get more fees. Fisherman is a relatively independent node in the system. It is only responsible for monitoring the system's illegal activities and reporting their detection. Then it will receive a substantial one-time reward. Moreover, a deposit is required to become a Fisherman, mainly used to prevent Sybil Attack by witches that waste the verifier's computing time and resources.

# *3.2 Polkadot Cross-Chain Technology*

Cross-chain communication is the most critical part of Polkadot, as shown in Fig. 2. Because of the relay chain's security guarantee for the whole system, transactions conducted on one Parachain can be transferred to another Parachain through the relay chain. As a result, cross-chain transactions on Polkadot are simpler and more efficient than other cross-chain methods. Specifically, each Parachain maintains an egress and an ingress transaction queue. The queue uses Merkle trees to ensure data authenticity. When a Parachain (A) initiates a cross-chain transaction to another Parachain (B), the transaction is pushed to Parachain A's egress queue. Then the

**Fig. 2** Schematic diagram of cross-chain transactions on Polkadot This figure from Polkadot White Paper Polkadot (2016)

relay chain transfers the transactions in Parachain A's egress queue to Parachain B's ingress queue, which then processes the transactions in its ingress queue itself (Yuan & Wang, 2019).

# **4 CBDC Cross-Border Payment Model**

This section will introduce a CBDC cross-border payment model based on consortium blockchain technology, referencing Polkadot's Parachain, Relay Chain, and cross-chain technologies. Figure 3 depicts a scalable, high-efficiency, high-security, and privacy-protecting CBDC cross-border payment model.

# *4.1 Design of Parachain*

Every country is a Parachain in this model, and each Parachain is a consortium blockchain. The consortium blockchain is a permissioned blockchain, meaning only the internal designation of several nodes can upload, record, and read data. These nodes act as bookkeeper nodes, and they collectively decide to generate blocks. Using consortium blockchain can significantly improve the blockchain's operational efficiency and reduce network latency, all while ensuring the privacy of each transaction's data. Therefore, Parachain in each country can be adopted in the form of consortium blockchains, which can achieve the purpose of improving efficiency and protecting privacy.

Since Parachain has a high degree of autonomy and flexibility, each country's Parachain can be designed independently according to its own country's conditions. Therefore, the consensus mechanism of each country's Parachain can be chosen according to the country's reality, such as the mainstream consensus mechanisms applied to the consortium blockchain: Raft, PBFT, etc. The Parachain of each country can be divided into several nodes with different authorities according to the actual situation of cross-border payment in the country, including the central bank, trusted financial payment institutions, and regulatory agencies. As a result, three roles are established in the network of this model: Validators, Collators, and Supervisors.

The work of Collators is to collect information on the Parachain, submit candidate blocks to the Validators, and assist the group of validators in creating valid blocks. They also have the authority to vote for the Validators. Consequently, the Collators can be commercial banks and trusted financial payment institutions in every country. Validators are the nodes with the authority to generate blocks and have the highest authority in the system. The Validators nodes are elected by vote of the Collators and are responsible for validating the blocks and packaging them. Each country's central bank or specialized agency can fill this critical role. Supervisors are the nodes that need to be responsible for regulating illegal activities in the system. Thus, it can be held by the regulator of each country.

Taking China as an example, the nodes of Collators can be served by six stateowned commercial banks, including Industrial and Commercial Bank of China (ICBC), Agricultural Bank of China (ABC), Bank of China(BOC), China Construction Bank (CCB), Bank of Communications (BCM), and Postal Savings Bank of China (PSBC), and trusted commercial banks and third-party payment institutions can be added in the future. Validators are the central bank of China, the People's Bank of China, and its affiliated institutions. Supervisors are mainly served by the Ministry of Commerce, the China Banking and Insurance Regulatory Commission, the National Audit Office, and other regulatory authorities.

# *4.2 Design of Relay chain*

In this model, the Relay chain is the same as the consortium blockchain, which contains the protocols of all Parachains, can recognize the transaction format of each country's Parachain, and can be responsible for coordinating consensus and forwarding cross-chain transactions between different Parachains. The Validator nodes, which the Collators vote on in each country, are also added to the relay chain and are responsible for packaging transactions and generating blocks.

Specifically, the Collators in each country's Parachain first elect the Validators in charge of their Parachain, and the Validators are added to the Relay chain. After that, the Collators on each country's Parachain will collect the transactions into the blocks with a Noninteractive Zero-Knowledge Proof, which is used to prove that the father block of this child block is valid, and hand them over to the Validators in charge of their country's Parachain. The Validators from each country involved in this crossborder transaction form a team of Validators to validate the blocks in the order in which the Collators send them and then consensus out the Parachain blocks for that height. When the Validators of each country's Parachain involved in cross-border payments confirm that their country's Parachain has confirmed the transaction, the Validators group then routes the message to the Relay chain and generates the Relay chain blocks. In the next round, the Collators in each country's Parachain vote again to elect new validators and round this cycle.

# *4.3 Cross-Chain Transaction*

The cross-chain transactions of the model are approximately the same as those of Polkadot. Each country's Parachain contains an egress transaction queue and an ingress transaction queue (there can be multiple exports and ingresses if the transaction volume is large). The Relay chain transfers transactions from the egress transaction queue at the source Parachain to the ingress transaction queue at the destination Parachain.

The egress transaction queue contains a list of grids with routing information, each with a concatenated structure of exit submissions. Merkle tree proofs can be provided between verifiers of Parachains so that blocks of one Parachain can be proven to correspond to the egress transaction queue of another Parachain, guaranteeing data authenticity. If the ingress transaction queue of a Parachain exceeds the block processing threshold, it is marked as complete on the relay chain, and no new messages are received until the queue is emptied. The Merkle tree is used to prove that the collector's operations in the Parachain blocks are trustworthy.

For example, the flow of a cross-border transaction between China and Russia is as follows. When a Chinese Parachain launches a cross-chain transaction to a Russian Parachain, this transaction will first be pushed to the Chinese Parachain's egress transaction queue. Then the Relay chain will transfer this transaction from the Chinese egress transaction queue to the Russian ingress transaction queue. Then the Russian Parachain will process the transaction in the ingress queue. This design can effectively guarantee the security of cross-chain transactions and significantly improve the efficiency of cross-chain transactions.

# *4.4 Privacy Protection*

In Parachain, the blockchain ledger takes advantage of the irreversible nature of the hash algorithm and uses hash digests instead of transaction-sensitive information. At the same time, CoinJoin technology is used to obfuscate transactions and sever the relationship between the input and output addresses of transactions so that the origin and destination of transactions cannot be traced for privacy protection.

In Relay chain, if other countries are not involved, the block can be generated by only the countries involved in cross-border transactions confirming the transactions. The relevant detailed information does not need to be authenticated by the nodes of other countries, which can prevent other countries from knowing the details of the transactions and can effectively protect the privacy of cross-border transaction information for each country.

# **5 CBDC Cross-Border Payment Model Architecture**

The whole cross-border model is divided into four layers, as shown in Fig. 4. They are application layer, architecture layer, blockchain layer, and digital currency issuance layer.

Application layer: This layer mainly faces users and can provide user identity authentication services, system access services, etc. The authentication technology verifies the user's identity through the authentication center to ensure the validity of the trader's identity. Users can access the system if they pass authentication.

Architecture layer: This layer is composed of Parachain, which is designed by each country, and Relay chain, which is responsible for forwarding cross-chain transactions, as shown in Fig. 3. Parachain and Relay chain are consortium blockchains that are ideal for practical applications. Three trusted roles are established in the network of the model: Validators, Collators, and Supervisors, to help the whole system work more effectively.

Blockchain layer: This layer consists of the core technical aspects of blockchain, such as Peer-to-Peer networks, Smart Contracts, Time stamps, and Consensus mechanisms. Distributed ledger technology resolves the problem of storing, transferring, and querying transaction information in cross-border payments. The consensus mechanism solves the agreement between validators on transactions and ledgers (Zhu, 2021). Resolving the problem of double payment through Digital signature and Time Stamp, Smart Contract technology can realize automatic accounting rec-

onciliation and error handling in cross-border payments, ensuring that transactions are trusted and reliable. The smart contract automatically identifies and executes the actual conditions and matches the situations that arise to the relevant processing rules. This way, all information is recorded between the parties synchronously and cannot be tampered with by either party. It can also effectively prevent the loss of information due to technical failures (Huang & Luo, 2021).

Digital currency issuance layer: It mainly corresponds to the issuance and redemption of the digital currency used in this model, as well as the management and maintenance of this digital currency. Since the essence of building a cross-border payment system based on CBDC is to establish a regional economic association, regional economic cooperation is a prerequisite for CBDC to be recognized in cross-border payments. Therefore, a multi-country CBDC anchored to a basket of legal currencies is considered to be issued as the circulating currency in the model; or a stable currency anchored to a basket of legal currencies is issued as the circulating currency. This digital currency should only be used for cross-border payment clearing between Parachains of individual countries. It cannot be freely exchanged or used outside this cross-border payment model between financial institutions on the parallel circulation chain. The intrinsic value and purchasing power of this digital currency can be determined by each party's central basket of traded goods based on historical transaction volumes (or other forms). In this way, it can bypass the US dollar settlement and circumvent the constraints of countries' foreign exchange reserves anchored by the US dollar without challenging the monetary sovereignty of national central banks (Huang & Luo, 2021).

# **6 Summary and Prospect**

This chapter utilizes Polkadot's Parachain, Relay chain, and cross-chain technologies as references and is based on consortium blockchain technology; a scalable, high-efficiency, high-security, and privacy-protecting CBDC cross-border payment model is designed. The purpose of privacy protection is investigated by using hash digest technology and CoinJoin technology to obfuscate the input address and output address of transactions. The construction of a free-floating legal digital currency system bypassing U.S. dollar settlement is discussed so that multi-country CBDCs anchored to a basket of legal currencies can be issued, or stablecoins anchored to a basket of legal currencies can be issued as the circulating currencies in the model in order to contribute to the study of CBDC cross-border payments.

With the increasing perfection of CBDC cross-border payment technology and the more mature development of cross-chain technology, the future is expected to form a regional-centric polycentric pattern. A new pattern of economic development in which "different currencies in the same system" are used within a region and "different currencies in different systems" are used between areas in the future. Nowadays, CBDC cross-border payment has become an international research hotspot. Although some countries are already experimenting with CBDC cross-border payment, the security and scalability of its cross-chain still need time to be proven. Follow-up research can focus on the following two levels: Technically, the focus and difficulty of CBDC cross-border payments lie in cross-chain technology, so the research of more secure and efficient cross-chain technology is a hotspot for future research. At the same time, the research of new consensus algorithms that can be applied to blockchain cross-chain will also greatly improve the efficiency of CBDC cross-border payments. In terms of regulation, it is necessary to strengthen the supervision of CBDC cross-border payments; it is not only essential to identify the regulatory authority of CBDC cross-border payment but also to improve the legal study of CBDC cross-border payment. In addition, we can also learn from the regulatory sandbox model and introduce a new model of the "Chinese regulatory sandbox" to balance risk and innovation.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **LLE Based K-Nearest Neighbor Smoothing for scRNA-Seq Data Imputation**

**Yifan Feng, Yutong Ai, and Hao Jiang**

**Abstract** The single-cell RNA sequencing (scRNA-seq) technique allows single cell level of gene expression measurements, but the scRNA-seq data often contain missing values, with a large proportion caused by technical defects failing to detect gene expressions, which is called dropout event. The dropout issue poses a great challenge for scRNA-seq data analysis. In this chapter, we introduce a method based on KNN-smoothing: LLE-KNN-smoothing to impute the dropout values in scRNA-seq data and show that the LLE-KNN-smoothing greatly improves the recovery of gene expression in cells and shows better performance than state-of-the-art imputation methods on a number of scRNA-seq data sets.

**Keywords** LLE · scRNA-seq · Dropout issue

# **1 Introduction**

Single-cell RNA sequencing (scRNA-seq) was first reported in Tang et al. (2009), and it is a high-throughput sequencing technology of the transcriptome at the single cell level, reflecting the heterogeneity between cells. The technology plays a significant part in many fields, such as developmental biology, microbiology and so on, and has gained a lot of attention in life science research (Kelsey et al., 2017; Stubbington et al., 2019).

The advent of scRNA-seq technology provides great help for revealing hidden biological functions. However, scRNA-seq data is noisy and incomplete, containing

Y. Feng · Y. Ai · H. Jiang (B)

School of Mathematics, Renmin University of China, No. 59 Zhong guancun Avenue, Haidian District, Beijing, China e-mail: jiangh@ruc.edu.cn

Y. Feng e-mail: fengyifannn@163.com

Y. Ai e-mail: aiyutong1996@qq.com

© The Author(s) 2023

Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_11

203

a large number of zero values. The zero values caused by failure of signal detection are called dropouts (Liu & Trapnell, 2016). The dropout event results from a failure of amplification of the original RNA transcript, and the generated noise may disrupt potential biological signals and hinder the downstream analysis. Hence it is a great challenge on how to distinguish the true biological zero and the false zero in scRNAseq data.

A great number of imputation methods have been proposed to solve the dropout issues arisen in bulk RNA-seq data (Moorthy et al., 2019). For example, Kim et al. proposed a local least squares imputation method called LLsimpute (Kim et al., 2004). This method uses least squares optimization to represent the missing genes as a linear combination of its similar genes. Aittokallio (2010) proposed a method based on fuzzy clustering and gene ontology to estimate the missing values in microarray data. However, these imputation methods may not be directly applicable to scRNAseq data as scRNA-seq data is much more sparse than bulk RNA-seq data.

In the design of imputation methods for scRNA-seq data, some researchers try to interpret the observed data through probability distribution model. Typical models assume that the scRNA-seq data follow Poisson or negative binomial distribution. The analysis of Ziegenhain (2017) and various studies show that in the absence of real expression differences, the mean variance relationship of genes or proteins closely follow Poisson distribution ( Grün et al., 2014). The randomness of singlecell sequencing technology leads to excessive zero values in single-cell data, and many studies include zero inflation to explain excessive zero values in scRNA-seq data (Fan et al., 2016; Parekh et al., 2017; Pierson and Yau, 2015; Risso et al., 2017).

MAGIC (Dijk et al., 2017) is a graph imputation method based on Markov affinity matrix. For a given cell, MAGIC first finds its most similar cell and aggregates the gene expression of these highly similar cells, so as to estimate the gene expression of those with dropout events and other noise sources. However, due to the sparsity of scRNA-seq data, the nearest neighbor in the original data may not represent the most biologically similar cells, which may add new bias to the data and eliminate meaningful biological properties. KNN-smoothing (Wagner et al., 2017) is developed by identifying the k-nearest neighbor of cells with average expression update to perform imputation. DrImpute (Gong et al., 2018) is also a smoothing method, which is designed based on the consistency clustering method of scRNA-seq data (Kiselev et al., 2017). In this method, Spearman and Pearson correlation coefficients are used to calculate the distance matrix between cells, while K-means is used to cluster the distance matrix within the expected cluster number range. These representatives form a class of smoothing based imputation methods.

Model based imputation methods constitute a large proportion of imputation methods for scRNA-seq data. scImpute (Wei et al., 2018) uses a mixed model to distinguish dropout zeros from true zeros. However, scImpute assumes that each gene has a dropout rate, but it has been confirmed that the dropout rate of genes depends on many factors, such as cell type and RNA-seq protocols (Kharchenko et al., 2014), so the selection of dropout rate may need further discussion and research. SAVER (Mo et al., 2017) assumes that the original data follow Poisson distribution and form a prediction model for each gene through the observed gene count (UMI) and then uses the weighted average of the observed count and the predicted value to restore the true expression of each gene in each cell. netNMF-sc (Elyanow et al., 2020) combines the network regularized nonnegative matrix decomposition with zero inflation processes in transcription count matrix. VIPER (Chen and Zhou, 2018) is based on a nonnegative sparse regression model, which predicts the cells to be imputed by actively selecting a set of sparse local neighborhood cells. In addition, VIPER models dropout probability in the way of specific cell types and specific genes and infers all modeling parameters from the data using an efficient quadratic programming algorithm.

Deep learning based imputation methods in the recent years have gained a lot of attention. AutoImpute (Talwar et al., 2018) is based on deep autoencoder and sparse gene expression matrix. DCA (deep count autoencoder) (Eraslan et al., 2019) is based on the negative binomial noise model, which can minimize the reconstruction error without supervision to learn the distribution parameters of specific genes, which can be applied to data sets of millions of cells. DeepImpute (deep neural network imputation) (Arisdakessian et al., 2018) imputes genes by constructing multiple sub-neural networks. The method uses dropout layers and loss functions to learn distribution in the data and constructs a predictive model, with imputation of missing data alone.

Ensemble methods were proposed mainly for fully integrating the advantages of the available methods. Enimpute (Zhang, 2019) combines the basic results of eight different imputation methods (ALRA, DCA, DrImpute, MAGIC, SAVER, scImpute, scRMD, Seurat) and takes trimmed mean to get the robust results. SHARP (Wan et al., 2020) is an algorithm based on ensemble random projection (RP) that is capable to deal with a scale of 10 million cells.

Among the above methods, the smoothing based method mainly imputes the missing values according to the expression of similar cells, which highly relies on distance measures to define similarity. The model-based method can better distinguish the real zeros from the dropouts, but the results largely depend on the assumptions of the models, which may lack generalization ability. Deep learning has high scalability and can process larger data sets, but at the same time, it requires too much time in training and learning steps, and the memory consumption is larger than other methods. In this chapter, we propose LLE (Locally linear embedding) (Zhou, 2016) based on KNN-smoothing for single cell data imputation. While dealing with real data, the global non-linearity of LLE as well as the property of maintaining the manifold structure can better restore the data. Compared with other methods, we believe that LLE-smoothing achieves better results.

### **Algorithm 1** K-nearest neighbor smoothing for UMI-filtered scRNA-Seq data

#### **Input:**

*p*, the number of genes.

*n*, the number of cells.

*X*, a *p* × *n* matrix.

*k*, the number of neighbors to use for smoothing.

*d*, the number of principal components to use for determining neighbors.

#### **Output:**

*S*,a *p* × *n* smoothed matrix.

```
Input: procedure KNN-SMOOTH(p, n, X, k)
  S = COPY (X)
  steps = -
          log2(k + 1))
1: for t = 1 to steps do
2: M = MEDIAN-NORMALIZE(S)
3: F = FREEMAN-TUKEY-TRANSFORM(M)
4: Y = LEADING-PC-SCORES(F, d)
5: D = PAIRWISE-DISTANCE(Y )
6: A = ARGSORT-ROWS(D)
7: k−step = MIN-

                 2t − 1, k

8: for j = 1 to n do
9: for i = 1 to p do
10: Si j = 0
11: end for
12: end for
13: for j = 1 to n do
14: for v = 1 to k−step + 1 do
15: u = Ajv
16: for i = 1 to p do
17: Si j = Si j + Xiu
18: end for
19: end for
20: end for
21: end for
22:
23: return S
```
# **2 Materials and Methods**

# *2.1 The K-Nearest Neighbor Smoothing Algorithm*

The k-nearest neighbor smoothing (KNN-smoothing) algorithm realizes imputation by aggregating information from similar cells based on the k-nearest neighbor (KNN) idea. The algorithm is formalized in Algorithm 1. Here, *Xi j* refers to the expression of *i* th gene and *j* th cell of *X*. COPY (*X*) returns an independent memory copy of *X*. MEDIAN-NORMALIZE (*X*) returns a new matrix of the same dimension as *X*, in which the values in each column have been scaled by a constant so that the column sum equals the median column sum of *X*. FREEMAN-TUKEY-TRANSFORM (*X*) returns a new matrix of the same shape as *X*, in which all values have been Freeman. Tukey transformed (FTT) Freeman and Tukey (1950) *<sup>f</sup>* (*x*) <sup>=</sup> <sup>√</sup>*<sup>x</sup>* <sup>+</sup> <sup>√</sup>*<sup>x</sup>* <sup>+</sup> <sup>1</sup> . LEADING-PC-SCORES (*X*, *d*) returns the principal component scores of the observations in *X* (contained in the columns) for the first *d* principal components. PAIRWISE-DISTANCE (*X*) computes the pair-wise distance matrix *D* from *X*, here *Di j* is the Euclidean distance between the *i* th column and the *j* th column of *X*. For a matrix *D* with *n* columns, ARGSORT-ROWS (*D*) returns a matrix of indices *A* that sorts *D* in a row-wise manner, i.e., *Dj Aj* <sup>1</sup> ≤ *Dj Aj* <sup>2</sup> ≤ ... ≤ *Dj Ajn* for all *j*.

# *2.2 Locally Linear Embedding*

The k-nearest neighbor smoothing algorithm highly depends on the distance evaluation and hence, LEADING-PC-SCORES (*X*, *d*) is a critical step in the realization of the algorithm. Taking into consideration that PCA is a linear embedding method that may neglect the non-linear intrinsic property of scRNA-seq data, we propose LLE-based method for low-dimensional projection of scRNA-seq data.

LLE is a dimensionality reduction method based on the concept of topological manifold. It assumes that each sample point and its neighbor sample point in highdimensional space are approximately located on a hyperplane, so the sample point can be reconstructed by a linear combination of its neighbor sample points. Since LLE algorithm only considers the k-nearest neighbor information of each point, which is computationally efficient. Assume *X* = (*x*1, *x*2, ..., *xN* ) ∈ *R <sup>D</sup>*×*<sup>N</sup>* , for each data point *xi* ∈ *R <sup>D</sup>*×1, it can be represented by the linear combination of its *k* nearest neighbor:

$$\mathbf{x}\_i = \sum\_{j=1}^k w\_{ji}\mathbf{x}\_{ji} \tag{1}$$

where *wi* ∈ *Rk*×1, *wji* is *j*th of *wi* , *x ji* is the *j*th nearest neighbor of *xi* . i.e.

$$\mathbf{w}\_{i} = \begin{bmatrix} w\_{1i} \\ w\_{2i} \\ \vdots \\ w\_{ki} \end{bmatrix} \quad \mathbf{x}\_{i} = \begin{bmatrix} x\_{1i} \\ x\_{2i} \\ \vdots \\ x\_{Di} \end{bmatrix} \tag{2}$$

Minimize the following loss function:

$$\arg\min\_{\mathbf{w}} \sum\_{i=1}^{N} \left\| \mathbf{x}\_i - \sum\_{j=1}^{k} w\_{ji} \mathbf{x}\_{ji} \right\|^2 \tag{3}$$

Solving the above formula, the weight coefficient can be obtained by

$$\mathbf{w} = [w\_1, w\_2, \dots, w\_N] \tag{4}$$

where *wi* ∈ *Rk*×*<sup>N</sup>* corresponds to *N* data points, (*i* = 1, 2, ...*N*).

After reducing the original data from *D* dimension to *d* dimension, *xi* → *yi* , the reduced representation can still be expressed as the linear combination of its *k*-nearest neighbors, and the combination coefficient remains unchanged, so the loss function can be written as:

$$\arg\min\_{\mathbf{y}} \sum\_{i=1}^{N} \left\| \mathbf{y}\_i - \sum\_{j=1}^{k} w\_{ji} \mathbf{y}\_{ji} \right\|^2 \tag{5}$$

where *Y* is the data located in the low dimensional space after dimensional reduction is obtained:

$$Y = [\mathbf{y}\_1, \mathbf{y}\_2, \dots, \mathbf{y}\_N] \tag{6}$$

We can rewrite the optimization objective as follows

$$\begin{aligned} \Phi(w) &= \sum\_{i=1}^{N} \left\| x\_i - \sum\_{j=1}^{k} w\_{ji} x\_{ji} \right\|^2 \\ &= \sum\_{i=1}^{N} \left\| \sum\_{j=1}^{k} \left( x\_i - x\_{ji} \right) w\_{ji} \right\|^2 \\ &= \sum\_{i=1}^{N} \left\| \left( X\_i - N\_i \right) w\_i \right\|^2 \\ &= \sum\_{i=1}^{N} w\_i^T \left( X\_i - N\_i \right)^T \left( X\_i - N\_i \right) w\_i \end{aligned} \tag{7}$$

Regarding *Si* as the local covariance matrix, we have

$$S\_i = \left(X\_i - N\_i\right)^T \left(X\_i - N\_i\right)$$

$$\Phi(\boldsymbol{w}) = \sum\_{i=1}^N w\_i^T S\_i w\_i \tag{8}$$

We can introduce Lagrange multiplier method

$$L\left(w\_i\right) = \sum\_{i=1}^{N} w\_i^T S\_i w\_i + \lambda \left(w\_i^T \mathbf{1}\_k - 1\right) \tag{9}$$

to get the optimal solution by derivation

LLE Based K-Nearest Neighbor Smoothing … 209

$$\frac{\partial L(w\_i)}{\partial w\_i} = 2S\_i w\_i + \lambda 1\_k = 0 \tag{10}$$

$$w\_i = \frac{S\_i^{-1} \mathbf{1}\_k}{\mathbf{1}\_k^T S\_i^{-1} \mathbf{1}\_k} \tag{11}$$

where 1*<sup>k</sup>* is the column vector of all 1 elements of *k* × 1, the local covariance matrix *Si* is a matrix of *k* × *k*, and its denominator is actually matrix *Si* , namely the sum of all elements of the inverse matrix, and its molecule is the column vector obtained by summing rows with the inverse matrix of *Si* .

Finally, the optimization problem for the low dimensional embedding becomes

$$\arg\min\_{Y} \Psi(Y) = \sum\_{i=1}^{N} \left\| \mathbf{y}\_i - \sum\_{j=1}^{k} w\_{ji} \mathbf{y}\_{ji} \right\|^2 \tag{12}$$

$$\text{s.t.} \sum\_{i=1}^{N} \mathbf{y}\_i = \mathbf{0}, \sum\_{i=1}^{N} \mathbf{y}\_i \mathbf{y}\_i^T = NI\_{d \times d} \tag{13}$$

where

$$Y = \{\mathbf{y}\_1, \mathbf{y}\_2, \dots, \mathbf{y}\_N\} \in \mathbb{R}^{d \times N} \tag{14}$$

Let *M* denote

$$M = \left(I - W\right)^{T} \left(I - W\right) \tag{15}$$

The optimization problem can be rewritten as:

$$\arg\min\_{Y} tr(YMY^T), \text{s.t.}\\ YY^T = I \tag{16}$$

It can be seen that *Y <sup>T</sup>* is actually a matrix composed of the eigenvector of *M*, so we only need to take the eigenvector corresponding to the smallest *d* non-zero eigenvalues of *M*.

# **3 Results**

# *3.1 Availability of Data*

The scRNA-seq data sets are available from Gene Expression Omnibus (GEO) database. Here, we use three data sets: Brain (Darmanis et al., 2015), Zeisel and Klein for method evaluation. Zeisel and Klein can be downloaded from GEO database with accession numbers GSE60361 and GSE65525 (Table 1).

# **Algorithm 2**Locally linear embedding neighbor smoothing for UMI-filtered scRNA-Seq data

### **Input:**


### **Output:**

*S*,a *p* × *n* smoothed matrix.

```
Input: procedure LLE-smoothing(p, n, X, k)
  S = COPY (X)
  steps = -
           log2(k + 1))
1: for t = 1 to steps do
2: M = MEDIAN-NORMALIZE(S) // a new p × n matrix
3: F = FREEMAN-TUKEY-TRANSFORM(M) // a new p × n matrix
4: Y = LLE(F, d) // a new d × n matrix
5: D = PAIRWISE-DISTANCE(Y ) // a new n × n matrix
6: A = ARGSORT-ROWS(D) // a new n × n matrix
7: k−step = MIN-

                  2t − 1, k

8: for j = 1 to n do
9: for i = 1 to p do
10: Si j = 0
11: end for
12: end for
13: for j = 1 to n do
14: for v = 1 to k−step + 1 do
15: u = Ajv
16: for i = 1 to p do
17: Si j = Si j + Xiu
18: end for
19: end for
20: end for
21: end for
22:
23: return S
```
# *3.2 Data Processing and Visualization*

The input of our method is a count matrix *X* with rows representing genes and columns representing cells. After logarithmic transformation and FTT transformation according to the process of Algorithm 4, *X* is mapped to a d-dimensional space by LLE. The Euclidean distance between each sample and its k nearest neighbors is calculated to form the distance matrix *Xn*×*<sup>n</sup>* and then smoothed step by step from 1 to *k*. We use t-distributed neighborhood embedding to visualize the data.


**Table 1** Summary of data sets used for imputation

# *3.3 Performance Evaluation*

For evaluation, we use SC3 to cluster the imputed data to test the imputation effect. The Adjusted Rand index (ARI) is used to evaluate the clustering accuracy between the original cluster label of the data set and the cluster label of SC3. The results show that compared with other imputation methods, LLE-KNN-smoothing provides the best ARI in all three data sets of the experiment (as is shown in Table 2). For the nearest neighbor of parameter *k*, we can see that the ARI value of LLE-KNNsmoothing method is relatively high under different data sets, and when the value of parameter *k* changes, the clustering accuracy of LLE-KNN-smoothing changes slowly and remains stable (Figs. 1, 2 and 3).

We use t-SNE visualization to analyze the advantages and disadvantages of various methods under different data sets. We find LLE-KNN-smoothing is better than other methods (Table 2) and that our method performs better between inter-class and intraclass (Fig. 4 shows the result of data set Brain).

**Fig. 1** Different *k* of the four imputation methods on data set Brain

**Fig. 2** Different *k* of the four imputation methods on data set Zeisel

**Fig. 3** Different *K* of the four imputation methods on data set Klein


**Table 2** ARI of different imputation methods using SC3 clustering results (*k* = 32)

# **4 Conclusions**

In this chapter, we have used different data sets to demonstrate that LLE-KNNsmoothing perform better than other methods. In future work, we will continue to study the selection method of parameter *k*, *d*, and other manifold learning methods. Other work would be devoted to explore the effect of smoothing for differential expression analysis, gene set enrichment analysis, trajectory inference, etc. We anticipate that LLE-KNN-smoothing algorithm will perform well.

**Fig. 4** t-SNE visualization of the reduced dimensions of the five imputation methods on dataset brain. **a** Raw data. **b**–**f** data after KNN-smoothing, KNN-smoothing (KPCA), KNN-smoothing (UMAP), LLE-KNN-smoothing, MAGIC

**Acknowledgements** This work was supported by the National Natural Science Foundation of China (Grant nos:11901575).

#### **Conflict of interest**

The authors declare there is no conflict of interest.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Application of Time Series Analysis in the Fiscal Budget Variance of China**

**Guanhua Chen and Xinqi Gong**

**Abstract** During the process of budget planning and execution, irregular behaviors will be reflected in the level of the difference between budgeted and actual figures (named budget variance). Considering that these two processes are both led by Government Of China (hereinafter called GOC), the budget variance is widely used to evaluate the fiscal system. This chapter collects State General Public Budget data from 2000 to 2018 and analyzes their influence on budget variance. Then the forecast for budget variance is completed by modeling the budget execution and budget variance rate separately. The descriptive analysis and AIC (Akaike Information Criterion) contributes to decide the candidate model, the RMSE (Root Mean Square Error) on test data is used to select the final optimal model. The forecast shows that the extent of budget variance will be further controlled in 2011 and 2012, this chapter explains the result with fiscal theories to enhance the credibility of it and thereby provides a couple of policy advice on Chinese budget reform.

**Keywords** Budget variance · Time series analysis · Fiscal science

# **1 Introduction**

Budgeting is an important administrative process, which reveals both the range and direction of government action, as well as the effectiveness of the monitoring on government activities from National People's Congress (hereinafter called NPC) and private sectors (Chen, 2000). Since the tax sharing reform in 1994, China has

G. Chen · X. Gong (B)

Institute for Mathematical Sciences, Renmin University of China, No. 59 Zhong guancun Avenue, Haidian District, Beijing, China e-mail: xinqigong@ruc.edu.cn

G. Chen e-mail: 13917773472@163.com

X. Gong

Beijing Advanced Innovation Center for Structural Biology, Tsing Hua University, Beijing, China

© The Author(s) 2023 Z. Zheng (ed.), *Proceedings of the Second International Forum on Financial Mathematics and Financial Technology*, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-99-2366-3\_12

217

accomplished extensive and profound reforms in budgeting process. However, there is still some room for improvement, and according to "Decision of the State Council on Deepening the Reform of the Budget Management System" issued in 2014, it is summarized as "the budgeting is not scientific enough, the budget system needs more supervision, the scale of financial carry-over and balance funds is large, and the budget data needs more transparency" etc. Irregular behaviors in budget planning and execution will be reflected in the budget variance. One of the characteristics of a scientific, transparent, and standardized budget system is that the final account income and expenditure are consistent with which planned by the budget.

Since the 21st century, the level of budget variance of the Chinese government has experienced a rise and then a fall. Budget variance generally expanded year by year (Sun & Wu, 2012; Wang, 2009) from 2000 to 2011, peaking at a 15.8% of overcollection and 9% of over-spending. From 2012 to 2018, it has been significantly controlled. The average budget variance in revenue during this period was 6.39%, and that of spending was 4.29%. Figures of United Kingdom and the United States is instructive for further reform: the average public sector recurrent revenue budget deviation in the United Kingdom from 2001 to 2003 was—2.8% (Wang, 2009). In the United States, the average expenditure deviation from the fiscal budget was 2.1% in 2007 (Cui n.d.). China still has a long way to go in terms of reducing budget variance in comparison to countries mentioned above.

The budget is naturally uncertain so a certain degree of variance is permitted. However, large variance can lead to economic and institutional problems. For instance, excessive revenue adds to the burden on tax payers and impacts the market vitality, while excessive under-spending funds leads to inadequate provision of public goods (e.g., infrastructure) by the government, thus affecting social stability and People's livelihood. Institutionally, excessive budget variance implies the weak monitoring of the government. Therefore, it is important for Chinese government to speed up the budget reform and to establish a modern fiscal system to promote the modernization of national governance capacity.

Budget variance contains great research value, so historical data can be analyzed to reveal the reform achievement in China's fiscal system. At the same time, forecasting this indicator can show whether there is still room for improvement and provide a reference for the direction of further reform. The main work of this chapter has three parts. Firstly, we analyze Chinese budget variance, clarifying the linkages between fiscal deviations and the general environment at home and abroad through statistical descriptions. Then the chapter builds a series of time series models and select the best one to predict the variance in the next two years. Finally, the chapter introduces fiscal science theory to explain the prediction and make some reasonable policy advice.

# **2 A General View on Budget Data**

# *2.1 Introduction to Concept and Data Source*

State General Public Budget is one of the four major components of Chinese fiscal system and can be divided into revenue and expenditure. The revenue part mainly includes tax income and non-tax income like confiscation income and transferred funds from the central government to local government. The expenditure part aimed at improving the people's livelihood and maintaining national security, including spending on domestic defense, education, and infrastructure etc. The budgeting process can be divided into two stages, budget planning and budget execution. Former is the annual fiscal revenue and expenditure plan of the state, which is examined and approved by legal procedures, stipulating the sources of national revenue and the purpose of spending, reflecting the scope and direction of government activities. The latter is the annual implementation of budget plan, reflecting the actual economic activities on the state level.

The calculation of budget variance rate is:

$$\text{Budget Variance Rate} = \frac{\text{BudgetExecution} - \text{Budget Planning}}{\text{Budget Planning}} \tag{1}$$

The chapter uses the data of State General Public Budget Revenue and Expenditure ranging from 2000 to 2018 (see Table 1), which is obtained from the China Financial Yearbook.

# *2.2 Descriptive Analysis of Budget Variance*

As seen in Fig. 1, the budget deviation is a common phenomenon from 2000 to 2018 on a yearly basis. Most of the situation is over-collection and over-expenditure, under-collection only occurred in 2015 and under-expenditure only occurred in 2014. Over-collection rates peaks in 2007 (16.5%), 2011 (15.8%) and 2010 (12.4%), while over-expenditure rates peaks in 2011 (9%), 2001 (8.9%) and 2007 (7%). The average revenue budget variance rate from 2000 to 2018 is 6.4%, which is higher than that of expenditure which is 4.3%. There is a consistent trend between revenue and expenditure variance rate, and the correlation coefficient of them is 0.77, showing a strong positive correlation.

Trend of budget variance is influenced by domestic fiscal policies as well as the global economic environment. From 1998 to 2004, due to China's two successive active fiscal policies, the revenue budget variance rate stayed around 7%, and finally peaked in 2007. In 2008 and 2009, due to the negative impact of the U.S financial crisis, the budget variance has all fallen back. The new round of fiscal expansion in 2010 and 2011 helped China and the world get out of crisis, however it also led


**Table 1** State general public budget execution from 2000 to 2018 (billion yuan; %)

Over-collection and over-expenditure is common phenomenon from 2000 to 2018. Over-collection rates peaks in 2007(16.5%), 2011 (15.8%) and 2010 (12.4%), while over-expenditure rates peaks in 2011 (9%), 2001 (8.9%) and 2007 (7%)

to a high budget variance (Chen & Lv, 2019). In the following years, the economy of China entered a new normal, the budget deviation began to decline due to the slowdown of GDP growth, under-spending appears in 2014 and under-revenue does so in 2015. The Ministry of Finance has been promoting structural tax cuts and fee reductions since 2014 and appropriately expanding the fiscal deficit. Corresponding to this policy, the spending budget variance rate exceeded the revenue budget variance rate for the first time in 2015, and remained higher than the latter in the next four years, growing slowly continuously.

Budget variance was controlled in general after 2015 mainly because the new budget law came into effect in that year, deepening the fiscal reform. The reduction of the budget variance is the latest achievement in establishing a modern fiscal system in China, indicating that the budgeting process is progressing toward a more scientific direction. The fact that the average expenditure budget variance is smaller than that of revenue reflects that the budget review system is more stringent in expenditure management than revenue. The lack of systematic auditing and monitoring of over-

**Fig. 1** State General Public Budget variance rate from 2000 to 2018 (%). The revenue and expenditure variance rate peaked in 2007 and 2011 and were significantly controlled after 2014

collection funds leads to the less frugal use of it, therefore over-collection can partially explanation for over-expenditure (Focus on budget deviations, 2008).

# *2.3 Descriptive Analysis of Budget Execution*

As seen in Fig. 2, the budget execution is cyclical on a quarterly basis. In terms of revenue, it peaks in the second quarter and then falls back in the rest of the year while in terms of expenditure the largest percentage of annual spending occurs in the last quarter. The average revenue for the four quarters from 2000 to 2018 is 20789.60, 23961.65, 19442.53, 19476.90 billion yuan, while that of expenditure is 17888.10, 23212.16, 21828.40, 30602.28 billion yuan. The cause of spending surging at the end of the year is the lack of scientific budgeting and monitoring of the over-collected funds. The difference between revenue and expenditure is usually positive in the first part of the year and negative in the second part of the year.

# **3 Overview of Time Series Analysis Techniques**

Time series are defined as ordered random variable sequences. The most common time series are discrete stochastic processes obtained at successively equally spaced time points. Time series describes the intrinsic structure of the series and the target of modeling is to make predictions. In brief, time series forecasting is the art of predicting the future by understanding the past.

**Fig. 2** The State General Public Budget execution from 2000 to 2018 (season; billion yuan). Revenues and expenditures have a tendency to grow over time, while making seasonal changes in cycles of 4. The difference between revenue and expenditure shows that there is often a slight surplus in national income in the first half of the year, which is offset by a sudden increase in expenditures in the last quarter

# *3.1 Decomposition of Time Series*

The time series can be decomposed into long-term trend variation *T* (a trend in a long period of time), seasonal variation *S* (regular variation due to seasonal changes), cyclical variation *C* (longer, more irregular cyclical variation) and irregular variation *L* (change caused by many contingent factors). The time series *Y* can be expressed as a function of the above four factors i.e. *Y* = *F*(*T*, *S*,*C*, *L*) like additive model (*Y* = *T* + *S* + *C* + *L*) and the multiplicative model (*Y* = *T* ∗ *S* ∗ *C* ∗ *L*). This chapter focuses on modeling *T*, *S*, *L* of the time series.

# *3.2 AR I M A( p, d, q)*

In *AR*(*p*) the value at time *t* is a linear combination of the intercept, past *p* period observations, and a random error obeying normal distribution.

$$X\_t = \psi\_0 + \sum\_{i}^{p} \psi\_i X\_{t-i} + \varepsilon\_t. \tag{2}$$

The Application of Time Series Analysis … 223

Here *Xt* represents the value at moment *t*, ε*<sup>t</sup>* represents random error obeying normal distribution at moment *t* with a variance of σ<sup>2</sup> which is mutually independent from *Xt* and ψ*<sup>i</sup>* represents the weight of lag order *i*. Generally a sufficient and necessary condition for *AR*(*p*) to be stationary is that all roots of the characteristic equation falling outside the unit circle. Specially, the *AR*(1) model is a Markov process, and *Xt* only relates to *Xt*−1. For example, the sufficient condition for *AR*(1) to be stationary is |ψ*i*| ≤ 1. Its mathematical properties when smooth are shown below, which shows that the auto-correlation of *AR*(1) model is long-tailed.

$$E(X\_t) = \frac{\phi\_0}{1 - \phi\_1}.\tag{3}$$

$$\text{Var}(X\_t) = \frac{\sigma^2}{1 - \phi\_1^2}. \tag{4}$$

$$\gamma\_k = \begin{cases} \frac{\sigma^2}{1 - \phi\_1^{-2}} & k = 0 \\\frac{1}{\phi\_1 \gamma\_{k-1}} & k > 0 \end{cases} . \tag{5}$$

$$
\rho\_k = \begin{cases} 1 & k = 0 \\ \phi\_1 \rho\_{k-1} & k > 0 \end{cases} . \tag{6}
$$

In *M A*(*q*) model the value at *t* is a linear combination of the past *q* period random error, an intercept and the random error obeying normal distribution.

$$X\_t = \theta\_0 + \sum\_{i}^{p} \theta\_i \varepsilon\_{t-i} + \varepsilon\_t. \tag{7}$$

Here θ*<sup>i</sup>* represents the weight of lag order *i*. The *M A* process of finite order does not require any precondition to be stationary. The mathematical properties of *M A*(1) are shown below, from which can be seen that the auto-correlation of the finite order model is truncated-tailed.

$$E(X\_t) = \theta\_0. \tag{8}$$

$$\text{Var}(X\_t) = \sigma^2 (1 + \theta\_1^{\,^2}). \tag{9}$$

$$\chi\_k = \begin{cases} \sigma^2 (1 + \theta\_1^{-2}) & k = 0 \\ \sigma^2 \theta\_1 & k = 1 \\ 0 & k > 1 \end{cases} \tag{10}$$

$$\rho\_k = \begin{cases} 1 & k = 0\\ \frac{\theta\_1}{1 + \theta\_1^{-2}} & k = 1\\ 0 & k > 1 \end{cases} \tag{11}$$

*ARMA* model combines *AR*(*p*) with *M A*(*q*) which can be written as:

$$X\_t = \phi\_0 + \sum\_{i=1}^p \phi\_i X\_{t-i} + \varepsilon\_t + \sum\_{i=1}^q \varepsilon\_{t-i} \,. \tag{12}$$

*ARMA*(1, 1) also requires |φ1| ≤ 1 to be stationary, and the mathematical properties of the stationary *ARMA*(1, 1) is shown below. Its auto-correlation coefficient is similar to the *AR*(1) model, and the partial auto-correlation coefficient is similar to the *M A*(1) model which are both long-trailed, decaying since two order lag.

$$E(X\_l) = \frac{\phi\_0}{1 - \phi\_1}.\tag{13}$$

$$\text{Var}(X\_t) = \sigma^2 \frac{1 + \theta\_1^{-2} + 2\phi\_1 \theta\_1}{1 - \phi\_1^{-2}}.\tag{14}$$

$$\gamma\_k = \begin{cases} \sigma^2 \frac{1 + \theta\_1^{-2} + 2\phi\_1 \theta\_1}{1 - \phi\_1^{-2}} & k = 0 \\ \sigma^2 \frac{(\phi\_1 + \theta\_1)(1 + \phi\_1 \theta\_1)}{1 - \phi\_1^{-2}} & k = 1 \\ \phi\_1 \gamma\_{k-1} & k > 1 \end{cases} \tag{15}$$

$$\rho\_k = \begin{cases} 1 & k = 0\\ \frac{(\phi\_1 + \theta\_1)(1 + \phi\_1 \theta\_1)}{1 + \theta\_1^2 + 2\phi\_1 \theta\_1} & k = 1\\ \phi\_1 \rho\_{k-1} & k > 1 \end{cases} \tag{16}$$

*ARIMA*(*p*, *d*, *q*) model introduces the differential operation to *ARMA*(*p*, *q*). In *ARIMA*(*p*, *d*, *q*), a time series is firstly transformed into a stationary one through *d* difference and then the *ARMA* model is established on it. Using *B* to denote the lagging operator, *ARIMA*(*p*, *d*, *q*) can be defined as:

$$X\_t = (1 - B)^d X\_t. \tag{17}$$

$$X\_t = \phi\_0 + \sum\_{i=1}^p \phi\_i X\_{t-i} + \varepsilon\_t + \sum\_{i=1}^q \varepsilon\_{t-i}.\tag{18}$$

# *3.3 SARIMA( p, d, q)(P, D, Q)<sup>s</sup>*

*SARIMA* model introduces seasonal trend to *ARIMA*, using *B* to denote the lagging operator, *B<sup>s</sup>* to denote s-order lag operator *SARIMA*(*p*, *d*, *q*) can be defined as:

$$Y\_t = (1 - B^s)^D (1 - B)^d X\_t. \tag{19}$$

$$Y\_t = \phi\_0 + \sum\_{i=1}^p \phi\_i Y\_{t-i} + \sum\_{i=1}^p \phi\_i^s Y\_{t-si} + \varepsilon\_t + \sum\_{i=1}^q \varepsilon\_{t-i} + \sum\_{i=1}^{\mathcal{Q}} \varepsilon\_{t-si}.\tag{20}$$

# **4 Modeling of Budget Variance**

# *4.1 Prediction of Budget Execution*

### **4.1.1 Exploratory Data Analysis**

The fiscal revenue and expenditure was increasing between 2000 and 2018, and the volatility was small. In Figs. 3 and 4 the auto-correlation decreases slowly, so a unit root non-stationary model can be applied.

The unit root is detected by Adf test, after the first-order difference the serial autocorrelation decreases slowly, and the auto-correlation plot shows that there is likely to be a seasonal trend with a period of four for both fiscal revenue and expenditure. Revenue and expenditure items are often similar to those of the same period last year. For example, some expenditure items in education, defense, and public facilities field is fixed every year. Therefore, the seasonal effect is reasonable. Hence further make fourth-order differences on the data.

**Fig. 3** The auto-correlation plot of revenue budget execution. The auto-correlation trails off, but the partial auto-correlation truncates after the third order

**Fig. 4** The auto-correlation plot of expenditure budget execution. The auto-correlation trails off, but the partial auto-correlation truncates after the fifth order

The data after fourth-order difference to extract seasonal effects are smooth nonwhite noise series, which meet the prerequisites of modeling. Then we take the data from 2000 to 2016 as training set and the data from 2017, 2018 as test set.

### **4.1.2 Unit Root Non-stationary Model**

The seasonal parameter is decided as 4, establishing *AR*(3) based on the AIC. The *SARIMA*(3, 1, 0)(0, 1, 0)<sup>4</sup> model shows that all the parameters are statistically significant with AIC of 1082.82. Using *MA*, the auto-correlation plot indicates a lag order of three, so *SARIMA*(0, 1, 3)(0, 1, 0)<sup>4</sup> is established. The model shows that only coefficient *ma*<sup>1</sup> was not statistically significant and the AIC is 1079.42. Setting *ma*<sup>1</sup> to 0 and AIC becomes 1080.63, so we choose the first *MA* model.

Trying *SARIMA*(3, 1, 3)(0, 1, 0)4, AIC is 1083.23 and coefficients *ar*1, *ar*2, *ar*3, *ma*1, *ma*<sup>3</sup> are not statistically significant. Fix *ar*<sup>1</sup> to 0, the new model shows an AIC of 1081.28 with *ar*2, *ma*1, *ma*<sup>3</sup> being not statistically significant. Then fix *ma*<sup>1</sup> to 0, the AIC becomes 1079.38, with all remaining coefficients being statistically significant.

Comparing all models based on AIC, it is concluded that *SARIMA*(3, 1, 3) (0, 1, 0)4, *ar*<sup>1</sup> = 0, *ma*<sup>1</sup> = 0 is the optimal unit root non-stationary model for revenue execution.

For expenditure budget execution the seasonal parameter is also decided as 4. Using *AR* model first, deciding the order as 6 according to AIC, the results of *SARIMA*(6, 1, 0)(0, 1, 0)<sup>4</sup> shows that all parameters are statistically significant and the AIC is 1132.89. Then consider *MA* model, *SARIMA*(0, 1, 5)(0, 1, 0)<sup>4</sup> model shows that coefficients *ma*2, *ma*3, *ma*4, *ma*<sup>5</sup> are not statistically significant and the AIC is 1136.75. Fix *ma*2, *ma*3, *ma*4, *ma*<sup>5</sup> to 0 and it shows that all the remaining parameters are statistically significant and AIC becomes 1131.5.

Considering *ARIMA*, *SARIMA*(6, 1, 5)(0, 1, 0)<sup>4</sup> model shows that the AIC is 1133.54 and coefficients *ar*1, *ar*3, *ar*5, *ma*2, and *ma*<sup>3</sup> are not statistically significant. After fixing *ar*5, *ma*1, *ma*3, *ma*<sup>4</sup> to zero the AIC becomes 1129.38 and all remaining coefficients are statistically significant. *SARIMA*(6, 1, 5)(0, 1, 0)4, *ar*<sup>5</sup> = 0, *ma*<sup>1</sup> = 0, *ma*<sup>3</sup> = 0, *ma*<sup>4</sup> = 0 is the optimal unit root non-stationary model by comparing AIC of all the models mentioned above.

### **4.1.3 Fixed Trend Model**

Since the data has a increasing trend, fixed trend model is another candidate model. Considering the nonlinear trend, the regression of revenue to time shows that both one-order and two-order term coefficients of time are statistically significant, with an R-squared of 0.95. In Fig. 5, overlaying the estimated trend on the original series, it is found in Fig. 6 that the fitted values are consistent with the actual ones.

**Fig. 5** Fitting effect of revenue budget execution (billion yuan). The fitted curve is basically consistent with data

**Fig. 6** Fitting residuals of revenue (billion yuan). Obviously it is non-stationary time series

**Fig. 7** Fitting effect of expenditure budget execution (billion yuan). The fitted curve is basically consistent with data

After removing the fixed trends, the remaining part are clearly non-stationary series and the auto-correlation plot of it shows that *AR* model can be applied. According to AIC *AR*(5) is chosen whose AIC is 1173.7. Considering that the data have a strong seasonal correlation so the fourth-order difference is applied and the model *SARIMA*(1, 0, 0)(0, 1, 0)<sup>4</sup> is built according to the auto-correlation plot, whose AIC is 1092.85.

The Regression of expenditure to time shows that the first-order and the secondorder coefficients are statistically significant, and the R-squared is about 0.9. Figures 7 and 8 shows that the fitted values are consistent with data, and the residual meets the modeling requirement.

*AR*(4) model is applied and the AIC is 1223.14. Considering the seasonal trend, fourth-order difference operation is done and the optimal model is obtained which is *SARIMA*(0, 0, 0)(0, 1, 0)<sup>4</sup> whose AIC is 1142.3.

### **4.1.4 Model Comparison and Testing**

The optimal unit root non-stationary model as well as the fixed trend model are selected based on AIC and their performance on test data is compared to make the final choice (see Table 2). It is found that the *SARIMA* model outperforms the *AR*, *MA* model with seasonal parameters on both revenue and expenditure. In terms of revenue budget execution, the fixed trend model outperforms the unit root non-stationary model on test data, although its AIC is slightly higher than it. In terms of expenditure budget execution, the fixed trend model outperforms the unit root non-stationary model on both training and test data. In conclusion, *SARIMA*(1, 0, 0)(0, 1, 0)<sup>4</sup> with a fixed trend is the optimal model for predicting expenditures execution,

**Fig. 8** Fitting residuals of expenditure (billion yuan). Obviously it is non-stationary time series


**Table 2** Comparison of revenue and expenditure forecasting models

and *SARIMA*(6, 1, 5)(0, 1, 0)4, *ar*<sup>5</sup> = 0, *ma*<sup>1</sup> = 0, *ma*<sup>3</sup> = 0, *ma*<sup>4</sup> = 0 is the optimal model for predicting revenue execution. Validating these two models, the residual of their prediction meet the white noise requirement.

### **4.1.5 Forecast and Policy Advice**

The forecast shows that the revenue and expenditure execution will continue growing: the state general public budget revenue will reach 246,765.0 and 267,782.89 billion yuan in 2021 and 2022 while expenditure will reach 258,529.00 and 273,031.33 billion yuan (see Table 3). Figure 9 shows that expenditure will still exceed the revenue in the next two years, but the difference between revenue and expenditure will be significantly controlled. According to the model, from 2021 to 2022, it will be reduced to –1176.396 billion yuan and then to –5248.44 billion yuan.

The excess of expenditure over revenue is the result of China's proactive fiscal policy in recent years. Premier Li Keqiang said in the 2020 government work report that "the current international situation is more unstable and uncertain, the world economic situation is complex and severe; the domestic economy is not yet a solid foundation for recovery, consumer spending is still constrained, investment growth is not strong enough, small and medium-sized enterprises and individual entrepreneurs have more difficulties, and the pressure on stable employment is greater. In this situation, the government still needs to give a hand", so the policy will continue as a heartening agent for market vitality.

On the other hand, the decrease in the fiscal deficit indicates that China's fiscal sustainability will be improved. Although China's fiscal deficit ratio is always at a low


**Table 3** Forecast of the state general public budget revenue and expenditure

**Fig. 9** Forecast of revenue and expenditure budget execution from 2021 to 2022 (season; billion yuan). In the next two years, the expenditure execution will still exceed the revenue, while the difference between income and expenditure will continue to narrow

level compared to Japan, the U.S. and other developed countries, China still needs to be wary of the rapid expansion of government debt, which will rapidly climb once the government becomes debt-dependent, creating a "snowball" effect, i.e., "issuing new debt to pay off old one". In addition, Chinese government also has a hidden debt problem, which is not yet accurately counted, but is generally considered to be of a large scale (Liu & Huang, 2008).

All in all, the future trend of fiscal revenue and expenditure reflects the trade-off between controlling fiscal deficit and restoring the vitality of the market economy, which is a reflection of the "no sharp turn" fiscal policy.

By 2020, China has already implemented a massive tax and fee reduction policy. Finance Minister Liu Kun stated at a meeting of the NPC on March 5, 2021: "During the 13th Five-Year Plan period, tax cut and fee reduction is unprecedented, reaching a total of 7.6 trillion yuan, thus effectively promoting the development of market players and the real economy." However, tax cuts and fee reductions can also make it more difficult to balance fiscal revenue and expenditure. Therefore it is suggested that the finance department should make appropriate adjustments on policy, optimize the tax structure, achieve increases and decreases in different tax sections rather than large and general tax cuts. At the same time, the existing tax policy should be implemented, especially the "precise policy", which means the cuts must be applied to the most difficult areas and enterprises of small and medium-size in the industry. In terms of expenditure, we should keep the strategy of reducing expenditure, especially those for going abroad, vehicle purchase and operation and official reception.

# *4.2 Prediction of Budget Variance*

Based on Eq. 1 the chapter will predict budget variance using the following formula on the grounds that the factors affecting it can be divided into those which affects the level of budget execution and those who affects budget variance rate, separately modeling them can help to better capture the serial correlation in data. This section will complete the modeling of the rate of budget variance.

$$\text{Budget Variance} = \text{BudgetExecution} \ast \left(\frac{\text{Budget Variance Rate}}{\text{Budget Variance Rate} + \text{l}}\right). \tag{21}$$

### **4.2.1 Exploratory Data Analysis**

Over-collecting and over-spending are common between 2000 and 2018, and it can also be seen in Fig. 10 that the budget variance of revenue and expenditure are both non-stationary time series. After first-order difference the data passes the stationary test, thus a unit root non-stationary model can be applied.

### **4.2.2 Unit Root Non-stationary Model**

*MA*model is applied and the lag order is set as 2 according to the auto-correlation plot. *ARIMA*(0, 1, 2) shows an AIC of 110.63 and the parameter *ma*<sup>1</sup> is not statistically significant. After setting it to zero the AIC decreases to 109.6, which is better than the previous model.

In *AR* model, according to the auto-correlation plot, the lag order should be set to 2. *ARIMA*(2, 1, 0) shows an AIC value of 108.19 and the parameter *ar*<sup>1</sup> is not statistically significant. The AIC value rises to 108.63 after setting *ar*<sup>1</sup> to zero, so the previous model is chosen.

*ARIMA*(2, 0, 2) shows an AIC value of 116.64. Setting the insignificant parameters *ma*<sup>1</sup> and *ma*<sup>2</sup> to zero degrades the model to the *AR* model, so *ARMA* is not included as a candidate model.

*M A* model is applied for the budget variance rate of expenditure after first-order difference. The lag order should be set as 2 according to the auto-correlation plot. The *ARIMA*(0, 1, 2) model shows an AIC of 90.89, with parameter *ma*<sup>1</sup> and *ma*<sup>2</sup> being statistically significant. These two parameters were retained because the AIC increases after removing them.

According to the auto-correlation plot *ARIMA*(2, 1, 0) is set with AIC as 92.92 and *ar*<sup>1</sup> being not statistically significant. After setting it to zero AIC decreases to 92.37.

Considering *ARMA*, *ARIMA*(2, 0, 2) model shows an AIC of 93.83. Setting *ar*<sup>1</sup> and *ma*<sup>2</sup> to zero, AIC decreases to 90.41.

**Fig. 10** The state public budget data variance rate (%)


**Table 4** Comparison of the model on budget variance rate

### **4.2.3 Model Comparison and Testing**

The optimal model is selected from those mentioned above (see Table 4): *ARIMA*(2, 1, 0) is the optimal model for revenue budget variance rate, *ARIMA* (2, 0, 2), *ar*<sup>1</sup> = 0, *ma*<sup>2</sup> = 0 is the optimal model for expenditure. The residual of both models pass the white noise test.

### **4.2.4 Forecast and Policy Advice**

The revenue budget variance is forecast to be 1.0377 and 0.4684% in 2021 and 2022, and 3.685 and 3.8845% for those of expenditure. Figure 11 tells that the budget variance rate in the next two years will be further controlled compared with the previous years, while the phenomenon of over-collection, over-spending and the general trend that expenditure variance rate exceeding revenue variance rate will remain unchanged. Taking the forecast results into Eq. 21, the revenue variance in 2021 and 2022 will be 2534.38 and 1248.44 billion yuan while expenditure variance will be 9188.20 and 10209.32 billion yuan. Compared with the data of 2017 and 2018, the absolute value of budget variance remains unchanged, which indicates that it is under control, considering the continuous growth of national economy. The forecast results can be explained from four perspectives:

The first factor is economy. Economic factors affect both budget variance rate and the level of budget execution: the faster the economy develops, the faster the budget execution grows, while the uncertainty of budgeting process also increases, leading to an increasing variance. In 2010s, China has experienced a shift from "speed" to "quality" in development priorities and started to consciously control its economic growth rate, while at the same time launching a supply-side reform. These macro factors have made it easier to estimate China's growth prospects and thus make it much less difficult to control budget deviations in the future. From a revenue perspective, it is common practice for the Chinese government to set revenue budgets by adding a few percentage points to the current year's GDP growth rate (Sun & Wu, 2012). Considering that governments at all levels have always been more inclined to leave room while budgeting (Wang, 2009), the over-collection phenomenon will

**Fig. 11** The forecast budget variance rate (year; %). The revenue and expenditure variance rate will be further controlled in 2021 and 2022, and the rate of expenditure will be higher than that of revenue

remain constant in the long run. From the perspective of expenditures, more fiscal spending items will occur during proactive fiscal policy, leading to more difficulties in controlling variance, which constitutes a partial explanation for the fact that expenditure variance will exceed revenue variance in the future.

The second factor is the soft constraints of budget execution (Chen & Lv, 2019). This factor mainly affects the level of budget variance rate, and the stricter it is, the lower the rate will be. Before 2007, the over-collected funds were neither included in the supervision of the NPC nor excluded from the next year's budget. The inadequate regulation becomes an incentive for over-collection (Ma, 2009). Similarly, the lack of supervision on under-spent funds will also encourage expenditure variance to increase. In order to solve the soft constraint problem, the GOC has made unremitting efforts since 2007: in 2007, the central government established the Central Budget Stabilization and Adjustment Fund (CBSAF) to save and subsidize the over-collected funds to the short-collection year. The Budget Law of the People's Republic of China (2014 Revision) (hereinafter referred to as the new budget law), which was implemented in 2014, has increased the control over budgeting processes by increasing the transparency of the budget, establishing a system to control the inter-year budget and balance the budget across years; In the second revision of the Regulations on the Implementation of the Budget Law in 2018, the reform guideline of improving the budget performance management system and constructing a new pattern of all-round budget performance management was proposed. These moves target strengthening soft constraints, which will help restrain the budget variance in the long run.

The third factor is related to fiscal management system which mainly affects the level of budget variance rate. As local governments rely on the central government's transfer payments, they are motivated to fight for more financial aids than they need, which encourages the phenomenon of "fighting more than spending" and opportunistic behaviors (Chen & Lv, 2019). Furthermore, local governments tend to raise their spending, especially in the last quarter, to fulfill budget tasks, which leads to irregular use of funds and other risks. These two factors contributed to the variance in fiscal expenditure in the past. However, the Regulations on the Implementation of the Budget Law, which were revised for the second time in 2018, point to improve the transfer payment process, clarify the fiscal relationship between governments by clarifying the types and scope of transfer payments, improving the evaluation of special transfer payments and regulating the process of fund transfer more strictly. These actions have significantly controlled the fiscal expenditure variance, which is reflected in the forecast.

The fourth factor is external supervision, which mainly affects the level of budget variance rate. In China, the NPC is charged with overseeing the budget (Chen & Lv, 2019), while the private sector also plays a supervisory role. Before the revision of the new budget law, the NPC's supervision was not sound enough, lacking professionalism, while it was also difficult for the private sector to form a strong supervision on the budget due to the insufficient information disclosed by the government. In 2014, the New Budget Law made efforts to strengthen the NPC's supervision and audit power over the use of budget funds, and the budget report was also required to be more detailed. In 2018, the Implementation of the Budget Law further requires more transparency, disclosing more data about government debt, agency operating expenses, government procurement and financial earmarked funds. Furthermore, it stipulated explicitly that special transfer payments should be disclosed by region and project, and expenditures by item. These measures strengthen the supervision function of the NPC and society, which is a reflection of the people's ownership and will continuously curb the budget variance rate.

All in all, the decrease of budget variance rate in the next two years is the result of the continuous reform of China's fiscal budget system over years. It is one of the most important achievement of the GOC's modernization of national governance capacity.

Based on the results of modeling and analysis, in order to further control the budget variance, we can focus on the following aspects. The first is to develop more scientific budgeting methods like adopting more mathematical models (e.g., time series techniques and uncertainty theory) and big data techniques (e.g., deep learning) to establish a more predictive budget planning system; The second is to strengthen the management of budget performance evaluation, such as implementing mediumterm financial budgeting management and gradually integrating it into the existing budget performance evaluation. The third is to harden the soft constraints on budget planning and execution. For example, by eliminating the misuse of over-collected and under-spent funds through legislation. The fourth is to further improve the budget law, clarify the fiscal relationship between central government and local government, focus on controlling scale of transfer payments, especially special transfer payments, enhance the efficiency of capital flow. Finally, GOC should further implement the supervision over the budget planning and execution to create a more transparent budget system. For this purpose, the NPC's power must be reinforced and more details of fiscal data needs to be published.

# **5 Conclusion**

This chapter completes a descriptive analysis of budget data from 2000 to 2018 in China, pointing out that the level of history budget variance is affected by the global economy and the macroeconomic policies of China. A series of budgetary management actions and fiscal reforms implemented by the Chinese government since 2014 were effective which is reflected in the fact that the budget variance has been well controlled in recent years. The chapter chooses unit root non-stationary model and fixed trend model to model for budget execution and budget variance rate data. The future budget variance in the next two years is calculated from the forecast of the two models. According to the prediction, the budget variance in 2021 and 2022 will be further controlled, and this positive trend is the result of the combination of economic, soft constraints, institutional and regulatory factors.

There is still much room for improvement. Theoretical forecast errors for 2021 and 2022 may exist (Zhao & Wu, 2013) mainly because the impact of the epidemic on China's economy is not taken into account. There is also room for improvement in the combination of fiscal theory and modeling results. Other researchers can make further analysis, like forecasting the level of budget variance of a province or municipality, thus drawing conclusions with local characteristics (Lin & Ma, 2013). China's fiscal data are increasingly abundant, so the research difficulty of this topic will decrease over time. Finally, there are more time series methods that can be used to accomplish the tasks accomplished in this chapter, such as time series multiplicative models (Jiang & Cheng, 2018), more advanced mathematical tools such as uncertainty theory (Liu & Peng, 2005), data mining techniques (Qin, 2018) and Markov Chains (Hou et al., 2010; Li & Yi, 1997) can be used during modeling, which may lead to more accurate conclusions.

# **References**

Chen, G. (2000). *Fiscal science*. Renmin University of China Press.


Focus on budget deviations. (2008). *Foreign-Related Taxation,* (1), 5–6.


Lin, M. H., & Ma, J. (2013). Study on budgetary supervision of local people's congresses in China. *Chinese Social Science, 6,* 78–103.

Liu, B. D., & Peng, J. (2005). *Uncertainty theory tutorials*. Tsinghua University Press.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.