**Understanding Complex Systems**

# Federico Beffa

# Weakly Nonlinear Systems With Applications in Communications Systems

# **Springer Complexity**

Springer Complexity is an interdisciplinary program publishing the best research and academic-level teaching on both fundamental and applied aspects of complex systems—cutting across all traditional disciplines of the natural and life sciences, engineering, economics, medicine, neuroscience, social and computer science.

Complex Systems are systems that comprise many interacting parts with the ability to generate a new quality of macroscopic collective behavior the manifestations of which are the spontaneous formation of distinctive temporal, spatial or functional structures. Models of such systems can be successfully mapped onto quite diverse "real-life" situations like the climate, the coherent emission of light from lasers, chemical reaction-diffusion systems, biological cellular networks, the dynamics of stock markets and of the internet, earthquake statistics and prediction, freeway traffic, the human brain, or the formation of opinions in social systems, to name just some of the popular applications.

Although their scope and methodologies overlap somewhat, one can distinguish the following main concepts and tools: self-organization, nonlinear dynamics, synergetics, turbulence, dynamical systems, catastrophes, instabilities, stochastic processes, chaos, graphs and networks, cellular automata, adaptive systems, genetic algorithms and computational intelligence.

The three major book publication platforms of the Springer Complexity program are the monograph series "Understanding Complex Systems" focusing on the various applications of complexity, the "Springer Series in Synergetics", which is devoted to the quantitative theoretical and methodological foundations, and the "Springer Briefs in Complexity" which are concise and topical working reports, case studies, surveys, essays and lecture notes of relevance to the field. In addition to the books in these two core series, the program also incorporates individual titles ranging from textbooks to major reference works.

Indexed by SCOPUS, INSPEC, zbMATH, SCImago.

#### **Series Editors**

Henry D. I. Abarbanel, Institute for Nonlinear Science, University of California, San Diego, La Jolla, CA, USA Dan Braha, New England Complex Systems Institute, University of Massachusetts, Dartmouth, USA Péter Érdi, Center for Complex Systems Studies, Kalamazoo College, Kalamazoo, USA; Hungarian Academy of Sciences, Budapest, Hungary Karl J. Friston, Institute of Cognitive Neuroscience, University College London, London, UK Sten Grillner, Department of Neuroscience, Karolinska Institutet, Stockholm, Sweden Hermann Haken, Center of Synergetics, University of Stuttgart, Stuttgart, Germany Viktor Jirsa, Centre National de la Recherche Scientifique (CNRS), Université de la Méditerranée, Marseille, France Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Kunihiko Kaneko, Research Center for Complex Systems Biology, The University of Tokyo, Tokyo, Japan Markus Kirkilionis, Mathematics Institute and Centre for Complex Systems, University of Warwick, Coventry, UK Ronaldo Menezes, Department of Computer Science, University of Exeter, UK Jürgen Kurths, Nonlinear Dynamics Group, University of Potsdam, Potsdam, Germany Andrzej Nowak, Department of Psychology, Warsaw University, Warszawa, Poland Hassan Qudrat-Ullah, School of Administrative Studies, York University, Toronto, Canada Linda Reichl, Center for Complex Quantum Systems, University of Texas, Austin, USA Peter Schuster, Theoretical Chemistry and Structural Biology, University of Vienna, Vienna, Austria Frank Schweitzer, System Design, ETH Zürich, Zürich, Switzerland Didier Sornette, Institute of Risk Analysis, Prediction and Management, Southern University of Science and Technology, Shenzhen, China Stefan Thurner, Section for Science of Complex Systems, Medical University of Vienna, Vienna, Austria

**Founding Editor: Scott Kelso**

Federico Beffa

# Weakly Nonlinear Systems

With Applications in Communications Systems

Federico Beffa Federico Beffa Engineering Alto Malcantone, Switzerland

ISSN 1860-0832 ISSN 1860-0840 (electronic) Understanding Complex Systems ISBN 978-3-031-40680-5 ISBN 978-3-031-40681-2 (eBook) https://doi.org/10.1007/978-3-031-40681-2

© The Editor(s) (if applicable) and The Author(s) 2024. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.

*There is nothing more practical than a good theory.*

# **Foreword**

The field of circuit design for communications is a rapidly evolving area of research, with new technologies and applications emerging all the time. One of the challenges in this field is the need to accurately model and analyze the behavior of nonlinear systems. This book offers an approach to this problem, that is both more rigorous and intuitive than previously published.

The book is organized into two main parts: signals and systems. The first part introduces the theory of distributions and covers topics such as basic properties of distributions, convolution, Fourier and Laplace transforms, and summable distributions. The second part focuses on the application of distributions in the study of convolution equations and their solutions.

In contrast to the few existing treatments, the approach taken highlights the algebraic structure underlying weakly nonlinear systems and is based on distributions, rather than functions. The use of distributions leads naturally to the convolution algebras of Linear Time-Invariant (LTI) systems and the ones suitable for weakly nonlinear systems emerges as simple extensions to higher order distributions, without having to resort to ad hoc operators. The main advantages of the approach include a new justification for the validity of the Volterra series; with a much-simplified notation, free of multiple integrals. The net result being a conceptual simplification and the ability to solve the associated nonlinear differential equations in a purely algebraic way.

Throughout the book, the author provides clear explanations of the key concepts and techniques, with numerous examples drawn from the area of circuits for wireless communications. As well as being of practical use to practitioners, these should help the reader to gain a deeper understanding of the material covered in the earlier part of the book. Of particular interest, to those interested in modern high-frequency circuit design, analysis of the classic amplifier cascode (common gate) shows the origin of nonlinear phenomenon, such as intermodulation, that can be accurately quantified from very simple models of the underlying transistors without having to resort to simulation. A similar analysis of local feedback in common source amplifiers (degeneration) shows similar rich behavior. As well as being theoretically interesting, this provides practical methods to the optimal design of such circuits.

This book is primarily intended for graduate students in engineering, who are interested in the theory of nonlinear systems and its application. However, it is likely that researchers in mathematics and physics will also find the material useful.

I'm sure that this book will serve as a valuable resource for anyone working with nonlinear dynamical systems who wish to learn more about this interesting and relevant topic.

Surrey, England April 2023

Jon Strange

# **Preface**

When I started working in industry I was immediately confronted with systems whose performance was invariably limited by noise and nonlinearities. For noise, there is a well-developed theory that can be used to guide the design and suggest ways to improve the system. For distortion, I was not aware of good theories and the investigations and developments were done entirely by numerical simulations. It's not that I didn't have interest in nonlinear systems. But rather than that, despite the fact that I graduated from a good university, the only courses offered on nonlinear systems were in the field of control theory and almost entirely devoted to stability questions. No course taught widely applicable methods to calculate the response of nonlinear systems.

While working on trying to improve the linearity of a system, I read a paper making use of nonlinear transfer functions. Although I had heard the name "Volterra Series", I did not know what it was, nor did anyone I knew. Stimulated by that paper I looked for a book, but didn't find any in print. So, I looked in the second-hand market and found an out-of-print book mentioned in the paper. The book was very focused on practical applications, and didn't say much about the broader theory. In any case, having learnt how to use nonlinear transfer functions, I started using them, and they immediately illuminated the reasons for some effects that I saw in simulations and that I didn't fully grasp. Since then, I kept using that method very fruitfully. I also used nonlinear transfer functions in design reviews to try to pass on to colleagues the intuition that it gave me about nonlinear effects.

After some time, some colleagues started to see the usefulness, and I was asked to prepare an internal tutorial on the subject. In preparing it, I realized how limited my theoretical understanding on the subject was and developed a desire for a much deeper understanding: I was hooked. I started studying it more by myself as well as through other (old) books, reports, and papers. While all studies that I saw did introduce some ad-hoc operators, I tried to develop a formulation using standard ones. When I realized that pairing convolution with tensor product would do, it became self-evident that the Volterra series could be seen as a generalization of the Taylor series. Convolution took front and center, but to make the mathematics solid, I had to resort to Schwartz's distributions. Differential equations became convolution equations and the Volterra series became a generalized formal power series.

This book is a summary of my investigations in which I tried to develop the theory in a new and, I believe, simpler form. I included many examples. Some of them are short and serve to illustrate some points just discussed. Others are (condensed versions of) real-world applications where I try to illustrate the power of the theory.

The book was written primarily for engineers. As is common in electrical engineering, I use the symbol *j* to denote the imaginary unit of complex numbers. This is to avoid confusing it with currents which are commonly denoted by *i*. All plots in the book were generated with the open-source CAS system maxima [1] which I also used to check many calculations and to compute all numerical solutions of differential equations. All numerical simulations of nonlinear networks were performed with the open-source circuit simulator Xyce [2].

I would like to take the opportunity to thank some people that in a direct or indirect way have contributed to this book. First of all, I would like to thank my wife Alessandra for her constant encouragement and support during this project. I would also like to thank many colleagues at Analog Devices and Mediatek who shared their insights with me and stimulated me to go deeper. In particular, I would like to thank Jon Strange who, among other things, did put a lot of trust in me and involved me in many stimulating and future-looking projects. They were the stimulus that ultimately lead me to write this book.

Alto Malcantone, Switzerland February 2023

Federico Beffa

# **Contents**


#### **Part I Signals**



#### **Part II Systems**




# **Chapter 1 Introduction**

Nonlinear systems are everywhere, yet most engineering curricula devote very little time, if any, to them. The reason is twofold: first of all, there is no general theory describing arbitrary nonlinear systems. Second, the theory of linear systems is effective in facilitating the design of many real-world systems. In fact, for sufficiently small input signals, the behaviour of most nonlinear systems can be approximated by a linear model. As a consequence the majority of engineered systems are designed based on linear system theory and their usability is limited in one way or another by the deviation of the real system from the assumed linear behaviour. This book is intended to give engineers a powerful tool to model, understand and reduce the impact of mild deviations from linear behaviour and thereby design better systems.

This chapter tries to develop some intuition for what we call weakly-nonlinear systems. It also tries to give an idea of the theory to which this book is devoted and to its applicability. The chapter is not meant to introduce in a precise way any concept. In fact the exposition is rather informal. A proper systematic development of the theory will start with the next chapter.

# **1.1 Nonlinear Phenomena**

The range of phenomena exhibited by nonlinear systems is much richer than the one of linear systems. To understand the applicability of the presented theory it's useful to have an idea of the main ones that may appear. In the following we give a bird's-eye view of them with qualitative descriptions.

# *1.1.1 Multiple Equilibrium Points*

Most dynamical systems can be described by a system of differential equations that can be written in the form

**Fig. 1.1** Pendumum in Earth's gravitational field

with *<sup>u</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* the state of the system and *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* the driving or input signal. For simplicity in this chapter we limit ourselves to *autonomous* systems. These are systems described by the simpler equation

$$\frac{\mathbf{d}}{\mathbf{d}t}\mu = f(\mu). \tag{1.1}$$

In the case of first and second order systems one can obtain a good qualitative understanding by examining the *phase portrait* of the system. This is a graphical representation of a family of state trajectories *t* → *u*(*t*) for various initial conditions *u*<sup>0</sup> in the plane spanned by the components *u*<sup>1</sup> and *u*<sup>2</sup> of *u* that in this context is called the *state* or *phase space*. Note that the phase portrait can be sketched without having to solve the equation by considering the vector field defined by *f* in the state plane. With it, it's easy to estimate the trajectory of the state *u* for every initial condition *u*0.

Of special interest are the zeros of the vector function *f* . That's because in those states the derivative with respect to time of the state vector *u* vanishes. In other words, the zeros of *f* are the equilibrium points of the system. *A nonlinear function f in general has several equilibrium points* and that's a first fundamental difference from linear systems which always have only one equilibrium point.

If *f* is well-behaved,<sup>1</sup> for every initial state *u*0, the system (1.1) has a unique solution. This means that the trajectories in the phase plane do not intersect. Therefore, the trajectories can only begin or end at equilibrium points, at infinity or on limit cycles (see below).

As an example consider the ideal friction-less pendulum shown in Fig. 1.1 and described by the differential equation

$$\frac{\mathbf{d}^2}{\mathbf{d}t^2}\phi + a\_0^2 \sin(\phi) = 0, \qquad a\_0 = \sqrt{\frac{\mathbf{g}}{l}}$$

<sup>1</sup> We will make this statement precise in a later chapter.

#### 1.1 Nonlinear Phenomena 3

**Fig. 1.2** Phase portrait of an ideal pendulum with ω<sup>0</sup> = 1

with φ the angle from the vertical, *g* the gravitational acceleration and *l* the length of the arm. The equation can be rewritten as

$$\frac{\mathbf{d}}{\mathbf{d}t} \begin{pmatrix} u\_1 \\ u\_2 \end{pmatrix} = \begin{pmatrix} u\_2 \\ -a\_0^2 \sin(u\_1) \end{pmatrix}$$

where we have set *u*<sup>1</sup> = φ, *u*<sup>2</sup> = dφ/d*t*. The phase portrait of this system is clearly periodic along the *u*<sup>1</sup> axis. We can therefore limit the study to the range *u*<sup>1</sup> = [−π, π). <sup>2</sup> In this range the system has two equilibrium points: *u*0*<sup>a</sup>* = (0, 0) and *u*0*<sup>b</sup>* = (π, 0).

The phase portrait of this system is depicted in Fig. 1.2 with the equilibrium points shown as black dots. The dashed lines connect the two equilibrium points and separates the phase plane in two distinct regions in which the system has different behaviour. The boundary between the two regions (the surface constituted by the dashed lines) is called the *separatrix*. Trajectories surrounding the equilibrium point *u*0*<sup>a</sup>* are closed curves that represent oscillations. The trajectories above and below the separatrix represent the pendulum perpetually rotating around the pivot.

This shows a second fundamental difference from linear systems. *Nonlinear systems can exhibit different behaviour and characteristics in different regions of the phase space.*

<sup>2</sup> Given that physically the angle <sup>φ</sup> and <sup>φ</sup> <sup>+</sup> <sup>2</sup><sup>π</sup> describe the same location, we should think of the phase space as a cylinder rather than a plane.

**Fig. 1.3 a** Electrical *RLC* oscillator with nonlinear feedback **b** Voltage-controlled current-source characteristic

# *1.1.2 Limit Cycles*

A further phenomenon present in some nonlinear systems that doesn't exist in linear ones is that of the *limit cycles*. These are periodic solutions of the equations at specific signal levels. As a simple example, consider the oscillator shown in Fig. 1.3a. It consists of a passive *RLC* resonator and a nonlinear saturating voltage-controlled current source (VCCS) with characteristic

$$i(v) = I\_0 \tanh\left(\frac{v}{V\_s}\right)$$

and plotted in Fig. 1.3b. The system is described by

$$
\frac{\mathbf{d}^2}{\mathbf{d}t^2}\upsilon + \frac{\omega\_o}{q} \left[1 - G\_m(\upsilon)R\right] \frac{\mathbf{d}}{\mathbf{d}t}\upsilon + \omega\_0^2 \upsilon = 0. \tag{1.2}
$$

with

$$\alpha\_0 = \frac{1}{\sqrt{LC}}\qquad\qquad\qquad q = \frac{R}{\alpha\_0 L}$$

and the nonlinear transconductance

$$G\_m(v) = \frac{\mathbf{d}}{\mathbf{d}v}i(v) = \frac{I\_0}{V\_s}\text{sech}^2(\frac{v}{V\_s}).$$

If the maximum of |v(*t*)| over a full period *T* = 2π/ω<sup>0</sup> remains small compared to *Vs* then the value of sech(v(*t*)/*Vs*) remains very nearly 1 over a full cycle. Therefore, under this assumption, if *Gm*(0)*R* > 1 the coefficient of the first order derivative of v in (1.2) is negative and the (0, 0) equilibrium point of the equation is unstable. Differently from this, if the maximum of |v(*t*)| over a period is much larger than *Vs* then the value of sech(v(*t*)/*Vs*) approaches 0 for most part of a period. Hence, in this regime of operation the system is governed by an equation corresponding to

the one of a damped oscillator. Between these two extreme cases there is a periodic trajectory, a limit cycle. On this trajectory the energy dissipated during one cycle by the resistor *R* is perfectly balanced by the energy injected in the resonator by the controlled source. This behaviour of the system is clearly discernible in the phase portrait shown in Fig. 1.4 in which we chose the current flowing through the inductor (downwards) *iL* and v as state variables.

This example has a stable limit cycle, but there are systems with unstable limit cycles: any infinitesimally small deviation from the perfectly periodic trajectory leads to a trajectory diverging from the limit cycle. Limit cycles can also be stable on one side and unstable on the other one.

# *1.1.3 Bifurcations*

All practical systems depend upon some parameters. For example the oscillator of the previous section depends on the value of the resistor *R*, and it is interesting to study how the value of that parameter affects the behaviour of the system. In particular the number and type of equilibrium points of a system may depend on the value of some parameter. This is in fact the case for our oscillator: For *Gm*(0)*R* < 1 the system has a single stable equilibrium point, while for *Gm*(0)*R* > 1 that equilibrium point becomes unstable and a limit cycle makes its appearance. Parameter values at which the character of the system behaviour changes are called *critical* or *bifurcation points.*

As a second example, consider a system described by the differential equation

$$\frac{\mathbf{d}^2}{\mathbf{d}t^2}\mu = -\lambda\mu + \mu^3.$$

The system can be interpreted as having a potential energy

$$U\_{\lambda}(\mu) = \frac{\mu^4}{4} - \lambda \frac{\mu^2}{2}.$$

For λ < 0 the potential energy has a single minimum at *u* = 0, while for λ > 0 it has two minima as shown in Fig. 1.5a. In the latter case *u* = 0 is an unstable equilibrium point and two new stable equilibrium points at ± <sup>√</sup><sup>λ</sup> do appear. If we draw the equilibrium points of the system as a function of λ one obtain the so-called *pitchfork* shown in Fig. 1.5b.

#### 1.1 Nonlinear Phenomena 7

**Fig. 1.6** Driven pendumum in Earth's gravitational field

# *1.1.4 Chaos*

Consider the driven pendulum shown in Fig. 1.6. It is similar to the one of Sect. 1.1.1, with the difference that now the pivot moves in time in the vertical direction as described by the function *yp*. This movement introduces a driving term in the differential equation that then becomes

$$\frac{\mathbf{d}^2}{\mathbf{d}t^2}\phi + a\_0^2 \sin\phi = -\frac{\sin\phi}{l}\frac{\mathbf{d}^2}{\mathbf{d}t^2}\mathbf{y}\_p(t).$$

Lets assume that the drive is periodic *yp*(*t*) = *A* cos(ω*t*). Figure 1.7 shows the time evolution of φ for two almost identical initial conditions. The upper curve was computed with the pendulum starting with <sup>d</sup> <sup>d</sup>*<sup>t</sup>* φ(0) = 0 rad/s and at an angle of φ(0) = 1 rad. The lower curve was computed with almost identical initial conditions d <sup>d</sup>*<sup>t</sup>* φ(0) = 0 rad/s and φ(0) = 1 + 10−<sup>10</sup>/*l* rad. The lower curve was thus started with a displacement corresponding to approximately an atom diameter from the upper one. The evolution of the two is initially very similar. However, after some time they become completely different and uncorrelated. This extreme sensitivity to initial conditions is the characteristic defining *chaotic systems* and makes long term predictions essentially impossible. In those systems the initial difference between adjacent trajectories grows on average exponentially [3].

In this simple case the phenomenon is intuitively understandable. When the pendulum reaches a position very close to the vertical, an infinitesimal difference in velocity can determine if it makes a full turn or if it goes back.

Note that if the initial oscillation is sufficiently small, the force exercised by the vertical drive is almost orthogonal to the direction in which the mass is free to move. For this reason, small oscillations are not pushed to large swings and do not show chaotic behaviour. There are therefore regions of the phase space exhibiting chaotic behaviour and regions not exhibiting it. The areas of these regions depend of course on the amplitude *A* of the drive. Small values of *A* lead to large areas in which the system behaves predictably and only small areas displaying chaotic behaviour.

**Fig. 1.7** Time evolution of the driven pendulum with <sup>ω</sup><sup>0</sup> <sup>=</sup> 1 rad/s, ω <sup>=</sup> <sup>2</sup>ω0, *<sup>g</sup>* <sup>=</sup> <sup>9</sup>, 8 m/s2,*<sup>l</sup>* <sup>=</sup> <sup>9</sup>.8 m, *<sup>A</sup>* <sup>=</sup> *<sup>l</sup>*/10. The upper curve was computed with initial conditions φ(0) <sup>=</sup> 1 rad, <sup>d</sup> <sup>d</sup>*<sup>t</sup>* φ(0) = 0 rad/s, the lower one with initial conditions φ(0) <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>10</sup>−10/*<sup>l</sup>* rad, <sup>d</sup> <sup>d</sup>*<sup>t</sup>* φ(0) = 0 rad/s

# **1.2 Weakly-Nonlinear Systems**

Chaos, bifurcations and other phenomena of nonlinear systems are fascinating, important and sometimes fundamental to the problem at hand. However, the vast majority of engineered system operate around stable equilibrium points by design. From an engineering point of view a quantitative theory to study the behaviour of nonlinear systems in the proximity of stable equilibrium points is therefore very important.

Inspection of the presented phase portraits suggest that in the neighbourhood of equilibrium points the behaviour of nonlinear systems is not too different from the one of linear systems, the deviation increasing with increasing distance of the state from the equilibrium points. In fact this statement can be made more precise. Consider a time invariant, single-input single-output (SISO) system with input *x* whose state dynamics is governed by the system of first order differential equations

$$\frac{d}{dt}\mu(t) = f\left(\mu(t), \boldsymbol{x}(t)\right), \qquad f: \mathbb{R}^n \times \mathbb{R} \to \mathbb{R}^n$$

and its output *y* by the algebraic equation

$$\mathbf{y}(t) = \mathbf{g}(\boldsymbol{\mu}(t), \mathbf{x}(t)) \,, \qquad \mathbf{g} : \mathbb{R}^n \times \mathbb{R} \to \mathbb{R} \; . $$

If around the equilibrium point *u*<sup>0</sup> = 0<sup>3</sup> and *x* = 0 the functions *f* and *g* are differentiable, then, using a Taylor expansion, the system behaviour can be approximated by the linear equations

$$\frac{\mathbf{d}}{\mathbf{d}t}\mu(t) \approx A\mu(t) + B\boldsymbol{x}(t) \qquad A \in \mathbb{R}^{n \times n}, \quad B \in \mathbb{R}^{n \times 1}$$

and

$$\mathbf{y}(t) \approx \mathbf{C}u(t) + D\mathbf{x}(t) \,, \qquad \mathbf{C} \in \mathbb{R}^{1 \times n} \,, \quad D \in \mathbb{R} \,.$$

The response of the system to the input signal can then be expressed by a convolution integral between the impulse response *h* of the system and the input signal

$$\mathbf{y}(t) = h(\mathbf{r}) \* \mathbf{x}(t) \,. \tag{1.3}$$

Note that in this chapter by stable equilibrium point we mean one for which all eigenvalues of the linearized state equation are negative.

The linear systems theory is very useful. However, for many practical applications this idealisation is too crude and doesn't capture effects that limit the usability of a vast array of systems. The theory presented in this book enables one to solve the system equations when *f* and *g* are approximated by a higher order polynomial or even by power series. The theory therefore is able to give a more faithful description of the behaviour of many real systems. In particular, it allows probing into effects outside the reach of linear systems theory.

Consider first a *memory-less* system, that is, a system whose output *y*(*t*) at time *t* depends only on the value of its input signal *x*(*t*) at time *t* and not on any of its past (or future) values. Such a system can be represented by a function *g* mapping for every value of *t* the value *x*(*t*) to *y*(*t*)

$$\mathbf{y}(t) = \mathbf{g}(\mathbf{x}(t))\,\mathrm{s}$$

Let's assume that for a zero input signal the output is zero and that *g* can be expanded in a Taylor series. Then we can write

$$\mathbf{y}(t) = \sum\_{k=1}^{\infty} \mathbf{g}\_k \mathbf{x}^k(t) \,.$$

Using the Dirac δ distribution this expression can be written in the different form

$$\mathbf{y}(t) = \sum\_{k=1}^{\infty} \mathbf{g}\_k \,\delta(\mathbf{r}\_1, \dots, \mathbf{r}\_k) \ast \mathbf{x}^{\otimes k}(t) \,\dots$$

<sup>3</sup> By a change of variables it's always possible to move the equilibrium point of interest to the origin.

That's because, under assumptions to be made precise later, the δ distribution is the unit of the convolution product

$$
\delta(\mathfrak{r}) \* \mathfrak{x}(t) = \mathfrak{x}(t)\dots
$$

The response of a *linear* memory-less system can therefore be written as

$$\mathbf{y}(t) = \mathbf{g}\_1 \delta(\mathbf{r}) \* \mathbf{x}(t)$$

which shows a striking similarity with (1.3), the response of a linear dynamical system with impulse response *h*. In fact *g*1δ is the impulse response of a linear memory-less system, and it vanishes everywhere except at the origin as expected.

From these considerations it's natural to hypothesise that the response of a class of nonlinear systems, around a stable equilibrium point, can be represented by a series of the form

$$\mathbf{y}(t) = \sum\_{k=1}^{\infty} h\_k(\tau\_1, \dots, \tau\_k) \* \boldsymbol{\pi}^{\otimes k}(t) \; .$$

This is in fact true, and it is the *Volterra series* representation of the system with *hk* its *k*th order impulse response. This representation is valid only for sufficiently small input signals not pushing the state of the system beyond a separatrix. This limitation of the Volterra series should not surprise. In fact power series, which are a subset of the Volterra series, in general also have a finite convergence radius.

The similarity between power series and the Volterra series doesn't end here. By introducing suitable definitions, we can represent the cascade of nonlinear systems represented by their respective Volterra series in a similar way as the composition of power series.

We call systems that can be represented by a Volterra series *weakly-nonlinear* systems. A feature of weakly-nonlinear systems shared with linear ones is the fact that the differential equations describing the system have to be solved only once to obtain the impulse responses. The response of the system to a large set of different input signals can then be computed directly from them. The impulse responses therefore completely characterise weakly-nonlinear systems. As with linear systems, weaklynonlinear ones have a frequency domain representation in terms of nonlinear transfer functions.

The theory can be extended to cover time-varying systems. In this case the impulse responses (or nonlinear transfer functions) have an explicit dependence on time

$$h\_k(t, \mathfrak{r}\_1, \dots, \mathfrak{r}\_k) \dots$$

# **1.3 Distributions**

The Dirac δ distribution plays a key role in highlighting the relationship between power and Volterra series. An ad-hoc use of the δ distribution however easily leads to problems.

Consider for example the *Heaviside unit step* function (or *unit step* function)

$$\mathfrak{1}\_{+}(t) := \begin{cases} 0 & t < 0 \\ 1 & t \ge 0 \end{cases} \tag{1.4}$$

and the theorem stating that the Laplace transform of the derivative of a function *f* continuous for *t* > 0 is

$$sF(s) - f(0+)$$

with *F* the Laplace transform of *f* and *f* (0+) the right-hand side limit to 0 of the function. A careless application of this theorem to 1<sup>+</sup> gives

$$\mathcal{L}\left\{\frac{\mathbf{d}}{\mathbf{d}t}\mathbf{1}\_{+}\right\} = s\frac{1}{s} - 1 = 0$$

where we have used the fact that *L*{1+} = 1/*s*. However, we will show that the derivative of 1<sup>+</sup> is the δ impulse whose Laplace transform is 1. The error lies in the fact that δ is not a function, but rather a Schwartz's distribution, or *distribution* for short. The above theorem, in the stated form, is therefore not applicable.

Distributions are the proper setting for studying linear and weakly-nonlinear systems. In this setting the convolution product comes to play a central role. In fact, distributions allow defining *convolution algebras* with δ playing the role of the unit. The Laplace transform then not only maps convolution products into multiplications, but it also maps the unit of the convolution algebra into the unit of multiplication. In addition, in this setting, the derivative of a distribution *f* can be represented as the convolution of the distribution with the derivative of the unit

$$\frac{\mathbf{d}}{\mathbf{d}t}f = \frac{\mathbf{d}}{\mathbf{d}t}\delta \ast f \,.$$

Differential equations can therefore be transformed into convolution equations to obtain a complete time-domain mirror image of the Laplace domain algebraic equations. Distributions enable a uniform representation in terms of convolution products of ubiquitous and embarrassingly simple linear systems such as inductors, which a theory based on functions is unable to do

$$v = L \frac{\mathbf{d}}{\mathbf{d}t} \delta \; \* \; i \; .$$

Here we see the current *i* as the input and the voltage v as the output of the system.

While we have been implicitly assuming causal systems and signals vanishing for *t* < 0, there are other convolution algebras. One of them is the convolution algebra of periodic distributions intimately related to the Fourier series, where the δ distribution plays a central role as well.

# **1.4 Numerical Simulations**

Learning a theory requires some investment of time. The question is: is it worth it in a world full of computers and where numerical methods able to solve most nonlinear equations are readily available? In our view the answer is definitely a resounding yes. Numerical simulations and theory are not in competition, but rather they complement each other.

The theory is able to reveal the origin of the various effects at play and to clarify how each parameter affects the performance of a system. However, to do so we must use relatively simple models and therefore, most of the time, obtain approximate results.

Simulations on the other hand can be used to obtain accurate answers taking into account all details. However, the results are presented as tables of numbers (or curves) valid only for a specific set of values of the parameters. We can of course run many simulations and sweep parameters, but complex simulations are not fast and this poses practical limits. In addition, inferring the relationship between parameters and a specific effect from simulation results only is often challenging. We can say that a good theory based engineering model is like a (slightly distorted) picture, while numerical simulations are like dots of a halftone image. A good model is worth thousands of simulations.

Most of the time the difficulty in engineering problems lies in finding the simplest model able to correctly characterise the effects of interest. During the phase of model development, numerical simulators can be extremely useful by using them as ideal laboratories in which to validate hypotheses. In these virtual laboratories it's easy to change the laws of physics and suppress or decouple phenomena in a way that's impossible in the real world. Experiments conducted in these virtual laboratories can therefore be an invaluable guide in the development of a model. Once a good model has been found, it will rapidly guide the development of the system. Numerical simulations will then serve further for final tuning and verification.

# **1.5 Historical Notes**

Around 1887 Vito Volterra developed the concept of *functionals* as an extension of functions of multiple variables to ones with an uncountable infinite number [4, 5]. Let *f* (*x*1,..., *xk* ) be a real valued function of the *k* real variables *x*<sup>1</sup> to *xk* . The latter can be interpreted as the values of a function *x*. evaluated at the discrete points 1 to *k*. He therefore conceived a functional as a function of another function *x* defined over a continuous finite interval. He then proposed the series now bearing his name as an extension of the Taylor series from functions to functionals.

In 1910 the mathematician M. Fréchet published a more detailed analysis of the conditions under which a functional can be expanded into a Volterra series [6]. This work is regarded by most as the foundation demonstrating the validity of Volterra's series expansion.

Volterra was well known and was invited to present his works in several countries, including the United States. During the second World War there was great pressure to develop anti-aircraft systems and N. Wiener of the Massachusetts Institute of Technology (MIT) found that by using Volterra's series he could analyse the response of a nonlinear device to noise [7]. His report was initially restricted. Its release after the war sparked interest in the engineering community at MIT and elsewhere. Several studies followed in the 50s to the 70s applying Volterra's series to nonlinear engineering problems, with [8–10] among the most significant ones. Wiener himself remained interested in the subject and developed his own variant of the theory based on Browning motion and leading to what's now called the *Wiener series* [11]. At the beginning of the 80s some books summarised the Volterra and Wiener theories [12, 13]. At around the same time, with the raise of desktop computers, engineering efforts started more and more to embrace numerical methods.

During the first decades of the 20th century there were two mathematical methods used by engineers and physicists that kept mathematicians occupied. The first was to find a solid mathematical justification for the *operational calculus* popularised by O. Heaviside. The second was the search for a solid mathematical interpretation for the δ distribution and its derivatives extensively used by P. A. M. Dirac in his landmark treatise on quantum mechanics which first appeared in 1930 [14].

The former was solved in two ways:


The second was solved by L. Schwartz by introducing new mathematical objects called *distributions* [16]. These are a special class of functionals with particularly attractive properties such at the fact that they are indefinitely differentiable. In addition, differentiation of distributions is a linear operation making series always differentiable term by term. With distributions Schwartz not only did put the δ "function" and its derivatives on a solid ground, but he also introduced convolution algebras and unified the two justifications for the operational calculus.

A deep understanding of distributions requires familiarity with advanced concepts of topological vector spaces [16, 17], which is probably why they are rarely introduced to engineers. However, the elementary part of the theory can be developed without recurring to particularly deep mathematical concepts and is of great practical value in physics and engineering problems. The aim of this book is to introduce distributions to engineers and use them to view the Volterra series and, more generally, weakly-nonlinear systems from a point of view different from the traditional one. The advantages are, among others, a conceptual simplification, a simpler notation freeing expressions from multiple integrals and an exposition of the theory of weakly-nonlinear systems as a natural extension of the linear one.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part I Signals**

# **Chapter 2 Distributions**

We investigate the mathematical description of signals that are commonly used in the analysis of technical problems. From a mathematical point of view it would be useful to limit the signals of interest to the set of continuous functions. However, this has several disadvantages. For example, suppose we are interested in the transient response of, say, a series *RC* low-pass-filter (LPF) to an input step voltage. If we describe the input signal with a continuous function, then the details of the calculations depend on the chosen description of the input signal rise transient. This however tends to mask the fact that, if the LPF time constant is much larger than the input signal rise-time, the output response is essentially independent of the input signal transient shape. For this reason, in these situations, it is much more convenient to use an *idealized* input unit step function such as the Heaviside unit step function

$$\mathbf{1}\_+(t) = \begin{cases} 0 \ t < 0 \\ 1 \ t \ge 0 \end{cases}$$

which is not continuous at *t* = 0.

Consider further the LPF example. If we write the differential equation using the current as unknown, then we need the derivative of the driving signal. However, the derivative of the function 1<sup>+</sup> does not exist at *t* = 0 and is zero at every other point. We are therefore led to introduce the so-called Dirac impulse δ which however is not even a function, but rather a *generalized function* or *distribution*.

It follows from these considerations that a correct description of commonly used signals belongs to the theory of *distributions*. Distributions have many useful properties. The key one being that they can be differentiated any number of times. The main contributor to the development of this theory was Schwartz [16].

# **2.1 Test Functions**

The key idea in the theory of distributions, is not to direct attention to the value of a function at every point of its domain, but instead to "measure" the behavior of a function when acting on a class of particularly well-behaved functions. In this section we introduce one such class of functions, the class of *test functions*.

Let *k* = (*k*1,..., *kn*) be an *n*-tuple of non-negative integers called a *multi-index.* The *differential operator* of order |*k*| is defined by

$$D^k := D\_1^{k\_1} \cdot \cdots \cdot D\_n^{k\_n}, \quad D\_i := \frac{\partial}{\partial \pi\_i} \tag{2.1}$$

with

$$|k| := k\_1 + \dots + k\_n \tag{2.2}$$

the length of the multi-index *<sup>k</sup>* and <sup>τ</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*. For functions of a single variable we also use the following shorter notation for the *k*th order derivative

$$f^{(k)} := \frac{\mathbf{d}^k f}{\mathbf{d}\tau^k} \tag{2.3}$$

where in this case *k* is of course a single non-negative integer.

Given an open set *<sup>U</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*, the set of all *<sup>k</sup>*-times continuously differentiable functions *<sup>f</sup>* : *<sup>U</sup>* <sup>→</sup> <sup>C</sup> is denoted by <sup>C</sup>*<sup>k</sup>* (*U*) or simply <sup>C</sup>*<sup>k</sup>* .

**Definition 2.1** (*Test function*) A function <sup>φ</sup> : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>C</sup> is called a *test function* if it is indefinitely differentiable and has compact support, that is, if φ ∈ C<sup>∞</sup> and φ(τ1,...,τ*n*) = 0 outside a compact set *K*. The vector space of all such functions is denoted by D.

To define a continuity criterion for distributions we need to define a topology that can be encoded in the form of a *convergence principle* in D.

**Definition 2.2** (*Convergence of test functions*) A sequence of functions φ*<sup>m</sup>* ∈ <sup>D</sup>, *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> is said to converge to <sup>φ</sup> <sup>∈</sup> <sup>D</sup>, in symbols

$$
\phi\_m \xrightarrow[\mathcal{D}]{} \phi \quad \text{or} \quad \lim\_{m \to \infty} \phi\_m = \phi \; , 
$$

if the following two conditions are met:


These conditions ensure that the limiting function (i) has compact support and (ii) that it is indefinitely differentiable, in other words, that the limiting function is also a test function.

#### **Example 2.1: Test function**

Consider the following functions (see Fig. 2.1)

$$\beta\_{\boldsymbol{\nu}}(t) := \begin{cases} \frac{\boldsymbol{v}}{B} \mathbf{e}^{\frac{-1}{1-(\boldsymbol{\nu})^2}} & \text{for } |\boldsymbol{\nu}t| < 1\\ 0 & \text{for } |\boldsymbol{\nu}t| \ge 1 \end{cases} \tag{2.4}$$

$$B := \int\_{-1}^{1} \mathbf{e}^{\frac{-1}{1-t^2}} \,\mathrm{d}t$$

For each value of ν > 0 and for |ν*t*| < 1 the function βν is the composition of a rational function with no singularities and the exponential function. Since the latter two functions are indefinitely differentiable and the composition of indefinitely differentiable functions is also indefinitely differentiable, it follows that, in this range, βν is indefinitely differentiable.

To establish that βν is a test function we have further to show that

$$\lim\_{||v||\uparrow 1} D^k \beta\_v(t) = 0 \tag{2.5}$$

for all values of *k*. This can be done by induction: assume that the *k*th order derivative is the product of βν and a polynomial in the two variables τ<sup>1</sup> = 1/(1 − ν*t*) and τ<sup>2</sup> = 1/(1 + ν*t*)

$$D^k \beta\_\upsilon(t) = p\_k \left( \frac{1}{1 - \upsilon t}, \frac{1}{1 + \upsilon t} \right) \beta\_\upsilon(t) \,. \tag{2.6}$$

This is clearly the case for *k* = 0. We show that this is then true for *k* + 1:

#### 20 2 Distributions

$$\begin{split} D^{k+1}\beta\_{\boldsymbol{\nu}}(t) \\ &= \quad \left[ -\frac{\boldsymbol{\nu}\left(D\_{2}p\_{k}\right)\left(\frac{1}{1-\boldsymbol{\nu}\boldsymbol{\nu}},\frac{1}{1+\boldsymbol{\nu}\boldsymbol{\nu}}\right)}{\left(\boldsymbol{\nu}\boldsymbol{\nu}+1\right)^{2}} + \frac{\boldsymbol{\nu}\,p\_{k}\left(\frac{1}{1-\boldsymbol{\nu}\boldsymbol{\nu}},\frac{1}{1+\boldsymbol{\nu}\boldsymbol{\nu}}\right)}{2\left(\boldsymbol{\nu}\boldsymbol{\nu}+1\right)^{2}} \right. \\ &\quad + \frac{\boldsymbol{\nu}\left(D\_{1}p\_{k}\right)\left(\frac{1}{1-\boldsymbol{\nu}\boldsymbol{\nu}},\frac{1}{1+\boldsymbol{\nu}\boldsymbol{\nu}}\right)}{\left(\boldsymbol{\nu}\boldsymbol{\nu}-1\right)^{2}} - \frac{\boldsymbol{\nu}\,p\_{k}\left(\frac{1}{1-\boldsymbol{\nu}\boldsymbol{\nu}},\frac{1}{1+\boldsymbol{\nu}\boldsymbol{\nu}}\right)}{2\left(\boldsymbol{\nu}\boldsymbol{\nu}-1\right)^{2}} \right] \beta\_{\boldsymbol{\nu}}(t) \\ &=: \quad p\_{k+1}\left(\frac{1}{1-\boldsymbol{\nu}\boldsymbol{\nu}},\frac{1}{1+\boldsymbol{\nu}\boldsymbol{\nu}}\right)\beta\_{\boldsymbol{\nu}}(t) \,. \end{split} \tag{2.7}$$

If we express the limit as ν*t* tends to 1 in terms of τ1, we see that it is the limit of the product of a polynomial and a decreasing exponential which converges to 0

$$\lim\_{\tau \downarrow 1} D^k \beta\_\upsilon(t) = \lim\_{\tau\_1 \to \infty} p\_k(\tau\_1, \frac{\tau\_1}{2\tau\_1 - 1}) \frac{\nu}{B} e^{-\frac{\tau\_1^2}{2\tau\_1 - 1}} = 0 \,. \tag{2.8}$$

Similarly, the limit towards –1 can be expressed in terms of τ<sup>2</sup> with the same result. Hence βν ∈ C<sup>∞</sup>.

While βν is a test function for each value of <sup>ν</sup>, the sequence (β*m*), *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> doesn't converge in D. For *t* = 0, *m* → ∞ the value of β*m*(*t*) converges toward zero, while the value of the functions at *t* = 0 grows without bounds. The limiting function is therefore not continuous.

The sequence (β<sup>1</sup>/*<sup>m</sup>*) also doesn't converge in D. As *m* → ∞ the support of the functions grows without bounds. It is therefore not possible to find a compact set *K* containing the support of all members of the sequence as well as that of the limiting function.

An example of a converging sequence is β*m*/*m*<sup>2</sup> which converges toward the zero function.

#### **Example 2.2: Regularisation**

Consider an impulse of finite duration (see Fig. 2.2a)

$$\mathfrak{1}\_k(t) := \mathfrak{1}\_+(t) - \mathfrak{1}\_+(t - k) = \begin{cases} 1 \ 0 \le t < k \\ 0 \text{ otherwise} \end{cases}$$

This function is clearly not continuous at *t* = 0 and *t* = *k*. These jump discontinuities can be removed by convolving 1*<sup>k</sup>* with the function βν of the previous example

$$\mathfrak{A}\_{k} \* \beta\_{\upsilon}(t) = \int\_{-\infty}^{\infty} \mathfrak{I}\_{k}(\tau) \, \beta\_{\upsilon}(t - \tau) \, d\tau = \int\_{0}^{k} \beta\_{\upsilon}(t - \tau) \, d\tau \, . \tag{2.9}$$

We say that the so obtained function is the *regularised* of 1*<sup>k</sup>* by βν (see Fig. 2.2a).

**Fig. 2.2 a** Regularized of the discontinuous function 12(.) **b** Construction of the regularized of 12(.)

Observe that 1*<sup>k</sup>* ∗ βν is just a definite integral of βν and is therefore indefinitely differentiable. If the support of βν lies completely within the integration range, then, given the chosen normalization constant for βν , the value of 1*<sup>k</sup>* ∗ βν is 1. If the support of βν doesn't intersect the integration range, then the value of 1*<sup>k</sup>* ∗ βν is 0. For the remaining values of the independent variable *t*, 0 < 1*<sup>k</sup>* ∗ βν (*t*) < 1 (see Fig. 2.2b)

$$\mathbb{1}\_k \* \beta\_\nu(t) = \begin{cases} 1 & 1/\nu \le t \le k - 1/\nu \\ 0 & t \le -1/\nu \text{ or } t \ge k + 1/\nu \\ > 0 \text{ and } < 1 \text{ otherwise.} \end{cases} \tag{2.10}$$

We have thus established that 1*<sup>k</sup>* ∗ βν ∈ D.

From this example we see that for any open interval *U* and any closed interval *K* ⊂ *U* we can construct a real valued test function φ with 0 ≤ φ(*t*) ≤ 1, a value of 1 within *K* and a value of 0 outside of *U*. This is a useful property that we will exploit later.

A similar construction can be made for test functions of more than one variable. For later reference we define a real valued test function with values between 0 and 1 that we call α such that

$$\alpha: \mathbb{R}^n \to [0, 1], \quad \mathfrak{r} \mapsto \begin{cases} 1 \, |\mathfrak{r}| \le 1 \\ 0 \, |\mathfrak{r}| \ge 2 \,. \end{cases} \tag{2.11}$$

# **2.2 Distributions**

A key aspect of the theory of distributions is the fact that it makes continuous functions differentiable any number of times. To see how this goes, remember from calculus that by partial integration we can transfer the operation of differentiation from one function to another one. Thus, if we pair the function of interest *f* with a function φ differentiable everywhere, then we can relate the derivative of *f* with a well-defined expression

$$\int\_{-\infty}^{\infty} Df(\tau) \, \phi(\tau) \, d\tau = f(\tau) \, \phi(\tau)|\_{-\infty}^{\infty} - \int\_{-\infty}^{\infty} f(\tau) \, D\phi(\tau) \, d\tau \, . \tag{2.12}$$

To make the expression independent of the limits of integration, the first term on the right-hand side should disappear. This can be achieved, for example, by choosing a function φ with compact support. In addition, to be able to assign a meaning to the derivative of any order, the function φ should be indefinitely differentiable. Note that these are precisely the properties of *test functions*.

An additional requirement is that of the assignment being unique. For example, the right-hand side expression should be identically zero only if *D f* = 0 (almost everywhere). Suppose that *f* has compact support. If the support of *D*φ doesn't overlap with the one of *f* then the right-hand expression is also zero and the assignment is not unique. To avoid this situation we are forced to pair the function *f* with *every* test function φ ∈ D.

A distribution is a generalization of these ideas and is defined as follows.

**Definition 2.3** (*Distribution*) A *distribution* is defined as a *linear, continuous* function on the set of test functions

$$T: \mathcal{D}(\mathbb{R}^n) \to \mathbb{C}, \quad \phi \mapsto \langle T, \phi \rangle \tag{2.13}$$

This means that a distribution *T* has the following properties:


Since distributions are linear by definition, the condition of continuity can be expressed in a slightly different, but equivalent way:

3'. From <sup>φ</sup>*<sup>k</sup>* −→<sup>D</sup> 0 it follows that *T*, φ*<sup>k</sup>* → 0.

Two distributions *T*<sup>1</sup> and *T*<sup>2</sup> are equal if *T*1, φ=*T*2, φ for every test function φ ∈ D. A distribution is called *real* if it evaluates to a real number when applied to any real valued test function.

The set of all distributions forms a vector space denoted by D , where addition of two distributions *T*<sup>1</sup> and *T*<sup>2</sup> and multiplication with a complex constant *c* are defined by

$$\begin{aligned} \langle T\_1 + T\_2, \phi \rangle &:= \langle T\_1, \phi \rangle + \langle T\_2, \phi \rangle \\ \langle cT, \phi \rangle &:= c \langle T, \phi \rangle = \langle T, c \, \phi \rangle. \end{aligned}$$

A mapping assigning a number to every element of a vector space is called a *functional*. Distributions are therefore functionals on test functions.

#### **Example 2.3: Functions as distributions**

Consider a continuous function *<sup>f</sup>* <sup>∈</sup> <sup>C</sup>(R*<sup>n</sup>*). We can associate with it a distribution *Tf* by the procedure outlined at the beginning of the section

$$
\langle T\_f, \phi \rangle = \int\_{\mathbb{R}^n} f(\boldsymbol{\tau}) \, \phi(\boldsymbol{\tau}) \, d^n \boldsymbol{\tau} \,. \tag{2.14}
$$

Linearity is clear from the properties of integrals. To see that it is continuous, consider a sequence of test functions converging to zero <sup>φ</sup>*<sup>m</sup>* −→<sup>D</sup> 0. Then

$$\int\_{\mathbb{R}^n} f\left(\tau\right) \phi\_m(\tau) \, d^n \tau \le \sup\_{t \in K} |\phi\_m(t)| \int\_K |f(\tau)| \, d^n \tau \longrightarrow 0$$

with *K* a compact set including the support of all φ*m*.

Consider now two continuous functions *f*<sup>1</sup> and *f*2. If*Tf*<sup>1</sup> , φ=*Tf*<sup>2</sup> , φ for every φ ∈ D, then, by the properties of integrals of continuous functions, it follows that *f*<sup>1</sup> = *f*2. We thus have an injective mapping from continuous functions to distributions. We can therefore *identify* continuous functions with their corresponding distributions and write  *f*, φ instead of *Tf* , φ .

The theory of distributions requires the use of Lebesgue integrals as opposed to Riemann ones, as Lebesgue's integration theory is more powerful and allows integrating a broader set of functions. Of course, when both integrals do exist, they coincide. A key concept in the Lebesgue theory of integration is that of the *measure*. For our purposes we can think of the Lebesgue measure as a volume and a set of *zero measure* in R*<sup>n</sup>* as a (sufficiently regular) subspace of dimension *k* < *n*. A point on the real line R, a line on a plane and a surface in R<sup>3</sup> are all examples of sets of zero measure. The union of a *denumerable* family of sets of zero measure is itself a set of zero measure. Therefore, the set of rational numbers on the real line R has zero measure. Two locally integrable functions differing only on a set of zero measure are said to be equal *almost everywhere*.

#### **Example 2.4: Locally integrable functions**

Consider a *locally integrable* function *f* ∈ L<sup>1</sup> loc(R*<sup>n</sup>*), a function that is Lebesgue integrable over every compact set *<sup>K</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*. As in the previous example we can associate it with a distribution through the integral (2.14). In this case however the mapping is not injective. Any two locally integrable functions *f*<sup>1</sup> and *f*<sup>2</sup> differing only in a set of measure zero produce the same value  *f*1, φ= *f*2, φ for every φ ∈ D. That means that they map to the same distribution.

In physical and engineering applications the values of a function in a set of zero measure is often unimportant. It is therefore natural to consider the *equivalence class* of all functions differing at most on a set of zero measure. In this way we obtain again an injective mapping, but now from the equivalence class of locally integrable functions differing at most on a set of zero measure (equal almost everywhere) to distributions, and we can again identify without ambiguity the former with the latter. To avoid overloading the notation we write a representative for the equivalence class, that is, we write  *f*, φ where *f* is a representative.

All distributions that can be represented by locally integrable functions through (2.14) are called *regular distributions*. However, not all distributions are regular and distributions that aren't regular are called *singular distributions*. Nonetheless, regular distributions are *dense* in D . That is, in a similar way as real numbers arise as a limiting process from rational ones, any distribution can be represented as a limit of regular distributions, where the convergence of distributions is defined as follows.

**Definition 2.4** (*Convergence of distributions*) A sequence of distributions (*Tm*)*<sup>m</sup>*∈<sup>N</sup> is said to converge to the distribution *T* , if the sequence of numbers*Tm*, φ converges to the number *T*, φ for every φ ∈ D. In symbols

$$T\_m \xrightarrow[\mathcal{O'}]{} T \quad \text{or} \quad \lim\_{m \to \infty} T\_m = T$$

#### 2.2 Distributions 25

if

$$
\langle T\_m, \phi \rangle \quad \longrightarrow \ \langle T, \phi \rangle \quad \text{for every } \phi \in \mathcal{D} \text{ .}\
$$

It is not obvious that the limit *T* is in fact a distribution, that is, linear and continuous. However, this is indeed the case. The space D is thus closed under convergence. A proof can be found in [18].

This definition is based on a discrete parameter *m* traversing the natural numbers. If the parameter traverses a continuous set of values the situation is similar and can be reduced to the discrete case. Consider the sequence of distributions *T*<sup>ν</sup> depending on the continuous parameter <sup>ν</sup> <sup>∈</sup> <sup>R</sup>. For each value of <sup>ν</sup> and each test function <sup>φ</sup>, the functional *T*<sup>ν</sup> , φ evaluates to a number. For each test function φ the set of distributions *T*<sup>ν</sup> therefore defines a function of ν

$$
\zeta(\nu) = \langle T\_{\upsilon}, \phi \rangle.
$$

Lets define a sequence (ν*m*)*<sup>m</sup>*∈<sup>N</sup> of values converging toward infinity. If for every such sequence and every test function

$$\lim\_{k \to \infty} \zeta(\nu\_k) = \lim\_{k \to \infty} \langle T\_{\nu\_k}, \phi \rangle = \langle T, \phi \rangle.$$

then

$$\lim\_{\nu \to \infty} \langle T\_{\nu}, \phi \rangle = \langle T, \phi \rangle.$$

Similarly for a continuous parameter converging toward a finite limit η.

#### **Example 2.5: Dirac delta distribution**

Consider the functions β*<sup>m</sup>* of Example 2.1. The regular distributions associated with these functions form a sequence converging to a singular distribution. We have

$$\begin{aligned} \lim\_{m \to \infty} \langle \beta\_m, \phi \rangle \\ = \lim\_{m \to \infty} \int\_{-1/m}^{1/m} \beta\_m(\tau) \, \phi(\tau) \, d\tau \\ = \lim\_{m \to \infty} \left\{ \int\_{-1/m}^{1/m} \beta\_m(\tau) \, \phi(0) \, d\tau + \int\_{-1/m}^{1/m} \beta\_m(\tau) \, [\phi(\tau) - \phi(0)] \, d\tau \right\} \end{aligned}$$

Since test functions are continuous and differentiable we can use the mean value theorem to express φ as

$$
\phi(\mathfrak{r}) = \phi(0) + D\phi(\lambda)\mathfrak{r}
$$

for some λ ∈ (0,τ). With this we can see that the second term converges to zero

26 2 Distributions

$$\begin{aligned} \int\_{-1/m}^{1/m} \beta\_m(\tau) \left[ \phi(\tau) - \phi(0) \right] d\tau &\leq \int\_{-1/m}^{1/m} |\beta\_m(\tau)| \left| \phi(\tau) - \phi(0) \right| d\tau \\ &\leq \sup\_{\lambda \in (-\frac{1}{m}, \frac{1}{m})} \frac{|D\phi(\lambda)|}{m} \int\_{-1/m}^{1/m} |\beta\_m(\tau)| \, d\tau \\ &\longrightarrow 0 \end{aligned}$$

We therefore obtain

$$\begin{aligned} \lim\_{m \to \infty} \langle \beta\_m, \phi \rangle &= \phi(0) \lim\_{m \to \infty} \int\_{-1/m}^{1/m} \beta\_m(\tau) \, d\tau \\ &= \phi(0) \lim\_{m \to \infty} 1 \\ &= \phi(0). \end{aligned}$$

The sequence β*<sup>m</sup>* thus converge to the Dirac delta distribution δ which is defined by

$$
\langle \delta, \phi \rangle := \phi(0). \tag{2.15}
$$

Besides the sequence β*<sup>m</sup>* there are many other regular distribution sequences converging to δ. For example, with the same procedure used above, it is simple to show that the sequence defined by the following functions does also converge to δ

$$f\_m(t) = \begin{cases} m/2 \ |t| \le 1/m \\ 0 & |t| > 1/m \end{cases}$$

Note that the notation used in many technical texts to define the Dirac delta distribution is not mathematically correct and only has a symbolic value

$$\int\_{-\infty}^{\infty} \delta(\tau) \,\phi(\tau) \,d\tau = \phi(0).$$

This notation imply the existence of a *function* with a value of zero everywhere but at τ = 0 where its value is infinite. However, the value of the Lebesgue integral of such a function is zero since a single point of the real line has zero measure. This notation is however useful as it helps to remember several properties that we will see shortly.

#### **Example 2.6: Cauchy principal value**

The function *f* (τ ) = 1/τ is not locally integrable. For this reason we can't associate with it a regular distribution through (2.14). A way around this is to use the *Cauchy principal value* of the integral to define the following singular distribution

#### 2.2 Distributions 27

$$\begin{split} \left\langle \text{pv} \, \frac{1}{\tau}, \phi \right\rangle &:= \text{pv} \int\_{-\infty}^{\infty} \frac{\phi(\tau)}{\tau} \, d\tau \\ &= \lim\_{\epsilon \downarrow 0} \left\{ \int\_{-\infty}^{-\epsilon} \frac{\phi(\tau)}{\tau} \, d\tau + \int\_{\epsilon}^{\infty} \frac{\phi(\tau)}{\tau} \, d\tau \right\} \end{split} \tag{2.16}$$

Integrating by parts the first integral we obtain

$$\begin{aligned} \int\_{-\infty}^{-\epsilon} \frac{\phi(\tau)}{\tau} \, d\tau &= \left. (\phi(\tau) \ln|\tau|) \right|\_{-\infty}^{-\epsilon} - \int\_{-\infty}^{-\epsilon} \ln|\tau| \, D\phi(\tau) \, d\tau \\ &= \phi(-\epsilon) \ln|\epsilon| - \int\_{-\infty}^{-\epsilon} \ln|\tau| \, D\phi(\tau) \, d\tau \end{aligned}$$

and similarly for the second integral

$$\int\_{\epsilon}^{\infty} \frac{\phi(\tau)}{\tau} \, d\tau = -\phi(\epsilon) \ln(\epsilon) - \int\_{\epsilon}^{\infty} \ln(\tau) \, D\phi(\tau) \, d\tau \dots$$

We see that the first term of both integrals do diverge as  goes to 0. However, using the mean value theorem, we note that there are values λ<sup>1</sup> ∈ (0, ) and λ<sup>2</sup> ∈ (−, 0) such that

$$\begin{aligned} \phi(\epsilon) &= \phi(0) + \epsilon \, D\phi(\lambda\_1) \\ \phi(-\epsilon) &= \phi(0) - \epsilon \, D\phi(\lambda\_2) \end{aligned}$$

With *M* = − (*D*φ(λ1) + *D*φ(λ2)), the limit of the sum of the diverging parts therefore do cancel

$$\lim\_{\epsilon \downarrow 0} M \,\epsilon \,\, \ln|\epsilon| = 0$$

and we finally obtain

$$\langle \text{pv}\,\frac{1}{\mathfrak{r}}, \phi \rangle = -\int\_{-\infty}^{\infty} \ln|\mathfrak{r}| \, D\phi(\mathfrak{r}) \, d\mathfrak{r} \, .$$

This last integral is well defined as ln |τ | is locally integrable and therefore defines a well defined regular distribution. We will meet this distribution again in the context of the Fourier transform of distributions.

# **2.3 Basic Properties**

There are some useful operations that we can perform on locally integrable functions that can be carried over to distributions. A common operation is to shift a function *f* by an amount τ to obtain *t* → *f* (*t* − τ ). If we apply the change of variable λ = *t* − τ to the regular distribution associated with the shifted function we obtain

$$\int\_{-\infty}^{\infty} f(t - \tau) \,\phi(t) \,dt = \int\_{-\infty}^{\infty} f(\lambda) \,\phi(\lambda + \tau) \,d\lambda.$$

By generalizing this result we define the operation of *shifting a distribution* by

$$
\langle T(t - \tau), \phi(t) \rangle := \langle T(t), \phi(t + \tau) \rangle. \tag{2.17}
$$

With this definition we can for example denote a Dirac pulse at time τ by δ(*t* − τ )

$$
\langle \delta(t - \tau), \phi(t) \rangle := \phi(\tau).
$$

# **• ! Notation**

Note that a distribution *T* isn't a function of the variable *t*. In spite of this it is useful to write *T* (*t*) to indicate the symbol used for the independent variable of the testing function (this will be useful when we'll introduce operations such as the convolution) and as a convenient notation to indicate some operations such as shifting. In no way this is meant to imply the existence of a function or that the distribution is regular.

Another useful operation is multiplication of the independent variable of a function by a constant *a*. By generalizing what happens with regular distributions, we define *multiplication of the independent variable by a constant a* for any distribution in D (R*<sup>n</sup>*) by

$$\langle T(a\,t), \phi(t)\rangle := \left\langle T(t), \frac{1}{|a|^n} \phi\left(\frac{t}{a}\right)\right\rangle. \tag{2.18}$$

This operation is closely related to the concepts of even and odd distributions.

**Definition 2.5** (*Even and odd distributions*) An *even* distribution *T* is defined as a distribution for which, for every test function φ

$$
\langle T(t), \phi(-t) \rangle = \langle T(t), \phi(t) \rangle. \tag{2.19}
$$

Similarly, an *odd* distribution satisfies

$$
\langle T(t), \phi(-t) \rangle = -\langle T(t), \phi(t) \rangle \tag{2.20}
$$

for every test function φ.

A further useful operation is *multiplication of a distribution with an indefinitely differentiable function* γ . First note that multiplication of a test function φ with an indefinitely differentiable function results in another test function. For this reason we can again generalize the behavior of regular distributions and define

$$\langle \boldsymbol{\chi}\boldsymbol{T}, \boldsymbol{\phi} \rangle := \langle \boldsymbol{T}, \boldsymbol{\chi}\boldsymbol{\phi} \rangle . \tag{2.21}$$

# **2.4 Differentiation of Distributions**

At the beginning of Sect. 2.2 we mentioned that one of the distinguishing features of distributions is the fact that they can be differentiated any number of times. We also argued that, for regular distributions, partial integration leads to an expression which can be considered as the definition of the derivative of regular distributions

$$
\begin{split}
\langle f^{(1)}, \phi \rangle &= \int\_{-\infty}^{\infty} f^{(1)}(\tau) \, \phi(\tau) \, d\tau \\&= -\int\_{-\infty}^{\infty} f(\tau) \, \phi^{(1)}(\tau) \, d\tau \\&= \langle f, -\phi^{(1)} \rangle. \end{split}
\tag{2.22}
$$

In fact, this definition can be extended to singular distributions and to distributions of several variables, that is to arbitrary distributions.

**Definition 2.6** The first order partial derivative of a distribution *<sup>T</sup>* on <sup>D</sup>(R*<sup>n</sup>*) is defined by

$$\langle D\_i T, \phi \rangle := \langle T, -D\_i \phi \rangle \quad i = 1, \ldots, n \ . \tag{2.23}$$

Since the derivative of a test function *Di*φ is still a test function, it follows that the derivative of a distribution is always a distribution and that distributions can be differentiated an arbitrary number of times.

With *k* an *n*-tuple of non-negative integers, the derivative of order |*k*| follows from the above definition

$$
\langle D^k T, \phi \rangle = (-1)^{|k|} \langle T, D^k \phi \rangle \,. \tag{2.24}
$$

The order of differentiation is irrelevant since test functions have continuous partial derivatives of all orders and hence

$$<\langle D\_i D\_j T, \phi \rangle = \langle T, D\_j D\_i \phi \rangle = \langle T, D\_i D\_j \phi \rangle = \langle D\_j D\_i T, \phi \rangle \dots$$

The rule for the differentiation of the product of a distribution *T* and an indefinitely differentiable function γ is the same as the rule of differentiation for the product of two functions

$$
\begin{aligned}
\langle D\_i(\chi T), \phi \rangle &= -\langle \chi T, D\_i \phi \rangle = -\langle T, \chi \not\supset D\_i \phi \rangle \\&= -\langle T, D\_i(\chi \not\phi) \rangle + \langle T, D\_i \chi \not\phi \rangle \\&= \langle D\_i T, \chi \not\phi \rangle + \langle D\_i \chi \not\supset T, \phi \rangle \\&= \langle \chi \not\supset D\_i T, \phi \rangle + \langle D\_i \chi \not\supset T, \phi \rangle
\end{aligned}
$$

or

$$D\_i(\boldsymbol{\chi}\boldsymbol{T}) = \boldsymbol{\chi}\,D\_i\boldsymbol{T} + \left(D\_i\boldsymbol{\chi}\right)\boldsymbol{T}\,.\tag{2.25}$$

Two important properties of distributional differentiation follow immediately from the definition. The first is that differentiation is a linear operation: given two distributions *T*<sup>1</sup> and *T*<sup>2</sup> and two numbers *c*<sup>1</sup> and *c*<sup>2</sup>

$$D^k(c\_1\,T\_1 + c\_2\,T\_2) = c\_1\,D^kT\_1 + c\_2\,D^kT\_2\,. \tag{2.26}$$

The second is continuity: given a sequence of distributions (*Tm*)*<sup>m</sup>*∈<sup>N</sup> converging toward a distribution *T* , the sequence of corresponding partial derivatives(*DkTm*)*<sup>m</sup>*∈<sup>N</sup> converges to *DkT*

$$\begin{split} \lim\_{m \to \infty} \langle D^k T\_m, \phi \rangle &= \lim\_{m \to \infty} (-1)^{|k|} \langle T\_m, D^k \phi \rangle \\ &= (-1)^{|k|} \langle T, D^k \phi \rangle \\ &= \langle D^k T, \phi \rangle . \end{split} \tag{2.27}$$

In other words, *the operations of limit-taking and differentiation can always be exchanged*. In particular this means that if a sequence of partial sums *Sm* <sup>=</sup> *<sup>m</sup>*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *Ti* converges to a series *S* = <sup>∞</sup> *<sup>i</sup>*=<sup>0</sup> *Ti* , then the *series can be differentiated term by term*.

# **• ! Notation**

To distinguish a regular distribution defined by the usual derivative of a function *f* from the derivative in the sense of distributions of the regular distribution defined by *f* , we are going to always denote the former by *Tf* (*k*) or *TDk <sup>f</sup>* . The later will be denoted interchangeably by *f* (*k*) , *D<sup>k</sup> f* , *T* (*k*) *<sup>f</sup>* or *DkTf* .

#### **Example 2.7: Derivative of** *δ*

The first order derivative of the Dirac delta distribution is

$$\langle \delta^{(1)}, \phi \rangle = -\langle \delta, \phi^{(1)} \rangle = -\phi^{(1)}(0) \dots$$

The *k*th order one is

$$\langle \delta^{(k)}, \phi \rangle = (-1)^k \phi^{(k)}(0) \,.$$

This example shows that, in general, to calculate the value *T*, φ of a distribution *T* when applied to a test function φ, it's not enough to know the values of φ over supp(*T* ). We need to know the values of φ over a *neighborhood* of supp(*T* ).

# **Example 2.8: Derivative of 1<sup>+</sup>**

The derivative of the Heaviside unit step 1<sup>+</sup> as a function is zero everywhere but at *<sup>t</sup>* <sup>=</sup> 0 (a set of zero measure) where it is undefined. The *function* <sup>1</sup>(1) <sup>+</sup> is therefore locally integrable, and we can define the regular distribution *T*1(1) <sup>+</sup> which evaluates to zero for every test function φ.

Differently from this, the derivative of 1<sup>+</sup> as a distribution is defined everywhere and, applying the definition, we find

$$\langle \mathfrak{l}\_+^{(1)}, \phi \rangle = -\langle \mathfrak{l}\_+, \phi^{(1)} \rangle = -\int\_0^\infty \phi^{(1)}(\tau) \, d\tau = -\phi(\tau)|\_0^\infty = \phi(0) = \langle \delta, \phi \rangle \ .$$

that is

$$\mathbf{1}\_+^{(1)} = \delta \dots$$

#### **Example 2.9: Function versus distributional derivative**

Consider a function *<sup>f</sup>* : <sup>R</sup> <sup>→</sup> <sup>C</sup> continuously differentiable *<sup>k</sup>* times everywhere but at *t* = 0, where it has a discontinuity such that both limits

$$\lim\_{t \downarrow 0} f^{(i)}(t) \quad \text{and} \quad \lim\_{t \uparrow 0} f^{(i)}(t).$$

exist for all *i* ≤ *k*. Let's denote the difference between these limits by

$$\alpha\_i = \lim\_{t \downarrow 0} f^{(i)}(t) - \lim\_{t \uparrow 0} f^{(i)}(t) \,.$$

Then we may represent the function *f* as

$$f(t) = f\_{c,0}(t) + \alpha\_0 \mathbf{1}\_+(t)$$

with *fc*,<sup>0</sup> a continuous function. It is easy to see that for *t* = 0, *f* (1) *<sup>c</sup>*,<sup>0</sup> (*t*) = *f* (1) (*t*). Thus, using the results of Example 2.8 the first order derivative of *f* is

32 2 Distributions

$$T\_f^{(1)} = T\_{f^{(1)}} + \alpha\_0 \delta \dots$$

To compute the second-order derivative we can use the same procedure. We decompose the function *f* (1) into a continuous function *fc*,<sup>1</sup> and a step

$$f^{(1)}(t) = f\_{c,1}(t) + \alpha\_1 \mathfrak{l}\_+(t) \dots$$

Differentiating term by term we therefore obtain

$$\begin{aligned} T\_f^{(2)} &= T\_{f^{(1)}}^{(1)} + \alpha\_0 \delta^{(1)} \\ &= T\_{f\_{c,1}}^{(1)} + \alpha\_1 T\_{1\_+}^{(1)} + \alpha\_0 \delta^{(1)} \\ &= T\_{f\_{c,1}^{(1)}} + \alpha\_1 \delta + \alpha\_0 \delta^{(1)} \\ &= T\_{f^{(2)}} + \alpha\_1 \delta + \alpha\_0 \delta^{(1)} \end{aligned}$$

The *k*th order derivative can be obtained by iterating this procedure

$$T\_f^{(k)} = T\_{f^{(k)}} + \alpha\_0 \delta^{(k-1)} + \alpha\_1 \delta^{(k-2)} + \dots + \alpha\_{k-1} \delta \ . \tag{2.28}$$

#### **Example 2.10: Logarithm derivative**

In Example 2.6 we showed that the Cauchy principal value of 1/τ is a distribution

$$\langle \text{pv}\,\frac{1}{\mathfrak{r}}, \phi \rangle = -\int\_{-\infty}^{\infty} \ln|\mathfrak{r}| \, D\phi(\mathfrak{r}) \,d\mathfrak{r}.$$

We now recognize this result as saying

$$\text{pv}\,\frac{1}{\pi} = D\,\ln|\pi|\,.$$

# **Example 2.11: Limit to ∞ of trigonometric functions**

Consider the following parameterized distribution

$$f\_{\alpha}(t) = -\frac{\cos \alpha t}{\alpha} \qquad \omega > 0.$$

As ω tends to infinity, it converges to

#### 2.5 Distributions with Compact Support 33

$$\begin{aligned} \lim\_{\omega \to \infty} |\langle f\_{\omega}, \phi \rangle| &= \lim\_{\omega \to \infty} \left| \int\_{\text{supp}(\phi)} -\frac{\cos \omega t}{\omega} \, \phi(t) \, dt \right| \\ &\le \lim\_{\omega \to \infty} \int\_{\text{supp}(\phi)} \frac{|\phi(t)|}{\omega} \, dt \\ &\le \lim\_{\omega \to \infty} \frac{\sup |\phi(t)|}{\omega} \, K \\ &= 0 \end{aligned}$$

with *K* = supp(φ). Since distributions can always be differentiated and for distributions arising from continuous functions the derivative as a distribution coincides with the derivative as a function, we have the following result

$$\begin{aligned} \lim\_{\omega \to \infty} \langle \sin \omega t, \phi \rangle &= \lim\_{\omega \to \infty} \langle -D \frac{\cos \omega t}{\omega}, \phi \rangle \\ &= \lim\_{\omega \to \infty} \langle \frac{\cos \omega t}{\omega}, D\phi \rangle \\ &= 0 \end{aligned}$$

or

$$\lim\_{\omega \to \infty} \sin \omega t = 0.\tag{2.29}$$

Similarly one obtains

$$\lim\_{\omega \to \infty} \cos \omega t = 0 \quad \text{and} \tag{2.30}$$

$$\lim\_{\omega \to \infty} e^{\imath \omega t} = 0.\tag{2.31}$$

Note that these limits do not exist for the corresponding functions.

# **2.5 Distributions with Compact Support**

The property of multiplication of a distribution with an indefinitely differentiable function suggests another interesting generalization. Let's start again with a regular distribution *f* . Then, if we write out explicitly the integral of the distribution γ *f*

$$\langle \boldsymbol{\chi}\,\boldsymbol{f}, \boldsymbol{\phi}\rangle = \int\_{-\infty}^{\infty} f(\boldsymbol{\tau}) \,\boldsymbol{\chi}(\boldsymbol{\tau}) \,\boldsymbol{\phi}(\boldsymbol{\tau}) \,d\boldsymbol{\tau}$$

we see that in principle we could group the functions differently and write φ *f*, γ . This is however not a distribution in D since γ is not a test function as its support is not compact. In spite of this, the number φ *f*, γ is the same as γ *f*, φ for every test function φ and every function γ ∈ C<sup>∞</sup>. A moment's reflection reveals that what makes these two expressions have the same value for every value of γ is the fact that, as a function, φ *f* has *compact support*.

To generalize this observation to arbitrary distributions we must first define what the support of a distribution *T* is.

A distribution is said to *vanish* on an open set *<sup>U</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* if *<sup>T</sup>*, φ<sup>=</sup> 0 for all test functions φ with supp(φ) ⊂ *U*, where supp(φ) is the support of the test function φ.

**Definition 2.7** (*Support of a distribution*) The *support of a distribution T* is the complement of the largest open set *U* on which the distribution vanishes and is denoted by supp(*T* ).

The set of all distributions with compact support is denoted by E and forms a vector subspace of D , that is E ⊂ D .

#### **Example 2.12: Support of** *δ*

The value of the Dirac delta distribution δ applied to any test function φ with supp(φ) ∈ *U* = (−∞, 0) ∪ (0,∞) is zero. That is, δ vanishes on U. Its support is supp(δ) <sup>=</sup> <sup>R</sup> \ *<sup>U</sup>* = {0} and is therefore compact.

With the notion of the support of a distribution we can generalize our observation that φ *f*, γ =γ *f*, φ by saying that distributions with compact support *T* ∈ E can be extended to *continuous, linear* functionals *L* on indefinitely differentiable functions with arbitrary support. In this context the vector space of all indefinitely differentiable functions is denoted by E and, to give a meaning to the continuity of the functionals, it is equipped with the following convergence criteria.

**Definition 2.8** (*Convergence in* E) A sequence (γ*m*)*<sup>n</sup>*∈<sup>N</sup> ∈ E is said to converge to γ if for every compact subset *K* of R*<sup>n</sup>* and every *n*-tuple *k*, the set of functions *D<sup>k</sup>*γ*<sup>m</sup>* converges uniformly to *D<sup>k</sup>*γ

$$\sup\_{\chi\_{\mathcal{K}} \in K} \left| D^k \mathcal{Y}\_m - D^k \mathcal{Y} \right| \to 0, \qquad m \to \infty.$$

Assume that the support of *T* is the compact set *K* and let α be a test function equal to 1 in a neighborhood *U* of *K*. Then for every function γ ∈ E and for every point τ ∈ *U*, α(τ ) γ (τ ) = γ (τ ). Therefore, there is a functional *L* such that

$$
\langle L, \chi \rangle = \langle T, \alpha \chi \rangle \qquad \chi \in \mathcal{E}.\tag{2.32}
$$

That it is independent of the choice of α is easily verified: Suppose that α<sup>1</sup> and α<sup>2</sup> are two test functions equal to 1 in a neighborhood of *K*. Then in the smallest of these neighborhoods α<sup>1</sup> − α<sup>2</sup> = 0 and *T*, α<sup>1</sup> γ =*T*, α<sup>2</sup> γ .

The functional thus defined is *unique* since, for every sequence of test functions α*<sup>m</sup>* equal to 1 for |τ | < *m* we have: on the one hand, by continuity of *L*

$$\lim\_{m \to \infty} \langle L, \alpha\_m \, \vert \, \nu \rangle = \langle L, \, \nu \rangle.$$

and on the other hand, for sufficiently large *m*

$$
\langle T, \alpha\_m \,\, \nu \rangle = \langle L, \,\, \nu \rangle \,\, \, \,
$$

Therefore every distribution with compact support *T* defines a unique continuous, linear functional *L* on E.

The converse is also true: Every continuous, linear functional *L* restricted to D ⊂ E defines a distribution *T* with compact support. For, if this was not the case and the support of *T* was not compact, then we could find a sequence of test functions φ*<sup>m</sup>* ∈ D with support in the complement of |τ | < *m*, such that *T*, φ*m* = 1 for all *m*. However, since in E lim*<sup>m</sup>*→∞ φ*<sup>m</sup>* = 0, by continuity of *L*

$$\lim\_{m \to \infty} \langle L, \phi\_m \rangle = 0 \,.$$

Therefore if *L*, φ=*T*, φ for all φ ∈ D, then the support of *T* must be compact.

There are other vector sub-spaces of D which can be extended to larger sets of functions than D. We will encounter another one in the context of the Fourier transform. The set of test functions D are the common set on which all distributions are defined.

# *2.5.1 Single-Point Support*

We now investigate distributions satisfying the following equation

$$t^k \, T = 0\,,\tag{2.33}$$

that is, distributions for which, for every *k* ≥ 1 and test function φ

$$\langle t^k | T, \phi \rangle = \langle T, t^k \phi \rangle = 0 \dots$$

For simplicity we limit ourselves to the one dimensional case.

First observe that on the open set *U* = (−∞, 0) ∪ (0,∞) the function *t* → *t <sup>k</sup>* doesn't assume the zero value. For this reason, to satisfy the equation, *T* must vanish on *U*, or, stated in other words, the support of *T* must be the origin: supp(*T* ) = {0}.

Since the support of *T* is compact (a single point) and, for any test function φ, the value of *T*, φ is determined by the values of φ in a neighborhood (however small) of supp(*T* ), we can expand φ using Taylor's formula with remainder [19]. For our purposes it is convenient to express the remainder in integral form which can be obtained by integrating by parts multiple times

$$\begin{split} \phi(t) &= \phi(0) + \int\_{0}^{t} \phi^{(1)}(\tau) \, d\tau \\ &= \phi(0) - (t - \tau)\phi^{(1)}(\tau) \Big|\_{0}^{t} + \int\_{0}^{t} (t - \tau)\phi^{(2)}(\tau) \, d\tau \\ &= \phi(0) + t \left. \phi^{(1)}(0) - \frac{(t - \tau)^{2}}{2} \phi^{(2)}(\tau) \right|\_{0}^{t} + \int\_{0}^{t} \frac{(t - \tau)^{2}}{2} \phi^{(3)}(\tau) \, d\tau \\ &= \cdots \\ &= \sum\_{m=0}^{k-1} \frac{\phi^{(m)}(0)}{m!} \, t^{m} + \int\_{0}^{t} \frac{(t - \tau)^{k-1}}{(k-1)!} \phi^{(k)}(\tau) \, d\tau. \end{split}$$

By performing the substitution τ = *t* λ the remainder can be transformed in the following form

$$t^k \int\_0^1 \frac{(1-\lambda)^{k-1}}{(k-1)!} \phi^{(k)}(t\,\lambda) \,d\lambda = \frac{t^k}{(k-1)!} \,\psi(t)$$

which makes it apparent that it is proportional to the product of *t <sup>k</sup>* and an indefinitely differentiable function ψ ∈ E. Note that, differently from φ, no addend has compact support. This poses no problem since *T* , having itself compact support, can be extended uniquely to a distribution on E (see Sect. 2.5).

With this expansion we can express the value of *T*, φ as a finite sum. Taking into account (2.33)

$$\begin{aligned} \langle T, \phi \rangle &= \sum\_{m=0}^{k-1} \frac{\phi^{(m)}(0)}{m!} \langle T, t^m \rangle + \frac{1}{(k-1)!} \langle T, t^k \, \psi(t) \rangle \\ &= \sum\_{m=0}^{k-1} \frac{\phi^{(m)}(0)}{m!} \langle T, t^m \rangle \\ &= \sum\_{m=0}^{k-1} c\_m \langle \delta^{(m)}, \phi \rangle \end{aligned}$$

or

$$T = \sum\_{m=0}^{k-1} c\_m \delta^{(m)} \tag{2.34}$$

with

$$c\_m = \left(-1\right)^m \frac{\left}{m!} \,.$$

We have therefore established that the homogeneous equation

$$t^k \,^\*T = 0$$

has an infinity of non-trivial solutions, each being a weighted sum of the Dirac delta distribution δ and its derivatives up to order *k* − 1. In addition, this shows that δ and its derivatives are the only distributions with the support consisting of a single point.

# **Example 2.13: Solutions of** *t T* **= 1**

We want to find all solutions of the equation

$$\_{1}T = 1..$$

If *T* would be a function then the equation would have no solution at *t* = 0 and 1/*t* at all other points. From this we guess that the solution as a distribution could be *T* = pv 1/*t*. Indeed, this distribution satisfies the equation

$$
\begin{aligned}
\left\langle t \operatorname{pv} \frac{1}{t}, \phi \right\rangle &= \left\langle \operatorname{pv} \frac{1}{t}, t \,\phi \right\rangle \\ &= \lim\_{\epsilon \downarrow 0} \int\_{t \ge \epsilon} \frac{1}{t} \, t \,\phi(t) \, dt \\ &= \int\_{-\infty}^{\infty} \phi(t) \, dt \\ &= \left\langle 1, \phi \right\rangle.
\end{aligned}
$$

However, this is not the only solution as the homogeneous equation has non-trivial solutions given by (2.34) (k = 1). The equation is therefore satisfied by all distributions of the form

$$T = \text{pv}\frac{1}{t} + c\,\delta(t)$$

with *c* an arbitrary constant.



The properties of distributions discussed in this chapter and some that will be discussed in the following ones are summarised in Table 2.1.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 Convolution of Distributions**

The convolution product plays a central role in the description of linear and weaklynonlinear systems. In this chapter we develop its theory based on distributions. In addition, the chapter introduces the tensor product which will also come to play an important role in the description of weakly-nonlinear systems.

# **3.1 Tensor Product**

The general notion of convolution of distributions is defined in terms of the *tensor product*. We therefore start by defining this product and, as before, we start by considering regular distributions.

The tensor product is a bilinear operation that can be used to generate a vector space out of other vector spaces. If *f* is a function on R*<sup>m</sup>* and *g* a function on R*<sup>n</sup>*, then the tensor product of *f* and *g* is defined as the function

$$f \otimes \mathbf{g} : \mathbb{R}^{m+n} \to \mathbb{C} \quad (\mathfrak{r}, \lambda) \mapsto f(\mathfrak{r})\mathfrak{g}(\lambda)\dots$$

The tensor product of two locally integrable functions is itself locally integrable. Therefore, if we now assume *f* and *g* to be locally integrable, we can try to build the tensor product of the regular distributions *Tf* and *Tg* based on the tensor product *f* ⊗ *g*. If we use as test function the tensor product of two suitable test functions ξ and ψ then we obtain

$$\langle T\_f \otimes T\_\mathfrak{g}, \xi \otimes \psi \rangle = \langle T\_f, \xi \rangle \langle T\_\mathfrak{g}, \psi \rangle$$

which is well-defined. However, for an arbitrary test function <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R*<sup>m</sup>*+*<sup>n</sup>*) it is not immediately apparent that the result is a distribution. Taking *m* = *n* = 1 for simplicity, we have

#### 40 3 Convolution of Distributions

$$\begin{aligned} \langle T\_f \otimes T\_\mathfrak{g}, \phi \rangle &= \int\_{-\infty}^\infty \int\_-^\infty f(\tau) \, g(\lambda) \, \phi(\tau, \lambda) \, d\lambda \, d\tau \\ &= \int\_{-\infty}^\infty f(\tau) \int\_{-\infty}^\infty g(\lambda) \, \phi(\tau, \lambda) \, d\lambda \, d\tau \end{aligned}$$

The inner integral evaluates to a number for every value of the variable τ , that is, it is a complex valued function of τ that we call ζ (τ ). Furthermore, the variable τ only appears as an argument of the test function φ. Therefore ζ must have compact support. In addition, when computing its derivative, differentiation can be moved under the integral and ζ is therefore indefinitely differentiable. In other words ζ is a test function. We therefore have

$$\langle\langle T\_f \otimes T\_\mathfrak{g}, \phi\rangle\rangle = \langle T\_f, \langle T\_\mathfrak{g}, \phi\rangle\rangle = \langle T\_\mathfrak{g}, \langle T\_f, \phi\rangle\rangle$$

where the last equality comes from the fact that we could reverse the order of integration without changing the result. This last property is referred to as *Fubini's theorem.*

The above arguments can be generalized to arbitrary distributions. That the inner functional is a function of τ and that has compact support is clear. The fact that it can be differentiated comes from the continuity and linearity of distributions

$$\begin{split} D\,\xi(\tau) &= \lim\_{\epsilon \to 0} \frac{\xi(\tau + \epsilon) - \xi(\tau)}{\epsilon} \\ &= \lim\_{\epsilon \to 0} \frac{\langle T(\lambda), \phi(\tau + \epsilon, \lambda) \rangle \rangle - \langle T(\lambda), \phi(\tau, \lambda) \rangle}{\epsilon} \\ &= \lim\_{\epsilon \to 0} \langle T(\lambda), \frac{\phi(\tau + \epsilon, \lambda) - \phi(\tau, \lambda)}{\epsilon} \rangle \\ &= \langle T(\lambda), D\_1\phi(\tau, \lambda) \rangle . \end{split} \tag{3.1}$$

With this we see that ζ can be differentiated an arbitrary number of times and is thus a test function. We therefore obtain the following general definition for the tensor product of distributions.

**Definition 3.1** (*Tensor product*) Given two distributions *S* ∈ D (R*<sup>m</sup>*) and *<sup>T</sup>* <sup>∈</sup> D (R*<sup>n</sup>*) the tensor product *<sup>S</sup>* <sup>⊗</sup> *<sup>T</sup>* is the distribution in <sup>D</sup> (R*<sup>m</sup>*+*<sup>n</sup>*) defined by

$$\langle \mathcal{S} \otimes T, \phi \rangle := \langle \mathcal{S}, \langle T, \phi \rangle \rangle = \langle T, \langle \mathcal{S}, \phi \rangle \rangle \,. \tag{3.2}$$

It's easy to see that the tensor product of distributions is bilinear

$$\begin{aligned} (S+T)\otimes U &= S\otimes U + T\otimes U\\ S\otimes (T+U) &= S\otimes T + S\otimes U \end{aligned} \tag{3.3}$$

and associative

$$(S \otimes T) \otimes U = S \otimes (T \otimes U) \,. \tag{3.4}$$

As a useful abbreviation of notation we define the tensor power by

$$\begin{aligned} T^{\otimes k} &:= \underbrace{T \otimes \dots \otimes T}\_{k \text{ times}}, \qquad k > 0\\ T^{\otimes 0} &:= 1 \in \mathbb{C} \text{ .} \end{aligned} \tag{3.5}$$

#### **Example 3.1: Higher Dimensional Dirac Pulse**

The tensor product of two Dirac pulses is

$$
\langle \delta \otimes \delta(\mathfrak{r}, \lambda), \phi(\mathfrak{r}, \lambda) \rangle = \langle \delta(\mathfrak{r}), \langle \delta(\lambda), \phi(\mathfrak{r}, \lambda) \rangle \rangle = \langle \delta(\mathfrak{r}), \phi(\mathfrak{r}, 0) \rangle
$$

$$
= \phi(0, 0)
$$

$$
=: \langle \delta(\mathfrak{r}, \lambda), \phi(\mathfrak{r}, \lambda) \rangle \ .
$$

# **3.2 Convolution of Distributions**

We now come to the main objective of this section: the convolution of distributions. Remember that the convolution of integrable functions *f*, *g* ∈ *L*<sup>1</sup> is defined as follows

$$f \ast \mathbf{g}\left(t\right) := \int\_{-\infty}^{\infty} f\left(\tau\right) \mathbf{g}\left(t - \tau\right) d\tau\left.\tau\right|$$

To obtain a distribution we may write

$$\begin{aligned} \langle f \ast g, \phi \rangle &= \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} f(\tau) \, g(t - \tau) \, d\tau \, \phi(t) \, dt \\ &= \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} f(\tau) \, g(\lambda) \, \phi(\lambda + \tau) \, d\tau \, d\lambda \end{aligned}$$

which can be represented as the following tensor product

$$\langle f \ast \mathbf{g}, \phi \rangle = \langle f(\mathfrak{r}) \otimes \mathbf{g}(\lambda), \phi(\lambda + \mathfrak{r}) \rangle \dots$$

However, while indefinitely differentiable, the function ψ(τ, λ) = φ(λ + τ ) is not a test function because its support is not compact. In fact, ψ(τ, λ) assumes the same

value φ(*t*) for every point on the diagonal line *t* = λ + τ of the (τ, λ)-plane (see Fig. 3.1). In spite of this, on account of our assumption that *f* and *g* are integrable functions (and not merely locally integrable), the above integral is well-defined. We therefore conclude that, similarly to the case of functions, *the convolution of distributions only exists for a subset of distributions with additional characteristics.*

**Definition 3.2** (*Convolution*) Given two distributions *S* and *T* in D (R*<sup>n</sup>*), if for every test function <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R*<sup>n</sup>*) the tensor product *<sup>S</sup>* <sup>⊗</sup> *<sup>T</sup>* can be extended to functions of the form ψ(τ, λ) = φ(τ + λ), then the convolution product *S* ∗ *T* is defined by

$$\langle \mathcal{S} \ast T, \phi \rangle := \langle \mathcal{S}(\mathfrak{r}) \otimes T(\lambda), \phi(\mathfrak{r} + \lambda) \rangle \tag{3.6}$$

and is commutative

$$S \* T = T \* S \,. \tag{3.7}$$

A *sufficient condition for the existence of the convolution* is as follows: if the intersection of the support of *S* ⊗ *T* , that is supp(*S* ⊗ *T* ) = supp(*S*) × supp(*T* ) and the support of ψ(τ, λ) = φ(τ + λ) is bounded, then *S* ∗ *T* is well defined. In other words, if for τ ∈ supp(*S*) and λ ∈ supp(*T* ) the sum τ + λ can only remain bounded if both τ and λ remain bounded, then the convolution product *S* ∗ *T* is well defined.

Note that this condition is sufficient but *not* necessary as shown for instance by the introductory example with integrable functions *f*, *g* ∈ *L*1. In fact the convolution *f* ∗ *g* of integrable functions does always exist and is itself an integrable function

$$\begin{aligned} |\langle f \ast g, \phi \rangle| &= \left| \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} f(\tau) \, g(\lambda) \, \phi(\lambda + \tau) \, d\tau \, d\lambda \right| \\ &\leq \sup\_{-\infty} |\phi| \int\_{-\infty}^{\infty} |f(\tau)| \, d\tau \int\_{-\infty}^{\infty} |g(\lambda)| \, d\lambda \, \phi \end{aligned}$$

#### **Example 3.2: One Sided Distributions**

A subset of the real line *<sup>U</sup>* <sup>∈</sup> <sup>R</sup> is said to be *bounded on the left* if there is a real constant *b* such that *U* ⊂ (*b*,∞). Similarly, a subset *U* is called *bounded on the right* if there is a constant *b* such that *U* ⊂ (−∞, *b*).

Distributions whose support is bounded on the left (right) are called *right-sided (left-sided) distributions.* The set of all such distributions forms a vector space denoted by D *<sup>R</sup>* (D *<sup>L</sup>* ). Of particular interest for our purposes are right-sided distributions *T* with supp(*T* ) ∈ [0,∞). We denote the space of all such distributions by D +.

Figure 3.2 shows the support of *S*(τ ) ⊗ *T* (λ) and of ψ(τ, λ) = φ(τ + λ) for two distributions *S* and *T* in D <sup>+</sup>. It is clear that, for any test function φ, their overlap is always bounded. Therefore, the convolution of right-sided or left-sided distributions is always well defined. Not so the convolution of a left-sided distribution with a right-sided one.

#### **Example 3.3: Convolution with** *δ*

Let *T* be any distribution in D (R*<sup>n</sup>*) and δ the *n* dimensional Dirac pulse (see Example 3.1), then

$$\begin{aligned} \langle T \ast \delta, \phi \rangle &= \langle T(\tau) \otimes \delta(\lambda), \phi(\tau + \lambda) \rangle \\ &= \langle T(\tau), \langle \delta(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle T, \phi \rangle \end{aligned}$$

or

$$T \* \delta = T \,. \tag{3.8}$$

Thus, δ is a *unit* of convolution.

Similarly, for any |*k*|th order derivative of the Dirac pulse

$$\begin{aligned} \langle T \ast D^k \delta, \phi \rangle &= \langle T(\tau) \otimes D^k \delta(\lambda), \phi(\tau + \lambda) \rangle \\ &= \langle T(\tau), \langle D^k \delta(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle T(\tau), \langle \delta(\lambda), (-1)^{|k|} D^k\_\lambda \phi(\tau + \lambda) \rangle \rangle \\ &= \langle T(\tau), \langle \delta(\lambda), (-1)^{|k|} D^k\_\tau \phi(\tau + \lambda) \rangle \rangle \\ &= \langle T(\tau), (-1)^{|k|} D^k \phi(\tau) \rangle \\ &= \langle D^k T(\tau), \phi(\tau) \rangle \end{aligned}$$

or

$$T \* D^k \delta = D^k T \tag{3.9}$$

where *D<sup>k</sup>* <sup>λ</sup> and *D<sup>k</sup>* <sup>τ</sup> mean differentiation with respect to the variable λ and τ , respectively; and we made use of the fact that *D<sup>k</sup>* λφ(τ + λ) = *D<sup>k</sup>* τφ(τ + λ).

In our notation we use *T* (*t*) to indicate a distribution to be associated with a test function whose independent variable is indicated by the symbol *t*. With this notation it seems natural to write *S*(*t*) ∗ *T* (*t*) to denote the convolution of two distributions. However, when we build the convolution of two shifted distributions this leads to confusion as *S*(*t* − *a*) ∗ *T* (*t* − *a*) does not represent the distribution *S* ∗ *T* shifted by *a*. To give a precise meaning to such expressions we introduce the *shifting operator* defined by

$$
\langle \mathfrak{t}\_a T, \phi(t) \rangle := \langle T(t), \phi(t+a) \rangle \,. \tag{3.10}
$$

With it we fix the following notation

$$(S \ast T)(t - a) := \mathfrak{r}\_a(S \ast T) \tag{3.11}$$

$$S(t-a) \* T(t-b) := \mathfrak{r}\_a \mathcal{S} \* \mathfrak{r}\_b T \,. \tag{3.12}$$

The convolution product has several useful properties. The first one that we want to discuss is *distributivity*. If all appearing convolutions are well defined, then

$$
\begin{aligned}
\langle (S+T) \* U, \phi \rangle &= \langle (S(\tau) + T(\tau)) \otimes U(\lambda), \phi(\tau + \lambda) \rangle \\ &= \langle S(\tau) + T(\tau), \langle U(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle S(\tau), \langle U(\lambda), \phi(\tau + \lambda) \rangle \rangle + \\ &\quad \langle T(\tau), \langle U(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle S \* U, \phi \rangle + \langle T \* U, \phi \rangle \\ &= \langle S \* U + T \* U, \phi \rangle \\
\end{aligned}
$$

and similarly

$$S\*(T+U) = S\*T + S\*U \,. \tag{3.14}$$

Further, *differentiation* of a convolution product is equivalent to differentiation of one of the products

$$\begin{aligned} \langle D\_i(S \ast T), \phi \rangle &= -\langle S \ast T, D\_i \phi \rangle \\ &= -\langle S(\tau) \otimes T(\lambda), D\_{\tau, i} \phi(\tau + \lambda) \rangle \\ &= -\langle S(\tau), \langle T(\lambda), D\_{\tau, i} \phi(\tau + \lambda) \rangle \rangle \\ &= \langle D\_i S(\tau), \langle T(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle (D\_i S) \ast T, \phi \rangle \end{aligned}$$

where *D*τ,*<sup>i</sup>* is the partial differential operator with respect to the *i*th component of the variable τ . Since *D*τ,*<sup>i</sup>*φ(τ + λ) = *D*λ,*<sup>i</sup>*φ(τ + λ) differentiation can also be moved to the second factor so that

$$D\_i(S\*T) = (D\_i S)\*T = S\*(D\_i T).\tag{3.15}$$

In a similar way one shows that the operation of *shifting* a convolution product can also be moved to one of the factors

$$S(S \ast T)(\tau - a) = S(\tau - a) \ast T(\tau) = S(\tau) \ast T(\tau - a) \,. \tag{3.16}$$

#### **Example 3.4: Convolution with** *δ*

Consider two Dirac pulses and an arbitrary distribution *T* in D (R*<sup>n</sup>*). By the shifting property of convolution we have

$$
\delta(\mathfrak{r} - a) \* \delta(\mathfrak{r} - b) = \delta(\mathfrak{r} - a - b) \tag{3.17}
$$

$$T(\mathfrak{r}) \* \delta(\mathfrak{r} - a) = T(\mathfrak{r} - a) \,. \tag{3.18}$$

The convolution of a distribution *T* with an indefinitely differentiable function γ is an indefinitely differentiable function. For, by keeping in mind that φ has compact support and therefore, as a distribution can be uniquely extended to functions in E, we have

$$
\begin{aligned}
\langle T \ast \boldsymbol{\gamma}, \phi \rangle &= \langle T(\boldsymbol{\tau}), \langle \boldsymbol{\gamma}(\boldsymbol{\lambda}), \phi(\boldsymbol{\tau} + \boldsymbol{\lambda}) \rangle \rangle \\&= \langle T(\boldsymbol{\tau}), \langle \boldsymbol{\gamma}(\boldsymbol{\lambda} - \boldsymbol{\tau}), \phi(\boldsymbol{\lambda}) \rangle \rangle \\&= \langle T(\boldsymbol{\tau}), \langle \phi(\boldsymbol{\lambda}), \boldsymbol{\gamma}(\boldsymbol{\lambda} - \boldsymbol{\tau}) \rangle \rangle \\&= \langle \phi(\boldsymbol{\lambda}), \langle T(\boldsymbol{\tau}), \boldsymbol{\gamma}(\boldsymbol{\lambda} - \boldsymbol{\tau}) \rangle \rangle \,.
\end{aligned}
$$

Then, by arguments similar to the ones that led to the definition of the tensor product, one deduces that the inner distribution is an indefinitely differentiable function that we call ζ . We can therefore proceed further

$$\begin{aligned} \langle T \ast \boldsymbol{\nu}, \phi \rangle &= \langle \phi(\boldsymbol{\lambda}), \xi(\boldsymbol{\lambda}) \rangle \\ &= \langle \boldsymbol{\xi}, \phi \rangle \end{aligned}$$

and obtain as claimed that *T* ∗ γ = ζ .

The convolution product is a *continuous* operation in the following sense. If *T* is a fixed convolution, (*Sm*)*<sup>m</sup>*∈<sup>N</sup> a sequence of distributions converging in D to *S* and all involved convolutions are well defined, then

$$\begin{aligned} \lim\_{m \to \infty} \langle S\_m \ast T, \phi \rangle &= \lim\_{m \to \infty} \langle S\_m(\tau), \langle T(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle S(\tau), \langle T(\lambda), \phi(\tau + \lambda) \rangle \rangle \\ &= \langle S \ast T, \phi \rangle \end{aligned}$$

or

$$\lim\_{m \to \infty} S\_m \ast T = S \ast T \,. \tag{3.19}$$

In particular we saw in Example 2.5 that δ can be represented as the limit of a sequence of test functions β*<sup>m</sup>* and in Example 3.3 that δ is a unit of convolution. With continuity of convolutions we therefore deduce that *each distribution is the limit of a sequence of indefinitely differentiable functions* of the form *T* ∗ φ with φ a test function. We saw an instance of this in Example 2.2.

The last property that we want to discuss in this section is *associativity*. In general the convolution of three or more distributions is not associative as is easily verified with simple examples.

#### **Example 3.5: Convolution may not be Associative**

Let's denote by 1 and 0 the constant functions evaluating to one and zero, respectively. Then

$$\begin{aligned} 1\*(\delta^{(1)}\*\mathbb{1}\_+) &= 1\*\delta = 1\\ (1\*\delta^{(1)})\*\mathbb{1}\_+ &= 0\*\mathbb{1}\_+ = 0\\ (\delta^{(1)}\*\mathbb{1}\_+)\*1 &= \delta\*1 = 1\\ \delta^{(1)}\*(\mathbb{1}\_+\*1) &= \text{undefined}. \end{aligned}$$

We can *guarantee associativity* by imposing a restriction similar to the one for the existence of the convolution of two distributions. Let's write the convolution of three distributions in terms of the tensor product

$$\langle S \ast T \ast U, \phi \rangle = \langle S(\mathfrak{r}) \otimes T(\lambda) \otimes U(\kappa), \phi(\mathfrak{r} + \lambda + \kappa) \rangle \,. \tag{3.20}$$

If the intersection of the support of *S*(τ ) ⊗ *T* (λ) ⊗ *U*(κ) and the support of φ(τ + λ + κ) is bounded, then, by the properties of the tensor product, the convolution is guaranteed to be associative. It's easily verified that the following is a *sufficient condition*: if all, but possibly one distribution have compact support, then the convolution product is associative.

#### **Example 3.6: One-Sided Distributions**

Consider three distributions *S*(τ ), *T* (λ) and *U*(κ) in D <sup>+</sup>. Then τ,λ and κ are ≥ 0. If the value of τ + λ + κ is bounded then there is a constant *c* for which τ + λ + κ < *c*. It follows that τ is bounded by τ < *c* − (λ + κ) and similarly for the other variables. The convolution of distributions in D <sup>+</sup> is therefore always associative. This is also true for distributions in D *<sup>R</sup>* and D *L* .

In addition, it's easily seen that D <sup>+</sup> is closed under convolution. That is, a convolution between distributions in D <sup>+</sup> results in another distribution in D +.

The discussed properties of the convolution product are summarized in Table 3.1.

# **3.3 Approximation of Distributions**

In this section we show how the convolution product can be used to obtain approximations of arbitrary distributions.

We saw that if *T* is a distribution in D and φ is a test function in D then the convolution product *T* ∗ φ is an indefinitely differentiable function. Its support is not necessarily bounded. However, let α be the test function defined by (2.11) and set <sup>α</sup>*m*(τ ) <sup>=</sup> α(τ/*m*). Then for every *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> the product <sup>α</sup>*<sup>m</sup>* · (*<sup>T</sup>* <sup>∗</sup> φ) is an indefinitely differentiable function with compact support and hence a test function.


**Table 3.1** Properties of the convolution product

Let (β*m*) be a sequence of test functions converging to the δ distribution. Then, with the continuity of convolution, we see that in D 

$$\lim\_{m \to \infty} \alpha\_m \cdot (T \* \beta\_m) = T \,. \tag{3.21}$$

This shows that *every distribution in* D *is the limit of a sequence of test functions in* D. In other words, D is a dense sub-vector space of D . Every distribution can thus be approximated to an arbitrary accuracy by a test function in D.

Next we construct another dense sub-vector space of D . For simplicity we only treat the one dimensional case and for brevity we write κ*<sup>m</sup>* for α*<sup>m</sup>* · (*T* ∗ β*m*). As we just discussed <sup>κ</sup>*<sup>m</sup>* is a test function for every *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>. Let <sup>φ</sup> be another arbitrary test function. Then, for every *m*, we can find constants *a* and *b* such that the interval [*a*, *b*] includes both, the support of κ*<sup>m</sup>* as well as the one of φ. If we construct the finite sum of δ distributions weighted by *km*

$$S\_{n,m} = \frac{b-a}{n} \sum\_{j=1}^{n} \kappa\_m (a + j\frac{b-a}{n}) \delta(t - a - j\frac{b-a}{n})$$

and apply it to φ we obtain

$$
\langle S\_{n,m}, \phi \rangle = \frac{b-a}{n} \sum\_{j=1}^{n} \kappa\_m (a + j \frac{b-a}{n}) \phi(a + j \frac{b-a}{n}) \dots
$$

In the limit as *n* tends to infinity we obtain

$$\lim\_{n \to \infty} \langle \mathcal{S}\_{n,m}, \phi \rangle = \int\_{a}^{b} \kappa\_{m}(\tau) \phi(\tau) \, d\tau \, .$$

By the choice of the interval [*a*, *<sup>b</sup>*] we can extend it to the whole of <sup>R</sup> without changing the value of the integral. Hence, by letting *m* tend to infinity we finally

#### 3.4 Convolution of Periodic Distributions 49

obtain

$$\lim\_{m \to \infty} \int\_{a}^{b} \kappa\_{m}(\tau) \phi(\tau) \,d\tau = \lim\_{m \to \infty} \langle \kappa\_{m}, \phi \rangle = \langle T, \phi \rangle \dots$$

We thus see that every distribution *T* ∈ D is the limit of a finite sum of weighted Dirac pulses *Sn* := *Sn*,*<sup>n</sup>*. That is, *finite sums of weighted* δ *distributions form a dense sub-vector space of* D .

Note that a regular spacing between the δ distributions is not necessary and was chosen purely for convenience. In general any distribution can be approximated by a finite sum of the following form

$$T\_n = \sum\_{j=1}^n a\_{n,j} \, \delta(t - \tau\_{n,j}) \tag{3.22}$$

with *an*,*<sup>j</sup>* <sup>∈</sup> <sup>C</sup> and <sup>τ</sup>*<sup>n</sup>*,*<sup>j</sup>* <sup>∈</sup> <sup>R</sup>.

# **3.4 Convolution of Periodic Distributions**

In this section we investigate periodic distributions and their convolution. One way to define periodic distributions is to define them in a similar way as periodic functions.

**Definition 3.3** (*Periodic distribution I*) A periodic distribution *T* is a distribution for which there exist a positive number T such that for all test functions φ

$$
\langle T(\mathfrak{r}), \phi(\mathfrak{r}) \rangle = \langle T(\mathfrak{r} + \mathcal{T}), \phi(\mathfrak{r}) \rangle. \tag{3.23}
$$

The smallest such number T is called the *fundamental period* of the distribution.

Periodic distributions have unbounded support. For this reason the convolution of two periodic distributions as defined by (3.6) does not exist. By exploiting their periodicity it is however possible to find an alternative definition for periodic distributions that allows for a well defined convolution product.

Consider a regular distribution arising from aT-periodic function *f* . By exploiting its periodicity we find that

$$\begin{aligned} \langle f, \phi \rangle &= \int\_{-\infty}^{\infty} f(t) \, \phi(t) \, dt \\ &= \sum\_{m = -\infty}^{\infty} \int\_{a + m\mathcal{T}} f(t) \, \phi(t) \, dt \\ &= \int\_{a}^{a + \mathcal{T}} f(t) \sum\_{m = -\infty}^{\infty} \phi(t - m\mathcal{T}) \, dt \\ &= \int\_{a + \mathcal{T}} f(t) \, \Phi(t) \, dt \end{aligned}$$

with *a* a constant,

$$\Phi(t) = \sum\_{m = -\infty}^{\infty} \phi(t - mT) \tag{3.24}$$

and where the exchange of summation and integration is justified by the fact that for every value of *t* the sum is finite. The function is T-periodic and indefinitely differentiable.

By introducing the identity

$$f(t) \equiv f^\circ([t]) \qquad (t \in \mathbb{R}) \tag{3.25}$$

with [*t*] the equivalence class of real numbers modulo T, we effectively and uniquely define a function *f* ◦. By writing [*t*] asT/(2π )[ϕ] and noting that[ϕ]is an equivalence class modulo 2π, we can think of *f* ◦ as a function defined on a circle of radius T/(2π ) at the origin of a plane, with [ϕ] the polar angle. With this interpretation, the equivalence class [*t*] is seen to represent the distance along the arc of the circle <sup>T</sup> from the reference [0]. In the following, to simplify notation, we are going to write a representative for an equivalence class.

Conversely, given a function *f* ◦, the identity (3.25) uniquely defines a periodic function *f* (see Fig. 3.3). The last integral above is therefore identical to the integral of *f* ◦ ◦ on the circle T

$$\int\_{a}^{a+\mathcal{T}} f(t)\,\Phi(t) \,dt = \int\_{\mathbb{T}} f^{\circ}(p)\,\Phi^{\circ}(p) \,dp.$$

We have thus obtained that to every regular periodic distribution *f* there corresponds a continuous linear functional *f* ◦ on indefinitely differentiable functions ◦ on the circle <sup>T</sup>. The set of all the latter functions is denoted by <sup>D</sup>(T). This space is isomorphic to the vector sub-space of E consisting of all indefinitely differen-

tiable T-periodic functions and from which it inherits the following definition of convergence.

**Definition 3.4** (*Convergence in* <sup>D</sup>(T)) A sequence of functions ◦ *<sup>m</sup>* <sup>∈</sup> <sup>D</sup>(T) is said to converge to ◦ <sup>∈</sup> <sup>D</sup>(T) if, for every natural number *<sup>k</sup>*, the functions *<sup>D</sup><sup>k</sup>*◦ *<sup>m</sup>* converge uniformly to *D<sup>k</sup>*◦.

In Sect. 3.2 we saw that every distribution *T* can be generated as the limit of a sequence of regular distributions *fm*. Applying this result

$$\langle T, \phi \rangle = \lim\_{m \to \infty} \langle f\_m, \phi \rangle = \lim\_{m \to \infty} \int\_{\mathbb{T}} f\_m^{\diamond}(p) \, \Phi^{\diamond}(p) \, dp = \langle T^{\diamond}, \Phi^{\diamond} \rangle$$

we see that not only regular, but every periodic distribution can be equivalently represented by a continuous, linear functional on <sup>D</sup>(T).

To see the converse, that is that every continuous, linear functional on <sup>D</sup>(T) represents a distribution on <sup>D</sup>(R), we have to show that every indefinitely differentiable T-periodic function can be generated by some test function φ as in (3.24). To this end we introduce the so-called *unitary functions*. These are test functions for which there is a number T such that

$$\sum\_{m=-\infty}^{\infty} \xi(t - m\mathcal{T}) = 1.\tag{3.26}$$

Note that, here again, the sum is finite for every bounded range of *t*. We can find several such functions. The following example satisfies (3.26) forT = 1 (see Fig. 3.4)

$$\xi\_1(t) = \begin{cases} \int\_0^1 e^{\frac{-1}{\tau(1-\tau)}} d\tau \int\_0^1 e^{\frac{-1}{\tau(1-\tau)}} d\tau & |t| < 1\\ 0 & |t| \ge 1 \end{cases}$$

With it we can construct unitary functions for arbitrary periods T by ξ<sup>T</sup> (*t*) = ξ1(*t*/T).

Now, given any T-periodic test function and an unitary function ξ<sup>T</sup> we have

$$\Phi(t) = \Phi(t) \sum\_{m = -\infty}^{\infty} \xi\_{\mathcal{T}}(t - m\mathcal{T})$$

$$= \sum\_{m = -\infty}^{\infty} \xi\_{\mathcal{T}}(t - m\mathcal{T})\,\Phi(t - m\mathcal{T}).\tag{3.27}$$

Since ξ<sup>T</sup> is a test function, so is ξ<sup>T</sup> . We have thus established that every indefinitely differentiable periodic function can be represented as the sum of a test function φ. We conclude that periodic distributions are in one-to-one correspondence with continuous, linear functionals on <sup>D</sup>(T). It is therefore natural to call these functionals distributions on <sup>T</sup>. They form a vector space that is denoted by <sup>D</sup> (T).

We now have a second way to *define* periodic distributions.

**Definition 3.5** (*Periodic distribution II*) A periodic distribution *T* is defined by

$$\langle T, \phi \rangle := \langle T^{\circ}, \Phi^{\circ} \rangle \tag{3.28}$$

with *T* ◦ a distribution in D (T) and where φ and ◦ are related by Eqs. (3.24) and (3.25).

This definition is compatible with the first one since replacing φ(*t*) by φ(*t* + T) doesn't change ◦.

# **Example 3.7: Dirac comb** *δ*<sup>T</sup>

Consider the distribution in D (T) defined by a Dirac pulse δ◦ with support consisting of the point at arc-length *<sup>p</sup>* <sup>=</sup> 0. Its value on a test function in <sup>D</sup>(T) is

$$
\langle \delta^\circ(p), \Phi^\circ(p) \rangle = \Phi^\circ(0) \dots
$$

The corresponding periodic distribution in D (R) is

$$\delta\_{\mathcal{T}}(t) := \sum\_{m = -\infty}^{\infty} \delta(t + m\mathcal{T})$$

and evaluates to the same value as δ◦

$$\begin{aligned} \left< \ast, \sum\_{m=-\infty}^{\infty} \delta(t+m\mathcal{T}) \right) \phi(t) &= \sum\_{m=-\infty}^{\infty} \left< \delta(t+m\mathcal{T}), \not{\xi}\_{\mathcal{T}}(t) \right> \Phi(t) \\ &= \sum\_{m=-\infty}^{\infty} \left< \xi\_{\mathcal{T}}(-m\mathcal{T}) \right> \Phi(-m\mathcal{T}) \\ &= \Phi^{\diamond}(0) \sum\_{m=-\infty}^{\infty} \xi\_{\mathcal{T}}(-m\mathcal{T}) \\ &= \Phi^{\diamond}(0) \,. \end{aligned}$$

Since the support of distributions in D (T) is bounded, with the second definition, the convolution of periodic distributions is always well-defined and associative

$$
\begin{aligned}
\langle \mathcal{S} \ast T, \phi \rangle &= \langle S(\tau) \otimes T(\lambda), \phi(\tau + \lambda) \rangle \\&= \langle S^{\circ}(\tau), \langle T^{\circ}(\lambda), \Phi^{\circ}(\tau + \lambda) \rangle \rangle \\&= \langle S^{\circ} \ast T^{\circ}, \Phi^{\circ} \rangle.
\end{aligned}
$$

In addition it's easily verified by replacing φ(*t*) by φ(*t* + T) that the resulting distribution is also T-periodic. In other words, D (T) is closed under convolution.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Fourier Transform of Distributions**

The Fourier transform is a major tool in the analysis of signals and systems. We will see that its extension to distributions will make the derivation of many results simpler and more direct than when working with functions.

# **4.1 Test Functions of Fast Descent**

Consider a Lebesgue integrable function *f* . Its Fourier transform is defined by

$$\mathcal{F}\{f\}(\omega) := \hat{f}(\omega) := \int\_{-\infty}^{\infty} f(t) \, \mathbf{e}^{-j\omega t} \, dt \tag{4.1}$$

which is a continuous function of ω. If *f* is such that ˆ*f* is also integrable, then

$$f(t) = \mathcal{F}^{-1}\{\hat{f}\}(t) := \frac{1}{2\pi} \int\_{-\infty}^{\infty} \hat{f}(\omega) \,\mathbf{e}^{\prime \omega t} \, d\omega \tag{4.2}$$

almost everywhere, with F <sup>−</sup><sup>1</sup>{ ˆ*f* } the inverse Fourier transform. F <sup>−</sup><sup>1</sup>{ ˆ*f* } may differ from *f* at the points where *f* is not continuous.

The Fourier transform of the regular distribution *f* is thus

$$\begin{aligned} \left< \hat{f}(\omega), \phi(\omega) \right> &= \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} f(t) \, \mathbf{e}^{-j\omega t} \, dt \, \phi(\omega) \, d\omega \\ &= \int\_{-\infty}^{\infty} f(t) \int\_{-\infty}^{\infty} \phi(\omega) \, \mathbf{e}^{-j\omega t} \, d\omega \, dt \end{aligned}$$

© The Author(s) 2024

F. Beffa, *Weakly Nonlinear Systems*, Understanding Complex Systems, https://doi.org/10.1007/978-3-031-40681-2\_4

55

56 4 Fourier Transform of Distributions

$$=\int\_{-\infty}^{\infty} f(t)\hat{\phi}(t) \,dt\,. \tag{4.3}$$

The last integral looks like a distribution in a form suitable to be generalized to arbitrary distributions. However, the support of φˆ is not compact. This is a manifestation of the uncertainty principle of the Fourier transform and can readily be seen by

$$\hat{\phi}(\omega) \, = \int\_{-\infty}^{\infty} \phi(t) \, \mathbf{e}^{-j\omega t} \, dt = \int\_{-a}^{b} \sum\_{m=0}^{\infty} \frac{(-j\omega)^{m}}{m!} \, t^{m} \, \phi(t) \, dt$$

$$= \sum\_{m=0}^{\infty} \frac{(-j\omega)^{m}}{m!} \int\_{-a}^{b} \phi(t) \, t^{m} \, dt \tag{4.4}$$

with *a* and *b* constants such that the interval [*a*, *b*] includes the support of φ.

To obtain a definition of the Fourier transform suitable for arbitrary distributions we have to replace the space of test functions D with a space closed under Fourier transformation. Suitable characteristics for the functions in this space can be inferred from the above expression for φˆ. First, given the uncertainty principle, the space has to be extended to functions of unbounded support (and therefore, the last step, the exchange of summation and integration, may not be valid). Then, if all summands have to remain finite, the limits lim*<sup>t</sup>*→±∞ φ(*t*)*t<sup>m</sup>* have to converge to zero for all values of *m*. Finally, to preserve arbitrary differentiability, the above characteristics must be satisfied by all derivatives of φ. These are the characteristics of the so-called *Schwartz space* S of which we give the general definition.

**Definition 4.1** (*Schwartz space* <sup>S</sup>(R*<sup>n</sup>*)) The Schwartz space <sup>S</sup>(R*<sup>n</sup>*) is the vector space of indefinitely differentiable functions <sup>φ</sup> : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>C</sup> that, together with all their derivatives, decrease more rapidly than any power of 1/|τ | as |τ | → ±∞. That is, for any *<sup>n</sup>*-tuples *<sup>m</sup>*, *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>*<sup>n</sup>* and <sup>τ</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*

$$\lim\_{|\mathbf{r}| \to \pm \infty} |\mathbf{r}^m D^k \phi(\mathbf{r})| = 0 \,. \tag{4.5}$$

Functions φ in the Schwartz space are called test functions of *rapid descent,* or *Schwartz functions.*

To see that the Fourier transform of a function <sup>φ</sup> <sup>∈</sup> <sup>S</sup>(R)is indeed another function in the same space, consider the *k*th derivative of φˆ. By integrating by parts we find

$$\begin{aligned} D^k \hat{\phi}(\omega) &= \int\_{-\infty}^{\infty} (-\jmath t)^k \, \phi(t) \, \mathbf{e}^{-\jmath at} \, dt \\ &= \frac{1}{\imath \omega} \int\_{-\infty}^{\infty} \mathbf{e}^{-\jmath at} D \left[ (-\jmath t)^k \, \phi(t) \right] \, dt \end{aligned}$$

and by iterating *m* times

$$\begin{aligned} \left| (\mathcal{J}\omega)^m D^k \widehat{\phi}(\omega) \right| &= \left| \int\_{-\infty}^{\infty} \mathrm{e}^{-j\omega t} D^m \left[ (-\mathcal{J}t)^k \,\phi(t) \right] dt \right| \\ &\leq \int\_{-\infty}^{\infty} \left| D^m \left[ t^k \,\phi(t) \right] \right| \, dt \, . \end{aligned}$$

Since this is valid for arbitrary *k* and *m* it shows that φˆ is in fact a function in the Schwartz space. In addition, given that the Fourier transform, and its inverse are almost symmetric, a similar calculation shows that the inverse Fourier transform of a Schwartz function φˆ is a function φ ∈ S. That is, *the Fourier transform is a bijection from the space* S *into itself.*

#### **Example 4.1: Gauss Function**

An important example of a function of rapid descent is the Gauss function

$$\phi(t) = \frac{1}{\sqrt{2\pi}\sigma} \mathbf{e}^{-t^2/(2\sigma^2)}\ .$$

It's widely known that its Fourier transform is

$$
\hat{\phi}(\omega) = e^{-\omega^2 \sigma^2 / 2}.
$$

One of the defining characteristics of distributions is their continuity. To talk about continuity we introduce a convergence principle (topology) similar to the ones we defined for D and E.

**Definition 4.2** (*Convergence in* <sup>S</sup>(R*<sup>n</sup>*)) A sequence of functions <sup>φ</sup>*<sup>m</sup>* <sup>∈</sup> <sup>S</sup>(R*<sup>n</sup>*)is said to converge in <sup>S</sup>(R*<sup>n</sup>*) to a function <sup>φ</sup> <sup>∈</sup> <sup>S</sup>(R*<sup>n</sup>*), if for each *<sup>n</sup>*-tuples *<sup>k</sup>*, *<sup>p</sup>* <sup>∈</sup> <sup>N</sup>*<sup>n</sup>* and <sup>τ</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* the sequence <sup>|</sup><sup>τ</sup> <sup>|</sup>*<sup>p</sup> <sup>D</sup><sup>k</sup>*φ*m*(τ ) converges uniformly to <sup>|</sup><sup>τ</sup> <sup>|</sup>*<sup>p</sup> <sup>D</sup><sup>k</sup>*φ(τ ), that is if

$$\lim\_{m \to \infty} |\mathfrak{r}|^p \, D^k \phi\_m(\mathfrak{r}) = |\mathfrak{r}|^p \, D^k \phi(\mathfrak{r}) \, .$$

# **4.2 Fourier Transform of Tempered Distributions**

For (4.3) to be an expression suitable for the definition of the Fourier transform for an arbitrary distribution, we must verify its linearity and continuity. The former is clear. To show the latter we have to verify that, if a sequence of test functions φ*<sup>m</sup>* ∈ S converges to zero, so does the sequence of their Fourier transforms φˆ*m*. That this is the case is shown by the following upper bound

$$\begin{split} |\hat{\phi}\_{m}(\boldsymbol{\omega})| &= \left| \int\_{0}^{\infty} \phi\_{m}(t) \, \mathrm{e}^{-j\boldsymbol{\alpha}t} \, dt \right| \leq \int\_{|t|<1} |\phi\_{m}(t)| \, dt + \int\_{|t| \geq 1} |\phi\_{m}(t)| \, dt \\ &\leq 2 \sup\_{|t|<1} |\phi\_{m}(t)| + \int\_{|t| \geq 1} |\frac{\phi\_{m}(t)}{t^{2}}| \, dt \\ &\leq 2 \sup\_{|t|<1} |\phi\_{m}(t)| + \sup\_{|t| \geq 1} |\phi\_{m}(t)|t^{2}| \int\_{|t| \geq 1} \frac{1}{|t^{2}|} \, dt \\ &= 2 \sup\_{|t|<1} |\phi\_{m}(t)| + 2 \sup\_{|t| \geq 1} |\phi\_{m}(t)|t^{2}| .\end{split}$$

We have a good candidate for the definition of the Fourier transform for an arbitrary distribution. However, since the space S is larger than D, the Fourier transform can only be defined for the following subset of distributions.

**Definition 4.3** (*Tempered distributions*) *Tempered distributions* (also called distributions of *slow growth*) are distributions that can be extended to continuous, linear functionals on the Schwartz space S.

The space of all continuous, linear functionals on S is denoted by S and, since the Schwartz space S is a subspace of E, we have the following inclusion: E ⊂ S ⊂ D . Consequently, from Sect. 2.5, we conclude that, if a distribution *T* ∈ D can be extended to a continuous, linear functional on S, then this extension is unique (and the other way around). S can therefore be *identified* with tempered distributions.

#### **Example 4.2: Slowly Increasing Function**

Consider a locally integrable function *f* satisfying

$$|f(t)| \le C|t|^m \qquad \text{as} \quad |t| \to \infty$$

for some constant *C* and some natural number *m*. Then, *f* is a tempered distribution, since

$$\begin{aligned} |\langle f, \phi \rangle| &\leq \int\_{|t|<1} |f(t)\,\phi(t)| \,dt + \int\_{|t|\geq 1} |f(t)\,\phi(t)| \,dt \\ &\leq \sup\_{|t|<1} |\phi(t)| \int\_{|t|<1} |f(t)| \,dt + \sup\_{|t|\geq 1} \left( |t^{m+2}| \,|\phi(t)| \right) \int\_{|t|\geq 1} \frac{C}{|t|^2} \,dt \end{aligned}$$

is bounded for every φ ∈ S.

#### **Example 4.3: Distributions in** E**-**

Distribution with bounded support are defined on all indefinitely differentiable function, independently of their asymptotic behavior as *t* → ∞. For this reason the Fourier transform of distributions in E is always well-defined.

#### **Example 4.4: Multiplication with Polynomial**

If *T* is a tempered distribution and *p* a polynomial, then *p T* is a tempered distribution. *p T* is in fact defined as

$$
\langle p \, T, \phi \rangle = \langle T, \, p \, \phi \rangle
$$

and it's easy to see that *p* φ ∈ S.

We can finally define the Fourier transform for tempered distributions.

**Definition 4.4** (*Fourier transform on* S ) The Fourier transform of a tempered distribution *T* and its inverse, are defined by

$$
\langle \mathcal{F}\{T\}, \phi \rangle := \langle T, \mathcal{F}\{\phi\} \rangle \tag{4.6}
$$

$$\left< \mathcal{F}^{-1} \{ T \} , \phi \right> := \left< T , \mathcal{F}^{-1} \{ \phi \} \right> \tag{4.7}$$

for every function φ ∈ S.

Clearly, the Fourier transform of a tempered distribution is a tempered distribution. Note that, given the properties of Schwartz functions, for a tempered distribution it's always the case that

$$\mathcal{F}^{-1}\{\mathcal{F}\{T\}\} = \mathcal{F}\left\{\mathcal{F}^{-1}\{T\}\right\} = T\dots$$

In addition the Fourier transform and its inverse satisfy the following *symmetry* relation

$$\begin{aligned} \left< \mathcal{F} \left\{ T \right\}, \phi \right> &= \left< T, \mathcal{F} \left\{ \phi \right\} \right> = \left< T(\omega), \int\_{-\infty}^{\infty} \phi \left( t \right) \mathbf{e}^{-j\omega t} dt \right> \\ &= \left< T(\omega), 2\pi \mathcal{F}^{-1} \{ \phi \} (-\omega) \right> \\ &= \left< 2\pi \, \mathcal{F}^{-1} \{ T(-\omega) \}, \phi \right> . \end{aligned}$$

If in this expression we replace *T* by its Fourier transform and denote it by *T*ˆ, then this symmetry relation can also be expressed as

$$\mathcal{F}\left\{\hat{T}(t)\right\} = 2\pi \, T(-\omega) \,. \tag{4.8}$$

As with functions, we will often use the convention of denoting by *T*ˆ the Fourier transform of a tempered distribution *T* .

#### **Example 4.5: Fourier Transform and** *δ*

The Fourier transform of the delta distribution δ is

$$\left<\hat{\delta},\phi\right> = \left<\delta,\hat{\phi}\right> = \hat{\phi}(0) = \int\_{-\infty}^{\infty} \phi(t)dt = \left<1,\phi\right>$$

or

$$
\delta = 1.
$$

Conversely, the Fourier transform of the constant function 1 is

$$\left\langle \hat{1}, \phi \right\rangle = \left\langle 1, \hat{\phi} \right\rangle = \int\_{-\infty}^{\infty} \hat{\phi}(\omega) \, d\omega = 2\pi \left\langle \delta, \mathcal{F}^{-1} \{\hat{\phi}\} \right\rangle = 2\pi \left\langle \delta, \phi \right\rangle$$

or

1ˆ = 2πδ.

This expression is often found in the technical literature symbolically written as

$$\delta(t) = \frac{1}{2\pi} \int\_{-\infty}^{\infty} \mathbf{e}^{-j\alpha t} \, d\alpha = \frac{1}{2\pi} \int\_{-\infty}^{\infty} \mathbf{e}^{j\alpha t} \, d\alpha \, \dots$$

The Fourier transform of the derivative of δ is

$$\left< \mathcal{F} \left< D\delta \right> , \phi \right> = \left< D\delta , \hat{\phi} \right> = -\left< \delta , -j\omega\phi \hat{\phi} \right> = j\omega \left< \hat{\delta} , \phi \right> = \left< j\omega , \phi \right> $$

and, by iterating this procedure, for the higher order derivatives we find

$$\mathcal{F}\left\{D^k\delta\right\} = \left(J\alpha\right)^k\dots$$

#### **Example 4.6: Complex Tones**

The Fourier transform of a complex tone is

$$\begin{aligned} \left< \mathcal{F} \left\{ \mathbf{e}^{J\alpha \cdot t} \right\}, \phi \right> &= \left< \mathbf{e}^{J\alpha \cdot t}, \hat{\phi} \right> = \int\_{-\infty}^{\infty} \mathbf{e}^{J\alpha \cdot t} \hat{\phi}(t) \, dt = 2\pi \, \phi(\omega\_c) \\ &= 2\pi \, \langle \delta(\omega - \omega\_c), \phi \rangle \end{aligned}$$

or

$$\mathcal{F}\left\{\mathbf{e}^{J\boldsymbol{\alpha}\_{t}t}\right\} = 2\pi \,\delta(\boldsymbol{\alpha} - \boldsymbol{\alpha}\_{c})\,.$$

Similarly, the Fourier transform of a shifted Dirac pulse is found to be

$$\mathcal{F}\left\{\delta(t-\tau\_o)\right\} = \mathbf{e}^{-j\alpha\tau\_0}.$$

#### **Example 4.7: Dirac comb**

An equally spaced sequence of Dirac pulses is a tempered distribution called a Dirac comb with period T

$$\delta\_{\mathcal{T}}(t) := \sum\_{m = -\infty}^{\infty} \delta(t - m\mathcal{T})\,.$$

The linearity and continuity of distributions permit to calculate its Fourier transform term by term and, by using previous results, we obtain

$$\mathcal{F}\left\{\delta\_{\mathcal{T}}\right\} = \sum\_{\mathfrak{m} = -\infty}^{\infty} \mathbf{e}^{J^{\alpha \mathfrak{m} \mathbf{T}}} \dots$$

This distribution is formally the limit

$$\lim\_{\kappa, M \to \infty} \left< \mathbf{s}\_{P,K}(\boldsymbol{\omega}) + \mathbf{s}\_{N,M}(\boldsymbol{\omega}) - 1, \phi(\boldsymbol{\omega}) \right> $$

with

$$s\_{P,K}(\alpha) := \sum\_{m=0}^{K-1} \mathbf{e}^{J\alpha m \mathcal{T}}$$

62 4 Fourier Transform of Distributions

$$s\_{N,M}(\omega) := \sum\_{m=0}^{M-1} \mathbf{e}^{-J\omega m \mathcal{T}}.$$

For values of <sup>ω</sup> <sup>=</sup> *<sup>k</sup>* <sup>2</sup>π/T, *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup> the partial sums can be represented by

$$\begin{split} s\_{P,K}(\boldsymbol{\omega}) &= \frac{1 - \mathbf{e}^{\boldsymbol{\omega} \circ K \mathcal{T}}}{1 - \mathbf{e}^{\boldsymbol{\omega} \circ \mathcal{T}}} = \frac{1}{1 - \mathbf{e}^{\boldsymbol{\omega} \circ \mathcal{T}}} - \frac{\mathbf{e}^{\boldsymbol{\omega} \circ K \mathcal{T}}}{1 - \mathbf{e}^{\boldsymbol{\omega} \circ \mathcal{T}}} \\ s\_{N,M}(\boldsymbol{\omega}) &= \frac{1 - \mathbf{e}^{-\boldsymbol{\omega} \circ M \mathcal{T}}}{1 - \mathbf{e}^{-\boldsymbol{\omega} \circ \mathcal{T}}} = \frac{1}{1 - \mathbf{e}^{-\boldsymbol{\omega} \circ \mathcal{T}}} - \frac{\mathbf{e}^{-\boldsymbol{\omega} \circ M \mathcal{T}}}{1 - \mathbf{e}^{-\boldsymbol{\omega} \circ \mathcal{T}}}. \end{split}$$

The sum of the first terms is easily seen to equal 1 and, with the results of Example 2.11, the limit of the second ones do vanish. The support of F {δ<sup>T</sup> } therefore consists in the set of points <sup>ω</sup> <sup>=</sup> *<sup>k</sup>* <sup>2</sup>π/T, *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup>. Consequently, when applied to any test function φ ∈ S, its value must be a weighted sum of the values of the test function at these points. Since, replacing φ(*t*) by φ(*t* + T) doesn't change the result, we can also deduce that the weighting factor must be the same for all terms. We thus have

$$\left< \mathcal{F} \left< \delta \tau \right> , \phi \right> = \sum\_{m = -\infty}^{\infty} C \phi \left( m \omega\_c \right) = C \left< \delta\_{\alpha\_c} , \phi \right>$$

with *C* a constant and ω*<sup>c</sup>* = 2π/T. The value of the constant can be found by inserting any Schwartz function. A convenient choice is the one of Example 4.1 with <sup>σ</sup> <sup>=</sup> <sup>√</sup>2π/*<sup>T</sup>* . With it, on the one hand we have

$$\left< \mathcal{F} \left\{ \delta\_{\mathcal{T}} \right\}, \phi \right> = C \left< \delta\_{a\iota\_{\ast}} \frac{\mathcal{T}}{2\pi} \mathbf{e}^{-(t\mathcal{T})^2/(4\pi)}, = \right> C \frac{\mathcal{T}}{2\pi} \sum\_{m=-\infty}^{\infty} \mathbf{e}^{-m^2 \pi}$$

and on the other hand

$$\left< \mathcal{F} \left\{ \delta\_{\mathcal{T}} \right\}, \phi \right> = \left< \delta\_{\mathcal{T}}, \hat{\phi} \right> = \left< \delta\_{\mathcal{T}}, \mathbf{e}^{-(t/T)^2 \pi} \right> = \sum\_{m = -\infty}^{\infty} \mathbf{e}^{-m^2 \pi}$$

so that *C* = 2π/*T* . We have thus established the following important result

$$
\mathcal{F}\left\{\delta\_{\mathcal{T}}\right\} = a\_c \,\delta\_{a\_c}\,.\tag{4.9}
$$

The Fourier transforms of the δ and related distributions are summarized in Table 4.1.

A useful property of the Fourier transform is that it *preserves parity.* This means that the Fourier transform of an odd tempered distribution *T* is odd


$$
\begin{aligned}
\left< \mathcal{F} \left\{ T \right\}, \phi(-t) \right> &= \left< T, \mathcal{F} \left\{ \phi(-t) \right\} \right> = \left< T, \int\_{\mathbb{R}} \phi(-t) \, \mathbf{e}^{-j\omega t} dt \right> \\ &= \left< T, \int\_{\mathbb{R}} \phi(t) \, \mathbf{e}^{j\omega t} dt \right> \\ &= \left< T, \hat{\phi}(-\omega) \right> = - \left< T, \hat{\phi}(\omega) \right> \\ &= - \left< \mathcal{F} \left\{ T \right\}, \phi(t) \right> \end{aligned}
$$

and, similarly, the Fourier transform of an even tempered distribution is even.

We conclude this section with an important property of the Fourier transform of real distributions. Let *T* be a real distribution, φ a real valued Schwartz function and let denote complex conjugation by an over bar. Then

$$\begin{aligned} \left\langle \overline{\hat{T}}, \phi \right\rangle &= \overline{\left\langle \hat{T}, \phi \right\rangle} = \overline{\left\langle T, \hat{\phi} \right\rangle} = \left\langle T, \overline{\hat{\phi}} \right\rangle \\ &= \left\langle T, \hat{\phi}(-\omega) \right\rangle = \left\langle \hat{T}(-\omega), \phi \right\rangle \\\\ \overline{\hat{T}}(\omega) &= \hat{T}(-\omega) \end{aligned} \tag{4.10}$$

as for real functions.

or

# **4.3 Distributions with Bounded Support**

The Fourier transform of distributions with bounded support can be expressed in a simpler, useful form that we explore in this section. To this end, consider first the convolution between a distribution of bounded support *T* and the regular constant distribution 1

#### 64 4 Fourier Transform of Distributions

$$\begin{aligned} \langle 1\*T, \phi \rangle &= \langle T, \langle 1, \phi \rangle \rangle = \left\langle T(\tau), \int\_{\mathbb{R}} \phi(\tau + \lambda) \, d\lambda \right\rangle \\ &= \langle 1, \langle T, \phi \rangle \rangle = \int\_{\mathbb{R}} \langle T(\tau), \phi(\tau + \lambda) \rangle \, d\lambda \end{aligned}$$

and note the two equivalent integral representations. With these equalities we can then proceed to represent the Fourier transform of *T* by

$$\begin{aligned} \left\langle \hat{T}, \phi \right\rangle &= \left\langle T(t), \int\_{\mathbb{R}} \phi(\omega) \mathbf{e}^{-j\ell\omega} \, d\omega \right\rangle \\ &= \int\_{\mathbb{R}} \left\langle T(t), \phi(\omega) \mathbf{e}^{-j\ell\omega} \right\rangle d\omega \\ &= \int\_{\mathbb{R}} \left\langle T(t), \mathbf{e}^{-j\alpha t} \right\rangle \phi(\omega) \, d\omega \\ &= \left\langle T(t), \mathbf{e}^{-j\alpha t} \right\rangle, \phi(\omega) \rangle \end{aligned}$$

with

$$
\hat{T}(\omega) = \left< T(t), \mathbf{e}^{-j\omega t} \right>\tag{4.11}
$$

an indefinitely differentiable *function* of slow growth. In a similar way we obtain

$$\mathcal{F}^{-1}\{T\} = \frac{1}{2\pi} \left\langle T(\omega), \mathbf{e}^{\mathrm{Jot}} \right\rangle. \tag{4.12}$$

# **4.4 Fourier Transform and Convolution**

Consider a tempered distribution *S* and a distribution with bounded support *T* . Their convolution is well-defined if, for every test function φ ∈ S

$$\langle \langle S \ast T, \phi \rangle = \langle S(\mathfrak{r}), \langle T(\lambda), \phi(\mathfrak{r} + \lambda) \rangle \rangle = \langle T(\lambda), \langle S(\mathfrak{r}), \phi(\mathfrak{r} + \lambda) \rangle \rangle \ .$$

In the first case,

$$\langle T(\lambda), \phi(\mathfrak{r} + \lambda) \rangle$$

is a function ζ (τ ) ∈ Sand the outer functional is therefore well-defined. In the second case,

$$
\langle \mathcal{S}(\lambda), \phi(\mathfrak{r} + \lambda) \rangle
$$

is an indefinitely differentiable function γ (τ ) ∈ E. Consequently, the outer functional is again well-defined. Equality of the two expressions is guaranteed by the uniqueness


of the extension of the distributions *T* and *S* to E and S respectively (and the other way around).

We now show that the Fourier transform of the convolution of *S* and *T* is the product of their Fourier transforms *S*ˆ and *T*ˆ:

$$\begin{aligned} \left< \mathcal{F} \left\{ S \ast T \right\}, \phi \right> &= \left< S \ast T, \mathcal{F} \left\{ \phi \right\} \right> = \left< S, \left< T, \mathcal{F} \left\{ \phi \right\} \right> \\ &= \left< S(\omega), \left< T(\lambda), \int\_{\mathbb{R}} \phi(t) \mathbf{e}^{-j \left( \omega + \lambda \right)t} \, dt \right> \right> \\ &= \left< S(\omega), \int\_{\mathbb{R}} \phi(t) \left< T(\lambda), \mathbf{e}^{-j \lambda t} \right> \mathbf{e}^{-j \omega t} \, dt \right> \\ &= \left< \mathcal{F} \left\{ S(\omega) \right\}, \phi(t) \left< T(\lambda), \mathbf{e}^{-j \lambda t} \right> \right> \\ &= \left< \mathcal{F} \left\{ S(\omega) \right\} \left< T(\lambda), \mathbf{e}^{-j \lambda t} \right>, \phi(t) \right> \\ &= \left< \hat{S} \, \hat{T}, \phi \right> \end{aligned}$$

or

$$
\mathcal{F}\left\{\mathcal{S}\*\boldsymbol{T}\right\} = \ddot{\mathcal{S}}\,\bar{\mathcal{T}}\,.\tag{4.13}
$$

The product is well-defined since *T*ˆ is an indefinitely differentiable function of slow growth. A similar result is readily obtained for the inverse Fourier transform

$$
\mathcal{F}^{-1}\{\mathbf{S} \ast T\} = 2\pi \,\mathcal{F}^{-1}\{\mathbf{S}\} \mathcal{F}^{-1}\{T\}.\tag{4.14}
$$

These are central results and arguably the most important properties of the Fourier transformation. It can be shown that this relation is valid in other cases as well. For example, in the case of locally integrable functions which are slowly increasing [20].

With these properties, the previously obtained Fourier transforms for the Dirac δ distribution and the properties of the convolution product we immediately obtain the properties listed in Table 4.2. In particular, it's noteworthy the fact that *the Dirac* δ *distribution acting as a unit with respect to the convolution product is related to the fact that its Fourier transform is 1.*

#### **Example 4.8: Fourier Transform of pv 1***/t*

In Example 2.13 we saw that the equation

$$\_{t}T=1$$

has solutions

$$T = \text{pv } \frac{1}{t} + C\delta.$$

with *C* a constant. By noting that pv 1/*t* is odd, while δ is even, we can find the Fourier transform of the former. First observe that

$$\mathcal{F}\left\{(-j\mu)\boldsymbol{T}\right\} = D\hat{T}$$

and hence, by transforming both sides of the equation we have that

$$\,\_{J}DT = 2\pi \,\delta \, .$$

Since the Fourier transform preserves parity, we have to look for an odd solution of this equation, and we find

$$\hat{T}(\omega) = -j\pi \operatorname{sign}(\omega) \qquad \text{or} \qquad \mathcal{F}\left\{ \operatorname{pv} \frac{J}{\pi|t} \right\}(\omega) = \operatorname{sign}(\omega) \dots$$

With this result and the symmetry of the Fourier transform (Eq. (4.8)) we also find

$$\mathcal{F}\left\{\text{sign}\right\}(\omega) = 2\pi \text{ pv } \frac{J}{\pi} \frac{J}{(-\omega)} = \text{pv } \frac{2}{J\omega} \text{ .}$$

# **Example 4.9: Fourier transform of 1<sup>+</sup>**

The Heaviside step function 1<sup>+</sup> can be written as

$$\mathbf{1}\_{+}(t) = \frac{1}{2} \left[ 1 + \text{sign}(t) \right].$$

Its Fourier transform is therefore

$$\mathcal{F}\left\{\mathfrak{l}\_{+}(t)\right\} = \frac{1}{2} \left[2\pi\,\delta(\omega) + \text{pv}\,\frac{2}{J\,\omega}\right] = \pi\,\delta + \text{pv}\,\frac{1}{J\,\omega}\,\dots$$

From the symmetry of the Fourier transform we also obtain

$$\mathcal{F}\left\{\pi\,\delta+\text{pv}\,\frac{1}{Jt}\right\}=2\pi\,\mathsf{1}\_{+}(-\omega).$$

or

$$\mathcal{F}\left\{\frac{1}{2}\delta + \text{pv}\,\frac{J}{2\pi|t|}\right\} = \mathfrak{1}\_+(\omega)\,.$$

# **4.5 Periodic Distributions**

In this section we investigate the Fourier transform of periodic distributions. Consider first a regular distribution arising from a locally integrable periodic function *f* . If we introduce a function *f*

$$f\_{\sqcap}(t) = \begin{cases} f(t) \ a \le t < a + \mathcal{T} \\ 0 & \text{otherwise} \end{cases}$$

with *a* a constant, then *f* can be expressed as a convolution product

$$f(t) = f\_{\sqcap}(t) \* \delta\_{\varGamma} \,. \tag{4.15}$$

The Fourier transform of *f* can therefore be written as the product of the transforms of *f* and δ<sup>T</sup> , which is well-defined since *f* has compact support

$$\mathcal{F}\{f\} = \left\langle f\_{\square}, \mathbf{e}^{-jat} \right\rangle \omega\_c \delta\_{a\_k} = \frac{2\pi}{\mathcal{T}} \sum\_{m = -\infty}^{\infty} \left\langle f\_{\square}, \mathbf{e}^{-jma\_ct} \right\rangle \delta(\omega - m\omega\_c) \dots$$

From this we see that the Fourier transform of *f* consists of a train of equally spaced Dirac pulses, each weighted by a numerical coefficient, and that this set of weighting numbers fully characterize it.

If we now represent *f* as the inverse Fourier transform of F { *f* } and make use of the results of Example 4.6, we obtain a trigonometric series

$$f(t) \ = \frac{2\pi}{\mathcal{T}} \sum\_{m = -\infty}^{\infty} \left\langle f\_{\square}, \mathbf{e}^{-jma\_{\circ}t} \right\rangle \mathcal{F}^{-1} \{\delta(\omega - m\omega\_{\circ})\}$$

$$= \frac{1}{\mathcal{T}} \sum\_{m = -\infty}^{\infty} \left\langle f\_{\square}, \mathbf{e}^{-jma\_{\circ}t} \right\rangle \mathbf{e}^{jma\_{\circ}t}.\tag{4.16}$$

called the Fourier series of *f* . The coefficients are values obtained by evaluating *f* on indefinitely differentiable periodic functions which are members of <sup>D</sup>(T) and the values are identical to the ones obtained by evaluating the distribution *f* ◦ ∈ D (T) corresponding to *f* (see Sect. 3.4) on the same functions. Consequently, the above trigonometric series is both a representation of a periodic distribution in D (R) as well as that of a distribution in D (T).

These arguments can be extended to general periodic distributions without any difficulty so that we have the following general definition of the Fourier series of a periodic distribution.

**Definition 4.5** (*Fourier Series*) The *Fourier series* of a distribution *T* ◦ ∈ D (T), or a periodic distribution *T* ∈ D (R), is the trigonometric series

$$\sum\_{m=-\infty}^{\infty} c\_m \mathbf{e}^{Im\omega\_c p} \tag{4.17}$$

with coefficients

$$c\_m = \frac{1}{\mathcal{T}} \left< T^{\circ}, \mathbf{e}^{-jma\_c p} \right> . \tag{4.18}$$

The coefficients are called the Fourier coefficients of the series.

The Fourier series is the only trigonometric series that converges to the distribution *T* ◦ in D (T). In fact, if for any  <sup>∈</sup> <sup>D</sup>(T) the series

$$\sum\_{m = -\infty}^{\infty} d\_m \left< \mathbf{e}^{J^{m\alpha\_c, p}}, \Phi \right> $$

does converge, then by putting  = e−j*m*ω*<sup>c</sup> <sup>p</sup>* and using the orthogonality of trigonometric functions we find that

$$\left< T^{\circ}, \mathbf{e}^{-jma\_{c}p} \right> = \mathcal{T}d\_{m}$$

which shows that the coefficients *dm* correspond to the Fourier coefficients of *T* ◦.

As every distribution, the Fourier series of a distribution can be differentiated term by term. Therefore, if we designate by *cm*(*T* ◦) the *m*th Fourier coefficient of the distribution *T* ◦, we have that

$$c\_m(D^k T^\circ) = (\underline{\boldsymbol{\jmath}} \boldsymbol{m} \boldsymbol{\alpha}\_c)^k c\_m(T^\circ) \,. \tag{4.19}$$

A natural question to ask is: How do we know if a certain trigonometric series converges to a periodic distribution? To answer this question first note that the series of numbers

$$\sum\_{m=1}^{\infty} \frac{1}{m^2}$$

#### 4.5 Periodic Distributions 69

is absolutely convergent. Therefore, if the magnitude of the coefficients |*cm*|, as *m* → ∞, are bounded above by *C*/|*m*| 2, with *C* a constant, then the series converges to a continuous function *f* and hence to a distribution. But distributions are always differentiable term by term an arbitrary number of times. Using (4.19) we therefore conclude that, if the magnitude of the coefficients of the series, as *m* → ∞ are bound by *C*|*m*| *<sup>k</sup>* for some number *k* ≥ 0 and a constant *C*, then the series converges to a distribution.

We derived the Fourier series starting from the Fourier transform and its property that converts convolution into a product. We therefore expect a similar property for the Fourier series. Consider the convolution of two distributions *S*◦ and *T* ◦ with the same period T. The Fourier coefficients of the resulting series are

$$\begin{split} c\_{\mathfrak{m}}(\mathbb{S}^{\circ} \ast T^{\circ}) &= \frac{1}{\mathcal{T}} \left< \mathbb{S}^{\circ} \ast T^{\circ}, \mathbb{e}^{-j m \alpha\_{\circ} \cdot t} \right> \\ &= \frac{1}{\mathcal{T}} \left< \mathbb{S}^{\circ}(t) \otimes T^{\circ}(\lambda), \mathbb{e}^{-j m \alpha\_{\circ} \cdot (t + \lambda)} \right> \\ &= \frac{1}{\mathcal{T}} \left< \mathbb{S}^{\circ}(t), \mathbb{e}^{-j m \alpha\_{\circ} \cdot t} \right> \left< T^{\circ}(t), \mathbb{e}^{-j m \alpha\_{\circ} \cdot \lambda} \right> \\ &= \mathcal{T} \, c\_{\mathfrak{m}}(\mathbb{S}^{\circ}) \, c\_{\mathfrak{m}}(T^{\circ}) \,. \end{split} \tag{4.20}$$

Consequently the Fourier series of the convolution of *S*◦ and *T* ◦ is

$$\mathcal{S}^{\diamond} \ast T^{\diamond} = \mathcal{T} \sum\_{m = -\infty}^{\infty} c\_m(\mathcal{S}^{\diamond}) \, c\_m(T^{\diamond}) \mathbf{e}^{\prime^{\prime \text{ma}\_{\mathcal{L}} \mathbf{I}}} \tag{4.21}$$

and, indeed we see that the Fourier series representation of periodic distributions transforms convolutions into products.

# **Example 4.10: Fourier series of** *δ*<sup>T</sup>

The *m*th Fourier coefficient of the Dirac comb δ<sup>T</sup> is

$$c\_m(\delta\_{\mathcal{T}}) = \frac{1}{\mathcal{T}} \left< \delta^\circ, \mathbf{e}^{-j m \alpha\_c t} \right> = \frac{1}{\mathcal{T}}$$

with ω*<sup>c</sup>* = 2π/T. Hence, its Fourier series is

$$\delta\_{\mathcal{T}} = \sum\_{m = -\infty}^{\infty} \frac{1}{\mathcal{T}} \mathbf{e}^{jm\alpha\_{\mathcal{C}}t} \dots$$

If we now compute the convolution of δ<sup>T</sup> with another T-periodic distribution *T* with Fourier coefficients *cm*(*T* ), from (4.21) we see that, as expected, δ<sup>T</sup> act as a unit

$$c\_m(T \* \delta\_{\bar{T}}) = \mathcal{T}c\_m(\delta\_{\bar{T}}) \\ c\_m(T) = c\_m(T) \dots$$

A periodic distribution can be represented as a convolution product between the Dirac comb δ<sup>T</sup> and a distribution different from the one of (4.15). For example, with ξ<sup>T</sup> any unitary function we have

$$\begin{aligned} T &= T \sum\_{m = -\infty}^{\infty} \xi\_{\mathcal{T}}(t - m\mathcal{T}) \\ &= \sum\_{m = -\infty}^{\infty} T(t - m\mathcal{T}) \, \xi\_{\mathcal{T}}(t - m\mathcal{T}) \\ &= \colon \sum\_{m = -\infty}^{\infty} S(t - m\mathcal{T}) = S\*\delta\_{\mathcal{T}} \end{aligned} \tag{4.22}$$

which defines a distribution *S* whose support is finite and larger than a single period of *T* . Using this representation we can express the Fourier coefficients and the Fourier transform of *T* in terms of the one of *S* as

$$\hat{T}\_{\ \ \ \ \ } = \omega\_c \ \hat{S} \ \delta\_{\omega\_c} = \frac{2\pi}{\mathcal{T}} \sum\_{m = -\infty}^{\infty} \hat{S}(\omega) \ \delta(\omega - m\omega\_c) \tag{4.23}$$

$$c\_m(T) \tag{4.24}$$

For this reason, if in some calculation we obtain the Fourier transform of a signal in this form, with *S*ˆ the transform of a known non-periodic distribution, then we can immediately write *T* in terms of *S* as in (4.22).

We close this section with a property that is the counterpart of (4.10) for the Fourier coefficients of a real periodic distribution

$$
\overline{c}\_m = c\_{-m} \,. \tag{4.25}
$$

# **4.6 Extension to Several Variables**

The Fourier transform can be extended to functions of several variables by transforming each variable individually. That is, if *f* is an integrable function on R*<sup>n</sup>*, then we can apply the one-dimensional Fourier transform to each variable individually, keeping the other ones constant. After performing this operation with respect to each variable in turns, we obtain the following expression which defines of the *n*-dimensional Fourier transform

#### 4.6 Extension to Several Variables 71

$$\begin{aligned} \hat{f}(\boldsymbol{\omega}\_1, \dots, \boldsymbol{\omega}\_n) &:= \mathcal{F}\left\{ f \right\}(\boldsymbol{\omega}\_1, \dots, \boldsymbol{\omega}\_n) \\ \vdots &:= \int\_{-\infty}^{\infty} \dots \int\_{-\infty}^{\infty} f(\boldsymbol{\tau}\_1, \dots, \boldsymbol{\tau}\_n) \, \mathbf{e}^{-f(\boldsymbol{\omega}\_1 \boldsymbol{\tau}\_1 + \dots + \boldsymbol{\omega}\_n \boldsymbol{\tau}\_n)} \, d\boldsymbol{\tau}\_1 \dots d\boldsymbol{\tau}\_n \,. \end{aligned}$$

To shorten the notation we will write

$$\hat{f}(\boldsymbol{\omega}) = \mathcal{F}\left\{f\right\}(\boldsymbol{\omega}) = \int\_{\mathbb{R}^n} f(\boldsymbol{\tau}) \, \mathrm{e}^{-f(\boldsymbol{\omega}, \boldsymbol{\tau})} \, d^n \boldsymbol{\tau} \tag{4.26}$$

with τ,ω <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and

$$(\boldsymbol{\alpha}, \boldsymbol{\tau}) := \sum\_{m=1}^{n} \alpha\_m \boldsymbol{\tau}\_m \dots$$

The *n*-dimensional inverse Fourier transform can be derived with the same procedure, and we obtain the following definition

$$\mathcal{F}^{-1}\{f\}(\mathbf{r}) := \frac{1}{(2\pi)^n} \int\_{\mathbb{R}^n} f(\omega) \,\mathbf{e}^{f(\omega,\mathbf{r})} \,d^n \boldsymbol{\omega} \,. \tag{4.27}$$

With these definitions it's easy to see that our definition of Fourier transform for tempered distributions remains valid for *n* > 1 as well. All properties carry over in similar form. For example, looking back at the derivation of the symmetry relation given by (4.8), we see that in the *n*-dimensional case it becomes

$$\mathcal{F}\left\{\hat{T}(\tau)\right\} = (2\pi)^n \, T(-\omega) \,. \tag{4.28}$$

The only difference from the one dimensional case is the fact that the factor of 2π becomes (2π )*<sup>n</sup>*. This happens to all properties involving factors of 2π.

The most important convolution property (4.13) remains unchanged, as can easily be verified by inspecting the derivation for the one-dimensional case.

Before proceeding, it's convenient to extend the multi-index notation that up to now we only used in conjunction with the differential operator. Let *a* be an *n*-tuple in <sup>C</sup>*<sup>n</sup>* and *<sup>k</sup>* a multi-index that we allow to include negative numbers (*k*1,..., *kn*) <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>*. Then we can define

$$\begin{array}{c} a^k := a\_1^{k\_1} \dots a\_n^{k\_n} \quad \text{ (exponentiation)}\\ k \; a := (k\_1 a\_1, \dots, k\_n a\_n) \quad \text{ (direct product)}\\ \sum\_{k=l\_l}^{l\_u} f\_k := \sum\_{k\_1=l\_1}^{u\_1} \dots \sum\_{k\_u=l\_u}^{u\_n} f\_{k\_1, \dots, k\_u} \text{ (summation)} \end{array}$$

with *fk* some function parameterized by the multi-index *k* and *l*, *u* lower resp. upper multi-indices. If in a summation we write integer numbers instead of *l* and *u*, we intend multi-indices equal to that number in every position.

We can introduce a multi-index notation for the factorial as well. However, this only makes sense for tuples of natural numbers *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>*<sup>n</sup>*

$$k! := \prod\_{i=1}^{n} (k\_i)!.$$

#### **Example 4.11: Fourier transform of** *δ*

In Example 3.1 we saw that the δ distribution in D (R*<sup>n</sup>*) is the tensor product of one dimensional <sup>δ</sup>'s. Hence, with τ,λ <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and the results of Example 4.5 the Fourier transform of the *n*-dimensional shifted δ becomes

$$\mathcal{F}\left\{\delta(\mathfrak{r}-\lambda)\right\} = \mathfrak{e}^{-J^{(\mathfrak{a},\mathfrak{t})}}$$

and it's partial derivative

$$\mathcal{F}\left\{D\_i\delta(\mathfrak{r})\right\} = (j\rho\_i) \qquad i = 1, \ldots, n\ .$$

Using the multi-index notation, the higher order partial derivatives can be conveniently expressed as

$$\mathcal{F}\left\{D^k\delta(\mathfrak{r})\right\} = (\mathfrak{z}o)^k\,.$$

Using the *n*-dimensional symmetry relation given by (4.28) we also immediately find

$$\mathcal{F}\left\{\mathbf{e}^{\mathbf{j}\left(\boldsymbol{w}\_{k},\mathbf{r}\right)}\right\} = (2\pi)^{n}\delta(\boldsymbol{w}-\boldsymbol{w}\_{c})^{2}$$

and

$$\mathcal{F}\left\{\left(-\jmath\tau\right)^{k}\right\} = \left(2\pi\right)^{n}D^{k}\delta(\boldsymbol{\varrho})\dots$$

As in the one-dimensional case, the other properties of the *n*-dimensional Fourier transform are immediate consequences of the convolution property and the convolution and transform of the δ distribution.

A periodic function on R*<sup>n</sup>* is a function that is periodic in each independent variable individually, that is, such that there are positive numbers T*<sup>i</sup>* for *i* = 1,..., *n*, called the period of the *i*th independent variable, so that

$$f(\mathfrak{r}\_1, \dots, \mathfrak{r}\_i + \mathcal{T}\_i, \dots, \mathfrak{r}\_n) = f(\mathfrak{r}\_1, \dots, \mathfrak{r}\_i, \dots, \mathfrak{r}\_n) \dots$$

This extension of the concept of a periodic function to higher dimensions permits us to widen the definition of periodic distributions (3.23) on test function of higher dimensions <sup>D</sup>(R*<sup>n</sup>*) in a straightforward way

$$
\begin{split} & \langle T(\mathfrak{r}\_1, \dots, \mathfrak{r}\_i + \mathscr{T}\_i, \dots, \mathfrak{r}\_n), \phi(\mathfrak{r}\_1, \dots, \mathfrak{r}\_i, \dots, \mathfrak{r}\_n) \rangle \\ & = \langle T(\mathfrak{r}\_1, \dots, \mathfrak{r}\_i, \dots, \mathfrak{r}\_n), \phi(\mathfrak{r}\_1, \dots, \mathfrak{r}\_i, \dots, \mathfrak{r}\_n) \rangle \ . \end{split} \rangle$$

From this follows without any difficulty an extension of the second Definition 3.5 as well.

The *n*-dimensional Fourier series of a periodic distribution *T* ∈ D (R*<sup>n</sup>*) is

$$\sum\_{k=-\infty}^{\infty} c\_k(T) \,\mathbf{e}^{J(k\boldsymbol{\alpha}, \boldsymbol{\tau})}$$

with *k* an *n*-dimensional multi-index, ω*<sup>c</sup>* the *n*-tuple (2π/T1,..., 2π/T*n*) and *ck* (*T* ) the Fourier coefficients

$$c\_k(T) = \frac{1}{\mathcal{T}\_1 \cdots \mathcal{T}\_n} \left< T^{\circ}, \mathbf{e}^{-J(ka\_k, \mathbf{r})} \right> \cdot$$

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Laplace Transform of Distributions**

# **5.1 Definition**

The classic Laplace transform is closely related to the Fourier one and has similar properties. In a way it can be seen as a modification of the latter in such a way that it can handle exponentially growing functions. To achieve this the transformable functions are required to be bounded on the left.1 Concretely, the classic one-dimensional Laplace transform is defined on so-called *original functions f* : <sup>R</sup> <sup>→</sup> <sup>C</sup> with the following properties:


The greatest lower bound σ *<sup>f</sup>* satisfying the last property is called the *abscissa of convergence* of *f* .

The Laplace transform of an original function *f* is a function of the complex variable *<sup>s</sup>* := <sup>σ</sup> <sup>+</sup> jω (σ, ω <sup>∈</sup> <sup>R</sup>) defined by

$$F(s) := \mathcal{L}\{f\}(s) := \int\_0^\infty f(t)e^{-st}dt \qquad \mathfrak{R}\{s\} > \sigma\_f\,. \tag{5.1}$$

We adopt the common convention of denoting the Laplace transform of a function *f* with the same letter, but capitalized. That is, for this example *F* = L { *f* }.

We want to extend this definition to distributions with support contained in [0,∞), that is to right-sided distributions *T* ∈ D <sup>+</sup>. To this end note that the integrand in the above definition is the product of an original function *f* with an indefinitely differentiable function with unbounded support. If we multiply the latter by any indefinitely differentiable function γ (*t*) with support bounded on the left and equal to 1 on a neighborhood of [0,∞), then we obtain the product of two functions

<sup>1</sup> There are left-sided and two-sided Laplace transforms as well, but we are not going to discuss them.

F. Beffa, *Weakly Nonlinear Systems*, Understanding Complex Systems, https://doi.org/10.1007/978-3-031-40681-2\_5

with support bounded on the left without changing the value of the integral. Then, since *f* is an original function, for any σ<sup>0</sup> > σ *<sup>f</sup>* the product *f* (*t*)e−σ0*<sup>t</sup>* is a (regular) tempered distribution and γ (*t*)e−*st* eσ0*<sup>t</sup>* , for {*s*} > σ0, a test function of fast descent. We have thus obtained a way to define the Laplace transform for a restricted class of distributions.

**Definition 5.1** (*Laplace transformable*) A distribution *T* ∈ D <sup>+</sup> is said to be *Laplace transformable* if there exists a constant <sup>σ</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup> such that

$$T(t) \text{ e }^{-\sigma\_0 t}$$

is a distribution in S (R). The greatest lower bound σ*<sup>T</sup>* is called the *abscissa of convergence* of *T* .

**Definition 5.2** (*Laplace transform*) The Laplace transform of a Laplace transformable distribution *T* is defined by

$$\mathcal{L}\{T\} := \langle T(t) \,\mathrm{e}^{-\sigma\_0 t}, \nu(t) \mathrm{e}^{-(s-\sigma\_0)t} \rangle \quad \text{for} \quad \Re\{\mathrm{s}\} > \sigma\_0 > \sigma\_T$$

with γ any indefinitely differentiable function with support bounded on the left and equal to 1 on a neighborhood of the support of *T* . It is commonly abbreviated by

$$\langle T(t), \mathbf{e}^{-st} \rangle \quad \text{for} \quad \mathfrak{R}\{\mathbf{s}\} > \sigma\_T \dots$$

The right-half plane {*s*} > σ*<sup>T</sup>* is called *region of convergence* (ROC).

If *T* is Laplace transformable, its transform is a well-defined number for any value of the complex parameter *s* with {*s*} > σ*<sup>T</sup>* . In other words, it is a function of *s*. Since *s* only appears as a parameter of the test function of fast descent γ (*t*)e−(*s*−σ0)*<sup>t</sup>* , with the continuity and linearity of distributions it is easy to see that

$$D\_{\mathbf{s}}\langle T(t), \mathbf{e}^{-st}\rangle = \langle T(t), -t\mathbf{e}^{-st}\rangle = \langle -tT(t), \mathbf{e}^{-st}\rangle \dots$$

In addition, since e−*st* is an entire analytic function, *in the right half-plane* {*s*} > σ*<sup>T</sup>* , L {*T* } *is a holomorphic function.* This is an important result as it allows to use many results from complex analysis.

Higher order derivatives are obtained by iterating the above result so that we have

$$D\_x^k \mathcal{L}\left\{T\right\}(\mathbf{s}) = \mathcal{L}\left\{(-t)^k T\right\}(\mathbf{s})\,. \tag{5.2}$$

Note that the abscissa of convergence of the derivatives is the same as the one of *T* .

#### **Example 5.1: Laplace Transform of** *δ*

The Laplace transform of δ, of δ(*t* − *a*) and of *D<sup>k</sup>* δ are

$$\begin{aligned} \langle \delta, \mathbf{e}^{-st} \rangle &= 1 \\ \langle \delta(t - a), \mathbf{e}^{-st} \rangle &= \mathbf{e}^{-sa} \\ \langle D^k \delta, \mathbf{e}^{-st} \rangle &= \langle \delta, (-1)^k D^k \mathbf{e}^{-st} \rangle = s^k \ . \end{aligned}$$

In all cases the region of convergence is the entire complex plane C.

# **Example 5.2: Laplace Transform of 1+***(t) t <sup>k</sup>/ k***! e***at*

Let *a* be a complex number. The Laplace transform of the regular distribution 1+(*t*) e*at* is

$$\langle \mathbf{e}^{at}, \mathbf{e}^{-st} \rangle = \int\_0^\infty \mathbf{e}^{-(s-a)t} \, dt = \frac{1}{s-a}$$

with abscissa of convergence σexp(*at*) = *a*.

From this and (5.2), the Laplace transform of

$$\mathbf{1}\_+(t) \, \frac{t^k}{k!} \, \mathbf{e}^{at}$$

is readily found to be

$$\mathcal{L}\left\{\mathbf{1}\_{+}(t)\,\frac{t^{k}}{k!}\mathbf{e}^{at}\right\}=\frac{(-1)^{k}}{k!}D^{k}\frac{1}{s-a}=\frac{1}{(s-a)^{k+1}}\dots$$

# **5.2 Properties**

The Laplace transform is a linear operation: given two Laplace transformable distribution *S* and *T* with abscissa of convergence σ*<sup>S</sup>* resp. σ*<sup>T</sup>* , the transform of their weighted sum is

$$
\mathcal{L}\left\{c\_1 S + c\_1 T\right\} = c\_1 \mathcal{L}\left\{S\right\} + c\_2 \mathcal{L}\left\{T\right\} \quad \text{for} \quad \mathfrak{M}\{s\} \succ \max(\sigma\_\mathbb{S}, \sigma\_T) \dots
$$

Let *a* be a complex number. Then, from the definition, the Laplace transform of e*atT* (*t*)

$$\mathcal{L}\left\{\mathbf{e}^{-at}T\right\}(\mathbf{s}) = \langle \mathbf{e}^{-at}T(t), \mathbf{e}^{-st}\rangle = \langle T(t), \mathbf{e}^{-(s+a)t}\rangle = \mathcal{L}\left\{T\right\}(\mathbf{s}+a)$$

with region of convergence {*s*} > σ*<sup>T</sup>* − {*a*}.

We saw in Example 3.6 that the convolution of distributions in D <sup>+</sup> is always well-defined. Therefore, the convolution of two Laplace transformable distributions *S* and *T* is well-defined. Then, for {*s*} > σ<sup>0</sup> = max(σ*S*, σ*<sup>T</sup>* ), the transform of their convolution product is by definition

$$\begin{aligned} &\mathcal{L}\left\{ S\*T \right\}(\mathsf{s}) \\ &= \langle (S\*T)\mathsf{e}^{-\sigma\_{0}t}, \mathcal{Y}(t)\mathsf{e}^{-(\mathsf{s}-\sigma\_{0})t} \rangle \\ &= \langle S(t)\otimes T(\lambda)\mathsf{e}^{-\sigma\_{0}\lambda}, \mathcal{Y}(t+\lambda)\mathsf{e}^{-(\mathsf{s}-\sigma\_{0})(t+\lambda)} \rangle .\end{aligned}$$

By noting that, over a neighborhood of the support of *S* ⊗ *T* , γ (*t* + λ) = 1 = γ (*t*)γ (λ) we can proceed further and obtain

$$\begin{split} &\mathcal{L}\{S\*T\}(s) \\ &= \langle S(t)\mathbf{e}^{-\sigma\mathbb{I}t}\otimes T(\lambda)\mathbf{e}^{-\sigma\mathbb{I}\lambda}, \mathcal{V}(t)\mathbf{e}^{-(s-\sigma\_{0})t}\mathcal{V}(\lambda)\mathbf{e}^{-(s-\sigma\_{0})\lambda}\rangle \\ &= \langle S(t)\mathbf{e}^{-\sigma\mathbb{I}t}, \langle T(\lambda)\mathbf{e}^{-\sigma\mathbb{I}\lambda}, \mathcal{V}(\lambda)\mathbf{e}^{-(s-\sigma\_{0})t}\rangle \mathcal{V}(t)\mathbf{e}^{-(s-\sigma\_{0})t}\rangle \\ &= \langle S(t)\mathbf{e}^{-\sigma\mathbb{I}t}, \mathcal{V}(t)\mathbf{e}^{-(s-\sigma\_{0})t}\rangle \langle T(\lambda)\mathbf{e}^{-\sigma\_{0}\lambda}, \mathcal{V}(\lambda)\mathbf{e}^{-(s-\sigma\_{0})\lambda}\rangle \\ &= \mathcal{L}\{S\}(s)\mathcal{L}\{T\}(s) \end{split}$$

which is well-defined since, in the specified ROC, the Laplace transforms of *S* and *T* are holomorphic functions. We thus see that, as the Fourier transform, the Laplace transform converts convolutions into products

$$
\mathcal{L}\left\{\mathcal{S}\*T\right\} = \mathcal{L}\left\{\mathcal{S}\right\} \mathcal{L}\left\{T\right\} \quad \text{for} \quad \mathfrak{R}\{\mathbf{s}\} \succ \max(\sigma\_{\mathcal{S}}, \sigma\_{T}) \,. \tag{5.3}
$$

A key advantage over the Fourier transform is that here the multiplication is between functions that are holomorphic in the specified open right-half plane.

In a similar way as we did for the Fourier transform, we can use this property to derive several additional properties in a straightforward way. Specifically, using the properties of the convolution product and the Laplace transform of the Dirac δ and related distributions (Example 5.1), we immediately obtain the properties in Table 5.1 that we have not yet discussed.

#### **Example 5.3: Convolution of exp's**

Let *k* and *l* be natural numbers and *a* a complex constant. We want to calculate the following convolution product

$$\mathbf{1}\_{+}(t)\frac{t^{k}}{k!}\mathbf{e}^{at}\ast\mathbf{1}\_{+}(t)\frac{t^{l}}{l!}\mathbf{e}^{at}\dots$$

From the convolution property of the Laplace transform and the results of Example 5.2 the transform of the above convolution product is


**Table 5.1** Properties of the Laplace transformation

$$\frac{1}{(s-a)^k} \frac{1}{(s-a)^l} = \frac{1}{(s-a)^{k+l}} \dots$$

With it, we find the desired result as

$$\mathbf{1}\_{+}(l)\frac{t^{k+l}}{(k+l)!}\mathbf{e}^{at}\dots$$

# **5.3 Inverse Laplace Transform**

The Laplace transform isn't only similar to the Fourier one, it can also be formally related to it. Consider first an original function *f* . By writing its Laplace transform as

$$\int\_0^\infty f(t) \mathbf{e}^{-\sigma t} \mathbf{e}^{-j\alpha t} \, dt$$

we see that, for every value of σ>σ *<sup>f</sup>* , it can be interpreted as the Fourier transform of the function *f* (*t*)e−σ*<sup>t</sup>* .

This relation between the two transforms can be extended to distributions. Consider a Laplace transformable distribution *T* . We have established that, for {*s*} > σ*<sup>T</sup>* , its Laplace transform is a holomorphic function of *s*. In addition, by definition, for every value σ<sup>0</sup> > σ*<sup>T</sup>* , *T* (*t*) e−σ0*<sup>t</sup>* is a distribution of slow growth and for σ>σ0, γ (*t*)e−(σ−σ0)*<sup>t</sup>* e−jω*<sup>t</sup>* is a test function of fast descent for every value of ω. We conclude that the Laplace transform of *T* considered as a function of ω for fixed σ must be a regular distribution of slow growth. Hence, the following integral is well-defined 

$$\int\_{\mathbb{R}} \langle T(t) \, \mathbf{e}^{-\sigma\_0 t}, \boldsymbol{\nu}(t) \mathbf{e}^{-(\sigma-\sigma\_0)t} \mathbf{e}^{-j\omega t} \rangle \phi(\boldsymbol{\nu}) \, d\boldsymbol{\nu}$$

for any φ(ω) ∈ S. This integral can be recognized as the tensor product 1(ω) ⊗ *T* (*t*) e−σ0*<sup>t</sup>* and, using Fubini's theorem, it can be rearranged to become

80 5 Laplace Transform of Distributions

$$\begin{aligned} \langle T(t) \operatorname{\mathbf{e}}^{-\sigma\_0 t}, \operatorname{\boldsymbol{\gamma}}(t) \operatorname{\mathbf{e}}^{-(\sigma-\sigma\_0)t} \int\_{\mathbb{R}} \operatorname{\mathbf{e}}^{-\operatorname{\mathbf{e}}ut} \phi(\boldsymbol{\omega}) \, d\boldsymbol{\omega} \rangle \\ &= \langle T(t) \operatorname{\mathbf{e}}^{-\sigma t}, \hat{\phi}(t) \rangle = \langle \mathcal{F} \{ T(t) \operatorname{\mathbf{e}}^{-\sigma t} \}, \phi(\boldsymbol{\omega}) \rangle \dots \end{aligned}$$

We thus obtain the claimed relation between Laplace and Fourier transforms

$$\mathcal{L}\left\{T\right\} = \mathcal{F}\{\mathbf{e}^{-\sigma t}T\} \quad \text{for} \quad \sigma > \sigma\_T \tag{5.4}$$

which gives us a formal way to invert the Laplace transform.

A first consequence of this relation is that, given the Laplace transform of a distribution *T with abscissa of convergence* σ*<sup>T</sup>* < 0, we can immediately find its Fourier transform by setting *s* = jω

$$
\ddot{T}(o) = \mathcal{L}\left\{\boldsymbol{T}\right\}(foo)\,\,.
$$

Another important consequence of (5.4) is that, if L {*T* } = 0 on a vertical line with {*s*} > σ*<sup>T</sup>* , then *T* = 0. In fact, with the above result we have that

$$0 = \langle \mathcal{L}\left\{ T \right\}, \phi \rangle = \langle \mathcal{F}\{ \mathbf{e}^{-\sigma t} T \}, \phi \rangle = \langle \mathbf{e}^{-\sigma t} T, \hat{\phi} \rangle$$

from which we conclude that e−σ*<sup>t</sup> T* and hence *T* must vanish. In addition, with *T* = *S* − *U* this implies that, if L {*S*} = L {*U*} on a vertical line of the region of convergence, then *S* = *U*. In other words, if a function, holomorphic in an open right-half plane, is the Laplace transform of a distribution in D <sup>+</sup>, then it is the transform of a *unique* distribution.

The next logical question to ask is: which holomorphic functions are transforms of a distribution? To answer this question, consider first an holomorphic function *F* bounded by

$$|F(\mathbf{s})| \le \frac{C}{|\mathbf{s}|^2} \quad \text{for} \quad \Re\{\mathbf{s}\} \ge \sigma\_0 > 0$$

with *C* a constant. Then

$$\int\_{-\infty}^{\infty} \left| F(\sigma\_0 + j\omega) \mathbf{e}^{jat} \right| \, d\omega \le \int\_{-\infty}^{\infty} \frac{C}{\sigma\_0^2 + \omega^2} \, d\omega < \infty$$

and the inverse Fourier integral

$$\frac{1}{2\pi} \int\_{-\infty}^{\infty} F(\sigma\_0 + j\omega) \mathbf{e}^{j\omega t} \, d\omega$$

exists and defines a continuous function that we may write as e−σ0*<sup>t</sup> f* (*t*). The thus defined function *f* is therefore

#### 5.3 Inverse Laplace Transform 81

**Fig. 5.1** Integration path for *t* < 0

$$\begin{split} f(t) &= \frac{\mathbf{e}^{\sigma\_0 t}}{2\pi} \int\_{-\infty}^{\infty} F(\sigma\_0 + j\omega) \mathbf{e}^{j\omega t} \, d\omega \\ &= \frac{1}{2\pi} \int\_{-\infty}^{\infty} F(\sigma\_0 + j\omega) \mathbf{e}^{(\sigma\_0 + j\omega)t} \, d\omega \\ &= \frac{1}{2\pi f} \int\_{\sigma\_0 - j\infty}^{\sigma\_0 + j\infty} F(s) \mathbf{e}^{st} \, ds \end{split} \tag{5.5}$$

and corresponds to the integral of an holomorphic function along the vertical line defined by {*s*} = σ0. If we write the variable *s* in its polar representation *s* = *R*ejϕ it's easy to verify that in the right-half plane and for *t* < 0

$$|F(\mathbf{s})\mathbf{e}^{\mathbf{s}t}| \le \frac{C}{R^2} \dots$$

Therefore, if we close the integration path of the above integral by first making the line finite, then closing it along the half-circle shown in Fig. 5.1 and then taking the limit *R* → ∞, the value of the integral remains unchanged. In fact

$$\lim\_{\mathcal{R}\to\infty} \left| \frac{1}{2\pi \, f} \int\_{\Gamma\_2} F(s) \, \mathbf{e}^{st} \, ds \right| \le \lim\_{\mathcal{R}\to\infty} \frac{C}{2\pi \, R^2} \, R\pi = 0 \, \dots$$

Having closed the integration path we can now use Cauchy's theorem and conclude that for *t* < 0, *f* (*t*) = 0.

Cauchy's theorem can also be used to show that the value of the integral is the same along any vertical line with {*s*} > 0. To show this, we integrate along two vertical segments and close the path with horizontal ones. Since the contribution of the horizontal paths vanishes as we extend the length of the vertical ones toward infinity, we conclude that the value of the integral along the two vertical lines must be the same.

We have thus established that the function *f* doesn't depend on the value of σ0, is continuous and vanishes for *t* < 0. These characteristics make *f* an original function and hence a Laplace transformable distribution in D <sup>+</sup>. Furthermore, from the definition of *f* and (5.4) we see that *F* is its Laplace transform.

Now consider the more general function *<sup>G</sup>*(*s*) <sup>=</sup> *<sup>s</sup><sup>k</sup> <sup>F</sup>*(*<sup>s</sup>* <sup>−</sup> <sup>σ</sup>*G*) with <sup>σ</sup>*<sup>G</sup>* <sup>∈</sup> <sup>R</sup>, *<sup>k</sup>* <sup>∈</sup> N and *F* as before. From the properties of the Laplace transform we know that it is the transform of the distribution *g* = *D<sup>k</sup>* (e<sup>σ</sup>*<sup>G</sup> <sup>t</sup> f* ) which is also clearly in D <sup>+</sup>. We therefore conclude that every function *<sup>G</sup>* that, for some <sup>σ</sup>*<sup>G</sup>* <sup>∈</sup> <sup>R</sup>, is holomorphic in the open right-half plane {*s*} > σ*<sup>G</sup>* and is bounded above by a polynomial *P*

$$|G(\mathbf{s})| \le P(|\mathbf{s}|) \qquad \mathfrak{N}\{\mathbf{s}\} \succ \sigma\_G \tag{5.6}$$

is the Laplace transform of a distribution in D +.

Without going into the details we also mention that the converse is also true. That is, every Laplace transformable distribution *T* ∈ D <sup>+</sup> is a derivative of some regular distribution associated with a continuous original function [16].

With the transforms of Examples 5.1 and 5.2 we can find the inverse Laplace transform of any rational function of *s* by partial fraction expansion. Note also that (5.5) corresponds to the classic inverse Laplace transform for functions.

# **Example 5.4: Laplace versus Fourier Transform of 1<sup>+</sup>**

In this example we calculate the Laplace transform of the Heaviside step function 1+. While it's easy to obtain it directly from the definition, we calculate it from our previous results *D*1<sup>+</sup> = δ (Example 2.8) and L {*D*δ} = *s* (Example 5.1). This is to compare it with the methods used in Examples 4.8 and 4.9 to obtain its Fourier transform.

By using

$$D\mathbf{1}\_{+} = D\boldsymbol{\delta} \* \mathbf{1}\_{+}$$

and the convolution property of the Laplace transform, we obtain the following equation for the Laplace transform of 1<sup>+</sup>

$$s\mathcal{L}\{\mathfrak{1}\_{+}\} = 1\dots$$

Then, since all Laplace transforms are holomorphic functions in an open right-half plane and only the zero distribution has zero as its transform, we can conclude that

$$
\mathcal{L}\left\{\mathfrak{I}\_{+}\right\} = \frac{1}{s} \qquad \mathfrak{R}\{s\} > 0 \dots
$$

#### 5.3 Inverse Laplace Transform 83

As an extra step we want to obtain the Fourier transform of 1<sup>+</sup> from its Laplace transform. The abscissa of convergence of L {1+}is not smaller than zero. Therefore, we can't obtain it by simply setting *s* = jω. However, in cases like this, where the abscissa of convergence is zero, given the continuity of distributions, it's still possible to obtain the Fourier transform as a limit, so that

$$\langle \mathcal{F}\{\mathfrak{I}\_{+}\}, \phi \rangle = \lim\_{\mathfrak{N}\{s\}\downarrow 0} \int\_{\Gamma} \frac{\phi(\mathfrak{J}\{s\})}{s} \frac{ds}{J}$$

with  a vertical line in the ROC@. The limit is only problematic around the origin, where we can integrate along a small half-circle of radius

$$\begin{split} &\lim\_{\substack{\Re\{\boldsymbol{s}\}\downarrow 0}} \frac{1}{J} \int\_{\Gamma} \frac{\phi(\Im\{\boldsymbol{s}\})}{s} \, ds \\ &= \lim\_{\epsilon \downarrow 0} \frac{1}{J} \left\{ \int\_{|\boldsymbol{\omega}| > \epsilon} \frac{\phi(\boldsymbol{\omega})}{J\boldsymbol{\omega}} \, J \, d\boldsymbol{\omega} + \int\_{-\pi/2}^{\pi/2} \frac{\phi(\epsilon \sin(\boldsymbol{\varphi}))}{\epsilon \mathbf{e}^{J\boldsymbol{\varphi}}} \, \operatorname{J} \epsilon \, \mathbf{e}^{J\boldsymbol{\varphi}} \, d\boldsymbol{\varphi} \right\}, \\ &= \lim\_{\epsilon \downarrow 0} \left\{ \int\_{|\boldsymbol{\omega}| > \epsilon} \frac{\phi(\boldsymbol{\omega})}{J\boldsymbol{\omega}} \, d\boldsymbol{\omega} \right\} + \int\_{-\pi/2}^{\pi/2} \phi(\boldsymbol{0}) \, d\boldsymbol{\varphi} \\ &= \left\langle \text{pv} \, \frac{1}{J\boldsymbol{\omega}} + \pi \, \boldsymbol{\delta}, \phi \right\rangle. \end{split}$$

#### **Example 5.5: Exploiting Continuity**

To simplify the notation we assume that all appearing functions (more correctly, regular distributions) disappear for *t* < 0. For example, we write *t <sup>k</sup>* for 1+(*t*)*t <sup>k</sup>* .

From the results of Example 5.2, by setting *a* = 0, we can note a dualism between positive and negative powers

$$
\mathcal{L}\left\{\frac{t^k}{k!}\right\} = \frac{1}{s^{k+1}} \qquad \mathfrak{R}\{s\} > 0 \;.
$$

If we sum the first *N* powers of *t* we obtain

$$\mathcal{L}\left\{\sum\_{k=0}^{N-1} \frac{t^k}{k!} \right\} = \sum\_{k=0}^{N-1} \frac{1}{s^{k+1}} = \frac{1}{s} \sum\_{k=0}^{N-1} \frac{1}{s^k} \dots$$

Using the continuity of distributions we can let *N* tend to infinity. The original distribution converges to the exponential function, while the transform becomes a geometric series (plus a factor)

84 5 Laplace Transform of Distributions

$$\mathcal{L}\left\{\mathbf{e}^{\mathbf{r}}\right\} = \frac{1}{s} \sum\_{k=0}^{\infty} \frac{1}{s^k}$$

that converges for |*s*| > 1. This means that there is a right-half plane {*s*} > 1 where the series converges and can be summed to obtain the expected result

$$
\mathcal{L}\left\{\mathbf{e}'\right\} = \frac{1}{s} \frac{1}{1 - 1/s} = \frac{1}{s - 1} \qquad \mathfrak{R}\{s\} > 1 \dots
$$

Note that the last expression can also be expressed as a geometric series of positive powers of *s*

$$\frac{1}{s-1} = -\sum\_{k=0}^{\infty} s^k \dots$$

However, this series only converges if |*s*| < 1. Consequently, there is no right-half plane where the series is holomorphic and hence it isn't the Laplace transform of a distribution.

# **5.4 Extension to Several Variables**

The Laplace transform can be extended to functions of several variables in the same way as we did for the Fourier transform. Let τ be an *n*-tuple in R*<sup>n</sup>*. A multi-variable original function *<sup>f</sup>* : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>C</sup> is a function that is an original function with respect to each variable independently, that is


The classic Laplace transform is then defined as

$$\mathcal{L}\left\{f\right\}(s) = \int\_{\mathbb{R}\_+^n} f(\mathfrak{r}) \, \mathbf{e}^{-(s,\mathfrak{r})} \, d^n \mathfrak{r} \tag{5.7}$$

with *<sup>s</sup>* <sup>∈</sup> <sup>C</sup>*<sup>n</sup>* and <sup>R</sup>*<sup>n</sup>* <sup>+</sup> the *n*-dimensional Cartesian product of the half-line [0,∞).

With this definition we see that the definition of the Laplace transform for distributions extends to higher dimensions essentially without modification. By interpreting *k* as a multi-index and all variables as *n*-dimensional ones, all properties of Table 5.1 remain valid. The only exception is the classic inverse Laplace transform integral (5.5). As we have seen, this integral is based on the inverse Fourier transform and the factor 2πj has therefore to be replaced by (2πj)*<sup>n</sup>*.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 Summable Distributions**

In this chapter we study a class of distributions, summable distributions, that can be extended on smooth bounded functions. These distributions have properties that are well suited to describe some classes of systems. However, the material is more technical than the rest of the book and can be skipped without loss of continuity.

# **6.1 Definition and Canonical Extension**

One can define summable distributions in several equivalent ways [16]. The following one is the most suitable for our purposes.

**Definition 6.1** (*Summable distributions*) A summable distribution *T* is a distribution that can be represented as a finite sum of derivatives (in the sense of distribution) of functions *fk* ∈ *L*<sup>1</sup>

$$T = \sum\_{|k| \le m} D^k f\_k$$

with *<sup>k</sup>* an *<sup>n</sup>*-tuple in <sup>N</sup>*<sup>n</sup>* and *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>.

We denote the vector space of summable distributions by D *L*1 .

An important property of summable distributions is the fact that they can be extended to continuous linear functionals on B, the set of indefinitely differentiable functions that, together with all their derivatives, are bounded

$$\mathcal{B} := \left\{ \phi \in \mathbb{C}^{\infty} \mid D^k \phi \text{ is bounded, } \ k \in \mathbb{N}^n \right\} \dots$$

As usual, to talk about continuity, we need to define a convergence criterion (topology).

**Definition 6.2** (*Convergence in* B) A sequence of functions (η*j*) in B converges to zero if the sequence as well as all sequences of the derivatives (*D<sup>k</sup>*η*j*) converge uniformly to zero as *j* tends to infinity. That is, given the norms

$$p\_m(\eta) := \sum\_{|k| \le m} \sup\_{\mathfrak{r} \in \mathbb{R}^n} |D^k \eta(\mathfrak{r})|,$$

the sequence (η*j*) converges to zero if, as *j* tends to infinity, the sequence of numbers (*pm*(η*j*)) converges to zero for all *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>.

The application of a summable distribution *T* to a function η ∈ B is well defined since

$$\left| \left\langle D^{k}f\_{k}, \eta \right\rangle \right| \leq \sup\_{\mathfrak{r} \in \mathbb{R}^{n}} \left| D^{k}\eta(\mathfrak{r}) \right| \int\_{\mathbb{R}^{n}} \left| f\_{k}(\mathfrak{r}) \right| \left| \left. d^{n}\mathfrak{r} \right| < \infty \right.$$

This shows that it's possible to extend a summable distribution to B. However, the extension in general is not unique. The reason being that the set of test functions D is not dense in B. That is to say that it's not possible to approximate to arbitrary accuracy any function in B with functions from D.

#### **Example 6.1: Constant Function**

Consider the constant function **1** : *t* → 1, the function α ∈ D defined by (2.11), and functions <sup>α</sup>*<sup>j</sup>* <sup>∈</sup> <sup>D</sup> defined by <sup>α</sup>*j*(*t*) <sup>=</sup> α(*t*/*j*), *<sup>j</sup>* <sup>∈</sup> <sup>N</sup>. The product <sup>α</sup>*j***<sup>1</sup>** is clearly a test function satisfying α*j*(2 *j*) = 0 for all values of *j*. From this we see that

$$p\_0(\alpha\_j \mathbf{1} - \mathbf{1}) = \sup\_{t \in \mathbb{R}} |\alpha\_j(t)\mathbf{1}(t) - \mathbf{1}(t)| = 1$$

no matter how large *j* is.

#### **Example 6.2: Average Functional**

Consider the functions **1**, α*<sup>j</sup>* from Example 6.1 and the following functional *L* on B

$$L(\eta) := \lim\_{C \to \infty} \frac{1}{C} \int\_{-C/2}^{C/2} \eta(\tau) \, d\tau \, .$$

It is easily seen that *L* is linear and continuous. Its value on the constant function **1** is 1 while its value on the test functions α*j***1** is zero for all values of *j*

$$|L(\alpha\_j \mathbf{1})| \le \lim\_{C \to \infty} \frac{1}{C} \int\_{-2j}^{2j} |\alpha\_j(\mathbf{r})| \, d\mathbf{r} \le \lim\_{C \to \infty} \frac{4j}{C} = 0.1$$

The functional *L* is therefore a valid extension to B of the zero distribution as is the zero functional on B.

We can define a unique, canonical extension of a summable distribution *T* by requiring an additional condition on the extension [19]. A suitable condition can be obtained from the properties of Lebesgue integrals. Consider again the test functions α*<sup>j</sup>* ∈ D from Example 6.1 and a function η ∈ B. If we apply *T* to α*j*η we obtain

$$\begin{aligned} \left< T, \alpha\_j \eta \right> &= \sum\_{|k| \le m} (-1)^{|k|} \int\_{\mathbb{R}^n} f\_k(\tau) \, D^k(\alpha\_j(\tau) \eta(\tau)) \, d^n \tau \\ &= \sum\_{|k| \le m} (-1)^{|k|} \left( \int\_{|\tau| \le j} f\_k(\tau) \, D^k \eta(\tau) \, d^n \tau + \int\_{|\tau| > j} f\_k(\tau) \, D^k(\alpha\_j(\tau) \eta(\tau)) \, d^n \tau \right) . \end{aligned}$$

The integral of an *L*<sup>1</sup> function can be approximated up to an arbitrary > 0 by an integral over a suitably chosen compact subset *K* of R*<sup>n</sup>*. Therefore we can find a large enough *N* such that for *j* > *N*

$$\langle T, \alpha\_j \eta \rangle = \epsilon + \sum\_{|k| \le m} (-1)^{|k|} \int\_{|\mathfrak{r}| \le j} f\_k(\mathfrak{r}) D^k \eta(\mathfrak{r}) \, d^n \mathfrak{r} \, .$$

Thus, in the limit as *j* tends to infinity we obtain a well-defined continuous linear functional on B.

The important observation from this derivation is the fact that, to find an extension to B of a summable distribution, it is not necessary to require uniform convergence on the whole of R*<sup>n</sup>*. An extension can be obtained by requiring uniform convergence on every compact subset *<sup>K</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*. More precisely, by requiring the convergence criterion that we defined for the space E. From this observation we define the following property.

**Definition 6.3** (*Bounded convergence property*) A continuous linear functional on B has the *bounded convergence property* if, given any sequence (η*j*) of functions η*<sup>j</sup>* ∈ B with *pm*(η*j*) < ∞ for all m, and converging to zero in the space E as *j* → ∞, then

$$\langle T, \eta\_j \rangle \to 0, \qquad j \to \infty.$$

The sequence (α*j*η) does converge to η in E. Hence, by continuity, there is a unique extension of *T* to B with the bounded convergence property

$$\lim\_{j \to \infty} \left< T, \alpha\_j \eta \right> = \left< T, \eta \right> \,. \tag{6.1}$$

In particular this shows that this extension does not depend on the particular representation of *T* in terms of derivatives of integrable functions.

The converse is also true. The restriction to D of any continuous linear functional on B with the bounded convergence property defines a unique summable distribution. Thus, there is a one to one correspondence between summable distributions and continuous linear functionals on B with the bounded convergence property.

**Definition 6.4** (*Canonical extension*) The *canonical* extension to B of a summable distribution is the unique extension to a continuous linear functional on B with the bounded convergence property.

In the following, whenever we use the extension of a summable distribution it will always be assumed to be the canonical one.

While our previous definition of differentiation carries over to summable distributions without problems, this is not the case for multiplication. In general the product of a bounded function η ∈ B with an unbounded one γ ∈ E is not bounded and therefore not in B. Differently from this, the product of two bounded functions η, ζ ∈ B is always in B. Therefore, *for summable distributions T* ∈ D *<sup>L</sup>*<sup>1</sup> *multiplication has to be restricted to functions in* B

$$
\langle \eta T, \zeta \rangle = \langle T, \eta \zeta \rangle \ .
$$

# **6.2 Convolution of Summable Distributions**

In Sect. 3.2 we defined the convolution product between two distributions *S*, *T* ∈ D by

$$\langle \mathcal{S} \ast T, \phi \rangle = \langle \mathcal{S}(\mathfrak{r}) \otimes T(\lambda), \phi(\mathfrak{r} + \lambda) \rangle$$

and saw that in general, if the support of both *S* and *T* is unbounded, it may not exist. In this section we show that if *S* and *T* are summable, then their convolution product is well-defined despite the fact that their support is unbounded.

Consider the application of a summable distribution *T* to a function τ → η(λ + τ ) ∈ B with λ a parameter. Following the same arguments as in Sect. 3.1, given the linearity and continuity of *T* , we deduce that it is a continuous and indefinitely differentiable function ζ belonging to B

$$
\zeta(\lambda) = \langle T(\mathfrak{r}), \eta(\lambda + \mathfrak{r}) \rangle \dots
$$

For this reason the convolution of two summable distributions *S* and *T* is always well-defined

$$\langle \mathcal{S} \ast T, \eta \rangle = \langle \mathcal{S}(\lambda), \langle T(\tau), \eta(\lambda + \tau) \rangle \rangle = \langle \mathcal{S}, \zeta \rangle$$

and commutative.

Next we investigate the convolution of a summable distribution *T* with a function in B. Consider the application of *T* to the function τ → η(λ − τ ) ∈ B parameterised by λ. As we just saw, it is a function that we call again ζ and that is clearly locally integrable. Hence, it defines a distribution in D and with φ ∈ D we can write

$$
\begin{aligned}
\langle\langle T(\tau),\eta(\lambda-\tau)\rangle,\phi(\lambda)\rangle&=\langle\xi,\phi\rangle=\langle\phi,\xi\rangle=\langle\phi(\lambda),\left(T(\tau),\eta(\lambda-\tau)\right)\rangle\\&=\langle\phi(\lambda)\otimes T(\tau),\eta(\lambda-\tau)\rangle\\&=\left\langle T(\tau),\int\_{\mathbb{R}^n}\phi(\lambda)\eta(\lambda-\tau)\,d^n\lambda\right\rangle\\&=\left\langle T(\tau),\int\_{\mathbb{R}^n}\phi(\xi+\tau)\eta(\xi)\,d^n\xi\right\rangle\\&=\left\langle T(\tau)\otimes\eta(\xi),\phi(\xi+\tau)\right\rangle=\left\langle T\*\eta,\phi\right\rangle.
\end{aligned}
$$

or

$$
\langle T(\tau), \eta(\lambda - \tau) \rangle = (T \ast \eta)(\lambda) \tag{6.2}
$$

This shows that a summable distribution can be regularised by a function in B and that the resulting regularised is also a function in B.

# **6.3 Fourier Transform of Summable Distributions**

The functions <sup>τ</sup> → <sup>e</sup>−<sup>j</sup> (ω,τ ) with <sup>ω</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* belong to <sup>B</sup>. For this reason the Fourier transform of a summable distribution *T* can be expressed in a simple way. Let φ ∈ D, then

$$\begin{aligned} \langle \mathcal{F}\{T\}, \phi \rangle &= \langle T(\mathsf{r}), \mathcal{F}\{\phi\}(\mathsf{r}) \rangle = \left\langle T(\mathsf{r}), \int\_{\mathbb{R}^n} \phi(\omega) \mathbf{e}^{-f(\mathsf{t}, \omega)} \, d^n \omega \right\rangle \\ &= \left\langle T(\mathsf{r}), \left\langle \phi(\omega), \mathbf{e}^{-f(\mathsf{t}, \omega)} \right\rangle \right\rangle = \left\langle T(\mathsf{r}) \otimes \phi(\omega), \mathbf{e}^{-f(\mathsf{t}, \omega)} \right\rangle \\ &= \left\langle \phi(\omega), \left\langle T(\mathsf{r}), \mathbf{e}^{-f(\mathsf{t}, \omega)} \right\rangle \right\rangle = \left\langle \left[T(\mathsf{r}), \mathbf{e}^{-f(\mathsf{t}, \omega)} \right], \phi(\omega) \right\rangle \end{aligned}$$

or

$$\mathcal{F}\{T\}(\omega) = \left< T(\mathbf{r}), \mathbf{e}^{-j\left(\mathbf{r},\omega\right)} \right>. \tag{6.3}$$

F {*T* } is thus a continuous function. Moreover it has at most polynomial growth, for, by representing *T* as a sum of integrable functions and the properties of the Fourier transform, for some *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> we have

$$\begin{aligned} |\mathcal{F}\{T\}(\omega)| &= \left| \sum\_{|k| \le m} (f\omega)^k \mathcal{F}\{f\_k\}(\omega) \right| \\ &\le \sum\_{|k| \le m} |\omega|^k \int\_{\mathbb{R}^n} |f\_k(\tau)| \, d^n \tau \le C(1+|\omega|)^m \end{aligned}$$

with *C* a constant. Thus, *the Fourier transformed of a summable distribution is a function of slow growth*.

The converse is not in general true, but we can find a class of functions for which it is. This is the set O*<sup>M</sup>* , the set of functions of slow growth that are indefinitely differentiable.

To see that this is the case, consider the Fourier transformed *T*ˆ of some tempered distribution *T* and assume that *T*ˆ ∈ O*<sup>M</sup>* . If φ ∈ D ⊂ S then its Fourier transformed φˆ as well as φˆ*T*ˆ are in S. Therefore we see that

$$
\phi \* T = \mathcal{F}^{-1} \{ \hat{\phi} \hat{T} \} \in \mathcal{S} \subset L^1
$$

is a summable distribution and we can apply it to a function η ∈ B to obtain

$$\langle \phi \ast T, \eta \rangle = \langle \phi(\lambda) \otimes T(\tau), \eta(\lambda + \tau) \rangle = \langle T(\tau), \langle \phi(\lambda), \eta(\lambda + \tau) \rangle \rangle \ .$$

Since φ ∈ D and η ∈ B are arbitrary and φ(λ), η(λ + τ ) ∈ B we deduce that *T* is a summable distribution.

We conclude this section by showing that the property of the Fourier transform of transforming convolution products into ordinary products is valid for arbitrary summable distributions. Let *S*, *T* be summable distributions, then using (6.3) and the property of the exponential function e−<sup>j</sup> (τ+λ,ω) = e−<sup>j</sup> (τ,ω) e−<sup>j</sup> (λ,ω) one readily obtain that

$$
\mathcal{F}\{\mathbf{S}\*T\} = \mathcal{F}\{\mathbf{S}\}\mathcal{F}\{T\}.\tag{6.4}
$$

The product is well defined as F {*S*} and F {*T* } are both functions.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part II Systems**

# **Chapter 7 Convolution Equations**

The objective of this chapter is to show that the solution of ordinary differential equations, if based on distributions as opposed to functions, can be obtained by (mostly) algebraic methods. These methods are rigorous forms of the so-called Heaviside's operational or symbolic calculus. The close relationship to the integral transforms that convert convolution into the ordinary multiplication is also shown.

# **• ! Notation**

With this chapter we stop using uppercase letters such as *T* to denote distributions. Instead, we start using lowercase letters such as the ones typically used to denote functions, for example *f* . We also adopt the convention of denoting the Laplace transform of a distribution, say *f* , with the same letter, but changed to uppercase, e.g. , *F* = L{ *f* }. When we need to distinguish between the ordinary and the distributional differential operator, we will in general denote the former by <sup>d</sup> <sup>d</sup>*<sup>t</sup>* and continue to denote the latter by *D*.

# **7.1 Convolution Algebra**

An *algebra* A is a vector space together with an associative product such that multiplication of any two vectors produces another vector in A and such that for any constants *a*, *b* and any vectors *f*, *g*, *h* ∈ A the following distributivity laws are valid

$$a(af + b\mathbf{g}) \odot h = a(f \odot h) + b(\mathbf{g} \odot h) \tag{7.1}$$

$$f \odot (ag + bh) = a(f \odot g) + b(f \odot h). \tag{7.2}$$

The convolution product seems like an adequate product to make an algebra out of distributions. Unfortunately, as we saw, the convolution product is not defined for arbitrary distributions. The solution is to restrict the set of distributions to a vector subspace of D on which the convolution is well-defined.

**Definition 7.1** (*Convolution algebra*) A convolution algebra A is a vector subspace of D with the following properties:


A convolution algebra is thus an algebra with a unit and for which the product is always commutative. We also note that the triple (A , +, ∗) forms a commutative *ring*.

We have already met three examples of convolution algebras: (i) the set of rightsided distributions D +, (ii) the set of periodic distributions and (iii) the set of distributions with compact support E .

# **7.2 Convolution Equations**

In this section we study convolution equations. We will see that they provide a framework for studying a broad class of systems that is the time-domain counterpart of one based on the Laplace transform.

A *convolution equation* is an equation of the form

$$
\mathbf{g} \* \mathbf{y} = \mathbf{x} \tag{7.3}
$$

with *g* and *x* given distributions and *y* a distribution to be determined. In this section we assume *g*, *x* and *y* to be elements of a convolution algebra A . Suppose that *g* has an *inverse* in A , that is, there is an element denoted by *<sup>g</sup>*∗−<sup>1</sup> <sup>∈</sup> <sup>A</sup> such that

$$\mathbf{g} \ast \mathbf{g}^{\ast -1} = \mathbf{g}^{\ast -1} \ast \mathbf{g} = \delta \dots$$

Then *g*∗−<sup>1</sup> ∗ *x* is a solution of the equation for any *x*, since

$$\mathbf{y} = \mathbf{g}^{\*-1} \* \mathbf{g} \* \mathbf{y} = \mathbf{g}^{\*-1} \* \mathbf{x} \; .$$

Note that if there is an inverse *g*∗−<sup>1</sup> then it must be unique, since if *g*∗−<sup>1</sup> <sup>1</sup> is another inverse we have

$$(\mathbf{g} \ast (\mathbf{g}^{\ast -1} - \mathbf{g}\_1^{\ast -1}) = (\mathbf{g} \ast \mathbf{g}^{\ast -1}) - (\mathbf{g} \ast \mathbf{g}\_1^{\ast -1}) = 0$$

Conversely, suppose that (7.3) has a solution for any right-hand *x*. Then it has a solution for *x* = δ and the solution is by definition the inverse of *g*. Consequently, we can say that, if *g* has an inverse in A , then the equation has a unique solution for any right-hand side *x* and the solution is

$$\mathbf{y} = \mathbf{g}^{\*-1} \* \boldsymbol{x} \,. \tag{7.4}$$

Therefore, knowledge of *g*∗−<sup>1</sup> permits to find the solution of (7.3) for any right-hand side *x*. For this reason *g*∗−<sup>1</sup> is called the *elementary* or *fundamental solution* of the convolution equation.

Note that if *g* has an inverse *g*∗−1, but it's not an element of the convolution algebra A , then the expression *g*∗−<sup>1</sup> ∗ *x* may not exist and *g*∗−<sup>1</sup> ∗ *g* ∗ *y* may not be associative (see Example 3.5). Hence, (7.4) can not be proved to be equivalent to (7.3).

Suppose that *g*<sup>1</sup> and *g*<sup>2</sup> are two elements of the convolution algebra A having inverses *g*∗−<sup>1</sup> <sup>1</sup> and *g*∗−<sup>1</sup> <sup>2</sup> , respectively. Then their convolution product *g*<sup>1</sup> ∗ *g*<sup>2</sup> has an inverse as well and it is given by

$$(\mathbf{g}\_1 \* \mathbf{g}\_2)^{\*-1} = \mathbf{g}\_1^{\*-1} \* \mathbf{g}\_2^{\*-1} \tag{7.5}$$

for

$$\begin{aligned} (\mathfrak{g}\_1 \ast \mathfrak{g}\_2)^{\ast -1} \ast (\mathfrak{g}\_1 \ast \mathfrak{g}\_2) &= \delta \\ &= \mathfrak{g}\_1 \ast \mathfrak{g}\_1^{\ast -1} \ast \mathfrak{g}\_2 \ast \mathfrak{g}\_2^{\ast -1} \\ &= (\mathfrak{g}\_1^{\ast -1} \ast \mathfrak{g}\_2^{\ast -1}) \ast (\mathfrak{g}\_1 \ast \mathfrak{g}\_2) .\end{aligned}$$

From this we see that, if in (7.3) *g* can be represented as the convolution product of *m* invertible elements *gi*,*i* = 1,..., *m*, then the solution of the equation can be expressed as the convolution product of their inverses

$$\mathbf{y} = \mathbf{g}\_1^{\*-1} \* \dots \* \mathbf{g}\_m^{\*-1} \* \times \dots \tag{7.6}$$

In every algebra with a unit, one can perform a partial fraction expansion and every convolution algebra has a unit by definition. Therefore, every convolution product of inverses can be represented as a sum of inverses.

#### **Example 7.1: Partial Fraction Expansion**

Consider the following convolution product

$$(D\delta + a\delta)^{\*-1} \* (D\delta - b\delta)^{\*-1}$$

with *a* and *b* different constants. Its partial fraction expansion has the form

$$c\_a(D\delta + a\delta)^{\*-1} + c\_b(D\delta - b\delta)^{\*-1}$$

with *ca* and *cb* constants to be determined. If we take the convolution of both expressions with

$$(D\delta + a\delta) \* (D\delta - b\delta)$$

we obtain the following equation

$$
\delta = c\_a (D\delta - b\delta) + c\_b (D\delta + a\delta) \,.
$$

Equating the coefficients of δ and *D*δ we obtain two equations for *ca* and *cb* whose solution is

$$c\_b = -c\_a = \frac{1}{a+b} \dots$$

A (convolution) algebra is said to be free from *zero divisors* if

$$\mathbf{g}\_1 \ast \mathbf{g}\_2 = 0$$

implies that either *g*<sup>1</sup> = 0 or *g*<sup>2</sup> = 0. In this case the algebra is called an *integral domain* and a convolution equation with common factors on both sides of the equation can be simplified. For example, assuming *f* = 0, the equation

$$f \ast g \ast \jmath = f \ast \imath$$

can be simplified to

*g* ∗ *y* = *x* .

In fact, the original equation can be written as

$$f \ast (g \ast y - x) = 0$$

and since *f* is different from zero, we can deduce the simplified form.

We will see that the convolution algebra of right-sided distributions D <sup>+</sup> is an integral domain. The algebra of periodic distributions D (T) is not.

# **7.3 Initial Value Problems**

In this section we want to apply the results of the previous section to study initial value problems. In particular let *L* denote the linear differential operator with constant coefficients of order *m*

$$L = D^m + a\_{m-1}D^{m-1} + \cdots + a\_1D + a\_0$$

where for convenience we have set *am* = 1. We are interested in the solution of the differential equation

$$L\mathbf{y}(t) = \mathbf{x}(t) \tag{7.7}$$

for *t* ≥ 0 with initial conditions

$$(D^k \mathbf{y})(0) = \mathbf{y}\_k \qquad k = 0, \ldots, m-1 \tag{7.8}$$

*x* and *y* functions and differentiation intended in the usual sense of differentiation of functions.

As a first step in translating this problem into the language of distributions, we note that the convolution algebra D <sup>+</sup> is well suited for the study of initial value problems. Every element of the algebra can be thought of as being in a zero state for *t* < 0 and representing some excitation or state evolution for *t* ≥ 0. The functions *x* and *y* can be associated with distributions of D <sup>+</sup> by extending them to negative values of *t* where we assign them the value of zero. To make this explicit it's usual to show them multiplied by the unit step 1+.

The second step is to perform differentiation in the sense of distributions. With the results of Example 2.9, for the first derivative of 1<sup>+</sup> *y* we have

$$D(\mathfrak{1}\_+(t)\mathfrak{y}(t)) = \mathfrak{1}\_+(t)D\mathfrak{y}(t) + \mathfrak{y}\_0\mathfrak{d}$$

and similarly, for the higher order derivatives

$$D^2(\mathfrak{l}\_+(t)\mathbf{y}(t)) = \mathfrak{l}\_+(t)D^2\mathbf{y}(t) + \mathfrak{y}\_0 D\delta + \mathfrak{y}\_1\delta$$

$$\cdots$$

$$D^m(\mathfrak{l}\_+(t)\mathbf{y}(t)) = \mathfrak{l}\_+(t)D^m\mathbf{y}(t) + \mathfrak{y}\_0 D^{m-1}\delta + \cdots + \mathfrak{y}\_{m-1}\delta \dots$$

Note that in all these expressions the first term on the right-hand side is the conventional derivative of the function *y* (multiplied by 1+). Putting these results in the differential equation we obtain an equivalent equation for the distribution 1<sup>+</sup> *y*

$$\begin{aligned} L(\mathfrak{l}\_+ \mathfrak{y}) &= \mathfrak{l}\_+ L \mathfrak{y} + \sum\_{k=0}^{m-1} \sigma\_k D^k \delta \\ &= \mathfrak{l}\_+ \mathfrak{x} + \sum\_{k=0}^{m-1} \sigma\_k D^k \delta \end{aligned}$$

with

$$\begin{aligned} \sigma\_k &= a\_{1+k}\mathbf{y}\_0 + a\_{2+k}\mathbf{y}\_1 + \dots + \mathbf{y}\_{m-k-1} \\ &= \sum\_{i=0}^{m-1-k} a\_{i+1+k}\mathbf{y}\_i \,, \qquad k = 0, \dots, m-1 \end{aligned} \tag{7.9}$$

and *am* = 1.

The last step required to translate the initial value problem into a convolution equation is to use the fact that the *k*th order derivative of a distribution can be expressed as the convolution product with *D<sup>k</sup>* δ so that

$$L(\mathfrak{l}\_+\mathfrak{y}) = L\mathfrak{l} \* \mathfrak{l}\_+\mathfrak{y} \dots$$

The initial value problem defined by (7.7) and (7.8) is therefore equivalent to the following convolution equation of distributions

$$L\delta \* \mathbb{1}\_{+}\mathbb{y} = \mathbb{1}\_{+}\mathrm{x} + \sum\_{k=0}^{m-1} \sigma\_{k} D^{k} \delta \,. \tag{7.10}$$

With the results of the previous section, if the distribution *L*δ has an inverse in D + (the elementary solution of the equation), the solution of the equation for arbitrary right-hand side 1+*x* and initial conditions is given by

$$\mathfrak{1}\_{+}\mathfrak{y} = (L\delta)^{\*-1} \ast \mathfrak{1}\_{+}\mathfrak{x} + \sum\_{k=0}^{m-1} \sigma\_{k} D^{k} \left[ (L\delta)^{\*-1} \right] \,. \tag{7.11}$$

It's worth highlighting two important points. The first one is the fact that the differential equation (7.7) is not a full description of the problem. To fully specify the problem it has to be accompanied by the initial conditions expressed by Eq. (7.8). Differently from this, the convolution Eq. (7.10) is a *full* description of the problem.

The second point that we want to highlight is the fact that (7.11) is a *global* solution of the problem, that is, the solution is specified for all times. Differently from this, the classical solution of the original initial value problem is a function only valid for *t* ≥ 0.

Next we show that the inverse (*L*δ)∗−<sup>1</sup> exists. To this end note that if we insert it in (7.10) and set *x* = 0 as well as σ<sup>0</sup> = 1 and σ*<sup>k</sup>* = 0, *k* = 1,..., *m* − 1 we obtain the equation *defining* the inverse

$$L\delta \ast (L\delta)^{\ast -1} = \delta \text{ .}$$

The inverse of *L*δ is thus the distribution 1+*e* with *e* the function which is the solution of the homogeneous equation

$$Le(t) = 0$$

with initial conditions

$$D^{m-1}e(t) = 1 \quad \text{and} \quad D^k e(t) = 0, \quad k = 0, \ldots, m-2.$$

#### **Example 7.2: Fundamental Solution**

Consider the differential operator

$$L = D + a.$$

The solution of the homogeneous differential equation *Le*(*t*) = 0 with initial condition *e*(0) = 1 is

$$e(t) = e^{-at} \,\_{.}$$

The inverse of *L*δ in the convolution algebra D <sup>+</sup> is therefore

$$(L\delta)^{\*-1} = (D\delta + a\delta)^{\*-1} = \mathfrak{1}\_+(t)e^{-at} \dots$$

This is easily verified by inserting it into the convolution equation for the operator *L*

$$\begin{split} L\delta \* (L\delta)^{\*-1} &= (D\delta + a\delta) \* \mathbb{1}\_{+}(t)e^{-at} = D(\mathbb{1}\_{+}(t)e^{-at}) + a\mathbb{1}\_{+}(t)e^{-at} \\ &= -a\mathbb{1}\_{+}(t)e^{-at} + \delta + a\mathbb{1}\_{+}(t)e^{-at} = \delta \ . \end{split}$$

In a similar way we find

$$(D\delta + a)^{-m} = \mathbb{1}\_+(t)\frac{t^{m-1}}{(m-1)!}e^{-at}$$

with *m* a positive natural number.

Let's focus for a moment on the distribution *L*δ and observe that it looks like a polynomial *P* with *D*δ playing the role of the independent variable

$$L\delta = D^m \delta + a\_{m-1} D^{m-1} \delta + \dots + a\_1 D \delta + a\_0 \delta \dots$$

Any polynomial can be represented as a product of factors

$$P(z) = (z - z\_1)(z - z\_2) \cdots (z - z\_m)$$

with *zj* the zeros that may or may not be distinct. From this and remembering that

$$D^k \delta \* D^i \delta = D^{k+i} \delta$$

we deduce that the distribution *L*δ can be factored in a similar way. If we denote by *f* <sup>∗</sup>*<sup>k</sup>* the convolution product of *k* ≥ 0 distributions equal to *f* with *f* <sup>∗</sup><sup>0</sup> = δ and group common factors, then *L*δ can be represented as

$$L\delta = (D\delta - z\_1\delta)^{\*l\_1} \ast (D\delta - z\_2\delta)^{\*l\_2} \ast \cdots \ast (D\delta - z\_n\delta)^{\*l\_n}$$

with *lj* the multiplicity of the *j*th factor. The inverse (*L*δ)∗−<sup>1</sup> can then also be factored

$$(L\delta)^{\*-1} = (D\delta - z\_1\delta)^{\*-l\_1} \ast (D\delta - z\_2\delta)^{\*-l\_2} \ast \cdots \ast (D\delta - z\_n\delta)^{\*-l\_n}$$

*f* ∗−*<sup>k</sup>* denoting the inverse of *f* <sup>∗</sup>*<sup>k</sup>* . With this factorization the elementary solution can either be directly expressed as a convolution product

$$\mathfrak{l}\_+(t)\,e(t) = \mathfrak{l}\_+(t)\frac{t^{l\_1-1}}{(l\_1-1)!}e^{z\_{1l}t} \ast \cdots \ast \mathfrak{l}\_+(t)\frac{t^{l\_n-1}}{(l\_n-1)!}e^{z\_{nl}t}$$

or, by first performing a partial fraction expansion, can be expressed as a sum of convolution-free known distributions.

To show the relation to the Laplace method, we Laplace transform Eq. (7.10). The Laplace transform of the distribution *L*δ becomes a true polynomial in the variable *s* and the convolution product becomes the conventional multiplication so that the convolution equation becomes an algebraic equation

$$\begin{aligned} P(\mathbf{s})\,Y(\mathbf{s}) &= X(\mathbf{s}) + \sum\_{k=0}^{m-1} \sigma\_k \mathbf{s}^k \\ P(\mathbf{s}) &= (\mathbf{s}^m + a\_{m-1}\mathbf{s}^{m-1} + \dots + a\_1\mathbf{s} + a\_0) \\ &= (\mathbf{s} - z\_1)^{l\_1} (\mathbf{s} - z\_2)^{l\_2} \dots \mathbf{s} + (\mathbf{s} - z\_n)^{l\_n} \dots \end{aligned}$$

The Laplace transformed of the inverse (*L*δ)∗−<sup>1</sup> is the reciprocal of *P*(*s*) and corresponds to the Laplace transform of the elementary solution *e*

$$E(\mathbf{s}) = \frac{1}{P(\mathbf{s})}.$$

With it the solution of the convolution equation can be written as

$$Y(s) = E(s)X(s) + E(s) \sum\_{k=0}^{m-1} \sigma\_k s^k \dots$$

The solution *y* of the original equation is then found by inverse Laplace transforming *Y* . In most cases this is most conveniently accomplished by partial fraction expansion.

This shows the parallelism between convolution equations in D <sup>+</sup> on one side and the Laplace transform method on the other one. In particular the distribution *D*δ is the time-domain counterpart of the variable *s*, the convolution product the counterpart of the ordinary multiplication and δ the one of the multiplicative unit element 1.

#### **Example 7.3**

Consider the differential equation

$$\left[D^2 + (a - b)D - ab\right] \mathbf{y}(t) = \mathbf{x}(t)$$

with initial conditions (*Dy*)(0) = *y*(0) = 0 and assume that *a* and *b* are different constants. The corresponding convolution equation

$$(D\delta + a\delta) \* (D\delta - b\delta) \* \mathcal{y} = \mathcal{x}$$

has as elementary solution the convolution product

$$e = (D\delta + a\delta)^{\*-1} \* (D\delta - b\delta)^{\*-1}$$

with partial fraction expansion (see Example 7.1)

$$e = \frac{1}{a+b} \left[ - (D\delta + a\delta)^{\*-1} + (D\delta - b\delta)^{\*-1} \right].$$

The inverse elements appearing in *e* were calculated in Example 7.2. Using those results we can express the elementary solution of the equation as

$$e(t) = \frac{1}{a+b} \left[ -\mathbb{1}\_+(t) \, e^{-at} + \mathbb{1}\_+(t) \, e^{bt} \right].$$

If we Laplace transform the equation, the procedure is completely parallel. The Laplace transformed of the elementary solution is

$$E(s) = \frac{1}{a+b} \left[ \frac{-1}{s+a} + \frac{1}{s-b} \right]$$

and by inversion we obtain the same distribution *e*.

We have seen that the initial value problem described by (7.7) and (7.8) can equivalently be described by the convolution (7.10). While the differential equation only has a meaning if *x* is a continuous function with isolated jump discontinuities, the convolution equation remain well-defined if 1+*x* is replaced by any distribution in D <sup>+</sup>. In particular, we can consider more general convolution equations of the form

$$L\delta \* \mathbf{y} = N\delta \* x + \sum\_{k=0}^{m-1} \sigma\_k D^k \delta$$

#### 104 7 Convolution Equations

with

$$N = b\_n D^n + b\_{n-1} D^{n-1} + \dots + b\_0 \dots$$

*x* any distribution in D <sup>+</sup> and where it's understood that the solution *y* must belong to the convolution algebra of distributions in D <sup>+</sup>. As before, the solution of the equation is found by convolving with the convolutional inverse of *L*δ

$$\mathbf{y} = (L\boldsymbol{\delta})^{\*-1} \ast N\boldsymbol{\delta} \ast \boldsymbol{x} + \sum\_{k=0}^{m-1} \sigma\_k (L\boldsymbol{\delta})^{\*-1} \ast D^k \boldsymbol{\delta} \cdot \boldsymbol{x}$$

We want to establish if it's possible to replace the second summand on the right-hand side, representing the initial conditions, by a suitably selected input signal composed by a weighted sum of a Dirac pulse and it's derivatives, such that, in the complement of *t* = 0, the solution *y* remains unchanged. To this end its convenient to consider the Laplace transformed of *y*

$$Y(\mathbf{s}) = \frac{Z(\mathbf{s})}{P(\mathbf{s})}X(\mathbf{s}) + \frac{\sum\_{k=0}^{m-1} \sigma\_k \mathbf{s}^k}{P(\mathbf{s})}$$

with *Z* = L{*N*δ} a polynomial of degree *n* and the other symbols having the same meaning as before. The Laplace transform of the sought for input signal is a polynomial

$$X(\mathbf{s}) = x\_q \mathbf{s}^q + \dots + x\_0$$

and it must be selected in such a way as to satisfy the equality

$$\frac{Z(\mathbf{s})}{P(\mathbf{s})}X(\mathbf{s}) = \frac{\sum\_{k=0}^{m-1} \sigma\_k \mathbf{s}^k}{P(\mathbf{s})} + W(\mathbf{s})$$

with *W*(*s*) another polynomial. This polynomial corresponds also to a weighted sum of Dirac pulses and its derivatives, and hence only changes *y* at *t* = 0, which we allow.

The conditions for the existence of such an input signal *X*(*s*) can be determined with the help of the division theorem of polynomials. It states that, given polynomials *Q*(*s*) and *P*(*s*) = 0, there are unique polynomials *R*(*s*) and *W*(*s*) satisfying

$$\mathcal{Q}(\mathbf{s}) = P(\mathbf{s})W(\mathbf{s}) + R(\mathbf{s})$$

with the degree of *R*(*s*) being lower than the one of *P*(*s*) [21]. From this theorem we deduce that, provided *Z*(*s*) and *P*(*s*) are *relatively prime*, that is, assuming that they have no common factors, we can select *X*(*s*) so that the rest of the division of *Z*(*s*)*X*(*s*) by *P*(*s*) corresponds to *<sup>m</sup>*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> <sup>σ</sup>*<sup>k</sup> <sup>s</sup><sup>k</sup>* . To achieve this we need *<sup>m</sup>* degrees of freedom, one for each σ*<sup>k</sup>* . In other words, the input polynomial *X*(*s*) must have degree *m* − 1. Then we can choose the coefficients of *X*(*s*) in such a way as to obtain the desired values for the rest of the division.

If *Z*(*s*) and *P*(*s*) have a common factor *K*(*s*) then

$$\begin{aligned} \frac{Z(s)X(s)}{P(s)} &= \frac{K(s)Z'(s)X(s)}{K(s)P'(s)} = \frac{K(s)}{K(s)} \left(\frac{R(s)}{P'(s)} + W(s)\right) \\ &= \frac{K(s)R(s)}{P(s)} + W(s) \end{aligned}$$

and we see that the rest of the division *K*(*s*)*R*(*s*) has a constrained form that can't be made to match *m*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> <sup>σ</sup>*<sup>k</sup> <sup>s</sup><sup>k</sup>* for arbitrary <sup>σ</sup>*<sup>k</sup>* s.

We have therefore established that, in a convolution equation derived from an initial value problem, the terms representing the initial conditions can be replaced by a distribution *x* composed by a weighted sum of a Dirac pulse and its derivatives if and only if *Z*(*s*) and *P*(*s*) have no common factors. If we perform this substitution, in the complement of *t* = 0, the solution of the equation *y* remains unchanged.

#### **Example 7.4: Replacing Initial Conditions**

Consider the initial value problem

$$\begin{aligned} \left(D^2 + a\_1 D + a\_0\right) \mathbf{y} &= \left(b\_1 D + b\_0\right) \mathbf{x}, \\ (D\mathbf{y})(0) &= \mathbf{y}\_1, \qquad \mathbf{y}(0) = \mathbf{y}\_0 \end{aligned}$$

The corresponding convolution equation is

$$(D\delta^2 + a\_1 D\delta + a\_0 \delta) \* \mathbf{y} = (b\_1 D\delta + b\_0) \* x + y\_0 D\delta + (a\_1 \mathbf{y}\_0 + \mathbf{y}\_1)\delta \ .$$

Our objective is to replace the initial conditions by an input signal composed by a Dirac pulse and its derivatives so that in the complement of *t* = 0 the solution *y* of the convolution equation with this input signal is identical to the solution of the equation with initial conditions and no input signal.

Expressed in the Laplace domain the problem is thus to find the coefficients of the polynomial

$$X(s) = \alpha\_1 s + \alpha\_0$$

such that

$$\frac{Z(\mathbf{s})X(\mathbf{s})}{P(\mathbf{s})} = \frac{R(\mathbf{s})}{P(\mathbf{s})} + W(\mathbf{s})$$

with

$$Z(\mathbf{s}) = b\_1 \mathbf{s} + b\_0, \qquad P(\mathbf{s}) = \mathbf{s}^2 + a\_1 \mathbf{s} + a\_0, \qquad R(\mathbf{s}) = \mathbf{y}\_0 \mathbf{s} + a\_1 \mathbf{y}\_0 + \mathbf{y}\_1.$$

and *W*(*s*) an arbitrary polynomial of degree lower than 2. By performing the polynomial division of the left-hand side of the equation we obtain

$$\frac{s(-a\_1b\_1x\_1 + b\_0x\_1 + b\_1x\_0) - a\_0b\_1x\_1 + b\_0x\_0}{s^2 + a\_1s + a\_0} + b\_1x\_1 \dots$$

Thus *W*(*s*) = *b*1*x*<sup>1</sup> and, by comparing coefficients of this expression with the righthand side of the equation, the coefficients of *X*(*s*) are found to be

$$\begin{aligned} \mathbf{x}\_0 &= \frac{(a\_1 b\_1 - b\_0)\mathbf{y}\_1 + [(a\_1^2 - a\_0)b\_1 - a\_1 b\_0]\mathbf{y}\_0}{a\_0 b\_1^2 - a\_1 b\_0 b\_1 + b\_0^2}, \\ \mathbf{x}\_1 &= \frac{b\_1 \mathbf{y}\_1 + (a\_1 b\_1 - b\_0)\mathbf{y}\_0}{a\_0 b\_1^2 - a\_1 b\_0 b\_1 + b\_0^2}. \end{aligned}$$

This solution is well-defined except when the denominator, which is the same for both *x*<sup>1</sup> and *x*0, becomes zero. This happens when

$$a\_1 = \frac{a\_0 b\_1}{b\_0} + \frac{b\_0}{b\_1}.$$

In this case the polynomial *Z*(*s*) becomes a factor of *P*(*s*)

$$s^2 + (\frac{a\_0 b\_1}{b\_0} + \frac{b\_0}{b\_1})2 + a\_0 = (b\_1 s + b\_0)(\frac{1}{b\_1}s + \frac{a\_0}{b\_0})^2$$

in accordance with our general treatment of the problem.

Before concluding this section we show the important fact mentioned before that *the convolution algebra* D <sup>+</sup> *has no zero divisors*. To see this, consider a test function φ that is real-valued and positive everywhere on its support, for example βν from Example 2.1. We call such a function a positive test function. In Chap. 3 we saw that every distribution can be represented as the limit of a sequence of indefinitely differentiable functions. Let (*gm*) and (*ym*) be such sequences converging to *g* and *y* respectively and, for simplicity, assume that all functions are real-valued. Then, for every *m* there exists an open interval *U* contained in the support of *gm* where, for every positive test function ζ with support in *U*, its value always has the same sign, for example positive

$$
\langle \mathbf{g}\_m, \zeta \rangle > 0 \dots
$$

We can make a similar construction for *yi* as well. In addition, we can introduce a parameter λ such that

$$
\lambda \mapsto \langle \mathbf{y}\_i(\mathfrak{r}), \boldsymbol{\phi}(\mathfrak{r} + \lambda) \rangle,
$$

is a positive (or negative) test functions of λ with support in *U*. Then, assuming again a positive sign,

$$\langle \mathcal{g}\_m \* \mathcal{y}\_i, \phi \rangle = \langle \mathcal{g}\_m(\lambda), \langle \mathcal{y}\_i(\tau), \phi(\tau + \lambda) \rangle \rangle$$

must be positive for every *m* and *i* and, with the continuity of distributions and convolution, so must be the limit. Consequently, if *g* ∗ *y* vanish for every test function then either *g* or *y* must be the zero distribution.

# **7.4 Integro-Differential Equations**

Some initial value problems are naturally formulated as ordinary integro-differential equations

$$\begin{aligned} &D^m \mathbf{y}(t) + a\_{m-1} D^{m-1} \mathbf{y}(t) + \dots + a\_1 D \mathbf{y}(t) + a\_0 \mathbf{y}(t) \\ &+ a\_{-1} \int\_0^t \mathbf{y}(\tau\_1) \, d\tau\_1 + \dots + a\_{-n} \int\_0^t \dots \int\_0^{\tau\_{n-1}} \mathbf{y}(\tau\_n) \, d\tau\_n \dots \, d\tau\_1 \\ &= \mathbf{x}(t) \end{aligned}$$

with initial conditions

$$(D^k \mathbf{y})(0) = \mathbf{y}\_k \quad \quad k = 0, \ldots, m-1 \,. \tag{7.12}$$

We still need initial conditions, but this time only *m* of them as the remaining information is included in the integrals.

These problems can be converted into convolution equations in the convolution algebra D <sup>+</sup> in a similar way as we discussed before. The new terms are the ones that are expressed as integrals and these can be written as convolution products

$$\int\_0^t \mathbf{y}(\mathfrak{r}\_1) \, d\mathfrak{r}\_1 = \mathfrak{l}\_+(t) \* \mathfrak{l}\_+(t)\mathbf{y}(t)$$

$$\cdots$$

$$\int\_0^t \cdots \int\_0^{\mathfrak{r}\_{n-1}} \mathbf{y}(\mathfrak{r}\_n) \, d\mathfrak{r}\_n \cdots \, d\mathfrak{r}\_1 = \mathfrak{l}\_+^{\*n}(t) \* \mathfrak{l}\_+(t)\mathbf{y}(t) \dots$$

The corresponding convolution equation is therefore

$$\begin{aligned} &\left(D^m \delta + a\_{m-1} D^{m-1} \delta + \dots + a\_1 D \delta + a\_0 \delta\right) \\ &+ a\_{-1} \mathbb{1}\_+ + \dots + a\_{-n} \mathbb{1}\_+^{\*n}\right) \* \mathbb{1}\_+(t) \mathbf{y}(t) \\ &= \mathbb{1}\_+(t) \mathbf{x}(t) + \sum\_{k=0}^{m-1} \sigma\_k D^k \delta \end{aligned}$$

with σ*<sup>k</sup>* , *k* = 0,..., *m* − 1 as defined in (7.9).

As we have seen, the convolution algebra D <sup>+</sup> is an integral domain. For this reason we can multiply both sides of the equation with a non-zero distribution without changing the result. If we choose *D<sup>n</sup>*δ as the distribution and make use of the fact that 1+(*t*) is the inverse of *D*δ

$$D\delta \* \mathbf{1}\_+ = \delta$$

the equation becomes

$$\begin{aligned} &\left(D^{m+n}\delta + a\_{m-1}D^{m-1+n}\delta + \cdots \circ a\_{-n}\delta\right) \* \mathbb{1}\_+(t)\mathbf{y}(t) \\ &= D^n\delta \* \mathbb{1}\_+(t)\mathbf{x}(t) + \sum\_{k=0}^{m-1} \sigma\_k D^{k+n}\delta .\end{aligned}$$

This is the type of convolution equation that we discussed in Sect. 7.3 and is solved by the same method. The solution of integro-differential equations thus requires no new technique.

The procedure of transforming the convolution equation that we just discussed is similar to the standard procedure used to convert an integro-differential equation into a differential equation by differentiating the equation. The key difference is that, while the former handles initial conditions automatically, the latter method requires extraction of additional conditions from the original equation.

# **7.5 Periodic Solutions**

One is often interested in periodic solutions of differential equations. These solutions are most conveniently found with the help of the convolution algebra of periodic distributions.

Consider again the convolution equation obtained from the differential operator *L* of Sect. 7.3 where now the unit element of the algebra is the Dirac comb δ<sup>T</sup>

$$L\delta\_{\mathcal{T}} \* \mathcal{y} = \mathcal{x}\,.$$

In Sect. 4.5 we established two important properties of the Fourier series:


#### 7.5 Periodic Solutions 109

$$c\_k(L\delta\_\mathcal{T}) = \left[ (f k \omega\_c)^m + a\_{m-1} (f k \omega\_c)^{m-1} + \dots + a\_1 (f k \omega\_c) + a\_0 \right] \frac{1}{\mathcal{T}}$$

$$= P(f k \omega\_c) \frac{1}{\mathcal{T}} \, .$$

By representing both *x* and *y* by their respective Fourier series and using these two properties, we can transform the above convolution equation into algebraic equations for the Fourier coefficients. Let's denote by *ck* the *k*th Fourier coefficient of *x* and by *dk* the one of *y*. Then the equation becomes

$$(j k a\_c - z\_1)^{l\_1} (j k a\_c - z\_2)^{l\_2} \cdots \cdots (j k a\_c - z\_n)^{l\_n} d\_k = c\_k$$

where, as before, we have expressed the polynomial *P* by its zero factors. To solve the equation we have to distinguish three cases:


$$L\delta\_{\mathcal{T}} \* e^{Jk\alpha\_{\ast}t} = 0$$

vanishes, which means that the convolution algebra of periodic distributions has zero divisors.

3. If no zero *zj* equals j*k*ω*<sup>c</sup>* for any value of *k* then the equation has the unique solution given by the Fourier series with coefficients

$$d\_k = \frac{c\_k}{P(j k a c\_c)}\,.$$

#### **Example 7.5: Cont. of Example 7.3**

We look for a periodic solution of the convolution equation of Example 7.3

$$(D\delta\_{\mathcal{T}} + a\delta\_{\mathcal{T}}) \* (D\delta\_{\mathcal{T}} - b\delta\_{\mathcal{T}}) \* \chi = x$$

assuming that the real part of *a* and *b* are both positive. In particular, we are interested in the elementary solution *e* of the equation. By setting *x* = δ<sup>T</sup> and expanding it by its Fourier series we obtain the following equation for the *k*th Fourier coefficient of *e*

$$e\_k = \frac{1}{\mathcal{T}} \cdot \frac{1}{(j k a\_c + a)(j k a\_c - b)} \cdot \frac{1}{\cdot}$$

By performing a partial fraction expansion and with the help of (4.24), we recognize them as the coefficients of the Fourier series of the distribution

$$e(t) = \mathbf{g}(t) \* \delta\_{\mathcal{T}}$$

with

$$\log(t) = \frac{-1}{a+b} \left[ \mathbb{1}\_+(t) \, e^{-at} + \mathbb{1}\_+(-t) \, e^{bt} \right].$$

In fact the Fourier transform of *g* is

$$\begin{split} \hat{\mathcal{g}}(\omega) &= \frac{1}{a+b} \left[ \frac{-1}{j\omega+a} + \frac{1}{j\omega-b} \right] \\ &= \frac{1}{(j\omega+a)(j\omega-b)} .\end{split}$$

Note that *g* is a distribution of slow growth. The elementary solution of the equation in the algebra of periodic distributions is therefore the sum of periodically shifted tempered solutions of the differential equation.

Suppose now that we are interested in the solution for *x*(*t*) = *Ae*jω*<sup>c</sup> <sup>t</sup>* . The only Fourier coefficient of *x* different from zero is *c*<sup>1</sup> = *A*. The Fourier coefficients of *y* are then also all zero except for

$$d\_1 = \mathcal{T} \; c\_1 \; e\_1 = A \; \hat{\mathbf{g}}(a\_c) \; .$$

In this case the solution *y* of the equation is therefore

$$\mathbf{y}(t) = A \,\hat{\mathbf{g}}(\alpha\_c) \, e^{J\alpha\_c t} \,.$$

# **7.6 General Convolution Equations**

# *7.6.1 General Solutions*

In this section we consider generic convolution equations of the form

$$\text{g } \* \text{ y = x}$$

with *g*, *y* and *x* generic distributions in D . Here the situation is different from when working in a convolution algebra. First the convolution between *g* and *y* may not exit. To guarantee its existence *g* must have compact support. This includes many important cases, for example, all linear differential operators with constant coefficients.

Second, *g* may not have an inverse. For example, if *g* ∈ D we have seen in Sect. 3.2 that *g* ∗ *y* ∈ E and so can't equal δ for any *y* ∈ D . If it does, then the equation has an elementary solution, but it only serves to find solutions for *x* having compact support, otherwise the last convolution in

$$\mathbf{y} = \mathbf{g}^{\*-1} \ast \mathbf{g} \ast \mathbf{y} = \mathbf{g}^{\*-1} \ast \mathbf{x}$$

may not make sense.

Further, the homogeneous equation

$$\mathbf{g} \ast \mathbf{y} = 0$$

may have solutions different from *y* = 0. For this reason there may be an infinity of elementary solution, two of them differing by a solution of the homogeneous equation.

Despite these facts, general convolution equations have many practical applications.

#### **Example 7.6: Electrostatics**

Let ρ denote the electric charge density and *u* the electrostatic potential, both functions of the position in space. In empty space the two quantities are related by Poisson's equation

$$
\Delta u(x) = -\frac{\rho(x)}{\epsilon\_0}
$$

with the Laplace operator, *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> the vector specifying position and <sup>0</sup> the permittivity of free space. This equation can be written as a convolution equation

$$
\Delta \delta \* u = -\frac{\rho(\mathbf{x})}{\epsilon\_0} \, .
$$

One can show that the inverse of δ is

$$-\frac{1}{4\pi|\mathbf{x}|}\dots$$

If the charge density <sup>ρ</sup> is distributed over a finite region <sup>⊂</sup> <sup>R</sup><sup>3</sup> then the generated potential is

$$\mu(\mathbf{x}) = \frac{1}{4\pi\epsilon\_0|\mathbf{x}|} \ast \rho(\mathbf{x})\,.$$

The homogeneous equation has solutions different from the trivial one: the socalled harmonic functions.

# *7.6.2 Tempered Solutions*

If *x* is tempered and one is interested in tempered solutions of the equation then the convolution equation has a sense not only for *g* of compact support, but for the larger class of distributions of rapid descent O *<sup>C</sup>* [16]. This case is particularly important because one can then use the Fourier transform which may make it easier to find a solution.

In the following we briefly consider the one dimensional case where *g* is a linear differential operator with constant coefficients *L* and the convolution equation has the form

$$L\delta \ast \mathcal{y} = \mathcal{x}.$$

In this case there always is at least an elementary solution. If we Fourier transform both sides of the equation we find the equivalent equation

$$P
\circ = \hat{x}$$

with *P* a polynomial (and thus in O*<sup>M</sup>* ).

If *P* has no zeros, then the only solution of the homogeneous equation is the trivial one and the inverse of *P* is a function of slow growth 1/*P* ∈ O*<sup>M</sup>* . The only elementary solution of the equation is therefore the summable distribution

$$e = \mathcal{F}^{-1}\{\frac{1}{P}\}\dots$$

If *P* has a zero at ω*<sup>p</sup>* then the homogeneous equation has nontrivial solutions. In particular, we saw Sect. 2.5.1 that if the multiplicity of the zero is *k* then the sums

$$\sum\_{m=0}^{k-1} c\_m \, D^m \delta(\omega - \omega\_p)$$

with *cm* constants, are all solutions of the Fourier transformed homogeneous equation *P y*ˆ = 0. The solutions of the original homogeneous equation are found by inverse Fourier transformation to be

$$\sum\_{m=0}^{k-1} \frac{c\_m}{2\pi} \left(-jt\right)^m e^{j\alpha\_{\mathcal{P}}t} \dots$$

The equation has therefore an infinity of elementary solutions. In addition, since 1/*P* ∈/ O*<sup>M</sup>* , the solutions are not summable distributions.

Note that the equation may have non-tempered solutions that are not captured by Fourier transform techniques.

#### **Example 7.7: Cont. of Example 7.3**

We look for a tempered solution of the convolution equation of Example 7.3

$$(D\delta + a\delta) \* (D\delta - b\delta) \* \mathbf{y} = \mathbf{x}$$

assuming that the real part of *a* and *b* are both positive. A tempered elementary solution is easily found by solving the Fourier transformed the equation

$$
\hat{e}(\omega) = \frac{1}{P(\omega)} = \frac{1}{(j\omega + a)(j\omega - b)}\ .
$$

and determining its inverse

$$e(t) = \frac{-1}{a+b} \left[ \mathfrak{1}\_{+}(t) \, e^{-at} + \mathfrak{1}\_{+}(-t) \, e^{bt} \right].$$

Note that despite the similarity between *e*ˆ(ω) and *E*(*s*) of Example 7.3 the tempered elementary solution is different from the solution found in the convolution algebra D +.

Since *P*(ω) has no zeros, *e* is the only elementary tempered solution of the equation. Other solutions obtained by adding any linear combination of the solutions of the homogeneous equation (*e*−*at* and *ebt*) growth exponentially as *t* tends either to ∞ or to −∞ and are therefore not tempered distributions.

# **7.7 Systems of Convolution Equations**

One often has to solve a set of *n* simultaneous equations in *n* unknown distributions *y*1,..., *yn*

$$\begin{aligned} g\_{11} \* y\_1 + g\_{12} \* y\_2 + \cdots + g\_{1n} \* y\_n &= x\_1 \\ g\_{21} \* y\_1 + g\_{22} \* y\_2 + \cdots + g\_{2n} \* y\_n &= x\_2 \\ \vdots \\ g\_{n1} \* y\_1 + g\_{n2} \* y\_2 + \cdots + g\_{nn} \* y\_n &= x\_n \end{aligned}$$

with *gjm* coefficients distributions, *x*1,..., *xn* right-hand side distributions and where all distributions belong to a distribution algebra A . This system of equations can conveniently be written in matrix form

$$G \* Y = X \tag{7.13}$$

with *G* the *n* × *n* matrix with elements *gjm* and *Y*, *X vector valued distributions* (column matrices) with elements *yj* and *x <sup>j</sup>* respectively. The space of vector valued

#### 114 7 Convolution Equations

distributions is denoted by D (R*<sup>m</sup>*, <sup>C</sup>*<sup>n</sup>*) and application of a test function <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R*<sup>m</sup>*) to a vector *X* is defined as the application of φ to each component individually

$$
\langle X, \phi \rangle := \begin{bmatrix} \langle \mathbf{x}\_1, \phi \rangle \\ \vdots \\ \langle \mathbf{x}\_n, \phi \rangle \end{bmatrix}.
$$

The determinant of the matrix *G* is defined as usual, with the convolution product replacing the standard product. It is a convolution belonging to the convolution algebra A . For example, the determinant of a 2 × 2 matrix *G* is

$$\det \begin{bmatrix} \mathbf{g}\_{11} \ \mathbf{g}\_{12} \\ \mathbf{g}\_{21} \ \mathbf{g}\_{22} \end{bmatrix} = \mathbf{g}\_{11} \* \mathbf{g}\_{22} - \mathbf{g}\_{21} \* \mathbf{g}\_{12} \ \mathbf{J}$$

Suppose that the matrix *G* has an inverse *G*∗−<sup>1</sup>

$$G \ast G^{\ast -1} = \delta I$$

where δ *I* is the identity matrix with the unit of A on the diagonal and 0 everywhere else. If we compute the determinant of both sides of this equation we obtain

$$\det(G \ast G^{\ast -1}) = \det(G) \ast \det(G^{\ast -1}) = \det(\delta I) = \delta$$

from which we deduce that, if the matrix *G* has an inverse then det(*G*) has an inverse in A . Conversely, if det(*G*) has an inverse, then we can compute the inverse of *G* by

$$G^{\*-1} = \det(G)^{\*-1} \* \tilde{G}^T$$

with *G*˜ the matrix of cofactors and *G*˜ *<sup>T</sup>* its transpose.

We conclude that (7.13) has a solution for arbitrary right-hand side *X* if and only if det(*G*) has an inverse in A . The solution is given by

$$Y = G^{\*-1} \* X \,. \tag{7.14}$$

One shows in a similar way as for a single equation (see Sect. 7.2) that *G*∗−<sup>1</sup> and hence the solution of the equation is unique.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 8 Linear Time Invariant Systems**

We assume the reader to have familiarity with linear time-invariant (LTI) systems. In this chapter we merely summarise the main results of this theory. We are going to call the quantities that are considered the input, the output and some characterization of the system *signals*. This should evoke a meaningful interpretation in most of the systems that we are going to discuss. Mathematically they are distributions.

# **8.1 Basic Definitions**

The meaning of time-invariant is very intuitive: suppose that we apply the input signal *x*(*t*) to a system represented by an operator H and observe the signal

$$\mathbf{y}(t) = \mathcal{H}[\mathbf{x}(t)]$$

as its output (Fig. 8.1). The system is said to be *time-invariant* if by applying the delayed input signal *x*(*t* − τ ) we observe the same output signal as before, except for a delay in time by an amount τ , that is, if

$$\mathbf{y}(t-\tau) = \mathcal{H}[\mathbf{x}(t-\tau)]\,. \tag{8.1}$$

The concept of linearity is subtler. A defining property of a linear system is the validity of the *superposition principle*: if *y*1(*t*) is the response of the system to the input *x*1(*t*) and *y*2(*t*) the one to *x*2(*t*), then the response to a linear combination of these inputs is

$$\begin{split} \mathbf{y}(t) &= \mathcal{H}[c\_1 \mathbf{x}\_1(t) + c\_2 \mathbf{x}\_2(t)] = c\_1 \mathcal{H}[\mathbf{x}\_1(t)] + c\_2 \mathcal{H}[\mathbf{x}\_2(t)] \\ &= c\_1 \mathbf{y}\_1(t) + c\_2 \mathbf{y}\_2(t) \end{split} \tag{8.2}$$

with *c*<sup>1</sup> and *c*<sup>2</sup> constants. However, if we limit the definition of a linear system to this property, then we admit pathological systems as the following one.

**Fig. 8.1** Representation of a single-input single-output LTI system H

#### **Example 8.1: A Discontinuous System [22]**

Consider a system accepting as input a piece-wise continuous function with at most a finite number of isolated jump discontinuities. The system response consists in the sum of the input signal jumps from −∞ to the present time *t*.

The system satisfies (8.2). However, the behaviour is rather peculiar. If we apply, say, a rectangular input then the output is also rectangular. But, if we approximate to any degree of accuracy the rectangular input with a continuous function, then the output is always zero.

To exclude systems with such a bizarre behavior, we require linear systems to be *continuous*: if as *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> tends to <sup>∞</sup> the sequence of input signals *xm*(*t*) converges (in the sense of distributions) to the signal *x*(*t*), then the system response *ym*(*t*) corresponding to input *xm*(*t*) converges to the response *y*(*t*) corresponding to *x*(*t*).

Suppose that we apply an impulse δ(*t*) to the input of the system H and observe the signal *h*(*t*) at its output. Then, by linearity, if we apply a finite number of pulses the output must be

$$\mathcal{H}[\sum\_{j=1}^{n} a\_j \,\delta(t - \tau\_j)] = \sum\_{j=1}^{n} a\_j \, h(t - \tau\_j) = h(t) \* \sum\_{j=1}^{n} a\_j \, \delta(t - \tau\_j) \, .$$

In Sect. 3.3 we saw that every distribution can be represented as the limit of a finite series of Dirac impulses. From this and the linearity of convolution (Eq. (3.19)) we obtain that, in the limit as *n* tends to infinity, if the input converges to the signal *x*(*t*) the output of the system converges to

$$\mathbf{y}(t) = h(t) \* \mathbf{x}(t) \,.$$

We therefore define

**Definition 8.1** (*LTI System*) A single-input, single-output (SISO), linear timeinvariant (LTI) system is a system that, when driven by an input signal *x*(*t*) produces the output

$$\mathbf{y}(t) = h(t) \* \mathbf{x}(t) \tag{8.3}$$

with *h*(*t*) the *impulse response* of the system.

A system is called *real* if, when driven by a real distribution, its response is a real distribution. In other words, if its impulse response is a real distribution.

While we have been talking about signals depending on time, we can abstract from that and talk about signals depending on a generic *n* dimensional independent variable <sup>λ</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>*. In this case, instead of time-invariance, it makes more sense to adapt (8.1) to

$$\mathbf{y}(\lambda - \mathfrak{r}) = \mathcal{H}[\mathbf{x}(\lambda - \mathfrak{r})]$$

and talk about *translation invariance*. A single-input single-output, linear translationinvariant system is then still described by a convolution product similar to (8.3) where however the independent variable *t* is replaced by the abstract *n* dimensional variable λ. We are going to call a system of this type an LTI system as well.

# **8.2 Causality**

Assume for simplicity that *h* and *x* are integrable functions of time. The response of a system characterized by *h* when driven by the input *x* can then be written in integral form

$$\mathbf{y}(t) = \int\_{-\infty}^{\infty} h(\mathbf{r}) \, \mathbf{x}(t - \tau) \, d\tau = \int\_{-\infty}^{\infty} h(t - \tau) \, \mathbf{x}(\tau) \, d\tau \, \dots$$

Suppose now that the input vanishes for *t* < 0. Then from

$$y(t) = \int\_0^\infty h(t - \tau) \, x(\tau) \, d\tau$$

we see that in general the system may produce a nonzero response *y*(*t*) for *t* < 0, that is, before the input signal *x*(*t*) has been applied.

If a system is *causal*, that is, if its output at time *t*<sup>0</sup> can only depend on values of the input signal at times *t* ≤ *t*0, then its impulse response *h*(*t*) must vanish for *t* < 0. In other words *h* must be a right-sided distribution in D +.

Note that in our interpretation of signals as being functions of time, non-causal systems are not physically implementable and appear to be meaning-less. However, non-causal systems are sometimes useful in theoretical studies. In addition, in many situations the theory of LTI systems can be applied to systems where the quantities of interest (the input and output) are not functions of time (see Example 7.6).

# **8.3 Stability**

An important aspect of a system is its stability. Let *x*(*t*) be a *bounded function*, that is, satisfying

$$\|\|x\|\|\_{\infty} := \sup\_{t \in \mathbb{R}} |x(t)| < \infty.$$

The response of a system characterized by the impulse response *h*(*t*) to such an input signal is

$$\mathbf{y}(t) = h(t) \* \mathbf{x}(t) \; .$$

The output *y*(*t*) is well-defined if

$$\langle h \ast x, \phi \rangle < \infty$$

for every test function φ ∈ D and for every sequence (φ*m*) converging to zero in D

$$\lim\_{m \to \infty} \langle h \ast x, \phi\_m \rangle = 0 \dots$$

In this case we say that the system is *bounded-input bounded-output (BIBO) stable*.

For a system to be BIBO stable

$$\langle h(t) \* \mathfrak{x}(t), \phi(t) \rangle = \langle h(t), \int\_{\mathbb{R}} \mathfrak{x}(\mathfrak{x}) \phi(t + \mathfrak{x}) \, d\mathfrak{x} \rangle$$

must have a meaning. Observe that the inner integral is an indefinitely differentiable bounded function. For the convolution to have a meaning the impulse response of the system must therefore be extensible to a continuous linear form on B. As we saw in Sect. 6.1 this is only the case if *h* is a summable distribution. Thus, *for a system to be BIBO stable, its impulse response must be a summable distribution*.

We mention without going into details that the definition of a BIBO stable system can be extended to input signals that are so-called bounded distributions and usually denoted by B or D *<sup>L</sup>*<sup>∞</sup> [16].

The series connection, or cascade of two stable systems results in a stable system. This is so because the convolution of summable distributions is always well-defined and is itself a summable distribution. In addition, for linear systems the order of the connection is irrelevant as, if *h <sup>A</sup>* and *hB* are the impulse responses of the two systems

$$h\_A \* h\_B = h\_B \* h\_A \dots$$

# **8.4 Transfer Function**

# *8.4.1 Stable Systems*

If a system is stable then its impulse response *h* can be Fourier transformed and the transformed *h*ˆ is a continuous function of slow growth called the *frequency response* of the system. If the input signal *x* is also a summable distribution then it can also be Fourier transformed and the Fourier transform of the output signal can be represented by the product

$$
\hat{\chi}(\omega) = \dot{h}(\omega)\hat{\chi}(\omega)\,. \tag{8.4}
$$

If the input signal *x* is T-periodic, then the system can be analysed in the convolution algebra of periodic distributions. To do so the impulse response *h* is converted in a periodic distribution by convolving it with the unit of the convolution algebra of periodic distributions δ<sup>T</sup>

$$h\_{\mathcal{T}} := h \ast \delta\_{\mathcal{T}} \dots$$

Provided that *h*<sup>T</sup> is well-defined, which for stable systems is always the case, then the output of the system can be represented by

$$\mathbf{y} = h\_{\mathcal{T}} \* \boldsymbol{x} \boldsymbol{\cdot}$$

Note that while the convolution used to define *h*<sup>T</sup> is the convolution in D (R), the latter is the convolution in D (T). As discussed in Sect. 7.5, the equation is most conveniently solved with the help of the Fourier series. If we denote by *cm* (*y*), *cm*(*h*<sup>T</sup> ) and *cm*(*x*) the *m*th Fourier coefficient of *y*, *h*<sup>T</sup> and *x* respectively, then the equation is solved if

$$c\_m(\mathbf{y}) = \mathcal{T}c\_m(h\_{\mathcal{T}})c\_m(\mathbf{x})$$

for every *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>. From (4.24) we know that

$$c\_m(h\_{\mathcal{T}}) = \frac{\hat{h}(ma\_c)}{\mathcal{T}}$$

with ω*<sup>c</sup>* = 2π/T. Therefore, by knowing the Fourier transform of the impulse response we can immediately obtain the Fourier coefficients of the output signal by

$$\mathbf{c}\_{m}(\mathbf{y}) = \dot{h}(m\omega\_{c})\mathbf{c}\_{m}(\mathbf{x})\,. \tag{8.5}$$

In particular, if the input is the complex tone ejω*<sup>c</sup> <sup>t</sup>* , the output is also a complex tone at the exact same frequency

$$\mathbf{y}(t) = \hat{h}(a\_c) \mathbf{e}^{f a\_c t} \dots$$

If the input of the system is the sum of two (or more) periodic signals *xA* and *xB* with incommensurate frequencies ω*<sup>A</sup>* and ω*B*, that is, if the ratio of the two frequencies ω*A*/ω*<sup>B</sup>* is an irrational number, then the input signal is not periodic, but *almost periodic*. Due to the linearity and continuity of the system, the response can still be calculated by the above technique for each input separately and the result combined

$$\mathbf{y}(t) = \sum\_{m = -\infty}^{\infty} \hat{h}(m\omega\_A) c\_m(\mathbf{x}\_A) \mathbf{e}^{m\omega\_A t} + \hat{h}(m\omega\_B) c\_m(\mathbf{x}\_B) \mathbf{e}^{m\omega\_B t} \dots$$

# *8.4.2 Causal Systems*

If the system is causal, that is, if its impulse response *h* is a distribution in D <sup>+</sup>, and one is interested in the system response for right-sided input signals *x* ∈ D <sup>+</sup>, then the system response *y* can be calculated in the convolution algebra D <sup>+</sup>. In particular, if *h* and *x* are Laplace transformable then the Laplace transformed of the output signal can be calculated by

$$Y(\mathbf{s}) = H(\mathbf{s})X(\mathbf{s})\,. \tag{8.6}$$

The Laplace transformed *H*(*s*) of the impulse response *h* is called the system *transfer function*.

If the system is BIBO stable, then the ROC of *H*(*s*) includes the imaginary axis *s* = *j*ω. In this case the Fourier transformed of *h* is immediately obtained from the transfer function by

$$
\hat{H}(\omega) = H(j\omega) \,. \tag{8.7}
$$

Note that if the system is not BIBO stable then this relation is not valid even if the Fourier transform of *h* does exits. See Example 5.4 for a simple example where the system corresponds to an ideal integrator.

In the following we are going to denote distributions belonging to D <sup>+</sup> ∩ D *<sup>L</sup>*<sup>1</sup> by D *<sup>L</sup>*1+.

# **8.5 Rational Transfer Functions**

Consider a causal system described by a rational transfer function

$$H(s) = \frac{N(s)}{P(s)} = \frac{b\_n s^n + b\_{n-1} s^{n-1} + \dots + b\_0}{s^m + a\_{m-1} s^{m-1} + \dots + a\_0}.$$

Given the Laplace transform *X*(*s*) of the input signal *x*, the Laplace transformed of the output is

$$Y(\mathbf{s}) = \frac{N(\mathbf{s})}{P(\mathbf{s})} X(\mathbf{s}) \dots$$

If we multiply both sides of this equation by *P*(*s*) we obtain

$$P(\mathbf{s})Y(\mathbf{s}) = N(\mathbf{s})X(\mathbf{s}).$$

and by inverse Laplace transforming the equation we obtain the convolution equation

$$\begin{aligned} &\left(D^n\delta + a\_{n-1}D^{n-1}\delta + \cdots + a\_0\delta\right) \* \mathbf{y} \\ &= \left(b\_m D^m\delta + b\_{m-1}D^{m-1}\delta + \cdots + b\_0\delta\right) \* x \end{aligned}$$

With the results of Sect. 7.3 we see that this equation corresponds to the initial value problem described by the linear differential equation with constant coefficients

$$L\mathbf{y}(t) = \mathbf{x}\_a(t)$$

with

$$L = D^m + a\_{m-1}D^{m-1} + \dots + a\_0,$$

$$\chi\_a(t) = (b\_n D^n + b\_{n-1} D^{n-1} + \dots + b\_0)\chi(t).$$

and zero initial conditions

$$(D^k \nu)(0) = 0, \quad k = 0, \cdots, m - 1 \dots$$

For this reason *y*(*t*) = *h*(*t*) ∗ *x*(*t*) is called the *zero state response* of the system.

It is obvious that the procedure can be reversed. We have therefore established a one-to-one correspondence between systems described by a rational transfer function and systems described by a linear differential equation with constant coefficients and zero initial conditions.

If the transfer function *H* of the system is *minimal*, that is, if its numerator and its denominator are relatively prime polynomials, then, in the complement of *t* = 0, it is possible to recreate the same output that would be produced by solving the corresponding initial value problem with *non-zero initial conditions*. This is achieved by driving the system with an input signal consisting of a weighted sum of a Dirac pulse and its derivatives

$$\mathbf{x} = \mathbf{x}\_{m-1} D^{m-1} \boldsymbol{\delta} + \dots + \boldsymbol{x}\_0 \boldsymbol{\delta}$$

and by suitably selecting the weighting coefficients *x*0,..., *xm*−<sup>1</sup> as described in Sect. 7.3 (see Example 7.4). Such a system is said to have *order m* and to be *observable* and *controllable* (see Sect. 8.6).

If *H*(*s*) is a proper rational transfer function, that is if *m* < *n*, then it can be expanded into a sum of partial fractions of the form

$$\frac{c\_{jk\_j}}{(s-p\_j)^{k\_j}}, \qquad k\_j = 1, \ldots, l\_j$$

with *pj* the *j*th zero of *P*(*s*), *lj* its multiplicity and *c jk <sup>j</sup>* constants. From Example 7.2 and the properties of the Laplace transform we therefore see that the impulse response *h* is the sum of products of polynomials and exponential functions. In particular, we see that the system is stable if the real part of the poles of *H*(*s*) are negative

$$
\Re\{p\_j\} < 0\,\mathrm{J}
$$

If *n* is not smaller than *m* then *H*(*s*) can be decomposed into the sum of a polynomial and a proper rational function. The impulse response *h* is then the sum of the above polynomial-exponential functions and a weighted sum of Dirac impulses and its derivatives.

# **8.6 System State**

In this section we review the concept of the state of a system. To this end consider the initial value problem described by the system of *n* differential equations

$$\frac{d}{dt}\mu = Au + \chi, \qquad \mu(0) = \mu\_0 \in \mathbb{C}^n$$

with *<sup>A</sup>* <sup>∈</sup> <sup>C</sup>*<sup>n</sup>*×*<sup>n</sup>* an *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* matrix and *<sup>u</sup>* and *x n* dimensional vectors of complex valued functions of time. As before we can translate this initial value problem in the language of distributions by replacing the (conventional) derivative with the distributional one and work in the convolution algebra of right sided distributions

$$Du = Au + u\_o \delta + \infty$$

If we rearrange the equation and convolve each term with *I*1<sup>+</sup> we obtain the equivalent equation

$$(I\delta - A\mathfrak{1}\_{+}) \* \mathfrak{u} = I\mathfrak{1}\_{+} \* (\mathfrak{u}\_{0}\delta + \mathfrak{x})\,. \tag{8.8}$$

This form shows that the equation can be solved by left convolving both sides of the equation with the inverse of (*I* δ − *A*1+). Observing the analogy with the geometric series, provided it converges, the latter can be represented by the following series, where the standard product of the geometric series has been replaced by the convolution product

$$(I\delta - A\mathbb{1}\_+)^{\*-1} = I\delta + A\mathbb{1}\_+ + (A\mathbb{1}\_+)^{\*2} + \dotsb \dotsb$$

The iterated convolutions are easily evaluated

$$(A\mathbb{1}\_+)^{\*n} = A^n \mathbb{1}\_+^{\*n} = A^n \frac{t^{n-1}}{(n-1)!}$$

and using the identity

$$\mathfrak{l}\_{+}^{\*n} = \mathfrak{l}\_{+}^{\*n} \* \delta = \mathfrak{l}\_{+}^{\*n} \* \mathfrak{l}\_{+} \* D\delta = \mathfrak{l}\_{+}^{\*n+1} \* D\delta$$

we obtain

$$(I\delta - A\mathbb{1}\_{+})^{\*-1} = I\delta + \sum\_{n=1}^{\infty} A^{n} \frac{t^{n-1}}{(n-1)!} = \sum\_{n=0}^{\infty} A^{n} \frac{t^{n}}{n!} \mathbb{1}\_{+} \* D\delta \dots$$

The last series can be expressed with the help of the *exponential matrix* defined by

$$\mathbf{e}^{At} := \sum\_{n=0}^{\infty} A^n \frac{t^n}{n!} \tag{8.9}$$

which converges for every value of *t*

$$(I\delta - A\mathfrak{1}\_{+})^{\*-1} = \mathfrak{1}\_{+} \mathbf{e}^{A\prime} \ast D\delta \,. \tag{8.10}$$

Having established the convergence of the series, using the linearity and continuity of convolution one readily sees that indeed it defines the desired inverse

$$(I\delta - A\mathfrak{1}\_+) \* [I\delta + A\mathfrak{1}\_+ + (A\mathfrak{1}\_+)^{\*2} + \cdots] = I\delta \dots$$

The solution of the equation is therefore given by

$$\mu = \mathfrak{l}\_{+} \mathbf{e}^{At} \ast I(D\delta \ast \mathfrak{l}\_{+}) \ast (\mu\_{0}\delta + \mathbf{x}) = \mathfrak{l}\_{+} \mathbf{e}^{At} \mu\_{0} + \mathfrak{l}\_{+} \mathbf{e}^{At} \ast \mathbf{x} \,. \tag{8.11}$$

The exponential matrix has several useful properties that are immediately verified using its defining series

$$\begin{aligned} \mathbf{e}^{At}\mathbf{e}^{A\tau} &= \mathbf{e}^{A(t+\tau)} & \mathbf{e}^{A0} &= I\\ (\mathbf{e}^{At})^{-1} &= \mathbf{e}^{-At} & D\mathbf{e}^{At} &= A\mathbf{e}^{At} = \mathbf{e}^{At}A \end{aligned}$$

Note however that in general

$$\mathbf{e}^A \mathbf{e}^B \neq \mathbf{e}^{A+B}.$$

This is only valid if *A* and *B* commute, that is *AB* = *B A*.

Consider now the *state space representation* of a SISO LTI system

$$Du = Au + u\_o \delta + Bx, \qquad \qquad A \in \mathbb{C}^{n \times n}, \; B \in \mathbb{C}^{n \times 1} \qquad (8.12)$$

$$\mathbf{y} = \mathbf{C}\mathbf{u} + D\mathbf{x} \tag{8.13} \tag{8.14} \tag{8.15} \mathbf{C} \in \mathbb{C}^{1 \times n}, D \in \mathbb{C} \tag{8.15}$$

where now *x* represents the input signal of the system and *y* its output. The vector *u* is called the *state* of the system and (8.11) shows that it's value *u*<sup>0</sup> at a given point in time *t*<sup>0</sup> is the minimum amount of information required that together with the input signal at times *t* ≥ *to* allows determining the system behaviour at all future times *t* > *t*0. In other words, the system state *u*<sup>0</sup> at time *t*<sup>0</sup> summarises the effect on the system of all past values of the input signal and of previous states.

# *8.6.1 Controllability*

It's interesting to ask if it's possible to design the input signal in such a way that the system can be set in an arbitrary state *u*<sup>0</sup> in finite time. That is, can we design the input signal such that for *t* > *t*<sup>0</sup> the state vector equals *u*(*t*) = e*Atu*0?

The problem is most easily analysed using impulsive inputs, starting from the zero state. From the above results we know that the system state dependence on the input signal *x* is given by

$$
\mu = \mathfrak{1}\_+ \mathfrak{e}^{A^r} B \ast x \dots
$$

Suppose that for an *n* dimensional system we use an input signal consisting of a weighted sum of a Dirac impulse and its derivatives up to order *n* − 1

$$\boldsymbol{\alpha} = \boldsymbol{\chi\_0}\boldsymbol{\delta} + \boldsymbol{\chi\_1}D\boldsymbol{\delta} + \cdots + \boldsymbol{\chi\_{n-1}}D^{n-1}\boldsymbol{\delta} \dots$$

Since the system is linear, we can analyse the contribution of each term individually

$$\begin{aligned} \mathfrak{l}\_{+} \mathfrak{e}^{At} B \ast \ge \mathfrak{x}\_{0} \delta = \mathfrak{l}\_{+} \mathfrak{e}^{At} B \ge \mathfrak{x}\_{0} \\ \mathfrak{l}\_{+} \mathfrak{e}^{At} B \ast \ge \mathfrak{x}\_{1} D \delta = D(\mathfrak{l}\_{+} \mathfrak{e}^{At} B \mathfrak{x}\_{1}) = \mathfrak{l}\_{+} \mathfrak{e}^{At} A B \mathfrak{x}\_{1} + \delta B \mathfrak{x}\_{1} \\ \cdots \\ \mathfrak{l}\_{+} \mathfrak{e}^{At} B \ast \mathfrak{x}\_{n-1} D^{n-1} \delta = D^{n-1} (\mathfrak{l}\_{+} \mathfrak{e}^{At} B \mathfrak{x}\_{n-1}) = \mathfrak{l}\_{+} \mathfrak{e}^{At} A^{n-1} B \mathfrak{x}\_{n-1} + \cdots \end{aligned}$$

The terms replaced by dots on the last line are constituted by a weighted sum of a Dirac impulse and its derivative which are zero for *t* > 0. Putting all terms together we obtain for *t* > 0

$$\mathfrak{1}\_{+}\mathbf{e}^{A\prime} \* \chi = \mathfrak{1}\_{+}\mathbf{e}^{A\prime} \left[ B\_{\;\;} \, \_{-} \mathbf{B}\_{\;\;} \, \_{-} \begin{matrix} \mathbf{B} \; \, \_{B} \mathbf{B} \; \, \ldots \; \, \_{A} \mathbf{e}^{n-1} \mathbf{B} \end{matrix} \right] \cdot \begin{bmatrix} \mathbf{x}\_{0} \\ \mathbf{x}\_{1} \\ \vdots \\ \mathbf{x}\_{n-1} \end{bmatrix}$$

From this we conclude that we can use a suitably designed input signal *x* to mimic the effect of an arbitrary initial state *u*<sup>0</sup> if and only if the matrix

$$\mathcal{C} := \left[ B \ A B \ \dots \ A^{n-1} B \right] \tag{8.14}$$

is invertible, in which case the weighting factors are

$$\begin{bmatrix} \boldsymbol{x}\_0 \\ \boldsymbol{x}\_1 \\ \vdots \\ \boldsymbol{x}\_{n-1} \end{bmatrix} = \boldsymbol{C}^{-1} \boldsymbol{u}\_0 \dots$$

The matrix C is called *controllability matrix*.

While the state of a system plays an important theoretical and conceptual role, in practice, when dealing with controllable systems we can always start from the zero state and drive the system in any desirable state. Things are completely different for non-controllable systems. As discussed in Sect. 8.6.3, these are systems possessing sub-systems that are not influenced by the input signal. In those systems the initial state may play an important role.

# *8.6.2 Observability*

Another interesting question is whether it's possible to reconstruct the initial state of a system at time *t*<sup>0</sup> from the observation of its output at times *t* > *t*<sup>0</sup> assuming that *A*, *B*,*C*, *D* and the input signal *x* are known. From linearity and knowledge of the input signal we can assume *x* to be zero. (Alternatively we could compute the part of the output signal due to the input signal—the zero state response of the system—and subtract it from the observed output.) The question is then if we can calculate *uo* from the observation of

$$\mathbf{y} = \mathbf{C}\mathbf{1}\_{+}\mathbf{e}^{At}u\_{0}.$$

Suppose that the system is *n* dimensional. Then if we compute the first *n* − 1 derivatives of the output signal we obtain

$$D\mathbf{y} = C\mathbf{1}\_{+}\mathbf{e}^{At}Au\_0 + C\delta u\_0$$

$$\cdots$$

$$D^{n-1}\mathbf{y} = C\mathbf{1}\_{+}\mathbf{e}^{At}A^{n-1}u\_0 + \cdots$$

where in the last equation we have represented by dots a weighted sum of a Dirac pulse and its derivatives as before. Thus, the observation of the output signal and of its first *n* − 1 derivatives at times *t* > 0 allows setting up the following system of equations

$$\lim\_{\iota \to 0+} \begin{bmatrix} \mathbf{y}(t) \\ D\mathbf{y}(t) \\ \vdots \\ \mathbf{0} \\ D^{n-1}\mathbf{y}(t) \end{bmatrix} = \begin{bmatrix} C \\ CA \\ \vdots \\ \vdots \\ CA^{n-1} \end{bmatrix} \cdot \boldsymbol{\mu}\_0 \dots$$

This system of equations can only be solved for *u*<sup>0</sup> if the matrix

$$\mathcal{O} := \begin{bmatrix} C \\ CA \\ \vdots \\ CA^{n-1} \end{bmatrix} \tag{8.15}$$

is not singular. The matrix O is called the *observability matrix*.

# *8.6.3 Jordan Normal form*

The simplest way to understand the structure of a system that is either not controllable, or not observable is by considering the system in Jordan normal form.

Consider a system in the state space representation

$$\begin{aligned} Du &= Au + Bx, & \quad A \in \mathbb{C}^{n \times n}, \; B \in \mathbb{C}^{n \times 1}, \\ \mathbf{y} &= Cu \quad \quad \quad \quad C \in \mathbb{C}^{1 \times n}. \end{aligned}$$

In linear algebra is shown that, by choosing a suitable basis, every linear operator can be represented by a matrix of the following block form, called the *Jordan normal form*

$$A = \begin{bmatrix} J\_1 & & & 0 \\ & J\_2 & & \\ & & \ddots & \\ & & & J\_r \end{bmatrix}$$

**Fig. 8.2** Jordan normal form representation of a system

with

$$J\_i = \begin{bmatrix} \lambda\_i & 1 & & 0 \\ & \lambda\_i & 1 \\ & & \ddots & \ddots \\ 0 & & & \lambda\_i \end{bmatrix}$$

the *elementary Jordan matrix*. The diagonal elements of *Ji* correspond all to the *i*th eigenvalue λ*<sup>i</sup>* of *A*. If *ni* correspond to the algebraic multiplicity of eigenvector λ*<sup>i</sup>* and ν*<sup>i</sup>* to its geometric multiplicity, then there are ν*<sup>i</sup>* Jordan blocks *Ji* corresponding to eigenvalue λ*<sup>i</sup>* . Thus, the total number of Jordan blocks corresponds to the number of independent eigenvectors of *A*. The Jordan normal form of a linear operator is unique up to permutations of the blocks.

A matrix for which the geometric multiplicity equals the algebraic multiplicity for each eigenvalue is called *semi simple*. In this case each block *Ji* is a 1 × 1 matrix and the Jordan normal form reduces to diagonal form.

A system in Jordan normal form can be interpreted as the parallel connection of independent sub-systems, each represented by a Jordan block *Ji* . Figure 8.2 shows the block diagram for a system with a simple eigenvalue λ<sup>0</sup> and a double eivenvalue λ<sup>1</sup> with ν<sup>1</sup> = 1. From the figure it's easy to see that if *b*<sup>0</sup> = 0 then the state variable *u*<sup>0</sup> can't be excited by the input signal *x*. The same is true for *u*<sup>2</sup> if *b*<sup>2</sup> = 0. In either case the system is not controllable. One can check that these are the two conditions under which the determinant of the matrix C vanishes.

In a similar way the figure shows that if *c*<sup>0</sup> = 0 there is no path from *u*<sup>0</sup> to the output of the system and for *c*<sup>1</sup> = 0 there is no path from *u*1. These are the two cases under which the system is not observable and correspond to the two conditions under which the determinant of the matrix O vanishes.

From these considerations we conclude that a non-observable system includes a sub-system whose output does not reach the global system output as schematically depicted in Fig. 8.3b. A non-controllable system includes a sub-system that is not reached by the input signal as schematically depicted in Fig. 8.3a.

#### **Example 8.2: Jordan Block**

Consider the system described by the following state-space representation

$$\begin{aligned} Du &= Au + Bx \\ \mathbf{y} &= Cu \end{aligned}$$

with

$$A = \begin{bmatrix} \alpha\_{3dB} & 1\\ 0 & \alpha\_{3dB} \end{bmatrix}, \qquad \qquad B = \begin{bmatrix} b\_0\\ b\_1 \end{bmatrix}, \qquad \qquad C = \begin{bmatrix} c\_0 \ c\_1 \end{bmatrix}.$$

We want to compute an explicit expression for the exponential matrix e*t A* allowing us to compute the response of the system to an arbitrary input signal *x*.

The matrix

$$A = \begin{bmatrix} \alpha\_{3dB} & 1 \\ 0 & \alpha\_{3dB} \end{bmatrix}$$

is an elementary Jordan matrix and can't be transformed in a diagonal matrix by a similarity transformation. In fact, as can be seen from the characteristic polynomial

$$\det(A - \lambda I) = (a\_{3dB} - \lambda)^2,$$

the matrix has a single eigenvalue λ = ω3*d B* with an algebraic multiplicity of 2 and the eigenspace belonging to this eigenvalue has dimension 1

$$(A - a\mathbb{1}\_{\partial B}I)v = \begin{bmatrix} 0 \ 1 \\ 0 \ 0 \end{bmatrix} v = 0 \qquad \Longrightarrow \qquad v = a \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad a \in \mathbb{C}\ .$$

The matrix *A* can however be written as the sum of a diagonal matrix *Ad* and a particularly simple matrix *Ac*

$$A = A\_d + A\_c = \begin{bmatrix} \alpha\_{3dB} & 0 \\ 0 & \alpha\_{3dB} \end{bmatrix} + \begin{bmatrix} 0 \ 1 \\ 0 \ 0 \end{bmatrix}$$

.

Observe that the matrices *Ad* and *Ac* do commute. For this reason we can use the following property of the exponential matrix

$$\mathbf{e}^{t(A\_d+A\_c)} = \mathbf{e}^{tA\_d}\mathbf{e}^{tA\_c}\dots$$

Since *Ad* is diagonal, the first exponential matrix e*t Ad* is easily calculated to be

$$\mathbf{e}^{tA\_d} = \mathbf{e}^{a\_{3dR}t}I \dots$$

The second exponential matrix e*t Ac* is easily calculated from the series defining the exponential matrix by noting that the square of the matrix *Ac* vanishes

$$\mathbf{e}^{tA\_c} = I + tA\_c \dots$$

Putting these results together we obtain

$$\mathbf{e}^{tA} = \mathbf{e}^{\alpha\_{\mathcal{M}^B}t} \left( \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} + \begin{bmatrix} 0 \ t \\ 0 \ 0 \end{bmatrix} \right) = \mathbf{e}^{\alpha\_{\mathcal{M}^B}t} \begin{bmatrix} 1 \ t \\ 0 \ 1 \end{bmatrix}.$$

The above method can be used to calculate the exponential of any elementary Jordan matrix with the only modification that for an *n* × *n* matrix *A* it is the *n*th power of the matrix *Ac* that vanishes.

In the following we are always going to assume that the systems under consideration are controllable and observable.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 9 Weakly Nonlinear Time Invariant Systems**

# **9.1 Introduction**

As outlined in Chap. 1, the behavior of nonlinear systems is substantially richer than the one of linear systems. To deal with them there is a set of techniques, each one best suited to analyse particular aspects or particular classes of nonlinear systems. We target systems that are stable about an equilibrium point and that depend continuously on the input signal.

Before analysing in more details this class of systems, we give a short overview, mostly by way of examples, of systems described by nonlinear ordinary differential equations of the form

$$D\mathbf{y} = f(t, \mathbf{y}), \qquad f: I \times X \to \mathbb{R}^n \tag{9.1a}$$

with *<sup>I</sup>* <sup>⊂</sup> <sup>R</sup>, *<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* and initial conditions

$$\mathbf{y}(\mathbf{0}) = \mathbf{y}\_0 \in X. \tag{9.1b}$$

We limit ourselves to the aspects that are helpful in better framing the concept of weakly nonlinear systems.

A first important difference compared to systems described by linear differential equations with constant coefficients is the fact that a solution may not exist for all *t* > 0 or may not be unique.

#### **Example 9.1: IVP with many Solutions**

Consider the following initial value problem (IVP)

$$D\mathbf{y} = \sqrt{|\mathbf{y}|} \qquad \mathbf{y}(0) = \mathbf{y}\_0 \dots$$

133

If *y*<sup>0</sup> > 0 then the equation can be solved by the method of separation of the variables, and we obtain the unique solution

$$\mathbf{y}(t) = \frac{1}{4}(t + 2\sqrt{\mathbf{y}\_o})^2, \qquad t \ge 0.1$$

If *y*<sup>0</sup> = 0 then *y*(*t*) = 0 is a solution. However, it is not the only one. For any constant *c* > 0 the function

$$y\_c(t) = \frac{\mathbf{1}\_+(t-c)}{4}(t-c)^2, \qquad t \ge 0$$

is also a solution as one easily verifies by inserting it in the equation.

For *y*<sup>0</sup> < 0 we can again use the method of the separation of the variables to find the solution

$$\mathbf{y}(t) = -\frac{1}{4}(2\sqrt{|\mathbf{y}\_0|} - t)^2 \dots$$

However, due to the fact that at *y* = 0 the function 1/ √|*<sup>y</sup>*<sup>|</sup> is not continuous (not even defined) this solution is only valid as long as *y*(*t*) < 0. When *y*(*t*) reaches zero the equation can again be satisfied by multiple solutions

$$\mathbf{y}\_c(t) = \begin{cases} -\frac{1}{4}(2\sqrt{|\mathbf{y}\_0|} - t)^2 & t \in [0, 2\sqrt{|\mathbf{y}\_0|}), \\ 0 & t \in [2\sqrt{|\mathbf{y}\_0|}, c), \\ \frac{1}{4}(t - c)^2 & t \in [c, \infty). \end{cases}$$

Therefore, for some initial conditions the equation has uncountably many solutions (Fig. 9.1).

#### 9.1 Introduction 135

From the above example we see that continuity of *f* is not enough to guarantee the existence of a unique solution of the initial value problem (9.1a). To guarantee uniqueness of a solution the function *f* (*t*, *y*) must be more regular with respect to *y*.

Let *<sup>I</sup>* <sup>⊂</sup> <sup>R</sup> and *<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>*. A function *<sup>f</sup>* <sup>∈</sup> *<sup>C</sup>*(*<sup>I</sup>* <sup>×</sup> *<sup>X</sup>*, <sup>R</sup>*<sup>n</sup>*) is called *locally Lipschitz continuous* in *x* if every point (*t*0, *y*0) ∈ *I* × *X* has a neighborhood *U* × *V* such that, for some constant *M* > 0

$$\|f(t, \mathbf{y}) - f(t, \mathbf{x})\| \le M \|\mathbf{y} - \mathbf{x}\|\,, \qquad t \in U, \quad \mathbf{x}, \mathbf{y} \in V\,.$$

If the function *f* (*t*, *y*)in (9.1b) is continuous in *t* and locally Lipschitz continuous in *y* then Picard-Lindelöf's theorem guarantees the existence and uniqueness of the solution of the initial value problem (9.1a) [23].

If the function *f* doesn't depend explicitly on time, then the system is time invariant and the system equation becomes

$$D\mathbf{y} = f(\mathbf{y})\,, \quad f: X \to \mathbb{R}^n, \quad X \subset \mathbb{R}^n. \tag{9.2}$$

A solution of the equation for which *Dy* = 0 is called an *equilibrium point* of the system. When one investigates the stability of an equilibrium point *ye* one can always assume it to be at the origin. In fact, by the change of variable *u* = *y* − *ye* one can always transform the system differential equation in one whose equilibrium point of interest is *ue* = 0

$$Du = D(\mu + \chi\_{\epsilon}) = f(\mu + \chi\_{\epsilon}) =: g(\mu).$$

An equilibrium point is *stable* if for each *c* > 0 one can find an -> 0 such that

$$\|\|\mathbf{y}(t\_0)\|\| < \epsilon \quad \implies \quad \|\|\mathbf{y}(t)\|\| < c, \quad t \ge t\_0 \dots$$

It is *asymptotically stable* if it is stable and in addition can be chosen such that

$$\|\mathbf{y}(t\_0)\| < \epsilon \quad \Longrightarrow \quad \lim\_{t \to \infty} \|\mathbf{y}(t)\| = 0 \,.$$

The set of all points *y*(*t*0) such that *y*(*t*) converges to zero as *t* tends to infinity is called the *domain of attraction* of the equilibrium point. If an equilibrium point is not stable it is called *unstable*.

As already highlighted in Chap. 1, an important difference of time invariant nonlinear systems compared to LTI ones is the possibility of the existence of *multiple isolated equilibrium points.*

#### **Example 9.2**

Consider the system described by the following differential equation

$$Dy = -ay + cy^2$$

with *a* and *c* positive constants. From

$$0 = -ay + cy^2 = cy(y - a/c)^2$$

we see that the system has two equilibrium points:

$$\mathbf{y}(t) = 0 \qquad \text{and} \qquad \mathbf{y}(t) = a/c \; .$$

We are interested in the dynamic of the system starting from the initial condition *y*(0) = *y*<sup>0</sup> assuming that *y*<sup>0</sup> doesn't coincide with an equilibrium point. Since the function *f* (*y*) = −*ay* + *cy*<sup>2</sup> is locally Lipschitz continuous, there is a unique solution and this solution doesn't intersect the equilibrium points. The initial value problem can therefore be solved by separating the variables and integrating

$$\int\_0^\prime \frac{d\mathbf{y}}{c\mathbf{y}(\mathbf{y} - a/c)} = \int\_0^\prime dt \,\,\mathrm{J}$$

The solution is found to be

$$\chi(t) = \chi\_0 \frac{\mathbf{e}^{-at}}{1 - \chi\_0 \frac{c}{a} (1 - \mathbf{e}^{-at})} \cdot \mathbf{t}$$

If *y*<sup>0</sup> is negative or 0 < *y*0*c*/*a* < 1 the solution converges toward zero which therefore is an asymptotically stable equilibrium point (see Fig. 9.2). If *y*0*c*/*a* > 1 the solution diverges and reaches infinity in the finite time

$$t\_{\infty} = \frac{1}{a} \ln \left( \frac{1}{1 - \frac{a}{y\_{00}}} \right) \dots$$

From the above example we see that a nonlinear system can have multiple equilibrium points some of which can be stable, and some unstable. For a system to remain stable around a stable equilibrium point the initial condition may have to remain within a limited region around that point. Also, divergence from initial conditions near unstable equilibrium points can diverge faster than exponentially and reach infinity in finite time (finite escape time).

One of the most useful tools in the study of the stability of equilibrium points is the Lyapunov stability theory [24]. In particular Lyapunov's linearization (or indirect) method, states that

• If the linear approximation of the system about an equilibrium point is asymptotically stable then, in a neighborhood *U* of the equilibrium point, the (nonlinear)

system is asymptotically stable. The largest neighborhood *U* is the domain of attraction of the equilibrium point.

• If the linear approximation of the system about an equilibrium point is unstable, then the (nonlinear) system is unstable.

If the linear approximation of the system is neither asymptotically stable nor unstable then this method is inconclusive and one must turn to other methods, for example, Lyapunov's direct method [24].

#### **Example 9.3**

Consider the initial value problem described by the differential equation

$$D\mathbf{y} = c\mathbf{y}^3$$

with *c* a constant; and the initial condition

$$\mathbf{y}(0) = \mathbf{y}\_0 \cdot$$

The only equilibrium point of the equation is the zero solution *ye*(*t*) = 0. As it's immediately seen, the linearized equation is stable, but not asymptotically stable about the equilibrium point.

The nonlinear equation can be solved by the method of the separation of the variables

$$\int\_0^y \frac{dy}{y^3} = c \int\_0^t dt \,\ldots$$

Performing the integrations and solving for *y* we find

If *c* > 0 the solution diverges and reaches infinity at

$$t\_{\infty} = \frac{1}{2c\chi\_0^2} \cdot$$

If *c* < 0 the equation is asymptotically stable for any value of the initial value *y*<sup>0</sup> (see Fig. 9.3).

Differently from what the above examples may suggest, most nonlinear differential equations can't be solved analytically. Therefore we are interested in methods to find approximate solutions around asymptotically stable equilibrium points in the spirit of a perturbation theory. Weakly nonlinear systems are a class of systems for which such a method exists and the solution is obtained in the form of a functional series.

Informally weakly nonlinear systems can be described as systems operated around an asymptotically stable equilibrium point and whose response depends continuously on the input signal *x*. They include systems described by a differential equation of the form

$$D\mathbf{y} = \mathbf{C}\mathbf{x} + f(\mathbf{y})\,,$$

$$f: Y \to \mathbb{R}^n, \quad \mathbf{C}: X \to \mathbb{R}^n, \quad X \subset \mathbb{R}, \quad Y \subset \mathbb{R}^m$$

with *C* a linear function and *f* a function that, within the excursion range of interest of *y*, can be approximated to any desired accuracy by a Taylor expansion. Note that polynomials are locally Lipschitz continuous. For this reason weakly nonlinear systems are well-behaved and produce a well-defined and unique output response.

# **9.2 Graded Algebra of Test Functions**

In the previous section we illustrated some aspects of weakly nonlinear systems based on examples of systems described by nonlinear differential equations. We now look for a description based on distributions. We'll see that this allows reducing the problem of solving some classes of nonlinear differential equations to an essentially algebraic problem. However, before discussing systems, we need some preparation that we provide in this and the next section.

Let *Vk* , *<sup>k</sup>* <sup>∈</sup> <sup>N</sup> be vector spaces on <sup>C</sup> such that *Vk* <sup>∩</sup> *Vj* = {0}for *<sup>k</sup>* <sup>=</sup> *<sup>j</sup>*. The *direct sum*

$$V := \bigoplus\_{k=0}^{\infty} V\_k := \bigoplus\_{k \ge 0} V\_k \tag{9.3}$$

is the vector space whose elements are the sequences (*xk* ) in <sup>∞</sup> *<sup>k</sup>*=<sup>0</sup> *Vk* with *xk* ∈ *Vk* and *xk* = 0 for fast every *k*. That is, the set of all finite sequences with *xk* ∈ *Vk* . The vector space structure of *V* is defined by the following addition and multiplication with scalars

$$c(\mathbf{x}\_k) + c(\mathbf{y}\_k) := (\mathbf{x}\_k + c\mathbf{y}\_k), \qquad (\mathbf{x}\_k), (\mathbf{y}\_k) \in V, \quad c \in \mathbb{C}. \tag{9.4}$$

Each *Vk* is evidently a sub-vector space of *V*.

If furthermore *V* is provided with a multiplication

$$V \times V \to V, \qquad (x, y) \mapsto x \odot y$$

such that it forms an algebra and in addition

$$V\_k \odot V\_j \subset V\_{k+j}, \qquad k, j \in \mathbb{N}$$

then it is called a *graded algebra*.

Let *Vk* <sup>=</sup> <sup>D</sup>(R*<sup>k</sup>* ) be the vector space of test functions on <sup>R</sup>*<sup>k</sup>* with *<sup>V</sup>*<sup>0</sup> <sup>=</sup> <sup>C</sup>. Then

$$\mathcal{D}\_{\oplus} := \bigoplus\_{k \geq 0} \mathcal{D}(\mathbb{R}^k)$$

with the tensor product as multiplication

$$\phi \otimes \psi \left(\tau\_1, \dots, \tau\_k, \tau\_{k+1}, \dots, \tau\_{k+j}\right) := \phi(\tau\_1, \dots, \tau\_k) \psi \left(\tau\_{k+1}, \dots, \tau\_{k+j}\right)$$

is a graded algebra that we call the *graded algebra of test functions*.We write elements of D<sup>⊕</sup> as sums with indices denoting the grade of the element

140 9 Weakly Nonlinear Time Invariant Systems

$$\phi = \sum\_{j=0}^{N} \phi\_j \,, \qquad \phi\_j \in \mathcal{D}(\mathbb{R}^j) \,, \quad N \in \mathbb{N} \,.$$

In the graded algebra of test functions we define the following convergence criterion. A sequence (φ*m*), φ*<sup>m</sup>* ∈ D<sup>⊕</sup> with

$$\phi\_m = \sum\_{j=0}^{N\_m} \phi\_{j,m} \,, \qquad \phi\_{j,m} \in \mathcal{D}(\mathbb{R}^j).$$

converges to zero if

1. There exist compact sets *<sup>K</sup> <sup>j</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>j</sup>* , *j* = 1,..., *N* with *N* = max*<sup>m</sup>*∈<sup>N</sup>(*Nm*) such that for each *j* and *m*

$$\text{supp}(\phi\_{j,m}) \subset K\_j \dots$$

2. For every *<sup>j</sup>* <sup>&</sup>gt; 0 and every *<sup>j</sup>*-tuple *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>*<sup>j</sup>* the sequence (*D<sup>k</sup>*φ*<sup>j</sup>*,*<sup>m</sup>*)*<sup>m</sup>*∈<sup>N</sup> converges uniformly to zero. For *j* = 0 the sequence of numbers (φ<sup>0</sup>,*<sup>m</sup>*)*<sup>m</sup>*∈<sup>N</sup> converges to zero.

# **9.3 Direct Product of Distributions**

The *direct product V* of vector spaces *Vk* on C is the vector space whose elements are the sequences (*xk* ) with *xk* <sup>∈</sup> *Vk* , *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. The vector space structure is defined as for the direct sum by (9.4). It is denoted by

$$V := \prod\_{k \ge 0} V\_k := \prod\_{k=0}^{\infty} V\_k \,. \tag{9.5}$$

The key difference from the direct sum is that, in a direct product, the sequence does not have to be finite.

Let *Vk* = D (R*<sup>k</sup>* ), with *<sup>V</sup>*<sup>0</sup> <sup>=</sup> <sup>C</sup>. Then the direct product

$$\mathcal{D}'\_{\oplus} := \prod\_{k \ge 0} \mathcal{D}'(\mathbb{R}^k)$$

is the set of linear continuous functionals on D<sup>⊕</sup> defined by

$$h: \mathcal{D}\_{\oplus} \to \mathbb{C}, \qquad \phi \mapsto \langle h, \phi \rangle := \sum\_{j=0}^{\infty} \langle h\_j, \phi\_j \rangle \tag{9.6}$$

with

$$\phi = \sum\_{j=0}^{\infty} \phi\_j \,, \qquad h = \sum\_{j=0}^{\infty} h\_j \,, \qquad \phi\_j \in \mathcal{D}(\mathbb{R}^j), \quad h\_j \in \mathcal{D}'(\mathbb{R}^j) \,.$$

Since φ only has a finite number of terms different from zero, *h*, φ is well-defined. As for *k* = *j*, D (R*<sup>k</sup>* ) <sup>∩</sup> <sup>D</sup> (R*<sup>j</sup>* ) = {0}, here and in the following we denote elements of D <sup>⊕</sup> by sums in a similar way as we do for elements of D⊕.

Continuity in D <sup>⊕</sup> is defined by the convergence that we defined for D<sup>⊕</sup> and follows from the continuity of distributions. Since D <sup>⊕</sup> is a vector space, it's enough to verify continuity at the origin. Let *h* ∈ D <sup>⊕</sup> and <sup>φ</sup> <sup>∈</sup> <sup>D</sup>⊕, then there exists an *<sup>N</sup>* <sup>∈</sup> <sup>N</sup> such that

$$|\langle h, \phi \rangle| \le \sum\_{j=0}^{N} |\langle h\_j, \phi\_j \rangle| \le (N+1) \sup\_{j \in \{0, \dots, N\}} |\langle h\_j, \phi\_j \rangle|$$

and according to our definition of convergence, when φ converges to zero, so does sup*j*|*h <sup>j</sup>*, φ*j*| and hence *h*, φ.

In Sect. 3.1 we have introduced the tensor product of distributions and have seen that it is well defined between any pair of distributions. With it we can define a product *g* · *h* between elements *g* and *h* of D <sup>⊕</sup>. It's *k*th component is defined by

$$(gh)\_k := (\mathbf{g} \cdot h)\_k := \sum\_{j=0}^k \mathbf{g}\_j \otimes h\_{k-j} \,, \qquad k \in \mathbb{N} \tag{9.7}$$

with *gj* and *h <sup>j</sup>* the *j*th components of *g* and *h* respectively. With this product (D <sup>⊕</sup>, +, ·) becomes an algebra. As is common practice, we will often denote *g* · *h* simply by *gh*. Being based on an associative operation (the tensor product) the product that we just defined is associative.

Note the close similarity between the algebra of formal power series and the one that we have defined for D <sup>⊕</sup>. In both cases addition is defined component wise and the product has the form of a convolution.

# **9.4 Symmetric Distributions**

Let Sk denote the set of all permutations of {1,..., *k*}. A distribution *hk* ∈ D (R*<sup>k</sup>* ) is *symmetric* if

$$\langle\langle h\_k, \phi(\mathfrak{r}\_{\sigma(1)}, \dots, \mathfrak{r}\_{\sigma(k)})\rangle = \langle h\_k, \phi(\mathfrak{r}\_1, \dots, \mathfrak{r}\_k)\rangle \tag{9.8}$$

for all permutations <sup>σ</sup> <sup>∈</sup> Sk and every <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R*<sup>k</sup>* ). Symmetric distributions are fully characterized by symmetric test functions for

142 9 Weakly Nonlinear Time Invariant Systems

$$\langle \langle h\_k, \phi(\tau\_1, \dots, \tau\_k) \rangle \rangle = \left\langle h\_k, \frac{1}{k!} \sum\_{\sigma \in \mathbf{S}\_k} \phi(\tau\_{\sigma(1)}, \dots, \tau\_{\sigma(k)}) \right\rangle$$

and the sum of test functions on the right-hand side is a symmetric test function. The sum of symmetric distributions is a symmetric distribution. Therefore, they form a vector subspace of distributions that we denote by D sym(R*<sup>k</sup>* ). Similarly, we denote the vector subspace of all symmetric test functions on <sup>R</sup>*<sup>k</sup>* by <sup>D</sup>sym(R*<sup>k</sup>* ), the one of the direct sum of symmetric test functions by <sup>D</sup>⊕,sym(R*<sup>k</sup>* ) and the one of the direct product of symmetric distributions by D <sup>⊕</sup>,sym(R*<sup>k</sup>* ).

A symmetric distribution can be constructed from an arbitrary distribution *f* ∈ D (R*<sup>k</sup>* ) by averaging over all permutations of the independent variables

$$\left[\!\!\right]\!\!\right]\_{\text{sym}} := \frac{1}{k!} \sum\_{\sigma \in \mathbb{S}\_k} f(\mathfrak{t}\_{\sigma(\text{l})}, \dots, \mathfrak{t}\_{\sigma(\text{k})}) \tag{9.9}$$

with

*f* (τσ (1),...,τσ (*k*)), φ(τ1,...,τ*<sup>k</sup>* ) := *f* (τ1,...,τ*<sup>k</sup>* ), φ(τσ (1),...,τσ (*k*)).

Such an operation is called *symmetrisation*.

The tensor product is a bi-linear operation. Therefore, the power of an element of D <sup>⊕</sup> composed by a finite number of distributions *f <sup>j</sup>* ∈ D (R*<sup>n</sup> <sup>j</sup>*), *<sup>n</sup> <sup>j</sup>* <sup>≥</sup> 1, *<sup>j</sup>* <sup>=</sup> 1,..., *m*, *m* ≥ 2 can be expressed as a sum of tensor products

$$\left(\sum\_{j=1}^{m} f\_j\right)^k = \sum\_{j\_1=1}^{m} \cdots \sum\_{j\_k=1}^{m} f\_{j\_1} \otimes \cdots \otimes f\_{j\_k}, \quad k \in \mathbb{N}$$

with the sum ranging over all possible combinations of the indexes *j*1,..., *jk* . If the distributions *f*1,..., *fm* are symmetric then one can reorder the indexes *j*1,..., *jk* by any permutation σ without changing the value of the sum. Hence, the tensor products on the right-hand side can be replaced by symmetrized products

$$\sum\_{j\_1=1}^m \cdots \sum\_{j\_k=1}^m f\_{j\_1} \otimes \cdots \otimes f\_{j\_k} = \sum\_{j\_1=1}^m \cdots \sum\_{j\_k=1}^m \left[ f\_{j\_1} \otimes \cdots \otimes f\_{j\_k} \right]\_{\text{sym}}.$$

The tensor product of symmetric distributions inside the symmetrisation operator act as a commutative operator. For this reason the sum includes summands that are equal and, by grouping them, we obtain an expression that is similar to the multinomial formula [21]

$$\left(\sum\_{j=1}^{m} f\_j\right)^k = \sum\_{|\alpha|=k} \frac{k!}{\alpha!} \left[f^{\otimes \alpha}\right]\_{\text{sym}}, \qquad f = (f\_1, \dots, f\_m) \tag{9.10}$$

with α an *m*-tuple in N*<sup>m</sup>*,

$$f^{\otimes \alpha} := f\_1^{\otimes \alpha\_1} \otimes \dots \otimes f\_m^{\otimes a\_m} \tag{9.11}$$

and where we made use of the multi-index notation introduced in Sect. 4.6.

In general the product that we defined on D <sup>⊕</sup> applied to two elements of D ⊕,sym does not result in an element of D <sup>⊕</sup>,sym. This can be remedied by symmetrizing the product

$$\mathfrak{g}(gh)\_k := (\mathfrak{g} \cdot h)\_k := \sum\_{j=0}^k \left[ \mathfrak{g}\_j \otimes h\_{k-j} \right]\_{\text{sym}}, \qquad \mathfrak{g}, h \in \mathcal{D}'\_{\oplus, \text{sym}}.\tag{9.12}$$

Unless explicitly stated otherwise, when working in D <sup>⊕</sup>,sym we will always assume the use of this symmetrized product.

The last property of symmetric distributions that we want to mention is the fact that, in a convolution algebra, the inverse of a symmetric distribution is symmetric, for

$$\begin{aligned} \delta(\mathfrak{r}\_1, \mathfrak{r}\_2) &= f(\mathfrak{r}\_1, \mathfrak{r}\_2) \ast f^{\ast -1}(\mathfrak{r}\_1, \mathfrak{r}\_2) \\ &= f(\mathfrak{r}\_2, \mathfrak{r}\_1) \ast f^{\ast -1}(\mathfrak{r}\_1, \mathfrak{r}\_2) \\ &= f(\mathfrak{r}\_1, \mathfrak{r}\_2) \ast f^{\ast -1}(\mathfrak{r}\_2, \mathfrak{r}\_1) \ . \end{aligned}$$

# **9.5 Weakly Nonlinear Systems**

We are looking for a representation, in the spirit of a perturbation theory, of a class of nonlinear systems including the ones described by differential equations of the form

$$L\mathbf{y} = \mathbf{x} + \sum\_{k=2}^{K} c\_k \mathbf{y}^k \tag{9.13}$$

with *x* ∈ D (R) a given input signal, *L* a linear differential operator with constant coefficients

$$L = D^m + a\_{m-1}D^{m-1} + \cdots + a\_1D + a\_0$$

and where we assume that the linearized system is stable.

In Chap. 7 we saw that, in the language of distributions, a linear differential equation with constant coefficients becomes a convolution equation. If we want to apply the results obtained for convolution equations, we need to give a meaning to the nonlinear terms appearing in the above equation.

In general, it's not possible to define a multiplication valid for arbitrary distributions. Therefore, the terms *<sup>y</sup><sup>k</sup>* , *<sup>k</sup>* <sup>&</sup>gt; 1 can't be assumed to belong to <sup>D</sup> (R). To work around this problem we can assume *y* to belong to a direct product of distributions, *y* = (*y*0, *y*1, *y*2,...), and use the product defined on that space. Since the product between functions with values in <sup>C</sup> is commutative *<sup>f</sup>* · *<sup>g</sup>* <sup>=</sup> *<sup>g</sup>* · *<sup>f</sup>* , we require *<sup>y</sup>* to belong to the direct product of symmetric distributions D <sup>⊕</sup>,sym. Then, if *y*<sup>1</sup> is the solution of the linearized equation its powers become tensor powers

$$(\mathbf{y}\_1)^k = \mathbf{y}\_1^{\otimes k} \dots$$

If *y*<sup>1</sup> is a regular distribution, that is a locally integrable *function*, then we can recover the meaning of the powers in the differential equation by evaluating *y*⊗*<sup>k</sup>* <sup>1</sup> on the diagonal

$$\mathbf{y}\_1^{\otimes k}(t, \dots, t) = \mathbf{y}\_1^k(t) \dots$$

The same remains true if we replace *y*<sup>1</sup> by a sum of distributions.

To complete the interpretation of the differential equation in the language of distributions it remains to be clarified what is the effect of the one dimensional differential operator *D* appearing in (9.13) on the components *yk* ∈ D sym(R*<sup>k</sup>* ) of *<sup>y</sup>*. To this end, suppose *yk* to be a regular distribution. Then it is a locally integrable function

$$\mathsf{y}\_{k}: \mathfrak{r} \mapsto \mathsf{y}\_{k}(\mathfrak{r}\_{1}, \dots, \mathfrak{r}\_{k}), \qquad \mathfrak{r} \in \mathbb{R}^{k}$$

and we can associate with it a function of the single variable *t* by defining an operation that we call "evaluating on the diagonal"

$$\text{ev}\_{\mathsf{d}}(\mathsf{y}\_{k}) := t \mapsto \mathsf{y}\_{k}(t, \ldots, t) \,, \qquad t \in \mathbb{R} \; .$$

If we assume this function to be differentiable, then the derivative with respect to *t* is well-defined

$$D\text{ev}\_{\mathbb{d}}(\mathbf{y}\_k)(t) = D\_1\mathbf{y}\_k(t, \dots, t) + \dots + D\_k\mathbf{y}\_k(t, \dots, t)$$

and, as a distribution, can be represented by

$$D\mathbf{y}\_k := \left(\sum\_{j=0}^{k-1} \delta^{\otimes j} \otimes D\delta \otimes \delta^{\otimes k - 1 - j}\right) \* \mathbf{y}\_k \,. \tag{9.14}$$

This last expression is symmetric and is valid for arbitrary distributions. Therefore, we can take it as the definition of the effect of the differential operator *D* on distributions *yk* ∈ D sym(R*<sup>k</sup>* ). For *<sup>y</sup>* <sup>∈</sup> <sup>D</sup> <sup>⊕</sup>,sym and any φ ∈ D⊗, *y*, φ only has a finite number of terms different from zero. For this reason the effect of *D* on *y* can be defined as acting on each component individually.

For *y* ∈ D <sup>⊕</sup>,sym to be a solution of (9.13) in a convolution algebra, the equation must be satisfied by each component *yk* of *y* individually. If *y* has to be compatible with our assumption of the system being described around the zero equilibrium point, then the 0th component *y*<sup>0</sup> must always be zero

$$\mathbf{y}\_0 = \mathbf{0}.$$

In analogy with the theory of formal power series we call distributions *y* ∈ D <sup>⊕</sup> with *y*<sup>0</sup> = 0 *nonunits* [25].

For *k* = 1 the only terms belonging to D (R) appearing in the equation are *y*<sup>1</sup> and *x*. Hence, *y*<sup>1</sup> is the solution of the linearized equation and, as discussed in Sect. 8.1, can be represented by

$$y\_1 = h\_1 \* x\_\cdot$$

For *k* = 2 we have

$$L\delta \* \mathcal{y}\_2 = c\_2 \, \mathcal{y}\_1^{\otimes 2} \qquad \delta \in \mathcal{D}'(\mathbb{R}^2).$$

and we see that, for the computation of *y*2, the tensor power of *y*<sup>1</sup> plays the role of an input signal applied to a linear system. Assuming that *L*δ has an inverse, we obtain

$$\mathbf{y}\_2 = c\_2 \left( L\delta \right)^{\*-1} \ast \mathbf{y}\_1^{\otimes 2} \dots$$

The above expression can be further manipulated by noting that

$$\begin{aligned} &\left< \left( a(\mathfrak{r}\_1) \otimes b(\mathfrak{r}\_2) \right) \ast \left( f(\mathfrak{r}\_1) \otimes g(\mathfrak{r}\_2) \right), \phi(\mathfrak{r}\_1, \mathfrak{r}\_2) \right> \\ &= \left< \left( a(\mathfrak{r}\_1) \otimes b(\mathfrak{r}\_2) \right) \otimes \left( f(\lambda\_1) \otimes g(\lambda\_2) \right), \phi \left( \mathfrak{r}\_1 + \lambda\_1, \mathfrak{r}\_2 + \lambda\_2 \right) \right> \\ &= \left< \left( a(\mathfrak{r}\_1) \ast f(\mathfrak{r}\_1) \right) \otimes \left( b(\mathfrak{r}\_2) \ast g(\mathfrak{r}\_2) \right), \phi \left( \mathfrak{r}\_1, \mathfrak{r}\_2 \right) \right> \end{aligned}$$

or

$$(a \otimes b) \* (f \otimes \mathbf{g}) = (a \* f) \otimes (b \* \mathbf{g})\,. \tag{9.15}$$

With this expression and the solution found for *y*<sup>1</sup> we can express *y*<sup>2</sup> as

$$y\_2 = h\_2 \* x^{\otimes 2}, \qquad h\_2 := c\_2 \, (L\delta)^{\*-1} \* h\_1^{\otimes 2}$$

where raising to a tensor power is assumed to have higher priority than convolution.

From this it is not difficult to see that every component *yk* can be expressed as the convolution of a distribution *hk* specific to the problem and the input signal *x* raised to the tensor power of *k*

$$y\_k = h\_k \* x^{\otimes k} \dots$$

We are therefore led to define a*weakly nonlinear* (or *analytic*) time-invariant (WNTI) system as a system H whose behavior around the zero equilibrium point can be described by an element *h* of D <sup>⊕</sup>,sym such that, when driven by the input signal *x*, its output is given by

$$\mathbf{h} \cdot \mathbf{y} = h[\mathbf{x}] := \sum\_{k=1}^{\infty} h\_k \ast \mathbf{x}^{\otimes k}, \qquad h\_k \in \mathcal{H}'\_k, \quad \mathbf{x} \in \mathcal{H}'\_1 \tag{9.16}$$

with A <sup>1</sup> a convolution algebra in D (R) and <sup>A</sup> *<sup>k</sup>* a convolution algebra in D sym(R*<sup>k</sup>* ) compatible with A<sup>1</sup> and the tensor product. This means that, if *x* ∈ A 1, then *x*⊗*<sup>k</sup>* must be an element of A *<sup>k</sup>* . We denote such a set of convolution algebras by A <sup>⊕</sup>,sym. The distribution *hk* is called the *kth order impulse response* (or *kernel*) of the system. A block diagram representation of a weakly nonlinear system is shown in Fig. 9.4. Note that, if the input signal is multiplied by a constant *<sup>c</sup>* <sup>∈</sup> <sup>C</sup>, *yk* is scaled by a factor of *c<sup>k</sup>*

$$\mathbf{y}\_k = h\_k \* (c\ x)^{\otimes k} = c^k \left( h\_k \* x^{\otimes k} \right) \dots$$

The interpretation of the output of our definition of a weakly nonlinear system requires some comment as it doesn't always represent a quantity that can be interpreted as a signal depending on time. Under the assumption that all involved distributions belong to a convolution algebra, then one can distinguish the following cases

• If the impulse responses *hk* as well as the input signal *x* are regular distributions and the convolutions *hk* ∗ *x*⊗*<sup>k</sup>* are well-defined (see Sect. 3.2) then all output components *yk* are locally integrable functions. In this case we can evaluate the *yk* on the diagonal

#### 9.5 Weakly Nonlinear Systems 147

$$\begin{aligned} \text{ev}\_d(\mathbf{y}\_k)(t) &= \text{ev}\_d(h\_k \ast \mathbf{x}^{\otimes k})(t) = \\ \int\_{-\infty}^{\infty} &\cdots \int\_{-\infty}^{\infty} h\_k(\tau\_1, \dots, \tau\_k) \mathbf{x}(t - \tau\_1) \cdots \cdots \mathbf{x}(t - \tau\_k) d\tau\_1 \cdots d\tau\_k \end{aligned} \tag{9.17}$$

and obtain an interpretation for the *yk* as signals of time.

If the input signal is scaled by the constant *c*, then, at each time *t*, the output evd(*y*)(*t*) is seen to be a power series in *c*

$$\operatorname{ev}\_{\mathsf{d}}(\mathsf{y})(t) := \sum\_{k=1}^{\infty} c^k \operatorname{ev}\_{\mathsf{d}}(h\_k \ast x^{\otimes k})(t) \, .$$

If this series has a convergence radius greater than zero valid at all times, then evd(*y*) represents a well-defined function of time and we have a clear procedure to interpret the output of the system.


#### **Example 9.4: Polynomial System**

In this example we consider a class of systems whose impulse responses are not regular.

Suppose that the output of a system H is represented by a nonlinear function *f* of the input signal *x* and that the function *f* can be adequately approximated by a Taylor polynomial around the origin

$$\mathbf{y} = f(\mathbf{x}) \approx \sum\_{k=1}^{K} \frac{f^{(k)}(\mathbf{0})}{k!} \mathbf{x}^{k}, \qquad f(\mathbf{0}) = \mathbf{0}, \quad K > 1. \tag{9.18}$$

It is readily seen that such a system can be represented by the impulse responses

$$h\_k = \frac{f^{(k)}(0)}{k!} \delta^{\otimes k}, \qquad k = 1, \ldots, K \ .$$

The response of the system to the input signal *x* as represented by these impulse responses is

$$\mathbf{y} = h[\boldsymbol{\chi}] = \sum\_{k=1}^{K} \frac{f^{(k)}(0)}{k!} \boldsymbol{\delta}^{\otimes k} \* \boldsymbol{\chi}^{\otimes k} \, .$$

If the input signal is not a regular distribution, for example if it is a Dirac pulse, then neither the initial representation given by (9.18), nor the evaluation on the diagonal evd(*h*[δ]) do have a meaning. In spite of this, the impulse responses and their outputs *yk* are mathematically well-defined.

If the class of input signals is restricted to regular distributions then the output obtained from the representation in terms of impulse responses by evaluating on the diagonal evd(*h*[*x*]) agrees with the original one.

If *f* is analytic, then it can be represented by a power series (*K* → ∞). In this case the output of the system is only well defined if the magnitude of the input signal |*x*(*t*)| remains smaller than the convergence radius of the series at all times.

Let *hk* be the *k*th order impulse responses of the weakly nonlinear system H and *x* its input signal. In Sect. 3.3 we saw that an arbitrary distribution can be approximated to any desired accuracy by a finite sum of Dirac pulses. Hence, *x* can be approximated by

$$\alpha \approx \sum\_{j=1}^{M} a\_j \, \delta(t - \lambda\_j) \,, \qquad a\_j \in \mathbb{C}, \quad \lambda\_j \in \mathbb{R}$$

and the output of *hk* by

$$\begin{split} \mathbf{y}\_{k} &\approx \sum\_{j\_{\mathrm{i}}=1}^{M} \cdots \sum\_{j\_{\mathrm{k}}=1}^{M} h\_{k} \ast a\_{j\_{\mathrm{i}}} \delta(\tau\_{1} - \lambda\_{j\_{\mathrm{i}}}) \otimes \cdots \otimes a\_{j\_{\mathrm{k}}} \delta(\tau\_{k} - \lambda\_{j\_{\mathrm{i}}}) \\ &= \sum\_{j\_{\mathrm{i}}=1}^{M} \cdots \sum\_{j\_{\mathrm{k}}=1}^{M} a\_{j\_{\mathrm{i}}} \cdot \cdots \cdot a\_{j\_{\mathrm{k}}} h\_{k} (\tau\_{1} - \lambda\_{j\_{\mathrm{i}}}, \dots, \tau\_{k} - \lambda\_{j\_{\mathrm{k}}}) \,. \end{split}$$

This expression suggests the interpretation for *hk* as that portion of the system defining how the response of the system depends on the combination of *k* simultaneous points in time of the input signal.

In addition, if we compare the expression representing the output at time *t* of the (causal) impulse response *hk*

$$\text{ev}\_{\mathsf{d}}(h\_k \ast x^{\otimes k})(t)$$

with the one of a polynomial system (see Example 9.4)

$$(\operatorname{ev}\_{\mathsf{d}}(c\_k \delta^{\otimes k} \* x^{\otimes k})(t) = c\_k \, x^k(t)$$

we see that, the output at time *t* of the latter only depends on the *k*th power of the current value of the input signal. In contrast to this, the output at time *t* of the former depends on all combinations of products of *k* (past) values of the input signal. The impulse responses *hk* can thus be interpreted as the memory of the system. The given representation of weakly nonlinear systems can be seen as a generalization of the Taylor approximation method for memory-less systems to systems with memory. It is called the Volterra functional series in honor of V. Volterra who first proposed it [5].

# **9.6 Nonlinear Transfer Functions**

All impulse responses *hk* of a causal weakly nonlinear system must vanish if any argument τ *<sup>j</sup>* is less than zero. This is most easily seen if we consider the case where the impulse responses as well as the input signal *x* are regular distributions, for then

$$\begin{aligned} \text{ev}\_{\mathbf{d}}(\mathbf{y}\_k)(t) &= \\ &\int\_{-\infty}^{\infty} \cdots \int\_{-\infty}^{\infty} h\_k(t-\tau\_1,\ldots,t-\tau\_k) \mathbf{x}(\tau\_1) \cdots \cdots \mathbf{x}(\tau\_k) d\tau\_1 \cdots d\tau\_k \,. \end{aligned}$$

As every distribution is the limit of smooth functions, this must then be true for arbitrary distributions. The impulse responses of all orders of causal systems are therefore right-sided distributions.

The Laplace transform of the *k*-order impulse response *hk* is called the *nonlinear transfer function of order k*

$$H\_k(\mathbf{s}\_1, \dots, \mathbf{s}\_k) = \langle h\_k(\tau\_1, \dots, \tau\_k), \mathbf{e}^{-s\_1\tau\_1 - \dots - s\_k\tau\_k} \rangle \,. \tag{9.19}$$

Due to the symmetry of *hk* , it is a symmetric function of the variables *s*<sup>1</sup> to *sk*

$$H\_k(\mathbf{s}\_1, \dots, \mathbf{s}\_k) = H\_k(\mathbf{s}\_{\sigma(1)}, \dots, \mathbf{s}\_{\sigma(k)}) \,, \quad \sigma \in \mathbf{S}\_k \,. \tag{9.20}$$

As the Laplace transform converts convolution products into ordinary products, the Laplace transform of *yk* = *hk* ∗ *x*⊗*<sup>k</sup>* is

$$Y\_k(\mathbf{s}1, \dots, \mathbf{s}\_k) = H\_k(\mathbf{s}\_1, \dots, \mathbf{s}\_k) X(\mathbf{s}\_1) \cdot \dots \cdot X(\mathbf{s}\_k) \dots$$

Just as with LTI systems, the many useful properties of the Laplace transform makes it a very valuable tool for solving convolution equations describing weakly nonlinear systems. In particular, on top of converting convolution products into ordinary multiplications, in their region of convergence, the Laplace transformed of distributions are holomorphic functions.

Consider a system described by a differential equation with constant coefficients of the type considered before

$$\begin{array}{c} \text{Ly} = N\mathbf{x} + \sum\_{j=2}^{J} c\_{j} \mathbf{y}^{j} \\ \text{L} = D^{n} + a\_{n-1}D^{n-1} + \dots + a\_{0} \\ N = b\_{m}D^{m} + b\_{m-1}D^{m-1} + \dots + b\_{0} \end{array} \tag{9.21}$$

The part of the corresponding convolution equation relevant for the calculation of *yk* , *k* > 1, is

$$(L\boldsymbol{\delta}^{\otimes k})\*\mathbf{y}\_k = \sum\_{j=2}^k c\_j(\mathbf{y}\_1 + \dots + \mathbf{y}\_{k-1})^j \dots$$

As the Laplace transform of *D*δ⊗*<sup>k</sup>* is

$$\mathcal{L}\{D\delta^{\otimes k}\}(s\_1, \dots, s\_k) = s\_1 + \dots + s\_k$$

(see (9.14)), the Laplace transform of *L*δ⊗*<sup>k</sup>* is a polynomial in *s*<sup>1</sup> +···+ *sk*

$$P(\mathbf{s}\_1 + \dots + \mathbf{s}\_k) = (\mathbf{s}\_1 + \dots + \mathbf{s}\_k)^n + a\_{n-1}(\mathbf{s}\_1 + \dots + \mathbf{s}\_k)^{n-1} + \dots + a\_0 \dots$$

Note that the coefficients of this polynomial are the same for all *k*, including *k* = 1. The only difference between the various values of *k* is in the argument. If we factor it, we see that the denominator of *Hk* adds to the denominator of the lower order transfer functions *Hj*, *j* = 1,..., *k* − 1 terms of the form

$$(\mathbf{s}\_1 + \dots + \mathbf{s}\_k - p\_j)^{l\_j}$$

with *pj* the *j*th pole and *lj* its multiplicity. If we assume *Hk* to be a proper rational function, then its partial fraction expansion will include terms of the form

$$\frac{F(\mathbf{s}\_1, \dots, \mathbf{s}\_{k-1})}{(\mathbf{s}\_1 + \dots + \mathbf{s}\_k - p\_j)^{l\_j}}$$

and similar ones where some of the variables *s*1,...,*sk*−<sup>1</sup> may be missing. If by the calculation of the inverse Laplace transform we start by inverse transforming with respect to *sk* we obtain the expression

$$F(\mathbf{s}\_1, \dots, \mathbf{s}\_{k-1}) \text{ } \mathbf{r}\_k^{l\_j - 1} \text{ } \mathbf{e}^{\{p\_j - (s\_1 + \dots + s\_{k-1})\} \mathbf{r}\_k} \text{ } \mathbf{l}\_+(\mathbf{r}\_k) \text{ } \mathbf{s}$$

By using the shifting property of the Laplace transform and denoting by *f* the inverse transform of *F*, the complete inverse transform of the above expression is

$$f(\mathfrak{r}\_1 - \mathfrak{r}\_k, \dots, \mathfrak{r}\_{k-1} - \mathfrak{r}\_k) \otimes \mathfrak{r}\_k^{l\_j - 1} \mathfrak{e}^{p\_j \mathfrak{r}\_k} \mathfrak{l}\_+(\mathfrak{r}\_k) \dots$$

If *Hk* is not a proper rational function, then it can be decomposed into a polynomial and a proper rational function. The inverse Laplace transform of the polynomial part results in Dirac pulses and its derivatives.

This shows that, if the system under consideration can be described by a differential equation with constant coefficients of the indicated type, then, similarly to the first order impulse response *h*1, the higher order impulse responses are sums of Dirac pulses, their derivatives and products of polynomials and exponential functions in the variables τ1,...,τ*<sup>k</sup>* . In addition, it also shows that, if the linear transfer function *H*1(*s*1) has all its poles in the left-hand side of the complex plane, then not only does the regular part of *h*<sup>1</sup> (that is discarding the Dirac pulses and its derivatives) decay exponentially as its argument tends to infinity, but so also do the regular part of all higher order impulse responses *hk* . In particular, we see that all impulse responses are summable distributions

$$h\_k \in \mathcal{D}'\_{L^1+} (\mathbb{R}^k) \,, \qquad k = 1, 2, \dots, \infty$$

In the following, unless explicitly stated otherwise, we are always going to assume the systems to be of this type.

#### **Example 9.5**

We revisit Example 9.2 and find an approximate solution of the initial value problem

$$Dy = -ay + cy^2, \qquad y(0) = y\_0, \qquad a, c > 0, \ldots$$

valid around its zero equilibrium point.

As we saw, in translating an initial value problem into the language of distributions, the initial conditions become part of the equation which, in this case, comes to be

$$(D+a)\mathbf{y} = \mathbf{y}\_0 \boldsymbol{\delta} + c\mathbf{y}^2.$$

We can think of this equation as an equation describing a system driven by the input signal *x* = *y*0δ. The solution of the equation *y* is an element of D <sup>⊕</sup>,sym and has the form

$$\mathbf{y} = \sum\_{k=1}^{\infty} h\_k \ast \mathbf{x}^{\otimes k} \cdot \mathbf{x}$$

The system is therefore fully characterized if we find the impulse responses *hk* . The solution of the original problem is then found by multiplying each impulse response *hk* by *y<sup>k</sup>* 0

$$\mathbf{y}\_k = h\_k \mathbf{y}\_0^k \dots$$

To find the impulse responses we apply the input signal *x* = δ and insert *y* = *h* into the equation. The equation is solved if it is satisfied by each component *hk* of *h* individually. The component *hk* can be computed from the equation and the impulse responses of lower order *h <sup>j</sup>*, *j* = 1,..., *k* − 1.

To find *h*<sup>1</sup> we retain only terms of the equation belonging to D (R)

$$(D\delta + a\delta) \* h\_1 = \delta \ .$$

If we Laplace transform the equation we obtain

$$(\mathbf{s}\_1 + a)H\_{\mathbf{l}}(\mathbf{s}\_1) = 1$$

from which we immediately obtain the first order transfer function

$$H\_{\mathbb{I}}(s\_{\mathbb{I}}) = \frac{1}{s\_{\mathbb{I}} + a}$$

and, by inverse Laplace transformation, the first order impulse response

$$h\_1(\mathfrak{r}\_1) = \mathfrak{l}\_+(\mathfrak{r}\_1) \operatorname{e}^{-a\mathfrak{r}\_1} \dots$$

The second order impulse response *h*<sup>2</sup> is found by retaining in the equation only terms belonging to D (R<sup>2</sup>)

$$(D+a)\delta^{\otimes 2} \* h\_2 = c \, h\_1^{\otimes 2} \, .$$

From the Laplace transformed equation

$$(\mathbf{s}\_1 + \mathbf{s}\_2 + a)H\_2(\mathbf{s}\_1, \mathbf{s}\_2) = c \, H\_1(\mathbf{s}\_1)H\_1(\mathbf{s}\_2)$$

we immediately obtain the second order nonlinear transfer function

$$H\_2(s\_1, s\_2) = \frac{c \ H\_1(s\_1) H\_1(s\_2)}{s\_1 + s\_2 + a} \ .$$

Note that it's often convenient to write higher-order transfer functions in terms of the first-order one. In this example

$$H\_2(\mathbf{s}\_1, \mathbf{s}\_2) = c \, H\_1(\mathbf{s}\_1 + \mathbf{s}\_2) H\_1(\mathbf{s}\_1) H\_1(\mathbf{s}\_2) \, .$$

To obtain the second order impulse response we can inverse Laplace transform, first with respect to one Laplace variable, then with respect to the other one, and finally by symmetrizing the result. We first inverse transform with respect to *s*<sup>2</sup> the expression

$$H\_1(s\_1 + s\_2)H\_1(s\_2) = \frac{1}{[s\_2 + (s\_1 + a)](s\_2 + a)} \cdot \frac{1}{2}$$

Assuming *s*<sup>1</sup> = 0<sup>1</sup> and expanding in partial fractions we find

<sup>1</sup> The obtained expression is a continuous function of *<sup>s</sup>*<sup>1</sup> which we extend by continuity to *<sup>s</sup>*<sup>1</sup> <sup>=</sup> 0.

$$\frac{1}{s\_1} \left( 1 - \mathbf{e}^{-s\_1 \tau\_2} \right) \mathbf{e}^{-a \cdot \tau\_2} \mathbf{1}\_+(\tau\_2) \dots$$

We then combine this expression with the other factors of *H*<sup>2</sup>

$$\frac{c}{(s\_1+a)s\_1} \left(1 - e^{-s\_1\tau\_2}\right) e^{-a\tau\_2} \mathcal{I}\_+(\tau\_2)$$

and inverse transform with respect to *s*1. This can be done by expanding in partial fractions the first factor

$$\mathcal{L}^{-1}\left\{\frac{c}{(s\_1+a)s\_1}\right\}(\mathfrak{r}\_1) = \frac{c}{a}\left(1-\mathfrak{e}^{-a\mathfrak{r}\_1}\right)\mathfrak{l}\_+(\mathfrak{r}\_1)$$

and by using the shifting property of the Laplace transform to find

$$\frac{c}{a}\left[\left(1-\mathbf{e}^{-a\tau\_1}\right)\mathfrak{I}\_+(\tau\_1)-\left(1-\mathbf{e}^{-a(\tau\_1-\tau\_2)}\right)\mathfrak{I}\_+(\tau\_1-\tau\_2)\right]\mathbf{e}^{-a\tau\_2}\mathfrak{I}\_+(\tau\_2)\dots$$

Note that this expression is not symmetric and that if we had first inverse transformed with respect to *s*<sup>1</sup> and then to *s*2, we would have obtained an expression with τ<sup>1</sup> and τ<sup>2</sup> exchanged.

The second-order impulse response is obtained from the above expression by symmetrisation

$$h\_2(\mathbf{r}\_1, \mathbf{r}\_2) = \left[ \frac{c}{a} \left[ \left( 1 - \mathbf{e}^{-a\tau\_1} \right) - \left( 1 - \mathbf{e}^{-a(\tau\_1 - \tau\_2)} \right) \mathbb{1}\_+ (\tau\_1 - \tau\_2) \right] \mathbf{e}^{-a\tau\_2} \right]\_{\text{sym}}$$

where we have suppressed the explicit Heavyside step functions with the understanding that the expression is zero if τ<sup>1</sup> < 0 or τ<sup>2</sup> < 0. As *h*<sup>2</sup> is a regular distribution, it can be evaluated on the diagonal and we obtain

$$\operatorname{ev}\_{\mathsf{d}}(h\_2)(t) = \frac{c}{a} \left( \mathsf{e}^{-at} - \mathsf{e}^{-2at} \right) \ . $$

The third order impulse response *h*<sup>3</sup> is found by retaining only elements belonging to D (R<sup>3</sup>) in the equation. As a first step we write

$$((D+a)\delta^{\otimes 3} \* h\_3 = c \, (h\_1 + h\_2)^2$$

for no other term can produce distributions belonging to D (R<sup>3</sup>). The right hand side can be expanded with the help of (9.10) and, retaining only the terms of interest, we obtain

$$((D+a)\delta^{\otimes 3} \* h\_3 = 2c \text{ [}h\_1 \otimes h\_2\text{]}\_{\text{sym}}\dots$$

The Laplace transformed equation is

$$c(\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3 + a)H\_3(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) = 2c \left[ H\_1(\mathbf{s}\_1)H\_2(\mathbf{s}\_2, \mathbf{s}\_3) \right]\_{\text{sym}}$$

and with it the third order nonlinear transfer function is readily obtained

$$H\_3(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) = 2\mathbf{c} \, H\_1(\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) \left[ H\_1(\mathbf{s}\_1) H\_2(\mathbf{s}\_2, \mathbf{s}\_3) \right]\_{\text{sym}} \dots$$

By expressing *H*<sup>2</sup> in terms of *H*<sup>1</sup> we can write *H*<sup>3</sup> in terms of *H*<sup>1</sup> alone

$$H\_3(s\_1, s\_2, s\_3) = \frac{2}{3}c^2 \left[H\_1(s\_1 + s\_2 + s\_3)H\_1(s\_1)H\_1(s\_2)H\_1(s\_3)\right.$$

$$\cdot \left[H\_1(s\_1 + s\_2) + H\_1(s\_1 + s\_3) + H\_1(s\_2 + s\_3)\right] \dots$$

The computation of the third order impulse response proceeds along the same lines as the computation of *h*2. After some algebraic manipulations and exploiting the properties of the Laplace transform we obtain a rather long expression whose evaluation on the diagonal is

$$\mathrm{ev}\_{\mathrm{d}}(h\_3)(t) = \left(\frac{c}{a}\right)^2 \left(\mathrm{e}^{-at} - 2\mathrm{e}^{-2at} + \mathrm{e}^{-3at}\right) \dots$$

At this point it's interesting to compare the first three elements of the approximate solution that we computed here with the exact solution that we calculated in Example 9.2 and that we reproduce here for convenience

$$\chi(t) = \chi\_0 \frac{\mathbf{e}^{-at}}{1 - \chi\_0 \frac{c}{a} (1 - \mathbf{e}^{-at})} \cdot \mathbf{t}$$

If |*y*0*c*/*a*| < 1 the exact solution can be expanded in a geometric power series

$$\begin{split} \mathbf{y}(t) &= \mathbf{y}\_0 \mathbf{e}^{-at} \sum\_{j=0}^{\infty} \left[ \frac{\mathbf{y}\_0 \mathbf{c}}{a} (1 - \mathbf{e}^{-at}) \right]^j \\ &= \mathbf{y}\_0 \mathbf{e}^{-at} + \mathbf{y}\_0^2 \frac{c}{a} \left( \mathbf{e}^{-at} - \mathbf{e}^{-2at} \right) \\ &+ \mathbf{y}\_0^3 \left( \frac{c}{a} \right)^2 \left( \mathbf{e}^{-at} - 2 \mathbf{e}^{-2at} + \mathbf{e}^{-3at} \right) + \cdots \\ &= \mathbf{e} \mathbf{v}\_d (h\_1 \mathbf{y}\_0 + h\_2 \mathbf{y}\_0^2 + h\_3 \mathbf{y}\_0^3)(t) + \cdots \end{split}$$

and see that the lowest order terms correspond to the calculated response components *y*1, *y*<sup>2</sup> and *y*3. Note also that the convergence radius of the power series derived from the exact solution corresponds to the radius of the largest open ball, centered at the origin and contained in the domain of attraction of the equilibrium point

$$\mathbb{B}(0, a/c) := \left\{ \mathbf{y}\_0 \in \mathbb{R} \mid \mathbf{y}\_0 < \frac{a}{c} \right\} \dots$$

Figure 9.5 compares the exact solution of the initial value problem with the approximation given by evd(*y*<sup>1</sup> + *y*<sup>2</sup> + *y*3) for *a* = 1, *c* = 1/2, *y*<sup>0</sup> = 1.

While for this particular example it was easier to compute the exact solution than to calculate the approximation, the latter allows us to obtain the output of the system described by the differential equation

$$\mathbf{D}\mathbf{y} + \mathbf{a}\mathbf{y} = \mathbf{x} + \mathbf{c}\mathbf{y}^2 \tag{9.22}$$

for any input signals *x* ∈ D <sup>+</sup>(R) maintaining the system withing the region of attraction of the equilibrium point

$$\mathrm{ev\_d}(\mathfrak{y})(t) \approx \mathrm{ev\_d}(\mathfrak{y}\_1 + \mathfrak{y}\_2 + \mathfrak{y}\_3)(t)$$

with

$$\begin{aligned} \mathbf{y}\_1(t) &= \int\_0^t h\_1(t-\tau\_1)\mathbf{x}(\tau\_1)d\tau\_1 \\ \mathrm{ev}\_d(\mathbf{y}\_2)(t) &= \int\_0^t \int\_0^t h\_2(t-\tau\_1, t-\tau\_2)\mathbf{x}(\tau\_1)\mathbf{x}(\tau\_2)d\tau\_1 d\tau\_2 \\ \mathrm{ev}\_d(\mathbf{y}\_3)(t) &= \int\_0^t \int\_0^t \int\_0^t h\_3(t-\tau\_1, t-\tau\_2, t-\tau\_3)\mathbf{x}(\tau\_1)\mathbf{x}(\tau\_2)\mathbf{x}(\tau\_3)d\tau\_1 d\tau\_2 d\tau\_3 \dots \end{aligned}$$

Here and in many problems, this amounts to limiting the magnitude of the input signal to sufficiently small values. Figure 9.6 show the approximate solution for a sinusoidal input *x*(*t*) = 1+(*t*)sin(*t*) and compares it to the solution obtained by numerical integration of the differential equation for *a* = 1, *c* = 1/2.

This example shows how by representing the solution of a nonlinear differential equation describing a weakly nonlinear system by a sequence of distributions *y* ∈ D <sup>⊕</sup>,sym we have reduced the problem of solving a nonlinear differential equation to an essentially algebraic problem. While some expressions are rather long, they can be manipulated rather easily by modern computer algebra systems (CAS).

#### **Example 9.6**

We revisit Example 9.3 and try to find an approximate solution in D <sup>⊕</sup>,sym of the initial value problem

$$D\mathbf{y} = c\mathbf{y}^3, \qquad \mathbf{y}(0) = \mathbf{y}\_0, \qquad c < 0$$

valid around its zero equilibrium point. Note that the linearized equation is stable, but not asymptotically stable.

As before we calculate the impulse responses by setting *y*<sup>0</sup> = 1. The solution for an arbitrary *y*<sup>0</sup> is then found by multiplying the *k*th order impulse response *hk* by *y<sup>k</sup>* 0 .

The first order impulse response *h*<sup>1</sup> is found by writing the convolution equation corresponding to the above initial value problem and retaining only terms of first order

$$D\delta \* h\_1 = \delta \ .$$

By Laplace transforming the equation, the first order transfer function *H*1(*s*1) is found to be

$$H\_1(\mathbf{s}\_1) = \frac{1}{\mathbf{s}\_1} \dots$$

From it, the first order impulse response is

$$h\_1(\mathfrak{r}\_1) = \mathfrak{l}\_+(\mathfrak{r}\_1)\dots$$

The equation doesn't have second order nonlinearities. Therefore the second order impulse response and the second order transfer function are both zero

$$h\_2(\mathfrak{r}\_1, \mathfrak{r}\_2) = 0 \,, \qquad H\_2(\mathfrak{s}\_1, \mathfrak{s}\_2) = 0 \,.$$

The third order impulse response is found by retaining all third order terms in the convolution equation

$$D\delta \ast h\_3 = c \, h\_1^{\otimes 3} \, .$$

By Laplace transforming the equation we find for the third order transfer function

$$H\_3(s\_1, s\_2, s\_3) = \frac{c}{(s\_1 + s\_2 + s\_3)s\_1 s\_2 s\_3} \dots$$

From this, the third order impulse response is obtained by inverse Laplace transforming with respect to one variable at a time and by symmetrizing the result

$$\begin{split} h\_{3}(\tau\_{1}, \tau\_{2}, \tau\_{3}) &= c \Big[ \tau\_{3} \mathsf{1}\_{+}(\tau\_{3}) + (\tau\_{2} - \tau\_{3}) \mathsf{1}\_{+}(\tau\_{3} - \tau\_{2}) \\ &\quad + \mathsf{1}\_{+}(\tau\_{2} - \tau\_{1}) \Big[ (\tau\_{1} - \tau\_{3}) \mathsf{1}\_{+}(\tau\_{3} - \tau\_{1}) + (\tau\_{3} - \tau\_{2}) \mathsf{1}\_{+}(\tau\_{3} - \tau\_{2}) \Big] \Big]\_{\text{sym}}. \end{split}$$

From the above results we could conclude that, to third order, the approximate solution of the initial value problem is

$$\text{ev}\_{\mathsf{d}}(\mathsf{y})(t) = \mathsf{y}\_{0}\mathsf{1}\_{+}(t) + c\mathsf{y}\_{0}^{\mathsf{3}}\mathsf{1}\_{+}(t)t + \cdots \mathsf{...}\_{+} $$

This is however only valid for sufficiently small values of *t*. The reason is best seen by comparing the above expression with the exact solution of the initial value problem that we obtained in Example 9.3 and that we repeat here for convenience

$$\mathbf{y}(t) = \frac{\mathbf{y}\_0}{\sqrt{1 - 2c\mathbf{y}\_0^2 t}} \cdot \mathbf{y}$$

The Taylor expansion around zero of the function

$$x \mapsto \frac{1}{\sqrt{1-x}}$$

is

$$1 + \frac{1}{2}\mathbf{x} + \frac{1\cdot 3}{2\cdot 4}\mathbf{x}^2 + \frac{1\cdot 3\cdot 5}{2\cdot 4\cdot 6}\mathbf{x}^3 + \frac{1\cdot 3\cdot 5\cdot 7}{2\cdot 4\cdot 6\cdot 8}\mathbf{x}^4 + \dotsb$$

and has a convergence radius of 1. Therefore, as long as |2*cy*<sup>2</sup> <sup>0</sup> *t*| < 1, the exact solution can be represented by the power series

$$\mathbf{y}(t) = \mathbf{y}\_0 \left[ 1 + c\mathbf{y}\_0^2 t + \frac{3}{2} (c\mathbf{y}\_0^2 t)^2 + \frac{\mathbf{5}}{2} (c\mathbf{y}\_0^2 t)^3 + \frac{3\mathbf{5}}{8} (c\mathbf{y}\_0^2 t)^4 + \dotsb \right]$$

whose first two terms coincide with *y*0*h*1(*t*) and evd(*y*<sup>3</sup> <sup>0</sup> *h*3)(*t*) respectively. However, as *t* increases, the higher order terms become more and more important and, when |2*cy*<sup>2</sup> <sup>0</sup> *t*| = 1, the Taylor expansion stops being a valid representation of the exact solution of the initial value problem.

The last example shows that, in general, the solution of a nonlinear differential equation in terms of an element of D <sup>⊕</sup>,sym is only meaningful around an equilibrium point for which *the linearized equation is asymptotically stable*. The reason being that, if this is not the case then the response of the system to any part of the input signal can persist indefinitely in time without ever decreasing to negligible levels. Since this is true for the response of any order, the output evd(*y*) can not in general be represented by a power series. We can say that *systems that are representable by a Volterra series are those whose output does not depend on the too distant past.*

In the case in which the linearized system is asymptotically stable all impulse responses are summable distributions. Their Fourier transforms are therefore continuous functions that can be obtained from the nonlinear transfer functions *Hk* by

$$h\_k(a\nu\_1, \dots, a\nu\_k) = H\_k(j a\nu\_1, \dots, j a\nu\_k)\dots$$

As the nonlinear transfer functions are rational functions, the Fourier transforms *h*ˆ *<sup>k</sup>* are indefinitely differentiable and of polynomial growth, so they belong to O*<sup>M</sup>* .

# **9.7 Periodic Input Signals**

In this section we investigate the response of weakly nonlinear systems to periodic input signals. Given a periodic input signal *x*, every tensor power *x*⊗*<sup>k</sup>* is evidently also a (higher dimensional) periodic distribution. Therefore, every component *yk* of the system response *y* can be calculated in the convolution algebra of periodic distributions and represented by a Fourier series.

Let *x* be a T-periodic input signal with Fourier coefficients

$$c\_m(\mathbf{x}) = \frac{1}{\mathcal{T}} \langle \mathbf{x}, \mathbf{e}^{-j m \frac{2\pi}{\mathcal{T}} t} \rangle, \qquad m \in \mathbb{Z}\_+.$$

Further, let *<sup>m</sup>* <sup>=</sup> (*m*1,..., *mk* ) <sup>∈</sup> <sup>Z</sup>*<sup>k</sup>* be a multi-index and <sup>ω</sup>*<sup>c</sup>* <sup>=</sup> <sup>2</sup>π/T, then the Fourier coefficients of the *k*th tensor power of *x* are

$$\begin{split} c\_{m}(\boldsymbol{\chi}^{\otimes k}) &= \frac{1}{\mathcal{T}^{k}} \langle \boldsymbol{\chi}^{\otimes k}, \mathbf{e}^{-j\boldsymbol{\omega}\_{\boldsymbol{c}}(\boldsymbol{m}, \boldsymbol{\tau})} \rangle \\ &= \frac{1}{\mathcal{T}} \langle \boldsymbol{\chi}, \mathbf{e}^{-j\boldsymbol{m}\_{1}\boldsymbol{\omega}\_{\boldsymbol{c}}\boldsymbol{\tau}\_{1}} \rangle \cdot \frac{1}{\mathcal{T}} \langle \boldsymbol{\chi}, \mathbf{e}^{-j\boldsymbol{m}\_{k}\boldsymbol{\omega}\_{\boldsymbol{c}}\boldsymbol{\tau}\_{k}} \rangle \\ &= c\_{m\_{1}}(\boldsymbol{\chi}) \cdot \cdots c\_{m\_{k}}(\boldsymbol{\chi}) \,. \end{split}$$

With this expression and a straightforward generalization of Eqs. (4.21) and (4.24) to higher dimensional distributions, the Fourier coefficients of *yk* are readily seen to be

$$c\_m(\mathbf{y}\_k) = \ddot{h}\_k(m\_1\omega\_c, \dots, m\_k\omega\_c) \ c\_{m\_1}(\mathbf{x}) \cdots c\_{m\_k}(\mathbf{x}) \tag{9.23}$$

with *h*ˆ *<sup>k</sup>* the Fourier transform of the *k*th order impulse response of the system.

# **9.8 Multi-tone Input Signals**

In some applications, for example in the study of interference and distortion in communication systems, one is often interested in the response of a system to input signals consisting of sinusoidal tones. If the frequencies of the tones are commensurate, that is, if their ratios are rational numbers, then one can find a common period and the input signal is periodic. The system response can thus be obtained by using the results of the previous section. However, for multi-tone input signals the results are often more directly interpretable by using a different indexing scheme for the tones composing the output components *yk* [13].

# *9.8.1 General Case*

Let's consider a system driven by an input consisting of *N* complex tones

$$\chi(t) = \sum\_{n=1}^{N} A\_n \chi\_n(t), \quad \chi\_n(t) := \mathbf{e}^{J^{\otimes \rho\_n}}, \quad A\_n := |A\_n| \mathbf{e}^{J^{\otimes \rho\_n}}$$

initially assumed to have commensurate angular frequencies ω1,...,ω*<sup>N</sup>* . Our objective is to calculate the system response of order *k*

$$y\_k = h\_k \* x^{\otimes k} \dots$$

Consider first the tensor power *x*⊗*<sup>k</sup>* . It can be expanded with the help of (9.10)

$$\begin{aligned} \chi^{\otimes k} &= \left(\sum\_{n=1}^N A\_n \chi\_n\right)^{\otimes k} \\ &= \sum\_{|m|=k} \frac{k!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \cdot \left[\chi\_1^{\otimes m\_1} \otimes \chi\_N^{\otimes m\_N}\right]\_{\text{sym}} \end{aligned}$$

with *m* the multi-index *m* = (*m*1,..., *mN* ) whose elements range from 0 to *N*. Observe that this expression is the Fourier series representation of *x*⊗*<sup>k</sup>* . With it and (9.23) the Fourier series representation of *yk* is thus found to be

$$\begin{aligned} \mathbf{y}\_k &= \sum\_{|m|=k} \frac{k!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \hat{h}\_{k,m} \cdot \left[ \chi\_1^{\otimes m\_1} \otimes \cdots \otimes \chi\_N^{\otimes m\_N} \right]\_{\mathbf{sym}} \\ \hat{h}\_{k,m} &:= \hat{h}\_k(\underbrace{\omega\_1, \dots, \omega\_1}\_{m\_1}, \dots, \underbrace{\omega\_N, \dots, \omega\_N}\_{m\_N}) \end{aligned} \tag{9.24}$$

with *h*ˆ *<sup>k</sup>* the Fourier transform of the impulse response of order *k*. As this sum is finite and only composed by indefinitely differentiable functions, it is itself an indefinitely differentiable function that can be evaluated on the diagonal

$$\text{v}\_{\mathbf{k}}(t) := \text{ev}\_{\mathbf{d}}(\mathbf{y}\_{\mathbf{k}})(t) = \sum\_{|m|=k} \mathbf{y}\_{k,m}(t) \tag{9.25}$$

$$\mathbf{y}\_{k,m}(t) := \frac{k!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \hat{h}\_{k,m} \mathbf{e}^{J^{\text{ab}\_m t}} \tag{9.26}$$

$$\boldsymbol{\alpha}\_{m} := \sum\_{n=1}^{N} m\_{n} \boldsymbol{\alpha}\_{n} = m\_{1} \boldsymbol{\alpha}\_{1} + \cdots + m\_{N} \boldsymbol{\alpha}\_{N} \,. \tag{9.27}$$

The *k*th order response of the system is therefore a sum composed by

$$\frac{(N-1+k)!}{(N-1)!k!} \tag{9.28}$$

complex tones, each one uniquely determined by a specific multi-index *m*. In this context the multi-index *m* is also called a *frequency mix* and |*m*| its *order*.

These results show several important properties of weakly nonlinear systems.


At the beginning of this section we assumed the input frequencies to be commensurate. If this is not the case then the input signal is not periodic, but *almost periodic*. For such signals one can still define a Fourier series [16, Sect. VI.9] and the obtained results remains valid.

# *9.8.2 Real Case*

In this section we specialize the above results to the case of a real system driven by an input consisting of *N* sinusoidal signals

$$\chi(t) = \sum\_{n=1}^{N} |A\_n| \cos(\alpha\_n t + \varphi\_n)$$

and where we assume ω1,...,ω*<sup>N</sup>* > 0. To re-use previous results it's convenient to represent the input signal in terms of complex exponentials and use separate indexes for positive and negative angular frequencies

$$\begin{aligned} \chi(t) &= \frac{1}{2} \sum\_{n=1}^{N} A\_n \chi\_n(t) + A\_{-n} \chi\_{-n}(t), \qquad \chi\_n(t) := \mathbf{e}^{J\alpha\_n t} \\ A\_n &:= |A\_n| \mathbf{e}^{J\psi\_n}, \qquad A\_{-n} := \overline{A}\_n = |A\_n| \mathbf{e}^{-J\psi\_n}, \qquad \omega\_{-n} := -\alpha\_n. \end{aligned}$$

The quantity *An* is called the *phasor* of the sinusoidal signal

$$|A\_n|\cos(a\rho\_n t + \varphi\_n) \dots$$

With this notation and using the multi-index *m* = (*m*−*<sup>N</sup>* ,..., *m*−<sup>1</sup>, *m*1,..., *mN* ) the output component *yk* is easily calculated with the help of (9.24)–(9.27)

$$\begin{aligned} \mathbf{y}\_k(t) &= \sum\_{|m|=k} \mathbf{y}\_{k,m}(t) \\ \mathbf{y}\_{k,m}(t) &= \frac{1}{2^k} \frac{k!}{m!} A\_{-N}^{m\_- N} \cdots A\_{-1}^{m\_-} A\_1^{m\_1} \cdots A\_N^{m\_N} \hat{h}\_{k,m} \mathbf{e}^{J\alpha \mathbf{m}^\mathbf{m}} \\ \hat{h}\_{k,m} &= \hat{h}\_k(\underbrace{\boldsymbol{\omega}\_{-N}, \dots, \boldsymbol{\omega}\_{-N}}\_{m\_{-N}}, \dots, \underbrace{\boldsymbol{\omega}\_{N}, \dots, \boldsymbol{\omega}\_{N}}\_{m\_N}) \end{aligned}$$

162 9 Weakly Nonlinear Time Invariant Systems

$$\rho\_m = \sum\_{n=-N \atop n \neq 0}^{N} m\_n \omega\_n = (m\_1 - m\_{-1})\omega\_1 + \dots + (m\_N - m\_{-N})\omega\_N \dots$$

To *N* sinusoidal input tones there correspond 2*N* complex tones. Therefore, in the real case, the sum is composed by

$$\frac{(2N-1+k)!}{(2N-1)!k!} \tag{9.29}$$

frequency mixes.

In the real case there is some extra structure that we can exploit. Consider a specific frequency mix *m* = (*m*−*<sup>N</sup>* ,..., *m*−<sup>1</sup>, *m*1,..., *mN* ). From the above expression, it's apparent that the multi-index

$$\text{rev}(m) := (m\_N, \dots, m\_1, m\_{-1}, \dots, m\_{-N}) \tag{9.30}$$

obtained from *m* by reversing the order of the entries does also appear in the Fourier series of *yk* . If *m* = rv(*m*) then from *k*!/rv(*m*)! = *k*!/*m*!, ωrv(*m*) = −ω*m*, *A*rv(*m*) = *Am* and *h*ˆ *<sup>k</sup>*,rv(*m*) = *h*ˆ *<sup>k</sup>*,*<sup>m</sup>* we deduce that the sum of *yk*,*<sup>m</sup>*(*t*) and *yk*,rv(*<sup>m</sup>*)(*t*)is a sinusoidal signal

$$\begin{split} \mathbf{y}\_{k,m}^{c}(t) := \mathbf{y}\_{k,m}(t) + \mathbf{y}\_{k, \text{rv}(m)}(t) = 2\Re\{\mathbf{y}\_{k,m}\} \\ = \frac{1}{2^{k-1}} \frac{k!}{m!} |A\_1|^{m\_1 + m\_{-1}} \cdots |A\_N|^{m\_N + m\_{-N}} |\hat{h}\_{k,m}| \\ \quad \cdot \cos(\omega\_m t + \varphi\_m + \psi\_{k,m}) \end{split} \tag{9.31}$$

with

$$
\hat{h}\_{k,m} = |\hat{h}\_{k,m}| \,\mathrm{e}^{J\psi\_{k,m}} \tag{9.32}
$$

$$\varphi\_m = \sum\_{\substack{n=-N \\ n\neq 0}}^N m\_n \varphi\_n = (m\_1 - m\_{-1})\varphi\_1 + \dots + (m\_N - m\_{-N})\varphi\_N \,. \tag{9.33}$$

If *m* = rv(*m*) then the multi-index rv(*m*) is not distinct from *m* and the Fourier series component described by rv(*m*) coincides with the one described by *m*. In this case ω*<sup>m</sup>* = 0 and, as the system is assumed to be real, *h*ˆ *<sup>k</sup>*,*<sup>m</sup>* must be real. The response *yk*,*<sup>m</sup>* therefore becomes

$$\text{y}\_{k,m}(t) = \frac{1}{2^k} \frac{k!}{m!} |A\_1|^{2m\_1} \cdots |A\_N|^{2m\_N} \hat{h}\_{k,m} \,, \qquad m = \text{rv}(m) \tag{9.34}$$

Note that *m* and rv(*m*) can only be equal for even values of *k*. Also, note that there can be multi-indexes *m* resulting in ω*<sup>m</sup>* = 0 for which *m* = rv(*m*) doesn't hold.

#### **Example 9.7**

Consider again the system described by the differential equation

$$D\mathbf{y} + a\mathbf{y} = \mathbf{x} + c\mathbf{y}^2 \qquad a, c > 0 \dots$$

that we analysed in Example 9.5. Here we are interested in the steady state response of the system when driven by the input signal

$$\mathbf{x}(t) = |A|\sin(\omega\_1 t) = |A|\cos(\omega\_1 t - \pi/2).$$

In our previous analysis of this system we calculated the first three nonlinear transfer functions *H*1, *H*<sup>2</sup> and *H*3. Using those results, the output components *y*1, *y*<sup>2</sup> and *y*<sup>3</sup> are immediately obtained from (9.31) and (9.34) without having to calculate any inverse Laplace transform.

Concretely, as the input signal consists of a single sinusoidal tone, the frequency mixes are composed by two entries *m* = (*m*−<sup>1</sup>, *m*1). The output of first order *y*<sup>1</sup> is obtained from the above equations by setting *k* = 1 and by summing over all multiindexes satisfying the constraint |*m*| = *m*−<sup>1</sup> + *m*<sup>1</sup> = 1. There are only two such multi-indexes: (0, 1) and rv((0, 1)) = (1, 0). The first order output of the system is therefore given by

$$\mathbf{y}\_1(t) = \mathfrak{R}\left\{H\_1(J\alpha\_1)A\mathbf{e}^{J\alpha\_1 t}\right\},$$

with *A* = |*A*|e−jπ/2.

The second order response of the system *y*<sup>2</sup> is obtained by setting *k* = 2 and summing over all multi-indexes under the constraint |*m*| = 2. There are three of them: (2, 0), (0, 2) and (1, 1). The first one is the reverse of the second one. Therefore, the contribution of these two is obtained from (9.31)

$$\mathbf{y}\_{2,(0,2)}^c(t) = \frac{1}{2} \mathfrak{R} \left\{ H\_2(j\omega\_1, j\omega\_1) A^2 \mathbf{e}^{j2a\_1t} \right\} \dots$$

Since the remaining multi-index is equal to its reverse (1, 1) = rv((1, 1)), its contribution is the constant given by (9.34)

$$\mathcal{Y}\_{2,(1,1)} = |A|^2 H\_2(-J\omega\_1, J\omega\_1) \dots$$

The response of second order is thus

$$
\chi\_2(t) = \chi\_{2,(0,2)}^c(t) + \chi\_{2,(1,1)}.
$$

The third order response of the system *y*<sup>3</sup> is obtained by setting *k* = 3 and summing over all multi-indexes for which |*m*| = 3. There are four of them:(3, 0), (2, 1), (1, 2) and (0, 3). Two of them are the reverse of the other two. For this reason the response of third order of the system *y*<sup>3</sup> is given by

$$\chi\_3(t) = \chi\_{3,(0,3)}^c(t) + \chi\_{3,(1,2)}^c(t)$$

with

$$\begin{aligned} \mathbf{y}\_{3,(0,3)}^c(t) &= \frac{1}{4} \Re \left\{ H\_3(j\omega\_1, j\omega\_1, j\omega\_1) A^3 \mathbf{e}^{j3a\_l t} \right\} \\ \mathbf{y}\_{3,(1,2)}^c(t) &= \frac{3}{4} \Re \left\{ H\_3(-j\omega\_1, j\omega\_1, j\omega\_1) |A|^2 A \mathbf{e}^{ja\_l t} \right\} \end{aligned}$$

#### **Example 9.8: Two Tones Input**

Suppose that we would like to implement a causal real LTI system. However, due to unavoidable limitations of physical components, the implementation behaves as a real weakly nonlinear system characterized by the nonlinear transfer functions *Hk* (see Fig. 9.4). We are interested in its output when driven by an input signal consisting in two sinusoidal tones

$$\mathbf{x}(t) = |A\_1|\cos(\alpha\_1 t + \varphi\_1) + |A\_2|\cos(\alpha\_2 t + \varphi\_2) \dots$$

We think of the two tones as closely spaced in frequency and denote the difference of their angular frequencies by ω = ω<sup>2</sup> − ω1.

As the input is composed by two sinusoidal signals, the frequency mixes have four entries *m* = (*m*−<sup>2</sup>, *m*−<sup>1</sup>, *m*1, *m*2). From (9.29) we calculate that there are 4, 10 and 20 frequency mixes of order one, two and three, respectively. They are listed in Table 9.1.

The first order output *y*<sup>1</sup> is the output that would be produced by a perfectly linear system. All other tones are undesired. In particular, while tones relatively distant in frequency from ω<sup>1</sup> and ω<sup>2</sup> are relatively easily suppressed with filters, tones close to them are much more difficult to filter out. The tones closest in frequency to ω<sup>1</sup> and ω<sup>2</sup> listed in Table 9.1 are the tones associated with the frequency mixes (1, 0, 2, 0), (0, 1, 0, 2) end their reverses

$$\mathcal{Y}\_{3,(1,0,2,0)}^c(t) = \frac{3}{4} \Re \left\{ \overline{A\_2} A\_1^2 \, H\_3(-j\omega\_2, J\omega\_1, J\omega\_1) \mathbf{e}^{J(\omega\_1 - \Delta\omega)t} \right\}$$

and

$$\mathcal{Y}\_{3,(0,1,0,2)}^c(t) = \frac{3}{4} \Re \left\{ \overline{A\_1} A\_2^2 \, H\_3(-J\omega\_1, J\omega\_2, J\omega\_2) \mathbf{e}^{J(\omega\_2 + \Delta\omega)t} \right\}$$

both produced by nonlinearities of third order.

The frequency mixes of fifth order are 56. Among them we can easily identify frequency mixes producing tones at every frequency generated by third order nonlinearities, in particular at ω<sup>1</sup> − ω = 2ω<sup>1</sup> − ω2. To see this, start with a frequency mix *m* producing the frequency of interest and add the same number *l* > 0 to *mn* and


**Table 9.1** Frequency mixes generated by the first, second and third order nonlinearities of a weakly nonlinear system driven by two sinusoidal tones

*m*−*<sup>n</sup>* for any *n* ranging from 1 to *N* (the number of input sinusoidal tones, here 2)

$$m' = (m\_{-N}, \dots, m\_{-n} + l, \dots, m\_n + l, \dots, m\_N) \dots$$

Then the order of the new frequency mix *m* is 2*l* higher than the one of *m* and the angular frequencies ω*<sup>m</sup>* and ω*<sup>m</sup>* associated with the two frequency mixes are identical (see (9.27)).

Using this construction starting from (1, 0, 2, 0), we see that the fifth order mixes (2, 0, 2, 1), (1, 1, 3, 0) and their reverses produce tones at ω<sup>1</sup> − ω

$$\begin{split} \mathcal{Y}^{c}\_{\mathfrak{H},(2,0,2,1)}(t) &= \frac{15}{8} \mathfrak{R} \left\{ \overline{A\_{2}}^{2} A\_{1}^{2} A\_{2} H\_{\mathfrak{H}}(-J\omega\_{2}, -J\omega\_{2}, J\omega\_{1}, J\omega\_{1}, J\omega\_{2}) \mathbf{e}^{J(\omega\_{1}-\Delta\alpha)t} \right\} \\ \mathcal{Y}^{c}\_{\mathfrak{H},(1,1,3,0)}(t) &= \frac{5}{4} \mathfrak{R} \left\{ \overline{A\_{2}} \overline{A\_{1}} A\_{1}^{3} H\_{\mathfrak{H}}(-J\omega\_{2}, -J\omega\_{1}, J\omega\_{1}, J\omega\_{1}, J\omega\_{1}) \mathbf{e}^{J(\omega\_{1}-\Delta\alpha)t} \right\} . \end{split}$$

The total response of the system at the frequency ω<sup>1</sup> − ω is therefore a possibly infinite sum composed by the above mixes and higher order ones

$$
\chi^c\_{3,(1,0,2,0)} + \chi^c\_{5,(2,0,2,1)} + \chi^c\_{5,(1,1,3,0)} + \dotsb \dotsb
$$

This sum can be represented graphically by drawing the phasor of each summand as a vector in the complex plane and summing them by vector addition. Figure 9.7 shows the phasor diagram for the above sum under the assumption that summands of order higher than fifth can be neglected.

Observe that summands of different order depend differently on the amplitude of the input signals |*A*1| and |*A*2|. For small input amplitudes the third order one is usually the dominant. As the amplitude of the input tones grows, higher order summands become first significant and then dominant. This means that both the magnitude as well as the phase of the output tone does change with the amplitude of the input signals. At some level of the input tones there may even be a canceling effect where the output tone becomes very small.

Among the 56 frequency mixes of fifth order there are several of them generating tones at new frequencies. In particular the closest in frequency to ω<sup>1</sup> and ω<sup>2</sup> (not generated by lower order mixes) are at ω<sup>1</sup> − 2ω and ω<sup>2</sup> + 2ω. Similarly, higher

**Fig. 9.8** Positive part of a typical magnitude output spectrum of a weakly nonlinear system driven by two sinusoidal input tones. The number *q* above each spectral line indicates the lowest order nonlinearity generating the line. The same line is also generated by every nonlinearity of order *<sup>q</sup>* <sup>+</sup> <sup>2</sup>*l*,*<sup>l</sup>* <sup>∈</sup> <sup>N</sup>. Only lines generated by fifth or lower order nonlinearities are shown

odd order frequency mixes introduce tones at new frequencies spaced by ω from the previous ones. Figure 9.8 illustrates a typical spectrum of the output signal. For simplicity of representation the figure only shows lines generated by fifth or lower order nonlinearities.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 10 Composition of Weakly Nonlinear Time Invariant Systems**

# **10.1 Cascade of Noninteracting Systems**

When building large systems, it's common to construct them by combining smaller subsystems. To gain the ability to investigate such systems, in this section we study the fundamental operation of *cascading* two systems, that is, of connecting the output of one system to the input of another one. In our treatment we are going to assume that this connection doesn't change the behavior of the involved systems. This is not always the case. Therefore, before applying what follows, we must carefully ponder this aspect.

Consider the cascade of the weakly nonlinear systems G and H as shown in Fig. 10.1. Both systems are characterised by their respective impulse responses *gk* and *hk* that we assume to belong to a convolution algebra A- <sup>⊕</sup>,sym. We are looking for an expression to represent

$$\mathbf{y} = (h \circ \mathbf{g})[\mathbf{x}] := h[\mathbf{g}[\mathbf{x}]]\,,$$

the composition of H after G that we denote by H ◦ G.

Let's first consider the system G. It's output *z* when driven by the one dimensional distribution *x*<sup>1</sup> is

$$z = \sum\_{k=1}^{\infty} g\_k \ast x\_1^{\otimes k} \cdot z$$

If instead of representing the input signal by a one dimensional distribution *x*1, we represent it by a sequence *x* = (0, *x*1, 0, . . .) ∈ A- <sup>⊕</sup>,sym with all its components but *x*<sup>1</sup> equal to zero and use the product that we defined on D- <sup>⊕</sup>,sym, then we can express *z* in the equivalent form

$$z = \sum\_{k=1}^{\infty} g\_k \ast x^k \dots$$

169

**Table 10.1** Lowest order impulse responses of the composite system *h* ◦ *g* in terms of the impulse responses of *h* and *g*

$$\begin{aligned} (h \circ g)\_1 &= h\_1 \ast g\_1 \\\\ (h \circ g)\_2 &= h\_1 \ast g\_2 + h\_2 \ast g\_1^{\otimes 2} \\\\ (h \circ g)\_3 &= h\_1 \ast g\_3 + 2 \, h\_2 \ast \left[ g\_1 \otimes g\_2 \right]\_{\text{sym}} + h\_3 \ast g\_1^{\otimes 3} \\\\ (h \circ g)\_4 &= h\_1 \ast g\_4 + h\_2 \ast \left[ g\_2^{\otimes 2} \right]\_{\text{sym}} + 2 \, h\_2 \ast \left[ g\_1 \otimes g\_3 \right]\_{\text{sym}} \\ &+ 3 \, h\_3 \ast \left[ g\_1^{\otimes 2} \otimes g\_2 \right]\_{\text{sym}} + h\_4 \ast g\_1^{\otimes 4} \\\\ (h \circ g)\_5 &= h\_1 \ast g\_5 + 2 h\_2 \ast \left[ g\_1 \otimes g\_4 \right]\_{\text{sym}} + 2 h\_2 \ast \left[ g\_2 \otimes g\_3 \right]\_{\text{sym}} \\ &+ 3 h\_3 \ast \left[ g\_1^{\otimes 2} \otimes g\_3 \right]\_{\text{sym}} + 3 h\_3 \ast \left[ g\_1 \otimes g\_2^{\otimes 2} \right]\_{\text{sym}} \\ &+ 4 h\_4 \ast \left[ g\_1^{\otimes 3} \otimes g\_2 \right]\_{\text{sym}} + h\_5 \ast g\_1^{\otimes 5} \end{aligned}$$

The obtained expression is even more reminiscent of a power series than the original one and, more importantly, it is more amenable to generalisation. In fact, if we assume that this expression remains valid for arbitrary input signals belonging to A- <sup>⊕</sup>,sym then the same expression can be used to describe the output of H in terms of *z*

$$y = \sum\_{k=1}^{\infty} h\_k \ast z^k \dots$$

We can then define the *composition* of weakly nonlinear systems by

$$\begin{split} (h \diamond g)[x\_1] &:= \sum\_{k=1}^{\infty} (h \diamond g)\_k \ast x\_1^{\otimes k} := \sum\_{k=1}^{\infty} h\_k \ast z^k \\ &= h\_1 \ast (g\_1 \ast x\_1 + g\_2 \ast x\_1^{\otimes 2} + \cdots) \\ &+ h\_2 \ast (g\_1 \ast x\_1 + g\_2 \ast x\_1^{\otimes 2} + \cdots)^2 \\ &+ h\_3 \ast (g\_1 \ast x\_1 + g\_2 \ast x\_1^{\otimes 2} + \cdots)^3 \\ &+ \cdots \end{split} \tag{10.1}$$

with (*h* ◦ *g*)*<sup>k</sup>* denoting the *k*th order impulse response of the overall system and consisting of all terms of dimension *k*. Note that, for every value of *k*, there are only a finite number of them as the lowest tensor power of *x*<sup>1</sup> appearing in *z<sup>n</sup>* is the *n*th one and thus

$$(z'')\_k = 0 \qquad \text{for} \quad n > k \dots$$

The first five components are listed in Table 10.1 for easy reference. Note, here as well, the analogy with power series and their composition [25].

The above definition by itself is not complete as the convolution between distributions of different dimensions is not defined. To complete the definition we have

**Table 10.2** Convolutions between impulse responses of different order appearing in the composition of weakly nonlinear systems and their definition. They are grouped by the resulting order, from second to fifth. To simplify the notation the symmetrization operation is not explicitly shown


thus to give a meaning to all undefined convolutions appearing in the expression for (*h* ◦ *g*).

Let's consider the convolutions appearing in (*h* ◦ *g*)*<sup>k</sup>* . The undefined ones are the ones involving *hl* with *l* < *k*. The first thing to note is that, for every *l*, all of them are convolution products between *hl* and a distribution that is the tensor product of *l* distributions. In addition, by definition, the sum of the dimensions of these *l* distributions must be *k*. The convolution products that we have to define have therefore all the form

$$h\_l \* \left[ \mathbf{g}\_1^{\otimes \alpha\_l} \otimes \dots \otimes \mathbf{g}\_{k-l+1}^{\otimes \alpha\_{k-l+1}} \right]\_{\text{sym}}, \qquad \alpha\_l \in \mathbb{N} \tag{10.2}$$

with

$$\sum\_{i=1}^{k-l+1} i \,\alpha\_i = k, \qquad \sum\_{i=1}^{k-l+1} \alpha\_i = l. \tag{10.3}$$

The simplest case is the one for *l* = 1

*h*<sup>1</sup> ∗ *gk*

which represents a nonlinearity of order *k*, *gk* , followed by a linear system *h*1. To find a suitable general definition for this convolution we let us guide by regular distributions belonging to *L*1, evaluated on the diagonal. Setting *k* = 2 for simplicity we have

$$\begin{aligned} \text{ev}\_{\mathsf{d}}(z\_2)(t) &= \text{ev}\_{\mathsf{d}}(\mathsf{g}\_2 \ast \mathsf{x}\_1^{\otimes 2})(t) \\ &= \int\_0^\infty \int\_0^\infty \mathbf{g}\_2(\lambda\_1, \lambda\_2) \mathbf{x}\_1(t - \lambda\_1) \mathbf{x}\_1(t - \lambda\_2) d\lambda\_1 d\lambda\_2 \dots \end{aligned}$$

Using this expression as the input of *h*<sup>1</sup> we obtain

$$\begin{split} \text{ev}\_{\mathsf{d}}(\mathsf{y}\_{2})(t) &= \int\_{0}^{\infty} h\_{1}(\mathsf{r}\_{1}) \text{ev}\_{\mathsf{d}}(\mathsf{z}\_{2}) (t - \mathsf{r}\_{1}) \, d\mathsf{r}\_{1} \\ &= \int\_{0}^{\infty} h\_{1}(\mathsf{r}\_{1}) \int\_{0}^{\infty} g\_{2}(\lambda\_{1}, \lambda\_{2}) \\ &\quad \cdot \, \mathsf{x}\_{1}(t - \lambda\_{1} - \mathsf{r}\_{1}) \mathsf{x}\_{1}(t - \lambda\_{2} - \mathsf{r}\_{1}) \, d\lambda\_{1} d\lambda\_{2} d\mathsf{r}\_{1} \\ &= \int\_{0}^{\infty} \int\_{0}^{\infty} h\_{1}(\mathsf{r}\_{1}) g\_{2}(\lambda\_{1} - \mathsf{r}\_{1}, \lambda\_{2} - \mathsf{r}\_{1}) \, d\mathsf{r}\_{1} \\ &\quad \cdot \, \mathsf{x}\_{1}(t - \lambda\_{1}) \mathsf{x}\_{1}(t - \lambda\_{2}) \, d\lambda\_{1} d\lambda\_{2} \,. \end{split}$$

Note that the innermost integral in the last expression is a convolution integral between *h*<sup>1</sup> and *g*2. It can be generalised to arbitrary distributions by building the tensor product of *h*1(τ1) with δ(τ<sup>2</sup> − τ1), the Dirac delta distribution in τ<sup>2</sup> parameterised (shifted) by τ<sup>1</sup>

$$\begin{split} & \langle \langle h\_{1}(\mathbf{r}\_{1}) \otimes \delta(\mathbf{r}\_{2} - \mathbf{r}\_{1}) \rangle \ast g\_{2}(\mathbf{r}\_{1}, \mathbf{r}\_{2}), \phi(\mathbf{r}\_{1}, \mathbf{r}\_{2}) \rangle \\ &= \langle h\_{1}(\mathbf{r}\_{1}) \otimes \delta(\mathbf{r}\_{2} - \mathbf{r}\_{1}) \otimes g\_{2}(\lambda\_{1}, \lambda\_{2}), \phi(\mathbf{r}\_{1} + \lambda\_{1}, \mathbf{r}\_{2} + \lambda\_{2}) \rangle \\ &= \langle h\_{1}(\mathbf{r}\_{1}) \otimes g\_{2}(\lambda\_{1}, \lambda\_{2}), \langle \delta(\mathbf{r}\_{2} - \mathbf{r}\_{1}), \phi(\mathbf{r}\_{1} + \lambda\_{1}, \mathbf{r}\_{2} + \lambda\_{2}) \rangle \rangle \\ &= \langle h\_{1}(\mathbf{r}\_{1}) \otimes g\_{2}(\lambda\_{1}, \lambda\_{2}), \phi(\mathbf{r}\_{1} + \lambda\_{1}, \mathbf{r}\_{1} + \lambda\_{2}) \rangle \rangle . \end{split}$$

The above derivation generalises without any difficulty to the convolution between *h*<sup>1</sup> and the impulse response of order *k* of G. Taking into account that impulse responses have to be symmetric, we thus define the convolution between *h*<sup>1</sup> and *gk* by

$$(h\_1 \ast g\_k)(\tau\_1, \dots, \tau\_k) := [h\_1(\tau\_1) \otimes \delta(\tau\_2 - \tau\_1, \dots, \tau\_k - \tau\_l)]\_{\text{sym}} \ast g\_k(\tau\_1, \dots, \tau\_k) \dots$$

In other words, to convolve *h*<sup>1</sup> with a distribution of dimension *k* we promote *h*<sup>1</sup> to a distribution of *k* dimensions by building the indicated tensor product and use the standard definition of convolution.

The Laplace transformed of *h*<sup>1</sup> ∗ *gk* has a very simple representation and leads to an easy interpretation. With

$$\begin{aligned} \left< h\_1(\mathfrak{r}\_1) \otimes \delta\left(\mathfrak{r}\_2 - \mathfrak{r}\_1, \dots, \mathfrak{r}\_k - \mathfrak{r}\_1\right), \mathfrak{e}^{-s\_1\mathfrak{r}\_1 - \dots - s\_k\mathfrak{r}\_k} \right> \\ = \left< h\_1(\mathfrak{r}\_1), \left< \delta(\mathfrak{r}\_2 - \mathfrak{r}\_1, \dots, \mathfrak{r}\_k - \mathfrak{r}\_1), \mathfrak{e}^{-s\_1\mathfrak{r}\_1 - \dots - s\_k\mathfrak{r}\_k} \right> \right> \\ = \left< h\_1(\mathfrak{r}\_1), \mathfrak{e}^{-(s\_1 + \dots + s\_k)\mathfrak{r}\_1} \right> \end{aligned}$$

we find

$$\mathcal{L}\{h\_1 \* g\_k\}(\mathbf{s}\_1, \dots, \mathbf{s}\_k) = H\_1(\mathbf{s}\_1 + \dots + \mathbf{s}\_k) \, G\_k(\mathbf{s}\_1, \dots, \mathbf{s}\_k) \dots$$

Therefore, if the input signal *x*<sup>1</sup> consists of *N* tones, the nonlinear system component *gk* generates new tones at frequencies that are linear combinations of *k* of the input frequencies at a time (see (9.25)). The linear system *h*<sup>1</sup> following it simply filters these newly generated tones as prescribed by its transfer function *H*1, in accordance with expectation.

Consider next the next simplest undefined convolution

$$h\_2 \ast \text{[g\_1 \otimes g\_2]\_{\text{sym}}} \cdot$$

As for the previous case, we look for a way to promote *h*<sup>2</sup> to a distribution of dimension *k* = 3 so that we can use the standard definition of convolution. We do so by working with multi-tone input signals as this leads to easier interpretations.

Let *g*<sup>1</sup> ⊗ *g*<sup>2</sup> be driven by 3 unit tones

$$\mathbf{x}\_1(t) = \mathbf{e}^{J^{\alpha \otimes t}} + \mathbf{e}^{J^{\alpha \otimes t}} + \mathbf{e}^{J^{\alpha \otimes t}},$$

then its output is

$$\begin{split} [\![g\_1 \otimes g\_2]\_{\text{sym}} \ast x\_1^{\otimes 3} \\ = & \sum\_{n\_1=1}^3 \sum\_{n\_2=1}^3 \sum\_{n\_3=1}^3 G\_1(f\omega\_{n\_1}) G\_2(f\omega\_{n\_2}, f\omega\_{n\_3}) \mathfrak{e}^{f(\omega\_{n\_1}\tau\_1 + \omega\_{n\_2}\tau\_2 + \omega\_{n\_3}\tau\_3)} \\ = & \sum\_{n\_1=1}^3 \sum\_{n\_2=1}^3 \sum\_{n\_3=1}^3 G\_1(f\omega\_{n\_1}) \mathfrak{e}^{f\omega\_{n\_1}\tau\_1} G\_2(f\omega\_{n\_2}, f\omega\_{n\_3}) \mathfrak{e}^{f(\omega\_{n\_2}\tau\_2 + \omega\_{n\_3}\tau\_3)}. \end{split}$$

with *G*1(*s*1)*G*2(*s*2,*s*3) the Laplace transform of *g*<sup>1</sup> ⊗ *g*2. This expression suggests that *g*<sup>1</sup> ⊗ *g*<sup>2</sup> can be interpreted as the parallel combination of a linear system and a second order one. For each term of the sum, the tone at ω*<sup>n</sup>*<sup>1</sup> passes through the linear system *g*<sup>1</sup> while the other two pass through *g*2. The output evd(*g*<sup>1</sup> ⊗ *g*2)(*t*) can thus be considered as consisting of a sum of pairs of tones, one at ω*<sup>n</sup>*<sup>1</sup> and the other at ω*<sup>n</sup>*<sup>2</sup> + ω*<sup>n</sup>*<sup>3</sup> . This sum of tones couples constitute the input of *h*<sup>2</sup> which processes them and, for each tone couple generates the signal

$$H\_2(j\omega\_{n\_1}, j\omega\_{n\_2} + j\omega\_{n\_3})G\_1(j\omega\_{n\_1})G\_2(j\omega\_{n\_2}, j\omega\_{n\_3})\mathbf{e}^{j(\omega\_{n\_1} + \omega\_{n\_2} + \omega\_{n\_3})t} \dots$$

Given these considerations, we define the convolution between *h*<sup>2</sup> and [*g*<sup>1</sup> ⊗ *g*2]sym by

$$h\_2 \ast \left[ \mathbf{g}\_1 \otimes \mathbf{g}\_2 \right]\_{\text{sym}} := \left[ \left( h\_2(\tau\_1, \tau\_2) \otimes \delta(\tau\_3 - \tau\_2) \right) \ast \left( \mathbf{g}\_1(\tau\_1) \otimes \mathbf{g}\_2(\tau\_2, \tau\_3) \right) \right]\_{\text{sym}} \dots$$

Its Laplace transform is

$$\mathcal{L}\{h\_2 \ast \{g\_1 \otimes g\_2\}\_{\text{sym}}\}(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) = \{H\_2(\mathbf{s}\_1, \mathbf{s}\_2 + \mathbf{s}\_3)G\_1(\mathbf{s}\_1)G\_2(\mathbf{s}\_2, \mathbf{s}\_3)\}\_{\text{sym}}.$$

The above considerations can be extended to the general case (10.2). The tensor product of *l* distributions

$$\mathbf{g}\_1^{\otimes \alpha\_1} \otimes \dots \otimes \mathbf{g}\_{k-l+1}^{\otimes \alpha\_{k-l+1}}, \qquad \sum\_{i=1}^{k-l+1} \alpha\_i = l$$

can be thought of as a set of*l* parallel subsystems of order lower than *k*. The constraints (10.3) make sure that with *k* input tones, its output can be made to consists of *l* tones at linear combinations of the original input frequencies. These can then be passed as input to *hl* .

The intended meaning of the generalised convolution expressed by (10.2) can thus be captured by promoting *hl* to a *k* dimensional distribution obtained by building the tensor product of *hl* and *k* − *l* appropriately shifted δ distributions constructed as follows.

• The first independent variable of each of the *l* distributions

$$g\_{\mathfrak{m}\_1}(\mathfrak{r}\_1, \dots, \mathfrak{r}\_{\mathfrak{m}\_1}) \otimes \dots \otimes g\_{\mathfrak{m}\_j}(\mathfrak{r}\_{\mathfrak{n}+1}, \dots, \mathfrak{r}\_{\mathfrak{n}+\mathfrak{m}\_j}) \otimes \dots \otimes g\_{\mathfrak{m}\_l}(\mathfrak{r}\_{\mathfrak{k}-\mathfrak{m}\_l+1}, \dots, \mathfrak{r}\_k)$$

form the list of independent variables of *hl*

$$h\_l(\mathfrak{r}\_1, \dots, \mathfrak{r}\_{n+1}, \dots, \mathfrak{r}\_{k-m\_l+1}) \dots$$

• For each additional variable of *gm <sup>j</sup>* , *m <sup>j</sup>* > 1, we tensor-multiply *hl* by a Dirac distribution in this same variable, shifted by the first one

$$
\delta(\mathfrak{r}\_{n+2} - \mathfrak{r}\_{n+1}) \otimes \dots \otimes \delta(\mathfrak{r}\_{n+m\_j} - \mathfrak{r}\_{n+1}) \dots
$$

• The resulting *k* dimensional distribution has finally to be symmetrized.

**Table 10.3** Convolutions between impulse responses of different order appearing in the composition of weakly nonlinear systems and their Laplace transforms. They are grouped by the resulting order, from second to fifth. To simplify the notation the symmetrization operation is not explicitly shown


A few convolution examples are given in Table 10.2. The Laplace transformed of these examples are tabulated in Table 10.3. With this definition we have completed the description of how to compose weakly nonlinear systems.

#### **Example 10.1: Third-Order Nonlinearity**

The third order nonlinearity of H ◦ G is generated in three distinct ways: First, by the nonlinearity of third order of H applied to the output of the linear part of G

$$H\_3(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) G\_1(\mathbf{s}\_1) G\_1(\mathbf{s}\_2) G\_1(\mathbf{s}\_2) X\_1(\mathbf{s}\_1) X\_1(\mathbf{s}\_2) X\_1(\mathbf{s}\_3) \dots$$

second, by the nonlinearity of third order of G passing through the linear part of H

$$H\_1(s\_1 + s\_2 + s\_3)G\_3(s\_1, s\_2, s\_2)X\_1(s\_1)X\_1(s\_2)X\_1(s\_3)$$

and third, by the second order nonlinearity of H applied to the output of first and second order of G

$$\mathcal{Z}\left\{H\_{2}(\mathbf{s}\_{1},\mathbf{s}\_{2}+\mathbf{s}\_{3})G\_{1}(\mathbf{s}\_{1})G\_{2}(\mathbf{s}\_{2},\mathbf{s}\_{3})\right\}\_{\text{sym}}\dots$$

These mechanisms are represented graphically in Fig. 10.2. In particular one should note that, even if neither G nor H shows nonlinearities of third order, the combined system H ◦ G in general still has an impulse response of third order different from zero.

**Fig. 10.2** Graphical representation of the third order nonlinearity generated by the composition H ◦ G. Each path has to be understood as symmetrised

#### **Example 10.2: Memory-Less Systems**

Consider the convolution(10.2) with *hl* a Dirac distribution of dimension *l* < *k*

$$\delta^{\otimes l} \ast \left[ \mathbf{g}\_1^{\otimes \alpha\_l} \otimes \dots \otimes \mathbf{g}\_{k-l+1}^{\otimes \alpha\_{k-l+1}} \right]\_{\text{sym}} \cdot \mathbf{g}\_1$$

By definition, the lower dimensional distribution δ⊗*<sup>l</sup>* is promoted to a distribution of dimension *k* by building the tensor product with shifted Dirac distributions as explained. For simplicity, we denote the promoted distribution by *hk* . Application of the convolution to a test function <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R*<sup>k</sup>* ) is defined by

$$\begin{aligned} \left\langle h\_k(\mathbf{r}) \otimes \left[ \mathbf{g}\_1^{\otimes \alpha\_1} \otimes \dots \otimes \mathbf{g}\_{k-l+1}^{\otimes \alpha\_{k-l+1}}(\lambda) \right]\_{\text{sym}}, \phi(\mathbf{r} + \lambda) \right\rangle \\ &= \left\langle \left[ \mathbf{g}\_1^{\otimes \alpha\_1} \otimes \dots \otimes \mathbf{g}\_{k-l+1}^{\otimes \alpha\_{k-l+1}}(\lambda) \right]\_{\text{sym}}, \left\langle h\_k(\mathbf{r}), \phi(\mathbf{r} + \lambda) \right\rangle \right\rangle \end{aligned}$$

with τ,λ <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* . The inner distribution is easily evaluated

$$\langle h\_k(\mathfrak{r}), \phi(\mathfrak{r} + \lambda) \rangle = \phi(\lambda)$$

and from this we conclude that for any *l* ≤ *k*

$$\delta^{\otimes l} \* \left[ g\_1^{\otimes \alpha\_1} \otimes \dots \otimes g\_{k-l+1}^{\otimes \alpha\_{k-l+1}} \right]\_{\text{sym}} = \left[ g\_1^{\otimes \alpha\_1} \otimes \dots \otimes g\_{k-l+1}^{\otimes \alpha\_{k-l+1}} \right]\_{\text{sym}} \dots$$

With this result we see that the response of a memoryless weakly-nonlinear system can be written in the following equivalent forms

$$\mathbf{y} = \sum\_{k=1}^{\infty} c\_k \mathbf{x}^k = \sum\_{k=1}^{\infty} c\_k \delta^{\otimes l\_k} \ast \mathbf{x}^k, \qquad l\_k \le k \,. \tag{10.4}$$

In general we will use *lk* = *k* so that, if the input signal is a one dimensional distribution *x*1, we do not need to use the extended definition of convolution.

Our definition of convolution between distributions of different dimensions and our definition of the one dimensional differential operator operating on higher dimensional distributions (9.14) are compatible. In fact the former is a generalization of the latter. Consider the differential operator acting on the *k* dimensional Dirac distribution <sup>δ</sup>⊗*<sup>k</sup>* . Application to a test function <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R*<sup>k</sup>* ) results in

$$\begin{aligned} \left< D\delta^{\otimes k}, \phi \right> &= \left< \sum\_{j=1}^{k} D\_j \delta^{\otimes k}, \phi \right> = -\left< \delta^{\otimes k}, \sum\_{j=1}^{k} D\_j \phi \right>, \\\ &= -\sum\_{j=1}^{k} D\_j \phi \left( 0, \dots, 0 \right). \end{aligned}$$

If we now consider φ as a function of the variable τ<sup>1</sup> only and *D*<sup>τ</sup><sup>1</sup> the total differential operator, then we can write

$$\begin{aligned} -\sum\_{j=1}^{k} D\_j \phi(0, \dots, 0) &= -\left< \delta(\tau\_1), D\_{\tau\_1} \phi(\tau\_1, \dots, \tau\_1) \right> \\ &= \left< D\_{\tau\_1} \delta(\tau\_1), \phi(\tau\_1, \dots, \tau\_1) \right> \\ &= \left< (D\delta) \* \delta^{\otimes k}, \phi \right> \end{aligned}$$

which shows that our definition of the differential operator acting on a higher dimensional distribution is equal to the convolution of the one dimensional distribution *D*δ promoted by our definition of convolution to a *k* dimensional distribution

$$D\delta^{\otimes k} = (D\delta) \* \delta^{\otimes k} . \tag{10.5}$$

This is also apparent from the Laplace transformed that in both cases are equal to

$$\mathbf{s}\_1 + \mathbf{\cdot} + \mathbf{s}\_k \dots$$

The differential operator and the extended definition of convolution do satisfy (3.15). We show this by way of an example. Consider the convolution between *h*<sup>2</sup> and *g*<sup>1</sup> ⊗ *g*2. Suppose further that *h*<sup>2</sup> is the derivative in the sense of (9.14) of another distribution w<sup>2</sup>

$$h\_2 = Dw\_2 \, .$$

Applying the convolution product to a test function <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R3) and using the extended definition of convolution we obtain

$$
\begin{split}
\langle h\_{2}\ast(\mathcal{g}\_{1}\otimes\mathcal{g}\_{2}),\phi\rangle &= \langle Dw\_{2}\ast(\mathcal{g}\_{1}\otimes\mathcal{g}\_{2}),\phi\rangle \\ &= \langle \{ [Dw\_{2}(\mathsf{r}\_{1},\mathsf{r}\_{2})]\otimes\delta(\mathsf{r}\_{3}-\mathsf{r}\_{2}) \}\otimes[\mathcal{g}\_{1}(\lambda\_{1})\otimes\mathcal{g}\_{2}(\lambda\_{2},\lambda\_{3})],\phi\rangle \\ &\quad \phi(\mathsf{r}\_{1}+\lambda\_{1},\mathsf{r}\_{2}+\lambda\_{2},\mathsf{r}\_{3}+\lambda\_{3})\rangle \\ &= \langle [Dw\_{2}(\mathsf{r}\_{1},\mathsf{r}\_{2})]\otimes[\mathcal{g}\_{1}(\lambda\_{1})\otimes\mathcal{g}\_{2}(\lambda\_{2},\lambda\_{3})],\phi\rangle \\ &\quad \phi(\mathsf{r}\_{1}+\lambda\_{1},\mathsf{r}\_{2}+\lambda\_{2},\mathsf{r}\_{2}+\lambda\_{3})\rangle\ .\end{split}
$$

Further, using the definition of differentiation and noting that

$$D\_{\tau\_2} \phi(\tau\_1 + \lambda\_1, \tau\_2 + \lambda\_2, \tau\_2 + \lambda\_3) = (D\_{\lambda\_2} + D\_{\lambda\_3}) \phi(\tau\_1 + \lambda\_1, \tau\_2 + \lambda\_2, \tau\_2 + \lambda\_3)$$

we obtain

$$-\langle [w\_2(\tau\_1, \tau\_2)] \otimes [g\_1(\lambda\_1) \otimes g\_2(\lambda\_2, \lambda\_3)],$$

$$(D\_{\tau\_1} + D\_{\tau\_2})\phi(\tau\_1 + \lambda\_1, \tau\_2 + \lambda\_2, \tau\_2 + \lambda\_3)\rangle$$

$$=-\langle [w\_2(\tau\_1, \tau\_2)] \otimes [g\_1(\lambda\_1) \otimes g\_2(\lambda\_2, \lambda\_3)],$$

$$(D\_{\lambda\_1} + D\_{\lambda\_2} + D\_{\lambda\_3})\phi(\tau\_1 + \lambda\_1, \tau\_2 + \lambda\_2, \tau\_2 + \lambda\_3)\rangle$$

$$=\langle w\_2(\tau\_1, \tau\_2) \otimes D[g\_1(\lambda\_1) \otimes g\_2(\lambda\_2, \lambda\_3)],$$

$$\phi(\tau\_1 + \lambda\_1, \tau\_2 + \lambda\_2, \tau\_2 + \lambda\_3)\rangle$$

or, summarising

$$(Dw\_2)\*(\mathcal{g}\_1 \otimes \mathcal{g}\_2) = w\_2\*D(\mathcal{g}\_1 \otimes \mathcal{g}\_2)\,. \tag{10.6}$$

# **10.2 Feedback**

A powerful technique used in the design of all sorts of systems is feedback. In control systems design, this technique is used to stabilise and adjust the dynamics of a system to achieve a desired behaviour. It's also used to reduce the sensitivity of systems to poorly controlled parameters. Here we are interested in describing the nonlinearities of a system making use of feedback based on the ones of its constituting subsystems.

Consider the system shown in Fig. 10.3 composed by a forward subsystem G and a feedback subsystem H. The input of G is the difference between the input signal *x* and the signal *z*, a signal obtained by sensing the output *y* and suitably processed by H. The system is described by the following equations

**Fig. 10.3** Weakly nonlinear system with feedback

*e* = *x* − *z z* = (*h* ◦ *g*)[*e*] *y* = *g*[*e*] .

Our objective is to obtain the impulse responses of the system based on the ones of G and H. We denote the overall system by W and its impulse response of order *k* by w*<sup>k</sup>* .

We start by computing the linear impulse response. The composition of linear systems is obtained by convolving their first order impulse responses. We can therefore write the equation

$$e\_1 = \delta - z\_1 = \delta - h\_1 \* g\_1 \* e\_1$$

and, solving for *e*1, we obtain

$$e\_1 = (\delta + h\_1 \* g\_1)^{\*-1} \dots$$

With *e*<sup>1</sup> the calculation of the linear impulse response is immediate

$$w\_1 = \mathbf{g\_1} \* e\_1 = (\delta + h\_1 \* \mathbf{g\_1})^{\*-1} \* \mathbf{g\_1} \dots$$

Its Laplace transform is a classical result of linear system theory

$$W\_1(s) = \frac{G\_1(s)}{1 + H\_1(s)G\_1(s)}$$

.

If in the frequency range of interest the magnitude of the linear loop gain is large |*H*1(jω)*G*1(jω)|  1 then the linear response of the system is almost exclusively determined by the feedback network

$$W\_1(J\alpha) \approx \frac{1}{H\_1(J\alpha)}\ .$$

For completeness, we give the Laplace transform of *e*<sup>1</sup> as well

$$E\_1(s\_1) = \frac{1}{1 + H\_1(s)G\_1(s)}\ .$$

With it the first order transfer function can be written as

$$W\_1(\mathbf{s}\_1) = G\_1(\mathbf{s}\_1) E\_1(\mathbf{s}\_1) \dots$$

Next we compute the impulse response of second order *W*2. Using the generalised response of weakly nonlinear systems (10.1), the second order component of the output signal *y* and of the feedback signal *z* are given by

$$\begin{aligned} \mathbf{y\_2} &= \mathbf{g\_2} \ast e\_1^{\otimes 2} + \mathbf{g\_1} \ast e\_2 \\ \mathbf{z\_2} &= h\_2 \ast \mathbf{y\_1}^{\otimes 2} + h\_1 \ast \mathbf{y\_2} \dots \end{aligned}$$

Note that, since we used a Dirac impulse as input, the output components *y*<sup>2</sup> and *y*<sup>1</sup> correspond to the impulse responses w<sup>2</sup> and w<sup>1</sup> respectively. By substituting the first equation into the second, using the previous result for w<sup>1</sup> and taking into account the fact that the input signal is a one dimensional distribution, we obtain an equation in *e*2

$$z\_2 = -e\_2 = h\_2 \ast (\mathbf{g}\_1 \ast e\_1)^{\otimes 2} + h\_1 \ast (\mathbf{g}\_2 \ast e\_1^{\otimes 2} + \mathbf{g}\_1 \ast e\_2) \dots$$

whose solution is

$$e\_2 = - (\delta^{\otimes 2} + h\_1 \ast g\_1)^{\ast -1} \ast (h\_2 \ast w\_1^{\otimes 2} + h\_1 \ast g\_2 \ast e\_1^{\otimes 2}) \, .$$

With *e*<sup>2</sup> and the previous results for *e*<sup>1</sup> and w<sup>1</sup> the second order impulse response is thus given by

$$w\_2 = \mathbf{g}\_2 \ast e\_1^{\otimes 2} + \mathbf{g}\_1 \ast e\_2 \dots$$

Its Laplace transform is

$$W\_2(\mathbf{s}\_1, \mathbf{s}\_2) = G\_2(\mathbf{s}\_1, \mathbf{s}\_2)E\_1(\mathbf{s}\_1)E\_1(\mathbf{s}\_2) + G\_1(\mathbf{s}\_1 + \mathbf{s}\_2)E\_2(\mathbf{s}\_1, \mathbf{s}\_2)$$

with

$$\begin{aligned} E\_2(\mathbf{s}\_1, \mathbf{s}\_2) &= \\ &- \frac{H\_2(\mathbf{s}\_1, \mathbf{s}\_2) W\_1(\mathbf{s}\_1) W\_1(\mathbf{s}\_2) + H\_1(\mathbf{s}\_1 + \mathbf{s}\_2) G\_2(\mathbf{s}\_1, \mathbf{s}\_2) E\_1(\mathbf{s}\_1) E\_1(\mathbf{s}\_2)}{1 + H\_1(\mathbf{s}\_1 + \mathbf{s}\_2) G\_1(\mathbf{s}\_1 + \mathbf{s}\_2)}. \end{aligned}$$

Combining these expressions and using previous results, we can write *W*<sup>2</sup> in the following form

$$\begin{aligned} W\_2(\mathbf{s}\_1, \mathbf{s}\_2) &= \left| E\_1(\mathbf{s}\_1 + \mathbf{s}\_2) G\_2(\mathbf{s}\_1, \mathbf{s}\_2) \right. \\ &\quad - W\_1(\mathbf{s}\_1 + \mathbf{s}\_2) H\_2(\mathbf{s}\_1, \mathbf{s}\_2) G\_1(\mathbf{s}\_1) G\_1(\mathbf{s}\_2) \right| E\_1(\mathbf{s}\_1) E\_1(\mathbf{s}\_2) \quad (10.7) \end{aligned}$$

**Fig. 10.4** Signal flow graph of a weakly nonlinear system with feedback

which is easily interpretable with the help of the signal flow graph (SFG) shown in Fig. 10.4 (see Appendix A).

The first term is composed by the transmission of the input signal—we think of it as composed by *k* tones — through the linear system to node *E*, the input of the nonlinear subsystem G. This part of the signal flow is represented by the factor *E*1(*s*1)*E*1(*s*2). The second order nonlinearity of G then generates a new tone as determined by *G*2(*s*1,*s*2). This newly generated tone is represented in the SFG by a source node because it is different from the input ones. The propagation of the new tone to the output of the system is accounted for by the last factor, *E*1(*s*<sup>1</sup> + *s*2).

The second summand in (10.7) has a similar interpretation. The input signal first propagates through the linear system to the input of the other nonlinear subsystem H. This part of the signal flow is represented by *G*1(*s*1)*E*1(*s*1)*G*1(*s*2)*E*1(*s*2) = *W*1(*s*1)*W*1(*s*2). The second order nonlinearity of H then generates a new tone as determined and accounted for by the *H*2(*s*1,*s*2) factor. Finally, the new tone propagates to the output of the system, contributing the last factor, −*W*1(*s*<sup>1</sup> + *s*2).

We now proceed with the calculation of the third order impulse response of the system. The procedure is similar to the one used for the computation of the second order one. From

$$\begin{aligned} \mathbf{y}\_3 &= \mathbf{g}\_3 \ast e\_1^{\otimes 3} + 2\mathbf{g}\_2 \ast [e\_1 \otimes e\_2]\_{\text{sym}} + \mathbf{g}\_1 \ast e\_3 \\ \mathbf{z}\_3 &= h\_3 \ast \mathbf{y}\_1^{\otimes 3} + 2h\_2 \ast [\mathbf{y}\_1 \otimes \mathbf{y}\_2]\_{\text{sym}} + h\_1 \ast \mathbf{y}\_3 \end{aligned}$$

and the previous results we obtain an equation for *e*<sup>3</sup>

$$\begin{aligned} z\_3 = -e\_3 &= h\_3 \ast (g\_1 \ast e\_1)^{\otimes 3} \\ &+ 2h\_2 \ast \left[ (g\_1 \ast e\_1) \otimes (g\_2 \ast e\_1^{\otimes 2} + g\_1 \ast e\_2) \right]\_{\text{sym}} \\ &+ h\_1 \ast (g\_3 \ast e\_1^{\otimes 3} + 2g\_2 \ast [e\_1 \otimes e\_2]\_{\text{sym}} + g\_1 \ast e\_3) \end{aligned}$$

whose solution is

$$\begin{aligned} e\_3 &= - (\delta^{\otimes 3} + h\_1 \ast g\_1)^{\ast -1} \ast \left[ h\_3 \ast (g\_1 \ast e\_1)^{\otimes 3} \\ &+ 2h\_2 \ast \left[ (g\_1 \ast e\_1) \otimes (g\_2 \ast e\_1^{\otimes 2} + g\_1 \ast e\_2) \right]\_{\text{sym}} \\ &+ h\_1 \ast (g\_3 \ast e\_1^{\otimes 3} + 2g\_2 \ast (e\_1 \otimes e\_2)\_{\text{sym}}) \right]. \end{aligned}$$

The third order impulse response is obtained by inserting this expression for *e*<sup>3</sup> and the previous ones for *e*<sup>1</sup> and *e*<sup>2</sup> into

$$w\_3 = \mathbf{g}\_3 \ast e\_1^{\otimes 3} + 2\mathbf{g}\_2 \ast [e\_1 \otimes e\_2]\_{\text{sym}} + \mathbf{g}\_1 \ast e\_3 \dots$$

As we find the expressions more easily interpretable, we perform this calculation in the Laplace domain. The Laplace transform of the last expressions for w<sup>3</sup> and *e*<sup>3</sup> are

$$\begin{aligned} W\_3(s\_1, s\_2, s\_3) &= G\_3(s\_1, s\_2, s\_3) E\_1(s\_1) E\_1(s\_2) E\_1(s\_3) \\ &+ 2 \left[ G\_2(s\_1, s\_2 + s\_3) E\_1(s\_1) E\_2(s\_2, s\_3) \right]\_{\text{sym}} \\ &+ G\_1(s\_1 + s\_2 + s\_3) E\_3(s\_1, s\_2, s\_3) \end{aligned}$$

and

$$\begin{aligned} E\_3(s\_1, s\_2, s\_3) &= \frac{-1}{1 + H\_1(s\_1 + s\_2 + s\_3)G\_1(s\_1 + s\_2 + s\_3)} \\ &\quad \left\{ H\_3(s\_1, s\_2, s\_3)W\_1(s\_1)W\_1(s\_2)W\_1(s\_3) \\ &\quad + 2 \Big[ H\_2(s\_1, s\_2 + s\_3)W\_1(s\_1) \Big[ G\_2(s\_2, s\_3)E\_1(s\_2)E\_1(s\_3) \\ &\quad + G\_1(s\_2 + s\_3)E\_2(s\_2, s\_3) \Big] \Big]\_{\text{sym}} \\ &\quad + H\_1(s\_1 + s\_2 + s\_3) \Big[ G\_3(s\_1, s\_2, s\_3)E\_1(s\_1)E\_1(s\_2)E\_1(s\_3) \\ &\quad + 2 \Big[ G\_2(s\_1, s\_2 + s\_3)E\_1(s\_1)E\_2(s\_2, s\_3) \Big]\_{\text{sym}} \end{aligned}$$

respectively. Combining these and previous results we can express *W*<sup>3</sup> as follows

$$\begin{aligned} W\_3(s\_1, s\_2, s\_3) &= E\_1(s\_1 + s\_2 + s\_3) G\_3(s\_1, s\_2, s\_3) E\_1(s\_1) E\_1(s\_2) E\_1(s\_3) \\ &- W\_1(s\_1 + s\_2 + s\_3) H\_3(s\_1, s\_2, s\_3) W\_1(s\_1) W\_1(s\_2) W\_1(s\_3) \\ &+ 2 W\_1(s\_1 + s\_2 + s\_3) H\_2(s\_1, s\_2 + s\_3) W\_1(s\_1) \\ &\cdot \Big[ W\_1(s\_2 + s\_3) H\_2(s\_2, s\_3) W\_1(s\_2) W\_1(s\_3) \\ &- E\_1(s\_2 + s\_3) G\_2(s\_2, s\_3) E\_1(s\_2) E\_1(s\_3) \Big]\_{\text{sym}} \\ &- 2 E\_1(s\_1 + s\_2 + s\_3) G\_2(s\_1, s\_2 + s\_3) E\_1(s\_1) \end{aligned}$$

$$\cdot \left[ H\_1(\mathbf{s}\_2 + \mathbf{s}\_3) E\_1(\mathbf{s}\_2 + \mathbf{s}\_3) G\_2(\mathbf{s}\_2, \mathbf{s}\_3) E\_1(\mathbf{s}\_2) E\_1(\mathbf{s}\_3) \right.$$

$$\left. + E\_1(\mathbf{s}\_2 + \mathbf{s}\_3) H\_2(\mathbf{s}\_2, \mathbf{s}\_3) W\_1(\mathbf{s}\_2) W\_1(\mathbf{s}\_3) \right]\_{\text{sym}}.\tag{10.8}$$

While this expression is rather long, it can be readily interpreted with the help of the SFG of Fig. 10.4. The first term is composed by the factor *E*1(*s*1)*E*1(*s*2)*E*1(*s*3) representing the input signal propagating through the linear part of the system to the input of G. The third order nonlinearity of G then generates a new tone as witnessed by *G*3(*s*1,*s*2,*s*3). Finally, the newly generated tone propagates through the linear part of the system to the output, *E*1(*s*<sup>1</sup> + *s*<sup>2</sup> + *s*3).

The second term has a similar structure and represents the contribution to the third order nonlinearity of W by the third order nonlinearity of H.

The next summand represents the mixing of the second order nonlinear component of H with the input signal in the second order nonlinearity of H (again). Specifically, thinking of the input signal as composed by three tones, the factors *W*1(*s*1), *W*1(*s*2) and *W*1(*s*3)represent the input tones propagating through the linear part of the system to the input of H. There, the second and third tones pass through the second order nonlinearity of H generating a new second order tone, *H*2(*s*2,*s*3). This second order tone then propagates through the linear part of the system to the input of H,−*W*1(*s*<sup>2</sup> + *s*3). There the second order tone and the first input tone pass through the second order distortion of H together and generate a new third order tone as witnessed by 2*H*2(*s*1,*s*<sup>2</sup> + *s*3). Finally, the third order tone propagates through the linear part of the system to the output, −*W*1(*s*<sup>1</sup> + *s*<sup>2</sup> + *s*3).

The remaining summands have all a similar structure and interpretation as the one just described. They describe the first input tone mixing with a second order tone. The difference between them lies in which subsystem generates the second order tone and which one mixes the first tone with the second order one.

Higher order impulse responses and nonlinear transfer functions of W can be obtained in a similar way. While the expressions become long, they can easily be computed with the help of computer algebra systems (CAS) computer programs and, referring to the SFG in Fig. 10.4, can be interpreted without difficulty.

From the first three nonlinear transfer functions of the feedback based system W we can draw the following conclusions.


transfer function of order *k* are proportional to

$$E\_1(\mathbf{s}\_1)\cdots E\_1(\mathbf{s}\_k)E\_1(\mathbf{s}\_1+\cdots+\mathbf{s}\_k)$$

and, as the loop gain is made large, *E*<sup>1</sup> becomes small.

• The nonlinear terms generated exclusively by the feedback subsystem H are not suppressed by making the magnitude of the loop gain |*H*1(*s*)*G*1(*s*)| large. That's because none of these terms are proportional to the linear component of the error signal *E*1. Instead, they are all proportional to

$$W\_1(\mathbf{s}\_1)\cdots\mathbf{V}\_1(\mathbf{s}\_k)\,W\_1(\mathbf{s}\_1+\cdots+\mathbf{s}\_k)\,.$$

which doesn't necessarily become small as the loop gain is made large.

• Nonlinear terms generated by combinations of nonlinearities of G as well as of H include factors in *E*<sup>1</sup> and therefore do experience some level of suppression at large loop gains.

#### **Example 10.3: Linear Feedback**

As a special case we consider a system with linear feedback. This means that all transfer functions of H are zero, except for *H*1. In this case the second and third order nonlinear transfer functions of the system are

$$\begin{aligned} W\_2(\mathbf{s}\_1, \mathbf{s}\_2) &= \frac{G\_2(\mathbf{s}\_1, \mathbf{s}\_2) E\_1(\mathbf{s}\_1) E\_1(\mathbf{s}\_2)}{1 + H\_1(\mathbf{s}\_1 + \mathbf{s}\_2) G\_1(\mathbf{s}\_1 + \mathbf{s}\_2)} \\ &= E\_1(\mathbf{s}\_1 + \mathbf{s}\_2) G\_2(\mathbf{s}\_1, \mathbf{s}\_2) E\_1(\mathbf{s}\_1) E\_1(\mathbf{s}\_2) \end{aligned}$$

and

$$\begin{aligned} W\_3(s\_1, s\_2, s\_3) &= E\_1(s\_1 + s\_2 + s\_3) \left| G\_3(s\_1, s\_2, s\_3) \\ &- 2 \Big[ G\_2(s\_1, s\_2 + s\_3) H\_1(s\_2 + s\_3) E\_1(s\_2 + s\_3) G\_2(s\_2, s\_3) \Big]\_{\text{sym}} \right] \\ &\qquad \cdot E\_1(s\_1) E\_1(s\_2) E\_1(s\_3) \end{aligned}$$

respectively. Both of them are proportional to

$$E\_1(\mathbf{s}\_1) \cdots E\_1(\mathbf{s}\_k) E\_1(\mathbf{s}\_1 + \cdots + \mathbf{s}\_k)$$

and can therefore be suppressed by making the loop gain large.

#### **Example 10.4**

We revisit Example 9.5 again. Here however we replace the initial condition *y*0δ by a generic input signal *x* so that the system equation becomes

$$(D\delta + a\delta) \* \mathcal{y} = \mathbf{x} + c\mathbf{y}^2.$$

Using (10.1) we can rewrite the equation in the following form

$$\mathbf{y} = (D\delta + a\delta)^{\*-1} \* (\mathbf{x} + c\mathbf{y}^2),$$

which can be interpreted as describing a linear system with nonlinear feedback. The problem can therefore be recast as the problem of finding the nonlinear transfer functions of a systemW constituted by the forward subsystem G with linear transfer function

$$G\_1(s\_1) = \frac{1}{s\_1 + a}$$

and a feedback subsystem H described by the second order nonlinear transfer function

$$H\_2(s\_1, s\_2) = -c$$

as shown in Fig. 10.5. Note that we have assumed negative feedback for consistency with our general treatment. This last expression is obtained by specialising the general expression *cy*<sup>2</sup> to an input signal having only a one dimensional component *y*<sup>1</sup>

$$c\mathbf{y}\_1^2 = c\mathbf{y}\_1^{\otimes 2} = c\boldsymbol{\delta}^{\otimes 2} \* \mathbf{y}\_1^{\otimes 2}.$$

The obtained expression clearly describes a system whose only impulse response differing from zero is the second order one *h*<sup>2</sup> = *c*δ⊗<sup>2</sup> (see also Example 10.2).

In this formulation of the problem the solution is found by inserting the above expressions for the transfer functions of the subsystems into Eqs. (10.7) and (10.8). The obtained expressions obviously agree with the ones obtained in Example 9.5 by calculation from the convolution equation.

# **10.3 Linearisation**

Many systems are designed based on the theory of linear systems and the deviation from linear behavior in practical implementations is undesired. For this reason in practical implementations one often tries to minimise the responses of order higher than one. In this section we investigate the possibility of suppressing higher order responses by preceding the system in question H with another system G or by following it with a system K.

We call a system K designed to suppress all nonlinear transfer functions of K ◦ H up to order *k* a *post-lineariser of order k* and a system K suppressing all responses of K ◦ H of order higher than one a *post-lineariser*. Similarly, we call a system G designed to suppress all nonlinear transfer functions of H ◦ G up to order *k* a *prelineariser of order k* and a system G suppressing all responses of H ◦ G of order higher than one a *pre-lineariser* or *pre-distorter*.

We first investigate post-linearisers. The first requirement is that the system K should not change the linear response of H. This is only the case if the linear impulse response of K is a Dirac impulse

$$k\_1 = \delta \dots$$

Next, we look for a condition to suppress the response of second order. Referring to Table 10.2 we see that the second order response of K ◦ H disappears if

$$(k \diamond h)\_2 = k\_1 \ast h\_2 + k\_2 \ast h\_1^{\otimes 2} = 0 \dots$$

Therefore, if *h*<sup>1</sup> has an inverse, we can make (*k* ◦ *h*)<sup>2</sup> disappear by choosing

$$k\_2 = -h\_2 \* (h\_1^{\otimes 2})^{\*-1}.\tag{10.9}$$

In the Laplace domain this is

$$K\_2(\mathbf{s}\_1, \mathbf{s}\_2) = -\frac{H\_2(\mathbf{s}\_1, \mathbf{s}\_2)}{H\_1(\mathbf{s}\_1)H\_1(\mathbf{s}\_2)}.\tag{10.10}$$

Next we look for a condition to suppress on top of(*k* ◦ *h*)<sup>2</sup> also (*k* ◦ *h*)3. Referring again to Table 10.2 we find the following condition

$$(k \diamond h)\_3 = k\_1 \ast h\_3 + 2 \, k\_2 \ast [h\_1 \otimes h\_2]\_{\text{sym}} + k\_3 \ast h\_1^{\otimes 3} = 0 \,\, .$$

As for the second order, this equation can be solved for *k*<sup>3</sup> only if *h*<sup>1</sup> has an inverse, in which case, using the previously obtained values for *k*<sup>1</sup> and *k*2, we find

$$k\_3 = \left(-h\_3 + 2h\_2 \ast (h\_1^{\otimes 2})^{\ast -1} \ast [h\_1 \otimes h\_2]\_{\text{sym}}\right) \ast (h\_1^{\otimes 3})^{\ast -1} \tag{10.11}$$

with Laplace transform

$$K\_3(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) = \frac{-H\_3(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) + 2\left[\frac{H\_2(\mathbf{s}\_1, \mathbf{s}\_2 + \mathbf{s}\_3)}{H\_1(\mathbf{s}\_2 + \mathbf{s}\_3)}H\_2(\mathbf{s}\_2, \mathbf{s}\_3)\right]\_{\text{sym}}}{H\_1(\mathbf{s}\_1)H\_1(\mathbf{s}\_2)H\_1(\mathbf{s}\_3)}.\tag{10.12}$$

This procedure can be extended to find the transfer functions of K up to order *j* such that they cancel the nonlinear responses of K ◦ H up to the *j*th order. The condition for the existence of *k <sup>j</sup>* is always the same: the existence of the inverse of *h*1. This is so because in each equation (*k* ◦ *h*)*<sup>j</sup>* = 0, *k <sup>j</sup>* appears convolved with *h*⊗*<sup>k</sup>* <sup>1</sup> . If we let *j* tend to infinity we obtain a post-lineariser suppressing all nonlinear responses of H.

The impulse responses of a pre-lineariser G can be obtained following a similar procedure. To preserve the response of H, its linear response must be a Dirac impulse as for a post-lineariser

$$\mathbf{g}\_1 = \boldsymbol{\delta}.$$

The second order response of H ◦ G disappears if

$$g\_2 = -h\_1^{\*-1} \* h\_2 \tag{10.13}$$

or, expressed in the Laplace domain, if

$$G\_2(s\_1, s\_2) = -\frac{H\_2(s\_1, s\_2)}{H\_1(s\_1 + s\_2)}.\tag{10.14}$$

The third order response of H ◦ G disappears if

$$\log\_3 = h\_1^{\*-1} \ast \left( -h\_3 + 2h\_2 \ast \left[ \delta \otimes (h\_1^{\*-1} \ast h\_2) \right]\_{\text{sym}} \right) \tag{10.15}$$

whose Laplace transform is

$$G\_3(s\_1, s\_2, s\_3) = \frac{-H\_3(s\_1, s\_2, s\_3) + 2\left[H\_2(s\_1, s\_2 + s\_3)\frac{H\_2(s\_2, s\_3)}{H\_1(s\_2 + s\_3)}\right]\_{\text{sym}}}{H\_1(s\_1 + s\_2 + s\_3)}\tag{10.16}$$

and so on. Again, the prerequisite for the existence of these solutions is the existence of the inverse of *h*1. Note also that in general the transfer functions of a pre-lineariser are different from the ones of a post-lineariser.

In summary, we can state that a *weakly nonlinear system can be linearised with a pre- or a post-lineariser only if its linear transfer function has a stable inverse* in the convolution algebra of interest. In the convolution algebra of right sided distributions this means the existence of a causal and stable inverse.

A generic linear system may not have an inverse. For example, if the linear impulse response *h*<sup>1</sup> is a right-sided, indefinitely differentiable function then *h*<sup>1</sup> ∗ w is an indefinitely differentiable function independently from the choice of w. This means that *h*<sup>1</sup> ∗ w = δ has no solution and hence *h*<sup>1</sup> has no inverse.

A class of systems of special interest to us is the class of causal systems whose transfer functions are rational functions

$$H\_1(s) = \frac{N(s)}{P(s)}\,\,.$$

For this class of systems *H*1(*s*) is stable and has a causal stable inverse if all poles *and zeros* of *H*1(*s*) are in the left-half of the complex plane.

#### **Example 10.5: Memory-less System Linearisation**

In this example we consider a third order memory-less system H with impulse responses

$$h\_1 = a\_1 \delta \qquad h\_2 = 0 \qquad h\_3 = -a\_3 \delta^{\otimes 3} \dots$$

We would like to find a pre-lineariser G suppressing the responses of third order.

The linear impulse response of the system has an inverse

$$a\_1 \delta \* \frac{1}{a\_1} \delta = \delta .$$

Therefore it can be linearised using the results of this section. As *h*<sup>2</sup> = 0, the secondorder impulse response of the pre-lineariser must also vanish

$$\mathbf{g}\_2 = \mathbf{0}\_\cdot$$

The third order impulse response of the pre-lineariser is obtained by applying (10.15) and we find

$$g\_3 = \frac{1}{a\_1} \delta \ast a\_3 \delta^{\otimes 3} = \frac{a\_3}{a\_1} \delta^{\otimes 3} \dots$$

Note that while the pre-lineariser suppresses responses of third order, it does introduce responses of higher order

$$\begin{split} h \diamond g &= a\_1 \delta \ast (\delta + \frac{a\_3}{a\_1} \delta^{\otimes 3}) - a\_3 \delta^{\otimes 3} \ast (\delta + \frac{a\_3}{a\_1} \delta^{\otimes 3})^3 \\ &= a\_1 \delta - 3 \frac{a\_3^2}{a\_1} \delta^{\otimes \delta} - 3 \frac{a\_3^3}{a\_1^2} \delta^{\otimes 7} - \frac{a\_3^4}{a\_1^3} \delta^{\otimes 9} .\end{split}$$

It's easy to see that to suppress the nonlinear responses up to order *k* the pre-lineariser must be of order *k*. To suppress them all a full pre-lineariser is needed.

# **10.4 System Manipulations**

In this section we highlight some properties of weakly nonlinear systems that allow us to manipulate weakly nonlinear system composed by sub-systems in such a way as to obtain different interconnections of the sub-systems without changing the behavior of the overall system.

The first property that we discuss is the associativity of addition which comes from the fact that D- <sup>⊕</sup>,sym is a vector space. Thus if *f*, *g* and *h* are three weakly nonlinear systems driven by the same input signal *x*, the ways in which the outputs are summed is irrelevant

$$f\left(f[\mathbf{x}] + \mathbf{g}[\mathbf{x}]\right) + h[\mathbf{x}] = f[\mathbf{x}] + \left(\mathbf{g}[\mathbf{x}] + h[\mathbf{x}]\right) = f[\mathbf{x}] + \mathbf{g}[\mathbf{x}] + h[\mathbf{x}] \dots$$

The same is true for the product of the output signals

$$f(f[\mathbf{x}] \cdot \mathbf{g}[\mathbf{x}]) \cdot h[\mathbf{x}] = f[\mathbf{x}] \cdot (\mathbf{g}[\mathbf{x}] \cdot h[\mathbf{x}]) = f[\mathbf{x}] \cdot \mathbf{g}[\mathbf{x}] \cdot h[\mathbf{x}] \dots$$

This is the case because the product that we defined on D- <sup>⊕</sup>,sym is defined in terms of the tensor product and the latter is associative.

A second important property is commutativity. Addition is always commutative, therefore the order in which the signal appears as input to adders is irrelevant

$$f[\mathbf{x}] + \mathbf{g}[\mathbf{x}] = \mathbf{g}[\mathbf{x}] + f[\mathbf{x}] \dots$$

While the tensor product is not commutative the *symmetrised* tensor product is and with it the product in D- ⊕,sym

$$f[\mathbf{x}] \cdot \mathbf{g}[\mathbf{x}] = \mathbf{g}[\mathbf{x}] \cdot f[\mathbf{x}] \, .$$

Thus the order in which the signals appearing as input to multipliers is irrelevant as well. In fact, because it's cumbersome to draw symmetrised block diagrams, *we will generally draw unsymmetrised block diagrams and, if not stated explicitly, imply symmetrisation.*

A further equivalence of block diagrams comes from the distributivity of the product over addition

$$\begin{aligned} (f[\mathbf{x}] + \mathbf{g}[\mathbf{x}]) \cdot h[\mathbf{x}] &= f[\mathbf{x}] \cdot h[\mathbf{x}] + \mathbf{g}[\mathbf{x}] \cdot h[\mathbf{x}], \\ f[\mathbf{x}] \cdot (\mathbf{g}[\mathbf{x}]) + h[\mathbf{x}]) &= f[\mathbf{x}] \cdot \mathbf{g}[\mathbf{x}] + f[\mathbf{x}] \cdot h[\mathbf{x}] \,. \end{aligned}$$

This property originates from the multi-linearity of the tensor product. A block diagram representation of the first equality is shown in Fig. 10.6.

Another equivalence is given by the equation

$$((\mathbf{g}\diamond f)[\mathbf{x}] + (h\diamond f)[\mathbf{x}] = (\mathbf{g}+h)\diamond f[\mathbf{x}]\dots$$

To prove the validity of this equation we prove its validity for terms of each order individually. To simplify the expressions, let's denote the sum of all *l*th tensor products resulting in a distribution of order *k* by

$$f\_k^{(l)} := \sum\_{\substack{|\alpha|=l\\|\kappa \alpha|=k}} \left[ f^{\otimes \alpha} \right]\_{\text{sym}}$$

with <sup>α</sup> a multi-indexe in <sup>N</sup>*<sup>k</sup>* and <sup>κ</sup> <sup>=</sup> (1, <sup>2</sup>,..., *<sup>k</sup>*). With this notation the *<sup>k</sup>*th order impulse responses of the summands on the left-hand side can be written as

$$(g \circ f)\_k = \sum\_{l=1}^k g\_l \ast f\_k^{(l)}, \qquad (h \circ f)\_k = \sum\_{l=1}^k h\_l \ast f\_k^{(l)}\ .$$

The two can be combined using the distributivity of convolution (3.13) to obtain

$$\sum\_{l=1}^{k} (\mathbf{g}\_l + h\_l) \ast f\_k^{(l)}$$

which is the *k*th order impulse response of the expression on the right-hand side.

The last useful property in manipulating block diagrams is the *right distributivity of composition*

$$(\mathbf{g}\diamond f)[\mathbf{x}]\cdot(h\diamond f)[\mathbf{x}]=(\mathbf{g}\cdot h)\diamond f[\mathbf{x}]\,.$$

We prove again this equality by proving its validity for terms of each order individually. The impulse response of order *k* on the left-hand side is

$$\sum\_{i+j=k} \sum\_{l=1}^{i} \mathbf{g}\_l \ast f\_i^{(l)} \sum\_{m=1}^{j} h\_m \ast f\_j^{(m)} \dots$$

Hence, dropping symmetrisation operators for simplicity of notation

**Fig. 10.6** Distributivity of WNTI systems

**Fig. 10.7** Right distributivity of composition of WNTI systems. The empty circle represents either a sum or a product

$$\begin{aligned} \sum\_{l+j=k} \sum\_{l+m \le k} (g\_l \* f\_i^{(l)}) \otimes (h\_m \* f\_j^{(m)}) &= \sum\_{i+j=k} \sum\_{l+m \le k} (g\_l \otimes h\_m) \* (f\_i^{(l)} \otimes f\_j^{(m)}) \\ &= \sum\_{s=1}^k \sum\_{l+m=s} (g\_l \otimes h\_m) \* f\_k^{(s)} \end{aligned}$$

which corresponds to the *k*th order impulse response of the right-hand side of the equation. A block diagram representation of the property is shown in Fig. 10.7.

# **10.5 Structure**

A review of our development of the theory of weakly nonlinear systems up to this point reveals that weakly nonlinear systems arise out of stable linear systems and multipliers. In particular, multipliers are the only mean by which we can combine linear systems to produce systems of higher order.<sup>1</sup> In this section we investigate the overall structure of systems constructed this way.

Let's start by considering the most generic impulse response of second-order that can be constructed out of a single multiplier and linear systems *h <sup>A</sup>*, *hB* and *hC*

$$h\_2(\mathfrak{r}\_1, \mathfrak{r}\_2) = [h\_C \ast (h\_A \otimes h\_B)]\_{\text{sym}}\dots$$

The block diagram of a system whose only impulse response is *h*<sup>2</sup> is shown in Fig. 10.8a. We call a system whose only impulse response is *hi* a *monomial system of ith order.*

In Sect. 3.3 we showed that every distribution can be approximated to arbitrary accuracy by a set of weighted Dirac impulses. We can thus approximate the linear system *hC* by

$$h\_C(\tau) \approx \sum\_{n=0}^{N} c\_n \delta(\tau - \lambda\_n), \qquad c\_n \in \mathbb{C}, \quad \lambda\_n \in [0, \infty).$$

<sup>1</sup> In our formalism multiplication is represented by the tensor product. It is only at the end, when the output signal of interest is "evaluated on the diagoman" with the operator evd() that the tensor product collapses to a multiplication.

**Fig. 10.8 a** Block diagram of the most generic monomial system of second-order constructed with a single multiplier and linear systems *h <sup>A</sup>*, *hB* and *hC*. **b** Approximation of the system in Fig. 10.8 a

where we assume the system to be causal. Using this approximation in *h*<sup>2</sup> we obtain

$$h\_2(\mathbf{r}\_1, \mathbf{r}\_2) \approx \sum\_{n=0}^{N} \left[ c\_n \delta(\mathbf{r}\_1 - \lambda\_n) \* \left( h\_A(\mathbf{r}\_1) \otimes h\_B(\mathbf{r}\_2) \right) \right]\_{\text{sym}}.$$

The shifting property of convolution (3.16) extends to convolutions between distributions of different dimensions in a similar way as the differentiation rule (10.6). In particular for the one dimensional convolution *f*<sup>1</sup> and the *i*th dimensional one *gi* we have

$$f\_1(\tau\_1 - \lambda) \* g\_i(\tau\_1, \dots, \tau\_i) = f\_1(\tau\_1) \* g\_i(\tau\_1 - \lambda, \dots, \tau\_i - \lambda).$$

Using this property the response of the system can be expressed as

$$\begin{split} y\_2(\mathfrak{r}\_1, \mathfrak{r}\_2) &= h\_2(\mathfrak{r}\_1, \mathfrak{r}\_2) \* \left( \mathfrak{x}(\mathfrak{r}\_1) \otimes \mathfrak{x}(\mathfrak{r}\_2) \right) \\ &\approx \sum\_{n=0}^N c\_n \left[ h\_A(\mathfrak{r}\_1) \otimes h\_B(\mathfrak{r}\_2) \right]\_{\text{sym}} \* \left( \mathfrak{x}(\mathfrak{r}\_1 - \lambda\_n) \otimes \mathfrak{x}(\mathfrak{r}\_2 - \lambda\_n) \right). \end{split}$$

This shows that all delays required to approximate *hC* to any desired accuracy can be moved to delays of the input signal as illustrated in Fig. 10.8b.

If we use a similar approximation for *h <sup>A</sup>* and *hB* we obtain

**Fig. 10.9** Conceptual structure of a WNTI system

$$\begin{aligned} \mathcal{Y}\_2(\mathsf{r}\_1, \mathsf{r}\_2) \\ \approx & \sum\_{n\_c=0}^{N\_c} \sum\_{n\_d=0}^{N\_b} \sum\_{n\_b=0}^{N\_a} c\_{n\_c} \Big[ a\_{n\_d} b\_{n\_b} \Big]\_{\text{sym}} \mathbf{x} \,(\mathsf{r}\_1 - \lambda(n\_a + n\_c)) \otimes \mathbf{x} (\mathsf{r}\_2 - \lambda(n\_b + n\_c)) \Big] \end{aligned}$$

where we have assumed the use of equal and uniform delays for all sub-systems.

Monomial systems of higher order can be constructed in a similar way by combining linear systems and more multipliers. If we approximate all linear sub-systems as we did above for the second-order monomial system, it's easy to see that all delays can be moved to the input of the system. A system of order *K* is the sum of monomial sub-systems of order up to *K*. Therefore, *weakly nonlinear systems of finite order can be represented as composed by two sections*: An input *tapped delay line* sub-system that represents the memory of the system and a *memoryless* sub-system composed by adders and multipliers as illustrated in Fig. 10.9.

An estimate for the maximum delay necessary to faithfully represent a given system of order *K* can be obtained from the sampling theorem (see Example 12.5): If the maximum frequency component of the input signal is *f*max, then the highest frequency at the output of the system is *K f*max and the delay must be bounded by

$$
\lambda < \frac{1}{2 \, K f\_{\text{max}}}.
$$

The number of taps depends on the amount of memory of the linear sub-systems to be approximated.

The system structure represented in Fig. 10.9 is not the most economical one. A comparison between Fig. 10.8a and b reveals that if one moves all the system memory to the input of the system then one needs a larger number of multipliers than by distributing the memory across sub-systems. This is entirely analogous to the trade-off in the implementation of discrete time filters as finite-impulse response (FIR) versus infinite-impulse response (IIR) filters.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 11 Weakly Nonlinear Time Invariant Circuits**

The aim of this chapter is to show the utility of the theory that we developed. This is done by applying it to the analysis of nonlinear effects, that is of deviation from linear behaviour, in analog circuits. The vast majority of analog circuits are limited by noise on the bottom end of their dynamic range and by nonlinear effects on the upper end. While the analysis of noise is well understood by practising engineers, the analysis of nonlinear effects is much less so, and their minimisation poses great practical challenges. The applications presented in this chapter are therefore of practical utility.

The components serving as the building blocks of analog circuits can be represented by linear elements and controlled sources representing nonlinear behaviour. The total response of the circuit can be calculated from a hierarchy of electrical networks with the familiar small-signal linear network forming its core. The hierarchy of networks is constituted by the linear core driven by sources of increasing order. This can be seen as a specialisation to electrical networks of the signal-flow graph method that we saw in Sect. 10.2.

Analog electrical circuits are operated around a stable equilibrium point called the (quiescent) *operating point* of the circuit. The dynamic variables of interest in the theory of weakly nonlinear systems are the ones describing the deviation from the operating point (see Sect. 9.1). We call such variables *small-signal* (or *incremental*) variables. In the following, to distinguish the incremental part of a quantity from the total quantity, we will adopt the notational conventions summarised in Table 11.1.

In Sects. 11.2 and 11.3 of this chapter we develop equivalent circuits for electronic components allowing us to model arbitrary weakly nonlinear analog circuits. In the remaining sections we study concrete circuits used in many types of systems and in particular in communication systems. Before that, in the following section we review a few standard metrics used to characterise the nonlinear behaviour of weakly nonlinear analog circuits.


**Table 11.1** Definition of symbols used for various quantities

# **11.1 Metrics for Nonlinear Effects**

It's common to distinguish between two classes of nonlinear effects. The first is characterised with input signals of large magnitude, the compression characteristics being the archetypal example. The second is characterised using small signals with intermodulation as the archetypal example. In the following we analyse these and related effects.

# *11.1.1 Gain Compression and Expansion*

Gain compression and gain expansion refer to the change in the gain experienced by a signal passing through a weakly nonlinear system as the amplitude of the input signal changes. At sufficiently large input signal levels all electronic circuits exhibit saturation. However, at the onset of deviation of gain from the small signal value, we may observe a gradual gain reduction, referred to as gain compression; or some gain increase, referred to as gain expansion (see Fig. 11.1). Which of these effects occurs and at which signal level depends on the nonlinear characteristics of the system.

Consider a weakly nonlinear system H driven by a sinusoidal signal

$$\mathbf{x}(t) = |A\_i|\cos(\alpha\_1 t + \varphi\_1) = \Re\left\{A\_i e^{J\alpha\_1 t}\right\}.$$

As discussed in Sect. 9.8.2 its output is composed by tones at ω<sup>1</sup> and at integer multiples of it, the harmonics. Let's denote by *y*ω<sup>1</sup> the sum of all the terms at ω<sup>1</sup>

$$\begin{split} \mathcal{Y}\_{\boldsymbol{\alpha}\_{1}}(t) &:= |A\_{\boldsymbol{\alpha}}| \cos(\boldsymbol{\alpha}\_{1}t + \psi\_{1}) = \mathfrak{N} \Big\{ A\_{\boldsymbol{\alpha}} e^{J\boldsymbol{\alpha}\_{1}t} \Big\} \\ &= \mathcal{Y}\_{1,(0,1)}^{\boldsymbol{c}}(t) + \mathcal{Y}\_{3,(1,2)}^{\boldsymbol{c}}(t) + \mathcal{Y}\_{5,(2,3)}^{\boldsymbol{c}}(t) + \cdots \\ &= \mathfrak{N} \Big\{ \Big[ \hat{h}\_{1,(0,1)} + \frac{3}{4} |A\_{i}|^{2} \, \hat{h}\_{3,(1,2)} + \frac{5}{8} |A\_{i}|^{4} \, \hat{h}\_{5,(2,3)} + \cdots \\ \end{split} $$

From this expression we see that *y*<sup>ω</sup><sup>1</sup> is proportional to the input signal. Therefore, in a similar way as we do with LTI systems, we can consider the ratio of the output signal phasor to the one of the input signal and obtain a sort of frequency response. However, differently from the frequency response of linear systems, the obtained ratio is a function of the input signal amplitude and is called the *describing function*

$$K(|A\_i|, \alpha\_1) := \frac{A\_o}{A\_i} = \hat{h}\_{1,(0,1)} + \frac{3}{4} |A\_i|^2 \hat{h}\_{3,(1,2)} + \frac{5}{8} |A\_i|^4 \hat{h}\_{5,(2,3)} + \dotsb \ . \tag{11.1}$$

Its magnitude is called the *gain* of the system

$$|G(\left|A\_i\right|,\left|\omega\_1\right|) := \left|K\left(\left|A\_i\right|,\left|\omega\_1\right|\right)\right| = \frac{\left|A\_o\right|}{\left|A\_i\right|}.$$

At sufficiently small input signal levels, at the onset of nonlinear behaviour, the third order nonlinearity usually dominates and the describing function can be approximated by

$$K(\left|A\_i\right|,\omega\_1) \approx \hat{h}\_{1,(0,1)} \cdot \left(1 + \frac{3}{4} \left|A\_i\right|^2 \frac{h\_{3,(1,2)}}{\hat{h}\_{1,(0,1)}}\right). \tag{11.2}$$

Note that we have factored the linear frequency response to obtain an explicit factor representing the deviation of the system's behaviour from the one of a perfectly linear system. This factor can be visualised in the complex plane as the sum of the vector

$$\Xi = \Xi\_r + J\Xi\_i := \frac{3}{4} \left| A\_i \right|^2 \frac{\hat{h}\_{3,(1,2)}}{\hat{h}\_{1,(0,1)}}; \qquad \Xi\_r, \Xi\_i \in \mathbb{R}$$

and the unit vector 1 (see Fig. 11.2). If the angle of is around 0° then the two vectors point approximately in the same direction. Therefore, as the amplitude of the input signal grows, the magnitude of the output signal grows faster than linearly and the system exhibits gain expansion. If the angle of is around 180◦ then the two vectors point approximately in opposite directions and the system exhibits gain compression.

If the angle is around ±90° then the vectors are approximately perpendicular and the gain of the system is less sensitive to variations of the input signal (terms of order higher than third will become important). However, in this case it is the *angle* of the output signal that is sensitive to changes in the input signal magnitude. Such a system is said to exhibit *amplitude-modulation (AM) tophase-modulation (PM) conversion*.

Let's have a closer look at the gain of the system. The ratio of the system gain to the one of the system if it would be perfectly linear is called the gain compression/expansion ratio and denoted by GCER

$$\text{GCER} := \frac{G(\left| A\_i \right|, \left. \omega\_1 \right)}{\left| \hat{h}\_{1,(0,1)} \right|}. \tag{11.3}$$

Using (11.2), at the onset of deviation from linear behaviour, it is given by

$$\begin{split} \mathbf{GCER} & \approx \sqrt{(1+\Xi\_r)^2 + \Xi\_i^2} \\ &= (1+\Xi\_r)\sqrt{1+\frac{\Xi\_i^2}{(1+\Xi\_r)^2}} \end{split}$$

If we expand the square root in a Taylor series

$$\text{GCER} \approx (1 + \Sigma\_r)(1 + \frac{1}{2} \frac{\Xi\_i^2}{(1 + \Xi\_r)^2} + \dotsb),$$

we see that, to first order, the GCER can be estimated by

$$\text{GCER} \approx 1 + \frac{3}{4} \Re \left\{ \frac{\hat{h}\_{3,(1,2)}}{\hat{h}\_{1,(0,1)}} \right\} \left| A\_i \right|^2 \,. \tag{11.4}$$

Given our small signal assumption, this expression should only be used to estimate gain compression or expansion up to ca. 1 dB.

A standard linearity metric used to test analog circuits is the 1 dB *compression point* which is the signal magnitude causing the system gain to decrease by 1 dB. Equation (11.4) allows estimating the magnitude of the input signal producing a given gain compression or expansion

$$|A\_i| \approx \sqrt{\frac{4}{3} \left| \frac{(\text{GCER} - 1)}{\Re\left\{ \frac{\hat{h}\_{3,(1,2)}}{\hat{h}\_{1,(0,1)}} \right\}} \right|}. \tag{11.5}$$

If -{*h*ˆ<sup>3</sup>,(1,<sup>2</sup>)/*h*ˆ<sup>1</sup>,(0,1)} is negative the 1 dB compression point can thus be estimated by

$$A\_{\rm{1dB}} := \frac{0.381}{\sqrt{\left| \Re \left\{ \frac{\hat{h}\_{3,(1,2)}}{\hat{h}\_{1,(0,1)}} \right\} \right|}} \,\,\,\,\tag{11.6}$$

For small input signals the phase change can also be calculated from the ratio *K*(|*Ao*| , ω1)/*h*ˆ1,(0,1)

$$
\Delta \psi\_1 = \arctan \frac{\Xi\_i}{1 + \Xi\_r} \approx \frac{\Xi\_i}{1 + \Xi\_r} \approx \Xi\_i \dots
$$

From this we can estimate the input signal magnitude producing a phase change of ψ<sup>1</sup> radiants by

$$|A\_i| \approx \sqrt{\frac{4}{3} \left| \frac{\Delta \psi\_1}{\Im \left\{ \frac{\hat{h}\_{3,(1,2)}}{\hat{h}\_{1,(0,1)}} \right\}} \right|}\,. \tag{11.7}$$

# *11.1.2 Intermodulation*

In Example 9.8 we analyzed the response of a weakly nonlinear system to a two tones input signal and found that it is composed by several tones at various frequencies. In the context of communication systems and analog circuit design all signal tones at a frequency that is not a multiple of one of the input frequencies are referred to as*intermodulation products*. An intermodulation product is said to be of order *k* and denoted by IM*k* if *k* is the lowest order nonlinearity able to produce it (see Fig. 9.8). For example, given input tones ω<sup>1</sup> and ω2, the tones at 2ω<sup>1</sup> − ω<sup>2</sup> and 2ω<sup>2</sup> − ω<sup>1</sup> are intermodulation products of third order (IM3); the ones at 3ω<sup>1</sup> − 2ω<sup>2</sup> and 3ω<sup>2</sup> − 2ω<sup>1</sup> of fifth order (IM5).

As an example showing the importance of controlling and limiting the strength of intermodulation products, consider a communication receiver designed for a specific service. Most communication services divide the allocated frequency band in equally spaced channels. Suppose that we are interested in receiving a signal transmitted by a distant transmitter on channel *j*. Suppose further that the receiver also receives relatively strong interfering signals on channels *j* + *m* and *j* + 2*m* destined to other users. If the receiver is not sufficiently linear then the two interfering signals will produce intermodulation products degrading and possibly completely masking the wanted signal. While the modulation of the involved signals plays a role, due to its **Fig. 11.3** Interfering signals causing the IM3 product to mask the wanted signal

simplicity, communication receivers are also invariably benchmarked and tested with tones as shown in Fig. 11.3.

Consider the two tones input signal

$$\mathbf{x}(t) = |A\_1|\cos(\omega\_1 + \varphi\_1) + |A\_2|\cos(\omega\_2 + \varphi\_2) = \Re\{A\_1 e^{j\alpha\_1} + A\_2 e^{j\alpha\_2}\}$$

where we assumeω<sup>2</sup> > ω<sup>1</sup> > 0. The intermodulation product of order *k* characterised by the frequency mix *m* is

$$\chi\_{k,m}^c(t) = \frac{1}{2^{k-1}} \frac{k!}{m!} \Re \{ A\_1^{m\_1} \overline{A\_1}^{m\_{-1}} A\_2^{m\_2} \overline{A\_2}^{m\_{-2}} \hat{h}\_{k,m} e^{a\_{m^t}} \} \dots$$

At relatively low input signal levels the strongest intermodulation products are the second and the third order ones with amplitudes

$$\begin{aligned} A\_{\text{IM2L}} &:= \left| \mathcal{y}\_{2,(0,1,0,1)}^{\boldsymbol{c}}(t) \right| = \left| A\_1 \right| \left| A\_2 \right| \left| \hat{h}\_{2,(0,1,0,1)} \right| \\ A\_{\text{IM2H}} &:= \left| \mathcal{y}\_{2,(0,0,1,1)}^{\boldsymbol{c}}(t) \right| = \left| A\_1 \right| \left| A\_2 \right| \left| \hat{h}\_{2,(0,0,1,1)} \right| \\ A\_{\text{IM3L}} &:= \left| \mathcal{y}\_{3,(0,2,0,1)}^{\boldsymbol{c}}(t) \right| = \frac{3}{4} \left| A\_1 \right|^2 \left| A\_2 \right| \left| \hat{h}\_{3,(0,2,0,1)} \right| \\ A\_{\text{IM3H}} &:= \left| \mathcal{y}\_{3,(0,1,0,2)}^{\boldsymbol{c}}(t) \right| = \frac{3}{4} \left| A\_1 \right| \left| A\_2 \right|^2 \left| \hat{h}\_{3,(0,1,0,2)} \right|. \end{aligned}$$

These expressions show that the IM2 products are proportional to the amplitudes of each of the two input tones while the IM3 products are proportional to the square of the magnitude of the closest tone and proportional to the magnitude of the more distant one (see Fig. 11.3).

The standard intermodulation test is performed with two tones of equal amplitude

$$|A\_1| = |A\_2| = A\ .$$

In this case the magnitude of the IM product of order *k* is proportional to *A<sup>k</sup>* (remember that |*m*| = *k*)

$$\frac{1}{2^{k-1}} \frac{k!}{m!} A^k \left| \hat{h}\_{k,m} \right| \ . $$

Thus knowing the IM*k* product level at one value of *A* is enough to compute its value at a different value of *A*. This is of course only true at sufficiently small input signals, when the contributions to the IM*k* product of nonlinearities of order higher than *k* can be neglected. Instead of specifying the IM*k* at a specific value of *A* it is common practice to specify the *intermodulation intercept point* of order *k* (IP*k*). This is the level, extrapolated from sufficiently small values of *A*, at which the IM*k* reaches the same magnitude as the (linear) output of the system at ω*<sup>m</sup>* when driven by a single tone of magnitude *A* and frequency ω*<sup>m</sup>* (see Fig. 11.4). The *k*th order intercept point is thus defined by the equation

$$\frac{1}{2^{k-1}} \frac{k!}{m!} A^k \left| \hat{h}\_{k,m} \right| = A \left| \hat{h}\_1(o\_m) \right| \dots$$

Solving for the amplitude we find

$$A\_{\rm IIB} := \sqrt[k-1]{\frac{2^{k-1}m!}{k!} \left| \frac{\hat{h}\_1(\omega\_m)}{\hat{h}\_{k,m}} \right|}\,. \tag{11.8}$$

This quantity is also called the *input referred IPk* and denoted by IIP*k*. Sometimes it is more convenient to refer this quantity to the output of the circuit in which case it is called the *output referred IPk* and denoted by OIP*k*. Its value is found by multiplying the IIP*k* by the linear gain at ω*<sup>m</sup>*

202 11 Weakly Nonlinear Time Invariant Circuits

$$A\_{\rm OIPk} := \left| \hat{h}\_1(\phi\_m) \right| A\_{\rm IIPk} \,. \tag{11.9}$$

The second and third order intercept points are the most important ones and can be estimated by

$$A\_{\rm IIB2} = \left| \frac{\ddot{h}\_1(\omega\_m)}{\hat{h}\_{2,m}} \right| \tag{11.10}$$

$$A\_{\rm IIB^3} = \sqrt{\frac{4}{3} \left| \frac{\hat{h}\_1(\alpha\_m)}{\hat{h}\_{3,m}} \right|}\Big|\,. \tag{11.11}$$

Expressed in decibels the IP*k* assumes a particularly simple form. To that end, let's first rewrite the output referred IP *k* as

$$\begin{split} A\_{\mathrm{OIPk}} &= \left| \hat{h}\_{1}(\omega\_{\mathrm{m}}) \right| \stackrel{k\_{-}}{\sqrt{\frac{2^{k-1}m!}{k!} \left| \frac{A^{k}}{A^{k}} \frac{\hat{h}\_{1}(\omega\_{\mathrm{m}})}{\hat{h}\_{k,\mathrm{m}}} \right|}} \stackrel{k\_{1}}{=} \sqrt{\frac{A \left| \hat{h}\_{1}(\omega\_{\mathrm{m}}) \right|}{A^{k} \frac{k!}{2^{k-1}m!} \left| \hat{h}\_{k,\mathrm{m}} \right|}}. \end{split}$$

Then note that

$$\left(A \left| \hat{h}\_1(a\_m) \right| \right)^2$$

is the output power of the fundamental tone normalised to a load of 1/2 . Similarly,

$$\left(A^k \frac{k!}{2^{k-1}m!} \left| \hat{h}\_{k,m} \right| \right)^2$$

is the one of the IM*k* product. Thus, if for a fixed and sufficiently small value of *A* we denote by *Po* the output power of the fundamental expressed in dB relative to some reference power and by *P*IM*<sup>k</sup>* the one of the IM*k* product relative to the same reference level, then we can express the OIP*k* by

$$\text{OIPk} = P\_o + \frac{P\_o - P\_{\text{IMk}}}{k - 1} \,. \tag{11.12}$$

Similarly, by denoting the normalised power of an input tone by *Pt* , the IIP*k* can be expressed by

$$\text{IIP}k = P\_t + \frac{P\_o - P\_{\text{IMk}}}{k - 1} \,. \tag{11.13}$$

These relationships are easily checked geometrically for the IP2 and IP3 in Fig. 11.4.

In *memory-less* weakly nonlinear systems for which, for every *k*, *h*ˆ *<sup>k</sup>*,*<sup>m</sup>* is a real number *ck* independent of *m*, the IP3 and the input signal level producing a gain compression/expansion of GCER are both proportional to (see (11.5))

$$
\sqrt{\left|\frac{c\_1}{c\_3}\right|}\dots
$$

Therefore, in this type of systems, these two quantities are proportional to each other

$$20\log\left(\frac{A\_{\text{1dB}}}{A\_{\text{IIP3}}}\right) = 20\log\sqrt{|\text{GCER}-1|}\text{ .}$$

For a memory-less system exhibiting gain compression, the difference between the IP3 and the 1 dB compression point is

$$20\log\left(\frac{A\_{\text{IdB}}}{A\_{\text{IIP3}}}\right) = 20\log\sqrt{1 - 10^{-1/20}} \approx -9.6\text{ dB}\text{ .}$$

# *11.1.3 Desensitisation*

The response of an LTI system to a signal is unaffected by the presence of a second signal. As long as we have a way of distinguishing the two signals, for example by separating them in frequency, we can ignore the presence of the second one. This is not the case in nonlinear systems where the response to one signal is affected by the presence of other ones. The effect is again most easily illustrated using a two tones input signal.

Let H be a weakly nonlinear system driven by the two tones input signal

$$\mathbf{x}(t) = |A\_1|\cos(\omega\_1 + \varphi\_1) + |A\_2|\cos(\omega\_2 + \varphi\_2) = \Re\{A\_1 e^{l\omega\_1} + A\_2 e^{l\omega\_2}\} \dots$$

The first tone represents the signal of interest, while the second one is an undesired signal that is referred to as a *blocking signal* or a *jammer*. As discussed, the response of the system is composed by several tones at various frequencies, among which several at ω1. As in Sect. 11.1.1, we denote by *y*<sup>ω</sup><sup>1</sup> the sum of all terms at ω<sup>1</sup>

$$\begin{split} \mathbf{y}\_{\boldsymbol{\alpha}\_{1}}(t) &:= |A\_{o}| \cos(\omega\_{1}t + \psi\_{1}) = \mathfrak{N} \{ A\_{o} e^{I\boldsymbol{\alpha}t} \} \\ &= \mathbf{y}\_{1,(0,0,1,0)}^{c}(t) + \mathbf{y}\_{3,(0,1,2,0)}^{c}(t) + \mathbf{y}\_{3,(1,0,1,1)}^{c}(t) + \cdots \\ &= \mathfrak{N} \left\{ \hat{h}\_{1,(0,0,1,0)} + \frac{3}{4} |A\_{1}|^{2} \hat{h}\_{3,(0,1,2,0)} + \frac{3}{2} |A\_{2}|^{2} \hat{h}\_{3,(1,0,1,1)} + \cdots \right \} \mathbf{A}\_{1} e^{I\boldsymbol{\alpha}t} \right\}. \end{split}$$

and find again an expression that is proportional to the phasor of the first input tone. At relatively small input signal levels the contributions of order higher than third can usually be neglected. In addition, we assume that the magnitude of the blocking signal is much larger than the one of the desired signal

$$|A\_1| \ll |A\_2|\,\,\,\,\,\,$$

Under these assumptions *y*ω<sup>1</sup> can be simplified to

$$\mathbf{y}\_{\alpha\_{\parallel}}(t) \approx \Re \left\{ \left[ \hat{h}\_{1,(0,0,1,0)} + \frac{3}{2} \left| A\_2 \right|^2 \hat{h}\_{3,(1,0,1,1)} \right] A\_1 e^{J^{\alpha\_{\parallel}t}} \right\} \dots$$

Following a procedure similar to the one that we used to analyse gain compression and expansion, we build the ratio of the output phasor to the one of the first tone

$$X M(\left|A\_2\right|, \left.\alpha\_1\right) := \frac{A\_o}{A\_1} = \hat{h}\_{1,(0,0,1,0)} \cdot \left(1 + \frac{3}{2} \left|A\_2\right|^2 \frac{\hat{h}\_{3,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}}\right)$$

to obtain a sort of frequency response. Similarly to the approximation of the describing function (11.2), it is the product of the linear frequency response of the system and a factor that characterises the deviation from linear behaviour. Differently from the describing function, however, this second factor depends on the amplitude of the *second* tone, the blocking signal.

The ratio *X M*(|*A*2| , ω1)/*h*ˆ1,(0,0,1.0) can again be visualised in the complex plane as the sum of the unit vector and the vector

$$\frac{3}{2}|A\_2|^2 \frac{\hat{h}\_{3,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}} \dots$$

If the angle of the latter is close to 180° then the second tone will induce a reduction in the gain experienced by the first one. If the angle is close to 0° it will induce a gain expansion and, if the angle is close to ±90° it will induce mostly a change in the phase of the first tone. The change in gain can be characterised by the magnitude of the above ratio, the *desensitisation ratio*

$$DR := \left| \frac{X M(|A\_2|, \,\omega\_1)}{\hat{h}\_{1,(0,0,1,0)}} \right| = \left| 1 + \frac{3}{2} |A\_2|^2 \frac{\hat{h}\_{3,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}} \right| \tag{11.14}$$

and to second order in |*A*2| can be estimated by

$$DR \approx 1 + \frac{3}{2} \left| A\_2 \right|^2 \Re \left\{ \frac{\ddot{h}\_{3,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}} \right\}. \tag{11.15}$$

From this expression we can estimate the magnitude of the blocker causing a certain wanted signal gain change

#### 11.2 Nonlinear Two-Terminal Elements 205

$$|A\_2| \approx \sqrt{\frac{2}{3} \left| \frac{(DR-1)}{\Re\left\{\frac{\hat{h}\_{3,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}}\right\}} \right|}\,. \tag{11.16}$$

If -{*h*ˆ<sup>3</sup>,(1,0,1,<sup>1</sup>)/*h*ˆ<sup>1</sup>,(0,0,1,0)} is negative, a desensitisation of 1 dB is produced by a blocker at the 1 dB *blocking level*

$$A\_{\rm B1dB} := \frac{0.269}{\sqrt{\left| \Re \left\{ \frac{\hat{h}\_{\rm b,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}} \right\} \right|}} \cdot \tag{11.17}$$

The change in phase of the first tone caused by the presence of the blocker can also be estimated from *X M*(|*A*2| , ω1)/*h*ˆ1,(0,0,1.0). To first order a phase change of ψ<sup>1</sup> radiants is produced by a blocker of magnitude

$$|A\_2| \approx \sqrt{\frac{2}{3} \left| \frac{\Delta \psi\_1}{\Im \left\{ \frac{\hat{h}\_{\lambda,(1,0,1,1)}}{\hat{h}\_{1,(0,0,1,0)}} \right\}} \right|}. \tag{11.18}$$

Note that if the blocker is modulated, then the modulation will be transferred from it to the wanted signal. For example, if the blocker is amplitude modulated (AM) and the angle of *h*ˆ3,(1,0,1,<sup>1</sup>)/*h*ˆ1,(0,0,1,0) is close to either 180°or 0° then the gain experienced by the wanted signal is modulated and, as a result, its output amplitude will also be modulated. If the angle of *h*ˆ3,(1,0,1,<sup>1</sup>)/*h*ˆ1,(0,0,1,0) is close to ±90° then an amplitude modulation of the blocker will produce a phase modulation of the wanted signal. This effect of transferring the modulation of one signal to another one is called *cross-modulation*.

# **11.2 Nonlinear Two-Terminal Elements**

In this section we investigate two-terminal electrical components that can be characterised by two quantities *xE* and *yE* , related by an equation of the form

$$f(\mathbf{x}\_E, \mathbf{y}\_E) = 0$$

with *f* a function called the element *x*-*y* characteristic (see Fig. 11.5). If the equation can be expressed as a function of *xE* , *yE* = ˜*f* (*xE* ) then the element is called an*x*controlled device. Similarly, if it can be expressed as a function of *yE* , *xE* = ˜*f* (*yE* ) then it is called a *y*-controlled device.

**Fig. 11.5** Characteristic of an *x*-controlled two-terminal element

The devices that interest us are the ones that, in a region of interest around a quiescent operating point (*XE* , *YE* ), are either *x*- or *y*-controlled and whose function ˜*f* can be approximated to any desired accuracy by a power series

$$\mathbf{y}\_{\mathfrak{e}} = \sum\_{k=1}^{\infty} \tilde{f}\_{k} \mathbf{x}\_{\mathfrak{e}}^{k} \qquad (\text{ $\chi$ -controlled}),$$

or

$$\mathbf{x}\_{\varepsilon} = \sum\_{k=1}^{\infty} \tilde{f}\_k \mathbf{y}\_{\varepsilon}^k \qquad (\text{y-controlled})$$

with

$$\mathbf{y}\_e = \mathbf{y}\_E - Y\_E, \qquad \mathbf{x}\_e = \mathbf{x}\_E - X\_E.$$

# *11.2.1 Nonlinear Resistors*

A nonlinear resistor is a device characterised by the current *iR* flowing through it, the voltage v*<sup>R</sup>* across its terminals and by an *i*-v characteristic *fR*(*iR*, v*R*) = 0. In the following we are going to represent a nonlinear resistor by the symbol shown in Fig. 11.6. A current controlled resistor can be characterised by a function

$$v\_{\mathcal{R}} = r(i\_{\mathcal{R}})$$

which, by assumption, around the operating point (*IR*, *VR*), can be approximated by a power series

$$v = \sum\_{k=1}^{\infty} r\_k i^k, \qquad v = v\_r = v\_R - V\_R \ , \quad i = i\_r = i\_R - I\_R \ .$$

If we consider the current *i* and the voltage v as signals, or, more precisely, elements of D <sup>⊕</sup>,sym, then a nonlinear resistor can be regarded as a weakly nonlinear system and the components of v can be expressed in terms of the ones of *i* using (10.1) and Table 10.1

$$\begin{aligned} v\_1 &= r\_1 i\_1 \\ v\_2 &= r\_1 i\_2 + r\_2 i\_1^{\otimes 2} \\ v\_3 &= r\_1 i\_3 + 2r\_2 \left[ i\_1 \otimes i\_2 \right]\_{\text{sym}} + r\_3 i\_1^{\otimes 3} \\ v\_1 &\dots \end{aligned} \tag{11.19}$$

From this representation we observe that each voltage component v*<sup>k</sup>* is determined (i) by a term proportional to the *k*th current component *ik* and (ii) by other terms proportional to current components of order lower than *k*. In an electric network, the former can be represented by a linear resistor of value *r*1, the latter by a voltage source v˜*R*,*<sup>k</sup>* whose value is determined by the current components *in*, *n* = 1,..., *k* − 1 (see Fig. 11.7a)

$$v\_k = r\_1 i\_k + \tilde{v}\_{\mathcal{R},k}(i\_1, \dots, i\_{k-1}) \dots$$

The various current and voltage components can therefore be calculated using a hierarchy of *linear* networks. First, we find the linear current *i*<sup>1</sup> using linearised components and the sources representing the system input. Once *i*<sup>1</sup> is found, v˜*R*,<sup>2</sup> can be determined. With it we can draw the second order network. It is obtained from the linearised network by removing the system input sources (since they are of first order), by adding the second order source v˜*<sup>R</sup>*,<sup>2</sup> and, if the case, the ones of other nonlinear components. With this network we compute *i*2. Having found the first two components of *i*, the third order source v˜*<sup>R</sup>*,<sup>3</sup> can be calculated. We then proceed to draw the third order network which is again composed by the linearised network with the addition of independent sources of third order only. With it, we find *i*<sup>3</sup> and so on.

If the nonlinear resistor is voltage controlled, then its characteristic around the operating point can be described by a power series where the role of the independent variable is played by the voltage v

$$i = \sum\_{k=1}^{\infty} g\_k v^k \dots$$

$$\left| \begin{array}{c} \mathsf{a} \\\\ \vdots \\\\ v\_{k} \\\\ \mathsf{b} \\ \mathsf{c} \\ \mathsf{b} \end{array} \right| \right| \overset{\mathsf{a}}{\leqslant} r\_{1} \qquad \qquad \qquad \qquad \qquad \stackrel{\mathsf{b}}{\underset{\mathsf{c}}{\overset{\mathsf{c}}{\leq}} i\_{k}} \\\\ \left| \begin{array}{c} \mathsf{a} \\\\ \vdots \\\\ v\_{k,k}(i\_{1},\ldots,i\_{k-1}) \end{array} \right| \overset{\mathsf{a}}{\leqslant} r\_{1} \qquad \left(\begin{array}{c} \mathsf{b} \\\\ \vdots \\\\ \mathsf{b} \end{array} \right) \overset{\mathsf{a}}{\leqslant} i\_{\mathsf{R},k}(v\_{1},\ldots,v\_{k-1}) \end{array}$$

**Fig. 11.7 a** Weakly nonlinear resistor current-controlled equivalent model **b** Weakly nonlinear resistor voltage-controlled equivalent model

Proceeding as for the case of a current controlled nonlinear resistor, but with the roles of the signals *i* and v exchanged, we can express the first few components of the current *i* in terms of the ones of the voltage

$$\begin{aligned} i\_1 &= g\_1 v\_1 \\ i\_2 &= g\_1 v\_2 + g\_2 v\_1^{\otimes 2} \\ i\_3 &= g\_1 v\_3 + 2g\_2 \left[ v\_1 \otimes v\_2 \right]\_{\text{sym}} + g\_3 v\_1^{\otimes 3} \\ &\dots \end{aligned} \tag{11.20}$$

As before, each current component *ik* is the sum of a term linear in v*<sup>k</sup>* and other terms only depending on components of v of order lower than *k*

$$i\_k = \mathbf{g}\_1 \boldsymbol{\upsilon}\_k + i\_{\mathcal{R},k}(\boldsymbol{\upsilon}\_1, \dots, \boldsymbol{\upsilon}\_{k-1}) \dots$$

From this representation we deduce the equivalent circuit shown in Fig. 11.7b.

If a nonlinear resistor is voltage as well as current controlled, then we can choose the most convenient representation for the problem at hand. If one representation is known, then the other one can be obtained by power series inversion. For example, if we know the voltage-controlled representation, the current-controlled one is obtained by inserting the expression for the components given by (11.20) into (11.19) and by choosing the coefficients*rk* so that the equations are satisfied. Specifically,*r*<sup>1</sup> is found by solving

$$i\_1 = \mathbf{g}\_1 v\_1 = \mathbf{g}\_1 r\_1 i\_1$$

which gives

$$r\_1 = \frac{1}{\mathbf{g}\_1}.$$

*r*<sup>2</sup> is obtained by solving

$$\dot{a}\_2 = g\_1 v\_2 + g\_2 v\_1^{\otimes 2} = g\_1 (r\_1 i\_2 + r\_2 i\_1^{\otimes 2}) + g\_2 r\_1^2 i\_1^{\otimes 2} \dots$$

Using the previously obtained value for *r*1, the equation is satisfied if

$$\begin{aligned} \mathbf{g}\_1 r\_2 + \mathbf{g}\_2 r\_1^2 &= 0 \\\\ \mathbf{g}\_2 \end{aligned}$$

or

$$r\_2 = -\frac{\mathbf{g}\_2}{\mathbf{g}\_1^3} \dots$$

*r*<sup>3</sup> is found in a similar way to be

$$r\_3 = \frac{2\mathbf{g}\_2^2 - \mathbf{g}\_1\mathbf{g}\_3}{\mathbf{g}\_1^s} \dots$$

Higher order coefficients are easily calculated using the same procedure.

# *11.2.2 Nonlinear Capacitors*

A nonlinear capacitor is a two-terminal device whose voltage v*<sup>C</sup>* across the terminals and the charge *qC* stored in it are related by a *q*-v characteristic *fC*(*qC*, v*C*) = 0. In the following, we are going to represent a nonlinear capacitor by the symbol shown in Fig. 11.6. A voltage controlled capacitor is a capacitor whose charge is a function of the voltage *qC* = ˜*fC*(v*C*). Since the electric current is the time derivative of electric charge, if the voltage v*<sup>C</sup>* is a differentiable function of time, the capacitor current is related to the voltage across its terminals by

$$i\_C = \frac{\operatorname{d}f\_C(v\_C)}{\operatorname{d}v\_C} \frac{\operatorname{d}v\_C}{\operatorname{d}t} \, . $$

The slope of the *q*-v characteristic is called the *small signal* (or *incremental*) *capacitance* of the nonlinear capacitor

$$C(v\_C) := \frac{\mathrm{d}\bar{f}\_C(v\_C)}{\mathrm{d}v\_C} \dots$$

As before, we assume it to be expandable in a power series around the operating point (*QC*, *VC*)

$$c(v) := C(v + V\_C) = \sum\_{k=0}^{\infty} c\_{k+1} v^k, \qquad v = v\_C - V\_C \dots$$

Using this expression in the equation for the current, we can express the latter as the following power series

210 11 Weakly Nonlinear Time Invariant Circuits

$$i\_C = \sum\_{k=0}^{\infty} c\_{k+1} v^k \frac{\mathbf{d}v}{\mathbf{d}t} = \sum\_{k=1}^{\infty} \frac{c\_k}{k} \frac{\mathbf{d}}{\mathbf{d}t} v^k \dots$$

This last expression can be extended to currents and voltages represented by elements of D <sup>⊕</sup>,sym, so that we take it as defining the relationship between current and voltage of a voltage-controlled weakly-nonlinear capacitor

$$i = i\_C = \sum\_{k=1}^{\infty} \frac{c\_k}{k} Dv^k. \tag{11.21}$$

The first few components of the current expressed in terms of the components of the voltage are given by

$$\begin{aligned} i\_1 &= c\_1 D v\_1 \\ i\_2 &= c\_1 D v\_2 + \frac{c\_2}{2} D v\_1^{\otimes 2} \\ i\_3 &= c\_1 D v\_3 + c\_2 D \left[ v\_1 \otimes v\_2 \right]\_{\text{sym}} + \frac{c\_3}{3} D v\_1^{\otimes 3} \\ &\dots \\ \dots \end{aligned} \tag{11.22}$$

Each current component *ik* is the sum of a term linear in *D*v*<sup>k</sup>* and others that only depend on the voltage components of order lower than *k*. In an electric network the *k*th component of the current can therefore be represented by a linear capacitor of value *c*<sup>1</sup> and a current source (see Fig. 11.8b)

$$i\_k = c\_1 D v\_k + i\_{C,k}(v\_1, \dots, v\_{k-1}) \ .$$

The various components are calculated with the same hierarchy of networks that we described for nonlinear resistors.

An initial charge *q*<sup>0</sup> on the capacitor can be represented as usual by a current pulse *q*0δ applied across the capacitor in the linear convolution equation.

Note that the linearised current-voltage characteristic of a capacitor is not by itself an asymptotically stable differential equation. The nonlinear transfer function formalism is therefore only applicable when the nonlinear capacitor is embedded in a network whose linear approximation is asymptotically stable.

A charge controlled nonlinear capacitor is a capacitor whose voltage is a function of the charge v*<sup>C</sup>* = ς (*qC*). Expanding this function around the operating point (*QC*, *VC*) we obtain

$$v = \sum\_{k=1}^{\infty} \xi\_k q^k, \qquad v = v\_C - V\_C, \quad q = q\_C - \mathcal{Q}\_C.$$

The electric charge is the integral of the current. In the convolution algebra of right sided distributions this can be expressed by the convolution product between current and the Heaviside step function

$$q(t) = \int\_0^t i(\tau) \, d\tau = \mathbb{1}\_+(t) \* i(t) \,.$$

Substituting this equation in the preceding series we obtain a relation between current and voltage

$$v = \sum\_{k=1}^{\infty} \varsigma\_k (\mathfrak{1}\_+ \ast i)^k \dots$$

The first few voltage components expressed as a function of the current components are given by

$$\begin{aligned} v\_1 &= \xi\_1 \mathfrak{l}\_+ \ast i\_1 \\ v\_2 &= \xi\_1 \mathfrak{l}\_+ \ast i\_2 + \xi\_2 (\mathfrak{l}\_+ \ast i\_1)^{\otimes 2} \\ v\_3 &= \xi\_1 \mathfrak{l}\_+ \ast i\_3 + \xi\_2 2 \Big[ (\mathfrak{l}\_+ \ast i\_1) \otimes (\mathfrak{l}\_+ \ast i\_2) \Big]\_{\text{sym}} + \xi\_3 (\mathfrak{l}\_+ \ast i\_1)^{\otimes 3} \\ &\dots \text{-} \end{aligned}$$

As for the previous cases we see that each voltage component v*<sup>k</sup>* is composed by a term linear in the current *ik* and other ones that only depend on current components of order lower than *k*

$$v\_k = \zeta\_1 \mathfrak{l}\_+ \ast i\_k + \tilde{v}\_{C,k}(i\_1, \dots, i\_{k-1}) \dots$$

This expression can be represented in an electric network by the equivalent circuit shown in Fig. 11.8a.

If a capacitor is voltage controlled as well as charge controlled, then one can use either representation and one can be converted in the other one. The following

**Fig. 11.8 a** Weakly nonlinear capacitor current-controlled equivalent model **b** Weakly nonlinear capacitor charge-controlled equivalent model

equations give the first three coefficients of the charge controlled representation expressed in terms of the ones of the voltage controlled ones

$$\begin{aligned} \zeta\_1 &= \frac{1}{c\_1} \\ \zeta\_2 &= -\frac{c\_2}{2c\_1^3} \\ \zeta\_3 &= \frac{c\_2^2}{2c\_1^5} - \frac{c\_3}{3c\_1^4} \dots \end{aligned}$$

They were obtained by the same inversion procedure that we used to relate the two representations of nonlinear resistors.

# *11.2.3 Nonlinear Inductors*

A nonlinear inductor is a two-terminal device whose current *iL* and magnetic flux φ*<sup>L</sup>* are related by the φ-*i* characteristic *fL* (φ*<sup>L</sup>* ,*iL* ) = 0. In the following we are going to represent a nonlinear inductor by the symbol shown in Fig. 11.6. A current controlled inductor is an inductor whose flux is a function of the current φ*<sup>L</sup>* = ˜*fL* (*iL* ). The voltage across the terminals of an inductor is the time derivative of the flux. Thus, if the current is a differentiable function of time, the voltage is

$$v\_L = \frac{\mathbf{d}f\_L(i\_L)}{\mathbf{d}i\_L} \frac{\mathbf{d}i\_L}{\mathbf{d}t} \dots$$

The slope of the φ-*i* characteristic is called the *small signal* (or *incremental*) *Inductance* of the inductor

$$L(\dot{\iota}\_L) := \frac{\mathrm{d}f\_L(\dot{\iota}\_L)}{\mathrm{d}\dot{\iota}\_L} \tag{11.23}$$

that we assume, around the quiescent operating point ( *<sup>L</sup>* , *IL* ), to be expandable in a power series

$$l(i) := L(i + I\_L) = \sum\_{k=0}^{\infty} l\_{k+1} i^k, \qquad i = i\_L - I\_L \dots$$

It is apparent that inductors and capacitors are "dual" of each other, with the roles of current and voltage exchanged. We can therefore adapt previous results and define the voltage and current relationship of a current controlled weakly nonlinear inductor by

$$v = v\_C = \sum\_{k=1}^{\infty} \frac{l\_k}{k} D i^k. \tag{11.24}$$

**Fig. 11.9 a** Weakly nonlinear inductor current-controlled equivalent model **b** Weakly nonlinear inductor flux-controlled equivalent model

The first few components of the voltage expressed in terms of the components of the current are given by

$$\begin{aligned} v\_1 &= l\_1 D i\_1\\ v\_2 &= l\_1 D i\_2 + \frac{l\_2}{2} D i\_1^{\otimes 2} \\ v\_3 &= l\_1 D i\_3 + l\_2 D \left[ i\_1 \otimes i\_2 \right]\_{\text{sym}} + \frac{l\_3}{3} D i\_1^{\otimes 3} \\ &\dots \end{aligned} \tag{11.25}$$

The component *k* has the form

$$v\_k = l\_1 D v\_k + \tilde{v}\_{L,k}(i\_1, \dots, i\_{k-1}).$$

from which we read the equivalent circuit shown in Fig. 11.9a.

Similarly, the flux controlled representation of a weakly nonlinear inductor is

$$i = \sum\_{k=1}^{\infty} \varrho\_k (\mathfrak{l}\_+ \ast v)^k,$$

with the first few voltage components expressed as a function of the current components given by

$$\begin{aligned} i\_1 &= \varrho\_1 \mathfrak{l}\_+ \* v\_1 \\ i\_2 &= \varrho\_1 \mathfrak{l}\_+ \* v\_2 + \varrho\_2 (\mathfrak{l}\_+ \* v\_1)^{\otimes 2} \\ i\_3 &= \varrho\_1 \mathfrak{l}\_+ \* v\_3 + \varrho\_2 2 \left[ (\mathfrak{l}\_+ \* v\_1) \otimes (\mathfrak{l}\_+ \* v\_2) \right]\_{\text{sym}} + \varrho\_3 (\mathfrak{l}\_+ \* v\_1)^{\otimes 3} \\ &\dots \end{aligned}$$

**Fig. 11.10** Current source representation in the Laplace domain of nonlinearities of weakly nonlinear elements

**Fig. 11.11** Voltage source representation in the Laplace domain of nonlinearities of weakly nonlinear elements

The component *k* has the form

$$i\_k = \varrho\_1 \mathfrak{1}\_+ \ast v\_k + i\_{L,k}(v\_1, \dots, v\_{k-1}) \dots$$

which leads to the equivalent circuit shown in Fig. 11.9b.

As for nonlinear capacitors, the nonlinear impulse responses formalism can only be applied to circuits including nonlinear inductors when they are part of networks whose linear approximation is asymptotically stable.

# **11.3 Nonlinear Multi-port Elements**

Weakly nonlinear multi-port and multi-terminal elements can be represented by twoterminal elements and controlled sources. Therefore, in this section we introduce weakly nonlinear controlled sources. With them, we will have at disposal all the necessary circuit elements necessary to model arbitrary weakly nonlinear electronic components.

A *controlled source* is a two terminal element whose voltage v*C S* or current *iC S* is controlled by a control voltage v*<sup>X</sup>* or current *iX* , a quantity in another part of the electric network of which it is part. There are four types of controlled sources:


As before, we assume that, around a quiescent operating point, the characterising function can be approximated to any desired accuracy by a power series. The incremental quantities can then be represented by elements of D <sup>⊕</sup>,sym. For example, we assume that a VCCS can be represented by

$$i = \sum\_{k=1}^{\infty} g\_{mk} v^k, \qquad i = i\_{CS} - I\_{CS}, \quad v = v\_X - V\_X.$$

The first three components of the current expressed in terms of the components of the voltage can be derived in the same way as we did for a voltage-controlled weakly-nonlinear resistor and are

$$\begin{aligned} \dot{v}\_1 &= g\_{m1} v\_1 \\ \dot{v}\_2 &= g\_{m1} v\_2 + g\_{m2} v\_1^{\otimes 2} \\ \dot{v}\_3 &= g\_{m1} v\_3 + 2g\_{m2} \left[ v\_1 \otimes v\_2 \right]\_{\text{sym}} + g\_{m3} v\_1^{\otimes 3} \\ &\dots \end{aligned} \tag{11.26}$$

Note that each current component *ik* is the sum of a term linear in the *k*th component of the incremental control voltage v*<sup>k</sup>* and other terms that only depend on components of the voltage v of order lower than *k*

$$i\_k = \mathbf{g}\_{mk} v\_k + i\_{CS,k}(v\_1, \dots, v\_{k-1}) \ .$$

In an electric network a VCCS can thus be represented by a *linear* VCCS and independent current sources only depending on control voltage components of order lower than *k*. As for the two terminal weakly-nonlinear elements considered in Sect. 11.2, the linear term of a VCCS plays a special role. For this reason the quantity *gm*<sup>1</sup> has been given a name: it is called the *transconductance* of the source. A two-port representation of a VCCS is shown in Fig. 11.12.

The situation is entirely analogous for the other types of controlled sources. The coefficient of the linear term of a CCVS *rm*<sup>1</sup> is called the *transresistance*, the one of a CCCS α<sup>1</sup> is called the *current transfer ratio* and the one of a VCVS μ<sup>1</sup> the *voltage transfer ratio*.

# **11.4 Low-Pass Filter with Nonlinear Capacitor**

In this section we investigate the low-pass filter (LPF) shown in Fig. 11.13a. When implemented in an integrated circuit technology, a considerable fraction of the circuit area is often occupied by the capacitor. Given that the price of integrated circuits is determined to a large extent by occupied area, to reduce the cost of the circuit, it is desirable to use a capacitor type with a high capacitance per unit area. The highest capacitance per unit area available in CMOS technologies is offered by MOS capacitors which however have a rather nonlinear characteristic. For this reason we investigate the effects introduced in the circuit by the use of a nonlinear capacitor. In particular, we are interested in the upper linearity limit set by the nonlinear capacitor and therefore assume the operational amplifier (OpAmp) to be ideal.

# *11.4.1 Nonlinear Transfer Functions*

Under the assumption of an ideal OpAmp, the circuit of Fig. 11.13a can be represented by the small-signal equivalent circuit shown in Fig. 11.13b. Since MOS capacitors are voltage controlled, we represent the nonlinear capacitor as a voltage controlled device. Then, using Kirchhoff's current law (KCL) the system equation is

**Fig. 11.13 a** Active *RC* low-pass filter circuit **b** Active *RC* low-pass filter with ideal OpAmp model

$$\frac{v\_o}{R} + c\_1 D v\_o = -i\_s - \sum\_{k=2}^{\infty} \frac{c\_k}{k} D v\_o^k. \tag{11.27}$$

To highlight that the nonlinear part of the capacitor characteristic act as a source, we have moved that part of the characteristic to the right-hand side of the equation together with the source *is* and collected all linear terms on the left-hand side.

We solve the equation in the Laplace domain using the equivalent circuits that we developed in Sects. 11.2 and 11.3. The first order output voltage component *Vo*.<sup>1</sup> is obtained by replacing the nonlinear capacitor with an ideal capacitor with a value *c*<sup>1</sup> corresponding to the value of the nonlinear capacitor at the operating point

$$\frac{V\_{o,1}(\mathbf{s}\_1)}{R} + c\_1 \mathbf{s}\_1 V\_{o,1}(\mathbf{s}\_1) = -I\_s(\mathbf{s}\_1) \dots$$

Using a Dirac impulse as input signal, the first order transfer function is found to be

$$H\_1(s\_1) = \left. \boldsymbol{V}\_{o.1} \right|\_{I\_1(s\_1) = 1} = \frac{-R}{1 + \frac{s\_1}{\alpha\_{\text{3dB}}}} $$

with

$$a\_{\text{3dB}} := \frac{1}{Rc\_1}$$

the 3 dB cut-off frequency of the filter.

Having found the first order output component of the voltage, we can calculate the equivalent source representing the second order nonlinearity of the capacitor ˜*Ic*,<sup>2</sup>(*s*1,*s*2) (see Fig. 11.10). With a Dirac pulse as the first order input we find

$$
\tilde{I}\_{c,2}(\mathbf{s}\_1, \mathbf{s}\_2) = \frac{c\_2}{2} (\mathbf{s}\_1 + \mathbf{s}\_2) H\_1(\mathbf{s}\_1) H\_1(\mathbf{s}\_2) \dots
$$

$$I\_{c,2}(s\_1, s\_2) \quad \Big| \begin{array}{ccc} \displaystyle \gtrless \kappa & \displaystyle \multimap & \displaystyle \multimap & \displaystyle \multimap & \displaystyle \multimap & I\_{c,2}(s\_1, s\_2) \\ \displaystyle \multimap & \vbox{\rule{1.0pt}{1.0pt}} \end{array}$$

The second order transfer function is found with the help of the second order equivalent circuit. It is obtained from the first order one by removing the current source *Is*(*s*1), which is of first order, and by inserting the source ˜*Ic*,<sup>2</sup>(*s*1,*s*2) representing the second order nonlinearity of the capacitor. The second order equivalent circuit is shown in Fig. 11.14. Note how the current generated by second (and higher) order nonlinearity is injected into the input node of the filter. For this reason, as discussed in Sect. 10.2, feedback is unable to suppress it. Since the variables in this network are of second order, we have to use the definition of the derivative for second order distributions (Eq. (9.14)). The second order transfer function is thus found to be

$$\begin{aligned} H\_2(s\_1, s\_2) &= \frac{-R}{1 + (s\_1 + s\_2)Rc\_1} \tilde{I}\_{c,2}(s\_1, s\_2) \\ &= \frac{c\_2}{2} (s\_1 + s\_2) H\_1(s\_1 + s\_2) H\_1(s\_1) H\_1(s\_2) \dots \end{aligned}$$

With the first two components of the output voltage we can compute the equivalent source representing the capacitor nonlinearity of third order (see Fig. 11.10)

$$\begin{aligned} & \quad I\_{\mathbf{c},3}(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) \\ & \quad = (\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) \Big\{ c\_2 \left[ H\_1(\mathbf{s}\_1) H\_2(\mathbf{s}\_2, \mathbf{s}\_3) \right]\_{\text{sym}} + \frac{c\_3}{3} H\_1(\mathbf{s}\_1) H\_1(\mathbf{s}\_2) H\_1(\mathbf{s}\_3) \Big\} \end{aligned}$$

and with it the equivalent circuit of third order. Using the definition of the derivative for third order distributions, the third order transfer function is found to be

$$\begin{split} H\_{3}(s\_{1},s\_{2},s\_{3}) &= \frac{-R}{1+(s\_{1}+s\_{2}+s\_{3})Rc\_{1}} \tilde{I}\_{c,3}(s\_{1},s\_{2},s\_{3}) \\ &= (s\_{1}+s\_{2}+s\_{3})H\_{1}(s\_{1}+s\_{2}+s\_{3}) \\ &\quad \cdot \left\{ \frac{c\_{2}^{2}}{2} \left[ H\_{1}(s\_{1})H\_{1}(s\_{2})H\_{1}(s\_{3})(s\_{2}+s\_{3})H\_{1}(s\_{2}+s\_{3}) \right]\_{\text{sym}} \right. \\ &\left. + \frac{c\_{3}}{3}H\_{1}(s\_{1})H\_{1}(s\_{2})H\_{1}(s\_{3}) \right\}. \end{split}$$

# *11.4.2 Second Order Intermodulation*

Having found the first three transfer functions of the filter, we can evaluate the impact of the nonlinearities in concrete situations. As a first situation, suppose that there is a strong modulated signal in the stop-band of the filter. If the even order nonlinearities generate strong IM products masking the wanted signal in the pass-band, then the filter is of little use. To have a first indication of the strength of this effect, we only consider the nonlinearity of second order and calculate the IP2. We model the modulated signal in the stop-band with two tones at ω<sup>1</sup> and ω2. We further assume ω<sup>2</sup> > ω<sup>1</sup> and

$$
\Delta \boldsymbol{\alpha} := \boldsymbol{\alpha}\_2 - \boldsymbol{\alpha}\_1 < \boldsymbol{\alpha}\_{3\text{dB}}
$$

so that one of the IM2 products falls in the pass-band of the filter. The IM2 of interest is characterized by the frequency mix *m* = (0, 1, 0, 1) and thus by the frequency response

$$\begin{split} H\_2(-j\omega\_1, j\omega\_2) &= \frac{c\_2}{2} j \, \Delta \omega H\_1(j\, \Delta \omega) H\_1(-j\omega\_1) H\_1(j\omega\_2) \\ &\approx -\frac{c\_2}{2} j \, \Delta \omega R \frac{R \, \omega\_{3\text{dB}}}{-j\omega\_1} \frac{R \, \omega\_{3\text{dB}}}{j\omega\_2} \approx -\frac{c\_2}{2} j \, \Delta \omega \frac{R^3 \omega\_{3\text{dB}}^2}{\omega\_1^2} \dots \end{split}$$

With it the IIP2 and OIP2 are obtained from (11.10)

$$I\_{\rm IIP2} \approx \left| \frac{2}{c\_2 \Delta \omega \sigma R^2} \right| \left( \frac{\omega\_{\rm I}}{\omega\_{3\rm dB}} \right)^2 \tag{11.28}$$

$$V\_{\rm OIP2} \approx \left| \frac{2}{c\_2 \Delta a \sigma R} \right| \left( \frac{a\nu\_1}{a\nu\_{3\rm dB}} \right)^2. \tag{11.29}$$

We have denoted the two intercept points by *I*IIP2 and *V*OIP2 to make it clear that the first characterizes the magnitude of the input current while the latter the magnitude of the output voltage.

These expressions reveal that the more the blocker is in the stop band, the lower the IM2. This makes intuitive sense as the voltage generated across the nonlinear capacitor by the interfering signal is the smaller, the lower the capacitor impedance. Since we have assumed a voltage-controlled capacitor, a small voltage will produce small intermodulation products. The above expressions also reveal that the IM2 is not homogeneous across the pass-band, but it is stronger when ω approaches the filter 3 dB cut-off frequency of the filter.

The IP2 can also be expressed in a slightly different form. If we replace one occurrence of ω3dB by 1/(*c*1*R*) in the above expression we obtain

$$V\_{\rm OIP2} \approx 2 \left| \frac{c\_1}{c\_2} \right| \frac{\alpha\_1^2}{\Delta \alpha \, \alpha\_{\rm 3dB}} \,. \tag{11.30}$$

**Fig. 11.15 a** Typical characteristic of an n-type accumulation-mode MOS varactor with a channel length of 0.2µm in a 40 nm CMOS technology **b** Small-signal model coefficients of an n-type accumulation-mode MOS varactor with a channel length of 0.2µm in a 40 nm CMOS technology

This form highlights the value of the OIP2 as a function of the ratio of the linear capacitor coefficient to the coefficient of the second order nonlinearity.

Figure 11.15a shows the typical characteristic of an n-type accumulation-mode MOS varactor [26] with a channel length of 0.2 µm in a 40 nm CMOS technology. Figure 11.15b shows the small-signal model coefficients normalized to the linear

**Fig. 11.16** Simulated IM2 of the LPF with the capacitor having the characteristic shown in Fig. 11.15a and driven by two tones of equal magnitude at 48.75 and 51.25MHz respectively

capacitance *c*<sup>1</sup> as a function of the voltage across the capacitor. If we use such a capacitor biased at 0 V to implement a LPF with a cut-off frequency of 5MHz to suppress a signal at 50MHz modeled as two tones at 48.75 and 51.25MHz respectively, (11.30) predicts that the filter will have an OIP2 of

$$\text{OIP2} \approx \text{44 dBV.}$$

For comparison we simulated the LPF IP2 numerically. To obtain a cut-off frequency of 5MHz we used a resistor of 1 kand a nominal capacitance of 31.58 pF. The results of the simulation are shown in Fig. 11.16. The value of the IP2 agrees very well with the predicted value. The IM2 starts to deviate from the ideal slope of 2 at a level of the input tones of ca. –55 dBA. This means that at that level the contribution of higher order nonlinearities to the IM2 become important. A -55 dBA tone at 50MHz passing through a linear LPF with a transfer function equal to *H*1(jω) and the above component values produces an output tone with a magnitude of approximately

$$\sqrt{2}\,10^{-\mathfrak{S}\mathfrak{S}/20}\,R\,\frac{a\_{\mathfrak{A}\mathfrak{B}}}{a}\approx 2\mathfrak{S}\,1\,\mathrm{mV}\,.$$

The capacitor characteristic in Fig. 11.15a shows that a linear *c*-v approximation is only reasonably accurate up to this value. We thus see that a rough estimate of the range of validity of the approximation can be obtained by overlapping the approximation with the real characteristic. At larger positive and negative voltage levels the capacitor characteristic flattens out, and we can speculate that this is the reason for the slower increase of the IM2 at large signal levels.

**Fig. 11.17 a** LPF with back-to-back capacitors ideal OpAmp model **b** LPF with back-to-back capacitors 2nd order equivalent circuit

**Fig. 11.18** Characteristic of two back-to-back n-type accumulation-mode MOS varactor each with the characteristic shown in Fig. 11.15a

For many applications, such as in communication receivers, this IP2 is insufficient. One way to improve it is by using two equal nonlinear capacitors connected backto-back as shown in Fig. 11.17a, each providing half of the required capacitance. In this way, when v*<sup>o</sup>* increases, the capacitance of one capacitor increases, while the one of the other capacitor decreases.

One way to analyze this circuit is to consider the combination of the two capacitors as a single nonlinear capacitor with the effective characteristic shown in Fig. 11.18. Figure 11.19 shows that, with identical devices, *c*<sup>2</sup> is identically zero. Hence, the IM2 is completely suppressed.

Another way to analyze the circuit is to consider each capacitor individually. If the linear transfer function has to remain the same as the one of the original circuit with a single capacitor, then we must have *cp*,<sup>1</sup> + *cn*,<sup>1</sup> = *c*1. The second order network is

**Fig. 11.19** Small-signal model coefficients of two back-to-back n-type accumulation-mode MOS varactor

therefore composed by the same linear components as before, but now it includes two sources, each representing the second-order nonlinearity of one of the two capacitors (see Fig. 11.17b). The one of *cp* has the same reference direction as the one of the original circuit and has a value of

$$
\tilde{I}\_{c\_p,2}(s\_1, s\_2) = \frac{c\_{p,2}}{2} (s\_1 + s\_2) V\_{o,1}(s\_1) V\_{o,1}(s\_2) \dots
$$

The one of *cn* has the opposite reference direction and a value of

$$\begin{split} \tilde{I}\_{c\_{u},2}(s\_{1},s\_{2}) &= \frac{c\_{n,2}}{2} (s\_{1}+s\_{2}) \left( -V\_{o,1}(s\_{1}) \right) \left( -V\_{o,1}(s\_{2}) \right) \\ &= \frac{c\_{n,2}}{2} (s\_{1}+s\_{2}) V\_{o,1}(s\_{1}) V\_{o,1}(s\_{2}) \ . \end{split}$$

As the two negative signs coming from v*<sup>n</sup>* = −v*<sup>o</sup>* cancel, the two currents flow in opposite directions and, if *cn*,<sup>2</sup> = *cp*,<sup>2</sup> they cancel each other.

Note that this cancelling effect of even order responses is quite general. Given an arbitrary even order frequency response *h*ˆ *<sup>k</sup>*,*<sup>m</sup>*, the response of (even) order *k* to *N* input tones with phasors *A*1,..., *An* is

$$\mathbb{E}\mathbf{y}\_{k,m}^c(t) = \mathfrak{N}\left\{\frac{1}{2^{k-1}}\frac{k!}{m!}A\_{-N}^{m\_{-N}}\cdots A\_{-1}^{m\_{-1}}A\_1^{m\_1}\cdots A\_N^{m\_N}\hat{h}\_{k,m}\mathbf{e}^{J^{\alpha\_{m}t}}\right\}.$$

If the sign of all input tones is reversed, every phasor will be multiplied by ejπ . As *k* is assumed to be even and |*m*| = *k* these factors will multiply to 1

$$\mathfrak{e}^{\prime \pi^k} = \mathfrak{l}; \qquad k \text{ even}.$$

For this reason the response remains unchanged, but with opposite reference direction. Therefore, *all even order harmonics and intermodulation products will be suppressed.*

In reality there are two limitations to the amount of canceling that is practically achievable. The first one is due to the fact that small unavoidable manufacturing imperfections make nominally identical devices slightly different. This effect is called *mismatch*. For this reason the coefficients of *cp* will be slightly different from the one of *cn*. Let's represent the small variations due to mismatch in the following way

$$\begin{aligned} c\_{p,1} &= c\_{\text{nom},1} + \Delta c\_{p,1} & \quad & c\_{n,1} = c\_{\text{nom},1} + \Delta c\_{n,1} \\ c\_{p,2} &= c\_{\text{nom},2} + \Delta c\_{p,2} & \quad & c\_{n,2} = c\_{\text{nom},2} + \Delta c\_{n,2} \end{aligned}$$

with

$$\begin{aligned} c\_{\text{nom},1} &= \frac{c\_1}{2} & \Delta c\_{p,1}, \Delta c\_{n,1} &\ll c\_{\text{nom},1} \\ c\_{\text{nom},2} &= \frac{c\_2}{2} & \Delta c\_{p,2}, \Delta c\_{n,2} &\ll c\_{\text{nom},2} \end{aligned}$$

Then, the two current sources ˜*Icp* ,<sup>2</sup>(*s*1,*s*2) and ˜*Icn* ,<sup>2</sup>(*s*1,*s*2) can be represented by a single source with the same reference direction of the former and a value of

$$
\tilde{I}\_{c\_p,2}(\mathbf{s}\_1, \mathbf{s}\_2) - \tilde{I}\_{c\_n,2}(\mathbf{s}\_1, \mathbf{s}\_2) = \frac{\Delta c\_{p,2} - \Delta c\_{n,2}}{2} (\mathbf{s}\_1 + \mathbf{s}\_2) V\_{o,1}(\mathbf{s}\_1) V\_{o,1}(\mathbf{s}\_2) \dots
$$

The resulting network is similar to the one of the original circuit, the only difference being that the coefficient *c*<sup>2</sup> is replaced by *cp*,<sup>2</sup> − *cn*,2. The OIP2 is therefore

$$\begin{split} V\_{\text{OIP2,B2B}} & \approx 2 \left| \frac{c\_1 + \Delta c\_{p,1} + \Delta c\_{p,1}}{\Delta c\_{p,2} - \Delta c\_{n,2}} \right| \frac{\omega\_1^2}{\Delta \omega \, \omega\_{\text{3dB}}} \\ & \approx 2 \left| \frac{c\_2}{\Delta c\_{p,2} - \Delta c\_{n,2}} \right| \left| \frac{c\_1}{c\_2} \right| \frac{\omega\_1^2}{\Delta \omega \, \omega\_{\text{3dB}}} \\ & = \left| \frac{c\_2}{\Delta c\_{p,2} - \Delta c\_{n,2}} \right| V\_{\text{OIP2}} . \end{split} \tag{11.31}$$

Compared to the original circuit the IP2 has been improved by the mismatch limited factor 

$$\left| \frac{c\_2}{\Delta c\_{p,2} - \Delta c\_{n,2}} \right| \dots$$

**Fig. 11.20** Simulated IM2 of the LPF with the capacitor having the characteristic shown in Fig. 11.15a and driven by two tones of equal magnitude at 48.75 and 51.25MHz respectively

Figure 11.20 shows the results of simulations with two identical nonlinear capacitors, and for the case where *cp*,<sup>2</sup>/*c*nom,<sup>2</sup> = −*cn*,<sup>2</sup>/*c*nom,<sup>2</sup> = 0.01. In the latter case we observe the expected improvement of

$$20\log\left(\frac{c\_2}{0.02c\_2/2}\right) = 20\log\left(\frac{1}{0.01}\right) = 40\text{ dB}\dots$$

In the former, at input signal levels up to –65 dBA the value of the IM2 is limited by numerical noise. At larger input signal levels the simulation result is the product of the limited accuracy of the used numerical algorithm.

The second practical limitation is constituted by the fact that the terminals of real components are often coupled to other nodes of the circuit. This coupling can be modeled with parasitic components connected to the terminals. The parasitic components of the positive terminal are often different from the ones of the negative terminal. In addition, parasitic components are often nonlinear.

We conclude this section by noting that if the two tones are in the pass-band of the filter the OIP2 is

$$\left| V\_{\rm OIP2} \approx \left| \frac{2}{c\_2 \Delta \omega \sigma R} \right| = 2 \left| \frac{c\_1}{c\_2} \right| \frac{o\_{\rm 3dB}}{\Delta \omega} \right. \tag{11.32}$$

# *11.4.3 Third Order Intermodulation*

In this section we investigate the situation where there are two interfering signals, one at ω<sup>1</sup> and a second one close to twice this frequency ω<sup>2</sup> = 2ω<sup>1</sup> − ω so that the lower side-band IM3 falls in the pass-band of the filter

$$2\alpha\_1 - \alpha\_2 = \Delta \alpha < \alpha\_{3\text{dB}} \qquad \alpha\_1, \alpha\_2 > \alpha\_{3\text{dB}} > 0 \dots$$

To characterize this situation we compute the IP3.

The IM3 of interest is obtained from the third order transfer function

$$\begin{aligned} H\_3(s\_1, s\_2, s\_3) &= (s\_1 + s\_2 + s\_3)H\_1(s\_1 + s\_2 + s\_3) \\ &\quad \cdot \left\{ \frac{c\_2^2}{2} \left[ H\_1(s\_1)H\_1(s\_2)H\_1(s\_3)(s\_2 + s\_3)H\_1(s\_2 + s\_3) \right] \text{sym} \right\} \\ &\quad + \frac{c\_3}{3}H\_1(s\_1)H\_1(s\_2)H\_1(s\_3) \right\}. \end{aligned}$$

evaluated at the frequency mix *m* = (1, 0, 2, 0). Setting *s*<sup>1</sup> = jω1,*s*<sup>2</sup> = jω<sup>1</sup> and *s*<sup>3</sup> = −jω<sup>2</sup> the term enclosed in the symmetrization operator becomes

$$\begin{aligned} &H\_1(j\omega\_1)H\_1(j\omega\_1)H\_1(-j\omega\_2) \\ &\cdot \frac{1}{6} \Big[ 2j \left( 2\omega\_1 \right) H\_1(j\omega\_1) + 4j \left( -\omega\_1 + \Delta\omega \right) H\_1(j \left( -\omega\_1 + \Delta\omega \right)) \Big] .\end{aligned}$$

If we assume |ω<sup>1</sup> − ω| > ω3dB and use the approximation

$$J\omega H\_1(j\omega) \approx J\omega \frac{-R}{j\omega c\_1 R} = \frac{-1}{c\_1}$$

we can simplify it to

$$H\_1(j\omega\_1)H\_1(j\omega\_1)H\_1(-j\omega\_2)\frac{-1}{c\_1}\dots$$

Using these results we obtain

 $H\_3(j\omega\_1, j\omega\_1, -j\omega\_2)$ 
$$\approx j\Delta\omega H\_1(j\Delta\omega)H\_1(j\omega\_1)H\_1(j\omega\_1)H\_1(-j\omega\_2)\left[-\frac{c\_2^2}{2c\_1} + \frac{c\_3}{3}\right]$$

$$\approx j\Delta\omega(-R)\left(\frac{-1}{j\omega\_1c\_1}\right)^2\left(\frac{-1}{-2j\omega\_1c\_1}\right)\left[-\frac{c\_2^2}{2c\_1} + \frac{c\_3}{3}\right]$$

$$= \frac{\Delta\omega R}{(\omega\_1c\_1)^3}\left[\frac{c\_3}{6} - \frac{c\_2^2}{4c\_1}\right].$$

The IIP3 and OIP3 are obtained by inserting this result in (11.11)

$$I\_{\rm IIP3} \approx \left| \frac{4}{3} \left| \frac{(\omega\_1 c\_1)^3}{\Delta \omega \left[ \frac{c\_3}{6} - \frac{c\_2^2}{4c\_1} \right]} \right| \right| \tag{11.33}$$

$$V\_{\rm OIP3} \approx \sqrt{\frac{4}{3} \frac{\omega\_1^3}{\Delta \omega \,\omega\_{\rm 3dB}^2} \left| \frac{1}{\frac{c\_1}{6c\_1} - \frac{1}{4} \left(\frac{c\_2}{c\_1}\right)^2} \right|}\,\,\,\tag{11.34}$$

These expressions reveal that the IP3 depends not only on the third order coefficient *c*3, but also from the second order one *c*2. The reason is the fact that second order intermodulation products are fed back to the input of the nonlinear component, where, in combination with the fundamental tones, they pass again through the second order nonlinearity. This is the effect that was discussed in Sect. 10.2 with the help of the signal-flow graph of Fig. 10.4 and the reason for *c*<sup>2</sup> being squared. The expression for the OIP3 highlights the fact that it is the ratio of the coefficients *c*<sup>2</sup> and *c*<sup>3</sup> to the linear capacitance *c*<sup>1</sup> that matters. The expressions also reveal that the IM3 generated by second order and third order nonlinearities have either the same or opposite phase and that, if *<sup>c</sup>*<sup>3</sup>

$$\frac{c\_3}{c\_1} = \frac{3}{2} \left(\frac{c\_2}{c\_1}\right)^2,\tag{11.35}$$

the two cancel each other.

In the previous section we discussed the fact that using equal nonlinear capacitors connected back-to-back eliminates even order components from the response of the system. This is not the case for odd order nonlinearities. To see this, we can draw the third order equivalent network of the filter with back-to-back capacitors. The equivalent sources representing the third order nonlinearities of *cp* and *cn* are

$$\tilde{I}\_{c\_p,3}(\mathbf{s}\_1,\mathbf{s}\_2,\mathbf{s}\_3) = \frac{\mathbf{c}\_{p,3}}{3}(\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) V\_{o,1}(\mathbf{s}\_1) V\_{o,1}(\mathbf{s}\_2) V\_{o,1}(\mathbf{s}\_3)$$

and

$$\tilde{I}\_{c\_n, \mathfrak{z}}(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) = -\frac{c\_{n, \mathfrak{z}}}{3} (\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) V\_{o, 1}(\mathbf{s}\_1) V\_{o, 1}(\mathbf{s}\_2) V\_{o, 1}(\mathbf{s}\_3)$$

respectively, where we have considered that *Vo*,<sup>2</sup>(*s*2,*s*3) is zero. For equal capacitors *cp*,<sup>3</sup> = *cn*,<sup>3</sup> = *c*3/2, therefore, having the sources opposite reference directions, they combine to form a single source equivalent to the one of a single nonlinear capacitor with *c*<sup>2</sup> = 0.

As an example, we consider again a filter with a cur-off frequency of 5MHz implemented with a nonlinear MOS capacitor having the characteristic shown in Fig. 11.15a and biased at 0 V. At this bias point the ratios *c*2/*c*<sup>1</sup> and *c*3/*c*<sup>1</sup> are 1.73 and –0.94 respectively. If the filter is driven by a tone at 15MHz and a second one at 27.5MHz (11.34) predicts an OIP3 of

$$\text{OPI3} \approx 16.0 \text{ dBV.}$$

Note that in this example it is the second order nonlinearity that dominates the IM3 as 

$$\left|\frac{c\_3}{6c\_1}\right| \approx 0.16 < \frac{1}{4} \left(\frac{c\_2}{c\_1}\right)^2 \approx 0.75\text{ .}$$

Thus, using back-to-back capacitors improves the OIP3 up to

OIP3B2B ≈ 23.6 dBV .

For comparison, we simulated the filter with the full nonlinear capacitor characteristic of Fig. 11.15a. The obtained IM3 as a function of the input tones magnitude is shown in Fig. 11.21. The figure also shows the IM3 obtained using back-to-back capacitors. In both cases the obtained IP3 is in good agreement with the above calculations. The IM3 starts to depart from a straight line with a slope of three at a level of the input tones of ca. –67 dBA. This corresponds to an output fundamental magnitude of ca. 0.2 V for the tone at ω<sup>1</sup> and is close to the level at which the polynomial approximation starts to deviate significantly from the real characteristic of the capacitor.

Further, we verified the occurrence of canceling between the IM3 produced by the third order nonlinearity with the one produced by second order. Figure 11.22 shows the magnitude of the IM3 as a function of the bias voltage of the capacitor. The curve shows a clear notch at a bias voltage of ca. –0.19 V, the bias voltage at

**Fig. 11.21** Simulated IP3 of the LPF with the capacitor having the characteristic shown in Fig. 11.15a and with two equal back-to-back (B2B) capacitors. The two input tones were of equal magnitude at 15 and 27.5MHz respectively

**Fig. 11.22** Simulated IM3 of the LPF with the capacitor having the characteristic shown in Fig. 11.15a as a function of the capacitor bias voltage *VO* . The filter was driven by two equal tones of magnitude 0.1 at 15 and 27.5MHz respectively

which the coefficient ratios *c*2/*c*<sup>1</sup> and *c*3/*c*<sup>1</sup> satisfy the canceling condition expressed by (11.35). This notch disappears at large signal levels, where contributions to the IM3 from higher order nonlinearities become important. The curve also suggest that, to obtain the best linearity, one should use a large bias voltage bringing the MOS capacitor in strong inversion, where its capacitance becomes almost constant.

Before concluding this section we investigate the case in which the two tones are in the pass-band of the filter. In this case the term in the symmetrization operator in *H*3(jω1,jω1, −jω2) is

$$\begin{aligned} &H\_{\mathrm{I}}(j\omega\_{\mathrm{I}})H\_{\mathrm{I}}(j\omega\_{\mathrm{I}})H\_{\mathrm{I}}(-j\omega\_{\mathrm{2}}) \\ &\cdot \frac{1}{6} \Big[ 2J\_{\mathrm{I}}(2\omega\_{\mathrm{1}})H\_{\mathrm{I}}(j\omega\_{\mathrm{1}}) + 4J\_{\mathrm{I}}(-\omega\_{\mathrm{1}} + \Delta\omega)(-R) \Big] \\ &= H\_{\mathrm{I}}(j\omega\_{\mathrm{1}})H\_{\mathrm{I}}(j\omega\_{\mathrm{1}})H\_{\mathrm{I}}(-j\omega\_{\mathrm{2}}) \cdot \frac{2}{3}J \Big[ \omega\_{\mathrm{1}}H\_{\mathrm{I}}(j\Delta\omega\_{\mathrm{1}}) + (\omega\_{\mathrm{1}} - \Delta\omega)R \Big]. \end{aligned}$$

If we further assume that 2ω<sup>1</sup> also falls in the pass-band of the filter it simplifies to

$$-H\_1(J\alpha\_1)H\_1(J\alpha\_1)H\_1(-J\alpha\_2)\cdot \frac{2}{3}J\,\Delta\phi\,\mathcal{R}\,\,.$$

The third order nonlinear transfer function evaluated at the frequency mix *m* = (1, 0, 2, 0) is therefore

**Fig. 11.23** Simulated IP3 of the LPF with the capacitor having the characteristic shown in Fig. 11.15a and with two equal back-to-back (B2B) capacitors. The two input tones were of equal magnitude at 1 and 1.1MHz respectively

$$\begin{split} H\_3(j\omega\_1, j\omega\_1, -j\omega\_2) &\approx j\,\Delta\omega R^4 \Big[\frac{c\_3}{3} - j\frac{c\_2^2}{3}\Delta\omega R\Big] \\ &= j\,\frac{\Delta\omega}{\omega\_{3\text{dB}}}\frac{R^3}{3} \Big[\frac{c\_3}{c\_1} - j\frac{c\_2^2}{c\_1}\Delta\omega R\Big]. \end{split}$$

With this, the OIP3 is

$$V\_{\rm OIP3,IB} \approx \sqrt{\frac{4\alpha\_{\rm 3dB}}{\Delta \omega} \frac{1}{\left| \frac{c\_2}{c\_1} - J \left( \frac{c\_2}{c\_1} \right)^2 \frac{\Delta \omega}{\alpha \mu \text{s}} \right|}}} \,\tag{11.36}$$

The results of a simulation with one tone at 1MHz and the second one at 1.1MHz is shown in Fig. 11.23. The results are again in good agreement with the OIP3 estimated with the help of the above equation which gives 10.1 and 10.7 dBV for a single capacitor and for back-to-back capacitors respectively.

# *11.4.4 Large Signal Effects*

In this section we evaluate gain compression and amplitude-modulation to phasemodulation due to the nonlinear capacitor. The onset of both of these effects is governed by the third order transfer function evaluated at the frequency mix *m* = (0, 1, 2, 0) relative to the linear transfer function at the fundamental

$$\begin{split} \frac{H\_3(j\omega\_1, j\omega\_1, -j\omega\_1)}{H\_1(j\omega\_1)} &\approx -j\omega\_1 R^3 \Big[\frac{c\_3}{3} - j\frac{c\_2^2}{3}\omega\_1 R\Big] \\ &= -\frac{\omega\_1}{\omega\_{3\text{dB}}} \frac{R^2}{3} \Big[\left(\frac{c\_2}{c\_1}\right)^2 \frac{\omega\_1}{\omega\_{3\text{dB}}} + j\frac{c\_3}{c\_1}\Big]. \end{split}$$

where we have assumed 2ω<sup>1</sup> < ω3dB. The phase of this expression determines the presence of gain compression or expansion and AM2PM.

As a concrete example, we consider again a low-pass filter with a cut-off frequency of 5MHz, *R* = 1 k, the nonlinear capacitor with the characteristic shown in Fig. 11.15a and driven by a sinusoidal tone at 1MHz. In this case the term in the square bracket in the above expression, multiplied by minus one, evaluates to −0.6 + j0.94. As the real part is negative we expect some gain compression. However, the imaginary part has a larger magnitude which implies that AM2PM should be somewhat more pronounced. If we use (11.7) to estimate the amplitude of the input tone producing a phase change of 1◦ we obtain a value of –67.3 dBA which corresponds to an output swing of 0.61 mV. A look at Fig. 11.15a shows that at these levels a second order approximation of the capacitor characteristic is a very poor approximation of the real characteristic. For this reason we can't expect this estimate to be accurate.

A believable prediction can be made for levels where the approximation is good. For example, a phase change of 0.1◦ is predicted to happen at an input signal level of –77.3 dBA which corresponds to an output swing of 0.193 mV. Similarly, (11.5) predicts a 10 mdB gain compression at an input level of –77.2 dBA. These levels compare quite favorably with the values obtained by a numerical simulation and shown in Fig. 11.24. The simulation shows that these effects remain very small up to the large output swing of 1 V RMS which is close to the reliability limit supported by these devices.

# **11.5 Class-AC Common-Source Stage**

In this section we analyse the common-source stage shown in Fig. 11.25a for use as an RF amplifier. In particular, we are interested in the distortion introduced by the nonlinear *i* − v characteristic of the transistor and in the influence on distortion of the choice of gate bias voltage *VG*. For simplicity in this section we neglect the *Cgd* capacitance. We will consider circuits with some form of local feedback in a later section.

The following is a simple large-signal MOSFET model presented in many textbooks [27, 28]

$$i\_D = \begin{cases} 0 & v\_{GS} - V\_T \le 0\\ K \frac{\prime W}{L} (v\_{GS} - V\_T - \frac{v\_{DS}}{2}) v\_{DS} (1 + \lambda v\_{DS}) & 0 < v\_{DS} \le (v\_{GS} - V\_T)\\ \frac{K \prime W}{2} (v\_{GS} - V\_T)^2 (1 + \lambda v\_{DS}) & 0 \le (v\_{GS} - V\_T) \le v\_{DS} \end{cases}$$

**Fig. 11.24** Simulated AM2PM and gain compression of the LPF with a nonlinear capacitor having the characteristic shown in Fig. 11.15a and driven by a tone at 1MHz

**Fig. 11.25 a** Common-source amplifier AC schematic **b** Common-source amplifier small-signal model

The second equation describes the so-called *linear region* of the characteristic. This is the region where the *overdrive voltage* v*G S* − *VT* is sufficiently large to cause a conductive surface charge *channel* in the active area at the surface between source and drain of the transistor and v*DS* is sufficiently small that the channel extends all along from the source to the drain terminal of the transistor. In this region the transistor behaves essentially as a nonlinear resistor.

The third equation describes the *saturation region* of the characteristic and is the one of interest for implementing amplifiers and most other analogue circuits. In this region v*G S* − *VT* is sufficiently large to cause the formation of a conductive channel. However, v*DS* is larger than the *saturation voltage* which means that the channel is present close to the source side of the transistor, but doesn't extend all along to the drain terminal. In this region the current through the transistor *iD* is almost independent of the drain voltage and the transistor behaves to a good approximation as a voltage-controlled current source with v*G S* the control voltage. The parameter λ takes into account the fact that the length of the channel does depend on the drain

**Fig. 11.26 a** FinFET input side characteristic. *L* = 22 nm, *n*fin = 10, *n <sup>f</sup>* = 16, *m* = 1, *W*eff = *n*fin*n <sup>f</sup> m*71 nm**b** FinFET output side characteristic. *L* = 22 nm, *n*fin = 10, *n <sup>f</sup>* = 16, *m* = 1, *W*eff = *n*fin *n <sup>f</sup> m* 71 nm

voltage and makes *iD* a weak function of the drain voltage [28]. In this simple model the saturation voltage is equal to the overdrive voltage v*G S* − *VT* .

The characteristic of real transistors depends on many effects not captured by this simple model. To enable the design of analogue circuits, very accurate transistor models have been developed and made available in circuit simulators. Unfortunately, most of those models depend on several dozens to hundreds of parameters making them unsuitable for analytical estimates. Figure 11.26a shows the characteristic of a FinFET with a channel length *L* = 22 nm as given by the CMG-BSIM model [29] with parameters from [30]. Figure 11.26a shows <sup>√</sup>*iD* as a function of <sup>v</sup>*<sup>G</sup>* with the source connected to ground and v*<sup>D</sup>* at a fix potential of 0.4 V. It shows that between 0.35 and 0.65 V the deviation of the characteristic from a straight line as predicted by the above simple model is quite small. Figure 11.26b shows *iD* as a function of v*<sup>D</sup>* for a fix gate voltage of 0.5 V. Here as well, the simple model gives a fairly good approximation over an extended range of the characteristic. The pictures show the values of *K* , *VT* and λ obtained by fitting the model to the curves.

Using the simple model the current *iD* can be split in two parts

$$i\_D = i\_{D,a} + i\_{D,b}$$

with

$$\begin{aligned} i\_{D,a} &= \frac{K'}{2} \frac{W}{L} (v\_{GS} - V\_T)^2 \\ i\_{D,b} &= g\_O(i\_{D,a}) v\_{DS} = i\_{D,a} \lambda v\_{DS} \dots \end{aligned}$$

The current*iD*,*<sup>a</sup>* can be interpreted as the output of an ideal voltage-controlled currentsource, while the current *iD*,*<sup>b</sup>* can be interpreted as the current due to a nonlinear load resistance. Since an ideal current source is not affected by its load, from an analysis point of view, it is convenient to analyse the two parts separately and combine the effects with the results of Sect. 10.1. For this reason we lump the components to the right of line *A* in Fig. 11.25b into a nonlinear load. In this section we focus on *iD*,*<sup>a</sup>*. A common nonlinear load will be considered in the next section. Similarly, for analysis purposes, the nonlinear *Cgs* capacitance can be considered part of the driving circuit. In the case of a resistive source we can reuse the results of the previous section with minor modifications. Often however, the distortion introduced by *Cga* is small compared to the one introduced by the *i*-v characteristic. In the following we will simply write *iD* for *iD*,*<sup>a</sup>*.

While the above model can be used to obtain a relatively good approximation of the transconductance *gm* of the transistor, it doesn't provide a good estimate of higher order distortion terms. Therefore, to analyse distortion we approximate the transistor characteristic around the operating point by a third order polynomial

$$i\_d = \mathcal{g}\_m v\_{gs} + \mathcal{g}\_2 v\_{gs}^2 + \mathcal{g}\_3 v\_{gs}^3$$

and extract the coefficients from simulation. Figure 11.27 compares first, second and third order polynomial approximations to the full characteristic at a bias level of *VG* = 0.5 V and *VD* = 0.4 V. At this bias level a third order approximation provides a good approximation up to a signal level of about 150 mV. Figure 11.28 shows the three coefficients *gm*, *g*<sup>2</sup> and *g*<sup>3</sup> as a function of the gate bias voltage *VG* simulated using CMG-BSIM models. While the simple model predicts a vanishing third order coefficient *g*<sup>3</sup> the picture shows that it disappears only at a single gate bias point. For small gate bias voltages the *g*<sup>3</sup> coefficient is positive, while for large values it's

**Fig. 11.27** Polynomial approximations of the transistor characteristic around *VG* = 0.5 V. *VD* = 0.4 V, same transistor size as in Fig. 11.26a

**Fig. 11.28** First three coefficients of a polynomial approximation of the transistor characteristic as a function of the gate bias point. *VD* = 0.4 V, same transistor size as in Fig. 11.26a

**Fig. 11.29** Second and third order coefficients of a polynomial approximation of the Class-AC stage characteristic as a function of the gate bias deviation from the nominal gate bias point *VG* for *VD* = 0.4 V

negative. We may try to minimize third order distortion by biasing the transistor at the bias point at which *g*<sup>3</sup> is zero. However, this strategy doesn't lead to a robust design. In fact mismatch between the transistor and the bias devices introduces a statistical Gaussian bias error with a typical standard deviation of order [31]

$$
\sigma\_{V\_T} \approx \frac{\\$ \text{ mV} \cdot \mu m}{\sqrt{WL}}
$$

where *L* and *W* are the length respectively the width of the active channel. A more fruitful approach is to use two transistors connected in parallel, but biased at different bias levels. One biased at the minimum of *g*<sup>3</sup> and the second at its maximum. The relative size of the two transistors is chosen in such a way as to make the sum of the *g*3s cancel. In this way the deviation of the bias point of each transistor due to mismatch has a smaller impact on the value of *g*3. The resulting effective *g*<sup>3</sup> of the transistor couple, a so called Class-AC stage, is shown in Fig. 11.29.

The IIP3 of the stage can be estimated from (11.11). For a single transistor biased at *VG* <sup>=</sup> <sup>0</sup>.46 V we read from Fig. 11.28 *gm* <sup>≈</sup> 15 mS, *<sup>g</sup>*<sup>3</sup> ≈ −110 mA/V<sup>2</sup> giving an IIP3 of ca. –10.4 dBV. From Fig. 11.29 we see that a Class-AC stages reduces *g*<sup>3</sup> by ca. a factor of 10, while leaving *gm* almost unchanged. From this data we estimate that the IIP3 should be ca. 10 dB higher or approximately –0.4 dBV. The results obtained by numerical simulation with the full transistor models are shown in Fig. 11.30. To suppress the effect of the nonlinear output conductance *go* the drain was held at 0.4 V using an ideal voltage source. The circuit was driven by a voltage source with a resistance of 50 generating two tones of equal amplitude at *f*<sup>1</sup> = 1.01 GHz and *f*<sup>2</sup> = 1.02 GHz. Note that the simulation does include the effect of a slightly

**Fig. 11.30** Simulated IM3 of a Class-AC stage compared to the one of a simple common-source stage consisting of the Class-A device only. Class-A device: *L* = 22 nm, *n*fin = 10, *n <sup>f</sup>* = 16, *m* = 1 biased at *VG* = 0.453 V. Class-C device: *L* = 22 nm, *n*fin = 10, *n <sup>f</sup>* = 8, *m* = 1 biased at *VG* = 0.342 V. *VD* = 0.4 V. |v*t*| is the magnitude of each of the two input tones

nonlinear *Cgs* as well as the one of *Cgd* . The results are in good agreement with our estimates up to a level of about –25 dBV (≈ 80 mV) per tone that translates in a peak input voltage of 160 mV. This is in line with expectation as beyond this level the third order approximation of the characteristic starts to break down as noted earlier.

The Class-AC stage reduces *g*3, but doesn't reduce *g*2. Therefore, if the second order transfer function of the preceding or following stage is also large, then the combined system will still produce third order distortion. If we call the first subsystem G and the second one H the combined third order impulse response is in fact (see Table 10.1)

$$(h \diamond \mathbf{g})\_3 = h\_1 \ast \mathbf{g}\_3 + 2 \, h\_2 \ast \left[ \mathbf{g}\_1 \otimes \mathbf{g}\_2 \right]\_{\text{sym}} + h\_3 \ast \mathbf{g}\_1^{\otimes 3}$$

which doesn't disappear even if *g*<sup>3</sup> and *h*<sup>3</sup> are both zero. One approach to reduce *g*<sup>2</sup> (on top of *g*3) is to use a complementary structure comprised of an nMOS Class-AC stage and a pMOS one as sketched in Fig. 11.31. Here we use common-gate stages (see the next section) to reduce the effects of *Cgd* and combine the currents through a transformer. For good results one needs large coupling between the primary and secondary of the transformer. In a monolithic implementation this is best achieved using equal coils stacked one on top of the other. We can also directly connect the drains of the two stages. In this case the bias currents of the two stages must coincide and a mean of controlling the DC drain voltage is necessary.

**Fig. 11.31** Complementary Class-AC stage suitable for RF applications

# **11.6 Common-Gate Stage**

In this section we investigate the linearity properties of the common-gate stage shown in Fig. 11.32a. We first consider the case in which the stage is driven by a source with internal resistance *Rs* and then specialise to the case in which the stage is used to form a *Cascode*. A basic variant of the Cascode stage suitable for use at RF frequencies is the combination of a common-source stage followed by a common-gate one. The combination of the two stages behaves as an improved common-source stage with much reduced *Cgd* and output conductance *go* [27]. In this section we will show that, under suitable conditions that we will work out, the addition of a common-gate stage does not degrade distortion either. Due to these very desirable benefits the Cascode stage is a widely used configuration.

**Fig. 11.32 a** Common-gate stage AC schematic **b** Common-gate stage Small-signal model

Consider the small-signal model shown in Fig. 11.32b. The input voltage v*<sup>i</sup>* corresponds to the source voltage. The input current is the current entering into the source terminal. The part of the input current that doesn't flow through *Csg* is labeled *ic* and represents the current that flows through the transistor active channel to the drain. The current leaving the drain terminal must therefore have the same value. This is represented by the output side current-controlled current source with unit gain. For simplicity, we neglect the distortion introduced by the nonlinear capacitance *Csg* as well as the one introduced by the drain capacitance that in the figure was lumped together with the load *ZL* . As before we characterise the linearity of the circuit by calculating the nonlinear terms present in the output current *ic*.

# *11.6.1 Nonlinear Transfer Functions*

According to the model presented in Sec. 11.5 (with λ = 0) the static characteristic of the transistor in saturation is given by

$$i\_D = \frac{\beta}{2} v\_{OD}^2$$

with v*O D* = v*G S* − *VT* the overdrive voltage and β = *K W*/*L*. In the present situation it is more convenient to express the input voltage as a function of the current. This is easily achieved by inverting the equation. If we further separate the DC bias terms from the small signal quantities we obtain

$$v\_{gs} = \sqrt{\frac{2(I\_D + i\_d)}{\beta}} - V\_{OD}$$

which we approximate by a third order Taylor polynomial around the operating point

$$v\_{gs} \approx V\_{OD} \left[ \frac{1}{2} \frac{i\_d}{I\_D} - \frac{1}{8} \left( \frac{i\_d}{I\_D} \right)^2 + \frac{1}{16} \left( \frac{i\_d}{I\_D} \right)^3 \right].$$

Using the relations v*<sup>i</sup>* = −v*gs* and *ic* = −*id* we obtain that the input characteristic corresponds to the one of a nonlinear resistor

$$v\_i = r\_1 i\_c + r\_2 i\_c^2 + r\_3 i\_c^3 \tag{11.37}$$

with

$$r\_1 = \frac{1}{g\_m} = \frac{V\_{OD}}{2I\_D}, \qquad r\_2 = \frac{V\_{OD}}{8I\_D^2}, \qquad r\_3 = \frac{V\_{OD}}{16I\_D^3}.\tag{11.38}$$

Note that while the original expression giving *iD* as a function v*O D* doesn't include any third order term, the inverted expression gives a well-defined term of third order.

As a result, the latter is less sensitive to modeling inaccuracies than the former. We will therefore use the above values as estimates for *ri*,*i* = 1,..., 3.

Using the above third order polynomial to model the source-gate nonlinear characteristic we obtain the equivalent circuit shown in Fig. 11.33 with *V*<sup>2</sup> and *V*<sup>3</sup> the second resp. third order equivalent nonlinear source as given in Table 11.11.

The first order transfer function is calculated by discarding the contribution of all sources of order different from one. This amounts to calculating the contribution due to the input source and using a Dirac impulse as input signal. Working in the Laplace domain, the Kirchhoff's voltage law gives

$$R\_s(I\_{c,1} + sC\_{sg}r\_1I\_{c,1}) + r\_1I\_{c,1} = 1\ .$$

Solving for the first order component of *Ic* we find

$$H\_{c,1}(\mathbf{s}) = H\_{\mathbf{l}}(\mathbf{s}) = \frac{1}{R\_s + r\_1 + sC\_{\rm sg}r\_1R\_s} = \frac{1}{R\_s + r\_1} \frac{1}{1 + \frac{s}{a\_0}}$$

with ω<sup>0</sup> = (*Rs* + *r*1)/(*Csgr*1*Rs*).

With *Ic*,<sup>1</sup> and referring to Table 11.11 we can now compute the equivalent source of second order *V*<sup>2</sup> = *r*<sup>2</sup> *Ic*,<sup>1</sup>(*s*1)*Ic*,<sup>1</sup>(*s*2). The second order transfer function is the response to this source which is easily calculated to be

$$H\_{c,2}(s\_1, s\_2) = H\_2(s\_1, s\_2) = -\frac{1 + (s\_1 + s\_2)C\_{\rm sg}R\_s}{R\_s + r\_1 + (s\_1 + s\_2)C\_{\rm sg}r\_1R\_s} r\_2 I\_{c,1}(s\_1)I\_{c,1}(s\_2)$$

or, expressed in terms of *H*<sup>1</sup>

$$H\_2(\mathbf{s}\_1, \mathbf{s}\_2) = -r\_2[1 + (\mathbf{s}\_1 + \mathbf{s}\_2)C\_{\rm sg}R\_s]H\_1(\mathbf{s}\_1 + \mathbf{s}\_2)H\_1(\mathbf{s}\_1)H\_1(\mathbf{s}\_2)\dots]$$

With *Ic*,<sup>2</sup> we can compute the equivalent source of third order

$$V\_3 = 2r\_2 \left[ I\_{c,1}(\mathbf{s}\_1) I\_{c,2}(\mathbf{s}\_2, \mathbf{s}\_3) \right]\_{\text{sym}} + r\_3 I\_{c,1}(\mathbf{s}\_1) I\_{c,1}(\mathbf{s}\_2) I\_{c,1}(\mathbf{s}\_3) \dots$$

The third order transfer function is the response to *V*<sup>3</sup> which is calculated in a similar way as *H*<sup>2</sup>

$$H\_3(\mathbf{r}\_1, \mathbf{r}\_2, \mathbf{r}\_3) = -[1 + (\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3)C\_{\text{rg}}R\_s]H\_1(\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3)V\_3(\mathbf{r}\_1, \mathbf{r}\_2, \mathbf{r}\_3) \dots$$

# *11.6.2 Cascode*

We now specialise to the case of a Cascode. Since the transfer functions are found by analysing a sequence of linear networks, we can use the Thévenin-Norton theorem [32] to transform the source into the parallel connection of an ideal current source and the internal resistor *Rs* as shown in Fig. 11.34. The resistor *Rs* corresponds now to the reciprocal of the output conductance *go* of the driving common-source stage. The latter is usually much larger than *r*1, so it has little effect on the operation of the circuit. For this reason and to obtain easier to interpret expressions we calculate the transfer functions in the limit as *Rs* tends to infinity. Under this assumption and using the results of the previous section, the first, second and third order transfer functions from the ideal source *Is* to the output current *Ic* are

$$H\_{c1}(s) := \lim\_{\mathcal{R}\_s \to \infty} H\_1(s)\mathcal{R}\_s = \frac{1}{1 + \frac{s}{a\eta}}\,,\tag{11.39}$$

$$\begin{split} H\_{c2}(\mathbf{s}\_1, \mathbf{s}\_2) &:= \lim\_{R\_l \to \infty} H\_2(\mathbf{s}\_1, \mathbf{s}\_2) R\_s^2 \\ &= -r\_2(\mathbf{s}\_1 + \mathbf{s}\_2) C\_{\text{sg}} H\_{c1}(\mathbf{s}\_1) H\_{c1}(\mathbf{s}\_2) H\_{c1}(\mathbf{s}\_1 + \mathbf{s}\_2) \end{split} \tag{11.40}$$

and

$$\begin{split} H\_{c\uparrow}(s\_1, s\_2, s\_3) &:= \lim\_{R\_s \to \infty} H\_3(s\_1, s\_2, s\_3) R\_s^3 \\ &= (s\_1 + s\_2 + s\_3) C\_{\rm sg} \left\{ 2r\_2^2 \left[ H\_{c1}(s\_1 + s\_2)(s\_1 + s\_2) C\_{\rm sg} \right]\_{\rm sym} - r\_3 \right\} \end{split} \tag{11.41}$$
 
$$\cdot \ H\_{c1}(s\_1) H\_{c1}(s\_2) H\_{c1}(s\_3) H\_{c1}(s\_1 + s\_2 + s\_3) \tag{11.41}$$

**Fig. 11.34** Equivalent circuit for the calculation of the nonlinear transfer functions of the Cascode circuit

respectively, where now ω<sup>0</sup> = 1/(*Csgr*1). Note that the symmetrization in *Hc*<sup>3</sup> is intended over all three Laplace variables *s*1,*s*<sup>2</sup> and *s*<sup>3</sup>

$$\begin{aligned} \left[H\_{c1}(s\_1+s\_2)(s\_1+s\_2)C\_{sg}\right]\_{\text{sym}} &= \frac{1}{3} \left\{ H\_{c1}(s\_1+s\_2)(s\_1+s\_2)C\_{sg} \right\} \\ &+ H\_{c1}(s\_1+s\_3)(s\_1+s\_3)C\_{sg} \\ &+ H\_{c1}(s\_3+s\_2)(s\_3+s\_2)C\_{sg} \right\}. \end{aligned}$$

Consider now the classic two-tones third order intermodulation test with one tone at ω<sup>1</sup> and the second one at ω<sup>2</sup> = ω<sup>1</sup> + ω. In particular consider the IM3 tone characterised by *m* = (1, 0, 2, 0). Assuming |ω| |ω1| the above symmetrised expression can be approximated by

$$\left[H\_{c1}(s\_1+s\_2)(s\_1+s\_2)C\_{sg}\right]\_{\text{sym}} \approx \frac{2}{3} \frac{J\omega\_1 C\_{sg}}{1+J\frac{2\alpha\_1}{\alpha\_0}}$$

and, with it, the third order transfer function by

$$\begin{split} &H\_{\mathsf{C3}}(\mathsf{j}\,\omega\mathsf{l}\_{1},\mathsf{j}\,\omega\mathsf{l}\_{1},-\mathsf{j}\,\omega\mathsf{l}\_{2}) \\ &\approx \mathsf{j}\,\omega\mathsf{l}\_{1}\mathrm{C}\_{\mathsf{S}\mathsf{g}}\left\{\frac{4}{3}r\_{2}^{2}\mathsf{j}\,\omega\mathsf{l}\_{1}\mathrm{C}\_{\mathsf{S}\mathsf{g}}H\_{\mathsf{c1}}(\mathsf{2}\,\mathsf{j}\,\omega\mathsf{l}\_{1}) - r\_{3}\right\}H\_{\mathsf{c1}}(\mathsf{j}\,\omega\mathsf{l}\_{1})H\_{\mathsf{c1}}(\mathsf{j}\,\omega\mathsf{l}\_{1})H\_{\mathsf{c1}}(-\mathsf{j}\,\omega\mathsf{l}\_{1})H\_{\mathsf{c1}}(\mathsf{j}\,\omega\mathsf{l}\_{1}) .\end{split}$$

If ω<sup>1</sup> ≤ ω0/5 the value of |*Hc*1(jω1)| can be approximated by 1 with an error of less than 2% and the magnitude of |*Hc*3| becomes very nearly

$$\left| \rho o\_1 C\_{\rm sg} \left| \frac{4}{3} r\_2^2 \,\rho o\_1 C\_{\rm sg} H\_{c1}(2 \,\rho o\_1) - r\_3 \right| \right| .$$

Using (11.38) for the coefficients of the nonlinear characteristic of the transistor we thus obtain

$$|H\_{c3}(j\omega\_1, j\omega\_1, -j\omega\_2)| \approx \frac{1}{8I\_D^2} \frac{\omega\_1 \mathcal{C}\_{\rm sg}}{g\_m} \left| \frac{2}{3} \frac{j\omega\_1 \mathcal{C}\_{\rm sg}}{g\_m} H\_{c1}(2j\omega\_1) - 1 \right|. \tag{11.42}$$

The magnitude of the IM3 tone normalised to the DC current *ID* is therefore

$$\left| \left| \frac{I\_{c3,m}}{I\_D} \right| \approx \frac{3}{32} \frac{\omega\_1 C\_{\rm sg}}{g\_m} \left| \frac{2}{3} \frac{J \omega\_1 C\_{\rm sg}}{g\_m} H\_{c1} (2J \omega\_1) - 1 \right| \left| \frac{I\_s}{I\_D} \right|^3. \right.$$

From this expression we can read several interesting aspects. First, both the second and the third order nonlinearities of the transistor characteristic contribute to the IM3 tone. This is visible from the appearance of *r*<sup>3</sup> as well as *r*<sup>2</sup> in the expression for *Hc*3. The contribution to an intermodulation product of third order by second-order nonlinearities is due to the presence of (local) feedback. This can be appreciated graphically by looking at Fig. 11.34. The second order source *V*<sup>2</sup> creates a current that circulates again through the input of the circuit. Therefore, the generated second order tones pass again through the second order distortion where they can mix with the input tones to produce frequency mixes of third order.

The contribution to the IM3 tone from second-order distortion is approximately orthogonal to the one from third order distortion. Therefore, it's not possible to size the transistor in such a way as to make the two cancel each other, not even at a specific frequency.

The IM3 is largely dominated by *r*<sup>3</sup> up to very high frequencies and for ω<sup>1</sup> up to ca. ω0/10 it is proportional to ω1. The quantity *gm*/*Csg* corresponds (neglecting *Cgd* ) to the angular frequency at which a common-source stage has unity current gain. It is called *transit frequency* and denoted by

$$
\omega\_{\overline{T}} = \frac{\mathcal{g}\_{\text{m}}}{\mathcal{C}\_{\text{sg}}}.\tag{11.43}
$$

It is one of the key parameters used to characterise the high-frequency capabilities of transistors. With it the magnitude of the IM3 up to ca. ω<sup>1</sup> ≤ ω*<sup>T</sup>* /10 can be approximated by

$$\left|\frac{I\_{c3,m}}{I\_D}\right| \approx \frac{3}{32} \frac{\alpha\_1}{\alpha\_T} \left|\frac{I\_s}{I\_D}\right|^3.$$

This shows that for low distortion one needs fast transistors. Looking again at Fig. 11.34 we can appreciate that in the limit as ω1/ω*<sup>T</sup>* tends to zero (which means that *Csg* tends to zero) the nonlinear sources become floating and can't generate any frequency mix current (remember that we also assume *Rs* → ∞).

In general, distortion introduced by the input (common-source) stage of the Cascode configuration generates frequency mixes of second-order. These can mix with the fundamental tones in the second-order distortion of the output (common-gate) stage to produce other IM3 components. However, since |*Hc*2(j2ω1, −jω2)| is also proportional to ω1/ω*<sup>T</sup>* this does not substantially change the situation.

For simplicity in our discussion we assumed *Rs* → ∞. From the gained insight we can appreciate that at low frequencies it is a finite value of *Rs* which will limit IM3 and, the lower *Rs*, the higher the IM3. In general however, the common-gate stage of a Cascode is not the stage limiting low frequency linearity.

# **11.7 Degenerated Common-Source Stage**

In this section we investigate the effect of local feedback on distortion and show that introduction of feedback may lead to degraded linearity. As a concrete example we analyse the degenerated common-source amplifier depicted in Fig. 11.35a. The impedance *Ze* is called the *degeneration impedance*. Its presence reduces the gatesource voltage across the transistor by an amount proportional to the output current. In

**Fig. 11.35 a** Degenerated common-source stage AC schematic **b** Degenerated common-source stage small-signal model

other words it introduces feedback around the transistor. The impedance *Zs* represents a generic driving impedance.

One way to analyse the circuit is to model the transistor as a nonlinear voltagecontrolled current-source characterised by a third (or higher) order polynomial

$$i\_d = \mathbf{g}\_m \boldsymbol{v}\_{\mathbf{g}s} + \mathbf{g}\_2 \boldsymbol{v}\_{\mathbf{g}s}^2 + \mathbf{g}\_3 \boldsymbol{v}\_{\mathbf{g}s}^3$$

and solve Kirchhoff's equations for v*gs*. Having found the voltage components v*gs*,<sup>1</sup>,...,v*gs*,*<sup>k</sup>* up to some order of interest *k*, one then finds the output current components*io*,<sup>1</sup>,...,*io*,*<sup>k</sup>* by use of the polynomial approximating the transistor characteristic. Instead of using this method we show how the use of a *nullor* allows the problem to be solved in a more direct way, by permitting to directly obtain an equation for the output current *io*.

Nullators and Norators are pathological network elements. A *Nullator* is a two terminal element represented by the symbol shown in Fig. 11.36a and characterised by the *two* equations

$$V = 0 \qquad I = 0 \dots$$

A *Norator* is a two terminal element represented by the symbol shown in Fig. 11.36b whose current and voltage are arbitrary and completely determined by the surrounding network. In other words it is characterised by *zero* equations. For a linear network to have a well-defined solution a Nullator must therefore always appear alongside a Norator. Such a pair is called a *Nullor* and can be used to model several elements such as controlled sources, OpAmps and transistors. In particular, we can use it to represent the inverted series (see Sect. 11.6.2)

$$v\_{gs} = r\_1 i\_d + r\_2 i\_d^2 + r\_3 i\_d^3$$

of the transistor characteristic. A nullor based small-signal model of the degenerated common-source stage using this transistor characteristic representation is shown in Fig. 11.37. Note that the transistor characteristic is represented by a nonlinear resistor.

**Fig. 11.36 a** Symbol of the Nullator reminding the shape of the zero digit 0 **b** Symbol of a Norator reminding the infinity symbol ∞

# *11.7.1 Nonlinear Transfer Functions*

From the model in Fig. 11.37 and Kirchhoff's laws we obtain the following system of equations relating the output current *Io* to the input signal *Vs*

$$\begin{aligned} V\_s &= (Z\_s + \frac{1}{sC\_{gs}})I\_s + Z\_e(I\_s + I\_o), \\ V\_{gs} &= \frac{1}{sC\_{cs}}I\_s \\ V\_{gs} &= r\_1I\_o + r\_2I\_o^2 + r\_3I\_o^3 \end{aligned}$$

After eliminating *Vgs* and *Is* we obtain the single equation

$$\begin{split} V\_s &= \left[ r\_1 + Z\_\epsilon + (Z\_s + Z\_\epsilon) r\_1 C\_{gs} s \right] I\_o \\ &+ \left[ 1 + (Z\_s + Z\_\epsilon) C\_{gs} s \right] (r\_2 I\_o^2 + r\_3 I\_o^3) \,. \end{split} \tag{11.44}$$

The first order transfer function is obtained by applying a Dirac impulse as input and discarding all terms of order higher than one in the equation. This is equivalent

**Fig. 11.37** Nullor based small-signal model of a degenerated common-source stage

to removing the nonlinear sources from the equivalent circuit. Using the relation *r*<sup>1</sup> = 1/*gm* we obtain

$$H\_1(\mathbf{s}) = \frac{\mathcal{g}\_m}{L(\mathbf{s})} = \frac{\mathcal{g}\_m}{1 + \mathcal{g}\_m Z\_e + s\mathcal{C}\_{\mathbf{g}s}(Z\_e + Z\_s)} \,. \tag{11.45}$$

To compute the second order nonlinear transfer function we first insert the first order solution into the nonlinear terms and retain only second order ones. Alternatively we can use Fig. 11.11 to read the value of the second order nonlinear source for a nonlinear resistor. In both cases, after adjusting the representation of the differential operator by replacing the variable *s* by *s*<sup>1</sup> + *s*2, we obtain

$$\begin{aligned} 0 &= \left[r\_1 + Z\_\epsilon + (Z\_s + Z\_\epsilon)r\_1 C\_{gs}(s\_1 + s\_2)\right] H\_2(s\_1, s\_2) \\ &+ \left[1 + (Z\_s + Z\_\epsilon)C\_{gs}(s\_1 + s\_2)\right] r\_2 H\_1(s\_1) H\_1(s\_2) \dots \end{aligned}$$

Note that for brevity we didn't explicitly write the argument of impedances. Their value has of course to be evaluated at *s*<sup>1</sup> + *s*2. The second order nonlinear transfer function is thus

$$H\_2(\mathbf{s}\_1, \mathbf{s}\_2) = -r\_2 H\_1(\mathbf{s}\_1) H\_1(\mathbf{s}\_2) H\_1(\mathbf{s}\_1 + \mathbf{s}\_2) \left[ 1 + (Z\_\mathbf{s} + Z\_\mathbf{e}) C\_{\mathbf{g}\mathbf{x}}(\mathbf{s}\_1 + \mathbf{s}\_2) \right]. \tag{11.46}$$

To find the third order nonlinear transfer function we proceed in a similar way and obtain

$$H\_3(s\_1, s\_2, s\_3) = -\left\{ 2r\_2 \left[ H\_1(s\_1)H\_2(s\_2, s\_3) \right]\_{\text{sym}} + r\_3 H\_1(s\_1)H\_1(s\_2)H\_1(s\_3) \right\}$$

$$H\_1(s\_1 + s\_2 + s\_3) \left[ 1 + (Z\_s + Z\_e)C\_{gs}(s\_1 + s\_2 + s\_3) \right]. \quad (11.47)$$

# *11.7.2 Resistive Degeneration*

We now specialise to the case of resistive degeneration *Ze* = *Re* and a resistive driving impedance *Zs* = *Rs* and calculate the intermodulation products of third order when driven by two tones of equal amplitudes at frequency ω<sup>1</sup> and ω<sup>1</sup> + ω respectively. As usual we assume ω ω1.

As a first step, to calculate *H*<sup>3</sup> for the mix (1, 0, 2, 0) we evaluate

$$\begin{aligned} & \left[ H\_1(j\omega\_1) H\_2(j\omega\_1, -j(\omega\_1 + \Delta\omega)) \right]\_{\text{sym}} \\ & \approx -r\_2 \frac{g\_m^4}{3 \, L(j\omega\_1)^2 L(-j\omega\_1)} \Big[ 2 \frac{N(-j\Delta\omega)}{L(-j\Delta\omega)} + \frac{N(2\omega\_1)}{L(2\omega\_1)} \Big] \end{aligned}$$

with

$$N(s) := 1 + (Z\_s + Z\_e)sC\_{gs} \dots$$

Inserting this expression into *H*<sup>3</sup> we obtain

$$\begin{split} &H\_{3}(j\omega\_{1},j\omega\_{1},-j(\omega\_{1}+\Delta\omega)) \\ &\approx \frac{g\_{m}^{4}N(j\omega\_{1})}{L(j\omega\_{1})^{3}L(-j\omega\_{1})} \left\{\frac{2}{3}r\_{2}^{2}g\_{m}\left[2\frac{N(-j\Delta\omega)}{L(-j\Delta\omega)}+\frac{N(2\omega\_{1})}{L(2\omega\_{1})}\right]-r\_{3}\right\}. \end{split}$$

Since in Sect. 11.5 we characterised the transistor in terms of *gm*, *g*<sup>2</sup> and *g*3, we express the coefficients *r*<sup>2</sup> and *r*<sup>3</sup> in terms of them using the results of Sect. 11.2.1

$$r\_2 = \frac{-g\_2}{g\_m^3} \qquad\qquad\qquad r\_3 = \frac{2g\_2^2 - g\_m g\_3}{g\_m^3} \dots$$

Substituting these expressions leads finally to

$$\begin{split} H\_3(j\,\omega\_1, j\,\omega\_1, -j\,(\omega\_1 + \Delta\omega)) \\ \approx & \frac{N(j\,\omega\_1)}{L(j\,\omega\_1)^3 L(-j\,\omega\_1)} \Bigg\{ 2\frac{g\_2^2}{g\_m} \Bigg[ \frac{2}{3} \frac{N(-j\,\Delta\omega)}{L(-j\,\Delta\omega)} + \frac{1}{3} \frac{N(2\omega\_1)}{L(2\omega\_1)} - 1 \Bigg] + g\_3 \Bigg\}. \end{split} \tag{11.48}$$

We can now discuss the effect of a small amount of feedback introduced by a small resistor *Re* on linearity. First note that, as expected, for *Ze* = 0 the term in square brackets vanishes making the IM3 depend only on *g*3. As *Re* is increased the contribution of *g*<sup>2</sup> increases and at low to moderate frequencies there is some possibility of cancelling between the contribution due to *g*<sup>3</sup> and *g*2. As *Re* increases beyond this value, the second order contribution starts to dominate. At high frequencies only imperfect cancelling is possible due to shift in phase of the *g*<sup>2</sup> contribution.

Figure 11.38b shows the low to moderate frequency IIP3 of the Class-AC stage from Sect. 11.5. It shows that cancelling occurs for very small amounts of feedback and, as is typical for cancelling effects, the performance is very sensitive to small variations in component values. Due to the large value of *g*2, a small to moderate amount of feedback with *gm Re* in the range of 0.03–2.5 leads to an actual degradation in IIP3. Note that small values of *Ze* may be introduced unintentionally by parasitic effects due to the interconnections between components.

A linearity improvement can be obtained by using a large amount of feedback *gm Re* 1. To simplify calculations let's assume ω1*Cgs*(*Rs* + *Re*) 1, then

$$H\_3(j\omega\_1, j\omega\_1, -j(\omega\_1 + \Delta\omega)) \approx \frac{1}{L(j\omega\_1)^3 L(-j\omega\_1)} \left\{-2\frac{\text{g}\_2^2}{\text{g}\_m} + \text{g}\_3\right\}$$

and

$$H\_1(Jac\_1) \approx \frac{1}{R\_e}.$$

Using (11.11) to calculate the IIP3 shows that under these conditions the latter does in fact increase with increasing *Re*

$$\text{IIIP3} \approx \sqrt{\frac{4(g\_m R\_e)^3}{3\left|\frac{g\_3}{g\_m} - 2\left(\frac{g\_2}{g\_m}\right)^2\right|}}, \qquad g\_m R\_e \gg 1\,. \tag{11.49}$$

The reason for the improvement is a substantially reduced amplitude of the voltage *Vgs* controlling the nonlinear sources compared to the circuit input signal *Vs*. Linearity thus comes at the expenses of a much reduced signal transconductance which for RF circuits is often not acceptable.

# *11.7.3 Inductive Degeneration*

A second type of degeneration widely used ar RF frequencies is the inductive one. This type of degeneration is often used in the input stage of low-noise RF amplifiers (LNAs), a basic small-signal model of which is shown in Fig. 11.39.

An important characteristic of RF amplifiers is the input impedance *Zi* . In many situations it is required to be real and equal to the source impedance *Rs*, or some standard value. From our model a simple calculation shows that *Zi* is given by

$$Z\_i = R\_i + \jmath X\_i = g\_m \frac{L\_\epsilon}{C\_{gs}} + s(L\_s + L\_\epsilon) + \frac{1}{sC\_{gs}} \,.$$

A degeneration inductor *Le* thus allows a real part to be introduced to the input impedance without using resistors. Avoiding resistors at the input of LNAs is necessary to avoid limiting the achievable sensitivity. The reactive part of the impedance can be cancelled over some frequency band by resonating it, in our example using the inductor *Ls*. The input network thus consists of a series resonator tuned at the center frequency of the band of interest.

In this section we analyse the linearity characteristics of this stage and in particular its IP3. The nonlinear transfer functions are readily obtained from our previous results by setting *Ze* = *s Le* and *Zs* = *Rs* + *s Ls*. Doing so, the first order transfer function becomes

$$H\_{\rm l}(s) = \frac{g\_m}{1 + s(g\_m L\_e + C\_{gs} R\_s) + s^2 C\_{gs} (L\_e + L\_s)} \,.$$

Note that *gm Le* = *Cgs Ri* . We can therefore write the denominator in the standard form

$$H\_1(\mathbf{s}) = \frac{\mathcal{G}\_m}{1 + \frac{s}{a\_0}\frac{1}{q\_t} + \left(\frac{s}{a\_o}\right)^2} \tag{11.50}$$

with

$$\begin{aligned} \alpha\_0^2 &= \frac{1}{C\_{gs}(L\_\epsilon + L\_s)} & q\_i &= \frac{1}{R\_i \alpha\_0 C\_{gs}}\\ \frac{1}{q\_t} &= \frac{1}{q\_i} + \frac{1}{q\_s} & q\_s &= \frac{1}{R\_s \alpha\_0 C\_{gs}} \end{aligned}$$

The same parameters can also be used to put *N*(*s*) in standard form

$$N(s) = 1 + \frac{s}{\alpha\_0} \frac{1}{q\_s} + \left(\frac{s}{\alpha\_0}\right)^2 \dots$$

The value of *H*<sup>3</sup> relevant for the two-tones IP3 test can then be obtained by substituting these expressions in (11.48). The resonance frequency of the input resonator is evidently set to the frequency of the input signal ω<sup>0</sup> = ω<sup>1</sup> so that

$$N(j\omega\_1) = \frac{J}{q\_s}, \qquad \qquad L(j\omega\_1) = \frac{J}{q\_l}, \qquad \qquad H\_l(j\omega\_1) = -j\,q\_l \text{gm}\dots$$

and

$$H\_3(j\omega\_1, j\omega\_1, -j(\omega\_1 + \Delta\omega)) \approx \frac{-jq\_t^4}{q\_s} \left\{ 2\frac{g\_2^2}{g\_m} \left[ \frac{2}{3} + \frac{1}{3} \frac{N(2\omega\_1)}{L(2\omega\_1)} - 1 \right] + g\_3 \right\}.$$

With these results we can compute the IIP3 using Eq. (11.11) as before

$$\text{IIP3} \approx \frac{2}{q\_t} \left\lceil \frac{q\_s}{q\_t} \frac{1}{\left| 2 \binom{\underline{\mathcal{E}2}}{g\_m} \right|^2 \left[ \frac{2}{3} + \frac{1}{3} \frac{N(2o\_l)}{L(2o\_l)} - 1 \right] + \frac{\underline{\mathcal{E}3}}{g\_m}} \right\rceil. \tag{11.51}$$

In the common case in which the input resistance *Ri* is equal to the source impedance *qs*/*qt* = 2. The IP3 of the circuit is thus approximately inversely proportional to the quality factor of the input resonance. This is due to the fact that at resonance the magnitude of the voltage across the reactive components is roughly *qt* times the one across the resistive part. In other words, the voltage *Vgs* controlling the nonlinear sources is amplified by a factor of ca. *qt* compared to the input signal *Vs*. This very same characteristic is also the reason for the good noise characteristic of the circuit: the input network provides some voltage gain before the first noisy device.

The quality factor of the network also influences the relative contributions of *g*<sup>2</sup> and *g*<sup>3</sup> to distortion through the term

$$\frac{N(2J\omega\_1)}{L(2J\omega\_1)} = \frac{-3 + J\frac{2}{q\_s}}{-3 + J\frac{2}{q\_t}}.$$

For large quality factors *qt*, *qs* 1 the ratio approaches 1 which makes the IP3 essentially independent of *g*2. For small quality factor values the contribution due to *g*<sup>2</sup> is not negligible, especially if *g*3/*gm* is small compared to (*g*2/*gm*)<sup>2</sup> as is the case with Class-AC stages.

In practical implementations the component values are affected by manufacturing variations. For this reason and to avoid the need for tuning, the quality factor *qt* is most often chosen to have a value smaller than 5.

# **11.8 Pseudo-Differential Circuits**

The analog signal path of many RF and mixes-signal integrated circuits is *differential*. This means that the signal of interest is transmitted on two equal lines carrying the same signal, but with opposite polarities. The main objective is to make the system insensitive to noise affecting both lines equally. This can be, for example, noise due to the activity of digital circuits propagating through the common substrate of the IC. A *differential circuit* is one that is designed to process the difference between the two input terminals sensing the two lines carrying the signal and rejecting the common component. Formally, if v<sup>+</sup> *<sup>i</sup>* and v<sup>−</sup> *<sup>i</sup>* are the two input voltages (relative to ground), the *differential-mode* voltage is defined as

$$v\_d := v\_i^+ - v\_i^-$$

and the *common-mode* voltage as

$$v\_c := \frac{v\_i^+ + v\_i^-}{2} \dots$$

Using this representation the two input voltages can be written as

$$v\_i^+ = v\_c + \frac{v\_d}{2} \,, \qquad \qquad \qquad v\_i^- = v\_c - \frac{v\_d}{2} \,.$$

The prototypical differential circuit is the *differential-pair* shown in Fig. 11.40. In the ideal drawn form the output currents are always *i*<sup>+</sup> *<sup>o</sup>* = *i*<sup>−</sup> *<sup>o</sup>* = *I*0/2 as long as v<sup>1</sup> *<sup>i</sup>* = v<sup>−</sup> *i* . Any common-mode signal component is thus fully rejected.

Differential circuits do also have disadvantages. A real current source is implemented with transistors and requires a certain voltage across its terminals to work

#### **Fig. 11.40** Differential pair

**Fig. 11.41** Pseudodifferential transconductance

properly. This reduces the headroom left for signal processing and in modern processes operating at supplies voltages below 1.0 V poses severe challenges. In addition, a current source does not only generate a DC current, but it also generates noise, reducing the sensitivity of the circuit to small signals.

*Pseudo-differential* circuits are a class of circuits that alleviate some of these problems while retaining some of the benefits of differential circuits. They are circuits composed by two equal single-ended sub-circuits each connected to one of the two lines carrying the differential signal. An example pseudo-differential transconductance is shown in Fig. 11.41.

In pseudo-differential circuits the input common-mode signal component is not rejected, but, if the circuit is sufficiently linear, the common-mode input appears as a common-mode signal at the output and remains separable from the wanted differential signal which appears at the output in differential form. The objective of this section is to quantify the conversion between common-mode and differentialmode in weakly nonlinear circuits.

We first show that weakly nonlinear circuits driven by a purely differential input signal produce a mixture of differential- and common-mode output signals. Let's denote the input signals by *x*+, *x*−. the output signals by *y*+, *y*−, the relative commonand differential-mode components by the same letter with index *c* and *d* respectively; and the nonlinear transfer function of order *k* of the single-ended subsystems by *hk* . By assumption the input signal is purely differential

$$x^{+} = \frac{x\_d}{2} \qquad\qquad\qquad\qquad x^{-} = -\frac{x\_d}{2}\,\,\,\,\,\,\,$$

The outputs of order *k* are therefore

$$\mathbf{y}\_k^+ = \frac{1}{2^k} h\_k \ast \mathbf{x}\_d^{\otimes k}, \qquad \qquad \qquad \mathbf{y}\_k^- = \frac{(-1)^k}{2^k} h\_k \ast \mathbf{x}\_d^{\otimes k}$$

from which we conclude that for *k even* the output is a common-mode signal, while for *k odd* it is differential.

Let's now consider the response of a weakly nonlinear circuit to a mixture of differential- and common-mode signals

$$\mathbf{x}^+ = \mathbf{x}\_c + \frac{\mathbf{x}\_d}{2} \qquad \qquad \qquad \qquad \mathbf{x}^- = \mathbf{x}\_c - \frac{\mathbf{x}\_d}{2} \dots$$

Let's first consider the second-order response of the two circuit halves. The positive and negative outputs are

$$\begin{aligned} \mathbf{y}\_2^+ &= h\_2 \ast \left( \mathbf{x}\_c + \frac{\mathbf{x}\_d}{2} \right)^{\otimes 2} \\ &= h\_2 \ast \mathbf{x}\_c^{\otimes 2} + \frac{1}{4} h\_2 \ast \mathbf{x}\_d^{\otimes 2} + h\_2 \ast [\mathbf{x}\_d \otimes \mathbf{x}\_c]\_{\text{sym}} \end{aligned}$$

and

$$\begin{aligned} \mathbf{y}\_2^- &= h\_2 \ast \left( \mathbf{x}\_c - \frac{\mathbf{x}\_d}{2} \right)^{\otimes 2} \\ &= h\_2 \ast \mathbf{x}\_c^{\otimes 2} + \frac{1}{4} h\_2 \ast \mathbf{x}\_d^{\otimes 2} - h\_2 \ast [\mathbf{x}\_d \otimes \mathbf{x}\_c]\_{\text{sym}} \end{aligned}$$

respectively. The second-order differential output signal component is therefore

$$\mathbf{y}\_{d,2} = 2h\_2 \ast [\mathbf{x}\_d \otimes \mathbf{x}\_c]\_{\text{sym}}$$

which includes the common-mode input signal. A similar calculation for the third order component gives

$$\mathbf{y}\_{d,3} = h\_3 \ast \left(\frac{\boldsymbol{\chi}\_d^{\otimes 3}}{4} + \mathbf{3} \left[\boldsymbol{\chi}\_d \otimes \boldsymbol{\chi}\_c^{\otimes 2}\right]\_{\text{sym}}\right),$$

which also includes a term depending on the input common-mode. One can generalise the calculations and show that *the differential- and common-mode input components are mixed by nonlinearities of all orders.*

Consider now the cascade of two pseudo-differential weakly nonlinear circuits driven by a purely differential signal. If the two subsystems are optimised independently to maximise IP3 without paying attention to even order distortion components, then, when the two subsystems are put together, one may obtain a lower than expected total IP3. This is because the first stage produces second (and higher even) order mixes as common-mode signals which are also fed as input to the second subsystem. The second (and higher order) distortion components of the latter will then mix differential- and common-mode to produce differential output signal components at the IM3 frequencies.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 12 Linear Time-Varying Systems**

# **12.1 Linear Time-Varying Systems**

In this chapter we consider linear time-varying (LTV) systems. These are systems whose behaviour depends on the particular moment in time at which they are used. The change with time may arise for example due to the sensitivity of system components to environmental changes. Examples of systems suffering from this type of sensitivity include wireless communication systems in which the communication channel between the transmitter- and the receiver-antennas is highly dependent on the environment in between and around the antennas. The variation in time may also be imposed intentionally by design to achieve functions that can't be realised with LTI-systems. This is the case for example in communication mixers whose function is to shift in frequency the spectrum of a signal.

In this section we introduce a definition of linear time-varying systems valid under the assumption that all signals are regular distributions. A generalisation will be given in Sect. 12.3. The assumption of linearity means that the *superposition principle* must hold. In addition, as for LTI-systems, we require that LTV-systems depend *continuously* on the input signal. We therefore define

**Definition 12.1** (*LTV-system*) A single-input, single-output, linear time-varying system is a system that when driven by the input signal *x* produces a response *y* that can be expressed by

$$\mathbf{y}(t) = h(t,\xi) \*\_{t} \mathbf{x}(t) := \int\_{-\infty}^{\infty} h(t,\xi) \mathbf{x}(t-\xi) d\xi \,. \tag{12.1}$$

*h*(*t*,ξ) is the *time-varying impulse response* of the system.

The meaning of the variable ξ is best illustrated by anticipating somewhat the results of Sect. 12.3 and apply as input signal a Dirac impulse at time *t*<sup>0</sup>

$$\mathbf{y}(t) = h(t, \boldsymbol{\xi}) \*\_{t} \boldsymbol{\delta}(t - t\_{0}) = h(t, t - t\_{0}) \;.$$

© The Author(s) 2024

257

F. Beffa, *Weakly Nonlinear Systems*, Understanding Complex Systems, https://doi.org/10.1007/978-3-031-40681-2\_12

Thus ξ represents the time lapsed since the application of the input impulse.

For *causal* systems the output must vanish before the input is applied. This implies that the impulse response must vanish for negative values of ξ

$$h(t, \xi) = 0 \,, \qquad \xi \, < 0 \,. \tag{12.2}$$

Therefore, the response of a causal system driven by a regular distributions *x* ∈ D + is given by

$$\mathbf{y}(t) = \int\_0^t h(t,\xi)\mathbf{x}(t-\xi)d\xi = \int\_0^t h(t,t-\xi)\mathbf{x}(\xi)d\xi \dots$$

# **12.2 Linear Ordinary Differential Equations**

An important class of LTV-systems is the one of systems described by differential equations with variable coefficients of the form

$$L\left(t, \frac{\mathbf{d}}{\mathbf{d}t}\right)\mathbf{y}(t) = N\left(t, \frac{\mathbf{d}}{\mathbf{d}t}\right)\mathbf{x}(t)$$

with

$$\begin{aligned} L\left(t, \frac{\mathbf{d}}{\mathbf{d}t}\right) &= \frac{\mathbf{d}^m}{\mathbf{d}t^m} + a\_{m-1}(t) \frac{\mathbf{d}^{m-1}}{\mathbf{d}t^{m-1}} + \dots + a\_0(t), \\ N\left(t, \frac{\mathbf{d}}{\mathbf{d}t}\right) &= b\_n(t) \frac{\mathbf{d}^n}{\mathbf{d}t^n} + b\_{n-1}(t) \frac{\mathbf{d}^{n-1}}{\mathbf{d}t^{n-1}} + \dots + b\_0(t) \end{aligned}$$

time-dependent differential operators. It's easy to verify that every such system with *n* < *m* can be represented in a state-space representation with time dependent matrices

$$\frac{\mathbf{d}}{\mathbf{d}t}\mathbf{u} = A(t)\mathbf{u} + B(t)\mathbf{x} \qquad A(.) \in C(\mathbb{R}, \mathbb{C}^{n \times n}), \ B(.) \in C(\mathbb{R}, \mathbb{C}^{n \times 1}) \tag{12.3}$$

$$\mathbf{y} = \mathbf{C}(t)\boldsymbol{\mu} + D(t)\mathbf{x} \qquad \mathbf{C} \in \mathbf{C}(\mathbb{R}, \mathbb{C}^{1 \times n}), \boldsymbol{D} \in \mathbf{C}(\mathbb{R}, \mathbb{C}) \tag{12.4}$$

with *u* the system state. Given an input signal *x*, the system response *y* is fully determined if one can find a state *u* satisfying the first equation and suitable initial conditions. The study of the dynamics of the system can therefore be reduced to the study of a system of *n* differential equations of first order.

# *12.2.1 Fundamental Solution*

Consider the initial value problem described by the system of *n* differential equations

$$\frac{\mathbf{d}}{\mathbf{d}t}\mathbf{y} = A(t)\mathbf{y} \tag{12.5}$$

and initial conditions

$$\mathbf{y}(0) = \mathbf{y}\_0 \in \mathbb{C}^n \tag{12.6}$$

with *A*(.) an *n* × *n* matrix of complex valued functions of time *ai j*(.). If the functions forming *A*(.) are bounded and continuous, then the right-hand side of the equation is Lipschitz continuous and, as discussed in Sect. 9.1, the equation has a unique solution. By choosing the initial value equal to the unit vector *<sup>e</sup> <sup>j</sup>* <sup>∈</sup> <sup>C</sup>*<sup>n</sup>* pointing in direction *j*, for *j* = 1,..., *n* we can thus obtain *n* independent solutions *yj* of the equation. The matrix formed by the column vectors *yj*

$$Y(t) := \begin{bmatrix} \mathbf{y}\_1(t), \dots, \mathbf{y}\_n(t) \end{bmatrix} \tag{12.7}$$

is called *principal fundamental matrix* of the system and satisfies the matrix equation

$$\frac{\mathrm{d}}{\mathrm{d}t}Y = A(t)Y, \qquad Y(0) = I\,. \tag{12.8}$$

Knowing *Y* , the solution of the initial value problem is thus given by

$$\mathbf{y}(t) = Y(t)\mathbf{y}\_0 \qquad t \ge 0 \dots$$

In addition, since the columns of *Y* are independent at all times, det(*Y* (*t*)) = 0 at all times. The inverse of *Y* , *Y* <sup>−</sup>1, is thus well-defined as is the *evolution operator* (also called *state transition matrix*)

$$U(t,\tau) := Y(t)Y^{-1}(\tau)\,. \tag{12.9}$$

Note that the evolution operator satisfies

$$\frac{\mathbf{d}}{\mathbf{d}t}U(t,\tau) = \left(\frac{\mathbf{d}}{\mathbf{d}t}Y(t)\right)Y^{-1}(\tau) = A(t)Y(t)Y^{-1}(\tau) = A(t)U(t,\tau)$$

and

$$U(\mathfrak{r}, \mathfrak{r}) = I$$

and is thus the principal fundamental matrix of the system *at time* τ . From (12.9) we also immediately obtain

$$U(t, \lambda)U(\lambda, \mathfrak{r}) = U(t, \mathfrak{r})$$

and

$$U(\mathfrak{r}, t) = [U(t, \mathfrak{r})]^{-1} \dots$$

The initial value problem described by (12.5) and initial conditions *y*(*t*0) = *y*<sup>0</sup> can be translated in the language of distributions by extending the functions by zero for *t* < *t*<sup>0</sup> and by replacing the differential operator by the distributional one

$$D\mathbf{y} = A(t)\mathbf{y} + \mathbf{y}\_0 \delta(t - t\_0) \tag{12.10}$$

as usual. Differently from the case where *A*(.) is constant, this equation can not be written as a convolution equation. For this reason and since for arbitrary distributions multiplication is only well-defined with smooth functions, for the equation to be well-defined the functions *ai j*(.) must belong to E. This may seem like a very serious limitation, but remember that any distribution can be approximated to arbitrary accuracy by such a function (see Sect. 3.3). In this case the *fundamental (or elementary) solution* of the equation relative to time τ is defined as the solution of the matrix equation

$$LE\_{\pi} = I\delta(t - \pi) \tag{12.11}$$

with *L* the differential operator

$$L := L(t, D) := D - A(t) \,.$$

If *U*(*t*,τ) is the evolution operator of the original differential equation (12.5) then

$$D\mathfrak{l}\_{+}(t-\tau)U(t,\tau) = \delta(t-\tau)I + \mathfrak{l}\_{+}(t-\tau)A(t)U(t,\tau)$$

shows that

$$E\_{\mathbf{r}}(t) = \mathfrak{I}\_{+}(t-\mathbf{r})U(t,\mathbf{r})\tag{12.12}$$

is the fundamental solution relative to τ of the above distributional equation and

$$\mathbf{y}(t) = E\_{\mathbf{t}\_0}(t)\mathbf{y}\_0 \tag{12.13}$$

is the solution of (12.10).

# *12.2.2 Formal Solution*

We now look for an explicit formal solution in D <sup>+</sup>(R, <sup>C</sup>*<sup>n</sup>*) of the equation

$$D\mathbf{y} = A(t)\mathbf{y} + \mathbf{y}\_0 \boldsymbol{\delta} + \mathbf{x} \tag{12.14}$$

with *A*(.) a matrix of functions in E as before. As a first step we rewrite the equation as an integral equation. To do this we write *Dy* as *D*δ ∗ *y* and convolve both sides of the equation with 1<sup>+</sup> to obtain

$$(\mathbf{y} - \mathbf{1}\_+ \ast (A(t)\mathbf{y}) = \mathbf{y}\_0 \mathbf{1}\_+ + \mathbf{1}\_+ \ast \mathbf{x}\_-$$

Thus, if *x* is a bounded regular distribution then the equation can be written as

$$\mathbf{y}(t) - \int\_{0}^{t} A(\tau)\mathbf{y}(\tau)d\tau = \mathbf{y}\_{0}\mathbf{1}\_{+}(t) + \int\_{0}^{t} \mathbf{x}(\tau)d\tau \,. \tag{12.15}$$

Instead of solving this equation directly, we consider a more general integral equation and then specialise to this case.

A *Volterra integral equation of the second kind* is an equation of the form

$$\mathbf{y}(t) = \int\_{0}^{t} k(t, \tau) \mathbf{y}(\tau) \, d\tau + \mathbf{x}(t) \,, \qquad t \ge 0 \tag{12.16}$$

with *x* a given regular distribution in D <sup>+</sup>(R, <sup>C</sup>*<sup>n</sup>*), *<sup>k</sup>* an *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* matrix of continuous functions [*ki j*], *i*, *j* = 1,..., *n* and *y* the required unknown in D <sup>+</sup>(R, <sup>C</sup>*<sup>n</sup>*). This equation can be solved by an algebraic method based on a group [33, 34].

**Definition 12.2** (*Group*) A group is a pair (G, •) consisting of a non-empty set of objects G and a binary operation •, usually called the group multiplication, satisfying the following properties


$$\text{g} \bullet \text{g}^{-1} = \text{g}^{-1} \bullet \text{g} = e.$$

Note that the unit element is unique, since if *e* is a second unit *e* = *e* • *e* = *e* shows that it must be equal to the first one. A group G *acts (from the left) on* a non-empty set *X* if there is a function

$$
\mathcal{G} \times X \to X, \qquad (\mathcal{g}, \times) \mapsto \mathcal{g} \cdot \times
$$

such that the following hold:


function in D <sup>+</sup>(R, <sup>C</sup>*<sup>n</sup>*). That *<sup>x</sup>* is *locally bounded* means that it is bounded on every finite interval. We define the operation of *I* + *k* on *x* by

$$(I+k)\cdot \mathbf{x} := \mathbf{x}(t) + \int\_0^t k(t,\tau)\mathbf{x}(\tau)\,\mathrm{d}\tau \,\dots$$

The resulting function is again a locally bounded, locally integrable function in D <sup>+</sup>(R, <sup>C</sup>*<sup>n</sup>*) as *<sup>x</sup>* and the elements *<sup>I</sup>* <sup>+</sup> *<sup>k</sup>* can be made to form a group. A suitable group multiplication can be found by writing

$$\begin{aligned} \mathbf{x}(I+k\_1) \cdot \mathbf{l}(I+k\_2) \cdot \mathbf{x} &= \mathbf{x}(t) + \int\_0^t k\_2(t,\tau) \mathbf{x}(\tau) \, \mathrm{d}\tau \\ &+ \int\_0^t k\_1(t,\tau) \mathbf{x}(\tau) \, \mathrm{d}\tau + \int\_0^t k\_1(t,\tau\_1) \int\_0^{\tau\_1} k\_2(\tau\_1,\tau\_2) \mathbf{x}(\tau\_2) \, \mathrm{d}\tau\_1 \, \mathrm{d}\tau\_2 \end{aligned}$$

and noting that

$$\begin{aligned} \int\_0^t k\_1(t, \tau\_1) \int\_0^{\tau\_1} k\_2(\tau\_1, \tau\_2) x(\tau\_2) \, \mathrm{d}\tau\_2 \, \mathrm{d}\tau\_1 \\ &= \int\_0^t \int\_{\tau\_1}^t k\_1(t, \tau\_1) k\_2(\tau\_1, \tau\_2) \, \mathrm{d}\tau\_1 x(\tau\_2) \, \mathrm{d}\tau\_2 \, \mathrm{d}\tau\_1 \end{aligned}$$

Since the inner integral on the right-hand side results in a matrix of continuous functions, we can define the group multiplication by

$$(I + k\_1) \bullet (I + k\_2) := I + k\_1 + k\_2 + k\_1 \star k\_2$$

with

$$k\_1 \star k\_2(t, \tau) := \int\_{\tau}^{t} k\_1(t, \lambda) k\_2(\lambda, \tau) \, d\lambda \,. \tag{12.17}$$

For convenience we also put

$$k\star x(t) := k\star x(t,0) := \int\_0^t k(t,\tau)x(\tau)d\tau\tag{12.18}$$

so that we can write

$$(I+k)\cdot \mathbf{x} = \mathbf{x} + k \star \mathbf{x} \dots$$

The unit of the group is readily seen to be *I*.

It remains to show that every element of the group *I* + *k* has an inverse (*I* + *k*)−1. From the similarity with the geometric series we infer that the inverse is given by

$$\left(\left(I+k\right)^{-1}\right) := I + \sum\_{n=1}^{\infty} (-1)^{n} k^{\star n} \tag{12.19}$$

and show that this series converges in every interval 0 ≤ τ ≤ *t* ≤ *T* . By definition, for every locally bounded function *x* = (*x*1,..., *xn*) and every finite interval 0 ≤ *t* ≤ *T* we can find an upper bound given by

$$p\_T(\mathbf{x}) := \max\_{1 \le i \le n} \left\{ \sup\_{0 \le t \le T} |\mathbf{x}\_i(t)| \right\}$$

so that, given the linearity of *k*,

$$p\_T(k \star \mathbf{x}) \le p\_T(k) \; p\_T(\mathbf{x}) \; T$$

with

$$p\_T(k) := \max\_i \left\{ \sum\_{j=1}^n \sup\_{0 \le \tau \le t \le T} |k\_{ij}(t, \tau)| \right\}.$$

Thus

$$p\_T(k \star k) \le p\_T(k)^2 |T|$$

and by induction

$$p\_T(k^{\star n}) \le p\_T(k)^n \frac{T^{n-1}}{(n-1)!} \dots$$

This upper bound is the *n*th term of a convergent series and implies the converges of (12.19) for every value of *T* . Having established convergence one immediately verifies that indeed

$$(I+k)\bullet(I-k+k^{\star 2}\mp\cdots)=I$$

and

$$(I - k + k^{\star 2} \mp \cdots) \bullet (I + k) = I \dots$$

With this group the Volterra equation (12.16) can be written as

(*I* − *k*) · *y* = *x* .

and is solved by multiplying on the left with (*I* − *k*)−<sup>1</sup>

$$\mathbf{y}(t) = \mathbf{x}(t) + w \star \mathbf{x}(t) \tag{12.20}$$

$$\text{l.w.:} = \sum\_{n=1}^{\infty} k^{\star n}. \tag{12.21}$$

The matrix function w is called the *resolvent kernel* of the equation.

The group can't be extended to a ring or an algebra with the natural addition as these would include the elements *k*. These elements pose two problems. First, the inverse of these elements are not necessarily functions. For example, the inverse of (*t* − τ )*<sup>m</sup>*−<sup>1</sup>/(*m* − 1)! is *D<sup>m</sup>*δ and for singular distributions multiplication is only defined with functions in E. Second, such a ring includes zero divisors. From now on we will generally drop the symbols • of group multiplication and · of group operation as is commonly done with multiplication symbols.

We now come back to the special case of (12.15) for which

$$k(t, \mathfrak{r}) = A(\mathfrak{r})\,.$$

The solution is given by (12.20) with

$$\begin{split} k^{\star n} \star \left( \operatorname{y\_0} \mathbf{1}\_+(t) + \int\_0^t \mathbf{x}(\tau) \, \mathrm{d}\tau \right) \\ &= \int\_0^t \int\_0^{\tau\_1} \cdots \int\_0^{\tau\_{n-1}} A(\tau\_1) \cdots A(\tau\_n) \, \mathrm{d}\tau\_n \cdots \, \mathrm{d}\tau\_1 \, \mathrm{y\_0} \\ &+ \int\_0^t \int\_0^{\tau\_1} \cdots \int\_0^{\tau\_{n-1}} A(\tau\_1) \cdots A(\tau\_n) \int\_0^{\tau\_p} \mathbf{x}(\lambda) \, \mathrm{d}\lambda \, \mathrm{d}\tau\_n \cdots \, \mathrm{d}\tau\_1 \, \mathrm{d}\tau\_n. \end{split}$$

These expressions can be written more compactly by introducing the notion of a *time-ordered product* of operators. We define *T* {*A*1(τ1)··· *An*(τ*n*)} as the product with factors arranged from left to right in order of decreasing times. For example

$$T\{A\_1(\mathfrak{r}\_1)A\_2(\mathfrak{r}\_2)\} = \begin{cases} A\_1(\mathfrak{r}\_1)A\_2(\mathfrak{r}\_2) & \mathfrak{r}\_1 \ge \mathfrak{r}\_2\\ A\_2(\mathfrak{r}\_2)A\_1(\mathfrak{r}\_1) & \mathfrak{r}\_1 < \mathfrak{r}\_2 \end{cases}.$$

With this meta-operator we can now write

$$T\left\{\left(\int^{\sharp}A(\tau)\,\mathrm{d}\tau\right)^{2}\right\}=\int^{\sharp}\int^{\sharp}T\{A(\tau\_{1})A(\tau\_{2})\}\,\mathrm{d}\tau\_{2}\,\mathrm{d}\tau\_{1}$$

$$\begin{split} &= \int\limits\_{0}^{t} \int\_{0}^{\tau\_{1}} A(\tau\_{1}) A(\tau\_{2}) \, \mathrm{d}\tau\_{2} \, \mathrm{d}\tau\_{1} + \int\limits\_{0}^{t} \int\limits\_{\tau\_{1}}^{t} A(\tau\_{2}) A(\tau\_{1}) \, \mathrm{d}\tau\_{2} \, \mathrm{d}\tau\_{1} \\ &= \int\limits\_{0}^{t} \int\limits\_{0}^{\tau\_{1}} A(\tau\_{1}) A(\tau\_{2}) \, \mathrm{d}\tau\_{2} \, \mathrm{d}\tau\_{1} + \int\limits\_{0}^{t} \int\limits\_{0}^{\tau\_{2}} A(\tau\_{2}) A(\tau\_{1}) \, \mathrm{d}\tau\_{1} \, \mathrm{d}\tau\_{2} \\ &= 2 \int\limits\_{0}^{t} \int\limits\_{0}^{\tau\_{1}} A(\tau\_{1}) A(\tau\_{2}) \, \mathrm{d}\tau\_{2} \, \mathrm{d}\tau\_{1} \, . \end{split}$$

and more generally

$$T\left\{\left(\int A(\tau)\,\mathrm{d}\tau\right)^{n}\right\} = n! \int\_{0}^{\prime} \cdots \int\_{0}^{\tau\_{n-1}} A(\tau\_{1}) \cdots A(\tau\_{n}) \,\mathrm{d}\tau\_{n} \cdots \,\mathrm{d}\tau\_{1} \tag{12.22}$$

because there are *n*! possible orderings of the n times τ1,...,τ*n*. Using these expressions we have

$$k^{\star\pi} \star \text{y}\_0 \mathfrak{I}\_+(t) = \frac{1}{n!} T \left\{ \left( \int\_0^t A(\tau) \, d\tau \right)^n \right\} \mathfrak{y}\_0 \tag{12.23}$$

and

$$\begin{split} \mathsf{k}^{\*n} &\star \int\_{0}^{t} \mathbf{x}(\tau) \, \mathrm{d}\tau \\ &= \int\_{0}^{t} \int\_{0}^{\lambda\_{1}} \cdots \int\_{0}^{\lambda\_{n}-1} A(\lambda\_{1}) \cdots A(\lambda\_{n}) \int\_{0}^{\lambda\_{n}} \mathbf{x}(\tau) \, \mathrm{d}\tau \, \mathrm{d}\lambda\_{n} \cdots \, \mathrm{d}\lambda\_{1} \\ &= \int\_{0}^{t} \int\_{\tau}^{t} \cdots \int\_{\tau}^{\lambda\_{n}-1} A(\lambda\_{1}) \cdots A(\lambda\_{n}) \, \mathrm{d}\lambda\_{n} \cdots \, \mathrm{d}\lambda\_{1} \, \mathbf{x}(\tau) \, \mathrm{d}\tau \\ &= \frac{1}{n!} T \Big{{}\Big{{}}} \Big{{}} \Big{{}} \Big{{}} \Big{{}} A(\lambda) \, \mathrm{d}\lambda \Big{{}^{n}\Big{}} \Big{{}} \star \, . \end{split} \tag{12.24}$$

The solution of (12.15) can thus be written in the simple form

$$\mathbf{y}(t) = E\_0(t)\mathbf{y}\_0 + E\_\mathbf{r}(t)\star\mathbf{x}(t) \tag{12.25}$$

with

$$E\_{\mathbf{r}}(t) = \mathfrak{I}\_{+}(t-\tau)T\{\mathbf{e}^{\int\_{\tau}^{t}A(\lambda)\,d\lambda}\}\tag{12.26}$$

the fundamental solution of the equation relative to τ and where we have made explicit the fact that for *t* < τ it is zero.

In the special case in which *A*(.) commutes with *<sup>t</sup>* <sup>τ</sup> *A*(λ) dλ the time ordering operator has no effect and the solution of the equation is a direct generalisation of the solution obtained using the method of separation of the variables for the scalar equation

$$E\_{\mathbf{r}}(t) = \mathbf{1}\_{+}(t-\mathbf{r})\mathbf{e}^{\int\_{\mathbf{r}}^{t}A(\lambda)\,d\lambda}\,.$$

In particular this is the case if *A*(.)is constant, in which case the fundamental solution becomes

$$E\_{\mathbf{t}}(t) = \mathbf{1}\_{+}(t-\tau)\mathbf{e}^{A(t-\tau)}, \qquad A \in \mathbb{C}^{n \times n}$$

and the expression for the solution *y* becomes a convolution identical to (8.11).

For this particular case it is interesting to observe that, for a small-time increment *t*, the evolution from an initial state *y*<sup>0</sup> can be approximated (to first order) by

$$\mathbf{y} (\Delta t) \approx (I + A \Delta t) \cdot \mathbf{y}\_0$$

so that, by iteration

$$\mathbf{y}(n\Delta t) \approx (I + A\Delta t)^{\bullet n} \cdot \mathbf{y}\_0 \cdot \mathbf{z}$$

Now if we set *t* = *t*/*n* we obtain that, in the limit as *n* tends to infinity

$$\lim\_{n \to \infty} \left( I + A \frac{t}{n} \right)^{\bullet n} = \mathbf{e}^{At} \ . $$

The fundamental solution of (12.14) given by (12.26) can also be interpreted as a matrix function of the two variables *t* and τ

$$W(t,\tau) := \mathbf{1}\_+(t-\tau)T\left\{\mathbf{e}^{\int\_{\tau}^{t}A(\lambda)\,d\lambda}\right\}\dots$$

As every element of the matrix is locally integrable, it is also a regular distribution that can be applied to test functions <sup>φ</sup> <sup>∈</sup> <sup>D</sup>(R<sup>2</sup>). In particular, we can choose test functions of the form ψ(*t*)*<sup>x</sup> <sup>j</sup>*(τ ) with ψ, *<sup>x</sup> <sup>j</sup>* <sup>∈</sup> <sup>D</sup>(R), *<sup>j</sup>* <sup>=</sup> <sup>1</sup>,... *<sup>n</sup>* in which case we obtain

$$\int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} W(t,\tau)\psi(t)\mathbf{x}(\tau)\,\mathrm{d}t\,\mathrm{d}\tau = \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} W(t,\tau)\psi(t)\,\mathrm{d}t\,\mathbf{x}(\tau)\,\mathrm{d}\tau$$

with *x* = (*x*1,..., *xn*). The inner integral on the right-hand side evaluates to a matrix of indefinitely differentiable functions in E [16, 35]. For this reason and remembering that every distribution *f* is the limit of a sequence of indefinitely differentiable functions (for example *fm* = *f* ∗ β*<sup>m</sup>* with β*<sup>m</sup>* the test functions of Example (2.4)) we can extend *W x* by continuity to operate on vector valued distributions in E (R, C*<sup>n</sup>*) by defining it as the distribution satisfying the system of equations

$$\langle (W \star x)\_i, \psi \rangle = \sum\_{j=1}^n \left\langle x\_j, \int\_{-\infty}^{\infty} w\_{ij}(t, \tau) \psi(t) \, dt \right\rangle, \qquad i = 1, \ldots, n. \tag{12.27}$$

The thus extended linear map *W* is a distribution valued continuous function

$$W\star : \mathcal{E}'(\mathbb{R}, \mathbb{C}'') \to \mathcal{D}'(\mathbb{R}, \mathbb{C}'')\dots$$

With this definition we obtain for example that the solution of the equation with an input signal

$$\mathbf{x} = \mathbf{y}\_0 \boldsymbol{\delta}(t - t\_0), \qquad \mathbf{y}\_0 \in \mathbb{C}^n$$

is

$$\langle \left( W \star \mathbb{y}\_0 \delta(t - t\_0) \right)\_i, \psi \rangle = \sum\_{j=1}^n \mathbb{y}\_{0,j} \int\_{-\infty}^{\infty} w\_{ij}(t, t\_0) \psi(t) \, dt\_0$$

or

$$W \star \jmath\_0 \delta(t - t\_0) = W(t, t\_0)\jmath\_0 \dots$$

This shows that the matrix *W* plays a similar role as the fundamental solution *E*<sup>τ</sup> (*t*) and is called the (two-sided) *fundamental kernel* (or *elementary kernel*) of the differential operator *D* − *A*(*t*). It also shows that, as with LTI systems, the initial conditions can be absorbed in the input vector signal *x*.

#### **Example 12.1: Oscillator with Increasing Resonance [33]**

Consider an ideal oscillator with a resonance frequency increasing with the square root of time

$$D^2 \mathbf{y} + \omega\_0^2 t \text{ y} = \mathbf{x} \tag{12.28}$$

to which we apply an input signal

$$\boldsymbol{\alpha} = \boldsymbol{\gamma}\_0 \boldsymbol{D} \boldsymbol{\delta} + \boldsymbol{\gamma}\_1 \boldsymbol{\delta}$$

corresponding to initial conditions *y*(0) = *y*<sup>0</sup> and *y* (0) = *y*1. The equation can be rewritten in the state-space form by defining the state

$$u = \begin{bmatrix} y \\ D\mathbf{y} \end{bmatrix}$$

to obtain

$$Du = A(t)u + B\delta, \qquad \mathbf{y} = Cu$$

with

$$A(t) = \begin{bmatrix} 0 & 1 \\ -\alpha\_0^2 t & 0 \end{bmatrix}, \qquad B = \begin{bmatrix} y\_0 \\ y\_1 \end{bmatrix}, \qquad C = \begin{bmatrix} 1 \ 0 \end{bmatrix} \dots$$

In essence we need to calculate *W*(*t*, 0). Using (12.23) and remembering (12.22) we have, for *n* even

$$A(\mathfrak{r})^{\star n} \star \mathfrak{l}\_+(t) = \begin{bmatrix} -\frac{t^{3n/2}\alpha\_0^n}{\prod\_{k=1}^n a\_k} & 0\\ 0 & -\frac{t^{3n/2}\alpha\_0^n}{\prod\_{k=1}^{n-1} b\_k} \end{bmatrix}, \qquad n \text{ even}$$

and for *n* odd

$$A(\tau)^{\star n} \star \mathbb{1}\_{+}(t) = \begin{bmatrix} 0 & \frac{t^{3(n-1)/2 + 1} a\_0^{n-1}}{\prod\_{k=1}^{n-1} b\_k} \\ \frac{t^{3(n+1)/2 - 1} a\_0^{n+1}}{\prod\_{k=1}^{n} a\_k} & 0 \end{bmatrix}, \qquad n \text{ odd}$$

where (*ak* )*<sup>k</sup>*≥<sup>1</sup> and (*bk* )*<sup>k</sup>*≥<sup>1</sup> are the following sequences of integers

$$\begin{aligned} (a\_k)\_{k \ge 1} &:= (2, 3, 5, 6, 8, 9, 11, 12, \dots) \\ (b\_k)\_{k \ge 1} &:= (3, 4, 6, 7, 9, 10, 12, 13, \dots) \end{aligned}$$

The fundamental kernel at (*t*, 0) is thus

$$\begin{split} W(t,0) &= I + \sum\_{n=1}^{\infty} A(\tau)^{\star n} \star \mathbb{1}\_{+}(t) \\ &= \begin{bmatrix} 1 - \frac{\alpha\_0^2 t^3}{6} + \frac{\alpha\_0^4 t^6}{180} \mp \dotsb & t - \frac{\alpha\_0^2 t^4}{12} \pm \dotsb \\\ -\frac{\alpha\_0^2 t^2}{2} + \frac{\alpha\_0^4 t^3}{30} \mp \dotsb & 1 - \frac{\alpha\_0^2 t^3}{3} + \frac{\alpha\_0^4 t^6}{72} \mp \dotsb \end{bmatrix}. \end{split}$$

The series can be recognised as linear combinations of the Airy Ai and Bi functions and their derivatives Ai and Bi

$$W(t,0) = \begin{bmatrix} w\_0(t) \ w\_1(t) \end{bmatrix}$$

with

$$w\_0(t) = \frac{\mathfrak{Z}^{1/6} \Gamma(2/3)}{2} \begin{bmatrix} (\sqrt{3} \mathsf{Ai}(-t\alpha\_0^{2/3}) + \mathsf{Bi}(-t\alpha\_0^{2/3})) \\ -\alpha\_0^{2/3} (\sqrt{3} \mathsf{Ai}'(-t\alpha\_0^{2/3}) + \mathsf{Bi}'(-t\alpha\_0^{2/3})) \end{bmatrix}$$

and

$$w\_1(t) = \frac{\Gamma(1/3)}{2 \cdot 3^{2/3}} \begin{bmatrix} \frac{3\mathbf{A}\mathbf{i}(-t\alpha\_0^{2/3}) - \sqrt{3}\mathbf{B}\mathbf{i}(-t\alpha\_0^{2/3})}{\alpha\_0^{2/3}}\\ -3\mathbf{A}\mathbf{i}'(-t\alpha\_0^{2/3}) + \sqrt{3}\mathbf{B}\mathbf{i}'(-t\alpha\_0^{2/3}) \end{bmatrix}.$$

The signal of interest *y* is thus given by

$$\mathbf{y}(t) = CW(t, 0)B\dots$$

Specifically, for *y*<sup>0</sup> = 1 and *y*<sup>1</sup> = 0 (see Fig. 12.1)

$$\mathbf{y}(t) = \frac{\mathfrak{J}^{1/6}\Gamma(2/3)}{2} \Big( \sqrt{3}\mathbf{A}\mathbf{i}(-t\alpha\_0^{2/3}) + \mathbf{B}\mathbf{i}(-t\alpha\_0^{2/3}) \Big).$$

The full fundamental kernel *W*(*t*,τ) can be obtained using Eqs. (12.9) and (12.12) and computing the inverse of *W*(*t*, 0)

$$W(t,\tau) = \mathfrak{I}\_+(t-\tau)W(t,0)[W(\tau,0)]^{-1}.$$

# *12.2.3 Perturbation Theory*

The solution of (12.14) presented above is of great theoretical value. However, when it comes to solving practical problems it is in general very difficult to find a closed form for the fundamental kernel *W*(*t*,τ). In many situations the problem at hand looks similar to a solvable problem, but with additional terms. If those terms are small in comparison to the ones of the solvable problem then one can obtain a good approximation to the solution of the problem by the following perturbative method.

Suppose that the matrix *A*(.) can be split in two parts: one that leads to a solvable problem and that we denote by *A*0(.) and one with relatively small elements, the perturbation term, that makes the equation unsolvable and that we denote by *A*˜(.)

$$D\mathbf{y} = [A\_0(t) + \tilde{A}(t)]\mathbf{y} + \mathbf{x} \ .$$

Let *W*0(*t*,τ) be the fundamental kernel of the solvable part of the equation, *Y*0(.) its principal fundamental matrix, that is, the solution of the matrix equation

$$DY\_0(t) = A\_0(t)Y\_0(t)\,, \qquad Y\_0(0) = I\ldots$$

and let express *y* in terms of a new vector *y*˜ defined by

$$\mathbf{y} = Y\_0(t)\tilde{\mathbf{y}}\dots$$

Then the equation becomes

$$D Y\_0(t)\tilde{\mathbf{y}} + Y\_0(t)D\tilde{\mathbf{y}} = [A\_0(t) + \bar{A}(t)]Y\_0(t)\tilde{\mathbf{y}} + x$$

which reduces to

$$D\tilde{\mathbf{y}} = \mathcal{Q}(t)\tilde{\mathbf{y}} + Y\_0^{-1}(t)x$$

with

$$\mathcal{Q}(t) := Y\_0^{-1}(t)\tilde{A}(t)Y\_0(t)\dots$$

This equation has the same form as the original one. Its solution is therefore given by

$$\tilde{\mathbf{y}}(t) = \mathbb{1}\_{+}(t-\tau)T\left\{\mathbf{e}^{\int\_{\tau}^{\tau} \mathcal{Q}(\lambda) \, d\lambda}\right\} \star \left[Y\_0^{-1}(t)\mathbf{x}(t)\right].$$

The advantage that we gain is the fact that, if the elements of *A*˜(.) are small, then the series expansion of this solution converges very quickly. Differently from this, to obtain a good approximation using the series of the original formulation of the problem requires a large number of terms (compare with Example 12.1).

If *x* is composed by regular distributions then the first terms of the solution of the equation are given by

$$\begin{aligned} Y\_0(t) &= Y\_0(t) \int\_0^t Y\_0^{-1}(\lambda) x(\lambda) \, d\lambda \\ &+ Y\_0(t) \int\_0^t \int\_0^\tau Q(\tau) Y\_0^{-1}(\lambda) x(\lambda) \, d\lambda \, d\tau + \cdots \\ &= \int\_0^t W\_0(t, \lambda) x(\lambda) \, d\lambda + \int\_0^t \int\_0^\tau W\_0(t, \tau) \tilde{A}(\tau) W\_0(\tau, \lambda) x(\lambda) \, d\lambda \, d\tau + \cdots \end{aligned}$$

The first term that we denote by *y*<sup>0</sup> is the solution of the unperturbed equation. In general, it is given by

$$y\_0 = W\_0 \star x\_\cdot$$

The next term is the first order perturbation term that we denote by *y*1. Note that it can be expressed as the action of the unperturbed system on an input signal *x*<sup>1</sup> constructed by multiplying *y*<sup>0</sup> by the perturbation *A*˜

$$\mathbf{y}\_1 = W\_0 \star x\_1, \qquad x\_1(t) = A(t)\mathbf{y}\_0(t) \; .$$

Similarly, the *n*th order perturbation term can be represented as the action of the unperturbed system on an input signal obtained by multiplying the perturbation term of order *n* − 1 by *A*˜

$$\mathbf{y}\_n = W\_0 \star \mathbf{x}\_n, \qquad \mathbf{x}\_n(t) = A(t)\mathbf{y}\_{n-1}(t) \dots$$

The output of the system

$$\mathbf{y}(t) = \sum\_{n=0}^{\infty} \mathbf{y}\_n(t)$$

can thus be calculated iteratively starting from the response of the unperturbed system where each successive term is the result of multiplying the output of the previous term by *A*˜ and feeding it back as input of the unperturbed system. This reminds of a feedback system with the unperturbed system playing the role of the forward path and *A*˜ of the feedback one.

# *12.2.4 Non-smooth Coefficients*

For several applications the requirement of differential operators with indefinitely differentiable coefficients is too restrictive. In those situations it's useful to work in the subspace of D constituted by distributions that are *m* times differentiable and denoted by D*<sup>m</sup>*. These distributions are said to be of *order m* and are the continuous linear functionals on the set of *m* times continuously differentiable functions with compact support D*<sup>m</sup>*. Convergence is defined in a similar way as for distributions in D .

Given a distribution *f* ∈ D*<sup>m</sup>* the product of *f* with an *m* times continuously differentiable function *g* is well-defined

$$
\langle fg, \phi \rangle = \langle f, g\phi \rangle
$$

since if φ ∈ D*<sup>m</sup>* then *g*φ is also in D*<sup>m</sup>*. Note that we can exchange the roles of *f* and *g* and still obtain a well-defined multiplication. Thus if *f* is an *m* times continuously differentiable function, it can be multiplied by a distribution of order *m*.

#### **Example 12.2: Dirac Distribution**

The Dirac distribution δ belongs to D<sup>0</sup> and its multiplication with continuous functions is well-defined as long as one restricts considerations to D*<sup>m</sup>*.

# **12.3 Impulse Response Generalisation**

In the previous section we saw that differential equations describing LTV systems aren't convolution equations. In spite of this we found that the solution of the equation can be written with the help of the operator acting on a (matrix) function characterising the system (the fundamental kernel) and the input vector *x*. In particular, for a system described by the state-space representation (12.3)–(12.4) with *D*(*t*) = 0, the input-output characteristic is given by

$$\begin{split} \mathbf{y}(t) &= C(t)\mathsf{1}\_{+}(t-\mathsf{r})T\{\mathbf{e}^{\int\_{\mathsf{r}}^{\mathsf{f}}A(\lambda)\,d\lambda}\} \star B(t)\mathbf{x}(t) \\ &= C(t)\mathsf{1}\_{+}(t-\mathsf{r})T\{\mathbf{e}^{\int\_{\mathsf{r}}^{\mathsf{f}}A(\lambda)\,d\lambda}\}B(\mathsf{r})\star\mathbf{x}(t) \,. \end{split}$$

This expression highlights how in LTV systems the operator is the natural operator taking the place of convolution in LTI systems. However, because the Fourier- and Laplace-transform convert convolutions into products and because for many purposes the frequency domain characteristics of a system are more interesting than the time domain ones, in engineering circles it is common to express the input-output characteristic of LTV systems in terms of a convolution like operator as we did in Sect. 12.1. This is easily done by the change of variable

$$\xi = t - v$$

and by defining the time-varying impulse response *h*(*t*,ξ) as a function of the variables *t* and ξ

$$\mathbf{y}(t) = \int\_{0}^{t} h(t, t - \xi) \, \mathbf{x}(\xi) \, d\xi.$$

$$h(t, \xi) = C(t) \mathbb{1}\_{+}(\xi) T \left\{ \mathbf{e}\_{t - \xi}^{\int\_{t}^{t} A(\lambda) \, d\lambda} \right\} B(t - \xi).$$

where we have assumed *x* to be a regular distribution in D <sup>+</sup>. Note that, while the above integral looks very similar to a convolution, it differs from the convolution that we defined in Sect. 3.2. A generalisation of the above convolution like operation for LTV systems is obtained by adapting (12.27) and defining it as the distribution satisfying the following equality

$$<\langle h \ast\_t \ge, \phi \rangle = \langle \chi(\xi), \langle h(t, t - \xi), \phi(t) \rangle \rangle \tag{12.29}$$

where we have generalised the inner integral of (12.27) to the application of a parameterised distribution to the test functionφ. Since this operation shares several properties with convolution, the operator ∗*<sup>t</sup>* is called the *convolution product for time-varying systems.* In the technical literature it is most often called convolution and denoted by the same symbol as the one used for convolution. In the following we will also often simply call it convolution, but maintain the use of the symbol ∗*<sup>t</sup>* to make it clear that it is not the operation defined by (3.6).

In the special case in which the time-varying impulse response is the product of an indefinitely differentiable function *f* and a distribution *g*

$$h(t, \xi) = f(t)g(\xi).$$

the convolution product for time-varying systems can be expressed in terms of a proper convolution by

$$
\langle h \ast\_t \ge, \phi \rangle = \langle \mathfrak{g} \ast \ge, f\phi \rangle \dots
$$

In the previous section we discussed the fact that, for systems described by a differential equation, the application *h*(*t*, *t* − ξ ), φ(*t*) appearing on the right-hand side of (12.29), regarded as a function of the parameter ξ , is a function belonging to E. For this reason, for the equation to have a meaning, *x* must be restricted to distributions in E . However, if we define a function γ ∈ E bounded from the left with γ (*t*) = 1 in a neighbourhood of [0,∞) and assume *h* to be such that

$$\xi \mapsto \mathcal{Y}(\xi) \langle h(t, t - \xi), \phi(t) \rangle$$

is a Schwartz function for every φ ∈ S, then (12.29) remains valid for right-sided tempered distributions

$$
\propto \mathcal{S}' \cap \mathcal{D}'\_+ \dots
$$

Note the similarity with the definition of the Laplace transform and the fact that, as for the Laplace transform, the value of the distribution does not depend on the choice of γ . For this reason and as is commonly done for the Laplace-transform, we will generally not write γ explicitly.

Before concluding this section we note some properties of the operator ∗*<sup>t</sup>* . The first is that it is associative

$$(h\_B \ast\_t h\_A) \ast\_t \ge = h\_B \ast\_t (h\_A \ast\_t \ge)$$

with *h <sup>A</sup>* and *hB* the time-varying impulse responses of two systems. This is a direct consequence of the fact that ∗*<sup>t</sup>* is related to by a simple variable transformation and by the definition of the latter (see Eqs. (12.17) and (12.18)).

A second important property, or rather the lack of it, is that, ∗*<sup>t</sup>* is not commutative. Therefore, differently from LTI systems, the order of LTV systems is important. As an example consider the cascade of a low-pass filter with a 3 dB cut-off frequency of ω3*d B* and the frequency shifting system of Example 12.4 with w<sup>0</sup> ω3*d B*. Suppose that the system is driven by a signal with a frequency falling in the pass-band of the LPF. Then if the signal passes first through the LPF and then into the frequency shifting system, the output will have a large magnitude. Differently from this, if the input signal first passes through the frequency translating system then the signal at the input of the LPF will lie in the stop-band of the latter and will appear much attenuated at its output.

# **12.4 Time-Varying Frequency Response**

# *12.4.1 Definition*

Consider a system described by the time-varying impulse response *h*(*t*,ξ). Under the assumption that the input signal *x* is a right-sided tempered distribution the system response *y* can be written as

$$\begin{aligned} \langle \chi(t), \phi(t) \rangle &= \left\langle \mathcal{F}^{-1} \{ \mathcal{F} \{ \mathbf{x} \} \}, \int\_{-\infty}^{\infty} h(t, t - \xi) \phi(t) \, dt \right\rangle \\ &= \left\langle \hat{\chi}(\omega), \frac{1}{2\pi} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} h(t, t - \xi) \phi(t) \, dt \, \mathbf{e}^{\prime \omega \xi} \, d\xi \right\rangle \\ &= \left\langle \hat{\chi}(\omega), \frac{1}{2\pi} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} h(t, t - \xi) \mathbf{e}^{-\prime \omega (t - \xi)} \, d\xi \, \mathbf{e}^{\prime \omega t} \, \phi(t) \, dt \right\rangle \end{aligned}$$

$$\begin{aligned} &= \left\langle \hat{\boldsymbol{x}}(\boldsymbol{\omega}), \frac{1}{2\pi} \int\_{-\infty}^{\infty} \hat{h}(t, \boldsymbol{\omega}) \, \mathbf{e}^{\boldsymbol{\omega}\boldsymbol{\omega}} \, \boldsymbol{\phi}(t) \, dt \right\rangle \\ &= \left\langle \frac{1}{2\pi} \hat{h}(t, \boldsymbol{\omega}) \, \mathbf{e}^{\boldsymbol{\omega}\boldsymbol{\omega}} \, \boldsymbol{\star} \, \hat{\boldsymbol{x}}(t), \, \boldsymbol{\phi}(t) \right\rangle \end{aligned}$$

or

$$\mathbf{y}(t) = \frac{1}{2\pi}\hat{h}(t,\omega)\,\mathbf{e}^{\prime\prime t}\,\star\,\hat{\mathbf{x}}(t)\tag{12.30}$$

where *h*ˆ(*t*, ω) is the Fourier transform with respect to ξ of *h*(*t*,ξ) and is called the *time-varying frequency response* of the system

$$\hat{h}(t,\omega) := \int\_{-\infty}^{\infty} h(t,\xi) \mathbf{e}^{-j\omega \xi} \, d\xi \,. \tag{12.31}$$

In particular, for regular distributions we have

$$\mathbf{y}(t) = \frac{1}{2\pi} \int\_{-\infty}^{\infty} \hat{h}(t, \omega) \hat{\mathbf{x}}(\omega) \,\mathbf{e}^{I\alpha t} \,d\omega \,\dots$$

It's easy to check that for real systems the time-varying frequency response at −ω is equal to the conjugate complex of the value at ω

$$
\hat{h}(t, -\alpha) = \overline{\hat{h}}(t, \alpha).
$$

for each value of *t*.

To obtain a physical interpretation for *h*ˆ(*t*, ω) we apply a complex tone ejω0*<sup>t</sup>* as input signal. This is allowed because periodic distributions are isomorphic to distributions with compact support (see Sect. 3.4). With this input signal the output of the system is found with the help of (12.30) to be

$$\mathbf{y}(t) = \hat{h}(t, a\_0) \mathbf{e}^{J^{\alpha \otimes f}}$$

and suggests the interpretation for the time-varying frequency response *h*ˆ(*t*, ω) as the complex envelope at ω<sup>0</sup> of the output signal (see Fig. 12.2).

If the output signal *y* is a tempered distribution it can be Fourier transformed. A useful expression relating the spectrum of *y* and the one of the input signal *x* can be obtained by expressing *y* with the help of (12.30)

**Fig. 12.2** Illustrative representation of the response of a real LTV system to an input tone

$$
\begin{split}
\langle \hat{\boldsymbol{\chi}}, \phi \rangle &= \left\langle \mathcal{F} \{ \frac{1}{2\pi} \hat{h}(t, \boldsymbol{w}) \operatorname{\mathbf{e}}^{\operatorname{fod}} \star \hat{\boldsymbol{x}} \}, \phi(t) \right\rangle \\ &= \left\langle \hat{\boldsymbol{\chi}}(\boldsymbol{w}), \frac{1}{2\pi} \int\_{-\infty}^{\infty} \hat{h}(t, \boldsymbol{w}) \operatorname{\mathbf{e}}^{\operatorname{fod}} \hat{\phi}(t) \, dt \right\rangle \\ &= \left\langle \hat{\boldsymbol{\chi}}(\boldsymbol{w}), \frac{1}{2\pi} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \hat{h}(t, \boldsymbol{w}) \operatorname{\mathbf{e}}^{-\operatorname{-,f}(\boldsymbol{w} - \boldsymbol{w})} \operatorname{d}t \, \phi(\boldsymbol{w}) \, d\boldsymbol{w} \right\rangle \\ &= \left\langle \hat{\boldsymbol{\chi}}(\boldsymbol{w}), \frac{1}{2\pi} \int\_{-\infty}^{\infty} \hat{h}(\boldsymbol{w} - \boldsymbol{w}, \boldsymbol{w}) \, \phi(\boldsymbol{w}) \, d\boldsymbol{w} \right\rangle \\ &= \left\langle \frac{1}{2\pi} \hat{h}(\boldsymbol{w} - \boldsymbol{w}, \boldsymbol{w}) \star \hat{\boldsymbol{x}}(\boldsymbol{w}), \phi(\boldsymbol{w}) \right\rangle
\end{split}
$$

or

$$
\hat{\chi}(w) = \frac{1}{2\pi} \hat{\hat{h}}(w - \omega, \omega) \star \hat{\chi}(w) \tag{12.32}
$$

with

$$\hat{\hat{h}}(w,\omega) := \int\_{-\infty}^{\infty} \hat{h}(t,\omega) \,\mathrm{e}^{-jwt} \,dt\,. \tag{12.33}$$

The function ˆ *h*ˆ is the two-dimensional Fourier transform of the time-varying impulse response *h* and, in the context of communication systems, is called the *doppler-spread function.* Equation (12.32) shows that the input and output spectra of an LTV system are related by a convolution like operation. In particular, for regular distributions they are related by the following integral

$$
\hat{\chi}(w) = \frac{1}{2\pi} \int\_{-\infty}^{\infty} \hat{\hat{h}}(w - w, \omega) \hat{\chi}(w) \, dw \dots
$$

For tempered distributions the time-varying impulse response *h*, the time-varying frequency response *h*ˆ and the doppler-spread function ˆ *h*ˆ are isomorphic to each other. For this reason an LTV system with a tempered time-varying impulse response can be described by any of these functions.

#### **Example 12.3**

In this example we investigate the relationship between an LTI system to which we apply a right-sided input tone and an LTV system activated at *t* = 0 s and driven by a tone, resulting in equal output signals.

Consider an LTI system described by the differential equation

$$D\mathbf{y} + a\mathbf{y} = \mathbf{x}\,, \qquad a > 0$$

to which we apply the signal *x*(*t*) = 1+(*t*)ejω*<sup>t</sup>* ∈ D <sup>+</sup>. The response of the system can be calculated with the help of the Laplace transform. The transfer function of the system and the Laplace transformed of the input signals are

$$H(\mathbf{s}) = \frac{1}{s+a}, \qquad \Re\{\mathbf{s}\} > -a$$

and

$$X(\mathbf{s}) = \frac{1}{\mathbf{s} - J\,\alpha}, \qquad \mathfrak{R}\{\mathbf{s}\} > 0$$

respectively. The system response is thus found by inverse Laplace transforming

$$Y(\mathbf{s}) = H(\mathbf{s})X(\mathbf{s}) = \frac{1}{(\mathbf{s} + a)(\mathbf{s} - j\omega)}, \qquad \Re\{\mathbf{s}\} > 0$$

which gives

$$\mathbf{y}(t) = \frac{\mathbf{e}^{-at}}{a + j\omega} \Big(\mathbf{e}^{(a+j\omega)t} - 1\Big).$$

We now re-interpret the system as a time-variable one consisting of the above LTI system and an ideal switch at its input. For *t* < 0 the input is disconnected from the system (switch open) which therefore produces the constant output signal *y*(*t*) = 0. At *t* = 0 the input signal is connected to the input of the LTI system by closing the switch. The full system is therefore described by the differential equation

$$Dy + ay = \mathbb{1}\_+(t)x\dots$$

The input signal is now the complex tone *x*(*t*) = ejω*<sup>t</sup>* .

To obtain the system response we first compute the time evolution operator*U*(*t*,τ) which is the solution of

$$Dy + ay = \delta(t - \mathfrak{r})\,, \qquad t \ge \mathfrak{r} > 0\,\}$$

and given by

$$U(t, \mathfrak{r}) = \mathfrak{l}\_+(t)\mathfrak{l}\_+(t-\mathfrak{r})\mathfrak{e}^{-a(t-\mathfrak{r})}\dots$$

With it the response of the system to the input *x*(*t*) = ejω*<sup>t</sup>* is calculated to be

$$\begin{aligned} \mathbf{y}(t) &= U(t, \tau) \star \mathbf{x}(t) = \int\_0^t \mathbf{e}^{-a(t-\tau)} \mathbf{e}^{J\alpha\tau} d\tau \\ &= \frac{\mathbf{e}^{-at}}{a+J\alpha} \Big( \mathbf{e}^{(a+J\alpha)t} - 1 \Big) \end{aligned}$$

which of course agrees with the calculation through Laplace transform. However, with the new interpretation we see that the system posses a time-varying frequency response *h*ˆ(*t*, ω). The easiest way to calculate it is through the relation *y*(*t*) = *h*ˆ(*t*, ω)ejω*<sup>t</sup>* and we obtain

$$
\hat{h}(t,\omega) = \frac{\mathbf{e}^{-(a+j\omega)t}}{a+j\omega} \Big(\mathbf{e}^{(a+j\omega)t} - 1\Big).
$$

This shows the relationship between *h*ˆ(*t*, ω) and the LTI frequency response *H*(jω) = 1/(*a* + jω). Differently from the latter, *h*ˆ(*t*, ω) includes the full information about the variation in time of the system. In this particular example, about when the switch is closed.

#### **Example 12.4: Frequency Translation**

Consider a system described by the doppler-spread function

$$
\hat{\hat{h}}(w,\omega) = 2\pi\delta(w - w\_0)\,.
$$

According to (12.32) the spectrum of the output signal is given by

$$
\hat{\chi}(w) = \delta(w - w\_0 - \omega) \star \hat{\chi}(w) = \hat{\chi}(w - w\_0) \, .
$$

Therefore, the effect of the system described by the above doppler-spread function is to shift in frequency the spectrum of the input signal by w0. Such a device is referred to as a *mixer*.

The time-varying frequency response and the time-varying impulse response corresponding to this delay-spread function are easily calculated to be

$$
\hat{h}(t,\alpha) = \mathbf{e}^{\mathbf{y}^{w\_0 t}}
$$

**Fig. 12.3** Block diagram of a frequency-translating LTV system

**Fig. 12.4** Ideal sample and hold

and

$$h(t, \xi) = \mathbf{e}^{J^{w\_0 t}} \delta(\xi).$$

respectively. If we apply a complex tone ejω0*<sup>t</sup>* as input signal we can calculate the output signal from the former and (12.30) as

$$\mathbf{y}(t) = \frac{1}{2\pi} \mathbf{e}^{\mathbf{f}^{(w\_0 + \alpha)t}} \star 2\pi \delta(t - \alpha\_0) = \mathbf{e}^{\mathbf{f}^{(w\_0 + \alpha\_0)t}}$$

or from the latter and (12.29) as

$$\mathbf{y}(t) = \mathbf{e}^{fw\otimes t} \delta(\xi) \*\_t \mathbf{e}^{Ja\otimes t} = \mathbf{e}^{J(w\_0 + \alpha y)t} \dots$$

In both cases the angular frequency of the input tone is shifted by w<sup>0</sup> as expected.

The time-varying impulse response shows clearly that the system is memory-less, that is, the value of the output signal at time *t* only depends on the input signal at time *t*. The effect of the system is to simply multiply the input signal by the complex tone ejw0*<sup>t</sup>* as illustrated in Fig. 12.3.

#### **Example 12.5: Sample and Hold**

In this example we consider an ideal sample and hold: the output of the system is constructed by sampling the input signal *x* at regular intervals T and by holding the value of each sample constant for the duration of a period T. Sample and hold blocks are used for example at the input of analog-to-digital converters (ADC) to give the converter enough time to compare the value of a sample with one or more reference signal levels. The operation of a sample and hold is illustrated in Fig. 12.4.

The ideal sample and hold is characterised by the following time-varying impulse response

$$h(t,\xi) = \delta\_{\mathcal{T}}(t-\xi)\mathbf{l}\_{\mathcal{T}}(\xi) = \sum\_{n=-\infty}^{\infty} \delta(t-\xi-n\mathcal{T})\mathbf{l}\_{\mathcal{T}}(\xi)$$

with

$$\mathbb{I}\_{\mathcal{T}}(\xi) = \begin{cases} 1 & 0 \le \xi < \mathcal{T} \\ 0 & \text{otherwise} \end{cases}$$

Note that in this case (12.29) doesn't make sense as *h* is a singular distribution and in the right-hand side expression *x* is not applied to a smooth function. To give a meaning to

$$\mathbf{y}(t) = h(t, \boldsymbol{\xi}) \*\_t \boldsymbol{x}(t)$$

we have to restrict the input signal *x* to belong to E. Then we can write

$$\begin{aligned} \langle \mathbf{y}, \boldsymbol{\phi} \rangle &= \langle h(t, \boldsymbol{\xi}) \*\_{t} \mathbf{x}(t), \boldsymbol{\phi}(t) \rangle \\ &= \sum\_{n = -\infty}^{\infty} \left\langle \mathbf{x}(\boldsymbol{\xi}) \boldsymbol{\delta}(\boldsymbol{\xi} - n\boldsymbol{\mathcal{T}}), \int\_{\boldsymbol{\xi}}^{\boldsymbol{\xi} + \boldsymbol{\mathcal{T}}} \boldsymbol{\phi}(t) \, dt \right\rangle \\ &= \sum\_{n = -\infty}^{\infty} \mathbf{x}(nT) \langle \mathbf{l}\_{\boldsymbol{\mathcal{T}}}(t - nT), \boldsymbol{\phi}(t) \rangle \end{aligned}$$

or

$$\mathbf{y}(t) = \sum\_{n = -\infty}^{\infty} \mathbf{x}(nT) \mathbf{l}\_{\mathcal{T}}(t - nT)$$

and we obtain the desired system response. The system response can also be written as a (proper) convolution

$$\mathbf{y}(t) = \mathcal{T}\delta\_{\mathcal{T}}(t)\mathbf{x}(t) \* \frac{1}{\mathcal{T}}\mathbf{l}\_{\mathcal{T}}(t)\,.$$

From this expression, assuming *x* to be Fourier transformable, it's easy to compute the output spectrum. From (4.14) we read that the Fourier transform of Tδ<sup>T</sup> *x* is the convolution of the transforms of the factors divided by 2π

$$\mathcal{F}\{\mathcal{T}\delta\_{\mathcal{T}} \ge \right\} = \delta\_{\alpha\_{\hat{\imath}}} \ast \hat{\mathcal{X}}$$

with ω*<sup>s</sup>* the sampling angular frequency 2π/T. Thus, the output spectrum is

$$\hat{\mathbf{y}}(\boldsymbol{\omega}) = [\delta\_{\boldsymbol{\omega}\_1} \* \hat{\boldsymbol{x}}(\boldsymbol{\omega})] \frac{1}{\mathcal{T}} \hat{\mathbf{l}}\_{\mathcal{T}}(\boldsymbol{\omega})$$

with

$$\hat{1}\_{\mathcal{T}}(\boldsymbol{\omega}) = \mathcal{T} \frac{\sin \pi \frac{\boldsymbol{\alpha}}{\boldsymbol{\alpha}\_{\boldsymbol{x}}}}{\pi \frac{\boldsymbol{\alpha}}{\boldsymbol{\alpha}\_{\boldsymbol{x}}}} \mathbf{e}^{-J \boldsymbol{\alpha} \frac{\boldsymbol{\mathcal{T}}}{2}}.$$

This expression shows clearly the effects of sampling and of holding in the frequency domain. The operation of sampling is represented by the factor in square brackets. Its effect is to produce an infinite number of copies of the spectrum of the input signal shifted by multiples of ω*<sup>s</sup>*

$$\delta\_{\alpha\_{\flat}} \* \hat{\mathfrak{x}}(\alpha) = \sum\_{n = -\infty}^{\infty} \hat{\mathfrak{x}}(\alpha - n\alpha\_{\mathfrak{x}}) \dots$$

If the original signal has to be recovered from the samples then one must avoid (or reduce to negligible levels) overlapping between the copies. This amount to saying that the power of the input signal residing outside the frequency range (−ω*s*/2, ω*s*/2) must be negligible. Or, in other words, the sampling frequency must be at least twice the frequency of the highest component of the input signal spectrum containing a non-negligible amount of power. This is the statement of the famous *sampling theorem*. If this condition is satisfied then the input signal can be recovered with the help of a low-pass-filter eliminating the copies with *n* = 0. When the copies of the input signal do overlap one says that sampling causes *aliasing*. Note that, if the spectrum of the input signal *x* only occupies a small fraction of the frequency range (−ω*s*/2, ω*s*/2) then one may find a sampling frequency lower than ω*<sup>s</sup>* not causing aliasing.

The effect of holding act as an LTI filter introducing a delay of T/2. The filter has a low-pass characteristic with notches at multiples of ω*s*. The effects of sampling and of holding on the spectrum on a signal are illustrated in Fig. 12.5.

The need to restrict *x* to being an indefinitely differentiable function may seem like excess of rigor. Note however that if *x* is not continuous at the sample instants *n*T then the problem is not "merely" a mathematical one, but any physical implementation will fail to work properly. This is so because if the input signal varies very rapidly compared to the actual speed of the physical sampling switch, then the value of the sample will be affected by many implementation details and in particular by noise. The result is a system producing unpredictable sample values.

From a mathematical point of view one may enlarge the type of allowed input signals to the class of continuous functions. Then the system response is mathematically well-defined, but it's not a distribution anymore. In fact, the value of a Dirac impulse is defined as the value of the test function at zero. If we multiply the test function with a continuous function, the value is still well-defined. However, we can't expect to be able to compute the derivatives of the output signal. Compare also with Sect. 12.2.4.

We started this section by performing a calculation leading to the definition of the time-varying frequency response of a system and a relation expressing the output of the system in terms of it. If we assume Laplace transformable, right-sided signals and redo a similar calculation replacing the Fourier transform by the Laplace one we obtain the *time-varying transfer function* of the system

$$H(t,s) = \int\_0^\infty h(t,\xi) \mathbf{e}^{-s\xi} \,d\xi \qquad \Re\{s\} > \sigma\,. \tag{12.34}$$

With it the output of the system is given by

$$\mathbf{y}(t) = \frac{1}{2\pi j} H(t, \mathbf{s}) \mathbf{e}^{\mathbf{s}t} \star X(\mathbf{s}) \,. \tag{12.35}$$

# *12.4.2 Differential Equation*

Consider again a linear time-varying system whose state *u* is described by the system of differential equations

$$Du = A(t)u + B(t)x$$

and assume that it is driven by a complex tone

$$\mathbf{x}(t) = \mathbf{e}^{J^{\mathrm{act}}}.$$

From (12.30) we know that the components of the state *u* can be represented by

$$
\mu\_i(t) = \hat{u}\_i(t, \alpha) \mathbf{e}^{\prime \alpha t}, \qquad i = 1, \ldots, n \ldots
$$

Inserting this representation for *u* and the complex tone for *x* in the equation we obtain

$$D\hat{u}(t,\alpha)\mathbf{e}^{J\alpha t} = A(t)\hat{u}(t,\alpha)\mathbf{e}^{J\alpha t} + B(t)\mathbf{e}^{J\alpha t}.$$

The left-hand side can be written as

$$D\hat{u}(t,\alpha)\mathbf{e}^{Jat} = \mathbf{e}^{Jat}(J\,\alpha + D)\hat{u}(t,\alpha).$$

so that we obtain an equation for *u*ˆ(*t*, ω)

$$(f\,\omega+D)\hat{\boldsymbol{u}}(t,\omega)=A(t)\hat{\boldsymbol{u}}(t,\omega)+B(t)\,.\tag{12.36}$$

With *u*ˆ(*t*, ω)we can directly obtain the time-varying frequency response of the system without having to first compute the fundamental kernel

$$
\hat{h}(t,\alpha) = C(t)\hat{u}(t,\alpha) + D(t)\,.
$$

In particular, if the system is described by a (possibly) higher-order differential equation

$$L(t,D)\mathbf{y} = N(t,D)\mathbf{x}$$

with

$$\begin{aligned} L(t,D) &= D^m + a\_{m-1}(t)D^{m-1} + \dots + a\_0(t), \\ N(t,D) &= b\_n(t)D^n + b\_{n-1}(t)D^{n-1} + \dots + b\_0(t) \end{aligned}$$

we can directly obtain an equation for the time-varying frequency response of the system by replacing the differential operator *D* in *L* by the operator jω + *D* and in *N* by jω [36]

$$L(t, j\omega + D)\hat{h}(t, \omega) = N(t, j\omega) \,. \tag{12.37}$$

Note that this formulation in terms of distributions and distributional derivatives takes care of the initial conditions automatically. If one works with functions and the standard derivative then the initial conditions for the problem are obtained from

$$\mathbf{y}(t) = \hat{h}(t, \alpha) \mathbf{e}^{J^{\alpha t}}.$$

#### **Example 12.6**

Consider a system that is switched off up to time *t* = 0 (*y*(*t*) = 0, *t* < 0) at which point it is turned on and is then described by the differential equation

$$Dy + t\mathbf{y} = x\dots$$

We are interested in the time-varying frequency response of the system. We compute it in three different ways.

First we compute it via the time evolution operator *U*. For *t* ≥ τ > 0 it is found by solving the differential equation

$$D\mathbf{y} + t\mathbf{y} = \delta(t-\mathbf{r})\,.$$

As can be verified by inserting it into the equation, it is given by

$$U(t, \tau) = \mathbf{e}^{-r^2/2 + \tau^2/2} \cdot$$

To obtain the time-varying frequency response we apply the input *x*(*t*) = ejω*<sup>t</sup>* and obtain

$$\mathbf{y}(t) = \int\_0^t U(t,\tau)\mathbf{x}(\tau)d\tau = \mathbf{e}^{-t^2/2} \int\_0^t \mathbf{e}^{\tau^2/2 + j\alpha\tau} d\tau.$$

From this and

$$\mathbf{y}(t) = \hat{h}(t, \boldsymbol{\alpha}) \mathbf{e}^{J\boldsymbol{\alpha}t}$$

we deduce that

$$\hat{h}(t,\omega) = \mathbf{e}^{-t^2/2 - j\omega t} \int\_0^t \mathbf{e}^{\mathbf{r}^2/2 + j\omega \mathbf{r}} \mathbf{d}\mathbf{r} \ .$$

The time-varying frequency response of the system can also be obtained by Fourier transforming the time-varying impulse response. The latter is obtained from the time evolution operator using the variable substitution ξ = *t* − τ

$$h(t,\xi) = \mathfrak{l}\_+(\xi)\mathfrak{l}\_+(t)\mathbf{e}^{-t^2/2 + (t-\xi)^2/2}$$

where we made explicit that for *t* < 0 the response of the system vanishes. The time-varying frequency response is thus

$$\begin{split} \hat{h}(t,\omega) &= \int\_{-\infty}^{\infty} h(t,\xi) \mathbf{e}^{-j\alpha\xi} d\xi = \mathbf{e}^{-t^2/2} \int\_{0}^{t} \mathbf{e}^{(t-\xi)^2/2} \mathbf{e}^{-j\alpha\xi} d\xi \\ &= \mathbf{e}^{-t^2/2} \int\_{0}^{t} \mathbf{e}^{\mathbf{r}^2/2} \mathbf{e}^{-j\alpha(t-\tau)} d\tau = \mathbf{e}^{-t^2/2 - j\alpha t} \int\_{0}^{t} \mathbf{e}^{\mathbf{r}^2/2 + j\alpha\tau} d\tau. \end{split}$$

which matches the one obtained with the previous method.

A third method to compute the time-varying frequency response is by solving the corresponding differential equation

$$((D + j\alpha)\ddot{h} + t\ddot{h} = \mathbf{1}\_+(t) \, .$$

The solution is

$$\hat{h}(t,\omega) = \mathbf{1}\_+(t)\mathbf{e}^{-t^2/2 - j\omega t} \int\_0^t \mathbf{e}^{\mathbf{r}^2/2 + j\omega \mathbf{r}} d\mathbf{r} \ .$$

as is verified by inserting it in the equation and where we made explicit that for *t* < 0 it is zero.

# **12.5 Linear Periodically Time-Varying Systems**

# *12.5.1 Floquet Theory*

In this section we consider in more details linear periodically time-varying (LPTV) systems. In particular, we study systems that can be described by a state-space representation with matrices *A*(.), *B*(.),*C*(.) and *D*(.) having periodic smooth functions as elements. These include systems described by differential equations with periodic, indefinitely differentiable coefficients.

Consider the differential equation

$$
\dot{\mathbf{y}} = A(t)\mathbf{y} + B(t)\mathbf{x} \tag{12.38}
$$

with *A*(.) an *n* × *n*-matrix and *B*(.) an *n* × 1 one, both with T-periodic indefinitely differentiable elements and where, for brevity, we denote by *y*˙ the (distributional) derivative of *y* and similarly for other quantities. Let further *Y* (.) be the principal fundamental matrix of the equation and

$$U(t, \mathfrak{r}) = Y(t)Y^{-1}(\mathfrak{r})$$

the evolution operator. From the periodicity of *A*(.) we obtain

$$\begin{aligned} \dot{U}(t+\mathcal{T}, \mathcal{T}) &= \dot{Y}(t+\mathcal{T})Y^{-1}(\mathcal{T}) \\ &= A(t+\mathcal{T})Y(t+\mathcal{T})Y^{-1}(\mathcal{T}) \\ &= A(t)U(t+\mathcal{T}, \mathcal{T}) \end{aligned}$$

from which, with *U*(*t*, *t*) = *I* and the uniqueness of the solution of the equation we deduce

$$U(t+\mathcal{T}, \mathcal{T}) = U(t, 0)$$

and

$$Y(t+\mathcal{T}) = Y(t)Y(\mathcal{T})\,.$$

Let now introduce

$$P(t) = Y(t)\mathbf{e}^{-tF}, \qquad F \in \mathbb{C}^{n \times n}$$

with *F* an *n* × *n* matrix with constant coefficients and define the variable transformation

$$\mathbf{y}(t) = P(t)\mathbf{z}(t)\,.$$

In terms of *z* the equation becomes

$$
\dot{P}(t)z + P(t)\dot{z} = A(t)P(t)z + B(t)x
$$

or

$$\dot{z} = P^{-1}(t)[A(t)P(t) - \dot{P}(t)]z + P^{-1}(t)B(t)x\dots$$

To simplify this equation we calculate the derivative of *P*

$$\begin{aligned} \dot{P}(t) &= \dot{Y}(t) \mathbf{e}^{-tF} - Y(t) \mathbf{e}^{-tF} F \\ &= A(t)Y(t) \mathbf{e}^{-tF} - Y(t) \mathbf{e}^{-tF} F \\ &= A(t)P(t) - P(t)F \end{aligned}$$

Using this result in the previous expression we finally obtain

$$\dot{z} = Fz + P^{-1}(t)B(t)x\dots$$

This equation is similar to the original one, but with the important difference that the periodically time-varying matrix *A*(.) of the original equation has been replaced by a *constant* matrix *F*. This shows that the evolution operator of any system of differential equations with *A*(.) a T-periodic smooth matrix can be represented in the form

$$U(t,\tau) = P(t) \mathbf{e}^{(t-\tau)F} P^{-1}(\tau) \,, \tag{12.39}$$

This is called the *Floquet representation* of the evolution operator.

Let *<sup>y</sup>*<sup>0</sup> <sup>∈</sup> <sup>C</sup>*<sup>n</sup>*, from the analysis of LTI-systems we know that e*t F <sup>y</sup>*<sup>0</sup> is a linear combination of functions of the form

$$p\_i(t)\mathbf{e}^{\lambda\_i t}$$

with λ*<sup>i</sup>* an eigenvalue of *F* and *pi* a polynomial of degree lower than the algebraic multiplicity of λ*<sup>i</sup>* . The Floquet representation tells us that the solution of (12.38) is a linear combination of functions of the form

*<sup>p</sup>*˜*i*(*t*)e<sup>λ</sup>*<sup>i</sup> <sup>t</sup>*

where *p*˜*<sup>i</sup>* are again polynomials, but in this case with T-periodic smooth coefficients.

#### **Example 12.7**

In this example we look for the solution of the equation

$$Dy = A(t)\mathbf{y} + x$$

with

$$A(t) = \begin{bmatrix} \omega\_{3dB} + \Delta\omega \cos\omega\_m t & 1\\ 0 & \omega\_{3dB} + \Delta\omega \cos\omega\_m t \end{bmatrix}.$$

In particular we are interested in the evolution operator of the equation as it allows us to calculate the solution for an arbitrary input *x*.

First observe that *A*(.) can be written as a sum of two matrices

$$A(t) = \begin{bmatrix} \alpha\_{3dB} & 1 \\ 0 & \alpha\_{3dB} \end{bmatrix} + \begin{bmatrix} \Delta\omega\cos\alpha\_m t & 0 \\ 0 & \Delta\omega\cos\alpha\_m t \end{bmatrix},$$

the first of which is constant, and we denote it by *F*. To find the principal fundamental matrix we make the ansatz

$$Y(t) = P(t)\mathbf{e}^{tF}, \qquad P(t) = p(t)I$$

with *p* an indefinitely differentiable periodic function with period 2π/ω*m*. Inserting this ansatz in the equation we find

$$\begin{aligned} DY &= D\left[p(t)I\mathbf{e}^{tF}\right] \\ &= \dot{p}(t)I\mathbf{e}^{tF} + p(t)IF\mathbf{e}^{tF} \\ &= \left[F + \frac{\dot{p}}{P}I\right]Y(t) \ . \end{aligned}$$

From this expression we see that it satisfies the equation if

$$\frac{\dot{p}}{p} = \Delta w \cos \alpha\_m t.$$

The function *p* can be calculated from this equation and the condition *Y* (0) = *I* using the method of the separation of variables from which we obtain

$$p(t) = \mathbf{e}^{\frac{\Delta \boldsymbol{\alpha}}{\alpha\_m} \sin \alpha\_m t}.$$

The principal fundamental matrix is thus

$$Y(t) = \mathbf{e}^{\frac{\Lambda \alpha}{\alpha \alpha} \sin \alpha \omega t} \mathbf{e}^{tF} \dots$$

With *Y* and using the results of Example 8.2 for e*t F* the evolution operator is found to be

$$U(t,\tau) = \frac{\mathbf{e}^{\frac{\Delta\boldsymbol{\omega}}{\omega\_{m}}\sin\phi\_{m}t}}{\mathbf{e}^{\frac{\Delta\boldsymbol{\omega}}{\omega\_{m}}\sin\phi\_{m}\tau}}\mathbf{e}^{\alpha\_{\mathcal{M}B}(t-\tau)}\begin{bmatrix} 1 & t-\tau\\ 0 & 1 \end{bmatrix}.$$

# *12.5.2 Time-Varying Frequency Response*

Consider a SISO linear periodically time-varying system described by the state-space representation

$$Du = A(t)u + B(t)x \tag{12.40}$$

$$\mathbf{y} = C(t)\boldsymbol{\mu} + D(t)\mathbf{x} \tag{12.41}$$

with *A*(.), *B*(.),*C*(.) and *D*(.) indefinitely differentiable T-periodic matrix functions. Thanks to linearity we can analyse the response of the system for *D*(*t*) = 0 and add the contribution of *D*(*t*)*x* at the end.

In the previous section we established that the evolution operator of (12.40) can be expressed in the form

$$U(t,\tau) = P(t) \mathbf{e}^{(t-\tau)F} P^{-1}(\tau)$$

with *P*(*t*) an invertible, indefinitely differentiable T-periodic matrix function and *F* a constant matrix. Using this representation for the response of the system we obtain

$$\mathbf{y}(t) = \mathbf{1}\_{+}(t-\tau)C(t)P(t)\mathbf{e}^{(t-\tau)F}P^{-1}(\tau)B(\tau)\star\mathbf{x}(t)$$

or, in terms of the time-varying impulse response

$$\mathbf{y}(t) = h\_C \ast\_t \mathbf{x}(t)$$

with

$$h\_C(t, \xi) = \mathfrak{I}\_+(\xi) C(t) P(t) \mathrm{e}^{\xi F} P^{-1}(t - \xi) B(t - \xi) \dots$$

If we now add the contribution to the output from *D*(*t*)*x* we finally find

$$\mathbf{y}(t) = h \ast\_t \mathbf{x}(t)$$

with

$$h(t, \xi) = h\_C(t, \xi) + D(t)\delta(\xi) \dots$$

The fist term *hC* is a regular distribution growing at most exponentially with respect to ξ while the second has bounded support. The impulse response *h* is therefore Laplace transformable with respect to ξ . This implies that the system possess a timevarying transfer function *H*(*t*,*s*). *H*(*t*,*s*) is a function in the variables *t* and *s* and the above expression makes it clear that it is periodic in *t*. Therefore, with respect to *t*, we can expand it in a Fourier series

$$H(t,s) = \sum\_{n = -\infty}^{\infty} H\_n(s) \mathbf{e}^{j n \alpha\_{\overline{j}} t}$$

with ω<sup>T</sup> = 2π/T and *Hn*(*s*) functions of the variable *s* alone.

The last expression shows that LPTV systems can be regarded as the parallel connection of LTI subsystems with transfer functions *Hn* whose outputs are shifted in frequency by *n*ω<sup>T</sup> (see Fig. 12.6 and Example 12.4). This is best seen by applying a complex tone to a stable system. Thus, assume that all the eigenvalues of *F* have a negative real part, then the time-varying frequency response *h*ˆ(*t*, ω) does also exist and is also a regular distribution that can be identified with a function in the variables *t* and ω. Proceeding as above we can write it as

$$\hat{h}(t,\omega) = \sum\_{n=-\infty}^{\infty} \hat{h}\_n(\omega) \mathbf{e}^{j n \alpha\_{\tilde{T}} t}$$

with *h*ˆ*n*(ω) = *Hn*(jω). If we now apply a complex input tone ejω0*<sup>t</sup>* to the system and use (12.30) to calculate the system response we obtain

$$\mathbf{y}(t) = \sum\_{n = -\infty}^{\infty} \hat{h}\_n(\alpha\_0) \mathbf{e}^{j(n\alpha\eta + \alpha\_0)t} \ .$$

The output is thus seen to consist of a sum of tones at <sup>ω</sup><sup>0</sup> <sup>+</sup> *<sup>n</sup>*ω<sup>T</sup> , *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>, each one weighted by *h*ˆ*n*(ω0). It is readily seen that for a real system the following relation must hold

$$
\hat{h}\_{-n}(-a) = \overline{\hat{h}}\_n(a) \, .
$$

#### **Example 12.8: LPTV LPF**

In this example we examine a series *RC* low-pass filter (LPF) where, to reduce the physical area occupied by the circuit, the series resistor is implemented with a MOSFET. While this will produce some distortion, here we are interested in what happens if the gate bias voltage is disturbed by a periodic signal (see Fig. 12.7). This could happen for example if in a mixed-signal system (both analog and digital signals present) a line distributing the system clock is in proximity of the gate bias line, and the two are not properly isolated. The system is described by the following differential equation

$$\left[D + \alpha\_{3dB}(t)\right] \mathbf{y} = \omega\_{3dB}(t)\mathbf{x}(t)\,, \qquad \omega\_{3dB}(t) = \frac{1}{R(t)C}$$

with *y* the voltage across the capacitor, *x* the source voltage and *R*(.) a periodic function. Given the periodicity of *R*(.), ω3*d B*(.) is also a periodic function with the same period that we assume to be smooth. ω3*d B*(.) can therefore be expanded in a

**Fig. 12.7** *RC* low-pass filter with a PTV resistor

Fourier series that, for simplicity of analysis, we assume to be given by

$$
\alpha\_{3dB}(t) = a\rho\_0 + \Delta\rho\cos(\omega\_m t) \,, \qquad a\rho\_0, \Delta\rho, \omega\_m > 0
$$

with ω ω0. We are interested in characterising the frequency response of the filter.

The equation describing the system separates into a differential equation with constant coefficients and a small perturbation term

$$(D + a\_0)\mathbf{y} + \Delta a \cos(a\_m t)\mathbf{y} = a\_{3dB}(t)\mathbf{x}(t) \dots$$

We can therefore solve the problem using the perturbation theory that we developed in Sect. 12.2.3. In addition, instead of solving for the time-varying impulse response and obtain the time-varying frequency response by Fourier transformation, it is convenient to solve directly the equation for the latter. Proceeding as in Sect. 12.4.2 we obtain

$$(D + j\omega + \omega\_0)\dot{h}(t, \omega) + \Delta\omega\cos(\omega\_m t)\dot{h}(t, \omega) = \omega\_{\text{3dB}}(t)$$

and we can identify −ω cos(ω*mt*) with the perturbation term *A*˜ and −(jω + ω0) with the matrix *A*<sup>0</sup> of the unperturbed system.

We start by computing the time-varying frequency response of the unperturbed system that we denote by *h*ˆ0(*t*, ω) and which has to satisfy

$$(D + j\omega + \alpha\_0)\hat{h}\_0(t, \omega) = \omega\_0 + \frac{\Delta\alpha}{2}(\mathbf{e}^{j\alpha\_n t} + \mathbf{e}^{-j\alpha\_n t})$$

where we have represented cos ω*mt* by complex tones. Note that the variation in *R*(.) results in additional input tones to an otherwise time invariant system. The solution of the equation is readily calculated to be

$$\hat{h}\_0(t,\omega) = H(\omega) + \frac{\Delta\omega}{2a\_0} [H(\omega + \omega\_m)\mathbf{e}^{J\omega\_m t} + H(\omega - \omega\_m)\mathbf{e}^{-J\omega\_m t}]$$

with

$$H(\alpha) = \frac{1}{1 + J\frac{\alpha}{\alpha\_0}}$$

the frequency response of the *RC* filter without disturbances (that is for ω3*d B*(*t*) = ω0). *H*(ω)/ω<sup>0</sup> plays the role of the fundamental kernel *W*<sup>0</sup> of Sect. 12.2.3. However, because *A*<sup>0</sup> is time invariant we can work in the convolution algebra of periodic distributions and instead of the fundamental kernel, the system can be characterised by the fundamental solution of the equation. In this example the *k*th Fourier coefficient of the fundamental solution of the equation is given by (see Example 7.5)

292 12 Linear Time-Varying Systems

$$e\_k = \frac{H(\omega + k\omega\_m)}{\mathcal{T}\omega\_0}; \qquad \mathcal{T} = 2\pi/\omega\_m\dots$$

We now calculate the first order perturbation term. The first step consists in calculating the new "input signal" produced by the perturbation *A*˜

$$\mathbf{x}\_1(t) = \tilde{A}(t)\hat{h}\_0(t,\omega) = -\frac{\Delta\phi}{2}(\mathbf{e}^{j\alpha\_0 t} + \mathbf{e}^{-j\alpha\_0 t})\hat{h}\_0(t,\omega)\ .$$

The first order perturbation term of the frequency response *h*ˆ1(*t*, ω) is then obtained by applying this signal to the unperturbed system

$$(D + j\,\omega + \omega\_0)\hat{h}\_1(t, \omega) = -\frac{\Delta\omega}{2} (\mathbf{e}^{j\alpha\_n t} + \mathbf{e}^{-j\alpha\_n t})\hat{h}\_0(t, \omega)\dots$$

The solution of the equation is given by

$$\begin{split} \hat{h}\_{1}(t,\omega) &= -\frac{\Delta\omega}{2\omega\_{0}}H(\omega) \Big[ H(\omega+\omega\_{m})\mathbf{e}^{f\omega\_{m}t} + H(\omega-\omega\_{m})\mathbf{e}^{-f\omega\_{m}t} \Big] \\ &- \left(\frac{\Delta\omega}{2\omega\_{0}}\right)^{2} \Big[ H(\omega) \Big[ H(\omega+\omega\_{m}) + H(\omega-\omega\_{m}) \Big] \\ &+ H(\omega+\omega\_{m})H(\omega+2\omega\_{m})\mathbf{e}^{-f\omega\_{m}t} \\ &+ H(\omega-\omega\_{m})H(\omega-2\omega\_{m})\mathbf{e}^{-f2\omega\_{m}t} \Big]. \end{split}$$

Note that both *h*ˆ<sup>0</sup> and *h*ˆ<sup>1</sup> include terms of order ω. Since *A*˜ is proportional to ω and all terms of *h*ˆ<sup>1</sup> are proportional to powers of this quantity, no higher perturbation term will include a contribution of order ω. The first two terms *h*ˆ<sup>0</sup> and *h*ˆ<sup>1</sup> are therefore enough to establish the effects of the perturbation of order ω. To obtain an estimate to second order in ω we would need to calculate *h*ˆ<sup>2</sup> as well.

With these results the first order response of the system when driven by a tone at ω is given by

$$\mathbf{y}(t) = \left[\hat{h}\_0(t, \boldsymbol{\omega}) + \hat{h}\_1(t, \boldsymbol{\omega})\right] \mathbf{e}^{\mathrm{fot}} \,\mathrm{s}$$

It is comprised by tones at ω + *n*ω*m*, *n* = −2, −1, 0, 1, 2. It's not difficult to see that if we would calculate higher order terms we would obtain similar tones for larger values of <sup>|</sup>*n*<sup>|</sup> and in the limit, when including all perturbation terms, for all *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>.

Let's consider more closely the component at ω + ω*<sup>m</sup>*

$$\begin{split} \mathbf{y}\_{1}(t) &= \frac{\Delta\omega}{2\omega\_{0}} H(\omega + \omega\_{m}) \Big[ 1 - H(\omega) \Big] \mathbf{e}^{J(\omega\_{m} + \omega)t} \\ &= \frac{\Delta\omega}{2\omega\_{0}} H(\omega + \omega\_{m}) \frac{J\frac{\omega}{\omega\_{0}}}{1 + J\frac{\omega}{\omega\_{0}}} \mathbf{e}^{J(\omega\_{m} + \omega)t} \end{split}$$

and assume that ω*<sup>m</sup>* ω0. If the filter is part of a transmitter and used to suppress noise outside the channel allocated to the user or service, then a 2π/ω*m*-periodic perturbation is seen to create spurious emissions that can fall in frequency ranges reserved for other users or services and violate the maximum allowed emission levels. From the above expression we note that a wide nominal filter bandwidth ω<sup>0</sup> helps in reducing the emission level caused by the perturbation. This can be interpreted intuitively as follows. If the input signal frequency is much smaller than the 3 dB cut-off frequency of the filter, then it produces a very small current flowing through the filer components and, in the limit of zero current, the output signal doesn't depend on the value of the filter components.

If the input tone is well above the nominal 3 dB cut-off frequency of the filter |ω| ω<sup>0</sup> then |*H*(ω)| 1 and the output tone at ω + ω*<sup>m</sup>* can be approximated by

$$\mathbf{y}\_1(t) \approx \frac{\Delta \boldsymbol{\alpha}}{2\alpha\_0} H(\boldsymbol{\alpha} + \boldsymbol{\alpha}\_m) \mathbf{e}^{\boldsymbol{f}^{(\alpha\_m + \alpha)\mathbf{r}}} \boldsymbol{\alpha}$$

If the frequency of the input tone is such that |ω + ω*m*| < ω<sup>0</sup> then the tone falls in a spurious pass band of the filter and for |ω + ω*m*| ω<sup>0</sup> it can be approximated by

$$\mathbf{y}\_1(t) \approx \frac{\Delta w}{2a\alpha\_0} \mathbf{e}^{J(\alpha\_m + \alpha)t} \mathbf{.} $$

If the filter is part of a communication receiver responsible to suppress interfering signals (the *channel filter*) then we see that 2π/ω*m*-periodic perturbations introduce spurious responses in the stop band of the filter at multiples of ω*<sup>m</sup>* that down-convert interfering signals in band, possibly masking the wanted signal. The amplitude of the dominant spurious response is proportional to the perturbation magnitude ω relative to the nominal 3 dB cut-off frequency of the filter.

#### **Example 12.9: Quadrature (De-)Modulator**

Consider the frequency translating system of Example 12.4 with time-varying impulse response

$$h\_{\text{mod}}(t,\xi) = \mathbf{e}^{fw0t}\delta(\xi)\,.$$

It is a complex system in the sense that if we apply a real valued input signal its response is complex valued. In this example we show that the system can be implemented using two real sub-systems.

Let's decompose the input signal into its real and imaginary parts

$$\mathbf{x}(t) = r(t) + jq(t).$$

The system response is given by

$$\mathbf{y}(t) = h\_{\text{mod}}(t,\boldsymbol{\xi}) \*\_{t} \boldsymbol{x}(t) = [r(t) + \boldsymbol{\jmath}q(t)]\mathbf{e}^{\boldsymbol{\jmath}^{\mathrm{w}\_{0}t}}$$

and can be written as

$$[r(t) + jq(t)]\cos w\_0 t - [q(t) - jr(t)]\sin w\_0 t \dots$$

In this form the system response is seen to be the sum of the responses of two real systems driven by correlated signals (see Fig. 12.8). By linearity, if the two systems are driven by the real part only of the input signals, that is by *r* and *q* respectively, then the response of the system is

$$\mathbf{y}(t) = \Re\{ [r(t) + jq(t)] \mathbf{e}^{j w\_0 t} \} \dots$$

The combination of the two real systems is called a *quadrature modulator.* Each of the two real subsystems is called *mixer* and effectively multiply the input signal with a second real valued signal *l* called the local oscillator (LO) signal. A mixer can therefore be considered a system having two input ports.

Consider now a system that shifts the spectrum of the input signal in the opposite direction

$$h\_{\text{demod}}(t,\xi) = \mathbf{e}^{-j\,w\_0 t} \delta(\xi) \, .$$

We would like to find a real system implementation that when driven by the signal

$$[r(t) + jq(t)]\mathbf{e}^{j u \mathbf{u} t}$$

allows us to recover the signals used at the input of the quadrature modulator used to generate it. Such a system is readily found by observing that

$$[r(t) + jq(t)]\mathbf{e}^{jw\psi}\cos w\_0 t = [r(t) + jq(t)]\frac{1}{2}[\mathbf{e}^{j2w\psi} + 1]\mathbf{I}$$

and similarly

$$[r(t) + jq(t)]\mathbf{e}^{jw\_0t}(-1)\sin w\_0t = [r(t) + jq(t)]\frac{-j}{2}[\mathbf{e}^{j2w\_0t} - 1]\dots$$

Thus, if the signals *r* and *q* are band-limited to frequencies smaller than w0, the original signals can be recovered (up to a fixed scaling factor) by use of two mixers driven by quadrature (orthogonal) local oscillator signals followed by low-pass filters (see Fig. 12.9). Such a system is called a *quadrature demodulator.* By linearity, if the system is driven by the real signal

$$\Re\{[r(t) + jq(t)]\mathbf{e}^{jw\_0t}\}$$

**Fig. 12.8** Quadrature modulator

**Fig. 12.9** Quadrature demodulator

the two output signals are the real parts of what we found above, that is *r*/2 and *q*/2 respectively.

#### **Example 12.10: Harmonic-Reject Mixer**

We saw in Example 12.9 that a mixer is a system multiplying the input signal with a T-periodic signal called the local oscillator signal

$$h(t, \xi) = l(t)\delta(\xi)\,.$$

**Fig. 12.11** Generic N-path receiver

**Fig. 12.12** Quadrature N-path demodulator

In practical implementations, to minimise the signal-to-noise degradation caused by the circuit, the local oscillator signal is not a pure sinusoidal. Instead, it is most often designed to approach a rectangular waveform as depicted in Fig. 12.10b. Being periodic the signal *l* can be represented by a Fourier series

$$l(t) = \sum\_{n = -\infty}^{\infty} a\_n \mathbf{e}^{\prime n \alpha \gamma \mathbf{t}}, \qquad \alpha\_{\mathcal{T}} = 2\pi/\mathcal{T}$$

with

$$a\_n = \begin{cases} \frac{\mathfrak{r}}{\mathcal{T}} & n = 0\\ \frac{1}{\pi n} \sin(n \pi \frac{\mathfrak{r}}{\mathcal{T}}) & n \neq 0 \end{cases}$$

for the waveform in Fig. 12.10a and

$$a\_n = \begin{cases} 0 & n \text{ even} \\ \frac{2}{\pi n} \sin(n \pi \frac{t}{T}) & n \text{ odd} \end{cases}$$

for the one in Fig. 12.10b. Therefore, a mixer driven by an input tone

$$\mathbf{x}(t) = \mathbf{e}^{\mathbf{y}(n\alpha\gamma + a\mathbf{y})t} \tag{12.42}$$

produces an output tone at ω<sup>1</sup> for every value of *n* for which *an* = 0. When the mixer is part of a receiver designed to down-convert a signal at ω<sup>T</sup> + ω<sup>1</sup> to ω<sup>1</sup> for further processing and detection, the spurious responses (*n* = −1) are undesired as they could cause an interfering signal to overlap in frequency with the desired signal and prevent reception of the latter. The spurious responses are most often suppressed by preceding the mixer with a suitable filter. However, in some situations such a filter is undesired. In the following we present a method to suppress the dominant spurious responses of a mixer without the need for filters and still using rectangular waveforms as local oscillator signals.

Note that, while the idealised local oscillator waveforms shown in Figs. 12.10a and 12.10b are discontinuous, their Fourier series representations truncated at an arbitrarily high value of |*n*| are indefinitely differentiable functions. Suitably truncated Fourier series are adequate representations of practical signals and do not cause any mathematical difficulty.

Consider the generic *N*-path receiver shown in Fig. 12.11. It is composed by *N* subsystems that are equal apart from the fact that the local oscillator signal of path *k*, *k* = 0,..., *N* − 1 is delayed by T*k*/*N* with respect to path 0. The blocks preceding the output signals *yk* represent LTI subsystem with impulse response *h*. Let the input signal be as in (12.42). Then, due to the tone at −*n*ω<sup>T</sup> in the Fourier series of *l*, the *k*th output signal includes a tone at ω<sup>1</sup> given by

$$\begin{split} \mathbf{y}\_{k,-n} &= h(t) \ast [a\_{-n} \mathbf{e}^{-j n \alpha \gamma \left(t - \mathcal{T} \frac{k}{N}\right)} \delta(\xi) \ast\_{\mathfrak{t}} \mathbf{e}^{j \left(n \alpha \gamma + \alpha \gamma\right) \mathbf{t}}] \\ &= [a\_{-n} H(j \omega\_{1}) \mathbf{e}^{j \alpha \vert I}] \mathbf{e}^{\prime n k \frac{2\pi}{N}} \\ &= \mathbf{y}\_{0,-n}(t) \mathbf{e}^{\prime n k \frac{2\pi}{N}} \end{split}$$

with *H* the Laplace transform of *h*. This shows that the output components of interest (atω1) are the product of the signal *<sup>y</sup>*<sup>0</sup>,−*<sup>n</sup>* and the constants e<sup>j</sup>*n*2<sup>π</sup> *<sup>k</sup> <sup>N</sup>* , *k* = 0,..., *N* − 1. By exploiting the properties of trigonometric functions we can form weighted sums of the outputs *yk* such that the resulting tone at ω<sup>1</sup> vanishes for some values of *n*

298 12 Linear Time-Varying Systems

$$z\_{-n}(t) = \sum\_{k=0}^{N-1} w\_k \mathbf{y}\_{k,-n}(t) = \mathbf{y}\_{0,-n}(t) \sum\_{k=0}^{N-1} w\_k \mathbf{e}^{\prime nk \frac{2\pi}{N}} \dots$$

Note that the sum on the right-hand side corresponds to a discrete Fourier transform of the weighting coefficients. For example, by choosing

$$w\_k = \cos\left(\frac{2\pi}{N}k\right) \tag{12.43}$$

we obtain

$$\begin{aligned} z\_{-n}(t) &= y\_{0,-n}(t) \sum\_{k=0}^{N-1} \cos\left(\frac{2\pi}{N}k\right) \mathbf{e}^{\prime nk \frac{2\pi}{N}} \\ &= \frac{y\_{0,-n}(t)}{2} \sum\_{k=0}^{N-1} \mathbf{e}^{\prime \frac{2\pi}{N}(n+1)k} + \mathbf{e}^{\prime \frac{2\pi}{N}(n-1)k} \dots \end{aligned}$$

The sums are geometric series that evaluate to

$$\sum\_{k=0}^{N-1} \mathbf{e}^{\prime \frac{2\pi}{N}(n \pm 1)k} = \begin{cases} \frac{1 - \mathbf{e}^{\prime 2\pi(n \pm 1)}}{1 - \mathbf{e}^{\prime \frac{2\pi}{N}(n \pm 1)}} = 0 & n \pm 1 \neq Nm, m \in \mathbb{Z} \\ N & \text{otherwise} \end{cases}$$

and therefore the signal *z*−*<sup>n</sup>* is

$$z\_{-n}(t) = \begin{cases} 0 & n \neq Nm \pm 1\\ \frac{N}{2} \chi\_{0, -n}(t) & \text{otherwise} \end{cases}$$

For example, for *N* = 8 all harmonics below the 15th except for the 7th and the 9th are suppressed. A mixer with no spurious responses at some odd harmonics is called a *harmonic-reject mixer.*

The weighting factors of (12.43) are not the only possible choice. For example, any rotation of the indexes w(*<sup>k</sup>*+*m*) mod *<sup>N</sup>* produces a similar result with the addition of a phase factor to the output signal. For *N* even we can thus construct a full quadrature demodulator by building two weighted sums, one with weighting factors as given by (12.43) and the other by factors rotated by *N*/2 (w*<sup>k</sup>*+*N*/2; see Fig. 12.12). The case with *N* = 4 corresponds to classical situation with differential output signals. Further choices of weighting factors allow isolating responses at values of *n* different from 1.

While we discussed summing the signals after the LTI systems characterised by *h*, the same results apply if the signals are summed right after the mixers. The rejection obtained in practice is limited by mismatch between the paths. The place where the summation is implemented plays a role in this respect.

If we revert the direction of the signals in the system of Fig. 12.11 we obtain an *N*-path transmitter. This is a generalisation of the classic case with *N* = 4 with the 4 input signals being differential versions of the *r* and *q* modulator input signals. As with the receiver, a larger value of *N* allow suppressing spurious emissions at harmonics of the local oscillator signal without the use of filters.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 13 Weakly Nonlinear Time-Varying Systems**

The theory of linear time-varying systems can be extended to weakly-nonlinear time-varying (WNTV) systems in a similar way as we did for linear time-invariant systems. In this chapter we first define WNTV systems mathematically and highlight some important differences from the theory of WNTI systems. We then discuss weakly-nonlinear periodically time-varying (WNPTV) systems. These type of systems generate a characteristic spectrum that is relatively easy to describe and is relevant, for example, in the study and design of communication systems.

# **13.1 Weakly Nonlinear Time-Varying Systems**

# *13.1.1 Definition*

A *Weakly-nonlinear time-varying system* is defined as a system whose response to the input signal *x* can be described by

$$\mathbf{y}(t) = \sum\_{k=1}^{\infty} w\_k(t, \tau\_1, \dots, \tau\_k) \star \mathbf{x}^{\otimes k}(\tau\_1, \dots, \tau\_k) \,. \tag{13.1}$$

The operator is the extension of the operator introduced in Sect. 12.2.2 to higher dimensions. For causal systems described by regular distributions and driven by a right sided input *x* it is defined by

$$\begin{aligned} w\_k(t, \tau\_1, \dots, \tau\_k) \star \mathbf{x}^{\otimes k}(\tau\_1, \dots, \tau\_k) &:= \\ \int\_0^t \cdots \int\_0^t w\_k(t, \tau\_1, \dots, \tau\_k) \mathbf{x}(\tau\_1) \cdots \mathbf{x}(\tau\_k) d\tau\_1 \cdots d\tau\_k \,. \end{aligned} \tag{13.2}$$

© The Author(s) 2024 F. Beffa, *Weakly Nonlinear Systems*, Understanding Complex Systems, https://doi.org/10.1007/978-3-031-40681-2\_13

301

w*<sup>k</sup>* is the *kth order fundamental kernel* of the system. As for WNTI systems, to guarantee uniqueness, we require it to be symmetric in the variables τ1,...,τ*<sup>k</sup>* .

Generalizations valid for a wider class of input signals can be done in the same way as was done for LTV systems. Note that w*<sup>k</sup> x*⊗*<sup>k</sup>* is a distribution of the single variable *t* and not a higher dimensional distribution as for WNTI systems. The reason for this is explained next.

Consider a WNTV system described by a differential equation of the form

$$L(t,D)\mathbf{y} = N(t,D)\mathbf{x} + c\_2(t)\mathbf{y}^2 + c\_3(t)\mathbf{y}^3 + \cdots$$

with

$$\begin{aligned} L(t, D) &= D^m + a\_{m-1}(t)D^{m-1} + \dots + a\_0(t) \\ N(t, D) &= b\_n(t)D^n + b\_{n-1}(t)D^{n-1} + \dots + b\_0(t) \end{aligned}$$

and where all coefficients *ai*, *bi* and *ci* are indefinitely differentiable functions. The equation can be solved iteratively as in the case of WNTI systems. We first solve the linear part of the equation. The solution *y*<sup>1</sup> is then used in the nonlinear terms to compute "nonlinear sources" of second order. With them we solve the part of the equation consisting of terms of second order only, a linear equation, and so on.

There is an important difference compared to the case of WNTI systems: in the case of WNTI systems, to get around the lack of a general multiplication between arbitrary distributions, we made use of a direct product of distributions and introduced a multiplication based on the tensor product. Here the same method doesn't work as the coefficients of the differential equation are functions of the single time variable *t* and it is unclear how to adapt them for use with higher order distributions. For this reason here the responses of all orders *yk* are distributions of the single variable *t*. To solve the equation we must therefore assume the existence of all appearing multiplications and powers *y<sup>n</sup>*, *k* = 2, 3,... .

If we consider *t* as a fix parameter then the multiplication between components of *y* act as a tensor product like operation. Consider the product between *yk* and *yl*

$$y\_k(t)y\_l(t) = \int\_0^t \cdots \int\_0^t w\_k(t, \tau\_1, \dots, \tau\_k) x^{\otimes k}(\tau\_1, \dots, \dots, \tau\_k) d\tau\_1 \cdots d\tau\_k$$

$$\begin{split} \int\_0^t \cdots \int\_0^t w\_l(t, \tau\_1, \dots, \tau\_l) x^{\otimes l}(\tau\_1, \dots, \dots, \tau\_l) d\tau\_1 \cdots d\tau\_l \\ = \int\_0^t \cdots \int\_0^t w\_k(t, \tau\_1, \dots, \tau\_k) w\_l(t, \tau\_{k+1}, \dots, \tau\_{k+l}) \\ \qquad \qquad \cdot \ x^{\otimes k+l}(\tau\_1, \dots, \tau\_{k+l}) d\tau\_1 \cdots d\tau\_{k+l} . \end{split} \tag{13.3}$$

The result has the form of a response of order *k* + *l* which can be interpreted as a "nonlinear source" generated by nonlinearities and lower order responses as desired.

To solve the equation we must be able to solve the equation for each order independently and verify that it has the desired form. Solving the equations is (in principle) simple as all equations are linear. The solution of the equation consisting of terms of order *k* is given by

$$\mathbf{y}\_k(t) = \int\_0^t \int\_0^\tau \cdots \int\_0^\tau v(t,\tau) z\_k(\tau, \tau\_1, \dots, \tau\_k) \mathbf{x}^{\otimes k}(\tau\_1, \dots, \tau\_k) d\tau\_1 \cdots d\tau\_k d\tau\_k$$

with *zk x*⊗*<sup>k</sup>* the nonlinear source and v the fundamental kernel of the equation.

To show that this expression can be transformed in the desire form, consider the integral

$$\int\limits\_{0}^{\mathsf{L}} \int\limits\_{0}^{\mathsf{L}} \int f(\mathsf{z}, \mathsf{z}\_{1}, \mathsf{z}\_{2}) \mathsf{d}\mathsf{z}\_{2} \mathsf{d}\mathsf{z}\_{1} \mathsf{d}\mathsf{z}\_{2}$$

As a first step we exchange the order of integration between τ and τ<sup>1</sup> and obtain

$$\int\limits\_{0}^{t}\int\limits\_{\tau\_{1}}^{t}\int\limits\_{0}^{\tau}f(\tau,\tau\_{1},\tau\_{2})\mathrm{d}\tau\_{2}\mathrm{d}\tau\mathrm{d}\tau\_{1}\,\dots$$

We then perform a second exchange between τ<sup>2</sup> and τ (refer to Fig. 13.1) which results in

$$\int\limits\_{0}^{\cdot} \int\limits\_{0}^{\cdot} \int\limits\_{\max(\tau\_{1}, \tau\_{2})} f(\tau, \tau\_{1}, \tau\_{2}) \text{d}\tau \text{d}\tau\_{2} \text{d}\tau\_{1} \text{d}\tau$$

If the integral would involve more integrations between 0 and τ then we could repeat the last step more times giving

$$\int\_{0}^{\cdot} \cdots \int\_{0}^{\cdot} \int\_{\max(\mathbf{r}\_{1}, \dots, \mathbf{r}\_{k})} f(\mathbf{r}, \mathbf{r}\_{1}, \dots, \mathbf{r}\_{k}) \mathrm{d}\tau \mathrm{d}\tau\_{k} \cdots \mathrm{d}\tau\_{1} \,. \tag{13.4}$$

Using this result we can transform the above expression for *yk* (*t*) into

$$\begin{aligned} \mathbf{y}\_k(t) = \int\_0^t \cdots \int\_0^t \int\_{\max(\tau\_1, \dots, \tau\_k)}^t v(t, \tau) z\_k(\tau, \tau\_1, \dots, \tau\_k) d\tau \\ \mathbf{x}^{\otimes k}(\tau\_1, \dots, \tau\_k) d\tau\_1 \cdots d\tau\_k \end{aligned}$$

which has the desired form w*<sup>k</sup> x*⊗*<sup>k</sup>* .

# *13.1.2 Time-Varying Nonlinear Impulse Responses*

As for LTV systems, the response of WNTV systems can also be expressed in terms of the *time-varying nonlinear impulse responses*

$$h\_k(t, \xi\_1, \dots, \xi\_k) := w\_k(t, t - \xi\_1, \dots, t - \xi\_k) \tag{13.5}$$

and the convolution operator ∗*<sup>t</sup>* for time varying systems

$$\begin{split} h\_k(t, \xi\_1, \dots, \xi\_k) \*\_t \ge^{\otimes k} (\xi\_1, \dots, \xi\_k) &:= \\ \int\_0^t \cdots \int\_0^t h\_k(t, t - \xi\_1, \dots, t - \xi\_k) \ge (\xi\_1) \cdots \ge (\xi\_k) \mathrm{d}\xi\_1 \cdots \mathrm{d}\xi\_k \,. \end{split} \tag{13.6}$$

#### **Example 13.1**

Consider a WNTV system described by the following differential equation

$$D\mathbf{y} + a(t)\mathbf{y} = \mathbf{x} + \mathbf{y}^2\,.$$

We are interested in the second order fundamental kernel of the system.

The fundamental kernel of the linearized equation is given by (12.26) which, taking into account the commutativity of the product of scalar functions simplifies to

$$w\_1(t, \tau\_1) = \mathbf{e}^{-\int\_{\tau\_1}^t a(\lambda)d\lambda}.$$

With it the linear response of the system is

$$y\_1(t) = w\_1(t, \mathfrak{r}\_1) \star x(t) \,.$$

Given *y*<sup>1</sup> we can compute the "nonlinear source" of second order

$$\mathbf{y}\_1^2(t) = \int\limits\_{0}^t \int\limits\_{0}^t w\_1(t,\tau\_1) w\_1(t,\tau\_2) \mathbf{x}(\tau\_1) \mathbf{x}(\tau\_2) \mathbf{d}\tau\_1 \mathbf{d}\tau\_2 \dots$$

With it we can then solve the equation consisting of terms of second order only

$$(D + a(t))\chi\_2 = \chi\_1^2 \, .$$

The fundamental kernel of this equation is the same as the one of the first order equation. The second order response of the system is therefore

$$\begin{aligned} \mathbf{y}\_2(t) &= \int\_0^t w\_1(t,\tau) \mathbf{y}\_1^2(\tau) d\tau \\ &= \int\_0^t \int\_0^t \int\_{\max(\mathbf{r}\_1,\mathbf{r}\_2)}^t w\_1(t,\tau) w\_1(\tau,\tau\_1) w\_1(\tau,\tau\_2) d\tau \ge (\tau\_1) \mathbf{x}(\tau\_2) d\tau\_1 d\tau\_2 \dots \end{aligned}$$

The second order fundamental kernel of the system can be found by comparing this expression with *y*<sup>2</sup> = w2(*t*, τ1, τ2) *x*⊗<sup>2</sup>(τ1, τ2) giving

$$w\_2(t, \tau\_1, \tau\_2) = \int\_{\max(\tau\_1, \tau\_2)}^t \mathbf{e}^{-\int\_{\tau\_1}^t a(\lambda) d\lambda - \int\_{\tau\_1}^t a(\lambda) d\lambda - \int\_{\tau\_2}^t a(\lambda) d\lambda} d\tau \ .$$

As a check we verify that in the special case in which *a*(*t*) is constant we obtain the same result as in Example 9.5. Evaluating the integrals gives

$$w\_2(\mathbf{r}\_1, \mathbf{r}\_2) = \frac{1}{a} \Big(\mathbf{e}^{-a[\iota - \min(\mathbf{r}\_2, \mathbf{r}\_1)]} - \mathbf{e}^{-a(2\iota - \mathbf{r}\_1 - \mathbf{r}\_2)}\Big)$$

and, after the variable substitutions ξ*<sup>i</sup>* = *t* − τ*i*,*i* = 1, 2 we indeed obtain an expression equivalent to *h*<sup>2</sup> in Example 9.5.

# *13.1.3 Time-Varying Nonlinear Frequency Responses*

Weakly-nonlinear time-varying systems can equivalently be characterised by *timevarying nonlinear frequency responses.* The *k*th order one is defined as the Fourier transform with respect to ξ1,...,ξ*<sup>k</sup>* of the impulse response *hk* (*t*, ξ1,...,ξ*<sup>k</sup>* ). For regular distributions

$$\hat{h}\_k(t, \omega\_1, \dots, \omega\_k) := \int\_{-\infty}^{\infty} \cdots \int\_{-\infty}^{\infty} h\_k(t, \xi\_1, \dots, \xi\_k) \mathbf{e}^{-J(\boldsymbol{\alpha}, \xi)} \mathbf{d}^k \xi \tag{13.7}$$

with ω, ξ <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* .

The response of order *k* of a system can be calculated by

$$\mathbf{y}\_k(t) = \frac{1}{(2\pi)^k} \hat{h}\_k(t, \boldsymbol{\omega}\_1, \dots, \boldsymbol{\omega}\_k) \,\mathbf{e}^{J(\boldsymbol{\omega}\_1 + \dots + \boldsymbol{\omega}\_k)t} \star \hat{\mathbf{x}}^{\otimes k}(\boldsymbol{\omega}\_1, \dots, \boldsymbol{\omega}\_k) \,. \tag{13.8}$$

The derivation is entirely analogous to the one dimensional case carried out in Sect. 12.4.1.

# **13.2 Weakly Nonlinear Periodically Time-Varying Systems**

Weakly nonlinear periodically time-varying (WNPTV) systems are weakly nonlinear systems whose characteristics vary periodically in time. In other words, their fundamental kernels, impulse responses and frequency responses are periodic functions of time and can therefore be expanded in Fourier series. For example, the *k*th order frequency response of a T-periodic system can be represented by the series

$$\hat{h}\_k(t, \omega\_1, \dots, \omega\_k) = \sum\_{n = -\infty}^{\infty} \hat{h}\_{k,n}(\omega\_1, \dots, \omega\_k) \mathbf{e}^{jn\omega\_{F^k}t}$$

**Fig. 13.2** Generic representation of a WNPTV system

with ω<sup>T</sup> = 2π/T. This representation highlights the fact that such systems can be represented by a parallel connection of a countable set of weakly nonlinear *timeinvariant* networks whose outputs are shifted in frequency by a multiple of ω<sup>T</sup> (see Fig. 13.2). Practical applications where this representation is particularly useful include the analysis and design of communication systems.

In the rest of this section we focus on the special case in which weakly nonlinear periodically time-varying systems are driven by a set of tones. This will reveal a spectrum characteristic of this type of systems.

# *13.2.1 Discrete Convolution*

Before turning to actually calculating the response of WNPTV systems driven by a set of tones, it's convenient to introduce some notation that will simplify many expressions.

A series <sup>∞</sup> *n*=−∞ *an* is *absolutely convergent* if the sum of the absolute values of the terms converges

$$\sum\_{n=-\infty}^{\infty} |a\_n| < \infty.$$

In this case the value of the series doesn't depend on the order of the elements. The product of two absolutely convergent series <sup>∞</sup> *<sup>n</sup>*=−∞ *an* and <sup>∞</sup> *<sup>n</sup>*=−∞ *bn* is also absolutely convergent

308 13 Weakly Nonlinear Time-Varying Systems

$$|\sum\_{n=-\infty}^{\infty} a\_n \sum\_{n=-\infty}^{\infty} b\_n| \le \sum\_{n=-\infty}^{\infty} |a\_n| \sum\_{n=-\infty}^{\infty} |b\_n| < \infty$$

and can be expressed as

$$\sum\_{n=-\infty}^{\infty} a\_n \sum\_{n=-\infty}^{\infty} b\_n = \sum\_{n=-\infty}^{\infty} \left( \sum\_{q=-\infty}^{\infty} a\_q b\_{n-q} \right).$$

The inner sum in the last expression is called *discrete convolution* (or *Cauchy product*). For convenience, we are going to denote it by

$$(a\_\cdot \*\_d b\_\cdot)\_n := \sum\_{q = -\infty}^\infty a\_q b\_{n-q} \tag{13.9}$$

The discrete convolution is associative and commutative

$$\begin{aligned} \left( (a\_\cdot \ast\_d b\_\cdot) \ast\_d c\_\cdot \right)\_n &= (a\_\cdot \ast\_d (b\_\cdot \ast\_d c\_\cdot) \big|\_n) \\ (a\_\cdot \ast\_d b\_\cdot)\_n &= (b\_\cdot \ast\_d a\_\cdot)\_n \end{aligned}$$

and has a unit element, the *Kronecker delta*

$$\delta\_n = \begin{cases} 1 & n = 0 \\ 0 & n \neq 0 \end{cases} \tag{13.10}$$

# *13.2.2 Product of Fourier Series*

In the following we use the convention introduced in Sect. 4.5 of denoting the *k*th Fourier coefficient of a distribution *f* by *ck* ( *f* ).

It is well known that if *t* → *f* (*t*) is a continuous T-periodic function, its Fourier series is absolutely convergent for all values of*t* [23]. If *f* and *g* are two such functions then their product is well-defined and continuous. In addition, the Fourier coefficients of the product can be expressed in terms of the coefficients of the individual series

$$\sum\_{n=-\infty}^{\infty} c\_n(f) \mathbf{e}^{l^{n\alpha\gamma\cdot l}} \sum\_{n=-\infty}^{\infty} c\_n(g) \mathbf{e}^{l^{n\alpha\gamma\cdot l}} = \sum\_{n=-\infty}^{\infty} \left( \sum\_{q=-\infty}^{\infty} c\_q(f) c\_{n-q}(g) \right) \mathbf{e}^{l^{n\alpha\gamma\cdot l}}.$$

The coefficients of the product are evidently the convolution product of the coefficients of the two series

$$\left(c\_{\cdot}(f)\*\_{d}c\_{\cdot}(\mathbf{g})\right)\_{n} = \sum\_{q=-\infty}^{\infty} c\_{q}(f)c\_{n-q}(\mathbf{g})\,. \tag{13.11}$$

Let now *f* and *g* be two T-periodic *distributions*. Let further introduce the sequences ( *fk* ) and (*gk* ) defined by

$$f\_k = f \ast \beta\_k \qquad \text{and} \qquad \mathbf{g}\_k = \mathbf{g} \ast \beta\_k$$

with (β*<sup>k</sup>* ) a sequence of functions in D converging to δ (for example the sequence of Example 2.5). Then ( *fk* ) and (*gk* ) are sequences of indefinitely differentiable functions converging as distributions to *f* and *g* respectively. The Fourier series of each member of each sequence is thus absolutely convergent.

If the product *f g* exists, then it defines a T-periodic distribution which must coincide with the limit of the sequence

$$f \text{ g} = \lim\_{k \to \infty} f\_k \text{ g}\_k \cdot \text{g}$$

The Fourier series of each member of the sequence can be written as

$$f\_k \underset{\mathfrak{g}}{\text{g}}\_k = \sum\_{n = -\infty}^{\infty} \left( c\_\cdot(f\_k) \ast\_d c\_\cdot(\mathfrak{g}\_k) \right)\_n \mathbf{e}^{\prime n \alpha \circ f} \dots$$

Therefore, from the assumption of convergence and the uniqueness of the Fourier series representation of periodic distributions we conclude that the Fourier coefficients of *f g* must be

$$c\_n(f\ g) = \left(c\_\cdot(f) \ast\_d c\_\cdot(g)\right)\_n := \lim\_{k \to \infty} \left(c\_\cdot(f\_k) \ast\_d c\_\cdot(g\_k)\right)\_n.$$

#### **Example 13.2**

Consider the regular T-periodic distribution shown in Fig. 13.3 that we denote by *f* and whose Fourier coefficients are

$$c\_n(f) = \begin{cases} 0 & n \text{ even} \\ \frac{2}{\pi n}(-1)^{\frac{n-1}{2}} & n \text{ odd} \dots \end{cases}$$

From the graph it's apparent that the product of *f* with itself is well-defined and produces the regular distribution with constant value 1. The Fourier coefficients are evidently all zero apart from the zeroth one whose value is one *c*0( *f f* ) = 1. We show that, despite the fact that the Fourier series of *f* is not absolutely convergent, *c*.( *f* ) ∗*<sup>d</sup> c*.( *f* ) produces the right answer.

**Fig. 13.3** Square regular T-periodic distribution

First note that for *n* odd either *cq* ( *f* ) or *cn*−*<sup>q</sup>* ( *f* ) is zero for every value of *q*. Hence,

$$\left(c\_{.}(f)\*\_{d}c\_{.}(f)\right)\_{n} = 0 \qquad n \text{ odd} \dots$$

For *n* even the convolution product is

$$\begin{aligned} \left(c\_{\circ}(f)\*\_{d}c\_{\circ}(f)\right)\_{n} &= \sum\_{k=-\infty}^{\infty} \frac{2}{\pi(2k+1)}(-1)^{k} \frac{2}{\pi(n-(2k+1))}(-1)^{\frac{n-2(k+1)}{2}}\\ &= (-1)^{\frac{n}{2}-1} \left(\frac{2}{\pi}\right)^{2} \sum\_{k=-\infty}^{\infty} \frac{1}{(2k+1)(n-(2k+1))}.\end{aligned}$$

For the particular case *n* = 0 the summation in the last expression can be written as

$$\sum\_{k=-\infty}^{\infty} \frac{-1}{(2k+1)^2} = -2\sum\_{k=0}^{\infty} \frac{1}{(2k+1)^2} = -\frac{\pi^2}{4} \dots$$

The zeroth coefficient is therefore

$$\left(c\_\cdot(f) \*\_d c\_\cdot(f)\right)\_0 = \left(\frac{2}{\pi}\right)^2 \frac{\pi^2}{4} = 1\dots$$

To evaluate the Fourier coefficient for *n* = 0 it's convenient to rewrite the summation as

$$\sum\_{k=-\infty}^{\infty} \frac{1}{(2k+1)(n-(2k+1))} = \sum\_{k=-\infty}^{\infty} \frac{1/n}{2k+1} + \frac{1/n}{(n-(2k+1))} \dots$$

In this form it's apparent that for each value of *n* all terms cancel in pair (the *k*th with the (*n*/2 + *k*)th), thus giving

$$\left(c\_\cdot(f) \ast\_d c\_\cdot(f)\right)\_n = 0 \qquad n \neq 0 \text{ even}\dots$$

# *13.2.3 Response to Multi-tones*

Consider a weakly nonlinear periodically time-varying system described by the differential equation

$$L(t,D)\mathbf{y} = N(t,D)\mathbf{x} + c\_2(t)\mathbf{y}^2 + c\_3(t)\mathbf{y}^3 + \cdots$$

with

$$L(t, D) = D^m + a\_{m-1}(t)D^{m-1} + \dots + a\_0(t)$$

$$N(t, D) = b\_n(t)D^n + b\_{n-1}(t)D^{n-1} + \dots + b\_0(t)$$

and where all coefficients *ai*, *bi* and *ci* are smooth T-periodic functions. We assume that the system is driven by *N* complex tones

$$\mathbf{x}(t) = A\_1 \mathbf{e}^{J\alpha\_1 t} + \dots + A\_N \mathbf{e}^{J\alpha\_N t}$$

with *A*1,..., *AN* the phasors of the tones.

In Sect. 12.4.2 we saw that the solution of the linear part of the equation is given by

$$\mathbf{y}\_1(t) = \sum\_{n=1}^{N} A\_n \hat{h}\_1(t, \alpha\_n) \mathbf{e}^{j\alpha\_n t}$$

with *h*ˆ<sup>1</sup> the (first order) time-varying frequency response of the system. We also saw (Sect. 12.5.2) that *t* → *h*ˆ(*t*, ω1) is a T-periodic function. Expanding it in a Fourier series, *y*<sup>1</sup> can be written as

$$\mathbf{y}\_1(t) = \sum\_{n=1}^{N} A\_n \mathbf{e}^{f^{a\mathbf{y}\_n t}} \sum\_{q=-\infty}^{\infty} \hat{h}\_{1,q}(\omega\_n) \mathbf{e}^{f^{q a \mathbf{y} \cdot \mathbf{t}}} \dots$$

*y*<sup>1</sup> is therefore a sum of tones at *q*ω<sup>T</sup> + ω*n*.

We now solve the nonlinear equation by adding terms to *y*<sup>1</sup> in a similar way as we did for weakly nonlinear time invariant systems in Sect. 9.5. As explained in Sect. 13.1 here we must assume the existence of the powers *y<sup>k</sup>* <sup>1</sup> , *k* = 2, 3,... and the others that will appear below.

For the sake of solving the equation let's assume that the frequencies ω<sup>T</sup> , ω1,..., ω*<sup>N</sup>* are all incommensurate. Under this assumption, the only power resulting in terms proportional to *Aj Al*ej (ω*j*+ω*l*)*<sup>t</sup>* ; *j*,*l* = 1,..., *N* is the second order one

$$\begin{aligned} \mathbf{c}\_2(t)\mathbf{y}\_1^2(t) &= \sum\_{|m|=2} \frac{2!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \mathbf{e}^{I^{\mu\_m}t} \\ &\sum\_{q=-\infty}^{\infty} \left( c\_.(c\_2) \ast\_d \hat{h}\_{1,.}(\omega\_1)^{\ast\_{d^{m\_1}}} \ast\_d \cdots \ast\_d \hat{h}\_{1,.}(\omega\_N)^{\ast\_{d^{m\_N}}} \right)\_q \mathbf{e}^{I^{q\mu\_Y}t} \end{aligned}$$

with *m* the multi-index *m* = (*m*1,..., *mN* ) whose elements range from 0 to *k* (=2) and ω*<sup>m</sup>* as defined in (9.27) and repeated here for convenience

$$\boldsymbol{\omega}\_m = \sum\_{n=1}^N \boldsymbol{m}\_n \, \boldsymbol{\omega}\_n = \boldsymbol{m}\_1 \boldsymbol{\omega}\_1 + \dots + \boldsymbol{m}\_N \boldsymbol{\omega}\_N \, \boldsymbol{\omega}\_n$$

Similarly to the time invariant case we can assume that the solution of the nonlinear differential equation includes a term of second order *y*<sup>2</sup> proportional to *Aj Al*ej (ω*j*+ω*l*)*<sup>t</sup>* ; *j*,*l* = 1,..., *N*. *y*<sup>2</sup> can be found by retaining only those terms in the equation that are proportional to *Aj Al*ej (ω*j*+ω*l*)*<sup>t</sup>* . The resulting equation is linear with *c*2(*t*)*y*<sup>2</sup> <sup>1</sup> (*t*) playing the role of a source composed by tones. Exploiting linearity we can solve the equation for a single tone at *q*ω<sup>T</sup> + ω<sup>1</sup> + ω<sup>2</sup> and combine the results at the end

$$L(t,D)\hat{\mathbf{g}}\_{2,q}(t,\omega\_1,\omega\_2)\mathbf{e}^{J(q\alpha\_{\overline{\mathcal{V}}}+\omega\_1+\omega\_2)t} = \mathbf{e}^{J(q\alpha\_{\overline{\mathcal{V}}}+\omega\_1+\omega\_2)t}.$$

*t* → ˆ*g*2,*<sup>q</sup>* (*t*, ω1, ω2) is also T-periodic and can be expanded in a Fourier series

$$\hat{\mathbf{g}}\_{2,q}(t,\omega\_1,\omega\_2)\mathbf{e}^{J(q\alpha\gamma+\alpha\_1+\alpha\_2)t} = \sum\_{q\_2=-\infty}^{\infty} c\_{q\_2}(\hat{\mathbf{g}}\_{2,q})\mathbf{e}^{J\left((q+q\_2)\omega\_{\overline{\gamma}}+\omega\_1+\alpha\_2\right)t}.$$

With it the second order term *y*<sup>2</sup> is given by

$$\begin{split} y\_2(t) &= \sum\_{|m|=2} \frac{2!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \mathbf{e}^{I^{a\_{\mathfrak{l}\_1}}} \\ &\cdot \sum\_{q\_1=-\infty}^{\infty} \left( c\_1(c\_2) \ast\_d \hat{h}\_{1,\shortwedge}(\boldsymbol{\omega}\_1)^{\ast\_{d^{\mathfrak{l}\_1}}} \ast\_d \cdots \ast\_d \hat{h}\_{1,\shortwedge}(\boldsymbol{\omega}\_N)^{\ast\_{d^{\mathfrak{l}\_N}}} \right)\_{q\_1} \mathbf{e}^{I^{q\_1 \alpha \underline{\mathfrak{r}} \underline{\mathfrak{r}}}} \\ &\cdot \sum\_{q\_2=-\infty}^{\infty} c\_{q\_2}(\hat{g}\_{2,q\_1}) \mathbf{e}^{I^{q\_2 \alpha \underline{\mathfrak{r}} \underline{\mathfrak{r}}}} \end{split}$$

which, with the change of variable *l* = *q*<sup>1</sup> + *q*2, can be rewritten as

$$\mathbf{y}\_2(t) = \sum\_{|m|=2} \frac{2!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \sum\_{l=-\infty}^{\infty} \hat{h}\_{2,m,l} \mathbf{e}^{l(l\alpha\gamma + \alpha\_m)t}$$

with

$$\hat{h}\_{2,m,l} := \sum\_{q\_1=-\infty}^{\infty} \left( c\_\cdot(c\_2) \ast\_d \hat{h}\_{1,\cdot}(\omega\_1)^{\ast\_{d^{m\_1}}} \ast\_d \cdots \ast\_d \hat{h}\_{1,\cdot}(\omega\_N)^{\ast\_{d^{m\_N}}} \right)\_{q\_1} c\_{l-q\_l}(\hat{g}\_{2,q\_1}) \dots$$

The second order response of the system thus consists of tones at all possible sums of two of the input tone frequencies at a time, around each of the harmonics of the fundamental frequency of the system.

The higher order responses can be calculated in a similar manner. The *k*th order response has the form

$$\mathbf{y}\_k(t) = \sum\_{|m|=k} \frac{k!}{m!} A\_1^{m\_1} \cdots A\_N^{m\_N} \sum\_{l=-\infty}^{\infty} \hat{h}\_{k,m,l} \mathbf{e}^{l(l\omega\_l + \omega\_m)t} \tag{13.12}$$

and is composed by tones at all possible sums of *k* input tone frequencies at a time, around each of the harmonics of the system fundamental frequency. A comparison of the typical two tones response of LTI-. WNTI-, LPTV- and WNPTV-systems is shown in Fig. 13.4.

**Fig. 13.4** Comparison of typical two (real) tones spectral response of LTI-, LPTV-, WNTI- and WNPTV-systems

Note that the factor

$$\sum\_{l=-\infty}^{\infty} \hat{h}\_{k,m,l} \mathbf{e}^{jl\alpha\eta\eta}$$

appearing in the *k*th order response *yk* is the Fourier series of the time-varying *k*th order nonlinear frequency response of the system *h*ˆ *<sup>k</sup>* (*t*, ω1,...,ω*<sup>k</sup>* ). It is related to *h*ˆ *<sup>k</sup>*,*m*,*<sup>l</sup>* by

$$\bar{h}\_{k,m,l} = c\_l(\bar{h}\_k(t, \underbrace{\alpha\_1, \dots, \alpha\_l}\_{m\_1}, \dots, \underbrace{\alpha\_N, \dots, \alpha\_N}\_{m\_N})), \qquad |m| = k \text{ .} $$

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 14 Periodically Switched Circuits**

This chapter is devoted to illustrating applications of the theory of weakly nonlinear time-varying systems with practical examples. After introducing the class of electrical networks called switched circuits, we analyse in details some instantiations. We analyse a practical implementation of a quadrature modulator, including its distortion and other aspects of fundamental importance in applications. As another example of the usefulness of time-varying circuits, we illustrate how they allow implementing highly selective filters that are otherwise unfeasible in small integrated form.

# **14.1 Switched Circuits**

An important class of circuits that finds many applications is the *Switched circuits* one. These are circuits whose only time varying components are switches. Despite the fact that switches are time-varying resistors, this class of systems can be analysed as a sequence of time invariant circuits, each one valid over an interval over which all switches remain in the same state.

Let *<sup>O</sup>* denote an open interval of <sup>R</sup>*<sup>n</sup>*. The space <sup>D</sup>*<sup>O</sup>* of test functions <sup>φ</sup> with support contained in *O* is a vector subspace of D. A distribution on *O* is a distribution defined on D*<sup>O</sup>* . The vector space of distributions defined on *O* is denoted by D- *O* .

Consider a linear switched circuit including at least a capacitor or an inductor. Without loss of generality we can assume the network to be driven by a single independent source *<sup>x</sup>* (superposition principle). Let *ti*,*<sup>i</sup>* <sup>∈</sup> <sup>N</sup> denote the times at which any of the ideal switches changes state and *Oi* the open intervals (*ti*, *ti*+<sup>1</sup>). In any of these intervals the network is described by a system of first order linear differential equations

$$Du = A\_i u + B\_i x$$

with *u* the state of the network that we can represent by the voltage across the capacitors and the currents through the inductors. Suppose that the network is in the zero state and that we apply a Dirac impulse at time τ with *ti* <τ< *ti*+1. Then, for τ < *t* < *ti*+<sup>1</sup> the state *u* evolves as a continuous function. At time *ti*+<sup>1</sup> some of the switches change state. If we exclude circuits including closed loops composed exclusively by ideal inductors, ideal voltage sources and (closed) ideal switches, then at this time one of the following happens


$$v\_1(t\_{i+1}+)[C\_1+C\_2] = v\_2(t\_{i+1}+)[C\_1+C\_2] = v\_1(t\_{i+1}-)C\_1+v\_2(t\_{i+1}-)C\_2\ \ .$$


These conditions specify initial conditions for the interval *Oi*+<sup>1</sup> that, together with the differential equation, allow to calculate the evolution of the state *u* of the network in that interval. The same arguments apply to all subsequent switching times. The state *u* is therefore fully determined and can be extended to a distribution on the hole of <sup>R</sup>. The fundamental kernel *<sup>W</sup>* is therefore well defined for <sup>τ</sup> <sup>∈</sup> *Oi*,*<sup>i</sup>* <sup>∈</sup> <sup>N</sup>.

The only problematic cases are when τ coincides with one of the switching times. To work around this problem we limit the set of allowed input signals to the set of regular bounded distributions. Then, assuming further *x* to be right-sided, the state of the circuit can be represented by the integral

$$\mu(t) = \int\_0^t W(t,\tau)B(\tau)x(\tau)d\tau \dots$$

Since the values of τ at which *W* is not defined is a set of zero measure, the output is well-defined at all times.

The above discussion shows that for switched circuits the computation of the time-varying impulse response is a straightforward process. In addition, since the impulse response in each interval *Oi* corresponds to the one of an LTI system, the computation of the time varying frequency response by Fourier transformation of *h*(*t*,ξ) does not pose any problem.

*Linear periodically switched circuits* are linear switched circuits in which the operation of the switches is periodic. For these circuits the time-varying impulse response *h*(*t*,ξ) and the time-varying frequency response *h*ˆ(*t*, ω) are periodic in time.

In the following we illustrate the use of this technique to analyse idealised versions of some practical periodically switched circuits used in communication receivers and transmitters.

# **14.2 Voltage-Mode Quadrature Modulator**

In this section we analyse an implementation of the quadrature modulator of Example 12.9 suitable for realisation in a CMOS technology and shown in Fig. 14.1. The input signals v*<sup>I</sup>* and v*<sup>Q</sup>* (called *r* and *q* in Example 12.9) are applied differentially. We assume the LO signals to be non-overlapping and to have *very fast edges*. In fact we model the gate signals as having rectangular waveform high 25% of the time and with a relative delay among them of T/4. We assume further that when the corresponding LO signal is high each MOSFET can be modeled as a resistor of value *rO N* , while when the LO signal is low it can be modeled as an open circuit (infinite resistance). We are interested in the signal v*<sup>A</sup>* at the input of the amplifier following the switching transistors and assume that the input impedance of the latter can be adequately modeled as a capacitor.

**Fig. 14.2** Voltage-mode quadrature modulator model

Under these assumptions the circuit is *linear*. Hence, we can analyse the contribution to the output of each input signal independently. Figure 14.2 shows the model used to analyse the contribution of signal v<sup>+</sup> *<sup>I</sup>* where we have combined *rO N* with the source resistance (assumed equal at all inputs)

$$r = r\_{ON} + \mathcal{R}\_S$$

and where we assume the switches to be closed when the corresponding LO control signal *li*,*i* = 0,..., 3 is high and open when low. The control signals are defined by

$$l\_i(t) := l(t - i\mathcal{T}/4) \quad \text{ } i = 0, \dots, 3 \text{ } $$

with *l* the signal introduced in Example 12.10 and shown in Fig. 12.10a with τ = T/4.

# *14.2.1 Single Input Response*

In this subsection we analyse the response to v<sup>+</sup> *<sup>I</sup>* . To compute its contribution to the output, we apply an input impulse at time τ . If the impulse is applied when the input switch is open, then its contribution is zero. If it's applied when the switch is closed, τ ∈ (−T/8,T/8) mod T, its contribution is described by the differential equation

$$(1 + rCD)v\_A = \delta(t - \mathfrak{r})\,.$$

Note that the capacitor C is connected in parallel with a resistor of value *r* at all times! The fundamental kernel is therefore

$$W(t,\tau) = \omega\_{3dB} \operatorname{e}^{-\omega\_{3dB}(t-\tau)} \operatorname{l}\_{+}(t-\tau) \operatorname{l}\_{0}(\tau), \qquad \omega\_{3dB} = \frac{1}{rC}$$

and can be interpreted as the impulse response of an LTI system whose input signal is v<sup>+</sup> *<sup>I</sup>* (*t*)*l*0(*t*).

The time-varying impulse response of the system can be obtained from the above fundamental kernel by applying the variable transformation ξ = *t* − τ

$$h(t,\xi) = \alpha\_{3dB} \operatorname{\mathbf{e}}^{-\omega\_{3dB}\xi} \mathbf{1}\_+(\xi) \, l\_0(t-\xi) \, .$$

We compute the response of the system to a complex tone through the timevarying frequency response. The latter is obtained by Fourier transforming *h*(*t*,ξ) with respect to ξ

$$\hat{h}(t,\omega) = \int\_{-\infty}^{\infty} h(t,\xi) \,\mathrm{e}^{-f\omega\xi} \,\mathrm{d}\xi\,\mathrm{d}\xi$$

Using the Fourier series of *l*<sup>0</sup>

$$l\_0(t) = \sum\_{n = -\infty}^{\infty} a\_n \mathbf{e}^{\prime n \alpha \gamma \cdot t}, \qquad \qquad a\_n = a\_{-n} = \begin{cases} \frac{1}{4} & n = 0\\ \frac{1}{\pi n} \sin(n \frac{\pi}{4}) & n > 0 \end{cases}$$

where ω<sup>T</sup> = 2π/T, we have

$$\begin{split} \hat{h}(t,\omega) &= \omega\_{3dB} \int\_0^\infty \mathbf{e}^{-(\omega\_{3dB} + j\omega)\xi} \sum\_{n=-\infty}^\infty a\_n \mathbf{e}^{j n \omega\_T (t-\xi)} \, \mathrm{d}\xi \\ &= \omega\_{3dB} \sum\_{n=-\infty}^\infty a\_n \int\_0^\infty \mathbf{e}^{-[\omega\_{3dB} + j(\omega + n\alpha r)]\xi} \, \mathrm{d}\xi \, \mathrm{e}^{j n \omega\_T t} \\ &= \sum\_{n=-\infty}^\infty \hat{h}\_n(\omega) \, \mathrm{e}^{j n \alpha \gamma t} \end{split} \tag{14.1}$$

with

$$\hat{h}\_n(\omega) = \omega\_{3dB} a\_n \int\_0^\infty \mathbf{e}^{-[\alpha\_{3dB} + f(\omega + n\omega r)]\xi} \, d\xi$$

$$= \frac{a\_n}{1 + f \frac{\alpha + n\alpha r}{\alpha \nu\_{3B}}} \,. \tag{14.2}$$

The response of the system to a complex tone of angular frequency ω is thus

$$v\_A(t) = \sum\_{n = -\infty}^{\infty} \hat{h}\_n(\omega) \,\mathbf{e}^{I(\omega + n\omega \gamma)t} \tag{14.3}$$

and, as remarked above, is seen to be equal the response of an LTI system with transfer function

320 14 Periodically Switched Circuits

$$H(\mathbf{s}) = \frac{1}{1 + \frac{s}{\alpha\_{\text{3dB}}}} \tag{14.4}$$

to the input

$$v\_I^+(t)l\_0(t) = \sum\_{n = -\infty}^{\infty} a\_n \mathbf{e}^{\mathbf{j}\cdot(\omega + n\alpha \mathbf{r})\mathbf{r}} \ .$$

# *14.2.2 Input Current and Switched Resistor*

Before combining the outputs from the four input signals v<sup>+</sup> *<sup>I</sup>* , v<sup>−</sup> *<sup>I</sup>* , v<sup>+</sup> *<sup>Q</sup>* and v<sup>−</sup> *<sup>Q</sup>* we compute the input current drawn from the input v<sup>+</sup> *<sup>I</sup>* when the other sources are disabled. When the input switch is closed, the input current is equal to the current flowing into the capacitor, while when the switch is open, the current is zero

$$i\_S(t) = l\_0(t)i\_C(t), \qquad i\_C(t) = CDv\_I^+(t) \ .$$

The current is therefore given by

$$\begin{split} \dot{\mathbf{u}}\_{S}(t) &= \left| \sum\_{n=-\infty}^{\infty} a\_{n} \mathbf{e}^{\prime n \alpha \gamma t} \right| \left| \sum\_{n=-\infty}^{\infty} \frac{a\_{n}}{r} \frac{J \frac{(\omega + n \alpha \gamma)}{\alpha\_{\text{MB}}}}{1 + J \frac{\alpha + n \alpha \gamma}{\alpha\_{\text{MB}}}} \mathbf{e}^{J(\omega + n \alpha \gamma)t} \right| \\ &= \sum\_{m=-\infty}^{\infty} \sum\_{n=-\infty}^{\infty} \frac{a\_{m} a\_{n}}{r} \frac{J \frac{(\omega + n \alpha \gamma)}{\alpha\_{\text{MB}}}}{1 + J \frac{\alpha + n \alpha \gamma}{\alpha\_{\text{MB}}}} \mathbf{e}^{J(\omega + (n+m)\alpha \gamma)t} \\ &= \sum\_{k=-\infty}^{\infty} \mathbf{y}\_{k}(\omega) \mathbf{e}^{J(\omega + k \alpha \gamma)t} \end{split} \tag{14.5}$$

with

$$\text{y}\_k(\omega) := \sum\_{n=-\infty}^{\infty} a\_{k-n} a\_n \frac{J(\omega + n\alpha \gamma)C}{1 + J\frac{\alpha + n\alpha \gamma}{\alpha \gamma\_{d\beta}}} \tag{14.6}$$

where in the last step we made the substitution *k* = *m* + *n*.

Let's consider more closely the term for *k* = 0. Substituting the expression for *an* we obtain

$$\text{y}\_0(\omega) = j\omega \frac{C}{16} + \sum\_{n \neq 0} \frac{\sin^2(n\frac{\pi}{4})}{(\pi n)^2} \frac{j(\omega + n\alpha\_T)C}{1 + j\frac{\alpha + n\alpha\_T}{\alpha\_{34B}}} \dots$$

To find an approximate value for this series it's useful to separate real- and imaginaryparts

$$\mathfrak{y}\_0(a) = \mathfrak{g}\_0(a) + jb\_0(a)\,.$$

We start by simplifying the imaginary part

$$b\_0(\omega) = \omega \frac{C}{16} + \sum\_{n \neq 0} \frac{\sin^2(n\frac{\pi}{4})}{(\pi n)^2} \frac{(\omega + n\alpha \gamma)C}{1 + \left(\frac{\omega + n\alpha \gamma}{\alpha\_{\text{3dB}}}\right)^2}$$

The first thing to note is that the terms proportional to *n*ω<sup>T</sup> with *n* > 0 cancel with the ones for *n* < 0 so that we obtain

$$b\_0(\omega) = \omega C \left[ \frac{1}{16} + \sum\_{n \neq 0} \frac{\sin^2(n\frac{\pi}{4})}{(\pi n)^2} \frac{1}{1 + \left(\frac{\omega + n\alpha\gamma}{\omega \nu \omega}\right)^2} \right].\tag{14.7}$$

All terms in the square bracket are positive. If we assume |ω| ω<sup>T</sup> < ω3*d B* the terms decrease as 1/*n*<sup>2</sup> for *n* < ω<sup>T</sup> /ω3*d B* and as 1/*n*<sup>4</sup> for larger values of *n*. We can therefore bound the series by

$$\sum\_{n\neq 0} \frac{\sin^2(n\frac{\pi}{4})}{(\pi n)^2} \frac{1}{1 + \left(\frac{\omega + n\alpha\_T}{\alpha\_{3\Delta R}}\right)^2} < \sum\_{n\neq 0} \frac{1}{(\pi n)^2} \dots$$

Using the known result

$$\sum\_{n=1}^{\infty} \frac{1}{n^2} = \frac{\pi^2}{6}$$

we thus obtain the upper bound

$$\frac{b\_0(\omega)}{\omega C} < \frac{1}{16} + 2\sum\_{n=1}^{\infty} \frac{1}{(\pi n)^2} = \frac{19}{48} \approx 0.40\dots$$

This bound is tighter for large values of ω3*d B*/ω<sup>T</sup> .

To obtain a value closer to the actual value of the series we note that for *N* 1

$$\sum\_{n=1}^{N} \sin^2(n\frac{\pi}{4}) \approx \frac{N}{2}.\tag{14.8}$$

Instead of bounding sin<sup>2</sup>(*n*π/4) by 1, we approximate its value by 1/2 independently of *n* to obtain

$$\frac{b\_0(\alpha)}{\alpha C} < \frac{1}{16} + \frac{1}{6} = \frac{11}{48} \approx 0.23\dots$$

Figure 14.3 shows the normalized value of *b*<sup>0</sup> as a function of ω3*d B*/ω<sup>T</sup> computed from (14.7). We see that, while the argument to obtain the above approximate value

is questionable, for large values of ω3*d B*/ω<sup>T</sup> the approximation is remarkably close to the real value.

We next turn to the real part of *y*0(ω)

$$\log\_0(\omega) = \frac{1}{r} \sum\_{n \neq 0} \frac{\sin^2(n\frac{\pi}{4})}{(\pi n)^2} \frac{\left(\frac{\omega + n\omega r}{\omega\_{\beta \Delta B}}\right)^2}{1 + \left(\frac{\omega + n\omega r}{\omega\_{\beta \Delta B}}\right)^2} \tag{14.9}$$

If we assume |ω| ω<sup>T</sup> < ω3*d B* then to a good approximation we have

$$g\_0(\omega) \approx \frac{2}{r} \sum\_{n=1}^{\infty} \frac{\sin^2(n\frac{\pi}{4})}{\pi^2} \frac{\left(\frac{\alpha r}{\alpha\_{3dB}}\right)^2}{1 + \left(\frac{n\alpha r}{\alpha\_{3dB}}\right)^2}$$

If ω<sup>T</sup> /ω3*d B* 1 then the quadratic term in the denominator can be neglected in a large number of terms up to approximately *<sup>N</sup>* <sup>=</sup> <sup>ω</sup>3*d B* <sup>ω</sup><sup>T</sup> . The first *N* terms of the series contribute the largest part of its total value. Therefore, referring again to the approximation (14.8), we approximate again sin<sup>2</sup>(*n*π/4) by 1/2. Instead of neglecting the terms for *n* > *N* we approximate the series by the integral

$$\frac{1}{\pi^2 r} \frac{a\eta}{a\eta\_{3dB}} \int\_0^\infty \frac{1}{1+x^2} \,\mathrm{d}x$$

with

$$n\frac{a\eta\tau}{a\omicron\_{3dB}} \to x,\qquad \frac{a\eta\tau}{a\textsubscript{3dB}} \to \mathrm{d}x.$$

This integral is easily solved and we finally obtain

$$g\_0(\omega) \approx \frac{1}{r\pi^2} \frac{\alpha \gamma}{\alpha\_{3dB}} \frac{\pi}{2} = \frac{C}{T} \cdot \frac{\pi}{2}$$

Figure 14.3 shows the normalized value of *g*<sup>0</sup> as a function of ω3*d B*/ω<sup>T</sup> computed from Eq. (14.9). For ω3*d B*/ω<sup>T</sup> > 3 it's in very good agreement with the given approximation.

Figure 14.4 shows the normalized current *iS*(*t*) for a cosinusoidal input with ω = 0.1ω<sup>T</sup> and ω3*d B* = 2ω<sup>T</sup> . The curve consists of peaks in concomitance with the closing instants of switch 0, followed by an exponential decay with a time constant of approximately 1/ω3*d B* and a sudden jump to zero at the instants where switch 0 is opened. (The oscillations around the instants where switch 0 changes state are due to the Gibbs phenomenon of the Fourier series.) As the time constant is shortened by reducing the value of*r*, the curve converges to a series of Dirac pulses at the closing instants of switch 0. If we shift the closing instants of switch 0 at multiples of T we can express this behavior by

$$\lim\_{\substack{r \to 0 \\ \omega \to 0}} i\_S \left( t - \frac{\mathcal{T}}{8} \right) = \sum\_{k = -\infty}^{\infty} \frac{C}{\mathcal{T}} \mathbf{e}^{jk\omega\_{\mathcal{T}}t} \dots$$

The discrete spectrum of *iS*(*t* − T/8) for ω3*d B* = 3ω<sup>T</sup> , 20ω<sup>T</sup> and ω = 0.1ω<sup>T</sup> is shown in Fig. 14.5. The figure shows that as the value of ω3*d B*/ω<sup>T</sup> is increased, an increasing number of coefficients *yk* (ω) tend to approach the value of *C*/T as expected. At a value of *k* ≈ ω3*d B*/ω<sup>T</sup> the real and imaginary parts have roughly the same value and for larger values of *k* the magnitude of *yk* (ω) decreases.

# *14.2.3 Full Response*

We now go back to the voltage across the capacitor v*<sup>A</sup>* and calculate the combined response to all four signals v<sup>+</sup> *<sup>I</sup>* , v<sup>−</sup> *<sup>I</sup>* , v<sup>+</sup> *<sup>Q</sup>* and v<sup>−</sup> *<sup>Q</sup>*. To distinguish the four responses we will add a subscript equal to the one of the corresponding LO signal. Thus in the following we will denote the response to the signal v<sup>+</sup> *<sup>I</sup>* given in (14.3) by v*<sup>A</sup>*,0. The contribution of v<sup>−</sup> *<sup>I</sup>* differs from the one of v<sup>+</sup> *<sup>I</sup>* by (i) a shift byT/2 in the LO waveform and (ii) a reversal of sign of the input signal. Its contribution to v*<sup>A</sup>* is therefore

$$\begin{split} v\_{A,2}(t) &= -\mathbf{e}^{I\alpha t} \sum\_{n=-\infty}^{\infty} \hat{h}\_n(\omega) \, \mathbf{e}^{I n \alpha\_{\mathcal{T}}(t-\mathcal{T}/2)} \\ &= \sum\_{n=-\infty}^{\infty} (-1)^{n+1} \hat{h}\_n(\omega) \, \mathbf{e}^{I(\alpha+n\alpha\mathcal{T})t} \end{split}$$

Note that the even harmonics have opposite sign compared to the ones of v*<sup>A</sup>*,0, while the odd ones have the same sign. Therefore the combined response of v<sup>1</sup> *<sup>I</sup>* and v<sup>−</sup> *I* consists of odd harmonics only

$$v\_{A,0}(t) + v\_{A,2}(t) = 2 \sum\_{n \text{ odd}} \hat{h}\_n(\alpha) \operatorname{\mathbf{e}}^{\mathcal{I}^{(\alpha + n\alpha\_{\overline{\mathcal{I}}})t}}.$$

The response to the signal v<sup>+</sup> *<sup>Q</sup>* differs from the one to v<sup>+</sup> *<sup>I</sup>* by (i) a shift by −T/4 of the LO signal and (ii) a shift of T/4 in the input signal

$$
v\_{\mathcal{Q}}^{+} = -j\mathbf{e}^{j\alpha t} \ .
$$

Its contribution to v*<sup>A</sup>* is therefore

$$\begin{split} v\_{A,\mathfrak{z}}(t) &= -J \mathbf{e}^{\mathrm{J}\mathbf{a}^{\mathrm{f}}} \sum\_{n=-\infty}^{\infty} \hat{h}\_{\mathfrak{n}}(\boldsymbol{\omega}) \, \mathbf{e}^{\mathrm{f}n\boldsymbol{\alpha}\boldsymbol{\gamma}\cdot(t+\mathcal{T}/4)} \\ &= -\sum\_{n=-\infty}^{\infty} \boldsymbol{J}^{n+1} \hat{h}\_{\mathfrak{n}}(\boldsymbol{\omega}) \, \mathbf{e}^{\mathrm{J}(\boldsymbol{\omega}+\boldsymbol{n}\boldsymbol{\alpha}\_{\mathcal{T}})t} \end{split}$$

Similarly, the response to the signal v<sup>−</sup> *<sup>Q</sup>* differs from the one to v<sup>+</sup> *<sup>I</sup>* by (i) a shift by T/4 of the LO signal and (ii) a shift of −T/4 in the input signal

$$\begin{split} v\_{A,1}(t) &= J \mathbf{e}^{J\alpha t} \sum\_{n = -\infty}^{\infty} \hat{h}\_n(\omega) \, \mathbf{e}^{J n \alpha \gamma \,(t - \mathcal{T}/4)} \\ &= - \sum\_{n = -\infty}^{\infty} (-J)^{n+1} \hat{h}\_n(\omega) \, \mathbf{e}^{J(\alpha + n\omega\_T)t} \, \mathbf{f} \end{split}$$

Again we note that odd harmonics of v*<sup>A</sup>*,<sup>3</sup> and v*<sup>A</sup>*,<sup>1</sup> have the same sign, while even ones have opposite sign. The combined response of these two signals is therefore also composed of odd harmonics only

$$\begin{split} \upsilon\_{A,3}(t) + \upsilon\_{A,1}(t) &= -2 \sum\_{n \text{ odd}} j^{n+1} \hat{h}\_n(\omega) \, \mathbf{e}^{j(\omega + n\alpha \tau)t} \\ &= -2 \sum\_{n \text{ odd}} (-1)^{(n+1)/2} \hat{h}\_n(\omega) \, \mathbf{e}^{j(\alpha + n\alpha \tau)t} \end{split}$$

We now combine the two partial sums v*<sup>A</sup>*,<sup>0</sup> + v*<sup>A</sup>*,<sup>2</sup> and v*<sup>A</sup>*,<sup>3</sup> + v*<sup>A</sup>*,1. The terms for *<sup>n</sup>* <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>4</sup>*m*, *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup> have the same sign, while the terms at *<sup>n</sup>* = −<sup>1</sup> <sup>+</sup> <sup>4</sup>*<sup>m</sup>* have opposite sign. The total sum is therefore

$$\begin{split} v\_{A}(t) &= v\_{A,0}(t) + v\_{A,2}(t) + v\_{A,3}(t) + v\_{A,1}(t) \\ &= 4 \sum\_{m = -\infty}^{\infty} \hat{h}\_{1+4\,m}(\omega) \,\mathbf{e}^{J[\omega + (1+4m)\omega r] \,\mathbf{t}} \,\mathbf{J} \end{split}$$

Note again that the response of the system is equal to the one of an LTI system with the transfer function given by (14.4) and driven by the input signal

$$\mathbf{x}(t) = \boldsymbol{v}\_I^+(t)\boldsymbol{l}\_0(t) + \boldsymbol{v}\_I^-(t)\boldsymbol{l}\_2(t) + \boldsymbol{v}\_Q^+(t)\boldsymbol{l}\_3(t) + \boldsymbol{v}\_Q^-(t)\boldsymbol{l}\_1(t)\dots$$

Using four signal paths (two differential) this quadrature modulator cancels three spurious emission tones every four. It's a transmitter implementation of the harmonicreject mixer presented in Example 12.10 where we showed that to suppress more harmonics requires a larger number of signal paths.

The above derivation of the output signal highlights the fact that suppression of harmonics relies on exact cancelling of strong tones. We will investigate some imperfections limiting the amount of cancelling achievable in practical implementations in later sections. Before turning to that question we investigate the effect called carrier leakage.

# *14.2.4 Carrier Leakage*

Carrier leakage refers to the presence of a tone at ±ω<sup>T</sup> in the output spectrum of the modulator. In transmitters, it is one of the undesired tones close to the signal of interest (or, depending on the architecture, indeed overlapping with the modulated wanted signal) that can't be easily filtered. It is caused by the presence of small, DC offset voltages at the inputs of the modulator. These offset voltages are the result of mismatch in the driving circuits and are therefore Gaussian random variables.

We denote the DC offset random variables by *Xk* , *k* = 0,..., 3 where the index matches the one of the corresponding switch. They form the following signal at the input of the equivalent LTI system

$$\begin{split} &X\_0 l\_0(t) + X\_1 l\_1(t) + X\_2 l\_2(t) + X\_3 l\_3(t) \\ &= \sum\_{n = -\infty}^{\infty} \left( X\_0 + X\_1 \mathbf{e}^{-j\frac{\pi}{2}n} + X\_2 \mathbf{e}^{-j\pi n} + X\_3 \mathbf{e}^{-j\frac{\lambda\pi}{2}n} \right) a\_n \mathbf{e}^{j n a\_l t} \\ &= \sum\_{n = -\infty}^{\infty} \left( X\_0 + X\_1 (-j)^n + X\_2 (-1)^n + X\_3 (j)^n \right) a\_n \mathbf{e}^{j n a\_l t} . \end{split} \tag{14.10}$$

Since usually the most problematic tone is the one at ±ω<sup>T</sup> we only consider the terms for *n* = −1, 1. The term for *n* = 1 is

$$[X\_0 - X\_2 + j(X\_3 - X\_1)]a\_1 \mathbf{e}^{j a\_{\overline{1}} \overline{a}}$$

and the one at *n* = −1 is its conjugate complex. The sum of the two terms gives

$$2a\_1 \Big[ X\_c \cos(a\_{\overline{\mathcal{T}}} t) - X\_s \sin(a\_{\overline{\mathcal{T}}} t) \Big], \qquad X\_c = X\_0 - X\_2, \quad X\_s = X\_3 - X\_1 \dots$$

Linear combinations of independent Gaussian random variables are Gaussian. Therefore, if we assume *Xk* , *k* = 0,..., 3 to be independent of each other, *Xc* and *Xs* are independent Gaussian random variables as well. We denote the standard deviation of *Xc* and *Xs* by σ*<sup>X</sup>* . Their joint probability density function (PDF) is

$$p\_{X\_c, X\_s}(\mathbf{x}\_c, \mathbf{x}\_s) = p\_{X\_c}(\mathbf{x}\_c) p\_{X\_s}(\mathbf{x}\_s) = \frac{1}{2\pi\sigma\_X^2} \mathbf{e}^{-\frac{\mathbf{x}\_c^2 + \mathbf{x}\_s^2}{2\sigma\_X}}.$$

It is now convenient to pass to polar random variables. Specifically, using the relation

$$\cos(\omega t + \phi) = \cos(\phi)\cos(\omega t) - \sin(\phi)\sin(\omega t)$$

the sum of the input terms for *n* = 1 and -1 can be rewritten as

$$2a\_1 X\_r \cos(a\_{7^\circ} t + X\_\phi)$$

with the new polar random variables

$$X\_r = \sqrt{X\_c^2 + X\_s^2}$$

$$X\_\phi = \arctan\frac{X\_s}{X\_c}.$$

Given that the probability density in terms of *Xc*, *Xs* must agree with the one in terms of *Xr*, *X*φ, we must have

$$p\_{X\_c, X\_r}(\mathbf{x}\_c, \mathbf{x}\_s) \mathbf{dx}\_c \mathbf{dx}\_s = p\_{X\_r, X\_\phi}(\mathbf{x}\_r, \mathbf{x}\_\phi) \mathbf{dx}\_r \mathbf{dx}\_\phi \dots$$

From this equation and d*xc*d*xs* = *xr*d*xr*d*x*<sup>φ</sup> we therefore deduce

$$p\_{X\_r, X\_\phi}(\chi\_r, \chi\_\phi) = \frac{\chi\_r}{2\pi\sigma\_X^2} \mathbf{e}^{-\frac{x\_r^2}{2\sigma\_X^2}} \dots$$

This joint probability density function is easily factored

$$p\_{X\_r, X\_\phi}(\mathbf{x}\_r, \mathbf{x}\_\phi) = p\_{X\_r}(\mathbf{x}\_r) p\_{X\_\phi}(\mathbf{x}\_\phi).$$

which implies that *Xr* and *X*<sup>φ</sup> are independent random variables with the following probability density functions

$$p\_{X\_r}(\mathbf{x}\_r) = \frac{1}{\sigma\_X^2} \mathbf{x}\_r \mathbf{e}^{-\frac{\mathbf{v}^2}{2\sigma\_X^2}}, \qquad \mathbf{x}\_r \ge \mathbf{0} \tag{14.11}$$

$$p\_{X\_{\phi}}(\mathbf{x}\_{\phi}) = \begin{cases} \frac{1}{2\pi} & 0 \le \mathbf{x}\_{\phi} < 2\pi \\ 0 & \text{otherwise} \end{cases} \tag{14.12}$$

The phase random variable *X*<sup>φ</sup> is uniformly distributed over the full circle. The distribution of the variable *Xr* is called *Rayleigh distribution*. Its PDF and complementary cumulative density function 1 − *FXr* (*xr*) are plotted in Fig. 14.6. The PDF assumes its maximum at *xr* = σ*<sup>X</sup>* . The expected value and variance of *Xr* are

$$\frac{\frac{\frac{\frac{\frac{\frac{\cdot}{\cdot}}{\cdot}}{\cdot}}{\cdot}}{1}-\frac{\frac{\cdot}{\cdot}}{2}}$$

$$\mathbb{E}[X\_r] = \int\_0^\infty \mathbf{x}\_r p\_{X\_r}(\mathbf{x}\_r) d\mathbf{x}\_r = \sqrt{\frac{\pi}{2}} \sigma\_X$$

and

$$\text{Var}(X\_r) = \text{E}[(\mathbf{x}\_r - \text{E}[X\_r])^2] = \frac{4 - \pi}{2} \sigma\_X^2$$

respectively.

The carrier leakage of the modulator is therefore given by

$$X\_r \frac{\sqrt{2}}{\pi} \mathfrak{R} \{ H(j\omega\_T) \mathbf{e}^{j(\alpha\_T t + X\_\phi)} \}$$

with the phase uniformly distributed over the full circle, The magnitude of the tone is Rayleigh distributed and if ω<sup>T</sup> ω3*d B* has an expected value of

$$\frac{\sigma\_{\chi}}{\sqrt{\pi}}$$

From Fig. 14.6 we read that, under the same assumption, 0.1% of the modulators have a carrier leakage magnitude exceeding

$$3.7\frac{\sqrt{2}}{\pi}\sigma\_X \approx 1.67\sigma\_X\ .$$

# *14.2.5 Image-Rejection*

In this subsection we come back to the finite cancelling of harmonics in practical implementations. We assume again the common case of a transmitter up-converting the input signal to ω + ω<sup>T</sup> with |ω| ω<sup>T</sup> ω3*d B*.

In previous calculations we assumed perfectly balanced signals v<sup>−</sup> *<sup>I</sup>* = −v<sup>+</sup> *<sup>I</sup>* , v<sup>−</sup> *Q* = −v<sup>+</sup> *<sup>Q</sup>*, equal amplitudes for all signals, a phase difference between v<sup>+</sup> *<sup>I</sup>* and v<sup>+</sup> *<sup>Q</sup>* of exactly π/2 and delays between the LO signals of exactly T/4. If we now introduce small differences in the amplitudes

$$v\_I^+(t) = (A\_I + \Delta A\_I/2) \mathbf{e}^{Jat}, \quad v\_I^-(t) = -(A\_I - \Delta A\_I/2) \mathbf{e}^{Jat}$$

the even harmonics of v<sup>+</sup> *<sup>I</sup>* (*t*)*l*0(*t*) + v<sup>−</sup> *<sup>I</sup>* (*t*)*l*2(*t*) do not cancel perfectly anymore

$$v\_I^+(t)l\_0(t) + v\_I^-(t)l\_2(t) = \sum\_{n \text{ odd}} 2A\_I a\_n \mathbf{e}^{(n\alpha\_\mathcal{T} + \omega)t} + \sum\_{n \text{ even}} \Delta A\_I a\_n \mathbf{e}^{(n\alpha\_\mathcal{T} + \omega)t}$$

and similarly for the signal v<sup>+</sup> *<sup>Q</sup>*(*t*)*l*3(*t*) + v<sup>−</sup> *<sup>Q</sup>*(*t*)*l*1(*t*)

$$\begin{aligned} v\_{\mathcal{Q}}^{+}(t)l\_{3}(t) + v\_{\mathcal{Q}}^{-}(t)l\_{1}(t) &= -\sum\_{n \text{ odd}} j^{n+1} 2A\_{\mathcal{Q}}a\_{n} \mathbf{e}^{l[(n\alpha\gamma + \omega)t - n\Delta\phi]} \\ &- \sum\_{n \text{ even}} j^{n+1} \Delta A\_{\mathcal{Q}}a\_{n} \mathbf{e}^{l[(n\alpha\gamma + \omega)t - n\Delta\phi]} \end{aligned}$$

where in addition we have added a small delay error in *l*<sup>3</sup> and *l*<sup>1</sup> of τ and set φ = 2π τ/T. If we now sum these partial sums, the complete cancelling that was happening for three harmonics out of four becomes a partial cancelling. In particular this partial cancelling causes the appearance of a tone at |ω<sup>T</sup> − ω| (for *n* = −1) which is difficult to filter as, in a single-sided representation, it appears very close to the wanted signal at ω<sup>T</sup> + ω. The tone at |ω<sup>T</sup> − ω| is called the *image* of the wanted signal. The ratio of the magnitude of the image to the one of the signal is called the *image-reject ratio* (IRR) and is given by

$$IRR = \left| \frac{A\_I - A\_Q \mathbf{e}^{j\Delta\phi}}{A\_I + A\_Q \mathbf{e}^{-j\Delta\phi}} \right| = \left| \frac{A\_I \mathbf{e}^{-j\Delta\phi/2} - A\_Q \mathbf{e}^{j\Delta\phi/2}}{A\_I \mathbf{e}^{j\Delta\phi/2} + A\_Q \mathbf{e}^{-j\Delta\phi/2}} \right|$$

$$= \sqrt{\frac{(A\_I - A\_Q)^2 \cos^2(\Delta\phi/2) + (A\_I + A\_Q)^2 \sin^2(\Delta\phi/2)}{(A\_I + A\_Q)^2 \cos^2(\Delta\phi/2) + (A\_I - A\_Q)^2 \sin^2(\Delta\phi/2)}}$$

$$\approx \sqrt{\left(\frac{A\_I - A\_Q}{A\_I + A\_Q}\right)^2 + \tan^2(\Delta\phi/2)}\tag{14.13}$$

where in the last step we neglected the term (*AI* <sup>−</sup> *AQ*)<sup>2</sup> sin<sup>2</sup>( φ/2) in the denominator which is of second order in the errors. Note that part of the phase error could well come from the input signal. This is the IRR of the effective signal at the input of the LTI system *H*(*s*). It is plotted in Fig. 14.7.

# *14.2.6 Effect of Mismatch*

In this subsection we investigate the effect of mismatch which, as we will see, is another phenomenon limiting the amount of harmonic cancelling achievable in practical implementations.

Due to mismatch, each of the four transistors with which the modulator is implemented (see Figs. 14.1 and 14.2) presents a slightly different *rO N* resistance and similarly for the four source resistors. Therefore, the value of the resistance connected to the capacitor is not independent of time, but is time-varying

$$r(t) = \sum\_{k=0}^{3} r\_k \, l\_k(t) \,, \qquad r\_k \in \mathbb{R}, k = 0, \ldots, 3 \ldots$$

The differential equation describing the system therefore becomes

$$[D + \omega\_{3dB}(t)]v\_A = \omega\_{3dB}(t)\mathbf{x}(t) \,, \qquad \omega\_{3dB}(t) := \frac{1}{r(t)C}$$

$$\mathbf{x}(t) := v\_I^+(t)\,l\_0(t) + v\_I^-(t)\,l\_2(t) + v\_Q^+(t)\,l\_3(t) + v\_Q^-(t)\,l\_1(t)$$

where we denoted the sum of the input signals by *x*. Since the variation of the resistance from the nominal value is small, we can solve the equation using the perturbation method and proceed as in Example 12.8 (Fig. 14.8).

As a first step we develop ω3*d B*(*t*) in a Fourier series and decompose it in two parts

$$
\omega\_{3dB}(t) = \omega\_c(t) + \omega\_s(t) \tag{14.14}
$$

$$w\_c(t) = r\_0 l\_0(t) + r\_2 l\_2(t) = w\_{c,0} + X\_c \sum\_{n=1}^{\infty} w\_n \cos(n\omega\_\mathcal{T} t) \tag{14.15}$$

$$\omega\_s(t) = r\_1 l\_1(t) + r\_3 l\_3(t) = \omega\_{s,0} - X\_s \sum\_{n=1}^{\infty} w\_n \sin(n\omega\_\Gamma t) \tag{14.16}$$

with

$$w\_n = \frac{4}{\pi n} \sin(n \frac{\pi}{4}), \qquad n > 0 \dots$$

ω*c*(*t*) corresponds to the curve of Fig. 12.10b with τ = T/4 scaled by *Xc* plus a constant term. ω*s*(*t*)is constructed similarly, but with the curve of Fig. 12.10b shifted by −T/4. We use the symbols *Xc* and *Xs* to denote independent Gaussian random variables as in Sect. 14.2.4, but they are not related to the quantities of that section. We also denote again their standard deviation by σ*<sup>X</sup>* .

The sum of the constant terms (which are also random variables) is the average frequency

$$a\_0 = a\_{c,0} + a\_{x,0} \cdot \frac{1}{x}$$

The variable part of ω3*d B*(*t*) can be written as

$$\sum\_{n=1}^{\infty} w\_n [X\_c \cos(n\alpha \gamma t) - X\_s \sin(n\alpha \gamma t)]^2$$

Proceeding as in the analysis of carrier leakage we can express it in terms of the polar random variables *Xr* and *X*<sup>φ</sup>

$$\sum\_{n=1}^{\infty} w\_n X\_r \cos(na\varphi t + X\_\phi) \dots$$

Using this form for ω3*d B*(*t*) the differential equation can be written as

$$(D + \omega\_0)v\_A = \omega\_{\text{3dB}}(t)\ge (t) - \sum\_{n=1}^{\infty} w\_n X\_r \cos(n\omega\_\mathcal{T} t + X\_\phi)v\_A(t) \ .$$

We solved this equation to first order in *Xr* for one input tone and one cos term in Example 12.8. Referring to that example for details, we conclude that mismatch produces tones at all frequencies *<sup>n</sup>*ω<sup>T</sup> <sup>+</sup> ω, *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>. The amplitude of these tones is proportional to the random variable *Xr* which is Rayleigh distributed. The phase is uniformly distributed over the full circle.

While in this subsection we focused on mismatch, the same method can be used to analyse other effects causing variations in the resistance such as overlapping LO signals.

# *14.2.7 Second-Order Distortion*

In this subsection we analyse the distortion of second order introduced by the nonlinear characteristic of the MOSFETs. As discussed, if we neglect mismatch the circuit can be modeled as a time-invariant system drived by the input signal

$$\mathbf{x}(t) = \boldsymbol{v}\_I^+(t)\boldsymbol{l}\_0(t) + \boldsymbol{v}\_I^-(t)\boldsymbol{l}\_2(t) + \boldsymbol{v}\_\mathcal{Q}^+(t)\boldsymbol{l}\_3(t) + \boldsymbol{v}\_\mathcal{Q}^-(t)\boldsymbol{l}\_1(t) \,. \tag{14.17}$$

To simplify the calculations we discard the source resistors *RS* and consider the situation shown in Fig. 14.9a. We assume the transistor to remain in the so-called linear region of its characteristic which is described by

$$i\_D = \beta (v\_G - V\_T)(v\_D - v\_S) - \frac{\beta}{2}(v\_D^2 - v\_S^2) \dots$$

In our model the gate voltage is assumed to be constant at a sufficiently high level *VG*, in which case the characteristic can be modeled by the linear resistor *r* that we

**Fig. 14.9 a** Equivalent WNTI schematic of the quadrature modulator of Fig. 14.1 **b** Equivalent WNTI circuit of the quadrature modulator of Fig. 14.1

used before and two nonlinear VCCS that we combine in a single one controlled by the two voltages v*<sup>D</sup>* and v*<sup>S</sup>*

$$i\_D = \mathbf{g}\_1(\upsilon\_D - \upsilon\_S) + \mathbf{g}\_2(\upsilon\_D^2 - \upsilon\_S^2)$$

with

$$\mathbf{g}\_1 = \frac{1}{r} = \beta (V\_G - V\_T) \quad \text{and} \quad \mathbf{g}\_2 = -\frac{\beta}{2}$$

and represented in Fig. 14.9b. Using this transistor model the differential equation describing the system is

$$(1 + rCD)v\_A = x + \frac{\mathfrak{g}\_2}{\mathfrak{g}\_1}(\mathfrak{x}^2 - v\_A^2) \dots$$

The first order response of the system is described by the transfer function *H* that we calculated before, that we repeat here for convenience and to which we add an index representing the order as usual

$$H\_1(\mathbf{s}\_1) = \frac{1}{1 + \frac{s\_1}{\alpha\_{3dB}}} \dots$$

We compute the higher order responses by Laplace transforming the differential equation and retaining only terms of the relevant order. To obtain the transfer functions directly we use a Dirac pulse as input. The Laplace transformed of the second-order part of the differential equation is

$$[1 + rC(s\_1 + s\_2)]H\_2(s\_1, s\_2) = \frac{\mathcal{g}\_2}{\mathcal{g}\_1}[1 - H\_1(s\_1)H\_1(s\_2)]\dots$$

The second-order transfer function therefore is

$$H\_2(\mathbf{s}\_1, \mathbf{s}\_2) = \frac{\mathcal{G}\_2}{\mathcal{g}\_1} H\_1(\mathbf{s}\_1 + \mathbf{s}\_2) \left[ 1 - H\_1(\mathbf{s}\_1) H\_1(\mathbf{s}\_2) \right]. \tag{14.18}$$

Consider the case in which the modulator is driven by two baseband (v<sup>+</sup> *<sup>I</sup>* , v<sup>−</sup> *<sup>I</sup>* , v<sup>+</sup> *Q*, v<sup>−</sup> *<sup>Q</sup>*) real tones at ω<sup>1</sup> and ω2. The tones of interest in the effective input signal *x* of the WNTI model are at ±(ω<sup>T</sup> ± ω*i*),*i* = 1, 2. Under the assumption that |ω*i*| ω<sup>T</sup> ω3*d B* we can approximate *H*<sup>1</sup> at these frequencies by

$$H\_{\mathbf{l}}(j\omega) \approx 1 - j\frac{\omega}{a\_{3dB}}\ .$$

Using this approximation in *H*<sup>2</sup> we find

$$H\_2(j\omega\_1, j\omega\_2) \approx \frac{g\_2}{g\_1} \frac{1 - (1 - j\omega\_1/\omega\_{3dB})(1 - j\omega\_2/\omega\_{3dB})}{1 + j\frac{\omega\_1 + \omega\_2}{\omega\_{3dB}}}$$

$$\approx \frac{-1}{2(V\_G - V\_T)} \frac{J\frac{\omega\_1 + \omega\_2}{\omega\_{3dB}}}{1 + J\frac{\omega\_1 + \omega\_2}{\omega\_{3dB}}} \,. \tag{14.19}$$

where in the last step we have neglected the small quantity ω1ω2/ω<sup>2</sup> 3*d B*. This expression shows that, to reduce the principal second order distortion components under the given assumptions, it is more convenient to choose a voltage *VG* − *VT* as large as possible than merely reduce *r* by using a wider transistor.

# *14.2.8 Third-Order Distortion*

We next compute the third-order transfer function. The third-order part of the Laplace transformed differential equation is

$$[1 + rC(s\_1 + s\_2 + s\_3)]H\_3(s\_1, s\_2, s\_3) = -2\frac{g\_2}{g\_1} \left[H\_1(s\_1)H\_2(s\_2, s\_3)\right]\_{\text{sym}}.$$

From it we immediately obtain

$$H\_3(s\_1, s\_2, s\_3) = -2\frac{g\_2}{g\_1}H\_1(s\_1 + s\_2 + s\_3) \left[ H\_1(s\_1)H\_2(s\_2, s\_3) \right]\_{\text{sym}}.\tag{14.20}$$

To gain some insight from this expression we assume again two real input tones with |ω*i*| ω<sup>T</sup> ω3*d B*. Under these assumptions we can expand *H*<sup>3</sup> in a first order Taylor polynomial and obtain

$$\begin{split} H\_3(j\omega\_1, j\omega\_2, j\omega\_3) &\approx -j\frac{4}{3} \left(\frac{g\_2}{g\_1}\right)^2 \frac{\omega\_1 + \omega\_2 + \omega\_3}{\omega\_{3dB}} \\ &= -\frac{J(\omega\_1 + \omega\_2 + \omega\_3)C}{3\beta (V\_G - V\_T)^3} .\end{split} \tag{14.21}$$

As for second-order distortion we find that it's more convenient to choose a large *VG* − *VT* than to increase the width of the transistor. Using this expression we can estimate the IP3 of the modulator as

$$A\_{\rm IP3} \approx \frac{g\_1}{g\_2} \left| \sqrt{\frac{\alpha\_{3dB}}{\alpha\_{\Upsilon}}} = 2(V\_G - V\_T) \sqrt{\frac{\alpha\_{3dB}}{\alpha\_{\Upsilon}}} \right|. \tag{14.22}$$

The value of this approximation is compared with the value calculated from the full *H*<sup>3</sup> (14.20) as a function of ω3*d B*/ω<sup>T</sup> in Fig. 14.10 for ω<sup>1</sup> = ω<sup>2</sup> = (1 + 0.1)ω<sup>T</sup> and ω<sup>3</sup> = −(1 + 0.2)ω<sup>T</sup> . The approximation gives a reasonable value from ω3*d B*/ω<sup>T</sup> 2. For values of ω3*d B*/ω<sup>T</sup> < 1 the IP3 is seen to raise. This is however related to the fact that the wanted signals do also experience substantial attenuation compared with the case of a large ratio ω3*d B*/ω<sup>T</sup> .

It is important to realize that the effective input signal *x* of the model includes many tones that produce many intermodulation products. In particular, the tones at 3ω<sup>T</sup> − ω*i*,*i* = 1, 2 together with the main tones at ω<sup>T</sup> + ω*<sup>i</sup>* produce third-order intermodulation products that falls close to the wanted signal and are difficult to suppress

$$
\Im a \rho\_{\overline{\tau}} - a\_{\overline{i}} - \Im (a \rho\_{\overline{\tau}} + a\_{\overline{i}}) = a \rho\_{\overline{\tau}} - \Im a\_{\overline{i}} \ .
$$

These tones are called third order *counter intermodulation products (CIM3).* We saw in previous paragraphs that many practical imperfections introduce tones at the second harmonic of the input signals 2(ω<sup>T</sup> ± ω*i*). In this case second-order distortion does also produce tones around the signal of interest. In particular the combination

**Fig. 14.11** Single-sided output spectrum of the modulator simulated with accurate transistor models

$$2(a\gamma - a\_{\bar{i}}) - (a\gamma + a\_{\bar{i}}) = a\gamma - 3a\_{\bar{i}}$$

results in a tone at the CIM3 frequency as does the second-order distortion between 3ω<sup>T</sup> − ω*<sup>i</sup>* and 2(ω<sup>T</sup> − ω*i*)

$$(\mathfrak{A}a\_{\overline{\tau}} - a\_{\overline{i}}) - 2(a\_{\overline{\tau}} + a\_{\overline{i}}) = a\_{\overline{\tau}} - \mathfrak{A}a\_{\overline{i}}\ .$$

Depending on the details of the design, these second order distortion components may contribute significantly to the overall CIM3 level of the modulator.

Figure 14.11 shows part of the output spectrum magnitude obtained by numerical simulation of the modulator with accurate transistor models, a load capacitance of 1 pF, an LO frequency of 1 GHz, and two input tones given by

$$\begin{aligned} v\_I^+(t) &= -v\_I^-(t) = A\cos(\alpha\_1 t) + A\cos(\alpha\_2 t) \\ v\_\mathcal{Q}^+(t) &= -v\_\mathcal{Q}^-(t) = A\sin(\alpha\_1 t) + A\sin(\alpha\_2 t) \end{aligned}$$

with

$$A = 0.15\,\text{V}, \quad \omega\_1 = \frac{\alpha \gamma}{8}, \quad \omega\_2 = \omega\_1 + \frac{\alpha \gamma}{64}.$$

We used 22 nm FinFETs modelled with BSIM-CMG compact models [29] with technology parameters from [30]. The transistors were sized to have an *rO N* of 20 at *VG* − *VT* = 0.5 V. Since the threshold voltage of the transistors is 0.311 V, the LO voltage high level was chosen to be 0.811 V, while the low level was set to –0.189 V. To avoid overlapping the duration of the high pulses was reduced slightly from the nominal value of T/4 to produce a cross point between successive LO signals ca.

**Fig. 14.12** LO signal waveforms used in the simulation of the modulator

0.1 V below *VT* . The raise- and fall-times were set to 0.125 ns giving an LO transient slope *K* of 8 GV/s. The LO signals used in the simulation are shown in Fig. 14.12.

The spectral component levels obtained by simulation compare favorably with our analysis. The expected level of the main tones is

$$A \frac{4}{\sqrt{2}\pi} \approx 0.135\,\text{V}$$

and is very close to simulated one of 0.131 V. Our choice of parameters is such that ω3*d B*/ω<sup>T</sup> ≈ 7.9. We can therefore estimate the expected second- and third order intermodulation products from the approximate transfer functions given by (14.19) and (14.22) respectively. The expected level of the second order tone at 2(ω<sup>T</sup> − ω1) is estimated to be

$$\left| \left( A \frac{4}{\sqrt{2}\pi} \right)^2 \frac{1}{2} H\_2(\left( \left( \alpha \gamma - \omega\_1 \right), \left( \left( \alpha \gamma - \omega\_1 \right) \right) \right) \approx 2.29 \,\text{mV} \right)$$

The expected IM3 level at ω<sup>T</sup> + 2ω<sup>1</sup> − ω<sup>2</sup> is

$$\left| \left( A \frac{4}{\sqrt{2}\pi} \right)^3 \frac{3}{4} H\_3(j(\alpha\gamma + \omega\_1), j(\alpha\gamma + \omega\_1), -j(\alpha\gamma + \omega\_2)) \right| \approx 0.23 \,\text{mV} \,\text{J}$$

The simulated values are 5.44 and 0.35 mV respectively, reasonably close to the predicted values.

In our simplified analysis we assumed zero LO signal raise- and fall-times. It is interesting to investigate how fast the LO transients have to be before the intermodula-

**Fig. 14.13** Simulated IM3 versus LO transient slope *K*

tion products start to deviate significantly from the predicted values. We investigated this question by simulation. The IM3 level is plotted as a function of the LO transients slope *K* in Fig. 14.13. For all values of *K* the crossing point was kept constant. Note that the LO waveforms corresponding to the lowest *K* values are essentially triangular with no flat high level region. This simulation suggest that the analysis gives reasonable IM3 estimates for values of *f*<sup>T</sup> /*K* 0.14.

# **14.3 Sampling Mixer**

A communication receiver should ideally be able to detect a single signal on a channel of a frequency band allocated to the service of interest and completely suppress all other signals. Due to limitations in the selectivity of filters and the difficulty of implementing tuneable filters, this can only be achieved approximately. Virtually all receivers are composed by a fixed highly selective filter (the preselection filter) typically implemented with surface- (SAW) or bulk-acoustic wave (BAW) technologies suppressing all signals outside the band of interest. This filter is then followed by some signal amplification and by a shift of the signal of interest to a lower fixed frequency with the help of a mixer. At this lower frequency another fixed filter (the channel filter) with a bandwidth corresponding to the bandwidth of a channel separates the wanted signal from other signals on adjacent channels. The channel of interest is selected by shifting in frequency of the input spectrum in such a way that the desired signal falls in the passband of the channel filter. This is done by appropriately choosing the frequency of the so-called local oscillator (LO) driving the LO port of the mixer.

The sampling mixer analysed hereafter is an attempt to remove the preselection filter and implement a tuneable filter capable of selecting a single channel only using components available in a standard CMOS technology. This is driven by the desire for miniaturisation and cost reduction. While this type of circuits have their own drawbacks they are a nice example showing some capabilities of time-varying systems that can't be matched by LTI ones.

# *14.3.1 Time-Varying Impulse Response*

Consider the highly idealised sampling mixer shown in Fig. 14.14. The input signal is represented by the voltage source *Vs* and the nodes labelled *V*0,..., *VN*−<sup>1</sup> represent output signals. The ideal switches *Sn*, *n* = 0,..., *N* − 1 are driven by theT-periodic clock signals φ*n*. A switch is closed when the corresponding clock signal is high and open when low. We assume that the clock signals are non-overlapping. Since no reactive component is present on the source side of the switches the output signals can be analysed independently of each other. In the following we assume that each clock phase has the same duration so that

$$t\_n = n\frac{\mathcal{T}}{N}; \qquad n = 0, \dots, N-1.$$

In this case it's enough to compute the time-varying impulse response of one output signal only. The other ones are then obtained by a simple translations in time. We will therefore compute the time-varying impulse response corresponding to the output *V*<sup>0</sup> that in the following we will denote by *y*. Similarly, we will denote the source signal by *x*.

The circuit, having two phases, is described by two differential equations. Between 0 < *t* < *t*1, when the switch is closed, it is described by

$$Dy + a\_0y = a\_0x, \qquad a\_0 := \frac{1}{RC}$$

while for *t*<sup>1</sup> < *t* < T, when the switch is open, by

$$D\mathbf{y} = \mathbf{0}\dots$$

We start by computing the fundamental kernel of the system *W*(*t*,τ) which is the solution of the differential equation when driven by a Dirac impulse occurring at time τ . Since the circuit varies periodically in time, it's enough to compute it for 0 <τ< T.

For 0 <τ< *t*<sup>1</sup> the output is zero up to time τ at which point it will jump to 1/*RC* and start to decay exponentially as in an LTI system. At time *t*<sup>1</sup> the switch is opened, leaving the output capacitor floating. The output voltage will therefore remain constant up to timeT. At timeT, since there is a resistor between the capacitor and the source, the output will simply start to decrease exponentially again with the same time constant 1/*RC* as during the first part of the response. Continuing this process we obtain (see Fig. 14.15)

$$W(t,\tau) = \begin{cases} 0 & t < \tau \\ \alpha\_0 \mathbf{e}^{-\alpha \mathbf{e} \left(t - \tau\right)} & \tau < t < t\_1 \\ \alpha\_0 B A^{k-1} \mathbf{e}^{-\alpha \mathbf{e} \left(t - k\mathcal{T}\right)} & k\mathcal{T} < t < k\mathcal{T} + t\_1, k \ge 1 \\ B A^k & t\_1 + k\mathcal{T} < t < (k+1)\mathcal{T}, k \ge 0 \end{cases}$$

with

$$A := \mathbf{e}^{-\alpha \mathbf{u} t\_1} \qquad B := \mathbf{e}^{-\alpha \mathbf{u} (t\_1 - \mathbf{r})} \dots$$

For *t*<sup>1</sup> <τ< T, given that the output is disconnected from the input, the output remains zero

$$W(t, \mathfrak{r}) = 0 \Box$$

The time varying impulse response can be derived from the fundamental kernel with the help of the variable substitution ξ = *t* − τ and by keeping in mind that the value of the impulse response *h*(*t*,ξ) is the value of the output at time *t* assuming that a Dirac impulse was applied ξ seconds in the past. As the impulse response is periodic, it's enough to compute its value over the first period. For 0 < *t* < *t*<sup>1</sup> it is given by (see Fig. 14.16a)

$$h(t,\xi) = \begin{cases} \omega\_0 \mathbf{e}^{-a\_0 \xi} & 0 < \xi < t \\ \omega\_0 A^k \mathbf{e}^{-a\_0(\xi - k\mathcal{T})} & t - t\_1 + k\mathcal{T} < \xi < t + k\mathcal{T}, k \ge 1 \\ 0 & \text{otherwise} \end{cases}$$

and for *t*<sup>1</sup> < *t* < T by

$$h(t,\xi) = \begin{cases} a\_0 A^k \mathfrak{e}^{-a\_0[\xi - (kT + t - t\_l)]} & t - t\_l + kT < \xi < t + kT, k \ge 1 \\ 0 & \text{otherwise} \end{cases}$$

# *14.3.2 Time-Varying Transfer Function*

While the circuit is fully characterised by the above time-varying impulse response, its filtering characteristics are best understood by analysing its time-varying frequency response. This will allow us to easily obtain the output signal when the circuit is driven by a tone.

We compute the time-varying transfer function *h*ˆ(*t*, ω) by Fourier transforming *h*(*t*,ξ). For 0 < *t* < *t*<sup>1</sup> we have

$$\hat{h}(t,\omega) = \int\_0^t \omega\_0 \mathbf{e}^{-a\_0\xi} \mathbf{e}^{-f\alpha\xi} \mathbf{d}\xi + \sum\_{k=1}^\infty \int\_{t-t\_1+k\mathcal{T}}^{t+k\mathcal{T}} \alpha\_0 A^k \mathbf{e}^{-a\_0(\xi - k\mathcal{T})} \mathbf{e}^{-f\alpha\xi} \mathbf{d}\xi \ .$$

The terms in the right summation are powers of the summation variable *k* multiplied by a constant and reminds of a geometric series with a missing first term. As a first step we therefore add the missing term by adjusting the limits of the first integral

$$\hat{h}(t,\omega) = -\int\_{t-t\_1}^{0} \omega\_0 \mathbf{e}^{-a\_0\xi - j\alpha\xi} \mathbf{d}\xi + \sum\_{k=0}^{\infty} a\_0 A^k \mathbf{e}^{a\_0 k \mathcal{T}} \int\_{t-t\_1+k\mathcal{T}}^{t+k\mathcal{T}} \mathbf{e}^{-a\_0 \xi - j\alpha\xi} \mathbf{d}\xi \ .$$

Evaluating the integrals and simplifying we find

$$\frac{1 - \mathbf{e}^{-(a\_0 + j\omega)(t - t\_1)}}{1 + J\frac{\omega}{a\_0}} + \mathbf{e}^{-(a\_0 + j\omega)t} \frac{\mathbf{e}^{(a\_0 + j\omega)t\_1} - 1}{1 + J\frac{\omega}{a\_0}} \sum\_{k=0}^{\infty} A^k \mathbf{e}^{-jak\mathcal{T}} \cdot \mathbf{e}$$

Performing the summation of the geometric series we finally obtain

$$\hat{h}(t,\omega) = \frac{1 - \mathbf{e}^{-(\alpha y + f\alpha)(t - t\_l)}}{1 + J\frac{\alpha}{\alpha\_0}} + \frac{(\mathbf{e}^{(\alpha y + f\alpha)t\_l} - 1)\mathbf{e}^{-(\alpha y + f\alpha)t}}{(1 + J\frac{\alpha}{\alpha\_0})(1 - \mathbf{e}^{-f\alpha T}\mathbf{e}^{-\alpha y t\_l})}.\tag{14.23}$$

For *t*<sup>1</sup> < *t* < T the time-varying frequency response is given by

$$\hat{h}(t,\omega) = \sum\_{k=0}^{\infty} \int\_{t-t\_1+k\mathcal{T}}^{t+k\mathcal{T}} a\_0 A^k \mathbf{e}^{-a\eta[\xi-(k\mathcal{T}+t-t\_1)]} \mathbf{e}^{-j\alpha\xi} d\xi \dots$$

This is again a geometric series and proceeding as above we obtain

$$\hat{h}(t,\omega) = \frac{(\mathbf{e}^{Jat\_{\parallel}} - \mathbf{e}^{-\alpha y\_{\parallel}})\mathbf{e}^{-Jat}}{(1 + J\frac{\omega}{\alpha\_{0}})(1 - \mathbf{e}^{-J\alpha\mathcal{T}}\mathbf{e}^{-\alpha y\_{\parallel}})} \,. \tag{14.24}$$

# *14.3.3 Selectivity*

With *h*ˆ(*t*, ω) the output of the circuit when driven by *x*(*t*) = cos(ω*t*) is immediately obtained

$$\mathbf{y}(t) = \mathfrak{R}\{\hat{h}(t,\boldsymbol{\omega})\mathbf{e}^{\boldsymbol{\omega}\boldsymbol{\omega}}\}.$$

The output is shown in Fig. 14.17 for two values of the input frequency and *N* = 4. During the time intervals *<sup>t</sup>*<sup>1</sup> <sup>+</sup> *<sup>k</sup>*<sup>T</sup> <sup>&</sup>lt; *<sup>t</sup>* < (*<sup>k</sup>* <sup>+</sup> <sup>1</sup>)T, *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup> the output is constant and assumes the value

$$\mathbf{y}(t) = \mathfrak{R}\{\hat{h}(t,\omega)\mathbf{e}^{\prime \alpha t}\} = \mathfrak{R}\left\{\frac{(\mathbf{e}^{\prime \alpha t\_{l}} - \mathbf{e}^{-\alpha\_{0}t\_{l}})\mathbf{e}^{\prime \alpha k \mathcal{T}}}{\left(1 + \int \frac{\omega}{a\_{0}}\right)\left(1 - \mathbf{e}^{-\prime \alpha t \mathcal{T}}\mathbf{e}^{-\alpha \omega t\_{l}}\right)}\right\}$$

where we used the periodicity in time of *h*ˆ(*t*, ω) and the previously computed expression valid for *t*<sup>1</sup> < *t* < T

$$
\hat{h}(t,\omega) = \hat{h}(t - k\mathcal{T}, \omega) \,.
$$

These values are the output sample values of the sampling mixer. Let's denote them by *y*[*k*] and set ω = *n*ω*<sup>s</sup>* + ω, *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup> with <sup>ω</sup>*<sup>s</sup>* <sup>=</sup> <sup>2</sup>π/<sup>T</sup> and ω < ω*s*/2. Then the above expression becomes

$$\begin{split} \text{y}[k] &= \Re \left\{ h\_{\text{eff}} (n\omega\_{\text{s}} + \Delta\omega) \mathbf{e}^{f\Delta\alpha k \mathcal{T}} \right\} \\ &:= \Re \left\{ \frac{\mathbf{e}^{f(n\omega\_{\text{s}} + \Delta\omega)t\_{\text{l}}} - \mathbf{e}^{-\omega\_{\text{0}}t\_{\text{l}}}}{(1 + j\frac{n\omega\_{\text{s}} + \Delta\omega}{\alpha\_{\text{0}}})(1 - \mathbf{e}^{-f\Delta\alpha\mathcal{T}}\mathbf{e}^{-\omega\_{\text{0}}t\_{\text{l}}})} \right\}. \end{split} \tag{14.25}$$

These are the samples of a sinusoidal with angular frequency ω and amplitude

$$\left| \frac{\left(\mathbf{e}^{J\left(n\alpha\_{\mathrm{i}} + \Delta\alpha\right)t\_{\mathrm{l}}} - \mathbf{e}^{-\alpha\_{\mathrm{i}}t\_{\mathrm{l}}}\right)}{\left(1 + J\frac{n\alpha\_{\mathrm{i}} + \Delta\alpha}{\alpha\_{\mathrm{0}}}\right)\left(1 - \mathbf{e}^{-J\Delta\alpha\mathcal{T}}\mathbf{e}^{-\alpha\_{\mathrm{i}}t\_{\mathrm{l}}}\right)} \right| \,\,\,\mathrm{s}$$

.

The fact that the output samples correspond to samples of a sinusoidal with a frequency independent of *n* is a manifestation of *aliasing* inherent in every sampling process. The interesting aspect of the sampling mixer is the fact that only samples of tones with a frequency very close to *n*ω*<sup>s</sup>* have a significant amplitude while the ones of signals with frequencies at a distance larger than approximately ω0/*N* from *n*ω*<sup>s</sup>* are attenuated. This effect is due to the factor

$$1 - \mathbf{e}^{-J\,\Delta\alpha\mathcal{T}}\mathbf{e}^{-\alpha\_0 t\_l} = 1 - \mathbf{e}^{-J\,2\pi\frac{\Delta\alpha}{\alpha\_t}}\mathbf{e}^{-\frac{2\pi}{N}\frac{\alpha\_0}{\alpha\_t}}$$

in the denominator of the above expression. For ω<sup>0</sup> *N*ω*<sup>s</sup>* the last exponential on the right is only slightly smaller than 1. Therefore, for ω < ω0/*N* this factor becomes small thereby boosting the value of the samples around those frequencies (see Fig. 14.18). From this we conclude that the sampling mixer not only behaves as a sample and hold, but it also acts as a highly selective (large quality factor) filter

around the sampling frequency and its harmonics. The achievable selectivity is much higher than the one of LTI *RLC* filters integrable in a standard CMOS technology.

# *14.3.4 Even Harmonic Response Suppression*

The response around the harmonics is generally undesired and can be suppressed by using weighted sums of the *N* outputs as discussed in Example 12.10. In the following we assume *N* even and investigate the possibility to suppress the responses at even harmonics by making use of the sample values on the capacitors as opposed to the full waveforms. Consider the sample value on capacitor *i* = *N*/2

$$V\_{N/2}(t) = \Re\{\hat{h}(t - \mathcal{T}/2, \boldsymbol{\omega}) \mathbf{e}^{f\boldsymbol{\alpha}t}\}$$

Denoting the sample value held by the capacitor during the interval *t*<sup>1</sup> + *t*1*N*/2 + *k*T < *t* < T + *t*1*N*/2 + *k*T by *yN*/<sup>2</sup>[*k*] and using the periodicity in time of *h*ˆ(*t*, ω) as before we obtain

$$\begin{split} \mathbb{E}[\mathbf{y}\_{N/2}[k]] &= \mathfrak{R} \left\{ \frac{(\mathbf{e}^{J\alpha t\_{1}} - \mathbf{e}^{-\alpha \eta t\_{1}})\mathbf{e}^{-J\alpha (t - kT - T/2)}}{\left(1 + J\frac{\omega}{\alpha \eta}\right) \left(1 - \mathbf{e}^{-J\alpha \mathcal{T}}\mathbf{e}^{-\alpha \eta t\_{1}}\right)} \mathbf{e}^{J\alpha t} \right\} \\ &= \mathfrak{R} \left\{ \frac{(\mathbf{e}^{J\alpha t\_{1}} - \mathbf{e}^{-\alpha \eta t\_{1}})}{\left(1 + J\frac{\omega}{\alpha \eta}\right) \left(1 - \mathbf{e}^{-J\alpha \mathcal{T}}\mathbf{e}^{-\alpha \eta t\_{1}}\right)} \mathbf{e}^{J\alpha (kT + T/2)} \right\} \\ &= \mathfrak{R} \left\{ h\_{\mathrm{eff}}(\omega) \mathbf{e}^{J\alpha kT} \mathbf{e}^{J\alpha T/2} \right\} \end{split}$$

If the frequency of the signal is close to the *n*th harmonic of the sampling frequency ω = *n*ω*<sup>s</sup>* + ω, ω ω*<sup>s</sup>* the last expression becomes

$$\mathbf{y}\_{N/2}[k] = \Re \left\{ h\_{\mathrm{eff}} (n\omega\_s + \Delta\omega) \mathbf{e}^{J\Delta a k \mathcal{T}} (-1)^n \mathbf{e}^{J\pi \frac{\Delta u}{a\_0}} \right\}.$$

The difference between sample values on capacitor *C*<sup>0</sup> and *CN*/<sup>2</sup> is thus given by

$$\|\mathbf{y}[k] - \mathbf{y}\_{N/2}[k] = \Re \left\{ h\_{\mathrm{eff}} (n\omega\_s + \Delta\omega) \mathbf{e}^{\prime \Delta a k \mathcal{T}} \left[ 1 - (-1)^n \mathbf{e}^{\prime \pi \frac{\Delta u}{a\_\Gamma}} \right] \right\}.$$

Under the assumption ω ω*<sup>s</sup>* the right most exponential is close to 1 so that

$$\mathbf{y}[k] - \mathbf{y}\_{N/2}[k] \approx \begin{cases} \mathbf{0} & n \text{ even} \\ \mathbf{2y}[k] & n \text{ odd} \dots \end{cases}$$

For *N* = 4 the four output samples can be combined in pairs forming signals corresponding to the in-phase (*VI* [*k*] = *V*0[*k*] − *V*2[*k*]) and quadrature (*VQ*[*k*] = *V*1[*k*] − *V*3[*k*]) output of a quadrature mixer.

Before concluding this section we note that a loss of charge from the capacitor during the hold phase results in a lower boost of the samples around the frequencies *<sup>n</sup>*ω*s*, *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>. A loss of charge could be caused for example by a finite load resistance or by a switched-capacitor circuit following the sampling mixer. The reduction in the magnitude of the samples comes from the fact that the value of *A* appearing in the definition of *h*(*t*,ξ) will become smaller and as a consequence the boosting factor

$$1 - \mathbf{e}^{-J\Delta\alpha\mathcal{T}}A$$

in the denominator of *h*ˆ(*t*, ω) will not become as small as calculated above.

# **14.4** *N***-Path Filters**

The block diagram of a general *N*-*path filter* is shown in Fig. 14.19 and can be thought of as the cascade of an *N*-path receiver and an *N*-path transmitter (compare with Example 12.10). Among other things they permit to implement transfer functions that, under suitable assumptions, mimic the ones of LTI networks that are difficult or not manufacturable with *RLC* elements due to the limited range of actually implementable values or due to limitations in their quality (quality factor). The time-varying transfer function of a general *N*-path filter can be analysed using the same methods used to analyse the *N*-path receiver of Example 12.10. Here instead of the general case we analyse a concrete implementation that shows some useful applications.

In the following we analyse the simple case in which the *N* LTI subsystems are simple shunt capacitors and where the periodic input and output functions are equal switching functions. Under these conditions, when the switches of path *k* are closed, the upper plate of the *k*th capacitor is simultaneously connected to the input as to the output and we obtain the circuit shown in Fig. 14.14 where now the output is constituted by the node labeled *Vf* .

# *14.4.1 Time-Varying Frequency Response*

During clock phase *k*, 0 ≤ *k* ≤ *N* − 1, during which the switch *Sk* is closed, the voltage *Vf* is equal to the voltage *Vi* . For this reason we can express *Vf* in terms of the time-varying frequency response that we obtained in the previous section. In particular, when the input is a complex tone ejω*<sup>t</sup>* the output is given by

$$V\_f(t) = \mathbf{c}^{\prime \prime \prime} \sum\_{k=0}^{N-1} \hat{h}\_{sm}(t - kt\_1, \omega) \mathbf{1}\_{t\_1}(t - kt\_1 \mod \mathcal{T}),$$

with

$$1\_{t\_1}(t) = \begin{cases} 1 & 0 < t < t\_1 \\ 0 & \text{otherwise} \end{cases}$$

and where we denoted the time-varying frequency response of the sampling mixer by *h*ˆ*sm* to avoid confusion with the one of the whole *N*-path filter that we'll denote by *h*ˆ. For 0 < *t* < T, using (14.23) the frequency response of the filter is thus given by

$$\begin{split} \hat{h}(t,\omega) &= \frac{1}{1+j\frac{\omega}{a\omega}} + \left[\frac{-\mathbf{e}^{(a\eta+j\omega)t\_{l}}}{1+j\frac{\omega}{a\omega}} \\ &+ \frac{\left(\mathbf{e}^{(a\eta+j\omega)t\_{l}}-1\right)}{\left(1+j\frac{\omega}{a\omega}\right)\left(1-\mathbf{e}^{-j\omega\mathcal{T}}\mathbf{e}^{-a\eta t\_{l}}\right)}\right] \sum\_{k=0}^{N-1} \mathbf{e}^{-(a\eta+j\omega)(t-kt\_{l})} \mathbf{1}\_{t\_{l}}(t-kt\_{l})\dots \end{split}$$

As the time-varying frequency response is T-periodic, we can expand it in a Fourier series. The *n*th Fourier coefficient of the summation on the right is

$$\begin{split} a\_{n} &= \frac{1}{\mathcal{T}} \int\_{0}^{\mathcal{T}} \sum\_{k=0}^{N-1} \mathbf{e}^{-(\alpha\_{0} + j\omega)(t - kt\_{1})} \mathbf{1}\_{l\_{1}}(t - kt\_{1}) \mathbf{e}^{-j n \alpha\_{t} t} \mathbf{d}t \\ &= \frac{1}{\mathcal{T}} \sum\_{k=0}^{N-1} \mathbf{e}^{(\alpha\_{0} + j\omega)kt\_{1}} \int\_{k\mathcal{T}/N} \mathbf{e}^{-[\alpha\_{0} + j(\omega + n\omega\_{t})]t} \mathbf{d}t \\ &= \frac{1}{\mathcal{T}} \frac{1 - \mathbf{e}^{-[\alpha\_{0} + j(\omega + n\omega\_{s})]t\_{1}}}{\omega\_{0} + j(\omega + n\omega\_{s})} \sum\_{k=0}^{N-1} \mathbf{e}^{-j n \alpha\_{t} kt\_{1}}. \end{split}$$

The last summation is zero unless *n* is a multiple of *N* in which case it evaluates to *N*

$$a\_n = \begin{cases} \frac{N}{\bar{T}} \frac{1 - e^{-[\alpha\_0 + f(\alpha + n\omega\_t)]\mathbb{I}\_1}}{\alpha\_0 + f(\alpha + n\omega\_t)} & n = Nm, m \in \mathbb{Z},\\ 0 & \text{otherwise.} \end{cases}$$

The time-varying frequency response of the filter is therefore given by

$$\hat{h}(t,\omega) = \sum\_{n=\ldots,-4,0,4,\ldots} h\_n(\omega) \mathbf{e}^{\prime m \omega\_n t} \tag{14.26}$$

with

$$h\_n(\omega) = \frac{1}{1 + j\frac{\omega}{\alpha\_0}} + \left[ \frac{-\mathbf{e}^{(\alpha \eta + j\alpha)t\_l}}{1 + j\frac{\omega}{\alpha\_0}} + \frac{\left(\mathbf{e}^{(\alpha \eta + j\alpha)t\_l} - 1\right)}{\left(1 + j\frac{\omega}{\alpha\_0}\right)\left(1 - \mathbf{e}^{-j\alpha \mathcal{T}}\mathbf{e}^{-\alpha \eta t\_l}\right)} \right] a\_n\right]$$

or, after some simplification

$$h\_n(\omega) = \frac{1}{1 + j\frac{\omega}{\alpha\_0}} \left[ 1 + \frac{1}{t\_1 \alpha\_0} \frac{\left(\mathbf{e}^{-j\omega(\mathcal{T} - t\_l)} - 1\right) \left(1 - \mathbf{e}^{-[\alpha \mathbf{y} + j(\omega + n\omega\_l)]t\_l}\right)}{\left(1 - \mathbf{e}^{-j\omega \mathcal{T}} \mathbf{e}^{-\alpha y\_l}\right) \left(1 + j\frac{\omega + n\alpha\_l}{\alpha\_0}\right)}\right]. \quad (14.27)$$

The last term of *hn*(ω) includes the same factor that we discussed in the analysis of the sampling mixer and responsible, under the condition ω0*t*<sup>1</sup> 1, for boosting the response of the circuit at frequencies ω = *k*ω*<sup>s</sup>* + ω, *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup>, ω < ω0/*N*. Therefore, for ω0*t*<sup>1</sup> 1 the transfer function *h*0(ω)represents a highly selective band-pass filter with pass bands centered at *<sup>k</sup>*ω*<sup>s</sup>* with *<sup>k</sup>* = *Nm*, *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup> (see Fig. 14.20).

The transfer functions *hn*(ω), *n* = 0 are also highly selective, but they also introduce a shift in frequency. In particular an input tone at ω − *n*ω*<sup>s</sup>* passing through *hn*(ω) results in an output tone at ω which will overlap with the response of *h*0(ω) to an input tone at ω. Therefore, if this *N*-path filter is used in a receiver to suppress interfering signals, while it possess undesired pass-bands, the closest frequency of an interfering signal that at the output of the circuit will overlap in frequency with the wanted signal is *N*ω*s*. The magnitude of a few transfer functions producing an output tone at ω are shown in Fig. 14.20.

# *14.4.2 Selectivity*

By applying some approximations to *h*<sup>0</sup> valid in the vicinity of ω*<sup>s</sup>* we can obtain a transfer function that can be implemented with fixed *RLC* components. This will allow us to quantify the selectivity of the filter in terms of standard metrics.

As a first step we set again ω = ω*<sup>s</sup>* + ω and use ω ω*<sup>s</sup>* to make the following approximation

$$\begin{split} \frac{1}{1 + \operatorname{J\,\frac{\alpha\_{l} + \Delta\alpha}{\alpha\_{0}}}} \frac{1}{t\_{1}\alpha\_{0}} \left( \operatorname{\, \left( \operatorname{\,}^{-J\,\mu\left(\mathcal{T} - t\_{l}\right)} - 1 \right) \right) & \approx \frac{1}{J\,\omega\_{s}t\_{1}} \operatorname{\,}^{J\alpha\_{s}t\_{1}/2} \left( \operatorname{\,\mathbf{e}^{\,\mu\omega\_{s}t\_{1}/2}} - \operatorname{\,\mathbf{e}^{-\,\mu\omega\_{l}t\_{1}/2}} \right) \\ & = \operatorname{\,\mathbf{e}^{\,\mu\pi/N}} \frac{\sin(\pi/N)}{\pi/N} . \end{split}$$

Similarly, using in addition ω<sup>0</sup> ω*<sup>s</sup>*

$$\begin{split} \frac{\left(1-\mathsf{e}^{-\left[\alpha\eta+f\left(\omega\_{l}+\Delta\phi\right)\right]t\_{l}}\right)}{\left(1-\mathsf{e}^{-f\left(\omega\_{l}+\Delta\phi\right)\mathcal{T}}\mathsf{e}^{-\alpha\eta t\_{l}}\right)\left(1+j\frac{\alpha\_{l}+\Delta\phi}{\alpha\_{0}}\right)} &\approx \mathsf{e}^{-j\pi/N}\frac{2j\sin(\pi/N)}{\left(j\Delta\alpha\mathcal{T}+\omega\_{0}t\_{l}\right)j\frac{\alpha\_{0}}{\alpha\_{0}}}\\ &\approx \mathsf{e}^{-j\pi/N}\frac{\sin(\pi/N)}{\left(1+jN\frac{\Delta\phi}{\alpha\_{0}}\right)\pi/N}. \end{split}$$

Finally, using these approximations and noting that the first summand in *h*0(ω) is small compared to the second we obtain

$$h\_0(\omega) \approx \left(\frac{\sin(\pi/N)}{\pi/N}\right)^2 \frac{1}{1 + j\,N\frac{\Delta\omega}{\omega\_0}}.\tag{14.28}$$

The impedance of a parallel LTI *RLC* resonator with a resonance frequency of ω*<sup>s</sup>* is given by

$$Z\_r(\omega) = \frac{\frac{f^{\alpha}}{\alpha\_s} \frac{R\_r}{q}}{\left(\frac{f^{\omega}}{\alpha\_s}\right)^2 + \frac{f^{\omega}}{\alpha\_s q} + 1}$$

**Fig. 14.21** Model for the *N*-path filter *h*0(ω) valid around ω*s*

with *q* the quality factor of the resonator and *Rr* the impedance at resonance. Around the resonance frequency it can be approximated by

$$Z\_r(\omega\_s + \Delta a) \approx \frac{R\_r}{1 + j\,2q\frac{\Delta a}{a\_s}}\,\,\,\,\,$$

The transfer function to *Vf* around the resonance frequency of the circuit shown in Fig. 14.21 is therefore given by

$$\frac{R\_r}{R + R\_r} \cdot \frac{1}{1 + j\,2q\frac{R}{R + R\_r}\frac{\Delta w}{w\_s}} \,, \quad \omega\_s^2 = \frac{1}{\sqrt{L\_r C\_r}} \,, \quad q = \frac{R\_r}{\omega\_s L\_r}$$

and has the same form as the approximation of *h*0(ω) given in (14.28). The two are equal if

$$\begin{cases} \frac{R\_r}{R + R\_r} = K\\ 2q \frac{R}{R + R\_r} \frac{\Delta w}{w\_3} = \frac{N}{a\_0} \end{cases}$$

with

$$K := \left(\frac{\sin(\pi/N)}{\pi/N}\right)^2.$$

This shows that around ω*<sup>s</sup> h*0(ω) can be modelled as a parallel resonator with resonance frequency ω*<sup>s</sup>* and characterised by

$$R\_r = R \frac{K}{1 - K}, \qquad q = \frac{N\pi}{(1 - K)a\_0 \mathcal{T}} \dots$$

The transfer function of this model is compared with the exact *h*0(ω)in Fig. 14.22. For *N* = 4, ω0T = 0.5 the quality factor has a value close to 133. For comparison, the highest resonance quality factor implementable at RF frequencies with inductors and capacitors available in standard CMOS technologies is in the range of 20, with typical values substantially lower than this.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Appendix A Signal-Flow Graphs**

Signal-flow graphs (SFGs) are a graphical way to represent systems. While this is true of block diagrams, the conventions used with the former make them simpler and, for many purposes, more powerful than the latter. SFGs are sometimes used with nonlinear systems, but are most useful to manipulate and transform linear systems. In this appendix we first review some of the most useful transformations applicable to SFGs used to describe *linear* systems, so called linear signal-flow graphs. We then review their more limited use with general nonlinear systems. Finally, we show how all the rules valid for linear SFGs can be extended to weakly-nonlinear time-invariant systems.

# **A.1 Linear Signal-Flow Graphs**

# *A.1.1 Construction Rules*

*Signal-flow graphs* are directed graphs. Each *node* represents a variable, in our case a signal. Signals flow along the *branches* (or *edges*) of the graph in the indicated direction. Each branch is labelled by the *transmission factor* of the branch. A signal flowing along a branch is composed with the branch transmission factor: in the time domain composition is effected by using the convolution product, while in the Laplace or frequency domain by using standard multiplication. The transmission factors in the time domain representation of a system are impulse responses while in the Laplace representation are transfer functions.

All signals entering a node through branches are *summed* at that node. In other words, the value of the variable represented by a node is the sum of the entering sig-

nals. The value of the variable represented by any node is transmitted to all branches leaving the node. Nodes without incoming branches are *source* nodes. Nodes with only incoming branches are *sink* nodes.

Using the above rules SFGs can be used to represent systems of linear algebraic and convolution equations. The usefulness of SFGs comes from the fact that their graphical nature helps clarify the relation between variables and the existence of feedback loops. For example, the system of equations

$$\begin{aligned} \mathbf{x}\_1 &= a\mathbf{x}\_0 + d\mathbf{x}\_2 + e\mathbf{x}\_2\\ \mathbf{x}\_2 &= b\mathbf{x}\_1\\ \mathbf{x}\_3 &= c\mathbf{x}\_2 \end{aligned} \tag{A.1}$$

can be represented by the SFG shown in Fig.A.1.

Here and in the following, for simplicity of notation, we will use single letters to represent transmission factors and suppress all product symbols. Before proceeding discussing more aspects of SFGs it's convenient to introduce some specific terminology [37].

*Intermediate node*: A node with incoming and outgoing branches.

*Path*: A connection from a starting node to an end node by a continuous, unidirectional succession of branches all of which are traversed along the branch direction.

*Open path*: Any path not touching a node more than once.

*Loop*: A path starting and ending at the same node. All other nodes are touched at most once.

*Self-loop*: A loop touching a single node.

*Non-touching loops*: Loops without common nodes.

*Path-/Loop-gain*: The product of the transmission factors of the branches forming the path resp. the loop.

# *A.1.2 Reduction Rules*

Many of the algebraic manipulations performed in solving linear equations can be translated in *reduction rules* for SFGs [37]. These are graphical rules to transform a

given SFG in an equivalent, but simpler form. Hereafter we list the most useful rules together with the algebraic equations proving the equivalence.

1. *Parallel transformation*:

*x*<sup>1</sup> = *ax*<sup>0</sup> + *bx*<sup>0</sup> *x*<sup>1</sup> = (*a* + *b*)*x*<sup>0</sup>

2. *Cascade transformation*:

$$\underbrace{a}\_{\times\_{0}} \xrightarrow[\times\_{\times\_{1}}]{a} \xrightarrow[\times\_{\times\_{2}}]{b} \xrightarrow[\times\_{\times\_{0}}]{} \xrightarrow[\times\_{\times\_{0}}]{ab} \xrightarrow[\times\_{\times\_{2}}]{}$$

$$\mathbf{x}\_2 = b\mathbf{x}\_1, \mathbf{x}\_1 = a\mathbf{x}\_0 \qquad \qquad \mathbf{x}\_2 = ab\mathbf{x}\_0$$

3. *Star to mesh transformation*:

This rule can also be applied in the case in which a transmission factor is zero and in the case in which some nodes coincide.

4. *Node elimination*:

$$\begin{aligned} \mathbf{x}\_0 &= c\mathbf{x}\_1 & \mathbf{x}\_0 &= c a \mathbf{x}\_0 + c d \mathbf{x}\_2\\ \mathbf{x}\_2 &= b \mathbf{x}\_1 & \mathbf{x}\_2 &= b a \mathbf{x}\_0 + b d \mathbf{x}\_2\\ \mathbf{x}\_1 &= a \mathbf{x}\_0 + d \mathbf{x}\_2 \end{aligned}$$

This is a special case of the star to mesh rule.

5. *Shifting start point*:

6. *Shifting end point*:

Note that this transformation changes the variable *x*<sup>1</sup> into a new one *x*- <sup>1</sup> = *x*1. In spite of this the total transmission factor from *x*<sup>0</sup> to *x*<sup>2</sup> remains unaffected.

7. *Path inversion:* A branch can only be inverted if it starts at a *source* node (no incoming branches) as the branch with transmission factor *b* in the illustration.

Note that application of this rule moves a source of the graph. Consecutive application of the rule allows inverting a path from a source to an arbitrary node.

#### 8. *Self-loop elimination*:

In the first few transformation rules only summations and multiplication of transmission factors occur. Therefore, if the initial transmission factors correspond to transfer functions (or impulse responses) of stable systems, so do the ones of the transformed graph. This is not necessarily the case with later rules entailing divisions.

# *A.1.3 Mason's Rule*

One of the strength of SFGs is the fact that *Mason's rule* allow writing the transfer function (or impulse response) from a source node to another node by inspection [38]. Before stating the general rule we first focus on graphs with a single open path from a source node *xs* to another node *x <sup>j</sup>* and where the path touches every loop in the graph (that means that it has at lease a node in common with every loop). For this case the transmission is given by

$$T\_{sj} = \frac{P\_{sj}}{\Delta}$$

with *Psj* the gain of the open path and the graph *determinant* (or system determinant). If we denote the loop gain of loop *i* by *Li* , the graph determinant is defined by

$$\Delta := 1 - \sum L\_i + \sum L\_i L\_j - \sum L\_i L\_j L\_k + \cdots,$$

the first summation being over all loops, the second over all pairs of non-touching loops, the third over all triplets of non-touching loops, etc.

As an example consider the SFG depicted in Fig.A.2. From *x*<sup>0</sup> to *x*<sup>4</sup> there is a single path touching all three loops. The transmission *T*<sup>04</sup> is

**Fig. A.2** Single path multi-loop SFG

$$\begin{aligned} T\_{04} &= \frac{P\_{04}}{1 - (L\_1 + L\_2 + L\_3) + L\_1 L\_2} \\ &= \frac{abcd}{1 - (be + dg + bcf) + bedg} \end{aligned}$$

Consider now the case of two open paths from a source node to another node. Since the graph represents a linear system, the output is the sum of the two contributions

$$T = \frac{P\_1}{\Delta\_1} + \frac{P\_2}{\Delta\_2}$$

where the individual contributions are calculated as for the single path case, with *Pj* the gain of path *j* and *<sup>j</sup>* the determinant associated with path *j*, that is, calculated discarding the loops not touched by path *j*. This expression can be rewritten as

$$T = \frac{P\_1 \Delta\_2 + P\_2 \Delta\_1}{\Delta}; \qquad \Delta = \Delta\_1 \Delta\_2 \dots$$

The denominator of this expression is the determinant of the full SFG and each path transmission *Pj* is multiplied by the determinant of that part of the graph with no nodes in common with the path. Generalising this expression to more paths we obtain *Mason's rule*

$$T = \frac{\sum\_{j} P\_{j} \Delta\_{j}}{\Delta} \tag{A.2}$$

where here, differently from above, *<sup>j</sup>* denotes the *co-factor* of path *k* defined as the determinant of that part of the graph which doesn't have any node in common with path *j*. A formal proof can be found in [39].

# **A.2 Nonlinear Systems**

The use of signal-flow graphs with nonlinear systems is more limited. One (and the first) application is in studying the smallest number of implicit equations in a system of nonlinear equations [38]. In this context a branch simply represents a dependence of the variable represented by the node pointed to by the branch in question by the **Fig. A.3** Example signal-flow graph representing Eq. (A.3)

variable represented by the node where the branch originates. For example the set of nonlinear equations

$$\begin{aligned} \mathbf{x}\_1 &= f\_1(\mathbf{x}\_0, \mathbf{x}\_2) \\ \mathbf{x}\_2 &= f\_2(\mathbf{x}\_0, \mathbf{x}\_1, \mathbf{x}\_2) \end{aligned} \tag{A.3}$$

is represented as in Fig.A.3. In this setting there is no concept of branch transmission. Also, parallel branches between nodes make no sense. In spite of this one can adapt those reduction rules based only on variable substitutions. For example, in the above example we can remove *x*<sup>1</sup> by a simple substitution obtaining a single implicit equation

$$\mathbf{x}\_2 = f\_2(\mathbf{x}\_0, f\_1(\mathbf{x}\_0, \mathbf{x}\_2), \mathbf{x}\_2) = f(\mathbf{x}\_0, \mathbf{x}\_2) \dots$$

A second way in which signal-flow graphs are used in conjunction with nonlinear systems consists in retaining most construction rules of linear SFGs, but allow the use of nonlinear functions instead of transmission factors. This approach is popular for example in the study of neural networks [40]. As an example Fig.A.4 shows the signal-flow graph of Rosenblatt's perceptron. In this model the input branches posses transmission factors labelled w*<sup>j</sup>* , but the branch connecting to the output *y* represents the application of the nonlinear function ϕ(.) to the signal v. The function ϕ(.) is called activation function and in this context has a monotonic, limiting character.

This method is useful to analyse graphs without feedback loops. There is no equivalent of Mason's rule applicable in the presence of feedback loops.

**Fig. A.4** Signal-flow graph of a Perceptron

# **A.3 Weakly-Nonlinear Systems**

The equations describing weakly-nonlinear time-invariant systems can be solved in an iterative way as described in Chap. 9. The component of order *k* of the Volterra series representation of a signal is calculated by solving a *linear* system of equations with products and powers of terms of order lower than *k* acting as sources. The signal-flow graph of a weakly-nonlinear system can thus be constructed as a linear SFG representing the linear part of the equations and where products and powers of signals are added as *source nodes.* In this way, all features of linear SFGs can be used, including Mason's rule.

One starts by calculating the first order response due to the external source (which is of order one) as with linear systems. With it one can then compute the value of the sources of second order. More generally, with the responses of order up to *k* − 1 one can compute the sources of order *k*. The latter drive a linear system. The transfer function of order *k* can often be written by inspection using Mason's rule as follows


The procedure works in the time domain as well. One has simply to use impulse responses and the convolution product, and adapt the second step.

The following example illustrates the use of SFGs to analyse WNTI systems. In Sect. 10.2 we use SFGs to illuminate the effect of distortion within a feedback loop.

#### **Example A.1: Driven Pendulum**

In this example we analyse the pendulum shown in Fig.A.5. A weight of mass *m* at the end of a massless rod is suspended from a pivot in Earth's gravitational field. We model the weight as a point mass whose position is specified by the angle φ. We assume the presence of some viscous fluid causing a drag proportional to the velocity of the weight. A motor drives the pendulum exerting a periodic torque *M*. The equation governing the dynamics of the system can be obtained from Newton's second law

$$D^2 \phi + \frac{b}{lm} D\phi + \frac{g}{l} \sin \phi = \frac{1}{l^2 m} K \cos(\omega t)$$

with *M*(*t*) = *K* cos(ω*t*).

For convenience, we re-express the coefficients of the equation in terms of the standard parameters for systems of second order and normalise the amplitude of the torque

$$a\_0 = \sqrt{\frac{\mathbf{g}}{l}}; \qquad \qquad q = \sqrt{g l} \frac{m}{b} ; \qquad \qquad \qquad A = \frac{K}{ml \, g}.$$

**Fig. A.5** Driven pendumum in Earth's gravitational field

Further, using the generalised velocity α = *D*φ, expanding the sin function in its Taylor series

$$\sin \phi = \sum\_{n=1,3,5,\dots} \frac{(-1)^{(n-1)/2}}{n!} \phi^n$$

and defining the input signal *x* as

$$\mathbf{x}(t) = A\cos(\alpha t)\,,$$

the equation can be written as a system of two convolution equations relating α and φ

$$\begin{aligned} \alpha &= D\delta \* \phi \\ \phi &= - (\frac{1}{\alpha\_0^2} D\delta + \frac{1}{\alpha\_0 q} \delta) \* \alpha + \mathbf{x} + \sum\_{n=3,5,\dots} \frac{(-1)^{(n-1)/2}}{n!} \phi^n. \end{aligned} \tag{A.4}$$

A signal-flow graph representing these equations is shown in Fig.A.6.

We are interested in the steady-state oscillation when the pendulum is driven with a frequency equal to ω0. For this reason it's convenient to work with transfer functions. The various transfer functions can be determined by inspection from the SFG using Mason's rule, using the transform of each transmission factor and applying the transform of an impulse as input. The first order transfer function is

$$H\_1(s\_1) = \frac{1}{\frac{s^2}{\alpha\_0^2} + \frac{1}{q}\frac{s}{\alpha\_0} + 1} \cdot$$

The linear part of the response is thus

$$\phi\_1(t) = \Re \{ H\_1(j\alpha\_0) A \mathbf{e}^{j\alpha\_0 t} \} = Aq \sin(\alpha\_0 t) \dots$$

From the signal-flow graph it's immediately apparent that the transmission from the source representing the nonlinear terms to the node φ is the same as the one from *x* to φ. Using the fact that the response of a linear system to a source of order *k* is obtained by multiplying the Laplace transform of the source by the linear transfer function of the system and by replacing in the latter *s*<sup>1</sup> with the sum *s*<sup>1</sup> +···+ *sk* (see Sect. 10.1), the transfer function of order *k* from the "nonlinear" source of order *k* is therefore

$$H\_1(\mathbf{s}\_1 + \dots + \mathbf{s}\_k) \dots$$

There is no second order power in the source terms. Hence, the second order transfer function *H*<sup>2</sup> is identically zero. The third order transfer function is obtained by substituting *H*<sup>1</sup> + *H*<sup>2</sup> in the "nonlinear source" terms φ*<sup>n</sup>* and retaining only terms of third order. This gives

$$\frac{1}{6}H\_1(\mathbf{s}\_1)H\_1(\mathbf{s}\_2)H\_1(\mathbf{s}\_3)\,,$$

hence

$$H\_3(\mathbf{s}\_1, \mathbf{s}\_2, \mathbf{s}\_3) = \frac{1}{6} H\_1(\mathbf{s}\_1) H\_1(\mathbf{s}\_2) H\_1(\mathbf{s}\_3) H\_1(\mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) \dots$$

Note that *H*<sup>3</sup> is already symmetric.

The contribution of third order to the steady-stage oscillation at ω<sup>0</sup> is given by the frequency mix *m* = (1, 2)

$$\begin{split} \phi\_{3,m}(t) &= \frac{A^3}{24} \frac{3!}{2!} \Re \left\{ H\_1(-J\omega\_1) H\_1(J\omega\_1) H\_1(J\omega\_1) H\_1(J\omega\_1) \mathbf{e}^{J\omega\_1 t} \right\} \\ &= -A^3 \frac{q^4}{8} \cos(\omega\_0 t) \ . \end{split}$$

The fourth order transfer function is identically zero. The fifth order is obtained by inserting *H*<sup>1</sup> +···+ *H*<sup>4</sup> into the "nonlinear source" terms φ*<sup>n</sup>* and retaining only terms of fifth order, giving

$$\frac{1}{6}\frac{3!}{2!} \left[ H\_1^{\otimes 2} \otimes H\_3 \right]\_{\text{sym}} - \frac{1}{6!} H\_1^{\otimes 5} \dots$$

The fifth order transfer function is then found by multiplying it by *H*1(*s*<sup>1</sup> +···+ *s*5) giving

$$H\_5(s\_1, s\_2, s\_3, s\_4, s\_5) = \left\{ \frac{1}{2} \left[ H\_1(s\_1) H\_1(s\_2) H\_3(s\_3, s\_4, s\_5) \right]\_{\text{sym}} \right.$$

$$-\frac{1}{5!} H\_1(s\_1) H\_1(s\_2) H\_1(s\_3) H\_1(s\_4) H\_1(s\_5) \left\{ H\_1(s\_1 + s\_2 + s\_3 + s\_4 + s\_5) \right\}.$$

The fifth order component at ω<sup>0</sup> produced by the frequency mix *m* = (2, 3) is

$$\mathbf{y}\_{\mathfrak{H},m}(t) = A^{\mathfrak{H}} \frac{1}{2^4} \frac{\mathfrak{H}!}{2!\mathfrak{J}!} \mathfrak{R} \left\{ H\_{\mathfrak{H}}(-j\omega\_0, -j\omega\_0, j\omega\_0, j\omega\_0, j\omega\_0) \mathfrak{e}^{j\alpha\mathfrak{g}} \right\}.$$

There are in total 10 ways to fill the slots of *H* <sup>⊗</sup><sup>2</sup> <sup>1</sup> ⊗ *H*<sup>3</sup> sym with permutations of the tuple (−jω0, −jω0,jω0,jω0,jω0). One possibility is to put the negative frequencies in the first two slots giving

$$H\_1(-\\\\\jmath a\_0)H\_1(-\\\\\jmath a\_0)H\_3(\\\\\jmath a\_0,\\\\\jmath a\_0,\\\\\jmath a\_0) = \frac{\jmath q^{\varsigma}}{6(8-\frac{3\jmath}{q})}.$$

There are 3 cases in which both *H*1s are applied to positive frequencies

$$H\_1(j\omega\_0)H\_1(j\omega\_0)H\_3(j\omega\_0, -j\omega\_0, -j\omega\_0) = \frac{q^6}{6} \dots$$

and 6 in which the two *H*1s appear one with a positive frequency and the other with a negative frequency

$$H\_1(-j\alpha\_0)H\_1(j\alpha\_0)H\_3(j\alpha\_0, j\alpha\_0, -j\alpha\_0) = -\frac{q^6}{6} \dots$$

Summing all terms we obtain

$$H\_{\mathbb{S}}(-j\omega\_0, -j\omega\_0, j\omega\_0, j\omega\_0, j\omega\_0) = \left[ \frac{1}{10} \left( \frac{Jq^{\frac{\mathfrak{s}}{\mathfrak{s}}}}{12(8 - \frac{3j}{q})} - \frac{q^6}{4} \right) + j\frac{q^8}{120} \right](-jq)\dots$$

The first, third and fifth order approximations that we have found are compared to a numerical solution of the differential equation for *q* = 7 and *A* = 0.1 in Fig.A.7. Even though we didn't took into account the harmonics produced by nonlinear terms, the fifth order approximation atω<sup>0</sup> gives a fairly accurate approximation up to a swing of ca. 40◦. A linear model predicts that the pendulum dynamics settles in such a way that the peak of the applied torque occurs when the pendulum is in the vertical position (φ = 0◦). The more accurate model shows that at moderate swing levels this is not the case. The peak torque happens before the pendulum passes through the vertical position. It's also interesting to note that the main phase correcting term is φ<sup>3</sup>,*<sup>m</sup>* whose phasor is perpendicular to the one of the linear term. However, being

at 90◦, φ<sup>3</sup>,*<sup>m</sup>* is unable to produce a good amplitude correction. For that we need to consider at least one more term.

# **References**


© The Editor(s) (if applicable) and The Author(s) 2024

F. Beffa, *Weakly Nonlinear Systems*, Understanding Complex Systems, https://doi.org/10.1007/978-3-031-40681-2


# **Index**

#### **Symbols**

evd(), 144 rv(), 162 []sym, 142

#### **A**

Airy functions, 268 Algebra, 95 convolution, 11–13, 96 graded, 139 Aliasing, 281 Almost everywhere, 24 Almost periodic, 122, 161 α, 22 Amplitude-modulation, 198

#### **B**

*B*, 87 Blocking level, 205 Bounded, 87 locally, 262

#### **C**

Capacitance, 209 Cauchy principal value, 26 Cauchy product, 308 Chaos, 7 CIM3, 335 *<sup>C</sup><sup>k</sup>* , <sup>18</sup> Convolution algebra, 11–13, 96 associativity, 46 continuity, 46 differentiation, 45

discrete, 308 distributivity, 44 for tome-varying systems, 273 generalised, 174 product, 10, 11, 42 shift, 45 unit, 44 Convolution equation, 96 elementary solution, 97 fundamental solution, 97 Cross-modulation, 205

#### **D**

*D*, 18, 48 *D*- , 23 *D*- (T), 52 *D*- <sup>+</sup>, 43 *D*- *<sup>L</sup>* , 43 *D*- *<sup>L</sup>*<sup>1</sup> , 87 *D*- *<sup>L</sup>*1+, <sup>122</sup> *D*⊕', 140 *D*- <sup>⊕</sup>,sym(R*<sup>k</sup>* ), <sup>142</sup> *D*- *<sup>R</sup>*, 43 *<sup>D</sup>*(T), <sup>50</sup> Degeneration impedance, 244 δ, 9, 11, 13, 26 derivative, 30 Fourier transform, 60 Dense, 24 Describing function, 197 Desensitisation ratio, 204 Device characteristic, 205 *x*-controlled, 205 *y*-controlled, 205

© The Editor(s) (if applicable) and The Author(s) 2024 F. Beffa, *Weakly Nonlinear Systems*, Understanding Complex Systems, https://doi.org/10.1007/978-3-031-40681-2

Differential circuit, 252 Differential equation elementary solution, 260 fundamental solution, 260 Differential operator, 18 Dirac comb, 53 , 61 delta, *see* δ impulse, *see* δ Paul A. M., 13 Direct product, 140 Direct sum, 139 Distribution, 11 , 22 approximation, 48 , 49 canonical extension, 90 convergence, 24 Dirac impulse, *see* δ even, 28 Laplace transformable, 76 left-sided, 43 multiplication, 29 odd, 28 partial derivative, 29 periodic, 49 , 52 real, 23 regular, 24 regularised, 20 , 91 right-sided, 43 scaling of independent, 28 shifting, 28 singular, 24 slow growth, 58 summable, 87 support, 34 symmetric, 141 tempered, 58 vanish, 34 vector valued, 113 with compact support, 34 *D O* , 315 *D* - *O* , 315 *D* ⊕ , 139 Doppler-spread function, 276 DR, 204 δ*<sup>T</sup>* , 53, 61

#### **E**

*E*, 34 *E*-, 34 Equilibrium point, 2 , 135 asymptotically stable, 135 domain of attraction, 135

stable, 8 , 135 unstable, 135 Evaluating on the diagonal, 144 Exponential matrix, 125 properties, 125

#### **F**

Floquet representation, 287 Formal power series, 141 Fourier series, 68 coefficients, 68 *n*-dimensional, 73 unit, 69 Fourier transform, 55 , 59 inverse, 55 *n*-dimensional, 70 properties, 65 symmetry, 59 uncertainty principle, 56 Fréchet, M., 13 Frequency mix, 160 order, 160 Frequency response time-varying, 275 time-varying nonlinear, 306 Fubini's theorem, 40 Function bounded, 87 , 120 Gauss, 57 original, 75 periodic, 73 rapid descent, 56 Schwartz, 56 slow increase, 58 Functional, 12 , 23 bounded convergence property, 89 Fundamental kernel, 267 Fundamental period, 49

#### **G**

Gain, 197 compression, 196 expansion, 196 1 dB compression point, 198 Group, 261

#### **H**

Heaviside Oliver, 13 unit step, 11 Index 369

#### **I**

IIP*k*, 201 Image-reject ratio, 329 IM*k*, 199 Impulse response, 9 kth order, 146 time-varying, 257 time-varying nonlinear, 304 Inductance, 212 Integrable locally, 24 Integral domain, 98 Intercept point, 201 Intermodulation intercept point, 201 product, 199 products, counter, 335 Inverse, 96 IP*k*, 201

#### **J**

Jammer, 203 Jordan normal form, 128

#### **K**

Kernel, 146 Kronecker delta, 308

#### **L**

Laplace transform, 76 abscissa of convergence, 75, 76 inverse, 82 *n*-dimensional, 84 properties, 78 region of convergence, 76 Lebesgue integral, 24 measure, 24 Limit cycles, 4 Lineariser post-, 186 pre-, 186 Lipschitz continuous locally, 135

#### **M**

Mason's rule, 358 Matrix controllability, 127 exponential, 125

observability, 128 principal fundamental, 259 semi simple, 129 state transition, 259 Measure Lebesgue, 24 zero, 24 Mikusinski, J., 13 Mismatch, 225 Mixer, 278 harmonic-reject, 298 image response, 329 image-reject ratio, 329 sampling, 339 MOSFET linear region, 233 overdrive voltage, 233 saturation region, 233 saturation voltage, 233 Multi-index, 18 direct product, 71 exponentiation, 71 factorial, 72 length, 18 summation, 72

#### **N**

Nonunit, 145 Norator, 245 *N*-path filter, 347 Nullator, 245 Nullor, 245 Numerical simulations, 12

#### **O**

OIP*k*, 201 *O<sup>M</sup>* , 92 Operatinal calculus, 13 Operating point, 195, 206 Operator differential, 18 evolution, 259 shift, 44 time-ordered product, 264 Original functions, 75

#### **P**

Parity, 62 Period fundamental, 49 Permutation, 141

Phase-modulation, 198 Phase portrait, 2 Phase space, 2 Phasor, 161 Pitchfork, 6 Point bifurcation, 5 critical, 5 equilibrium, *see* Equilibrium point Polynomial relatively prime, 104 Principal value, 26 Product convolution, 10, 11, 42 tensor, 39, 40

#### **Q**

Quadrature demodulator, 294 Quadrature modulator, 294

**R** Rayleigh distribution, 327 Resolvent kernel, 264

#### **S**

*S*- , 58 *S*, 56 Sample and hold, 279 Sampling, 281 Schwartz distributions, *see* Distribution functions, 56 Laurent, 13 space, 56 Separatrix, 3 Series absolutely convergent, 307 Fourier, 68 Volterra, 10, 149 Wiener, 13 Signal, 117 blocking, 203 common-mode, 252 differential-mode, 252 Signal-flow graph, 353 co-factor, 358 determinant, 357 linear, 353 Mason's rule, 358 transmission factor, 353 Sk, 141

Small-signal, 195 capacitance, 209 inductance, 212 Source controlled, 216 Stability bounded-input bounded-output, 120 Lyapunov, 136 Stage Cascode, 239, 242 class-AC, 237 common-gate, 239 common-source, 232 differential-pair, 252 Pseudo-differential, 253 State, 2 State space representation, 126 Superposition principle, 117 Support, 34 compact, 18 Switched circuits, 315 Symmetrisation, 142 System autonomous, 2 cascade, 169 causal, 119, 149 chaotic, 7 composition, 170 continuous, 118 controllable, 124 frequency response, 121 impulse response, 9, 118 *k*th order fundamental kernel, 302 linear time-invariant, 118 memory, 149 memory-less, 9, 149, 176, 203 monomial, 191 nonlinear, 1 observable, 124 order, 124, 193 real, 118 state, 2, 126 state space representation, 126 time-invariant, 117 time-varying impulse response, 257 time-varying nonlinear frequency responses, 306 time-varying nonlinear impulse responses, 304 time-varying transfer function, 289 transfer function, 122 translation invariance, 119 weakly-nonlinear, 10, 138, 146

#### Index 371

weakly-nonlinear time-varying, 301 zero state response, 123

#### **T**

*T* , 49 Tempered distribution, 58 Tensor product, 40 Test functions, 18 convergence, 18 graded algebra, 139 topology, 18 Theorem Fubini, 40 Picard-Lindelöf's, 135 sampling, 281 Thévenin-Norton, 242 Transconductance, 217 Transfer function, 122 minimal, 123 nonlinear, 149 time-varying, 282 Transfer ratio current, 217 voltage, 217 Transit frequency, 244 Transresistance, 217

**U** Unitary function, 51 Unit step, 11 derivative, 31

#### **V**

Vanish, 34 Volterra integral equation of the second kind, 261 series, 10, 149 Vito, 12, 149

**W** Wiener, N., 13

#### **X** XM, 204

**Z** Zero divisors, 98