**134**

Spencer J. Sherwin · David Moxey Joaquim Peiró · Peter E. Vincent Christoph Schwab *Editors*

# **Spectral and High Order Methods for Partial Diff erential Equations ICOSAHOM 2018 Editorial Board T. J. Barth M. Griebel D. E. Keyes R. M. Nieminen**

**D. Roose T. Schlick**

## **Lecture Notes in Computational Science and Engineering**

### Editors:

Timothy J. Barth Michael Griebel David E. Keyes Risto M. Nieminen Dirk Roose Tamar Schlick

More information about this series at http://www.springer.com/series/3527

Spencer J. Sherwin • David Moxey • Joaquim Peiró • Peter E. Vincent • Christoph Schwab Editors

# Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018

Selected Papers from the ICOSAHOM Conference, London, UK, July 9–13, 2018

*Editors* Spencer J. Sherwin Department of Aeronautics Imperial College London, UK

Joaquim Peiró Department of Aeronautics Imperial College London, UK

Christoph Schwab Department of Mathematics ETH Zürich Zürich, Switzerland

David Moxey College of Engineering, Mathematics & Physical Sciences University of Exeter Exeter, UK

Peter E. Vincent Department of Aeronautics Imperial College London, UK

ISSN 1439-7358 ISSN 2197-7100 (electronic) Lecture Notes in Computational Science and Engineering ISBN 978-3-030-39646-6 ISBN 978-3-030-39647-3 (eBook) https://doi.org/10.1007/978-3-030-39647-3

Mathematics Subject Classification: 65M70, 65N35, 65N30, 74S25, 76M10, 76M22, 78M10, 78M22

This book is an open access publication.

© The Editor(s) (if applicable) and The Author(s) 2020

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cover illustration: A 7th order accurate simulation of free stream turbulence passing over a turbine blade simulated using the Nektar++ package, courtesy of Andrea Cassinelli

This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### **Preface**

This volume presents selected papers from the twelfth International Conference on Spectral and High-Order Methods (ICOSAHOM'18) that was held in London, United Kingdom, during the week of July 9–13th, 2018. These selected papers were refereed by members of the scientific committee of ICOSAHOM, as well as by other leading scientists.

The first ICOSAHOM conference was held in Como, Italy, in 1989 and marked the beginning of an international conference series in Montpellier, France (1992); Houston, TX, USA (1995); Tel Aviv, Israel (1998); Uppsala, Sweden (2001); Providence, RI, USA (2004); Beijing, China (2007); Trondheim, Norway (2009); Gammarth, Tunisia (2012); Salt Lake City, USA (2014); and Rio de Janeiro, Brazil (2016).

ICOSAHOM has established itself as the main meeting place for researchers with interests in the theoretical, applied, and computational aspects of high-order methods for the numerical solution of partial differential equations.

With over 360 attendees, ICOSAHOM '18 has been the largest edition of the conference series to date. The program consisted of eight invited speakers across the week from internationally renowned researchers, alongside 40 minisymposia (of around 300 presentations) dedicated to specialized topics in high-order methods, and approximately a further 90 contributed talks.

The content of these proceedings is organized as follows. First, contributions from the invited speakers are included. The remainder of the volume consists of refereed selected papers highlighting the broad spectrum of topics presented at ICOSAHOM '18.

The success of ICOSAHOM '18 was ensured through generous contributions and financial support of our sponsors: the Air Force Office of Scientific Research (AFSOR); the Platform for Research in Simulation Methods (PRISM) platform grant, funded by the Engineering and Physical Sciences Research Council (EPSRC); Rolls-Royce Ltd.; and, finally, the Department of Aeronautics at Imperial College London.

We would like to give special thanks to our local organizing committee for their efforts in organizing and promoting the event. In particular, we would also like to thank Mr. Andrea Cassinelli for his organizational efforts leading up to the conference, as well as the administrative staff of the Department of Aeronautics at Imperial College London for their help in coordinating the logistics of the event. We also thank the many student helpers for their advice, help, and support given to the delegates during the event itself, who all contributed to the smooth running of the event.

Zürich, Switzerland Christoph Schwab

London, UK Spencer J. Sherwin Exeter, UK David Moxey London, UK Joaquim Peiró London, UK Peter E. Vincent

### **Contents**

#### **Part I Invited Papers**



Contents ix




#### Contents xi


## **Part I Invited Papers**

## **Stability of Wall Boundary Condition Procedures for Discontinuous Galerkin Spectral Element Approximations of the Compressible Euler Equations**

**Florian J. Hindenlang, Gregor J. Gassner, and David A. Kopriva**

#### **1 Introduction**

The ingredients for a reliable numerical method for the approximation of partial differential equations, e.g. one that will not blow up, include stable inter-element and physical boundary condition implementations. The recognition that the discontinuous Galerkin spectral element method (DGSEM) with Gauss-Lobatto quadratures satisfies a summation-by-parts (SBP) operators [4, 7] has allowed for the analysis of these schemes and to connect them with penalty collocation and SBP finite difference schemes. For instance, in [5], we showed that a split form approximation of the compressible Navier–Stokes equations was both linearly and entropy stable provided that the boundary conditions were properly imposed.

The importance of stable boundary condition procedures for hyperbolic equations has long been studied, especially in relation to finite difference methods, e.g. [3, 9, 10]. Only recently have they been studied for discontinuous Galerkin approximations. In [12], the authors showed that the reflection approach is stable when using an entropy conserving flux and an additional entropy stable dissipation

F. J. Hindenlang

D. A. Kopriva Florida State University, Tallahassee, FL, USA

San Diego State University, San Diego, CA, USA e-mail: kopriva@math.fsu.edu

© The Author(s) 2020

Max Planck Institute for Plasma Physics, Garching, Germany e-mail: florian.hindenlang@ipp.mpg.de

G. J. Gassner (-)

Department for Mathematics and Computer Science, Center for Data and Simulation Science, University of Cologne, Cologne, Germany e-mail: ggassner@math.uni-koeln.de

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_1

term (EC-ES). In [2], the authors show that the reflection condition is stable if the numerical flux is either the Godunov or HLL flux.

In this paper, we analyze both the linear and entropy stability of two types of commonly used wall boundary condition procedures used with the DGSEM applied to the compressible Euler equations. In both cases, wall boundary conditions are implemented through a numerical flux. The boundary condition might be implemented through a special wall numerical flux that includes the boundary condition, or a fictitious external state applied to a Riemann solver approximation. We show how to construct special wall numerical fluxes that are stable, and study the behavior of the approximations. In particular, we show that the use of Riemann solvers at the boundaries introduce numerical dissipation in an amount that depends on the size of the normal Mach number at the wall.

#### **2 The Compressible Euler Equations and the Wall Boundary Condition**

We write the Euler equations as

$$\mathbf{u}\_l + \sum\_{i=1}^3 \frac{\partial \mathbf{f}\_i}{\partial x\_l} = \mathbf{0}. \tag{1}$$

The state vector contains the conservative variables

$$\mathbf{u} = \begin{bmatrix} \varrho & \varrho \vec{v} & E \end{bmatrix}^T = \begin{bmatrix} \varrho & \varrho v\_1 & \varrho v\_2 & \varrho v\_3 & E \end{bmatrix}^T. \tag{2}$$

In standard form, the components of the advective fluxes are

$$\mathbf{f}\_{1} = \begin{bmatrix} \varrho v\_{1} \\ \varrho v\_{1}^{2} + p \\ \varrho v\_{1} v\_{2} \\ \varrho v\_{1} v\_{3} \\ (E+p)v\_{1} \end{bmatrix} \quad \mathbf{f}\_{2} = \begin{bmatrix} \varrho v\_{2} \\ \varrho v\_{2} \, v\_{1} \\ \varrho v\_{2}^{2} + p \\ \varrho v\_{2} \, v\_{3} \\ (E+p)v\_{2} \end{bmatrix} \quad \mathbf{f}\_{3} = \begin{bmatrix} \varrho v\_{3} \\ \varrho v\_{3} \, v\_{1} \\ \varrho v\_{3} \, v\_{2} \\ \varrho v\_{3}^{2} + p \\ (E+p)v\_{3} \end{bmatrix}, \tag{3}$$

Here, *, v*- <sup>=</sup> *(v*1*, v*2*, v*3*)<sup>T</sup> , p, E* are the mass density, fluid velocities, pressure and total energy. We close the system with the ideal gas assumption, which relates the total energy and pressure

$$p = (\boldsymbol{\nu} - \boldsymbol{1}) \left( \boldsymbol{E} - \frac{1}{2} \boldsymbol{\varrho} \left\| \vec{\boldsymbol{v}} \right\|^2 \right), \tag{4}$$

where *γ* denotes the adiabatic coefficient. For a compact notation that simplifies the analysis, we define *block vectors* (with the double arrow)

$$\overleftrightarrow{\mathbf{f}} = \begin{bmatrix} \mathbf{f}\_1 & \mathbf{f}\_2 & \mathbf{f}\_3 \end{bmatrix}^T,\tag{5}$$

so that the system of equations can be written in the compact form

$$
\mathbf{u}\_l + \vec{\nabla}\_\times \cdot \vec{\mathbf{f}} = \mathbf{0}.\tag{6}
$$

The linear Euler equations are derived by linearizing about a constant mean state *(,*¯ *v*¯1*, v*¯2*, v*¯3*, p)*¯ . We follow [11] for the symmetrization of the linearized equations, with the constants

$$a = \sqrt{\frac{\chi - 1}{\chi}} \bar{c}, \quad b = \frac{\bar{c}}{\sqrt{\bar{\chi}}},\\ \bar{c} = \sqrt{\frac{\chi \bar{p}}{\bar{e}}}, \tag{7}$$

where *c*¯ is the sound speed of the constant mean state. The state variables become

$$\mathbf{u} = \begin{bmatrix} \varrho' & v\_1 & v\_2 & v\_3 & p' \end{bmatrix}^T,\tag{8}$$

where *v*is the velocity perturbation from the mean state, and we introduce

$$\varrho'=b\frac{\tilde{\varrho}}{\tilde{\varrho}}, \quad p'=\frac{1}{\tilde{\varrho}a}\tilde{p}-\frac{1}{\sqrt{\chi'-1}}\varrho',\tag{9}$$

which depend on the density and pressure perturbations *,*˜ *p*˜. The flux vectors are

$$\mathbf{f}\_l = \underline{\mathbf{A}}\_l \mathbf{u}, \quad \overleftrightarrow{\mathbf{f}} = \overline{\mathbf{A}} \mathbf{u} = \left( \underline{\mathbf{A}}\_l \hat{\mathbf{x}} + \underline{\mathbf{A}}\_2 \hat{\mathbf{y}} + \underline{\mathbf{A}}\_3 \hat{\mathbf{z}} \right) \mathbf{u}, \tag{10}$$

where [11]

$$
\begin{aligned}
\mathbf{A}\_{1} = \begin{bmatrix}
\bar{v}\_{1} & b & 0 & 0 & 0 \\ b & \bar{v}\_{1} & 0 & 0 & a \\ 0 & 0 & \bar{v}\_{1} & 0 & 0 \\ 0 & 0 & 0 & \bar{v}\_{1} & 0 \\ 0 & a & 0 & 0 & \bar{v}\_{1}
\end{bmatrix}, \quad \mathbf{A}\_{2} = \begin{bmatrix}
\bar{v}\_{2} & 0 & b & 0 & 0 \\ 0 & \bar{v}\_{2} & 0 & 0 & 0 \\ b & 0 & \bar{v}\_{2} & 0 & a \\ 0 & 0 & 0 & \bar{v}\_{2} & 0 \\ 0 & 0 & a & 0 & \bar{v}\_{2}
\end{bmatrix}, \quad \mathbf{A}\_{3} = \begin{bmatrix}
\bar{v}\_{3} & 0 & 0 & b & 0 \\ 0 & \bar{v}\_{3} & 0 & 0 & 0 \\ 0 & 0 & \bar{v}\_{3} & 0 & 0 \\ b & 0 & 0 & \bar{v}\_{3} & a \\ 0 & 0 & 0 & a & \bar{v}\_{3}
\end{bmatrix}, \tag{11}
$$

are constant symmetric matrices.

The linear equations have the property that the *L*<sup>2</sup> norm of the solution over a domain is bounded by terms of the boundary data on *∂*, only. Let

$$
\langle \mathbf{v}, \mathbf{w} \rangle = \int\_{\Omega} \mathbf{v}^T \mathbf{w} \, d\mathbf{x} \, d\mathbf{y} dz, \quad \left\langle \overleftarrow{\mathbf{f}}, \overleftarrow{\mathbf{g}} \right\rangle = \int\_{\Omega} \sum\_{l=1}^3 \mathbf{f}\_l^T \mathbf{g}\_l \, d\mathbf{x} \, d\mathbf{y} dz. \tag{12}
$$

represent the *L*<sup>2</sup> inner product of two state vectors **v** and **w** and two block vectors <sup>↔</sup> **f** and <sup>↔</sup> **g**, respectively. Since the coefficient matrices are constant the product rule and symmetry of Aimplies

$$
\left\langle \vec{\nabla}\_{\times} \cdot \vec{\mathbf{f}}, \mathbf{u} \right\rangle = \left\langle \vec{\nabla}\_{\times} \cdot \left( \vec{\Delta} \mathbf{u} \right), \mathbf{u} \right\rangle = \left\langle \nabla\_{\times} \mathbf{u}, \vec{\mathbf{f}} \right\rangle. \tag{13}
$$

Then it follows from Gauss' law (integration by parts) that

$$
\left\langle \vec{\nabla}\_{\times} \cdot \vec{\mathbf{f}}, \mathbf{u} \right\rangle = \frac{1}{2} \int\_{\partial \Omega} \mathbf{u}^{T} \vec{\mathbf{f}} \cdot \vec{n} dS,\tag{14}
$$

where *n* is the outward normal to the surface of . The norm of the solution therefore satisfies

$$\frac{d}{dt}||\mathbf{u}||^2 = -\int\_{\partial\Omega} \mathbf{u}^T \mathbf{\tilde{f}} \cdot \mathbf{\tilde{n}}dS.\tag{15}$$

Replacing the boundary terms by boundary conditions leads to a bound on the solution in terms of the boundary data. The argument of the boundary integral on the right of (15) is

$$\mathbf{u}^{T}\overleftarrow{\mathbf{f}}\cdot\vec{n}=\mathbf{u}^{T}\left(\overleftarrow{\Delta}\cdot\vec{n}\right)\mathbf{u}=2\left(\varrho b+ap\right)v\_{n}+(\vec{\bar{v}}\cdot\vec{n})(\varrho^{2}+|\vec{v}|^{2}+p^{2}),\tag{16}$$

where *vn* is the wall normal velocity, *vn* = *v* · *n*. Note that here, the mean flow must be chosen such that the normal flow vanishes at the wall boundary *v*-¯ · *n* = 0, so that the boundary condition makes physical sense.

Therefore, with the no penetration wall condition *vn* = 0 applied,

$$\frac{d}{dt}||\mathbf{u}||^2 = 0,\tag{17}$$

and the (energy) norm of the solution is bounded for all time by its initial value.

The nonlinear equations, on the other hand, satisfy a bound on the entropy that depends only on the boundary data. For what follows, we assume that the solution is smooth so that we don't have to consider entropy generated at shock waves. We introduce the entropy density (scaled with *(γ* − 1*)* for convenience) as

$$s(\mathbf{u}) = -\frac{\varrho \boldsymbol{\xi}}{(\boldsymbol{\chi} - \boldsymbol{1})},\tag{18}$$

where *ς* = ln*(p)* − *γ* ln*()* is the physical entropy. (The minus sign is conventional in the theory of hyperbolic conservation laws to ensure a decreasing entropy function.) The entropy flux for the Euler equations is

$$\vec{f}^{\xi}(\mathbf{u}) = \vec{v} \, s = -\frac{\varrho \, \xi \, \vec{v}}{(\chi - 1)}.\tag{19}$$

Finally the entropy variables are

$$\mathbf{w} = \frac{\partial s(\mathbf{u})}{\partial \mathbf{u}} = \begin{bmatrix} \frac{\mathbb{Y} - \xi}{\mathbb{Y} - 1} - \beta ||\vec{v}||^2, \\\ 2\beta \vec{v} \\\ -2\beta \end{bmatrix}, \quad \beta = \frac{\varrho}{2p}. \tag{20}$$

The entropy pair contracts the solution and fluxes, meaning that it satisfies the relations

$$\mathbf{w}^T \mathbf{u}\_l = \left(\frac{\partial \mathbf{s}}{\partial \mathbf{u}}\right)^T \mathbf{u}\_l = \mathbf{s}\_l(\mathbf{u}), \quad \mathbf{w}^T \vec{\nabla}\_\mathbf{x} \cdot \vec{\mathbf{f}} = \vec{\nabla}\_\mathbf{x} \cdot \vec{f}^\xi. \tag{21}$$

When we multiply (6) with the entropy variables and integrate over the domain,

$$
\left< \mathbf{w(u)}, \mathbf{u}\_{l} \right> + \left< \mathbf{w(u)}, \vec{\nabla}\_{\chi} \cdot \vec{\mathbf{f}} \right> = 0 \,. \tag{22}
$$

Next we use the properties of the entropy pair to contract (22) and use integration by parts to get

$$\left< s\_l(\mathbf{u}), 1 \right> = -\left< \vec{\nabla}\_{\chi} \cdot \vec{f}^{\leq \zeta}, 1 \right> = -\int\_{\partial \Omega} \left( \vec{f}^{\leq \zeta} \cdot \vec{n} \right) \, \mathrm{d}S \tag{23}$$

showing that, in the continuous case, the total entropy in the domain can only change via the boundary conditions.

In the case of a zero-mass flux boundary condition, with *vn* = *v* · *n* = 0, the entropy is not changed by the slip-wall boundary condition, since

$$-\vec{f}\,^{\zeta}\cdot\vec{n} = \frac{\varrho\,\zeta}{(\chi - 1)}v\_n = 0.\tag{24}$$

#### **3 Stability Bounds for the DGSEM**

The DGSEM is described in detail in [5] and elsewhere [1, 6]. We will only quickly summarize the approximation here. The domain, is subdivided into non-overlapping, conforming, hexahedral elements. Each element is mapped to the reference element *E* = [−1*,* 1] 3. Associated with the transformation from the reference element is a set of contravariant coordinate vectors, *ai* , and transformation Jacobian, J. Equation (6) transform to another conservation law on the reference element as

$$
\mathcal{J}\mathbf{u}\_l + \vec{\nabla}\_{\xi} \cdot \stackrel{\leftrightarrow}{\mathbf{\tilde{f}}} = 0,\tag{25}
$$

where ↔ ˜**f** is the contravariant flux vector with components ˜**f** *<sup>i</sup>* <sup>=</sup> <sup>J</sup>*ai* · ↔ **f**.

The approximation of (25) proceeds as follows: A weak form is created by taking the inner product of the equation with a test function. The Gauss law is applied to the divergence term to separate the boundary from the interior contributions. The resulting weak form is then approximated: The solution vector is approximated by a polynomial of degree *N* interpolated at the Legendre–Gauss–Lobatto points. In the following, we will represent the true continuous solutions by lower case letter. Upper case letters will denote their polynomial approximations, except for the density, where the approximation is denoted by *ρ*. The volume fluxes are replaced by twopoint numerical fluxes. In the linear case, the two point fluxes are immediately relatable to a split form of the equations. Integrals are replaced by Legendre–Gauss– Lobatto quadratures. Finally, the boundary fluxes are replaced by a numerical flux. See [5] and [8] for details.

The result is an approximation that is energy stable for the linearized equations if at every quadrature point along a physical boundary the numerical flux **F**˜ ∗ satisfies the bound [5]

$$\mathbf{U}^{T}\left\{\tilde{\mathbf{F}}^{\*} - \frac{1}{2}\stackrel{\leftrightarrow}{\mathbf{F}} \cdot \hat{n}\right\} \geq 0,\tag{26}$$

where ↔ **F**˜ is the polynomial interpolation of the contravariant flux from the interior, *n*ˆ is the reference space outward normal direction, and **U** is the approximation of the state vector. Since the contravariant fluxes are proportional to the normal fluxes [6], we can change the condition (26) to

$$B\_L \equiv \mathbf{U}^T \left\{ \mathbf{F}^\* - \frac{1}{2} \overleftarrow{\mathbf{F}} \cdot \tilde{n} \right\} \ge 0,\tag{27}$$

For entropy stability of the nonlinear equations, the boundary stability condition shown in [5] is proportional to

$$B\_{NL} \equiv \mathbf{W}^T \left( \mathbf{F}^\* - \left( \vec{\mathbf{F}} \cdot \vec{n} \right) \right) + \left( \vec{F}^\cdot \cdot \vec{n} \right) \ge 0,\tag{28}$$

where *F <sup>ς</sup>* is the polynomial interpolation of the entropy flux, *f<sup>ς</sup>* , and **W** is the interpolation of the entropy variables.

#### *3.1 Linear Stability of Wall Boundary Condition Approximations*

To find linearly stable implementations of the wall condition *vn* = 0, one needs only find a numerical flux that satisfies it and the condition (27). For the linear equations, the approximation of the state vector is **U** = [*ρ V P* - ] *<sup>T</sup>* and the normal contravariant flux is proportional to

$$
\vec{\mathbf{F}} \cdot \vec{n} = \vec{\underline{\mathbf{A}}} \cdot \vec{n}
\
\mathbf{U} = \begin{bmatrix} b \, V\_n \, \, n \, \, \underline{Q} \, \, n \, \underline{Q} \, \, n \, \underline{Q} \, \, a \, V\_n \end{bmatrix}^T,\tag{29}
$$

where *Vn* is the approximation of the normal velocity at the wall computed from the interior, *Q* = *bρ* + *aP* , and *(n*1*, n*1*, n*3*)* are the three components of the physical space normal vector, *n*-. The numerical flux can be expressed as

$$\mathbf{F}^\* = \underline{\vec{\mathbf{A}}} \cdot \vec{n} \, \mathbf{U}^\* = \begin{bmatrix} bV\_n^\* & n\_1 \, Q^\* & n\_2 \, Q^\* & n\_3 \, Q^\* & aV\_n^\* \end{bmatrix}^T. \tag{30}$$

It then remains only to find *Q*∗ so that (27) is satisfied when the normal wall condition *V* ∗ *<sup>n</sup>* = 0 is applied. When we substitute the fluxes (29) and (30) into (27),

$$B\_L = \frac{1}{2} \left\{ \mathcal{Q} \left( 2V\_n^\* - V\_n \right) + V\_n \left( 2\mathcal{Q}^\* - \mathcal{Q} \right) \right\} = \frac{1}{2} \left\{ 2\mathcal{Q}V\_n^\* + 2V\_n \left( \mathcal{Q}^\* - \mathcal{Q} \right) \right\} \tag{31}$$

Substituting the wall boundary condition *V* ∗ *<sup>n</sup>* = 0 yields the condition on *Q*<sup>∗</sup> for stability

$$V\_n \left( \mathcal{Q}^\* - \mathcal{Q} \right) \ge 0. \tag{32}$$

Neutral stability is thus ensured if *ρ*∗ and *P*∗ are computed from the interior, i.e. *ρ*<sup>∗</sup> = *ρ* , *P*<sup>∗</sup> = *P* so that *Q*<sup>∗</sup> = *Q*.

In practice, the boundary condition is also implemented through the use of a Riemann solver and external state designed to imply the physical boundary condition to construct the numerical boundary flux. The exact upwind (*ε* = 1) normal Riemann flux and the central flux (*ε* = 0) for the linear system of equations is

$$\mathbf{F}^\*\left(\mathbf{U}, \mathbf{U}^{\text{ext}}\right) = \frac{1}{2} \left\{ \vec{\mathbf{F}}\left(\mathbf{U}\right) \cdot \vec{n} + \vec{\mathbf{F}}\left(\mathbf{U}^{\text{ext}}\right) \cdot \vec{n} \right\} - \frac{\varepsilon}{2} \left| \Delta\_n \right| \left(\mathbf{U}^{\text{ext}} - \mathbf{U}\right), \tag{33}$$

where A*<sup>n</sup>* ≡ A- · *n* is the normal coefficient matrix. The external state is set by using the interior values of the density and pressure and the negative of the value of the normal velocity,

$$\mathbf{U}^{\text{ext}} = \left[\rho' \quad \left(\vec{V} - 2V\_n\vec{n}\right) \cdot \mathbf{P}'\right]^T. \tag{34}$$

For *ε* = 0, using the central (averaged) numerical flux, the interior flux contribution cancels and condition (27) reduces to

$$\begin{aligned} \boldsymbol{B}\_{L,0} &= \frac{1}{2} \mathbf{U}^T \underline{\Delta}\_n \mathbf{U}^{\text{ext}} = \begin{bmatrix} \rho' & \vec{V} & \boldsymbol{P}' \end{bmatrix} \begin{bmatrix} 0 & n\_1 b \ n\_2 b \ n\_3 b & 0 \\ n\_1 b & 0 & 0 & n\_1 a \\ n\_2 b & 0 & 0 & 0 & n\_2 a \\ n\_3 b & 0 & 0 & 0 & n\_3 a \\ 0 & n\_1 a \ n\_2 a \ n\_3 a & 0 \end{bmatrix} \begin{bmatrix} \rho' \\ \vec{V} - 2V\_n \vec{n} \\ \vec{V} - 2V\_n \vec{n} \\ \boldsymbol{P}' \end{bmatrix} \\ &= \mathcal{Q} \begin{pmatrix} -V \cdot \vec{n} \end{pmatrix} + \begin{pmatrix} V \cdot \vec{n} \end{pmatrix} \mathcal{Q} = 0, \end{aligned} \tag{35}$$

which is neutrally stable, having no additional stabilizing dissipation. We note again, that the mean state for the linearization is chosen such that the normal mean velocity components are zero, resulting in the zeros on the diagonal of A*n*.

Substituting the exact upwind flux where *ε* = 1 into (27) and rearranging,

$$B\_{L,1} = -\mathbf{U}^T \left| \underline{\mathbf{A}}\_n^- \right| \mathbf{U}^{\text{ext}} + \frac{1}{2} \mathbf{U}^T \left| \underline{\mathbf{A}}\_n \right| \mathbf{U},\tag{36}$$

where A− *<sup>n</sup>* <sup>=</sup> <sup>1</sup> 2 A*<sup>n</sup>* − <sup>A</sup>*<sup>n</sup>* is negative semidefinite. The second term is nonnegative, depends only on the interior state, and adds stabilizing dissipation. From the matrix absolute value, the dissipation term is

$$\mathbf{U}^{T} \left| \underline{\mathbf{A}}\_{n} \right| \mathbf{U} = \frac{1}{\bar{c}} \boldsymbol{\mathcal{Q}}^{2} + \bar{c}^{3} \mathbf{M} \mathbf{a}\_{n}^{2},\tag{37}$$

where Ma*<sup>n</sup>* = *Vn/c*¯ is the normal Mach number. Stability depends, then, on the value of the first term, which is where the boundary conditions are incorporated through the external state **U**ext written in (34). Then

$$\mathbf{U}^{T}\left|\Delta\_{n}^{-}\right|\mathbf{U}^{\text{ext}}=\frac{1}{2\bar{c}}\mathcal{Q}^{2}-\frac{\bar{c}^{3}}{2}\mathbf{M}\mathbf{a}^{2}.\tag{38}$$

Therefore, using the upwind numerical flux, (36) becomes

$$\mathcal{B}\_{L,1} = \bar{c}^3 \mathbf{M} \mathbf{a}\_n^2 \ge 0,\tag{39}$$

as required. The amount of dissipation depends on how far the interior computed normal velocity deviates from zero.

The combination of the reflective state and local Lax-Friedrichs flux is also linearly stable. In that case the exact matrix absolute value is replaced by a diagonal matrix, A*n* <sup>≈</sup> <sup>|</sup>*λ*|max I. The jump term is added to the central (averaged) flux so

$$B\_{L,LF} = -\frac{|\lambda|\_{\text{max}}}{2} \mathbf{U}^T \left(\mathbf{U}^{\text{ext}} - \mathbf{U}\right) = \bar{c}^2 |\lambda|\_{\text{max}} \mathbf{M} \mathbf{a}\_n^2 \ge 0 \tag{40}$$

Finally, a dissipative version of the direct numerical flux (30) can be formed by looking at the reflective state approach. For instance, the equivalent to using the Lax-Friedrichs flux is to choose *ρ*<sup>∗</sup> = *ρ* and

$$P^\* = P' + \frac{\bar{c}^3}{a} |\lambda|\_{\text{max}} \mathbf{M} \mathbf{a}\_{\text{\textquotedblleft}a}.\tag{41}$$

Then *<sup>Q</sup>*<sup>∗</sup> <sup>=</sup> *<sup>Q</sup>* + ¯*c*3|*λ*|maxMa*<sup>n</sup>* and

$$V\_n \left( \mathcal{Q}^\* - \mathcal{Q} \right) = \bar{c}^2 |\lambda|\_{\text{max}} \mathbf{M} \mathbf{a}\_n^2 \ge 0. \tag{42}$$

A similar, though more complicated, modified *P*∗ can be made to be equivalent to the exact upwind flux.

#### *3.2 Entropy Stability of Wall Boundary Condition Approximations*

As in the linear approximation, the wall boundary condition can be imposed for the nonlinear equations either by directly specifying the numerical flux or by computing it through a Riemann solver using a reflection external state that enforces the normal wall condition implicitly. Note that in this section, the discrete variables *(ρ, V,P)* - describe the full nonlinear state.

For the nonlinear equations, we construct the numerical flux for a slip-wall as

$$\left(\stackrel{\leftrightarrow}{\mathbf{F}} \cdot \vec{n}\right)^{\*} = \begin{bmatrix} 0\\P^{\*}\vec{n} \\ 0 \end{bmatrix} \tag{43}$$

where we imposed *Vn* = 0 leading to a flux with no mass or energy transfer, and we introduce a wall pressure *P*∗, whose value will be chosen to ensure consistency and stability.

After some manipulations, the discrete entropy stability condition (28) becomes

$$-\rho V\_n \left(\frac{\mathcal{V} - \xi}{(\mathcal{V} - 1)} - \beta ||\vec{V}||^2\right)$$

$$+ 2\rho V\_n \left(P^\* - P - \rho ||\vec{V}||^2\right) + 2\beta V\_n \left(\rho E + P\right) - \frac{\rho\_\zeta V\_n}{(\mathcal{V} - 1)} =$$

$$- \rho V\_n \left(\frac{\mathcal{V}}{(\mathcal{V} - 1)} - \beta ||\vec{V}||^2\right) + 2\beta V\_n \left(P^\* + \rho E - \rho ||\vec{V}||^2\right) = \tag{44}$$

$$\frac{\rho V\_n}{P} \left(-\frac{\mathcal{V}}{(\mathcal{V} - 1)}P + \frac{1}{2}\rho ||\vec{V}||^2 + P^\* + \frac{P}{(\mathcal{V} - 1)} - \frac{1}{2}\rho ||\vec{V}||^2\right) =$$

$$\rho V\_n \left(\frac{P^\*}{P} - 1\right) \ge 0$$

Therefore if we choose *P*<sup>∗</sup> = *P*, to be the internal pressure, the boundary flux does not contribute to the total entropy, independent of the inner normal velocity *Vn*. A value of *P*<sup>∗</sup> that leads to a dissipative boundary condition can be found either through exact solution of the Riemann problem at the boundary, or through the use of an external state and an approximate Riemann solver.

#### **3.2.1 Exact Solution of the Riemann Problem**

In [14] a symmetric 1D Riemann problem is exactly solved following Toro [13], to get the wall pressure *P*∗, accounting for the fact that *Vn* never vanishes discretely and therefore the wall pressure should be different from the interior pressure. The exact solution of the 1D Riemann problem reads as

$$\left(\frac{P^\*}{P}\right)\_{\mathrm{RP}} = \begin{cases} 1 + \gamma \mathrm{Ma}\_{\mathrm{fl}} \left( \frac{(\gamma + 1)}{4} \mathrm{Ma}\_{\mathrm{fl}} + \sqrt{\left(\frac{(\gamma + 1)}{4} \mathrm{Ma}\_{\mathrm{fl}}\right)^2 + 1} \right) > 1 & \text{for} \quad V\_{\mathrm{fl}} > 0 \\\\ \left( 1 + \frac{1}{2} (\gamma - 1) \mathrm{Ma}\_{\mathrm{fl}} \right)^{\frac{2\gamma}{(\gamma - 1)}} & \le 1 & \text{for} \quad V\_{\mathrm{fl}} \le 0 \end{cases} \tag{45}$$

with the normal Mach number, Ma*<sup>n</sup>* <sup>=</sup> *Vn <sup>c</sup>* , and the sound speed *c* = \* *γ <sup>P</sup> ρ* .

As shown by Toro [13], the solution for the rarefaction has a limiting vacuum solution for Ma*<sup>n</sup>* ≤ −2*(γ* <sup>−</sup> <sup>1</sup>*)*−1*.* We will restrict our analysis to normal Mach numbers yielding strictly positive pressure solutions only (Ma*<sup>n</sup> <sup>&</sup>gt;* <sup>−</sup>5 for *<sup>γ</sup>* <sup>=</sup> <sup>7</sup> 5 ).

It is easy to see that using *P*∗ from (45), the entropy inequality (44) is still satisfied for |*Vn*| = 0, and the added entropy scales with the discrete value of *Vn* at the boundary. Hence, for *h* → 0, the discrete boundary condition converges to its physical counterpart, since *Vn* → 0. The choice of *P*<sup>∗</sup> from (45) appears to stabilize under-resolved simulations, which can be now explained by the fact that the boundary flux always adds entropy for |*Vn*| = 0.

#### **3.2.2 Using Approximate Riemann Solvers for the Boundary Flux**

A well known strategy in finite volume methods is to mirror only the velocity of the internal state and solve an approximate Riemann problem to get the boundary flux, mostly just because of a simpler implementation, since an approximate Riemann solver is already available and used for the fluxes between the elements. For DG methods, see also, for example, [2] and [12] where reflection conditions are proved to be entropy stable.

The mirror state is set so that the mass and energy flux are zero. Let the inner state be labeled *L* and the outer *R*. then the inner and outer states that satisfy the mirror condition are

$$\mathbf{U}^{L} = \begin{bmatrix} \rho & \rho \vec{V}\_{n} & E \end{bmatrix}^{T}, \quad \mathbf{U}^{R} = \begin{bmatrix} \rho & \rho(\vec{V} - 2V\_{n}\vec{n}) & E \end{bmatrix}^{T} \tag{46}$$

We show below under what conditions on the normal velocity *Vn* that the reflection condition is entropy stable for the Lax-Friedrichs, HLL and HLLC, Roe and EC-ES fluxes.

#### Lax-Friedrichs Flux

We start with the simplest approximate Riemann solver, the Lax-Friedrichs or Rusanov flux, which reads as

$$\left(\vec{\mathbf{F}} \cdot \vec{n}\right)\_{\rm LF}^{\*} = \frac{1}{2}\vec{n} \cdot \left(\vec{\mathbf{F}}(\mathbf{U}^{L}) + \vec{\mathbf{F}}(\mathbf{U}^{R})\right) - \frac{|\lambda|\_{\rm max}}{2}(\mathbf{U}^{R} - \mathbf{U}^{L}).\tag{47}$$

Inserting the states from (46), we get

$$\left(\stackrel{\circ}{\mathbf{F}}\cdot\vec{n}\right)\_{\rm LF}^{\ast} = \left[\begin{pmatrix} 0\\ \left(\rho V\_n^2 + P\right)\vec{n} \\\ 0 \end{pmatrix} - \frac{\lambda\_{\rm max}}{2} \begin{bmatrix} 0\\ -2\rho V\_n \vec{n} \\\ 0 \end{bmatrix} = \left[\begin{pmatrix} 0\\ \left(\rho V\_n^2 + \rho V\_n \lambda\_{\rm max} + P\right)\vec{n} \\\ 0 \end{pmatrix}\right]. \tag{48}$$

The maximum wave speed is normally approximated from the largest leftgoing and rightgoing wave speed,

$$
\lambda\_{\text{max}} = \max(|V\_n^L| + c^L, |V\_n^R| + c^R) = |V\_n| + c, \quad \text{since} \quad c^L = c^R = c,
$$

$$
V\_n = V\_n^L = -V\_n^R \tag{49}
$$

and thus gives a definition of *P*∗

$$\begin{aligned} \left(\frac{P^\*}{P}\right)\_{\rm LF} &= 1 + \gamma \text{Ma}\_{\rm n} \left(\text{Ma}\_{\rm n} + |\text{Ma}\_{\rm n}| + 1\right) \\ &= \begin{cases} 1 + \gamma \text{Ma}\_{\rm n} (2\text{Ma}\_{\rm n} + 1) > 1 & \text{for} \quad V\_n > 0 \\ 1 + \gamma \text{Ma}\_{\rm n} & \le 1 & \text{for} \quad V\_n \le 0 \end{cases}, \end{aligned} \tag{50}$$

which shows that the Lax-Friedrichs flux satisfies the entropy inequality (44).

#### HLL and HLLC Flux

The HLL flux [13] is written as

$$\left(\vec{\mathbf{F}} \cdot \vec{n}\right)\_{\text{HLL}}^{\*} = \frac{1}{S^{R} - S^{L}} \left(\vec{n} \cdot \left(S^{R}\vec{\mathbf{F}}(\mathbf{U}^{L}) - S^{L}\vec{\mathbf{F}}(\mathbf{U}^{R})\right) + S^{L}S^{R}\left(\mathbf{U}^{R} - \mathbf{U}^{L}\right)\right) . \tag{51}$$

The leftgoing and rightgoing wave speeds are *<sup>S</sup><sup>L</sup>* <sup>=</sup> *<sup>V</sup> <sup>L</sup> <sup>n</sup>* <sup>−</sup>*c<sup>L</sup>* = −*<sup>V</sup> <sup>R</sup> <sup>n</sup>* <sup>−</sup>*c<sup>R</sup>* = −*S<sup>R</sup>* and the HLL flux reduces to

$$\left(\vec{\mathbf{F}} \cdot \vec{n}\right)\_{\text{HLL}}^{\*} = \frac{1}{2}\vec{n} \cdot \left(\vec{\mathbf{F}}(\mathbf{U}^{L}) + \vec{\mathbf{F}}(\mathbf{U}^{R})\right) - \frac{S^{R}}{2}\left(\mathbf{U}^{R} - \mathbf{U}^{L}\right). \tag{52}$$

If we would choose *S<sup>R</sup>* to be the maximum wave speed, the HLL flux would reduce to the Lax-Friedrichs flux. However, with *<sup>S</sup><sup>R</sup>* <sup>=</sup> *<sup>V</sup> <sup>R</sup> <sup>n</sup>* <sup>+</sup> *<sup>c</sup><sup>R</sup>* = −*Vn* <sup>+</sup> *<sup>c</sup>*, an even simpler relation for *P*∗ is found, which also satisfies the entropy inequality

$$\left(\frac{P^\*}{P}\right)\_{\text{HLL}} = 1 + \mathcal{\boldsymbol{\chi}} \text{Ma}\_{\text{\tiny{}^\circ}} \begin{cases} > 1 & \text{for} \quad V\_n > 0 \\ \le 1 & \text{for} \quad V\_n \le 0 \end{cases} \tag{53}$$

For the HLLC flux [13], one can show that since the Riemann problem is symmetric, the approximate wave speed of the contact discontinuity is *λ*<sup>∗</sup> = 0 and, choosing *<sup>S</sup><sup>R</sup>* = −*Vn* <sup>+</sup> *<sup>c</sup>*, HLLC reduces to the HLL flux.

$$\left(\frac{P^\*}{P}\right)\_{\text{HL.C}} = 1 + \mathcal{Y}\mathbf{M}\_{\text{lt}} = \left(\frac{P^\*}{P}\right)\_{\text{HL.L}}\tag{54}$$

#### Roe Flux

For the original Roe method without entropy fix [13], the mean values are

$$
\tilde{V}\_n = \frac{\sqrt{\rho^L} V\_n^L + \sqrt{\rho^R} V\_n^R}{\sqrt{\rho^L} + \sqrt{\rho^R}} = 0, \quad \tilde{V}\_{l\_1} = V\_{l\_1}, \quad \tilde{V}\_{l\_2} = V\_{l\_2},
$$

$$
\tilde{c} = c\sqrt{1 + \frac{(\mathcal{V} - 1)}{2} \text{Ma}\_n^2}. \tag{55}
$$

After some manipulations,

$$
\begin{aligned}
\left(\hat{\mathbf{F}} \cdot \vec{n}\right)^{\*}\_{\text{Roe}} &= \left(\hat{\mathbf{F}} \cdot \vec{n}\right) + \tilde{\lambda}\_1 \tilde{\alpha}\_1 \tilde{\mathbf{K}}^1 = \left(\hat{\mathbf{F}} \cdot \vec{n}\right) + (-\tilde{c}) \frac{\rho V\_n}{\tilde{c}} \begin{bmatrix} 1 \\ -\tilde{c} \\ V\_{l\_1} \\ V\_{l\_2} \\ V\_{l\_2} \\ \frac{1}{\rho} (\rho E + P) \end{bmatrix} \\ &= \begin{bmatrix} 0 \\ \left(\rho V\_n^2 + \rho V\_n \tilde{c} + P\right) \vec{n} \\ 0 \end{bmatrix}. \end{aligned} \tag{56}
$$

with *λ*˜ <sup>1</sup> = *V*˜ *<sup>n</sup>* − ˜*<sup>c</sup>* = −˜*c*, *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *ρVn/c*˜ and **<sup>K</sup>**˜ <sup>1</sup> from [13]. This leads again to a definition of *P*∗

$$\left(\frac{P^\*}{P}\right)\_{\text{Rec}} = 1 + \chi \text{Ma}\_n \left(\text{Ma}\_n + \sqrt{1 + \frac{(\chi - 1)}{2} \text{Ma}\_n^2}\right),\tag{57}$$

which fulfills the entropy inequality as long as

$$\mathbf{M}\mathbf{a}\_{\boldsymbol{n}} \ge -\sqrt{\frac{2}{3-\nu}}, \quad \text{for } \nu = \frac{7}{5} \quad \mathbf{M}\mathbf{a}\_{\boldsymbol{n}} > -1.12. \tag{58}$$

Thus, the Roe flux is entropy stable for shocks, but not for supersonic rarefactions.

#### EC-ES Fluxes

We can also apply an entropy conservative (EC) flux that is used for interior element interfaces and add an entropy stable dissipation term (ES) to compute the boundary flux via the mirrored states (46). This is exactly the strategy proposed in Parsani et al. [12] to get the boundary flux. Such an EC-ES flux is presented in Winters et al. [15]

$$\left(\stackrel{\ast}{\mathbf{\tilde{F}}} \cdot \vec{n}\right)\_{\rm ES}^{\ast} = \vec{\mathbf{\tilde{F}}}\_{\rm EC}\left(\mathbf{U}\_{L}, \mathbf{U}\_{R}\right) \cdot \vec{n} - \frac{1}{2} \underline{\mathbf{D}} \underline{\mathbf{H}}\left(\mathbf{W}^{R} - \mathbf{W}^{L}\right) \tag{59}$$

where D is a dissipation matrix and the matrix H **w u** is carefully derived from the left and right states. Details are given in [15], where two approaches for the dissipation are distinguished. One is a Lax-Friedrichs-type dissipation, scaling with the maximum eigenvalue *λ*max = |*Vn*|+*c* (referred to as 'EC-LF'). The other is a Roe-type dissipation computed via the eigenstructure of the matrix *(*D H*)* (referred to as 'EC-Roe').

If we carefully insert the two mirrored boundary states into (59), we again get an equation for the modified pressure

$$\left(\frac{P^\*}{P}\right)\_{\text{EC-LF}} = 1 + \gamma \text{Ma}\_n \left(|\text{Ma}\_n| + 1\right) \tag{60}$$

for the Lax-Friedrichs-type dissipation and

$$\left(\frac{P^\*}{P}\right)\_{\text{EC-Roe}} = 1 + \gamma \text{Ma}\_{\text{\tiny\$}}\tag{61}$$

for the Roe-type dissipation. Both approaches lead to an entropy stable boundary flux when using a mirrored state. Note that the modified pressure of the EC-Roe flux (61) exactly matches the one of the HLL flux (53).

#### **4 Discussion**

In the previous section we have shown conditions under which a specified wall flux is stable. In the linear analysis, the central numerical flux adds no dissipation and is neutrally stable. In the nonlinear analysis, entropy is not generated if the numerical wall pressure is equal to the internal pressure, *P*<sup>∗</sup> = *P* . For upwinded approximations, the amount of energy or entropy dissipation depends on the normal Mach number. Since the boundary condition is only imposed weakly through the numerical flux, the normal Mach number will not be exactly zero except in the convergence limit. In fact, flow computations (especially steady state ones) are usually initiated with an impulsive start, where the initial state is a uniform flow, and the normal Mach number is not zero. This has proved over time to be very robust in practice. The analysis above gives an explanation why.

In the linear analysis the dissipation due to imposing the boundary condition is proportional to the square of the normal Mach number. With an impulsive start initialization, this dissipation will be large. As the flow develops and the

**Fig. 1** Entropy contribution *s* (62) produced by the wall boundary flux. RP refers to the exact Riemann problem (45), LF to (50), EC-LF to (60), HLL to (53) and Roe to (57). Plotted over the normal Mach number ranges |Ma*n*| ≤ 5 on the top and restricted to |Ma*n*| ≤ 1 on the bottom

boundary condition is better enforced, the dissipation reduces, going away only as the approximate solution converges.

A similar effect is observed for the use of the different approximate Riemann solvers in the nonlinear analysis. In Fig. 1, we compare the entropy contribution

$$
\Delta s = (\rho c) \text{Ma}\_n \left( \frac{P^\*}{P} - 1 \right) \tag{62}
$$

for the different wall boundary fluxes, over a range of normal Mach numbers for *(ρc)* = 1 and *γ* = 7*/*5. When the boundary condition is exactly fulfilled (Ma*<sup>n</sup>* = 0), the entropy contribution is zero. For low normal Mach numbers, all fluxes have the same behavior. Compared to the exact Riemann problem (RP), the Lax-Friedrichs flux and the EC-LF flux always produce more entropy whereas the HLL flux produces less entropy for impinging velocities Ma*<sup>n</sup> >* 0. The results of HLLC and EC-Roe fluxes are not plotted, as they coincide with the HLL flux. As shown in the analysis, the Roe flux produces a negative entropy change for supersonic rarefactions, implying that it is not suitable for all flow configurations.

**Acknowledgements** This work was supported by a grant from the Simons Foundation (#426393, David Kopriva). Gregor J. Gassner has been supported by the European Research Council (ERC) under the European Union's Eights Framework Program Horizon 2020 with the research project *Extreme*, ERC grant agreement no. 714487. Florian Hindenlang thanks Eric Sonnendrücker and the Max-Planck Institute for Plasma Physics in Garching for their constant support.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **On the Order Reduction of Entropy Stable DGSEM for the Compressible Euler Equations**

**Florian J. Hindenlang and Gregor J. Gassner**

### **1 Introduction**

Discontinuous Galerkin spectral element collocation method (DGSEM) with either Legendre-Gauss or Legendre-Gauss-Lobatto (LGL) nodes (see e.g. [14]) are among the most efficient variants in the class of element based high order methods, such as e.g. discontinuous Galerkin, flux reconstruction, or summation-by-parts (SBP) finite differences. In particular, the LGL variant, starting in [9], turned out to be similar to a SBP finite difference approximation with simultaneous-approximate-term technique (SAT). This relationship allowed to construct conservative skew-symmetric approximations, e.g. [9, 10, 21], and later enabled DGSEM-LGL approximations that are discretely entropy stable, e.g. [1, 3, 6, 8, 13, 17, 19, 20], and/or kinetic energy preserving [12]. These novel variants of nodal split form DG methods feature drastically increased non-linear robustness towards aliasing induced instabilities and favourable properties regarding the simulation of unresolved turbulence, e.g. [7, 23].

In addition to the very robust dissipative entropy stable versions, it is also possible to construct virtually dissipation free variants by choosing appropriate element interface numerical fluxes. These entropy conserving variants all show an odd-even behavior when experimentally testing the order of convergence, e.g. [9, 21], where the observed convergence order for even polynomial degrees

F. J. Hindenlang

G. J. Gassner (-)

© The Author(s) 2020

Max Planck Institute for Plasma Physics, Garching, Germany e-mail: florian.hindenlang@ipp.mpg.de

Department for Mathematics and Computer Science, Center for Data and Simulation Science, University of Cologne, Cologne, Germany e-mail: ggassner@math.uni-koeln.de

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_2

*N* is *N* and for odd *N* is *N* + 1. Lately, a discussion emerged in the community, with interesting debates during the recent ICOSAHOM conference in London, where researchers reported non-optimal convergence behavior of the entropy stable DGSEM-LGL even with dissipative numerical surface fluxes, e.g. [6].

This paper contributes to this discussion and presents results of an experimental convergence order study for the compressible Euler equations with (1) the standard DGSEM with either Gauss and LGL nodes, (2) the entropy stable DGSEM with LGL nodes. For these nodal schemes, we test the convergence order with different numerical surface fluxes and report the results depending on the Mach number of the test case. The remainder of the paper is organized as follows: in the next section we describe the numerical model for our numerical experiments, in Sect. 3 we present our observed experimental convergence orders for different configurations and draw our conclusion in Sect. 4.

#### **2 Numerical Model**

We consider the compressible Euler equations defined in the domain <sup>⊂</sup> <sup>R</sup><sup>3</sup>

$$\mathbf{u}\_l + \sum\_{l=1}^3 \frac{\partial \mathbf{f}\_l}{\partial x\_l} = \mathbf{0}.\tag{1}$$

The state vector contains the conservative variables and the advective flux components are

$$\mathbf{u} = \begin{bmatrix} \varrho \\ \varrho \\ \varrho\vec{v} \\ E \end{bmatrix} = \begin{bmatrix} \varrho \\ \varrho v\_1 \\ \varrho v\_2 \\ \varrho v\_3 \\ E \end{bmatrix}, \mathbf{f}\_1 = \begin{bmatrix} \varrho v\_1 \\ \varrho v\_1^2 + p \\ \varrho v\_1 v\_2 \\ \varrho v\_1 v\_3 \\ (E+p)v\_1 \end{bmatrix}, \mathbf{f}\_2 = \begin{bmatrix} \varrho v\_2 \\ \varrho v\_2 v\_1 \\ \varrho v\_2^2 + p \\ \varrho v\_2 v\_3 \\ (E+p)v\_2 \end{bmatrix}, \mathbf{f}\_3 = \begin{bmatrix} \varrho v\_3 \\ \varrho v\_3 v\_1 \\ \varrho v\_3 v\_2 \\ \varrho v\_3^2 + p \\ (E+p)v\_3 \end{bmatrix}. \tag{2}$$

Here, *,* <sup>→</sup> *<sup>v</sup>* <sup>=</sup> *(v*1*, v*2*, v*3*)<sup>T</sup> , p, E* are the mass density, fluid velocities, pressure and total energy. We close the system with the ideal gas assumption, which relates the total energy and pressure

$$p = (\boldsymbol{\gamma} - \mathbf{l}) \left( E - \frac{1}{2} \varrho \left\| \vec{v} \right\|^2 \right), \tag{3}$$

where *γ* denotes the adiabatic coefficient.

For our discretization, we subdivide the domain into non-overlapping hexahedral elements. For each element, we define a transfinite mapping to a unit reference space and use this mapping to transform the Eq. (1) from physical to reference space. A weak form is created by taking the inner product of the transformed equation with a test function. We use integration-by-parts for the flux term and approximate the resulting weak form as follows: the conservative variables are approximated by a polynomial in reference space with degree N, interpolated at the Gauss or LGL nodes. The volume fluxes are replaced by a standard interpolation of the non-linear flux function at the same Gauss/LGL nodes (standard DGSEM-Gauss or DGSEM-LGL), see e.g. [14]. For the LGL variant, we are also able to introduce the split form volume integral based on entropy conserving and kinetic energy preserving numerical volume fluxes (Split-DGSEM), e.g. [12] and [22], resulting in either the entropy conserving or entropy stable DGSEM variants, depending on the choice of numerical surface flux.

#### **3 Convergence Results**

In this section, we compare the convergence of the standard DGSEM and the entropy conservative and entropy stable discretization for different choices of the numerical flux and polynomial degrees *N* = 2*,* 3*,* 4*,* 5.

We choose the test case of a two-dimensional density wave, with a constant pressure and transported with a constant velocity, which was proposed for onedimensional convergence tests in [4]. The density evolves as

$$\varrho(\mathbf{x}\_1, \mathbf{x}\_2, t) = 1 + 0.1 \sin \left( \pi \left( (\mathbf{x}\_1 - \upsilon\_1 t) + (\mathbf{x}\_2 - \upsilon\_2 t) \right) \right) \tag{4}$$

with a prescribed velocity *(v*1*, v*2*)*. The pressure is chosen as *p* = 1*/γ* with *γ* = 1*.*4, so that the sound speed ranges between *c* = 0*.*95 *...* 1*.*05. Thus, by changing the velocity, we change the Mach number of the flow Ma = |<sup>→</sup> *v*|*/c*. Three Mach numbers are chosen: Ma ≈ 0*.*2 with *(v*1*, v*2*)* = *(*0*.*1*,* 0*.*15*)*, Ma ≈ 1*.*0 with *(v*1*, v*2*)* = *(*0*.*7*,* 0*.*65*)* and Ma ≈ 3*.*5 with *(v*1*, v*2*)* = *(*2*.*5*,* 2*.*4*)*. The experimental order of convergence (EOC) is computed with the *L*<sup>2</sup> error of the density at *t* = 1.

The convergence study is performed with the open source, three-dimensional curvilinear split-form DG framework FLUXO (www.github.com/project-fluxo). As the test case is two-dimensional, we use fully periodic cartesian meshes of the domain [−1*,* 1] <sup>3</sup> with an equal number of elements in x- and y-directions and always one element in z-direction. Note that *h*<sup>0</sup> in the convergence tables refers to the coarsest mesh level, which is 4<sup>2</sup> elements for *<sup>N</sup>* <sup>=</sup> <sup>2</sup>*,* 3 (*h*<sup>0</sup> <sup>=</sup> <sup>1</sup>*/*2) and 2<sup>2</sup> elements for *N* = 4*,* 5 (*h*<sup>0</sup> = 1).

All simulation results are obtained with an explicit five stage, fourth order accurate low storage Runge–Kutta scheme [2], where a stable time step is computed according to the adjustable coefficient *CFL* ∈ *(*0*,* 1] the local maximum wave speed, and the relative grid size, e.g. [11]. We made sure that the time integrator did not influence the spatial convergence order, by adjusting the CFL number accordingly.

#### *3.1 Standard DGSEM*

The convergence of the standard DGSEM with Gauss-Legendre nodes (DGSEM-Gauss) and with Legendre-Gauss-Lobatto (DGSEM-LGL) is shown in Tables 1 and 2, for the three Mach numbers and two choices of the numerical flux, namely the HLL (Harten, Lax, van Leer) flux and the Roe flux. The results of the LLF (local Lax-Friedrichs) flux and the HLLC flux (HLL variant with three waves, C for 'contact' wave) are reported in the Appendix, as the HLL results are similar to LLF, and HLLC behaves exactly the same as Roe, see Tables 4 and 5. Details on the properties and the implementation of the LLF, HLL, HLLC, and Roe fluxes are found in the book of Toro [18] and the references therein.

For the HLL flux and the low Mach number Ma = 0*.*2, we observe an odd-even behavior with an *order reduction* for even polynomial degrees *N* = 2*,* 4. Also for Ma = 1*.*0, the convergence for even degrees is slightly affected, whereas for the high Mach number, all fluxes converge with full order. Comparing the *L*<sup>2</sup> errors of the finest mesh for HLL and Roe for the low Mach number, HLL is less accurate for *N* = 2*,* 4 and more accurate for *N* = 3*,* 5.

All numerical fluxes are approximate Riemann solvers, but the LLF and HLL only use the maximum wave speeds, whereas the HLLC and Roe also take the contact wave into account, and therefore keep the full order of the scheme for all Mach numbers for this test case.

#### *3.2 Entropy Conservative and Entropy Stable DGSEM*

Now, we investigate the order reduction of the entropy conservative and entropy stable discretizations. Here, the standard DGSEM volume integral is replaced by split-form formulation (Split-DGSEM) using a two-point entropy conservative and kinetic energy preserving flux (ECKEP). If we choose the ECKEP flux at the surface, we get an entropy-conserving scheme. For entropy stability, we can use the LLF or HLL flux directly at the surface, or use the ECKEP flux and add a dissipation term, which must still satisfy the entropy inequality condition. In Winters et al. [22], such dissipation terms are carefully derived, using either only the maximum wave



**Table 1**

(continued) Fullorderismarkedwith-(*N*+1)andanorderreduction

 with



**Table 2**

(continued) Fullorderismarkedwith-( *N* +1)andanorderreduction

 with



(continued)



**Table 3**

(continued) speed (LLF-type) or incorporating all waves (Roe-type), which we will refer to as ECKEP-LLF and ECKEP-Roe fluxes.

In Table 3, we summarize the convergence of the dissipation-free ECKEP flux, the HLL and ECKEP-Roe flux. The results for LLF and ECKEP-LLF fluxes are found in the Appendix in Table 6, as they have the same convergence and error levels as the HLL flux. As expected, the dissipation-free surface flux (ECKEP) produces an order reduction for all Mach numbers for *N* = 3*,* 5, and for *N* = 2 full order is not kept in the last refinement step.

If we simply use the HLL flux, we have an entropy stable scheme, but an order reduction for *N* = 2*,* 4 can be observed for the low Mach number flow, analogously to the standard DGSEM-LGL scheme. Interestingly, the odd-even behavior switches between entropy conserving and entropy stable fluxes.

The ECKEP-Roe entropy stable flux accounts for all waves of the Riemann problem and adjusts the dissipation for each wave accordingly, which gives full order convergence for all Mach numbers.

#### **4 Conclusions**

In this work, we report the convergence of standard DGSEM Gauss and Gauss-Lobatto schemes to entropy conservative (EC) and entropy stable (ES) DGSEM schemes for the Euler equations, as there have been findings of order reduction for EC and ES schemes. We choose a simple density transport test case on a periodic domain and investigate the influence of the Mach number of the transport velocity.

The EC scheme is dissipation free and an order reduction is observed by the convergence study presented here, confirming many similar observations found in literature. We also confirm that the ES scheme can have an order reduction for low Mach numbers, but only if the entropy stable numerical flux relies on simple approximate Riemann solvers such as local Lax-Friedrichs or HLL. If all waves are accounted for in the dissipation term of the entropy stable flux as presented in [22], the full order is observed for all Mach numbers. In addition, we reproduce the same behavior for the standard DGSEM Gauss and Gauss-Lobatto schemes, where the LLF and HLL fluxes suffer from order reduction at low Mach number, and HLLC and Roe fluxes have full order for all Mach numbers.

We want to emphasize that the present convergence study should be seen merely as an observation, confirming that the numerical flux can have strong influence on the convergence order for both the standard DGSEM and the entropy stable DGSEM. Also, we stress that in our tests the order reduction is related to the form of the dissipation term in the numerical surface flux and is not related to the insufficient integration precision of the LGL-quadrature.

Based on the observations presented in this work, a possible explanation for the loss of convergence for the density transport at low Mach numbers when using LLF and HLL fluxes is the form of dissipation from the approximate Riemann solver. In the case of the density transport, the exact solution follows the characteristic with velocity <sup>→</sup> *v*. However, the approximate Riemann solver LLF and HLL consider only two waves with maximum velocity ∼ *(*| → *v*| + *c)* and do not consider the contact wave with velocity <sup>→</sup> *v*. Thus, the contact wave is dissipated proportional to ∼ *(*| → *v*| + *c)* and not to | → *v*|. For low Mach numbers, where *c >* | → *v*|, this causes over-upwinding. Over-upwinding was discussed in [5, 15]. It is not intuitive at first, but over-upwinding (over-penalization) can lead to a reduction of the in-built dissipation of the DG scheme, getting wave-propagation characteristics similar to a continuous Galerkin method [16]. This loss of in-built dissipation could be an explanation for the even-odd behavior we observed. However, it is still unclear why numerical surface fluxes with no in-built dissipation that are symmetric, e.g. EC flux, lead to an odd-even behavior in the convergence order and why numerical surface fluxes with over-upwinding, i.e. reduced dissipation due to over-penalization, cause an opposite even-odd behavior. What supports the explanation is the recovery of full convergence order for LLF and HLL when the difference in wave speed becomes smaller for higher Mach numbers, i.e. no over-upwinding. In contrast to LLF and HLL, the HLLC and Roe solvers take specifically the contact wave into account and adjust the dissipation accordingly and thus avoid strong over-upwinding by construction. In our tests, we always observe full convergence order for all Mach numbers for HLLC and Roe.

Lastly we note that a convergence study using a manufactured solution technique can be misleading, as full convergence order is found independent of the choice of numerical flux. Hence, the introduction of a source term to balance the prescribed solution overcomes possible deficiencies of the surface fluxes, showing the limit of the manufactured solution technique in this context. In the Appendix, the convergence results of a manufactured solution are reported.

**Acknowledgements** Gregor Gassner thanks the European Research Council for funding through the ERC Starting Grant "An Exascale aware and Un-crashable Space-Time-Adaptive Discontinuous Spectral Element Solver for Non-Linear Conservation Laws" (Extreme), ERC grant agreement no. 714487. Florian Hindenlang thanks Eric Sonnendrücker and the Max-Planck Institute for Plasma Physics in Garching for their constant support. We would also like to thank all participants of the ICOSAHOM 2018 for the valuable discussions on the topic of entropy stable schemes, which motivated this work.

#### **Appendix**

#### *Additional Convergence Results*

In this section, we present additional convergence results of the density wave test case for the DGSEM-Gauss and DGSEM-LGL with LLF and HLLC fluxes in Table 4 and Table 5, and also the entropy stable schemes with LLF and ECKEP-LLF fluxes in Table 6. The results for LLF-type fluxes behave like the HLL flux, and for the HLLC flux like the Roe-type fluxes presented in Table 3.



**Table 4**

(continued) Fullorderismarkedwith-( *N* +1)andanorderreduction

 with


Experimentalorderofconvergenceof*L*2errortotheexactdensity(4),usingDGSEM-GLwithLLFandHLLC


**Table 5**

(continued) Fullorderismarkedwith-( *N* + 1)andanorderreductionwith

36 F. J. Hindenlang and G. J. Gassner



**Table 6** (continued) Full order is marked with -( *N* + 1) and an order reduction with

#### *Manufactured Solution with Source Term*

Here, we run a convergence test with the method of manufactured solutions. To do so, we assume a two-dimensional solution of the form

$$\begin{aligned} \mathbf{u} &= \begin{bmatrix} \varrho \ \varrho \upsilon\_1 \ \varrho \upsilon\_2 \ \varrho \upsilon\_3 \ \end{bmatrix}^T = \begin{bmatrix} \mathbf{g} \ \mathbf{g} \ \mathbf{g} \ \mathbf{0} \ \mathbf{0} \ \mathbf{g}^2 \end{bmatrix}^T \\ \text{with } \begin{aligned} \mathbf{g} &= \mathbf{g}(\mathbf{x}\_1, \mathbf{x}\_2, t) = \mathbf{0}. \mathbf{5} \sin(2\pi(\mathbf{x}\_1 + \mathbf{x}\_2 - t)) + 2. \end{aligned} \tag{5}$$

Note that the average Mach number in the domain is Ma = 0*.*8. Inserting (5) into the Euler equations, and using the fact that spatial and time derivatives are *g* = *∂x*1*g* = *∂x*2*g* = −*∂tg*, we get an additional residual

$$\mathbf{u}\_{l} + \sum\_{i=1}^{3} \frac{\partial \mathbf{f}\_{i}}{\partial x\_{l}} = \begin{pmatrix} \mathbf{g}'\\(3\nu - 2)\mathbf{g}' + 2(\nu - 1)\mathbf{g}\mathbf{g}'\\(3\nu - 2)\mathbf{g}' + 2(\nu - 1)\mathbf{g}\mathbf{g}'\\0\\(6\nu - 2)\mathbf{g}' + 2(2\nu - 1)\mathbf{g}\mathbf{g}'\end{pmatrix} \tag{6}$$

To solve the inhomogeneous problem, we subtract the residual from the approximate solution in each Runge–Kutta step. Moreover, we run the test case up to the final time *t* =1*.*0.

In the convergence results for the standard DGSEM Gauss and Gauss-Lobatto, we see that the LLF flux still leads to an order reduction for *N* = 2*,* 4, whereas full order is found for the HLL, HLLC and Roe fluxes, see Tables 7 and 8.

In Table 9 the entropy conservative scheme shows again an order reduction for *N* = 3*,* 5, and the LLF-Type dissipation too, for *N* = 2*,* 4, and for this test case, all entropy stable schemes exhibit full order.


**Table 7**

Experimental

 order of

convergence

 of

*L*2 error of density for the

manufactured

 solution (5), using

DGSEM-Gauss

 with LLF, HLL, HLLC and Roe



On the Order Reduction of Entropy Stable DGSEM for the Compressible Euler... 41


Full order is marked with


*N*

+ 1) and an order reduction with

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Review of Regular Decompositions of Vector Fields: Continuous, Discrete, and Structure-Preserving**

**Ralf Hiptmair and Clemens Pechstein**

### **1 Introduction**

For a bounded Lipschitz domain *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> recall the classical *<sup>L</sup>*2-orthogonal *Helmholtz decompositions*

> **<sup>L</sup>**2*(Ω)* = ∇ *<sup>H</sup>*<sup>1</sup> <sup>0</sup> *(Ω)* <sup>⊕</sup> **<sup>H</sup>***(*div 0*,Ω)* = ∇ *<sup>H</sup>*1*(Ω)* <sup>⊕</sup> **<sup>H</sup>**0*(*div 0*,Ω),*

see, e.g., [9, Ch. XI, Sect. I]. They can be used to derive decompositions of (subspaces of) **H***(***curl***,Ω)*:

$$\mathbf{H}\_0(\mathbf{curl}, \mathfrak{Q}) = \nabla H\_0^1(\mathfrak{Q}) \oplus \mathbf{X}\_N(\mathfrak{Q}), \quad \mathbf{X}\_N(\mathfrak{Q}) := \mathbf{H}\_0(\mathbf{curl}, \mathfrak{Q}) \cap \mathbf{H}(\mathrm{div}\, \mathbf{0}, \mathfrak{Q}),$$

$$\mathbf{H}(\mathbf{curl},\boldsymbol{\Omega}) = \nabla H^1(\boldsymbol{\Omega}) \oplus \mathbf{X}\_T(\boldsymbol{\Omega}), \quad \mathbf{X}\_T(\boldsymbol{\Omega}) := \mathbf{H}(\mathbf{curl},\boldsymbol{\Omega}) \cap \mathbf{H}\_0(\text{div}\, \boldsymbol{0}, \boldsymbol{\Omega}) \dots$$

If the domain *Ω* is convex then the respective complementary space, **X***<sup>N</sup> (Ω)* or **X***<sup>T</sup> (Ω)*, is continuously embedded in the space **H**1*(Ω)* of vector fields with Cartesian components in *H*1*(Ω)*, cf. [1]. Then one can, for instance, write any **u** ∈ **H***(***curl***,Ω)* as

$$
\mathbf{u} = \nabla \ p + \mathbf{z},
\tag{1}
$$

R. Hiptmair (-)

C. Pechstein Dassault Systèmes, Darmstadt, Germany e-mail: clemens.pechstein@3ds.com

© The Author(s) 2020

45

Seminar for Applied Mathematics, ETH Zürich, Zürich, Switzerland e-mail: hiptmair@sam.math.ethz.ch

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_3

with *<sup>p</sup>* <sup>∈</sup> *<sup>H</sup>*1*(Ω)* and **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**1*(Ω)*. Since ∇ *<sup>p</sup>***L**2*(Ω)* ≤ **uL**2*(Ω)* one obtains (using the continuous embedding) the stability property<sup>1</sup>

$$\|\nabla p\|\_{\mathbf{L}^2(\varOmega)} + \|\mathbf{z}\|\_{\mathbf{H}^1(\varOmega)} \le C \|\mathbf{z}\|\_{\mathbf{H}(\mathbf{curl},\varOmega)}.\tag{2}$$

A similar decomposition can be found for **u** ∈ **H**0*(***curl***,Ω)*.

Generally, a decomposition of form (1) with the stability property (2) is called *regular decomposition*, even if **L**2-orthogonality does not hold. Actually, it turns out that (1)–(2) can be achieved even in cases where *Ω* is non-convex, in particular on non-smooth domains, or in cases where *Ω* or its boundary have non-trivial topology; only the **L**2-orthogonality has to be sacrificed, cf. [20].

Noting that <sup>∇</sup> *<sup>H</sup>*1*(Ω)* is contained in the kernel of the **curl** operator and that under mild smoothness assumptions on the domain—the whole kernel is spanned by <sup>∇</sup> *<sup>H</sup>*1*(Ω)* plus a finite-dimensional *co-homology space* [15, Sect. 4] one can achieve a second decomposition,

$$
\mathbf{u} = \mathbf{h} + \mathbf{z},
\tag{3}
$$

with **<sup>h</sup>** <sup>∈</sup> ker*(***curl**|**H***(***curl***,Ω))* and **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**1*(Ω)*, where

$$\|\mathbf{h}\|\_{\mathbf{L}^2(\varOmega)} \le C \left\|\mathbf{u}\right\|\_{\mathbf{L}^2(\varOmega)}, \qquad \|\mathbf{z}\|\_{\mathbf{H}^1(\varOmega)} \le C \left\|\mathbf{curl}\,\mathbf{u}\right\|\_{\mathbf{L}^2(\varOmega)}.\tag{4}$$

The second stability estimate states that if **u** is already in the kernel of the **curl** operator, then **z** is zero. Hence, (1) the operator mapping **u** to **h** is a *projection* onto the kernel space and (2) the complement operator projects **u** to the function **z** of higher regularity **H**1*(Ω)*. For trivial topology of *Ω* and *∂Ω*, the two decompositions (1)– (2) and (3)–(4) coincide.

As a few among many more [17, Sect. 1.5], we would like to highlight two important applications of these regular decompositions.


<sup>1</sup>Here and below *C* stands for a positive "generic constant" that may depend only on *Ω*, unless specified otherwise.

In both applications, it is desirable to obtain the decompositions for minimal smoothness of the domain, e.g., Lipschitz domains, which are not necessarily convex. Moreover, it is also desirable to go beyond decompositions of the entire space **H***(***curl***,Ω)* and extend them to subspaces for which the appropriate trace vanishes on a "Dirichlet part" *ΓD* of the boundary. In this case traces of the two summands should also vanish on *ΓD*.

In the present paper, we provide regular decompositions of both types for subspaces of **H***(***curl***,Ω)* (in Sect. 3) and **H***(*div*,Ω)* (in Sect. 4) comprising functions with vanishing trace on a part *ΓD* of the boundary *∂Ω* for Lipschitz domains *Ω* of *arbitrary topology*. In particular, *Ω* is allowed to have handles, and *∂Ω* and *ΓD* may have several connected components. The Dirichlet boundary *ΓD* must satisfy a certain smoothness assumption that we shall introduce in Sect. 2. In addition to the stability estimates (2) and (4), we show that the decompositions are stable even in **L**2*(Ω)*.

In the final part of the manuscript, in Sect. 5, we establish regular decompositions of spaces of Whitney forms, which are lowest-order conforming finite element subspaces of **H***(***curl***,Ω)* and **H***(*div*,Ω)*, respectively, built upon simplicial triangulations of *Ω*.

This note is based on [17] and is an abridged version of [18]. Please refer to this latter preprint for complete proofs of the results quoted below.

#### **2 Preliminaries**

Since subtle geometric arguments will play a major role for parts of the theory, we start with a precise characterization of the geometric setting: Let *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> be an open, bounded, connected Lipschitz domain.<sup>2</sup> We write d*(Ω)* for its diameter. Its boundary *Γ* := *∂Ω* is partitioned according to *Γ* = *ΓD* ∪ *Σ* ∪ *ΓN* , with relatively open sets *ΓD* and *ΓN* . We assume that this provides a *piecewise C*<sup>1</sup> *dissection* of *∂Ω* in the sense of [12, Definition 2.2]. Sloppily speaking, this means that *Σ* is the union of closed curves that are piecewise *C*1.

Under the above assumptions on *Ω* and *ΓD*, [12, Lemma 4.4] guarantees the existence of an open Lipschitz neighborhood *ΩΓ* ("Lipschitz collar") of *Γ* and of a "bulge" *ΥD* ⊂ *ΩΓ* \*Ω*. We recall the properties of bulge domains from [12, Sect. 2, Thm. 2.3], also stated in [17, Thm. 2.2]:

**Theorem 1 (Bulge-Augmented Domain)** *There exists a* Lipschitz domain *ΥD* ⊂ <sup>R</sup><sup>3</sup> \*Ω, such that <sup>Υ</sup> <sup>D</sup>* <sup>∩</sup>*<sup>Ω</sup>* <sup>=</sup> *ΓD, <sup>Ω</sup><sup>e</sup>* := *ΥD* <sup>∪</sup>*ΓD* <sup>∪</sup>*<sup>Ω</sup> is Lipschitz,* <sup>d</sup>*(Ωe)* <sup>≤</sup> 2 d*(Ω), and Υ <sup>D</sup>* ⊂ *ΩΓ . Moreover, each connected component ΓD,k of ΓD corresponds to a connected component ΥD,k of ΥD, and these have positive distance from each other.*

<sup>2</sup>Strongly Lipschitz, in the sense that the boundary is locally the graph of a Lipschitz continuous function.

Let

$$\begin{aligned} H^{1}\_{\Gamma\_{D}}(\mathcal{Q}) &:= \{ u \in H^{1}(\mathcal{Q}) \colon (\mathcal{Y}u)\_{|\Gamma\_{D}} = 0 \}, \\ \mathbf{H}\_{\Gamma\_{D}}(\mathbf{curl}, \mathcal{Q}) &:= \{ \mathbf{u} \in \mathbf{H}(\mathbf{curl}, \mathcal{Q}) \colon (\mathcal{Y}\_{\mathbf{r}}\mathbf{u})\_{|\Gamma\_{D}} = 0 \}, \\ \mathbf{H}\_{\Gamma\_{D}}(\text{div}, \mathcal{Q}) &:= \{ \mathbf{u} \in \mathbf{H}(\text{div}, \mathcal{Q}) \colon (\mathcal{Y}\_{\mathbf{r}}\mathbf{u})\_{|\Gamma\_{D}} = 0 \}, \end{aligned}$$

denote the standard Sobolev spaces where the distributional gradient, curl, or divergence is in *L*<sup>2</sup> and where the pointwise trace *γ u*, the tangential trace *γτ* **u**, or the normal trace *γn***u**, respectively, vanishes on the Dirichlet boundary *ΓD*, see e.g. [3, 6, 26]. These space are linked via the *de Rham complex*,

$$\begin{aligned} \mathcal{K}\_{\Gamma\_{\mathrm{D}}}(\mathfrak{Q}) \stackrel{\mathrm{id}}{\longrightarrow} H^{1}\_{\Gamma\_{\mathrm{D}}}(\mathfrak{Q}) \stackrel{\nabla}{\longrightarrow} \mathbf{H}\_{\Gamma\_{\mathrm{D}}}(\mathbf{curl},\mathfrak{Q}) \stackrel{\mathrm{curl}}{\longrightarrow} \mathbf{H}\_{\Gamma\_{\mathrm{D}}}(\mathrm{div},\mathfrak{Q}) \stackrel{\mathrm{div}}{\longrightarrow} L^{2}(\mathfrak{Q}), \end{aligned} \tag{5}$$

where

$$\mathcal{K}\_{\Gamma\_D}(\mathfrak{Q}) := \{ v \in H\_{\Gamma\_D}^1(\mathfrak{Q}) \colon v = \text{const} \} = \begin{cases} \text{span}\{1\}, & \text{if } \Gamma\_D = \emptyset, \\ \{0\}, & \text{otherwise}. \end{cases}$$

The range of each operator in (5) lies in the kernel space of the succeeding one, cf. [3, Lemma 2.2]. We define

$$\begin{aligned} \mathbf{H}\_{\Gamma\_{D}}(\mathbf{curl}\,\mathbf{0},\mathcal{Q}) &:= \{ \mathbf{v} \in \mathbf{H}\_{\Gamma\_{D}}(\mathbf{curl},\mathcal{Q}) \colon \mathbf{curl}\,\mathbf{v} = 0 \}, \\\mathbf{H}\_{\Gamma\_{D}}(\text{div}\,\mathbf{0},\mathcal{Q}) &:= \{ \mathbf{v} \in \mathbf{H}\_{\Gamma\_{D}}(\text{div},\mathcal{Q}) \colon \text{div}\,\mathbf{v} = 0 \}. \end{aligned} \tag{6}$$

Barring topological obstructions these kernels can be represented through potentials: Let *β*1*(Ω)* denote the *first Betti number* of *Ω* (the number of "handles") and *β*2*(Ω)* the *second Betti number* (the number of connected components of *∂Ω* minus one). By the very definition of the Betti numbers as dimensions of co-homology spaces we have

$$\beta\_1(\mathcal{Q}) = 0 \quad \Longrightarrow \quad \mathbf{H}(\mathbf{curl} \, 0, \mathcal{Q}) = \nabla H^1(\mathcal{Q}), \tag{7}$$

$$\beta\_2(\Omega) = 0 \quad \Longrightarrow \quad \mathbf{H}(\text{div}\, 0, \Omega) = \mathbf{curl}\, \mathbf{H}(\text{curl}, \Omega), \tag{8}$$

cf. [26]. We call *Ω* topologically trivial if *β*1*(Ω)* = *β*2*(Ω)* = 0.

#### **3 Regular Decompositions and Potentials Related to H***(***curl***)*

Throughout we rely on the properties of *Ω* and *ΓD* as introduced in Sect. 2 and use the notations from Theorem 1. We write *C* for positive "generic constants" and say that a constant "depends only on the shape of *Ω* and *ΓD*", if it depends on the geometric setting alone, but is invariant with respect to similarity transformations. To achieve this the diameter of *Ω* will have to enter the estimates; we denote it by d*(Ω)*.

#### *3.1 Gradient-Based Regular Decomposition of* **H***(***curl***)*

The following theorem is essentially [17, Thm. 2.1].

**Theorem 2 (Gradient-Based Regular Decomposition of H***(***curl***)***)** *Let (Ω , ΓD) satisfy the assumptions of Sect. 2. Then for each* **u** ∈ **H***ΓD (***curl***,Ω) there exist* **z** ∈ **H**1 *ΓD (Ω) and <sup>p</sup>* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> *ΓD (Ω) depending linearly on* **u** *such that*

*(i)* **u** = **z** + ∇ *p,*

*(ii)* **z**0*,Ω* +∇ *p*0*,Ω* ≤ *C* **u**0*,Ω ,*

$$\begin{aligned} \text{(iii)}\\ \text{(iii)} \end{aligned} \quad \|\begin{aligned} \nabla \operatorname{\mathbf{z}}\|\_{0,\Omega} + \frac{1}{\operatorname{d}(\mathcal{Q})} \|\mathbf{z}\|\_{0,\Omega} \leq C \|\operatorname{\mathbf{curl}}\,\mathbf{u}\|\_{0,\Omega} + \frac{1}{\operatorname{d}(\mathcal{Q})} \|\mathbf{u}\|\_{0,\Omega} \,\,\mathbf{u} \end{aligned}$$

*with constants depending only on the shape of Ω and ΓD, but not on* d*(Ω).*

*Remark 1* An early decomposition of a subspace of **H***(***curl***,Ω)* ∩ **H***(*div*,Ω)* into a regular part in **<sup>H</sup>**1*(Ω)* and a singular part in <sup>∇</sup>*H*1*(Ω)* can be found in [4] and in [5, Proposition 5.1], see also [7, Sect. 3] and references therein. Theorem 2 was proved in [14, Lemma 2.4] for the case of *ΓD* <sup>=</sup> *∂Ω* and without the **<sup>L</sup>**2 stability estimate, following [5, Proposition 5.1]. Pasciak and Zhao [28, Lemma 2.2] provided a version for simply connected *Ω* and the case *ΓD* = *∂Ω with* pure **L**2-stability, but *p* is only constant on each connected component of *∂Ω* (see also Theorem 5 and Remark 3). This result was refined in [24, Thm. 3.1]. For the case *ΓD* = ∅, [14, Lemma 2.4] gives a similar decomposition but ∇*p* must be replaced by an element from **H***(***curl** 0*,Ω)* in general. Finally, Theorem 2 without the pure **L**2-stability was proved in [20, Thm. 5.2].3

*Remark 2* The constant *C* in Theorem 2 depends mainly on the stability constants of key extension operators. If the bulge *ΥD* has multiple components *ΥD,k*, the final estimate will depend on the relative distances between *ΥD,k*, *ΥD,*, *k* = and the ratios d*(ΥD,k)/* d*(Ω)*.

<sup>3</sup>This reference contains a typo which is easily identified when inspecting the proof: In general, **z** cannot be estimated in terms of **curl u**0*,Ω* but one must use the full **H***(***curl***)* norm.

*Remark 3* If *ΓD* <sup>=</sup> *∂Ω*, one obtains only *<sup>p</sup>* <sup>∈</sup> *<sup>H</sup>*1*(Ω)* being constant on each connected component of *ΓD* but the improved bound

$$\|\nabla \mathbf{z}\|\_{0,\Omega} + \mathbf{d} (\mathcal{Q})^{-1} \|\mathbf{z}\|\_{0,\Omega} \le C \|\mathbf{curl}\,\mathbf{u}\|\_{0,\Omega} \,\,\omega$$

Results on regular decompositions in this special case can be found in [24, 28].

#### *3.2 Regular Potentials for Some Divergence-Free Functions*

Let the domain *Ω* and the Dirichlet boundary part *ΓD* be as introduced in Sect. 2 and let *Γi*, *i* = 0*,...,β*2*(Ω)*, denote the connected components of *∂Ω*, where *β*2*(Ω)* is the second Betti number of *Ω*.

We define the space<sup>4</sup>

$$\mathbf{H}\_{\Gamma\_{D}}(\text{div}\,00,\mathfrak{L}) := \left\{ \mathbf{q} \in \mathbf{H}\_{\Gamma\_{D}}(\text{div}\,0,\mathfrak{L}) \colon \langle \boldsymbol{\eta}\_{\mathfrak{n}}\mathbf{q},\mathbf{l} \rangle\_{\varGamma\_{l}} = 0, \ i = 0,\ldots,\beta\_{2}(\mathfrak{L}) \right\}. \tag{9}$$

Above *γn* denotes the normal trace operator, and the duality pairing is that between *<sup>H</sup>* <sup>−</sup>1*/*<sup>2</sup>*(Γi)* and *<sup>H</sup>*1*/*<sup>2</sup>*(Γi)*. If *ΓD* = ∅ we simply drop the subscript *ΓD*. Obviously,

$$\mathbf{H}\_{\Gamma\_D}(\text{div}\,00,\,\Omega) \subset \mathbf{H}(\text{div}\,00,\,\Omega)\,.$$

The next result identifies the above space as the range of the curl operator.

**Theorem 3 (Regular Potential of** Range*(***curl***)***)** *Let (Ω , ΓD) be as in Sect. 2 and assume in addition that each connected component ΥD,k of the bulge has vanishing first Betti number, β*1*(ΥD,k)* = 0*. Then*

$$\mathbf{H}\_{\Gamma\_{\mathrm{D}}}(\mathrm{div}\,\,\mathbf{0}0,\,\Omega) = \mathbf{curl}\,\,\mathbf{H}\_{\Gamma\_{\mathrm{D}}}(\mathbf{curl},\,\Omega) = \mathbf{curl}\,\,\mathbf{H}\_{\Gamma\_{\mathrm{D}}}^{\mathrm{l}}(\mathcal{Q})\,,,$$

*and for each* **<sup>q</sup>** <sup>∈</sup> **<sup>H</sup>***ΓD (*div 00*,Ω) there exists <sup>ψ</sup>* <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) depending linearly on* **q** *such that*

$$\mathbf{curl}\,\boldsymbol{\Psi} = \mathbf{q} \quad \text{and} \quad \|\boldsymbol{\nabla}\,\boldsymbol{\Psi}\|\_{0,\Omega} + \frac{1}{\mathbf{d}(\mathcal{Q})} \|\boldsymbol{\Psi}\|\_{0,\Omega} \le C\,\|\mathbf{q}\|\_{0,\Omega}\,\,\boldsymbol{\Psi}$$

*where C depends only on the shape of Ω and ΓD, but not on* d*(Ω)*

<sup>4</sup>Alternatively we can define **H***ΓD (*div 00*,Ω)* as the functions in **H***ΓD (*div 0*,Ω)* orthogonal to the *harmonic Dirichlet fields* **H***(*div 0*,Ω)* ∩ **H**0*(***curl** 0*,Ω)*.

*Remark 4* For the case that *ΓD* = ∅, we reproduce the classical result

$$\mathbf{H}(\text{div}\,00,\mathcal{Q}) = \mathbf{curl}\,\mathbf{H}(\mathbf{curl},\mathcal{Q}) = \mathbf{curl}\,\mathbf{H}^{\mathrm{l}}(\mathcal{Q}),$$

see [11, Thm. 3.4]. In that case, Step 4 of the proof can be left out and *ψ* = **w**<sup>1</sup> which is why div *ψ* = 0 in *Ω*. This property, however, is lost in the general case.

#### *3.3 Rotation-Bounded Regular Decomposition of* **H***(***curl***)*

We can now formulate another *new* variety of regular decompositions, for which the **H**1-component will vanish for curl-free fields.

**Theorem 4 (Rotation-Bounded Regular Decomposition of H***(***curl***)* **(I))** *Let (Ω , ΓD) be as in Sect. 2 and assume, in addition, that each connected component ΥD,k of the bulge has vanishing first Betti number, β*1*(ΥD,k)* = 0*. Then, for each* **<sup>u</sup>** <sup>∈</sup> **<sup>H</sup>***ΓD (***curl***,Ω) there exist* **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) and a curl-free vector field* **h** ∈ **H***ΓD (***curl** 0*,Ω), depending linearly on* **u** *such that*

$$
\mathbf{u} = \mathbf{z} + \mathbf{h},
$$

**h**0*,Ω* ≤ **u**0*,Ω* + *C* d*(Ω)* **curl u**0*,Ω ,*

$$\|\nabla \mathbf{z}\|\_{0,\Omega} + \frac{1}{\mathbf{d}(\mathcal{Q})} \|\mathbf{z}\|\_{0,\Omega} \le C \|\mathbf{curl}\,\mathbf{u}\|\_{0,\Omega},$$

*where C depends only on the shape of Ω and ΓD, but not on* d*(Ω).*

*Remark 5* The constant *C* in Theorem 4 depends essentially on the stability constants of the divergence-free extension operator *E*div*,*<sup>0</sup> *<sup>Ω</sup><sup>e</sup>* and the (adapted) Stein extension operator *E*∇*,*Stein *ΥD* .

Another *stronger* version of the rotation-bounded regular decomposition of **H***(***curl***)* gets rid of the assumptions on the topology of the Dirichlet boundary and has improved stability properties (though with less explicit constants).

**Theorem 5 (Rotation-Bounded Regular Decomposition of H***(***curl***)* **(II))** *Let (Ω , ΓD) be as in Sect. 2. Then for each* **<sup>u</sup>** <sup>∈</sup> **<sup>H</sup>***ΓD (***curl***,Ω) there exist* **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) and a curl-free* **h** ∈ **H***ΓD (***curl** 0*,Ω) depending linearly on* **u** *such that*

$$\begin{aligned} \mathbf{u} &= \mathbf{z} + \mathbf{h} \,, \\ \|\mathbf{z}\|\_{0,\mathcal{Q}} + \|\mathbf{h}\|\_{0,\mathcal{Q}} &\leq C \, \|\mathbf{u}\|\_{0,\mathcal{Q}} \,, \\ \|\,\nabla \mathbf{z}\|\_{0,\mathcal{Q}} + \mathbf{d}(\mathcal{Q})^{-1} \|\mathbf{z}\|\_{0,\mathcal{Q}} &\leq C \, \|\,\mathbf{curl} \, \mathbf{u}\|\_{0,\mathcal{Q}} \,, \end{aligned}$$

*where C depends only on the shape of Ω and ΓD, but not on* d*(Ω).*

*Remark 6* For the case *ΓD* = *∂Ω* the result of the theorem is already proved by Remark <sup>3</sup> since we obtain **<sup>u</sup>** <sup>=</sup> **<sup>z</sup>** + ∇*<sup>p</sup>* with <sup>∇</sup>*<sup>p</sup>* ∈ ∇*H*<sup>1</sup> <sup>0</sup>*,*const*(Ω)* = **H**0*(***curl***,Ω)*.

*Remark 7* We would like to emphasize that both in Theorems 2 and 5, the domain *Ω* may be non-convex, non-smooth, and may have non-trivial topology: It may have handles and its boundary may have multiple components. Also the Dirichlet boundary *ΓD* may have multiple components, each of which with non-trivial topology. Moreover, we have the pure **L**2*(Ω)*-stability in both theorems. In this sense, the results of Theorems 2 and 5 are superior to those found, e.g., in [7, Thm 3.4], [19] or the more recent ones in [8, Thm. 2.3], [22].

*Remark 8* If *Ω* has vanishing first Betti number, *β*1*(Ω)* = 0, then **H***ΓD (***curl** 0*,Ω)* = ∇*H*<sup>1</sup> *ΓD,*const*(Ω)*. Hence, we can split each **<sup>u</sup>** <sup>∈</sup> **<sup>H</sup>***ΓD (***curl***,Ω)* into **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω)* and <sup>∇</sup>*<sup>p</sup>* with *<sup>p</sup>* <sup>∈</sup> *<sup>H</sup>*1*(Ω)* being constant on each connected component of *ΓD*. If *ΓD* is connected, then *<sup>p</sup>* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> *ΓD (Ω)*. Summarizing, if *Ω* has no handles and if *ΓD* is connected, then we have the combined features of Theorems 2 and 5.

Finally, we mention that the regular decomposition theorems spawn projection operators that play a fundamental role in the analysis of weak formulations of Maxwell's equations in frequency domain [14, Sect. 5].

**Corollary 1** *Let (Ω , ΓD) be as in Sect. 2. Then there exist continuous projection operators* <sup>R</sup>: **<sup>H</sup>***ΓD (***curl***,Ω)* <sup>→</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) and* N: **H***ΓD (***curl***,Ω)* → **H***ΓD (***curl** 0*,Ω) such that* R + N = id *and*

$$\|\mathsf{Av}\|\_{\mathbf{H}^1(\mathcal{Q})} + \|\mathsf{Nv}\|\_{\mathbf{L}^2(\mathcal{Q})} \le C \,\|\mathbf{v}\|\_{\mathbf{H}(\mathbf{curl},\mathcal{Q})} \qquad \forall \mathbf{v} \in \mathbf{H}(\mathbf{curl},\mathcal{Q}),$$

*where C is a constant independent of* **v***. Moreover,* F: **H***ΓD (***curl***,Ω)* → **H***ΓD (***curl***,Ω) defined by* F**v** := R**v** − N**v** *is an isomorphism.*

*Remark 9* The **L**2-estimates from Theorem 4 then show that the corresponding operator R can be extended to a continuous operator mapping from **L**2*(Ω)* to **L**2*(Ω)*.

#### **4 Regular Decompositions and Potentials Related to H***(***div***)*

The developments of this section are largely parallel to those of Sect. 3 with some new aspects concerning extensions and topological considerations.

#### *4.1 Rotation-Based Regular Decomposition of* **H***(***div***)*

The following theorem is the **H***(*div*)*-counterpart of Theorem 2.

**Theorem 6 (Rotation-Based Regular Decomposition of H***(*div*)***)** *Let (Ω , ΓD) satisfy the assumptions made in Sect. 2. Then for each* **v** ∈ **H***ΓD (*div*,Ω) there exist* **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) and* **<sup>q</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) depending linearly on* **v** *such that*

$$\begin{aligned} \mathbf{v} &= \mathbf{z} + \mathbf{curl} \, \mathbf{q}, \\ \|\mathbf{z}\|\_{0,\mathcal{Q}} + \|\mathbf{curl} \, \mathbf{q}\|\_{0,\mathcal{Q}} + \frac{1}{\mathbf{d}(\mathcal{Q})} \|\mathbf{q}\|\_{0,\mathcal{Q}} &\leq C \|\mathbf{v}\|\_{0,\mathcal{Q}} \, , \\ \|\mathbf{V} \mathbf{z}\|\_{0,\mathcal{Q}} + \frac{1}{\mathbf{d}(\mathcal{Q})} \|\mathbf{z}\|\_{0,\mathcal{Q}} + \frac{1}{\mathbf{d}(\mathcal{Q})} \|\nabla \mathbf{q}\|\_{0,\mathcal{Q}} &\leq C \left( \|\mathbf{curl} \, \mathbf{v}\|\_{0,\mathcal{Q}} + \frac{1}{\mathbf{d}(\mathcal{Q})} \|\mathbf{v}\|\_{0,\mathcal{Q}} \right) \,, \end{aligned}$$

*with constant C depending only on the shape of Ω and ΓD, but not on* d*(Ω).*

#### *4.2 Regular Potential with Prescribed Divergence*

The next result carries Theorem 3 over to **H***(*div*)*.

**Theorem 7 (Regular Potentials for the Image Space of** div**)** *Let (Ω , ΓD) be as in Sect. 2 and, in addition, assume that each connected component ΥD,k of the bulge has a connected boundary, i.e., β*2*(ΥD,k)* = 0*. Then*

$$L^2(\Omega) = \text{div}\, \mathbf{H} \\ \tau\_D(\text{div}, \Omega) = \text{div}\, \mathbf{H}^{\text{l}}\_{\Gamma\_D}(\Omega).$$

*Moreover, for each <sup>v</sup>* <sup>∈</sup> *<sup>L</sup>*2*(Ω) there exists* **<sup>q</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) depending linearly on v such that, with a constant C depending on Ω and ΓD but not on* d*(Ω),*

$$\operatorname{div}\mathbf{q} = v \quad \text{and} \quad \|\nabla\mathbf{q}\|\_{0,\Omega} + \frac{1}{\operatorname{d}(\Omega)} \|\mathbf{q}\|\_{0,\Omega} \le C \,\|v\|\_{0,\Omega} \,\,\omega$$

#### *4.3 Divergence-Bounded Regular Decompositions of* **H***(***div***)*

We can now formulate other variants of regular decompositions of **H***(*div*)* in analogy to what we did in Sect. 3.3.

**Theorem 8 (Divergence-Bounded Regular Decomposition of H***(*div*)* **(I))** *Let (Ω , ΓD) be as in Sect. 2. In addition, assume that each connected component ΥD,k of the bulge has a connected boundary, i.e., β*2*(ΥD,k)* = 0*. Then, for each* **<sup>v</sup>** <sup>∈</sup> **<sup>H</sup>***ΓD (*div*,Ω) there exists* **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) and a divergence-free vector field* **h** ∈ **H***ΓD (*div 0*,Ω) depending linearly on* **v** *such that*

$$\mathbf{v} = \mathbf{z} + \mathbf{h},\tag{10}$$

$$\|\mathbf{h}\|\_{0,\Omega} \le \|\mathbf{v}\|\_{0,\Omega} + C\,\mathbf{d}(\mathcal{Q})\|\,\mathrm{div}\,\mathbf{v}\|\_{0,\Omega} \,,\tag{11}$$

$$\|\nabla \mathbf{z}\|\_{0,\Omega} + \frac{1}{\mathbf{d}(\Omega)} \|\mathbf{z}\|\_{0,\Omega} \le C \|\operatorname{div} \mathbf{v}\|\_{0,\Omega} \,. \tag{12}$$

*where C depends only on the shape of Ω and ΓD, but not on* d*(Ω).*

The last variant of **H***(*div*)* regular decomposition of **H***(*div*)* dispenses with the assumptions on the topology of the Dirichlet boundary and has better stability properties than the splitting from Theorem 8 (though with less explicit constants).

**Theorem 9 (Divergence-Bounded Regular Decomposition of H***(*div*)* **(II))** *Let (Ω , ΓD) be as in Sect. 2. Then, for each* **<sup>v</sup>** <sup>∈</sup> **<sup>H</sup>***ΓD (*div*,Ω) there exists* **<sup>z</sup>** <sup>∈</sup> **<sup>H</sup>**<sup>1</sup> *ΓD (Ω) and a divergence-free vector field* **h** ∈ **H***ΓD (*div 0*,Ω) depending linearly on* **v** *such that*

$$\mathbf{v} = \mathbf{z} + \mathbf{h},\tag{13}$$

**z**0*,Ω* + **h**0*,Ω* ≤ **v**0*,Ω ,* (14)

$$\|\nabla \mathbf{z}\|\_{0,\Omega} + \frac{1}{\mathbf{d}(\Omega)} \|\mathbf{z}\|\_{0,\Omega} \le C \parallel \text{div}\, \mathbf{v} \|\_{0,\Omega},\tag{15}$$

*where C depends only on the shape of Ω and ΓD, but not on* d*(Ω).*

#### **5 Discrete Counterparts of the Regular Decompositions**

The discrete setting to which we want to extend the concept of regular decompositions is provided by finite element exterior calculus (FEEC, [2]) which introduces finite element subspaces of **H***(***curl***)* and **H***(*div*)* as special instances of spaces of discrete differential forms. In this section we confine ourselves to the lowest-order case of piecewise linear finite element functions.

Throughout, we assume that *(Ω , ΓD)* is as in Sect. 2, and, additionally, that *Ω* is a polyhedron and that *∂ΓD* consists of straight line segments. All considerations take for granted a shape-regular family of meshes {*<sup>T</sup> <sup>h</sup>*}*<sup>h</sup>* of *<sup>Ω</sup>*, consisting of tetrahedral elements, and resolving *ΓD* in the sense that *ΓD* is a union of faces of some of the tetrahedra.

The following finite element spaces will be relevant:


Functions in *<sup>W</sup> h,ΓD (Ω)*, = 1*,* 2*,* 3, are so-called Whitney forms, lowest-order discrete differential forms of the first family as introduced in [13] and [2, Sect. 5].

#### *5.1 Discrete Regular Decompositions for Edge Elements*

Commuting projectors, also known as co-chain projectors, are the linchpin of FEEC theory [2, Sect. 7], and it is not different with our developments. Thus, let

$$\begin{aligned} \mathcal{R}^{0}\_{h,\Gamma\_{\mathcal{D}}} & \colon H^{1}\_{\Gamma\_{\mathcal{D}}}(\mathfrak{Q}) \to \mathcal{W}^{0}\_{h,\Gamma\_{\mathcal{D}}}(\mathfrak{Q})\\ \text{and} \quad \mathbf{R}^{1}\_{h,\Gamma\_{\mathcal{D}}} & \colon \mathbf{H}\_{\Gamma\_{\mathcal{D}}}(\mathbf{curl},\mathfrak{Q}) \to \mathcal{W}^{1}\_{h,\Gamma\_{\mathcal{D}}}(\mathfrak{Q}) \end{aligned}$$

denote the *continuous*, *boundary-aware* cochain projectors from [17, Sect. 3.2.6], which extend the pioneering work [10] by Falk and Winther. These two linear operators are projectors onto their ranges, they fulfill the commuting property

$$\nabla(\mathcal{R}\_{h,\Gamma\_{\mathcal{D}}}^{0}\varphi) = \mathbf{R}\_{h,\Gamma\_{\mathcal{D}}}^{1}(\nabla\varphi) \qquad \forall \varphi \in H\_{\Gamma\_{\mathcal{D}}}^{1}(\mathcal{Q})\,,\tag{16}$$

and local stability estimates

**Theorem 10 ([17, Thm. 1.2])** *For each* **<sup>v</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>1</sup> <sup>0</sup>*,ΓD (Ω) there exists a continuous and piecewise linear vector field* **<sup>z</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>0</sup> *h,ΓD (Ω), a continuous and piecewise linear scalar function ph* <sup>∈</sup> *<sup>W</sup>*<sup>0</sup> *h,ΓD (Ω), and a remainder* <sup>3</sup>**v***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>1</sup> <sup>0</sup>*,ΓD (Ω), all depending linearly on* **v***h, providing the discrete regular decomposition*

$$\mathbf{v}\_h = \mathbf{R}\_{h, \Gamma\_D}^1 \mathbf{z}\_h + \widetilde{\mathbf{v}}\_h + \nabla \ p\_h$$

*and satisfying the stability estimates*

$$\|\mathbf{z}\_h\|\_{0,\Omega} + \|\nabla \operatorname{p}\_h\|\_{0,\Omega} + \|\widetilde{\mathbf{v}}\_h\|\_{0,\Omega} \le C \left\|\mathbf{v}\_h\right\|\_{0,\Omega},\tag{17}$$

$$\|\|\nabla \mathbf{z}\_h\|\|\_{0,\Omega} + \|h^{-1}\widetilde{\mathbf{v}}\_h\|\|\_{0,\Omega} \le C \left( \|\operatorname{curl}\mathbf{v}\_h\|\|\_{0,\Omega} + \frac{1}{\operatorname{d}(\Omega)} \|\mathbf{v}\_h\|\|\_{0,\Omega} \right),\qquad(18)$$

*where C is a generic constant that depends only on the shape of (Ω , ΓD), but not on* <sup>d</sup>*(Ω), and on the shape regularity constant of <sup>T</sup> h(Ω). Above, <sup>h</sup>*−<sup>1</sup> *is the piecewise constant function that is equal to h*−<sup>1</sup> *<sup>T</sup> on every element T .*

Obviously, this is a discrete counterpart of the regular decomposition of **H***(***curl***)* from Theorem 2. The following theorem appears to be new and it corresponds to the rotation-bounded regular decomposition of Theorem 5. For the sake of brevity define the discrete nullspace of the curl operator

$$\mathcal{N}\_h^1 := \{ \mathbf{v}\_h \in \mathcal{W}\_{h, \Gamma\_D}^1(\mathcal{Q}) \colon \mathbf{curl} \, \mathbf{v}\_h = 0 \} \,. \tag{19}$$

If *<sup>Ω</sup>* and *ΓD* have simple topology, *<sup>X</sup><sup>h</sup>* = ∇ *<sup>W</sup>*<sup>0</sup> *h,ΓD (Ω)*, but if the first Betti number of *Ω* is non-zero, or if *ΓD* has multiple components, then a finite-dimensional cohomology space has to be added [2, Sect. 5.6].

**Theorem 11 (Rotation-Bounded Discrete Regular Decomposition for Edge Elements)** *For each* **<sup>v</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>1</sup> <sup>0</sup>*,ΓD (Ω) there exists a continuous and piecewise linear vector field* **<sup>z</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>0</sup> *h,ΓD (Ω), an* curl-free *edge element function* **<sup>h</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>N</sup>* <sup>1</sup> *<sup>h</sup> , and a remainder* <sup>3</sup>**v***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>1</sup> <sup>0</sup>*,ΓD (Ω), all depending linearly on* **v***h, providing the discrete regular decomposition*

$$\mathbf{v}\_h = \mathbf{R}\_{h, \Gamma\_D}^1 \mathbf{z}\_h + \widetilde{\mathbf{v}}\_h + \mathbf{h}\_h$$

*and satisfying the* stability bounds

$$\left\|\frac{\|\mathbf{z}\_{h}\|\_{0,\Omega}}{\|\mathbf{h}\_{h}\|\_{0,\Omega}}\right\| \leq C \left\|\mathbf{v}\_{h}\right\|\_{0,\Omega}, \qquad \qquad \left\|\frac{\nabla\mathbf{z}\_{h}\|\_{0,\Omega}}{\|h^{-1}\widetilde{\mathbf{v}}\_{h}\|\_{0,\Omega}}\right\| \leq C \left\|\mathbf{curl}\,\mathbf{v}\_{h}\right\|\_{0,\Omega}.$$

*where C is a uniform constant that depends only on the shape of (Ω , ΓD), but not on* <sup>d</sup>*(Ω), and on the shape regularity constant of <sup>T</sup> h(Ω).*

We stress that the statements of Theorems 10 and 11 do not hinge on *any* assumptions on the topological properties of *Ω* and *ΓD*.

#### *5.2 Discrete Regular Decompositions for Face Elements*

For face elements, the construction of a boundary-aware co-chain projection operator

$$\mathbf{R}^2\_{h,\Gamma\_D} \colon \mathbf{H}\_{\Gamma\_D}(\text{div}, \mathfrak{Q}) \to \mathcal{W}^2\_{h,\Gamma\_D}(\mathfrak{Q})$$

that commutes with **R**<sup>1</sup> *h,ΓD* and the **curl**-operator has not yet been accomplished. Fortunately, in the case *ΓD* = ∅, this operator is available from [10]. Thus, in the following, we treat only the case *ΓD* = ∅ and just omit the subscript *ΓD*. Then, from [10] we can borrow a linear operator **R**<sup>2</sup> *<sup>h</sup>* : **<sup>H</sup>***(*div*,Ω)* <sup>→</sup> *<sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω)* such that

$$\mathbf{curl}\,\mathbf{R}\_h^{\mathrm{l}}\mathbf{u} = \mathbf{R}\_h^2 \,\mathbf{curl}\,\mathbf{u} \qquad \forall \mathbf{u} \in \mathbf{H}(\mathbf{curl}, \Omega) \,. \tag{20}$$

The next result takes Theorem 6 to the discrete setting.

**Theorem 12 (Discrete Regular Decomposition of** *<sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω)***)** *For each vector field* **<sup>v</sup>***<sup>h</sup> in the lowest-order Raviart-Thomas space <sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω), there exists a continuous and piecewise linear vector field* **<sup>z</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>0</sup> *<sup>h</sup>(Ω), a vector field* **q***<sup>h</sup> in the lowest-order Nédélec space <sup>W</sup>*<sup>1</sup> *<sup>h</sup>(Ω), and a remainder* <sup>3</sup>**v***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω), all depending linearly on* **v***h, providing the discrete regular decomposition*

$$\mathbf{v}\_h = \mathbf{R}\_h^2 \mathbf{z}\_h + \widetilde{\mathbf{v}}\_h + \mathbf{curl} \, \mathbf{q}\_h \,, .$$

*and the stability estimates*

$$\begin{aligned} \left\| \operatorname{curl} \mathbf{q}\_{h} \right\|\_{0,\mathcal{Q}} + \frac{1}{\operatorname{d}(\mathcal{Q})} \left\| \mathbf{q}\_{h} \right\|\_{0,\mathcal{Q}} \\ \left\| \widetilde{\mathbf{v}}\_{h} \right\|\_{0,\mathcal{Q}} \end{aligned} \right\| \leq C \left\| \mathbf{v}\_{h} \right\|\_{0,\mathcal{Q}},$$
 
$$\begin{aligned} \left\| \operatorname{\nabla} \mathbf{z}\_{h} \right\|\_{0,\mathcal{Q}} \\ \left\| h^{-1} \widetilde{\mathbf{v}}\_{h} \right\|\_{0,\mathcal{Q}} \end{aligned} \right\| \leq C \left\| \operatorname{div} \mathbf{v}\_{h} \right\|\_{0,\mathcal{Q}} + \frac{1}{\operatorname{d}(\mathcal{Q})} \left\| \mathbf{v}\_{h} \right\|\_{0,\mathcal{Q}}.$$

*The constant C depends only on the shape of Ω, but not on* d*(Ω), and the shaperegularity of <sup>T</sup> h(Ω).*

Finally, we present a counterpart to the divergence-bounded regular decomposition of Theorem 9. For convenience we introduce the space of divergence-free face element functions

$$\mathcal{N}\_h^2 := \{ \mathbf{q}\_h \in \mathcal{W}\_h^2(\mathcal{Q}) \colon \operatorname{div} \mathbf{q}\_h = 0 \}. \tag{21}$$

**Theorem 13 (Divergence-Bounded Discrete Regular Decomposition of** *<sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω)***)** *For each vector field* **v***<sup>h</sup> in the lowest-order Raviart-Thomas space <sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω), there exists a continuous and piecewise linear vector field* **<sup>z</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>0</sup> *<sup>h</sup>(Ω), an element* **<sup>h</sup>***<sup>h</sup> in the discrete divergence-free subspace <sup>N</sup>* <sup>2</sup> *<sup>h</sup> , and a remainder* <sup>3</sup>**v***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>2</sup> *<sup>h</sup>(Ω), all depending linearly on* **v***h, providing the discrete regular decomposition*

$$\mathbf{v}\_h = \mathbf{R}\_h^2 \mathbf{z}\_h + \widetilde{\mathbf{v}}\_h + \mathbf{h}\_h$$

*and the stability estimates*

$$\left\|\frac{\|\mathbf{z}\_{h}\|\_{0,\mathcal{Q}}}{\|\widetilde{\mathbf{v}}\_{h}\|\_{0,\mathcal{Q}}}\right\| \leq C \left\|\mathbf{v}\_{h}\right\|\_{0,\mathcal{Q}} , \qquad \qquad \left\|\frac{\nabla \mathbf{z}\_{h}\|\_{0,\mathcal{Q}}}{\|h^{-1}\widetilde{\mathbf{v}}\_{h}\|\_{0,\mathcal{Q}}}\right\| \leq C \left\|\operatorname{div}\mathbf{v}\_{h}\right\|\_{0,\mathcal{Q}} ,$$

*The constants C depend only on the shape of Ω, but not on* d*(Ω), and the shape regularity of <sup>T</sup> h(Ω).*

*Remark 10* The result of Theorem 13 can be viewed as an improvement of the decompositions in [25] which are elaborated for the case of essential boundary conditions on *∂Ω*.

**Corollary 2** *If the second Betti number of Ω vanishes, that is, if ∂Ω is connected, then* **<sup>h</sup>***<sup>h</sup> in Theorem <sup>13</sup> can be chosen as* **<sup>h</sup>***<sup>h</sup>* <sup>=</sup> **curl q***<sup>h</sup> with* **<sup>q</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*<sup>1</sup> *<sup>h</sup>(Ω) such that*

$$\mathbf{v}\_h = \mathbf{R}\_h^2 \mathbf{z} + \widetilde{\mathbf{v}}\_h + \mathbf{curl} \, \mathbf{q}\_h \,, ,$$

*with the bounds*

$$\begin{aligned} \left\| \begin{aligned} \left\| \mathbf{z}\_{h} \right\|\_{0,\mathcal{Q}} \\ \left\| \widetilde{\mathbf{v}}\_{h} \right\|\_{0,\mathcal{Q}} \\ \left\| \mathbf{curl} \, \mathbf{q}\_{h} \right\|\_{0,\mathcal{Q}} \\ \left\| (\mathcal{Q})^{-1} \right\| \mathbf{q}\_{h} \right\|\_{0,\mathcal{Q}} \end{aligned} \right\| \leq C \left\| \mathbf{v}\_{h} \right\|\_{0,\mathcal{Q}}, \qquad \qquad \qquad \left\| \begin{aligned} \left\| \nabla \mathbf{z}\_{h} \right\|\_{0,\mathcal{Q}} \\ \left\| \mathbf{h}^{-1} \widetilde{\mathbf{v}}\_{h} \right\|\_{0,\mathcal{Q}} \end{aligned} \right\| \leq C \left\| \operatorname{div} \mathbf{v}\_{h} \right\|\_{0,\mathcal{Q}}.$$

*Remark 11* The result of Corollary 2 is an improvement of [19, Lemma 5.2] which assumes a domain *Ω* that is smooth enough to allow *H*2-regularity of the Laplace problem (2-regular case, for details see [19, Sect. 3]). This lemma is used in [27] in a domain decomposition framework, where convex subdomains are assumed. With our improved version, this assumption can be weakened considerably.

**Acknowledgements** The second author would like to thank Dirk Pauly (Essen) for enlightening discussions.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Model Reduction by Separation of Variables: A Comparison Between Hierarchical Model Reduction and Proper Generalized Decomposition**

**Simona Perotto, Michele Giuliano Carlino, and Francesco Ballarin**

### **1 Introduction**

This paper is meant as a first attempt to compare two procedures which share the idea of exploiting separation of variables to perform model reduction, albeit with different purposes. Proper Generalized Decomposition (PGD) is essentially employed as a powerful tool to deal with parametric problems in several fields of application [3, 14, 23]. Parametrized models characterize multi-query contexts, such as parameter optimization, statistical analysis or inverse problems. Here, the computation of the solution for many different parameters demands, in general, a huge computational effort, and this justifies the development of model reduction techniques.

For this purpose, projection-based techniques, such as Proper Orthogonal Decomposition (POD) or Reduced Basis methods, are widely used in the literature [11]. The idea is to project the discrete operators onto a reduced space so that the problem can be solved rapidly in the lower dimensional space. PGD adopts a completely different way to deal with parameters. Here, parameters are considered

S. Perotto

MOX - Modeling and Scientific Computing, Dipartimento di Matematica, Politecnico di Milano, Milano, Italy

e-mail: simona.perotto@polimi.it

M. G. Carlino Inria Bordeaux Sud-Ouest and Institut de Mathématiques de Bordeaux, University of Bordeaux, Talence, France e-mail: michele-giuliano.carlino@inria.fr

F. Ballarin (-) MathLab, Mathematics Area, SISSA, Trieste, Italy e-mail: francesco.ballarin@sissa.it

© The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_4

as new independent variables of the problem, together with the standard space-time ones [5]. Although the dimensionality of the problem is inevitably increased, PGD transforms the computation of the solution for new values of the parameters into a plain evaluation of the reduced solution, with striking computational advantages.

Hierarchical-Model (HiMod) reduction has been proposed to improve onedimensional (1D) partial differential equation (PDE) solvers for problems defined in domains with a geometrically dominant direction, like slabs or pipes [6, 20]. The main applicative field of interest is hemodynamics, in particular the modeling of blood flow in patient-specific geometries. Purely 1D hemodynamic models completely drop the transverse dynamics, which, however may be locally important (e.g., in the presence of a stenosis or an aneurism). HiMod aims at providing a numerical tool to incorporate the transverse components of the 3D solution into a conceptually 1D solver. To do this, the driving idea is to discretize main and transverse dynamics in a different way. The latter are generally of secondary importance and can be described by few degrees of freedom using a spectral approximation, in combination, for instance, with a finite element (FE) discretization of the mainstream.

The parametric version of HiMod (namely, HiPOD) is a more recent proposal [4, 13]. On the other hand, PGD is not so widely employed in a non-parametric setting, despite its original formulation [12]. Nevertheless, for the sake of comparison, in this paper we consider the non-parametric as well as the parametric versions of both the HiMod and PGD approaches. The goal is to begin a preliminary comparative analysis between the two methodologies, to highlight the respective weaknesses and strengths. The main limit of PGD remains its inability to deal with non-Cartesian geometries without losing the computational benefits arising from the separability of the spatial coordinates. HiMod turns out to be more flexible from a geometric viewpoint. On the other hand, PGD turns out to be extremely effective for parametric problems thanks to the explicit expression of the PGD solution in terms of the parameters, while HiPOD can be classified as a projection-based method with all the associated drawbacks. In perspective, the ultimate goal is to merge HiMod with PGD to emphasize the good features and mitigate the intrinsic limits of the two methods taken alone.

#### **2 The HiMod Approach**

Hierarchical Model reduction proved to be an efficient and reliable method to deal with phenomena characterized by dominant dynamics [10]. In general, the computational domain itself exhibits an intrinsic directionality. We assume <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* (*d* = 2*,* 3) to coincide with a *d*-dimensional fiber bundle, = 8 *<sup>x</sup>*∈1*<sup>D</sup>* {*x*} × *γx*, where 1*<sup>D</sup>* <sup>⊂</sup> <sup>R</sup> denotes the supporting fiber aligned with the main stream, while *γx* <sup>⊂</sup> <sup>R</sup>*d*−<sup>1</sup> is the transverse fiber at *<sup>x</sup>* <sup>∈</sup> 1*D*, parallel to the transverse dynamics. For the sake of simplicity, we identify 1*<sup>D</sup>* with a straight segment, *(x*0*, x*1*)*. We refer to [15, 21] for the case where 1*<sup>D</sup>* is curvilinear. From a computational viewpoint, the idea is to exploit a map, : → ˆ , transforming the physical domain, , into a reference domain, ˆ , and to make explicit computations in ˆ only. Typically, ˆ coincides with a rectangle in 2D, with a cylinder with circular section in 3D. To define , for each *x* ∈ 1*D*, we introduce the map, *ψx* : *γx* → ˆ*γd*−1, from fiber *γx* to the reference transverse fiber, *γ*ˆ*d*−1, so that the reference domain coincides with ˆ = 8 *<sup>x</sup>*∈1*<sup>D</sup>* {*x*}× ˆ*γd*−1. The supporting fiber is preserved by map , which modifies the lateral boundaries only.

We consider now the (full) problem to be reduced. Due to the comparative purposes of the paper, we focus on a scalar elliptic equation, and, in particular, on the associated weak formulation,

$$\text{find } u \in V : a(u, v) = F(v) \quad \forall v \in V,\tag{1}$$

where *<sup>V</sup>* <sup>⊆</sup> *<sup>H</sup>*1*()*, *a(*·*,*·*)* : *<sup>V</sup>* <sup>×</sup> *<sup>V</sup>* <sup>→</sup> <sup>R</sup> is a continuous and coercive bilinear form and *F (*·*)* : *<sup>V</sup>* <sup>→</sup> <sup>R</sup> is a continuous linear functional. To provide the HiMod formulation for problem (1), we introduce the hierarchical reduced space

$$V\_m = \left\{ v\_m(\mathbf{x}, \mathbf{y}) = \sum\_{k=1}^m \tilde{v}\_k(\mathbf{x}) \varphi\_k(\psi\_x(\mathbf{y})), \text{ with } \tilde{v}\_k \in V\_{1D}^h, \mathbf{x} \in \mathfrak{Q}\_{1D}, \mathbf{y} \in \boldsymbol{\chi\_\chi} \right\} \tag{2}$$

for a modal index *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>+, where *<sup>V</sup> <sup>h</sup>* <sup>1</sup>*<sup>D</sup>* <sup>⊆</sup> *<sup>H</sup>*1*(*1*D)* is a discrete space of dimension *Nh* associated with a partition <sup>T</sup>*<sup>h</sup>* of 1*D*, while {*ϕk*}*<sup>m</sup> <sup>k</sup>*=<sup>1</sup> denotes a modal basis of functions orthogonal with respect to the *<sup>L</sup>*2*(γ*ˆ*d*−1*)*-scalar product. Index *<sup>m</sup>* sets the hierarchical level of the HiMod space, being *Vm* ⊂ *Vm*+1, for any *m*. Concerning *V <sup>h</sup>* <sup>1</sup>*D*, we adopt here a standard FE space, although any discrete space can be employed (see, e.g., [21], where an isogeometric discretization is used). Functions in *V <sup>h</sup>* <sup>1</sup>*<sup>D</sup>* have to include the boundary conditions on {*x*0} × *γx*<sup>0</sup> and {*x*1} × *γx*<sup>1</sup> ; analogously, the modal functions have to take into account the boundary data along the horizontal sides. In Sect. 4 further comments are provided about the selection of the modal basis and of the modal index *m*. The HiMod formulation for problem (1) thus reads

$$\text{find } u\_m^{\text{HiMod}} \in V\_m : a(u\_m^{\text{HiMod}}, v\_m) = F(v\_m) \quad \forall v\_m \in V\_m. \tag{3}$$

To ensure the well-posedness of formulation (3) and the convergence of the HiMod approximation, *u*HiMod *<sup>m</sup>* , to the full solution, *u*, we endow the HiMod space with a conformity and a spectral approximability hypothesis, and we introduce a standard density assumption on the discrete space *V <sup>h</sup>* <sup>1</sup>*<sup>D</sup>* (see [20] for all the details).

The HiMod solution can be fully characterized by introducing a basis, {*θl*} *Nh <sup>l</sup>*=1, for the space *V <sup>h</sup>* <sup>1</sup>*D*. Actually, each modal coefficient, *<sup>u</sup>*˜ *<sup>k</sup>*, of *<sup>u</sup>*HiMod *<sup>m</sup>* can be expanded in terms of such a basis, so that, we obtain the modal representation

$$
\mu\_m^{\text{HiMod}}(\mathbf{x}, \mathbf{y}) = \sum\_{k=1}^m \sum\_{l=1}^{N\_h} \tilde{\mu}\_{k,l} \theta\_l(\mathbf{x}) \varphi\_k(\psi\_\mathbf{x}(\mathbf{y})). \tag{4}
$$

The actual unknowns of problem (3) become the *mNh* coefficients { ˜*uk,l*} *m,Nh <sup>k</sup>*=1*,l*=1. With reference to the Poisson problem, −*u* = *f* , completed with full homogeneous Dirichlet boundary data, the corresponding HiMod formulation, after exploiting (4) in (3) and picking *vm(x,* **y***)* = *θi(x)ϕj (ψx (***y***))* with *i* = 1*,...,Nh* and *j* = 1*,...,m*, reduces to the system of *mNh* 1D equations in the *mNh* unknowns { ˜*uk,l*} *m,Nh <sup>k</sup>*=1*,l*=1,

$$\begin{split} \sum\_{k=1}^{m} \sum\_{l=1}^{N\_h} \tilde{u}\_{k,l} \left[ \int\_{\Omega\_{1D}} \left( \hat{r}\_{jk}^{1,1}(\mathbf{x}) \frac{d\theta\_l}{dx}(\mathbf{x}) \frac{d\theta\_l}{dx}(\mathbf{x}) + \hat{r}\_{jk}^{1,0}(\mathbf{x}) \frac{d\theta\_l}{dx}(\mathbf{x}) \theta\_l(\mathbf{x}) + \hat{r}\_{jk}^{0,1}(\mathbf{x}) \frac{d\theta\_l}{dx}(\mathbf{x}) \frac{d\theta\_l}{dx}(\mathbf{x}) \right] dx = \int\_{\Omega\_{1D}} \hat{f}\_j(\mathbf{x}) \theta\_l(\mathbf{x}) \, d\mathbf{x}, \end{split}$$

where *r*ˆ *a,b jk (x)* = ; *<sup>γ</sup>*ˆ*d*−<sup>1</sup> *<sup>r</sup>a,b jk (x,* **y**ˆ*)*|*J* | *d***y**ˆ with *a, b* = 0,1, *J* = det D−<sup>1</sup> <sup>2</sup> *(x, ψ*−<sup>1</sup> *<sup>x</sup> (***y**ˆ*))* with <sup>D</sup><sup>2</sup> <sup>=</sup> <sup>D</sup>2*(x, ψ*−<sup>1</sup> *<sup>x</sup> (***y**ˆ*))* = ∇**y***ψx* ,

$$\begin{split} r\_{jk}^{0,0}(\mathbf{x},\hat{\mathbf{y}}) &= \varphi\_k'(\hat{\mathbf{y}})\varphi\_j'(\hat{\mathbf{y}}) \left( \mathcal{D}\_1^2 + \mathcal{D}\_2^2 \right), \ r\_{jk}^{0,1}(\mathbf{x},\hat{\mathbf{y}}) = \varphi\_k'(\hat{\mathbf{y}})\varphi\_j(\hat{\mathbf{y}})\mathcal{D}\_1, \\ r\_{jk}^{1,0}(\mathbf{x},\hat{\mathbf{y}}) &= \varphi\_k(\hat{\mathbf{y}})\varphi\_j'(\hat{\mathbf{y}})\mathcal{D}\_1, \end{split} \qquad \begin{split} r\_{jk}^{0,1}(\mathbf{x},\hat{\mathbf{y}}) &= \varphi\_k'(\hat{\mathbf{y}})\varphi\_j(\hat{\mathbf{y}})\mathcal{D}\_1, \\ r\_{jk}^{1,1}(\mathbf{x},\hat{\mathbf{y}}) &= \varphi\_k(\hat{\mathbf{y}})\varphi\_j(\hat{\mathbf{y}}), \end{split}$$

with <sup>D</sup><sup>1</sup> <sup>=</sup> <sup>D</sup>1*(x, ψ*−<sup>1</sup> *<sup>x</sup> (***y**ˆ*))* <sup>=</sup> *∂ψx /∂x*, and *<sup>f</sup>*<sup>ˆ</sup> *<sup>j</sup> (x)* = ; *<sup>γ</sup>*ˆ*d*−<sup>1</sup> *f (x, ψ*−<sup>1</sup> *<sup>x</sup> (***y**ˆ*))ϕj (***y**ˆ*)* |*J* | *d***y**ˆ. Information associated with the transverse dynamics are lumped in the coefficients { ˆ*r a,b jk* }, so that the HiMod system is solved on the supporting fiber, 1*D*. Collecting the HiMod unknowns, by mode, in the vector **u**HiMod *<sup>m</sup>* <sup>∈</sup> <sup>R</sup>*mNh* , such that

$$\mathbf{u}\_{m}^{\text{HiMod}} = \begin{bmatrix} \tilde{\boldsymbol{\mu}}\_{1,1}, \tilde{\boldsymbol{\mu}}\_{1,2}, \dots, \tilde{\boldsymbol{\mu}}\_{1,N\_h}, \tilde{\boldsymbol{\mu}}\_{2,1}, \dots, \tilde{\boldsymbol{\mu}}\_{m,1}, \dots, \tilde{\boldsymbol{\mu}}\_{m,N\_h} \end{bmatrix}^{T},\tag{5}$$

we can rewrite the HiMod system in the compact form

$$A\_m^{\text{HiMod}} \mathbf{u}\_m^{\text{HiMod}} = \mathbf{f}\_m^{\text{HiMod}},\tag{6}$$

where *A*HiMod *<sup>m</sup>* <sup>∈</sup> <sup>R</sup>*mNh*×*mNh* and **<sup>f</sup>** HiMod *<sup>m</sup>* <sup>∈</sup> <sup>R</sup>*mNh* are the HiMod stiffness matrix and right-hand side, respectively, with [**f** HiMod *<sup>m</sup>* ]*j i* = ; 1*<sup>D</sup> f*ˆ *<sup>j</sup> (x)θi(x)dx*, and [*A*HiMod *<sup>m</sup>* ]*j i,kl* <sup>=</sup> <sup>&</sup>lt;<sup>1</sup> *a,b*=0 ; 1*<sup>D</sup> r*ˆ *a,b jk (x) <sup>d</sup>aθl dx (x) <sup>d</sup>bθi dx (x)dx*. According to (5), for each modal index *j* , between 1 and *m*, the nodal index, *i*, takes the values 1*,...,Nh*. Thus, HiMod reduction leads to solve a system of order *mNh*, independently of the dimension of the full problem (1).

#### **3 The PGD Approach**

To perform PGD, we have to introduce on problem (1) a separability hypothesis with respect to both the spatial variables and the data [5, 22]. Thus, domain ⊂ <sup>R</sup>*<sup>d</sup>* coincides with the rectangle *x* <sup>×</sup> *y* if *<sup>d</sup>* <sup>=</sup> 2, with the parallelepiped *x* <sup>×</sup> *y* × *z* (total separability) or with the cylinder *x* × **<sup>y</sup>** (partial separability) if *<sup>d</sup>* <sup>=</sup> 3, for *x*, *y*, *z* <sup>⊂</sup> <sup>R</sup> and **<sup>y</sup>** <sup>⊂</sup> <sup>R</sup>2, being **<sup>y</sup>** <sup>=</sup> *(y, z)*. In the following, we focus on partial separability, since it is more suited to match HiMod reduction with PGD. Analogously, we assume that the generic problem data, *d* = *d(x, y, z)*, can be written as *<sup>d</sup>* <sup>=</sup> *<sup>d</sup><sup>x</sup> (x)d***y***(***y***)*. The separability is inherited by the PGD space

$$W\_{\mathfrak{m}} = \left\{ w\_{\mathfrak{m}}(\mathbf{x}, \mathbf{y}) = \sum\_{k=1}^{m} w\_k^{\chi}(\mathbf{x}) w\_k^{\mathbf{y}}(\mathbf{y}), \text{ with } w\_k^{\chi} \in W\_h^{\chi}, \ w\_k^{\mathbf{y}} \in W\_h^{\mathbf{y}}, \ x \in \mathfrak{Q}\_{\mathbf{x}}, \ \mathbf{y} \in \mathfrak{Q}\_{\mathbf{y}} \right\}, \tag{7}$$

where *W<sup>x</sup> <sup>h</sup>* <sup>⊆</sup> *<sup>H</sup>*<sup>1</sup>*(x)* and *<sup>W</sup>***<sup>y</sup>** *<sup>h</sup>* <sup>⊆</sup> *<sup>H</sup>*1*(***y**; <sup>R</sup>*d*−1*)* are discrete spaces, with dim*(W<sup>x</sup> <sup>h</sup> )* <sup>=</sup> *<sup>N</sup><sup>x</sup> <sup>h</sup>* and dim*(W***<sup>y</sup>** *<sup>h</sup> )* <sup>=</sup> *<sup>N</sup>***<sup>y</sup>** *<sup>h</sup>* , associated with partitions, <sup>T</sup>*<sup>x</sup> <sup>h</sup>* and <sup>T</sup>**<sup>y</sup>** *<sup>h</sup>*, of *x* and **y**, respectively. In general, *W<sup>x</sup> <sup>h</sup>* and *<sup>W</sup>***<sup>y</sup>** *<sup>h</sup>* are FE spaces, although, a priori, any discretization can be adopted. It turns out that *Wm* is a tensor function space, being *Wm* <sup>=</sup> *<sup>W</sup><sup>x</sup> <sup>h</sup>* <sup>⊗</sup> *<sup>W</sup>***<sup>y</sup>** *<sup>h</sup>* <sup>⊆</sup> *<sup>H</sup>*<sup>1</sup>*(x)* <sup>⊗</sup> *<sup>H</sup>*1*(***y**; <sup>R</sup>*d*−1*)*.

Index *m* plays the same role as in the HiMod reduction, setting the level of detail for the reduced solution (see Sect. 4 for possible criteria to choose *m*). PGD exploits the hierarchical structure in *Wm* to build the generic function *wm* ∈ *Wm*. In particular, *wm* is computed as

$$w\_m(\mathbf{x}, \mathbf{y}) = w\_m^\times(\mathbf{x}) w\_m^\mathbf{y}(\mathbf{y}) + \sum\_{k=1}^{m-1} w\_k^\times(\mathbf{x}) w\_k^\mathbf{y}(\mathbf{y}),\tag{8}$$

where *w<sup>x</sup> <sup>k</sup>* and *<sup>w</sup>***<sup>y</sup>** *<sup>k</sup>* are assumed known for *k* = 1*,...,m* − 1, so that the enrichment functions, *w<sup>x</sup> <sup>m</sup>* and *<sup>w</sup>***<sup>y</sup>** *<sup>m</sup>*, become the actual unknowns. To provide the PGD formulation for the Poisson problem considered in Sect. 2, we exploit representation (8) for the PGD approximation, *u*PGD *<sup>m</sup>* , and we pick the test function as *X(x)Y (***y***)*, with *<sup>X</sup>* <sup>∈</sup> *<sup>W</sup><sup>x</sup> <sup>h</sup>* and *<sup>Y</sup>* <sup>∈</sup> *<sup>W</sup>***<sup>y</sup>** *<sup>h</sup>* . The coupling between the unknowns, *<sup>u</sup><sup>x</sup> m* and *u***<sup>y</sup>** *<sup>m</sup>*, leads to a nonlinear problem, which is tackled by means of the *Alternating Direction Strategy* (ADS) [5]. The idea is to look for *u<sup>x</sup> <sup>m</sup>* and *<sup>u</sup>***<sup>y</sup>** *<sup>m</sup>*, separately via a fixed point procedure. We introduce an auxiliary index to keep trace of the ADS iterations, so that, at the *p*-th ADS iteration we compute *ux,p <sup>m</sup>* and *<sup>u</sup>***y***,p <sup>m</sup>* starting from the previous approximations, *ux,g <sup>m</sup>* and *<sup>u</sup>***y***,g <sup>m</sup>* for *g* = 1*,...,p* − 1, following a two-step procedure. First, we compute *ux,p <sup>m</sup>* by identifying *<sup>u</sup>***<sup>y</sup>** *<sup>m</sup>* with *<sup>u</sup>***y***,p*−<sup>1</sup> *<sup>m</sup>* , and by selecting *Y (***y***)* <sup>=</sup> *<sup>u</sup>***y***,p*−<sup>1</sup> *<sup>m</sup>* in the test function. This yields, for any *<sup>X</sup>* <sup>∈</sup> *<sup>W</sup><sup>x</sup> h* ,

$$\begin{split} & \int\_{\Omega\_{\boldsymbol{x}}} \left( \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p} \right)' X' \boldsymbol{d}x \int\_{\Omega\_{\boldsymbol{\chi}}} \left[ \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p-1} \right]^{2} d\mathbf{y} + \int\_{\Omega\_{\boldsymbol{x}}} \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p} X \boldsymbol{d}x \int\_{\Omega\_{\boldsymbol{\chi}}} \left[ \left( \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p-1} \right)' \right]^{2} d\mathbf{y} \\ & = \int\_{\Omega\_{\boldsymbol{x}}} f^{\boldsymbol{x}} X \boldsymbol{d}x \int\_{\Omega\_{\boldsymbol{\chi}}} f^{\mathbf{y}} \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p-1} d\mathbf{y} - \sum\_{k=1}^{m-1} \int\_{\Omega\_{\boldsymbol{x}}} \left( \boldsymbol{u}\_{k}^{\boldsymbol{\chi}} \right)' X' \boldsymbol{d}x \int\_{\Omega\_{\boldsymbol{\chi}}} \boldsymbol{u}\_{k}^{\boldsymbol{\chi}} \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p-1} d\mathbf{y} \\ & - \sum\_{k=1}^{m-1} \int\_{\Omega\_{\boldsymbol{x}}} \boldsymbol{u}\_{k}^{\boldsymbol{\chi}} X \boldsymbol{d}x \int\_{\Omega\_{\boldsymbol{\chi}}} \left( \boldsymbol{u}\_{k}^{\boldsymbol{\chi}} \right)' \left( \boldsymbol{u}\_{m}^{\boldsymbol{\chi},p-1} \right)' d\mathbf{y}, \end{split} \tag{9}$$

where the separability of *f* is exploited (the dependence on the independent variables, *x* and **y**, is omitted to simplify notation). Successively, we compute *u***y***,p <sup>m</sup>* , after setting *u<sup>x</sup> <sup>m</sup>* to *<sup>u</sup>x,p <sup>m</sup>* and choosing function *<sup>X</sup>* as to *<sup>u</sup>x,p <sup>m</sup>* in the test function, so that we obtain, for any *<sup>Y</sup>* <sup>∈</sup> *<sup>W</sup>***<sup>y</sup>** *h* ,

$$\begin{split} & \int\_{\Omega\_{\boldsymbol{X}}} \left[ \left( \boldsymbol{u}\_{m}^{\boldsymbol{x},\boldsymbol{p}} \right)' \right]^{2} dx \int\_{\Omega\_{\mathbf{Y}}} \boldsymbol{u}\_{m}^{\mathbf{y},\boldsymbol{p}} Y d\mathbf{y} + \int\_{\Omega\_{\boldsymbol{X}}} \left[ \boldsymbol{u}\_{m}^{\boldsymbol{x},\boldsymbol{p}} \right]^{2} dx \int\_{\Omega\_{\mathbf{Y}}} \left( \boldsymbol{u}\_{m}^{\mathbf{y},\boldsymbol{p}} \right)' Y' d\mathbf{y} \\ & = \int\_{\Omega\_{\boldsymbol{X}}} f^{\boldsymbol{x}} \boldsymbol{u}\_{m}^{\boldsymbol{x},\boldsymbol{p}} dx \int\_{\Omega\_{\mathbf{Y}}} f^{\mathbf{y}} Y d\mathbf{y} - \sum\_{k=1}^{m-1} \int\_{\Omega\_{\mathbf{X}}} \left( \boldsymbol{u}\_{k}^{\boldsymbol{x}} \right)' \left( \boldsymbol{u}\_{m}^{\boldsymbol{x},\boldsymbol{p}} \right)' dx \int\_{\Omega\_{\mathbf{Y}}} \boldsymbol{u}\_{k}^{\mathbf{y}} Y d\mathbf{y} \\ & - \sum\_{k=1}^{m-1} \int\_{\Omega\_{\mathbf{X}}} \boldsymbol{u}\_{k}^{\boldsymbol{x}} \boldsymbol{u}\_{m}^{\boldsymbol{x},\boldsymbol{p}} dx \int\_{\Omega\_{\mathbf{Y}}} \left( \boldsymbol{u}\_{k}^{\mathbf{y}} \right)' Y' d\mathbf{y}. \end{split} \tag{10}$$

The algebraic counterpart of (9) and (10) is obtained by introducing a basis, B*<sup>x</sup>* = {*θ x α* } *N<sup>x</sup> h <sup>α</sup>*=<sup>1</sup> and <sup>B</sup>**<sup>y</sup>** = {*<sup>θ</sup>* **<sup>y</sup>** *β*} *N***y** *h <sup>β</sup>*=1, for the space *<sup>W</sup><sup>x</sup> <sup>h</sup>* and *<sup>W</sup>***<sup>y</sup>** *<sup>h</sup>* , respectively, so that *<sup>u</sup><sup>q</sup> <sup>j</sup> (q)* = <*N<sup>q</sup> h <sup>i</sup>*=<sup>1</sup> *<sup>u</sup>*˜ *q j i<sup>θ</sup> <sup>q</sup> <sup>i</sup> (q)*, *<sup>u</sup>q,s <sup>m</sup> (q)* <sup>=</sup> <sup>&</sup>lt;*N<sup>q</sup> h <sup>i</sup>*=<sup>1</sup> *<sup>u</sup>*˜ *q,s mi <sup>θ</sup> <sup>q</sup> <sup>i</sup> (q)*, with *q* = *x*, **y**, *s* = *p*, *p* − 1, *j* = <sup>1</sup>*,...,m* <sup>−</sup> 1, and, likewise, *X(x)* <sup>=</sup> <sup>&</sup>lt;*N<sup>x</sup> h <sup>α</sup>*=<sup>1</sup> *<sup>x</sup>*˜*αθ <sup>x</sup> <sup>α</sup> (x)* and *Y (***y***)* <sup>=</sup> <sup>&</sup>lt;*N***<sup>y</sup>** *h <sup>β</sup>*=<sup>1</sup> *<sup>y</sup>*˜*βθ* **<sup>y</sup>** *<sup>β</sup>(***y***)*. Thanks to these expansions and to the arbitrariness of *X* and *Y* , we can rewrite (9) and (10) as

$$\begin{cases} \left[\left(\mathbf{u}\_{m}^{\mathbf{y},p-1}\right)^{T}M^{\mathbf{y}}\mathbf{u}\_{m}^{\mathbf{y},p-1}\right]K^{\mathbf{x}}+\left[\left(\mathbf{u}\_{m}^{\mathbf{y},p-1}\right)^{T}K^{\mathbf{y}}\mathbf{u}\_{m}^{\mathbf{y},p-1}\right]M^{\mathbf{x}}\right]\mathbf{u}\_{m}^{\mathbf{x},p}=\left[\left(\mathbf{u}\_{m}^{\mathbf{y},p-1}\right)^{T}\mathbf{f}^{\mathbf{y}}\right]\mathbf{f}^{\mathbf{x}},\\ -\sum\_{k=1}^{m-1}\left\{\left[\left(\mathbf{u}\_{m}^{\mathbf{y},p-1}\right)^{T}M^{\mathbf{y}}\mathbf{u}\_{k}^{\mathbf{y}}\right]K^{\mathbf{x}}+\left[\left(\mathbf{u}\_{m}^{\mathbf{y},p-1}\right)^{T}K^{\mathbf{y}}\mathbf{u}\_{k}^{\mathbf{y}}\right]M^{\mathbf{x}}\right\}\mathbf{u}\_{k}^{\mathbf{x}},\end{cases},\tag{11}$$

and

$$\begin{cases} \left[\left(\mathbf{u}\_m^{\boldsymbol{\chi},p}\right)^T K^{\boldsymbol{\chi}} \mathbf{u}\_m^{\boldsymbol{\chi},p}\right] M^{\mathbf{y}} + \left[\left(\mathbf{u}\_m^{\boldsymbol{\chi},p}\right)^T M^{\mathbf{x}} \mathbf{u}\_m^{\boldsymbol{\chi},p}\right] K^{\mathbf{y}}\Big| \mathbf{u}\_m^{\mathbf{y},p} = \left[\left(\mathbf{u}\_m^{\boldsymbol{\chi},p}\right)^T \mathbf{f}^{\mathbf{x}}\right] \mathbf{f}^{\mathbf{y}}\\ -\sum\_{k=1}^{m-1} \left\{\left[\left(\mathbf{u}\_m^{\boldsymbol{\chi},p}\right)^T K^{\mathbf{x}} \mathbf{u}\_k^{\boldsymbol{\chi}}\right] M^{\mathbf{y}} + \left[\left(\mathbf{u}\_m^{\boldsymbol{\chi},p}\right)^T M^{\mathbf{x}} \mathbf{u}\_k^{\boldsymbol{\chi}}\right] K^{\mathbf{y}}\right\} \mathbf{u}\_k^{\mathbf{y}}, \end{cases} \tag{12}$$

respectively, where vectors **u***<sup>q</sup> <sup>j</sup>* , **<sup>u</sup>***q,s <sup>m</sup>* <sup>∈</sup> <sup>R</sup>*N<sup>q</sup> <sup>h</sup>* collect the PGD coefficients, being **u***q j <sup>i</sup>* = ˜*u<sup>q</sup> j i*, **u***q,s m <sup>i</sup>* = ˜*uq,s mi* and *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,N<sup>q</sup> <sup>h</sup>* , *<sup>K</sup>x*, *<sup>M</sup><sup>x</sup>* <sup>∈</sup> <sup>R</sup>*N<sup>x</sup> <sup>h</sup>* <sup>×</sup>*N<sup>x</sup> <sup>h</sup>* and *K***y**, *<sup>M</sup>***<sup>y</sup>** <sup>∈</sup> <sup>R</sup>*N***<sup>y</sup>** *h*×*N***<sup>y</sup>** *<sup>h</sup>* are the stiffness and mass matrices associated with *x*- and **y**variables, with *Kx αl* = ; *x θ x α θ x l dx*, *K***<sup>y</sup>** *βs* = ; **y** *θ* **y** *β θ* **y** *s d***y**, *M<sup>x</sup> αl* = ; *x <sup>θ</sup> <sup>x</sup> <sup>α</sup> θ <sup>x</sup> <sup>l</sup> dx*, *M***<sup>y</sup>** *βs* = ; **<sup>y</sup>** *<sup>θ</sup>* **<sup>y</sup>** *βθ* **y** *<sup>s</sup> d***y**, and where **f** *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*N<sup>x</sup> <sup>h</sup>* , **f <sup>y</sup>** <sup>∈</sup> <sup>R</sup>*N***<sup>y</sup>** *<sup>h</sup>* , with **f** *x <sup>l</sup>* = ; *x <sup>f</sup> xθ <sup>x</sup> <sup>l</sup> dx*, **f y** *<sup>s</sup>* = ; **<sup>y</sup>** *<sup>f</sup>* **<sup>y</sup>***<sup>θ</sup>* **<sup>y</sup>** *<sup>s</sup> <sup>d</sup>***y**, for *α, l* <sup>=</sup> <sup>1</sup>*,...,N<sup>x</sup> <sup>h</sup>* , *β, s* <sup>=</sup> <sup>1</sup>*,...,N***<sup>y</sup>** *h* . Systems (11) and (12) are solved at each ADS iteration, so that the computational effort characterizing PGD is the one associated with the solution of two systems of order *N<sup>x</sup> <sup>h</sup>* and *<sup>N</sup>***<sup>y</sup>** *<sup>h</sup>* , respectively, for each ADS iteration. When a certain stopping criterion is met (see the next section for more details), ADS procedure yields vectors **u***x <sup>m</sup>* and **uy** *<sup>m</sup>* which identify the enrichment functions *u<sup>x</sup> <sup>m</sup>* and *<sup>u</sup>***<sup>y</sup>** *m*.

#### **4 HiMod Reduction Versus PGD**

Both HiMod reduction and PGD exploit the separation of variables and, according to [5], belong to the a priori approaches, since they do not rely on any solution to the problem at hand. Nevertheless, we can easily itemize features which distinguish the two techniques. The most relevant ones concern the geometry of , the selection of the transverse basis and of the modal index, and the numerical implementation of the two procedures. Pros and cons of the two methods are then here highlighted.

#### *4.1 Domain Geometry*

HiMod reduction and PGD advance precise hypotheses on the geometry of the computational domain.

According to the HiMod approach, is expected to coincide with a fiber bundle and to be mapped into the reference domain, ˆ , by a sufficiently regular transformation. Actually, map is assumed differentiable, while map *ψx* is required to be a *<sup>C</sup>*1-diffeomorphism, for all *<sup>x</sup>* <sup>∈</sup> 1*<sup>D</sup>* [20]. These hypotheses introduce some constraints, in particular, on the lateral boundary of which, e.g., cannot exhibit kinks. Additionally, geometries of interest in many applications, such as bifurcations or, more in general, networks are ruled out from the demands on *ψx* and . An approach based on the domain decomposition technique is currently under investigation as a viable way to deal with such geometries. The isogeometric version of HiMod (i.e., the HIgaMod approach) will play a crucial role in view of HiMod simulations for the blood flow modeling in patient-specific geometries [21].

The constraints introduced by PGD on the geometry of are more restrictive. The separability hypothesis leads to consider essentially only Cartesian domains. This considerably reduces the applicability of PGD to practical contexts. Some techniques are available in the literature to overcome this issue. For instance, in [9] a generic domain is embedded into a Cartesian geometry, while in [7] the authors introduce a parametrization map for quadrilateral domains.

Overall, HiMod reduction exhibits a higher geometric flexibility with respect to PGD, in its straightforward formulation. As discussed in Sect. 5, this limitation can be removed when considering a parametric setting.

#### *4.2 Modeling of the Transverse Dynamics*

In the HiMod expansion, **y**-components, *ϕk(ψx (***y***))*, are selected before starting the model reduction. This choice, although coherent with an a priori approach, introduces a constraint on the dynamics that can be described, so that hints about the solution trend along the transverse direction can be helpful to select a representative modal basis. In the original proposal of the HiMod procedure, sinusoidal functions are employed according to a Fourier expansion [6, 20]. This turns out to be a reasonable choice when Dirichlet boundary conditions are assigned on the lateral surface, lat = {*x*} × *∂γx*, of . Legendre polynomials, properly modified to include the homogeneous Dirichlet data and orthonormalized, are employed in [20] as an alternative to a trigonometric expansion. Nevertheless, Legendre polynomials require high-order quadrature rules to accurately compute coefficients { ˆ*r a,b jk* }.

In [1], the concept of educated modal basis is introduced to impose generic boundary conditions on lat. The idea is to solve an auxiliary Sturm-Liouville eigenvalue problem on the transverse reference fiber *γ*ˆ*d*−1, to build a basis which automatically includes the boundary values on lat. The eigenfunctions of the Sturm-Liouville problem provide the modal basis. A first attempt to generalize the educated-HiMod reduction to three-dimensional (3D) cylindrical geometries is performed in [10], where the Navier-Stokes equations are hierarchically reduced to model the blood flow in pipes. This generalization is far from being straightforward due to the employment of polar coordinates. To overcome this issue, we are currently investigating the HIgaMod approach [21], which allows us to define the transverse basis as the Cartesian product of 1D modal functions, independently of the considered geometry.

Additionally, we remark that any modal basis can be precomputed on the transverse reference fiber before performing the HiMod reduction, thanks to the employment of map . This considerably simplifies computations.

When applying PGD, **y**-components are unknown as the ones associated with *x*. This leads to the nonlinear problems (9)–(10), thus loosing any advantage related to a precomputation of the HiMod modal basis. On the other hand, PGD does not constrain the transverse dynamic to follow a prescribed (e.g., sinusoidal) analytical shape as HiMod procedure does. The educated-Himod reduction clearly is out of this comparison, since the modal basis strictly depends on the problem at hand.

Finally, we observe that HiMod modes are orthonormal with respect to the *<sup>L</sup>*2*(γ*ˆ*d*−1*)*-norm. This property is not ensured by PGD.

Concerning the selection of the modal index *m* in (2) and (7), as a first attempt, both HiMod reduction and PGD resort to a trial-and-error approach, so that the modal index is gradually increased until a check on the accuracy of the reduced solution is satisfied. For instance, in [6, 20] a qualitative investigation of the contour plot of the HiMod approximation drives the choice of *m*. Concerning PGD, the check on the relative enrichment

$$\frac{\|\boldsymbol{\mu}\_m^\mathbf{x}\boldsymbol{u}\_m^\mathbf{y}\|\_{L^2(\Omega)}}{\|\boldsymbol{\mu}\_1^\mathbf{x}\boldsymbol{u}\_1^\mathbf{y}\|\_{L^2(\Omega)}} \le \text{TOL}\_\mathbf{E},\tag{13}$$

is usually employed, with TOLE a user-defined tolerance [5]. An automatic selection of index *m* can yield a significant improvement. In [17, 19], an adaptive procedure is proposed for HiMod, based on an a posteriori modeling error analysis. In particular, the estimator in [17] is derived in a goal-oriented setting to control a quantity of interest, and exploits the hierarchical structure (i.e., the inclusion *Vm* <sup>⊂</sup> *Vm*+*<sup>d</sup>* , <sup>∀</sup>*m*, *<sup>d</sup>* <sup>∈</sup> <sup>N</sup>+) typical of a HiMod reduction. A similar modeling error analysis is performed in [2] for PGD, although no adaptive algorithm is here set to automatically pick the reduced model. Paper [19] generalizes the a posteriori analysis in [17] to an unsteady setting, providing the tool to automatically select *m* together with the partition T*<sup>h</sup>* along 1*<sup>D</sup>* and the time step.

Finally, HiMod allows to tune the modal index along the domain , according to the local complexity of the transverse dynamics. In particular, *m* can be varied in different areas of or, in the presence of very localized dynamics, in correspondence with specific nodes of the partition T*h*. We refer to these two variants as to *piecewise* and *pointwise* HiMod reduction, in contrast to a *uniform* approach, where the same number of modes is adopted everywhere [16, 18]. This flexibility in the choice of *m* is currently not available for PGD. Adaptive strategies to select the modal index are available for the three variants of the HiMod procedure [17, 19].

#### *4.3 Computational Aspects*

From a computational viewpoint, HiMod reduction and PGD lead to completely different procedures. Indeed, for a fixed value of *m*, we have to solve the only system (6) of order *mNh* when applying HiMod, in contrast to PGD which demands a multiple solution of systems (11)–(12) of order *N<sup>x</sup> <sup>h</sup>* and *<sup>N</sup>***<sup>y</sup>** *<sup>h</sup>* , respectively because of the fixed point and the enrichment algorithms. Thus, the direct solution of a single system, in general of larger order, is replaced by an iterative solution of several and smaller systems. This heterogeneity makes a computational comparison between PGD and HiMod not so meaningful. We verify the reliability of the HiMod and PGD procedures on a common test case, by choosing in (1) *<sup>V</sup>* <sup>=</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> *()* with = *(*0*,* 5*)* × *(*0*,* 1*)*, *a(u, v)* = ; *μ*∇*u* · ∇*v* + **b** · ∇*u d* for *μ* = 0*.*24, **b** = [−5*,* 0] *<sup>T</sup>* , and *F (v)* <sup>=</sup> ; *f vd* with *f (x, y)* = 50= exp − *(x* <sup>−</sup>2*.*85*)/*0*.*075<sup>2</sup> <sup>−</sup> *(y* <sup>−</sup> <sup>0</sup>*.*5*)/*0*.*0752 + exp − *(x* <sup>−</sup> <sup>3</sup>*.*75*)/*0*.*075<sup>2</sup> <sup>−</sup> *(y* <sup>−</sup> <sup>0</sup>*.*5*)/*0*.*0752>. For both the methods, we uniformly subdivide 1*<sup>D</sup>* into 285 subintervals. We set the PGD discretization along *y* as well as the PGD and the HiMod index *m* in order

**Fig. 1** Qualitative comparison between a HiMod (left) and a PGD (right) approximations

to ensure the same accuracy, TOL, on the reduced approximations with respect to a reference FE solution, computed on a 2500 ×500 structured mesh. In particular, for TOL <sup>=</sup> <sup>8</sup> · <sup>10</sup>−3, we have to subdivide interval *(*0*,* <sup>1</sup>*)* into 20 uniform subintervals, and to set *m* to 6 and to 9 in the PGD and the HiMod discretization, respectively. Sinusoidal functions are chosen for the HiMod modal basis. The ADS iterations are controlled in terms of the relative increment, as

$$\frac{\|\boldsymbol{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}}\boldsymbol{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}}-\boldsymbol{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}-1}\boldsymbol{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}-1}\|\_{L^{2}(\Omega)}}{\|\boldsymbol{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}}\boldsymbol{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}}\|\_{L^{2}(\Omega)}}\leq \text{TOL}\_{\text{FP}},\tag{14}$$

with TOLFP <sup>=</sup> <sup>10</sup>−2. Figure <sup>1</sup> shows the reduced approximations (which are fully comparable with the FE one, here omitted). The contourplots are very similar. The coarse PGD *y*-discretization justifies the slight roughness of the PGD contourlines.

Another distinguishing feature between HiMod and PGD is the domain discretization. Indeed, HiMod requires only the partition T*<sup>h</sup>* along 1*D*, independently of the dimension of . No discretization is needed in the **y**-direction, although we have to carefully select the quadrature nodes to compute coefficients { ˆ*r a,b jk* }. This task becomes particularly challenging when dealing with polar coordinates [10]. With PGD to benefit of the computational advantages associated with a 1D discretization, we are obliged to assume the full separability of ; actually, a partial separability demands a 1D partition for *x*, and a two-dimensional partition of **y**. As explained in Sect. 5, non-Cartesian domains require a 3D discretization of .

Finally we analyze the interplay between the enrichment and the ADS iterations in the PGD reduction. We investigate the possible relationship between TOLFP in (14) and TOLE in (13), to verify if a small tolerance for the fixed point iteration improves the accuracy of the PGD approximation, thus reducing the number of enrichment steps. To do this, we adopt the same test case used above. Table 1 gathers the number of ADS iterations, #ITFP, the number, *m*, of enrichment steps, and the CPU time1 (in seconds) demanded by the PGD procedure, for two different values of TOLE and three different choices of TOLFP. In particular, in column #ITFP we specify the number of ADS iterations required by each enrichment step. As expected, there exists a link between the two tolerances, namely, when a higher accuracy constrains the fixed point iteration, a smaller number of enrichment steps is performed to ensure the accuracy TOLE.

<sup>1</sup>The computations have been run on a Intel Core i5 Dual-Core CPU 2.7 GHz 8 GB RAM MacBook.


**Table 1** Quantitative analysis for PGD in terms of fixed point iterations and enrichment steps

#### **5 HiMod Reduction and PGD for Parametrized Problems**

The actual potential of PGD becomes more evident when considering a parametric setting, i.e., when problem (1) is replaced by the formulation

$$\text{find } u(\mu) \in V : a(u(\mu), v; \mu) = F(v; \mu) \quad \forall v \in V,\tag{15}$$

with *μ* a parameter, which may represent any data of the problem, e.g., the coefficients of the considered PDE, the source term, a boundary value or the domain geometry.

The technique adopted by PGD to deal with the parametric dependence in (15) is very effective. Parameter *μ* is considered as an additional independent variable which varies in a domain *μ* [5]. Thus, the PGD space (7) changes into the new one

$$\begin{aligned} W\_m^\mu &= \left\{ \left. w\_m(\mathbf{x}, \mathbf{y}, \boldsymbol{\mu}) = \sum\_{k=1}^m w\_k^\mathbf{x}(\mathbf{x}) w\_k^\mathbf{y}(\mathbf{y}) w\_k^\boldsymbol{\mu}(\boldsymbol{\mu}), \text{ with} \\\\ w\_k^\mathbf{x} &\in W\_h^\mathbf{x}, \; w\_k^\mathbf{y} \in W\_h^\mathbf{y}, \; w\_k^\boldsymbol{\mu} \in W\_h^\mu, \; \mathbf{x} \in \Omega\_\mathbf{x}, \; \mathbf{y} \in \Omega\_\mathbf{y}, \; \boldsymbol{\mu} \in \Omega\_\mu \right\}, \end{aligned} \tag{16}$$

with *W<sup>μ</sup> <sup>h</sup>* a discretization of the space *<sup>L</sup>*2*(μ*; <sup>R</sup>*Q)*, being *<sup>Q</sup>* the length of vector *<sup>μ</sup>*. Generalizing the enrichment paradigm in (8), at the *m*-th step of the PGD approach applied to problem (15) we have to compute three unknown functions, *u<sup>x</sup> <sup>m</sup>*, *<sup>u</sup>***<sup>y</sup>** *<sup>m</sup>* and *uμ <sup>m</sup>*, by picking the test function as *X(x)Y (***y***)Z(μ)*, with *<sup>X</sup>* <sup>∈</sup> *<sup>W</sup><sup>x</sup> <sup>h</sup>* , *<sup>Y</sup>* <sup>∈</sup> *<sup>W</sup>***<sup>y</sup>** *<sup>h</sup>* , *Z* ∈ *W<sup>μ</sup> <sup>h</sup>* . Functions *<sup>u</sup><sup>x</sup> <sup>m</sup>*, *<sup>u</sup>***<sup>y</sup>** *<sup>m</sup>*, *<sup>u</sup><sup>μ</sup> <sup>m</sup>* are computed by ADS, which now coincides with a three-step procedure. Thus, with reference to the Poisson problem, −∇· *μ*∇*u* = *f* completed with full homogeneous Dirichlet boundary conditions and for *μ* ≡ *μ*, we first compute *ux,p <sup>m</sup>* by identifying *<sup>u</sup>***<sup>y</sup>** *<sup>m</sup>* and *<sup>u</sup><sup>μ</sup> <sup>m</sup>* with the previous approximations, *<sup>u</sup>***y***,p*−<sup>1</sup> *<sup>m</sup>* and *<sup>u</sup>μ,p*−<sup>1</sup> *<sup>m</sup>* , respectively and by selecting *Y (***y***)Z(μ)* <sup>=</sup> *<sup>u</sup>***y***,p*−<sup>1</sup> *<sup>m</sup> <sup>u</sup>μ,p*−<sup>1</sup> *<sup>m</sup>* in the test function. This leads to a linear system which generalizes (11), namely

$$\begin{split} & \left[ \left( \mathbf{u}\_{m}^{\mu, p-1} \right)^{T} M^{\mu} \mathbf{u}\_{m}^{\mu, p-1} \right] \\ & \quad \left[ \left( \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right)^{T} M^{\mathbf{y}} \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right] K^{\times} + \left[ \left( \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right)^{T} K^{\mathbf{y}} \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right] M^{\mathbf{x}} \right] \mathbf{u}\_{m}^{\mathbf{x}, p} \\ &= \left[ \left( \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right)^{T} \mathbf{f}^{\mathbf{y}} \right] \left[ \left( \mathbf{u}\_{m}^{\mu, p-1} \right)^{T} \mathbf{f}^{\mathbf{y}} \right] \mathbf{f}^{\mathbf{x}} - \sum\_{k=1}^{m-1} \left[ \left( \mathbf{u}\_{m}^{\mu, p-1} \right)^{T} M^{\mu} \mathbf{u}\_{k}^{\mathbf{y}} \right] \\ & \quad \left\{ \left[ \left( \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right)^{T} M^{\mathbf{y}} \mathbf{u}\_{k}^{\mathbf{y}} \right] K^{\times} + \left[ \left( \mathbf{u}\_{m}^{\mathbf{y}, p-1} \right)^{T} K^{\mathbf{y}} \mathbf{u}\_{k}^{\mathbf{y}} \right] M^{\mathbf{x}} \right\} \mathbf{u}\_{k}^{\mathbf{x}}, \end{split} \tag{17}$$

where *<sup>M</sup><sup>μ</sup>* <sup>∈</sup> <sup>R</sup>*N<sup>μ</sup> <sup>h</sup>* <sup>×</sup>*N<sup>μ</sup> <sup>h</sup>* is the mass matrix associated with the parameter *μ*, with *Mμ ij* = ; *<sup>μ</sup> <sup>μ</sup>θ<sup>μ</sup> <sup>i</sup> <sup>θ</sup><sup>μ</sup> <sup>j</sup> <sup>d</sup><sup>μ</sup>* for *<sup>i</sup>*, *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,N<sup>μ</sup> <sup>h</sup>* and <sup>B</sup>*<sup>μ</sup>* = {*θ<sup>μ</sup> γ* } *N<sup>μ</sup> h <sup>γ</sup>* <sup>=</sup><sup>1</sup> a basis for the space *W<sup>μ</sup> <sup>h</sup>* , **f** *<sup>μ</sup>* <sup>∈</sup> <sup>R</sup>*N<sup>μ</sup> <sup>h</sup>* with **f** *μ <sup>l</sup>* = ; *<sup>μ</sup> <sup>f</sup> <sup>μ</sup>θ<sup>μ</sup> <sup>l</sup> <sup>d</sup><sup>μ</sup>* for *<sup>l</sup>* <sup>=</sup> <sup>1</sup>*,...,N<sup>μ</sup> <sup>h</sup>* after assuming the separability *<sup>f</sup>* <sup>=</sup> *<sup>f</sup> xf* **<sup>y</sup>***<sup>f</sup> <sup>μ</sup>* for the source term *<sup>f</sup>* , and where we employ the same notation as in (11)–(12) to denote vectors **u***<sup>μ</sup> <sup>w</sup>*, **u***μ,s <sup>m</sup>* , with *w* = 1*,...,m* − 1, *s* = *p, p*−1, collecting the PGD coefficients associated with the basis B*μ*. Analogously, *u***y***,p <sup>m</sup>* is computed by solving the generalization of the linear system (12) given by

$$\begin{split} & \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\mu},\boldsymbol{p}-1} \right)^{T} M^{\boldsymbol{\mu}} \mathbf{u}\_{m}^{\boldsymbol{\mu},\boldsymbol{p}-1} \right] \left\{ \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right)^{T} K^{\boldsymbol{\chi}} \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right] M^{\mathbf{y}} + \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right)^{T} M^{\mathbf{x}} \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right] K^{\mathbf{y}} \right\} \mathbf{u}\_{m}^{\mathbf{y},p} \\ &= \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right)^{T} \mathbf{f}^{\mathbf{x}} \right] \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\mu},\boldsymbol{p}-1} \right)^{T} \mathbf{f}^{\mathbf{x}} \right] \mathbf{f}^{\mathbf{y}} - \sum\_{k=1}^{m-1} \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\mu},\boldsymbol{p}-1} \right)^{T} M^{\mathbf{x}} \mathbf{u}\_{k}^{\boldsymbol{\mu}} \right] \\ & \quad \times \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right)^{T} K^{\mathbf{x}} \mathbf{u}\_{k}^{\boldsymbol{\chi}} \right] M^{\mathbf{y}} + \left[ \left( \mathbf{u}\_{m}^{\boldsymbol{\chi},\boldsymbol{p}} \right)^{T} M^{\mathbf{x}} \mathbf{u}\_{k}^{\boldsymbol{\chi}} \right] K^{\mathbf{y}} \right] \mathbf{u}\_{k}^{\mathbf{y}}, \end{split}$$

after setting *u<sup>x</sup> <sup>m</sup>* <sup>=</sup> *<sup>u</sup>x,p <sup>m</sup>* , *<sup>u</sup><sup>μ</sup> <sup>m</sup>* <sup>=</sup> *<sup>u</sup>μ,p*−<sup>1</sup> *<sup>m</sup>* and *X(x)Z(μ)* <sup>=</sup> *<sup>u</sup>x,p <sup>m</sup> <sup>u</sup>μ,p*−<sup>1</sup> *<sup>m</sup>* for the PGD test function. Finally, we have the additional linear system used to compute *uμ,p <sup>m</sup>* ,

! **u***x,p m <sup>T</sup> Kx***u***x,p m* "! **uy***,p m <sup>T</sup> M***yu<sup>y</sup>***,p m* " + ! **u***x,p m <sup>T</sup> Mx***u***x,p m* "! **uy***,p m <sup>T</sup> K***yu<sup>y</sup>***,p m* " *Mμ***u***μ,p m* = ! **u***x,p m <sup>T</sup>* **f** *x* "! **uy***,p m <sup>T</sup>* **f y** " **f** *<sup>μ</sup>* <sup>−</sup> <sup>&</sup>lt;*m*−<sup>1</sup> *k*=1 ! **u***x,p m <sup>T</sup> Kx***u***<sup>x</sup> k* "! **uy***,p m <sup>T</sup> M***yu***<sup>x</sup> k* " ! **u***x,p m <sup>T</sup> Mx***u***<sup>x</sup> k* "! **uy***,p m <sup>T</sup> K***yu<sup>y</sup>** *k* "*Mμ***u***<sup>μ</sup> k ,*

obtained for *u<sup>x</sup> <sup>m</sup>* <sup>=</sup> *<sup>u</sup>x,p <sup>m</sup>* , *<sup>u</sup>***<sup>y</sup>** *<sup>m</sup>* <sup>=</sup> *<sup>u</sup>***y***,p <sup>m</sup>* and by selecting *X(x)Y (***y***)* <sup>=</sup> *<sup>u</sup>x,p <sup>m</sup> <sup>u</sup>***y***,p <sup>m</sup>* for the test function. From a computational viewpoint, at each ADS iteration, we have to solve now three linear systems of order *N<sup>x</sup> <sup>h</sup>* , *<sup>N</sup>***<sup>y</sup>** *<sup>h</sup>* , *<sup>N</sup><sup>μ</sup> <sup>h</sup>* , respectively.

We investigate the reliability of PGD on problem (15), for *<sup>V</sup>* <sup>=</sup> *<sup>H</sup>*<sup>1</sup> in∪up∪down *()* with = *(*0*,* 3*)* × *(*0*,* 1*)*, in = {0} × *(*0*,* 1*)*, up = *(*0*,* 3*)* × {1}, down = *(*0*,* 3*)* × {0}, *a(u, v)* = ; *μ*∇*u* · ∇*v* + **b** · ∇*u d* with **b** = [2*.*5*,* 0] *<sup>T</sup>* and *μ* the parameter to be varied in *<sup>μ</sup>* = [1*,* 5], *F (v)* = ; *f vd* with *f* = 1. The problem is completed with mixed boundary conditions, namely a homogeneous Dirichlet data on up ∪ down, the non-homogeneous Dirichlet condition, *u* = *u*in with *u*in = *y(*1−*y)*, on in and a homogeneous Neumann value on out = {3}×*(*0*,* 1*)*. We apply the PGD reduction for *<sup>m</sup>* <sup>=</sup> 2, and we uniformly subdivide *x*, *y*, *μ*, being *<sup>N</sup><sup>x</sup> h* =

**Fig. 2** Qualitative comparison between the reference (left) and the PGD (right) solutions, for *μ* = 1 (top) and *μ* = 2*.*5 (bottom)

150, *N<sup>y</sup> <sup>h</sup>* <sup>=</sup> 50, *<sup>N</sup><sup>μ</sup> <sup>h</sup>* <sup>=</sup> 500. The tolerance in (14) is set to 10−2. Figure <sup>2</sup> compares the PGD approximation for *μ* = 1 and *μ* = 2*.*5 with a reference full solution coinciding with a linear FE approximation computed on a 300 × 100 structured mesh. The qualitative matching between the corresponding solutions is significant. From a quantitative viewpoint, the *L*2*()*-norm of the relative error associated with the PGD approximation does not vary significantly by increasing *m*, whereas a slight error reduction is detected by increasing *μ*.

The parametric counterpart of the HiMod reduction, known as HiPOD, merges HiMod with POD [4, 13]. HiPOD pursues a different goal with respect to PGD. Indeed, for a new value, *μ*∗, of the parameter, PGD provides an approximation for the full solution *u(μ*∗*)*, while HiPOD approximates the HiMod solution associated with *μ*∗. The offline/online paradigm of POD is followed also by HiPOD. The peculiarity is that the offline step is now performed in the HiMod setting to contain the computational burden typical of this stage and by relying on the good properties of HiMod in terms of reliability-versus-accuracy balance. Thus, we choose *P* different values, *μ* = *μ<sup>i</sup>* with *i* = 1*,...,P*, for parameter *μ*, and we collect the HiMod approximation for the corresponding problem (15) into the response matrix, S = **u**HiMod *<sup>m</sup> (μ*1*),* **u**HiMod *<sup>m</sup> (μ*2*), . . . ,* **u**HiMod *<sup>m</sup> (μ<sup>P</sup> )* <sup>∈</sup> <sup>R</sup>*mNh*×*<sup>P</sup>* , according to representation (5). Successively, we define the null-average matrix

$$\mathcal{V} = \mathcal{S} - \frac{1}{P} \sum\_{l=1}^{P} \left[ \mathbf{u}\_m^{\text{HiMod}}(\mu\_l), \mathbf{u}\_m^{\text{HiMod}}(\mu\_l), \dots, \mathbf{u}\_m^{\text{HiMod}}(\mu\_l) \right] \in \mathbb{R}^{mN\_h \times P},$$

and we apply the Singular Value Decomposition (SVD) to <sup>V</sup>, so that <sup>V</sup> <sup>=</sup> *<sup>T</sup>* , where <sup>∈</sup> <sup>R</sup>*(mNh)*×*(mNh)* and <sup>∈</sup> <sup>R</sup>*P*×*<sup>P</sup>* are the unitary matrices of the left- and of the right-singular vectors of V, respectively while = diag *(σ*1*,...,σρ )* ∈ <sup>R</sup>*(mNh)*×*<sup>P</sup>* denotes the pseudo-diagonal matrix of the singular values of <sup>V</sup>, being *σ*<sup>1</sup> ≥ *σ*<sup>2</sup> ≥ ··· ≥ *σρ* ≥ 0 and *ρ* = min*(mNh,P)* [8]. The POD basis is identified by the first *l* left singular vectors, *φi*, of V, so that the reduced POD space is *V l* POD <sup>=</sup> span{*φ*1*,..., <sup>φ</sup>l*}, with dim*(V <sup>l</sup>* POD*)* = *l* and *l mNh*. In the numerical assessment below, value *l* coincides with the smallest integer such that *σ*<sup>2</sup> *<sup>l</sup> < ε*, with *ε* a prescribed tolerance.

The online phase of HiPOD approximates the HiMod solution to problem (15) for a new value, *μ*∗, of the parameter by exploiting the POD basis instead of solving system (6). This is performed via a projection step. After assembling the HiMod stiffness matrix and right-hand side, *A*HiMod *<sup>m</sup> (μ*∗*)* and **f** HiMod *<sup>m</sup> (μ*∗*)*, associated with the new value of the parameter, we solve the POD system of order *l*

$$A\_{\rm POD}(\mu^\*)\mathbf{u}\_{\rm POD}(\mu^\*) = \mathbf{f}\_{\rm POD}(\mu^\*),\tag{18}$$

where *<sup>A</sup>*POD*(μ*∗*)* <sup>=</sup> *(<sup>l</sup>* POD*)<sup>T</sup> <sup>A</sup>*HiMod *<sup>m</sup> (μ*∗*) <sup>l</sup>* POD and **<sup>f</sup>**POD*(μ*∗*)* <sup>=</sup> *(<sup>l</sup>* POD*)<sup>T</sup>* **<sup>f</sup>** HiMod *m (μ*∗*)* denote the POD stiffness matrix and right-hand side, respectively with *<sup>l</sup>* POD = [*φ*1*,..., <sup>φ</sup>l*] ∈ <sup>R</sup>*(mNh)*×*<sup>l</sup>* the matrix collecting the POD basis vectors. The HiMod solution is thus approximated by vector *<sup>l</sup>* POD**u**POD*(μ*∗*)* <sup>∈</sup> <sup>R</sup>*mNh* , i.e., after solving a system of order *l* instead of *mNh*. Overall, HiPOD requires to solve *P* linear systems of order *mNh* during the offline phase, additionally to a system of order *l* in the online phase.

To check the performances of HiPOD, we adopt the test case used above for PGD, for the same values of the parameters, *μ*<sup>∗</sup> = 1 and *μ*<sup>∗</sup> = 2*.*5. The reference solution is the corresponding HiMod approximation computed by using *m* = 15 sinusoidal functions in the *y*-direction, and a linear FE discretization along the mainstream based on a uniform subdivision of 1*<sup>D</sup>* into 50 subintervals. The same HiMod discretization is adopted to build the response matrix. Concerning the HiPOD approximation, we pick *P* = 100 by uniformly sampling the interval [1*,* <sup>5</sup>], and we select *<sup>ε</sup>* <sup>=</sup> <sup>2</sup>*.*<sup>5</sup> · <sup>10</sup>−15. This choice sets the dimension of the POD space to *l* = 8, so that we have to solve a system of order 8 instead of 750. The contour plots in Fig. 3 qualitatively compare the HiMod solution with the HiPOD approximation for *l* = 1. The correspondence between the two approximations is good despite a single POD mode is employed (in such a case, system (18)

**Fig. 3** Contour plots: comparison between the reference HiMod solution (left) and the HiPOD approximation with *l* = 1 (right), for *μ*<sup>∗</sup> = 1 (top) and *μ*∗ = 2*.*5 (bottom). Table: relative error between HiMod and HiPOD solutions with respect to the *L*2*()*-norm

reduces to a scalar equation). We do not provide the HiPOD approximations for *l* = 8 since they qualitatively coincide with the corresponding HiMod solution. The left panels can be additionally compared with the FE solutions in Fig. 2 to verify the reliability of the HiMod procedure. Finally, the table in Fig. 3 gathers the *L*2*()*-norm of the relative error between HiMod and HiPOD solutions, for four different POD bases and for three choices of the viscosity (1, 2*.*5 and the average over a sampling of 30 random values of *μ*). The error monotonically decreases for larger and larger values of *l*, independently of the choice for *μ*. If we compare the values for *μ* = 1 and for *μ* = 2*.*5 (one of the endpoints and the midpoint of the sampling interval, respectively), we notice a higher accuracy (of about one order of magnitude) for the latter choice. This is rather standard in projection-based reduced order modeling [11]. Concerning the computational saving in terms of CPU time, HiPOD method requires on average *O(*10−3*)*[s] to be compared with *O(*10*)*[s] demanded by HiMod, resulting in a speedup of 104.

Although PGD and HiPOD are not directly comparable due to the different purpose they pursue, we highlight the main pros and cons of the two methods. The explicit dependence of the approximation on the parameters makes PGD an ideal tool to efficiently deal with parametric problems. For any new parameter, a direct evaluation yields the corresponding PGD approximation. On the other hand, HiPOD suffers of the drawbacks typical of the projection-based methods. The main bottleneck is the assembling of the HiMod arrays involved in *A*POD*(μ*∗*)* and **f**POD*(μ*∗*)*.

When PGD is applied to parametric problems, we recover the possibility to deal with any geometric domain. In such a case, a partial separability is applied to the problem, so that the space independent variables are kept together whereas parameters are separated. This approach clearly looses the computational advantages due to space separability. On the contrary, HiPOD inherits the geometric flexibility of the HiMod reduction, without giving up the spatial dimensional reduction of the problem.

**Acknowledgements** The authors thank Yves Antonio Brandes Costa Barbosa for his support in the HiMod simulations. This work has been partially funded by GNCS-INdAM 2018 project on "Tecniche di Riduzione di Modello per le Applicazioni Mediche". F. Ballarin also acknowledges the support by European Union Funding for Research and Innovation, Horizon 2020 Program, in the framework of European Research Council Executive Agency: H2020 ERC Consolidator Grant 2015 AROMA-CFD project 681447 "Advanced Reduced Order Methods with Applications in Computational Fluid Dynamics" (P. I. G. Rozza).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Recurrence Relations for a Family of Orthogonal Polynomials on a Triangle**

**Sheehan Olver, Alex Townsend, and Geoffrey M. Vasil**

#### **1 Introduction**

In 1975, Koornwinder described a general procedure for constructing multivariate orthogonal polynomials from univariate ones [4, §3.7.2]. The procedure allows for the construction of seven classes of bivariate orthogonal polynomials from Jacobi polynomials, some of which were previously known [9]. In this paper, we consider a four-parameter variant of Koornwinder's class IV polynomials (the four-parameter variant was not constructed by Koornwinder) defined as [2]

$$\begin{split} P\_{n,k}^{(a,b,c,d)}(\mathbf{x}, \mathbf{y}) &= P\_{n-k}^{(2k+b+c+d+1,a)} (2\mathbf{x} - 1) (1 - \mathbf{x})^k P\_k^{(c,b)} \left( -1 + \frac{2\mathbf{y}}{1 - \mathbf{x}} \right) \\ &= \tilde{P}\_{n-k}^{(2k+b+c+d+1,a)} (\mathbf{x}) (1 - \mathbf{x})^k \tilde{P}\_k^{(c,b)} \left( \frac{\mathbf{y}}{1 - \mathbf{x}} \right), \end{split} \tag{1}$$

where *a, b, c >* <sup>−</sup>1, *<sup>n</sup>* and *<sup>k</sup>* are integers such that *<sup>n</sup>* <sup>≥</sup> *<sup>k</sup>* <sup>≥</sup> 0, *<sup>P</sup>(a,b) <sup>k</sup> (x)* is the Jacobi polynomial of degree *k* [7, Table 18.3.1], and *P*˜*(a,b) <sup>k</sup>* is the Jacobi polynomial of degree *k* shifted to have support on *(*0*,* 1*)*. Koornwinder's construction derives the polynomials with *<sup>d</sup>* <sup>=</sup> 0, which we denote by *<sup>P</sup>(a,b,c) n,k* . The polynomials in (1)

S. Olver (-)

A. Townsend

G. M. Vasil

Department of Mathematics, Imperial College, London, UK e-mail: s.olver@imperial.ac.uk

Department of Mathematics, Cornell University, Ithaca, NY, USA e-mail: townsend@cornell.edu

School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia e-mail: geoffrey.vasil@sydney.edu.au

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_5

are orthogonal on the right-angled triangle {*(x, y)* : 0 *<x<* 1*,* 0 *<y<* 1 − *x*} with respect to the weight function *wa,b,c,d(x, y)* <sup>=</sup> *<sup>x</sup>ayb(*<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>−</sup> *y)c(*<sup>1</sup> <sup>−</sup> *x)<sup>d</sup>* .

The basis *P(a,b,c) n,k* has been used extensively by the spectral element community, see the overview in [3]. The recurrence relations we derive can be employed to reduce partial differential operators to sparse matrices, enabling efficient solution of linear partial differential equations defined on triangles, which will be the topic of a future paper. This is analogous to the ultraspherical spectral method for solving ordinary differential equations on bounded intervals [8]. A similar idea using a hierarchy of Zernike polynomials, which are bivariate orthogonal polynomials on the unit disk, is used in [13] to develop a sparse spectral method for solving partial differential equations defined on the disk [13]. On the disk, polar coordinates allow for radially symmetric partial differential operators to be reduced to ordinary differential operators acting on Jacobi polynomials [13]. This simplification does not translate to non-radially symmetric partial differential operators on the disk, nor partial differential operators on the triangle.

Several of the formulae in this paper have already be derived by directly employing recurrence relations satisfied by Jacobi polynomials [14]. Our approach via ladder operators is a more systematic study that derives previously unreported recurrence relations for *P(a,b,c) n,k* . We also hope to use ladder operators to derive sparse recurrence relations for multivariate orthogonal polynomials built from Jacobi polynomials on higher-dimensional simplices.

Throughout this paper, the recurrence relations hold for choices of the parameters *n*, *k*, *a*, *b*, *c*, and *d* that make the Jacobi polynomials well-defined. Moreover, we take *P(a,b)* <sup>−</sup><sup>1</sup> *(x)* <sup>=</sup> 0. Also, note that orthogonal polynomials remain orthogonal after an affine transformation so the recurrence relations in this paper for (1) on a rightangled triangle can be extended to any triangle, including triangles with the corners permuted.

The paper is structured as follows. In the next section, we give 12 ladder operators for Jacobi polynomials and use them to derive sparse recurrence relations for *P(a,b) <sup>n</sup>* . In Sect. 3 we give 24 ladder operators for (1) and write down the corresponding sparse recurrence relations for *P(a,b,c,d) n,k* . In Sect. 4, we use the ladder operators to derive a collection of sparse recurrence relations for differentiation, conversion, and multiplication that are satisfied by *P(a,b,c) n,k* . Section 5 applies these sparse recurrence relations to efficiently calculating Laplacians of functions on the triangle.

#### **2 Ladder Operators for Jacobi Polynomials**

We give 12 ordinary differential operators that increment or decrement the parameters and degree of Jacobi polynomials by zero or one. Each ladder operator maps *P(a,b) <sup>n</sup> (x)* to *P(a,*˜ *b)*˜ *<sup>n</sup>*˜ *(x)*, where | ˜*<sup>n</sup>* <sup>−</sup> *<sup>n</sup>*| ≤ 1, | ˜*<sup>a</sup>* <sup>−</sup> *<sup>a</sup>*| ≤ 1, and <sup>|</sup>*b*˜ <sup>−</sup> *<sup>b</sup>*| ≤ 1.

**Definition 1** The following operators are ladder operators for Jacobi polynomials:

$$\begin{aligned} \mathcal{L}\_1 u &= \frac{du}{dx} \\ \mathcal{L}\_1^\dagger u &= ((1+x)a - (1-x)b)u - (1-x^2)\frac{du}{dx} \\ \mathcal{L}\_2^\dagger u &= (a+b+n+1)u + (1+x)\frac{du}{dx} \\ \mathcal{L}\_2^\dagger u &= (2a+(1-x)n)u - (1-x^2)\frac{du}{dx} \\ \mathcal{L}\_3 u &= (a+b+n+1)u - (1-x)\frac{du}{dx} \\ \mathcal{L}\_3^\dagger u &= (2b+(1+x)n)u + (1-x^2)\frac{du}{dx} \\ \mathcal{L}\_4 u &= ((1+x)a - (1-x)(b+n+1))u - (1-x^2)\frac{du}{dx} \\ \mathcal{L}\_5 u &= ((1+x)a + n+1) - (1-x)b)u - (1-x^2)\frac{du}{dx} \\ \mathcal{L}\_5 u &= ((1+x)a + (1+x)\frac{du}{dx} \\ \mathcal{L}\_6 u &= bu + (1+x)\frac{du}{dx} \end{aligned}$$

The notation for the ladder operators is chosen so that <sup>L</sup>† *s*L*sP(a,b) <sup>n</sup>* and <sup>L</sup>*s*L† *sP(a,b) <sup>n</sup>* are scalar multiples of *P(a,b) <sup>n</sup>* for 1 ≤ *s* ≤ 6. These ladder operators are carefully constructed to give rise to sparse recurrence relations for Jacobi polynomials.

**Lemma 1** *The ladder operators give sparse recurrence relations for Jacobi polynomials:*

$$\begin{aligned} \mathcal{L}\_1 P\_n^{(a,b)} &= \frac{1}{2} (n+a+b+1) P\_{n-1}^{(a+1,b+1)} & \mathcal{L}\_1^\dagger P\_n^{(a,b)} &= 2(n+1) P\_{n+1}^{(a-1,b-1)} \\ \mathcal{L}\_2 P\_n^{(a,b)} &= (n+a+b+1) P\_n^{(a+1,b)} & \mathcal{L}\_2^\dagger P\_n^{(a,b)} &= 2(n+a) P\_n^{(a-1,b)} \\ \mathcal{L}\_3 P\_n^{(a,b)} &= (n+a+b+1) P\_n^{(a,b+1)} & \mathcal{L}\_3^\dagger P\_n^{(a,b)} &= 2(n+b) P\_n^{(a,b-1)} \\ \mathcal{L}\_4 P\_n^{(a,b)} &= 2(n+1) P\_{n+1}^{(a-1,b)} & \mathcal{L}\_4^\dagger P\_n^{(a,b)} &= (n+b) P\_{n-1}^{(a+1,b)} \\ \mathcal{L}\_5 P\_n^{(a,b)} &= 2(n+1) P\_{n+1}^{(a,b-1)} & \mathcal{L}\_5^\dagger P\_n^{(a,b)} &= (n+a) P\_{n-1}^{(a,b+1)} \\ \mathcal{L}\_6 P\_n^{(a,b)} &= (n+b) P\_n^{(a+1,b-1)} & \mathcal{L}\_6^\dagger P\_n^{(a,b)} &= (n+a) P\_n^{(a-1,b+1)} \end{aligned}$$

*Proof* The relationship for <sup>L</sup><sup>1</sup> is a formula for the derivative of *<sup>P</sup>(a,b) <sup>n</sup> (x)* [7, 18.9.15] and relationship <sup>L</sup>† <sup>1</sup> is equivalent to [7, 18.9.16]. Six more follow from expressing the left- and right-hand sides in terms of <sup>2</sup>*F*<sup>1</sup> functions using [7, 18.5.7] and the reflection formula *P(a,b) <sup>n</sup> (x)* <sup>=</sup> *(*−1*)nP(b,a) <sup>n</sup> (*−*x)*: <sup>L</sup>† <sup>4</sup> and <sup>L</sup>† <sup>4</sup> are equivalent to [7, 15.5.3], <sup>L</sup>† <sup>6</sup> is equivalent to [7, 15.5.4], L<sup>4</sup> and L<sup>5</sup> are equivalent to [7, 15.5.5], and L<sup>6</sup> is equivalent to [7, 15.5.6].

**Fig. 1** Illustration of the 12 ladder operators for Jacobi polynomials in Definition 1

The relationship for L<sup>2</sup> follows from combining [7, 18.9.5] and [7, 18.9.6]. The relationships for <sup>L</sup>† <sup>2</sup> follows by writing

$$\mathcal{L}\_2^\dagger = \mathcal{L}\_1^\dagger + (n+a+b)(1-x)$$

and then using [7, 18.9.5] and [7, 18.9.6]. Finally, <sup>L</sup><sup>3</sup> and <sup>L</sup>† <sup>3</sup> follow just as L<sup>2</sup> and L† <sup>2</sup>, using the reflection formula.

*Remark 1* We note that the first-order differential operators occurring in Lemma 1 form together an action of the Lie algebra sl*(*4*)* [1, 5, 6].

Figure 1 illustrates the ladder operators and how they increment or decrement the parameters associated to a Jacobi polynomial.

The ladder operators can be easily adapted to the shifted Jacobi polynomials, denoted by *P*˜*(a,b) <sup>k</sup>* , which are supported on *(*0*,* 1*)*, with *x* in place of 1 + *x*, and no factors of 2. The corresponding recurrence relations for *P*˜*(a,b) <sup>k</sup>* are the same as in Lemma 1, except the multiplicative factors of <sup>1</sup> <sup>2</sup> and 2 are replaced by 1.

#### **3 Ladder Operators for** *P(a,b,c,d) n,k*

The 12 ladder operators for the Jacobi polynomials in Sect. 2 allow us to derive 24 ladder operators for *P(a,b,c,d) n,k* . The ladder operators are carefully defined so that they map *P(a,b,c,d) n,k* to a scalar multiple of *<sup>P</sup>(a,*˜ *b,*˜ *c,*˜ *d)*˜ *n,*˜ *<sup>k</sup>*˜ , where the new parameters in *P(a,*˜ *b,*˜ *c,*˜ *d)*˜ *n,*˜ *<sup>k</sup>*˜ are *<sup>n</sup>*, *<sup>k</sup>*, *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup>* or *<sup>d</sup>*, respectively, incremented or decremented by 0 or 1.

To highlight the symmetries of the right-angled triangle and make the recurrences more convenient to write down, we define

$$z := 1 - x - y \qquad \text{and} \qquad \frac{\partial}{\partial z} := \frac{\partial}{\partial y} - \frac{\partial}{\partial x},$$

as in [14]. Now, the variables *x*, *y*, and *z* have the convenient property that any affine transformation that maps the triangle onto itself has the effect of exchanging the roles of *x*, *y*, and *z*.

**Definition 2** The following operators are ladder operators for *P(a,b,c,d) n,k* . The first set of 12 are:

<sup>M</sup>0*,*1*<sup>u</sup>* <sup>=</sup> *∂u ∂y* <sup>M</sup>† <sup>0</sup>*,*1*<sup>u</sup>* <sup>=</sup> *(yc* <sup>−</sup> *zb)u* <sup>−</sup> *yz ∂u ∂y* <sup>M</sup>0*,*2*<sup>u</sup>* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)u* <sup>+</sup> *<sup>y</sup> ∂u ∂y* <sup>M</sup>† <sup>0</sup>*,*2*u* = *<sup>c</sup>* <sup>+</sup> *<sup>k</sup>* <sup>−</sup> *yk* 1−*x <sup>u</sup>* <sup>−</sup> *<sup>y</sup>* <sup>1</sup>−*<sup>x</sup> <sup>z</sup> ∂u ∂y* <sup>M</sup>0*,*3*<sup>u</sup>* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)u* <sup>−</sup> *<sup>x</sup> ∂u ∂y* <sup>M</sup>† <sup>0</sup>*,*3*u* = *<sup>b</sup>* <sup>+</sup> *ky* 1−*x <sup>u</sup>* <sup>+</sup> *<sup>y</sup>* <sup>1</sup>−*<sup>x</sup> <sup>z</sup> ∂u ∂y* <sup>M</sup>0*,*4*<sup>u</sup>* <sup>=</sup> *(yc* <sup>−</sup> *z(b* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> <sup>1</sup>*))u* <sup>−</sup> *yz ∂u ∂y* <sup>M</sup>† <sup>0</sup>*,*4*<sup>u</sup>* = − *<sup>k</sup>* <sup>1</sup>−*<sup>x</sup> <sup>u</sup>* <sup>+</sup> *<sup>y</sup>* 1−*x ∂u ∂y* <sup>M</sup>0*,*5*<sup>u</sup>* <sup>=</sup> *(y(c* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> <sup>1</sup>*))u* <sup>−</sup> *zb* <sup>−</sup> *yz ∂u ∂y* <sup>M</sup>† <sup>0</sup>*,*5*<sup>u</sup>* <sup>=</sup> *<sup>k</sup>* <sup>1</sup>−*<sup>x</sup> <sup>u</sup>* <sup>+</sup> <sup>1</sup> <sup>−</sup> *<sup>y</sup>* 1−*x ∂u ∂y* <sup>M</sup>0*,*6*<sup>u</sup>* <sup>=</sup> *cu* <sup>−</sup> *<sup>z</sup> ∂u ∂y* <sup>M</sup>† <sup>0</sup>*,*6*<sup>u</sup>* <sup>=</sup> *bu* <sup>+</sup> *<sup>y</sup> ∂u ∂y .*

The second set of 12 are:

<sup>M</sup>1*,*0*<sup>u</sup>* <sup>=</sup> *<sup>k</sup>* <sup>1</sup>−*<sup>x</sup> <sup>u</sup>* <sup>+</sup> *∂u ∂x* <sup>−</sup> *<sup>y</sup>* 1−*x ∂u ∂y ,* M† <sup>1</sup>*,*0*<sup>u</sup>* <sup>=</sup> *(x(k* <sup>+</sup> *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>1</sup>*)* <sup>−</sup> *a)u* <sup>−</sup> *x(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>+</sup> *xy ∂u ∂y* <sup>M</sup>2*,*0*<sup>u</sup>* <sup>=</sup> *(n* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>2</sup>*)u* <sup>+</sup> *xk* <sup>1</sup>−*<sup>x</sup> <sup>u</sup>* <sup>+</sup> *<sup>x</sup> ∂u ∂x* <sup>−</sup> *xy* 1−*x ∂u ∂y* M† <sup>2</sup>*,*0*<sup>u</sup>* <sup>=</sup> *(n* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>1</sup> <sup>−</sup> *xn)u* <sup>−</sup> *x(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>+</sup> *xy ∂u ∂y* <sup>M</sup>3*,*0*<sup>u</sup>* <sup>=</sup> *(n* <sup>+</sup> *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>2</sup>*)u* <sup>−</sup> *(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>+</sup> *<sup>y</sup> ∂u ∂y* M† <sup>3</sup>*,*0*<sup>u</sup>* <sup>=</sup> *(a* <sup>+</sup> *xn)u* <sup>+</sup> *x(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>−</sup> *xy ∂u ∂y* <sup>M</sup>4*,*0*<sup>u</sup>* <sup>=</sup> *(x(n* <sup>+</sup> *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>2</sup>*)* <sup>−</sup> *<sup>a</sup>* <sup>−</sup> *<sup>n</sup>* <sup>+</sup> *<sup>k</sup>* <sup>−</sup> <sup>1</sup>*)u* <sup>−</sup> *x(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>+</sup> *xy ∂u ∂y* M† <sup>4</sup>*,*0*<sup>u</sup>* <sup>=</sup> *<sup>k</sup>* <sup>1</sup>−*<sup>x</sup> <sup>u</sup>* <sup>−</sup> *nu* <sup>+</sup> *<sup>x</sup> ∂u ∂x* <sup>−</sup> *xy* 1−*x ∂u ∂y ,* <sup>M</sup>5*,*0*<sup>u</sup>* <sup>=</sup> *nu* <sup>+</sup> *(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>−</sup> *<sup>y</sup> ∂u ∂y* M† <sup>5</sup>*,*0*<sup>u</sup>* <sup>=</sup> *x(n* <sup>+</sup> *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>2</sup>*)u* <sup>−</sup> *au* <sup>−</sup> *x(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>+</sup> *xy ∂u ∂y* <sup>M</sup>6*,*0*<sup>u</sup>* <sup>=</sup> *au* <sup>+</sup> *xk* <sup>1</sup>−*<sup>x</sup> <sup>u</sup>* <sup>+</sup> *<sup>x</sup> ∂u ∂x* <sup>−</sup> *xy* 1−*x ∂u ∂y ,* M† <sup>6</sup>*,*0*<sup>u</sup>* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> *<sup>d</sup>* <sup>+</sup> <sup>1</sup>*)u* <sup>−</sup> *(*<sup>1</sup> <sup>−</sup> *x) ∂u ∂x* <sup>+</sup> *<sup>y</sup> ∂u ∂y .*

The notation for the ladder operators is chosen so that the recurrence relations in Theorem <sup>1</sup> are derived for <sup>M</sup>*s,*<sup>0</sup> (resp. <sup>M</sup>0*,s*) by applying <sup>L</sup>*<sup>s</sup>* or <sup>L</sup>† *<sup>s</sup>* to the first (resp. second) Jacobi polynomial in *P(a,b,c,d) n,k (x, y)* for 1 ≤ *s* ≤ 6. Moreover, we know that <sup>M</sup>† *s,*0M*s,*0*P(a,b,c,d) n,k* , <sup>M</sup>† 0*,s*M<sup>0</sup>*,sP(a,b,c,d) n,k* , <sup>M</sup>*s,*0M† *s,*0*P(a,b,c,d) n,k* , and M0*,s*M† 0*,sP(a,b,c,d) n,k* are scalar multiples of *<sup>P</sup>(a,b,c,d) n,k* for 1 ≤ *s* ≤ 6.

The ladder operators in Definition 2 correspond to 24 sparse recurrence relations for *P(a,b,c,d) n,k* . To derive these recurrences, we first express the partial derivatives of *P(a,b,c,d) n,k* as derivatives of shifted Jacobi polynomials.

**Proposition 1** *The following relationships hold:*

$$
\tilde{P}\_{n-k}^{(2k+b+c+d+1,a)}(x)(1-x)^k \left[ \tilde{P}\_k^{(c,b)} \right]' \left( \frac{\mathbf{y}}{1-x} \right) = (1-x) \frac{\partial}{\partial \mathbf{y}} P\_{n,k}^{(a,b,c,d)}(x,\mathbf{y}),\tag{2}
$$

$$\left[\tilde{P}\_{n-k}^{(2k+b+c+d+1,a)}\right]'(x)(1-x)^{k+1}\tilde{P}\_k^{(c,b)}\left(\frac{\text{y}}{1-x}\right) = \left(k + (1-x)\frac{\partial}{\partial x} - y\frac{\partial}{\partial y}\right)P\_{n,k}^{(a,b,c,d)}(x,y). \tag{5}$$

*Proof* The first relationship is immediate. The second relationship follows from the chain-rule:

 $f(1-x)\frac{\partial}{\partial x}\left(f(x)(1-x)^k g\left(\frac{y}{1-x}\right)\right) = f'(x)(1-x)^{k+1}g\left(\frac{y}{1-x}\right)$ 
$$-kf(x)(1-x)^k g(x) + yf(x)(1-x)^{k-1}g'\left(\frac{y}{1-x}\right)$$

and an application of (2) to simplify the last term.

The 24 sparse recurrence relations for *P(a,b,c,d) n,k* are given in the following theorem.

**Theorem 1** *Let t* = *a* + *b* + *c* + *d. The first set of 12 are:*

M0*,*1*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)P(a,b*+1*,c*+1*,d) <sup>n</sup>*−1*,k*−<sup>1</sup> <sup>M</sup>† 0*,*1*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> <sup>1</sup>*)P(a,b*−1*,c*−1*,d) n*+1*,k*+1 M0*,*2*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)P(a,b,c*+1*,d*−1*) n,k* <sup>M</sup>† 0*,*2*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *c)P(a,b,c*−1*,d*+1*) n,k* M0*,*3*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)P(a,b*+1*,c,d*−1*) n,k* <sup>M</sup>† 0*,*3*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *b)P(a,b*−1*,c,d*+1*) n,k* M0*,*4*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> <sup>1</sup>*)P(a,b,c*−1*,d*−1*) <sup>n</sup>*+1*,k*+<sup>1</sup> <sup>M</sup>† 0*,*4*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *b)P(a,b,c*+1*,d*+1*) n*−1*,k*−1 M0*,*5*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> <sup>1</sup>*)P(a,b*−1*,c,d*−1*) <sup>n</sup>*+1*,k*+<sup>1</sup> <sup>M</sup>† 0*,*5*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *c)P(a,b*+1*,c,d*+1*) n*−1*,k*−1 M0*,*6*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *c)P(a,b*+1*,c*−1*,d) n,k* <sup>M</sup>† 0*,*6*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *b)P(a,b*−1*,c*+1*,d) n,k .*

*The second set of 12 are:*

$$\begin{aligned} \mathcal{M}\_{1,0}P\_{n,k}^{(a,b,c,d)} &= (n+k+t+2)P\_{n-1,k}^{(a+1,b,c,d+1)}\\ \mathcal{M}\_{1,0}^{\\\dagger}P\_{n,k}^{(a,b,c,d)} &= (n-k+1)P\_{n+1,k}^{(a-1,b,c,d-1)}\\ \mathcal{M}\_{2,0}P\_{n,k}^{(a,b,c,d)} &= (n+k+t+2)P\_{n,k}^{(a,b,c,d+1)}\\ \mathcal{M}\_{2,0}^{\\\dagger}P\_{n,k}^{(a,b,c,d)} &= (n+k+t-a+1)P\_{n,k}^{(a,b,c,d-1)} \end{aligned}$$

$$\begin{aligned} \mathcal{M}\_{3,0}P\_{n,k}^{(a,b,c,d)} &= (n+k+t+2)P\_{n,k}^{(a+1,b,c,d)}\\ \mathcal{M}\_{3,0}^{\dagger}P\_{n,k}^{(a,b,c,d)} &= (n-k+a)P\_{n,k}^{(a-1,b,c,d)}\\ \mathcal{M}\_{4,0}P\_{n,k}^{(a,b,c,d)} &= (n-k+1)P\_{n+1,k}^{(a,b,c,d-1)}\\ \mathcal{M}\_{4,0}^{\dagger}P\_{n,k}^{(a,b,c,d)} &= (n-k+a)P\_{n-1,k}^{(a,b,c,d+1)}\\ \mathcal{M}\_{5,0}P\_{n,k}^{(a,b,c,d)} &= (n+k+t-a+1)P\_{n-1,k}^{(a+1,b,c,d)}\\ \mathcal{M}\_{5,0}^{\dagger}P\_{n,k}^{(a,b,c,d)} &= (n-k+1)P\_{n+1,k}^{(a-1,b,c,d)}\\ \mathcal{M}\_{6,0}P\_{n,k}^{(a,b,c,d)} &= (n-k+a)P\_{n,k}^{(a-1,b,c,d+1)}\\ \end{aligned}$$

$$\begin{aligned} \mathcal{M}\_{6,0}^{\dagger}P\_{n,k}^{(a,b,c,d)} &= (n-k+a)P\_{n,k}^{(a-1,b,c,d+1)}\\ \end{aligned}$$

*Proof* We present the proof of <sup>M</sup>0*,*1*P(a,b,c,d) n,k* <sup>=</sup> *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)P(a,b*+1*,c*+1*,d) <sup>n</sup>*−1*,k*−<sup>1</sup> . By the definition of *P(a,b,c,d) n,k* in (1), the chain rule, and the relationship in (2), we have

$$\begin{split} \frac{\partial}{\partial \mathbf{y}} \tilde{P}\_{n,k}^{(a,b,c,d)}(\mathbf{x}, \mathbf{y}) &= \tilde{P}\_{n-k}^{(2k+b+c+d+1,a)}(\mathbf{x}) (1-\mathbf{x})^{k-1} [\tilde{P}\_k^{(c,b)}]' \left( \frac{\mathbf{y}}{1-\mathbf{x}} \right) \\ &= (k+c+b+1) \tilde{P}\_{n-k}^{(2k+b+c+d+1,a)}(\mathbf{x}) (1-\mathbf{x})^{k-1} \tilde{P}\_{k-1}^{(c+1,b+1)} \left( \frac{\mathbf{y}}{1-\mathbf{x}} \right), \end{split} \tag{4}$$

where the last equality comes from applying L<sup>1</sup> in Definition 1. The final expression in (4) is equivalent to *(k* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*)P(a,b*+1*,c*+1*,d) <sup>n</sup>*−1*,k*−<sup>1</sup> . The manipulations for the remaining recurrence relations are similar, except with different choices of the operators <sup>L</sup>*<sup>s</sup>* or <sup>L</sup>† *<sup>s</sup>* and combinations of (2) and (3).

#### **4 Sparse Recurrence Relations for** *P(a,b,c) n,k*

We can combine the ladder operators in Sect. 3 to derive sparse recurrence relations between *P(a,b,c) n,k* polynomials with different parameters and their partial derivatives. These recurrence relations are analogous to many of the sparse recurrence relations for Jacobi polynomials [7, §18.9].

#### *4.1 Differentiation*

The partial derivatives of *P(a,b,c) n,k* can be written in terms of Jacobi polynomials on the triangle with incremented parameters, which is analogous to a recurrence relation for the derivative of a Jacobi polynomial [7, 18.9.15]. A similar recurrence for *P(a,b,c) n,k* can be found in [14, Prop. 4.6, 4.7, & 4.8].

**Corollary 1** *The following recurrence relations hold:*

$$\begin{aligned} (2k+b+c+1)\frac{\partial}{\partial x}P\_{n,k}^{(a,b,c)} &= (n+k+a+b+c+2)(k+b+c+1)P\_{n-1,k}^{(a+1,b,c+1)} \\ &+ (k+b)(n+k+b+c+1)P\_{n-1,k-1}^{(a+1,b,c+1)}, \end{aligned} \tag{5}$$

$$\frac{\partial}{\partial y}P\_{n,k}^{(a,b,c)} = (k+b+c+1)P\_{n-1,k-1}^{(a,b+1,c+1)},\tag{6}$$

$$\begin{split} (2k+b+c+1)\frac{\partial}{\partial z}P\_{n,k}^{(a,b,c)} &= -(n+k+a+b+c+2)(k+b+c+1)P\_{n-1,k}^{(a+1,b+1,c)} \\ &+ (k+c)(n+k+b+c+1)P\_{n-1,k-1}^{(a+1,b+1,c)} .\end{split} \tag{7}$$

*Proof* The recurrence (5) follows from the fact that *(*M1*,*0M0*,*<sup>2</sup> <sup>+</sup> <sup>M</sup>† 0*,*4M† <sup>6</sup>*,*0*)u* = *(*2*<sup>k</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*) ∂u ∂x* when *d* = 0. The relationship (6) is equivalent to the relation given by M0*,*<sup>1</sup> in Theorem 1 when *d* = 0. Finally, (7) follows from the fact that *(*M1*,*0M0*,*<sup>3</sup> <sup>−</sup> <sup>M</sup>† 0*,*5M† <sup>6</sup>*,*0*)u* = −*(*2*<sup>k</sup>* <sup>+</sup> *<sup>b</sup>* <sup>+</sup> *<sup>c</sup>* <sup>+</sup> <sup>1</sup>*) ∂u ∂z* .

The derivatives of weighted versions of *P(a,b,c) n,k* also satisfy sparse recurrence relations, which are analogous to an expression for the derivative of a weighted Jacobi polynomial [7, 18.9.16].

**Corollary 2** *The following recurrence relations hold:*

$$\begin{split} -(2k+b+c+1)\frac{\partial}{\partial x}\left(x^{a}\,y^{b}\,z^{c}\,P\_{n,k}^{(a,b,c)}\right) &= x^{a-1}\,y^{b}\,z^{c-1}\Big((k+c)(n-k+1)\,P\_{n+1,k}^{(a-1,b,c-1)}\Big) \\ &+ (k+1)(n-k+a)\,P\_{n+1,k+1}^{(a-1,b,c-1)}\Big), \\ \frac{\partial}{\partial y}\left(x^{a}\,y^{b}\,z^{c}\,P\_{n,k}^{(a,b,c)}\right) &= -(k+1)x^{a}\,y^{b-1}\,z^{c-1}\,P\_{n+1,k+1}^{(a,b-1,c-1)}\Big) \\ (2k+b+c+1)\frac{\partial}{\partial z}\left(x^{a}\,y^{b}\,z^{c}\,P\_{n,k}^{(a,b,c)}\right) &= x^{a-1}\,y^{b-1}\,z^{c}\Big((k+b)(n-k+1)\,P\_{n+1,k}^{(a-1,b-1,c)}\Big) \\ &- (k+1)(n-k+a)\,P\_{n+1,k+1}^{(a-1,b-1,c)}\Big). \end{split}$$

*Proof* The first recurrence follows from

$$\begin{aligned} (\mathcal{M}\_{0,2}^\dagger \mathcal{M}\_{1,0}^\dagger + \mathcal{M}\_{0,4} \mathcal{M}\_{6,0})u &= (2k + b + c + 1)(c\chi - az - xz \frac{\partial}{\partial \chi})u \\ &= -(2k + b + c + 1)\mathbf{x}^{1 - a} z^{1 - c} \frac{\partial}{\partial \chi} (\mathbf{x}^a z^c u) .\end{aligned}$$

The second recurrence holds since

$$\mathcal{M}\_{0,1}^{\dagger}\mu = (c\mathbf{y} - bz - \mathbf{y}z \frac{\partial}{\partial \mathbf{y}})\mu = -\mathbf{y}^{1-b}z^{1-c}\frac{\partial}{\partial x}(\mathbf{y}^{b}z^{c}\mu).$$

The third recurrence is derived from the fact that

$$\begin{split} (\mathcal{M}\_{0,3}^{\dagger}\mathcal{M}\_{1,0}^{\dagger} - \mathcal{M}\_{0,5}\mathcal{M}\_{6,0})u &= (2k+b+c+1)(b\mathbf{x} - a\mathbf{y} + \mathbf{x}\mathbf{y}(\frac{\partial}{\partial \mathbf{y}} - \frac{\partial}{\partial \mathbf{x}}))u \\ &= (2k+b+c+1)\mathbf{x}^{1-a}\mathbf{y}^{1-b}\mathbf{z}^{-c}\frac{\partial}{\partial \boldsymbol{z}}(\mathbf{x}^{a}\mathbf{y}^{b}\boldsymbol{z}^{c}u). \end{split}$$

#### *4.2 Conversion*

Recurrence relations for conversion allow us to express *P(a,b,c) n,k* in terms of Jacobi polynomials on the triangle with different parameters. Here, we give the recurrence relations that increment the parameters, which are analogues of [7, 18.9.3]. Similar relations can be found in [14, Prop. 4.4].

**Corollary 3** *The following recurrence relations hold:*

$$\begin{split} (2n+a+b+c+2)P\_{n,k}^{(a,b,c)} &= (n+k+a+b+c+2)P\_{n,k}^{(a+1,b,c)} \\ &+ (n+k+b+c+1)P\_{n-1,k}^{(a+1,b,c)}, \\ (2n+a+b+c+2)(2k+b+c+1)P\_{n,k}^{(a,b,c)} \\ &= (n+k+a+b+c+2)(k+b+c+1)P\_{n,k}^{(a,b+1,c)} \\ &- (n-k+a)(k+b+c+1)P\_{n-1,k}^{(a,b+1,c)} + (k+c)(n+k+b+c+1)P\_{n-1,k-1}^{(a,b+1,c)} \\ &- (k+c)(n-k+1)P\_{n,k-1}^{(a,b+1,c)}, \\ (2n+a+b+c+2)(2k+b+c+1)P\_{n,k}^{(a,b,c)} \\ &= (n+k+a+b+c+2)(k+b+c+1)P\_{n,k}^{(a,b,c+1)} \\ &- (n-k+a)(k+b+c+1)P\_{n-1,k}^{(a,b,c+1)} - (k+b)(n+k+b+c+1)P\_{n-1,k-1}^{(a,b,c+1)} \\ &+ (k+b)(n-k+1)P\_{n,k-1}^{(a,b,c+1)}. \end{split} \tag{10}$$

*Proof* The recurrence relation in (8) follows from the fact that *(*M<sup>30</sup> + M50*)u* = *(*2*n*+*a*+*b*+*c*+2*)u* when *<sup>d</sup>* <sup>=</sup> 0. Since *(*M2*,*0−M† <sup>4</sup>*,*0*)u* = *(*2*n*+*a*+*b*+*c*+*d*+2*)u* and *(*M† <sup>2</sup>*,*<sup>0</sup> − M4*,*0*)u* = *(*2*n* + *a* + *b* + *c* + *d* + 2*)(*1 − *x)u*, we obtain

$$(\mathcal{M}\_{0,2}\mathcal{M}\_{2,0} - \mathcal{M}\_{0,2}\mathcal{M}\_{4,0}^\uparrow - \mathcal{M}\_{0,4}^\uparrow\mathcal{M}\_{2,0}^\uparrow + \mathcal{M}\_{0,4}^\uparrow\mathcal{M}\_{4,0})u = (2k + b + c + 1)(2n + a + b + c + d + 2)u,$$

The recurrence relation (10) immediately follows. Similarly, (9) holds since

$$\Delta(\mathcal{M}\_{0,3}\mathcal{M}\_{2,0} - \mathcal{M}\_{0,3}\mathcal{M}\_{4,0}^\uparrow + \mathcal{M}\_{0,5}^\uparrow\mathcal{M}\_{2,0}^\uparrow - \mathcal{M}\_{0,5}^\uparrow\mathcal{M}\_{4,0})u = (2k + b + c + 1)(2n + a + b + c + d + 2)u\dots$$

#### *4.3 Multiplication*

Recurrence relations for multiplication allow one to express *xP(a,b,c) n,k* , *yP(a,b,c) n,k* , and *zP(a,b,c) n,k* in terms of a sum of Jacobi polynomials on the triangle with potentially different parameters. The recurrences in Corollary 4 are analogous to the recurrence relations for *P(a,b) <sup>n</sup>* found in [7, 18.9.6].

**Corollary 4** *The following recurrence relations hold:*

$$(2n+a+b+c+2)\ge P\_{n,k}^{(a,b,c)} = (n-k+a)P\_{n,k}^{(a-1,b,c)} + (n-k+1)P\_{n+1,k}^{(a-1,b,c)},\tag{11}$$

$$\begin{aligned} (2k+b+c+1)(2n+a+b+c+2)\circ P\_{n,k}^{(a,b,c)} &= (k+b)(n+k+b+c+1)P\_{n,k}^{(a,b-1,c)} \\ &- (k+1)(n-k+a)P\_{n,k+1}^{(a,b-1,c)} - (k+b)(n-k+1)P\_{n+1,k}^{(a,b-1,c)} \\ &+ (k+1)(n+k+a+b+c+2)P\_{n+1,k+1}^{(a,b-1,c)}, \\ \end{aligned} \tag{12}$$

$$\begin{aligned} (2k+b+c+1)(2n+a+b+c+2)zP\_{n,k}^{(a,b,c)} &= (k+c)(n+k+b+c+1)P\_{n,k}^{(a,b,c-1)} \\ &+ (k+1)(n-k+a)P\_{n,k+1}^{(a,b,c-1)} - (k+c)(n-k+1)P\_{n+1,k}^{(a,b,c-1)} \\ &- (k+1)(n+k+a+b+c+2)P\_{n+1,k+1}^{(a,b,c-1)} . \end{aligned}$$

*Proof* The recurrence relation in (11) follows from the fact that *(*M† <sup>3</sup>*,*<sup>0</sup> <sup>+</sup>M† <sup>5</sup>*,*0*)u* = *(*2*n* + *a* + *b* + *c* + *d* + 2*)xu*. Since

$$(\mathcal{M}\_{03}^{\dagger}\mathcal{M}\_{20}^{\dagger} - \mathcal{M}\_{03}^{\dagger}\mathcal{M}\_{40} + \mathcal{M}\_{05}\mathcal{M}\_{20} - \mathcal{M}\_{05}\mathcal{M}\_{40}^{\dagger})u = (2k + b + c + 1)(2n + a + b + c + d + 2)yu$$

holds, we find that (12) is satisfied. Finally, (13) follows from

$$(\mathcal{M}\_{02}^{\dagger}\mathcal{M}\_{20}^{\dagger} - \mathcal{M}\_{02}^{\dagger}\mathcal{M}\_{40} + \mathcal{M}\_{04}\mathcal{M}\_{40}^{\dagger} - \mathcal{M}\_{04}\mathcal{M}\_{20})u = (2k + b + c + 1)(2n + a + b + c + d + 2)zu.$$

Combining the recurrence relations in Corollaries 3 and 4, we can derive expressions for *xP(a,b,c) n,k* , *yP(a,b,c) n,k* , and *zP(a,b,c) n,k* in terms of a sum of Jacobi polynomials on the triangle with parameters *(a, b, c)*. These are analogous to the three-term recurrence relation for Jacobi polynomials [7, 18.9.2]. Since these recurrence relations are long, we refer the reader to [2, pp. 80–81].

#### *4.4 Differential Eigenvalue Problems*

The polynomials *P(a,b,c) n,k* are eigenfunctions for second-order differential operators (see [2, (5.3.4)] and [14, Prop. 4.11]), and the ladder operators in Sect. 3 make it easy to derive this fact.

**Theorem 2** *The polynomial P(a,b,c) n,k satisfies two second-order differential eigenproblems:*

$$(z\mathbf{y}\frac{\partial^2}{\partial \mathbf{y}^2}P\_{n,k}^{(a,b,c)} + ((1+b)(1-\mathbf{x}) - (2+b+c)\mathbf{y})\frac{\partial}{\partial \mathbf{y}}P\_{n,k}^{(a,b,c)} = -k(k+b+c+1)P\_{n,k}^{(a,b,c)}$$

*and*

$$\begin{split} &\mathbf{x}(1-\mathbf{x})\frac{\partial^2}{\partial\mathbf{x}^2}P\_{n,k}^{(a,b,c)} - 2\mathbf{x}\mathbf{y}\frac{\partial^2}{\partial\mathbf{x}\partial\mathbf{y}}P\_{n,k}^{(a,b,c)} + \mathbf{y}(1-\mathbf{y})\frac{\partial^2}{\partial\mathbf{y}^2}P\_{n,k}^{(a,b,c)} \\ &\quad + (a+1-(a+b+c+3)\mathbf{x})\frac{\partial}{\partial\mathbf{x}}P\_{n,k}^{(a,b,c)} + (b+1-(a+b+c+3)\mathbf{y})\frac{\partial}{\partial\mathbf{y}}P\_{n,k}^{(a,b,c)} \\ &= -n(n+a+b+c+d+2)P\_{n,k}^{(a,b,c,d)}(\mathbf{x},\mathbf{y}). \end{split}$$

*Proof* The first equation follows from

$$(\mathcal{M}\_{0,1}\mathcal{M}\_{0,1}^\dagger P\_{n,k}^{(a,b,c,d)}(\mathbf{x}, \mathbf{y}) = k(k+b+c+1)P\_{n,k}^{(a,b,c,d)}(\mathbf{x}, \mathbf{y})$$

and the second from

$$\begin{bmatrix} \mathcal{M}\_{0,1} \mathcal{M}\_{0,1}^{\dagger} - k(k+b+c+1) \\ \hline 1-x \\ \end{bmatrix} + \mathcal{M}\_{3,0} \mathcal{M}\_{3,0}^{\dagger} + \mathcal{M}\_{4,0} \mathcal{M}\_{4,0}^{\dagger} \begin{bmatrix} \\ \\ \end{bmatrix} P\_{n,k}^{(a,b,c,d)}(\mathbf{x}, \mathbf{y}),$$
 
$$= (1+a-k+n)(3+t+2n) P\_{n,k}^{(a,b,c,d)}(\mathbf{x}, \mathbf{y}).$$

#### **5 Application: Calculating Laplacians**

We can use the recurrence relationships in this paper to calculate partial derivatives too. Slevinsky's fast triangle transform [11] (which builds on his fast spherical harmonic transform [10]) as implemented in the FastTransform multithreaded C code [12] gives an efficient and stable routine for calculating the expansion coefficients on the triangle in *O(d*<sup>2</sup> log2 *d)* operations, where *d* is the polynomial degree. The partial derivative recurrences (see Corollary 1) show us how to calculate the expansion coefficients of *∂p ∂x* and *∂p ∂y* with coefficients associated to the parameters *(a*+1*, b, c*+1*)* and *(a, b*+1*, c*+1*)*, respectively. Moreover, Corollary 3 informs us how to convert from expansions with parameters*(a, b, c)*to *(a*+1*, b, c)*,*(a, b*+1*, c)* and *(a, b, c* + 1*)*. We can combine these various recurrences relations to compute

**Fig. 2** Top: Error when evaluating the Laplacian of *f (x, y)* = cos*(nxy/*40*)* at *(x, y)* = *(*0*.*1*,* 0*.*2*)* by expanding in degree *N* = *(n* + 1*)(n* + 2*)/*2 Jacobi polynomials on the triangle with *(a, b, c)* = *(*0*,* 0*,* 0*)* and using the recurrences. Bottom: Execution times of (1) the fast transform, (2) constructing the recurrences as 8 banded-block-banded matrices, and (3) applying the matrices

the coefficients of the Laplacian in the basis *P(*2*,*2*,*2*) n,k* in an optimal complexity of *O(N)* <sup>=</sup> *O(d*2*)* operations, where *<sup>d</sup>* is the polynomial degree and *<sup>N</sup>* <sup>=</sup> *d(d* <sup>+</sup> <sup>1</sup>*)/*<sup>2</sup> is the total number of degrees of freedom.

For example, the Laplacian of *f (x, y)* = cos*(nxy/*40*)* can be computed by first approximating *f* on the unit right-angled triangle to within machine precision by a polynomial, and then employing various recurrence relationships to calculate its Laplacian (Fig. 2). To do this efficiently, we store the recurrence relations as banded-block-banded matrices to take advantage of fast banded matrix-vector multiplication.

#### **6 Conclusion**

We introduce ladder operators for systematically deriving sparse recurrence relations for differentiation, conversion, and multiplication of Jacobi and orthogonal polynomials on the triangle. We use these recurrences to efficiently apply partial differential operators, in particular for calculating Laplacians. The importance of these relationships is that they allow general linear partial differential operators with polynomial coefficients to be represented as sparse operators acting on orthogonal polynomial expansions. This application will be the topic of a subsequent paper.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Part II Contributed Papers**

## **Greedy Kernel Methods for Center Manifold Approximation**

**Bernard Haasdonk, Boumediene Hamzi, Gabriele Santin, and Dominik Wittwar**

### **1 Introduction**

Center manifold theory plays an important role in the study of the stability of dynamical systems when the equilibrium point is not hyperbolic. It isolates the complicated asymptotic behavior by locating the center manifold which is an invariant manifold tangent to the subspace spanned by the eigenspace of eigenvalues on the imaginary axis. Then, the dynamics of the original system will be essentially determined by the restriction of this dynamics on the center manifold since the local dynamic behavior "transverse" to this invariant manifold is relatively simple as it corresponds to the flows in the local stable (and unstable) manifolds. In practice, one does not compute the center manifold and its dynamics exactly since this requires the resolution of a quasilinear partial differential equation which is not easily solvable. In most cases of interest, an approximation of degree two or three of the solution is sufficient. Then, the reduced dynamics on the center manifold can be determined, its stability can be studied and then conclusions about the stability of the original system can be obtained [1, 3, 4, 6, 8].

In this article, we use greedy kernel methods to construct a data-based approximation of the center manifold. The present work is a preliminary study that is intended to introduce our concept and algorithm, and to test it on some examples.

gabriele.santin@mathematik.uni-stuttgart.de; dominik.wittwar@mathematik.uni-stuttgart.de

B. Haasdonk · G. Santin · D. Wittwar

Inst. of Applied Analysis and Numerical Simulation, University of Stuttgart, Stuttgart, Germany e-mail: bernard.haasdonk@mathematik.uni-stuttgart.de;

B. Hamzi (-) Department of Mathematics, Imperial College London, London, UK

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_6

#### **2 Background**

We consider a large dimensional dynamical system

$$
\dot{\mathbf{x}} = f(\mathbf{x}), \quad \mathbf{x} \in D,\tag{1}
$$

where *<sup>f</sup>* : *<sup>D</sup>* <sup>→</sup> <sup>R</sup>*<sup>n</sup>* is a continuously differentiable function over the domain *<sup>D</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* such that 0 <sup>∈</sup> *<sup>D</sup>*. We are interested in the study of the behavior of the system around an equilibrium point *x* ∈ *D*, i.e., *f (x)* = 0, possibly analyzing a smaller dimensional system.

Without loss of generality, we may assume that the equilibrium is *x* = 0, and, letting *<sup>L</sup>* <sup>=</sup> *∂ f ∂ x (x)*|*x*=0, we can rewrite (1) as

$$
\dot{\mathfrak{x}} = f(\mathfrak{x}) = L\mathfrak{x} + N(\mathfrak{x}),
$$

with a suitable nonlinear component *N*, and denote as *σ*R*(L)* the set of real parts of the eigenvalues of *L*. A classical result relates the stability of the equilibrium with the spectrum of *L*, and in particular it is known that if *L* has all its eigenvalues with negative real parts, i.e., *<sup>σ</sup>*R*(L)* <sup>⊂</sup> <sup>R</sup>*<*0, then the origin is asymptotically stable, and if *L* has some eigenvalues with positive real parts, then the origin is unstable. If instead *<sup>σ</sup>*R*(L)* <sup>⊂</sup> <sup>R</sup>≤0, the linearization fails to determine the stability properties of the origin, and thus the analysis of this situation requires to employ additional tools.

In this case, we can first use a linear change of coordinates to separate the zero and the negative eigenvalues, i.e., we can rewrite (1) as

$$
\dot{\mathbf{x}} = L\_1 \mathbf{x} + N\_1(\mathbf{x}, \mathbf{y})
$$

$$
\dot{\mathbf{y}} = L\_2 \mathbf{y} + N\_2(\mathbf{x}, \mathbf{y})
\tag{2}
$$

where *<sup>L</sup>*<sup>1</sup> <sup>∈</sup> <sup>R</sup>*d*×*<sup>d</sup>* is such that *<sup>σ</sup>*R*(L*1*)* = {0} and *<sup>L</sup>*<sup>2</sup> <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* with *<sup>m</sup>* := *<sup>n</sup>* <sup>−</sup> *<sup>d</sup>* is such that *<sup>σ</sup>*R*(L*2*)* <sup>⊂</sup> <sup>R</sup>*<*0. The nonlinear functions *<sup>N</sup>*<sup>1</sup> : <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*<sup>m</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup>* and *<sup>N</sup>*<sup>2</sup> : <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*<sup>m</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup>* are continuously differentiable. Intuitively, we expect the stability of the equilibrium to only depend on the nonlinear term *N*1*(x, y)*. This intuition turns out to be correct, and indeed it can be properly formalized by means of the center manifold theorem.

We start by recalling a sufficient condition for the existence of a center manifold.

**Theorem 1 ([1])** *If N*<sup>1</sup> *and N*<sup>2</sup> *are twice continuously differentiable and are such that*

$$N\_l(0,0) = 0, \ \frac{\partial N\_l}{\partial \mathbf{x}}(0,0) = 0, \ \frac{N\_l}{\partial \mathbf{y}}(0,0) = 0, \ \mathbf{i} = 1,2,3$$

*and if the eigenvalues of L*<sup>1</sup> *have zero real parts, and all the eigenvalues of L*<sup>2</sup> *have negative real parts, then there exists a neighbourhood <sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>d</sup> of the origin* <sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>d</sup>*

*and a center manifold <sup>h</sup>* : *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup> for* (2)*, i.e., <sup>y</sup>* <sup>=</sup> *h(x) is an invariant manifold for* (2)*,* <sup>1</sup> *h is smooth, and*

$$h(0) = 0, \ \ Dh h(0) = 0.\tag{3}$$

Under the assumptions of this theorem, using (2) we deduce that *h* satisfies the PDE

$$L2h(\mathbf{x}) + N2(\mathbf{x}, h(\mathbf{x})) = Dh(\mathbf{x}) \left( L1\mathbf{x} + N1(\mathbf{x}, h(\mathbf{x})) \right),\tag{4}$$

and the following center manifold theorem ensures that there are smooth solutions to this PDE. Moreover, it also allows to deduce the stability of the origin of the full order system (2) from the stability of the origin of a reduced order system called the *center dynamics*.

**Theorem 2 (Center Manifold Theorem [1])** *The equilibria x* = 0*, y* = 0 *of the original dynamics is locally asymptotically stable (resp. unstable) if and only if the equilibrium x* = 0 *of the center dynamics (dynamics on the center manifold)*

$$
\dot{\mathbf{x}} = L\_1 \mathbf{x} + N\_1(\mathbf{x}, h(\mathbf{x})), \tag{5}
$$

*is locally asymptotically stable (resp. unstable).*

In particular, this result guarantees that, after solving the PDE (4), the problem of analyzing the stability properties of the system (2) reduces to analyzing the nonlinear stability of the lower dimensional system (5). This second problem is of smaller dimension and thus, provided the knowledge of *h*, the approach is attractive to obtain information on the system (1) via a reduced model.

Moreover, we remark that an exact knowledge of *h* is not required for this purpose, i.e., it is sufficient to have an approximate solution of the PDE (4). Indeed, it is frequently sufficient to compute only the low degree terms of the Taylor series expansion of *<sup>h</sup>* around *<sup>x</sup>* <sup>=</sup> 0, i.e., if *(*·*)*[*k*] is the degree *<sup>k</sup>* part of the Taylor series of *h*, the approximation

$$h(\mathbf{x}) \approx h^{[1]}\mathbf{x} + h^{[2]}(\mathbf{x}) + h^{[3]}(\mathbf{x}) + \dots + h^{[d-1]}(\mathbf{x})\tag{6}$$

is sufficient to obtain an approximation of the dynamics of order *<sup>ε</sup><sup>d</sup>* as *x* ≤ *<sup>ε</sup>*. The approximation (6) can be obtained by coefficient comparison, thus rewriting

<sup>1</sup>A differentiable manifold *<sup>M</sup>* is said to be invariant under the flow of a vector field *<sup>X</sup>* if for *<sup>x</sup>* <sup>∈</sup> *<sup>M</sup>*, *Ft(x)* ∈ *M* for small *t >* 0, where *Ft(x)* is the flow of *X*.

the PDE (4) as a set of algebraic equations as

$$\begin{aligned} L\_2 h^{[1]} &= h^{[1]} L\_1 \\ L\_2 h^{[2]}(\mathbf{x}) + N\_2^{[2]}(\mathbf{x}, h^{[1]}(\mathbf{x})) &= \frac{\partial h^{[2]}}{\partial \mathbf{x}}(\mathbf{x}) \left( L\_1 \mathbf{x}\_1 + N\_1^{[2]}(\mathbf{x}, h^{[1]}(\mathbf{x})) \right) \\ L\_2 h^{[3]}(\mathbf{x}) + \left( N\_2(\mathbf{x}, h^{[2]}(\mathbf{x})) \right)^{[2]} &= \frac{\partial h^{[2]}}{\partial \mathbf{x}}(\mathbf{x}) \left( L\_1 \mathbf{x} + \left( N\_1(\mathbf{x}, h^{[2]}(\mathbf{x}) \right) \right)^{[2]} \right) \\ &\dots \dots \end{aligned}$$

We remark that this methodology is valid for parameterized dynamical systems and is used to study the stability of dynamical systems with bifurcations.

Nevertheless, even this approximated knowledge of *h* can be difficult to obtain in practice for a general ODE. To overcome this limitation, and since an approximated knowledge of the manifold is sufficient, our goal in this paper is to find a databased approximation of the center manifold. This approximation is based solely on the knowledge of the splitting (1) and on the numerical computation of a set of trajectories of the system, and it provides an approximation of *h* which can be used to study the system stability.

#### **3 Kernel Approximation**

We want to build a surrogate model *sh* : *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup>* which approximates the center manifold *<sup>h</sup>* on a suitable set *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* , in the sense that *sh(x)* <sup>≈</sup> *h(x)* for all *<sup>x</sup>* <sup>∈</sup> *<sup>Ω</sup>*. This model is constructed in a data-based way, i.e., we assume some knowledge of the map *h* on a finite set of input parameters, or training data. In practice, such values are computed from high-fidelity numerical approximations, which will be discussed in detail in the following.

The surrogate is based on kernel approximation, which allows the use of scattered data, i.e., we do not require any grid structure on the set of training data. Moreover, since the unknown function *h* is vector-valued, we employ here matrix-valued kernels. Details on kernel-based approximation can be found e.g. in [9], and the extension to the vectorial case is detailed e.g. in [5, 10]. We recall here only that a positive definite matrix-valued kernel on *<sup>Ω</sup>* is a function *<sup>K</sup>* : *<sup>Ω</sup>* <sup>×</sup> *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*m*×*<sup>m</sup>* such that *K(x, y)* <sup>=</sup> *K(y, x)<sup>T</sup>* for all *x,y* <sup>∈</sup> *<sup>Ω</sup>* and [*K(xi, xj )*] *N i,j*=<sup>1</sup> <sup>∈</sup> <sup>R</sup>*mN*×*mN* is positive semidefinite for any set {*x*1*,...,xN* } ⊂ *Ω* of pairwise distinct points, for all *<sup>N</sup>* <sup>∈</sup> <sup>N</sup>. Associated to a positive definite kernel there is a unique Hilbert space *<sup>H</sup>* of functions *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*m*, named native space, where the kernel is reproducing, meaning that *K(*·*, x)α* is the Riesz representer of the directional point evaluation *δα <sup>x</sup> (f )* := *<sup>α</sup><sup>T</sup> f (x)*, for all *<sup>α</sup>* <sup>∈</sup> <sup>R</sup>*m*, *<sup>x</sup>* <sup>∈</sup> *<sup>Ω</sup>*.

We consider here a twice continuously differentiable matrix-valued kernel *k* on *Ω*, and we use a specific functional formulation for our approximation and a specific cost function, in order to construct a surrogate that is well suited for the particular approximation task.

In detail, the approximant takes the form

$$s\_h(\boldsymbol{\alpha}) = \sum\_{l=1}^{n\_1} K(\boldsymbol{\alpha}, \boldsymbol{x}\_l^{(1)}) \boldsymbol{\alpha}\_l + \sum\_{j=1}^{n\_2} \sum\_{l=1}^{m} \partial\_l^{(2)} K(\boldsymbol{\alpha}, \boldsymbol{x}\_j^{(2)}) \boldsymbol{\beta}\_{l,j},$$

with centers *x(*1*) <sup>i</sup>* <sup>∈</sup> *<sup>X</sup>(*1*)* <sup>=</sup> *x(*1*)* <sup>1</sup> *,...,x(*1*) <sup>n</sup>*<sup>1</sup> , *x(*2*) <sup>j</sup>* <sup>∈</sup> *<sup>X</sup>(*2*)* <sup>=</sup> *x(*2*)* <sup>1</sup> *,...,x(*2*) <sup>n</sup>*<sup>2</sup> and coefficient vectors *αi, βi,j* <sup>∈</sup> <sup>R</sup>*m*. Here the superscript *<sup>∂</sup>(*2*)* denotes that the derivative with regards to the second kernel component is taken.

Subsequently, we assume to have a sufficient amount of data *XN*<sup>∗</sup> = {*x*1*,...,xN*<sup>∗</sup> } and *YN*<sup>∗</sup> = {*y*1*,...,yN*<sup>∗</sup> } which, for example, is generated by running a numerical scheme to compute discrete trajectories for different initial values *(x*0*, y*0*)*. For this step, we need to assume that the variable splitting (2) is known in advance. Note that this is not a severe restriction, as for a general ODE (1) the required state transformation can be determined by eigenvalue decomposition of *L*.

Observe that we do not know if a data pair *(xi, yi)* lies on the center manifold, i.e. if *yi* = *h(xi)* holds. We only know that the data converges asymptotically to the center manifold as *xi* → 0. Thus, an interpolation-based surrogate which merely interpolates the data on a given subset *X* ⊂ *XN*<sup>∗</sup> seems ill-suited for our purposes. Instead we consider another set of conditions to define the approximant. First, we still require the conditions in (3) to be satisfied by our approximation. Moreover, for the given subsets *X* = {*x*1*,...,xN* } and *Y* = {*y*1*,...,yN* }, we compute our approximant by minimizing the following functional *<sup>J</sup>* : *<sup>H</sup>* <sup>→</sup> <sup>R</sup> under the constraint *s(*0*)* = 0*,Ds(*0*)* = 0:

$$J(s) := \|s\|\_{\mathcal{H}^\ell}^2 + \sum\_{l=1}^N (s(\mathbf{x}\_l) - \mathbf{y}\_l)^T \boldsymbol{\alpha}\_l(\mathbf{s}(\mathbf{x}\_l) - \mathbf{y}\_l). \tag{7}$$

Here *ωi* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* is a positive definite weight matrix. It can be shown that (7) has a unique minimizer *sh* (see [11]). In particular *sh* and its derivative *Dsh* have the form

$$s\_{\hbar}(\mathbf{x}) = \sum\_{l=1}^{N+1} K(\mathbf{x}, \mathbf{x}\_l) \alpha\_l + \sum\_{l=1}^{m} \partial\_l^{(2)} K(\mathbf{x}, \mathbf{0}) \beta\_l,\tag{8}$$

$$Ds\_{\hbar}(\mathbf{x}) = \sum\_{l=1}^{N+1} D^{(1)} K(\mathbf{x}, \mathbf{x}\_l) \alpha\_l + \sum\_{l=1}^{m} D^{(1)} \partial\_l^{(2)} K(\mathbf{x}, \mathbf{0}) \beta\_l,$$

where we set *xN*+<sup>1</sup> := 0. The coefficient vectors *αi, βi* can be computed by solving the system

$$
\begin{pmatrix} A+W & B \\ B^T & C \end{pmatrix} \begin{pmatrix} \alpha \\ \beta \end{pmatrix} = \begin{pmatrix} Y \\ Z \end{pmatrix}, \tag{9}
$$

with

$$\begin{split} A &:= \left( K(\boldsymbol{\chi\_{i}}, \boldsymbol{\chi\_{j}}) \right)\_{i,j} \in \mathbb{R}^{m(N+1)\times m(N+1)}, \\ W &:= \text{diag}\left( \boldsymbol{\omega\_{1}^{-1}}, \dots, \boldsymbol{\omega\_{N}^{-1}}, 0 \right) \in \mathbb{R}^{m(N+1)\times m(N+1)}, \\ B &:= \left( \partial\_{j}^{(2)} K(\boldsymbol{\chi\_{i}}, 0) \right)\_{i,j} \in \mathbb{R}^{m(N+1)\times m^{2}}, \\ C &:= \left( \partial\_{i}^{(1)} \partial\_{j}^{(2)} k(0, 0) \right)\_{i,j} \in \mathbb{R}^{m^{2}\times m^{2}}, \\ Y &:= \left( \mathbf{y}\_{1}^{T}, \dots, \mathbf{y}\_{n}^{T}, 0 \right)^{T} \in \mathbb{R}^{m(N+1)}, \\ Z &:= 0 \in \mathbb{R}^{m^{2}\times m}. \end{split}$$

The weight matrices *ωi* can either be chosen manually, or a regularizing function *<sup>r</sup>* : *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*m*×*<sup>m</sup>* can be prescribed such that *ωi* <sup>=</sup> *r(xi)* is symmetric and positive definite. In our numerical examples in Sect. 4 we chose a constant regularization function, i.e.

$$a\_l = r(\mathbf{x}\_l) = \lambda I\_m$$

for some *λ >* 0. However, one might consider a more general approach, where the weight increases as the data tends to the origin, i.e. *ωi* " *ωj* if *xi*≤*xj* .

#### *3.1 Greedy Approximation*

If the technique of the previous section is used as it is, the surrogate (8) is given by an expansion with *N*∗ terms, where *N*∗ is the number of points in the training set. Therefore, the model evaluation might not be efficient enough if the model is built using a too large dataset. Furthermore, the computation of the coefficients in (8) requires the solution of the linear system (9), whose size again scales with the size of the training set, and which can be severely ill-conditioned for non well-placed points.

To mitigate both problems, we employ an algorithm that aims at selecting small subsets *XN* , *YN* of points such that the surrogate computed with these sets is a sufficiently good approximation of the one which uses the full sets. The algorithm selects the points in a greedy way, i.e., one point at a time is selected and added to the current training set. In this way, it is possible to identify a good set without the need to solve a nearly infeasible combinatorial problem.

The selection is performed using the *P*-greedy method of [2] applied to the kernel *K*, such that the set of points is selected before the computation of the surrogate. The number of points, and therefore the expansion size and evaluation speed, is depending on a prescribed target accuracy *εtol >* 0. For details on the method implementation and its convergence properties we refer to [7].

#### **4 Numerical Examples**

We test now our method on three different examples. In each of them, we specify the setting and the parameters used to build the surrogate and visualize our approximation to the center manifold. Additionally, we compute the pointwise residual

$$r(\mathbf{x}) = D\mathbf{s}\_h(\mathbf{x}) \left( L\_1 \mathbf{x} + N\_1(\mathbf{x}, \mathbf{s}\_h(\mathbf{x})) \right) - \left( L\_2 \mathbf{s}\_h(\mathbf{x}) + N\_2(\mathbf{x}, \mathbf{s}\_h(\mathbf{x})) \right),$$

which measures how well the surrogate *sh* satisfies the ODE (4).

In all the three examples, the greedy algorithm is used to select a suitable subset of the points, and in all cases the procedure is stopped with a prescribed *εtol*. In the first two examples we set *εtol* := <sup>10</sup>−15, while *εtol* := <sup>10</sup>−<sup>10</sup> is used in the last one.

#### *4.1 Example 1*

We consider the 2-dimensional system

$$\begin{aligned} \dot{\mathbf{x}} &= L\_1 \mathbf{x} + N\_1(\mathbf{x}, \mathbf{y}) = \mathbf{0} + \mathbf{x} \mathbf{y} \\ \dot{\mathbf{y}} &= L\_2 \mathbf{y} + N\_2(\mathbf{x}, \mathbf{y}) = -\mathbf{y} + \mathbf{x}^2. \end{aligned} \tag{10}$$

We generate the training data by solving (10) with an implicit Euler scheme for initial time *t*<sup>0</sup> = 0, final time *T* = 1000 and with the time step *Δt* = 0*.*1. We initiate the numerical procedure with initial values *(x*0*, y*0*)* ∈ {±0*.*8} × {±0*.*8} and store the resulting data pairs in *X* and *Y* after discarding all data whose *x*-values are not contained in the neighborhood [−0*.*1*,* 0*.*1] which results in *N*<sup>∗</sup> = 38*,*248 data pairs.

We run the greedy algorithm for the kernels *k*1*(x, y)* := 1 + *xy/*2 <sup>4</sup> and *<sup>k</sup>*2*(x, y)* <sup>=</sup> *<sup>e</sup>*−*(x*−*y)*2*/*2. This results in the sets *<sup>X</sup>*<sup>1</sup> and *<sup>X</sup>*<sup>2</sup> which contain 14 and 6 points, respectively. The corresponding approximations *s*<sup>1</sup> and *s*<sup>2</sup> for the constant

regularization function *<sup>r</sup>* <sup>≡</sup> <sup>10</sup>−<sup>10</sup> are plotted in Fig. <sup>1</sup> over the domain [−0*.*1*,* <sup>0</sup>*.*1]. The pointwise residual is depicted in Fig. 2.

#### *4.2 Example 2*

We consider the 2-dimensional system

$$\begin{aligned} \dot{\mathbf{x}} &= L\_1 \mathbf{x} + N\_1(\mathbf{x}, \mathbf{y}) = \mathbf{0} - \mathbf{x} \mathbf{y} \\ \dot{\mathbf{y}} &= L\_2 \mathbf{y} + N\_2(\mathbf{x}, \mathbf{y}) = -\mathbf{y} + \mathbf{x}^2 - 2\mathbf{y}^2. \end{aligned} \tag{11}$$

The training data is generated the same way as in Example 1. We again use the kernels *k*<sup>1</sup> and *k*2. The greedy algorithm gives sets *X*<sup>1</sup> and *X*<sup>2</sup> of size 12 and 6, respectively. The evaluation of the approximations *s*<sup>1</sup> and *s*<sup>2</sup> over the neighborhood

[−0*.*1*,* 0*.*1] can be seen in Fig. 3, while the respective pointwise residuals are plotted in Fig. 4.

#### *4.3 Example 3*

We consider the *(*2 + 1*)*-dimensional system

$$\begin{aligned} \dot{\mathbf{x}} &= L\_1 \mathbf{x} + N\_1(\mathbf{x}, \mathbf{y}) = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} \mathbf{x}\_1 \\ \mathbf{x}\_2 \end{pmatrix} + \begin{pmatrix} \mathbf{x}\_1 \mathbf{y} \\ \mathbf{x}\_2 \mathbf{y} \end{pmatrix} \\\ \dot{\mathbf{y}} &= L\_2 \mathbf{y} + N\_2(\mathbf{x}, \mathbf{y}) = -\mathbf{y} - \mathbf{x}\_1^2 - \mathbf{x}\_2^2 + \mathbf{y}^2. \end{aligned} \tag{12}$$

We generate the training data in a similar fashion as before. We again use the implicit Euler scheme with start time *t* = 0, final time *T* = 1000 and with time step

**Fig. 5** Approximations *s*<sup>1</sup> and *s*<sup>2</sup> of the center manifold and corresponding residuals *r*<sup>1</sup> and *r*<sup>2</sup>

*Δt* = 0*.*1. The Euler method is performed for initial data *(x*0*, y*0*)* ∈ {±0*.*8} <sup>3</sup> and the resulting trajectories are stored in *X* and *Y* , where only data with *x* ∈ [−0*.*1*,* 0*.*1] 2 was considered; this leads to *N*<sup>∗</sup> = 78*,*796 data pairs. We use the kernels *k*1*(x, y)* = *(*<sup>1</sup> <sup>+</sup> *<sup>x</sup><sup>T</sup> y/*2*)*<sup>4</sup> and *<sup>k</sup>*2*(x, y)* <sup>=</sup> *<sup>e</sup>*−*x*−*y*<sup>2</sup> <sup>2</sup>*/*2, and the greedy-selected sets have the size 21 (for *k*1) and 25 (for *k*2), respectively. The approximations *s*1, *s*<sup>2</sup> and their corresponding residuals *r*<sup>1</sup> and *r*<sup>2</sup> computed over the domain [−0*.*1*,* 0*.*1] 2. The results can be seen in Fig. 5.

We remark that in all the three experiments both kernels give comparable results in terms of error magnitude, and they both provide a good approximation of the manifold.

#### **5 Conclusions**

In this paper we introduced a novel algorithm to approximate the center manifold of a given ODE using a data-based surrogate.

This algorithm computes an approximation of the manifold from a set of numerical trajectories with different initial data. It is based on kernel methods, which allow the use of the scattered data generated by these simulations as training points. Moreover, an application-specific ansatz and cost function have been employed in order to enforce suitable properties on the surrogate.

Several numerical experiments suggested that the present method can reach a significant accuracy, and that it has the potential to be used as an effective model reduction technique. It seems promising to apply this approach to high dimensional systems as the approximation technique straightforwardly can be extended and is less prone to the curse of dimensionality than grid-based approximation techniques. An interesting extension would consist of determining the decomposition (2) in a data-based fashion by suitable processing of the trajectory data.

**Acknowledgements** The first, third, and fourth authors would like to thank the German Research Foundation (DFG) for support within the Cluster of Excellence in Simulation Technology (EXC 310/2) at the University of Stuttgart. The second author thanks the European Commission for financial support received through Marie Curie Fellowships.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **An Adaptive Error Inhibiting Block One-Step Method for Ordinary Differential Equations**

**Jiaxi Gu and Jae-Hun Jung**

#### **1 Introduction**

General linear methods have been extensively studied for solving ODEs. Among the large family of general linear methods the diagonally implicit multistage integration methods (DIMSIMs) in [1] are the special cases, which exhibit considerable potential for efficient implementation, providing the global error of the same order as the local truncation error. In [2], it was demonstrated that finite difference methods for PDEs can be constructed such that their convergence rates, or the order of their global errors, are higher than the order of the truncation errors. Following this idea, Ditkowski and Gottlieb devised the error inhibiting strategy in [3] by inhibiting the lowest order term in the truncation error from accumulating over time and thus showed that the global error of the scheme is one order higher than the local truncation error. The form of the error inhibiting scheme is inspired by the work of [7], where a block of *s* new step values is obtained at each step. The key idea of this method is to construct a coefficient matrix that has the null space where the local truncation error resides.

In this work, we further improved the original error inhibiting method by introducing an additional free parameter used in the radial basis function

J. Gu

Department of Mathematics, University at Buffalo, The State University of New York, Buffalo, NY, USA

e-mail: jiaxigu@buffalo.edu

J.-H. Jung (-)

© The Author(s) 2020

Department of Mathematics, University at Buffalo, The State University of New York, Buffalo, NY, USA

Department of Data Science, Ajou University, Suwon, South Korea e-mail: jaehun@buffalo.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_7

(RBF) approximations. The main idea of the proposed method is to adopt the free parameter in the reconstruction of the error inhibiting method and to control it for further possible error cancellations. This results in a higher order of convergence than the original method. One advantage is that the proposed method does not need any additional conditions, so it is efficient to implement.

The next section will review the explicit error inhibiting block one-step method. In Sect. 3, we will explain the RBF interpolation. In Sect. 4, we show how the new method can be derived followed by Sect. 5 where numerical results are provided verifying that the convergence rate of the proposed method is increased by one order. A brief conclusion and an outline of our future research are presented in Sect. 6.

#### **2 Error Inhibiting Block One-Step Method**

Consider the initial value problem for the first-order ODE below

$$\begin{aligned} u'(t) &= f(t, u(t)), \; t \gtrless a \\ u(a) &= u\_a \end{aligned} \tag{1}$$

where we assume *f (t , u)* is uniformly Lipschitz continuous in *u* and continuous in *t*. We choose a value *h* for the step size and set *tn* = *a* + *nh* a discrete sequence in the time domain. Denote the numerical approximation of the solution *u(tn)* by *vn*.

Define the solution vector *Un* by

$$U\_n = \begin{bmatrix} \mu\_{n + \frac{s-1}{s}}, \cdot, \cdot, \ u\_{n + \frac{1}{s}}, u\_n \end{bmatrix}^T,$$

where *un*+*<sup>j</sup> s* <sup>=</sup> *u(tn*+*jh/s)* is the exact solution at *<sup>t</sup>* <sup>=</sup> *tn* <sup>+</sup> *jh <sup>s</sup>* for *j* = 0*,*··· *, s* − 1. The corresponding approximation vector *Vn* is defined as

$$V\_n = \begin{bmatrix} \upsilon\_{n + \frac{s-1}{s}}, \cdot \cdot \cdot \ , \ \upsilon\_{n + \frac{1}{s}}, \upsilon\_n \end{bmatrix}^T \ .$$

In [3], the scheme is formulated as

$$V\_{n+1} = \mathcal{Q}V\_n \tag{2}$$

where the operator *Q* is represented by the following

$$\mathcal{Q} = A + hBf$$

and *A,B* <sup>∈</sup> <sup>R</sup>*s*×*s*. There are 4 sufficient conditions imposed on the matrices *<sup>A</sup>* and *B* in order to be error inhibiting:


$$[1, \dots, 1]^T.$$


$$||\mathcal{Q}\mathfrak{r}\_{\boldsymbol{\nu}}|| \leqslant \mathcal{O}(h) \cdot ||\mathfrak{r}\_{\boldsymbol{\nu}}||.$$

This is accomplished by requiring that the leading order term of the local truncation error is in the eigenspace of *A* associated with the zero eigenvalue.

We derive those matrices of *A* and *B* with symbolic computation. As an example of the derivation of the error inhibiting method, we consider the construction of the scheme with *s* = 2. The solution vector is then

$$U\_n = \left[\mu\_{n+1/2}, \mu\_n\right]^T,$$

and the corresponding approximation vector is given by

$$V\_n = \begin{bmatrix} \upsilon\_{n+1/2}, \upsilon\_n \end{bmatrix}^T.$$

In order to satisfy those conditions listed above we first select

$$A = \begin{bmatrix} 1 - \upsilon \ \upsilon \\ 1 - \upsilon \ \upsilon \end{bmatrix},\tag{3}$$

which can be diagonalized as

$$A = \begin{bmatrix} 1 - \boldsymbol{\upsilon} \ \boldsymbol{\upsilon} \\ 1 - \boldsymbol{\upsilon} \ \boldsymbol{\upsilon} \end{bmatrix} = \begin{bmatrix} \boldsymbol{\upsilon} - 1 & \boldsymbol{\upsilon} \\ \boldsymbol{\upsilon} - 1 \ \boldsymbol{\upsilon} - 1 \end{bmatrix} \begin{bmatrix} 1 \ 0 \\ 0 \ 0 \end{bmatrix} \begin{bmatrix} -1 \ \frac{\boldsymbol{\upsilon}}{\boldsymbol{\upsilon} - 1} \\ 1 \ -1 \end{bmatrix}.\tag{4}$$

Then conditions 1, 2 and 3 are satisfied. Further suppose that

$$B = \begin{bmatrix} b\_{11} \ b\_{12} \\ b\_{21} \ b\_{22} \end{bmatrix}. \tag{5}$$

Then

$$V\_{n+1} = \begin{bmatrix} 1 - \nu \ \upsilon \\ 1 - \nu \ \upsilon \end{bmatrix} V\_n + h \begin{bmatrix} b\_{11} \ b\_{12} \\ b\_{21} \ b\_{22} \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix} \tag{6}$$

where *fn*+1*/*<sup>2</sup> = *f (tn*+1*/*2*, vn*+1*/*2*)* and *fn* = *f (tn, vn)*. The components of *Vn*+<sup>1</sup> are

$$v\_{n+3/2} = (1 - \upsilon)v\_{n+1/2} + \upsilon v\_n + h(b\_{11}f\_{n+1/2} + b\_{12}f\_n),$$

$$v\_{n+1} = (1 - \upsilon)v\_{n+1/2} + \upsilon v\_n + h(b\_{21}f\_{n+1/2} + b\_{22}f\_n).$$

We write each difference equation in the form of error normalized by the step size and then insert the exact solutions to the ODE into the difference equation. Expanding *un*+3*/*2*, un*+<sup>1</sup> and *un*+1*/*<sup>2</sup> around *t* = *tn* in Taylor series gives the local truncation error

$$\mathfrak{r}\_n = (\mathfrak{r}\_{n+1/2}, \mathfrak{r}\_n)^T,$$

where

$$\begin{split} \pi\_{n+1/2} = \frac{1}{2}(2 - 2b\_{11} - 2b\_{12} + \upsilon)u\_n' + \frac{1}{8}(8 - 4b\_{11} + \upsilon)u\_n''h \\ &+ \frac{1}{48}(26 - 6b\_{11} + \upsilon)u\_n^{(3)}h^2 + O(h^3), \end{split} \tag{7}$$

$$\pi\_{\boldsymbol{\theta}} = \frac{1}{2}(1 - 2b\_{21} - 2b\_{22} + \nu)\boldsymbol{\mu}\_n^{\prime} + \frac{1}{8}(3 - 4b\_{21} + \nu)\boldsymbol{\mu}\_n^{\prime\prime}\boldsymbol{h}$$

$$+ \frac{1}{48}(7 - 6b\_{21} + \nu)\boldsymbol{\mu}\_n^{(3)}\boldsymbol{h}^2 + O(h^3). \qquad (8)$$

Vanishing the coefficients of the constant term and the term *h* in (7) and (8), and equating the quotient of the coefficient of the terms *h*<sup>2</sup> in (7) and (8) to *<sup>υ</sup> <sup>υ</sup>*−<sup>1</sup> , the condition 4 is satisfied.

Finally we have the desired scheme as in [3]

$$V\_{n+1} = \frac{1}{6} \begin{bmatrix} -1 & 7\\ -1 & 7 \end{bmatrix} V\_n + \frac{h}{24} \begin{bmatrix} 55 & -17\\ 25 & 1 \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix},\tag{9}$$

and correspondingly the local truncation error is 2nd order convergent as expected

$$
\pi\_n = \frac{23}{576} \begin{bmatrix} 7 \\ 1 \end{bmatrix} u\_n^{(3)} h^2 + O(h^3). \tag{10}
$$

#### **3 RBF Interpolation**

Now we briefly explain the RBF interpolation in one dimension. Suppose that for a domain <sup>⊂</sup> <sup>R</sup>, a data set {*(xi, ui)*} *N <sup>i</sup>*=<sup>0</sup> is given where *ui* is the value of the unknown function *u(x)* at *<sup>x</sup>* <sup>=</sup> *xi* <sup>∈</sup> . We use the RBFs *<sup>φ</sup>* : <sup>→</sup> <sup>R</sup> defined by *φi(x)* = *φ(*|*x* − *xi*|*, #i)*, where |*x* − *xi*| is the distance between *x* and *xi* and *#i* is a free parameter. The reconstruction of a function, *u(x)*, is then made by a linear combination of RBFs

$$I\_N^R \,\mu(\mathbf{x}) = \sum\_{l=0}^N \lambda\_l \phi(|\mathbf{x} - \mathbf{x}\_l|, \epsilon\_l), \tag{11}$$

where *λi* are the expansion coefficients to be determined. Using the interpolation condition *I <sup>R</sup> <sup>N</sup> u(xi)* = *ui, i* = 0*,*··· *, N*, we could find the expansion coefficients *λi* by solving the linear system

$$
\begin{bmatrix}
\phi(|\mathbf{x}\_{0}-\mathbf{x}\_{0}|,\epsilon\_{0}) & \phi(|\mathbf{x}\_{0}-\mathbf{x}\_{1}|,\epsilon\_{1}) & \cdots & \phi(|\mathbf{x}\_{0}-\mathbf{x}\_{N}|,\epsilon\_{N}) \\
\phi(|\mathbf{x}\_{1}-\mathbf{x}\_{0}|,\epsilon\_{0}) & \phi(|\mathbf{x}\_{1}-\mathbf{x}\_{1}|,\epsilon\_{1}) & \cdots & \phi(|\mathbf{x}\_{1}-\mathbf{x}\_{N}|,\epsilon\_{N}) \\
\vdots & \vdots & & \vdots \\
\phi(|\mathbf{x}\_{N}-\mathbf{x}\_{0}|,\epsilon\_{0}) & \phi(|\mathbf{x}\_{N}-\mathbf{x}\_{1}|,\epsilon\_{1}) & \cdots & \phi(|\mathbf{x}\_{N}-\mathbf{x}\_{N}|,\epsilon\_{N})
\end{bmatrix},
\begin{bmatrix}
\lambda\_{0} \\
\lambda\_{1} \\
\vdots \\
\lambda\_{N}
\end{bmatrix} = \begin{bmatrix}
u\_{0} \\
u\_{1} \\
\vdots \\
\omega\_{N}
\end{bmatrix}.
\tag{12}
$$

If we choose the multiquadric RBF with all the free parameters equal, then the interpolation matrix, *A*, becomes a symmetric matrix with all diagonal entries 1,

$$A = \begin{bmatrix} 1 & \sqrt{1 + \epsilon^2 (\mathbf{x}\_0 - \mathbf{x}\_1)^2} & \cdots & \sqrt{1 + \epsilon^2 (\mathbf{x}\_0 - \mathbf{x}\_N)^2} \\ \sqrt{1 + \epsilon^2 (\mathbf{x}\_1 - \mathbf{x}\_0)^2} & 1 & \cdots & \sqrt{1 + \epsilon^2 (\mathbf{x}\_1 - \mathbf{x}\_N)^2} \\ \vdots & \vdots & & \vdots \\ \sqrt{1 + \epsilon^2 (\mathbf{x}\_N - \mathbf{x}\_0)^2} & \sqrt{1 + \epsilon^2 (\mathbf{x}\_N - \mathbf{x}\_1)^2} & \cdots & 1 \end{bmatrix} \tag{13}$$

Consider the case of three equally spaced nodes *x*0*, x*1*, x*<sup>2</sup> with *x*<sup>0</sup> *< x*<sup>1</sup> *< x*2. Let *h* be the grid spacing. Then the linear system becomes

$$
\begin{bmatrix}
1 & \sqrt{1+\epsilon^2 h^2} & \sqrt{1+4\epsilon^2 h^2} \\
\sqrt{1+\epsilon^2 h^2} & 1 & \sqrt{1+\epsilon^2 h^2} \\
\sqrt{1+4\epsilon^2 h^2} & \sqrt{1+\epsilon^2 h^2} & 1
\end{bmatrix} \cdot \begin{bmatrix}
\lambda\_0 \\
\lambda\_1 \\
\lambda\_2
\end{bmatrix} = \begin{bmatrix}
u\_0 \\
u\_1 \\
u\_2
\end{bmatrix}.
\tag{14}$$

By the closed-form expression for the RBF interpolant in [4],

$$I\_2^R u(\mathbf{x}) = \sum\_{l=0}^2 \frac{u\_l}{\det(A)} \det(A\_l(\mathbf{x})). \tag{15}$$

where *Ai(x)*,a3 × 3 matrix, is obtained by replacing the *i*th row of *A* with the row vector

$$\left[\sqrt{1+\epsilon^2(\mathbf{x}-\mathbf{x\_0})^2} \quad \sqrt{1+\epsilon^2(\mathbf{x}-\mathbf{x\_1})^2} \quad \sqrt{1+\epsilon^2(\mathbf{x}-\mathbf{x\_2})^2}\right].$$

Differentiating the interpolant, we obtain the first-order derivative

$$\frac{d}{d\boldsymbol{x}} I\_2^R \boldsymbol{u}(\boldsymbol{x}) = \sum\_{l=0}^2 \frac{\mu\_l}{\det(\boldsymbol{A})} \cdot \frac{d}{d\boldsymbol{x}} \det(\boldsymbol{A}\_l(\boldsymbol{x})).\tag{16}$$

We then estimate the derivative of *u* at *x* = *x*<sup>1</sup> as we do in polynomial interpolation for the central difference formula:

$$\frac{d}{dx}I\_2^R u(x\_1) = \frac{\sqrt{1 + 4\epsilon^2 h^2} + 1}{4h\sqrt{1 + \epsilon^2 h^2}}(u\_2 - u\_0). \tag{17}$$

By employing the Taylor expansion of the quotient on the right-hand side of (17), we have

$$\frac{d}{d\boldsymbol{\chi}} I\_2^R \boldsymbol{u}(\boldsymbol{\chi}\_1) = \left[ \frac{1}{2h} + \epsilon^2 \frac{h}{4} + O(h^3) \right] (\boldsymbol{\mu}\_2 - \boldsymbol{\mu}\_0). \tag{18}$$

The main feature of the RBF method is that it contains a free parameter, *#*, which we could make use of to further inhibit the errors. In the following section, we will show that using the parameter *#* coupled with *hp* terms, where *p* 2, we can increase the order of local truncation error and further promote the order of global error by adopting the error inhibiting scheme.

#### **4 Construction of the Adaptive Error Inhibiting Scheme**

Following the main feature of the RBF method explained in the preceding section, we try to establish a similar explicit block one-step scheme that provides a higher order of convergence by adding one more block of the free parameters *#*<sup>1</sup> and *#*<sup>2</sup> coupled with *hp* term. With *<sup>p</sup>* <sup>=</sup> 3, we have

$$V\_{n+1} = \begin{bmatrix} 1 - \boldsymbol{\upsilon} \ \boldsymbol{\upsilon} \\ 1 - \boldsymbol{\upsilon} \ \boldsymbol{\upsilon} \end{bmatrix} V\_n + h \begin{bmatrix} b\_{11} \ b\_{12} \\ b\_{21} \ b\_{22} \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix} + h^3 \begin{bmatrix} 0 \ \boldsymbol{\epsilon}\_1 \\ 0 \ \boldsymbol{\epsilon}\_2 \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix} . \tag{19}$$

We measure the one-step error normalized by the step size as in Sect. 2. Expanding *un*+3*/*2*, un*+<sup>1</sup> and *un*+1*/*<sup>2</sup> around *t* = *tn* in Taylor series again yields the local truncation error

$$\mathbf{r}\_n = \begin{bmatrix} \mathbf{r}\_{n+1/2}, \mathbf{r}\_n \end{bmatrix}^T,$$

where

$$\begin{aligned} \pi\_{n+1/2} &= \frac{1}{2} (2 - 2b\_{11} - 2b\_{12} + \upsilon) u\_n' + \frac{1}{8} (8 - 4b\_{11} + \upsilon) u\_n'' h + \\ &\left( -\epsilon\_1 u\_n' + \frac{1}{48} (26 - 6b\_{11} + \upsilon) u\_n^{(3)} \right) h^2 + \frac{1}{384} (80 - 8b\_{11} + \upsilon) u\_n^{(4)} h^3 + O(h^4), \end{aligned} \tag{20}$$

$$\begin{split} \pi\_{n} &= \frac{1}{2}(1 - 2b\_{21} - 2b\_{22} + \upsilon)u\_{n}^{\prime} + \frac{1}{8}(3 - 4b\_{21} + \upsilon)u\_{n}^{\prime\prime}h + \\ &\left(-\epsilon\_{2}u\_{n}^{\prime} + \frac{1}{48}(7 - 6b\_{21} + \upsilon)u\_{n}^{(3)}\right)h^{2} + \frac{1}{384}(15 - 8b\_{21} + \upsilon)u\_{n}^{(4)}h^{3} + O(h^{4}). \end{split} \tag{21}$$

Annihilating the first two terms in (20) and (21), and equating the quotient of the coefficient of the terms *h*<sup>3</sup> in (20) and (21) to *<sup>υ</sup> <sup>υ</sup>*−<sup>1</sup> , we have the scheme

$$V\_{n+1} = \frac{1}{7} \begin{bmatrix} -1 \ 8 \\ -1 \ 8 \end{bmatrix} V\_n + \frac{h}{28} \begin{bmatrix} 64 \ -20 \\ 29 \ 1 \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix} + h^3 \begin{bmatrix} 0 \ \epsilon\_1 \\ 0 \ \epsilon\_2 \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix} \tag{22}$$

We can easily check that the scheme (22) satisfies those four conditions in Sect. 2. Further annihilating the coefficients of the term *h*2, we get the optimal values of *#*<sup>1</sup> and *#*2:

$$
\epsilon\_1 = \frac{47\mu\_n^{(3)}}{168\mu\_n'},
\tag{23}
$$

$$
\epsilon\_2 = \frac{9u\_n^{(3)}}{224u\_n'}.\tag{24}
$$

Our new scheme has the truncation error

$$
\pi\_n = \frac{55}{2688} \begin{bmatrix} 8 \\ 1 \end{bmatrix} \mu\_n^{(4)} h^3 + O(h^4). \tag{25}
$$

Note that in our new scheme, we need the value of *u(*3*) <sup>n</sup>* at each step. This higher order derivative can be computed by repeated differentiation of the function *f* on the right-hand side of (1) twice. However, we choose to estimate the third-order derivative. For *u <sup>n</sup>*, we use the given condition from (1), i.e. *u (t)* = *f (t , u(t))*. For the third-order derivative *u(*3*) <sup>n</sup>* , we employ the second-order central difference formula for *f (t, u(t))* at *t* = *tn* as

$$
\mu\_n^{(3)} = f^{\prime\prime}(t\_n, u\_n) \approx \frac{4(f\_{n+1/2} + 2f\_n - f\_{n-1/2})}{h^2},\tag{26}
$$

where *fn*+1*/*2*, fn* and *fn*−1*/*<sup>2</sup> are given values. For this computation, no additional conditions are necessary. The truncation error is still third order accurate, *O(h*3*)*, as in (25), so by the error inhibiting strategy we end up with a global error that is *O(h*4*)*, which will soon be confirmed in the following section.

We conclude this section with a comparison of three methods. For DIMSIM of type 3,

$$
\begin{bmatrix} \upsilon\_{n+2} \\ \upsilon\_{n+1} \end{bmatrix} = \frac{1}{4} \begin{bmatrix} 7 - 3 \\ 7 - 3 \end{bmatrix} \begin{bmatrix} \upsilon\_{n+1} \\ \upsilon\_n \end{bmatrix} + \frac{h}{8} \begin{bmatrix} 9 & -7 \\ -3 & -3 \end{bmatrix} \begin{bmatrix} f\_{n+1} \\ f\_n \end{bmatrix},
$$

two steps *vn* and *vn*+<sup>1</sup> are employed to update the step *vn*+<sup>1</sup> and obtain the step *vn*+2.

For error inhibiting scheme,

$$
\begin{bmatrix} v\_{n+3/2} \\ v\_{n+1} \end{bmatrix} = \frac{1}{6} \begin{bmatrix} -1 & 7 \\ -1 & 7 \end{bmatrix} \begin{bmatrix} v\_{n+1/2} \\ v\_n \end{bmatrix} + \frac{h}{24} \begin{bmatrix} 55 & -17 \\ 25 & 1 \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \end{bmatrix},
$$

two steps *vn* and *vn*+1*/*<sup>2</sup> are involved to generate the next two steps *vn*+<sup>1</sup> and *vn*+3*/*2. For our method (if we utilize (26) and substitute (23), (24) for respective *#*<sup>1</sup> and *#*<sup>2</sup> in (22) to avoid the zero denominator),

$$
\begin{bmatrix} v\_{n+3/2} \\ v\_{n+1} \\ v\_{n+1/2} \end{bmatrix} = \frac{1}{7} \begin{bmatrix} -1 \ 8 \ 0 \\ -1 \ 8 \ 0 \\ 1 \ 0 \ 0 \end{bmatrix} \begin{bmatrix} v\_{n+1/2} \\ v\_n \\ v\_{n-1/2} \end{bmatrix} + \frac{h}{168} \begin{bmatrix} 572 \ -496 \ 188 \\ 201 \ -48 \ 27 \\ 0 \ 0 \ 0 \end{bmatrix} \begin{bmatrix} f\_{n+1/2} \\ f\_n \\ f\_{n-1/2} \end{bmatrix},
$$

we use previous three steps *vn*−1*/*2*, vn* and *vn*+1*/*<sup>2</sup> to evolve the next two steps *vn*+<sup>1</sup> and *vn*+3*/*2. In [5] the stability analysis has been done for the adaptive radial basis function methods for IVPs and it has been shown that some adaptive methods have a better stability condition than the original ones. However, it seems that the adaptive error inhibiting method is more computationally expensive than the original one when the approximation of (26) is used.

#### **5 Numerical Results**

We start with the nonlinear first-order differential equation used in [3]

$$\begin{aligned} u' &= -u^2, \ t \geqslant 0 \\ u(0) &= 1. \end{aligned} \tag{27}$$

The exact solution of the example is *u(t)* = 1*/(t* +1*)*. The left figure of Fig. 1 shows the global errors at the time *t* = 1 versus *N*, the number of steps, in logarithmic scale for the type-3 DIMSIM (blue), the original error inhibiting scheme (red) and our proposed method (green). As seen in the figure, our proposed method is the most accurate among those three methods and yields high order convergence which is 4th order. Table 1 shows the convergence with *N* for (27). The type-3 DIMSIM yields the 2nd order accuracy, the original error inhibiting scheme yields the 3rd order accuracy and our proposed method yields the 4th order accuracy.

Next we consider the following problem used in [6] where the solution changes rapidly between [−2*,* 2]

$$\begin{aligned} u' &= -4t^3 u^2, \ t \gg -10\\ u(-10) &= 1/10001. \end{aligned} \tag{28}$$

**Fig. 1** Global error versus *N* in logarithmic scale. Left: (27). Right: (28). Blue: DIMSIM (DIMSIM3) 2nd order. Red: error inhibiting scheme (EIS) 3rd order. Green: our proposed method (EIS with *h*3) 4th order


**Table 1** Global error and order of convergence for *<sup>u</sup>* = −*u*<sup>2</sup> with *u(*0*)* <sup>=</sup> <sup>1</sup>

The exact solution is *u(t)* <sup>=</sup> <sup>1</sup>*/(t*<sup>4</sup> <sup>+</sup> <sup>1</sup>*)*. The right figure of Fig. <sup>1</sup> shows the global errors at *t* = 0 versus *N* in logarithmic scale for the type-3 DIMSIM (blue), the original error inhibiting method (red) and our proposed method (green). We verify again that our proposed method is indeed the most accurate and yields the highest order of convergence. Table 2 shows the convergence with *N* for (28). Although the type-3 DIMSIM does not reveal the 2nd order accuracy in the beginning, it eventually exhibits the order of accuracy as expected. The original error inhibiting scheme is 3rd order accurate and our proposed method 4th order accurate.

#### **6 Conclusions**

In this note, we modified and improved the original error inhibiting block onestep method proposed in [3] by introducing a free parameter. By exploiting the parameter, the local truncation error is further reduced resulting in higher order of the global error. It is numerically demonstrated that, with the proposed method, the local truncation error is of the 3rd order and the global error of the 4th order. As mentioned in Sect. 4, we will investigate the stability of the error inhibiting method and our proposed method as well as relaxing the fourth constraint in error inhibiting method in our future research.


**Table 2** Global error and order of convergence for *<sup>u</sup>* = −4*t*3*u*<sup>2</sup> with *u(*−10*)* <sup>=</sup> <sup>1</sup>*/*<sup>10001</sup>

**Acknowledgements** The authors thank Adi Ditkowski for introducing the error inhibiting method to us and communicating with us on the subject. The research is partially supported by Ajou University.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Hermite Methods in Time**

**Rujie Gu and Thomas Hagstrom**

#### **1 Introduction**

Over the past decade a number of works have appeared which exploit the unique properties of Hermite-Birkhoff interpolation in space to construct arbitraryorder discretization methods for hyperbolic [1, 2, 4–7, 10, 14, 16–19] as well as Schrödinger [3] equations. The precise form of the interpolant in a single cell, which here we write in one dimension labelled *t*, is

$$u(t) \approx \mathcal{J}u(t) \in \Pi^{2m+1}, \quad t \in (t\_{j-1}, t\_j), \tag{1}$$

$$\frac{d^k}{dt^k} \mathcal{J}u(t\_\ell) = \frac{d^k u}{dt^k}(t\_\ell);\ k = 0, \dots, m, \quad \ell = j - 1, j,\tag{2}$$

where *<sup>Π</sup>*2*m*+<sup>1</sup> denotes the polynomials of degree 2*<sup>m</sup>* <sup>+</sup> 1. (In higher dimensions one uses a tensor-product cell interpolant based on vertex data consisting of mixed derivatives of order through *m* in each Cartesian coordinate.)

In contrast, there has been little work on analogous methods for time discretization. A recent exception is the manuscript by Liu et al. [15]. They develop methods for second-order semilinear hyperbolic equations using interpolants of the form (1)–(2) combined with a reformulation of the evolution problem using exact solutions of the linear part. They demonstrate excellent long-time performance.

The outline of the paper is as follows. In Sect. 2 we list a few properties of piecewise Hermite-Birkhoff interpolation. In Sect. 3 we construct the time-stepping

R. Gu · T. Hagstrom (-)

Southern Methodist University, Dallas, TX, USA e-mail: rujieg@smu.edu; thagstrom@smu.edu

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_8

schemes and establish some basic results, with a few numerical experiments described in Sect. 4.

#### **2 Basic Properties of Hermite-Birkhoff Interpolation**

Hermite interpolants have a number of interesting properties which make them very attractive for the solution of differential equations; see, e.g., [2]. Here we will mainly use the simplest. Precisely, for *t* ∈ *(tj*−<sup>1</sup>*, tj )*, the Peano representation of the local error can be easily derived by noting that *e* = *u*−*Iu* solves the two point boundary value problem

$$\frac{d^{2m+2}e}{dt^{2m+2}} = \frac{d^{2m+2}u}{dt^{2m+2}}, \ \ \frac{d^ke}{dt^k} = 0, \ \ t = t\_{j-1}, t\_j, \ \ k = 0, \dots \ m. \tag{3}$$

Thus

$$e(t) = \int\_{1/\rho - 1}^{t/\
f} K\_f(t, s) \frac{d^{2m+2}\mu}{dt^{2m+2}}(s) ds,\tag{4}$$

where the kernel *Kj* is the Green's function for (3). Simple scaling arguments combined with the transformation *<sup>t</sup>* <sup>=</sup> *tj*−<sup>1</sup> <sup>+</sup> *zhj* then show that *<sup>e</sup>* <sup>=</sup> *O(h*2*m*+<sup>2</sup> *<sup>j</sup> )* where *hj* = *tj* − *tj*−<sup>1</sup> is the time step. A fundamental feature of piecewise Hermite interpolation is the following orthogonality property. For **any** functions *v(t)*, *w(t)*

$$\int\_{t\_{j-1}}^{t\_j} \frac{d^{m+1} \mathcal{A}^{\boldsymbol{\upsilon}}}{dt^{m+1}}(t) \cdot \frac{d^{m+1} \left(w - \mathcal{A}^{\boldsymbol{\upsilon}} w\right)}{dt^{m+1}}(t) dt = 0,\tag{5}$$

which in particular implies that interpolation reduces the *H <sup>m</sup>*+<sup>1</sup> seminorm.

#### **3 Time-Stepping Methods**

We begin by considering the initial value problem for a first-order system ordinary differential equations:

$$\frac{du}{dt} = f(u, t), \quad u(t\_0) = u\_0, \quad u(t) \in \mathbb{R}^d. \tag{6}$$

Given a discrete time sequence *tj > tj*−1, *j* = 1*,...,N*, with time steps *hj* = *tj* − *tj*−<sup>1</sup> we write down the Picard integral formulation of the time evolution over Hermite Methods in Time 121

a single step

$$u(t\_j) = u(t\_{j-1}) + \int\_{t\_{j-1}}^{t\_j} f(u(s), s)ds. \tag{7}$$

The construction of our time integration formula proceeds in three steps. We denote by *vj* the approximation to *u(tj )*.

1. Given *vj*−<sup>1</sup> and assuming for the moment that *vj* is known, use the differential equation to compute *m* scaled derivatives of its solution, *V(t)*, satisfying *V(t)* = *v*, = *j,j* − 1. Setting

$$F\_{\ell}(t) = f(V\_{\ell}(t), t), \tag{8}$$

these are recursively defined by the formula

$$\frac{d^k V\_\ell}{dt^k}(t\_\ell) = \frac{d^{k-1} F\_\ell}{dt^{k-1}}(t\_\ell), \quad k = 0, \ldots m. \tag{9}$$

2. Construct the Hermite-Birkhoff interpolant of this data; that is the polynomial *Pj*−1*/*2*(t*; *vj*−<sup>1</sup>*, vj )* of degree 2*m* + 1, satisfying

$$\frac{d^k P\_{j-1/2}}{dt^k}(t\_\ell; v\_{j-1}, v\_j) = \frac{d^k V\_\ell}{dt^k}(t\_\ell); \quad \ell = j - 1, j, \quad k = 0, \ldots, m. \tag{10}$$

3. Approximate (7) by replacing *u(t)* by *v* and replacing the integral by a *q* + 1 point quadrature rule with *f* evaluated at the Hermite interpolant:

$$w\_j = v\_{j-1} + h\_j \sum\_{k=0}^{q} w\_k f(P\_{j-1/2}(t\_{j,k}; v\_{j-1}, v\_j)).\tag{11}$$

4. Solve (11) for *vj* . Note that this is a system of *d* nonlinear equations for any *m*; that is, unlike standard implicit Runge–Kutta methods, the size of the nonlinear system is independent of the order.

We remark that we have not studied in detail the unique solvability of (11) in the stiff case. In our numerical experiments we used the solution at the current time step as an initial approximation for Newton iterations and simply accepted the solution to which the iterates converged.

To emphasize the ideas we write down some specific examples of methods with *m* = 1 and *m* = 2 making the simplifying assumption of autonomy; that is *f* = *f (u)*. The derivation of methods of arbitrary order is straightforward and the formulas can be trivially obtained using software capable of symbolic computations. To apply them at higher order one must evaluate higher derivatives of *f* , which is also possible using automatic differentiation tools [11].

*Example (m* = 1*)* Set

$$\tau = \frac{t - t\_{j-1}}{h\_j}.$$

Now the interpolant *Pj*−1*/*2*(t*; *vj*−<sup>1</sup>*, vj )* is given by:

$$P\_{j-1/2}(t; \upsilon\_{j-1}, \upsilon\_j) = \sum\_{k=0}^{3} a\_k \tau^k,\tag{12}$$

where

$$a\_0 = v\_{j-1}, \quad a\_1 = h\_j f(v\_{j-1}),$$

$$a\_2 = \Im\left(v\_j - v\_{j-1}\right) - h\_j \left(2f(v\_{j-1}) + f(v\_j)\right),\tag{13}$$

$$a\_3 = -2\left(v\_j - v\_{j-1}\right) + h\_j \left(f(v\_{j-1}) + f(v\_j)\right).$$

We next introduce a quadrature rule which is exact for polynomials of degree 3. Possible choices include the 2-point Gauss-Legendre (14) rules, or the 3-point Gauss-Radau (15) or Gauss-Lobatto rules. Note that by using two different rules we obtain a possible error indicator. Here are the two different methods used below. Note that the methods are identical if *f* is linear.

$$v\_j = v\_{j-1} + \frac{h\_j}{2} \left( f\left(P\_{j-1/2}(\alpha\_-; v\_{j-1}, v\_j)\right) + f\left(P\_{j-1/2}(\alpha\_+; v\_{j-1}, v\_j)\right) \right), \tag{14}$$

$$\begin{split} v\_{j} &= v\_{j-1} + \frac{h\_{j}}{36} \left( \beta\_{+} f\left( P\_{j-1/2}(\nu\_{-}; v\_{j-1}, v\_{j}) \right) + \beta\_{-} f\left( P\_{j-1/2}(\nu\_{+}; v\_{j-1}, v\_{j}) \right) \right. \\ &\left. + 4f\left( v\_{j} \right) \right), \end{split} \tag{15}$$

$$\alpha\_{\pm} = \left( 1 \pm \frac{1}{\sqrt{3}} \right), \quad \wp\_{\pm} = \frac{4 \pm \sqrt{6}}{10}, \quad \beta\_{\pm} = 16 \pm \sqrt{6}. \tag{16}$$

A time step is executed by solving the nonlinear system, (14) or (15), for *vj* . *Example (m* = 2*)* Now we also need the second time derivative of *u*,

$$\frac{d^2u}{dt^2} = \frac{d}{dt}f(u) = J(u)f(u),\tag{17}$$

where *J (u)* is the Jacobian derivative. The Hermite interpolant can now be written:

$$P\_{j-1/2}(t; \upsilon\_{j-1}, \upsilon\_j) = \sum\_{k=0}^{\mathfrak{s}} a\_k \tau^k,\tag{18}$$

where

$$a\_0 = v\_{j-1}, \quad a\_1 = h\_j f(v\_{j-1}), \quad a\_2 = \frac{h\_j^2}{2} J(v\_{j-1}) f(v\_{j-1}), \tag{19}$$

$$a\_3 = 10\left(v\_j - v\_{j-1}\right) - h\_j\left(6f(v\_{j-1}) + 4f(v\_j)\right) + \frac{h\_j^2}{2}\left(-3J(v\_{j-1})f(v\_{j-1}) + J(v\_j)f(v\_j)\right),$$

$$a\_4 = -15\left(v\_j - v\_{j-1}\right) + h\_j\left(8f(v\_{j-1}) + 7f(v\_j)\right) + \frac{h\_j^2}{2}\left(3J(v\_{j-1})f(v\_{j-1}) - J(v\_j)f(v\_j)\right),$$
 
$$\cdot 2$$

$$\mathcal{A}\_5 = 6\left(v\_j - v\_{j-1}\right) - 3h\_j \left(f(v\_{j-1}) + f(v\_j)\right) + \frac{h\_j^2}{2}\left(-J(v\_{j-1})f(v\_{j-1}) + J(v\_j)f(v\_j)\right).$$

Again we can now use, for example, the 3-point Gauss-Legendre or 4-point Gauss-Radau quadrature rules to produce the equation we must solve for *vj* .

#### *3.1 Stability and Consistency*

The consistency of the method is a straightforward consequence of its construction, and its linear stability properties can also be established.

**Theorem 1** *Assume that the quadrature rule has positive weights and is exact for polynomials of degree* 2*m* + 1*. Then the implicit Hermite method is A-stable and accurate of order* 2*m* + 2*.*

*Proof* Assume that *<sup>f</sup>* is smooth and that *u(t)* <sup>∈</sup> *<sup>C</sup>*2*m*+2*(*0*,T)*. Using (4), standard estimates for quadrature errors, and the Picard formula (7) we find for the truncation error

$$\pi\_j = \frac{u(t\_j) - u(t\_{j-1})}{h\_j} - \sum\_k w\_k f(P\_{j-1/2}(t\_{j,k}; u(t\_{j-1}), u(t\_j))),\tag{20}$$

$$|\tau\_j| = \frac{1}{h\_j} \left| \int\_{t\_{j-1}}^{t\_j} f(u(s))ds - h\_j \sum\_k w\_k f(P\_{j-1/2}(t\_{j,k}; u(t\_{j-1}), u(t\_j))) \right| $$

$$\begin{aligned} & \leq \frac{1}{h\_j} \left| \int\_{t\_{j-1}}^{t\_j} f(u(s)) ds - h\_j \sum\_k w\_k f(u(t\_{j,k})) \right| \\ & \quad + \left| \sum\_k w\_k \left( f(u(t\_{j,k})) - f(P\_{j-1/2}(t\_{j,k}; u(t\_{j-1}), u(t\_j))) \right) \right| \\ & \leq C h\_j^{2m+2} . \end{aligned} \tag{21}$$

Now consider the Dahlquist test problem, *f (u)* = *λu*. In this case all quadrature rules which are exact for the Hermite interpolant produce the same method. As interpolation is linear, we have that the coefficients of the interpolant are linear combinations *hk <sup>j</sup> λkvj*−<sup>1</sup> and *hk <sup>j</sup> λkvj* , *<sup>k</sup>* <sup>=</sup> <sup>0</sup>*,...m*. The Picard integral then increases the powers of *hj λ* by one so that the implicit system (11) can be rearranged to:

$$\mathcal{Q}\_{+}(h\_{j}\lambda)v\_{j} = \mathcal{Q}\_{-}(h\_{j}\lambda)v\_{j-1} \implies v\_{j} = \frac{\mathcal{Q}\_{-}(h\_{j}\lambda)}{\mathcal{Q}\_{+}(h\_{j}\lambda)}v\_{j-1},\tag{22}$$

where *Q*±*(hj λ)* are polynomials of degree *m* + 1. Consistency implies

$$e^{h\_j \lambda} = \frac{\mathcal{Q}\_-(h\_j \lambda)}{\mathcal{Q}\_+(h\_j \lambda)} + O\left(\left(h\_j \lambda\right)^{2m+3}\right). \tag{23}$$

The only rational function of the given degree with this accuracy is the diagonal Padé approximant. We thus conclude that our methods are A-stable [12].#

#### **4 Numerical Experiments**

Our first experiments treat standard problems from the ode literature and are restricted to the fourth and sixth order methods described above with either Gauss-Legendre or Gauss-Radau quadrature. Our practical implementations employ the classical Aitken algorithm adapted to Hermite interpolation to directly evaluate *Pj*−1*/*2*(tj,k , vj*−<sup>1</sup>*, vj )* and solve (11) using Newton's method with the Jacobian of the implicit system approximated by finite differences. For adaptive computations we


We then adjust the time step by the simple rule

$$h\_{j+1} = \left(\frac{\text{tol}}{\rho\_j}\right)^{1/(2m+3)} h\_j,\tag{24}$$

while also imposing a minimum time step.

Our final experiment examines the use of the method for evolving spectral discretizations of initial-boundary value problems for the Schrödinger equation.

#### *4.1 Arentsorf Orbit*

We first consider the problem of computing a periodic solutions of the restricted three-body problem which we reformulate as a first-order system of four variables:

$$\begin{aligned} \frac{d^2 y\_1}{dt^2} &= y\_1 + 2 \frac{dy\_2}{dt} - (1 - \mu) \frac{y\_1 + \mu}{\left( (y\_1 + \mu)^2 + y\_2^2 \right)^{3/2}} - \mu \frac{y\_1 - (1 - \mu)}{\left( (y\_1 - (1 - \mu))^2 + y\_2^2 \right)^{3/2}}, \\\\ \frac{d^2 y\_2}{dt^2} &= y\_2 - 2 \frac{dy\_1}{dt} - (1 - \mu) \frac{y\_2}{\left( (y\_1 + \mu)^2 + y\_2^2 \right)^{3/2}} - \mu \frac{y\_2}{\left( (y\_1 - (1 - \mu))^2 + y\_2^2 \right)^{3/2}}, \\\\ \mu &= 0.012277471, \quad y\_1(0) = .994, \quad \frac{d y\_1}{dt}(0) = y\_2(0) = 0, \\\\ \frac{d y\_2}{dt}(0) &= -2.01585106379082\ldots, \; T = 17.06521656015796\ldots \end{aligned}$$

(For graphs of the solution see [13, Ch. II].)

We note that this problem is not considered to be stiff. The main difficulty is a need for very small time steps when the orbits approach the singularities of *f* . However, we use it to verify convergence at the design order when (woefully inefficient) uniform time steps are employed and to test the utility of our naive time step adaptivity algorithm.

Results for fixed (small) time steps are displayed in Table 1. We observe that convergence is at design order and that the results for the two quadrature formulas are comparable, though the fourth order Radau method is somewhat more accurate than Gauss-Legendre with roles reversed at sixth order. The sixth order methods are more accurate with larger time steps. The error is simply , *(y*1*(T )* − *y*1*(*0*))*<sup>2</sup> + *(y*2*(T )* − *y*2*(*0*))*2.

Results for adaptive computations with *m* = 2 are shown in Table 2. Obviously, the adaptive methods lead to a very significant reduction in the number of time steps; an accuracy of 10−<sup>7</sup> is achieved with 264 steps of the adaptive method



**Fig. 1** Solution and time step history for the van der Pol oscillator with tolerance 10−<sup>8</sup>

while 35,000 uniform steps are required. Due to the sensitivity of the problem, the global error is much larger than the error tolerance, but is reduced in proportion to it.

#### *4.2 Van der Pol Oscillator*

Our second example is the van der Pol oscillator problem, which again we rewrite as a first order system:

$$
\frac{d^2\mathbf{y}}{dt^2} = \epsilon^{-1} \left( (\mathbf{1} - \mathbf{y}^2) \frac{d\mathbf{y}}{dt} - \mathbf{y} \right),
\tag{25}
$$

$$
\epsilon = 10^{-6}, \quad \mathbf{y}(0) = 2, \ \frac{d\mathbf{y}}{dt}(0) = 0.
$$

We solve up to *T* = 11 using the adaptive method with *m* = 2. We plot the solution and the time step histories for a tolerance of 10−<sup>10</sup> in Fig. 1. Note that very small steps are needed to resolve the fast transitions, while the problem is quite stiff in the regions where *y* is nearly constant. Plots for the other tolerances tested, 10−<sup>6</sup> and 10−10, are similar though the number of time steps required varies.

**Fig. 2** Left: Relative errors for NLS with various time steps and *m* = 3. Right: Relative errors for NLS with *h* = *.*01 and varying *m*

#### *4.3 Schrödinger Equation*

Lastly, we apply the method to evolve a Fourier pseudospectral discretization of the nonlinear Schrödinger equation. Precisely we consider the real problem

$$\frac{\partial v}{\partial t} = -\frac{\partial^2 w}{\partial x^2} - \left(v^2 + w^2\right)w, \quad \frac{\partial w}{\partial t} = \frac{\partial^2 v}{\partial x^2} + \left(v^2 + w^2\right)v,\tag{26}$$

for *<sup>x</sup>* <sup>∈</sup> *(*−8*,* <sup>8</sup>*)*, *<sup>t</sup>* <sup>∈</sup> *(*0*,* <sup>3</sup>*)* with periodic boundary conditions *∂v ∂x* <sup>=</sup> *∂w ∂x* = 0 at *x* = ±1. We approximate the periodization of the exact solitary wave solution

$$v(\mathbf{x},t) = \sqrt{50}\cos\left(r\mathbf{x} - \mathbf{st}\right) \cdot \text{sech}(\mathbf{5}(\mathbf{x} - ct)),$$

$$w(\mathbf{x},t) = \sqrt{50}\sin\left(r\mathbf{x} - \mathbf{st}\right) \cdot \text{sech}(\mathbf{5}(\mathbf{x} - ct)),\tag{27}$$

with *<sup>c</sup>* <sup>=</sup> <sup>2</sup>*π*, *<sup>r</sup>* <sup>=</sup> *<sup>π</sup>* , *<sup>s</sup>* <sup>=</sup> *<sup>π</sup>*<sup>2</sup> <sup>−</sup> 25. We note that the amplitude of the solitary wave is reduced by about 17 digits at a distance of 8 from its peak so that the interaction with periodic copies is negligible over the simulation time. We use 512 Fourier modes in the computation of the derivatives and experiments show that this is sufficient to represent the solitary wave to machine precision. The implicit system was solved using Newton iterations each time step. In Fig. 2 we present results for *m* = 3 (8th order) with varying time step and for *m* varying from 1 to 5 (order 4 through 12) with *<sup>h</sup>* <sup>=</sup> <sup>10</sup>−2. In both cases we observe rapid convergence. We also tabulate the errors at the final time and calculate the convergence rates when *m* = 3 in Table 3. The results are clearly consistent with the design order.

**Table 3** Relative errors for the Fourier pseudospectral discretization of the NLS (26) with solitary wave solution (27)


#### **5 Conclusions and Future Work**

In conclusion, we have demonstrated that Hermite-Birkhoff interpolation can be used to develop singly-implicit A-stable timestepping methods of arbitrary order. A number of possible generalizations and improvements to the method are possible. These include


**Acknowledgements** This work was supported in part by NSF Grant DMS-1418871. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **HPS Accelerated Spectral Solvers for Time Dependent Problems: Part II, Numerical Experiments**

**Tracy Babb, Per-Gunnar Martinsson, and Daniel Appelö**

#### **1 Introduction**

In this chapter describes a highly computationally efficient solver for equations of the form

$$
\kappa \frac{\partial u}{\partial t} = \mathcal{A}u(\mathbf{x}, t) + h(u, \mathbf{x}, t), \ \mathbf{x} \in \mathcal{Q}, t > 0,\tag{1}
$$

with initial data *u(x,* 0*)* = *u*0*(x)*. Here *L* is an elliptic operator acting on a fixed domain *Ω* and *h* is lower order, possibly nonlinear terms. We take *κ* to be real or imaginary, allowing for parabolic and Schrödinger type equations. We desire the benefits that can be gained from an implicit solver, such as L-stability and stiff accuracy, which means that the computational bottleneck will be the solution of a sequence of elliptic equations set on *Ω*. In situations where the elliptic equation to be solved is the same in each time-step, it is highly advantageous to use a *direct* (as opposed to *iterative*) solver. In a direct solver, an approximate solution operator to the elliptic equation is built once. The cost to build it is typically higher than the cost required for a single elliptic solve using an iterative method such as multigrid, but the upside is that after it has been built, each subsequent solve is very fast. In this chapter, we argue that a particularly efficient direct solver to use in this context is a method obtained by combining a multidomain spectral collocation discretization (a

T. Babb · D. Appelö (-)

University of Colorado, Boulder, CO, USA e-mail: tracy.babb@colorado.edu; daniel.appelo@colorado.edu

© The Author(s) 2020

P.-G. Martinsson University of Texas, Austin, TX, USA e-mail: pgm@ices.utexas.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_9

so-called "patching method", see e.g. Ch. 5.13 in [3]) with a nested dissection type solver. It has recently been demonstrated [1, 7, 12] that this combined scheme, which we refer to as a "Hierarchial Poincaré–Steklov (HPS)" solver, can be used with very high local discretization orders (up to *p* = 20 or higher) without jeopardizing either speed or stability, as compared to lower order methods.

In this chapter, we investigate the stability and accuracy that is obtained when combining high-order time-stepping schemes with the HPS method for solving elliptic equations. We restrict attention to relatively simple geometries (mostly rectangles). The method can without substantial difficulty be generalized to domains that can naturally be expressed as a union of rectangles, possibly mapped via curvilinear smooth parameter maps.

A longer version of this chapter with additional details is available at [2]. Also note that the conclusions are deferred to Part II of this paper (same issue).

#### **2 The Hierarchical Poincaré–Steklov Method**

In this section, we describe a computationally efficient and highly accurate technique for solving an elliptic PDE of the form

$$\begin{aligned} [Au](\mathbf{x}) &= g(\mathbf{x}), \qquad \mathbf{x} \in \mathcal{Q}, \\ u(\mathbf{x}) &= f(\mathbf{x}), \qquad \mathbf{x} \in \Gamma, \end{aligned} \tag{2}$$

where *Ω* is a domain with boundary *Γ* , and where *A* is a variable coefficient elliptic differential operator

$$[Au](\mathbf{x}) = -c\_{11}(\mathbf{x})[\partial\_1^2 u](\mathbf{x}) - 2c\_{12}(\mathbf{x})[\partial\_1 \partial\_2 u](\mathbf{x}) - c\_{22}(\mathbf{x})[\partial\_2^2 u](\mathbf{x})$$

$$+ c\_1(\mathbf{x})[\partial\_1 u](\mathbf{x}) + c\_2(\mathbf{x})[\partial\_2 u](\mathbf{x}) + c(\mathbf{x})u(\mathbf{x})$$

with smooth coefficients. In the present context, (2) represents an elliptic solve that is required in an implicit time-descretization technique of a parabolic PDE, as discussed in Sect. 1. For simplicity, let us temporarily suppose that the domain *Ω* is rectangular; the extension to more general domains is discussed in Remark 1.

Our ambition here is merely to provide a high level description of the method; for implementation details, we refer to [1, 2, 7–9, 12, 13].

#### *2.1 Discretization*

We split the domain *Ω* into *n*<sup>1</sup> ×*n*<sup>2</sup> boxes, each of size *h*×*h*. Then on each box, we place a *p* × *p* tensor product grid of Chebyshev nodes, as shown in Fig. 1. We use collocation to discretize the PDE (2). With {*xi*} *N <sup>i</sup>*=<sup>1</sup> denoting the collocation points,

**Fig. 1** The domain *Ω* is split into *n*<sup>1</sup> × *n*<sup>2</sup> squares, each of size *h* × *h*. In the figure, *n*<sup>1</sup> = 3 and *n*<sup>2</sup> = 2. Then on each box, a *p* × *p* tensor product grid of Chebyshev nodes is placed, shown for *p* = 7. At red nodes, the PDE (2) is enforced via collocation of the spectral differentiation matrix. At the blue nodes, we enforce continuity of the normal fluxes. Observe that the corner nodes (gray) are excluded from consideration

the vector u that represents our approximation to the solution *u* of (2) is given simply by u*(i)* ≈ *u(xi).* We then discretize (2) as follows:


Since we exclude the corner nodes from consideration, the total number of nodes in the grid equals *N* = *(p* − 2*) p n*<sup>1</sup> *n*<sup>2</sup> + *n*<sup>1</sup> + *n*<sup>2</sup> <sup>≈</sup> *<sup>p</sup>*<sup>2</sup> *<sup>n</sup>*<sup>1</sup> *<sup>n</sup>*2*.* The discretization procedure described then results in an *N* × *N* matrix A. For a node *i*, the value of A*(i,*:*)*u depends on what type of node *i* is:

$$\mathsf{A}(i,:)\mathsf{u} \approx \begin{cases} [Au](\mathfrak{x}\_{l}) \text{ for any interior (red) node,} \\ 0 \text{ for any edge node (blue) not on } \Gamma, \\ \partial\mathfrak{u}/\partial\mathfrak{n} \text{ for any edge node (blue) on } \Gamma. \end{cases}$$

This matrix A can be used to solve BVPs with a variety of different boundary conditions, including Dirichlet, Neumann, Robin, and periodic [12].

In many situations, a simple uniform mesh of the type shown in Fig. 1 is not optimal, since the regularity in the solution may vary greatly, due to corner singularities, localized loads, etc. The HPS method can easily be adapted to handle local refinement. The essential difficulty that arises is that when boxes of different sizes are joined, the collocation nodes along the joint boundary will not align. It is demonstrated in [1, 5] that this difficulty can stably and efficiently be handled by incorporating local interpolation operators.

#### *2.2 A Hierarchical Direct Solver*

A key observation in previous work on the HPS method is that the sparse linear system that results from the discretization technique described in Sect. 2.1 is particularly well suited for direct solvers, such as the well-known multifrontal solvers that compute an LU-factorization of a sparse matrix. The key is to minimize fill-in by using a so called nested dissection ordering [4, 6]. Such direct solvers are very powerful in a situation where a sequence of linear systems with the same coefficient matrix needs to be solved, since each solve is very fast once the coefficient matrix has been factorized. This is precisely the environment under consideration here. The particular advantage of combining the multidomain spectral collocation discretization described in Sect. 2.1 is that the time required for factorizing the matrix is *independent* of the local discretization order. As we will see in the numerical experiments, this enables us to attain both very high accuracy, and very high computational efficiency.

*Remark 1 (General Domains)* For simplicity we restrict attention to rectangular domains in this chapter. The extension to domains that can be mapped to a union of rectangles via smooth coordinate maps is relatively straight-forward, since the method can handle variable coefficient operators [12, Sec. 6.4]. Some care must be exercised since singularities may arise at intersections of parameter maps, which may require local refinement to maintain high accuracy.

The direct solver described exactly mimics the classical nested dissection method, and has the same asymptotic complexity of *O(N*1*.*5*)* for the "build" (or "factorization") stage, and then *O(N* log *N)* cost for solving a system once the coefficient matrix has been factorized. Storage requirements are also *O(N* log *N)*. A more precise analysis of the complexity that takes into account the dependence on the order *<sup>p</sup>* of the local discretization shows [1] that *<sup>T</sup>*build <sup>∼</sup> *N p*<sup>4</sup> <sup>+</sup> *<sup>N</sup>*1*.*5, and *<sup>T</sup>*solve <sup>∼</sup> *N p*<sup>2</sup> <sup>+</sup> *<sup>N</sup>* log *<sup>N</sup>*.

#### **3 Time-Stepping Methods**

For high-order time-stepping of (1), we use the so called Explicit, Singly Diagonally Implicit Runge–Kutta (ESDIRK) methods. These methods have a Butcher diagram with a constant diagonal *γ* and are of the form

> 0 0 2*γ γ γ c*<sup>3</sup> *a*3*,*<sup>1</sup> *a*3*,*<sup>2</sup> *γ . . . . . . . . . ... ... cs*−<sup>1</sup> *as*−1*,*<sup>1</sup> *as*−1*,*<sup>2</sup> *as*−1*,*<sup>3</sup> ··· *γ* 1 *b*<sup>1</sup> *b*<sup>2</sup> *b*<sup>3</sup> ··· *bs*−<sup>1</sup> *γ b*<sup>1</sup> *b*<sup>2</sup> *b*<sup>3</sup> ··· *bs*−<sup>1</sup> *γ*

ESDIRK methods offer the advantages of stiff accuracy and L-stability. They are particularly attractive when used in conjunction with direct solvers since the elliptic solve required in each stage involves the same coefficient matrix *(I* − *hγ L)*, where *h* is the time-step.

In general we split the right hand side of (1) into a stiff part, *F*[1] , that will be treated implicitly using ESDIRK methods, and a part, *F*[2] , that will be treated explicitly (with a Butcher table denoted *c*ˆ, *A*ˆ, and *b*ˆ). Precisely we will use the Additive Runge–Kutta (ARK) methods by Carpenter and Kennedy [11], of order 3, 4 and 5.

We may choose to formulate the Runge–Kutta method in terms of either solving for slopes or solving for stage solutions. We denote these the *ki* formulation and the *ui* formulation, respectively. When solving for slopes the stage computation is

$$k\_i^n = F^{[1]}(t\_n + c\_l \Delta t, u^n + \Delta t \sum\_{j=1}^s a\_{lj} k\_j^n + \Delta t \sum\_{j=1}^s \hat{a}\_{lj} l\_j^n), \quad i = 1, \dots, s,\tag{3}$$

$$l\_i^n = F^{[2]}(t\_n + c\_i \Delta t, u^n + \Delta t \sum\_{j=1}^s a\_{lj} k\_j^n + \Delta t \sum\_{j=1}^s \hat{a}\_{lj} l\_j^n), \quad i = 1, \dots, s. \tag{4}$$

Note that the explicit nature of (4) is encoded in the fact that the elements on the diagonal and above in *A*ˆ are zero. Once the slopes have been computed the solution at the next time-step is assembled as

$$
\mu^{n+1} = \mu^n + \Delta t \sum\_{j=1}^s b\_j k\_j^n + \Delta t \sum\_{j=1}^s \hat{b}\_j l\_j^n. \tag{5}
$$

If the method is instead formulated in terms of solving for the stage solutions the implicit solves take the form

$$u\_i^n = u^n + \Delta t \sum\_{j=1}^s \left( a\_{ij} F^{[1]}(t\_n + c\_j \Delta t, u\_j^n) + \hat{a}\_{ij} F^{[2]}(t\_n + c\_j \Delta t, u\_j^n) \right),$$

and the explicit update for *un*+<sup>1</sup> is given by

$$\mu^{n+1} = \mu^n + \Delta t \sum\_{j=1}^s b\_j (F^{[1]}(t\_n + c\_j \Delta t, \mu\_j^n) + F^{[2]}(t\_n + c\_j \Delta t, \mu\_j^n)).$$

The two formulations are algebraically equivalent but offer different advantages. For example, when working with the slopes we do not observe (see experiments presented in the second part of this paper) any order reduction due to time-dependent boundary conditions (see e.g. the analysis by Rosales et al. [14]). On the other hand and as discussed in some detail below, in solving for the slopes the HPS framework requires an additional step to enforce continuity.

We note that it is generally preferred to solve for the slopes when implementing implicit Runge–Kutta methods, particularly when solving very stiff problems where the influence of roundoff (or solver tolerance) errors can be magnified by the Lipschitz constant when solving for the stages directly.

*Remark 2* The HPS method for elliptic solves was previously used in [10], which considered a linear hyperbolic equation

$$\frac{\partial \mu}{\partial t} = \mathcal{A}\mathbb{I}(\mathbf{x}, t), \ \mathbf{x} \in \Omega, t > 0, \mathbf{0}$$

where *L* is a skew-Hermitian operator. The evolution of the numerical solution can be performed by approximating the propagator exp*(τL)* : *<sup>L</sup>*2*(Ω)* <sup>→</sup> *<sup>L</sup>*2*(Ω)* via a rational approximation

$$\exp(\tau \mathcal{L} \mathcal{Q}) \approx \sum\_{m=-M}^{M} b\_m (\tau \mathcal{L} - \alpha\_m)^{-1}.$$

If application of *(τ<sup>L</sup>* <sup>−</sup> *αm)*−<sup>1</sup> to the current solution can be reduced to the solution of an elliptic-type PDE it is straightforward to apply the HPS scheme to each term in the approximation. A drawback with this approach is that multiple operators must be formed and it is also slightly more convenient to time step non-linear equations using the Runge–Kutta methods we use here.

There are two modifications to the HPS algorithm that are necessitated by the use of ARK time integrators, we discuss these in the next two subsections.

#### *3.1 Neumann Data Correction in the Slope Formulation*

In the HPS algorithm the PDE is enforced on interior nodes and continuity of the normal derivative is enforced on the leaf boundary. Now, due to the structure of the update formula (5), if at some time *u<sup>n</sup>* has an error component in the null space of the operator that is used to solve for a slope *ki*, then this will remain throughout the solution process. Although this does not affect the stability of the method it may result in loss of relative accuracy as the solution evolves. As a concrete example consider the heat equation

$$
\mu\_l = \mu\_{\ge \ge}, \quad \ge \in [0, 2], t > 0,\tag{6}
$$

with the initial data *u(x,* 0*)* = 1−|*x* −1|, and with homogenous Dirichlet boundary conditions. We discretize this on two leaves which we denote by *α* and *β*.

Now in the *ki* formulation, we solve several PDEs for the *ki* values and update the solution as

$$
\mu^{n+1} = \mu^n + \Delta t \sum\_{j=1}^s b\_j k\_j^n.
$$

Here, even though the individual slopes have continuous derivatives the kink in *u<sup>n</sup>* will be propagated to *un*+1. In this particular example we would end up with the incorrect steady state solution *u(x, t)* = 1 − |*x* − 1|.

Fortunately, this can easily be mitigated by adding a consistent penalization of the jump in the derivative of the solution during the merging of two leaves (for details see Section 4 in [1]). That is, if we denote the jump by [[·]] we replace the condition 0 = [[*T k* <sup>+</sup> *hk*]] where *T k* is the derivative from the homogenous part and *hk* is the derivative for the particular solution (of the slope) by the condition [[*T k* <sup>+</sup>*hk* <sup>−</sup>*Δt*−<sup>1</sup>*hu*]] = 0. In comparison to [1] we get the slightly modified merge formula

$$\mathbb{K}\_{l,3} = \left(\mathsf{T}\_{3,3}^{\alpha} - \mathsf{T}\_{3,3}^{\beta}\right)^{-1} \left(\mathsf{T}\_{3,2}^{\beta}\mathsf{K}\_{l,2} - \mathsf{T}\_{3,1}^{\alpha}\mathsf{K}\_{l,1} + \mathsf{h}\_{3}^{k,\beta} - \mathsf{h}\_{3}^{k,\alpha} - \frac{1}{\Delta t}(\mathsf{h}\_{3}^{u,\alpha} - \mathsf{h}\_{3}^{u,\beta})\right),$$

along with the modified equation for the fluxes of the particular solution on the parent box

$$
\begin{split}
\begin{bmatrix}
\mathbf{v}\_{1} \\
\mathbf{v}\_{2}
\end{bmatrix} &= \left( \begin{bmatrix}
\mathsf{T}\_{1,1}^{\alpha} & \mathbf{0} \\
\mathbf{0} & \mathsf{T}\_{2,2}^{\beta}
\end{bmatrix} + \begin{bmatrix}
\mathsf{T}\_{1,3}^{\alpha} \\
\mathsf{T}\_{2,3}^{\beta}
\end{bmatrix} \left(\mathsf{T}\_{3,3}^{\alpha} - \mathsf{T}\_{3,3}^{\beta}\right)^{-1} \left[-\mathsf{T}\_{3,1}^{\alpha} \mid \mathsf{T}\_{3,2}^{\beta}\right] \right) \begin{bmatrix}
\mathsf{k}\_{i,1} \\
\mathsf{k}\_{i,2}
\end{bmatrix} + \begin{bmatrix}
\mathsf{k}\_{i,1}^{\alpha} \\
\mathsf{T}\_{2,3}^{\beta}
\end{bmatrix} \\
&= \begin{bmatrix}
\mathsf{h}\_{1}^{k,\alpha} \\
\mathsf{h}\_{2}^{k,\beta}
\end{bmatrix} + \begin{bmatrix}
\mathsf{T}\_{1,3}^{\alpha} \\
\mathsf{T}\_{2,3}^{\beta}
\end{bmatrix} \left(\mathsf{T}\_{3,3}^{\alpha} - \mathsf{T}\_{3,3}^{\beta}\right)^{-1} \left(\mathsf{h}\_{3}^{\beta} - \mathsf{h}\_{3}^{\alpha} - \frac{1}{\Delta t} (\mathsf{h}\_{3}^{u,\alpha} - \mathsf{h}\_{3}^{u,\beta})\right).
\end{split}
$$

Due to space we must refer to [1] for a detailed discussion of these equations. Briefly, *hk,α* and *hk,β* above denote the spectral derivative on each child's boundary for the particular solution to the PDE for *ki* and are already present in [1]. However, *hu,α* and *hu,β* , which denote the spectral derivative of *u<sup>n</sup>* on the boundary from each child box, are new additions.

The above initial data is of course extreme but we note that the problem persists for any non-polynomial initial data with the size of the (stationary) error depending on resolution of the simulation. We further note that the described penalization removes this problem without affecting the accuracy or performance of the overall algorithm.

*Remark 3* Although for linear constant coefficient PDE it may be possible to project the initial data in a way so that interior affine functions do not cause the difficulty above, for greater generality, we have chosen to enforce the extra penalization throughout the time evolution.

*Remark 4* When utilizing the *ui* formulation in a purely implicit problem we do not encounter the difficulty described above. This is because we enforce continuity of the derivative in *u<sup>n</sup> <sup>s</sup>* when solving

$$\mathcal{L}(I - \Delta t \boldsymbol{\gamma} \mathcal{L} \boldsymbol{\mathcal{Q}}) \boldsymbol{u}\_s^n = \boldsymbol{u}^n + \Delta t \mathcal{L} \Big( \sum\_{j=1}^{s-1} a\_{sj} \boldsymbol{u}\_j^n \Big) + \Delta t \sum\_{j=1}^{s-1} a\_{sj} \mathbf{g} \left( \boldsymbol{x}, t\_n + c\_j \Delta t \right),$$

followed by the update *<sup>u</sup>n*+<sup>1</sup> <sup>=</sup> *<sup>u</sup><sup>n</sup> s* .

#### *3.2 Enforcing Continuity in the Explicit Stage*

The second modification is to the first explicit stage in the *ki* formulation. Solving a problem with no forcing this stage is simply

$$k\_1^n = \mathcal{Q}(u\_n).$$

When, for example, *L* is the Laplacian, we must evaluate it on all nodes on the interior of the physical domain. This includes the nodes on the boundary between two leafs where the spectral approximation to the Laplacian can be different if we use values from different leaves. The seemingly obvious choice, replacing the Laplacian on the leaf boundary by the average, leads to instability. However, stability can be restored if we enforce *k<sup>n</sup>* <sup>1</sup> = *L(un)* on the interior of each leaf and continuity of the derivative across each leaf boundary. Algorithmically, this is straightforward as these are the same conditions that are enforced in the regular HPS algorithm, except in this case we simply have an identity equation for *k*<sup>1</sup> on the interior nodes instead of a full PDE.

Although it is convenient to enforce continuity of the derivative using the regular HPS algorithm it can be done in a more efficient fashion by forming a separate system of equations involving only data on the leaf boundary nodes. In a single dimension on a discretization with *n* leafs this reduces the work associated with enforcing continuity of the derivative across leaf boundary nodes from solving *n* × *(p* − 1*)* − 1 equations for *n*×*(p* − 1*)* − 1 unknowns to solving a tridiagonal system of equations *n* − 1 equations for *n* − 1 unknowns.

In two dimensions the system is slightly different, but if we have *n*×*n* leafs with *p* × *p* Chebyshev nodes on each leaf then eliminating the explicit equations for the interior nodes reduces the system to *(p*−2*)*×2*n* independent tridiagonal systems of *n*−1 equations with *n*−1 unknowns for a total of *(p* −2*)*×2*n*×*(n*−1*)* equations with *(p* − 2*)* × 2*n* × *(n* − 1*)* unknowns.

When the *ui* formulation is used for a fully implicit problem the intermediate stage values still requires us to evaluate *Lun*, but this quantity only enters through the body load in the intermediate stage PDEs. The explicit first stage in this formulation is simply *u<sup>n</sup>* <sup>1</sup> <sup>=</sup> *<sup>u</sup>n*. Furthermore, while we must calculate

$$
\mu^{n+1} = \mu^n + \Delta t \mathcal{L} \Big( \sum\_{j=1}^s a\_{sj} \mu\_j^n \Big),
$$

this is equivalent to *u<sup>n</sup> <sup>s</sup>* since *bj* <sup>=</sup> *asj* and we simply take *<sup>u</sup>n*+<sup>1</sup> <sup>=</sup> *<sup>u</sup><sup>n</sup> s* .

When both explicit and implicit terms are present, we proceed differently. Now, the values of *u<sup>n</sup> <sup>i</sup>* look almost identical to the implicit case and we still avoid the problem of an explicit "solve" in *u<sup>n</sup>* <sup>1</sup>, but we also have

$$u^{n+1} = u^n + \Delta t \sum\_{j=1}^s b\_j (F^{[1]}(t\_n + c\_j \Delta t, u\_j^n) + F^{[2]}(t\_n + c\_j \Delta t, u\_j^n))$$

The ESDIRK method has the property that *bj* = *asj* , but for the explicit Runge– Kutta method we have *bj* = ˆ*asj* . When the explicit operator *<sup>F</sup>*[2] does not contain partial derivatives we need not enforce continuity of the derivative and can simply reformulate the method as

$$
\mu^{n+1} = \mu\_s^n + \Delta t \sum\_{j=1}^s (a\_{sj} - \hat{a}\_{sj}) F^{[2]}(t\_n + c\_j \Delta t, u\_j^n)
$$

#### **4 Boundary Conditions**

The above description for Runge–Kutta methods does not address how to impose boundary conditions for a system of ODEs resulting from a discretization of a PDE. In particular, the different formulations incorporate boundary conditions in slightly different ways.

In this work we consider Dirichlet, Neumann, and periodic boundary conditions. For periodic boundary conditions the intermediate stage boundary conditions are enforced to be periodic for both formulations. As the *ki* stage values are approximations to the time derivative of *u*, the imposed Dirichlet boundary conditions for *<sup>x</sup>* <sup>∈</sup> *<sup>Γ</sup>* are *<sup>k</sup><sup>n</sup> <sup>i</sup>* = *ut(x, tn* + *ciΔt)*. When solving for *ui* one may attempt to enforce boundary conditions using *ui* = *u(x, t* + *ciΔt), x* ∈ *Γ* . However, as demonstrated in part two of this series and discussed in detail in [14], this results in order reduction for time dependent boundary conditions.

In the HPS algorithm, Neumann or Robin boundary conditions are mapped to Dirichlet boundary conditions using the linear Dirichlet to Neumann operator as discussed for example in [1].

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **On the Use of Hermite Functions for the Vlasov–Poisson System**

**Lorella Fatone, Daniele Funaro, and Gianmarco Manzini**

#### **1 Introduction**

A semi-Lagrangian spectral method has been proposed in [8] for the numerical approximation of the nonrelativistic Vlasov–Poisson equations, which describe the dynamics of a collisionless plasma of charged particles, coupled under the effect of their own electric field. We assume for simplicity that the development of the plasma is only due to electrons. Moreover, we just treat the case of a 1D-1V distribution function, defined in a phase space consisting of the two onedimensional independent variables *x* (space) and *v* (velocity). The approximation introduced in [8] has been initially developed and tested on Fourier-Fourier periodic discretizations, for both variables in the phase space. In the successive paper [9], the approximation in the variable *v* has been approached with the help of Hermite functions, i.e., Hermite polynomials multiplied by the Gaussian weight exp *(*−*v*2*)*.

Semi-Lagrangian methods for plasma physics calculations were originally proposed in [5, 18] and more recently in [6, 15, 16]. By this approach, at different times, the solution is approximated at the nodes of a Cartesian grid covering the space-

L. Fatone

D. Funaro

G. Manzini (-)

Dipartimento di Matematica, Università degli Studi di Camerino, Camerino, Italy e-mail: lorella.fatone@unicam.it

Dipartimento di Scienze Chimiche e Geologiche, Università degli Studi di Modena e Reggio Emilia, Modena, Italy e-mail: daniele.funaro@unimore.it

Group T-5, Applied Mathematics and Plasma Physics, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA e-mail: gmanzini@lanl.gov

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_10

velocity domain. The solution at each space-velocity node is traced back along the characteristic curve originating backward from that node. In [8] a high-order Taylor expansion of the characteristic curves is used to trace back the solution in time, which is then approximated by spectral interpolation. Such a method guarantees the conservation of the main physical quantities (charge, mass, and momentum).

The first attempt in using Hermite polynomials to solve the Vlasov equation dates back to the work [10], where the Hermite basis is used in the velocity variable to describe a plasma in a physical state near the thermodynamic equilibrium. Within this approach, exact discrete conservation laws can be constructed [7, 13, 14, 20, 21]. The weight function of the Hermite basis can be generalized by introducing a parameter *<sup>α</sup>* in such a way that it becomes exp*(*−*α*2*v*2*)*. A proper choice of this parameter can significantly improve the convergence [2, 3, 19]. This fact was also confirmed in earlier works on plasmas physics based on Hermite spectral methods (see [11, 17] and more recently [4]).

The paper is organized as follows. In Sect. 2, we present the continuous model, i.e., the 1D-1V Vlasov equation. In Sect. 3, we introduce the spectral approximation in the phase space. In Sect. 4, we present the semi-Lagrangian schemes based on an approximation of the characteristic curves coupled with a second-order backward differentiation formula (BDF). In Sect. 5, we numerically assess the performance of the method for a standard test case, and we show how the solution's behavior can be affected by the choice of a certain parameter *β*, acting on the location of Hermite weight function.

#### **2 The Continuous Model**

We deal with the 1D-1V Vlasov equation defined in the domain <sup>=</sup> *x* <sup>×</sup> <sup>R</sup>, with *x* <sup>⊆</sup> <sup>R</sup>. The unknown *<sup>f</sup>* <sup>=</sup> *f (t , x, v)* denotes the probability of finding negative charged particles at the location *x* with velocity *v*. This is solution of the problem

$$\frac{\partial f}{\partial t} + v \frac{\partial f}{\partial x} - E(t, x) \frac{\partial f}{\partial v} = 0, \qquad t \in (0, T], \ x \in \Omega\_{\text{x}}, \ v \in \mathbb{R}. \tag{1}$$

At time *t* = 0 we have the initial distribution *f (*0*, x, v)* = *f (x, v)* ¯ . The problem is nonlinear, since the electric field *E* is coupled with *f* . Indeed, we set

$$\frac{\partial E}{\partial \boldsymbol{x}}(t, \boldsymbol{x}) = 1 - \rho(t, \boldsymbol{x}) = 1 - \int\_{\mathbb{R}} f(t, \boldsymbol{x}, \boldsymbol{v}) d\boldsymbol{v},\tag{2}$$

where *ρ* denotes the electron charge density. System (1)–(2) in the unknowns *f* and *E* is a simplification of the Vlasov–Poisson equations in two or three dimensional space domains. Uniqueness of the solution is ensured by imposing that

$$\int\_{\Omega\_x} E(t, x)dx = 0,\qquad \text{which implies that}\qquad \int\_{\Omega\_x} \rho(t, x)dx = |\Omega\_x|,\qquad \text{(3)}$$

where |*x*| is the size of *x*. We assume periodic boundary conditions in the variable *x* and a suitable exponential decay at infinity for the variable *v*. After integration and by using the boundary constraints, we obtain the conservation of mass

$$\frac{d}{dt} \int\_{\Omega} f(t, x, v) \, dx \, dv = 0. \tag{4}$$

When *f* and *E* are smooth enough, for a sufficiently small *δ >* 0, the local system of characteristics associated with (1) is given by the curves *(X(τ ), V (τ ))* solving

$$\frac{dX}{d\tau} = -V(\tau), \qquad \frac{dV}{d\tau} = E(\tau, X(\tau)), \qquad \tau \in ]t - \delta, t + \delta[, \tag{5}$$

with the condition that *(X(t), V (t))* = *(x, v)* when *τ* = *t*. With this setting we have in mind that for *τ >* 0 we proceed backward. Under suitable regularity assumptions, there exists a unique solution of the Vlasov–Poisson problem (1)–(2) which is formally obtained by propagating the initial condition along the characteristic curves described by (5), i.e. we have

$$f(t, \ge, v) = f(X(t), V(t)),\tag{6}$$

where we recall that *f*¯ is the initial datum. By using the first-order approximation

$$X(\tau) = x - v(\tau - t), \qquad V(\tau) = v + E(t, x)(\tau - t), \tag{7}$$

the Vlasov equation is satisfied up to an error decaying as |*τ* − *t*|, for *τ* tending to *t*.

#### **3 Phase-Space Discretization**

We briefly recall the construction of the approximation method proposed in [8]. At each point of a given grid, the new value of the discrete solution is set up to be equal to the value obtained by going backward, by a suitably small amount, along the local characteristic lines. The algorithm follows from a Taylor expansion of arbitrary order, where the derivatives in the variable *x* and *v* are carried out with spectral accuracy. In particular, for the variable *x* we consider the domain *x* = [0*,* 2*π*[. Given the positive integer *N*, we have the equispaced nodes *xi* = 2*π i/N , i* = 0*,* 1*,...,N* − 1. Regarding the direction *v*, when *M* is a given positive integer, the nodes *vj , j* = 0*,* 1*,...,M* − 1, are the zeros of *HM*, which is the Hermite polynomial of degree *M*.

We introduce the polynomial Lagrangian basis functions for the *x* and *v* variables, that are *B(N) <sup>i</sup> (xn)* <sup>=</sup> *δin* and *<sup>B</sup>(M) <sup>j</sup> (vm)* = *δjm*, where *δij* is the usual Kronecker symbol. We recall that Hermite functions are obtained from Hermite polynomials after multiplication by the weight *ω(v)* <sup>=</sup> *<sup>e</sup>*−*v*<sup>2</sup> . We also define the discrete spaces

$$\mathbf{X}\_N = \text{span}\left\{ \boldsymbol{B}\_l^{(N)} \right\}\_{l=0,1,\ldots,N-1}, \quad \mathbf{Y}\_{N,M} = \text{span}\left\{ \boldsymbol{B}\_l^{(N)} \boldsymbol{B}\_j^{(M)} \boldsymbol{\omega} \right\}\_{j=0,1,\ldots,N-1}. \tag{8}$$

Any function *fN,M* that belongs to **Y***N,M* can be represented as

$$f\_{N,M}(\mathbf{x}, \upsilon) = \sum\_{i=0}^{N-1} \sum\_{j=0}^{M-1} c\_{ij} \, \mathcal{B}\_i^{(N)}(\mathbf{x}) \, \mathcal{B}\_j^{(M)}(\upsilon) \, \boldsymbol{\omega}(\upsilon), \tag{9}$$

where the coefficients of such an expansion are given by *cij* = *fN,M (xi, vj )*.

In the following, the matrices *d(N,s) ni* and *<sup>d</sup>(M,s) mj* denote the *<sup>s</sup>*-th derivative of *<sup>B</sup>(N) i* evaluated at point *xn* and *(B(M) <sup>j</sup> ω)* evaluated at point *vm*

$$d\_{nl}^{(N,s)} = \frac{d^s B\_l^{(N)}}{d\mathbf{x}^s}(\mathbf{x}\_n) \quad \text{and} \qquad d\_{mj}^{(M,s)} = \frac{d^s \left(B\_j^{(M)} \alpha \right)}{d\upsilon^s}(\upsilon\_m). \tag{10}$$

As a special case, we set *d(N,*0*) ni* <sup>=</sup> *δni*, *<sup>d</sup>(M,*0*) mj* = *δmj* .

Now, let us assume that the one-dimensional function *EN* ∈ **X***<sup>N</sup>* is known. Given *t >* 0, by taking *τ* = *t* − *t* in formula (7), we define the new set of points *x*˜*nm* = *xn* − *vm t* and *v*˜*nm* = *vm* + *EN (xn)
t*. To evaluate a function *fN,M* ∈ **Y***N,M* at the new points *(x*˜*nm, v*˜*nm)* through the coefficients *cij* , we use a Taylor expansion in time. By omitting the terms in *t* of order higher than one, we get

$$\begin{aligned} \mathcal{B}\_{l}^{(N)}(\tilde{\mathbf{x}}\_{nm}) \Big( \mathcal{B}\_{j}^{(M)} \boldsymbol{\omega} \Big)(\tilde{v}\_{nm}) & \approx \\\\ \delta\_{ln} \, \delta\_{jm} \boldsymbol{\omega} (v\_{m}) - v\_{m} \, \Delta t \, \delta\_{jm} \, \mathcal{d}\_{nl}^{(N,1)} \boldsymbol{\omega} (v\_{m}) + E\_{N}(\mathbf{x}\_{n}) \, \Delta t \, \delta\_{ln} \, \mathcal{d}\_{mj}^{(M,1)}. \end{aligned} \tag{11}$$

By substituting (11) in (9), we obtain the approximation

$$\begin{split} f\_{N,M}(\tilde{\mathbf{x}}\_{nm}, \tilde{v}\_{nm}) &= \sum\_{l=0}^{N-1} \sum\_{j=0}^{M-1} c\_{lj} \, \mathcal{B}\_{l}^{(N)}(\tilde{\mathbf{x}}\_{nm}) \, \mathcal{B}\_{j}^{(M)}(\tilde{v}\_{nm}) \, \boldsymbol{\alpha}(\tilde{v}\_{nm}) \\ &\approx c\_{nm} \boldsymbol{\alpha}(v\_{m}) - v\_{m} \boldsymbol{\alpha}(v\_{m}) \, \Delta t \sum\_{l=0}^{N-1} d\_{nl}^{(N,1)} c\_{lm} + E\_{N}(\mathbf{x}\_{n}) \, \Delta t \sum\_{j=0}^{M-1} d\_{mj}^{(M,1)} c\_{nj}, \end{split} \tag{12}$$

which is the main building block for more advanced schemes.

#### **4 Discretization of the Vlasov Equation**

Given the time instants *t <sup>k</sup>* <sup>=</sup> *k
t* <sup>=</sup> *k T /K* for any integer *<sup>k</sup>* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,...,K*, we consider the approximation of the unknowns *f* and *E* of problem (1)–(2), given by

$$\left(f\_{N,M}^{(k)}(\mathbf{x},v), E\_N^{(k)}(\mathbf{x})\right) \cong \left(f(t^k, \mathbf{x}, v), \, E(t^k, \mathbf{x})\right), \qquad \mathbf{x} \in \mathfrak{Q}\_{\mathbf{x}}, \, v \in \mathbb{R}, \tag{13}$$

where the function *f (k) N,M* belongs to **<sup>Y</sup>***N,M* and the function *<sup>E</sup>(k) <sup>N</sup>* belongs to **X***<sup>N</sup>* . Concerning the density function, we define

$$\rho\_N^{(k)}(\mathbf{x}) = \int\_{\Omega\_v} f\_{N,M}^{(k)}(\mathbf{x}, v) \, dv \, \simeq \rho(t^k, \mathbf{x}). \tag{14}$$

Hence, at any time step *k*, we express *f (k) N,M* in the following way

$$f\_{N,M}^{(k)}(\mathbf{x},\boldsymbol{\upsilon}) = \sum\_{i=0}^{N-1} \sum\_{j=0}^{M-1} c\_{ij}^{(k)} \, \mathcal{B}\_i^{(N)}(\boldsymbol{\upsilon}) \, \mathcal{B}\_j^{(M)}(\boldsymbol{\upsilon}) \boldsymbol{\omega}(\boldsymbol{\upsilon}), \tag{15}$$

where *c (k) ij* <sup>=</sup> *<sup>f</sup> (k) N,M (xi, vj )*. At time *t* = 0, we use the initial condition *c (*0*) ij* = *f (*0*, xi, vj )* = *f (x* ¯ *i, vj )*.

Suppose that *E(k) <sup>N</sup>* is given at step *k*. According to [8], we write

$$E\_N^{(k)}(\mathbf{x}) = -\sum\_{n=1}^{N/2} \frac{1}{n} \left[ \hat{a}\_n^{(k)} \sin(n\mathbf{x}) - \hat{b}\_n^{(k)} \cos(n\mathbf{x}) \right],\tag{16}$$

where the discrete Fourier coefficients *a*ˆ *(k) <sup>n</sup>* and *b*ˆ*(k) <sup>n</sup>* , *n* = 1*,* 2*,...,N/*2, are suitably related to those of *ρ(k) <sup>N</sup>* .

By taking *τ* = *t* − *t* in (7), we define *x*˜*nm* = *xn* − *vm t* and *v*˜*nm* = *vm* <sup>+</sup> *<sup>E</sup>(k) <sup>N</sup> (xn)
t*. The distribution function *f* is expected to remain constant along the characteristics. The most straightforward discretization method is obtained by advancing the coefficients according to the approximation

$$f\_{N,M}^{(k+1)}(\mathbf{x}\_n, \upsilon\_m) \approx f\_{N,M}^{(k)}(\tilde{\mathbf{x}}\_{nm}, \tilde{\upsilon}\_{nm}).\tag{17}$$

This states that the value of *f (k*+1*) N,M* , at the grid points and time step *(k* + 1*)
t*, is assumed to correspond to the previous value at time *k
t*, recovered by going backwards along the characteristics. To compute *<sup>v</sup>*˜*nm*, we should use *<sup>E</sup>(k*+1*) <sup>N</sup> (xn)* instead of *E(k) <sup>N</sup> (xn)*. However, the distance between these two quantities is of the order of *t*, so that the replacement has no practical effects on the accuracy of firstorder methods. Between each step *k* and the successive one, we need to update the electric field. This can be done by using the Gaussian quadrature formula in (14), so obtaining

$$\rho\_N^{(k)}(\mathbf{x}) = \sum\_{j=0}^{M-1} \frac{1}{\omega(v\_j)} f\_{N,M}^{(k)}(\mathbf{x}\_l, v\_j) \, w\_j = \sum\_{j=0}^{M-1} \frac{1}{\omega(v\_j)} c\_{lj}^{(k)} \, w\_j,\tag{18}$$

where *wj* , for *j* = 1*,...,M* − 1, are the quadrature weights. Afterwards, in order to compute the new point-values *E(k*+1*) <sup>N</sup> (xn)* of the electric field, it is necessary to integrate *ρ(k) <sup>N</sup>* . By using approximation (12) in (17), we end up with the first-order explicit scheme of Euler type:

$$c\_{nm}^{(k+1)} = c\_{nm}^{(k)} + \Delta t \, \Phi\_{nm}^{(k)},\tag{19}$$

where

$$\Phi\_{nm}^{(k)} = -v\_m \sum\_{l=0}^{N-1} d\_{nl}^{(N,1)} c\_{lm}^{(k)} + E\_N^{(k)}(\mathbf{x}\_n) \sum\_{j=0}^{M-1} d\_{mj}^{(M,1)} c\_{nj}^{(k)} \frac{1}{\alpha(v\_m)}.\tag{20}$$

The parameter *t* must satisfy a suitable CFL condition, which is obtained by requiring that the point *(x*˜*nm, v*˜*nm)* falls inside the box ]*xn*−<sup>1</sup>*, xn*+1[×]*vm*−<sup>1</sup>*, vm*+1[. A straightforward way to increase the time accuracy is to use a multistep discretization scheme as the second-order accurate two-step BDF scheme. We have

$$f\_{N,M}^{(k+1)}(\mathbf{x}\_n, \upsilon\_m) \approx \frac{4}{3} f\_{N,M}^{(k)}(\tilde{\mathbf{x}}\_{nm}, \tilde{\upsilon}\_{nm}) - \frac{1}{3} f\_{N,M}^{(k-1)}(\tilde{\tilde{\mathbf{x}}}\_{nm}, \tilde{\tilde{\upsilon}}\_{nm}),\tag{21}$$

where *(x*˜*nm, v*˜*nm)* is the point obtained from *(xn, vm)* going back of one step *t* along the characteristic lines. Similarly, the point *(*˜ *x*˜*nm,* ˜ *v*˜*nm)* is obtained by going two steps back along the characteristic lines, i.e., by using 2*t* instead of *t* when computing *x*˜*nm* and *v*˜*nm*. Despite the fact that a BDF scheme is commonly presented as an implicit technique, in our context (*f* constant along the characteristics) it assumes the form of an explicit method. In terms of the coefficients, we end up with the scheme

$$c\_{nm}^{(k+1)} = \frac{4}{3} \left( c\_{nm}^{(k)} + \Delta t \, \Phi\_{nm}^{(k)} \right) - \frac{1}{3} \left( c\_{nm}^{(k-1)} + 2 \Delta t \, \Phi\_{nm}^{(k-1)} \right)$$

$$= \frac{4}{3} c\_{nm}^{(k)} - \frac{1}{3} c\_{nm}^{(k-1)} + \frac{2}{3} \Delta t \left[ -v\_m \sum\_{i=0}^{N-1} d\_{nl}^{(N,1)} (2c\_{lm}^{(k)} - c\_{lm}^{(k-1)}) \right.$$

$$+ E\_N^{(k)} (\mathbf{x}\_n) \sum\_{j=0}^{M-1} d\_{mj}^{(M,1)} (2c\_{nj}^{(k)} - c\_{nj}^{(k-1)}) \frac{1}{\omega(v\_m)} \right]. \tag{22}$$

From theoretical considerations and the experiments in [8], it turns out that the above method is actually second-order accurate in *t*. Higher order schemes can be obtained with similar principles. All the above schemes guarantee mass conservation (see (4) for the continuous case), which is a crucial physical property.

For practical purposes, it is advisable to make the change of variable *f (t , x, v)* = *p(t , x, v)* exp*(*−*v*2*)* in the Vlasov equation, so obtaining

$$\left[\frac{\partial p}{\partial t} + v \frac{\partial p}{\partial x} - E(t, x)\left[\frac{\partial p}{\partial v} - 2vp\right] = 0, \ t \in (0, T], \ x \in \Omega\_{\ge}, \ v \in \mathbb{R}.\tag{23}$$

At time step *k*, the function *p(t<sup>k</sup> , x, v)* is approximated by a function *p(k) N,M (x, v)* in such a way that *p(k) N,M <sup>e</sup>*−*v*<sup>2</sup> belongs to the finite dimensional space **Y***N,M* .

A generalization consists in introducing a real parameter *α* and assuming that the weight function is *ω(v)* <sup>=</sup> exp*(*−*α*2*v*2*)*. The approximation scheme can be easily adjusted by modifying nodes and weights of the Gaussian formula, through a multiplication by suitable constants. The difficulty in the implementation is practically the same, but, as observed in [9], the results are quite sensitive to the variation of *α*.

#### **5 Numerical Experiments**

The numerical scheme here proposed is validated in the standard two-stream instability benchmark test. We consider the Vlasov–Poisson problem (1)–(2) where we set *x* = [0*,* 4*π*[, *v* = [−5*,* 5]. The initial solution is given by

$$\bar{f}(\mathbf{x}, \upsilon) = \frac{1}{2a\sqrt{2\pi}} \left[ G\_R(\upsilon) + G\_L(\upsilon) \right] (1 + \epsilon \cos(\kappa \upsilon)),\tag{24}$$

where *GR(v)* <sup>=</sup> *<sup>e</sup>*−*α*2*(v*−*β)*<sup>2</sup> and *GR(v)* <sup>=</sup> *<sup>e</sup>*−*α*2*(v*+*β)*<sup>2</sup> are two Gaussians centered symmetrically at the points *v* = ±*β*. The parameters for (24) are: *a* = 1*/* √ 8, *#* <sup>=</sup> <sup>10</sup>−3, *<sup>κ</sup>* <sup>=</sup> <sup>0</sup>*.*5, *<sup>α</sup>* = ¯*<sup>α</sup>* <sup>=</sup> 2, *<sup>β</sup>* <sup>=</sup> *<sup>β</sup>*¯ <sup>=</sup> 1.

In all the experiments that follow, we integrate up to time *T* = 30 using the second-order BDF scheme with a suitably small time step, in order to guarantee stability and a good accuracy. In this way we can concentrate our attention to the spectral approximation in the variable *x* and *v*. A study of the convergence rate in time of the proposed numerical scheme can be found in [8]. First of all, in Fig. 1 we show the results at time *T* = 30 of the solution recovered by the Fourier-Fourier method, by choosing *<sup>N</sup>* <sup>=</sup> <sup>2</sup>5, *<sup>M</sup>* <sup>=</sup> <sup>2</sup><sup>6</sup> and time step equal to *<sup>t</sup>* <sup>=</sup> <sup>0</sup>*.*00125. This will be the referring figure for the successive comparisons. Besides we show the corresponding time evolution of | ˆ*a(k)* <sup>1</sup> |, the first Fourier mode of the electric field *E(k) <sup>N</sup>* in (16). The behavior of this last quantity is predicted by theoretical

**Fig. 1** Two-stream instability test: approximated distribution function at time *T* = 30 obtained by using the Fourier-Fourier method with *<sup>N</sup>* <sup>=</sup> 25, *<sup>M</sup>* <sup>=</sup> 26, *<sup>t</sup>* <sup>=</sup> <sup>0</sup>*.*00125, and the corresponding time evolution of the first Fourier mode of the electric field *E(k) <sup>N</sup>* , i.e. | ˆ*a(k)* <sup>1</sup> | in (16)

considerations, and the slope of the "segment" starting at *T* = 15 agrees with the expectancy [1, Chapter 5].

As done in [9], we perform a series of experiments using less degrees of freedom than those actually necessary to resolve accurately the equation. In practice, we set *<sup>N</sup>* <sup>=</sup> *<sup>M</sup>* <sup>=</sup> <sup>2</sup>4. In this way, we could for instance detect what happens by varying the parameters *α* and *β*. Of course, if we increase the number of degrees of freedom, the numerical solution improves and cannot be distinguished from the referring one shown in Fig. 1. The purpose in [9] was to check what happens by varying the parameter *<sup>α</sup>* in the Hermite weight exp *(*−*α*2*v*2*)*. The conclusions are that the approximate solution is very sensitive to the choice of *α* and that there are values of *α* that perform better than others. In general these values are those belonging to a neighbourhood of *α* = 1. Moreover, in [9], we note that keeping *α* constantly equal to the value that better fits the initial datum (i.e. *α* = ¯*α* = 2 for (24)) may create instability as time increases. For such motivations, since at the moment a practical algorithm able to vary *α* in a dynamical way during the computations is not available, in the numerical experiments that follow we fix *α* = 1, while play with *β*.

Due to the particular initial condition, we adopt a two-species decomposition of the Vlasov equation, where the distribution function is given by the sum of two electron distribution functions, i.e., *f* = *fR* + *fL*. These distribution functions refer to the two initial electron distributions, so that *fR* = *pRGR* and *fL* = *pLGL*, where *pL* and *pR* are given polynomials. We consider the two systems of electrons described by the distribution functions *fL* and *fR* at the initial time as distinct plasma species that maintain their diversity throughout the whole numerical simulation. Therefore, we can split the Vlasov equation into two equations that are still of Vlasov type and are solvable independently, although they are coupled through the same electric field, which depends on the total charge density. This amounts to approximate two independent equations of the same type of that given

**Fig. 2** Two-stream instability test: approximated distribution function at time *T* = 30 obtained by using the Fourier–Hermite method with *<sup>N</sup>* <sup>=</sup> *<sup>M</sup>* <sup>=</sup> 24, *<sup>t</sup>* <sup>=</sup> <sup>0</sup>*.*01, *<sup>α</sup>* <sup>=</sup> 1 (left panel) and the corresponding time evolution of the first Fourier mode of the electric field *E(k) <sup>N</sup>* , i.e. | ˆ*a(k)* <sup>1</sup> | in (16) (right panel) when *β* = 0*.*5 (top), *β* = 1 (center) and *β* = 1*.*5 (bottom)

in (23), respectively shifted by ±*β*, i.e.

$$\frac{\partial p\_R}{\partial t} + (v - \beta) \frac{\partial p\_R}{\partial x} - E(t, \mathbf{x}) \left[ \frac{\partial p\_R}{\partial v} - 2a^2 (v - \beta) p\_R \right] = 0,\tag{25}$$

$$\frac{\partial p\_L}{\partial t} + (v + \beta) \frac{\partial p\_L}{\partial x} - E(t, x) \left[ \frac{\partial p\_L}{\partial v} - 2\alpha^2 (v + \beta) p\_L \right] = 0. \tag{26}$$

The two unknowns are then coupled through the density function as in (2).

The plots of Fig. 2 show the numerical distribution function at time *T* = 30 obtained by using the Fourier–Hermite method with *<sup>N</sup>* <sup>=</sup> *<sup>M</sup>* <sup>=</sup> <sup>2</sup>4, *<sup>t</sup>* <sup>=</sup> <sup>0</sup>*.*01, *α* = 1 and different values of the parameter *β* (i.e. *β* = 0*.*5, *β* = 1 and *β* = 1*.*5), together with the corresponding time evolution of the (log of the) first Fourier mode of the electric field *E(k) <sup>N</sup>* , i.e. | ˆ*a(k)* <sup>1</sup> | in (16).

The distribution functions presented in the left column of Fig. 2 are visibly and significantly different depending on *β*, while the first Fourier mode of the electric field shown in the right column seems to be less affected. These differences practically confirm that the choice of the Hermite weight functions *ω(v)* <sup>=</sup> exp*(*−*α*2*(v* <sup>±</sup> *β)*2*)* is a crucial aspect of the method (see also [11, 12, 17, 22]). This conclusion is heuristic. Unfortunately, there is no space enough for a deeper quantitative analysis in these pages. The question deserves however further investigation. Moreover, it would be advisable to develop appropriate algorithms allowing for the automatic adjustment of both parameters *α* and *β* during the time advancing procedure, in order to optimize the performance.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **HPS Accelerated Spectral Solvers for Time Dependent Problems: Part I, Algorithms**

**Tracy Babb, Per-Gunnar Martinsson, and Daniel Appelö**

### **1 Introduction**

In this chapter, part two in a two part series, describes a sequence of numerical experiments demonstrating the performance of a highly computationally efficient solver for equations of the form

$$
\kappa \frac{\partial u}{\partial t} = \mathcal{A}u(\mathbf{x}, t) + \mathbf{g}(u, \mathbf{x}, t), \ \mathbf{x} \in \mathcal{Q}, t > 0,\tag{1}
$$

with initial data *u(x,* 0*)* = *u*0*(x)*. Here *L* is an elliptic operator acting on a fixed domain *Ω* and *f* is lower order, possibly nonlinear terms. We take *κ* to be real or imaginary, allowing for parabolic and Schrödinger type equations.

The "Hierarchial Poincaré–Steklov (HPS)" solver has already been demonstrated to be a highly competitive spectrally accurate solver for elliptic problems [1, 4, 7] and has also been used together with a class of exponential integrators [5], to evolve solutions to hyperbolic differential equations. As just mentioned, the focus here is on differential equations in the form (1) whose discretization leads to stiff system of ODE that can beneficially be advanced in time using Explicit, Singly Diagonally Implicit Runge–Kutta (ESDIRK) methods. ESDIRK methods offer the advantages of stiff accuracy and L-stability and are well suited for the HPS algorithm as they only require a single matrix factorization. They are also easily combined with

T. Babb · D. Appelö (-)

University of Colorado, Boulder, CO, USA

e-mail: tracy.babb@colorado.edu; daniel.appelo@colorado.edu

P.-G. Martinsson University of Texas, Austin, TX, USA e-mail: pgm@ices.utexas.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_11

explicit Runge–Kutta method leading to so called Additive Runge–Kutta (ARK) methods [6].

To this end we investigate the stability and accuracy that is obtained when combining high-order time-stepping schemes with the HPS method for solving elliptic equations. We restrict attention to relatively simple geometries (rectangles) but note that the method can without difficulty be generalized to domains that can be expressed as a union of rectangles, possibly mapped via curvilinear smooth parameter maps.

The rest of this chapter is organized as follows. In Sect. 2 we present results illustrating that the order reduction phenomena for DIRK methods observed in [8] can be circumvented when formulating the time stepping in terms of slopes (with boundary conditions differentiated in time) rather than formulating it in terms of stage solutions. In Sect. 3 we present numerical results for Schrödingers equation in two dimensions and in Sect. 4 we present numerical results for a nonlinear problem, viscous Burgers' equation in two dimensions. Finally, in Sect. 5 we summarize and conclude. For a longer description of the method we refer to thee first part of this paper and to [2].

#### **2 Time Dependent Boundary Conditions**

This section discusses time-dependent boundary conditions within the two different Runge–Kutta formulations. In particular, we investigate the order reduction that has been documented in [8] for implicit Runge–Kutta methods and earlier in [3] for explicit Runge–Kutta methods.

In this first experiment, introduced in [8], we solve the heat equation in one dimension

$$u\_t = u\_{\ge t} + f(t), \qquad x \in [0, 2], \ t > 0. \tag{2}$$

We set the initial data, Dirichlet boundary conditions and the forcing *f (t)* so that exact solution is *u(x, t)* = cos*(t)*. This example is designed to eliminate the effect of the spatial discretization, with the solution being constant in space and allows for the study of possible order reduction near the boundaries.

We use the HPS scheme in space and use 32 leafs with *p* = 32 Chebyshev nodes per leaf. We apply the third, fourth, and fifth order ESDIRK methods from [6]. We consider solving for the intermediate solutions, or as we refer to it below "the *ui* formulation" with the boundary condition enforced as *u<sup>n</sup> <sup>i</sup>* = cos*(tn* + *ciΔt)*. We also consider solving for the stages, which we refer to as "the *ki* formulation" with boundary conditions imposed as *k<sup>n</sup> <sup>i</sup>* = − sin*(x, tn* + *ciΔt)*.

Error reduction for time dependent boundary conditions has been studied both in the context of explicit Runge–Kutta methods in e.g. [3] and more recently for implicit Runge–Kutta methods in [8]. In [8] the authors report observed orders of accuracy equal to two (for the solution *u*) for DIRK methods of order 2, 3, and 4 for

**Fig. 1** The error in solving (2). Results are for a third order ESDIRK. (**a**) Displays the single step error which converges with fourth order of accuracy. (**b**) Displays the global error at *t* = 1 converging at third order. Both errors converge at one order higher than what is expected from the analysis in [8]

**Fig. 2** The error in solving (2). Results are for a fifth order ESDIRK. (**a**) Displays the single step error which converges with fourth order of accuracy. (**b**) Displays the global error at *t* = 1 converging at third order. Both errors converge at one order higher than what is expected from the analysis in [8] but still lower than expected

the problem (2) discretized with a finite difference method on a fine grid (the spatial errors are zero) using the *ui* formulation.

Figures 1 and 2 show the error for the third and fifth order ESDIRK methods, respectively, as a function of *x* for a single step and at the final time *t* = 1. Figure 3 shows the maximum error for the third, fourth, and fifth order methods as a function of time step *Δt* after a single step and at the final time *t* = 1.

In general, for a method of order *p* we expect that the single step error decreases as *Δtp*+<sup>1</sup> while the global error decreases as *Δtp*. However, with time dependent

**Fig. 3** The maximum error (here denoted *l* ∞) in solving (2) for the third, fourth, and fifth order ESDIRK methods for a sequence of decreasing time steps. (**a**, **c**) are errors after one time step and (**b**, **d**) are the errors at time *t* = 1. The top row are for the *ui* formulation and the bottom row is for the *ki* formulation. Note that the *ki* formulation is free of order reduction

boundary conditions implemented as *u<sup>n</sup> <sup>i</sup>* = cos*(tn* + *ciΔt)* the results in [8] indicate that the rate of convergence will not exceed two for the single step or global error.

The results for the third order method (*p* = 3) displayed in Fig. 1 show that the single step error decreases as *Δtp*+<sup>1</sup> while the global error decreases as *Δtp*, which is better than the results documented in [8]. However, we still see that a boundary layer appears to be forming, but it is of the same order as the error away from the boundary. The results for the fifth order method (*p* = 5) displayed in Fig. 2 show that the single step error decreases as *Δt*<sup>4</sup> while the global error decreases as *Δt*3, which is still better than the results documented in [8]. However, the boundary layer is giving order reduction from *Δtp*+<sup>1</sup> for the single step error and *Δt<sup>p</sup>* for the global error. We note that our observations differ from those in [8] but that this possibly can be attributed to the use of a ESDIRK method rather than a DIRK method.

We repeat the experiment but now we use the *ki* formulation for Runge–Kutta methods and for the boundary condition we enforce *k<sup>n</sup> <sup>i</sup>* = − sin*(tn* + *ciΔt)*. The intuition here is that *k<sup>n</sup> <sup>i</sup>* is an approximation to *ut* at time *tn* + *ciΔt* and we use the value of *ut* for the boundary condition of *k<sup>n</sup> <sup>i</sup>* . Intuitively we expect that the fact that we reduce the index of the system of differential algebraic equation in the *ui* formulation by differentiating the boundary conditions can restore the design order of accuracy.

In the previous examples the Runge–Kutta method introduced an error on the interior while the solution on the boundary was exact. If the error on the boundary is on the same order of magnitude as the error on the interior then the error in *uxx* is of the correct order, but when the value of *u* is exact on the boundary it introduces a larger error in *uxx*. In the *ki* formulation, for each intermediate stage we find *uxx* <sup>=</sup> 0 and then *<sup>k</sup><sup>n</sup> <sup>i</sup>* = − sin*(tn* + *ciΔt)* on the interior and on the boundary. So at a fixed time the solution is constant in *x* and a boundary layer does not form. Additionally, the error is constant in *x* at any fixed time and for a method of order *p* we obtain the expected behavior where the single step error decreases as *Δtp*+<sup>1</sup> and the global error decreases as *Δtp*.

Figure 3 shows the maximum error for the third, fourth, and fifth order methods as a function of time step *Δt* after a single step and at the final time *t* = 1. The results show that the methods behave exactly as we expect. The single step error behaves as *Δtp*+<sup>1</sup> for the third and fifth order methods and *Δtp*+<sup>2</sup> for the fourth order method. The fourth order method gives sixth order error in a single step because the exact solution is *u(x, t)* = cos*(t)*, which has every other derivative equal to zero at *t* = 0 and for a single step we start at *<sup>t</sup>* <sup>=</sup> 0. The global error behaves as *Δt<sup>p</sup>* for each method.

#### **3 Schrödinger Equation**

Next we consider the Schrödinger equation for *u* = *u(x, y, t)*

$$\begin{aligned} i\hbar u\_t &= -\frac{\hbar^2}{2M}\Delta u + V(\mathbf{x}, \mathbf{y})u, \ t > 0, \ (\mathbf{x}, \mathbf{y}) \in [\mathbf{x}\_l, \mathbf{x}\_r] \times [\mathbf{y}\_b, \mathbf{y}\_l], \\ u(\mathbf{x}, \mathbf{y}, 0) &= u\_0(\mathbf{x}, \mathbf{y}). \end{aligned} \tag{3}$$

Here we nondimensionalize in a way equivalent to setting *M* = 1*, h*¯ = 1 in the above equation. We choose the potential to be the harmonic potential

$$V(\mathbf{x}, \mathbf{y}) = \frac{1}{2} \left(\mathbf{x}^2 + \mathbf{y}^2\right).$$

This leads to an exact solution

$$u(\mathbf{x}, \mathbf{y}, t) = A e^{-it} e^{-\frac{(\mathbf{x}^2 + \mathbf{v}^2)}{2}},\tag{4}$$

where we set *A* = 1*/* ,√*<sup>π</sup>* and solve until *<sup>t</sup>* <sup>=</sup> <sup>2</sup>*<sup>π</sup>* on the domain *(x, y)* ∈ [−8*,* <sup>8</sup>] 2.

**Fig. 4** Error in the Schrödinger equation as a function of leaf size. The exact solution is given in Eq. (4)


The computational domain is subdivided into *nx* × *ny* panels with *p* × *p* points on each panel. To begin, we study the order of accuracy with respect to leaf size. To eliminate the effect of time-stepping errors we scale *Δt* <sup>=</sup> *hp/q*RK , where *<sup>q</sup>*RK is the order of the Runge–Kutta method. In Fig. 4 we display the errors as a function of the leaf size for *p* = 4*,* 6*,* 8*,* 10*,* 12*,* 16 and for the third and fifth order Runge–Kutta methods (*qRK* = 3*,* 5). The rates of convergence are found for all three Runge– Kutta methods and summarized in Table 1. As can be seen from the table, *p* = 4 appears to converge at second order, while for higher *p* we generally observe a rate of convergence approaching to *p*.

In this problem the efficiency of the method is limited by the order of the Runge–Kutta methods. However, as our methods are unconditionally stable we may enhance the efficiency by using Richardson extrapolation to achieve a highly accurate solution in time. We solve the same problem, but now we fix *p* = 12 and take 5 · <sup>2</sup>*<sup>n</sup>* time steps, with *<sup>n</sup>* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,...,* 5. For the third order ESDIRK method we use 60 × 60 leaf boxes. For the fourth order ESDIRK method we use 90×90 leaf boxes. For the fifth order ESDIRK method we use 120×120 leaf boxes. Table 2 shows that we can easily achieve much higher accuracy by using Richardson extrapolation.

Finally, we solve a problem without an analytic solution. In this problem the initial data

$$
\mu(\mathbf{x}, \mathbf{y}, t) = \Im \sin(\mathbf{x}) \sin(\mathbf{y}) e^{-(\mathbf{x}^2 + \mathbf{y}^2)},
$$


**Table 2** Estimated errors at the final time after Richardson extrapolation

The notation *d(*−*p)* means *<sup>d</sup>* · <sup>10</sup>−*<sup>p</sup>*

**Table 3** Errors computed against a *p* and *h* refined solution


The errors are maximum errors at the final time *<sup>t</sup>* <sup>=</sup> 4. The notation *d(*−*p)* means *<sup>d</sup>* · <sup>10</sup>−*<sup>p</sup>*

interacts with the weak and slightly non-symmetric potential

$$V(\mathbf{x}, \mathbf{y}) = 1 - e^{-(\mathbf{x} + \mathbf{0}.9\mathbf{y})^4},$$

allowing the solution to reach the boundary where we impose homogenous Dirichlet conditions.

We evolve the solution until time *t* = 4 using *p* = 8 and 10 and 2*,* 4*,* 8*,* 16 and 32 leaf boxes in each direction of a domain of size 12 × 12. The errors computed against a reference solution with *p* = 12 and with 32 leaf boxes can be found in Table 3.

In Fig. 5 we display snapshots of the magnitude of the solution at the initial time *t* = 0, the intermediate times *t* ≈ 1*.*07, *t* ≈ 1*.*68 and at the final time *t* = 4*.*0.

#### **4 Burgers' Equation in Two Dimensions**

As a first step towards a full blown flow solver we solve Burgers' equation in two dimensions using the additive Runge–Kutta methods described in the first part of this paper. Precisely, we solve the system

$$\mathbf{u}\_t + \mathbf{u} \cdot \nabla \mathbf{u} = \varepsilon \Delta \mathbf{u}, \quad \mathbf{x} \in \left[ -\pi, \pi \right]^2, \ t > 0,\tag{5}$$

where **u** = [*u(x, y, t), v(x, y, t)*] *<sup>T</sup>* is the vector containing the velocities in the *x* and *y* directions.

The first problem we solve uses the initial condition **u** = 5[−*y,x*] *<sup>T</sup>* exp*(*−3*r*2*)* and the boundary conditions are taken to be no-slip boundary conditions on all sides.

**Fig. 5** Snapshots of the magnitude of the solution at the initial time (**a**) *t* = 0, the intermediate times (**b**) *t* ≈ 1*.*07, (**c**) *t* ≈ 1*.*68 and at the final time (**d**) *t* = 4*.*0

We solve the problem using 24 × 24 leafs, *p* = 24, *ε* = 0*.*005, and the fifth order ARK method found in [6]. We use a time step of *k* = 1*/*80 and solve until time *t*max = 5. The low viscosity combined with the initial condition produces a rotating flow resembling a vortex that steepens up over time.

In Fig. 6 we can see the velocities at times *t* = 0*.*5 and *t* = 1. The fluid rotates and expands out and eventually forms a shock like transition. This creates a sharp flow region with large gradients resulting in a flow that may be difficult to resolve with a low order accurate method. These sharp gradients can be seen in the two vorticity plots in Fig. 6 along with the speed and vorticity plots in Fig. 7.

In our second experiment we consider a cross stream of orthogonal flows. We use an initial condition of

$$\mathbf{u} = \left[ 8\mathbf{y} \, e^{-\Re\left(\frac{\mathbf{y}}{2}\right)^{8}}, -8\mathbf{x} \, e^{-\Re\left(\frac{\mathbf{x}}{2}\right)^{8}} \right]^{T} \mathbf{l}^{T},\tag{6}$$

and time independent boundary conditions that are compatible with the initial data.


3

2

1

0


3

2

1

0





**Fig. 7** The plots in the first column correspond to the rotating shock problem. The second and third columns show the velocity in the *x* direction and thedilatation at times *t* = 0*.*06 and *t* = 0*.*15 for the cross flow problem

This initial horizontal velocity drops to zero quickly as we approach |*y*| = 0*.*5. For |*y*| *<* 0*.*5 the exponential term approaches exp*(*0*)* and the velocity behaves like *u* = 8*y*. The flow has changed slightly by *t* = 0*.*06, but we can see in Fig. 7 the flow is moving to the right for *y >* 0 and the flow is moving the left for *y <* 0 and all significant behavior is in |*y*| *<* 0*.*5. A plot of the velocity *v* would show similar behavior. We also use 24 × 24 leafs, *p* = 24, *#* = 0*.*025, *k* = 1*/*200, and *t*max = 0*.*75. We show plots of the horizontal velocity *u* and the dilatation at time *t* = 0*.*06 and *t* = 0*.*15. We only show plots before time *t* = 0*.*15 when the fluid is hardest to resolve and we observe that after *t* = 0*.*15 the cross streams begin to dissipate. This problem contains sharp interfaces inside **x** ∈ [−0*.*5*,* 0*.*5] 2.

#### **5 Conclusion**

In this two part series we have demonstrated that the spectrally accurate Hierarchial Poincaré–Steklov solver can be easily extended to handle time dependent PDE problems with a parabolic principal part by using ESDIRK methods. We have outlined the advantages of the two possible ways to formulate implicit Runge–Kutta methods within the HPS scheme and demonstrated the capabilities on both linear and non-linear examples.

There are many avenues for future work, for example:


#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **High-Order Finite Element Methods for Interface Problems: Theory and Implementations**

**Yuanming Xiao, Fangman Zhai, Linbo Zhang, and Weiying Zheng**

### **1 Introduction**

The interface problems which involve partial differential equations having discontinuous coefficients across certain interfaces are often encountered in fluid dynamics, electromagnetics and materials science. Because of the low global regularity and the irregular geometry of the interface, the standard numerical methods which are efficient for smooth solutions usually lead to loss in accuracy across the interface.

For arbitrarily shaped interface , it is known that optimal or nearly optimal convergence rate can be recovered if body-fitted finite element meshes are used, see e.g. [6, 8, 20, 29]. Here, by "body-fitted meshes" we mean an element of the underlying mesh is required to intersect with the interface only through its boundaries (Fig. 1). Unfortunately, when the geometry is complex, this usually leads to a nontrivial interface meshing problem. Therefore, numerous modified finite difference methods based only on simple Cartesian grids have been proposed in the literature. We refer to the immersed boundary method [24], the immersed interface method [17, 18], the ghost fluid method [21], and the references therein. In the

F. Zhai

L. Zhang · W. Zheng

Y. Xiao (-)

Department of Mathematics, Nanjing University, Nanjing, China e-mail: xym@nju.edu.cn

Department of Applied Mathematics, Nanjing Forestry University, Nanjing, China

State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China e-mail: zlb@lsec.cc.ac.cn; zwy@lsec.cc.ac.cn

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_12

finite element setting, we refer to the work of the immersed finite element method [7, 11, 19], the multiscale finite element method [9], the penalty finite element method [1].

In the past decade, a combination of the extended finite element method (XFEM) with the Nitsche scheme has become a popular discretization method. As the first attempt, an *unfitted finite element method* was proposed in [13] which can be viewed as a linear and consistent modification of [1]. This approach has motivated a number of works, e.g., the unfitted finite element method [4, 5, 12], the Ghost penalty method [2, 3], the unfitted discontinuous Galerkin methods [22]. Although significant progresses in the error analyses of some methods have been made, the development of high-order accurate unfitted FEMs with rigorous error analysis is still challenging. We refer to the work of [14–16, 22, 27, 28] which claim high order approximations. In [22], an *hp-unfitted discontinuous Galerkin method* for Problem (1) was considered, and optimal *h*-convergence for arbitrary *p* was shown for the two-dimensional case in the energy norm and in the *L*2-norm. With an extra flux penalty term applied on the interface, [27] gave better *hp* a priori error estimates in both two and three dimensions. In [15, 16], an isoparametric finite element method with a high order geometrical approximation of level set domains was presented. The analysis reveals optimal order error bounds with respect to *h* for the geometry approximation and for the finite element approximation. In [14, 28], various issues related to unfitted methods was addressed, including the dependence of error estimates on the diffusion coefficients, the condition number of the discrete system, and the choice of stabilization parameters.

The Nitsche-XFEM can be interpreted as applying interior penalty (IP) methods on the interface, and our method falls into this category. The major step in our variant is an appropriate choice of the mesh and geometry dependent weights in the average (see (6)), which lead to trace and inverse inequalities for possibly degenerated subelements (see (9)). We note that in our approach, the penalization is applied only to the jump of the solution values across the interface (compared with the bilinear form in [27]). The optimal *h*-convergence rate for arbitrary high-order discretization in the energy and *L*2-norm are proved regardless of the dimension. We refer to [14– 16] for the similar estimates with respect to *h* and [27] for a refined version with respect to both *h* and *p*.

Efficient implementations of this method are then discussed in two aspects. We first consider an optimal multigrid solver for the generated linear system. We use the continuous FE space as a "background" subspace, with some smoothing operations added near the interface, to formulate a nested geometrical multigrid method. We prove the optimality of this special multiplicative multigrid method, which means the method converges uniformly with respect to the mesh size, and is independent of the location of the interface relative to the meshes. Since the assembling of the stiffness matrix will require integration over curved surfaces and volumes, we then implement a robust and arbitrarily high order numerical quadrature algorithm by transforming surface and volume integrals into multiple 1-D integrals. The code for the algorithm is freely available in the open source finite element toolbox Parallel Hierarchical Grid (PHG) [26]. We also refer to [23, 25] for different approaches to compute integrals on curved sub-elements and their curved boundaries.

The layout of this paper is as follows. In Sect. 2 we introduce the XFE spaces and reformulate the interface problem (1) in DG schemes. The *H*1- and *L*2- error estimates of both schemes—which attain the optimal order of the convergence rate in respect to mesh size *h*—are given. In Sect. 3, we give an optimal multigrid method for the aforementioned DG-XFE schemes. Numerical examples for both two and three dimensions are reported in Sect. 4, to illustrate the high accuracy of the algorithm.

#### **2 XFE and DG Schemes for Interface Problems**

We consider the following elliptic interface problem for *u*: Let = <sup>1</sup> ∪ ∪ <sup>2</sup> be a bounded and convex polygonal or polyhedral domain in <sup>R</sup>*<sup>d</sup> , d* <sup>=</sup> 2 or 3, where <sup>1</sup> and <sup>2</sup> are two subdomains of and are separated by a *C*2-smooth interface (see Fig. 2 for an illustration of a unit square that contains a circle as an interface),

$$\begin{cases} -\nabla \cdot (\alpha(\mathbf{x}) \nabla u) = \quad f, \quad \text{in } \Omega\_1 \cup \Omega\_2, \\ \begin{bmatrix} \alpha(\mathbf{x}) \nabla u \end{bmatrix} = \mathbf{g}\_N, \text{ on } \Gamma, \\ \begin{bmatrix} \boldsymbol{\mu} \end{bmatrix} = \mathbf{g}\_D, \text{ on } \Gamma, \\ \boldsymbol{\mu} = \, \mathbf{0}, \quad \text{on } \partial \Omega. \end{cases} \tag{1}$$

Here *α(***x***)* = *αi, i* = 1*,* 2*,* is a piecewise constant function on the partition <sup>1</sup> ∪2.

Denote by {T*h*}, a family of conforming, quasi-uniform, and regular partitions of into triangles and parallelograms/tetrahedrons and parallelepipeds. As *K* is of regular shape, there is a constant *γ*<sup>0</sup> such that

$$h\_K^d \le \mathcal{y}\_0|K|, \quad \forall K \in \mathcal{T}\_h. \tag{2}$$

We define the set of all elements intersected by as <sup>T</sup> *<sup>h</sup>* = {*K* ∈ T*<sup>h</sup>* : |*K* ∩| = 0}. Each <sup>T</sup> *<sup>h</sup>* induces a partition of interface , which we denote by <sup>E</sup> *<sup>h</sup>* = {*eK* : *eK* = **Fig. 2** Domain = <sup>1</sup> ∪ ∪ <sup>2</sup> with an unfitted mesh

*<sup>K</sup>* <sup>∩</sup> *, K* <sup>∈</sup> <sup>T</sup> *<sup>h</sup>* }. For any *<sup>K</sup>* <sup>∈</sup> <sup>T</sup> *<sup>h</sup>* , let *Ki* = *K* ∩ *i* denote the part of *K* in *i* and *<sup>n</sup><sup>i</sup>* be the unit outward normal vector on *∂Ki* with *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,* 2. As is of class *<sup>C</sup>*2, it is easy to prove that (cf.[6, 31]) each interface segment/patch *eK* is contained in a strip of width *δ* and satisfies

$$\delta \le \gamma\_1 h\_K^2 \quad \text{and} \quad |\mathfrak{n}\_l(\mathbf{x}) - \mathfrak{n}\_l(\mathbf{y})| \le \gamma\_2 h\_K, \ \forall \mathbf{x}, \mathbf{y} \in e\_K. \tag{3}$$

We define the weighted average {·} and the jump [·] on *<sup>e</sup>* <sup>∈</sup> <sup>E</sup> *<sup>h</sup>* by

$$\{\upsilon\} = \kappa\_1 \upsilon\_1 + \kappa\_2 \upsilon\_2,\tag{4} = \upsilon\_1 \mathfrak{n}\_1 + \upsilon\_2 \mathfrak{n}\_2,\tag{4}$$

$$\{\mathbf{q}\} = \kappa\_1 \mathbf{q}\_1 + \kappa\_2 \mathbf{q}\_2,\qquad\qquad\qquad\text{[ $\mathbf{q}| = \mathbf{q}\_1 \cdot \mathbf{n}\_1 + \mathbf{q}\_2 \cdot \mathbf{n}\_2$ .]}\tag{5}$$

For the stability analysis of our schemes, we define *(κ*1*, κ*2*)* on each element as follows:

$$\kappa\_l = \begin{cases} 0, & \text{if } \frac{|K\_l|}{|K|} < c\_0 h\_K, \\ 1, & \text{if } \frac{|K\_l|}{|K|} > 1 - c\_0 h\_K, \\ \frac{|K\_l|}{|K|}, & \text{otherwise} \end{cases} \tag{6}$$

Clearly, 0 ≤ *κi* ≤ 1 and *κ*<sup>1</sup> + *κ*<sup>2</sup> = 1 so that {·} is a convex combination along . Roughly speaking, we adopt the weight *κi* <sup>=</sup> <sup>|</sup>*Ki*<sup>|</sup> <sup>|</sup>*K*<sup>|</sup> suggested in [13] for general subelements and we set *κi* <sup>=</sup> 0 for <sup>|</sup>*Ki*<sup>|</sup> *< chd*+<sup>1</sup> *<sup>K</sup>* . Here, the user-defined constant *c*<sup>0</sup> ≥ 2*γ*0*γ*<sup>1</sup> and *γ*0, *γ*<sup>1</sup> are constants defined in (2) and (3), respectively. The dependence of *c*<sup>0</sup> on these generic constants is elaborated in Lemma 1.

Let *χi* be the characteristic function on *i* with *i* = 1*,* 2. Given a mesh T*h*, let *Vh* be the continuous piecewise polynomial function space of degree *p* ≥ 1 on the mesh. Let *V* <sup>0</sup> *<sup>h</sup>* := *Vh* <sup>∩</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> *()*, *<sup>V</sup>* <sup>1</sup> *<sup>h</sup>* := *<sup>V</sup>* <sup>0</sup> *<sup>h</sup>* · *<sup>χ</sup>*<sup>1</sup> and *<sup>V</sup>* <sup>2</sup> *<sup>h</sup>* := *<sup>V</sup>* <sup>0</sup> *<sup>h</sup>* · *χ*2. We define the XFE space as *V <sup>h</sup>* <sup>=</sup> *<sup>V</sup>* <sup>1</sup> *<sup>h</sup>* <sup>+</sup> *<sup>V</sup>* <sup>2</sup> *h* .

Then, the DG-XFE method for the interface problem is: Find *uh* <sup>∈</sup> *<sup>V</sup> <sup>h</sup>* such that

$$B\_h(u\_h, v\_h) = F\_h(v\_h), \quad \forall v\_h \in V\_h^\Gamma,\tag{7}$$

where

$$\begin{split} B\_{h}(w,\upsilon) := & \int\_{\Omega\_{1} \cup \Omega\_{2}} a(\mathbf{x}) \nabla w \cdot \nabla \upsilon - \int\_{\Gamma} \{a(\mathbf{x}) \nabla w\} \cdot [\upsilon] \\ & - \beta \int\_{\Gamma} [w] \cdot \{a(\mathbf{x}) \nabla \upsilon\} + \sum\_{K \in \mathcal{T}\_{h}^{\Gamma}} \frac{\eta \beta}{h\_{K}} \int\_{K \cap \Gamma} [w] \cdot [\upsilon], \end{split}$$
 
$$\begin{split} F\_{h}(\upsilon) &:= & \int\_{\Omega} f \ \upsilon + \int\_{\Gamma} g\_{N}(\kappa\_{1} \upsilon\_{2} + \kappa\_{2} \upsilon\_{1}) \\ & - \beta \int\_{\Gamma} \mathbf{g} \mathbf{D} \cdot \{a(\mathbf{x}) \nabla \upsilon\} + \sum\_{K \in \mathcal{T}\_{h}^{\Gamma}} \frac{\eta \beta}{h\_{K}} \int\_{K \cap \Gamma} \mathbf{g} \mathbf{D} \cdot [\upsilon], \end{split}$$

For *ηβ* sufficiently large, the norm corresponding to the bilinear form *Bh(*·*,*·*)* is uniformly equivalent to ·*Bh* , which is defined by

$$\left\|\boldsymbol{v}\right\|\_{B\_{h}}^{2} = \left|\boldsymbol{v}\right|\_{1,\Omega\_{1}\cup\Omega\_{2}}^{2} + \sum\_{K\in\mathcal{T}\_{h}^{\Gamma}} \eta\_{\boldsymbol{\beta}} h\_{K}^{-1} \left\|\boldsymbol{l}\boldsymbol{v}\right\|\boldsymbol{l}\_{L^{2}(\boldsymbol{e}\_{K})}^{2} + \sum\_{K\in\mathcal{T}\_{h}^{\Gamma}} \eta\_{\boldsymbol{\beta}}^{-1} h\_{K} \left\|\left[\boldsymbol{a}(\boldsymbol{x})\nabla\boldsymbol{v}\right]\right\|\_{L^{2}(\boldsymbol{e}\_{K})}^{2} \tag{8}$$

The crucial component in regard to establishing this equivalence result and also the stability of bilinear forms is the control on the weighted normal derivatives, which is stated as a trace and inverse inequality in Lemma 1.

**Lemma 1 ([27, 28])** *Let γ*<sup>0</sup> *and γ*<sup>1</sup> *be constants defined in* (2) *and* (3)*, respectively. If we choose c*<sup>0</sup> ≥ 2*γ*0*γ*<sup>1</sup> *in the definition* (6) *of κ, there exists a positive constant h*<sup>0</sup> *such that for all <sup>h</sup>* <sup>∈</sup> *(*0*, h*0] *and any interface segment/patch eK* <sup>=</sup> *<sup>K</sup>* <sup>∩</sup> <sup>∈</sup> <sup>E</sup> *h , the following estimates hold on both sub-elements of K:*

$$\|\kappa\_i^{1/2}v\_l\|\_{L^2(\epsilon\_K)} \le \frac{C}{h\_K^{1/2}}\|v\_l\|\_{L^2(K\_l)}, \quad v\_l \in \mathcal{P}\_p(K\_l), \ i = 1, 2. \tag{9}$$

The coercivity and boundedness of *Bh(*·*,*·*)* in its norm ·<sup>2</sup> *Bh* is then a direct consequence of the Cauchy–Schwarz inequality.

**Lemma 2** *Let <sup>V</sup>* <sup>=</sup> *<sup>H</sup>*2*(*<sup>1</sup> <sup>∪</sup> 2*) and V (h)* <sup>=</sup> *<sup>V</sup> <sup>h</sup>* + *V , we have*

$$B\_h(w, \upsilon) \le C\_b \|w\|\_{B\_h} \|\upsilon\|\_{B\_h}, \quad \forall w, \upsilon \in V(h), \tag{10}$$

*and*

$$B\_h(\upsilon, \upsilon) \ge C\_s \|\upsilon\|\_{B\_h}^2, \quad \forall \upsilon \in V\_h^\Gamma,\tag{11}$$

*provided the penalty parameter ηβ is chosen sufficiently large.*

The XFE space has optimal approximation quality for piecewise smooth functions in *<sup>H</sup>p(*<sup>1</sup> <sup>∪</sup> 2*)*. The following theorem is proved in [28] as an analogue of Cea's lemma.

**Theorem 1** *Assume that the interface is C*<sup>2</sup> *smooth and that the solution of the elliptic interface problem* (1) *satisfies <sup>u</sup>* <sup>∈</sup> *<sup>H</sup>s(*<sup>1</sup> <sup>∪</sup> 2*), where <sup>s</sup>* <sup>≥</sup> <sup>2</sup> *is an integer. Let μ* = min{*p* + 1*, s*}*. The following error estimates hold for any h* ∈ *(*0*, h*0]*: If ηβ is chosen sufficiently large (see* (11)*) and uh is the solution to the first scheme of* (7)*, then*

$$\|\|u - u\_h\|\|\_{B\_h} \stackrel{<}{\sim} h^{\mu - 1} \|\|u\|\|\_{H^s(\mathfrak{Q}\_1 \cup \mathfrak{Q}\_2)}, \quad \forall \, 0 < h \le h\_0. \tag{12}$$

*The hidden constants in the above estimates are dependent on the angle condition of the mesh* T*h, the degree of the polynomials, the parameter in the scheme, and α(***x***), but are independent of the location of the interface relative to the mesh. Here, the constant h*<sup>0</sup> *is from Lemma 1.*

#### **3 An Optimal Multigrid Method for (7)**

In this section, we propose a two-level geometric multigrid solver of the finite element problem (7). It is well known that the element *K* with a "small" cut (i.e. |*K* ∩ *i*|*/*|*K*| 1) would have adverse effect on the conditioning of the resulting stiffness matrices (see e.g. [3]). Our approach is based on the general theory of the successive subspace correction (SSC) method of solving on a linear vector space *<sup>V</sup>*˜ <sup>=</sup> <sup>&</sup>lt;*<sup>J</sup> <sup>i</sup>*=<sup>0</sup> *Vi* with inner product *(*·*,*·*)* the equation *(Au, v)* <sup>=</sup> *(f, v)*, where *A* : *V*˜ → *V*˜ is a symmetric positive definite operator.

We apply SSC for a relatively simple case of two subspaces (i.e. *J* = 2), that is, *<sup>V</sup>*˜ <sup>=</sup> *<sup>V</sup> <sup>h</sup>* <sup>=</sup> *<sup>V</sup>*<sup>1</sup> <sup>⊕</sup> *<sup>V</sup>*2, with *<sup>V</sup>*<sup>1</sup> <sup>=</sup> *<sup>V</sup>* <sup>0</sup> *<sup>h</sup>* and *V*<sup>2</sup> = *V* 3 *<sup>h</sup>* , where *V* 3 *<sup>h</sup>* <sup>⊂</sup> *<sup>V</sup> <sup>h</sup>* is the space of nodal basis functions that vanish on <sup>N</sup>3*<sup>h</sup>* := {*xj* : |supp*(ψj )* <sup>∩</sup> | = <sup>0</sup>}. With a slight abuse of notation, the DG-XFE scheme induces a symmetric positive definite operator *Bh* for *<sup>β</sup>* <sup>=</sup> 1. Let *<sup>B</sup>*˜*<sup>h</sup>* and *<sup>B</sup>*˜ *<sup>h</sup>* be the restrictions of *Bh* on *<sup>V</sup>* <sup>0</sup> *<sup>h</sup>* and *V* 3 *h* , respectively. Let *<sup>R</sup>*˜*<sup>h</sup>* : *<sup>V</sup>* <sup>0</sup> *<sup>h</sup>* <sup>→</sup> *<sup>V</sup>* <sup>0</sup> *<sup>h</sup>* be approximately an inverse of *B*˜*h*. We have this two-level successive subspace correction method (Algorithm 1). The similar idea has been employed in a special linear case in [32] and analyzed using the framework given in [30, 33].



Obviously, Algorithm 1 defines an iterative method for solving *Bhuh* = *fh*. Denote by *Ah* the iterator of the method, then the error contract property is summarized as the following theorem.

**Theorem 2 ([28])** *Assume that I* − *R*˜*hB*˜*hB*˜*<sup>h</sup> ρ <* 1*. Then Algorithm 1 is uniformly convergent with respect to the mesh size with*

$$\left\| I - A\_h B\_h \right\|\_{B\_h}^2 \leqslant \frac{\Lambda}{1 - \rho^2 + \Lambda},$$

*where \* is a constant independent of h.*

When T*<sup>h</sup>* is a shape-regular grid with a geometrical multilevel structure, then a geometric multigrid process can be implemented on *V* <sup>0</sup> *<sup>h</sup>* , and the approximate inverse *R*˜*<sup>h</sup>* of *B*˜*<sup>h</sup>* can be chosen to be the iterator of *V* -cycle multigrid method.

#### **4 Numerical Tests**

In this section, we present some initial results to demonstrate the high-order accuracy and robustness of our method. A 2-D example was implemented in MATLAB. The numerical experiment for a 3-D case was carried out in the open source finite element toolbox PHG [26].

#### *4.1 High-Order Numerical Quadratures on "cut" Elements*

Assembling the local stiffness matrix and the corresponding RHS for *<sup>K</sup>* <sup>∈</sup> <sup>T</sup> *h* requires integration over irregularly shaped manifolds:

$$I = \int\_{K \cap \Omega\_l} \mu(\mathbf{x})d\mathbf{x} \quad \text{and} \quad I = \int\_{K \cap \Gamma} \mu(\mathbf{x})d\Gamma,\tag{13}$$

where is defined by the zero level set of a piecewise smooth function.

Our implementation of (13) relies on a general-purpose and arbitrarily high order numerical quadrature algorithm proposed in [10]. The basic idea is to choose a local coordinate system with three orthogonal directions, decompose integrals in (13) into multiple 1-D integrals along these directions, and use 1-D Gaussian quadratures to compute these integrals. For 1-D Gaussian quadratures to work, the local coordinate system should be suitably chosen according to properties of *K* and to prevent essential singularities from appearing in the 1-D integrands, and the integration intervals are divided into subintervals at the non essential singularities of the integrands. We note that the proposed algorithm only requires finding roots of univariate nonlinear functions in given intervals and evaluating the integrand, the level set function, and the gradient of the level set function at given points. It can achieve arbitrarily high order by increasing the orders of Gaussian quadratures, and does not need extra a priori knowledge about the integrand and the level set function.

This algorithm has been implemented in the file src/quad-interface.c and include/phg/quad-interface.h in PHG [26]. Extensive *h*− and *p*−convergence tests have been performed in [10] and included in a sample code test/quad\_test2.c.

#### *4.2 2-D Numerical Examples*

Let domain be the unit square *(*0*,* 1*)* <sup>2</sup> and interface be the zero level set of the function *ϕ(x)* = *(x*1−0*.*5*)* <sup>2</sup>+*(x*2−0*.*5*)*2−1*/*7. The subdomain <sup>1</sup> is characterized by *ϕ(x) <* 0 and <sup>2</sup> by *ϕ(x) >* 0. The domain is partitioned into grids of squares with the same size *h*. The exact solution is chosen as

$$u(\mathbf{x}\_1, \mathbf{x}\_2) = \begin{cases} 1/\alpha\_1 \exp(\mathbf{x}\_1 \mathbf{x}\_2), & (\mathbf{x}\_1, \mathbf{x}\_2) \in \mathfrak{Q}\_1, \\ 1/\alpha\_2 \sin(\pi \mathbf{x}\_1) \sin(\pi \mathbf{x}\_2), & (\mathbf{x}\_1, \mathbf{x}\_2) \in \mathfrak{Q}\_2. \end{cases}$$

The right-hand side can be computed accordingly.

We implement Algorithm 1, with *V* -cycle geometric multigrid based on the unfitted grid T*<sup>h</sup>* playing as the coarse grid corrector. In each pre- and post-smoothing stage of *V* -cycle iterator, we perform Gauss-Seidel for two times. We record the numerical results in Table 1. In these examples, the initial guess is **0**, and the stopping criterion is

$$\|\|f\_h - B\_h \boldsymbol{u}\_h^{(k)}\|\|\_\infty / \|f\_h - B\_h \boldsymbol{u}\_h^{(0)}\|\|\_\infty < 10^{-10}.$$

From Table 1, we can see that the multigrid method converges uniformly with respect to the mesh size, which confirms our theoretical results.


**Table 1** Numerical performance of Algorithm 1 (2-D example)

#### *4.3 3-D Numerical Examples*

The settings of this numerical experiment are as follows. The domain <sup>=</sup> *(*0*,* <sup>1</sup>*)*3. The interfaces are two touched spheres of radius 0*.*1 centered at *(*0*.*4*,* 0*.*5*,* 0*.*5*)* and *(*0*.*6*,* 0*.*5*,* 0*.*5*)*. The exact solution is given by

$$\mu(\mathbf{x}\_{\mathsf{I}}, \mathbf{x}\_{\mathsf{2}}, \mathbf{x}\_{\mathsf{3}}) = \begin{cases} \exp(\mathbf{x}\_{\mathsf{I}} + \mathbf{x}\_{\mathsf{2}} + \mathbf{x}\_{\mathsf{3}}), & (\mathbf{x}\_{\mathsf{I}}, \mathbf{x}\_{\mathsf{2}}, \mathbf{x}\_{\mathsf{3}}) \in \mathfrak{Q}\_{\mathsf{I}}, \\\\ \sin(\mathbf{x}\_{\mathsf{I}}) \sin(\mathbf{x}\_{\mathsf{2}}) \sin(\mathbf{x}\_{\mathsf{3}}), & (\mathbf{x}\_{\mathsf{I}}, \mathbf{x}\_{\mathsf{2}}, \mathbf{x}\_{\mathsf{3}}) \in \mathfrak{Q}\_{\mathsf{2}}. \end{cases}$$

The discontinuous coefficient function is defined such that *α*<sup>1</sup> = 1 and *α*<sup>2</sup> = 100.

A convergence study is performed on a series of meshes generated by uniform refinements of an initial mesh consisting of 6 congruent tetrahedra. Relative errors and convergence rates of numerical solutions for *Pp* elements for *p* = 1*,* 2*,* 3 and 4 are listed in Table 2, with the quadrature order *q* = 2*p* + 3. The convergence rates are optimal for both *<sup>H</sup>*1*()*-errors (order *<sup>p</sup>*) and *<sup>L</sup>*2*()*-errors (order *<sup>p</sup>*+1). For the


**Table 2** Errors and convergence orders of the numerical solutions (3-D example)

time being, however, the design of multigrid solver for 3-D case is still on-going. The computations for *P*1, *P*<sup>2</sup> and *P*<sup>3</sup> elements were done using the 64-bit double precision and the linear systems were solved using MUMPS, but for *P*<sup>4</sup> element, to eliminate influences of roundoff errors, the computations were done using the 80 bit extended double precision and the linear systems were solved using the GMRES method with MUMPS in double precision as its preconditioner. The performance of Algorithm 1 will be reported in a future work.

**Acknowledgements** The authors would like to acknowledge the funding support of this research by the National Key Research and Development Program of China under grant number 2017YFC0209804, National Natural Science Foundation of China under grant number 11101208, and National Center for Mathematics and Interdisciplinary Sciences of Chinese Academy of Sciences. The authors are grateful to Jinchao Xu and Haijun Wu for the fruitful discussions and suggestions, to Dr. Huaqing Liu for his valuable help on preparing the numerical examples.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Stabilised Hybrid Discontinuous Galerkin Methods for the Stokes Problem with Non-standard Boundary Conditions**

**Gabriel R. Barrenechea, Michał Bosy, and Victorita Dolean**

### **1 Introduction**

The interest of this paper is to discretise the Stokes problem with non-standard boundary conditions. In [1], a hybrid discontinuous Galerkin (hdG) method was proposed and analysed for this problem. The finite element method used was the combination of BDM elements of order *k* for the velocity, and discontinuous elements of order *k* − 1 for the pressure. In this paper we increase the order of the pressure space to *k*, while keeping the order for the velocity space fixed as *k*. Since this pair does not satisfy the inf-sup condition, a stabilisation term needs to be added.

The stabilisation term referred to above can be built using a diversity of approaches, but, roughly speaking, the stabilisation can be residual or non-residual. In [8] the authors added a mesh-dependent term penalising the gradient of the pressure to the formulation. Later, in [14] this method was restricted and reinterpreted

G. R. Barrenechea

M. Bosy

V. Dolean (-)

Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK

University Côte d'Azur, CNRS, LJAD, Nice Cedex, France e-mail: work@victoritadolean.com

© The Author(s) 2020

Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK e-mail: gabriel.barrenechea@strath.ac.uk

Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK

Dipartimento di Matematica "F. Casorati", Universitá degli Studi di Pavia, Pavia, Italy e-mail: michal.bosy@unipv.it

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_13

as a Petrov–Galerkin scheme leading to the first consistent stabilised method, and further developments were presented in the works [7] and [13]. For a review of different residual stabilised finite element methods for the Stokes problem, see the review paper [2].

Now, due to their nature, residual methods include unphysical couplings to the formulation, and modify all the entries of the stiffness matrix. Hence, non-residual methods where only a positive semi-definite term penalising the pressure is added have also being proposed. Examples of this type of methods are the pressure gradient projection [9] and local pressure gradient stabilisation [3]. The methods just mentioned typically use two nested meshes in order to build the method. Thus, to avoid this complication, the local pressure gradient stabilisation has been also presented on the same mesh in [12]. Additionally, methods that use fluctuations of the pressure gradient are not effective when the finite element space for pressure is the piecewise constant space. The usual way to overcome this is to add pressure jumps to the formulation, as it has been done, e.g., in [16]. These have been shown to be very effective, but they do somehow temper with the data structure of the code. To avoid this, the authors in [10] present an approach that is based on polynomialpressure-projection. This method works for low order of polynomials as was shown in [4], and preserves symmetry of the original equation.

In the light of the discussion of the previous paragraphs, in this work we propose a stabilised hdG method for the Stokes problem with non-standard boundary conditions. The method is reminiscent of the Dorhmann–Bochev method (from [10]), but uses the same velocity space used in the hdG method from [1].

#### *1.1 Notations and Model Problem*

Let be an open polygonal domain in <sup>R</sup><sup>2</sup> with Lipschitz boundary := *∂*. We use boldface font for tensor or vector variables e.g. *u* is a velocity vector field. The scalar variables will be italic e.g. *p* denotes pressure scalar value. We define the stress tensor *σ* := *ν*∇*u* − *pI* (where *ν >* 0 is the fluid viscosity and *I* is the identity matrix) and the flux as *σn* := *σ n*. In addition, we denote normal and tangential components as follows *un* := *u* · *n*, *ut* := *u* · *t*, *σnn* := *σn* · *n*, where *n* is the outward unit normal vector to the boundary and *t* is a vector tangential to such that *n* · *t* = 0.

For *<sup>D</sup>* <sup>⊂</sup> , we use the standard *<sup>L</sup>*2*(D)* space with the following norm

$$\|f\|\_{D}^{2} := \int\_{D} f^{2} \, d\mathbf{x} \text{ for all } f \in L^{2}(D).$$

Let us define, for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, the following Sobolev spaces

$$H^m(D) := \left\{ v \in L^2(D) \, : \, \forall \, |\mathfrak{a}| \le m \, \partial^{\mathfrak{a}} v \in L^2(D) \right\},$$

$$H(div, D) := \left\{ \mathfrak{v} \in [L^2(D)]^2 \, : \, \nabla \cdot \mathfrak{v} \in L^2(D) \right\},$$

where, for *<sup>α</sup>* <sup>=</sup> *(α*1*, α*2*)* <sup>∈</sup> <sup>N</sup>2, <sup>|</sup>*α*| = *<sup>α</sup>*<sup>1</sup> <sup>+</sup> *<sup>α</sup>*2, and *<sup>∂</sup><sup>α</sup>* <sup>=</sup> *<sup>∂</sup>*|*α*<sup>|</sup> *∂xα*<sup>1</sup> <sup>1</sup> *∂xα*<sup>2</sup> 2 . In addition, we will use the standard semi-norm and norm for the Sobolev space *H m(D)*

$$\left\| f \right\|\_{H^m(D)}^2 := \sum\_{|\mathfrak{a}| = m} \left\| \partial^{\mathfrak{a}} f \right\|\_{D}^2, \quad \left\| f \right\|\_{H^m(D)}^2 := \sum\_{k=0}^m \left| f \right|\_{H^k(D)}^2 \forall \ f \in H^m(D).$$

In this work, we consider the two dimensional Stokes problem with tangentialvelocity and normal-flux (TVNF) boundary conditions

$$\begin{cases} -\nu \Delta \mathfrak{u} \; \; + \nabla p \; \; \; = \; f \; \; \text{in } \Omega, \\ \nabla \cdot \mathfrak{u} \; = 0 \; \; \text{in } \Omega, \\ \sigma\_{nn} \; \; = \; g \quad \text{on } \Gamma, \\ u\_I \; \; = 0 \; \text{on } \Gamma, \end{cases} \tag{1}$$

where *<sup>u</sup>* : ¯ <sup>→</sup> <sup>R</sup><sup>2</sup> is the unknown velocity field, *<sup>p</sup>* : ¯ <sup>→</sup> <sup>R</sup> the pressure, *ν >* <sup>0</sup> the viscosity, which is considered to be constant, and *<sup>f</sup>* ∈ [*L*2*()*] 2, *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2*()* are given functions. The restriction to homogeneous Dirichlet conditions on *ut* is made only to simplify the presentation.

Let = *Th* > *h>*<sup>0</sup> be a regular family of triangulations of ¯ made of triangles. For each triangulation *Th*, *E<sup>h</sup>* denotes the set of its edges. In addition, for each of element *K* ∈ *Th*, *hK* := diam*(K)*, and we denote *h* := max*K*∈*T<sup>h</sup> hK*. We define following Sobolev spaces on the triangulation *T<sup>h</sup>* and the set of all edges in *E<sup>h</sup>*

$$\begin{aligned} L^2(\mathcal{E}\_\hbar) &:= \left\{ v \,:\, v|\_E \in L^2(E) \,\forall \, E \in \mathcal{E}\_\hbar \right\}, \\ H^m(\overline{\mathcal{T}\_\hbar}) &:= \left\{ v \in L^2(\Omega) \,:\, v|\_K \in H^m(K) \,\forall \, K \in \mathcal{T}\_\hbar \right\} \text{ for } m \in \mathbb{N}, \end{aligned}$$

with the corresponding broken norms.

Now we will introduce the finite element spaces that discretise the above spaces. Let *k* ≥ 1. We start by introducing the velocity and pressure spaces. To discretise the velocity *u* we use the Brezzi–Douglas–Marini space (see [5, Section 2.3.1]) of order *k* ≥ 1 defined by

$$BDM\_h^k := \left\{ \mathfrak{v}\_h \in H \, (div, \mathfrak{Q}) \, : \, \mathfrak{v}\_h|\_K \in \left[\mathbb{P}\_k \, (K)\right]^2 \, \forall \, K \in \mathcal{T}\_h \right\}.$$

Associated to this space, we introduce the BDM projection *+k* : [*H*1*()*] <sup>2</sup> <sup>→</sup> *BDM<sup>k</sup> <sup>h</sup>* defined in [5, Section 2.5]. The pressure is discretised using the following space

$$\mathcal{Q}\_h^k := \left\{ q\_h \in L^2(\Omega) \,:\, q\_h|\_K \in \mathbb{P}\_k(K) \,\,\forall\,\, K \in \mathcal{T}\_h \right\}.$$

Associated to this space we define the local *L*2*(K)*-projection *<sup>k</sup> <sup>K</sup>* : *<sup>L</sup>*2*(K)* <sup>→</sup> <sup>P</sup>*<sup>k</sup> (K)* for each *<sup>K</sup>* <sup>∈</sup> *<sup>T</sup><sup>h</sup>* defined as follows. For every *<sup>w</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *(K)*, *<sup>k</sup> <sup>K</sup>(w)* is the unique element of P*<sup>k</sup> (K)* satisfying ; *<sup>K</sup> <sup>k</sup> <sup>K</sup>(w)vhdx* = ; *<sup>K</sup> wvhdx* ∀ *vh* ∈ <sup>P</sup>*<sup>k</sup> (K),* and we define the continuous projection *k*|*<sup>K</sup>* <sup>=</sup> *<sup>k</sup> <sup>K</sup>* for all *K* ∈ *Th*.

The last ingredient needed in the method described below is a finite element space associated to a family of Lagrange multipliers associated to the edges of the triangulation. These multipliers will be denoted by *u*˜ and are meant to approximate the tangential trace of the velocity *u* on the edges of the triangulation. For this, and in order to propose a discretisation with fewer degrees of freedom, we discretise the Lagrange multiplier *u*˜ using the space

$$\mathcal{M}\_{h,0}^{k-1} := \left\{ \tilde{v}\_h \in L^2\left(\mathcal{E}\_h\right) \, : \, \tilde{v}\_h|\_E \in \mathbb{P}\_{k-1}\left(E\right) \, \forall \, E \in \mathcal{E}\_h, \, \tilde{v}\_h = 0 \text{ on } \Gamma \right\}.$$

Furthermore, we introduce for all *<sup>E</sup>* <sup>∈</sup> *<sup>E</sup><sup>h</sup>* the *<sup>L</sup>*2*(E)*-projection *k*−<sup>1</sup> *<sup>E</sup>* : *<sup>L</sup>*<sup>2</sup> *(E)* <sup>→</sup> <sup>P</sup>*k*−<sup>1</sup> *(E)* defined as follows. For every *<sup>w</sup>*˜ <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *(E)*, *k*−<sup>1</sup> *<sup>E</sup> (w)*˜ is the unique element of <sup>P</sup>*k*−<sup>1</sup> *(E)* satisfying ; *<sup>E</sup> k*−<sup>1</sup> *<sup>E</sup> (w)*˜ *v*˜*<sup>h</sup> ds* = ; *<sup>E</sup> <sup>w</sup>*˜ *<sup>v</sup>*˜*<sup>h</sup> ds* ∀ ˜*vh* <sup>∈</sup> <sup>P</sup>*k*−<sup>1</sup> *(E),* and we denote *k*−<sup>1</sup> : *<sup>L</sup>*<sup>2</sup> *Eh* <sup>→</sup> *<sup>M</sup>k*−<sup>1</sup> *<sup>h</sup>* defined as *k*−1|*<sup>E</sup>* := *k*−<sup>1</sup> *<sup>E</sup>* for all *E* ∈ *Eh*.

#### **2 The Stabilised Method**

Our approach is to write the discrete problem with the same degree of polynomials for velocity and pressure spaces. In other words, denoting *Vh* := *BDM<sup>k</sup> <sup>h</sup>* <sup>×</sup> *<sup>M</sup>k*−<sup>1</sup> *h,*<sup>0</sup> , we want to use the space *Vh* <sup>×</sup> *Qk <sup>h</sup>*, instead of *Vh* <sup>×</sup> *Qk*−<sup>1</sup> *<sup>h</sup>* as it was done in [1]. To do this, we need the proper stabilisation term, because this choice of spaces does not guarantee inf-sup stability.

The first ingredient in the definition of the stabilised method for (1) we use the same bilinear forms as in [1], this is

$$\begin{aligned} \label{eq:SDAR-1} \begin{pmatrix} \left(\boldsymbol{w}\_{\boldsymbol{h}}, \tilde{\boldsymbol{w}}\_{\boldsymbol{h}}\right), \left(\boldsymbol{v}\_{\boldsymbol{h}}, \tilde{\boldsymbol{v}}\_{\boldsymbol{h}}\right) \end{pmatrix} :=& \sum\_{K \in \mathcal{T}\_{\boldsymbol{h}}} \left(\int\_{K} \boldsymbol{\nu} \nabla \boldsymbol{w}\_{\boldsymbol{h}} : \nabla \boldsymbol{v}\_{\boldsymbol{h}} \, d\boldsymbol{x} \\\\ \boldsymbol{-} \int\_{\partial K} \boldsymbol{\nu} \left(\eth\_{\boldsymbol{h}} \boldsymbol{w}\_{\boldsymbol{h}}\right)\_{\boldsymbol{l}} \left( (\boldsymbol{v}\_{\boldsymbol{h}})\_{\boldsymbol{l}} - \tilde{\boldsymbol{v}}\_{\boldsymbol{h}} \right) \, ds + \varepsilon \int\_{\partial K} \boldsymbol{\nu} \left( (\boldsymbol{w}\_{\boldsymbol{h}})\_{\boldsymbol{l}} - \tilde{\boldsymbol{w}}\_{\boldsymbol{h}} \right) \left( \boldsymbol{\partial\_{\boldsymbol{h}}} \boldsymbol{v}\_{\boldsymbol{h}} \right)\_{\boldsymbol{l}} \, ds \end{aligned}$$

$$\begin{aligned} &+\nu \frac{\tau}{h\_K} \int\_{\partial K} \Phi^{k-1} \left( (\boldsymbol{w}\_h)\_I - \tilde{\boldsymbol{w}}\_h \right) \Phi^{k-1} \left( (\boldsymbol{v}\_h)\_I - \tilde{\boldsymbol{v}}\_h \right) \, ds \, \bigg| \\ &\quad b \left( \left( \boldsymbol{v}\_h, \tilde{\boldsymbol{v}}\_h \right), q\_h \right) := - \sum\_{K \in \mathcal{T}\_h} \int\_K q\_h \nabla \cdot \boldsymbol{v}\_h \, d\mathbf{x}, \end{aligned}$$

where *ε* ∈ {−1*,* 1} and *τ >* 0 is a stabilisation parameter. In addition, to compensate for the non-inf-sup stability of the finite element spaces we have chosen, we introduce the bilinear form

$$s\left(p\_h, q\_h\right) := \frac{1}{\nu} \int\_{\Omega} \left(p\_h - \Psi^{k-1} p\_h\right) \left(q\_h - \Psi^{k-1} q\_h\right) \,d\mathbf{x}\,.$$

With these ingredients we can now present the finite element method analysed in this work: *Find uh, u*˜*h, ph* <sup>∈</sup> *Vh* <sup>×</sup> *Qk <sup>h</sup> such that for all vh, v*˜*h, qh* <sup>∈</sup> *Vh* <sup>×</sup> *Qk h*

$$A\left(\left(\mathfrak{u}\_{\hbar},\widetilde{\mathfrak{u}}\_{\hbar},p\_{\hbar}\right),\left(\mathfrak{v}\_{\hbar},\widetilde{v}\_{\hbar},q\_{\hbar}\right)\right) = \int\_{\Omega}f\mathfrak{v}\_{\hbar}\,d\mathfrak{x} + \int\_{\Gamma}g(\mathfrak{v}\_{\hbar})\_{\hbar}\,d\mathfrak{s},\tag{2}$$

where

$$\begin{aligned} A\left(\left(\mathfrak{u}\_{\hbar},\tilde{\boldsymbol{u}}\_{\hbar},p\_{\hbar}\right),\left(\mathfrak{v}\_{\hbar},\tilde{\boldsymbol{v}}\_{\hbar},q\_{\hbar}\right)\right) &:= a\left(\left(\mathfrak{u}\_{\hbar},\tilde{\boldsymbol{u}}\_{\hbar}\right),\left(\mathfrak{v}\_{\hbar},\tilde{\boldsymbol{v}}\_{\hbar}\right)\right) + b\left(\left(\mathfrak{v}\_{\hbar},\tilde{\boldsymbol{v}}\_{\hbar}\right),p\_{\hbar}\right) \\ &+ b\left(\left(\mathfrak{u}\_{\hbar},\tilde{\boldsymbol{u}}\_{\hbar}\right),q\_{\hbar}\right) - s\left(p\_{\hbar},q\_{\hbar}\right). \end{aligned}$$

#### *2.1 Well-Posedness of the Discrete Problem*

Let us consider the following norm on *Vh* (see [1, Lemma 3.2] for a proof that this is actually a norm in *Vh*)

$$|||\left(\mathfrak{w}\_{\hbar},\tilde{w}\_{\hbar}\right)|||^{2} := \nu \sum\_{K \in \mathcal{T}\_{\hbar}} \left( |\mathfrak{w}\_{\hbar}|\_{H^{1}(K)}^{2} + h\_{K} \left\| \mathfrak{d}\_{\hbar} \mathfrak{w}\_{\hbar} \right\|\_{\partial K}^{2} + \frac{\pi}{h\_{K}} \left\| \boldsymbol{\Phi}^{k-1} \left( (\boldsymbol{w}\_{\hbar})\_{t} - \tilde{w}\_{\hbar} \right) \right\|\_{\partial K}^{2} \right).$$

The first step towards proving the stability of Method (2) is the following weak inf-sup condition for *b*.

**Lemma 1** *There exist constants C*1*, C*<sup>2</sup> *>* 0*, independent of hK and ν, such that*

$$\sup\_{\left(\boldsymbol{v}\_{h},\tilde{\boldsymbol{v}}\_{h}\right)\in\boldsymbol{V}\_{h}}\frac{b\left(\left(\boldsymbol{v}\_{h},\tilde{\boldsymbol{v}}\_{h}\right),\boldsymbol{q}\_{h}\right)}{|||\left(\boldsymbol{v}\_{h},\tilde{\boldsymbol{v}}\_{h}\right)|||}\geq C\_{1}\left\|\boldsymbol{q}\_{h}\right\|\_{\Omega}-C\_{2}\left\|\boldsymbol{q}\_{h}-\boldsymbol{\Psi}^{k-1}\boldsymbol{q}\_{h}\right\|\_{\Omega}\quad\forall\boldsymbol{q}\_{h}\in\mathcal{Q}\_{h}^{k}.\tag{3}$$

*Proof* We consider an arbitrary *qh* <sup>∈</sup> *Qk <sup>h</sup>*. Let ˜ be a convex, open, Lipschitz set such that ⊂ ˜ , and let us consider following extension

$$
\hat{q}\_h := \begin{cases} q\_h \text{ in } \quad \Omega \\ 0 \text{ in } \tilde{\Omega} \text{ } \cup \Omega \end{cases}
$$

Let now *φ* be the unique weak solution of the problem

$$\begin{cases} -\Delta \phi = \hat{q}\_h \text{ in } \tilde{\Omega} \\ \phi = 0 \quad \text{on } \partial\Omega \end{cases} .$$

Since ˜ is convex, then *<sup>φ</sup>* <sup>∈</sup> *<sup>H</sup>*2*()*˜ . Then *<sup>w</sup>* := ∇*φ*| belongs to [*H*1*()*] 2, and for *w*˜ := *wt* ,

$$b\left(\left(\mathfrak{w},\tilde{w}\right),q\_{h}\right) = \|q\_{h}\|\_{\Omega}^{2} \quad \forall q\_{h} \in \mathcal{Q}\_{h}^{k}.\tag{4}$$

In addition, applying standard regularity results, see [5, Section 1.2], we get

$$\|\mathfrak{w}\|\_{H^1(\Omega)} \le \|\nabla\mathfrak{\Phi}\|\_{H^1(\tilde{\Omega})} \le c\_1 \|q\_h\|\_{\Omega}.\tag{5}$$

In [1, Lemma 3.5] it is shown that there exists a Fortin operator  : ! *H*<sup>1</sup> *()* "2 → *Vh* satisfying the following condition: for all *<sup>v</sup>* ∈ [*H*1*()*] <sup>2</sup> the following holds

$$b\left(\left(\mathfrak{v},\widetilde{\upsilon}\right),q\_h\right) = b\left(\Pi\left(\mathfrak{v}\right),q\_h\right) \quad \forall \, q\_h \in \mathcal{Q}\_h^{k-1},\tag{6}$$

$$|||\Pi~(\mathfrak{v})||| \leq C\sqrt{\nu} \|\mathfrak{v}\|\_{H^1(\mathfrak{Q})}.\tag{7}$$

Let *wh, w*˜ *<sup>h</sup>* :=  *(w)*, then thanks to (6), (4) and the continuity of *b* (see [1, Lemma 3.3])

$$\begin{aligned} b\left(\left(\boldsymbol{w}\_{h},\tilde{\boldsymbol{w}}\_{h}\right),q\_{h}\right) &= b\left(\left(\boldsymbol{w},\tilde{\boldsymbol{w}}\right),q\_{h}\right) - b\left(\left(\boldsymbol{w}-\boldsymbol{w}\_{h},\tilde{\boldsymbol{w}}-\tilde{\boldsymbol{w}}\_{h}\right),q\_{h}-\boldsymbol{\Psi}^{k-1}q\_{h}\right) \\ &\geq \|q\_{h}\|\_{\Omega}^{2} - c\_{2}\sqrt{\sum\_{K\in\mathcal{T}\_{h}}\left|\boldsymbol{w}\_{h}-\boldsymbol{w}\right|\_{H^{1}(K)}^{2}}\left\|q\_{h}-\boldsymbol{\Psi}^{k-1}q\_{h}\right\|\_{\Omega}.\end{aligned}$$

Using the approximation properties of the BDM interpolation operator (see [5, Preposition 2.5.1]) and (5)

$$\begin{aligned} b\left(\left(\boldsymbol{w}\_{h},\tilde{\boldsymbol{w}}\_{h}\right),q\_{h}\right) &\geq \left(\frac{1}{c\_{1}}\|q\_{h}\|\_{\Omega} - c\_{2}c\_{3}\left\|q\_{h} - \Psi^{k-1}q\_{h}\right\|\_{\Omega}\right) \left\|\boldsymbol{w}\right\|\_{H^{1}(\Omega)} \\ &\geq \left(C\_{1}\|q\_{h}\|\_{\Omega} - C\_{2}\left\|q\_{h} - \Psi^{k-1}q\_{h}\right\|\_{\Omega}\right) \left\|\left|\left(\boldsymbol{w}\_{h},\tilde{\boldsymbol{w}}\_{h}\right)\right|\right\|\_{\*} \end{aligned}$$

where, in the last estimate we have used the stability of the Fortin operator  in the ||| · ||| norm (7). This proves the result with *<sup>C</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> *<sup>C</sup>*√*νc*<sup>1</sup> and *<sup>C</sup>*<sup>2</sup> <sup>=</sup> *<sup>c</sup>*2*c*<sup>3</sup> *<sup>C</sup>*√*<sup>ν</sup>* . %&

Before showing an inf-sup condition, we prove the continuity of bilinear form *A*.

**Lemma 2** *There exists a constant C >* 0 *such that, for all wh, w*˜ *<sup>h</sup> , vh, v*˜*<sup>h</sup>* ∈ *Vh and rh, qh* <sup>∈</sup> *Qk <sup>h</sup>, we have*

$$\left| A\left( \left( \left( \boldsymbol{\omega}\_{\boldsymbol{h}}, \tilde{\boldsymbol{w}}\_{\boldsymbol{h}}, r\_{\boldsymbol{h}} \right), \left( \boldsymbol{v}\_{\boldsymbol{h}}, \tilde{v}\_{\boldsymbol{h}}, q\_{\boldsymbol{h}} \right) \right) \right| \leq C ||| \left( \left( \boldsymbol{\omega}\_{\boldsymbol{h}}, \tilde{\boldsymbol{w}}\_{\boldsymbol{h}}, r\_{\boldsymbol{h}} \right) ||| || \boldsymbol{v}\_{\boldsymbol{h}} ||| \left( \left( \boldsymbol{v}\_{\boldsymbol{h}}, \tilde{v}\_{\boldsymbol{h}}, q\_{\boldsymbol{h}} \right) ||| || \boldsymbol{h} . \tag{8}$$

*Proof* We use the continuity of the bilinear forms (see [1, Lemma 3.3]) and the fact that the projection is a bounded operator. %&

The final step towards stability is proving the inf-sup condition for bilinear form *A*.

**Lemma 3** *There exists β >* 0 *independent of hK such that for all wh, w*˜ *h, rh* ∈ *Vh* <sup>×</sup> *Qk <sup>h</sup> the following holds*

$$\sup\_{\left(\boldsymbol{v}\_{\mathsf{h}},\tilde{\boldsymbol{v}}\_{\mathsf{h}},q\_{\mathsf{h}}\right)\in\mathsf{V}\_{\mathsf{h}}\times\mathcal{Q}\_{\mathsf{h}}^{k}}\frac{A\left(\left(\boldsymbol{w}\_{\mathsf{h}},\tilde{\boldsymbol{w}}\_{\mathsf{h}},\boldsymbol{r}\_{\mathsf{h}}\right),\left(\boldsymbol{v}\_{\mathsf{h}},\tilde{\boldsymbol{v}}\_{\mathsf{h}},q\_{\mathsf{h}}\right)\right)}{|||\left(\boldsymbol{v}\_{\mathsf{h}},\tilde{\boldsymbol{v}}\_{\mathsf{h}},q\_{\mathsf{h}}\right)|||\_{h}}\geq\beta|||\left(\boldsymbol{w}\_{\mathsf{h}},\tilde{\boldsymbol{w}}\_{\mathsf{h}},\boldsymbol{r}\_{\mathsf{h}}\right)|||\_{h}.\tag{9}$$

*As a consequence, Problem* (2) *is well-posed.*

*Proof* Let *wh, w*˜ *h, rh* <sup>∈</sup> *Vh* <sup>×</sup> *Qk <sup>h</sup>*. The idea of the proof is to construct an appropriate *vh, v*˜*h, qh* such that

$$A\left(\left(\mathfrak{w}\_{\hbar},\tilde{w}\_{\hbar},r\_{\hbar}\right),\left(\mathfrak{v}\_{\hbar},\tilde{v}\_{\hbar},q\_{\hbar}\right)\right)\geq c|| \left(\mathfrak{w}\_{\hbar},\tilde{w}\_{\hbar},r\_{\hbar}\right) || ||\_{\hbar} \, || \, ||\left(\mathfrak{v}\_{\hbar},\tilde{v}\_{\hbar},q\_{\hbar}\right) || ||\_{\hbar}.$$

To achieve that we use coercivity of *a* (see [1, Lemma 3.4]), continuity of *a* (see [1, Lemma 3.3]) and Lemma 2. For details see [6]. %&

#### *2.2 Error Analysis*

In this section we present the error estimates for the method. The addition of the stabilising bilinear form *s(*·*,*·*)* introduced a consistency error. However according to [4], this should not be viewed as a serious flaw, as this consistency error can be bounded in an optimal way. The following result is the first step towards that goal.

**Lemma 4** *Let u, p* ∈ ! *<sup>H</sup>*<sup>1</sup> *()* <sup>∩</sup> *<sup>H</sup>*<sup>2</sup> *Th* "2 <sup>×</sup> *<sup>L</sup>*<sup>2</sup> *() be the solution of the problem* (1) *and u*˜ = *ut on all edges of Eh. If uh, u*˜*h, ph* <sup>∈</sup> *Vh* <sup>×</sup> *Qk <sup>h</sup> solves* (2)*,* *then for all vh, v*˜*h, qh* <sup>∈</sup> *Vh* <sup>×</sup> *Qk <sup>h</sup> the following holds*

$$A\left(\left(\mathfrak{u}-\mathfrak{u}\_{h},\widetilde{\mathfrak{u}}-\widetilde{\mathfrak{u}}\_{h},p-p\_{h}\right),\left(\mathfrak{v}\_{h},\widetilde{v}\_{h},q\_{h}\right)\right)=s\left(p,q\_{h}\right).\tag{10}$$

Next, we introduce the following norm

$$|||(\mathfrak{u}, \tilde{\mathfrak{u}}, p)|||\_{h} := |||(\mathfrak{u}, \tilde{\mathfrak{u}})||| + \frac{1}{\sqrt{\nu}} \|p\|\_{\mathfrak{U}},\tag{11}$$

and prove the following variant of Cea's lemma [11, Lemma 2.28] for this stabilised Stokes problem.

**Lemma 5** *Let u, p* ∈ ! *<sup>H</sup>*<sup>1</sup> *()* <sup>∩</sup> *<sup>H</sup>*<sup>2</sup> *Th* "2 <sup>×</sup> *<sup>L</sup>*<sup>2</sup> *() be the solution of the problem* (1) *and u*˜ = *ut on all edges of Eh. If uh, u*˜*h, ph* <sup>∈</sup> *Vh* <sup>×</sup> *Qk <sup>h</sup> solves* (2)*, then there exists C >* 0*, independent of h and ν, such that*

$$|||\left(\mathfrak{u} - \mathfrak{u}\_{h}, \tilde{\mathfrak{u}} - \tilde{\mathfrak{u}}\_{h}, p - p\_{h}\right)\\||\_{h} \leq C \inf\_{\left(\mathfrak{v}\_{h}, \tilde{\mathfrak{v}}\_{h}, q\_{h}\right) \in V\_{h} \times Q\_{h}^{k}} |||\left(\mathfrak{u} - \mathfrak{v}\_{h}, \tilde{\mathfrak{u}} - \tilde{\mathfrak{v}}\_{h}, p - q\_{h}\right)$$

$$+\frac{\mathcal{C}}{\sqrt{\nu}} \left\| p - \Psi^{k-1} p \right\|\_{\Omega}.\tag{12}$$

*Proof* It is a combination of Lemmas 1, 2 and 3. For details see [6]. %&

**Lemma 6** *Let u, p* ∈ ! *<sup>H</sup>*<sup>1</sup> *()* <sup>∩</sup> *<sup>H</sup>*<sup>2</sup> *Th* "2 <sup>×</sup> *<sup>L</sup>*<sup>2</sup> *() be the solution of the problem* (1) *and u*˜ = *ut on all edges of Eh. If uh, u*˜*h, ph* <sup>∈</sup> *Vh* <sup>×</sup> *Qk <sup>h</sup> solves* (2)*, then there exists C >* 0*, independent of h and ν, such that*

$$||\left(\mathfrak{u} - \mathfrak{u}\_{h}, \tilde{\mathfrak{u}} - \tilde{\mathfrak{u}}\_{h}, p - p\_{h}\right)||\_{h} \leq Ch^{k} \left(\sqrt{\nu} \|\mathfrak{u}\|\_{H^{k+1}\left(\mathcal{T}\_{h}\right)} + \frac{1}{\sqrt{\nu}} \|p\|\_{H^{k}\left(\mathcal{T}\_{h}\right)}\right).$$

*Proof* It is a combination of [1, Lemmas 3.8] and Lemma 5 with the local *L*2 projection approximation [11, Theorem 1.103]. %&

#### **3 Numerical Experiments**

The computational domain is the unit square = *(*0*,* 1*)* 2. We present the results for *<sup>k</sup>* <sup>=</sup> 1, that is the discrete space is given by *BDM***<sup>1</sup>** *<sup>h</sup>* <sup>×</sup> *<sup>M</sup>*<sup>0</sup> *h,*<sup>0</sup> <sup>×</sup> *<sup>Q</sup>*<sup>1</sup> *<sup>h</sup>*. We test both the symmetric method (*ε* = −1) and the non-symmetric method (*ε* = 1). We have followed the recommendation given in [15, Section 2.5.2] and taken *τ* = 6. We choose the right hand side *f* and the boundary condition *g* such that the exact solution is given by

$$\mu = \operatorname{curl} \left[ \left( 1 - \cos((1 - \mathbf{x})^2) \right) \sin(\mathbf{x}^2) \sin(\mathbf{y}^2) \left( 1 - \cos((1 - \mathbf{y})^2) \right) \right], \quad p = \tan(\mathbf{x}\mathbf{y}).$$

In Fig. 1a and b we depict the errors for both the symmetric and non-symmetric cases, respectively. We can see that they not only validate the theory from Sect. 2.2, but also perform an optimal *<sup>h</sup>*<sup>2</sup> convergence rate for *<sup>u</sup>* <sup>−</sup> *uh*. Furthermore, we observe an increased order of convergence for *p* − *ph*. In fact, the error seems to decrease with *O(h*3*/*2*)*, rather than the *O(h)* predicted by the theory.

To stress the last point made in the previous paragraph, in Table 1 we compare the *<sup>L</sup>*<sup>2</sup> error of the pressure (||*<sup>p</sup>* <sup>−</sup> *ph*||) for hdG method introduced in [1] and stabilised hdG method from Sect. 2. Columns *ph* <sup>∈</sup> *<sup>Q</sup>*<sup>0</sup> *<sup>h</sup>* are associated with hdG method and *ph* <sup>∈</sup> *<sup>Q</sup>*<sup>1</sup> *<sup>h</sup>* with stabilised hdG ones. There, we confirm that the pressure

**Fig. 1** Convergence the stabilised method with *k* = 1. (**a**) Symmetric bilinear form (*ε* = −1). (**b**) Non-symmetric bilinear form (*ε* = 1)


**Table 1** Comparison of the error of the pressure ||*p* − *ph*||

error for the stabilised version is much smaller than the one for the inf-sup stable case, in addition to having an increased order of convergence.

#### **4 Conclusion**

In this work we have applied the idea introduced in [10] to stabilise the hdG method proposed in [1] for the Stokes problem with TVNF boundary conditions. The method adds a simple, symmetric, term to the formulation, and allowed us to use a higher order pressure space, which, in turn, improved the pressure convergence (although a proof of this fact is, in general, not available). This approach was also applied to NVTF boundary conditions (see [6]) and can be used for other discontinuous Galerkin methods that deal with Stokes or nearly incompressible elasticity problems.

Future testing using higher order discretisations is needed to assess whether this approach provides an increase of the convergence rate for the pressure. Thus, the numerical tests with higher order of polynomials for discontinuous finite methods is interest for further research to look for the improvement of the convergence.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **RBF Based CWENO Method**

**Jan S. Hesthaven, Fabian Mönkeberg, and Sara Zaninelli**

### **1 Introduction**

A broad range of physical phenomena can be described by hyperbolic conservation laws of the form

$$\begin{aligned} \mu\_l + f(\mu)\_\times &= 0, \quad (\mathbf{x}, t) \in \mathbb{R} \times \mathbb{R}\_+, \\ \mu(\mathbf{0}) &= \mu\_0, \end{aligned} \tag{1}$$

with the conserved variables *<sup>u</sup>* : <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup>*<sup>N</sup>* and the flux function *<sup>f</sup>* : <sup>R</sup>*<sup>N</sup>* <sup>→</sup> R*<sup>N</sup>* . The nonlinear behavior of *f* can lead to complex solutions, most notably shocks. It is well-known that high-order methods give good results for smooth data, but for discontinuous ones spurious oscillations are introduced. A popular class of methods to solve (1) is the finite volume method, which is based on a discretization in space *... < xi*−1*/*<sup>2</sup> *< xi*+1*/*<sup>2</sup> *< ...* and the average values *u*¯*<sup>i</sup>* of its cells *Ci* = [*xi*−1*/*2*, xi*+1*/*2]. It is defined by the semi-discrete scheme

$$\frac{\mathbf{d}\overline{\mu}\_{l}}{\Delta t} = -\frac{F\_{l+1/2} - F\_{l-1/2}}{\Delta x},\tag{2}$$

where the numerical flux term *Fi*+1*/*<sup>2</sup> depends on the values { ¯*ui*−*k,..., u*¯*i*+*p*−*<sup>k</sup>* } with 0 ≤ *k* ≤ *p* − 1. For more details we refer the reader to [15, 20, 22].

The class of essentially nonoscillatory (ENO) methods, introduced by Harten et al. [14], reduces spurious oscillations to a minimum. They are based on a monotone numerical flux function *F (u, v)* and high-order accurate reconstruction *si(x)* for

J. S. Hesthaven · F. Mönkeberg (-) · S. Zaninelli

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland e-mail: jan.hesthaven@epfl.ch; fabian.monkeberg@epfl.ch; sara.zaninelli@epfl.ch

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_14

each cell *i*. The central idea is to choose the least oscillating interpolation function *si* and define the numerical flux *Fi*+1*/*<sup>2</sup> = *F (u*<sup>+</sup> *<sup>i</sup>*+1*/*2*, u*<sup>−</sup> *<sup>i</sup>*+1*/*2*)* with *<sup>u</sup>*<sup>±</sup> *<sup>i</sup>*+1*/*<sup>2</sup> being the evaluation of *si*+<sup>1</sup> and *si* at the interface *xi*+1*/*2. Based on the ENO method, Jiang and Shu [19] introduced the weighted ENO (WENO) method which considers different interpolation polynomials, based on different stencils, and combines them in a nonoscillatory manner to maximize the attainable accuracy. Further results on ENO and WENO methods can be found in [10, 11, 16].

#### **2 CWENO**

The CWENO method is based on the WENO method and was introduced by Levy et al. [23] as a third order method. Further analysis and generalization to higher orders on general grids can be found in [6, 7].

Let us consider the standard semi-discrete formulation (2) with a monotone flux function *F (u, v)*. The goal is to construct a reconstruction *Prec,i* for each cell *Ci* based on the stencil {*Ci*−*k,...,Ci*+*<sup>k</sup>* } for *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. In the smooth regions the algorithm should choose a polynomial of degree 2*k* which interpolates the central stencil *u*¯*i*−*k,..., u*¯*i*+*<sup>k</sup>* in the mean value sense. In case of a non-smooth solution it chooses a polynomial of degree *k* on one stencil {*Ci*−*k*+*l,...,Ci*+*l*} that avoids the discontinuity. Given the reconstruction, the high-order numerical flux is *Fi*+1*/*<sup>2</sup> = *F (Prec,i*+<sup>1</sup>*(xi*+1*/*2*), Prec,i(xi*+1*/*2*))*.

Specifically, let us consider *Popt* as the polynomial of degree 2*k* that interpolates all data in the 2*k* + 1 stencil and the polynomials *Pl* of degree *k* that interpolate the data on the stencil {*Ci*−*k*+*l*−<sup>1</sup>*,...,Ci*+*l*−1} for *l* = 1*,...,k* + 1. Furthermore, the reconstruction depends on the choice of the positive real coefficients *d*0*,...,dk*+<sup>1</sup> ∈ [0*,* <sup>1</sup>]such that <sup>&</sup>lt;*k*+<sup>1</sup> *<sup>l</sup>*=<sup>0</sup> *dl* <sup>=</sup> 1, *<sup>d</sup>*<sup>0</sup> = 0. Then, the reconstruction polynomial of degree 2*k* is

$$P\_{rec}(\mathbf{x}) = \sum\_{l=0}^{k+1} a\_l P\_l(\mathbf{x}),\tag{3}$$

with

$$P\_0(\mathbf{x}) = \frac{1}{d\_0} \left( P\_{opt}(\mathbf{x}) - \sum\_{l=1}^{k+1} d\_l P\_l(\mathbf{x}) \right), \tag{4}$$

and the nonlinear coefficients *ωl* that are defined as

$$\alpha \eta = \frac{\alpha\_l}{\sum\_{l=0}^{k+1} \alpha\_l}, \qquad \alpha\_l = \frac{d\_l}{(I[P\_l] + \bar{\epsilon})^t}, \tag{5}$$

where *I* [*Pl*] indicates the smoothness of *Pl*, 1 ' ¯*# >* 0 and *t* ≥ 2. A classical indicator of smoothness in the cell *C* for a polynomial is the Jiang–Shu indicator [19]

$$I[P] = \sum\_{l>0} \text{diam}(C)^{2l-1} \int\_{C} \left(\frac{\mathbf{d}^{l}}{\mathbf{d}x^{l}} P(\mathbf{x})\right)^{2} \text{d}x. \tag{6}$$

The choice of *#*¯ is of importance: if it is too small, it might affect the order of convergence. On the other hand if it is too big, spurious oscillations may occur. Cravero et al. [7] show that the choice *#*¯ = ˆ*#hp* for *<sup>p</sup>* <sup>=</sup> <sup>1</sup>*,* 2 leads to the maximal order of convergence. As proposed in [7] we define the coefficients *dj* over the temporary weights

$$
\hat{d}\_j = \hat{d}\_{k+2-j} = j, \qquad 1 \le j \le \frac{k+2}{2}, \tag{7}
$$

and we choose *d*<sup>0</sup> ∈ *(*0*,* 1*)* for the high-order polynomial. This gives us a possible choice for the coefficients

$$d\_{\hat{l}} = \frac{\hat{d}\_{\hat{l}}}{\sum\_{i>0} \hat{d}\_{\hat{l}}} (1 - d\_0). \tag{8}$$

The main difference with respect to the classical WENO method is that for the smooth case we are not constructing *Popt* out of the polynomials *Pl*, but we build it independently by resolving an additional system of equations. This method has the advantage that it is easier to generalize on general grids in high dimensions, while maintaining high-order accuracy.

#### **3 Radial Basis Functions**

An alternative to the classical polynomial interpolation is the interpolation with radial basis functions (RBF). RBFs were proposed in the seminal work by Hardy [13]. They have been successfully applied in scattered data interpolation [4, 9, 17, 24, 27] and as a basis for a generalized finite difference method (RBF-FD) [5, 12]. The advantage is its flexibility in high dimensions and the possibility to reduce the risk of ill-conditioned point constellations. Its disadvantage is the ill-conditioning of the interpolation matrix for small grid sizes [8, 21, 26].

The RBF interpolation is based on a basis *B*, obtained from a univariate continuous function *<sup>φ</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>, composed with the Euclidean norm centered at the data points

$$\phi(\mathbf{x} - \mathbf{x}\_f) := \phi(\varepsilon \|\mathbf{x} - \mathbf{x}\_f\|), \tag{9}$$

**Table 1** Commonly used RBFs with <sup>N</sup> ( *ν >* 0, *<sup>k</sup>* <sup>∈</sup> <sup>N</sup> and *ε >* 0


with the shape parameter *ε*. Some common RBFs can be found in Table 1. Thus, for given scattered data points *<sup>X</sup>* <sup>=</sup> *(x*1*,...,xn)<sup>T</sup>* with *xj* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* and corresponding values *<sup>f</sup>*1*,...,fn* <sup>∈</sup> <sup>R</sup> we look for

$$s(\mathbf{x}) = \sum\_{j=1}^{n} a\_j \phi(\mathbf{x} - \mathbf{x}\_j) + p(\mathbf{x}),\tag{10}$$

with a polynomial *<sup>p</sup>* <sup>∈</sup> *Πm*−1*(*R*<sup>d</sup> )*, *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, the interpolation condition *s(xj )* <sup>=</sup> *fj* and the additional constraints

$$\sum\_{j=1}^{n} a\_j q(\mathbf{x}\_j) = 0, \qquad \text{for all } q \in \Pi\_{m-1}(\mathbb{R}^d), \tag{11}$$

with the coefficients *aj* <sup>∈</sup> <sup>R</sup> for all *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,n*.

The same concept can be applied in the case of cell-averages. We seek functions

$$s(\mathbf{x}) = \sum\_{j=1}^{n} a\_j \lambda\_{C\_j}^{\frac{k}{k}} \phi(\mathbf{x} - \boldsymbol{\xi}) + p(\mathbf{x}), \qquad p \in \Pi\_{m-1}(\mathbb{R}^d), \tag{12}$$

such that

$$
\lambda\_{C\_f} \mathbf{s} = \bar{\mu}\_f,\tag{13a}
\\
\text{for all } j = 1, \dots, n,\tag{13a}
$$

$$\sum\_{j=1}^{n} a\_j \lambda\_C(p) = 0,\qquad\text{ for all } C \in \{C\_1, \dots, C\_n\},\tag{13b}$$

with the averaging operator *λx <sup>C</sup>f (x)* <sup>=</sup> <sup>1</sup> |*C*| ; *<sup>C</sup> f (x)*d*x*. A well-known problem with RBFs is the high condition number of the interpolation matrix for small grid sizes or small shape parameters [8, 21, 26]. This problem can be resolved by using the vector-valued rational approximation method [28].

#### **4 RBF-CWENO**

Methods combining RBFs and essentially nonoscillatory methods have been proposed, e.g. RBFs with ENO [18, 25], RBFs with WENO [1–3]. The advantage of the CWENO method over the WENO method is its flexibility on general grids and its independence of the construction of a high-order interpolation function out of lower order ones. This facilitates the use of the whole grid in smooth regions and is important for non-polynomial interpolation functions which cannot be combined to an higher order function.

We propose the RBF-CWENO method which works as the classical CWENO method with the reconstruction function (3) and the weights (5), but as interpolation function we use RBFs instead of polynomials. Since the problem of the illconditioning can be solved by using the vector-valued rational approximation method [28], the main challenge for RBF methods is the choice of the smoothness indicator. For polyharmonic splines, Aboyar et al. [1] use the semi-norm of the Beppo-Levi space and Bigoni et al. [3] use a modified version of the Jiang-Shu indicator (6).

#### *4.1 Smoothness Indicator*

The smoothness indicator is the heart of the essentially nonoscillatory methods. We consider one based on the one introduced by Bigoni and Hesthaven [3]

$$\begin{split} I\_{l}[s] &= \sum\_{l=1}^{g+1} \Delta x\_{l}^{2l-1} \int\_{C\_{l}} \left( \frac{\partial^{l} p(\mathbf{x})}{\partial \mathbf{x}^{l}} \right)^{2} \mathbf{d} \mathbf{x} \\ &+ \Delta x\_{l}^{2g+1} \int\_{C\_{l}} \left( \frac{\partial^{g+1}}{\partial \mathbf{x}^{g+1}} \Big[ \sum\_{j=1}^{g+1} a\_{j} \lambda\_{C\_{j}}^{\frac{g}{}} \phi \left( \|\mathbf{x} - \boldsymbol{\xi}\| \right) \Big] \mathbf{d} \mathbf{x} \right)^{2}, \end{split} \tag{14}$$

where the first part is the sum of the derivatives of the polynomial part and the second term expresses the highest derivative of the RBF-part. The original Jiang-Shu indicator applied to (12) would include the lower derivatives of the RBF-part plus all mixed terms, but we find this to be less efficient. For simplicity the integrals can be approximated with a simple mid-point rule.

We face again the problem of ill-conditioning when recovering the coefficients *ai*. Numerical examples indicate that small shape parameter improve the accuracy, but they do not affect the choice of the stencil using this smoothness indicator. Thus, we use a bigger shape parameter *εR*, that is smaller than the smallest distance to a singularity

$$\varepsilon\_R = 0.95(\max\_{i,j \le N} \|\mathbf{x}\_i - \mathbf{x}\_j\|)^{-1},\tag{15}$$

which ensures the solvability of the system of equations [28].

#### **5 Numerical Results**

We now discuss the numerical results of the RBF-CWENO method and compare it with the RBF-WENO method [3] and the classical ENO method [14]. All methods are using the Lax-Friedrichs numerical flux and integration in time is done using the SSPRK-5 method [15] with time step *dt* = *CFL* · *Δx/λmax* and the maximal eigenvalue *λmax* of ∇*uF*. Furthermore, we use the vector-valued rational approximation approach [28] to circumvent ill-conditioning of the interpolation matrix and a shape parameter *ε* = 0*.*1. For the nonlinear weights (5) we choose *#*¯ = ˆ*#h*<sup>2</sup> with *#*<sup>ˆ</sup> <sup>=</sup> <sup>0</sup>*.*1.

#### *5.1 Linear Advection Equation*

Let us consider the linear advection equation

$$u\_l + au\_\chi = 0, \qquad \chi \in [0, 1], \tag{16}$$

with wave speed *a* = 1, initial condition *u*0*(x)* = *sin(*2*πx)* and periodic boundary conditions [22]. Note that for *k* = 3 we expect the order of convergence to be 7, therefore we use the reduced time step *dt* <sup>=</sup> *CFL* · *Δx*7*/*<sup>5</sup>*/λmax* to recover the right order of convergence. The correct order of convergence of the RBF-CWENO method is shown in Table 2 and it seems to be more accurate than the RBF-WENO method.

#### *5.2 Burger's Equation*

Considering the Burger's equation

$$
\mu\_1 + \frac{1}{2} (\mu^2)\_x = 0, \qquad x \in [0, 1], \tag{17}
$$


**Table 2** Convergence rates of RBF-CWENO using multiquadratics for the linear advection equation at time *t* = 0*.*05

We use shape parameter *ε* = 0*.*1, CFL = 0*.*01

**Fig. 1** Burger's equation at *t* = 0*.*3 with *u*<sup>0</sup> = sin*(*2*πx)* solved by using RBF-CWENO method with MQ interpolants of order *k* = 3

*x*

we analyze its robustness with respect to discontinuities. In Fig. 1 we report the results performed with *CFL* = 0*.*5 at *t* = 0*.*3. We observe no oscillations around the discontinuity at *x* = 0*.*5 and as expected an increasing accuracy for increasing number of elements.

#### *5.3 Euler Equations*

The one-dimensional Euler equations express conservation of mass, momentum and the total energy. They can be described by the density *ρ*, the mass flow *m*, the energy per unit volume *E* and the pressure *p* through

$$
\begin{pmatrix} \rho \\ m \\ E \end{pmatrix}\_t + \begin{pmatrix} m \\ \frac{m^2}{\rho} + p \\ \frac{m}{\rho}(E+p) \end{pmatrix}\_\chi = 0,\tag{18}
$$

with *<sup>p</sup>* <sup>=</sup> *<sup>R</sup>ρT* <sup>=</sup> *(γ* <sup>−</sup> <sup>1</sup>*)(E* <sup>−</sup> <sup>1</sup> 2 *m*2 *<sup>ρ</sup> )* for an ideal gas with the ratio of specific heat *γ* = 1*.*4 [15]. For *k* = 3 we need to change the nonlinear weights (5) by using *#*¯ = ˆ*#h*<sup>2</sup> with *#*<sup>ˆ</sup> <sup>=</sup> <sup>10</sup>−<sup>6</sup> to avoid oscillations.

#### **5.3.1 Sod's Shock Tube Problem**

The Sod's shock tube problem describes two colliding gases in [0*,* 1] with different densities given by the initial conditions

$$(\rho\_0, m\_0, p\_0) = \begin{cases} (1, 0, 1) & \text{if } x < 0.5\\ (0.125, 0, 0.1) & \text{if } x \ge 0.5 \end{cases} \tag{19}$$

This results in a rarefaction wave followed by a contact and a shock discontinuity which separates the domain into four domains with constant variables. The RBF-CWENO method resolves it well, see Fig. 2. For *k* = 3, we observe minor

**Fig. 2** Results for the Sod shock tube problem at *t* = 0*.*2 solved by using RBF-CWENO with MQ interpolants of order *k* = 2*,* 3 on characteristic variables (left: *k* = 2, right: *k* = 3)

**Fig. 3** Results for the Euler shock entropy problem at *t* = 1*.*8 solved by using RBF-CWENO with MQ interpolants of order *k* = 2 on characteristic variables (Left) and a comparison with WENO, ENO2 and ENO5 for *N* = 256 cells (Right)

oscillations, but their amplitude decreases for increasing number of elements. Furthermore, we observe the increasing accuracy for *k* = 3 compared to *k* = 2.

#### **5.3.2 Shu–Osher Shock-Entropy Wave Interaction Problem**

The Shu–Osher problem describes the interaction of a discontinuity with a low frequency wave which introduces some high frequent waves. Its initial conditions are

$$(\rho\_0, m\_0, p\_0) = \begin{cases} (3.857143, 2.629369, 10.33333) & \text{if } x < -4\\ (1 + 0.2\sin(5x), 0, 1) & \text{if } x \ge -4 \end{cases} \tag{20}$$

In Fig. 3, we observe on the left side the increasing accuracy for increasing number of elements for *k* = 2. On the right side we see its good approximative behaviour compared to the existing methods ENO2, ENO5 and the corresponding WENO. In particular we observe that the performance of the RBF-CWENO (*k* = 2) is comparable to ENO5 and superior to WENO (*k* = 2).

#### **6 Conclusion**

In this work, we introduce the RBF-CWENO method that relies on the CWENO method [23] and the use of radial basis functions for the interpolation. We develop a smoothness indicator that is based on RBFs but works similarly to the one for polynomials. Furthermore, we tackle the problem about the choice of the weight <sup>1</sup> ' ¯*# >* 0. For *#*¯ = ˆ*#h*<sup>2</sup> with *#*<sup>ˆ</sup> <sup>=</sup> <sup>0</sup>*.*1 we get the right order of convergence, but for the 7th order method (*<sup>k</sup>* <sup>=</sup> 3) we choose *#*<sup>ˆ</sup> <sup>=</sup> <sup>10</sup>−<sup>6</sup> to reduce spurious oscillations for the Euler equations.

Moreover, we should point out that the choice of the linear weight *d*<sup>0</sup> can influence the result; indeed if it is too close to 1 then the reconstruction almost coincides with *P*opt, which can lead to spurious oscillations in case of discontinuous solutions. We present multiple numerical examples to show the robustness of the method.

We can conclude that the RBF-CWENO method works comparable to the existing RBF-WENO and ENO methods in one dimension. The advantage of RBFs is clearer when considering unstructured grids in higher dimensions where polynomial reconstruction is complex.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Discrete Equivalence of Adjoint Neumann–Dirichlet div-grad and grad-div Equations in Curvilinear 3D Domains**

**Yi Zhang, Varun Jain, Artur Palha, and Marc Gerritsma**

### **1 Introduction**

In <sup>R</sup>*<sup>d</sup>* , given a bounded domain with Lipschitz boundary *∂* and *<sup>σ</sup>*<sup>ˆ</sup> *<sup>n</sup>* <sup>∈</sup> *<sup>H</sup>* <sup>−</sup>1*/*2*(∂)* <sup>=</sup> tr *H (*div*, )*, *<sup>ω</sup>* <sup>∈</sup> *<sup>H</sup>*1*()* solves the Neumann problem,

$$\begin{cases} \begin{aligned} \frac{\partial \boldsymbol{\omega}}{\partial \boldsymbol{\pi}} &= \hat{\boldsymbol{\sigma}}\_{\text{H}} \qquad & \text{on } \partial \Omega \\ -\text{div}\left(\text{grad } \boldsymbol{\omega}\right) + \boldsymbol{\omega} &= 0 \qquad & \text{in } \Omega \end{aligned} \end{cases} , \tag{1}$$

if and only if *σ* ∈ *H (*div*, )* which solves the Dirichlet problem,

$$\begin{cases} \sigma \cdot \mathfrak{n} = \hat{\sigma}\_{\mathfrak{n}} & \text{on } \partial \Omega \\\\ -\text{grad}\,(\text{div } \sigma) + \sigma = 0 & \text{in } \Omega \end{cases}, \tag{2}$$

satisfies *σ* = grad *ω* [3]. This is obvious at the continuous level. The question is whether we can find a set of finite dimensional function spaces such that *<sup>σ</sup> <sup>h</sup>* <sup>=</sup> grad *ω<sup>h</sup>* holds if *ω<sup>h</sup>* and *σ <sup>h</sup>* solve the discrete Neumann and Dirichlet problems respectively. The answer is yes.

Y. Zhang (-) · V. Jain · M. Gerritsma

Delft University of Technology, Delft, Netherlands e-mail: y.zhang-14@tudelft.nl; v.jain@tudelft.nl; m.i.gerritsma@tudelft.nl

© The Author(s) 2020

A. Palha Eindhoven University of Technology, Eindhoven, Netherlands e-mail: a.palha@tue.nl

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_15

Throughout this paper, we restrict ourselves to R3. We will first construct the primal polynomial spaces and their algebraic dual representations, and then use them to discretize problems (1) and (2) such that the identity *<sup>σ</sup> <sup>h</sup>* <sup>=</sup> grad *<sup>ω</sup><sup>h</sup>* holds at the discrete level in any curvilinear domain for any polynomial approximation degree. This work extends [7, 9], where similar dual Neumann–Dirichlet problems are considered, to 3-dimensional space. These primal spaces and their algebraic dual representations can be ideal for the so-called mimetic or structure-preserving discretizations [1, 4, 8, 11, 12]. Together with their trace spaces, they can be used for the hybrid finite element methods which first decompose the domains into discontinuous elements then connect them with Lagrange multipliers living in the trace spaces [2, 13, 14].

The outline of this paper is as follows: In Sect. 2, we introduce the construction of polynomial spaces and their algebraic dual representations. The discrete formulations of the Neumann–Dirichlet problems and the proof of their equivalence at the discrete level follow in Sect. 3. A 3-dimensional numerical test case is then presented in Sect. 4. Finally, conclusions are drawn in Sect. 5.

#### **2 Function Spaces**

#### *2.1 Primal Polynomial Spaces*

Let <sup>−</sup><sup>1</sup> <sup>=</sup> *<sup>ξ</sup><sup>i</sup>* <sup>0</sup> *< ξ<sup>i</sup>* <sup>1</sup> *<sup>&</sup>lt;* ··· *< ξ<sup>i</sup> <sup>I</sup> <sup>i</sup>* = 1, *i* = 1*,* 2*,* 3, being three partitionings of [−1*,* 1]. The associated Lagrange polynomials are

$$h\_j(\boldsymbol{\xi}^i) = \prod\_{m=0, m \neq j}^{I^i} \frac{\boldsymbol{\xi}^i - \boldsymbol{\xi}\_m^i}{\boldsymbol{\xi}\_j^i - \boldsymbol{\xi}\_m^i}, \ j = 0, 1, \dots, I^i.$$

They are polynomials of degree *I <sup>i</sup>* which satisfy the Kronecker delta property, *hj (ξ<sup>i</sup> <sup>k</sup> )* = *δjk*. The associated edge functions can be derived as [6],

$$e\_j(\xi^i) = -\sum\_{k=0}^{j-1} \frac{\mathrm{d}h\_k(\xi^i)}{\mathrm{d}\xi^i}, \ j = 1, 2, \cdots, I^i,$$

which are polynomials of degree *<sup>I</sup> <sup>i</sup>* <sup>−</sup> 1. Edge functions also satisfy the Kronecker delta property, but in the integral sense,

$$\int\_{\xi\_{k-1}^{\vec{\ell}}}^{\xi\_k^{\vec{\ell}}} e\_f(\xi^{\vec{\ell}}) \, \mathrm{d}\xi^{\vec{\ell}} = \delta\_{jk}.$$

Consider a reference domain ref|*<sup>ξ</sup>* <sup>1</sup>*,ξ* <sup>2</sup>*,ξ* <sup>3</sup> := [−1*,* 1] 3. With the tensor product, we can construct finite dimensional scalar function space <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> spanned by polynomial basis functions

$$\left\{ h\_i(\xi^1) h\_j(\xi^2) h\_k(\xi^3) \right\},$$

and vector-valued function space <sup>L</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> spanned by polynomial basis functions

$$\left\{ e\_l(\xi^1) h\_j(\xi^2) h\_k(\xi^3), \ h\_l(\xi^1) e\_j(\xi^2) h\_k(\xi^3), \ h\_l(\xi^1) h\_j(\xi^2) e\_k(\xi^3) \right\}.$$

Let *<sup>ω</sup><sup>h</sup>* <sup>∈</sup> <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> be

$$
\omega^h = \sum\_{i=0}^{I^1} \sum\_{j=0}^{I^2} \sum\_{k=0}^{I^3} w\_{i,j,k} h\_i(\xi^1) h\_j(\xi^2) h\_k(\xi^3). \tag{3}
$$

Due to the way of constructing the edge functions, we can easy derive *<sup>ρ</sup><sup>h</sup>* <sup>=</sup> grad *<sup>ω</sup><sup>h</sup>* <sup>∈</sup> <sup>L</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> ,

$$
\rho^h = \text{grad } \omega^h = (\rho\_1, \,\rho\_2, \,\rho\_3)^\mathsf{T},
$$

where [6],

$$\rho\_1 = \sum\_{i=1}^{I^1} \sum\_{j=0}^{I^2} \sum\_{k=0}^{I^3} \left( w\_{i,j,k} - w\_{i-1,j,k} \right) e\_i(\xi^1) h\_j(\xi^2) h\_k(\xi^3),$$

$$\rho\_2 = \sum\_{i=0}^{I^1} \sum\_{j=1}^{I^2} \sum\_{k=0}^{I^3} \left( w\_{i,j,k} - w\_{i,j-1,k} \right) h\_i(\xi^1) e\_j(\xi^2) h\_k(\xi^3),$$

$$\rho\_3 = \sum\_{i=0}^{I^1} \sum\_{j=0}^{I^2} \sum\_{k=1}^{I^3} \left( w\_{i,j,k} - w\_{i,j,k-1} \right) h\_i(\xi^1) h\_j(\xi^2) e\_k(\xi^3).$$

Let *ω*, *ρ* be the vectors of expansion coefficients of *ωh*, *ρh*. We can obtain

$$
\underline{\underline{\rho}} = \mathbb{E}\,\underline{\underline{\omega}},\tag{4}
$$

where E is called the incidence matrix. The incidence matrix is very sparse, only consists of ±1 as non-zero entries. If we squeeze, stretch or distort the domain, of course, the polynomial basis functions change, but the incidence matrix will remain the same. It only depends on the topology of the mesh and the numbering of the degrees of freedom. And it is exact. In other words, it introduces no extra error. All these features make it an excellent discrete counterpart of the grad operator. Examples of incidence matrices can be found in [8, 10–12].

For a comprehensive explanation of these polynomial basis functions, we refer to [6]. In isogeometric analysis, tensor-product B-splines with similar properties have been developed, see, for example [5]. For tetrahedral elements, an analogue development can be found in [15].

From (3), we can derive the trace of *ωh*, for example, on the back boundary of ref, <sup>b</sup> = *<sup>ξ</sup>* <sup>1</sup> = −1*, ξ* <sup>2</sup>*, ξ* <sup>3</sup> ∈ [−1*,* <sup>1</sup>] ,

$$\text{tr}\_{\mathbf{b}}\,\omega^{h} = \sum\_{j=0}^{I^{2}}\sum\_{k=0}^{I^{3}}w\_{0,j,k}h\_{0}(-1)h\_{j}(\xi^{2})h\_{k}(\xi^{3})\,.$$

Let *ω*<sup>b</sup> be the vector of expansion coefficients of trb *ωh*. Clearly, there exists a linear operator N<sup>b</sup> such that

$$
\underline{a}\_{\mathsf{b}} = \mathbb{N}\_{\mathsf{b}} \; \underline{a}.
$$

The same processes can be done for other boundaries. If we collect the traces of *ω<sup>h</sup>* on all boundaries and combine their vectors of expansion coefficients and corresponding linear operators, we can eventually obtain

$$
\underline{a}\_{\underline{v}} = \mathbb{N} \underline{a},
$$

where the matrix N, like E, is sparse and only depends on the topology of the mesh and the numbering of the degrees of freedom. Furthermore, it contains only 1 as non-zero entries. An example of N can be found in [7]. Now, we can conclude that the trace space, *P<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> <sup>=</sup> tr <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> , is given as

$$P^{I^1, I^2, I^3} := P^{I^2, I^3}\_{-1} \cup P^{I^2, I^3}\_1 \cup P^{I^1, I^3}\_{-1} \cup P^{I^1, I^3}\_1 \cup P^{I^1, I^2}\_{-1} \cup P^{I^1, I^2}\_1,$$

where *<sup>P</sup><sup>I</sup>* <sup>2</sup>*,I* <sup>3</sup> <sup>−</sup><sup>1</sup> is the space spanned by *<sup>h</sup>*0*(*−1*)hj (ξ* <sup>2</sup>*)hk(ξ* <sup>3</sup>*)* , *<sup>P</sup><sup>I</sup>* <sup>2</sup>*,I* <sup>3</sup> <sup>1</sup> is the space spanned by *hI* <sup>1</sup> *(*1*)hj (ξ* <sup>2</sup>*)hk(ξ* <sup>3</sup>*)* and so on. Notice that the polynomial basis functions in *<sup>h</sup>*0*(*−1*)hj (ξ* <sup>2</sup>*)hk(ξ* <sup>3</sup>*)* are exactly the same as those in *hI* <sup>1</sup> *(*1*)hj (ξ* <sup>2</sup>*)hk (ξ* <sup>3</sup>*)* because *h*0*(*−1*)* = *hI* <sup>1</sup> *(*1*)* = 1. But here we still distinguish them because they represent basis functions at different boundaries.

#### *2.2 Algebraic Dual Polynomial Spaces*

We first consider the space <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> . Let <sup>M</sup><sup>P</sup> be the symmetric mass matrix, for example,

$$\begin{aligned} \mathbb{M}\mathbb{M}\mathbb{M}\_{l} + j(I^{1}+1) + k(I^{1}+1)(I^{2}+1), \; l+m(I^{1}+1)+n(I^{1}+1)(I^{2}+1) &:=\\ \iiint\_{\Omega\_{\text{ref}}} h\_{l}(\xi^{1})h\_{j}(\xi^{2})h\_{k}(\xi^{3})h\_{l}(\xi^{1})h\_{m}(\xi^{2})h\_{n}(\xi^{3}) \, \mathrm{d}\xi^{1} \, \mathrm{d}\xi^{2} \, \mathrm{d}\xi^{3} \, \mathrm{d}\xi \end{aligned}$$

The associated algebraic dual polynomial representations, or simply dual polynomials, are linear combinations of the polynomial basis functions, or simply primal polynomials, defined in the previous section,

$$\begin{aligned} \left[ \widehat{h\_{0,0,0}}(\xi^1, \xi^2, \xi^3), \dots, \widehat{h\_{I^1,I^2,I^3}}(\xi^1, \xi^2, \xi^3) \right] \\ := \left[ h\_0(\xi^1) h\_0(\xi^2) h\_0(\xi^3), \dots, \, \, h\_{I^1}(\xi^1) h\_{I^2}(\xi^2) h\_{I^3}(\xi^3) \right] \mathbb{M}\_{\mathcal{P}}^{-1} . \end{aligned}$$

These dual polynomials are always well-defined. This is because the primal polynomials are linearly independent. So the mass matrix <sup>M</sup><sup>P</sup> is injective and surjective, therefore invertible. Let the finite dimensional space spanned by *h i,j,k (ξ* <sup>1</sup>*, ξ* <sup>2</sup>*, ξ* <sup>3</sup>*)* be denoted by <sup>3</sup>P*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> . We say <sup>3</sup>P*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> is the algebraic dual space of the primal space <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> . Note that <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> and <sup>3</sup>P*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> actually represent the same space. The change of basis functions only leads to a different representation. Therefore, we also call the algebraic dual space a dual representation. Let <sup>M</sup>3<sup>P</sup> be the mass matrix of <sup>3</sup>P*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> , we can easily see that

$$
\tilde{\mathbb{M}}\_{\mathcal{P}} \mathbb{M}\_{\mathcal{P}} = I,\tag{5}
$$

where I is the identity matrix. Similarly, we can derive the algebraic dual space 3 L*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> of the primal space <sup>L</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> . Let <sup>M</sup>3<sup>L</sup> and <sup>M</sup><sup>L</sup> be their mass matrices, we have

$$
\widetilde{\mathbb{M}}\_{\mathcal{L}} \mathbb{M}\_{\mathcal{L}} = I. \tag{6}
$$

If *<sup>ρ</sup><sup>h</sup>* <sup>∈</sup> <sup>L</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> , *σ <sup>h</sup>*, whose vector of expansion coefficients *σ* satisfies

$$
\underline{\sigma} = \mathbb{M}\_{\mathcal{L}} \underline{\sigma},
\tag{7}
$$

will be the representation of *<sup>ρ</sup><sup>h</sup>* in the algebraic dual space <sup>3</sup> L*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> .

To explain how the algebraic dual space of the trace space *P<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> is derived, we take *<sup>P</sup><sup>I</sup>* <sup>2</sup>*,I* <sup>3</sup> <sup>−</sup><sup>1</sup> as example. We already know that *<sup>P</sup><sup>I</sup>* <sup>2</sup>*,I* <sup>3</sup> <sup>−</sup><sup>1</sup> is a space spanned by primal polynomials *<sup>h</sup>*0*(*−1*)hj (ξ* <sup>2</sup>*)hk(ξ* <sup>3</sup>*)* . With these primal polynomials, we can compute its mass matrix, denoted by Mb. The dual polynomials are then computed by

$$\begin{aligned} \left[ \widehat{h\_{0,0,0}}(-1, \xi^2, \xi^3), \dots, \widehat{h\_{0,I^2,I^3}}(-1, \xi^2, \xi^3) \right] \\ = \left[ \widehat{h\_0(-1)h\_1(\xi^2)h\_1(\xi^3)}, \dots, \widehat{h\_0(-1)h\_{I^2}(\xi^2)h\_{I^3}(\xi^3)} \right] \mathbb{M}\_{\mathsf{b}^{-1}}^{-1} . \end{aligned}$$

The algebraic dual space3*P<sup>I</sup>* <sup>2</sup>*,I* <sup>3</sup> <sup>−</sup><sup>1</sup> is spanned by dual polynomials *h* -<sup>0</sup>*,j,k(*−1*, ξ* <sup>2</sup>*, ξ* <sup>3</sup>*)* . The algebraic dual space of the trace space *P<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> eventually can be written as

$$
\widetilde{P}^{I^1,I^2,I^3} = \widetilde{P}^{I^2,I^3}\_{-1} \cup \widetilde{P}^{I^2,I^3}\_{1} \cup \widetilde{P}^{I^1,I^3}\_{-1} \cup \widetilde{P}^{I^1,I^3}\_{1} \cup \widetilde{P}^{I^1,I^2}\_{-1} \cup \widetilde{P}^{I^1,I^2}\_{1}.
$$

The divergence of *<sup>σ</sup> <sup>h</sup>* <sup>∈</sup> <sup>3</sup> L*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> can be done with the help of the boundary value *<sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>* <sup>∈</sup> <sup>3</sup>*P<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> . With vector proxies, it can be written as

$$
\underline{\text{div}\,\underline{\sigma}^h} = \mathbb{N}^\mathsf{T}\,\underline{\hat{\underline{\sigma}}^h} - \mathbb{E}^\mathsf{T}\,\underline{\underline{\sigma}}^h.\tag{8}
$$

A detailed introduction of algebraic dual polynomial spaces is given in [9].

#### *2.3 Function Spaces in Curvilinear Domains*

So far, all polynomial spaces are defined only in the reference domain ref|*<sup>ξ</sup>* <sup>1</sup>*,ξ* <sup>2</sup>*,ξ* <sup>3</sup> = [−1*,* 1] 3. Consider an arbitrary domain and a <sup>C</sup><sup>1</sup> diffeomorphism : ref|*<sup>ξ</sup>* <sup>1</sup>*,ξ* <sup>2</sup>*,ξ* <sup>3</sup> → |*x*1*,x*2*,x*<sup>3</sup> . In , the primal polynomials change. Therefore, the mass matrices will also change. But the process of constructing dual polynomials does not change. And as we mentioned before, the metric-independent incidence matrix E and the matrix N remain the same. The way of converting polynomials in Cartesian domain into those in curvilinear domains follows the general coordinate transformation process, for example, see [16].

From now on, notations mentioned in this section not only refer to the reference domain ref, but also refer to the physical domain .

#### **3 Weak Formulations**

#### *3.1 Discrete Neumann Problem*

With integration by parts, we can derive the weak formulation of the Neumann problem, (1), written as: For given *<sup>σ</sup>*<sup>ˆ</sup> <sup>∈</sup> *<sup>H</sup>* <sup>−</sup>1*/*2*(∂)*, find *<sup>ω</sup>* <sup>∈</sup> *<sup>H</sup>*1*()* such that

$$\left(\text{grad}\,\boldsymbol{\omega},\,\,\text{grad}\,\tilde{\boldsymbol{\omega}}\right)\_{L^{2}} + (\boldsymbol{\omega},\,\,\tilde{\boldsymbol{\omega}})\_{L^{2}} = \left<\text{tr}\,\,\tilde{\boldsymbol{\omega}},\,\,\hat{\boldsymbol{\sigma}}\right>,\,\,\,\forall\tilde{\boldsymbol{\omega}}\in H^{1}(\Omega).\tag{9}$$

Note that on the right hand side, we use ·*,*· to represent the duality pairing between tr *<sup>ω</sup>*¯ <sup>∈</sup> *<sup>H</sup>*1*/*2*(∂)* and *<sup>σ</sup>*<sup>ˆ</sup> <sup>∈</sup> *<sup>H</sup>* <sup>−</sup>1*/*2*(∂)*. We use finite dimensional space <sup>P</sup>*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> to approximate the space *<sup>H</sup>*1*()* and use the algebraic dual trace space <sup>3</sup>*P<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> to approximate the space *H* <sup>−</sup>1*/*2*(∂)*. Then we obtain

$$\left(\text{grad}\,\boldsymbol{\omega}^{h},\,\text{grad}\,\bar{\boldsymbol{\omega}}^{h}\right)\_{L^{2}}=\bar{\boldsymbol{\omega}}^{h,\,\text{T}}\,\mathbb{E}^{\mathsf{T}}\mathbb{M}\_{\mathcal{L}}\mathbb{E}\,\underline{\boldsymbol{\omega}}^{h},$$

$$\left(\boldsymbol{\omega}^{h},\,\,\bar{\boldsymbol{\omega}}^{h}\right)\_{L^{2}}=\bar{\boldsymbol{\omega}}^{h,\,\mathsf{T}}\,\mathbb{M}\boldsymbol{\rho}\,\,\underline{\boldsymbol{\omega}}^{h},$$

and

$$\int\_{\partial\Omega} \text{tr}\,\bar{\boldsymbol{\phi}}^h \,\hat{\boldsymbol{\sigma}}^h \,\mathrm{d}\Gamma = \underline{\underline{\boldsymbol{\phi}}}^{h,\mathsf{T}} \,\mathbb{N}^{\mathsf{T}} \,\underline{\hat{\underline{\mathbf{o}}}^h}.$$

which eventually leads to the discrete formulation of (9),

$$\mathbb{E}^{\mathsf{T}}\mathbb{M}\_{\mathcal{L}}\mathbb{E}\,\underline{\boldsymbol{\omega}}^{h} + \mathbb{M}\_{\mathcal{P}}\,\underline{\boldsymbol{\alpha}}^{h} = \mathbb{N}^{\mathsf{T}}\,\underline{\hat{\mathsf{o}}}^{h}.\tag{10}$$

#### *3.2 Discrete Dirichlet Problem*

For the Dirichlet problem, (2), the weak formulation is given as: For given *σ*ˆ ∈ *<sup>H</sup>* <sup>−</sup>1*/*2*(∂)*, find *<sup>σ</sup>* <sup>∈</sup> *H (*div*, )*, tr *<sup>σ</sup>* = ˆ*<sup>σ</sup>* such that

$$(\operatorname{div}\sigma,\,\operatorname{div}\bar{\sigma})\_{L^2} + (\sigma,\,\bar{\sigma})\_{L^2} = 0,\quad \forall \bar{\sigma} \in H\_0(\operatorname{div},\Omega). \tag{11}$$

We use algebraic dual space <sup>3</sup> L*<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> to approximate *H (*div*, )*. With *<sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>* <sup>∈</sup> 3*P<sup>I</sup>* <sup>1</sup>*,I* <sup>2</sup>*,I* <sup>3</sup> given and (8), we obtain

$$\left(\operatorname{div}\boldsymbol{\sigma}^{h},\operatorname{div}\bar{\boldsymbol{\sigma}}^{h}\right)\_{L^{2}}=-\bar{\underline{\sigma}}^{h,\mathsf{T}}\operatorname{\mathbb{E}\tilde{\mathbb{M}}\_{\mathcal{P}}\left(\mathbb{N}^{\mathsf{T}}\,\underline{\hat{\underline{\sigma}}}^{h}-\mathbb{E}^{\mathsf{T}}\,\underline{\underline{\sigma}}^{h}\right),$$

and

$$\left(\boldsymbol{\sigma}^{h},\,\,\bar{\boldsymbol{\sigma}}^{h}\right)\_{L^{2}}=\underline{\bar{\boldsymbol{\sigma}}^{h,\mathsf{T}}\,\,\widetilde{\mathbb{M}}\_{\mathcal{L}}\,\underline{\underline{\sigma}}^{h}\,\,\cdot$$

Therefore, the discrete formulation of (11) is written as

$$\mathbb{E}\widetilde{\mathbb{M}}\_{\mathcal{P}}\mathbb{E}^{\mathsf{T}}\underline{\mathbf{o}}^{h} + \widetilde{\mathbb{M}}\_{\mathcal{L}}\underline{\mathbf{o}}^{h} = \mathbb{E}\widetilde{\mathbb{M}}\_{\mathcal{P}}\mathbb{N}^{\mathsf{T}}\underline{\hat{\mathbf{o}}}^{h}.\tag{12}$$

#### *3.3 Equivalence Between Discrete Formulations*

Now it is time to check if the equivalence between (1) and (2) holds at the discrete level. In other words, it is time to check if the statement that *ω<sup>h</sup>* solves (10) if and only if *<sup>σ</sup> <sup>h</sup>* <sup>=</sup> grad *<sup>ω</sup><sup>h</sup>* solves (12) is correct.

From (4) and (7), we know that *σ <sup>h</sup>*,

$$
\underline{\underline{\sigma}}^h = \mathbb{M} \mathcal{L} \, \underline{\mathbb{E}} \, \underline{\underline{\omega}}^h,\tag{13}
$$

is the vector representation of grad *ω<sup>h</sup>* in the dual space. If we insert (13) into (12), we obtain

$$\mathbb{E}\widetilde{\mathbb{M}}\mathbb{M}^{\mathbb{E}}\mathbb{M}^{\mathbb{T}}\mathbb{M}^{\mathbb{T}}\mathbb{E}\,\underline{\boldsymbol{\omega}}^{h} + \widetilde{\mathbb{M}}\mathbb{M}^{\mathbb{T}}\mathbb{L}^{\mathbb{E}}\underline{\mathbb{E}}\,\underline{\boldsymbol{\omega}}^{h} = \mathbb{E}\widetilde{\mathbb{M}}\mathbb{M}^{\mathbb{N}}\,\underline{\hat{\mathbf{o}}}^{h}.\tag{14}$$

From (10), we know that

$$\mathbb{E}^{\mathsf{T}}\mathbb{M}\_{\mathbb{C}}\mathbb{E}\,\underline{\boldsymbol{\omega}}^{h} = -\mathbb{M}\boldsymbol{\rho}\,\underline{\boldsymbol{\omega}}^{h} + \mathbb{N}^{\mathsf{T}}\,\underline{\hat{\underline{\sigma}}}^{h}.\tag{15}$$

By inserting (15) into (14), we get

$$\mathbb{E}\widetilde{\mathbb{M}}\mathfrak{\rho}\left(-\mathbb{M}\underline{\rho}\,\,\underline{\omega}^{h}+\mathbb{N}^{\mathsf{T}}\,\underline{\hat{\mathfrak{a}}}^{h}\right)+\widetilde{\mathbb{M}}\_{\mathcal{L}}\mathbb{M}\_{\mathcal{L}}\mathbb{E}\,\underline{\omega}^{h}=\mathbb{E}\widetilde{\mathbb{M}}\mathbb{M}^{\mathsf{T}}\,\underline{\hat{\mathfrak{a}}}^{h}.\tag{16}$$

From (5) and (6), we know that (16) holds, which proves the equivalence.

If the equivalence holds, relation *ωh <sup>H</sup>*1*()* = *σ h H (*div*,)* should also be satisfied. To prove this, we have

 *σ h* 2 *H (*div*,)* (8) <sup>=</sup> *<sup>σ</sup> h,*<sup>T</sup> <sup>M</sup>3<sup>L</sup> *<sup>σ</sup> <sup>h</sup>* <sup>+</sup> <sup>N</sup><sup>T</sup> *<sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>* <sup>−</sup> <sup>E</sup>T*<sup>σ</sup> <sup>h</sup>* T M3P <sup>N</sup><sup>T</sup> *<sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>* <sup>−</sup> <sup>E</sup>T*<sup>σ</sup> <sup>h</sup>* (13) = <sup>M</sup>L<sup>E</sup> *<sup>ω</sup><sup>h</sup>* T M3L <sup>M</sup>L<sup>E</sup> *<sup>ω</sup><sup>h</sup>* + <sup>N</sup><sup>T</sup> *<sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>* <sup>−</sup> <sup>E</sup><sup>T</sup> <sup>M</sup>L<sup>E</sup> *<sup>ω</sup><sup>h</sup>* T M3P <sup>N</sup><sup>T</sup> *<sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>* <sup>−</sup> <sup>E</sup><sup>T</sup> <sup>M</sup>L<sup>E</sup> *<sup>ω</sup><sup>h</sup>* (10) <sup>=</sup> *<sup>ω</sup>h,*<sup>T</sup> <sup>E</sup>TML<sup>E</sup> *<sup>ω</sup><sup>h</sup>* <sup>+</sup> *<sup>ω</sup>h,*<sup>T</sup> <sup>M</sup>PM3PM<sup>P</sup> *<sup>ω</sup><sup>h</sup>* = *ωh* 2 *<sup>H</sup>*1*() ,*

where we constantly use (5) and (6) and the fact that mass matrices are symmetric.

#### **4 Numerical Test**

Consider the mapping which maps the Cartesian reference domain ref|*<sup>ξ</sup>* <sup>1</sup>*,ξ* <sup>2</sup>*,ξ* <sup>3</sup> := [−1*,* 1] <sup>3</sup> into the physical domain |*x*1*,x*2*,x*<sup>3</sup> = [0*,* <sup>1</sup>] <sup>3</sup> by

$$\alpha^{i} = \frac{1}{2} + \frac{1}{2} \left( \xi^{i} + c \prod\_{j} \sin(\pi \xi^{j}) \right), \ i = 1, 2, 3.$$

When the deformation coefficient *c* = 0, the domain is Cartesian. Otherwise the domain is curvilinear, meaning that a curvilinear coordinate system parametrizes . Examples of such curvilinear domains in R<sup>2</sup> are shown in Fig. 1.

A manufactured solution of the Neumann problem, (1), is

$$a\_{\text{exact}} = e^{\boldsymbol{\chi}^1} + e^{\boldsymbol{\chi^2}} + e^{\boldsymbol{\chi^3}}.$$

Clearly, *σ*exact = grad *ω*exact = *ex*1 *, ex*<sup>2</sup> *, ex*<sup>3</sup> T solves the Dirichlet problem, (2).

In the domains of different deformation coefficient *c*, with the boundary condition *σ*ˆ = tr *σ*exact imposed, we solve the discrete formulations (10) and (12) using Gauss–Lobatto–Legendre (GLL) polynomial spaces of degree *<sup>I</sup>* <sup>1</sup> <sup>=</sup> *<sup>I</sup>* <sup>2</sup> <sup>=</sup> *<sup>I</sup>* <sup>3</sup> <sup>=</sup> *<sup>N</sup>*.

The results of the *L*2-error of *<sup>σ</sup> <sup>h</sup>* <sup>−</sup> grad *<sup>ω</sup><sup>h</sup>* are shown in Fig. 2 (Left) where we can see that the relation *<sup>σ</sup> <sup>h</sup>* <sup>=</sup> grad *<sup>ω</sup><sup>h</sup>* is preserved up to the machine precision. With the growth of the polynomial degree, the error increases slowly because of the accumulation of the machine error as the amount of degrees of freedom grows significantly.

In Table 1, the results of the *H*1-norm of *ω<sup>h</sup>* and *H (*div*)*-norm of *σ <sup>h</sup>* are presented. It is shown that the relation *ωh <sup>H</sup>*1*()* = *σ h H (*div*,)* holds for all polynomial degrees irrespective of whether we use the Cartesian domain, *c* = 0, or

**Fig. 1** Curvilinear domains for *<sup>c</sup>* <sup>=</sup> <sup>0</sup>*.*15 (Left) and *<sup>c</sup>* <sup>=</sup> <sup>0</sup>*.*3 (Right) in <sup>R</sup>2. The gray lines illustrate the coordinate lines

**Fig. 2** The *L*2-error of *<sup>σ</sup> <sup>h</sup>* <sup>−</sup> grad *<sup>ω</sup><sup>h</sup>* (Left) and the *p*-convergence of the *H*1-error of *ω<sup>h</sup>* (Right) for *N* = 2*,* 4*,*··· *,* 20 and *c* = 0*,* 0*.*15*,* 0*.*3

**Table 1** The *<sup>H</sup>*1-norm of *<sup>ω</sup><sup>h</sup>* and *H (*div*)*-norm of *<sup>σ</sup> <sup>h</sup>* for polynomial degree *<sup>N</sup>* <sup>=</sup> <sup>2</sup>*,* <sup>4</sup>*,*··· *,* <sup>20</sup> and deformation coefficient *c* = 0*,* 0*.*15*,* 0*.*3


curvilinear domains, *c* = 0*.*15*,* 0*.*3. It is also seen that the results always converge to the analytical value *ω*exact*H*<sup>1</sup> = *σ h H (*div*)* = 6*.*0730653668.The *p*-convergence for the *H*1-error of *ωh*, therefore also for the *H (*div*)*-error of *σ <sup>h</sup>*, is shown in Fig. 2 (Right), which shows the exponential convergence of the method.

#### **5 Conclusions**

By constructing and using primal polynomial spaces and their algebraic dual representations both in the domain and on the boundary, we successfully preserve the equivalence of the div-grad Neumann problem and the grad-div Dirichlet problem at the discrete level in 3-dimensional curvilinear domains. This suggests the further usage of these spaces to structure-preserving methods and hybrid methods.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Conservative Hybrid Method for Darcy Flow**

**Varun Jain, Joël Fisser, Artur Palha, and Marc Gerritsma**

### **1 Introduction**

Hybrid formulations [1, 3, 10] are classical domain decomposition methods which reduce the problem of solving one global system to many small local systems. The local systems can then be efficiently solved independently of each other in parallel.

In this work we present a hybrid mimetic spectral element formulation to solve Darcy flow. We follow [8] which render the constraints on divergence of mass flux, the pressure gradient and the inter-element continuity metric free. The resulting system is extremely sparse and shows a reduced growth in condition number as compared to a non-hybrid system.

This document is structured as follows: In Sect. 2 we define the weak formulation for Darcy flow. The basis functions are introduced in Sect. 3. The evaluation of weighted inner product and duality pairings are discussed in Sect. 4. In Sect. 5 we discuss the formulation of discrete algebraic system. In Sect. 6 we present results for a test case taken from [7].

V. Jain (-) · J. Fisser · A. Palha · M. Gerritsma

Faculty of Aerospace Engineering, TU Delft, Delft, The Netherlands e-mail: V.Jain@tudelft.nl; A.PalhaDaSilvaClerigo@tudelft.nl; M.I.Gerritsma@tudelft.nl

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_16

#### **2 Darcy Flow Formulation**

For <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* , where *<sup>d</sup>* is the dimension of the domain, the governing equations for Darcy flow, are given by,

$$\begin{cases} \mathfrak{u} + \mathbb{A} \, \nabla p = 0 \\ \nabla \cdot \mathfrak{u} \, \qquad = f \end{cases} \quad \text{in } \Omega \quad \text{and} \quad \begin{cases} \partial \Omega &= \Gamma\_D \cup \Gamma\_N \\ p &= \hat{p} \\ \mathfrak{u} \cdot \mathfrak{n} = \hat{\mathfrak{u}}\_n & \text{on } \Gamma\_N \end{cases} ,$$

where, *<sup>u</sup>* is the velocity, *<sup>p</sup>* is the pressure, *<sup>f</sup>* the prescribed RHS term, <sup>A</sup> is a *<sup>d</sup>* <sup>×</sup> *<sup>d</sup>* symmetric positive definite matrix, *p*ˆ and *u*ˆ *<sup>n</sup>* are the prescribed pressure and flux boundary conditions, respectively.

#### *2.1 Notations*

For *f, g* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *()*, *f, g* denotes the usual *<sup>L</sup>*2-inner product.

For vector-valued functions in *L*<sup>2</sup> we define the weighted inner product by,

$$(\mathfrak{u}, \mathfrak{v})\_{\mathbb{A}^{-1}, \Omega} = \int\_{\Omega} \left( \mathfrak{u}, \mathbb{A}^{-1} \mathfrak{v} \right) d\Omega \,, \tag{1}$$

where *(*· *,*·*)* denotes the pointwise inner product.

Duality pairing, denoted by ·*,*·, is the outcome of a linear functional on *L*<sup>2</sup> *()* acting on elements from *L*<sup>2</sup> *()*.

Let *K* be a disjoint partitioning of with total number of elements *K*, and *Ki* is any element in *K*, such that, *Ki* ∈ *K*. We define the following broken Sobolev spaces [2], *H* div; *K* = A *<sup>i</sup> H* div; *Ki* , and *<sup>H</sup>*1*/*<sup>2</sup> *(∂K)* <sup>=</sup> <sup>A</sup> *<sup>i</sup> H*1*/*<sup>2</sup> *(∂Ki)*.

#### *2.2 Weak Formulation*

The Lagrange functional for Darcy flow is defined as,

$$\begin{split} \mathcal{L}\left(\mathfrak{u},p,\lambda;f\right) &= \frac{1}{2} \int\_{\partial\Omega\_{K}} \mathfrak{u}^{T} \mathbb{A}^{-1} \mathfrak{u} \,\mathrm{d}\Omega\_{K} - \int\_{\Omega\_{K}} p \left(\nabla \cdot \mathfrak{u} - f\right) \mathrm{d}\Omega\_{K} \\ &+ \int\_{\partial\Omega\_{K} \backslash \Gamma\_{D}} \lambda \left(\mathfrak{u} \cdot \mathfrak{n}\right) \,\mathrm{d}\Gamma + \int\_{\Gamma\_{D}} \hat{p} \left(\mathfrak{u} \cdot \mathfrak{n}\right) \,\mathrm{d}\Gamma - \int\_{\Gamma\_{N}} \lambda \left(\hat{\mathfrak{u}}\_{\mathfrak{n}}\right) \,\mathrm{d}\Gamma \end{split}$$

The variational problem is then given by: For given *<sup>f</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *(K)*, *<sup>p</sup>*<sup>ˆ</sup> <sup>∈</sup> *<sup>H</sup>*1*/*<sup>2</sup>*(ΓD)* and *<sup>u</sup>*<sup>ˆ</sup> *<sup>n</sup>* <sup>∈</sup> *<sup>H</sup>* <sup>−</sup>1*/*<sup>2</sup>*(ΓN )* find *<sup>u</sup>* <sup>∈</sup> *H (*div; *K)*, *<sup>p</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *(K)*, *<sup>λ</sup>* <sup>∈</sup> *<sup>H</sup>* <sup>1</sup> <sup>2</sup> *(∂K)*, such that,

$$\begin{cases} \left. \left( \boldsymbol{\mathfrak{v}}, \boldsymbol{\mathfrak{u}} \right) \right|\_{\boldsymbol{\Lambda}^{-1}, \boldsymbol{\Omega}\_{K}} - \left< \nabla \cdot \boldsymbol{\mathfrak{v}}, p \right>\_{\boldsymbol{\Omega}\_{K}} + \left< \left( \boldsymbol{\mathfrak{v}} \cdot \boldsymbol{\mathfrak{n}} \right), \lambda \right>\_{\partial \Omega\_{K} \backslash \Gamma\_{D}} = - \left< \boldsymbol{\mathfrak{v}} \cdot \boldsymbol{\mathfrak{n}}, \hat{p} \right>\_{\Gamma\_{D}} & \forall \, \mathfrak{v} \in H(\operatorname{div}; \Omega\_{K}) \\\\ - \left< q, \nabla \cdot \boldsymbol{\mathfrak{n}} \right>\_{\Omega\_{K}} & = - \left< q, \boldsymbol{f} \right>\_{\Omega\_{K}} & \forall \, \mathfrak{q} \in L^{2}(\Omega\_{K}) \\\\ \left< \mu, \left( \boldsymbol{\mathfrak{n}} \cdot \boldsymbol{\mathfrak{n}} \right) \right>\_{\partial \Omega\_{K} \backslash \Gamma\_{D}} & = \left< \mu, \left< \boldsymbol{\mathfrak{n}}, \boldsymbol{\mathfrak{n}} \right>\_{\Gamma\_{N}} & \forall \, \mu \in H^{\frac{1}{2}}(\partial \Omega\_{K}) \end{cases} \end{cases} \tag{2}$$

#### **3 Basis Functions**

#### *3.1 Primal and Dual Nodal Degrees of Freedom*

Let *ξj* , *j* = 0*,* 1*,...,N*, be the *N* + 1 Gauss–Lobatto–Legendre (GLL) points in *I* ∈ −1*,* 1 . The Lagrange polynomials *hi(ξ )* through *ξj* , of degree *N*, given by,

$$h\_l\left(\xi\right) = \frac{\left(\xi^2 - 1\right)L'\_N\left(\xi\right)}{N\left(N + 1\right)L\_N\left(\xi\_l\right)\left(\xi - \xi\_l\right)},$$

form the 1D primal nodal polynomials which satisfy, *hi(ξj )* = *δij* .

Let *a<sup>h</sup>* and *b<sup>h</sup>* be two polynomials expanded in terms of *hi ξ* . The *L*2—inner product is then given by,

$$\left(a^{h}, b^{h}\right)\_{I} = \mathbf{a}^{T} \mathbb{M}^{(0)} \mathbf{b} \,, \quad \text{where} \quad \mathbb{M}^{(0)}\_{i,j} = \int\_{-1}^{1} h\_{l}(\xi) \, h\_{j}(\xi) \, d\xi \,,$$

and, **a** = [a<sup>0</sup> a<sup>1</sup> *...* a*<sup>N</sup>* ] and **b** = b<sup>0</sup> b<sup>1</sup> *...* b*<sup>N</sup>* are the nodal degrees of freedom. We define the algebraic *dual* degrees of freedom, **a**, such that the duality pairing is simply the vector dot product between primal and dual degrees of freedom,

$$\begin{aligned} \text{for dot product between primal and dual degrees of } \mathbf{f} \\\\ \left\langle a^h, b^h \right\rangle\_I = \widetilde{\mathbf{a}}^T \mathbf{b} := \mathbf{a}^T \mathbb{M}^{(0)} \mathbf{b} \quad \Rightarrow \quad \widetilde{\mathbf{a}} = \mathbb{M}^{(0)} \mathbf{a} \end{aligned}$$

Thus, the dual degrees of freedom are linear functionals of primal degrees of freedom.

#### *3.2 Primal and Dual Edge Degrees of Freedom*

The edge polynomials, for the *N* edges between *N* + 1 GLL points *ξj*−<sup>1</sup>*, ξj* , of polynomial degree *N* − 1, are defined as [4],

$$e\_j(\xi) = -\sum\_{k=0}^{j-1} \frac{\mathrm{d}h\_k}{\mathrm{d}\xi}(\xi) \ , \quad \text{such that} \quad \int\_{\xi\_{j-1}}^{\xi\_j} e\_i(\xi) = \delta\_{ij} \ .$$

Let *p<sup>h</sup>* and *q<sup>h</sup>* be two polynomials expanded in edge basis functions. The inner product in *L*<sup>2</sup> space is given by,

$$\left(p^h, q^h\right)\_I = \mathbf{p}^T \mathbb{M}^{(1)} \mathbf{q} \,, \quad \text{where} \quad \mathbb{M}\_{l,j}^{(1)} = \int\_{-1}^1 e\_l(\xi) \, e\_j(\xi) \, d\xi \,, \quad \text{as} \quad \text{for} \quad l \in \mathbb{N}$$

and, **p** = p<sup>1</sup> p<sup>2</sup> *...* p*<sup>N</sup>* and **q** = q<sup>1</sup> q<sup>2</sup> *...* q*<sup>N</sup>* are the edge degrees of freedom. As before, we define the *dual* degrees of freedom such that,

 *ph, qh I* = **<sup>p</sup>***<sup>T</sup>* **<sup>q</sup>** := **<sup>p</sup>***<sup>T</sup>* <sup>M</sup>*(*1*)* **<sup>q</sup>** <sup>⇒</sup> **<sup>p</sup>** <sup>=</sup> <sup>M</sup>*(*1*)* **p** *.*

A similar construction can be used for dual degrees of freedom in higher dimensions. For construction of the dual degrees of freedom in 2D see [8] and for 3D see [9].

#### *3.3 Differentiation of Nodal Polynomial Representation*

Let *a<sup>h</sup> ξ* be expanded in Lagrange polynomials, then

$$\frac{d}{d\xi}a^h\left(\xi\right) = \frac{d}{d\xi}\sum\_{l=0}^{N} \mathbf{a}\_l h\_l\left(\xi\right) = \sum\_{l=1}^{N} \left(\mathbf{a}\_l - \mathbf{a}\_{l-1}\right) e\_l\left(\xi\right) \,. \tag{3}$$

Therefore, taking the derivative of a polynomial involves two steps: First, take the difference of degrees of freedom; second, change of basis from nodal to edge [4].

#### **4 Discrete Inner Product and Duality Pairing**

For 2D domains, the higher dimensional primal basis are constructed using the tensor product of the 1D basis.

For the weak formulation (2) we expand the velocity *u<sup>h</sup>* in primal edge basis as,

$$\mathfrak{u}^{h}\left(\boldsymbol{\xi},\boldsymbol{\eta}\right) = \sum\_{l=0}^{N} \sum\_{j=1}^{N} \mathfrak{u}\_{\mathfrak{X}l,j} \, h\_{l}\left(\boldsymbol{\xi}\right) \, e\_{j}\left(\boldsymbol{\eta}\right) \, \hat{\mathfrak{z}} + \sum\_{l=1}^{N} \sum\_{j=0}^{N} \mathfrak{u}\_{\mathfrak{Y}l,j} \, e\_{l}\left(\boldsymbol{\xi}\right) \, h\_{j}\left(\boldsymbol{\eta}\right) \, \hat{\mathfrak{z}}\,,\tag{4}$$

where u*<sup>x</sup> i,j* denotes the flux, ; *u* · *n*, over the vertical edges and u*<sup>y</sup> i,j* the flux over the horizontal edges, see Fig. 1.

#### *4.1 Weighted Inner Product*

Using (1) and the expansions in (4), the weighted inner product is evaluated as,

$$\left(\boldsymbol{v}^{h},\boldsymbol{u}^{h}\right)\_{\mathbb{A}^{-1},\mathfrak{Q}\_{K}} = \sum\_{K\_{\ell}} \mathbf{v}\_{K\_{\ell}}^{T} \, \mathbb{M}\_{\mathbb{A}^{-1},K\_{\ell}}^{(1)} \, \mathbf{u}\_{K\_{\ell}} \,, $$

**Fig. 1** Discretized domain for *K* = 3 × 3, *N* = 3. The blue dots represent the pressure boundary condition *p*ˆ, and the blue edges represent the velocity boundary condition *u*ˆ *<sup>n</sup>*

where, **u***Ki* are the degrees of freedom in element *Ki*, and

$$\mathbb{M}^{(1)}\_{\mathbb{A}^{-1}, K\_l} = \int\_{K\_l} \begin{pmatrix} h\_l(\xi) \ e\_j(\eta) \\ e\_l(\xi) \ h\_j(\eta) \end{pmatrix}^T \mathbb{A}^{-1} \begin{pmatrix} \xi, \eta \end{pmatrix} \begin{pmatrix} h\_l(\xi) \ e\_j(\eta) \\ e\_l(\xi) \ h\_j(\eta) \end{pmatrix} \, \mathrm{d}K\_l \ .$$

For mapping of elements please refer to [6].

#### *4.2 Divergence of Velocity*

Divergence of velocity, ∇ · *<sup>u</sup>h*, is evaluated using (3), but now for 2D,

$$\begin{split} \nabla \cdot \mathbf{u}^{h} &= \frac{\partial}{\partial \mathbf{x}} \sum\_{l=0}^{N} \sum\_{j=1}^{N} \mathbf{u}\_{\mathbf{x}l,j} \, h\_{l}(\xi) e\_{j}(\eta) + \frac{\partial}{\partial \mathbf{y}} \sum\_{l=1}^{N} \sum\_{j=0}^{N} \mathbf{u}\_{\mathbf{y}\_{i,j}} e\_{l}(\xi) h\_{j}(\eta) \\ &= \sum\_{l,j=1}^{N} \left( \mathbf{u}\_{\mathbf{x}l,j} - \mathbf{u}\_{\mathbf{x}l-1,j} + \mathbf{u}\_{\mathbf{y}\_{i,j}} - \mathbf{u}\_{\mathbf{y}\_{i,j-1}} \right) e\_{l}(\xi) \, e\_{j}(\eta) \end{split} \tag{5}$$

For pressure we will use dual degrees of freedom. Therefore the weak constraint on divergence of velocity is a duality pairing evaluated as, **-**

$$\left\langle q^h, \nabla \cdot \mathbf{u}^h \right\rangle\_{\Omega\_K} = \sum\_{K\_l} \widetilde{\mathbf{q}}\_{K\_l}^T \, \mathbb{E}^{2,1} \, \mathbf{u}\_{K\_l} \,, \, $$

where E2*,*<sup>1</sup> represents the discrete divergence operator. It is an incidence matrix that is metric-free and topological, and remains the same for each element in *K*. For an extensive discussion on the incidence matrix, see for instance [6]. For an element of degree *N* = 3,

<sup>E</sup>2*,*<sup>1</sup> <sup>=</sup> ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ −1 0 0 1 0 0 0 0 0 000 −1 0 0 1 0 0 0 0 0 000 000 −1 0 0 1 0 0 000 0 −1 0 0 1 0 0 0 0 000 000000 −1 0 0 100 0 0 −1 0 0 1 0 0 0 000 0 −1 0 0 1 0 0 0 0 000 0 0 0 −1 0 0 1 0 0 000 0000 −1 0 0 1 0 000 0 0 0 0 −1 0 0 1 0 000 0000000 −1 0 010 0 0 0 0 0 −1 0 0 1 000 0 0 −1 0 0 1 0 0 0 000 0 0 0 0 0 0 −1 0 0 100 00000 −1 0 0 1 000 0 0 0 0 0 0 0 −1 0 010 00000000 −1001 0 0 0 0 0 0 0 0 −1001 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *.*

#### *4.3 Connectivity Matrix*

The connectivity matrix ensures continuity of the velocity flux across the elements. *λ* is the interface variable defined between the elements, shown as red dots in Fig. 1. *λ* acts as Lagrange multiplier that imposes the continuity constraint given by,

$$\begin{aligned} \text{nge multiplier that imposes the continuity constraint:}\\ \left\langle \boldsymbol{\mu}^{h}, \boldsymbol{\mu}^{h} \cdot \mathbf{n} \right\rangle\_{\partial \Omega\_{K} \backslash \Gamma\_{D}} = \sum\_{K} \widetilde{\boldsymbol{\mu}}\_{K\_{i}}^{T} \boxtimes \mathbf{u}\_{K\_{i}} = \widetilde{\boldsymbol{\mu}}^{T} \boxtimes\_{\mathbb{N}} \mathbf{u}\_{-,h} \end{aligned}$$

where <sup>N</sup> is the discrete trace operator. It is a sparse matrix that consists of 1, <sup>−</sup><sup>1</sup> and 0 only. For construction of N please refer to [5]. E<sup>N</sup> is the assembled N for all elements. For, *<sup>K</sup>* <sup>=</sup> <sup>2</sup> <sup>×</sup> 2, *<sup>N</sup>* <sup>=</sup> 2, <sup>E</sup><sup>N</sup> is shown in (6). The matrix size of <sup>E</sup><sup>N</sup> is 8 × 64, but it has only 16 non-zero entities. It is an extremely sparse matrix that is metric free and the location of ± valued entries depend only on the connection between different elements.

<sup>E</sup><sup>N</sup> <sup>=</sup> ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ <sup>0000</sup> 1 0 00000 0 0000 -1 0 000000000 0 000000000 0 0 0 000000000 0 00000 0 00000000 <sup>00000</sup> 1 00000 0 00000 -1 000000000 0 000000000 0 0 0 000000000 0 00000 0 00000000 00000 0 00000 0 00000 0 000000000 0 00000000 1 0 0 0 00000000 -1 0 00000 0 00000000 00000 0 00000 0 00000 0 000000000 0 000000000 1 0 0 000000000 -1 00000 0 00000000 00000 0 0000 1 0 00000 0 000000000 0 000000000 0 -1 0 000000000 0 00000 0 00000000 00000 0 00000 1 00000 0 000000000 0 000000000 0 0 -1 000000000 0 00000 0 00000000 00000 0 00000 0 00000 0 00000000 1 0 000000000 0 0 0 000000000 0 0000 -1 0 00000000 00000 0 00000 0 00000 0 000000000 1 000000000 0 0 0 000000000 0 00000 -1 <sup>00000000</sup> ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *.* (6)

#### **5 Discrete Formulation**

Using the weighted inner product and duality pairings discussed in Sect. 4, we can write the discrete form of weak formulation in (2) as,

$$
\begin{bmatrix}
\mathbb{B} & \mathbb{E}\_{\mathbb{N}}^{\mathcal{T}} \\
\mathbb{E}\_{\mathbb{N}} & 0
\end{bmatrix}
\begin{bmatrix}
\mathbf{x} \\
\mathbf{x}
\end{bmatrix} = 
\begin{bmatrix}
\mathbf{F} \\
\mathbf{0}
\end{bmatrix}.
\tag{7}
$$

where, B is an invertible block diagonal matrix given by,

$$
\mathbb{B} = \begin{bmatrix}
\mathbb{M}\_{\mathbb{A}^{-1}, K\_1}^{(1)} & \mathbb{E}^{2, 1} & & & \\
& \mathbb{E}^{2, 1} & \mathbb{0} & & & \\
& & \mathbb{M}\_{\mathbb{A}^{-1}, K\_2}^{(1)} \mathbb{E}^{2, 1} & & & \\
& & \mathbb{E}^{2, 1} & \mathbb{0} & & \\
& & & \ddots & \ddots & \\
& & & & \ddots & \ddots \\
& & & & & \mathbb{M}\_{\mathbb{A}^{-1}, K\_K}^{(1)} \mathbb{E}^{2, 1} \\
& & & & \mathbb{E}^{2, 1} & \mathbb{0}
\end{bmatrix}, \tag{8}
$$

<sup>E</sup><sup>N</sup> is as given in (6), **<sup>X</sup>** <sup>=</sup> <sup>&</sup>lt; *i* ⎡ ⎣ **u p** ⎤ ⎦ *Ki* , and **F** = < *i* ⎡ ⎣ **p**ˆ **f** ⎤ ⎦ *Ki* , where **f** are the expansion coefficients of *f <sup>h</sup> x,y* <sup>=</sup> <sup>&</sup>lt;*<sup>N</sup> i,j* f*ij ei (x) ej y* .

In (8), the mass matrix M*(*1*)* <sup>A</sup>−1*,Ki* is the only dense matrix and also the only matrix that changes with each local element, *Ki*. E<sup>N</sup> is a sparse incidence matrix for the global system and E2*,*<sup>1</sup> is a sparse incidence matrix for the local systems that remains the same for each element.

Using the Schur complement method, the global system (7) can be reduced to solve for *λ*, [1],

$$\lambda = \left(\mathbb{E}\_{\mathbb{N}} \mathbb{E}^{-1} \, \mathbb{E}\_{\mathbb{N}} \, ^T\right)^{-1} \cdot \left(\mathbb{E}\_{\mathbb{N}} \mathbb{E}^{-1} \, \mathbf{F}\right) \, . \tag{9}$$

To evaluate *λ* in (9) we need B−<sup>1</sup> that can be calculated efficiently by taking inverse of each block of B separately. This part is trivially parallelized. Once the *λ* is determined the solution in each element, *Ki*, can be evaluated independent of each other.

The system (9) solves for interface degrees of freedom between the elements and will always be smaller than the full global system. For a comparison of the size of *λ* system with full system see Table 1 (for 2D), and Table 2 (for 3D). On the left of Tables 1 and 2 we see that, for constant *K*, increasing the order of polynomial basis the growth in size of *λ* system is less than the growth in size of full system. Thus, hybrid formulations are beneficial for high order methods where local degrees of freedom of an element are much higher than interface degrees of freedom.


**Table 1** For 2D

Left: Number of total unknowns as a function of *N*, for *K* = 3 × 3. Right: Number of total unknowns as a function of the number of elements *K*, for *N* = 3



Left: Number of total unknowns as a function of *N*, for *K* = 3 × 3 × 3. Right: Number of total unknowns as a function of the number of elements *K*, for *N* = 3

On the right of Tables 1 and 2 we see that, for constant *N*, the *λ* system is smaller than the full system, although the growth ratio of the size of *λ* and full systems do not change significantly.

#### **6 Results**

In this section we present the results for a test problem from [7] by solving system (7). The domain of the test problem is, ∈ 0*,* 1 2 . The RHS term is defined as,

$$f\_{e\boldsymbol{\alpha}} = \nabla \cdot (-\mathbb{A}\nabla p\_{e\boldsymbol{\alpha}})\ , \quad \text{where } \boldsymbol{\alpha}$$

$$\begin{array}{ll} \mathsf{A} &= \frac{1}{\chi^2 + \chi^2 + a} \begin{pmatrix} 10^{-3} \mathsf{x}^2 + \mathsf{y}^2 + a & \left(10^{-3} - 1\right) \mathsf{x} \mathsf{y} \\\\ \left(10^{-3} - 1\right) \mathsf{x} \mathsf{y} & \mathsf{x}^2 + 10^{-3} \mathsf{y}^2 + a \end{pmatrix}; & a = 0.1 \\\ p\_{e\boldsymbol{x}} &= \sin\left(2\pi\boldsymbol{x}\right) \sin\left(2\pi\boldsymbol{\chi}\right) \end{array}$$

and Dirichlet boundary conditions are imposed along the entire boundary,<sup>D</sup> = *∂* and <sup>N</sup> = ∅. We solve this problem on an orthogonal and a curved mesh, see Fig. 2.

The same problem was earlier addressed in [6], but for a method with continuous elements and *primal* basis functions only. For the configuration *K* = 3 × 3, *N* = 6, we compare the sparsity structure of the two approaches in Fig. 3. On the left we see

*,*

**Fig. 2** Mesh configuration: *K* = 3 × 3, *N* = 6. Left: orthogonal. Right: curved

**Fig. 3** Sparsity plots *K* = 3×3, *N* = 6. Left: hybrid elements method. Right: continuous element method

the hybrid formulation, and on the right we see the continuous elements formulation [6]. The number of non zero entries are almost half in the hybrid formulation, 66*,*384, as compared to the continuous element formulation, 117*,*504. Here, the sparsity is due to use of algebraic dual degrees of freedom and is not because of hybridization of the scheme.

In Fig. 4, on the left we compare the growth in condition number, for the *λ* system (9) with full continuous element system, for *N* = 7 on the curved mesh, with increasing number of elements, *K*. We observe similar growth rates for hybrid and continuous formulation, however the condition number for continuous elements formulation is almost O 10<sup>2</sup> higher. On the right we see the growth in condition number with increasing polynomial degree for *K* = 9 × 9 on the curved mesh. A reduced growth rate in condition number for hybrid formulation is observed. Thus hybrid formulations are beneficial for high order methods.

**Fig. 4** Growth in condition number for hybrid elements in dark line, and continuous elements in dotted line. Left: *h*-refinement; Right: *N*-refinement. 'c' refers to the curved mesh

**Fig. 5** *L*2-error in divergence of velocity: Left: *h*-refinement; Right: *N*-refinement. 'o' refers to the orthogonal mesh and 'c' to the curved mesh

In Fig. <sup>5</sup> we show the *<sup>L</sup>*2-error for ∇ ·*u<sup>h</sup>* <sup>−</sup>*<sup>f</sup> <sup>h</sup>*. On the left side as a function of element size, *h* = 1*/* <sup>√</sup>*K*, and on the right side as a function of polynomial degree of the basis functions. In both cases the maximum error observed is of O 10−12 .

In Fig. 6, on the top two figures we show the error in the *H* div; norm for the velocity; and at the bottom two figures we show the error in *L*<sup>2</sup> *()* norm for the pressure. On the left we have *h*-convergence plots, and on the right we have *N*-convergence plots. In all the figures, for the same number of elements, *K*, and polynomial degree, *N*, the error is higher for the curved mesh.

On the left we see that the error decreases with the element size. The slope of error rate of convergence is *N*, which is optimal for both curved and orthogonal meshes. On the right we see exponential convergence of the error with increasing polynomial degree of basis for both orthogonal and curved meshes.

**Fig. 6** Top row: error in *H* div; norm for velocity; Bottom row: *L*2-error in pressure. Left: *h*-refinement; Right: *N*-refinement. 'o' refers to the orthogonal mesh and 'c' to the curved mesh

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **High-Order Mesh Generation Based on Optimal Affine Combinations of Nodal Positions**

**Mike Stees and Suzanne M. Shontz**

#### **1 Introduction**

The advantage of high-order numerical methods for solving partial differential equations is their higher degree of accuracy compared to low-order numerical methods. A major hurdle in the usage of these methods in the presence of complex geometries is the absence of robust high-order mesh generation methods [23]. In other words, these methods need a high-order mesh that accurately captures the features of the geometry to achieve their full potential [1, 10].

The typical approach for generating high-order meshes is to transform a coarse linear mesh [2–6, 9, 11, 12, 14, 16, 17, 19–22, 24]. At a high-level, these transformations usually consist of the following three steps: (1) the low-order mesh is enriched with additional nodes; (2) the new nodes that lie along the boundary of the mesh are moved to the true boundary; (3) the interior nodes are moved based on the boundary deformation. The main challenge of these methods arises from step (2). In particular, the curving of the elements along the boundary can result in invalid mesh elements. With that in mind, these high-order mesh generation methods use different approaches in step (3) in an effort to obtain a valid high-order mesh. Methods for transforming the linear mesh usually fall into two groups. The first

e-mail: shontz@ku.edu

M. Stees (-)

Department of Electrical Engineering and Computer Science, Information and Telecommunication Technology Center, University of Kansas, Lawrence, KS, USA e-mail: mstees@ku.edu

S. M. Shontz

Department of Electrical Engineering and Computer Science, Bioengineering Program, Information and Telecommunication Technology Center, University of Kansas, Lawrence, KS, USA

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_17

group of methods transform the mesh based on the solution to a partial differential equation [3, 12, 14, 24]. The second group of methods are based on optimization of an objective function [2, 4, 5, 9, 16–21].

In this paper, we describe an optimization-based approach for generating highorder meshes based on affine combinations of nodal positions. The remainder of this paper is organized as follows. In Sect. 2, we present our new method for highorder mesh generation. In Sect. 3, we demonstrate the performance of our proposed method on several aerospace engineering geometries. Finally, in Sect. 4, we offer some concluding remarks and possible directions for our future work.

#### **2 High-Order Mesh Generation Based on Affine Combinations of Nodal Positions**

In this section, we present our optimization-based method for high-order mesh generation. Our proposed method uses affine combinations of nodal positions to determine the movement of the interior nodes after deforming the boundary. Our method consists of three steps. First, for each interior node in the high-order straightsided mesh, an optimization problem is solved to calculate a set of weights that relates the interior node to its neighbors. Second, the boundary nodes are moved to the true boundary. Third, the new positions of the interior nodes are calculated by solving a linear system of equations using the weights and the new boundary positions. In spirit, this method is similar to the weight-based method that we proposed in [19] with two major differences. The first difference is that we propose an affine combination of nodal positions in this work, as opposed to a convex combination. This change allows us to remove the inequality constraint and logbarrier term, leaving only the equality constraints. We also propose an alternative objective function that when combined with the equality constraints allows us to directly solve the optimization problem via a QR factorization. This change results in simplified computational complexity and faster execution time.

To frame our discussion of the method, we introduce the following notation for the 2D formulation of the problem; the 3D formulation is similar. Let the *x*- and *y*-coordinates of the *i*th interior node be represented as *(xi, yi)*. In addition, define the *x*- and *y*-coordinates of the vertices adjacent to node *i* as {*(xj , yj )* : *j* ∈ *Ni*}, where *Ni* is the set of neighbors of node *i*. For each interior node *i*, this information can be represented as the following linear system, where *wij* are the weights:

$$\sum\_{j \in N\_l} w\_{lj} x\_j = x\_l$$

$$\sum\_{j \in N\_l} w\_{lj} y\_j = y\_l,$$

where

$$N\_l = \{\text{high-order nodes of the patch to which } i \text{ belongs}\}.$$

There are several potential choices for the local neighboring set based on use of the low-order nodes, high-order nodes, or both. We include only the highorder nodes as neighbors, as only the high-order boundary nodes move during the boundary deformations. Using either the low-order nodes or both the low- and high-order nodes would dampen the effect the boundary deformation has on the interior nodes, which might lead to tangling near the boundary. Including additional nodes as neighbors would also result in a less sparse matrix when solving (7). Another important consideration is that while the weight calculation is based on only the local neighbors, the position of an interior node is indirectly affected by the deformation of all the interior nodes through the solution of (7).

Adding the additional constraint that the weights sum to one results in the following linear system *Aw* = *b* for finding an affine combination of the x- and y-coordinates of the vertices adjacent to node *i*:

$$
\begin{bmatrix}
\boldsymbol{x}\_{1} \ \boldsymbol{x}\_{2} \ \dots \ \boldsymbol{x}\_{n} \\
\boldsymbol{y}\_{1} \ \boldsymbol{y}\_{2} \ \dots \ \boldsymbol{y}\_{n} \\
\boldsymbol{1} \ \boldsymbol{1} \ \dots \ \boldsymbol{1} \\
\end{bmatrix}
\begin{bmatrix}
\boldsymbol{w}\_{i1} \\
\boldsymbol{w}\_{i2} \\
\vdots \\
\boldsymbol{w}\_{in}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{x}\_{i} \\
\boldsymbol{y}\_{i} \\
\boldsymbol{1}
\end{bmatrix},
$$

where *n* = |*Ni*|. Based on the set of neighbors, this linear system will be underdetermined (i.e., *A* = *m* × *n* with *m<n*) in general. If we assume that *A* has full rank, we can find one particular solution to our problem by requiring that *w* has the smallest norm of any solution. This results in the following optimization problem:

$$\min\_{w} ||w||\_2^2 \tag{1}$$

$$\text{subject to } A\,w = b.\tag{2}$$

From the Karush–Kuhn–Tucker (KKT) theory [13], we know that the following conditions must hold for a solution *(w*∗*, λ*∗*)* to our problem to be optimal:

$$\nabla\_w \mathcal{L}(w^\*, \lambda^\*) = 0 \tag{3}$$

$$Aw^\* - b = 0\tag{4}$$

$$
\lambda^\*(Aw^\*-b) = 0.\tag{5}
$$

The Lagrangian of our problem is given by:

$$
\mathcal{L}(w, \lambda) = w^T w - \lambda^T (Aw - b),
$$

where *λ* are the Lagrange multipliers.

Using (3)–(5), we can find the following solution pair *(w*∗*, λ*∗*)* as follows:

$$
\nabla\_w \mathcal{L}(w, \lambda) = 2w - A^T \lambda.
$$

$$
\nabla\_w \mathcal{L}(w^\*, \lambda^\*) = 0 \Rightarrow w^\* = \frac{1}{2} A^T \lambda^\*.
$$

$$
A w^\* - b = 0 \Rightarrow A(\frac{1}{2} A^T \lambda^\*) - b = 0.
$$

$$
\lambda^\* = 2(A A^T)^{-1} b
$$

$$
w^\* = \frac{1}{2} A^T \lambda^\* = \frac{1}{2} A^T 2(A A^T)^{-1} b = A^T (A A^T)^{-1} b.
$$

Although we have verified that *(w*∗*, λ*∗*)* is a stationary point, we cannot yet claim that it is a minimum. To do so, we must investigate <sup>∇</sup><sup>2</sup> *<sup>w</sup>*L*(w*∗*, λ*∗*)*:

$$
\nabla\_w^2 \mathcal{L}(w, \lambda) = 2I\_{|N\_l| \times |N\_l|}
$$

$$
\nabla\_w^2 \mathcal{L}(w^\*, \lambda^\*) = 2I\_{|N\_l| \times |N\_l|}.
$$

From the second-order sufficient conditions, if *w*∗ satisfies (3)–(5) and the following condition is satisfied:

$$z^T \nabla\_w^2 \mathcal{L}(w^\*, \lambda^\*) z > 0, \text{for all } z \in C(w^\*, \lambda^\*), z \neq 0,\tag{6}$$

where *C(w*∗*, λ*∗*)* = {*<sup>z</sup>* | ∇*wc(w*∗*)<sup>T</sup> <sup>z</sup>* <sup>=</sup> <sup>0</sup>} is the critical cone and *c(w)* <sup>=</sup> *Aw* <sup>−</sup> *<sup>b</sup>*, then our solution is a minimum. Since <sup>∇</sup><sup>2</sup> *<sup>w</sup>*L*(w*∗*, λ*∗*)* is symmetric positive definite, the inequality in (6) is satisfied for any choice of *z*. Thus we can conclude that our solution *w*∗ is a minimum of (1)–(2).

Now that we have established that *w*∗ is our solution, we will discuss calculating it via a reduced QR factorization. Suppose that *<sup>A</sup><sup>T</sup>* <sup>=</sup> *QR*, where *Qn*×*m, Rm*×*<sup>m</sup>* is upper triangular, and *QT <sup>Q</sup>* <sup>=</sup> *Im*×*m*. Substituting in the QR factorization of *<sup>A</sup><sup>T</sup>* into *w*∗, we get the following:

$$\begin{aligned} w^\* = A^+ b = A^T (A A^T)^{-1} b &= \mathcal{Q} R (R^T \mathcal{Q}^T \mathcal{Q} R)^{-1} b &= \mathcal{Q} R (R^T R)^{-1} b \\ &= \mathcal{Q} R R^{-1} R^{-T} b = \mathcal{Q} R^{-T} b. \end{aligned}$$

Rearranging this into linear system form, we have:

$$\mathcal{R}^T \mathcal{Q}^T w^\* = b.$$

If we let *<sup>t</sup>* <sup>=</sup> *QT <sup>w</sup>*∗, then *<sup>R</sup><sup>T</sup> <sup>t</sup>* <sup>=</sup> *<sup>b</sup>* and *<sup>w</sup>*<sup>∗</sup> <sup>=</sup> *Qt*. Thus calculating *<sup>w</sup>*<sup>∗</sup> involves a QR decomposition of *<sup>A</sup><sup>T</sup>* , solving the lower triangular system *<sup>R</sup><sup>T</sup> <sup>t</sup>* <sup>=</sup> *<sup>b</sup>* by forward substitution, and calculating the matrix-vector product *Qt*.

After calculating the weights, a boundary deformation is applied. The final step is to solve for the new locations of the interior nodes [ ˆ*xI , y*ˆ*<sup>I</sup>* ] by solving the following global linear system:

$$A\_I[\hat{\mathbf{x}}\_I, \hat{\mathbf{y}}\_I] = -A\_B[\hat{\mathbf{x}}\_B, \hat{\mathbf{y}}\_B],\tag{7}$$

where *x*ˆ*<sup>B</sup>* and *y*ˆ*<sup>B</sup>* are the new x- and y-coordinates for the boundary nodes, and *AI* and *AB* contain the weights for the interior nodes and boundary nodes, respectively. In this global linear system, each row of the weight matrix corresponds to an interior node with nonzero entries for the node's neighbors and zero entries for the remainder of the row. The resulting global weight matrix is very sparse with irregular structure. In an effort to shift the nonzero entries closer to the diagonal, we apply the sparse reverse Cuthill–McKee ordering provided in Matlab. In Fig. 1, we show the matrix sparsity plots for the natural node ordering and the updated node ordering for the first two examples in Sect. 3. After applying the matrix reordering, the linear system is solved using a sparse LU factorization.

#### **3 Numerical Experiments**

In this section, we demonstrate the results from applying our method to generate several high-order meshes. We use Gmsh [7, 8, 15, 21] to generate the initial straightsided high-order meshes. Our method then uses this mesh to calculate the weights (step 1). Next, we curve the boundary nodes (step 2) using Gmsh. The positions of the interior nodes in the resulting curved high-order mesh are then updated (step 3) by our method. For each example, we show the mesh which results from our method (with high-order nodes visible), the mesh element distortion for the curved boundary mesh generated using Gmsh, and the distortion for the mesh resulting from our method. When reporting the mesh distortion, we list the minimum distortion, maximum distortion, average distortion computed over all elements (referred to as Avg1 in figures), and average distortion computed over curved elements (referred to as Avg2 in figures). The ideal distortion value is 1, indicating that the element is straight. We also list the execution times needed for steps 1 and 3 of our method (excluding I/O) in Table 1. The code was run using Matlab R2018a, and the wallclock execution times were measured on a machine with 16GB of RAM and a Ryzen 7 1700 CPU. All mesh visualizations and distortion evaluations were done using Gmsh. Our first example is a third-order mesh of a square region around

**Fig. 1** Sparsity plots for the first two examples in Sect. 3: (**a** and **c**) the nonzero entries using the original node ordering; (**b** and **d**) the nonzero entries after applying the sparse reverse Cuthill– McKee ordering

a NACA0012 airfoil. In Fig. 2a, b, we show the mesh resulting from our method and a table of the mesh quality values as measured by the distortion metric. In this example, our method increased the minimum distortion from 0.744 to 0.799, while causing only minor changes in the average distortion.

In our second example, we extrude the NACA0012 airfoil and create a thirdorder mesh of the resulting region. In Fig. 3a, b, we show the mesh resulting from our method, and a table of the mesh quality values. For this example, our method improved the minimum distortion by 0.125, increasing it from 0.317 to 0.442.


**Table 1** The number of elements and the wall clock times for steps 1 and 3 of our method (excluding I/O) for each example

**Fig. 2** NACA0012 airfoil example: (**a**) the mesh resulting from our method and (**b**) the mesh quality as measured by the element distortion metric


(b)


Our third and final example is a second-order mesh of an Airbus A319 aircraft. Unlike our previous examples, this geometry resulted in tangled elements after curving the boundary. Although our method still increased the minimum quality, it was not able to untangle the mesh. To address this, we applied the high-order

**Fig. 3** Extruded NACA0012 airfoil example: (**a**) the mesh resulting from our method and (**b**) the mesh quality as measured by the element distortion metric

(b)


(c)

regularization scheme available in Gmsh as a post-processing step. Aside from changing the target Jacobian range to 0.3–2 and fixing the boundary nodes, all other parameters were left at their default values. The untangling for the original mesh took 14.14 s, while untangling the mesh resulting from our method required only 1.64 s. In Fig. 4a–c, we show the surfaces of the mesh resulting from our method, a view of a cut through the interior volume mesh, and a table of the mesh quality values. In Fig. 4c, we list the distortion for the mesh after curving the boundary, the distortion for the mesh resulting from our method, and the distortions of both meshes after applying the regularization scheme in Gmsh.

Aside from the third test case, all of these examples were relatively straightforward. In each case, our method increased the minimum distortion when compared to only curving the nodes along the boundary. While additional testing is necessary to confirm this, our results for the third example seem to indicate that our method could be used to reduce the severity of the mesh tangling and thus simplify the work for an untangling method during post-processing.

**Fig. 4** Airbus A319 example: (**a**) surfaces of the mesh resulting from our method; (**b**) a view of a cut through the interior volume mesh, and (**c**) the quality of the mesh with only boundary curving, the quality of the mesh resulting from our method, and the quality of both meshes after applying the regularization scheme available in Gmsh

#### **4 Concluding Remarks and Future Work**

We have presented a new optimization-based method for generating high-order meshes. Our examples have shown that the proposed method based on affine combinations of nodal positions tends to improve the quality of the most distorted elements, while causing minor changes to the least distorted elements. While our approach is optimization-based, we have demonstrated that the optimization problem can be solved directly using a QR factorization as opposed to the typical iterative optimization approach. This change results in lessened computational complexity and reduced execution time.

As part of our future work, we will consider other definitions for the set of neighbors of an interior node. We will also investigate other aspects of the linear system including other node reordering schemes and solvers. Finally, we will apply the untangling method that we proposed in [20] after extending it to 3D.

**Acknowledgements** The work of the first author was funded in part by the Madison and Lila Self Graduate Fellowship and NSF CCF grant 1717894. The work of the second author was supported in part by NSF grants CCF 1717894 and OAC 1808553. The authors wish to thank the anonymous referee for his or her careful reading of the paper and for the helpful suggestions which strengthened it.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Sparse Spectral-Element Methods for the Helically Reduced Einstein Equations**

**Stephen R. Lau**

#### **1 Introduction**

To model the inspiral and merger of binary objects (blackholes or neutron stars), many researchers have been solving the Einstein equations numerically. Such simulation involves both the construction of gravitational initial data at time *t*<sup>0</sup> and its subsequent evolution to a final time *tF* ' *t*0. Interpretation of experimental detections of gravitational waves relies on numerical simulation. Moreover, detection of weak signals is facilitated by statistical techniques alongside "template banks" of numerically generated signals. We consider a nonstandard problem, solution of the Einstein equations reduced by helical symmetry, as described by Beetle, Bromley, Hernández, and Price (BBHP) [1, 2]. Heuristically, helical reduction is a data+evolution synthesis. Although solutions to the BBHP equations are ultimately unphysical, they may approximate the early phase of inspiral and serve as reduced order models. Moreover, they may provide excellent "trial data" (the starting point for the construction of *t* = *t*<sup>0</sup> initial data). Finally, they would address bewitching mathematical issues concerning exact helical symmetry in general relativity.

We consider a spectral element approach [3–8] for solving the BBHP equations. Although the equations involve a mixed-typed operator *L*, we solve them via relaxation using a Broyden-Krylov approach. The computational domain which surrounds the compact objects is split into 11+ subdomains (blocks, spherical shells, and cylindrical shells with classical spectral expansions thereon). To rapidly solve the linear systems arising in our scheme, we have developed sparse modal methods based on the application of spectral integration matrices, extending ideas originally described in the 1990s by Coutsias, Hagstrom, Hesthaven, and Torres.

S. R. Lau (-)

Mathematics and Statistics, University of New Mexico, Albuquerque, NM, USA e-mail: lau@math.unm.edu

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_18

We use preconditioned GMRES to solve these systems, with standard domain decomposition methods. In addition, we have developed fast methods for inversion of subdomain approximations of *L*, either via modal-based preconditioning or direct schemes.

#### **2 Background**

This section first describes the helically reduced wave equation (HRWE), a model for the helically reduced Einstein equations. Our HRWE description fixes ideas.

#### *2.1 Helically Reduced Wave Equation*

The wave equation is *<sup>ψ</sup>* <sup>=</sup> 0, where = −*∂*<sup>2</sup> *<sup>t</sup>* + *Δ***x**. Assume that *ψ* rotates rigidly with rate *Ω*. With the *z*-axis as the rotation axis, *ψ* then depends on time *t* only through *ϕ* = *φ* − *Ωt*, where *φ* is the azimuthal angle. Via the *∂t* → −*Ω∂ϕ* replacement, *ψ* = 0 becomes the HRWE *Lψ* = 0, where *<sup>L</sup>* <sup>=</sup> *<sup>Δ</sup>***x**˜ <sup>−</sup> *<sup>Ω</sup>*2*(x∂*˜ *<sup>y</sup>*˜ − ˜*y∂x*˜*)*<sup>2</sup> in terms of co-rotating coordinates *(t ,*˜ *x,*˜ *y,*˜ *z)*˜ <sup>=</sup> *(t, x* cos *Ωt* + *y* sin *Ωt,y* cos *Ωt* − *x* sin *Ωt, z)*.

We adopt a "2-center domain" *D*, a 3d ball with two smaller 3d balls excised from it; see Fig. 1. Its boundary *∂D* = *∂S***<sup>−</sup>** *<sup>I</sup>* ∪ *∂S***<sup>−</sup>** *II* ∪ *∂S***<sup>+</sup>** out is the union of two inner spheres (the **−**'s) and one outer sphere (the **+**). We consider the mixed-type problem

$$L\psi = 0,\qquad \psi = f^- \text{ on } \partial S\_I \cup \partial S\_{II}, \qquad \left(\partial\_r - \Omega \,\partial\_\psi + r^{-1}\right)\psi = f^+ \text{ on } \partial S\_{\text{out}}^\dagger,\tag{1}$$

**Fig. 1** 2-center domain decomposition. The left panel depicts the *z*-cross section of the inner subdomains with *S*out suppressed. The right panel depicts all subdomains, although for visualization the outer radius for *S*out has been chosen rather small. In this work the blocks *B*<sup>2</sup> and *B*<sup>4</sup> are absent. (**a**) Inner domain decomposition. (**b**) Double cross section

with inner Dirichlet conditions and an outer radiation boundary condition on *ψ*. For simplicity here, we have put on *∂S***+** out a simple Sommerfeld condition; in practice, *f* **+** is a nonlocal function of *ψ* enforcing an exact Dirichlet-to-Neumann map [4].

A class of solutions to (1) stems from Liénard-Wiechert potentials. Indeed, consider *Lψ* = −16*πγMδ(***x**˜ − **x**˜*p)*. The "particle" location **x**˜*<sup>p</sup>* = *(*±*a,* 0*,* 0*)* is the center of either *∂S***−** *II* or *∂S***<sup>−</sup>** *<sup>I</sup>* ; whence the equation is homogeneous on *D*. The mass *<sup>M</sup>* and relativistic factor *<sup>γ</sup>* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *<sup>v</sup>*2*)*−1*/*<sup>2</sup> are constants, with *<sup>v</sup>* <sup>=</sup> *aΩ <* <sup>1</sup> so the particle moves subluminally in the *(t , x, y, z)* frame. The retarded solution to this problem is

$$\psi(\tilde{x}, \tilde{y}, z) = \psi(\rho \cos \varphi, \rho \sin \varphi, z) = \frac{4\gamma M}{\lambda \mp v\rho \sin(\varphi + \mathcal{Q}\lambda)} =: \gamma^2 \frac{4M}{\mathcal{R}}.\tag{2}$$

Evaluation of this expression involves a numerical component: solution of the fixed point equation *λ* = *<sup>z</sup>*<sup>2</sup> <sup>+</sup> *<sup>ρ</sup>*<sup>2</sup> <sup>+</sup> *<sup>a</sup>*<sup>2</sup> <sup>∓</sup> <sup>2</sup>*aρ* cos*(ϕ* <sup>+</sup> *Ωλ)*1*/*<sup>2</sup> for (the retarded time) *<sup>λ</sup>*.

#### *2.2 Helically Reduced Einstein Equations*

We consider the vacuum Einstein equations in Landau-Lifshitz form. Write the (densitized contravariant) metric tensor as <sup>g</sup>*μν* <sup>=</sup> *<sup>η</sup>μν* <sup>−</sup> *hμν* , where *<sup>η</sup>μν* <sup>=</sup> diag*(*−1*,* <sup>1</sup>*,* <sup>1</sup>*,* <sup>1</sup>*)* is the flat metric and *hμν* is the *metric perturbation*. Assume the *harmonic gauge condition ∂νhμν* <sup>=</sup> 0. Then the vacuum Einstein equations can be expressed as

$$
\Box h^{\mu\upsilon} = S^{\mu\upsilon\kappa\beta}\_{\tau\phi\gamma\alpha}(\mathfrak{g}) \frac{\partial h^{\tau\phi}}{\partial x^{\kappa}} \frac{\partial h^{\gamma\alpha}}{\partial x^{\beta}} + h^{\alpha\beta} \frac{\partial^2 h^{\mu\upsilon}}{\partial x^{\alpha} \partial x^{\beta}},\tag{3}
$$

where *Sμνκβ τφγ α(*g*)* depends on <sup>g</sup>*μν* and its inverse <sup>g</sup>*μν* (but not on derivatives of either). Einsteins equations are then a constrained system of 10 nonlinear wave equations.

The BBHP reduction of the Einstein equations is similar to the one outlined for the wave equation. Technically, it assumes the existence of a *Killing vector field*, but we give a brief and heuristic description of their approach. The challenge is that the perturbations *hμν* themselves are not "helical scalars"; helical reduction is therefore not tantamount to the replacement *hμν* <sup>→</sup> *Lhμν* . However, BBHP have introduced helical scalars *ψ<sup>A</sup>* through which the reduction can be carried out. These are

$$\begin{cases} \psi^{(nn)} = h^{\text{II}}, & \psi^{(n0)} = \sqrt{2}h^{\text{Iz}}, \quad \psi^{(00)} = \sqrt{\frac{1}{2}}(h^{\text{xx}} + h^{\text{yy}} + h^{\text{zz}}) \\\\ \psi^{(20)} = -\sqrt{\frac{1}{6}}(h^{\text{xx}} + h^{\text{yy}} - 2h^{\text{zz}}), & \psi^{(n1)} = e^{\text{i}\Omega t} \left( -h^{\text{Ix}} + \text{i}h^{\text{Iy}} \right) \\\\ \psi^{(21)} = e^{\text{i}\Omega t} \left( -h^{\text{x}\varepsilon} + \text{i}h^{\text{y}\varepsilon} \right), & \psi^{(22)} = e^{\text{i}\Omega t} \left[ \frac{1}{2}(h^{\text{x}\varepsilon} - h^{\text{yy}}) - \text{i}h^{\text{xy}} \right]. \end{cases} (4)$$

These 4 real and 3 complex quantities contain the 10 degrees of freedom in the *hμν* . We express this transformation as *<sup>ψ</sup><sup>A</sup>* <sup>=</sup> *<sup>e</sup>*<sup>i</sup>*μ(A)ΩtMA μνhμν* , where *A* runs over the (tensor-spherical-harmonic) labels, and *μ(A)* is 0,1, or 2. Contraction of *M<sup>A</sup> μν* on (3), with subsequent helical reduction based on the action of on *ψA*, yields

$$\begin{aligned} &L\psi^A - 2\text{i}\mu(A)\mathfrak{L}^2\partial\_{\psi}\psi^A + \mu^2(A)\mathfrak{L}^2\psi^A = \\ &h^{a\beta}\psi^A\_{,\alpha\beta} - 2\text{i}\mu(A)\mathfrak{L}h^{\dagger\beta}\psi^A\_{,\beta} - \mu^2(A)\mathfrak{L}^2h^{\dagger\iota}\psi^A + M^A\_{\ \mu\upsilon}S^{\mu\upsilon\kappa\beta}\_{\text{\tiny tr\beta\forall\alpha}}(\mathfrak{g})h^{\dagger\phi}{}\_{,\kappa}h^{\prime\nu\alpha}{}\_{,\beta}.\end{aligned} \tag{5}$$

Here *L* is the operator appearing in the HRWE. Similar to the boundary conditions appearing in (1), the boundary conditions we adopt for (5) are

$$\psi^A = (f^A)^\top \text{ on } \partial S\_I \cup \partial S\_{II}, \qquad \left(\partial\_r - \mathcal{Q}\,\partial\_\theta + r^{-1}\right) e^{\mathbf{i}\mu(A)\varphi} \psi^A = (f^A)^\top \text{ on } \partial S\_{\text{out}}^\bot. \tag{6}$$

Again for simplicity, here we have a Sommerfeld condition on *∂S***+** out, but in practice use a nonlocal outgoing condition based on an exact Dirichlet-to-Neumann map.

Price has written down the analog of (2) for linearized gravity, i.e. (5) when the right-hand side of the equation is set to zero. This solution may be viewed as an exact solution to *hμν* = −16*πT μν* , where the stress energy tensor *<sup>T</sup> μν* corresponds to a massive point particle in an eternal circular orbit (as discussed, such a point source is excised from our domain *D*). Price's solution is analogous to the electromagnetic solution given by G. A. Schott, and it is given by the leading 1*/*R terms in the appendix expressions (12). Unfortunately, *∂νhμν* = 0 for this solution.

#### **3 Sparse Spectral Element Methods**

This section summarizes our numerical methods. It is necessarily impressionistic, as even an incomplete presentation of details would take too much space.

#### *3.1 Overview*

We split *D* into subdomains, here with the minimal configuration of 11 subdomains: blocks *B*1*,*3*,*5; cylinders *C*1*,*2*,*3*,*4*,*5; an inner shell *SI* around "particle" *I* ; an inner shell *SII* around "particle" *II* ; and an outer shell *S*out. This corresponds to a "binary blackhole" (BBH) domain with two excised inner balls. For a "binary neutron star" (BNS) domain, we further split both *SI* and *SII* into two overlapping concentric spheres (a stellar surface then resides in each overlap [8]), and fill in the excised regions with two extra blocks *B*2*,*4. Figure 1 depicts a BNS domain. Pioneered by Pfeiffer et al. [9], such decompositions are used in the EllipticSolver of SpEC [10].

The unknowns in our approach are the modal expansion coefficients associated with subdomain expansions in terms of classical (Chebyshev, Fourier, and spherical harmonic) basis functions. As described below, we make extensive use of integration matrices to achieve sparse representations of the relevant operators. Before presenting details, we first address how we handle the nonlinearities in (5). Let *ψ* 3*A* (the vector of unknowns) be a concatenation of the modal coefficients from all 11 subdomains. Then, as sketched below, upon approximation (5) becomes

$$
\mathcal{A}\mathcal{B}\widetilde{\mathcal{Q}}\widetilde{\Psi}^A = \mathcal{A}\widetilde{\mathfrak{g}}^A,\tag{7}
$$

where *BL* approximates the operator on the left-hand side of (5), with *B* representing the action of "integration preconditioning" (see below) on each subdomain. For linearized gravity, with the right-hand side of (5) set to zero, the vector 3*g<sup>A</sup>* is zero, save for select entries related to the inner Dirichlet values *(f A)***<sup>−</sup>** of *ψ<sup>A</sup>* on *∂S***−** *<sup>I</sup>* ∪ *∂S***<sup>−</sup>** *II* . For the full Einstein equations <sup>3</sup>*g<sup>A</sup>* depends on *<sup>ψ</sup>* <sup>3</sup>*A*, and its evaluation relies on spectral analysis/synthesis (forward/backward transform) and numerical differentiation on each subdomain. We then view

$$
\ddot{\boldsymbol{\Psi}}\_k^A = (\boldsymbol{\cir}\boldsymbol{\mathcal{X}})^{-1}\boldsymbol{\mathcal{X}}\widetilde{\mathbf{g}}^A(\ddot{\boldsymbol{\Psi}}\_{k-1}^B) \tag{8}
$$

as a fixed-point equation, accelerating its convergence with the Broyden algorithm.

This approach relies on approximation and inversion of the operator appearing on the left-hand side of (5). Reference [4] is a detailed account of the case *μ(A)* = 0, i.e. the HRWE. For *μ(A)* <sup>=</sup> 1 or 2, the operator mixes the *<sup>U</sup><sup>A</sup>* and *<sup>V</sup> <sup>A</sup>* in *<sup>ψ</sup><sup>A</sup>* <sup>=</sup> *<sup>U</sup><sup>A</sup>* <sup>+</sup> <sup>i</sup>*<sup>V</sup> <sup>A</sup>*. We have not yet described our treatment of this scenario, but note that it relies on Schur-complement techniques. Here we describe only the *μ(A)* = 0 case.

#### *3.2 Integration Preconditioning and Other Key Aspects*

We use "integration preconditioning" [11] in order to achieve sparse linear systems. Especially for the cylinders and inner shells, the details are formidable. We convey the basic ideas with the Laplacian *Δ*, rather than *L*, on a cube (suppressing tildes on the co-rotating coordinates). Let *u(x, y, z)* ≈ <*Nx i*=0 <*Ny j*=0 <*Nz <sup>k</sup>*=<sup>0</sup> <sup>3</sup>*uijkTi(x)Tj (y)Tk(z)* obey *Δu* <sup>=</sup> *<sup>g</sup>* on <sup>C</sup> = [−1*,* <sup>1</sup>] 3. An approximation of the Poisson equation is

$$(D\_\chi^2 \otimes I\_\chi \otimes I\_\mathfrak{z} + I\_\chi \otimes D\_\chi^2 \otimes I\_\mathfrak{z} + I\_\chi \otimes I\_\chi \otimes D\_\mathfrak{z}^2)\widetilde{\mathbf{u}} = \widetilde{\mathbf{g}},\tag{9}$$

with <sup>3</sup>**<sup>u</sup>** the vector of <sup>3</sup>*uijk* with appropriate ordering, and *<sup>D</sup>*<sup>2</sup> representing double differentiation in the modal Chebyshev basis. Let *B*<sup>2</sup> [2] represent double integration in the modal basis, where the [2] indicates that the first two rows have all zero entries. To (9) we apply the "preconditioner" <sup>B</sup> <sup>=</sup> *<sup>B</sup>*<sup>2</sup> *<sup>x</sup>*[2] <sup>⊗</sup> *<sup>B</sup>*<sup>2</sup> *<sup>y</sup>*[2] <sup>⊗</sup> *<sup>B</sup>*<sup>2</sup> *z*[2] , thereby reaching

$$\mathbb{P}\left(I\_{\mathbf{x}\{2\}}\otimes B\_{\mathbf{y}\{2\}}^2\otimes B\_{\mathbf{z}\{2\}}^2 + B\_{\mathbf{x}\{2\}}^2\otimes I\_{\mathbf{y}\{2\}}\otimes B\_{\mathbf{z}\{2\}}^2 + B\_{\mathbf{x}\{2\}}^2\otimes B\_{\mathbf{y}\{2\}}^2\otimes I\_{\mathbf{z}\{2\}}\tilde{\mathbf{u}} = \mathfrak{B}\widetilde{\mathbf{g}}.\qquad(10)$$

The system (10) is sparse, and the number of *empty rows* (all 0's) equals the number of *tau-conditions* to be enforced, auxiliary equations enforcing, say, *u*|*∂*<sup>C</sup> = *f* .

"Integration preconditioning" of (9) results in the sparse system (10), but the issue of condition number is subtle. Indeed, passage from (9) to (10) arguably *worsens conditioning*. For this reason, integration preconditioning has been viewed as bad for PDE; we call the technique *integration sparsification*. Conditioning issues are then surmounted either by further *genuine* preconditioning on top of the sparsification or fast direct solves. Our use of 2-center domains with sparse modaltau methods features: (a) sparse representation of *L* on subdomains; (b) "gluing" of conforming and overlapping subdomains; (c) modal-based preconditioning of subdomain solves; (d) "fast" direct-solves on blocks and *S*out; (e) standard global preconditioning; (f) low-rank treatment of stellar surfaces; (g) nonlocal domain reduction.

#### *3.3 Current Complexity Estimates*

We aim to solve the linear systems (say approximating the HRWE or the Helmholtz equation posed on the 2-center domain *D*) arising in our problems at *demonstrably* sub-quadratic complexity; indeed, we believe that an order- <sup>5</sup> <sup>3</sup> complexity is achievable. This is the complexity associated with matrix-vector multiplication; despite our sparse representations, the "gluing" of overlapping subdomains in our decomposition of *D* prevents realization of a linear-complexity matrix-vector product.

To document progress towards our goal, we summarize our current complexity estimates associated with solution of the HRWE on each subdomain type. These solves serve as part of our preconditioner for the global GMRES solution of the HRWE on *D*. For this discussion, let N represent the total number of modes on a given subdomain; e.g. N = *(Nx* + 1*)(Ny* + 1*)(Nz* + 1*)* for the block considered above.

For *<sup>S</sup>*out let <sup>M</sup> be the matrix which represents*r*2*<sup>L</sup>* (the *<sup>r</sup>*<sup>2</sup> factor here is explained in [4]) and includes inserted tau-vectors to enforce boundary conditions. Ignoring tau-vectors, M is block diagonal in the spectral space of spherical harmonics indexed by *(, m)*. View its elements as M*mk, m <sup>k</sup>* , where *mk* is a "clumped index' and *k, k* are Chebyshev indices. Then, apart from tau-vectors, M*mk, mk* = 0, unless = and *m* = *m* . Moreover, each *(Nr* + 1*)* ×*(Nr* + 1*)* block M*mk,mk* is itself sparse and banded. While these desirable structures are somewhat spoiled by the tau-conditions, through the use of the Woodbury formula and band solvers, for *S*out we are able to directly invert M (i.e. solve the HRWE on *S*out) at *O(*N*)* cost.

For the inner shells, *SI* and *SII* , our representation <sup>M</sup> of *<sup>r</sup>*2*<sup>L</sup>* is only block banded. Indeed, the centers of these shells lie off the rotation axis, and *r*2*L* mixes spherical harmonic modes. Its spectral representation [4] is remarkably complicated, and relies on identities for spherical harmonics found in the treatise [12]. We solve the HRWE on inner shells via preconditioned GMRES, with a modal block-Jacobi preconditioner defined by inversion of the diagonal blocks M*mk,mk*. Apart from tau-conditions, these blocks are again sparse and banded, and therefore amenable to the fast methods alluded to in the last bullet. Construction and reuse of this block-Jacobi preconditioner therefore has *O(*N*)* cost. Moreover, we have empirically observed (see [4]) that such preconditioning yields low and essentially *resolution independent* iteration counts. While more analysis is needed, from a practical standpoint solution of the HRWE on an inner shell has an *O(*N*)* cost.

The situation on blocks is worse. Part of our global preconditioner for solving the HRWE on *D* involves inversion of the Poisson problem on blocks as an approximation to the HRWE (and inversion of the Helmholtz equation when the spin index *μ(A)* = 0). This works extremely well, likely due to the fact that *Ω* 1 and the blocks are close to the rotation center. In any case, we solve the Poisson/Helmholtz problem on a block via a direct approach [13] based on a rank-augmenting generalization of the Woodbury formula. This direct method is empirically well-conditioned and low-memory. Moreover, it has an *O(*N2*)* set-up cost with a small constant, followed by an *O(*N4*/*3*)* cost for subsequent solves. If possible, we hope to reduce the set-up cost to an *O(*N5*/*3*)* complexity.

The situation on cylinders is worst of all, although to date we have not focused much attention on these subdomains. We solve the HRWE on cylinders (or the collection of cylinders) via GMRES, with modal-based preconditioners that empirically yield resolution-independent iteration counts. Application of the preconditioner currently involves an *O(*N7*/*3*)* set-up cost, followed by an *O(*N5*/*3*)* cost for subsequent solves. Here, we believe improvement is possible.

#### **4 Numerical Tests**

Our decomposition of *D* is from Table IV of [4], except here *∂S*out has *r*out = 15. For that table *∂S***−** *<sup>I</sup>* has radius *rI,*min = 0*.*4 and center *(x*˜*<sup>I</sup>* = −0*.*9*,* 0*,* 0*)*, and *∂S***<sup>−</sup>** *II* has radius *rII ,*min <sup>=</sup> <sup>0</sup>*.*3 and center *(x*˜*II* <sup>=</sup> <sup>1</sup>*.*0*,* <sup>0</sup>*,* <sup>0</sup>*)*. The coordinates *X,* <sup>3</sup> *Y ,* <sup>3</sup> *<sup>Z</sup>* <sup>3</sup> in [4] are *y,*˜ *z,*˜ *x*˜ here. The subdomain truncations (number of modes) adopted here are nearly the same as those in [4]; however, here we only record the total number of modes over all subdomains. Unless otherwise stated, *Ω* = 0*.*075, *MI* = 0*.*05, and *MII* = 0*.*1.

#### *4.1 Comparison with Exact Solutions*

Our first test uses PriceLG, the aforementioned solution for linearized gravity due to Price. Here the solution is a superposition of two point-sources with the above masses. Each source point's contribution to the helical scalars is defined by the leading 1*/*R term in (12), and these expressions seed inner Dirichlet conditions on *∂S***<sup>−</sup>** *I* ∪ *∂S***−** *II* . The outer boundary conditions are nonlocal conditions which are exact for this solution. Table 1 lists errors for *ψ(nn)*; errors for the other scalars are similar. These are relative *L*2-errors (against the exact solution) computed on both *B*<sup>3</sup> and *S*out via interpolation onto uniform reference grids. Since the problem is linear, each line of table corresponds to a single GMRES solve performed in parallel on 10 nodes. For the table middle about 0.4% of the matrix entries are nonzero. The last table line has *∂νhμν*rms (3.9420e-12,1.7310e-03,8.9752e-05,9.0076e-16), with the rms calculation taken over the (relatively coarse) dual-nodal subdomain grids. The first and last components of *∂νhμν* vanish for the exact PriceLG solution.

Our next test is SchwarzH, the Schwarzschild metric in harmonic coordinates:

$$h^{ll} = -1 + r^{-2}(r+M)^3/(r-M), \quad h^{jk} = r^{-2}M^2\nu^j\nu^k. \tag{11}$$

Here the radius *r* and the direction cosines *ν* = *(*sin *θ* cos *φ,*sin *θ* sin *φ,* cos *θ )* are chosen relative to a point *(x*0*, y*0*, z*0*)* = (-0.9+1.37e-3,-1.6854e-4,2.9985 e-3) which is off-center but close to the center of *∂S***−** *<sup>I</sup>* . The mass is *M* = 0*.*05, and for this choice the horizon of the blackhole lies inside of (but is not concentric with) *∂S***−** *<sup>I</sup>* . For this test *Ω* = 0, and the exact solution (11) seeds inner Dirichlet boundary conditions on both *∂S***−** *<sup>I</sup>* and *∂S***<sup>−</sup>** *II* . On *∂S***<sup>+</sup>** out rather than radiation conditions, we adopt an inhomogeneous Neumann condition based on the exact solution. Table 2 lists


**Table 1** PriceLG solution

Here relative *L*<sup>2</sup> errors are listed only for *ψ(nn)*. The lowest resolution run has zero initial iterate; afterwards the initial iterate stems from the previous solution. Respectively, tGMRES and iGMRES are the tolerance and iteration number for the GMRES solve


**Table 2** SchwarzH solution

As for the PriceLG test, only errors for *ψ(nn)* are listed


**Table 3** Full GR with the same boundary conditions as PriceLG

errors with the same meanings as before. Now *∂νhμν*rms for each *<sup>μ</sup>* is comparable to the corresponding table errors; i.e. the gauge constraint converges to zero. Table 2 also lists the number iBROY of iterations performed by the Broyden solver to achieve the tolerance tBROY. Each Broyden iteration itself involves a linear solve via GMRES. Each GMRES tolerance is tBROY*/*10, the same as the corresponding line in Table 1.

#### *4.2 Gauge Constraint Tests*

Our next two tests explore to what extent the gauge constraint *∂νhμν* <sup>=</sup> 0 is satisfied for the Einstein problem (5) and (6) in a binary scenario. For the first test we redo the PriceLG test, except now with the Einstein equations. The inner and outer boundary conditions are exactly as before. Table 3 again lists errors for *ψ(nn)* with the same meanings as before. Errors are computed against the finest-resolution numerical solution. The table also lists the *L*2-norm of the nonlinear residual. The last table line has *∂νhμν*rms (3.2031e-03 5.1309e-03 4.5881e-03 4.6711e-03). That is, the gauge constraint does not converge to zero. Since the harmonic gauge is not satisfied, we cannot view these as solutions to the Einstein equations!

Presumably, the violation of the harmonic gauge in the preceding example stems from the fact that *∂νhμν* = 0 for the Price solution used to fix the inner boundary conditions. Beetle, Bromley, and, Hernández, and Price have given a refined set of inner boundary conditions, based on the near-field asymptotics of a moving Schwarzschild blackhole and meant to improve on the point-particle boundary conditions. The appendix lists (our understanding of) these conditions. With the hope that these refined boundary will result in lower gauge errors, we have performed the previous test with them. Convergence of the numerical solution is similar, but *∂νhμν*rms has comparable size and still does not converge to zero.

#### **5 Conclusions and Acknowledgments**

Our tentative conclusion for helically symmetric BBH models is that violation of the gauge constraint stems from imperfect inner boundary conditions. We have also found a persistent gauge error in our BNS models, despite being several orders of magnitude smaller in that context. The BNS model, with stars in place of excised regions, involves no inner boundary conditions. We believe that in this context it is the outer boundary conditions which give rise to the constraint violation. Likely, at both the inner and outer boundaries some of the helical scalars need to be fixed using the gauge constraint itself (similar to "constraint preserving boundary conditions" used in evolution codes). The author is grateful for correspondence with Richard H. Price, and for assistance from UNM's Center for Advanced Research Computing.

#### **Appendix: Beetle, Bromley, Hernández, and Price Inner Boundary Conditions**

This appendix lists expansions for the helical scalars which somewhat generalize the ones in [1]. We need two expansions, one for a "particle" at *(*−*aI ,* 0*,* 0*)*, and one at *(aII ,* 0*,* 0*)*. Each has its own mass *MI,II* . The top choice of ± or ∓ refers to *II* and the bottom to *<sup>I</sup>* . For both *aI* and *aII* , we define *<sup>v</sup>* <sup>=</sup> *aΩ*, *<sup>γ</sup>* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *<sup>v</sup>*2*)*−1*/*2, <sup>R</sup> <sup>=</sup> *γ λ* <sup>∓</sup> *vγρ* sin*(ϕ* <sup>+</sup> *Ωλ)*, *<sup>K</sup><sup>R</sup>* = −*<sup>ρ</sup>* cos *<sup>ϕ</sup>* <sup>±</sup> *<sup>a</sup>* cos*(Ωλ)* <sup>±</sup> *vγ*<sup>R</sup> sin*(Ωλ)*, and *<sup>K</sup><sup>I</sup>* <sup>=</sup> *<sup>ρ</sup>* sin *<sup>ϕ</sup>*±*<sup>a</sup>* sin*(Ωλ)*∓*vγ*<sup>R</sup> cos*(Ωλ)*. For *<sup>λ</sup>* see after (2). Assuming *M/*<sup>R</sup> 1, we have

$$
\psi^{(nn)} \sim \chi^2 \left(\frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2}\right) + \frac{M^2(\lambda - \chi\mathcal{R})^2}{\mathcal{R}^4} \tag{12a}
$$

$$
\psi^{(n0)} \sim 0 \cdot \left(\frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2}\right) + \sqrt{2} \frac{M^2 (\lambda - \mathcal{Y}\mathcal{R}) z}{\mathcal{R}^4} \tag{12b}
$$

$$\psi^{(00)} \sim \frac{v^2 \chi^2}{\sqrt{3}} \left( \frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2} \right) + \frac{M^2 \left[ \lambda^2 - 2\gamma \lambda \mathcal{R} + (2 + v^2 \chi^2) \mathcal{R}^2 \right]}{\sqrt{3} \mathcal{R}^4} \tag{12c}$$

$$\psi^{(20)} \sim -\frac{v^2 \nu^2}{\sqrt{6}} \left(\frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2}\right) - \frac{M^2 \left[\lambda^2 - 3z^2 - 2\gamma\lambda\mathcal{R} + (2 + v^2\nu^2)\mathcal{R}^2\right]}{\sqrt{6}\mathcal{R}^4} \tag{12d}$$

$$\psi^{(nl)} \sim \pm i v \gamma^2 e^{i\Omega\lambda} \left(\frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2}\right) + \frac{M^2(\lambda - \gamma\mathcal{R})(K^R + iK^I)}{\mathcal{R}^4} \tag{12e}$$

$$
\psi^{(21)} \sim 0 \cdot \left(\frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2}\right) + \frac{M^2 z (K^R + iK^I)}{\mathcal{R}^4} \tag{12f}
$$

$$\Psi^{(22)} \sim -\frac{1}{2}v^2\gamma^2 e^{2i\Omega\lambda} \left(\frac{4M}{\mathcal{R}} + \frac{7M^2}{\mathcal{R}^2}\right) + \frac{M^2}{2\mathcal{R}^4} [(K^R)^2 - (K^I)^2 + 2iK^R K^I].\tag{12g}$$

Worried about a possible sign discrepancy with the results in [1], we have also considered (12a–g) with all correction terms (those with <sup>R</sup><sup>4</sup> in the denominator) flipped by a sign.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Spectral Analysis of Isogeometric Discretizations of 2D Curl-Div Problems with General Geometry**

**Mariarosa Mazza, Carla Manni, and Hendrik Speleers**

### **1 Introduction**

In this paper we focus on isogeometric Galerkin discretizations of the weighted curl-div operator

$$\mathcal{L}\_{\alpha,\beta}\mathfrak{u} := \alpha \nabla \times \nabla \times \mathfrak{u} - \beta \nabla \nabla \cdot \mathfrak{u}, \quad 0 < \alpha, \beta. \tag{1}$$

This parameter-dependent operator appears in several problems, including the Stokes equation and Maxwell equations [2]. Moreover, containing a weighting of the curl and div operators, it captures the essential features of the so-called Alfvénlike operator [14], which is of interest in magnetohydrodynamics [15]. We note that L*α,β* can be seen as a weighted Laplacian for vector fields (equivalently, Hodge Laplace for 1-forms). Indeed, when *α* = *β* = 1, it is equal to the standard (negative) vector Laplace operator, i.e.,

$$\nabla \times \nabla \times \mathfrak{u} - \nabla \nabla \cdot \mathfrak{u} = -\nabla^2 \mathfrak{u} \dots$$

We assume that (1) is defined on a sufficiently smooth domain <sup>∈</sup> <sup>R</sup><sup>2</sup> that can be described through a geometry map *G* : [0*,* 1] <sup>2</sup> <sup>→</sup> , and we consider homogeneous

M. Mazza (-)

C. Manni · H. Speleers

© The Author(s) 2020

Max Planck Institute for Plasma Physics, Garching, Germany e-mail: mariarosa.mazza@ipp.mpg.de

Department of Mathematics, University of Rome Tor Vergata, Rome, Italy e-mail: manni@mat.uniroma2.it; speleers@mat.uniroma2.it

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_19

Dirichlet (no-slip) boundary conditions, i.e., *u* = 0 on *∂*. This leads us to the following variational formulation

$$\alpha(\mathcal{L}\_{\mathfrak{u},\beta}\mathfrak{u},\mathfrak{v}) = \mathfrak{a}(\nabla \times \mathfrak{u}, \nabla \times \mathfrak{v}) + \beta(\nabla \cdot \mathfrak{u}, \nabla \cdot \mathfrak{v}), \quad \mathfrak{u}, \mathfrak{v} \in \left(H\_0^1(\Omega)\right)^2. \tag{2}$$

We refer the reader to [3, 15] for a discussion about well-posedness.

To find an approximate solution of the problem L*α,βu* = *f* , with the stated boundary conditions, we consider the variational formulation (2) in a finite dimensional vector space <sup>V</sup>*<sup>h</sup>* <sup>⊂</sup> *H*<sup>1</sup> <sup>0</sup> *()*<sup>2</sup> , i.e.,

$$\alpha(\mathcal{L}\_{a,\beta}\mathfrak{u}\_h, \mathfrak{v}\_h) = \alpha(\nabla \times \mathfrak{u}\_h, \nabla \times \mathfrak{v}\_h) + \beta(\nabla \cdot \mathfrak{u}\_h, \nabla \cdot \mathfrak{v}\_h), \quad \mathfrak{u}\_h, \mathfrak{v}\_h \in \mathbb{V}\_h. \tag{3}$$

We focus on isogeometric analysis (IgA) as discretization technique, where the approximation space V*<sup>h</sup>* is chosen to be composed of vector fields whose components are linear combinations of tensor-product B-splines mapped according to *G*.

The discretization (3) leads to solving linear systems, which turn out to be severely ill-conditioned and require ad hoc fast solvers for a proper treatment [4, 6, 15]. This requires a deep understanding of the spectral properties of the related matrices. They depend on many factors: the problem parameters *α*, *β*, the basic curl and div operators, the mesh-size, the degree of the B-spline approximation, and the map *G* used to describe the geometry of the computational domain.

In this paper we provide a spectral study of these matrices using the theory of (multilevel block) Toeplitz [13, 17, 19] and generalized locally Toeplitz [10–12] sequences. More precisely, we show that such matrices admit a spectral distribution which can be described in terms of a so-called *spectral symbol*. We determine this spectral symbol and we reveal its dependence on the characteristic parameters of the problem listed above. The spectral analysis presented in this paper extends the results of [15] to the case of non-trivial geometry and relies on the spectral theory developed for isogeometric discretizations of elliptic problems in [7, 8]. We also refer the reader to [16] for a spectral analysis of the curl-curl operator.

The remainder of the paper is organized as follows. In Sect. 2 we introduce notations and definitions relevant for our spectral analysis, and we recall the basics of B-splines. In Sect. 3 we detail the IgA discretization matrices and we perform a spectral analysis of them. We numerically illustrate those results in Sect. 4. Finally, we conclude the paper in Sect. 5.

#### **2 Preliminaries**

In this section we collect some preliminary tools on spectral analysis and IgA discretizations. In particular, we recall the formal definition of spectral distribution for a general matrix-sequence and the definition of (cardinal) B-splines.

#### *2.1 Spectral Distribution*

Throughout the paper, we follow the standard convention for operations with multiindices (see e.g. [9, 18]). Given a multi-index *<sup>n</sup>* := *(n*1*,...,nd )* <sup>∈</sup> <sup>N</sup>*<sup>d</sup>* , we say *<sup>n</sup>* → ∞ if *ni* → ∞, *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,d*. Let <sup>C</sup>0*(*C*)* be the set of continuous functions *<sup>F</sup>* : <sup>C</sup> <sup>→</sup> <sup>C</sup> with compact support.

**Definition 1** Let *<sup>f</sup>* : *<sup>D</sup>* <sup>→</sup> <sup>C</sup>*s*×*<sup>s</sup>* be a measurable matrix-valued function, defined on a measurable set *<sup>D</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>q</sup>* with *<sup>q</sup>* <sup>≥</sup> 1, 0 *< μq (D) <* <sup>∞</sup>, where *μq* is the Lebesgue measure. Let {*An*}*<sup>n</sup>* be a matrix-sequence with dim*(An)* =: *d<sup>n</sup>* and *d<sup>n</sup>* → ∞ as *n* → ∞. Then, {*An*}*<sup>n</sup>* is distributed like the pair *(f, D)* in the sense of the eigenvalues, denoted by {*An*}*<sup>n</sup>* ∼*<sup>λ</sup> (f, D)*, if the following limit relation holds for all *<sup>F</sup>* <sup>∈</sup> <sup>C</sup>0*(*C*)*:

$$\lim\_{n \to \infty} \frac{1}{d\_{\mathfrak{n}}} \sum\_{j=1}^{d\_{\mathfrak{n}}} F(\lambda\_j(A\_{\mathfrak{n}})) = \frac{1}{\mu\_q(D)} \int\_D \frac{\sum\_{l=1}^s F(\lambda\_l(f(t)))}{s} d\mathfrak{t},\tag{4}$$

where *λj (An)*, *j* = 1*,...,d<sup>n</sup>* are the eigenvalues of *A<sup>n</sup>* and *λi(f )*, *i* = 1*,...,s* are the eigenvalues of *f* . We say that *f* is the (spectral) symbol of the matrix-sequence {*An*}*n*.

If *f* is smooth enough and the matrix-size of *An* is sufficiently large, then the limit relation (4) has the following informal meaning: a first set of *dn/s* eigenvalues of *A<sup>n</sup>* is approximated by a sampling of *λ*1*(f )* on a uniform equispaced grid of the domain *D*, a second set of *dn/s* eigenvalues of *An* is approximated by a sampling of *λ*2*(f )* on a uniform equispaced grid of the domain *D*, and so on, up to few outliers.

In general, understanding whether a matrix-sequence admits a symbol and how to compute it is not an easy task. On the other hand, any "reasonable" approximation of partial differential equations by local methods leads to matrix-sequences that are in the so-called generalized locally Toeplitz (GLT) algebra, and so admit a symbol [10–12]. The IgA discretization of our curl-div problem (3) fits in this frame.

#### *2.2 B-Splines*

For *p* ≥ 0 and *n* ≥ 1, consider the uniform knot sequence

$$\xi\_1 = \dots = \xi\_{p+1} := 0 < \xi\_{p+2} < \dots < \xi\_{p+n} < 1 =: \xi\_{p+n+1} = \dots = \xi\_{2^p + n + 1},$$

where *ξi*+*p*+<sup>1</sup> := *<sup>i</sup> <sup>n</sup>* , *i* = 0*,...,n*. This knot sequence allows us to define *n* + *p* B-splines of degree *p*. Let *χI* denote the characteristic function on the interval *I* .

**Definition 2** The B-splines of degree *p* over a uniform mesh of [0*,* 1], consisting of *n* intervals, are denoted by *N<sup>p</sup> <sup>i</sup>* : [0*,* <sup>1</sup>] → <sup>R</sup>, *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,n* <sup>+</sup> *<sup>p</sup>*, and defined recursively as follows: for 1 ≤ *i* ≤ *n* + 2*p*,

$$N\_i^0(\mathbf{x}) := \chi\_{[\xi^i, \xi\_{i+1})}(\mathbf{x});$$

for 1 ≤ *k* ≤ *p* and 1 ≤ *i* ≤ *n* + 2*p* − *k*,

$$N\_l^k(\mathbf{x}) := \frac{\mathbf{x} - \xi\_l}{\xi\_{l+k} - \xi\_l} N\_l^{k-1}(\mathbf{x}) + \frac{\xi\_{l+k+1} - \mathbf{x}}{\xi\_{l+k+1} - \xi\_{l+1}} N\_{l+1}^{k-1}(\mathbf{x}),$$

where a fraction with zero denominator is assumed to be zero.

It is well known (see e.g. [1]) that the B-splines *N<sup>p</sup> <sup>i</sup>* , *i* = 1*,...,n* + *p*, form a basis, and

$$N\_l^p(0) = N\_l^p(1) = 0, \quad i = 2, \ldots, n+p-1. \tag{5}$$

The central B-splines *N<sup>p</sup> <sup>i</sup>* , *i* = *p* + 1*,...,n*, are uniformly shifted and scaled versions of a single shape function, the so-called cardinal B-spline *φp* : <sup>R</sup> <sup>→</sup> <sup>R</sup>,

$$\phi\_0(t) := \chi\_{[0,1)}(t), \quad \phi\_p(t) := \frac{t}{p}\phi\_{p-1}(t) + \frac{p+1-t}{p}\phi\_{p-1}(t-1), \quad p \ge 1.$$

More precisely, we have

$$N\_i^{p}(\mathbf{x}) = \phi\_p(n\mathbf{x} - i + p + 1), \quad i = p + 1, \ldots, n.$$

The cardinal B-spline *φp* is a *Cp*−<sup>1</sup> function which is locally supported on the interval [0*, p* + 1].

Finally, we recall the definition of tensor-product B-splines.

**Definition 3** The tensor-product B-splines of bi-degree *p* := *(p*1*, p*2*)* over a uniform mesh of [0*,* 1] 2, consisting of *<sup>n</sup>* := *(n*1*, n*2*)* intervals in each direction, are denoted by *N<sup>p</sup> <sup>i</sup>* : [0*,* 1] <sup>2</sup> <sup>→</sup> <sup>R</sup>, *<sup>i</sup>* <sup>=</sup> **<sup>1</sup>***,..., <sup>n</sup>* <sup>+</sup> *<sup>p</sup>*, and defined as

$$N\_{\boldsymbol{\ell}}^{\boldsymbol{p}} := N\_{\boldsymbol{i}\_1}^{\boldsymbol{p}\_1} \otimes N\_{\boldsymbol{i}\_2}^{\boldsymbol{p}\_2},$$

where **<sup>1</sup>** := *(*1*,* <sup>1</sup>*)* and *<sup>i</sup>* := *(i*1*, i*2*)* <sup>∈</sup> <sup>N</sup>2.

We define the tensor-product spline space S*<sup>p</sup> <sup>n</sup>* as

$$\mathbb{S}\_{\mathfrak{n}}^{p} := \text{span}\left\{ N\_{l}^{p} : l = \mathbf{2}, \dots, n + p - 1 \right\}.\tag{6}$$

Note that all the elements of this space vanish at the boundary of [0*,* 1] 2; see (5). Hence, the space incorporates homogeneous Dirichlet boundary conditions.

#### **3 Spectral Analysis of Isogeometric Discretizations in 2D**

Suppose that the physical domain can be described by a global geometry map, *G* := [*G*1*, G*2] *<sup>T</sup>* , *<sup>G</sup>* : <sup>B</sup> <sup>→</sup> , which is invertible in the parametric domain <sup>B</sup> := [0*,* 1] <sup>2</sup> and satisfies *<sup>G</sup>(∂)* <sup>B</sup> <sup>=</sup> *∂*. Let

$$\mathbb{V}\_{h} = \text{span}\left\{ \phi\_{l\_{1},l\_{2}}^{p,1}, \phi\_{j\_{1},j\_{2}}^{p,2} : i\_{l}, j\_{l} = 2, \dots, n+p-1; \ l = 1,2 \right\},\tag{7}$$

where

$$
\phi\_{i\_1,i\_2}^{p,1} := \begin{bmatrix} \varphi\_{i\_1,i\_2} \\ 0 \end{bmatrix}, \quad \phi\_{j\_1,j\_2}^{p,2} := \begin{bmatrix} 0 \\ \varphi\_{j\_1,j\_2} \end{bmatrix},
$$

and for *kl* ∈ {*il, jl*}, *l* = 1*,* 2,

$$
\varphi\_{k\_1,k\_2}(\mathbf{x}\_1,\mathbf{x}\_2) := \hat{\varphi}\_{k\_1,k\_2}(\mathbf{G}^{-1}(\mathbf{x}\_1,\mathbf{x}\_2)) = \hat{\varphi}\_{k\_1,k\_2}(\hat{\mathbf{x}}\_1,\hat{\mathbf{x}}\_2), \quad (\mathbf{x}\_1,\mathbf{x}\_2) = \mathbf{G}(\hat{\mathbf{x}}\_1,\hat{\mathbf{x}}\_2).
$$

Then, we set *<sup>ϕ</sup>*ˆ*k*1*,k*<sup>2</sup> <sup>=</sup> *<sup>N</sup><sup>p</sup> <sup>k</sup>*<sup>1</sup> <sup>⊗</sup> *<sup>N</sup><sup>p</sup> k*2 , i.e., the tensor-product B-splines in (6). For simplicity of notation, we have taken *n*<sup>1</sup> = *n*<sup>2</sup> = *n* and *p*<sup>1</sup> = *p*<sup>2</sup> = *p*. Also note that

$$\begin{split} \nabla \varphi\_{k\_{1},k\_{2}} &= (J\_{G})^{-T} \nabla (N\_{k\_{1}}^{p} \otimes N\_{k\_{2}}^{p}) \\ &= \frac{1}{\det(J\_{G})} \begin{bmatrix} \frac{\partial G\_{2}}{\partial \hat{\boldsymbol{x}}\_{2}} (N\_{k\_{1}}^{p})' \otimes N\_{k\_{2}}^{p} - \frac{\partial G\_{2}}{\partial \hat{\boldsymbol{x}}\_{1}} N\_{k\_{1}}^{p} \otimes (N\_{k\_{2}}^{p})' \\ -\frac{\partial G\_{1}}{\partial \hat{\boldsymbol{x}}\_{2}} (N\_{k\_{1}}^{p})' \otimes N\_{k\_{2}}^{p} + \frac{\partial G\_{1}}{\partial \hat{\boldsymbol{x}}\_{1}} N\_{k\_{1}}^{p} \otimes (N\_{k\_{2}}^{p})' \end{bmatrix}. \end{split}$$

where

$$J\_G := \begin{bmatrix} \frac{\partial G\_1}{\partial \hat{\boldsymbol{x}}\_1} \frac{\partial G\_1}{\partial \hat{\boldsymbol{x}}\_2} \\ \frac{\partial G\_2}{\partial \hat{\boldsymbol{x}}\_1} \frac{\partial G\_2}{\partial \hat{\boldsymbol{x}}\_2} \end{bmatrix}.$$

In the following, we start by discussing the coefficient matrices arising from the IgA discretization of a generalized Poisson problem. Then, we construct the coefficient matrices related to the IgA discretization of our curl-div problem (3) using (7), and we perform a spectral analysis.

#### *3.1 Matrices Related to a Generalized Poisson Problem*

Let us focus on the following bivariate generalized Poisson operator:

$$
\mathcal{L}\!Ku := -\nabla \cdot K\nabla u,\tag{8}
$$

where *<sup>K</sup>* : <sup>→</sup> <sup>R</sup>2×2, and consider homogeneous Dirichlet boundary conditions, i.e., *u* = 0 on *∂*. From [8] we know that the Galerkin discretization of (8) using one component of the space (7) leads to the coefficient matrix <sup>A</sup>*p,K <sup>n</sup>,<sup>G</sup>* defined by

$$\left[\mathcal{R}\_{\mathfrak{n},G}^{p,K}\right]\_{l,j} := \int\_{\widehat{\Omega}} \left[\nabla (N\_{j\_1+1}^p \otimes N\_{j\_2+1}^p)^T \, K\_{\mathcal{G}} \, \nabla (N\_{l\_1+1}^p \otimes N\_{l\_2+1}^p)\right] |\det(J\_{\mathcal{G}})|,$$

where

$$K\_{\mathbf{G}} := (J\_{\mathbf{G}})^{-1} K(\mathbf{G}) (J\_{\mathbf{G}})^{-T}.$$

It has been proved in [8] that such matrices admit a spectral distribution according to Definition 1. To this end, let us define

$$H\_p := \begin{bmatrix} \mathfrak{s}\_p \otimes \mathfrak{m}\_p & \mathfrak{a}\_p \otimes \mathfrak{a}\_p \\ \mathfrak{a}\_p \otimes \mathfrak{a}\_p & \mathfrak{m}\_p \otimes \mathfrak{s}\_p \end{bmatrix},$$

with

$$\begin{aligned} \mathfrak{m}\_p(\theta) &:= \phi\_{2^{p+1}}(p+1) + 2 \sum\_{k=1}^p \phi\_{2^{p+1}}(p+1-k) \cos(k\theta), \\ \mathfrak{a}\_p(\theta) &:= -2 \sum\_{k=1}^p \phi'\_{2^{p+1}}(p+1-k) \sin(k\theta), \\ \mathfrak{s}\_p(\theta) &:= -\phi\_{2^{p+1}}'(p+1) - 2 \sum\_{k=1}^p \phi\_{2^{p+1}}''(p+1-k) \cos(k\theta). \end{aligned}$$

**Theorem 1** *Let <sup>G</sup> be a regular geometry map, i.e., <sup>G</sup>* <sup>∈</sup> *<sup>C</sup>*1*(*[0*,* <sup>1</sup>] <sup>2</sup>*) and* det*(JG)* = 0 *in* [0*,* 1] <sup>2</sup>*, and let <sup>K</sup> be a symmetric matrix. Then, the matrix-sequence* {A*p,K n,G* }*n with n* = *(n, n) is distributed, in the sense of the eigenvalues, like the function*

$$f\_{\mathbf{G}}^{p,K}(\hat{\mathbf{x}},\boldsymbol{\theta}) := \left[\mathbb{I}\ \mathbf{l}\right] \left(\left|\det(J\_{\mathbf{G}}(\hat{\mathbf{x}}))\right| K\_{\mathbf{G}}(\hat{\mathbf{x}}) \circ H\_{p}(\boldsymbol{\theta})\right) \left[\mathbb{I}\ \mathbf{l}\right]^{T},\tag{9}$$

*where x*ˆ ∈ [0*,* 1] <sup>2</sup>*, <sup>θ</sup>* ∈ [−*π, π*] <sup>2</sup>*, and* ◦ *is the Hadamard matrix product.*

We refer the reader to [8, 9] for a detailed discussion about the symbol (9).

#### *3.2 Matrices Related to Our Curl-Div Problem*

We can reformulate (1) in 2D as

$$\mathcal{L}\_{\alpha,\beta}u = \alpha \begin{bmatrix} \frac{\partial^2 u\_2}{\partial x\_1 \ge \varepsilon\_2} - \frac{\partial^2 u\_1}{\partial x\_2^2} \\ \frac{\partial^2 u\_1}{\partial x\_1 \ge \varepsilon\_2} - \frac{\partial^2 u\_2}{\partial x\_1^2} \end{bmatrix} - \beta \begin{bmatrix} \frac{\partial^2 u\_2}{\partial x\_1 \ge \varepsilon\_2} + \frac{\partial^2 u\_1}{\partial x\_1^2} \\ \frac{\partial^2 u\_1}{\partial x\_1 \ge \varepsilon\_2} + \frac{\partial^2 u\_2}{\partial x\_2^2} \end{bmatrix},\tag{10}$$

where *u(x*1*, x*2*)* := [*u*1*(x*1*, x*2*), u*2*(x*1*, x*2*)*] *<sup>T</sup>* . When discretizing the weak form (3) using the space (7) we arrive at the 2 × 2 block matrix

$$
\mathcal{H}\_{\mathfrak{n},G}^{p,\alpha,\beta} := \alpha \begin{bmatrix}
\mathcal{R}\_{\mathfrak{n},11}^{p,\text{curl}} \, \mathcal{R}\_{\mathfrak{n},12}^{p,\text{curl}} \\
\mathcal{R}\_{\mathfrak{n},21}^{p,\text{curl}} \, \mathcal{R}\_{\mathfrak{n},22}^{p,\text{curl}}
\end{bmatrix} + \beta \begin{bmatrix}
\mathcal{R}\_{\mathfrak{n},11}^{p,\text{div}} \, \mathcal{R}\_{\mathfrak{n},12}^{p,\text{div}} \\
\mathcal{R}\_{\mathfrak{n},21}^{p,\text{div}} \, \mathcal{R}\_{\mathfrak{n},22}^{p,\text{div}}
\end{bmatrix}.
$$

The blocks related to the curl-curl operator *(*∇×·*,* ∇×·*)* are given by

! A*p,*curl *<sup>n</sup>,*<sup>11</sup> " *<sup>i</sup>,<sup>j</sup>* = B ! <sup>−</sup>*∂G*<sup>1</sup> *∂x*ˆ<sup>2</sup> *(N<sup>p</sup> <sup>j</sup>*1+1*)* <sup>⊗</sup> *<sup>N</sup><sup>p</sup> <sup>j</sup>*2+<sup>1</sup> <sup>+</sup> *∂G*<sup>1</sup> *∂x*ˆ<sup>1</sup> *<sup>N</sup><sup>p</sup> <sup>j</sup>*1+<sup>1</sup> <sup>⊗</sup> *(N<sup>p</sup> <sup>j</sup>*2+1*)* " ! <sup>−</sup>*∂G*<sup>1</sup> *∂x*ˆ<sup>2</sup> *(N<sup>p</sup> <sup>i</sup>*1+1*)* <sup>⊗</sup> *<sup>N</sup><sup>p</sup> <sup>i</sup>*2+<sup>1</sup> <sup>+</sup> *∂G*<sup>1</sup> *∂x*ˆ<sup>1</sup> *<sup>N</sup><sup>p</sup> <sup>i</sup>*1+<sup>1</sup> <sup>⊗</sup> *(N<sup>p</sup> <sup>i</sup>*2+1*)* " 1 | det*(JG)*| *,* ! A*p,*curl *<sup>n</sup>,*<sup>12</sup> " *<sup>i</sup>,<sup>j</sup>* = − B ! *∂G*2 *∂x*ˆ<sup>2</sup> *(N<sup>p</sup> <sup>j</sup>*1+1*)* <sup>⊗</sup> *<sup>N</sup><sup>p</sup> <sup>j</sup>*2+<sup>1</sup> <sup>−</sup> *∂G*<sup>2</sup> *∂x*ˆ<sup>1</sup> *<sup>N</sup><sup>p</sup> <sup>j</sup>*1+<sup>1</sup> <sup>⊗</sup> *(N<sup>p</sup> <sup>j</sup>*2+1*)* " ! <sup>−</sup>*∂G*<sup>1</sup> *∂x*ˆ<sup>2</sup> *(N<sup>p</sup> <sup>i</sup>*1+1*)* <sup>⊗</sup> *<sup>N</sup><sup>p</sup> <sup>i</sup>*2+<sup>1</sup> <sup>+</sup> *∂G*<sup>1</sup> *∂x*ˆ<sup>1</sup> *<sup>N</sup><sup>p</sup> <sup>i</sup>*1+<sup>1</sup> <sup>⊗</sup> *(N<sup>p</sup> <sup>i</sup>*2+1*)* " 1 | det*(JG)*| *,* ! A*p,*curl *<sup>n</sup>,*<sup>22</sup> " *<sup>i</sup>,<sup>j</sup>* = B ! *∂G*2 *∂x*ˆ<sup>2</sup> *(N<sup>p</sup> <sup>j</sup>*1+1*)* <sup>⊗</sup> *<sup>N</sup><sup>p</sup> <sup>j</sup>*2+<sup>1</sup> <sup>−</sup> *∂G*<sup>2</sup> *∂x*ˆ<sup>1</sup> *<sup>N</sup><sup>p</sup> <sup>j</sup>*1+<sup>1</sup> <sup>⊗</sup> *(N<sup>p</sup> <sup>j</sup>*2+1*)* " ! *∂G*2 *∂x*ˆ<sup>2</sup> *(N<sup>p</sup> <sup>i</sup>*1+1*)* <sup>⊗</sup> *<sup>N</sup><sup>p</sup> <sup>i</sup>*2+<sup>1</sup> <sup>−</sup> *∂G*<sup>2</sup> *∂x*ˆ<sup>1</sup> *<sup>N</sup><sup>p</sup> <sup>i</sup>*1+<sup>1</sup> <sup>⊗</sup> *(N<sup>p</sup> <sup>i</sup>*2+1*)* " 1 | det*(JG)*| *,*

and <sup>A</sup>*p,*curl *<sup>n</sup>,*<sup>21</sup> <sup>=</sup> <sup>A</sup>*p,*curl *<sup>n</sup>,*<sup>12</sup> . Note that all those blocks are symmetric matrices. Similarly, the blocks related to the div-div operator *(*∇·*,* ∇·*)* are given by (see also (10))

$$\mathcal{R}^{p, \text{div}}\_{\mathfrak{n}, 11} = \mathcal{R}^{p, \text{curl}}\_{\mathfrak{n}, 22}, \quad \mathcal{R}^{p, \text{div}}\_{\mathfrak{n}, 12} = \mathcal{R}^{p, \text{div}}\_{\mathfrak{n}, 21} = -\mathcal{R}^{p, \text{curl}}\_{\mathfrak{n}, 12}, \quad \mathcal{R}^{p, \text{div}}\_{\mathfrak{n}, 22} = \mathcal{R}^{p, \text{curl}}\_{\mathfrak{n}, 11}.$$

In the next subsection we compute the symbol of the matrix-sequence {A*p,α,β <sup>n</sup>,<sup>G</sup>* }*n*.

#### *3.3 Spectral Symbol of Curl-Div Matrices* <sup>A</sup>*p,α,β n,G*

We are now ready for the main contribution of the paper: we show that the matrixsequence {A*p,α,β <sup>n</sup>,<sup>G</sup>* }*<sup>n</sup>* admits a spectral distribution according to Definition 1. This extends the symbol computation in [15] to the case of non-trivial geometry.

**Theorem 2** *Let <sup>G</sup> be a regular geometry map, i.e., <sup>G</sup>* <sup>∈</sup> *<sup>C</sup>*1*(*[0*,* <sup>1</sup>] <sup>2</sup>*) and* det*(JG)* = 0 *in* [0*,* 1] <sup>2</sup>*. Then, the matrix-sequence* {A*p,α,β <sup>n</sup>,<sup>G</sup>* }*<sup>n</sup> with n* = *(n, n) is distributed, in the sense of the eigenvalues, like the* 2 × 2 *matrix-valued function*

$$f\_{G}^{p,\alpha,\beta}(\hat{\mathfrak{x}},\theta) := \alpha f\_{G}^{p,\text{curl}}(\hat{\mathfrak{x}},\theta) + \beta f\_{G}^{p,\text{div}}(\hat{\mathfrak{x}},\theta),\tag{11}$$

*where x*ˆ ∈ [0*,* 1] <sup>2</sup>*, <sup>θ</sup>* ∈ [−*π, π*] <sup>2</sup>*, and*

$$\begin{split} f\_{\boldsymbol{G}}^{p,\operatorname{curl}}(\hat{\mathbf{x}},\boldsymbol{\theta}) &:= \frac{1}{|\det(J\_{\boldsymbol{G}}(\hat{\mathbf{x}}))|} \boldsymbol{J}\_{\boldsymbol{G}}(\hat{\mathbf{x}}) \, \boldsymbol{P} \, H\_{p}(\boldsymbol{\theta}) \, \boldsymbol{P}^{T} \left(J\_{\boldsymbol{G}}(\hat{\mathbf{x}})\right)^{T}, \quad \boldsymbol{P} := \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \\\ f\_{\boldsymbol{G}}^{p,\operatorname{div}}(\hat{\mathbf{x}},\boldsymbol{\theta}) &:= |\det(J\_{\boldsymbol{G}}(\hat{\mathbf{x}}))| \, (J\_{\boldsymbol{G}}(\hat{\mathbf{x}}))^{-T} \, H\_{p}(\boldsymbol{\theta}) \, (J\_{\boldsymbol{G}}(\hat{\mathbf{x}}))^{-1}. \end{split}$$

*Proof* From (10) it follows that the block <sup>A</sup>*p,*curl *<sup>n</sup>,*<sup>11</sup> corresponds to the isogeometric discretization of <sup>−</sup>*∂*2*u*<sup>1</sup> *∂x*<sup>2</sup> 2 . By means of a direct computation we can verify that Theorem 1, with

$$K = \begin{bmatrix} 0 \ 0 \\ 0 \ 1 \end{bmatrix}.$$

ensures that the matrix-sequence {A*p,*curl *<sup>n</sup>,*<sup>11</sup> }*<sup>n</sup>* is distributed in the sense of the eigenvalues like the entry *(*1*,* 1*)* of the matrix *f p,*curl *<sup>G</sup>* . The same argument (using a suitable matrix *K*) can also be applied to the remaining blocks. Then, it can be checked that all the considered blocks satisfy the hypotheses of [10, Theorem 5], which implies that <sup>A</sup>*p,α,β <sup>n</sup>,<sup>G</sup>* is similar, via a proper permutation matrix, to a matrix T*p,α,β <sup>n</sup>,<sup>G</sup>* such that the matrix-sequence {T*p,α,β <sup>n</sup>,<sup>G</sup>* }*<sup>n</sup>* has its symbol given by (11). %&

In the context of IgA, the geometry map *G* is expressed in terms of the same B-spline basis as used for the discretization space. However, as can be seen from the proof, the spectral result in the above theorem holds for any (smooth enough) geometry map.

Finally, we remark that the *p*-dependence of the symbol in (11) is completely captured by the matrix *Hp(θ)*. As described in Sect. 3.1 this matrix also appears in the symbol expression of a generalized Poisson problem; its properties have been discussed in [5, 8].

#### **4 Numerical Example**

In this section we numerically illustrate the spectral results obtained in Sect. 3.3, using the same test problem as in [15, Sect. 5]. More precisely, we consider (3) defined on a quarter of an annulus,

$$\mathfrak{U}\mathfrak{D} = \{ (\mathbf{x}\_1, \mathbf{x}\_2) \in \mathbb{R}^2 \colon r^2 < \mathbf{x}\_1^2 + \mathbf{x}\_2^2 < \mathbf{R}^2, \ \mathbf{x}\_1 > \mathbf{0}, \ \mathbf{x}\_2 > \mathbf{0} \}, \quad r = 1, \quad \mathbf{R} = 4, \dots$$

with

$$\mathbf{G}(\hat{\boldsymbol{\alpha}}\_{1},\hat{\boldsymbol{\alpha}}\_{2}) = \begin{cases} \boldsymbol{x}\_{1} = \left[r + \hat{\boldsymbol{\alpha}}\_{1}(\boldsymbol{R} - r)\right] \cos\left(\frac{\pi}{2}\hat{\boldsymbol{\alpha}}\_{2}\right) \\ \boldsymbol{x}\_{2} = \left[r + \hat{\boldsymbol{\alpha}}\_{1}(\boldsymbol{R} - r)\right] \sin\left(\frac{\pi}{2}\hat{\boldsymbol{\alpha}}\_{2}\right) \end{cases} \quad (\hat{\boldsymbol{\alpha}}\_{1},\hat{\boldsymbol{\alpha}}\_{2}) \in \left[0, 1\right]^{2} \dots$$

Let us fix *<sup>n</sup>* := *(n, n)* <sup>∈</sup> <sup>N</sup>2, *<sup>p</sup>* := *(p, p)* <sup>∈</sup> <sup>N</sup><sup>2</sup> and *<sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>2</sup> such that *<sup>m</sup>*<sup>2</sup> <sup>=</sup> *n* + *p* − **2**. We start by defining two equispaced grids on [0*,* 1] <sup>2</sup> and [0*, π*] 2:

$$\text{If } x\_j := \frac{j}{m - 1}, \quad \theta\_k := \frac{k\pi}{m - 1}, \quad j, k = \mathbf{0}, \dots, m - 1.$$

Then, we denote by *\*i* the set of all evaluations of *λi(f p,α,β <sup>G</sup> )* on := {*(xj , θ <sup>k</sup>)*, *j, k* = **0***,..., m* − **1**} for a fixed *i* ∈ {1*,* 2}. Note that it suffices to consider only [0*, π*] <sup>2</sup> because the symbol (11) is symmetric on [−*π, π*] 2, and hence also its eigenvalue functions.

In Fig. 1 we numerically check relation (11) by comparing the eigenvalues of A*p,α,β <sup>n</sup>,<sup>G</sup>* with the values collected in *\** = {*\**1*, \**2}, ordered in ascending way, for *α* = 1 and *β* = 0*.*1. We observe that, in a complete agreement with the theory, the considered sampling of *λi(f p,α,β <sup>G</sup> )*, *i* = 1*,* 2, describes quite accurately the behavior of the eigenvalues of <sup>A</sup>*p,α,β <sup>n</sup>,<sup>G</sup>* , also for relatively small matrix-sizes, up to few outliers.

#### **5 Conclusions**

We have analyzed the spectral properties of matrix-sequences arising from isogeometric Galerkin methods for weighted curl-div operators on general planar domains, considering a non-trivial geometry map. More precisely, we have shown that an (asymptotic) spectral distribution exists and it is compactly described by a 2 × 2 spectral symbol. In other words, the eigenvalues of the matrices we are dealing with can be approximated accurately by a uniform sampling of the two eigenvalue functions of the 2 × 2 symbol matrix. The symbol depends on the characteristic parameters of the problem and on the geometry of the physical domain. Its formal

**Fig. 1** Comparison of the eigenvalues of <sup>A</sup>*p,α,β <sup>n</sup>,<sup>G</sup>* (open circle) with *\** = {*\**1*, \**2} collecting uniform samples of *λi(f p,α,β <sup>G</sup> )*, *i* = 1*,* 2 (asterisk), ordered in ascending way, varying both *n* and *p*, and fixing *α* = 1 and *β* = 0*.*1. (**a**) *p* = 3, *n* = 15. (**b**) *p* = 3, *n* = 35. (**c**) *p* = 4, *n* = 14. (**d**) *p* = 4, *n* = 34. (**e**) *p* = 5, *n* = 13. (**f**) *p* = 5, *n* = 33

structure nicely mimics the structure of the differential problem. The numerical results show a very good matching between the true eigenvalues and the estimates provided by the symbol, already for relatively small matrix-sizes.

The convergence of iterative solvers for linear systems strongly depends on the spectral behavior of the corresponding coefficient matrices. Since the symbol gives a precise description of the spectrum of the curl-div matrix <sup>A</sup>*p,α,β <sup>n</sup>,<sup>G</sup>* , it could be helpful in the design of good preconditioners that lead to better performance than current solution strategies, like the one in [15, Sect. 5].

**Acknowledgements** This work was partially supported by the INdAM research group GNCS, by the MIUR-DAAD Joint Mobility 2017 Programme through the project "ATOMA", and by the MIUR Excellence Department Project awarded to the Department of Mathematics, University of Rome Tor Vergata (CUP E83C18000100006).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Performance of Preconditioners for Large-Scale Simulations Using Nek5000**

**N. Offermans, A. Peplinski, O. Marin, E. Merzari, and P. Schlatter**

### **1 Introduction**

The preconditioning of elliptic problems characterized by the propagation of information at infinite speed over the domain is a numerically challenging task. We study the case of the Poisson equation arising from the numerical resolution of the incompressible Navier–Stokes equations by operator splitting. We consider Nek5000, a code based on the spectral element method, as our framework. The current preconditioning strategy is based on an additive Schwarz method, which combines a domain decomposition method [5] and a so-called coarse grid problem [10]. The first step consists in solving directly local overlapping Poisson problems and is easily parallelizable. The second step corresponds to a Poisson-like problem over the whole domain and is hard to scale because of its relatively low number of degrees of freedom and the bottleneck induced by global communication.

A scalable solver for the coarse grid problem is critical to ensure strong scaling of the code. Existing strategies include a direct solution method similar to a Cholesky decomposition, called *XX<sup>T</sup>* [14], and an algebraic multigrid (AMG) solver [6]. While the first choice works well for relatively small problems (typically *<*100*,*000 spectral elements on *<*10*,*000 cores), the second option is preferred for large scale simulations. The current AMG solver, which we will denote as the *in-house AMG*, is fast and scales well [11]. It has been shown that the use of AMG can speed up

e-mail: nof@mech.kth.se; adam@mech.kth.se; pschlatt@mech.kth.se

N. Offermans (-) · A. Peplinski · P. Schlatter

Linné FLOW Centre and Swedish e-Science Research Centre (SeRC), KTH Mechanics, Stockholm, Sweden

O. Marin · E. Merzari Argonne National Laboratory, Lemont, IL, USA e-mail: oanam@mcs.anl.gov; emerzari@anl.gov

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_20

large-scale simulations by up to 10%. In addition, the *XX<sup>T</sup>* has been designed for optimal performance on a number of cores which is a power of 2, whereas the AMG is insensitive to this parameter.

However, the AMG solver requires a setup phase, performed once for each mesh, by an external and serial code. Besides inducing an unwanted overhead, it also limits the use of the *in-house AMG* solver in the framework of mesh refinement, which is the main motivation for this work. Therefore, we propose to replace the *in-house AMG* by *BoomerAMG*, a parallel AMG solver for arbitrary unstructured grids from the *hypre* library for linear algebra [1, 4, 8]. *BoomerAMG* offers a number of parallel algorithms for the coarsening, interpolation and smoothing steps of the AMG setup, to accommodate various types of problems, meshes and architectures. The *BoomerAMG* solver will be tested in terms of scalability and time to solution.

Scaling tests for the *BoomerAMG* solver have been performed up to 4096 cores by Baker et al. [1]. Matrices arising from the finite element and finite difference discretizations of 2D and 3D scalar diffusion problems were considered. The authors used HMIS coarsening and extended+i interpolation and showed that *l*1-scaled Jacobi, *l*1-scaled Gauss-Seidel and Chebyshev smoothers are good choices for such problems.

Weak scaling up to 125*,*000 cores has been presented in Ref. [2], where *BoomerAMG* was used as a preconditioner for a conjugate gradient solver. The test case considered is that of a 3D Laplace operator. The parameters for the AMG solver were again HMIS coarsening, extended+i interpolation and symmetric hybrid Gauss–Seidel for the smoother. Aggressive coarsening with multipass interpolation was used on the finest grid, while the problem on the coarsest level was solved by Gaussian elimination. The authors show the impact of additional parameters such as the use of 64 bits for the integers or the use of an hybrid parallel strategy with OpenMP and MPI.

In the present work, we use the *BoomerAMG* from *hypre* to precondition a GMRES solver for the pressure equation arising from the spectral element discretization of the Navier–Stokes equations. We study strong scaling up to 131,072 cores on two different supercomputers: Mira, based on the IBM Blue Gene/Q architecture, and Hazel Hen, a Cray XC40 system. The first test case considered is the flow around a NACA4412 airfoil, which we use to identify a set of best parameters for the *BoomerAMG* solver. A second test case is employed for a strong scaling study: the turbulent flow in wire-wrapped pin bundles [3, 13].

The paper is organized as follows. In Sect. 2, we introduce the discretization method and describe the preconditioning strategy and the *hypre* library. In Sect. 3, we study which set of parameters gives the fastest time to solution for the problem at hand. Using those parameters, we perform a strong scaling study for the flow in wire-wrapped pin bundles in Sect. 4. We finish with conclusions and outlook in Sect. 5

#### **2 Problem Description**

Considering an operator splitting strategy to solve the Navier–Stokes equations, the consistent pressure *p* is the solution to a Laplace problem of the form *p* = *r*. In Nek5000, this equation is discretized using the spectral element method [7] and it has been shown that it is well preconditioned by an overlapping additive Schwarz method. The preconditioner combines local problems R<sup>T</sup> kA−<sup>1</sup> <sup>k</sup> Rk and a coarse grid problem R<sup>T</sup> 0A−<sup>1</sup> <sup>0</sup> R0 and is expressed as

$$\mathbf{M}^{-1} = \mathbf{R}\_0^\mathsf{T} \mathbf{A}\_0^{-1} \mathbf{R}\_0 + \sum\_{k=1}^K \mathbf{R}\_k^\mathsf{T} \mathbf{A}\_k^{-1} \mathbf{R}\_k \ .$$

where R0 and Rk are restriction operators, Ak are local stiffness matrices and *K* is the total number of spectral elements. The matrix A0 corresponds to a Laplace operator defined on the element vertices only. Because of its low number of degrees of freedom and global extent, the scalability of the coarse grid problem is mostly limited by communication and latency. We note that the term "coarse grid" here refers to the fact that *A*<sup>0</sup> is defined on the vertices of the spectral elements only. The problem is therefore "coarse" in comparison to the solution fields, which are expanded on the Gauss–Lobatto–Legendre points inside each spectral element (typically order 10 quadrature points in each direction). When talking about the different levels arising from the coarsening phase of the AMG setup, we will use the term "coarse level" to avoid confusion.

As mentioned before, the solver of choice for large problems is currently an *inhouse AMG*, developed specifically for Nek5000 [6], whose main drawback is a setup phase by an external and serial code. The default option for the setup step of the *in-house AMG* is a Matlab code, which uses Ostrowski coarsening with norm bound, a diagonal Chebyshev smoother, applied on the second branch of the V-cycle only, and an energy-minimizing interpolation, all described in Ref. [9]. Other properties of the AMG include no smoothing on the finest level, a number of smoothing steps predefined for each level during the setup phase and a coarsest level made of one variable only. The good scalability of the *in-house AMG* is due to the fact that it automatically chooses, at run time, the fastest communication strategy at each level of the coarsening process, between three options: a pairwise exchange, a crystal router method or an allreduce operation. Previous work has shown that, when far from the strong scaling limit, the total time spent in the pressure solver is typically 85–90% of the total computational time, including the time spent in the coarse grid solver, which amounts to about 5–10% of that [11].

As an intermediate step in a previous work [12], the coarsening and interpolation steps from this Matlab code have been transferred to *BoomerAMG*, while the smoother and the solver were left unchanged. This "hybrid" serial setup was shown to significantly reduce the setup time without affecting the rapidity and scalability of the *in-house AMG* solver.

In the present work, we completely replace the *in-house AMG* by *BoomerAMG*, which allows for the whole AMG problem (setup + solver) to be performed online and in parallel. The existing code requires only limited modifications. Local contributions to the coarse grid operator are built on each process and then handed over to *BoomerAMG*, which takes care of assembling the global operator and of communication. If the operator possesses a nullspace, the solution is normalized such that the mean of the solution entries is 0. Apart from that, the critical aspect of switching to *BoomerAMG* is the choice of parameters for the setup and for the solver that match the performance of the *in-house AMG*.

#### **3 Optimal Parameter Selection**

The choice of parameters for the *BoomerAMG* solver is done by testing a set of parameters on a medium-sized test case and looking for the optimal combination. We consider the turbulent flow around a NACA 4412 airfoil at *Rec* = 400*,*000 [13] on a mesh made of 253,980 elements, with polynomial order 11, and we run the simulation for 30 timesteps. The best set of parameters is defined as the one which minimizes the time to solution for the pressure equation. This is achieved by balancing two competing aspects: the accuracy of the coarse grid solution and the total number of iterations of the GMRES solver used for the pressure equation. Since the AMG is used as a preconditioner, a high level of accuracy is not paramount. Yet, it should be sufficient to ensure efficient preconditioning. Based on results obtained with the *in-house AMG*, the initial error on the coarse grid problem should be reduced by approximately one order of magnitude. While the *in-house AMG* is designed to ensure that a given reduction in the error is attained at minimal cost, the *BoomerAMG* is designed to ensure the maximum reduction in the error occurs at a given cost. Therefore, the best choice of parameters for the *BoomerAMG* is case dependent; here we optimize this choice in the case of large 3D simulations, when the use of AMG is most relevant.

All tests are run on 4096 processors on the Blue Gene Mira at the Argonne National Laboratory. We test a total of 96 combinations of the following parameters:


We assign a letter and a number to each option which will be used to identify the method when comparing the results.


**Table 1** Five best timings for the *BoomerAMG* with corresponding parameters as compared to the *in-house AMG*

3 V-cycles for the *BoomerAMG*. Total number of iterations for the pressure solver and timings are reported for 30 timesteps

**Table 2** Five best timings for the *BoomerAMG* with corresponding parameters as compared to the *in-house AMG*


1 V-cycle for the *BoomerAMG*. Total number of iterations for the pressure solver and timings are reported for 30 timesteps

In all cases, we set a relative tolerance of 0*.*1 on the solution of the coarse grid problem and a maximum of 3 V-cycles for the AMG. Moreover, the problem on the coarsest level is solved by Gaussian elimination. Total timings and number of pressure iterations for 30 timesteps are presented in Table 1 for the *in-house AMG* and the five fastest combinations of parameters for the *BoomerAMG*. We assign a letter to each run for comparison later. The time spent in the pressure solver is reported for process 0 and the time spent in the coarse grid solver (CGS) is the maximum value over all processors for a single run. Since an allocation on Mira is always made of cores that are physically contiguous and isolated from the rest of the network, the timings suffer little noise and uncertainty. We see that the number of pressure iterations is on par with the *in-house AMG*, but the time spent in the coarse grid solver is more that three times as much. As a result, the time spent in the pressure solver is between 8 and 13% higher.

To accelerate the *BoomerAMG* solver, we set the maximum number of V-cycles to 1, instead of 3, and perform another test using the optimal parameters. This should significantly accelerate the resolution of the coarse grid solver but reduce the accuracy of the solution, therefore increasing the number of pressure iterations. The corresponding results are presented in Table 2. A surprising result comes from run d, where the number of pressure iterations has actually decreased. Since this might be the sign of an unstable solver, we discard this choice of parameters. The second best set of parameters corresponds to HMIS coarsening, extended+i interpolation, *l*1-Gauss–Seidel forward solve on the down cycle + backward solve on the up cycle for the smoother and an AMG strength threshold of 0*.*5, which we use for the rest of our simulations.

We also experimented with more aggressive non-Galerkin coarsening to change the communication pattern on the largest AMG levels and reduce communication time. However, this caused a drop of accuracy for the coarse grid solver and an increase of iterations for the pressure solver, which led to slower overall timings as a result.

We note that the time to setup the coarse grid problem, which includes building the matrix *A*<sup>0</sup> and performing the *BoomerAMG* setup, is negligible. In the present configuration, it amounts to less than a second, whereas a single timestep takes about 20 s. Since the setup is performed once at the beginning of the simulation, we do not discuss the matter further. Furthermore, it is orders of magnitude lower than the serial versions of the setup [12].

Further analysis of the results shows that the use of classical Ruge-Stueben, Falgout, CGC or CGC-E significantly slows down the solver. Moreover, another valid choice for the smoother could have been *l*1-scaled hybrid symmetric Gauss– Seidel, whose speed is on par with our choice. Other relaxation methods, Chebyshev and *l*1-scaled Jacobi, are also consistently slower. Finally, the choice of the interpolation method does not have a significant impact and both methods give very similar results.

#### **4 Scaling Results**

As is often the case with numerical simulations, we look for the fastest path to solution for a given problem. Therefore, a strong scaling study, where the total amount of work is fixed and the number of processes is increased, is a relevant measure of the efficiency of the *BoomerAMG*. This is opposed to a weak scaling analysis, where the amount of work per process is kept constant, which is not carried out here.

We consider the turbulent flow inside a reactor assembly made of 61 wirewrapped pins, a configuration appearing in a nuclear reactor core [3]. The mesh consists of 1,650,240 elements, uses polynomial order 7 and has a complex, fully three-dimensional topology, making it a relevant test case for evaluating preconditioning strategies. The initial velocity field is turbulent and we run the simulation for 10 timesteps. Two series of tests were conducted on two supercomputers: Mira and Hazel Hen. The number of compute nodes considered is 512, 1024, 2048, 4096 and 8192 on the former machine and 256, 512, 1024, 2048, 4096 on the latter one. On both computers, the number of MPI processes per node is equal to the number of available compute cores, i.e. 16 on Mira and 24 on Hazel Hen. We use our previous defined optimal parameters for the setup of the *BoomerAMG* and we also include non-Galerkin coarsening. A drop-tolerance of 0*.*05 for sparsification is set as default on all levels, with the exception of the five finest levels, which have respective droptolerances of 0*.*0, 0*.*01, 0*,* 02, 0*.*03 and 0*.*04. This choice of parameters is motivated by the fact that, unlike with the wing case, the time for the coarse grid solver is reduced by about 25%, as tests on 6,144 cores on Hazel Hen have shown, without impacting the number of pressure iterations.

First, let us mention that the setup time for the *BoomerAMG* solver is once again negligible in comparison to the time for the entire simulation, requiring less than 3 s on any number of cores on any machine. It is also significantly lower than reading the data of the *in-house AMG*, a serial process, which takes about 80–90 s. Therefore, it does not represent a bottleneck and we do not investigate timings for the setup phase in details.

Next, we present the strong scaling results, based on a single run per core count. The reported value for the time spent in the pressure solver is the timing from core 0. The time spent in the coarse grid solver is measured on each processor and we consider the maximum value among all processes. The average time per timestep for the pressure solver on Mira is shown in Fig. 1, left plot. Unlike what was observed for the wing simulation, the choice of AMG solver does not impact the number of pressure iterations. The *in-house AMG* is slightly faster than the *BoomerAMG* on all core counts and it seems to scale marginally better. Since all other timings are the same, the reason is a faster coarse grid solver, as can be seen in Fig. 1, right plot. The *in-house AMG* achieves a better performance because it optimizes the communication process independently on each level of the AMG. On the coarsest levels, it is able to take advantage of the fast allreduce operation offered by the network of the Blue Gene architecture in hardware. On the finest levels, it picks up the fastest method between a crystal-router strategy or a pairwise exchange.

On 131,072 cores, the actual speed up for the coarse grid solver is 3*.*11 and the parallel efficiency is 0*.*194 for the *in-house AMG*, when the timings on 8,192

**Fig. 1** Rod-bundle test case on Mira, with 1,650,240 elements, leading to 12 or 13 elements per core for the largest core count. AMG parameters: HMIS coarsening, extended+i interpolation, *l*1- Gauss–Seidel forward solve on the down cycle + backward solve on the up cycle for the smoother and an AMG strength threshold of 0*.*5. Left: time spent in the pressure solver; value for process 0, averaged over the total number of timesteps. Right: time spent in the coarse grid solver; maximum value over all processes, normalized by the total number of timesteps

**Fig. 2** Rod-bundle test case on Hazel Hen, with 1,650,240 elements, leading to 16 or 17 elements per core for the largest core count. AMG parameters: HMIS coarsening, extended+i interpolation, *l*1-Gauss–Seidel forward solve on the down cycle + backward solve on the up cycle for the smoother and an AMG strength threshold of 0*.*5. Left: time spent in the pressure solver; value for process 0, averaged over the total number of timesteps. Right: time spent in the coarse grid solver; maximum value over all processes, normalized by the total number of timesteps

cores are used as references. These quantities are respectively 2*.*44 and 0*.*153 for the *BoomerAMG*. As mentioned before, the network on Mira is characterized by little noise and the uncertainty on the timings is low. Yet, these numbers are based on a single run and are only indications.

The same timings on Hazel Hen are shown in Fig. 2. Unfortunately, the node allocation on this machine can be scattered and the interconnect noise, which is shared with the rest of the computer, can be high and unpredictable. Therefore, a thorough analysis of the scaling results is not possible from a single run and the data from Fig. 2 are only indicative. Nevertheless, the runs for each AMG solver on the same number of cores are obtained using the same node allocation. This makes a comparison between the two solvers on a given core count somewhat relevant. In contrast to the results on Mira, the *BoomerAMG* is slightly faster than the *inhouse AMG* on most core counts; the exception is on 6144 cores, where the timings are almost equal. Furthermore, it is quite clear that the coarse grid solver on Hazel Hen does not scale at all. Indeed, the time spent in the coarse grid solver is almost constant from the lowest amount of cores considered.

Based on the available data, we see that beyond 24*,*576 thousand cores, the time spent in the coarse grid solver accounts for roughly half of the time spent in the pressure solver. At that point, there is about 67 elements and 35*,*000 grid points per core, which is consistent with the strong scaling limit on a similar computer as identified in Ref. [11].

#### **5 Conclusions**

We used the *BoomerAMG* solver from the *hypre* library for linear algebra to solve a global coarse problem that is part of the preconditioner for the pressure equation arising when time-integrating the Navier–Stokes equations. The set of parameters for the *BoomerAMG* setup that leads to the lowest solver time for the pressure equation is HMIS coarsening, extended+i interpolation, *l*1-Gauss–Seidel forward solve on the down cycle + backward solve on the up cycle for the smoother and an AMG strength threshold of 0*.*5. We also used non-Galerkin coarsening, with more aggressive drop-tolerance on the coarser levels, to speed up the solver. This new method replaces an existing AMG solver, which is fast and scales well but requires a setup phase done externally in serial. Strong scaling was assessed for both AMG solvers on a real large-scale test case on two supercomputers: an IBM Blue Gene/Q (Mira) and a Cray XC40 (Hazel Hen). On Mira, the *in-house AMG* leads to a faster pressure solver that the *BoomerAMG* on all core counts. The maximum difference is about 10% on 131,072 cores. This is because the *in-house AMG* is able to take advantage of the fast hardware allreduce operation on this machine at the coarsest levels of the AMG solver; in that sense the present result was expected. On Hazel Hen, however, the *BoomerAMG* is consistently faster than the *in-house AMG* and we observe the strong scaling limit to be reached at about 24,576 cores. Overall, the *BoomerAMG* is a valid alternative to the *in-house AMG*; both methods are close in terms of performance and the *BoomerAMG* has the advantage to be set up online and in parallel. In particular for modern architectures, the *BoomerAMG* is even faster than the *in-house AMG* with obvious advantages in the setup phase. All the codes developed in this work are available from the https://github.com/nicooff/nek5000/ tree/amg\_hypre\_c Github repository.

Future work will extend the use of *BoomerAMG* to mesh refinement, where an online and parallel AMG setup phase is a requirement.

**Acknowledgements** We would like to thank Aleks Obabko for his advice and for sharing the wire-wrapped pin bundle case with us. Financial support by the H2020 EU Project "ExaFLOW: Enabling Exascale Fluid Dynamics Simulation" (grant reference 671571), and the Knut and Alice Wallenberg Foundation is gratefully acknowledged. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. Additional computer time was provided by ExaFLOW at HLRS Stuttgart, and by resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC Stockholm.

#### **References**


Y., Saied, F. (eds.) High-Performance Scientific Computing: Algorithms and Applications, pp. 261–279. Springer, London (2012)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Two Decades Old Entropy Stable Method for the Euler Equations Revisited**

**Björn Sjögreen and H. C. Yee**

#### **1 Introduction, Objectives and Preliminaries**

The two decades old high order central differencing via entropy splitting and summation-by-parts (SBP) difference closure of Olsson and Oliger, Gerritsen and Olsson, and Yee et al. [2, 7, 25] is revisited. The entropy splitting is a form of skewsymmetric splitting in terms of the physical entropy of the nonlinear Euler flux derivatives. Central differencing applied to the entropy splitting form of the Euler flux derivatives together with SBP difference operators will, hereafter, be referred to as **entropy split schemes**.

The objective is to prove for the first time, in the recent definition of entropy stability based on the *L*2-energy-like norm estimate, that entropy splitting for central schemes with SBP operators are entropy stable. The proof is to replace the spatial derivatives by summation-by-parts (SBP) difference operators in the entropy split form of the equations using the physical entropy of the Euler equations. The numerical boundary closure follows directly from the SBP operator. No additional numerical boundary procedure is required. In contrast, Tadmor-type entropy conserving schemes [18] using mathematical entropies do not naturally come with a numerical boundary closure. A generalized SBP operator has to be developed [8]. Standard high order spatial central differencing as well as high order central spatial DRP (dispersion relation preserving) spatial differencing is part of the entropy stable methodology. An entropy split scheme satisfies the *L*2-energy

B. Sjögreen

© The Author(s) 2020

Multid Analyses AB, Gothenburg, Sweden e-mail: bjorn.sjogreen@multid.se

H. C. Yee (-) NASA Ames Research Center, Mountain View, CA, USA e-mail: Helen.M.Yee@nas.nasa.gov

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_21

norm estimate readily without an added numerical dissipation term for smooth flows. For flows containing discontinuities the Yee et al. nonlinear filter approach [10–12, 14, 15, 22–25] is employed at isolated computed locations. After each full time step of the entropy split method to suppress spurious oscillations while maintaining accuracy on the remaining flow field. Since the nonlinear filter step is executed as an Euler time discretization at isolated location after the completion of a full time step of the entropy stable central scheme, entropy conservation/stability is valid almost everywhere. The efficiency and performance of the entropy stable split schemes using the physical entropies are compared with Tadmor-type entropy conservative method [18] using mathematical entropies for long time integration of a 2D smooth flows and a 3D direct numerical simulation (DNS) of turbulence with shocklets. It is found that Tadmor-type entropy conservative methods required twice the CPU time than the entropy stable split schemes using the same order of the central scheme. Comparisons among the three skew-symmetric splittings (entropy splitting [19, 20, 25], Ducros et al. splitting [1] and the Kennedy and Grubber splitting [5]) on their nonlinear stability and accuracy performance without added numerical dissipations for smooth flows is included. See [16] for additional details and comparison.

*Remarks* It is noted that the Hughes et al. formulation [4] using the Harten's idea [3] but solving the flow equations in nonconservative form in terms of the entropy variables is completely different from the entropy split schemes. The entropy split scheme solve the entropy splitting form of the Euler flux derivatives consisting of a one parameter family of conservative and a non-conservative portions in terms of the entropy variables. If the parameter satisfies the energy estimate, entropy stability is immediate. The entropy split scheme has been generalized from a perfect gas to a thermally perfect gas and gas flows consisting of linear combination of perfect gases [21, 25]. In addition, these high order schemes have been formulated in time varying deforming curvilinear grids with free-stream preservation [17, 21].

#### **2 Entropy Splitting of the Euler Flux Derivatives**

We consider the 3D equations of inviscid compressible gas dynamics

$$\mathbf{q}\_l + \mathbf{f}\_\times + \mathbf{g}\_\times + \mathbf{h}\_\times = \mathbf{0}$$

with conserved variables **<sup>q</sup>** <sup>=</sup> *(ρ ρu ρv ρw e)<sup>T</sup>* and fluxes in an arbitrary direction **k** = *(k*<sup>1</sup> *k*<sup>2</sup> *k*3*)* with |**k**| <sup>2</sup> <sup>=</sup> 1, and

$$\hat{\mathbf{f}} = k\mathbf{\hat{f}} + k\mathbf{\hat{g}} + k\mathbf{\hat{y}}\\
\mathbf{h} = \left(\rho\hat{u}\ \rho u\hat{u} + k\mathbf{\hat{u}}\ p\ \rho v\hat{u} + k\mathbf{\hat{z}}\ p\ \rho w\hat{u} + k\mathbf{\hat{z}}\ p\ \hat{u}(e+p)\right)^T,\qquad(1)$$

where *u*ˆ = *k*1*u*+*k*2*v*+*k*3*w*. The total energy is related to the pressure *p* by the ideal gas law, *<sup>e</sup>* <sup>=</sup> *<sup>p</sup> <sup>γ</sup>* <sup>−</sup>1+<sup>1</sup> <sup>2</sup>*ρ*|**u**| 2, where *γ >* 1 is a given constant, and <sup>|</sup>**u**<sup>|</sup> <sup>2</sup> <sup>=</sup> *<sup>u</sup>*2+*v*2+*w*2.

An entropy is a convex function, *E(***q***)*, of the conserved variables that allows an additional conservation law,

$$E\_I + F\_\chi + G\_\chi + H\_\natural = 0,\tag{2}$$

when the solution is smooth. The entropy fluxes in the *x*-, *y*-, and *z*-directions are denoted by *F*, *G*, and *H*, respectively. The entropy variables are defined by **v** = ∇**q***E* (the notation *E***<sup>q</sup>** for the gradient will sometimes be used). The convexity of *E* ensures that these are well-defined. The Entropy conservation law (2) follows if the relation **v***<sup>T</sup> <sup>∂</sup>***<sup>f</sup>** *∂***q** = ∇**q***F* for the *x*-direction fluxes, and similarly for the *y*- and *z*directions, holds. Moreover, the entropy variables symmetrize the equations; *∂***f***/∂***v** is a symmetric matrix.

Harten [3] considered the class of entropies

$$E = -\frac{\gamma + \alpha}{\gamma - 1} \rho (p \rho^{-\gamma})^{\frac{1}{a + \gamma}},\tag{3}$$

where *α* is a parameter. To ensure that *E* is convex, i.e., that the matrix *E***q***,***<sup>q</sup>** is positive definite, *α* is required to satisfy *α >* 0 or *α <* −*γ* . The full range for *α* was given in [25], while [3] only considered *α >* 0, and [2] used only the special case *α* = 1 − 2*γ* from *α <* −*γ* . The corresponding entropy flux in the direction **<sup>k</sup>** <sup>=</sup> *(k*<sup>1</sup> *<sup>k</sup>*<sup>2</sup> *<sup>k</sup>*3*)<sup>T</sup>* is

$$F = \hat{u}E.$$

The entropy variables **v** = *E***<sup>q</sup>** are straightforwardly found to be

$$\mathbf{v} = \frac{\rho}{p} \mathbf{s}^{\frac{1}{\alpha+\gamma}} \left( -\frac{\alpha}{\gamma - 1} \frac{p}{\rho} - \frac{1}{2} |\mathbf{u}|^2 \ u \ v \ w \ -1 \right)^T,\tag{4}$$

where *s* denotes *pρ*−*<sup>γ</sup>* . The conserved variables are homogeneous functions of the entropy variables (4),

$$\mathbf{q}(\theta \mathbf{v}) = \theta^{\beta} \mathbf{q}(\mathbf{v}),\tag{5}$$

where *β* = *(α* + *γ )/(*1 − *γ )*. From (5) it follows that

$$\mathbf{q}\_{\mathbf{V}}\mathbf{v} = \beta\mathbf{q} \tag{6}$$

$$
\hat{\mathbf{f}}\_{\mathbf{V}} \mathbf{v} = \beta \hat{\mathbf{f}}.\tag{7}
$$

See [3, 16] for the proof. The range of *α*, where *E***q***,***<sup>q</sup>** is positive definite, translates to *<sup>β</sup>* satisfying *β <* <sup>−</sup> *<sup>γ</sup> <sup>γ</sup>* <sup>−</sup><sup>1</sup> or *β >* 0.

Entropy splitting of the Euler flux derivative in the x-direction with the *y*- and *z*-directions suppressed [2, 25] is written as a weighted sum of a conservative part, **f***x*, and a non-conservative part, **fvv***x*, as

$$\mathbf{f}\_{\times} = \frac{\beta}{\beta + 1} \mathbf{f}\_{\times} + \frac{1}{\beta + 1} \mathbf{f}\_{\mathbf{v}} \mathbf{v}\_{\times}.$$

Replacing **f***<sup>x</sup>* by this split flux derivative gives

$$\mathbf{q}\_{l} + \frac{\beta}{\beta + 1} \mathbf{f}\_{\times} + \frac{1}{\beta + 1} \mathbf{f}\_{\mathbf{v}} \mathbf{v}\_{\times} = \mathbf{0}. \tag{8}$$

The entropy splitting weights the non-conservative portion of the flux derivative by 1 <sup>1</sup>+*<sup>β</sup>* . This means that the range *β >* 0 corresponds to a weight that is less than 1, whereas negative *β* leads, unphysically, to a weight that is greater than 1. The global entropy conservation can be rewritten as an *L*2-like estimate. The entropy time derivative can be rewritten as

$$\frac{d}{dt}E(\mathbf{q}) = \frac{1}{\beta + 1} \frac{d}{dt} (\mathbf{v}^T (E\_{\mathbf{q}, \mathbf{q}})^{-1} \mathbf{v})$$

by using the homogeneity (5). Due to a page limit, see [16] for further discussion. Note that it is necessary to bound the eigenvalues of *E*−<sup>1</sup> **<sup>q</sup>***,***<sup>q</sup>** in order to make the *L*2 like norm a valid estimate.

#### **3 Semi-Discrete Entropy Split Discretization of the Euler Equations**

Consider the 1D compressible gas dynamic equations discretized on a domain *a < x<b* by a uniform grid *xj* = *(j* − 1*)Δx* + *a*, *j* = 1*,...,N*, and grid spacing *Δx* = *(b* − *a)/(N* − 1*)*. Define the semi-discrete entropy split approximation

$$\frac{d}{dt}\mathbf{q}\_j + \frac{\beta}{\beta + 1}D\mathbf{f}\_j + \frac{1}{\beta + 1}(\mathbf{f}\_\mathbf{v})\_j D\mathbf{v}\_j = 0, \quad j = 1, \ldots, N,\tag{9}$$

where *D* is a SBP difference operator. With *entropy split scheme*, we will always mean the entropy split form of Eqs. (8) discretized in space by a summation-byparts finite difference operator. The flux Jacobian matrix with respect to the entropy variables, **fv**, is symmetric. The SBP scalar product is denoted by

$$(\mathbf{u}, \mathbf{v})\_h = \Delta x \sum\_{j=1}^N w\_j \mathbf{u}\_j^T \mathbf{v}\_j,$$

where *ωj >* 0 are weights that are different from 1 only at a few points near the boundaries. The operator *D* satisfies the SBP property

$$(D\mathbf{u}, \mathbf{v})\_{\hbar} = -(\mathbf{u}, D\mathbf{v})\_{\hbar} - \mathbf{u}\_{1}^{T}\mathbf{v}\_{1} + \mathbf{u}\_{N}^{T}\mathbf{v}\_{N},\tag{10}$$

but is otherwise arbitrary. In the most common case *D* is a standard SBP centered difference operator, but other operators are possible.

A zero velocity, *u*<sup>1</sup> = 0, *uN* = 0, boundary condition is enforced, corresponding to wall boundaries. Thanks to the SBP property of the difference approximation the derivation of entropy conservation for the continuous problem can be carried over to the discretization.

**Theorem 1** *The approximation (9) together with the boundary conditions u*<sup>1</sup> = 0 *and uN* <sup>=</sup> <sup>0</sup> *conserve the global entropy in the sense that <sup>d</sup> dt* <*<sup>N</sup> <sup>j</sup>*=<sup>1</sup> *ωjEj* <sup>=</sup> <sup>0</sup>*.*

A method is entropy dissipative, or "entropy stable", if the computed solution satisfies (2) with inequality,

*Proof* Denote

$$\mathbf{r} = -\frac{\beta}{\beta + 1} (\mathbf{v}, D\mathbf{f})\_h - \frac{1}{\beta + 1} (\mathbf{v}, (\mathbf{f}\_\mathbf{v}) D\mathbf{v})\_h.$$

The scheme (9) can be written

$$\frac{d}{dt}\mathbf{q}\_j = P\mathbf{r}\_j,\tag{11}$$

where the projection *<sup>P</sup>* sets *<sup>u</sup>*<sup>1</sup> <sup>=</sup> 0 and *uN* <sup>=</sup> 0. Because *<sup>P</sup>*<sup>2</sup> <sup>=</sup> *<sup>P</sup>*, applying P to both sides of (11) gives that

$$\frac{d}{dt}P\mathbf{q} = \frac{d}{dt}\mathbf{q},$$

i.e., that *P***q** = **q** if the initial data satisfy the boundary conditions. For the entropy

$$\frac{d}{dt}E = (\mathbf{v}, \mathbf{q}\_I)\_h = (\mathbf{v}, P\mathbf{r})\_h = (\mathbf{v}, \mathbf{r})\_h - (\mathbf{v}, (I - P)\mathbf{r})\_h = $$

$$(\mathbf{v}, \mathbf{r})\_h - (P\mathbf{v}, (I - P)\mathbf{r})\_h = (\mathbf{v}, \mathbf{r})\_h,\qquad(12)$$

where we use that *P***v** = **v**. This is due to the second component of **v** is zero when the *x*-velocity, *u*, is zero, and the orthogonality *(P***v***, (I* − *P )***r***)h* = 0. The entropy equation is now of the same form as for the continuous problem, but replacing with integration-by-parts by summation-by-parts gives

$$\frac{d}{dt}E(\mathbf{q}\_j) = -F\_N + F\_1.$$

Entropy conservation follows by observing that *F* = *uE*, so that the boundary conditions imply that *F*<sup>1</sup> = *FN* = 0.

If the boundary conditions are periodic, no SBP modification of the difference operator is needed. Entropy conservation is proved with periodic boundary conditions by direct application of the same technique as above. It can be shown that the result carries over directly to the semi-discrete approximation, since only time derivatives are used in the proof. Hence, the *L*2-like estimate

$$\frac{d}{dt} \sum\_{j=1}^{N} \omega\_j \mathbf{v}\_j (E\_{\mathbf{q}, \mathbf{q}})^{-1}\_j \mathbf{v}\_j = 0$$

is obtained for the approximation (9). It can be shown that Tadmor-type entropy conservative discretization using the Harten entropy and high order central spatial differencings are also entropy conservative methods. See Sjögreen and Yee [16] for the proof.

#### **4 Numerical Experiments**

More extensive numerical experiments are reported in the extended version of this paper [16]. Previous studies using SBP boundary closures for non-periodic boundary conditions can be found in [25]. Here selected summary results are presented.

#### **Test Case 1: 2D Compressible Euler Simulation of Smooth Flow: Isentropic Vortex Convection**

The compressible Euler equations in two space dimensions are solved with initial data

$$\rho(\mathbf{x}, \mathbf{y}) = (1 - \frac{(\mathbf{y} - 1)\boldsymbol{\beta}^2}{8\chi\pi^2}e^{1 - r^2})^{\frac{1}{\chi - 1}}\tag{13}$$

$$u(\mathbf{x}, \mathbf{y}) = u\_{\infty} - \frac{\beta(\mathbf{y} - \mathbf{y}\_0)}{2\pi} e^{(1 - r^2)/2} \tag{14}$$

$$v(x,y) = v\_{\infty} + \frac{\beta(x-x\_0)}{2\pi}e^{(1-r^2)/2} \tag{15}$$

$$p(\mathbf{x}, \mathbf{y}) = \rho(\mathbf{x}, \mathbf{y})^{\mathcal{Y}},\tag{16}$$

where *<sup>r</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>+</sup> *<sup>y</sup>*2, *<sup>β</sup>* <sup>=</sup> 5, *<sup>γ</sup>* <sup>=</sup> <sup>1</sup>*.*4, *<sup>u</sup>*<sup>∞</sup> <sup>=</sup> 1, and *<sup>v</sup>*<sup>∞</sup> <sup>=</sup> 0. The exact solution is the initial data translated, **u***(x, t)* = **u**0*(x* − *u*∞*t,y* − *v*∞*t)*.

The computational domain is 0 ≤ *x* ≤ 18, 0 ≤ *y* ≤ 18 with periodic boundary conditions. The center of the vortex is chosen to be *(x*0*, y*0*)* = *(*9*,* 9*)*. The problem is solved in time with the classical fourth-order accurate explicit Runge–Kutta method to time *t* = 72, which corresponds to four revolutions of the vortex across the domain.

Comparisons of high order classical central split schemes with high order DRP schemes with grid refinements are reported in [13]. Due a space limitation only one grid with maximum and *L*<sup>2</sup> error norm compared with the exact solution is shown in Fig. 1. Here C08-DS represents eighth-order central differencing applied to the Ducros et al. splitting form of the Euler flux derivatives. The corresponding eighth-order entropy splitting, entropy conservative method and Kennedy Grubber splitting are indicated by "C08-ES", "C08-EC" and "C08-KGS". If the computed solutions by "C08-DS", "C08-ES", "C08-EC" and "C08-KGS" are nonlinearly filtered by a dissipative portion of WENO7 (seventh-order weighted essentially nonoscillatory spatial method) with an adaptive flow sensor, they are indicated by C08-DS+WENO7FI, C08-ES+WENO7FI, C08-EC+WENO7FI, and C08-KGS+WENO7FI [14, 15, 22–25]. For the smooth flow without any turbulent structure, *β* = 1 for the entropy split scheme. The *β* parameter studies are reported in [9, 16, 25]. In general, for compressible shock-free turbulence and turbulence with

**Fig. 1** Inviscid 2D compressible vortex convection with 1002 grid points: comparison of maximum-norm of error vs. time for C08-DS, C08-ES, C08-EC, and C08-KGS (left, top), and C08- DS+WENO7FI, C08-ES+WENO7FI, C08-EC+WENO7FI, and C08-KGS+WENO7fFI (right top). Bottom left and bottom right are the corresponding *L*2-norm of error vs. time

shocklets, *β* lies somewhere in the range 1*.*5 *<β<* 2*.*5. In general, the optimal *β* is problem dependent. A general conclusion is that *β* should not be very large or below 1.

Other high resolution dissipative shock-capturing methods are also candidates for the nonlinear filter approach as well as other optimal WENO or ENO methods. However, with a good control of the numerical dissipation away from discontinuities, there is no need to use the more complicated and more CPU intensive shock-capturing methods. The non-split C08 without any added numerical dissipation diverges shortly after time evolution. Results by WENO5 or WENO7 are very diffusive with large maximum or *L*<sup>2</sup> errors. For this smooth long time integration flow, entropy splitting is the most accurate method.

#### **Test Case 2: 3D Isotropic Turbulence with Eddy Shocklets**

The second numerical test problem computes decaying compressible isotropic turbulence with eddy shocklets. For high enough turbulent Mach numbers weak shocks (shocklets) develop from the turbulent motion. Here the initial turbulent Mach number is 0.6. The Navier–Stokes equations are solved using *γ* = 1*.*4. The computational domain is a cube with side length 2*π* and periodic boundary conditions in all three directions. The initial datum is a random divergence free velocity field, *ui,*0, *i* = 1*,* 2*,* 3, that satisfies

$$\frac{3}{2}\mu\_{rms,0}^2 = \frac{1}{2}\langle u\_{l,0}, u\_{l,0} \rangle = \int\_0^\infty E(k) \, dk$$

with energy spectrum

$$E(k) \sim k^4 e^{-2(k/k\_0)^2}.$$

The computations were made with *urms,*<sup>0</sup> = 1 and *k*<sup>0</sup> = 4. The angular brackets denote averaging over the entire computational domain. The density and pressure fields are initially constant. The Taylor-scale Reynolds number, *Reλ,*0, is 100. See [6] for definitions of the quantities and more details about the set up of the problem. The simulation is run to the final time 4.

Figure 2 shows the comparison of two splitting methods (DS and KGS), ES (entropy splitting and entropy stable) and EC (entropy conservative) using the same nonlinear filter. The time evolution of the domain averaged kinetic energy (upper left), enstrophy (upper right), temperature variance (lower left), and dilatation (lower right) are compared. All four forms of the nonlinear filter method provide similar resolution. All four schemes without the nonlinear filter are stable but not as accurate as the nonlinear filter versions. Over all, DS splitting is slightly less CPU intensive than ES. KGS skew-symmetric splitting is more CPU intensive than DS and ES. The EC method is around two times more expensive than DS. In addition, as the order of these methods increases, the gain in efficiency (CPU) by entropy split schemes increases.

**Fig. 2** 3D Isotropic turbulence problem with 643 grid points. Comparison of two splitting method (DS and KGS), ES (entropy splitting and entropy stable) and EC (entropy conservative) using the same nonlinear filter. Evolution of kinetic energy (upper left), enstrophy (upper right), temperature variance (lower left), and dilatation (lower right) DNS computed on 2563 grid points and filtered down to 643 resolution is considered as the reference solution

Although entropy split methods are not in conservation form but entropy conservative, Sect. 4 showed that they perform well on problems with shocklets. Over all, Extension of the entropy split scheme to other equations of state (nonperfect gas) and the MHD can be found in the original 2000 Yee et al. [25] paper. The entropies (3) can be used to construct entropy conserving schemes in conservative form. See [16] for the derivation.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **A Mimetic Spectral Element Method for Free Surface Flows**

**L. Nielsen and B. Gervang**

#### **1 Introduction**

In the last decades, CFD simulations of free surface flows have become a key tool in engineering analysis in the design of marine structures. To be able to obtain valid estimates of environmental stress on ship-wave hydrodynamics, offshore wind turbines, wave energy converters, and offshore production systems the CFD tools need to be able to account for non-linear wave-wave and wave-body interaction. Traditionally free surface water simulation has been simulated using lower order methods, however recently spectral element methods have been used [2]. In contrast to earlier work, in the present article, we simulate 2D free surface waves using a mimetic spectral element method. This ensures that the invariants of the system mass, momentum, and energy are conserved throughout the simulation.

The governing equation for incompressible, Newtonian fluids is the Navier– Stokes equation. Free surface waves can be assumed to be governed by an inviscid and irrotational fluid flow. Assuming first the fluid to be inviscid we arrive at the Euler equations,

$$
\rho \left[ \frac{\partial \mathbf{u}}{\partial t} + \mathbf{u} \cdot \nabla \mathbf{u} \right] = -\nabla p + \rho \mathbf{g},
$$

together with the continuity equation

$$\nabla \cdot \mathbf{u} = 0.$$

L. Nielsen · B. Gervang (-)

© The Author(s) 2020

Department of Engineering, Aarhus University, Aarhus, Denmark e-mail: bge@ase.au.dk

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_22

Using the vector identity <sup>1</sup> <sup>2</sup>∇*(***u**·**u***)* = *(***u**· ∇*)***u**+**u**×*(*∇ ×**u***)* and using that the fluid is irrotational (∇ ×**u** = 0*)*, we can rewrite the momentum and continuity equations,

$$
\rho \frac{\partial \nabla \phi}{\partial t} = -\frac{\rho}{2} \nabla |\nabla \phi|^2 - \nabla p - \rho \mathbf{g}, \tag{1}
$$

$$
\nabla^2 \phi = 0,\tag{2}
$$

where *φ* is a vector potential defined as **u** = ∇*φ, φ* = *φ(x, z, t)*. We can now rewrite the momentum equation as,

$$\nabla \left[ \rho \frac{\partial \phi}{\partial t} + \frac{\rho}{2} |\nabla \phi|^2 + p + \rho gz \right] = 0,$$

which we can integrate in space to obtain the time dependent Bernoulli's equation.

$$
\rho \frac{\partial \phi}{\partial t} + \frac{\rho}{2} |\nabla \phi|^2 + p + \rho gz = C(t),
$$

where *C(t)* is an arbitrary function of integration. We assign *C(t)* = 0 by recalling that *φ* and *φ* + ; *C(t)dt* yield exactly the same flow. Redefining *φ* and retaining the symbol *φ* := *φ* + ; *C(t)dt* we obtain the time dependent Bernoulli's equation for the problem as,

$$
\rho \frac{\partial \phi}{\partial t} + \frac{\rho}{2} |\nabla \phi|^2 + p + \rho gz = 0. \tag{3}
$$

The governing equations for inviscid and irrotational flows for an incompressible fluid are stated through (2) and (3), where the unknowns are the velocity potential, *φ*, and the pressure, *p*. Equations (2) and (3) together with proper boundary conditions constitute a well-posed problem. The velocity potential, *φ*, can be solved from the Laplace equation and then substituted into the Bernoulli's equation to obtain the pressure field.

#### *1.1 Boundary Conditions*

The physical domain is shown in Fig. 1, where the notations are also illustrated. The fluid domain *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup>*, d* <sup>=</sup> 2 is a bounded, connected domain with piecewise bathymetry *<sup>Γ</sup> <sup>b</sup>* <sup>⊂</sup> <sup>R</sup>*d*−1. The time domain is taken as *<sup>T</sup>* : *<sup>t</sup>* <sup>≥</sup> 0. The unknowns for the problem become the velocity potential and the free surface elevation *η(x, t)* : *<sup>Γ</sup> F S* <sup>×</sup> *<sup>T</sup>* −→ <sup>R</sup>. The pressure can hereafter be determined through (3).

**Fig. 1** Illustration of the physical domain with notation of the relevant quantities shown

**Fig. 2** Computational domain

The unsteady kinematic and dynamic free surface boundary conditions are given by Zakharov [8],

$$
\partial\_t \eta = -\partial\_{\underline{x}} \eta \partial\_{\underline{x}} \tilde{\phi} + \tilde{\nu} (1 + \partial\_{\underline{x}} \eta \partial\_{\underline{x}} \eta) \quad \in \quad \Gamma^{FS} \times T,\tag{4}
$$

$$
\partial\_l \tilde{\phi} = -g\eta - \frac{1}{2} ( (\partial\_\mathbf{x} \tilde{\phi})^2 - \tilde{\nu}^2 (1 + \partial\_\mathbf{x} \eta \partial\_\mathbf{x} \eta) ) \quad \in \quad \Gamma^{FS} \times T,\tag{5}
$$

where ˜signify functions defined only on the free surface. The vertical component of the velocity *ν*˜ = *∂zφ*|*z*=*<sup>η</sup>* is calculated by solving the Laplace problem (2) together with the Zakharov boundary conditions (4) and (5) on the free surface. On the bottom we have the no penetration condition,

$$
\partial\_{\overline{z}}\phi + \partial\_{\overline{x}}h\partial\_{\overline{x}}\phi = 0,\text{ for }z = -h(\mathbf{x})\text{ on }\Gamma^b.\tag{6}
$$

On the inlet and outlet boundaries *(Γ* \*<sup>Γ</sup> F S* <sup>∪</sup> *<sup>Γ</sup> b)* the gradient of the velocity potential is specified. The computational domain is shown in Fig. 2.

#### **2 Discretization of Governing Equations**

The developed method adopts elements from differential geometry. The unknowns of our system are described by use of *differential forms*. In a three-dimensional setting we are making use of four types of sub-manifolds: points, curves, surfaces, and volumes, both as inner and outer oriented objects, see an example in Fig. 3. The mimetic spectral element method uses an approach similar to the Galerkin method of the finite element method where the numerical residual is weighted by an arbitrary weight function. In contrast to the traditional finite element method the arbitrary weight functions are taken from the dual space of the function space used by the unknowns.

**Fig. 3** Three-dimensional dual De Rahm complex showing the four types of sub-manifolds and their different orientations

#### *2.1 Basis Functions*

For the polynomial representation we use Lagrange polynomials *li(x)* and edge polynomials *ei(ξ )*, see [5]. The Lagrange polynomials are based on a Gauss-Lobatto-Legendre (GLL) point distribution for the nodal values. The Lagrange polynomials and edge polynomials satisfy the properties,

$$l\_l(\xi\_j) = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j, \end{cases} \qquad \int\_{L\_j} e\_l(\xi) = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j, \end{cases}$$

and the edge polynomials are explicitly given in terms of the nodal Lagrange basis functions *li(x*) as

$$e\_l(\xi) = -\sum\_{k=1}^{l-1} dl\_k(\xi\_l),\tag{7}$$

where *dlk(ξ )* is the exterior derivative applied to the 0-form *lk(ξ )*. This definition of the edge polynomial also implies, see [4] and [5],

$$dl\_l = e\_l - e\_{l+1} \,. \tag{8}$$

#### *2.2 Mimetic Discretization in 2D*

If we let the 0-form *<sup>φ</sup>(*0*)* <sup>∈</sup> *<sup>Λ</sup>*0*(M)* be expanded as

$$\phi\_h^{(0)} = \sum\_{l,j=0}^{N} \phi\_{l,j} l\_l(\xi) l\_j(\eta),\tag{9}$$

then we can write *φ(*0*) <sup>h</sup>* as a matrix-vector product

$$\boldsymbol{\Phi}\_h^{(0)} = \left[\mathbf{L} \otimes \mathbf{L}\right] \boldsymbol{\Phi} = \mathbf{M}^{(0)} \cdot \boldsymbol{\Phi},\tag{10}$$

where **L***i,j* = *li(ξj )* and *ξj* are the Gauss-Lobatto-Legendre points, GLL points.

If we let the 1-form *<sup>u</sup>(*1*)* <sup>∈</sup> *<sup>Λ</sup>*1*(M)* be defined as

$$
\mu^{(1)} = \mu^{\xi} \, d\xi + \mu^{\eta} \, d\eta,\tag{11}
$$

we can expand *u<sup>ξ</sup>* and *u<sup>η</sup>* using edge polynomials as,

$$\mu\_h^\xi = \sum\_{l=1}^N \sum\_{j=0}^N \mu\_{l,j}^\xi e\_l(\xi) l\_j(\eta),\tag{12}$$

$$u\_h^\eta = \sum\_{l=0}^N \sum\_{j=1}^N u\_{l,j}^\eta l\_l(\xi) e\_j(\eta). \tag{13}$$

The discrete one-form *u(*1*)* can also be written as a matrix-vector product, where **u** is evaluated in the GLL points,

$$
\mu\_h^{(1)} = \begin{bmatrix} \left[\mathbf{L} \otimes \mathbf{E}\right] & \mathbf{0} \\ \mathbf{0} & \left[\mathbf{E} \otimes \mathbf{L}\right] \end{bmatrix} \cdot \begin{bmatrix} \mathbf{u}^\xi \\ \mathbf{u}^\eta \end{bmatrix} = \mathbf{M}^{(1)} \cdot \mathbf{u},\tag{14}$$

where **E***i,j* = *ei(ξj )*.

The 2-form *<sup>P</sup>(*2*)* <sup>∈</sup> *<sup>Λ</sup>*2*(M)* is expanded using only edge polynomials,

$$p\_h^{(2)} = \sum\_{i,j=1}^N p\_{i,j} e\_i(\xi) e\_j(\eta) \Rightarrow \left[\mathbf{E} \otimes \mathbf{E}\right] \cdot \mathbf{p} = \mathbf{M}^{(2)} \cdot \mathbf{p}.\tag{15}$$

The Laplace equation can be reformulated using a mixed formulation, see [1], where the equilibrium equation and the constitutive relationship are separated into two equations.

$$
\nabla \phi = \mu, \quad \nabla \cdot \mu = 0. \tag{16}
$$

Writing (16) using differential geometry for a 3-D geometry we obtain,

$$d\phi^{(0)} = \mu^{(1)},\tag{17}$$

$$dq^{(2)} = 0^{(3)},\tag{18}$$

$$q^{(2)} = \star u^{(\mathrm{l})},\tag{19}$$

where we have utilized the Hodge star operator. The Hodge star operator is a map, which maps *p*-forms onto *(n* − *p)*-forms, where *n* is the dimension of the domain, *Ω*. Given a *p*-form, *λ(p)*, the hodge star maps as follows:

$$
\star\lambda^{(p)}(\mathcal{Q}^n) = \tilde{\lambda}^{(n-p)}(\mathcal{Q}^n),
\tag{20}
$$

where ˜ denotes the change of orientation of the new form. The Hodge star is also the coupling between the outer oriented domain and the inner oriented dual space, as seen in Fig. 3.

In 2-D, using differential geometry, equations (16) take the form,

$$d\phi^{(0)} = u^{(1)}, \quad \star u^{(1)} = \tilde{q}^{(1)}, \quad d\tilde{q}^{(1)} = \tilde{0}^{(2)}.\tag{21}$$

When the exterior derivative is applied to the balance equation of (21) we obtain, see [5]

$$d\tilde{q}\_h^{(1)} = \sum\_{i,j=1}^N (q\_{l,j}^\xi - q\_{l-1,j}^\xi + q\_{l,j}^\eta - q\_{l,j-1}^\eta) e\_i(\xi) e\_j(\eta),\tag{22}$$

where we have utilized (8). The equilibrium equation, the first equation in (21), is equated to a zero valued 2-form. Expanding the last equation in (21) yields,

$$\sum\_{i,j=1}^{N} f\_{i,j} e\_l(\xi) e\_j(\eta) = \sum\_{i,j=1}^{N} (q\_{lj}^{\xi} - q\_{i-1j}^{\xi} + q\_{lj}^{\eta} - q\_{lj-1}^{\eta}) e\_l(\xi) e\_j(\eta), \tag{23}$$

where *fi,j* = 0. The basis can then be cancelled and we can rewrite (23) as,

$$\mathbf{f} = \mathbb{E}^{(2,1)}\mathbf{q},\tag{24}$$

where <sup>E</sup>*(*2*,*1*)* is an incidence matrix, only consisting of 0, 1 and <sup>−</sup>1. This matrix relates the fluxes of **q** to the volume integral of the balance equation, see Fig. 4.

The first step in developing the discrete system is the formulation of the weak form, where we make use of duality pairing between an arbitrary *k*-form*, α(k)*, and

an arbitrary *(n* <sup>−</sup> *k)*-form, *<sup>β</sup>(n*−*k)*. The duality pairing is defines as,

$$\left\langle \alpha^{(k)}, \tilde{\beta}^{(n-k)} \right\rangle\_{\mathcal{Q}^n} = \int\_{\mathcal{Q}^n} a^{(k)} \wedge \tilde{\beta}^{(n-k)}.\tag{25}$$

The pairing with the *(n* <sup>−</sup> *k)*-form, *<sup>β</sup>(n*−*k)*, takes the role of a weight function in traditional finite element analysis and lives in the dual space and carry the opposite orientation. The result of duality pairing can also be represented as a matrix-vector product,

$$\boldsymbol{\beta}^{T} \cdot \tilde{\mathbf{M}}^{(n-k),T} \cdot \mathbf{W} \cdot \mathbf{J} \cdot \mathbf{M}^{(k)} \cdot \boldsymbol{\alpha} = \boldsymbol{\beta}^{T} \cdot \mathbf{M}^{(k)} \cdot \boldsymbol{\alpha},\tag{26}$$

where **W** contains the Gauss weights and **J** is the Jacobian matrix. M*(k)* is a mass matrix of the corresponding discretized *k*- and *(n*− *k)*-form pairing, and ˜*( )* denotes a matrix of opposite orientation.

Using Stokes generalized theorem [3] and applying integration by parts to the balance equation (the last equation of (21)), we obtain.

$$\int\_{\mathcal{Q}} d\tilde{q}^{(1)} \wedge a^{(0)} = \int\_{\mathcal{Q}} d\left(\tilde{q}^{(1)} \wedge a^{(0)}\right) - \int\_{\mathcal{Q}} \tilde{q}^{(1)} \wedge da^{(0)} \tag{27}$$

$$=\int\_{\partial\Omega} \left(\tilde{q}^{(1)} \wedge a^{(0)}\right) - \int\_{\Omega} \tilde{q}^{(1)} \wedge da^{(0)}.\tag{28}$$

Using duality pairing, an inner product projection for the term with the Hodge star operator, the expansions in (9)–(15), and appropriate boundary conditions we can set up the matrix system for the discrete Laplace operator as shown in (29).

$$
\begin{bmatrix}
\mathbf{0} & \mathbf{0} & \mathbb{E}^{(1,0),T}\tilde{\mathbb{M}}^{(1)} \\
\mathbf{0} & \mathbb{M}^{(1)} & \tilde{\mathbb{M}}^{(1)} \\
\mathbb{M}^{(1)}\mathbb{E}^{(1,0)}\mathbb{M}^{(1)} & \mathbf{0}
\end{bmatrix} \cdot \begin{bmatrix}
\boldsymbol{\phi} \\
\mathbf{u} \\
\tilde{\mathbf{q}}
\end{bmatrix} = \begin{bmatrix}
\mathbf{0} \\
\mathbf{0} \\
\mathbf{0}
\end{bmatrix} .\tag{29}
$$

Using the forward Euler scheme for the temporal term and pairing it with an arbitrary 0-form, *<sup>α</sup>*˜*(*0*)* , we can rewrite the Bernoulli's equation as,

$$\left\langle \left( \rho^{(2)} \wedge \frac{\phi\_n^{(0)} - \phi\_{n-1}^{(0)}}{\Delta t} + \frac{1}{2} \rho^{(2)} \wedge \left( i\_{d\phi\_{n-1}^{(0)}} d\phi\_n^{(0)} \right) + p^{(2)} = -\rho^{(2)} \wedge h^{(0)} \text{ g} \right), \left. \tilde{a}^{(0)} \right\rangle\_{\Omega} . \tag{30}$$

The density is considered a 2-form, which leaves (30) Hodge invariant. The interior product *i* is defined in [7]. The discrete version of (30) takes the form,

$$\left[\underset{\Delta t}{^{\rho}}\mathbb{M}^{(2)} + \underset{\mathbf{2}}{^{\rho}}\mathbb{M}^{(2)}\mathbb{M}\_{i}^{(1,1)}\mathbb{M}^{(2)}\right] \cdot \left[\underset{\mathbf{P}}{^{\rho}}\right] = -\rho \text{ g } \mathbb{M}^{(2)}\mathbf{h} \ + \underset{\Delta t}{\underset{\Delta t}{^{\rho}}}\mathbb{M}^{(2)}\phi\_{n-1}.\tag{31}$$

M*(*1*,*1*) <sup>i</sup>* is derived from the interior product of the two 1-forms in the convective term, and contains information of *φ* from the previous time step and consequently has to be updated at each new time step.

The simulation is initialized by first solving the Laplace equation with the prescribed boundary conditions. The initial velocity potential *φ* on the free surface, is set to *φ(x, t* ˜ = 0*)* = *x*, and the free surface height is set to *η(x, t* = 0*)* = 0. At the following time steps, the Zakharov free surface equations are solved to obtain new values of *φ*˜ as well as the free surface elevation, *η*.

#### **3 Numerical Results**

The method is first applied to a non-temporal problem without a free surface. The geometry sketched in Fig. 5 contains a cylinder in the middle of a square. On the horizontal walls of the square and the cylinder wall the no penetration condition is applied. On the left vertical boundary a fully developed velocity profile is specified and on the right vertical boundary a constant velocity potential is defined. The velocity potential *φ* and streamlines are shown in the middle section of Fig. 5. In the right part of Fig. 5 the pressure field is shown. Figure 6 shows that we obtain spectral convergence for both unknowns.

Furthermore, the balance equation ∇ · **u** = 0 (conservation of mass) is satisfied both globally and point-wise independent of polynomial order as shown in Fig. 7.

Next we apply the method to a temporal and free surface problem where we have included a bump on the bottom boundary. The Zakharov free surface equations are applied on the top horizontal boundary. In Fig. 8 the pressure field and the free surface are plotted at *t* = 1*,*100*,*200.

**Fig. 5** Left: multi-element mesh of the cylinder problem with corresponding boundary conditions. Middle: solved velocity potential *φ* in black with corresponding streamlines in red. Right: Solved pressure field from the Bernoulli equation

**Fig. 6** The two unknowns of the system, the velocity potential, *φ*, and the pressure field, *P* , are shown to carry spectral convergence

**Fig. 7** The mass balance equation of (16) (∇ · **u** = 0) is satisfied both globally and locally for any order of the expanding polynomial

**Fig. 8** Time progression of the pressure fields, *P* , at time steps *t* = 1, 100 and 200 are shown to the left. To the right the height of free surface wave *η* is shown (a scaling factor of 10 is used)

#### **4 Discussion and Conclusion**

Using an isoparametric, multi-element formulation the solution of the discretized Laplace equation shows spectral convergence. In addition, we observe that mass is conserved both globally and locally.

In (31), the discretized Bernoulli equation was kept Hodge-invariant, leaving the equation metric free. This suggests that the fundamental invariant of the equation is conserved. The Bernoulli equation conserves the total energy of the system. However, in Fig. 9 it is observed that a small amount of energy is gained and lost in a periodic manner. It is also observed that the mean energy is constant. It was possible to time integrate over very long time periods without noticing any degradation of data and we conclude that energy is conserved over long time periods even though fluctuations were observed for short time periods. In the future we plan on using a mimetic time integration scheme, which was used in [6], as well as the mimetic spatial discretization that was used in the present work.

**Fig. 9** Potential and kinetic energy is summed for the entire system at every time step and plotted against time

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Spectral/hp Methodology Study for iLES-SVV on an Ahmed Body**

**Filipe F. Buscariolo, Spencer J. Sherwin, Gustavo R. S. Assi, and Julio R. Meneghini**

This work focuses on the correlation study between a computational and physical model of an Ahmed Body with slant angle of 25◦, which generates a complex flow behaviour over the slant and back, with two vortices being generated from the side combined with separation on the slant. Physical results are from a wind tunnel test, performed by Strachan et al. [12] considering moving ground and Reynolds number of 1.7M, based on the length of the body.

CFD simulations were performed using the code Nektar++, which is an open source, spectral/hp element high-order solver, which methodology combine both mesh refinement (h), with higher polynomial order (p) for higher fidelity modelling. It employs an implicit type turbulence model using a Spectral Vanish Viscosity (iLES-SVV) model, which works as a filter for high frequencies. Same physical test conditions and tunnel test section were also considered, over a total time of 4 convective lengths, with same Reynolds number of 1.7 Million from reference experiments.

Considering the drag coefficient values for fully developed cases on the 5th and 6th polynomial order, the difference observed, compared with experimental results, was a maximum difference of 16%, however the simulation does not consider the upper support used in the experimental setup. Comparing the Spectral/hp element

F. F. Buscariolo (-)

Imperial College London, NDF-USP, London, UK e-mail: f.fabian-buscariolo16@imperial.ac.uk

S. J. Sherwin Imperial College London, London, UK e-mail: s.sherwin@imperial.ac.uk

G. R. S. Assi · J. R. Meneghini NDF-USP, São Paulo, Brazil e-mail: g.assi@usp.br; jmeneg@usp.br

© The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_23

297

LES-SVV case from literature, the agreement with the experimental drag coefficient has been improved, reducing the gap from 45 to 16%. For the lift coefficient the maximum difference between the simulation results compared to experimental data is only 3%. There is also a good agreement between the LDA measurements on the end of the body with the results from the simulation. It is possible to observe a more intense vortex core on the simulation results, as compared to experimental data, which might well be explained by the upper support used to fix the Ahmed body in experimental test, which weakens the vortices.

The methodology shows promising results against the open literature once an appropriate validation study has been undertaken. Despite the relatively coarse resolution adopted the results are encouraging. Having identified an appropriate resolution, we will next consider other slant angles, to see how well these correlate with the experimental studies.

#### **1 Introduction**

Among all automotive bluff bodies in literature, the most studied one is the Ahmed Body. It was first proposed by Ahmed et al. [2], based on previous work from Morel [7], which was the first to study the behavior of slanted bluff bodies. The Ahmed body was designed to have shape similar to road vehicles and generate their main flow features, such as stagnation and separation points. The main dimensions of the Ahmed body are highlighted on Fig. 1.

Based on the results found by Ahmed et al. [2] on the variation of the slant inclination angle, Huminic and Huminic [4] states that three different flow configurations are found: from 0 to 12.5◦, the airflow over the angled surface remains fully attached before separating from the model when it reaches the vertical surface of the back end. The flow from the angled section and the side walls produces a pair of counter rotating vortices, which continue downstream; from 12.5 to 30◦, the flow over the angled section becomes highly complex. Two increased counterrotating lateral vortices are shed from the sides of the angled section with increased size, which affects the flow over the whole back end, causing a three-dimensional wake. These vortices are also responsible for maintaining attached flow over angled surface up to an angle of 30◦; from 30◦ and above, the flow is fully separated. There remains though a weak tendency of the flow to turn around the side edge of the model, a result of the relative separation positions of the flow over model top and that over the backlight side edges.

Due to some limitations on the wind tunnel and resources, Ahmed performed only force measurements on the bluff body during his experiments. In order to better understand the flow phenomena on an Ahmed Body, Lienhart and Becker [6] performed a study using Laser Doppler Anemometry (LDA), Hot-Wire Anemometry (HWA) and static pressure measurements in order to investigate the flow and turbulence structure around the Ahmed Body model for two slant angle conditions: 25 and 35◦. The main scope was to supply a detailed data set acquired under

**Fig. 1** Ahmed Body schematic drawing considering its main dimensions and 3D visualization

well-defined boundary conditions, similar to Ahmed first test, which considered a Reynolds number of 4.29 Million based on the length and static floor, to be used as reference data for numerical simulations.

Aiming to reproduce the real highway conditions of a vehicle, Strachan et al. [12], performed an Ahmed Body wind tunnel test with moving road conditions and both the aerodynamic forces and flow characteristics by time-averaged LDA were recorded. The flow conditions were slightly different from the ones used on Ahmed first test, by reducing the flow velocity to 25 m/s resulting in a Reynolds number of Re = 1.7 Million based on its length and the supports on the ground were replaced by a fixing system on the top of the tunnel, due to the rolling road simulation.

The Ahmed Body stands as one of the most used validation cases for CFD codes employed for automotive applications. Simulations employing a Reynolds Averaged Navier-Stokes (RANS) methodology are able to predict with good accuracy the drag coefficient, even for cases with complex flow topology, such as the slant angle of 25◦, with correlation factor of around 95% compared to experimental results, however the flow physics does not agree, usually under-estimating the flow features. Attempts considering more refined methodologies such as Detached Eddy Simulations (DES) and Large Eddies Simulation (LES) provide better correlation with experiments when comparing the flow structures but aerodynamic quantities values lose accuracy.

A trend that rose to improve the confidence level of CFD simulations was the high-order or high-fidelity methods, such as the spectral/hp element method [5]. The spectral/hp elemental method combines, according to Xu et al. [13], the advantages of the spectral element method, in terms of the properties of accuracy and rapid convergence, with those of the classical h-version finite element method, that allows complex geometries to be effectively captured. It also provides an attractive higherprecision approximation to solve partial differential equations.

One of the software that employs the spectral/hp element methodology is Nektar++ [9]. Nektar++ is a cross-platform spectral/hp element framework which aims to make high-order finite element methods accessible to the broader community. This is achieved by providing a structured hierarchy of C++ components, encapsulating the complexities of these methods, which can be readily applied to a range of application areas, as stated by Cantwell et al. [3]. It allows the use of high complex solution such as implicit LES (iLES) using a Spectral Vanish Viscosity (SVV) technique to stabilize the solution.

The latest achievements in the high-fidelity turbulence models around an Ahmed Body with slant angle of 25◦ are summarized in the compilation work of Serre et al. [11], in which a comparative analysis of recent simulations, conducted in the framework of a French–German collaboration on LES of Complex Flows at Reynolds number of 768,000. It compares the results obtained with different eddyresolving modelling approaches, with two LES on body-fitted curvilinear grids: LES with Smagorinsky model and wall function (LES-NWM) and Wall-resolving LES with dynamic Smagorinsky model (LES-NWR), a stabilized spectral method known as iLES-SVV, similar to the one used in this present work, which is the base of the Nektar++ code and a DES-SST approach on an unstructured grid with element number ranging from 18.5 to 40 Million. Results of the flow field shows good agreement with results measured by Lienhart and Becker [6] by a gap on the drag coefficient values of 17% for the best case and 45% for the one using iLES-SVV.

#### **2 Objectives**

The main objective of this work is to evaluate the aerodynamic behaviour in terms of the drag and lift coefficients, considering an Ahmed Body with slant angle of 25◦ using a spectral/hp elements method methodology as shown on Fig. 2. To achieve this, we first present a mesh study, evaluating two different size refinements referred as h-refinement for each of those, we employ three high-order surface mesh values to improve curvature representation. As the spectral/hp element method has also the possibility to improve the solution by increasing the polynomial order and consequently the number of degrees of freedom, we also evaluated three high polynomial orders for each mesh case, in a total of eighteen load cases.

**Fig. 2** Representation of the Ahmed Body with slant angle of 25◦

All load cases employ moving ground condition and Reynolds number of 1.7 Million, based on the length of the body. Due to the same conditions considered, results are compared with experiments performed in the study of Strachan et al. [12].

#### **3 Spectral/hp iLES-SVV**

In this work, Nektar++ is used to run an implicit LES simulation using spectral/hp method. In this method, the domain is first divided into non-overlapping elements, offering geometric flexibility and allows for local refinement. Simulations were performed using the incompressible Navier–Stokes solver employing a velocity correction scheme, combined with a Continuous Galerkin (CG) projection. More details are presented by Cantwell et al. [3].

The mathematics behind Nektar++ basically considers the numerical solution of partial differential equations (PDEs) of on a domain , which may be geometrically complex, for some solution u. Practically, takes the form of a d-dimensional finite element mesh consisting of elements K*i*, embedded in a space of dimension dc, such that d ≤ dc ≤ 3, with = *uiKi* is an empty set or an interface between elements of dimension dbar *<* d. The PDE problem is solved then in the weak sense, considering that u|K*i* must be smooth with at least a 1st-order derivative. Therefore is required that u|K*<sup>i</sup>* is in the Sobolev space W1*,*2(K*i*) equivalent to H1 (K*i*), according to Adams [1]. For a continuous discretisation, we impose C0 continuity along element interfaces.

We assume the solution can be represented as u*δ*(x) = < *<sup>n</sup> u*ˆ*nn*(x), a weighted sum of N trial functions *n*(x) defined on and the problem becomes that of finding the coefficients *u*ˆ*n*. The approximation u*<sup>δ</sup>* does not directly give unique choices for the coefficients *u*ˆ*n*. To achieve this, a restriction is placed on the residual so that its L2 inner product, with respect to the test functions *n*(x), is zero. For a Galerkin projection it is chosen that the test functions are the same as the trial functions, that is *n* = *n*. As outlined previously, to construct the global basis *n* it is first considered the contributions from each element in the domain. Each K*i* is mapped from a standard reference space K is between [−1, 1] by a parametric mapping *χ*e: K becomes K*<sup>i</sup>* given by x = *χe(ξ )*, where K is one of the supported region shapes, and *ξ* are d-dimensional coordinates representing positions in a reference element, distinguishing them from x which are d-dimensional coordinates in the Cartesian coordinate space.

The next step is to construct a local polynomial basis on each reference element with which to represent solutions. For 3D regions, a tensorial basis may be used, where the polynomial space is constructed as the tensor-product of one-dimensional bases on segments, quadrilaterals or hexahedral regions.

Spectral/hp element discretisation generally lead to approximations that have low dissipation and low dispersion per degree of freedom when compared to lowerorder methods. As stated by Xu et al. [13], in solving advection-diffusion equations and nonlinear partial differential equations such as advection-dominated flows, at marginal resolutions, oscillations appear that may render the computation unstable. Artificial viscosity has been used in may discretisation methods to suppress wiggles associated with high wavenumbers has been broadly and effectively used in simulations using the Fourier method. A related concept is the so-called SVV, which was originally proposed based on a second-order diffusion operator for spectral Fourier methods. SVV has been explicitly regarded as a turbulent model of implementing iLES under the assumption that the action of subgrid scales on the resolved scales is equivalent to strictly dissipative action stated by Sagaut [10], even though SVV is not explicitly designed as a subgrid-scale model. An example of a 1-D SVV kernel is:

$$\text{Df} = \begin{cases} 0, \text{ p} \le \text{Pcut} \\ \exp\left(-\frac{(p-P^2)}{(p-Pcut)^2}\right), \text{ p} > \text{Pcut} \end{cases} \tag{1}$$

where P is the total number of modes employed and P*cut* is the cutoff polynomial order. SVV with the kernel function Df can be regarded as a low-pass filter. We see that the SVV dissipation added to the high mode numbers with respect to the spectral element discretisation does indeed yield dissipation at the global high wave number scales of the solution.

For this work, we employed a novel CG-SVV scheme with DGKernel, proposed by Moura et al. [8] where he dissipation curves of CG of order *p* are match to those of DG with order *p* − 2, eliminating non-smooth dissipation characteristics arising from CG dissipation when considering high Reynolds number.

#### **4 Simulation Methodology**

We first define the coordinate system as X the streamwise direction, Y the vertical direction and Z the spamwise direction. The Ahmed Body length of 1.044 m is defined as 1 AL. The virtual wind tunnel dimensions are 2*.*74 × 1*.*66 m for the test section and total length of the domain of 4 AL, similar to Strachan et al. [12] study. The Ahmed Body model back in placed on X = 0, inlet position at X = −2 AL and outlet position at X = 2 AL. A schematic setup is shown on Fig. 3.

In terms of boundary conditions, velocity was normalized to 1 in order to match the Reynolds number previously stated and set as the inlet boundary condition. The outlet was set as pressure high-order outlet condition and the floor was also set with the same velocity of the free stream in order to reproduce the moving floor effect. The top and outer side wall and the Ahmed Body wall are set as no slip condition and a symmetry condition. Total simulated time is 7 convective lengths AL, which means that the flow is able to cross the whole domain.

This study evaluates two mesh configuration considering different h-refinements and referred as Original and Refined meshes and for each of those, three highorder surface mesh settings: 4th, 5th, and 6th order, generating six different meshes. All mesh files were generated by NekMesh, which is Nektar++ high-order mesh generator. In both Original and Refined meshes, cases two refinement zones were generated, where the first one, defined as the Ahmed Body refinement, ranges from 0.3 AL before the beginning of the geometry and 0.3 AL after the end of the body, in a total length of 1.6 AL. The second refinement, defined as the Wake Refinement region, intercepting the first refinement in 0.3 AL before the end of the body, to 1.3 AL after the end of the body, in order to fully capture the flow phenomena in the separation region, with same total length of 1.6 AL, as illustrated on Fig. 4.

The Original Mesh has total number of elements for half model around 95,000. For the Refined mesh, the boundary layer setup was the same and the dimensions

**Fig. 3** Schematic representation of the boundary condition on the Ahmed Body simulation

**Fig. 4** Plane Z = 0 representation of mesh refinement regions. Ahmed Body Refinement region highlighted in yellow and Wake Refinement region highlighted in black

**Fig. 5** Plane Z =0 mesh refinement comparison between two h-refinement cases. (**a**) Original mesh. (**b**) Refined mesh

were kept the same in terms of sizing. Refined mesh setup, giving a total of 310,00 elements. Details of both meshes are shown in Fig. 5.

Most of the commercial CFD code employ low order methods and the highest order polynomial interpolation for the solutions usually seem is 3rd. The mesh plays the major role for complex simulations such as LES, leading to elevate number of elements to reach a reliable result. To make use of the flexibility of the spectral/hp element methods, we proposed solutions considering polynomials with order higher than 3rd within the previous mesh refinement studies as the higher order polynomials increase the degrees of freedom and resolution of the mesh. For the Nektar++ implicit LES simulations using the Incompressible Navier–Stokes solver evaluated three different polynomial expansions, 4th, 5th and 6th orders, referred here as P4, P5 and P6. In summary, 18 load cases were evaluated using HPC with 432 CPUs for each case.

#### **5 Results**

#### *5.1 Drag Coefficient Comparison Results*

The drag coefficient for the 18 cases evaluated, considering 9 from the Original mesh with 95,000 elements considering fourth, fifth and sixth polynomial order and the Refined mesh case with 310,000 elements also considering fourth, fifth and sixth polynomial order expansion, with maximum RMS and compared with experimental results are shown on Fig. 6.

From Fig. 6 it is possible to observe that for the drag coefficient, P4 polynomial expansion considering both mesh cases presented mean drag results around 35% higher than the experimental results. For the P5 cases, considering again both Original and Refined mesh cases, the error was reduced to 5% however results change the trend from over-predicted to under-predicted when the mesh is refined further. The cases considering P6 polynomial expansion presented the same trend for both mesh cases, highlighting its consistency although the mean error when compared to experiments increases to 16%.

#### *5.2 Lift Coefficient Comparison Results*

Similar to the drag coefficient graph, in Fig. 7 the lift coefficient for the all evaluated cases is shown, considering Original and Refined meshes and fourth, fifth and sixth polynomial order expansion. Maximum RMS is also plotted for all cases and compared with experimental results from Strachan et al. [12].

**Fig. 6** Drag coefficient for the 18 evaluated test cases. On the left, average values for Original mesh, considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6). On the right average, values for Refined mesh, considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6)

**Fig. 7** Lift coefficient for the 18 evaluated test cases. On the left, average values for Original mesh, considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6). On the right average, values for Refined mesh, considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6)

Analyzing Fig. 7, we observe that the h-refinement from Original mesh to Refined mesh lead to results closer to experimental values when adopting P4 as the polynomial expansion basis. For both P5 and P6 polynomial expansions, lift coefficient results present good agreement with experimental data, with maximum mean error of 5%.

#### *5.3 Flow Structure Comparison*

In terms of the polynomial order expansions for the solution, combined with the 6th order surface mesh, the results present focus on the Refined mesh case, once they improved the correlation for the P4 polynomial expansion within experimental results and kept similar trend for P5 and consistent results for P6 in terms of drag and lift coefficient prediction. An initial comparison in terms of flow structures is present by the Q-Criterion of 350 coloured by pressure, comparing the Refined mesh case, considering P4, P5 and P6 polynomial expansions in Fig. 8.

**Fig. 8** Iso-Surface of Q-Criterion = 350 colored by pressure on the Ahmed Body with slant angle of 25◦, considering Refined mesh and 6th order surface mesh for fourth, fifth and sixth polynomial expansions (P4, P5 and P6)

**Fig. 9** Contour of Lambda 2 on the plane x/L = 0 on the back of the Ahmed Body with slant angle of 25◦, considering Refined mesh and 6th order surface mesh for fourth, fifth and sixth polynomial expansions (P4, P5 and P6)

From Fig. 8 it is possible to visualize that P4 is unable to define the vortex on the side of the slant, explaining also the difference in terms of both drag and lift coefficients, compared to experimental results. Results for P5 show the side vortex clearly defined and P6 is also able to capture the lower vortex, detailed on the lower image, which is not present in the studies considering the Ahmed Body, but they are important to understand the behaviour with the moving floor. Figure 9 shows a contour of Lambda 2 to illustrate the lower vortex detail on the plane x/L = 0, on the back of the Ahmed Body.

We next focus only on the Refined mesh with 6th order surface mesh for P5 and P6, once they were able to predict both lower and top vortices. Due to nature of the wind tunnel with moving ground used by Strachan et al. [12], the model had to be fixed on the top by a steam, which can be removed in the drag coefficient calculations, however it might change the flow topology over the slant, as stated by the authors themselves.

Comparing the plane x/L = 0.076 with the measurements of the flow velocity on x direction U normalized by the free stream velocity of Lienhart and Becker [6] with static floor without the steam on the upper portion with results of Strachan et al. [12], it is possible to notice intensity changes in the U normalized velocity and this is attributed by the last due to the upper support.

As the simulations do not included the upper support, but do include the moving ground, the expected results are the top portion to be similar to Lienhart and Becker [6] measurements and the lower part, correlated with measurements of Strachan et al. [12], which both P5 and P6, proved to have good agreement in terms of normalized U velocity, shown in Fig. 10.

Similar comparison is presented on Fig. 11 for the vortex intensity on the slant, on the plane x/L = 0, on the back of the Ahmed Body for vertical velocity V normalized by the free stream velocity. In this, the simulations close correlate to Lienhart and Becker [6] study, due to the absence of the steam support but for this case, the higher polynomial order expansion P6 is able to capture more scales than the P5 for the core of the main vortex, highlighting the gain of resolution of the high-order simulations.

**Fig. 10** Contour of U velocity normalized by free stream velocity on the plane x/L = 0.076 of the Ahmed Body with slant angle of 25◦, comparing LDA measurements of Lienhart and Becker [6] with static floor without the steam (left), results of Strachan et al. [12] with moving floor and steam support (middle left), Refined mesh and 6th order surface P5 (middle right) and Refined mesh and 6th order surface P6 (right)

**Fig. 11** Contour of V velocity normalized by free stream velocity on the plane x/L = 0 of the Ahmed Body with slant angle of 25◦, comparing LDA measurements of Lienhart and Becker [6] with static floor without the steam (left), results of Strachan et al. [12] with moving floor and steam support (middle left), Refined mesh and 6th order surface P5 (middle right) and Refined mesh and 6th order surface P6 (right)

#### **6 Conclusions**

Within the advances in CFD codes, confidence level and computational power, aerodynamic simulations are applied in almost every automotive company. The reason is very simple: reduced development cost and time, which is an enormous advantage in a competitive market.

High-fidelity simulations are becoming a reality for complex industrial cases in order to improve resolution and results in a reliable response time, such as presented for the Ahmed Body on this work.

On the meshing definition study, the surface mesh order seems not to influence the results in terms of aerodynamic quantities, presenting similar trend for same polynomial order, as the Ahmed Body geometry has curved surfaces only on the front portion.

Still on the mesh definition, as the h-refinement increases from Original to Refined mesh, the drag coefficient values for P4 and P6 remains unchanged and P5 values switched from positive to negative. We conclude that consistency is shown for P4 and P6 cases but P6 presented the most reliable results, with a maximum deviation of 16%. For the lift coefficient, results for P4 improved as the h-refinement increased and kept similar values for both P5 and P6, where the best agreement was found for the case considering Refined mesh with 6th order surface mesh and P6 as the polynomial expansion.

Flow structure results focus only the Refined 6th order surface mesh, where the main expected features were captured by P5 and P6 cases. It was confirmed by those two simulation cases that the lower portion has similar behaviour of the moving ground test conducted by Strachan et al. [12], however the top portion close correlate to Lienhart and Becker [6] experiments, as the simulation cases allow the body to be fixed without the upper support used in the experiment. This fact might also explain the difference from the simulation results with the literature experiments, as the simulation allows idealized configurations.

For all simulation cases, half of the body is being simulated and a symmetry plane is set on the middle portion. From Fig. 12, which shows the normalized U velocity on a line of coordinate y/L=0.15 on the plane x/L = 0, we observe that

**Fig. 12** Normalized U velocity distribution over a line at coordinate y/L = 0.15 on the plane x/L = 0 at the back of the Ahmed Body with slant angle of 25◦, comparing LDA measurements of P5 (red), P6 (orange), Lienhart and Becker [6] (dark green) and Strachan et al. [12] (light green)

simulation has good agreement with experimental results, with a small distortion as it gets closer to the symmetry plane.

**Acknowledgements** We should acknowledge the HPC facilities at Imperial and also under the UK Turbulence Consortium.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **A High-Order Discontinuous Galerkin Solver for Multiphase Flows**

**Juan Manzanero, Carlos Redondo, Gonzalo Rubio, Esteban Ferrer, Eusebio Valero, Susana Gómez-Álvarez, and Ángel Rivero-Jiménez**

#### **1 Introduction**

Multiphase flow is not a canonical problem, therefore different models can be found in the literature. Volume Of Fluid (VOF) model [9] is amongst the simplest. It defines a single set of momentum equations shared by all phases, whilst the volume fraction (fraction of a particular infinitesimal control volume which is occupied by each phase) is tracked throughout the domain following an advection equation. Phase-field methods [11] conserve the simplicity of VOF whilst increasing the physical meaning of the evolution equation of the fluids present in the simulation. The volume fraction is substituted by a phase-field parameter, which identifies each phase. In this work, the Cahn–Hilliard equation [4] is chosen to model the evolution of the phase-field parameter.

The introduced model is discretised in space using a high-order discontinuous Galerkin method. These methods have been gaining popularity for the discretisation of conservation laws, such as the Navier–Stokes equations [5–7, 13, 16, 22, 26]. Specifically, we use a Discontinuous Galerkin Spectral Element Method (DGSEM) [2] that allows the generation of provably stable schemes [8]. These schemes provide enhanced robustness when compared to classical high-order methods [17–20]. As far as the temporal discretisation is concerned, we use an efficient implicit-

J. Manzanero (-) · C. Redondo · G. Rubio · E. Ferrer · E. Valero

ETSIAE-UPM – School of Aeronautics, Universidad Politécnica de Madrid, Madrid, Spain

Center for Computational Simulation, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain

e-mail: juan.manzanero@upm.es

S. Gómez-Álvarez · Á. Rivero-Jiménez Repsol Technology Lab, Móstoles, Madrid, Spain

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_24

explicit approach that permits maintaining the time step restriction of a typical one phase Navier–Stokes solver. It should be noticed that similar approaches to model multiphase flows have been proposed in the past, see for example [29], where an algorithm to model N immiscible incompressible fluids with high-order methods is described. However, according to the authors knowledge, this is the first implementation using the DGSEM.

The rest of the paper is organised as follows: in Sect. 2 the governing equations of the model are described. In Sect. 3 the numerical techniques to discretise the described model are introduced. Finally, in Sect. 4 the results of two validation test cases are shown.

#### **2 Governing Equations**

In this work we model multiphase flows with a phase field approach. The flow field is modelled by means of the incompressible Navier–Stokes equations. The evolution of each of the fluids is modelled with the Cahn–Hilliard equation, which defines a phase field variable, *φ* ∈ [−1*,* 1], that identifies spatial coordinates occupied by fluid 1, *φ* = −1, fluid 2, *φ* = 1, or an interface *φ* ∈ *(*−1*,* 1*)*. The value of the thermodynamic properties of the fluids at each spatial coordinate can be computed as:

$$\rho(\phi) = \rho\_1 \left(\frac{1-\phi}{2}\right) + \rho\_2 \left(\frac{1+\phi}{2}\right), \quad \eta(\phi) = \eta\_1 \left(\frac{1-\phi}{2}\right) + \eta\_2 \left(\frac{1+\phi}{2}\right), \tag{1}$$

where *ρi* is the density of fluid *i* whilst *ηi* is the dynamic viscosity of fluid *i*. The complete system is built considering first the momentum equation,

$$\frac{\partial \left(\rho \mathbf{v}\right)}{\partial t} + \nabla \cdot \left(\rho \mathbf{v} \mathbf{v}\right) = -\nabla p + \frac{1}{Re} \nabla \cdot \left(\eta \left(\nabla \mathbf{v} + \nabla \mathbf{v}^{T}\right)\right) + \frac{3}{\sqrt{2}\kappa ReCa} \mu \nabla \phi + \frac{1}{Fr^{2}} \rho \mathbf{e}\_{\mathbf{g}},\tag{2}$$

with velocity **<sup>v</sup>**, static pressure *<sup>p</sup>*, Reynolds number *Re* <sup>=</sup> *<sup>ρ</sup>*1*u*0*<sup>L</sup> <sup>η</sup>*<sup>1</sup> (where *uo* is a reference velocity whilst *<sup>L</sup>* is a reference length), Capillary number *Ca* <sup>=</sup> *<sup>η</sup>*1*u*<sup>0</sup> *σ* (where *<sup>σ</sup>* represents the surface tension), Froude number *F r* <sup>=</sup> <sup>√</sup>*u*<sup>0</sup> *gL*, (where *g* is the gravity acceleration) and **e***<sup>g</sup>* is the gravity direction. Second, an artificial compressibility method [25] is used to couple the divergence-free condition,

$$\frac{\partial p}{\partial t} + \frac{\rho\_0}{\rho\_1} \frac{1}{M\_0^2} \nabla \cdot \mathbf{v} = 0,\tag{3}$$

where *ρ*<sup>0</sup> = max *ρ*1*, ρ*<sup>2</sup> is a reference density, and *M*<sup>0</sup> is the artificial compressibility Mach number. Third, the Cahn–Hilliard equation for the phase field,

$$\frac{\partial \phi}{\partial t} + \nabla \cdot (\phi \mathbf{v}) = M \nabla^2 \mu, \quad \mu = -\phi + \phi^3 - \varepsilon^2 \nabla^2 \phi,\tag{4}$$

with *M* the mobility, and *ε* the interface width, the two free parameters of the model. In (2) and (4), *μ* represents the chemical potential. Moreover, this equation is designed to minimize the free-energy functional [4], F,

$$\mathcal{F}(\phi, \nabla \phi) = \int\_{\Omega} \left( \frac{1}{4} \left( 1 - \phi \right)^2 \left( 1 + \phi \right)^2 + \frac{1}{2} \varepsilon^2 |\nabla \phi|^2 \right) d\mathbf{x}. \tag{5}$$

Note that the set of Eqs. (2)–(4) is written in non-dimensional form, where the thermodynamic variables of fluid 1 are taken as reference values, e.g.,

$$\rho(\phi) = \left(\frac{1-\phi}{2}\right) + \frac{\rho\_2}{\rho\_1} \left(\frac{1+\phi}{2}\right), \quad \eta(\phi) = \left(\frac{1-\phi}{2}\right) + \frac{\eta\_2}{\eta\_1} \left(\frac{1+\phi}{2}\right). \tag{6}$$

The set (2)–(4) can be written as an advection-diffusion system:

$$\frac{\partial \mu}{\partial t} + \nabla \cdot \mathbf{F}(u) = \nabla \cdot \mathbf{F}\_v(u, \mathbf{g}) + S(u, \mathbf{g}), \tag{7}$$

where *u* = *(φ, ρ***v***, p)* is the state vector, **g** = *(***g***φ,* **g***v,* **g***μ)* = *(*∇*φ,* ∇**v***,* ∇*μ)* is the gradients vector, **F***(u)* and **F***v(u,* **g***)* are the inviscid and viscous fluxes respectively, and *S(u,* **g***)* is a source term,

$$\mathbf{F}(\boldsymbol{\mu}) = \begin{bmatrix} \phi \mathbf{v} \\ \rho \mathbf{v} \mathbf{v} + p \boldsymbol{I}\_3 \\ \frac{\rho\_0}{\rho\_1} \frac{1}{M\_0^2} \mathbf{v} \\\\ S(\boldsymbol{\mu}, \mathbf{g}) = \begin{bmatrix} \mathbf{g}\_{\boldsymbol{\mu}} \\ \eta \left( \mathbf{g}\_v + \mathbf{g}\_v^T \right) \\ 0 \end{bmatrix}, \end{bmatrix},$$

$$S(\boldsymbol{\mu}, \mathbf{g}) = \begin{bmatrix} 0 \\ \frac{3}{\sqrt{2} \varepsilon \boldsymbol{Re} C \boldsymbol{a}} \mu \mathbf{g}\_{\boldsymbol{\phi}} + \frac{1}{\boldsymbol{Fr}^2} \rho \mathbf{e}\_{\boldsymbol{g}} \\ 0 \end{bmatrix}. \tag{8}$$

#### **3 Numerical Methods**

The numerical implementation of (2)–(4) is performed using a high-order discontinuous Galerkin scheme for the spatial discretisation (DGSEM variant) and an implicit-explicit Euler scheme for the time discretisation.

#### *3.1 Spatial Discretisation Using a Nodal Discontinuous Galerkin Scheme (DGSEM)*

Discontinuous Galerkin (DG) schemes (see [15]) are constructed by tessellating the domain in non-overlapping elements, where the solution is approximated using polynomials of an arbitrary order, *N*. In this particular implementation, we use a nodal variant of the DG method, and we restrict ourselves to hexahedral elements.

In each element we approximate the solution using polynomials written in a set of local spatial coordinates *ξ* = *(ξ, η, ζ )* ∈ [−1*,* 1] 3, which are related to the physical space by a transfinite mapping,

$$\mathbf{x} = (\mathbf{x}, \mathbf{y}, z) = \mathbf{X}(\boldsymbol{\xi}) = \mathbf{X}\left(\boldsymbol{\xi}, \boldsymbol{\eta}, \boldsymbol{\xi}\right). \tag{9}$$

Using the local coordinates, we write the solution using tensor product Lagrange polynomials,

$$\left.u(\mathbf{x})\right|\_E \approx U(\boldsymbol{\xi}) = \sum\_{i,j,k=0}^{N} U^{ijk}(t) l\_i(\boldsymbol{\xi}) l\_j(\boldsymbol{\eta}) l\_k(\boldsymbol{\zeta}),\tag{10}$$

where the time-dependent coefficients *Uijk (t)* are the nodal values of the solution *U*, and *lj (ξ )* are the Lagrange polynomials based on a set of Gauss points {*ξj* } *N <sup>j</sup>*=0. To handle curvilinear geometries, we use a mapping **X** that transforms local and physical spaces. With this mapping, we can construct covariant **a***<sup>i</sup>* and contravariant **<sup>a</sup>***<sup>i</sup>* basis, and their associated Jacobian *<sup>J</sup>* , and metrics matrix <sup>M</sup>:

$$\mathbf{a}\_{l} = \frac{\partial \mathbf{X}(\boldsymbol{\xi})}{\partial \boldsymbol{\xi}\_{l}}, \quad \mathbf{a}^{l} = \nabla \boldsymbol{\xi}\_{l} = \frac{1}{J} \mathbf{a}\_{j} \times \mathbf{a}\_{k}, \quad J = \mathbf{a}\_{l} \cdot \left(\mathbf{a}\_{j} \wedge \mathbf{a}\_{k}\right), \quad \mathcal{M} = [J \mathbf{a}^{\xi}, J \mathbf{a}^{\eta}, J \mathbf{a}^{\zeta}]. \tag{11}$$

Following [14], we transform the system of Eqs. (7) to local coordinates,

$$
\frac{\partial}{\partial t} \left\{ \begin{aligned} J\phi \\ J\rho \mathbf{v} \\ Jp \end{aligned} \right\} + \nabla\_{\xi} \cdot \left\{ \begin{aligned} \mathcal{M}^{T} \mathbf{v} \phi \\ \mathcal{M}^{T} \rho \mathbf{v} + \mathcal{M}^{T} p \mathbf{1}\_{3} \\ \mathcal{M}^{T} \frac{1}{M\_{0}^{2}} \mathbf{v} \end{aligned} \right\} = \nabla\_{\xi} \cdot \left\{ \begin{aligned} &M\mathcal{M}^{T} \mathbf{g}\_{\mu} \\ \frac{1}{Re} \mathcal{M}^{T} \left( \eta \left( \mathbf{g}\_{\mathbf{v}} + \mathbf{g}\_{\mathbf{v}}^{T} \right) \right) \\ &0 \end{aligned} \right\}, \\ + J \begin{Bmatrix} &0 \\ \frac{1}{Fr^{2}} \rho \mathbf{e}\_{\mathbf{g}} + \frac{3}{\sqrt{2 ReCa\varepsilon}} \mu \mathbf{g}\_{\phi} \\ 0 \end{aligned} \right\}, \tag{12}
$$

with gradients,

$$J\mathbf{g}\_{\mathbf{v}} = \mathcal{M}\nabla\_{\xi}\mathbf{v},\ \ J\mathbf{g}\_{\phi} = \mathcal{M}\nabla\_{\xi}\phi,\ \ J\mathbf{g}\_{\mu} = \mathcal{M}\nabla\_{\xi}\mu,\tag{13}$$

and the chemical potential definition,

$$J\mu = -J\phi + J\phi^3 - \varepsilon^2 \nabla\_\xi \cdot \left(\mathcal{M}^T \mathbf{g}\_\phi\right). \tag{14}$$

We obtain the DG scheme replacing the continuous solution by their polynomial counterpart (10), then multiplying (12), written in compact form (7), by a polynomial test function (with same order *N* as the solution) *ϑ*, and we integrate the result in one element *E* = [−1*,* 1] 3,

$$\int\_{E} J \vartheta \frac{\partial U}{\partial t} + \int\_{E} \vartheta \nabla\_{\xi} \cdot \mathbf{F}(U) = \int\_{E} \vartheta \nabla\_{\xi} \cdot \mathbf{F}\_{v}(U, \mathbf{G}) + \int\_{E} J \vartheta S(U, \mathbf{G}). \tag{15}$$

Next, we integrate by parts the terms containing divergences, which yields surface integrals. Since the solution is discontinuous at the inter-element faces, we replace the surface flux by a *numerical flux*, **F***0*,

$$
\int\_{E} J \vartheta \frac{\partial U}{\partial t} + \int\_{\partial E} \vartheta \mathbf{F}^{\star} \cdot \hat{n} dS - \int\_{E} \nabla\_{\xi} \vartheta \cdot \mathbf{F} = \int\_{\partial E} \vartheta \mathbf{F}\_{v}^{\star} \cdot \hat{n} dS
$$

$$
$$

where *∂E* represents the six surfaces of the element E. For the inviscid numerical flux **F***0*, we use the exact Riemann solver derived in [1], whilst for the viscous numerical flux we use the Symmetric Interior Penalty (SIP) method [27], with the penalty parameter value derived in [24] and recently discussed for the DGSEM in [21]. In (16), *n*ˆ is the surface outward normal vector in local coordinates. To obtain the evolution equations for each nodal degree of freedom *Uijk* , we let *ϑ* = *li(ξ )lj (η)lk(ζ )*, and compute the integrals using the Gauss quadrature points (and weights{*wi*}) associated to the interpolation points (which provide an accuracy of 2*N* + 1),

$$\begin{split} \left.j^{j,l}\frac{dU^{ij,k}}{dt} + \frac{F^{\star}\_{\star}}{w\_{l}}(\xi,\eta\_{j},\xi\_{k})I\_{l}(\xi)\right|\_{\xi=-1}^{\xi=1} + \frac{F^{\star}\_{\star}}{w\_{j}}(\xi\_{l},\eta\_{l},\xi\_{k})I\_{j}(\eta)\Big|\_{\eta=-1}^{\eta=1} \\ \left. + \frac{F^{\star}\_{\star}}{w\_{k}}(\xi\_{l},\eta\_{j},\xi)I\_{k}(\xi)\right|\_{\xi=-1}^{\xi=1} \end{split}$$

$$\begin{split} -\sum\_{m=0}^{N} \left(\frac{w\_{m}}{w\_{l}}D\_{ml}F^{mk}\_{x} + \frac{w\_{m}}{w\_{j}}D\_{mj}F^{imk}\_{y} + \frac{w\_{m}}{w\_{k}}D\_{mk}F^{ijm}\_{z}\right) &= \\ \frac{F^{\star}\_{\star}/k}{w\_{l}}(\xi\_{lN} - \delta\_{l0}) + \frac{F^{\star}\_{\star}/k}{w\_{j}}(\delta\_{jN} - \delta\_{j0}) + \frac{F^{\star}\_{\star}/k}{w\_{k}}(\delta\_{kN} - \delta\_{k0}) \\ - \sum\_{m=0}^{N} \left(\frac{w\_{m}}{w\_{l}}D\_{ml}F^{mk}\_{v,x} + \frac{w\_{m}}{w\_{j}}D\_{mj}F^{imk}\_{v,v} + \frac{w\_{m}}{w\_{k}}D\_{mk}F^{ijm}\_{v,z}\right) &+ J^{ijk}S^{ijk}, \end{split} \tag{17}$$

where *<sup>F</sup>ijk* <sup>=</sup> *F (Uijk )* and *<sup>F</sup>ijk <sup>v</sup>* <sup>=</sup> *Fv(Uijk ,* **<sup>G</sup>***ijk )*, being **<sup>G</sup>***ijk* the nodal values of the gradient **G**. The symbol *δik* represents the Kronecker delta. The derivation matrix *Dij* is defined as *Dij* = *l <sup>j</sup> (ξi)*. To compute the gradient **G**, we perform the weak formulation of (13),

$$\int\_{E} J \, \mathbf{r} \cdot \mathbf{G} = \int\_{\partial E} U^{\star} \mathcal{M}^{T} \cdot \mathbf{r} dS - \int\_{E} U \nabla\_{\xi} \cdot \left( \mathcal{M}^{T} \cdot \mathbf{r} \right), \tag{18}$$

where *τ* is an arbitrary vector test function (from the order *N* polynomials space). Since we use the SIP method, we use solution averages to couple inter-element fluxes, *<sup>U</sup><sup>0</sup>* = {{*U*}}. All the integrals involved in (18) are computed discretely similar to those in (16), i.e.,

$$\begin{split} \left. J^{ljk} \mathfrak{r}\_{ljk}^{d} G\_{d}^{ljk} = \frac{U^{\star}(\boldsymbol{\xi}, \boldsymbol{\eta}\_{j}, \boldsymbol{\xi}\_{k}) J a\_{d}^{\vec{k}}(\boldsymbol{\xi}, \boldsymbol{\eta}\_{j}, \boldsymbol{\xi}\_{k})}{w\_{l}} I\_{l}(\boldsymbol{\xi}) \right|\_{\boldsymbol{\xi}=-1}^{\boldsymbol{\xi}=1} \\ + \frac{U^{\star}(\boldsymbol{\xi}\_{i}, \boldsymbol{\eta}, \boldsymbol{\xi}\_{k}) J a\_{d}^{\vec{\eta}}(\boldsymbol{\xi}\_{i}, \boldsymbol{\eta}, \boldsymbol{\xi}\_{k})}{w\_{j}} I\_{j}(\boldsymbol{\eta}) \Big|\_{\boldsymbol{\eta}=-1}^{\boldsymbol{\eta}=1} + \frac{U^{\star}(\boldsymbol{\xi}\_{i}, \boldsymbol{\eta}, \boldsymbol{\eta}, \boldsymbol{\xi}) J a\_{d}^{\vec{\xi}}(\boldsymbol{\xi}\_{i}, \boldsymbol{\eta}\_{j}, \boldsymbol{\xi})}{w\_{k}} I\_{k}(\boldsymbol{\xi}) \Big|\_{\boldsymbol{\xi}=-1}^{\boldsymbol{\xi}=1} \\ - \sum\_{m=0}^{N} \left( \frac{w\_{m}}{w\_{l}} J a\_{d}^{\vec{k}, ljk} D\_{mi} U^{mjk} + \frac{w\_{m}}{w\_{j}} J a\_{d}^{\eta, ljk} D\_{mj} U^{m\vec{k}} + \frac{w\_{m}}{w\_{k}} J a\_{d}^{\xi, ljk} D\_{mk} U^{l\boldsymbol{j}m} \right). \end{split} \tag{19}$$

The gradient nodal values *Gijk <sup>d</sup>* are introduced in the viscous fluxes *Fv(Uijk ,* **<sup>G</sup>***ijk )* of (17) hence completing the discretisation of (16). Note that one needs to compute **g***<sup>φ</sup>* before computing *μ* and its gradient **g***μ*.

#### *3.2 Time Integration Using IMplicit–EXplicit (IMEX) and Runge–Kutta Schemes*

The time integration of (17) is performed with a combination of forward and backwards Euler and explicit Runge–Kutta schemes. On the one hand, the Navier– Stokes equations are integrated by means of a third order explicit Runge–Kutta (RK3) scheme [28]. On the other hand, the Cahn–Hilliard equation is integrated with a combination of explicit RK3 for the phase field advection, forward Euler for the chemical free-energy, and backwards Euler for the interfacial energy,

$$\frac{\phi^{n+1} - \phi^n}{\Delta t} + \nabla \cdot \left(\mathbf{v}\phi\right)^{RK3} = \nabla^2 \left(-\phi^n + \left(\phi^n\right)^3 - \varepsilon^2 \nabla^2 \phi^{n+1}\right). \tag{20}$$

The reason behind this choice, is that the numerical stiffness of the bi-Laplacian (∇4*φ*) operator prevents from using an explicit method, as restricts the time-step *<sup>t</sup>*

to unpractical values. We only treat implicitly the interfacial energy since it yields a constant Jacobian matrix, represented by *J* ∇<sup>2</sup> . In particular, the linear system to solve is,

$$\left[J^{\nabla\_2} + \frac{I}{\Delta t}\right] \phi^{n+1} = \frac{\phi^n}{\Delta t} - \nabla \cdot \left(\mathbf{v}\phi\right)^{RK3} + \nabla^2 \left(-\phi^n + \left(\phi^n\right)^3\right). \tag{21}$$

The Jacobian matrix is computed numerically (see [3]) and a LU factorisation is performed only at the first time step. In each following iteration, the RHS of (21) is computed and the linear system is solved by means of forward and backward substitutions. Both the LU factorisation and the forward and backward substitutions are performed with the library MKL-PARDISO [23].

#### **4 Validation**

The proposed methodology is tested with two test cases. First, the validity of the discontinuous Galerkin discretisation of the Cahn–Hilliard equation is tested with a benchmark spinodal decomposition problem [12]. Second, the validity of the coupled Cahn–Hilliard/Navier–Stokes system is tested with a two dimensional rising bubble test [10].

#### *4.1 Spinodal Decomposition*

This test problem considers an initial mixture of two fluids. These fluids are immiscible, therefore they tend to separate to minimise their free energy (5). As stated before, the geometry, initial condition and fluid parameters are taken from [12]. In particular, the initial condition for this benchmark problem is:

$$\phi(\mathbf{x}, \mathbf{y}) = -0.05 \left[ \cos \left( 0.105 \mathbf{x} \right) \cos \left( 0.11 \mathbf{y} \right) + \left[ \cos \left( 0.13 \mathbf{x} \right) \cos \left( 0.087 \mathbf{y} \right) \right]^2 \right. \tag{22}$$

$$+ \cos \left( 0.025 \mathbf{x} - 0.15 \mathbf{y} \right) \cos \left( 0.07 \mathbf{x} - 0.02 \mathbf{y} \right) \Big]. \tag{22}$$

The physical domain is a "T" shape with a total height of 120 units, a total width of 100 units, and horizontal and vertical section widths of 20 units (Fig. 1). No-flux boundary conditions are applied at the boundaries. Following [12] mobility is set to *M* = 10, whilst the interface width is set to *ε* = 3*.*16. The physical domain is discretised with an unstructured mesh of 326 elements and a polynomial order of *<sup>N</sup>* <sup>=</sup> 4. For the time discretization, we use a time step *<sup>t</sup>* <sup>=</sup> <sup>10</sup>−3.

Figure 1 shows qualitatively how the different phases separate, whilst Fig. 2 shows quantitatively the evolution of the total free energy with time. In Fig. 2

**Fig. 1** "T" domain for the spinodal decomposition. Initial condition (left figure) and evolution with time (the right figure is the steady-state solution)

**Fig. 2** Evolution of total free energy (5) with time

the results of this work are compared with those obtained in [12], validating the proposed method.

#### *4.2 Rising Bubble*

This test case considers a bubble of light fluid submerged in a heavy fluid, both subjected to a gravitational field. Following [10] the initial configuration, see Fig. 3, consists of a bubble of radius *r* = 0*.*25 centred at [0*.*5*,* 0*.*5] in a [1 × 2] domain. A no-slip boundary condition is used at the top and the bottom of the domain whilst a free slip condition is enforced at the vertical walls. Following [10], the Reynolds number is set to *Re* = 35 whilst *σ* and *ε* are set to 24.5 and 0.03125 respectively (this gives a Eötvös number *Eo* = 10) whilst both density and viscosity ratios are set to *ρ*1*/ρ*<sup>2</sup> = *μ*1*/μ*<sup>2</sup> = 10. The gravitational acceleration is *g* = 0*.*98. The problem is discretised with 16 × 32 elements with a polynomial order of *N* = 4, and a time step *<sup>t</sup>* <sup>=</sup> <sup>4</sup> · <sup>10</sup>−6.

**Fig. 3** Initial condition of the rising bubble test problem

**Fig. 4** Evolution of the center of mass of the bubble with time

This test case is quantitively compared with the results of [10] in Fig. 4 with satisfactory results. It should be mentioned that the benchmark results of [10] are obtained with a sharp-interface model which may explain the small disagreement in the evolution of the center of mass shown in Fig. 4.

#### **5 Conclusions**

A method to model incompressible two phases flows is introduced. The model solves the incompressible Navier–Stokes equations coupled with the Cahn–Hilliard equation to track the evolution of the different fluids. The model is discretised in space using a discontinuous Galerkin spectral element method (DGSEM) whilst an efficient implicit-explicit approach is used to advance in time. The validity of the model is shown with two test cases. A spinodal decomposition benchmark problem is solved to validate the Cahn–Hilliard solver whilst a rising-bubble test problem is solved to validate the coupled Cahn–Hilliard–Navier–Stokes system. Both test cases are solved showing good agreement with the literature, and proving the accuracy and robustness of the proposed method.

**Acknowledgements** This work has been partially supported by REPSOL under the research grant P180021090. This work has been partially supported by Ministerio de Economía y Competitividad under the research grant TRA2015-67679-C2-2-R and under the research grant EUIN2017-88294 (Gonzalo Rubio). The authors acknowledge the computer resources and technical assistance provided by the Centro de Supercomputación y Visualización de Madrid (CeSViMa).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **High-Order Propagation of Jet Noise on a Tetrahedral Mesh Using Large Eddy Simulation Sources**

**M. A. Moratilla-Vega, V. Saini, H. Xia, and G. J. Page**

#### **1 Introduction**

Due to the rapid expansion of the commercial aviation industry, authorities have been tightening the legislation for aircraft noise. For instance, the European Commission has set a 65% reduction goal of overall aircraft noise from the year 2000 to 2050 [1]. The noise generated by the jet exhaust is one of the main contributors to the overall aircraft noise, especially during take-off [2]. Moreover, in new generation ultra-high by-pass ratio turbofan engines the increased interaction between the engine jet and the high-lift devices can potentially affect the noise field [3]. Thus, our overall aim is to develop and investigate an accurate and efficient method for the prediction of far-field jet noise in installed jet configurations.

Rapid growth in computing power during the last decades has enabled the use of scale resolving numerical simulations for jet noise research at a reduced cost than most experimental campaigns. Conventionally, 2nd-order numerical schemes combined with surface integral techniques, particularly the Ffowcs Williams-Hawkings (FW-H) method [4] have been widely adopted for predicting the far-field noise, due to its simplicity and low cost. However, defining the envelope surface used in the FW-H method is not always trivial in complex configurations [5], for example, installed jets on aircraft wings. Also, the results may be overly sensitive to the size, shape and location of these surfaces. Now, directly resolving the Navier-Stokes (NS) equations for sufficiently accurate far-field jet noise results is prohibitively expensive [6]. LES using finite volume 2nd-order accurate schemes has proven to be reliable and robust for solving jets' near field, but large numerical dispersion and dissipation error makes them less suitable for the propagation of the

M. A. Moratilla-Vega · V. Saini (-) · H. Xia · G. J. Page

Rolls-Royce University Technology Centre, Loughborough University, Loughborough, UK e-mail: m.a.moratilla-vega@lboro.ac.uk; v.saini@lboro.ac.uk; g.j.page@lboro.ac.uk

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_25

sound waves to the far field. High-order methods provide more accurate propagation due to their reduced numerical error but are insufficiently robust for simulating complex jet flows. Therefore, we have used a coupled approach in which a finite volume LES solver is used to obtain the acoustic sources, which are then transferred to a high-order acoustic solver that propagates noise to the far-field.

The spectral/*hp* DG method [7] is capable of providing high-order accuracy and handling mixed mesh elements types such as tetrahedra and hexahedra, thus providing a potential solution to geometrically complex acoustic problems. The solver based on this approach is *AcousticSolver* of the Nektar++ framework [8, 9]. The LES code HYDRA and acoustic code *AcousticSolver* have been coupled and validated using hexahedral elements [10, 11]. A similar coupling strategy has been used previously for jet noise [12] and combustion noise on tetrahedral grids [9].

In this paper, our focus is on two aspects: (1) estimates of mesh design for the high-order solver using a canonical two-dimensional (2D) case and (2) comparison of three-dimensional (3D) turbulent isolated jet-noise results on a tetrahedral grid and a comparable hexahedral grid using the coupling approach. From the perspective of our near future work, the tetrahedral grid results provide motivation and parameters for the set-up of the coupled methodology for jet-flap interactions.

#### **2 Numerical Methods and Solvers**

In this section, the details of the high-order spectral/*hp* DG solver employed to solve the APE equations are provided followed by a brief description of the LES code that solves the filtered compressible NS equations. Finally, the coupling of the two is briefly mentioned.

#### *2.1 APE Solver*

*Equations for Propagation* The acoustic perturbation equations (APE) solved here are the ones proposed by Ewert and Schröder [6] in the APE-4 form. These equations describe the transport of acoustic fluctuations in a linearized form, where the source terms can be non-linear, and can be written as:

$$
\partial\_t p' + \overline{c}^2 \nabla \cdot \left(\overline{\rho} \mathbf{u}' + \overline{\mathbf{u}} \frac{p'}{\overline{c}^2}\right) = \overline{c}^2 q\_c,\tag{1}
$$

$$
\partial\_l \mathbf{u}' + \nabla \left( \overline{\mathbf{u}} \cdot \mathbf{u}' \right) + \nabla \left( \frac{p'}{\overline{\rho}} \right) = \mathbf{q}\_m,\tag{2}
$$

where *p* , **u** are the acoustic pressure and acoustic velocity vector respectively and *c* is the speed of sound. The time-averaged quantities are denoted by the over-bar and acoustic fluctuations are primed. The left-hand side of (1) and (2) represents the advection of waves in the mean flow. The right-hand side describes different sources that may be present in a generic aeroacoustic problem.

Finally, the source terms, *qc* and **q***<sup>m</sup>* are defined as:

$$q\_c = -\nabla \cdot \left(\rho' \mathbf{u}'\right)' + \frac{\overline{\rho}}{c\_p} \frac{Ds'}{Dt},\tag{3}$$

$$\mathbf{q}\_m = -\left(\boldsymbol{\omega} \times \mathbf{u}\right)' + T'\nabla\overline{s} - s'\nabla\overline{T} - \left(\frac{\nabla\left(\mathbf{u}'\right)^2}{2}\right)' + \left(\frac{\nabla \cdot \underline{\mathbf{r}}}{\rho}\right)'.\tag{4}$$

These terms are classified into four categories:


In this paper, only the Lamb vector **L** is considered as a source term because it is the dominant contributor for isothermal applications with strong vortical motions (shear layers and wakes), as demonstrated in [12, 13].

*Numerical Solver* The solver used for the above APE equations is called *AcousticSolver*, which is part of the open-source Nektar++ framework [8]. The solver employs a high-order, spectral/*hp* element method with a DG formulation [7]. In short (for details see [9]), the present DG method works as follows:


The scheme used here to solve the Riemann problem is a local Lax-Friederichs scheme as defined in [9]. The temporal discretisation is performed using a 4th-order Runge-Kutta scheme. A numerical sponge layer [14] is set up using source terms to dampen out the outgoing acoustic waves smoothly, thus minimising reflections from the boundaries of the domain.

#### *2.2 LES Solver*

The LES is performed using the in-house code of Rolls-Royce plc., HYDRA [15] that solves Favre-filtered unsteady compressible Navier-Stokes equations [10]. It is a density-based, spatially 2nd-order accurate finite volume cell-vertex code used for propulsion and turbomachinery applications. More details on the set-up of the spatial scheme used can be found in [10]. For the temporal discretisation, a 2ndorder, four-stage Runge-Kutta explicit algorithm is employed. The size of the time step is chosen to keep the Courant number less than unity. The code is capable of solving arbitrary mesh topologies which is beneficial for complex geometries. The sub-grid scale model is chosen as *σ*-model [16] with model constant *Cσ* = 1*.*35 [17].

#### *2.3 Coupling of Solvers*

The 3D data from LES mesh is transferred and interpolated onto the APE mesh in real time. The interpolation is necessary because two solvers have different meshes designed specifically to capture flow and acoustics. The transfer-interpolation process takes place in parallel. This is achieved using an MPI based coupling strategy with the open-source library CWIPI [18]. More details on the coupling mechanism are provided in Lackhove et al. [9] and Moratilla-Vega et al. [10]. Note that larger time steps can be used for *AcousticSolver* since it is not restricted to resolve the small flow structures.

#### **3 Test Cases**

Two cases are presented here. First, a canonical noise propagation case due to a well-defined vortex-pair source run on *AcousticSolver* alone. A study of numerical error by changing the mesh and polynomial expansion order (P) is performed. The second case uses the mesh parameters from the first to propagate noise generated by an isolated jet in an LES simulation. This case provides validation of the coupling for a 3D turbulent jet noise case on a fully tetrahedral mesh. The results are then compared to the ones obtained with the FW-H technique.

#### *3.1 Spinning Vortex Pair*

The case is an acoustic wave propagation problem in two dimensions where the source is mathematically well-defined, as in the original work on APE [6]. The case is run with standalone *AcousticSolver*. The source is in the form of twopoint vortices at a distance of *r*<sup>0</sup> from the origin, rotating with a circulation . An analytical solution of the induced acoustic field was found by Müller and Obermeier [19] as:

$$
\tilde{p}' = \frac{\rho\_{\infty} \Gamma^4}{64\pi^3 r\_0^4 c\_{\infty}^2} H\_2^{(2)}(kr), \tag{5}
$$

where, *H(*2*)* <sup>2</sup> is the Hankel function of 2nd-order and second kind, the rotation period is defined as *<sup>T</sup>* <sup>=</sup> <sup>8</sup>*π*2*r*<sup>2</sup> <sup>0</sup> */* ; the angular velocity as *<sup>ω</sup>* <sup>=</sup> */*4*π r*<sup>2</sup> <sup>0</sup> and the Mach number as *Mr* = */*4*π r*0*c*∞. The real part of Eq. (5) gives the pressure fluctuations. Ewert and Schröder [6] found the source-term based on the Lamb vector that represents the acoustic field for this case as:

$$\mathbf{q}\_m = -\frac{\Gamma^2 \mathbf{e}\_r(t)}{8\pi^2 \sigma^2 r\_0} \sum\_{l=1}^2 (-1)^l \exp\left(-\frac{|\mathbf{r} + (-1)^l \mathbf{r}\_0(\mathfrak{H})|^2}{2\sigma^2}\right), \quad \sigma \approx r\_0,\tag{6}$$

where, *<sup>r</sup>* <sup>=</sup> *(x, y)<sup>T</sup>* , *<sup>r</sup>*<sup>0</sup> <sup>=</sup> *<sup>r</sup>*0*er*, *<sup>e</sup><sup>r</sup>* <sup>=</sup> *(*cos *θ ,*sin *θ )<sup>T</sup>* and *<sup>θ</sup>* <sup>=</sup> *ωt*.

The computational domain considered is circular and extends to 250*r*0. The source parameters are set as in [6] i.e. */(c*∞*r*0*)* = 1*.*6 and *Mr* = 0*.*1273. Simulations are run until the pressure fluctuations reach *r* = 200*r*<sup>0</sup> in order to minimise the boundary effects. All the elemental meshes consist of triangles and a modified hierarchical Jacobi polynomial basis [8].

First, a reference simulation with polynomial order 4 (P4) is run on a fine uniform mesh. The resulting acoustic pressure field is plotted along a diagonal line in Fig. 1a. The result from this simulation matches its analytical counterpart well. Further comparisons are made with respect to this well-resolved P4 numerical

**Fig. 1** (**a**) Pressure field along *x* = *y* line, (**b**) solution points, and contours of the source term in the source region and the pressure field in the propagation region


**Table 1** Details of the test cases run

*Nr* is the elements in the radial direction

**Fig. 2** Acoustic pressure and relative error comparison of the test cases. (**a**) Pressure along *x*=*y* line. (**b**) Relative error (as moving average)

result, henceforth called as "P4 reference". For the test cases, elemental meshes are coarsened radially and the polynomial order is elevated in the propagation region, such that, the solution points-per-wavelength (referred as "ppw") distribution is similar in the radial direction. The radial growth rate is kept ∼1.023 with geometric distribution in all the cases. In the source region, the mesh is kept the same with P1 expansion for all cases. This allows having a smooth transition of solution points distribution when crossing from one region to the other. A sample P2 mesh and contours are shown in Fig. 1b. Table 1 summarises the test runs.

Figure 2a compares the pressure fields in different test cases with the P4 reference. As expected, the P1 simulation shows a considerable reduction in the amplitude. P2 and P4 preserve this quantity more accurately. For simplification, we unify the dissipation and dispersion error by calculating the overall relative error as a moving average (M.A.) over bins of ∼30*r*0. This is plotted in Fig. 2b. For P1 and P2 simulations, a 2% error limit is reached around 50*r*<sup>0</sup> (ppw ∼ 9) and 85*r*<sup>0</sup> respectively (ppw∼ 6.4). P4 simulations remain below this limit under the present conditions (note ppw ∼ 5 at 150*r*0). The values of ppw for different P agree with those suggested in [20] and provide an estimate for mesh design in different polynomial order setting.

Note that for the given ppw, mesh expansion rate and Riemann solver, we did not observe reflections of the acoustic signals on the inter-element boundaries. A caveat of the present study is that we calculate total numerical error here for brevity, however dissipation and dispersion error could be studied separately as done in [20] on a one-dimensional advection study.

#### *3.2 3D Turbulent Isolated Jet Noise*

As a step forward towards noise prediction of installed jets, an isolated jet is simulated using a tetrahedral mesh for *AcousticSolver* to verify the capabilities of the present methodology for complex 3D cases. Note that the coupling is already validated on a cylinder in cross flow and a cylinder-airfoil interaction case in [11].

*Jet Flow* The LES performed is described here briefly since the same is detailed in [10]. An isothermal turbulent jet issuing from a circular cross-section nozzle at Mach 0.9 and Reynolds number *Re* = 10*,*000 (based on jet bulk velocity *Uj* and jet diameter *Dj* ) is considered. Following Shur et al. [21], the present LES domain is cylindrical in shape and extends as *x/Dj* = [−5*,* 100] and *r/Dj* = [0*,* 50]. The mesh has 190 × 75 × 49 nodes in the axial, radial and azimuthal directions respectively. It is refined in the shear layer development area and coarsened towards the outer boundaries. Figure 3a shows a central cross-section of the LES mesh. The inlet boundary condition is a total pressure profile.

*Jet Acoustics* The acoustics domain is cubical to facilitate control on mesh growth. It extends as [−5*,* 40]*Dj* in streamwise direction and [−25*,* 25]*Dj* in transverse directions. Noise propagation on two different grids is compared: fully hexahedral ("hexa") and fully tetrahedral ("tetra"). The former mesh consists of 107 × 69 × 69 elements in the streamwise and transverse directions respectively [10]. The tetra grid is generated to give a similar distribution as the hexa mesh in the vicinity of the

**Fig. 3** Cross-section view through the centre of the jet nozzle. (**a**) LES mesh elements. (**b**) *AcousticSolver* meshes

jet, providing 300,000 elements in total. Figure 3b shows the two meshes where it is seen that the nominal element size in the tetra mesh is slightly larger away from the jet nozzle. Results on the hexa grid (P4) are available from [10] and calculations are performed on the tetra grid in this study. The expansion type utilised is a P4 modified Jacobi basis [8]. A numerical sponge layer [14] of thickness 3*Dj* is applied at the outer boundaries to avoid reflections of the outgoing waves. A factor of 3 in time step size is used as compared to the compressible LES. In line with Sect. 3.1 and [20], a value of ppw ∼ 5 is chosen for accurately resolving frequencies up to a Strouhal number *St* = 0*.*9.

It is already demonstrated in [10] that the LES flow quantities are in acceptable agreement with the high-order LES study of Shur et al. [21]. The noise propagation is calculated using the FW-H method [4] in addition to the present coupled approach. The nominal cut-off *St* for the integral surface defined is ∼0.3 based on the 22 ppw criterion [6]. Figure 4 shows a visual comparison between the acoustic pressure field computed by LES alone and coupled LES-APE (on two meshes). Figure 4a, b qualitatively show that the coupled LES-APE has retained more acoustic content (especially at higher frequencies) due to lower numerical error. This difference is more pronounced in the direction perpendicular to the jet centre-line. Qualitatively comparable results are obtained on the tetra mesh as depicted in Fig. 4c.

Figure 5 shows a quantitative comparison in terms of power-spectral-density (PSD) at two observer locations at a distance of 120*Dj* . The PSD for FW-H is calculated over the surface indicated by the dashed line in Fig. 4a (details in [10]). Comparison is done with the LES of Shur et al. [21] and the experiment of Tanna [22] (*Re* <sup>=</sup> <sup>10</sup>6, Mach <sup>=</sup> <sup>0</sup>*.*9). As previously observed in Fig. 4, the difference between FW-H and the present coupled approach is significant at higher frequencies. For the 30◦ location, the tetra mesh results match the hexa results well. At 90◦

**Fig. 4** Acoustic pressure field at the same time instant (in grayscale [−30*,* 30] Pa). (**a**) LES. (**b**) LES-APE (hexa). (**c**) LES-APE (tetra)

**Fig. 5** PSD at 120*Dj* at two observer locations with respect to the jet centre-line. (**a**) 30◦. (**b**) 90◦

location, there is an improvement of cut-off *St* from 0.3 to 0.8. A small discrepancy is seen at 90◦ for *St >* 0*.*8 (close to the cut-off *St* = 0*.*9). This may be improved by using a finer mesh in the far-field. Overall, the APE results are an improvement over the present FW-H prediction in the high frequency domain. Moreover, the results from the tetra mesh are comparable to ones from the hexa mesh. This implies that the present methodology using tetra grids can be extended to more complex cases (such as installed jets).

#### **4 Conclusions**

A spectral/*hp* code *AcousticSolver* (under Nektar++ framework) has been employed for acoustic waves propagation. The favourable properties of this solver are high-order accuracy and capability to handle unstructured mesh elements. A study on a canonical test case with an analytical solution provided estimates for designing the mesh for the jet application. For polynomial order expansion P4, 5 solution points-per-wavelength is found to provide a low overall error. This value is close to the one reported in a related study [20]. These estimates are used to design a tetrahedral mesh for prediction of noise from an isolated jet (*Re* <sup>=</sup> <sup>10</sup>4, Mach = 0*.*9). The noise sources are calculated from a 2nd-order accurate finite volume LES solver and interpolated onto *AcousticSolver* mesh on-the-fly for noise propagation. The noise results thus obtained offer an improvement over the traditional FW-H method due to high-order accuracy. The power-spectral-density (PSD) results of the noise signal at two different locations relative to the jet nozzle show that the PSDs obtained on the tetrahedral mesh agree with the ones obtained on a slightly finer hexahedral mesh. Further improvements may be achieved by refining the former mesh in the radial direction. These results are encouraging for noise-prediction of more complex industrially relevant geometries such as installed jets.

**Acknowledgements** The authors would like to thank EPSRC for the UK supercomputing facility ARCHER via the UK Turbulence Consortium (EP/L000261/1). This work was partially funded under the embedded CSE programme of the ARCHER UK National Supercomputing Service. The first author would like to acknowledge K. Lackhove for the useful discussions on the use of CWIPI and Nektar++ and the support given by HPC at Loughborough University.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Dynamical Degree Adaptivity for DG-LES Models**

**M. Tugnoli, A. Abbà, and L. Bonaventura**

### **1 Introduction**

Discontinuous Galerkin spatial discretizations of compressible flows allow to perform local degree adaptation (shortly, *p*-adaptation) in a very straightforward way and almost without computational overhead, as shown e.g. in [6]. Dynamical adaptation was also applied successfully to inviscid geophysical flows in [11, 12]. All the previous works relied however on a refinement criterion which essentially estimates the *L*<sup>2</sup> norm approximation error. In [10], we have argued that such a criterion may not be optimal for LES and we have proposed a different, physically based criterion that was shown to be more effective in a number of numerical experiments. The goal of this work, which summarizes some of the results presented in [9], is to extend the above approach to dynamical adaptation and to test the new criterion also in a dynamically adaptive framework.

#### **2 The DG-LES Approach and Its Numerical Implementation**

The DG-LES model for compressible flows employed in this work, based on a Local Discontinuous Galerkin (LDG) discretization of the viscous terms [3], is fully described in [1], to which we refer for all the details on the model

M. Tugnoli · A. Abbà

L. Bonaventura (-) MOX – Modelling and Scientific Computing, Dipartimento di Matematica, Politecnico di Milano, Milano, Italy

e-mail: luca.bonaventura@polimi.it

Dipartimento di Ingegneria Aerospaziale, Politecnico di Milano, Milano, Italy e-mail: matteo.tugnoli@polimi.it; antonella.abba@polimi.it

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_26

equations and numerical discretization approach. Here, only a short description of the discretization elements necessary to introduce dynamical adaptivity will be reported. On the computational domain <sup>⊂</sup> <sup>R</sup><sup>3</sup> a tessellation <sup>T</sup>*<sup>h</sup>* is defined, composed of non overlapping simplicial elements. A discontinuous finite element space V*<sup>h</sup>* is defined as

$$\mathcal{V}\mathcal{V}\_{\hbar} = \left\{ v\_{\hbar} \in L^{2}(\Omega) : v\_{\hbar}|\_{K} \in \mathbb{P}^{q\_{K}}(K), \,\forall K \in \mathcal{T}\_{\hbar} \right\}, \tag{1}$$

where P*qK (K)* denotes the space of polynomial functions of total degree *qK*. The degree can vary arbitrarily from element to element, and the definition of a suitable way to assign such polynomial degree will be discussed in the following. The numerical approximation of the generic variable *a* can be expressed as

$$a\_{\hbar}|\_{K} = \sum\_{l=0}^{n\_{\phi}(K)} a^{(l)} \phi\_{l}^{K},\tag{2}$$

where *φ<sup>K</sup> <sup>l</sup>* are the basis functions on element *K, a(l)* are the modal coefficients of the basis functions and *nφ(K)* + 1 is the number of basis functions required to span the polynomial space P*qK (K)* of degree *qK*, defined in R<sup>3</sup> as:

$$m\_{\phi}(K) = \frac{1}{6}(q\_K + 1)(q\_K + 2)(q\_K + 3) - 1\tag{3}$$

It is worth noting that the expression in (2) can be rewritten, thanks to the hierarchical nature of the basis, as

$$a\_h|\_K = \sum\_{p=0}^{q\_K} \sum\_{l \in d\_p} a^{(l)} \phi\_l^K,\tag{4}$$

where *d*<sup>0</sup> = {0} and *dp* = *<sup>l</sup>* <sup>∈</sup> <sup>1</sup> *...nφ(K)* <sup>|</sup> *φl* <sup>∈</sup> <sup>P</sup>*p(K)*\P*p*−1*(K)* is the set of indices of the basis functions of degree *p.* Obtaining a more or less accurate approximation can be done through increasing or decreasing the limit *qK* of the sum over *p*. It is also worth noticing that the basis normalization implies that the first coefficient of the polynomial expansion *a(*0*)* coincides with the mean value of *ah*|*<sup>K</sup>* over *K.*

In the present DG-LES approach, as discussed extensively in [1], the LES filtering operators are built directly into the DG discretization, in a spirit similar to the VMS approach [4]. Considering *<sup>+</sup>*<sup>V</sup> : *<sup>L</sup>*2*()* <sup>→</sup> <sup>V</sup> the *<sup>L</sup>*<sup>2</sup> projector over the subspace <sup>V</sup> <sup>⊂</sup> *<sup>L</sup>*2*()*, defined by

$$\int\_{\Omega} \Pi \cdot \mathcal{V} \boldsymbol{\mu} \,\boldsymbol{v} \,d\mathbf{x} = \int\_{\Omega} \boldsymbol{\mu} \,\boldsymbol{v} \,d\mathbf{x}, \qquad \forall \boldsymbol{u}, \,\boldsymbol{v} \in \mathcal{V}.$$

it is possible to define the LES filtering · as the projection over the finite dimensional solution subspace V*<sup>h</sup>* in the following way:

$$
\overline{a} = \Pi\_{\mathcal{V}\_{\hbar}} a.\tag{5}
$$

The application of the main LES filtering is purely formal, since it coincides with the discretization of the equations. In this way, simply discretizing the equations leads to solving them for the filtered quantities.

Another parameter to be defined is the filter characteristic dimension, , employed in the definition of all the eddy-viscosity based subgrid model. The definition of the filter size is constant over each element, since the projection is performed elementwise. While more refined definitions can be employed, see e.g. [2], the simple definition

$$\overline{\Delta}(K) = \sqrt[3]{\frac{Vol(K)}{n\_{\phi}(K) + 1}}\tag{6}$$

was employed with success. For the time discretization, the five stages, fourth order Strong Stability Preserving Runge-Kutta method proposed in [8] is employed. The numerical implementation of the previously sketched approach is built in the solver dg-comp using the finite elements toolkit FEMilaro [7].

A first attempt to introduce static *p*-adaptivity in a DG-LES framework has been presented in [10]. In order to overcome the limitations of classical error estimations in LES, a novel indicator based on the classical structure function

$$D\_{lj} = \left\langle \left[ u\_l(\mathbf{x} + \mathbf{r}, t) - u\_l(\mathbf{x}, t) \right] \left[ u\_j(\mathbf{x} + \mathbf{r}, t) - u\_j(\mathbf{x}, t) \right] \right\rangle \tag{7}$$

was proposed. Large values of the structure function calculated inside the element denote a poorly correlated velocity field and the need of higher resolution, while a low structure function value denotes a highly correlated velocity field, which is an indication of a well resolved turbulent region or laminar conditions and of the possibility to employ a lower resolution. However, most of the subgrid models (and in particular the Smagorinsky model) perform adequately in a regime of homogeneous isotropic turbulence, if the filter cut-off length is inside the inertial range. Therefore, in such conditions excessive refinement is not necessary and one can let the subgrid scale model simulate the turbulent dissipation. For this reason, the contribution due to homogeneous isotropic turbulence is removed from the structure function (7). This contribution, as discussed in detail in [10], can be written as

$$D\_{lj}^{iso}(\mathbf{r},t) = D\_{NN}(r,t)\delta\_{lj} + \left(D\_{LL}(r,t) - D\_{NN}(r,t)\right)\frac{r\_l r\_j}{r^2} \tag{8}$$

where *r* = *r* and *DLL, DNN* are the longitudinal and transverse structure functions, respectively. Once *r* is known, only *DLL* and *DNN* need to be determined. The procedure to compute the error indicator can then be described as follows:


$$Ind\_{SF}(K) = \sqrt{Q(K)} = \sqrt{\sum\_{ij} \left[ D\_{lj}(K) - D\_{lj}(K)^{lso} \right]^2}. \tag{9}$$

The static adaptivity procedure presented in [10] is able to produce accurate results with a significant reduction in computational cost. For the simulation of transient phenomena, however, a dynamic adaptivity approach must be applied. The goal of this work, which summarizes results presented in [9], is to extend the above approach to dynamical adaptation, which was successfully employed in the inviscid case in [11, 12].

In those papers, in which special time discretizations approaches were employed that allow the use of very long time steps, the adaptation process was performed at each time step. In the dynamically adaptive simulations presented here, instead, which are carried out with a relatively small time step, the structure function indicator *I ndSF (K)* is computed every *ni(K)* time steps and the average of *si(K)* subsequent values of this quantity is computed. Then, every *ni(K)* × *si(K)* time steps, based on the resulting indicator value in each element, either the polynomial degree is left unchanged or it is updated along with the solution representation. Since the solution is expressed in terms of a hierarchical basis (4), when lowering the polynomial degree, the contribution bound to the removed modes is simply discarded, while when raising the polynomial degree the contribution of the newly added mode is left to zero, to be populated when the integrals over the element and faces couple the old modes with the newly introduced ones.

Notice that, in the present implementation, no dynamic load balancing has been implemented for parallel runs. This means that, during the parallel execution, the dynamic change of number of degrees of freedom could potentially lead to unbalances between the load of different processors. At the moment the balancing is generally executed using a static polynomial distribution. While avoiding excessive unbalancing, this is definitely not the optimal approach and more effective load balancing techniques will have to be investigated in the future.

#### **3 Dynamical Adaptivity Experiments**

The proposed dynamic adaptation criterion has been tested in the simulation of a isolated vortex superimposed on a uniform horizontal flow [5]. This simple test has been chosen for the preliminary study reported here, in anticipation the more complex tests already discussed in [9], in which the same isolated vortex impinges on an obstacle. The DG-LES approach described in [1] was applied, as in [10], with a standard Smagorinsky model for the subgrid stresses. A coarser and a finer mesh have been employed, both based on fully unstructured tetrahedra of constant characteristic length equal to *lh* = 1 and *lh* = 0*.*5*,* respectively. The indicator (9) is computed every *ni(K)* = 2 time steps and *si(K)* = 10 subsequent values are averaged, in order to adapt the resolution every 20 time steps. The sensitivity analysis of the results with respect to these parameters has not yet been carried out and will be the focus of future study. As in [10], two threshold values *#*1*, #*<sup>2</sup> are used to determine *p*-refinement and *p*-derefinement. More specifically. the cells with indicator values smaller than *#*<sup>1</sup> are assigned polynomial degree 2, those with indicator values larger than *#*<sup>2</sup> are assigned polynomial degree 4, while the others are polynomial degree 3. The threshold values employed are given by *#*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>−4*, #*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>−2. Following [10], these values were chosen so as to achieve on average a total number of degrees of freedom slightly smaller than that required by a uniform degree simulation with *p* = 3*.* The dynamic adaptation procedure is able to effectively increase the polynomial degree around the vortex and follow it as it is advected downstream, leaving all the elements with no vortex activity at the lowest resolution. A map of the polynomial degrees in the domain during the advection of the vortex is shown in Fig. 1.

The profiles of velocity magnitude recorded during time, along the path of the vortex, at different distances from the vortex starting point, employing the coarsest mesh, are presented in Fig. 2. The simulations obtained at different uniform polynomial orders are compared with the adaptive results. It can be observed that, even at the highest uniform resolution of degree 4 the velocity profile is distorted during the advection, due to the very limited grid resolution. However, the vortex does not diffuse and dissipate excessively, as opposed to the low resolution uniform

**Fig. 1** Polynomial degree values following the advected vortices on the (**a**) coarse and (**b**) fine mesh; green color corresponds to polynomial degree 3, red color corresponds to polynomial degree 4

**Fig. 2** Profiles of velocity magnitude recorded during time in the vortex path centreline at different distances from vortex starting point, comparison of uniform degree simulations and dynamic adaptive one on the coarse mesh

degree 2 simulation in which the vortex is quickly dissipated. The behaviour of the adaptive simulation is generally mid way between the uniform degree 4 and the uniform degree 3 results.

The comparison with the uniform high degree simulations can be more easily observed in Fig. 3, which show the difference of the velocity magnitude profiles with respect to the uniform degree 4 results, still for the coarse mesh case. In the locations nearer to the starting position of the vortex the adaptive simulation appears close to the degree 4 solution when the first part of the vortex is passing, while a slight difference appears in the second part of the vortex, which is however always within the error of the uniform degree 3 simulation. In the locations farther from the initial starting point of the vortex, which sense the vortex passage after a longer advection time, the adapted simulation is always very close to the uniform degree 4 solution. It has to be noted that the average number of degrees of freedom of the adaptive simulation is 41,488, which remain almost constant throughout the simulation. This is 10.8% more than the 37,430 degrees of freedom needed for the uniform degree 2 solution, 44.6% less than the 74,860 degrees of freedom of the uniform degree 3 resolution, which is always outperformed by the adaptive one, and 68.3% less than the uniform degree 4 simulation.

To correctly assess the effects of adaptivity in the case of the refined mesh, we study the difference of the various results with respect to the uniform degree 4 one, presented in Fig. 4. The differences are generally very small, even for the

**Fig. 3** Difference of velocity magnitude with respect to the most refined simulation at uniform degree 4, recorded during time in the vortex path centreline at different distances from vortex starting point, on coarse mesh

**Fig. 4** Difference of velocity magnitude with respect to the most refined simulation at uniform degree 4, recorded during time in the vortex path centreline at different distances from vortex starting point, on fine mesh

lowest resolution, however it is possible to note how the adaptive results are always comparable to the uniform degree 3 results, and in many points better. Nonetheless, the improvement created by the adaptivity is more limited than in the coarse case, mainly due to the fact that the mesh by itself sufficient to resolve the vortex. In this case the average number of degrees of freedom of the adaptive case is 170,470, which is 5.7% more than the 161,320 degrees of freedom of the uniform degree 2 case, 47.2% less than the uniform degree 3 case and 70.0% less than the uniform degree 4 case. Also the difference in vorticity profiles between the simulation at uniform degree 4 and the lower resolution simulations are presented in Fig. 5 for the coarse resolution and in Fig. 6 for the finer resolution. By comparing the results at the two different resolution is possible to note also for the vorticity that, at the finer resolution, the large scale phenomenon is correctly represented by almost all polynomial degrees, with a minimal vorticity dissipation, while at the coarser resolution only the higher polynomial degree, as well as the adaptive simulation, avoid an excessive dissipation of vorticity.

At the coarser resolution, the difference of the adaptive simulations with respect to the uniform degree 4 ones is smaller than the differences between the other uniform degree simulations (Fig. 5), showing that with the adaptation is also possible to obtain a better resolution of the vorticity profiles. The same is true also at the finer resolution (Fig. 5). In the dynamically adaptive simulations spurious acoustic waves seem to be produced by the dynamical adaptation process, see

**Fig. 5** Difference of vorticity magnitude with respect to the most refined simulation at uniform degree 4, recorded during time in the vortex path centreline at different distances from vortex starting point, on coarse mesh

**Fig. 6** Difference of vorticity with respect to the most refined simulation at uniform degree 4, recorded during time in the vortex path centreline at different distances from vortex starting point, on fine mesh

**Fig. 7** Pressure time derivative in the adaptive simulation of vortex advection on (**a**) coarse mesh, (**b**) finer mesh, at time *T* = 4; in both plots, the represented quantity takes values in the interval [−0*.*1*,* 0*.*1]

Fig. 7. These spurious disturbances were not observed in the dynamically adaptive tests presented in [11, 12], which employed an implicit time discretization, thus strongly damping these high frequency solution components. However, as it can be seen inspecting the time series of the pressure values (not reported here due to the limited space available), these disturbances decrease rapidly in amplitude on the finer mesh and do not seem to propagate through the domain but rather follow the advected vortex. This spurious feature warrants further investigation of the dynamical adaptation approach if a correct approximation of acoustic waves is desired.

#### **4 Conclusions**

The novel degree adaptation criterion for LES simulations in adaptive DG frameworks proposed in [10] and tested so far only in statically adaptive simulations has been also employed in dynamically adaptive simulations. Numerical results in the benchmark case of the advection of an isolated vortex have been presented. These results are meant to be a preliminary for the study of more complex configurations in which the same isolated vortex impinges on an obstacle. The presented results show that the proposed criterion is also effective in the dynamical case. With a coarse basic mesh resolution the effects of *p*-adaptivity are significant, leading to results close to the ones obtained with the maximum resolution allowed to the polynomial base, while when the mesh resolution is already suitable to represent the vortex even with the lowest polynomial degrees the adaptivity leads anyway to accurate results, but with an even higher reduction of the number of degrees of freedom with respect to the non-adaptive solutions. In a subsequent work, the results obtained in [9] for the case of the isolated vortex impinging on an obstacle will be presented, along with other application to fully three-dimensional turbulent flows.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Novel Eighth-Order Diffusive Scheme for Unstructured Polyhedral Grids Using the Weighted Least-Squares Method**

**Duarte M. S. Albuquerque, Artur G. R. Vasconcelos, and Jose C. F. Pereira**

#### **1 Introduction**

The numerical solution of transport phenomena in complex geometrical domains is a subject of continuous development regarding three characteristics: accuracy, robustness and efficiency. The geometrical complexity can be handled with different grid topologies and the understanding of their issues is relevant for industrial applications. High-order computation is a demanding issue, motivated by a potential reduction of computational cost for complex computational fluid dynamics (CFD) problems.

High-order accurate methods for unstructured grids have historically been focused on hyperbolic equations, see e.g. Lê et al. [1]. Barth and Frederickson [2] developed a high-order Finite Volume Methods (FVM) for the resolution of the Euler equations, using a quadratic polynomial. The coupling of Euler system with viscous terms, which requires diffusive schemes was achieved by Ollivier-Gooch et al. [3].

In the last years, the development of high-order methods was applied for the resolution of parabolic and elliptic problems in unstructured grids, see e.g. Boularas et al. [4]. The range of possible applications varies from Poisson problems, see Batty [5], heat transfer problems, see e.g. Chantasiriwan [6], diffusion equations with variable coefficients, see Zhai [7], or discontinuous coefficients, see e.g. Clain et al. [8].

Several polynomial reconstruction techniques applied to FVM can be highlighted: the fourth-order methods of Ollivier-Gooch et al. [9], Cueto-Felgueroso

D. M. S. Albuquerque (-) · A. G. R. Vasconcelos · J. C. F. Pereira

LAETA, IDMEC, IST, Universidade de Lisboa, Lisboa, Portugal

e-mail: DuarteAlbuquerque@tecnico.ulisboa.pt; Artur.Vasconcelos@tecnico.ulisboa.pt; jcfpereira@tecnico.ulisboa.pt

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_27

et al. [10], and Nogueira et al. [11], also sixth-order results have been reported by Clain et al. [12]. The objective of this work is to extend the weighted least-squares (WLS) method to very high-order schemes and polyhedral unstructured grids.

In terms of other applications with the weighted least-squares technique. Magalhaes et al. [13] and Albuquerque et al. [14] have developed, respectively, relative and absolute error estimators for second-order finite volume schemes with unstructured grids. Martins et al. [15, 16] has created a third-order interpolation method with divergence free constraint for immersed boundary applications, respectively, for Cartesian and unstructured polyhedral grids.

The following manuscript is divided in four sections: in Sect. 2 the implemented method for two dimensions is briefly described, in Sect. 3 the verification of the implemented schemes, with Cartesian and perturbed grids, is carried out. Section 4 shows the results for a case with irregular polyhedral and triangular grids and proposes a novel method to treat the Neumann boundary conditions, Sect. 5 concludes the manuscript with a summary of the principal achievements of this work.

#### **2 Elliptical Operator for Unstructured Grids with the Least Squares Technique**

In this work, the Poisson equation will be solved, which is defined by:

$$
\nabla \cdot \nabla \phi = \varphi\_{\phi},
\tag{1}
$$

where *φ* is the transported variable and *ϕφ* is the source term that is required when using manufactured analytical solutions and it is equal to its own Laplacian. After applying the classic Finite Volume method in a Poisson equation the following equation is obtained:

$$\sum\_{f \in \mathcal{F}(P)} \sum\_{g \in \mathcal{G}(f)} \nabla \phi\_{\mathcal{S}} w\_{G\_{\mathcal{S}}} \cdot \mathbf{S}\_{\mathbf{f}} = \int\_{CV} \varphi\_{\phi} dV,\tag{2}$$

where *F (P)* is the set of faces of cell *P*, *G f* is the set of Gauss points of the face *f* , **Sf** is the face normal vector and *wG* is the weight of Gauss-Legendre Quadrature. The important part of this method is how the calculation of the face gradient ∇*φg* is carried out at each Gauss point. This will be explained in the next subsection.

#### *2.1 Polynomial Reconstructions*

To obtain the gradients values at the integration points, a reconstruction of the unknown primitive variable is performed at the face centroid, using a polynomial expansion.

**Table 1** Number of terms of the Taylor expansion required for a *pth* order polynomial at two dimensional (2D) cases


The number of terms of the polynomial has to take into account the required order of the scheme and it has the following form:

$$\begin{aligned} \phi\_f^R \left( \mathbf{x}, \mathbf{y} \right) &= C\_1 + C\_2 \left( \mathbf{x} - \mathbf{x}\_f \right) + C\_3 \left( \mathbf{y} - \mathbf{y}\_f \right) + \\ &+ C\_4 \left( \mathbf{x} - \mathbf{x}\_f \right)^2 + C\_5 \left( \mathbf{x} - \mathbf{x}\_f \right) \left( \mathbf{y} - \mathbf{y}\_f \right) + C\_6 \left( \mathbf{y} - \mathbf{y}\_f \right)^2 + \cdots \ . \end{aligned} \tag{3}$$

Expression (3) can be written in a more compact form, a vectorial one, as:

$$\boldsymbol{\phi}\_f^R(\mathbf{x}) = \mathbf{d}\_f \ (\mathbf{x}) \,\mathbf{c}\_f. \tag{4}$$

where the subscript *f* refers that the reconstruction is made at the face *f* and **d***<sup>f</sup> (***x***)* = ! 1*, x* − *xf , y* − *yf , x* − *xf* 2 *, x* − *xf y* − *yf , y* − *yf* 2 *,* ··· ], **x***<sup>f</sup>* = *xf , yf* is the face centroid coordinates vector, **x** = *x, y* is the coordinates vector of a point used for the reconstruction and **c***<sup>f</sup>* = *<sup>C</sup>*1*, C*2*, C*3*, C*4*, C*5*, C*6*,* ···*<sup>T</sup>* are the reconstruction constants.

Table 1 lists the number of terms of the expansion for each polynomial used in this work.

The order of accuracy of the numerical scheme is *p* + 1, consequently the linear reconstruction will be second order accurate, the cubic reconstruction will be fourth order accurate, the fifth polynomial will have sixth order accurate and finally the seventh polynomial will be eighth order accurate. The numerical schemes will be called of *FLS p* + 1 according to the global order of the implemented method. For each order a minimum number of Gauss points are required to maintain the respective Quadrature order.

#### *2.2 General Approach*

The Weighted Least Squares (WLS) method is a technique used to solve overdetermined problems, where there are more independent equations than unknowns.

Equation (4) results in a system of linear equations, which the form as:

$$\mathbf{D}\_f \, \mathbf{c}\_f = \boldsymbol{\Phi}\_s,\tag{5}$$

**Fig. 1** Examples of different vertex neighbours order from the red face

where **D***<sup>f</sup>* is a combination of **d***<sup>f</sup> (***x***)* for every point of the reconstruction resulting in a matrix with *ns* × *ncoef s* entries. The **c***<sup>f</sup>* is a column vector with *ncoef s* entries, **φ***<sup>s</sup>* is a column vector with *ns* entries, *ncoef s* is the number of constants of the *pth* polynomials and *ns* the size of the computational stencil, which is the set of the computational values and points used in the reconstruction and is made of cell neighbours of the face. Since *ns > ncoef s*, the problem is overdetermined and so the WLS technique is used in order to minimize the weighted residual of the problem.

To solve this problem, specific stencils must be used for each scheme order. This is done by using vertex neighbours according to the experience of the Authors in a previous work [14]. Each successive order scheme requires an higher stencil to respect the *ns > ncoef s* condition. Figure 1 shows examples of these stencils for a regular polyhedral and triangular grid. Basically each successive vertex neighbours (from 1 to 4) is used for a scheme with an even order accuracy. For example the second order scheme only needs a first order of vertex neighbours from the face marked in red.

Other details used in the global matrix construction *Aij* are described in the work of Vasconcelos et al. [17]. Each line of the global matrix *Aij* corresponds to the diffusive discretization of the cell *i* and has to consider the diffusive flux integral for each face of the cell. This flux integral is computed from the polynomial reconstruction centered in the respective face and which was described previously. Finally the high-order diffusive fluxes can be written in the following matrix form:

$$\mathbf{A}\_{lj}\boldsymbol{\phi}\_{j} = \sum\_{f \in \mathcal{F}(l)} \left( \sum\_{\mathbf{s} \in \mathcal{G}(f)} w\_{G\mathbf{s}} \mathbf{t}\_{f} \left( \mathbf{x}\_{\boldsymbol{\mathcal{S}}} \right) \boldsymbol{\phi}\_{j} \right) \cdot \mathbf{S}\_{f} \,, \tag{6}$$

where *F(i)* is the set of faces from cell *i*, *G(f )* is the set of Gauss-Legendre points of face *f* , *wGg* is the weight, **x***<sup>g</sup>* are the coordinates of each Gauss-Legendre point *g* and **tf***<sup>j</sup>* is the contribution from cell *j* to the face *f* diffusive flux from the reconstructed polynomial. The set of cells *j* is defined by the used stencil in the polynomial reconstructed at each face *f* , globally each line *i* will have contributions from all cells that result from the junction of sets from all faces of cell *i*.

#### **3 Order Convergence Verification with Cartesian and Perturbed Non-uniform Grids**

A numerical test is performed in non-uniform grids with an certain imposed displacement. This perturbation is done by moving randomly the grid lines in a range between zero and a % (*γ* ) of the grid size from the Cartesian grid counterpart. This perturbation can be done in either a positive or negative direction. The cells of the grids are always squares and a reference grid without any perturbation is used, i.e. a Cartesian one. For this case, the following analytical solution was used and solved in a 1 × 1 square domain:

$$\phi\left(\mathbf{x},\mathbf{y}\right) = \exp\left(-\frac{\left(\mathbf{x} - \mathbf{0.5}\right)^2 + \left(\mathbf{y} - \mathbf{0.5}\right)^2}{0.0175}\right),\tag{7}$$

Table 2 lists the error ratios, *r*, between a grid with an imposed perturbation and a regular one. Showing the ratio for both the mean and maximum error of the finest grid at study. Particularly for the FLS6 and FLS8 schemes, the error could be one order of magnitude greater than the obtained with the Cartesian grid. It is also shown that an imposed perturbation up to 20% has a low numerical error penalization.

Figure 2 shows the convergence curves obtained but only with the FLS4 and FLS8 schemes. It is possible to observe that the theoretical convergence orders is achieved for every perturbed grid.

Figure 3 shows the error distribution for the FLS8 scheme with a Cartesian and perturbed grid with *γ* = 30%. It is shown that the error distribution is severally changed by the imposed perturbation at the grid.


**Table 2** Ratio of mean and maximum error norms for all schemes between grids with an imposed perturbation and a Cartesian one with 25,600 cells

**Fig. 2** Convergence curves for the imposed perturbed grids with FLS4 and FLS8

**Fig. 3** Error distribution for FLS8 scheme and two grid with 25,600 cells: one without any perturbation (left) and one with an imposed perturbation (right)

#### **4 Results for Several Grid Types and with Neumann Boundary Conditions**

To verify the applicability of the proposed schemes to other grid types and Neumann boundary conditions. The numerical verification was performed with an analytical solution in a square domain, 0*,* 1 . A Neumann boundary condition were imposed at the vertical faces and a Dirichlet boundary condition at the remaining ones.

Two different approaches were used when considering Neumann boundary conditions. The first approach is the classic one, which consists in simply derivation of the respective line from the least-squares matrix that represent the boundary face. It will be defined as the general case (GC) approach.

The second approach which is new to the Author's knowledge and it consists on the multiplication of each line of the **D***<sup>f</sup>* matrix referent to Neumann boundary face, *b*, by the respective face area, **S***b*. That line will be written by ∇**d***<sup>f</sup> (***x***b)* **S***b*, instead of ∇**d***<sup>f</sup> (***x***b)* **n***<sup>b</sup>* and the entry for the vector **φ***<sup>f</sup>* is given by ∇**φ***b***S***b*, instead of ∇**φ***b***n***b*. Consequently, the problem will have the following aspect:

$$
\begin{bmatrix}
1 \left(\mathbf{x}\_1 - \mathbf{x}\_f\right) \left(\mathbf{y}\_1 - \mathbf{y}\_f\right) & \left(\mathbf{x}\_1 - \mathbf{x}\_f\right)^2 & \cdots \\
1 \left(\mathbf{x}\_2 - \mathbf{x}\_f\right) \left(\mathbf{y}\_2 - \mathbf{y}\_f\right) & \left(\mathbf{x}\_2 - \mathbf{x}\_f\right)^2 & \cdots \\
\vdots & \vdots & \vdots & \ddots \\
\hline
\mathbf{0} & \mathbf{S}\_{b\times} & \mathbf{S}\_{b\times} & \mathbf{2}\left(\mathbf{x}\_b - \mathbf{x}\_f\right)\mathbf{S}\_{b\times} & \cdots \\
\vdots & \vdots & \vdots & \vdots & \ddots \\
\end{bmatrix}
\begin{bmatrix}
\mathbf{C}\_1 \\
\mathbf{C}\_2 \\
\vdots \\
\mathbf{C}\_{n\_{off}s} \\
\hline
\mathbf{0}\_{b\times s} \\
\vdots
\end{bmatrix} = \begin{bmatrix}
\phi\_1 \\
\phi\_2 \\
\vdots \\
\nabla \Phi\_b \mathbf{S}\_b \\
\mathbf{0} \\
\vdots
\end{bmatrix},\tag{8}
$$

where the line in the matrix separates the contribution of the stencil cells and Dirichlet faces from the contribution of the Neumann faces of the current considered stencil.

The goal of this operation is to ensure that the vector **φ***<sup>s</sup>* and each line of the least-squares matrix **D***<sup>f</sup>* have the same unit dimensions, something that does not happen with the classic approach. This approach will be designed as dimensional correction (DCN) for Neumann boundary condition.

Numerical tests were performed for two grid types: irregular polyhedral and triangular grids. The analytical solution is given by:

$$\phi\left(\mathbf{x},\mathbf{y}\right) = \sin\left(3\pi\mathbf{x}\right)\sin\left(3\pi\mathbf{y}\right),\tag{9}$$

where in the Neumann boundaries the face flux will be ∇*φb* · **S***<sup>b</sup>* = 0.

Figure 4 shows the convergence curves to both approaches applied for all schemes with the irregular polyhedral (left) and triangular grids (right). The solid line represents the DCN approach and the dotted one represents the classic GC approach. The results point out that the theoretical convergence order is always

**Fig. 4** Convergence curves of the mean error for mixed boundary conditions with irregular polyhedral and triangular grids for all schemes. The dotted lines are the convergence curves for the GC approach and the solid ones represent the convergence curves with the DCN approach


**Table 3** Comparison between the two approaches for a problem with an imposed Neumann BC for all schemes applied to the irregular polyhedral grids

**Table 4** Comparison between the two approaches for a problem with an imposed not-null Neumann BC for all schemes applied to the triangular grids


achieved for both grids and indicates that the DCN approach improves the schemes performance, being more evident for the FLS2 and for FLS8 schemes, specially to the last one. The behaviour of the finest grids are more stable with the DCN.

Table 3 lists the comparison of the two approaches used for the Neumann BC for the irregular polyhedral grid, the comparison is made through the ratio between both approaches and using the mean and maximum error norm, *r* is computed by:

$$r\_l = \frac{\|e\|\_l^{GC}}{\|e\|\_l^{DC\_N}},\tag{10}$$

where *i* is the error norm used for the calculation.

The results show that the biggest decrease of the error occurs for the maximum error. For the FLS8 scheme the error can be reduced up to 21 times, since the new method avoids the truncation error issue presented in the GC and showed in Fig. 4.

Table 4 lists the comparison between the two approaches used for the Neumann BC with the triangular grids. The results obtained allow to conclude that the major decrease of the numerical error occurs for the maximum error, which can be reduced almost one order of magnitude for the second-order scheme and to half with the fourth-order scheme. For the sixth-order scheme the maximum error with this new approach is slightly worse, almost 10%, than the general approach, however in terms of mean error the gain is evident since the mean error is reduced to half with the DCN approach. For the eighth-order scheme, it is possible to reduce the error in about three orders of magnitude since it avoids the truncation error issue.

#### **5 Conclusions**

Verifications tests have been performed for a new high-order scheme based on the weighted least-squares technique and the Finite Volume method. The convergence curves have showed an excellent behaviour indicating that the theoretical order is achieved for all cases at study. Also the new reconstruction method is not very sensitive to the imposed perturbations in the grid or either the topology of the cells.

Additionally, the results allowed the novel proposed approach to treat the Neumann boundary conditions, improving the quality of the solution. These results are the expected ones, since in the WLS problem the dimensions of the matrices are identical to each other, when using this proposed approach.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **An Explicit Mapped Tent Pitching Scheme for Maxwell Equations**

**Jay Gopalakrishnan, Matthias Hochsteger, Joachim Schöberl, and Christoph Wintersteiger**

### **1 Introduction**

Electromagnetic waves propagate at the speed of light. Thus, the field at a certain point in space and time depends only on field values within a dependency cone. A tent pitching method introduces a special "causal" spacetime mesh that respects this finite speed of propagation. It is not limited to Maxwell equations, but can be applied to general hyperbolic equations. A tent pitching method requires a numerical scheme to discretize the equation on that mesh. Discontinuous Galerkin (DG) methods are of particular interest since they offer a systematic avenue to build high order methods. For a given initial condition at the bottom of a tent, the discrete equations may be solved within each individual tent, up to the tent top. The computed solution at the tent top provides initial conditions for the tents that follow later in time. This method is highly parallel, since many tents can be solved independently. Methods using such tent-pitched meshes may be traced back to [5, 7]. More recent works [1, 6, 8] develop Spacetime DG (SDG) methods within tents by formulating local variational problems, for which linear systems are set up and solved. Although these systems are local, the matrix size can grow rapidly with the polynomial order, especially in four-dimensional spacetime tents. In this context

Fariborz Maseeh Department of Mathematics & Statistics, Portland State University, Portland, OR, USA

e-mail: gjay@pdx.edu

J. Gopalakrishnan (-)

M. Hochsteger · J. Schöberl · C. Wintersteiger Institute for Analysis and Scientific Computing, Technische Universität Wien, Wien, Austria e-mail: matthias.hochsteger@tuwien.ac.at; joachim.schoeberl@tuwien.ac.at; christoph.wintersteiger@tuwien.ac.at

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_28

it is natural to ask if one can develop explicit schemes (which usually perform well under low memory bandwidth) that take advantage of tents.

A key ingredient to answer this question was presented in [2], where Mapped Tent Pitching (MTP) schemes were introduced. The MTP discretization, which proceeds by mapping tents to a spacetime cylinder, allows one to evolve the solution either implicitly or explicitly within tents. The memory requirements of the explicit MTP scheme are limited to what is needed for storing the spatial mesh, the solution coefficients at one time step, and the topology of the tents.

In this work, we show that notwithstanding the above-mentioned advantages of the explicit MTP scheme, one may lose higher order convergence if a naive time stepping strategy (involving a standard explicit Runge-Kutta scheme) is used. We then develop a new Taylor time-stepping for the local problems within tents. Despite its simplicity, our numerical experiments show that it delivers optimal order of convergence.

#### **2 Mesh Generation by Tent Pitching**

We start with a conforming spatial mesh consisting of elements *T* = {*T* } and vertices *V* = {*V* }. We progress in time by defining a sequence of advancing fronts *τi*. A front *τi* is given as a standard nodal finite element function on this mesh. It is defined by storing the current time for every vertex of the mesh. We move from *τi* to the next front *τi*+<sup>1</sup> by moving one vertex forward in time, while keeping all other vertices fixed. The spacetime domain between *τi* and *τi*+<sup>1</sup> we call a tent. In Fig. 1, the red domain is the tent between *τi* and *τi*+1.

Its projection to the spatial domain is exactly the vertex patch *ωV* around *V* of the original mesh. The data to be stored for one tent are the bottom and top-times of the central vertex, plus the times for all neighboring vertices.

Note that although the algorithm is described sequentially, it is highly parallel. Vertices with graph-distance of at least two can be moved forward independently. For example, in Fig. 1, all blue tents can be built and processed in parallel.

The distance for advancing a vertex is limited by the speed of light, a constraint often referred to in the literature as the *causality condition.* Under this condition, the Maxwell problem inside the tent is solvable using the initial conditions at the tent bottom. Thus, the top boundary is an outgoing boundary and no boundary conditions are needed there.

**Fig. 1** Tent pitched spacetime mesh for a one-dimensional spatial mesh

Note that the spatial mesh in Fig. 1 is refined towards the right boundary, which leads to smaller tent heights at the right boundary. Hence, smaller time steps in locally refined regions is a very natural feature of tent pitching methods.

#### **3 The MTP Discretization**

Now, we consider the discretization method for one tent domain *K* = {*(x, t)* : *x* ∈ *ωV , ϕb(x)* ≤ *t* ≤ *ϕt(x))*}*,* where *ωV* is the union of elements containing the vertex *V* , and *ϕb* and *ϕt* are the bottom and top fronts, respectively, restricted to *ωV* . Our aim is to numerically solve the Maxwell system on *K*, namely

$$
\partial\_t \varepsilon E = \nabla \times H \,, \qquad \partial\_t \mu H = -\nabla \times E \,, \tag{l}
$$

where boundary values for both fields are given at the tent bottom and ∇=∇*<sup>x</sup>* denotes the spatial gradient.

The approach of MTP schemes is to map the tent domain to a spacetime cylinder *ωV* × *(*0*,* 1*)* and solve the transformed equation there. The transformation from the cylinder to the tent is denoted by *Φ* : *ωV* ×*(*0*,* 1*)* → *K* and is defined by *Φ(x,t )*ˆ = *(x, ϕ(x,t))* ˆ where

$$
\varphi(\mathbf{x}, \hat{t}) = (1 - \hat{t})\varphi\_b(\mathbf{x}) + \hat{t}\varphi\_l(\mathbf{x})\ .
$$

It is similar to the Duffy transformation mapping a square to a triangle (see Fig. 2).

With the notation

$$\text{skew}\,E = \begin{pmatrix} 0 & E\_{\mathbb{Z}} & -E\_{\mathbb{Y}} \\ -E\_{\mathbb{Z}} & 0 & E\_{\mathbb{X}} \\ E\_{\mathbb{Y}} & -E\_{\mathbb{X}} & 0 \end{pmatrix},$$

we can rephrase the curl operator as ∇ × *E* = div skew *E,* where the divergence of the matrix function is taken row-wise. To simplify notation further, we define *<sup>u</sup>* : *<sup>K</sup>* <sup>→</sup> <sup>R</sup><sup>6</sup> by *<sup>u</sup>* <sup>=</sup> *(E, H ),* and set *<sup>g</sup>* : *<sup>K</sup>* <sup>→</sup> <sup>R</sup><sup>6</sup> and *<sup>f</sup>* : *<sup>K</sup>* <sup>→</sup> <sup>R</sup>6×<sup>3</sup> by

$$g(\mu) = \begin{bmatrix} \varepsilon E \\ \mu H \end{bmatrix}, \qquad f(\mu) = \begin{bmatrix} -\text{skew}\, H \\ \text{skew}\, E \end{bmatrix}. \tag{2}$$

Then (1) may be rewritten as the conservation law *∂tg(u)* + div*<sup>x</sup> f (u)* = 0*.* Furthermore, we define *F (u)* <sup>∈</sup> <sup>R</sup>6×<sup>4</sup> as

$$F(u) = \begin{bmatrix} f(u) & g(u) \end{bmatrix} = \begin{bmatrix} -\text{skew}\, H & \varepsilon E \\ \text{skew}\, E & \mu H \end{bmatrix},$$

**Fig. 2** Tent mapped from a tensor product domain

which allows us to write Maxwell's system (1) as the spacetime conservation law

$$\operatorname{div}\_{\mathfrak{x},l} F(\mathfrak{u}) = 0 \; . \tag{3}$$

For each row of *F*, the spacetime divergence div*x,t* sums the spatial divergence of the first three components with the time-derivative of the last component.

Now, we apply the Piola transformation to pull back *F* from the tent *K* to the cylinder using the mapping *Φ*. The derivative of *Φ* and its transposed inverse are

$$\boldsymbol{\Phi}^{\prime} = \begin{bmatrix} I & 0 \\ \nabla \boldsymbol{\varphi}^{T} \ \boldsymbol{\delta} \end{bmatrix} \qquad \text{and} \qquad (\boldsymbol{\Phi}^{\prime})^{-T} = \begin{bmatrix} I & -\boldsymbol{\delta}^{-1} \ \nabla \boldsymbol{\varphi} \\ \boldsymbol{0} & \boldsymbol{\delta}^{-1} \end{bmatrix}.$$

The Piola transform of *<sup>F</sup>* is *F (* <sup>B</sup> *u)*<sup>ˆ</sup> <sup>=</sup> *<sup>P</sup>*{*F*} = *(*det *<sup>Φ</sup> )(F* ◦*Φ)(Φ )*−*<sup>T</sup>* with *<sup>u</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>u</sup>*◦*Φ*. Since the Piola transform provides an algebraic transformation of the divergence, Eq. (3) is simply transformed to div*x,t* <sup>ˆ</sup> *F (* <sup>B</sup> *u)*<sup>ˆ</sup> <sup>=</sup> 0 on the spacetime cylinder. Then, inserting the Jacobian of *Φ* leads us to the transformed equation

$$\left(\partial\_{\hat{t}}(g(\hat{\boldsymbol{\mu}}) - f(\hat{\boldsymbol{\mu}})\nabla\varphi) + \text{div}\_{\boldsymbol{\lambda}}(\delta f(\hat{\boldsymbol{\mu}})) = 0\right),\tag{4}$$

where *δ(x)* = *ϕt(x)* − *ϕb(x)* is the local height of the tent. Note that ∇*ϕ* is an affine-linear function in quasi-time *t* ˆ. Equation (4) describes the evolution of *u*ˆ along quasi-time from *t* ˆ = 0 to *t* ˆ = 1. Details of the calculations are given in [2].

The next step is the space discretization of (4) by a standard discontinuous Galerkin method. Let *Vh* ⊂ [*L*2] <sup>6</sup> be the DG finite element space of degree *p* on *T*. On each tent we search for *u*ˆ : [0*,* 1] → *Vh* such that

$$\int\_{\partial \mathcal{V}} \partial\_{\hat{t}} [g(\hat{u}) - f(\hat{u}) \nabla \varphi] \, v\_h - \sum\_{T \subset \partial \mathcal{V}} \int\_T \delta f(\hat{u}) \nabla v\_h + \sum\_{F \subset \partial \mathcal{V}} \int\_F \delta f\_n(\hat{u}^+, \hat{u}^-) [v] = 0$$

holds for all *vh* ∈ *Vh* and all *t* ˆ ∈ [0*,* 1]. Only the restriction of *Vh* on the patch *ωV* is used in this equation. The numerical flux *fn(u*ˆ+*, u*ˆ−*)* depends on the positive trace lim*s*→0<sup>+</sup> *u(x* ˆ + *sn)* and negative trace lim*s*→0<sup>+</sup> *u(x* ˆ − *sn)*, where *n* is a unit normal vector of arbitrary orientation to the face. The jump is defined as usual by *<sup>u</sup>*ˆ := ˆ*u*<sup>+</sup> − ˆ*u*<sup>−</sup> and the mean value by { ˆ*u*} := <sup>1</sup> <sup>2</sup> *(u*ˆ<sup>+</sup> + ˆ*u*−*)*. One example is the upwind flux [3, p. 434]

$$(f\_n(\hat{\boldsymbol{\mu}}^+, \hat{\boldsymbol{\mu}}^-) = \begin{bmatrix} \{\hat{H}\} \times n + \{\hat{E}\_I\} \\ -\{\hat{E}\} \times n + \{\hat{H}\_I\} \end{bmatrix},$$

with the tangential components *E*ˆ*<sup>t</sup>* = −*(E*ˆ ×*n)*×*n* and *H*ˆ*<sup>t</sup>* = −*(H*ˆ ×*n)*×*n* of *E*ˆ = *E*◦*Φ* and *H*ˆ = *H* ◦*Φ*. Note that the local tent height *δ* enters the boundary integrals as a multiplicative factor. At the outer boundary of the vertex patch we have *δ* = 0, so the facet integrals on the outer boundary disappear. For the above semidiscrete system, initial values for the tent problem are given finite element functions at the tent bottom. The finite element solution on the tent top provides the initial conditions for the next level tent. Therefore, no projection of initial values is needed when propagating from one tent to the next.

After the semi-discretization, as usual, we are left to solve a system of *N* = dim *Vh(ωV )* ordinary differential equations for *<sup>U</sup>* : [0*,* <sup>1</sup>] → <sup>R</sup>*<sup>N</sup>* ,

$$\frac{d}{d\hat{t}} \left[MU\right](\hat{t}) - AU(\hat{t}) = 0 \,, \qquad \hat{t} \in (0, 1) \,, \tag{5}$$

given *U (*0*)*. The non-standard feature of (5) is that *M* is an affine-linear function of the quasi-time *t* ˆ (since our mapping enters the mass matrix *M* through ∇*ϕ*). The matrix *A* is independent of *t* ˆ. A straightforward approach is to substitute *Y* = *MU* and solve

$$\frac{d}{d\hat{t}}Y - AM^{-1}Y = 0\ ,$$

instead of (5). Although first order convergence was observed with this strategy, further numerical studies showed reduced order of convergence if the stage-order of the Runge Kutta (RK) method is not high enough—see Fig. 3 (right). While the implicit MTP schemes discussed in [2] do not show this problem, the issue remains critical for explicit schemes. Thus, we propose to use a new type of explicit timestepping for time discretization, discussed next.

#### **4 Structure-Aware Taylor Time-Stepping**

Returning to the ordinary differential equation (5) and continuing to make the substitution *Y* = *MU*, we now reconsider the previous equation as the following differential-algebraic system:

$$\frac{d}{d\hat{t}}Y = AU\ , \qquad Y = MU\ . \tag{6}$$

We begin by subdividing the interval *(*0*,* <sup>1</sup>*)* into *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> smaller intervals of size <sup>1</sup> *m*, defined by *(t* ˆ*i,t* <sup>ˆ</sup>*i*+1*)* <sup>=</sup> *( <sup>i</sup> <sup>m</sup>, <sup>i</sup>*+<sup>1</sup> *<sup>m</sup> )*, for *<sup>i</sup>* <sup>∈</sup> <sup>N</sup> and 0 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>−</sup> 1. Recall that *<sup>A</sup>* is independent of quasi-time *t* ˆ, and *M* is an affine function of *t* ˆ, i.e.,

$$M(\hat{t}) = M\_l + (\hat{t} - \hat{t}\_l)M', \qquad \hat{t} \in (\hat{t}\_l, \hat{t}\_{l+1})$$

where *Mi* = *M(t* ˆ*i)* and the derivative *M* is a constant matrix. We want to design a time-stepping scheme that is aware of this structure.

Consider the approximations to *Y, U* on *(t* ˆ*i,t* ˆ*i*+1*)* in the form of Taylor polynomials *Yi, Ui* of degree *q*, defined by

$$Y\_l(\hat{t}) = \sum\_{n=0}^{q} \frac{(\hat{t} - \hat{t}\_l)^n}{n!} Y\_{l,n} \qquad U\_l(\hat{t}) = \sum\_{n=0}^{q-1} \frac{(\hat{t} - \hat{t}\_l)^n}{n!} U\_{l,n} \ , \qquad \hat{t} \in (\hat{t}\_l, \hat{t}\_{l+1}) \tag{7}$$

where *Yi,n* <sup>=</sup> *<sup>Y</sup> (n) <sup>i</sup> (t* <sup>ˆ</sup>*i)* and *Ui,n* <sup>=</sup> *<sup>U</sup>(n) <sup>i</sup> (t* ˆ*i).* To find these derivatives, we differentiate both equations of (6) *n* times to get

$$\begin{aligned} Y^{(n+1)}(\hat{t}) &= AU^{(n)}(\hat{t}) \; , \quad &n \ge 0 \; , \\ Y^{(n)}(\hat{t}) &= M(\hat{t})U^{(n)}(\hat{t}) + nM'U^{(n-1)}(\hat{t}) \; , \quad &n \ge 1 \; . \end{aligned}$$

For the second equation we used Leibnitz' formula *(fg)(n)* <sup>=</sup> <sup>&</sup>lt;*<sup>n</sup> i*=0 *n i f (i)g(n*−*i)*, and the fact that *M* is affine-linear. Evaluating these equations for the Taylor polynomials *Yi, Ui* at *t* ˆ = *t* ˆ*i*, we obtain a recursive formula for *Yi,n* and *Ui,n* in terms of *Ui,n*−1, namely

$$\begin{aligned} Y\_{l,n} &= AU\_{l,n-1} \ & , \quad 1 \le n \le q \ &,\\ M\_l U\_{l,n} &= Y\_{l,n} - nM' U\_{l,n-1} \ & , \quad 1 \le n \le q-1 \ &, \end{aligned} \tag{8}$$

for all 0 ≤ *i* ≤ *m* − 1. Given *Y*0*,*<sup>0</sup> = *Y (t* ˆ <sup>0</sup>*)*, *M*0*U*0*,*<sup>0</sup> = *Y*0*,*0, applying (8) with *i* = 0 gives the approximate functions *Y*0*(t), U* ˆ <sup>0</sup>*(t )*ˆ in the first subinterval *(t* ˆ <sup>0</sup>*,t* ˆ 1*)*. The recursive formulas are initiated for later subintervals at *n* = 0 by

$$Y\_{l,0} = Y\_{l-1}(\hat{t}\_l), \qquad \qquad M\_l U\_{l,0} = Y\_{l,0} \ , \qquad 1 \le i \le m-1 \, . \tag{9}$$

After the final subinterval, we get *Ym*−<sup>1</sup>*(tm)*, our approximation to *Y (*1*)*. We shall refer to the new time-stepping scheme generated by (8) as the *q*-stage *SAT (structure-aware Taylor) time-stepping.*

Note that *Ym*−<sup>1</sup>*(tm)* is our approximation to *Y* = *MU* at the top of the tent. This value is then passed to the next tent in time. The time dependence of *M* arises from the time dependence of ∇*ϕ*. This gradient is continuous along spacetime lines of constant spatial coordinates. Therefore, when passing from one element of a tent to the same element within the next tent in time, *Y* is continuous (since the solution *U* is continuous). Of course, on flat fronts ∇*ϕ* = ∇*τ* = 0, so there *M* is just a diagonal matrix containing the material parameters.

To briefly remark on the expected convergence rate of a *q*-stage SAT timestepping, recall that due to the mapping of the MTP method we solve for *u*ˆ = *u*◦*Φ*, which satisfies *∂<sup>n</sup> t* <sup>ˆ</sup> *<sup>u</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>δ</sup>n(∂n <sup>t</sup> u)* ◦ *Φ*. The causality condition implies that *δ* → 0 if the mesh size *h* → 0. Thus we may expect the *n*th temporal derivative of *u*ˆ, and correspondingly *U(n),* to go to zero at the rate *O(hn)*. By using a *q*-stage SAT timestepping, we approximate the first *q* − 1 terms of the exact Taylor expansion of *U*. Thus we expect the convergence rate to be *O(hq )*, the size of the remainder term involving *U(q)*. The next section provides numerical evidence for this.

Before concluding this section, we should note that in (8) and (9), we tacitly assumed that *Mi* is invertible. Let us show that this is indeed the case whenever the causality condition (see Sect. 2) |∇*ϕ*<sup>|</sup> *<sup>&</sup>lt;* <sup>√</sup>*εμ* is fulfilled. At any quasi-time *<sup>t</sup>* ˆ, given <sup>a</sup> *<sup>w</sup>*<sup>ˆ</sup> <sup>=</sup> *(w*<sup>ˆ</sup> *E, <sup>w</sup>*<sup>ˆ</sup> *<sup>H</sup> )* <sup>∈</sup> *Vh* whose coefficient vector in the basis expansion is *<sup>W</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* , consider the equation *M(t )U*ˆ = *W* for the coefficient vector *U* of *u*ˆ ∈ *Vh*. This equation, in variational form, is

$$\int\_{\boldsymbol{\alpha}\boldsymbol{\eta}} \left[ \boldsymbol{g}(\hat{\boldsymbol{u}}) - f(\hat{\boldsymbol{u}}) \nabla \boldsymbol{\varphi} \right] \cdot \hat{\boldsymbol{v}} = \int\_{\boldsymbol{\alpha}\boldsymbol{\eta}} \left( \hat{\boldsymbol{w}}\_{E}, \hat{\boldsymbol{w}}\_{H} \right) \cdot \hat{\boldsymbol{v}}, \qquad \text{for all } \hat{\boldsymbol{v}} \in V\_{h}. \tag{10}$$

Let *a(u,*ˆ *v)*ˆ denote the left hand side of (10). To prove solvability of (10), it suffices to prove that *a(*·*,*·*)* is a coercive bilinear form on [*L*2] <sup>6</sup> for any *t* ˆ. By inserting *g(u)*ˆ = [*εE,μ* ˆ *H*ˆ ] *<sup>T</sup>* and *f (u)*<sup>ˆ</sup> = [− skew *H ,* <sup>ˆ</sup> skew *<sup>E</sup>*ˆ] *<sup>T</sup>* into *a(u,*<sup>ˆ</sup> *u)*<sup>ˆ</sup> ,

$$\begin{split} a(\hat{\mu}, \hat{u}) &= \int\_{\omega\_{V}} (\varepsilon \hat{E} - \hat{H} \times \nabla \varphi) \cdot \hat{E} + (\mu \hat{H} + \hat{E} \times \nabla \varphi) \cdot \hat{H} \\ &= \int\_{\omega\_{V}} \varepsilon \hat{E} \cdot \hat{E} + \mu \hat{H} \cdot \hat{H} + 2(\hat{E} \times \nabla \varphi) \cdot \hat{H} \\ &\geq \int\_{\omega\_{V}} \varepsilon \hat{E} \cdot \hat{E} + \mu \hat{H} \cdot \hat{H} - 2 \frac{|\nabla \varphi|}{\sqrt{\varepsilon \mu}} \sqrt{\varepsilon} |\hat{E}| \sqrt{\mu} \left| \hat{H} \right| \,, \end{split}$$

where we used the Cauchy-Schwarz inequality and inserted <sup>√</sup>*<sup>ε</sup>* and <sup>√</sup>*<sup>μ</sup>* to achieve the desired scaling. By applying Young's inequality and |∇*ϕ*<sup>|</sup> *<sup>&</sup>lt;* <sup>√</sup>*εμ*,

$$\begin{split} a(\hat{\boldsymbol{\mu}},\hat{\boldsymbol{\mu}}) &\geq \int\_{\boldsymbol{\omega}\boldsymbol{\mu}} \boldsymbol{\varepsilon} \hat{\boldsymbol{E}} \cdot \hat{\boldsymbol{E}} + \mu \hat{\boldsymbol{H}} \cdot \hat{\boldsymbol{H}} - \frac{|\nabla\boldsymbol{\varphi}|}{\sqrt{\varepsilon\mu}} (\boldsymbol{\varepsilon}\hat{\boldsymbol{E}} \cdot \hat{\boldsymbol{E}} + \mu \hat{\boldsymbol{H}} \cdot \hat{\boldsymbol{H}}) \\ &= \int\_{\boldsymbol{\omega}\boldsymbol{\mu}} \left( 1 - \frac{|\nabla\boldsymbol{\varphi}|}{\sqrt{\varepsilon\mu}} \right) (\boldsymbol{\varepsilon}\hat{\boldsymbol{E}} \cdot \hat{\boldsymbol{E}} + \mu \hat{\boldsymbol{H}} \cdot \hat{\boldsymbol{H}}) \geq C \min\left(\boldsymbol{\varepsilon},\mu\right) \|\hat{\boldsymbol{u}}\|\_{L\_{2}}^{2} \,, \end{split}$$

form some constant *C >* 0. Thus *Mi* is invertible and the SAT time-stepping is well defined on all tents respecting the causality condition.

One may exploit the specific details of the Maxwell problem to avoid the assembly and the inversion of matrices *Mi* (as we have done in our implementation). In fact, instead of (10), we can explicitly solve the corresponding exact undiscretized equation obtained by replacing *Vh* by [*L*2] <sup>6</sup> in (10). The solution *<sup>u</sup>*<sup>ˆ</sup> <sup>=</sup> *(E,*<sup>ˆ</sup> *H )*<sup>ˆ</sup> in closed form reads

$$
\hat{E} = \frac{1}{\varepsilon\mu - |\nabla\varphi|^2} \left( I - \frac{1}{\varepsilon\mu} \nabla\varphi \nabla\varphi^T \right) \left( \mu \hat{w}\_E + \hat{w}\_H \times \nabla\varphi \right) \,, .
$$

$$
\hat{H} = \frac{1}{\varepsilon\mu - |\nabla\varphi|^2} \left( I - \frac{1}{\varepsilon\mu} \nabla\varphi \nabla\varphi^T \right) \left( \varepsilon\hat{w}\_H - \hat{w}\_E \times \nabla\varphi \right) \,, .
$$

We then perform a projection of these into *Vh* to obtain the coefficients *U (t* ˆ*i)*. For uncurved elements, this just involves the inversion of a diagonal mass matrix. For the small number of curved elements, we use a highly optimized algorithm which uses an approximation instead of the exact inverse mass matrix.

#### **5 Numerical Results**

The MTP discretization in combination with the SAT time-stepping on tents is implemented within the Netgen/NGSolve finite element library. In this section numerical results concerning accuracy as well as performance are reported.

#### *5.1 Convergence Studies in Two Space Dimensions*

We consider the model problem in two space dimensions

$$
\partial\_{\mathbf{l}} \varepsilon E\_{\boldsymbol{\varepsilon}} = \partial\_{\mathbf{x}} H\_{\mathbf{y}} - \partial\_{\mathbf{y}} H\_{\mathbf{x}} \ , \qquad \partial\_{\mathbf{l}} \mu H\_{\mathbf{x}} = -\partial\_{\mathbf{y}} E\_{\boldsymbol{\varepsilon}} \ , \qquad \partial\_{\mathbf{l}} \mu H\_{\mathbf{y}} = \partial\_{\mathbf{x}} E\_{\boldsymbol{\varepsilon}} \ , \ \mathbf{x}
$$

on the spacetime cube [0*, π*] <sup>2</sup> × [0*,* <sup>√</sup>2*π*]. Parameters are set *<sup>ε</sup>* <sup>=</sup> *<sup>μ</sup>* <sup>=</sup> 1 such that speed of light is *c* = 1. Initial and boundary values are set such that the exact solution is given by

$$E\_{\mathbb{Z}} = \sin(\mathbf{x})\sin(\mathbf{y})\cos\left(\sqrt{2}t\right)\text{ ,}$$

$$H\_{\mathbb{X}} = -\frac{1}{\sqrt{2}}\sin(\mathbf{x})\cos(\mathbf{y})\sin\left(\sqrt{2}t\right)\text{ ,}$$

$$H\_{\mathbb{Y}} = \frac{1}{\sqrt{2}}\cos(\mathbf{x})\sin(\mathbf{y})\sin\left(\sqrt{2}t\right)\text{ .}$$

Based on a spatial mesh with mesh size *h*, we generate a tent pitched mesh such that the maximal slope |∇*ϕ*<sup>|</sup> is bounded by *(*2*c)*−<sup>1</sup> and apply a discontinuous

**Fig. 3** Spatial *L*<sup>2</sup> error of all field components over degrees of freedom (dof) for the *(p*+1*)*-stage SAT time-stepping (left) and the classical Runge-Kutta (right)

Galerkin method in space using polynomials of order *p*, with 1 ≤ *p* ≤ 4. On each cylinder we perform a *(p* +1*)*-stage SAT time-stepping with *m* = 2*p* intervals. The spatial *L*<sup>2</sup> error of all field components at the final time is reported in the left plot of Fig. 3. We observe that the error goes to zero at the optimal rate of *O(hp*+1*)* until we are close to machine precision.

In contrast, the right plot in Fig. 3 illustrates the previously mentioned loss of convergence rates when the classical Runge-Kutta method is used. The convergence rates stagnate at first order no matter what *p* is used. A similar behavior was also observed for other explicit Runge-Kutta methods.

#### *5.2 Large Scale Problem in Three Space Dimensions*

As a second example we present a simulation on a domain similar to the resonator shown in [4]. The geometry is given as body of revolution of smooth B-spline curves. The mesh consisting of 489,593 curved tetrahedral elements is shown in Fig. 4. Due to higher curvature the mesh is refined along the inner roundings, where the ratio of the largest to the smallest element is approximately 5:1. We used a Gaussian peak (located at the axis of revolution and the position of the fifth inner rounding) for the electric field as initial data. The explicit MTP scheme with SAT time-stepping then computed the solution at *t* = 260 using time slabs of height 1, with each slab composed of *N*tents = 149*,*072 tents. On each tent we used a *(p*+1*)*-stage SAT time-stepping with *m* = 2*p* intervals, where *p* denotes the spatial polynomial order. With the spatial degrees of freedom *N*dof*,i* of the *i*th tent and the

**Fig. 4** Tetrahedral mesh with 489 k curved elements, ratio of the largest to the smallest element of approximately 5:1 and the *Hy* component of solution at *t* = 260 calculated with spatial polynomial order *p* = 3

**Table 1** Number of degrees of freedom and simulation times for spatial polynomial orders *p* = 2*,* 3


This data was generated using a shared memory server with 4 E7-8867 CPUs with 16 cores each

number of stages *q* = *p* + 1, we obtain the total spacetime degrees of freedom per time slab

$$\sum\_{i=1}^{N\_{\text{tents}}} N\_{\text{dof},i} \, m \, q = \left( \sum\_{i=1}^{N\_{\text{tents}}} N\_{\text{dof},i} \right) 2 \, p(p+1) \dots$$

The corresponding numbers of degrees of freedom and the simulation times are shown in Table 1. In [4] a similar problem is solved using a discontinuous Galerkin method with quadratic elements, combined with a polynomial Krylov subspace method in time. Using 96 cores it took them 7:10 h to reach the final time. Our simulation with polynomial order *p* = 3, which has a comparable number of unknowns, took 3:33 h on 64 cores. This significant speed up is an illustration of the capability of the new method. The *Hy* component of the obtained solution at *t* = 260, using third order polynomials in space, is shown in Fig. 4.

**Acknowledgement** This work was supported in part by the National Science Foundation grant DMS-1912779.

#### **References**

1. Abedi, R., Petracovici, B., Haber, R.B.: A spacetime discontinuous Galerkin method for elastodynamics with element-wise momentum balance. Comput. Methods Appl. Mech. Eng. **195**, 3247–3273 (2006)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Viscous Diffusion Effects in the Eigenanalysis of (Hybridisable) DG Methods**

**Rodrigo C. Moura, Pablo Fernandez, Gianmarco Mengaldo, and Spencer J. Sherwin**

### **1 Introduction**

When numerically solving partial differential equations, numerical errors are likely to impact not only solution accuracy, but also the stability/robustness of the computation. This is particularly the case in eddy-resolving approaches to turbulent flows, such as large-eddy simulation (LES) and direct numerical simulation (DNS). Also, in the so-called implicit LES / under-resolved DNS strategies [1], where numerical error (specifically dissipation) provides small-scale regularisation in lieu of a turbulence model, understanding the nature of numerical errors is crucial. These typically appear in the form of dispersion and diffusion errors, where the former distorts the solution, while the latter is responsible for its damping. A useful framework for the assessment of such numerical errors is the eigensolution analysis technique [2, 3].

P. Fernandez Massachusetts Institute of Technology, Cambridge, MA, USA e-mail: pablof@mit.edu

S. J. Sherwin Imperial College, London, UK e-mail: s.sherwin@imperial.ac.uk

R. C. Moura (-)

Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil e-mail: moura@ita.br

G. Mengaldo California Institute of Technology, Pasadena, CA, USA e-mail: mengaldo@caltech.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_29

We present the first eigenanalysis of hybridisable discontinuous Galerkin (HDG) methods. This is also one of the first studies to consider viscous diffusion effects in the eigenanalysis of discontinuous SEM (spectral element methods), as it addresses the advection-diffusion equation in one dimension. Focus is given to the temporal analysis approach [2, 5], which is suited for problems with periodic boundary conditions. The spatial analysis [3, 4], suited for inflow-outflow problems, will be considered in subsequent studies. Here, we offer preliminary results on (*i*) the effects of the Peclét number (a cell-based Reynolds number), and (*ii*) the interplay between upwind (numerical) dissipation and viscous (physical) diffusion. We highlight how these results improve upon our understanding and practice of implicit LES / under-resolved DNS approaches.

We note that, although a non-modal eigenanalysis strategy better suited for turbulence computations has been recently proposed [6], the present work will focus on more fundamental aspects and follow therefore the classical eigenanalysis. Finally, the results presented here are representative of a broader class of discontinuous SEM, given the well established connections within this class—see e.g. [7].

This paper is organized as follows. Section 2 introduces the HDG discretisation as applied to the linear advection-diffusion equation in one dimension. Section 3 details the temporal eigenanalysis framework and presents our preliminary results. Finally, in Sect. 4, our conclusions are summarised and future research topics are outlined.

#### **2 HDG Discretisation**

In one dimension, the linear advection-diffusion equation is given by

$$\frac{\partial u}{\partial t} + a \frac{\partial u}{\partial x} = \mu \frac{\partial^2 u}{\partial x^2} \,, \tag{1}$$

where the advection velocity *a* and the viscosity *μ* are positive constants. This equation can be written in conservation form through the flux function *f (u, g)* = *au* − *μg*, as the system

$$
\frac{\partial \mu}{\partial t} + \frac{\partial f}{\partial x} = 0 \,, \tag{2}
$$

$$
\delta g - \frac{\partial u}{\partial x} = 0 \,, \tag{3}
$$

where *g* is the auxiliary gradient variable. The discretisation procedure is similar to that of traditional DG methods.

After the (1D) physical domain is partitioned into non-overlapping elemental regions of size *h*, the numerical solution and its gradient are locally approximated by polynomial expansions in the form

$$u|\_{\Omega} = \sum\_{j=0}^{P} \hat{u}\_j(t) \, \phi\_j(\xi) \,, \quad \mathbf{g}|\_{\Omega} = \sum\_{j=0}^{P} \hat{\mathbf{g}}\_j(t) \, \phi\_j(\xi) \,, \tag{4}$$

where *φj* are polynomial basis functions of degree up to *P*, defined in the standard domain *st* = [−1*,* 1]. A linear mapping relation is assumed between the physical coordinate *x* of element and the coordinate *ξ* ∈ *st*.

Multiplying Eqs. (2)–(3) by *φi*, integrating over element and applying integration by parts leads respectively to

$$\frac{h}{2} \int\_{\Omega\_{\rm dl}} \frac{\partial u}{\partial t} \phi\_l \, d\xi \, + \left( \widetilde{f} \phi\_l \right)\_{\ominus}^{\oplus} = \int\_{\Omega\_{\rm dl}} f \frac{\partial \phi\_l}{\partial \xi} \, d\xi \,\,, \tag{5}$$

$$\frac{h}{2} \int\_{\Omega\_{\rm sl}} g \phi\_l \, d\xi \, + \int\_{\Omega\_{\rm sl}} \mu \frac{\partial \phi\_l}{\partial \xi} \, d\xi \, = \left( \widetilde{\boldsymbol{u}} \phi\_l \right)\_{\ominus}^{\oplus},\tag{6}$$

where 0 and ⊕ denote the left and right boundaries of element , in that order. As typical, expansions in (4) are to be inserted into (5)–(6), which are then required to hold for *i* = 0*,...,P*. Note that the integrals above have been moved to *st* and interface quantities 3*<sup>u</sup>* and *<sup>f</sup>* <sup>3</sup>have been introduced. The state average3*<sup>u</sup>* is peculiar to HDG in that it represents a uniquely defined interface variable whose value stems indirectly from the enforced continuity of the numerical flux *f* <sup>3</sup>. This continuity ensures local conservation for HDG methods, regardless of the chosen flux formula.

For the advection-diffusion problem at hand, the interface fluxes on either side of a given element (cf. Fig. 1, left diagram) can be taken in the form

$$\widetilde{f}\_{\oplus} = f(\widetilde{\boldsymbol{u}}\_{\oplus}, \boldsymbol{g}\_{\oplus}) - \tau(\widetilde{\boldsymbol{u}}\_{\oplus} - \boldsymbol{u}\_{\oplus})\,,\tag{7}$$

$$\tilde{f}\_{\ominus} = f(\tilde{u}\_{\ominus}, \mathcal{g}\_{\ominus}) - \tau(u\_{\ominus} - \tilde{u}\_{\ominus})\ , \tag{8}$$

**Fig. 1** Notation adopted for the element viewpoint (left) and the interface viewpoint (right)

in which

$$\mu\_{\oplus} = \sum\_{j=0}^{P} \hat{u}\_{j} \, \phi\_{j}(+1) \,, \quad \text{g}\_{\oplus} = \sum\_{j=0}^{P} \hat{\mathbf{g}}\_{j} \, \phi\_{j}(+1) \,, \tag{9}$$

$$\mu\_{\ominus} = \sum\_{j=0}^{P} \hat{u}\_j \,\phi\_j(-1) \,, \quad \text{g}\_{\ominus} = \sum\_{j=0}^{P} \hat{\mathfrak{g}}\_j \,\phi\_j(-1) \, . \tag{10}$$

Also, *τ* = *β*|*a*| + *σ* is a stabilisation constant combining an upwinding parameter *β* and a penalty term *σ* that accounts for the partially diffusive character of the model equation considered. This work however assumes *σ* = 0 as it focuses on advectiondominated cases, which are typically stable without the penalty term *σ*, even within the context of turbulence simulations [8].

Flux formulas (7)–(8) are inspired in Ref. [9]. In the case of pure advection (with *<sup>σ</sup>* <sup>=</sup> 0), the interface solution variable becomes the simple average 3*<sup>u</sup>* <sup>=</sup> *<sup>u</sup><sup>L</sup>* <sup>⊕</sup> <sup>+</sup> *<sup>u</sup><sup>R</sup>* <sup>0</sup> of the adjacent states from the left (*L*) and right (*R*) elements sharing the considered interface. Under this case, it is also easy to show that the fluxes in (7)–(8) recover those used in traditional DG methods, whereby HDG exactly reproduces DG. This does not hold, however, when diffusion is taken into account, in which case 3*<sup>u</sup>* is only implicitly defined from the flux continuity condition enforced at interfaces, *f* 3*L* <sup>⊕</sup> = *f* 3*R* <sup>0</sup> , namely

$$a\widetilde{u} - \mu \mathbf{g}\_{\oplus}^{L} - \tau \left(\widetilde{u} - \mu\_{\oplus}^{L}\right) = a\widetilde{u} - \mu \mathbf{g}\_{\ominus}^{R} - \tau \left(\mu\_{\ominus}^{R} - \widetilde{u}\right),\tag{11}$$

where *g<sup>L</sup>* <sup>⊕</sup> and *<sup>g</sup><sup>R</sup>* <sup>0</sup> depend on values of <sup>3</sup>*<sup>u</sup>* at two other interfaces via (6). The diagram on the right-hand-side of Fig. 1 should help clarify the notation adopted.

Using vectors *<sup>u</sup>*<sup>ˆ</sup> = {ˆ*u*0*,..., <sup>u</sup>*ˆ*<sup>P</sup>* }*<sup>T</sup>* and *<sup>g</sup>*<sup>ˆ</sup> = {ˆ*g*0*,..., <sup>g</sup>*ˆ*<sup>P</sup>* }*<sup>T</sup>* , the flux continuity condition (11) becomes

$$\widetilde{\boldsymbol{\mu}} = \frac{1}{2} \left( \widehat{\boldsymbol{\phi}}\_{\oplus}^{T} \widehat{\boldsymbol{\mu}}^{L} + \widehat{\boldsymbol{\phi}}\_{\ominus}^{T} \widehat{\boldsymbol{\mu}}^{R} \right) + \frac{\mu}{2\pi} \left( \widehat{\boldsymbol{\phi}}\_{\ominus}^{T} \widehat{\boldsymbol{\xi}}^{R} - \widehat{\boldsymbol{\phi}}\_{\oplus}^{T} \widehat{\boldsymbol{\xi}}^{L} \right) \,, \tag{12}$$

where *<sup>φ</sup>*ˆ⊕ = {*φ*ˆ0*(*+1*), . . . , <sup>φ</sup>*ˆ*<sup>P</sup> (*+1*)*}*<sup>T</sup>* and *<sup>φ</sup>*ˆ0 = {*φ*ˆ0*(*−1*), . . . , <sup>φ</sup>*ˆ*<sup>P</sup> (*−1*)*}*<sup>T</sup>* . Likewise, (6) can be written as

$$\frac{h}{2}M\hat{\mathbf{g}} + D\hat{u} = \hat{\phi}\_{\oplus}\widetilde{u}\_{\oplus} - \hat{\phi}\_{\ominus}\widetilde{u}\_{\ominus} \, , \tag{13}$$

in which matrices *M* and *D* have been introduced, namely

$$M\_{l,j} = \int\_{\Omega\_{ll}} \phi\_l \phi\_j \, d\xi \, \,, \quad D\_{l,j} = \int\_{\Omega\_{ll}} \frac{\partial \phi\_l}{\partial \xi} \phi\_j \, d\xi \, \,. \tag{14}$$

Finally, (5) becomes

$$\frac{h}{2}M\frac{d\hat{u}}{dt} + \hat{\phi}\_{\oplus}\widetilde{f}\_{\oplus} - \hat{\phi}\_{\ominus}\widetilde{f}\_{\ominus} = aD\hat{u} - \mu D\hat{g} \; , \tag{15}$$

with

$$\widetilde{f}\_{\oplus} = a\widetilde{\boldsymbol{\mu}}\_{\oplus} - \mu \widehat{\boldsymbol{\phi}}\_{\oplus}^T \widehat{\boldsymbol{g}} - \tau (\widetilde{\boldsymbol{\mu}}\_{\oplus} - \widehat{\boldsymbol{\phi}}\_{\oplus}^T \widehat{\boldsymbol{\mu}}) \,, \tag{16}$$

$$\widetilde{f}\_{\ominus} = a\widetilde{\boldsymbol{\mu}}\_{\ominus} - \mu \widehat{\boldsymbol{\phi}}\_{\ominus}^T \widehat{\boldsymbol{g}} - \tau (\widehat{\boldsymbol{\phi}}\_{\ominus}^T \widehat{\boldsymbol{\mu}} - \widetilde{\boldsymbol{\mu}}\_{\ominus}) \,. \tag{17}$$

Note that (12) is a scalar equation written from the point of view of a given interface, whereas (13) and (15) are vector equations written from the viewpoint of an arbitrary element of size *h*.

It is now convenient to eliminate *<sup>g</sup>*<sup>ˆ</sup> and work with variables *<sup>u</sup>*<sup>ˆ</sup> and 3*<sup>u</sup>* alone. This can be done by solving (13) for *g*ˆ and substituting the resulting expression in both (12) and (15). The former substitution leads, after some algebra, to

$$\left(\beta + \frac{m\_{\oplus}^{\ominus} + m\_{\ominus}^{\ominus}}{\text{Pe}}\right)\widetilde{u} - \frac{m\_{\oplus}^{\ominus}}{\text{Pe}}\widetilde{u}\_{\ominus}^{L} - \frac{m\_{\ominus}^{\oplus}}{\text{Pe}}\widetilde{u}\_{\ominus}^{R} = \left.\widehat{\phi}\_{\oplus}^{T}B\_{\oplus}^{L}\widehat{u}^{L} + \widehat{\phi}\_{\ominus}^{T}B\_{\ominus}^{R}\widehat{u}^{R}\right.\tag{18}$$

where Pe = |*a*|*h/μ* denotes the Péclet number, for which a uniform mesh spacing is assumed. Moreover, four scalar constants '*m*' have been introduced, defined as

$$m\_{\oplus}^{\ominus} = \hat{\phi}\_{\ominus}^{T} M^{-1} \hat{\phi}\_{\oplus} \ , \quad m\_{\ominus}^{\ominus} = \hat{\phi}\_{\ominus}^{T} M^{-1} \hat{\phi}\_{\ominus} , \quad m\_{\oplus}^{\ominus} = \hat{\phi}\_{\oplus}^{T} M^{-1} \hat{\phi}\_{\ominus} , \quad m\_{\ominus}^{\oplus} = \hat{\phi}\_{\ominus}^{T} M^{-1} \hat{\phi}\_{\oplus} . \tag{19}$$

In addition, the following matrices appear in (18)

$$B\_{\oplus}^{L} = \frac{\beta}{2}I + \frac{M^{-1}D}{\text{Pe}} \,, \quad B\_{\ominus}^{R} = \frac{\beta}{2}I - \frac{M^{-1}D}{\text{Pe}} \,. \tag{20}$$

Note that (18) relates the solution vectors *u*ˆ of two adjacent elements (*L* and *R*) with the three interface states 3*<sup>u</sup>* associated to the boundaries of these elements.

The second step consists in using *g*ˆ from (13) into (15), not forgetting to take the fluxes (16)–(17) into account. After some more algebra, one arrives at

$$\frac{h}{2a}M\frac{d\hat{u}}{dt} + A\hat{u} = A\_{\oplus}\hat{\phi}\_{\oplus}\tilde{u}\_{\oplus} + A\_{\ominus}\hat{\phi}\_{\ominus}\tilde{u}\_{\ominus},\tag{21}$$

whose matrices now introduced are given by

$$A = \beta \left(\Phi\_{\oplus}^{\ominus} + \Phi\_{\ominus}^{\ominus}\right) + \left(2\operatorname{Pe}^{-1}N - I\right)D\,,\tag{22}$$

$$A\_{\oplus} = \left(\beta - 1\right)I + 2\operatorname{Pe}^{-1}N \,, \quad A\_{\ominus} = \left(\beta + 1\right)I - 2\operatorname{Pe}^{-1}N \, , \tag{23}$$

where

$$\Phi\_{\oplus}^{\oplus} = \hat{\phi}\_{\oplus} \hat{\phi}\_{\oplus}^{T}, \quad \Phi\_{\ominus}^{\ominus} = \hat{\phi}\_{\ominus} \hat{\phi}\_{\ominus}^{T}, \quad N = \left(\Phi\_{\oplus}^{\oplus} - \Phi\_{\ominus}^{\ominus} - D\right) M^{-1} \,. \tag{24}$$

Note that (21) links the solution vector *u*ˆ and its time derivative to the two interface variables 3*<sup>u</sup>* at the boundaries of the considered element.

In the actual context of simulations, (21) would be first solved (analytically) for *u*ˆ after an implicit time-stepping scheme is chosen. This is possible since it entails expressing *du/dt* ˆ in terms of *u*ˆ at the current as well as previous time levels. The next step would be to insert the resulting expression for *u*ˆ into (18), from which a scalar equation whose only unknowns are 3*<sup>u</sup>* at various interfaces is obtained. This equation is finally used for the assembly of a global system given suitable boundary conditions, which can be solved via direct or iterative techniques. Since the system's solution grants 3*<sup>u</sup>* for all interfaces, *<sup>u</sup>*<sup>ˆ</sup> can be obtained locally for each element from the time-discrete version of (21). The reader is referred to [9] for the details of this procedure. In this work, however, as we are interested in the eigenanalysis of HDG, a different strategy is adopted, as outlined next.

#### **3 Temporal Eigenanalysis**

In the eigenanalysis of spectral element methods [2, 5], it is typical to assume wavelike solutions in the form *<sup>u</sup>*<sup>ˆ</sup> <sup>∝</sup> exp[*i(κx* <sup>−</sup> *ωt)*], whereby *<sup>u</sup>*ˆ*<sup>L</sup>* = ˆ*<sup>u</sup>* exp*(*−*iκh)* and *<sup>u</sup>*ˆ*<sup>R</sup>* = ˆ*<sup>u</sup>* exp*(*+*iκh)*. Here, *<sup>u</sup>*<sup>ˆ</sup> is the solution vector of a "central" element, whereas *<sup>u</sup>*ˆ*<sup>L</sup>* and *<sup>u</sup>*ˆ*<sup>R</sup>* refer to solution vectors of neighbouring elements from the left (*L*) and from the right (*R*), respectively. For the HDG formulation, an additional assumption can be made regarding a wave-like behaviour for3*u*. We assume that3*u<sup>L</sup>* <sup>0</sup> <sup>=</sup> <sup>3</sup>*<sup>u</sup>* exp*(*−*iκ h)* and 3*u<sup>R</sup>* <sup>⊕</sup> <sup>=</sup> <sup>3</sup>*<sup>u</sup>* exp*(*+*iκ h)*, where now 3*<sup>u</sup>* is the interface variable shared by two adjacent elements, whereas 3*u<sup>L</sup>* <sup>0</sup> and <sup>3</sup>*u<sup>R</sup>* <sup>⊕</sup> refer to interface variables at the nearest interfaces from the left/right (*L*/*R*). This second assumption is only natural given the connection between *<sup>u</sup>*<sup>ˆ</sup> and <sup>3</sup>*u*. Actually, we now show that *<sup>κ</sup>* <sup>=</sup> *<sup>κ</sup>*, which is not surprising.

We start from (21) assuming wave-like behaviour for *u*ˆ, obtaining

$$\left(-i\frac{\alpha h}{2a}M + A\right)\hat{\mu} = A\_{\oplus}\hat{\phi}\_{\oplus}\widetilde{\mu}\_{\oplus} + A\_{\ominus}\hat{\phi}\_{\ominus}\widetilde{\mu}\_{\ominus},\tag{25}$$

which uniquely defines *<sup>u</sup>*<sup>ˆ</sup> from <sup>3</sup>*u*<sup>⊕</sup> and <sup>3</sup>*u*0. If the above is written for another element, say, the adjacent element from the right (a translation *x* 2→ *x* + *h*), one has

$$\left(-i\frac{\omega h}{2a}M + A\right)\hat{\mu}\exp(i\kappa h) = A\_{\oplus}\hat{\phi}\_{\oplus}\widetilde{\mu}\_{\oplus}\exp(i\kappa'h) + A\_{\ominus}\hat{\phi}\_{\ominus}\widetilde{\mu}\_{\ominus}\exp(i\kappa'h),\tag{26}$$

which then implies

$$\left(-i\frac{\alpha h}{2a}M + A\right)\hat{\mu}\frac{\exp(i\kappa h)}{\exp(i\kappa' h)} = A\_{\oplus}\hat{\phi}\_{\oplus}\widetilde{\mu}\_{\oplus} + A\_{\ominus}\hat{\phi}\_{\ominus}\widetilde{\mu}\_{\ominus} = \left(-i\frac{\alpha h}{2a}M + A\right)\hat{\mu}\,,\tag{27}$$

where (25) has been used on the right-hand side. Comparing the left- and rightmost expressions above leads to exp*(iκ h)* = exp*(iκh)*, which means *κ h* = *κh* + 2*nπ*, for *n* integer. This phase ambiguity can be sorted out by the evaluation of the *x*-derivative of (25) at *x* + *h*, given by

$$i\kappa \left(-i\frac{\omega h}{2a}M + A\right)\hat{\mu}\exp(i\kappa h) = i\kappa' \left(A\_{\oplus}\hat{\phi}\_{\oplus}\tilde{\mu}\_{\oplus} + A\_{\ominus}\hat{\phi}\_{\ominus}\tilde{\mu}\_{\ominus}\right)\exp(i\kappa'h)\,,\tag{28}$$

which yields *κ* = *κ*. This last step about the phase is, however, not really necessary to the eigenanalysis because only the complex exponential factors appear throughout the relevant equations, hence knowing that exp*(iκ h)* = exp*(iκh)* is sufficient.

In the remainder of the study, orthonormal Legendre basis functions are assumed, whereby *M* = *I* . We note that numerical dispersion and diffusion eigencurves, which are the focus of the study, do not change depending on the basis functions adopted, provided that exact integrations are used in the spatial discretisation.

In the temporal analysis, an eigenvalue problem is set where, given a realvalued wavenumber *κ*, multiple *(P* + 1*)* eigenvalues of the relevant eigenmatrix are associated to admissible complex-valued numerical frequencies *ω* = *ω(κ)*. The procedure to obtain this eigenvalue problem is described below.

We begin from (18), assuming 3*u<sup>L</sup>* <sup>0</sup> <sup>=</sup> <sup>3</sup>*<sup>u</sup>* exp*(*−*iκh)* and <sup>3</sup>*u<sup>R</sup>* <sup>⊕</sup> <sup>=</sup> <sup>3</sup>*<sup>u</sup>* exp*(iκh)*, to find

$$
\widetilde{\boldsymbol{\mu}} = \left( \widehat{\phi}\_{\ominus}^{T} B\_{\ominus}^{L} \widehat{\boldsymbol{\mu}}^{L} + \widehat{\phi}\_{\ominus}^{T} B\_{\ominus}^{R} \widehat{\boldsymbol{\mu}}^{R} \right) \boldsymbol{b}^{-1}, \tag{29}
$$

with scalar *b* = *b(κh*; Pe*,β)* defined as

$$b = \beta + \left[ m\_{\oplus}^{\ominus} + m\_{\ominus}^{\ominus} - m\_{\ominus}^{\ominus} \exp(i\kappa h) - m\_{\oplus}^{\ominus} \exp(-i\kappa h) \right] \text{Pe}^{-1} \,. \tag{30}$$

Then, (29) is used into (21), relating the solution vector *u*ˆ at a given element to the state vectors of its left (*u*ˆ*L*) and right (*u*ˆ*R*) neighbours. From the wave-like behaviour of *<sup>u</sup>*<sup>ˆ</sup> and the relations *<sup>u</sup>*ˆ*<sup>L</sup>* = ˆ*<sup>u</sup>* exp*(*−*iκh)* and *<sup>u</sup>*ˆ*<sup>R</sup>* = ˆ*<sup>u</sup>* exp*(*+*iκh)*, one can arrive at

$$-i\varpi h \,\hat{\mu} = Z \,\hat{\mu}\,\,,\tag{31}$$

where *4* = *ω/a* and matrix *Z* = *Z(κh*; Pe*,β)* is given by

$$\begin{split} Z = 2b^{-1} \left[ A\_{\oplus} \Phi\_{\ominus}^{\oplus} B\_{\ominus} \exp(i\kappa h) + A\_{\ominus} \Phi\_{\oplus}^{\ominus} B\_{\oplus} \exp(-i\kappa h) + \\ \qquad + A\_{\oplus} \Phi\_{\oplus}^{\oplus} B\_{\ominus} + A\_{\ominus} \Phi\_{\ominus}^{\ominus} B\_{\ominus} - A b \right], \end{split} \tag{32}$$

in which ⊕ <sup>⊕</sup> and <sup>0</sup> <sup>0</sup> are given by (24), whereas

$$
\Phi\_{\ominus}^{\ominus} = \hat{\phi}\_{\oplus} \hat{\phi}\_{\ominus}^{T}, \quad \Phi\_{\oplus}^{\ominus} = \hat{\phi}\_{\ominus} \hat{\phi}\_{\oplus}^{T}. \tag{33}
$$

In (31), we have the desired eigenvalue problem of size *P* + 1, which thus supports this same number of eigenvalues *λj* . These are related to the (normalised) numerical frequencies *4j* via

$$
\varpi\_j h = i\lambda\_j \{ Z(\kappa h) \} \,. \tag{34}
$$

Typically, one of the eigenvalues represents the so-called primary eigenmode, while the remaining ones can be regarded as secondary as they simply replicate the behaviour of the primary mode on shifted wavenumber ranges. This formally allows us to focus on the analysis of the primary eigenmode and on its dispersion and diffusion eigencurves. The reader is referred to [2, 5] for the concepts relevant to the separation of primary and secondary modes adopted in this work.

Once the primary mode is identified, the scheme's numerical diffusion behaviour can be assessed in wavenumber space through the imaginary part of *4*∗*h*, where the asterisk subscript denotes the primary mode from (34). Note that numerical diffusion is especially relevant to turbulence computations as it impacts not only accuracy, but also stability. Note that eigencurves are entirely defined by the polynomial order *P*, the upwinding parameter *β* and, in case viscosity is present, the normalised Péclet number Pe*<sup>0</sup>* = |*a*<sup>|</sup> *h/μ* ¯ , with *<sup>h</sup>*¯ <sup>=</sup> *h/(P* <sup>+</sup>1*)*. Standard upwinding is here assumed.

Figure 2 depicts a comparison between HDG's primary dissipation curves for pure advection and for advection-diffusion at Pe*<sup>0</sup>* <sup>=</sup> 100 for *<sup>P</sup>* <sup>=</sup> <sup>1</sup>*,* 4 and 7. As explained further below, this is about the lowest value of Pe*<sup>0</sup>* one achieves (domainwise) in a turbulent flow computation. However, at this Pe*0*, viscous effects are still somewhat weak in regular (linear-scale) plots of *4ih*¯ vs. *κh*¯, where *4i* is the absolute value of *4*'s imaginary part. This is especially true for *P* ≤ 4. Hence, Fig. 2 also shows these plots in log-log scale, highlighting what happens at wellresolved wavenumbers.

The log-log plots in Fig. 2 are revealing. They make clear that HDG's numerical diffusion follows the correct diffusive behaviour up to a certain wavenumber, hereinafter named *κc*, beyond which upwind dissipation overcomes viscous diffusion. The exact diffusive behaviour, as derived from our model problem, is given by

$$
\varpi\_l \hbar = (\kappa \hbar)^2 / \text{Pe}^\star \quad \text{or} \quad \log\_{10}(\varpi\_l \hbar) = 2 \log\_{10}(\kappa \hbar) - \log\_{10}(\text{Pe}^\star) \,, \tag{35}
$$

showing that, as Pe*<sup>0</sup>* increases, the reference line of exact diffusive behaviour shifts downwards, reducing the value of *κch*¯. Also, for a given number of DOFs, i.e. fixed *h*¯, increasing the discretisation order increases *κc*. This type of analysis reveals how upwind dissipation and viscous diffusion complement each other, allowing also for the estimation of the wavenumber *κc* after which upwinding dominates. The

**Fig. 2** Normalised numerical diffusion in bilinear (left) and log-log plots (right) for *P* = 1, *P* = 4 and *P* = 7 (top to bottom), with/without viscosity (dashed/full curve), the former considering Pe*<sup>0</sup>* <sup>=</sup> 100. The exact diffusive behaviour is shown as a dotted parabola/line (left/right plots)

latter, though important for small-scale regularisation and stability, is not entirely physical in the sense of subgrid-scale modelling. Hence, *kc* values could be used as quality criteria for implicit LES / under-resolved DNS approaches based on discontinuous SEM. For transitional flows, where small numerical dissipation is particularly important, this kind of analysis might prove very useful. Although specific estimates would be needed for different schemes, the analysis strategy should be similar.

Finally, it is now explained why Pe*<sup>0</sup>* <sup>=</sup> 100 is about the lowest Péclet value one may find in a turbulent flow simulation. As candidates for very small Pe*0*, one could think of the near-wall region of turbulent boundary layers, given the low velocity and small mesh spacing in typical wall-resolved LES. For the viscous sublayer, where *u*+ *<* 5, the streamwise Peclét number can be evaluated using wall quantities:

$$\mathrm{Pe}^{\star} = \frac{\mu(\mathbf{y})\,\hbar}{\nu} = \frac{\boldsymbol{\mu}^{+}\hbar^{+}}{\nu}\boldsymbol{\mu}\_{\text{t}}\,\delta\_{\text{v}} = \boldsymbol{\mu}^{+}\,\boldsymbol{\hbar}^{+},\tag{36}$$

where by definition *ν* = *uτ δν* , being *uτ* the friction velocity and *δν* the associated viscous lengthscale. Our argument is then concluded since 50 *< 
x*<sup>+</sup> = *h*¯ + *<* 150 in typical wall-resolved LES or under-resolved DNS approaches, cf. e.g. [10].

#### **4 Concluding Remarks**

We presented a preliminary study of the numerical dispersion and diffusion characteristics of HDG methods for linear advection-diffusion problems using the temporal eigenanalysis technique. To the authors' knowledge, this is the first eigenanalysis of HDG methods, and also one of the first of such analyses of a discontinuous SEM to consider viscous diffusion effects, cf. also [11].

It was shown that, for the range of Péclet numbers encountered in underresolved turbulence simulations, upwind (numerical) dissipation dominates viscous (physical) diffusion in the smallest resolved scales. Only in the large scales, the effect of viscous diffusion becomes significant. The wavenumber beyond which upwind dissipation overcomes viscous diffusion, and its dependence on the polynomial order, can be estimated through eigenanalysis, and this can be used as quality criterion for LES and DNS in general, and for implicit LES/under-resolved DNS in particular.

Future work includes further analysing the interplay between viscous and upwind diffusion, investigating other numerical fluxes (e.g. over-upwinding *β* ' 1, nearly central fluxes *β* ≈ 0, non-zero viscous stabilization *σ* = 0), and testing eigenanalysis against actual turbulence simulations. Finally, the dispersion-diffusion characteristics of HDG methods for spatially developing simulations could be investigated using spatial eigenanalysis techniques.

**Acknowledgements** Pablo Fernandez would like to acknowledge financial support from the MIT Zakhartchenko Fellowship. Spencer J. Sherwin acknowledges support from EPSRC Platform grant EP/R029423/1.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Spectral Galerkin Method for Solving Helmholtz and Laplace Dirichlet Problems on Multiple Open Arcs**

**Carlos Jerez-Hanckes and José Pinto**

#### **1 Introduction**

We seek solutions of Helmholtz and Laplace equations in a two-dimensional plane after removing a finite collection of open finite curves—also called arcs. This setting can be found in areas such as structural and mechanical engineering [2], or biomedical imaging [11] to name a few. Such problems pose the following challenges: (1) *unbounded domains*, which call for boundary integral methods with carefully chosen radiation conditions; (2) *singular behaviors* of solutions near arc endpoints; and (3) *large number of degrees of freedom* when the wavenumber or number of arcs increase.

Our approach is to recast the problem as a system of boundary integral equations defined on the arcs, so as to obtain an integral representation of the volume solution. Well-posedness for a single arc was proven in [9], with an extension to the multiple arcs case given in [5]. We will consider numerical approximations of the resulting surface densities based on Galerkin-Bubnov discretizations of the corresponding system of boundary integral equations.

In the present note, we start by briefly introducing a spectral scheme to account for general arcs as well as for a wide wavenumber range. We show that significant reduction in both memory consumption and computational work can be achieved by an *ad hoc* matrix compression algorithm. Moreover, we establish detailed interde-

C. Jerez-Hanckes (-)

J. Pinto

Faculty of Engineering and Sciences, Universidad Adolfo Ibañez, Santiago, Chile e-mail: carlos.jerez@uai.cl

School of Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile e-mail: jspinto@uc.cl

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_30

pendencies between compression parameters and accuracy. Numerical experiments validate our claims and point out further improvements.

#### **2 Continuous Model Problem**

Let the canonical domain *(*−1*,* <sup>1</sup>*)* × {0} be denoted by <sup>B</sup>. We say that *<sup>g</sup>* : <sup>B</sup> <sup>→</sup> <sup>C</sup> is *ρ*-analytic if the function *t* 2→ *g(t ,* 0*)* can be extended to an analytic function on the Bernstein ellipse of parameter *ρ >* 1 (*cf.* [10, Chapter 8]). We say that *\** <sup>⊂</sup> <sup>R</sup><sup>2</sup> is a regular Jordan arc of class <sup>C</sup>*m*, for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, if it is the image of a bijective parametrization, denoted by **r** = *(r*1*, r*2*)*, such that its components are <sup>C</sup>*m(*B*)*-functions, **<sup>r</sup>** : <sup>B</sup> <sup>→</sup> *\** and **r** *(t)*<sup>2</sup> *<sup>&</sup>gt;* 0, <sup>∀</sup> *<sup>t</sup>* <sup>∈</sup> <sup>B</sup>, where ·<sup>2</sup> is the Euclidean norm. Similarly, we define *ρ*-analytic arcs as those whose components are *ρ*-analytic. Throughout, we will assume that for any *\** regular Jordan arc, there exists an extension of *\** to *\**˜ , which is a closed and keep the same regularity.

Consider a finite number *<sup>M</sup>* <sup>∈</sup> <sup>N</sup> of at least <sup>C</sup>1-arcs, written {*i*}*<sup>M</sup> <sup>i</sup>*=1, such that their closures are mutually disjoint. Moreover, we assume that there are disjoint domains *i* whose boundaries are given by extensions *∂i* <sup>=</sup> <sup>3</sup>*i*, for *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,M*. Let us define

$$
\Gamma := \bigcup\_{i=1}^M \Gamma\_i \qquad \text{and} \qquad \Omega := \mathbb{R}^2 \backslash \overline{\Gamma}.
$$

We say that is of class <sup>C</sup>*m*, *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, if each arc *i* is of class <sup>C</sup>*<sup>m</sup>* and analogously for the *<sup>ρ</sup>*-analytic case. For *<sup>i</sup>* ∈ {1*,...,M*}, let **<sup>r</sup>***<sup>i</sup>* : <sup>B</sup> <sup>→</sup> *i* and *gi* : *i* <sup>→</sup> <sup>C</sup>. We claim that **<sup>g</sup>** <sup>=</sup> *(g*1*,...,gM )* is of class <sup>C</sup>*m()* if *gi* ◦ **<sup>r</sup>***<sup>i</sup>* <sup>∈</sup> <sup>C</sup>*m(*B*)*, for *<sup>i</sup>* ∈ {1*,...,M*}. A similar definition holds for the analytic case.

Let *<sup>G</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>d</sup>* , *<sup>d</sup>* <sup>=</sup> <sup>1</sup>*,* 2, be an open domain. For *<sup>s</sup>* <sup>∈</sup> <sup>R</sup>, we denote by *<sup>H</sup>s(G)* the standard Sobolev spaces, by *H<sup>s</sup> loc(G)* their locally integrable counterparts [8, Section 2.3], and by *<sup>H</sup>*3<sup>−</sup>*s(G)* the corresponding dual spaces. The corresponding duality product (when the dual space of *L*2*(G)* is identified with itself) is denoted ·*,*·*G*. Finally, *<sup>H</sup>*3*<sup>s</sup>* 0 *(G)* refers to mean-zero spaces [5, Section 2.3]. We will also make use of the following Hilbert space in R2:

$$W(G) := \left\{ U \in \mathcal{D}^\*(G) : \frac{U(\mathbf{x})}{\sqrt{1 + \|\mathbf{x}\|\_2^2 \log(2 + \|\mathbf{x}\|\_2^2)}} \in L^2(G), \nabla U \in L^2(G) \right\},$$

where <sup>D</sup>∗*(G)* is the dual space of <sup>C</sup>∞*(G)* = ∩*n>*1C*n(G)*. For *<sup>s</sup>* <sup>∈</sup> <sup>R</sup> and for the finite union of disjoint open arcs , we define Cartesian product spaces as

$$\mathbb{H}^s(\Gamma) := H^s(\Gamma\_1) \times H^s(\Gamma\_2) \times \cdots \times H^s(\Gamma\_M).$$

Spaces <sup>H</sup>3*s()* and <sup>H</sup>3*<sup>s</sup>* 0 *()* are defined similarly. Also, <sup>H</sup>*s(*B*)* is to be understood as the Cartesian product A*<sup>M</sup> <sup>i</sup>*=<sup>1</sup> *<sup>H</sup>s(*B*)*. Finally, given an open bounded neighborhood *Gi* such that *i* <sup>⊂</sup> *∂Gi*, Dirichlet traces are defined as extensions to *<sup>H</sup>s(Gi)*, for *s* ≥ 1*/*2, of the following operator (applied to smooth functions):

$$\chi\_{\boldsymbol{\ell}}^{\pm}\mu(\mathbf{y}) := \lim\_{\epsilon \downarrow 0} \mu(\mathbf{y} \pm \epsilon \mathbf{n}\_{\boldsymbol{\ell}}(\mathbf{y})),$$

where **n***i(***y***)* is the unitary vector with direction *(r i,*2*(t),* −*r i,*1*(t))* and *t* such that **r***(t)* = **y**. For a function *u* defined in an open neighborhood of *i* such that *γ* <sup>+</sup> *<sup>i</sup> u* = *γ* − *<sup>i</sup> u*, we denote *γiu* := *γ* <sup>±</sup> *<sup>i</sup> u*.

**Problem 1 (Volume Problem)** Let **<sup>g</sup>** <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>2</sup> *()* and *<sup>κ</sup>* <sup>≥</sup> 0. We seek *<sup>U</sup>* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> *loc()* such that

$$-\Delta U - \kappa^2 U = 0 \qquad\qquad\qquad\qquad\text{in }\Omega,\tag{1}$$

$$
\gamma\_i^{\pm} U = \mathcal{g}\_i \qquad \qquad \text{for } i = 1, \ldots, M,\tag{2}
$$

Condition at infinity*(κ).* (3)

The behavior at infinity (3) depends on *κ* in the following way: if *κ >* 0, we employ the classical Sommerfeld condition [8, Section 3.9]. If *κ* = 0, we seek for solutions *U* ∈ *W ()*. This last condition was discussed in detail in [5, Remarks 3.9, 4.2 and 4.5] with uniqueness proofs for *κ* ≥ 0 provided in [5, Propositions 3.8 and 3.10].

For *κ* ≥ 0, we can express *U* solution of Problem 1 as

$$U(\mathbf{x}) = \sum\_{l=1}^{M} (\mathbf{S} \mathbf{L}\_l[\kappa] \lambda\_l)(\mathbf{x}), \quad \forall \mathbf{x} \in \Omega,\tag{4}$$

where

$$(\mathfrak{S}\mathcal{L}\_i[\kappa]\lambda\_i)(\mathbf{x}) := \int\_{\Gamma\_i} G\_\kappa(\mathbf{x}, \mathbf{y})\lambda\_i(\mathbf{y})d\Gamma\_i(\mathbf{y}), \quad \forall \, \mathbf{x} \in \mathfrak{Q}\_{\perp}$$

denotes the single layer potential generated at a curve *i* with *Gκ* the corresponding fundamental solution, defined as in [8, Section 3.1]. It is direct from (4) that *U* solves (1)–(2) in (see [8, Theorem 3.1.1]). Also, it displays the desired behavior at infinity as long as each *λi* lies in the right functional space [5, Section 4]. In order to find the surface densities *λi*, we take Dirichlet traces *γ* <sup>±</sup> *<sup>i</sup>* of the SL*<sup>j</sup>* and impose boundary conditions (2). This naturally defines of weakly singular boundary integral operators:

$$
\mathcal{L}\_{lj}[\kappa] := \frac{1}{2} \left( \boldsymbol{\gamma}\_l^+ \mathbf{S} \boldsymbol{\mathsf{L}}\_j[\kappa] + \boldsymbol{\gamma}\_l^- \mathbf{S} \boldsymbol{\mathsf{L}}\_j[\kappa] \right) = \boldsymbol{\gamma}\_l \mathbf{S} \boldsymbol{\mathsf{L}}\_j[\kappa],
$$

and an equivalent boundary integral equation problem to Problem 1.

**Problem 2 (Boundary Integral Problem)** Let **<sup>g</sup>** <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>2</sup> *()*. For *κ >* 0, we seek *<sup>λ</sup>* <sup>=</sup> *(λ*1*,...,λM )* <sup>∈</sup> <sup>H</sup>3−<sup>1</sup> <sup>2</sup> *()* such that

$$
\mathcal{L}[\kappa] \lambda = \mathbf{g},
$$

where <sup>L</sup>[*κ*] : <sup>H</sup>3−<sup>1</sup> <sup>2</sup> *()* <sup>→</sup> <sup>H</sup><sup>1</sup> <sup>2</sup> *()* is a matrix operator with entries L[*κ*]*ij* = L*ij* [*κ*], for *i, j* ∈ {1*,...M*}. If *<sup>κ</sup>* <sup>=</sup> 0, we seek *<sup>λ</sup>* <sup>∈</sup> <sup>H</sup>3−<sup>1</sup> 2 0 *()*, given **<sup>g</sup>** in the dual space of the aforementioned space.

**Theorem 1 (Theorem 4.13 in [5])** *For κ >* 0*, Problem 2 has a unique solution <sup>λ</sup>* <sup>∈</sup> <sup>H</sup>3−<sup>1</sup> <sup>2</sup> *(), whereas for <sup>κ</sup>* <sup>=</sup> <sup>0</sup> *a unique solution exists in the subspace* <sup>H</sup>3−<sup>1</sup> 2 0 *(). Also, the following continuity estimate holds*

$$\|\mathsf{A}\|\_{\widetilde{\mathsf{H}}^{-\frac{1}{2}}(\Gamma)} \leq C(\Gamma, \kappa) \|\mathsf{g}\|\_{\operatorname{H}^{\frac{1}{2}}(\Gamma)}.$$

#### **3 Spectral Discretization**

We present a family of finite dimensional subspaces in <sup>H</sup>3−<sup>1</sup> <sup>2</sup> *()* that can be used to approximate the solution of Problem <sup>2</sup> (*cf.* [4, 6]). Let <sup>T</sup>*<sup>N</sup> (*B*)* denote the space spanned by first kind Chebyshev polynomials, denoted by {*Tn*} *N <sup>n</sup>*=0, of degree lower or equal than *<sup>N</sup>* on <sup>B</sup>, orthogonal with the *<sup>L</sup>*2*(*−1*,* <sup>1</sup>*)*inner product, under the weight *<sup>w</sup>*−<sup>1</sup> with *w(t)* := <sup>√</sup> <sup>1</sup> <sup>−</sup> *<sup>t</sup>*2. Now, let us construct elements *<sup>p</sup><sup>i</sup> <sup>n</sup>* <sup>=</sup> *Tn* ◦**r**−<sup>1</sup> *<sup>i</sup>* over each arc *i* spanning the space T*<sup>N</sup> (i)*. For practical reasons, we define the normalized space:

$$\overline{\mathbb{T}}\_N(\Gamma\_l) := \left\{ \bar{p}^l \in \mathcal{C}(\Gamma\_l) \, : \, \bar{p}\_n^l := \frac{p\_n^l}{\left\| \mathbf{r}\_l^l \circ \mathbf{r}\_l^{-1} \right\|\_2}, \quad p\_n^l \in \mathbb{T}\_N(\Gamma\_l) \right\}.$$

We account for edge singularities by multiplying the basis { ¯*p<sup>i</sup> n*} *N <sup>n</sup>*=<sup>0</sup> by a suitable weight:

$$\mathbb{Q}\_N(\Gamma\_l) := \left\{ q\_n^i := w\_l^{-1} \, \bar{p}\_n^i \, : \, \bar{p}\_n^i \in \overline{\mathbb{T}\_N}(\Gamma\_l) \right\},$$

wherein *wi* := *<sup>w</sup>* ◦ **<sup>r</sup>**−<sup>1</sup> *<sup>i</sup>* . The corresponding basis for <sup>Q</sup>*<sup>N</sup> (i)* will be denoted {*qi n*}*N <sup>n</sup>*=0. By Chebyshev orthogonality, we can easily define the mean-zero subspace <sup>Q</sup>*N,*0*(i)* := <sup>Q</sup>*<sup>N</sup> (i)* \ <sup>Q</sup>0*(i),* spanned by {*q<sup>i</sup> n*}*N <sup>n</sup>*=1. With these definitions, we set the discretization space for a Galerkin-Bubnov solution of Problem 2 as

$$\mathbb{H}\_N[\kappa] := \begin{cases} \prod\_{l=1}^M \mathbb{Q}\_{N,\langle 0 \rangle}(\Gamma\_l) & \text{for } \kappa = 0, \\ \prod\_{l=1}^M \mathbb{Q}\_N(\Gamma\_l) & \text{for } \kappa > 0. \end{cases}$$

**Problem 3 (Linear System)** For *κ >* 0, let *<sup>N</sup>* <sup>∈</sup> <sup>N</sup> and **<sup>g</sup>** <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>2</sup> *()* be the same as in Problem 2. Then, we seek coefficients <sup>u</sup> <sup>=</sup> *(*u1*,...,* <sup>u</sup>*<sup>M</sup> )* <sup>∈</sup> <sup>C</sup>*M(N*+1*)* , such that

$$
\mathfrak{L}[\kappa] \mathfrak{u} = \mathfrak{g}.
$$

Therein, we have defined the Galerkin matrix **<sup>L</sup>**[*κ*] ∈ <sup>C</sup>*M(N*+1*)*×*M(N*+1*)* composed of matrix blocks <sup>L</sup>*ij* [*κ*] ∈ <sup>C</sup>*(N*+1*)*×*(N*+1*)* whose entries are

$$(\mathsf{L}\_{lj}[\kappa])\_{lm} = \left\langle \mathcal{L}\_{lj}[\kappa] q\_m^j, q\_l^l \right\rangle\_{\Gamma\_l} = \left\langle \widehat{\mathcal{L}}\_{lj}[\kappa] w^{-1} T\_m, w^{-1} T\_l \right\rangle\_{\widehat{\Gamma}}.$$

There, <sup>L</sup>B*ij* [*κ*] is the weakly-singular operator whose kernel is parametrized by **<sup>r</sup>***i,* **<sup>r</sup>***<sup>j</sup>* and right-hand g <sup>=</sup> *(*g1*,...,* g*M)* <sup>∈</sup> <sup>C</sup>*M(N*+1*)* with components

$$(\mathfrak{g}\_l)\_{\varGamma} = \left\langle \mathbf{g}\_l, \, q\_l^\ell \right\rangle\_{\varGamma\_l} = \left\langle \widehat{\mathbf{g}}\_l, \, w^{-1} \, T\_l \right\rangle\_{\widehat{\Gamma}},$$

where <sup>B</sup>*gi* <sup>=</sup> *gi* ◦ **<sup>r</sup>***i*. The approximation *<sup>λ</sup><sup>N</sup>* <sup>∈</sup> <sup>H</sup>*<sup>N</sup>* [*κ*] is constructed as

$$(\lambda\_N)\_i = \sum\_{m=0}^N (\mathfrak{u}\_i)\_m q\_m^{\bar{i}} \quad \text{in } \Gamma\_{\bar{i}}, \quad \text{for all } i \in \{1, \dots, M\}.$$

For *<sup>k</sup>* <sup>=</sup> 0 we need **<sup>g</sup>** as in Problem 2; we also have u <sup>∈</sup> <sup>C</sup>*MN* , and **<sup>L</sup>**[0] ∈ <sup>C</sup>*MN*×*MN* since the approximation space is <sup>H</sup>*<sup>N</sup>* [0]. By conformity and density of these spaces in <sup>H</sup>3−<sup>1</sup> <sup>2</sup> *()*, one derives the following result:

**Theorem 2 (Theorem 4.23 [4])** *Let <sup>κ</sup>* <sup>≥</sup> <sup>0</sup>*, <sup>m</sup>* <sup>∈</sup> <sup>N</sup> *with m >* <sup>2</sup>*,* <sup>∈</sup> <sup>C</sup>*m,* **<sup>g</sup>** <sup>∈</sup> <sup>C</sup>*m(), and <sup>λ</sup> be the only solution of Problem 2. Then, there exists <sup>N</sup>*<sup>0</sup> <sup>∈</sup> <sup>N</sup> *such that for every N>N*<sup>0</sup> <sup>∈</sup> <sup>N</sup> *there is a unique <sup>λ</sup><sup>N</sup>* <sup>∈</sup> <sup>H</sup>*<sup>N</sup>* [*κ*] *solution of Problem 3. Moreover, the following error convergence rates hold*

$$\|\lambda - \lambda\_N\|\_{\widetilde{\mathbb{H}}^{-\frac{1}{2}}(\Gamma)} \le C(\Gamma, \kappa) N^{-m+1}.$$

*Moreover, if and* **g** *are ρ-analytic with ρ >* 1*, we have the following superalgebraic convergence rates*

$$\left\|\lambda - \lambda\_N\right\|\_{\widetilde{\mathbb{H}}^{-\frac{1}{2}}(\Gamma)} \le C(\Gamma, \kappa) \rho^{-N+2} \sqrt{N},$$

*where C(, κ) is a positive constant, which does not depend on N.*

*Remark 1* Observe that the constants *C(, κ)* and *N*<sup>0</sup> depend on the geometry and frequency. To the best of our knowledge previous convergence results for 2D arcs are somehow limited. For intervals, the result was established in [6] whereas for more general arc results are only obtained for the Laplace case [1]. Super-algebraic convergence rates can be achieved by the method detailed in [3], though their scheme is limited to intervals and to the case of elliptic problems (*N*<sup>0</sup> = 0). More complex cases are still an open problem.

#### **4 Numerical Implementation and Compression Algorithm**

Before fleshing out our proposed compression technique, we explain how **L**[*κ*] and g of Problem <sup>3</sup> are computed. For the right-hand side, one must compute integrals of the form:

$$\int\_{-1}^{1} \widehat{\boldsymbol{g}}(t) \boldsymbol{w}^{-1}(t) T\_l(t) dt, \quad \forall \, l \in \mathbb{N}\_0, \, l$$

which corresponds to Fourier-Chebyshev coefficients of B*g(t)* and can be approximated using the Fast Fourier Transform [10]. Computations for matrix terms L*ij* [*κ*] are split into two groups: (a) *cross-interactions*, where test and trial functions supports lie along curves *i*, *j* with *i* = *j* ; and (b) *self-interactions*, where both trial and test functions are defined on the same curve. As for cross-interactions the integral kernel is smooth, we use the same computational procedure for the righthand side.

For self-interactions, the kernel function has a singularity that can be characterized as

$$G\_k(\mathbf{r}(t), \mathbf{r}(s)) = (2\pi)^{-1} \log|t - s| J\_0(k \|\mathbf{r}(t) - \mathbf{r}(s)\|\_2) + G\_k(t, s), \quad t \neq s, t$$

for *t,s* <sup>∈</sup> <sup>B</sup>, where *<sup>J</sup>*<sup>0</sup> is the zeroth-order first kind Bessel function, and *Gr* is a regular function. Thus, integration for the regular part is done as in the crossinteraction case, while integrals with the first term as kernel are obtained by convolution as integrals for log |*t* − *s*| are known (see [6, Remark 4.2]).

Yet, as *κ* increases, larger values of *N* will be required, and thus, the need to compress the resulting matrix terms. As stated in [10, Chapters 7 and 8], the regularity of a function controls the decay of its Fourier-Chebyshev coefficients. Hence, as the entries of the matrix **L**[*κ*] are precisely such coefficients, for a smooth kernel one observes fast decaying terms. This implies that we can select small blocks to approximate the matrix and obtain a sparse approximation by discarding the remaining entries, based on a predetermined tolerance *# >* 0. Specifically, the kernel function is smooth when we compute cross-interactions. Let the routine Quadrature(*l*,*m*) compute the term *(l, m)* of this interaction matrix using a 2D Gauss-Chebyshev quadrature. Given a tolerance *# >* 0, we minimize the number of computations needed by performing the following binary search:

#### **Matrix Compression Algorithm**

```
INPUT: Tolerance (Tol), Max level of search (Lmax)
OUTPUT: Number of columns to use (Ncols)
INITIALIZE: Ncols = N, level = 0, a = 0,b=N
While{level < Lmax}
   m = (a+b)/2
   Tleft = m-1
   Tcenter = m
   Tright = m+1
   Veft = abs(Quadrature(0,Teft))
   Vcenter = abs(Quadrature(0,Tcenter))
   Vright = abs(Quadrature(0,Tright))
   If{Vright & Vcenter < 0.5*Tol} or {Vleft & Vcenter < 0.5*Tol}
      b=m
   Else
      a=m
   EndIF
   level++
EndWhile
Ncols = b
```
The algorithm returns the minimum number of columns required, *Ncols*, by searching in the first row the minimum index such that the matrix entries' absolute value is lower than *#*. The binary search is restricted to a depth *<sup>L</sup>*max <sup>∈</sup> <sup>N</sup>. The same procedure is used to estimate the number of rows, *Nrows*, by executing a binary search in the first column. Once *Ncols* and *Nrows* are selected, we define *N#* := max{*Nrows, Ncols*} and compute the block of size *N#* × *N#* as in the full matrix implementation.

The matrix compression percentage will strongly depend on the regularity of the arcs involved. For *ρ*-analytic arcs, using [10, Theorem 8.1] we can prove the lower bound:

$$N\_{\epsilon} \ge \frac{-\log \epsilon}{2^{\Upsilon} \log \rho},$$

where *ϒ* is an upper bound for the absolute value of the kernel in the corresponding Bernstein ellipse. However, since compression is done by a binary search, the bound for the compression rate depends on *L*max as

$$N\_{\epsilon} \ge \frac{N}{2^{L\_{\text{max}}}}.$$

Compression of self-interaction blocks does not follow the same ideas. In fact, these blocks can be characterized as two perturbations over the canonical case, <sup>=</sup> <sup>B</sup> for *κ* = 0, leading to a diagonal matrix. Namely, these are


In order to reduce memory consumption—though not computational time—we discard the entries of the self-interaction matrices lower than the given tolerance.

As expected, matrix compression induces an extra error as it perturbs the original linear system solved by *λ<sup>N</sup>* in Problem 3. We denote by **L***#* [*k*] the matrix generated by the compression algorithm with tolerance *#*, and define the matrix difference **<sup>L</sup>***#* [*k*] := **<sup>L</sup>***#* [*k*] − **<sup>L</sup>**[*k*]. We seek to control the solution <sup>u</sup>*#* <sup>=</sup> <sup>u</sup> <sup>+</sup> <sup>u</sup> of

$$(\mathsf{L}[k] + \Delta \mathsf{L}\_{\epsilon}[k])\mathfrak{u}^{\epsilon} = \mathfrak{g},$$

where u and g are the same as in Problem 3. In order to bound this error, we will assume that, for every pair of indices *(i, j )* in the matrix **L**[*k*], we have,

$$|(
\Delta \mathsf{L}\_{\epsilon}[k])\_{lj}| < \epsilon. \tag{5}$$

**Theorem 3** *Let <sup>N</sup>* <sup>∈</sup> <sup>N</sup> *be such there is only one <sup>λ</sup><sup>N</sup> solution of Problem 3. Then, there is a constant C(, κ) >* 0*, not depending on N, such that*

$$\frac{\|\Delta\mathfrak{u}\|\_{2}}{\|\mathfrak{u}\|\_{2}} \leq \left| \frac{N\epsilon}{C(\kappa,\Gamma) - N\epsilon} \right|.$$

*Proof* By [7, Section 1.13.2] we have that

$$\frac{\|\Delta \mathbf{u}\|\_2}{\|\mathbf{u}\|\_2} \le \frac{\left\|\Delta \mathbf{L}\_\epsilon[k]\right\|\_2}{\left\|(\mathbf{L}[k])^{-1}\right\|\_2 - \left\|\Delta \mathbf{L}\_\epsilon[k]\right\|\_2},$$

and thus, we need to estimate  **<sup>L</sup>***#* [*k*] <sup>2</sup> and *(***L**[*k*]*)*−<sup>1</sup> 2 . The bound for the first term is direct from (5) and matrix norm definitions. By the classical bound of a matrix inverse and the continuity of the associated boundary integral operator, it holds that

$$\left\|\mathsf{L}[k]^{-1}\mathfrak{g}\right\|\_{2} \geq \left\|\mathsf{L}[k]\mathfrak{g}\right\|\_{2}^{-1} \geq C(\kappa,\varGamma),$$

from where the result follows directly. %&

We can also estimate the error introduced by the compression algorithm in terms of the energy norm. In order to do so, define *(λ# <sup>N</sup> )i* := <sup>&</sup>lt;*<sup>N</sup> <sup>m</sup>*=0*(*u*# <sup>i</sup> )mq<sup>i</sup> <sup>m</sup>* in *i*. By the same arguments in the above proof, we obtain

$$\left\|\lambda\_N - \lambda\_N^{\epsilon}\right\|\_{\widetilde{\mathbb{H}}^{-\frac{1}{2}}(\Gamma)} \le C\_1(\kappa, \Gamma) \left\|\mathbf{g}\right\|\_{\operatorname{\mathbb{H}}(\Gamma)^{\frac{1}{2}}} \frac{\epsilon N^{3/2}}{C\_2(\kappa, \Gamma) - \epsilon N},$$

where **g** is the same that in Problem 2 and *C*1*(κ, Γ )*, *C*2*(κ, Γ )* are two different constants.

*Remark 2* Our compression algorithm produces a faster and less memory demanding implementation of the spectral Galerkin method at the cost of accuracy loss, similar to fast multipole or hierarchical matrices methods. Moreover, once we have compressed the matrix, we can implement a fast matrixvector product.

#### **5 Numerical Results**

To illustrate the above claims, Fig. 1 presents convergence results for different wavenumbers, *κ* = 0*,* 25*,* 50*,* 100 for a configuration of *M* = 28 arcs. As the chosen geometry and excitation are given by analytic functions, Theorem 2 predicts exponential rate of convergence as observed numerically.

Table 1 provides matrix compression results for *κ* = 100 and for the same geometry of Fig. 1. It presents the percentage of non-zero entries (%NNZ) and relative errors as bounded in Theorem 3 as functions of the maximum level of binary

**Fig. 1** (**a**) Smooth geometry with *M* = 28 open arcs parametrized as **r***i(t)* = *(ai t,ci* sin*(bit)*+*di)*, with *ai* ∈ [0*.*14*,* 0*.*25], *bi* ∈ [0*,* 0*.*2], *ci* ∈ [1*,* 2], *di* ∈ [0*,* 20], *t* ∈ [−1*,* 1]. (**b**) Convergence results for different wavenumbers and a planewave excitation along *(*1*,* 1*)*. Errors computed against an overkill solution using *N* = 660 per arc


**Table 1** Compression performance for *κ* = 100

search (*L*max), tolerances (*#*), and polynomial order per arc (Order). For low orders (Order *<* 60), relative errors are quite large, and therefore, most of the matrix terms are kept. This is due to an insufficient number of matrix entries to solve the problem with good accuracy (see Fig. 1), rendering compression pointless. On the other hand, once convergence is achieved, the compression error drastically decreases along with the percentage of matrix terms stored.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Explicit Polynomial Trefftz-DG Method for Space-Time Elasto-Acoustics**

**H. Barucq, H. Calandra, J. Diaz, and E. Shishenina**

#### **1 Trefftz-DG Formulation for the Elasto-Acoustic Equation**

Trefftz methods are particular finite element methods where the basis and test functions are locally solutions to the partial differential equation that governs the problem to be solved. Compared to the existing literature for solving frequency problems, space-time Trefftz methods are still not widely used. One reason could be that they require using space-time meshes [6, 12]. To our knowledge, few references on Trefftz approximations of time-dependent wave equations are available and they mainly address theoretical properties in the case of Acoustics and Electromagnetism [4, 8, 10, 11]. They provide convergence and stability studies and some numerical results are displayed by using plane wave bases in 1D+ time dimension. Numerical in 2D + time dimensions are proposed in [4] for electromagnetism. There are also some studies devoted to the second-order formulation of the acoustic wave equation approximated in Trefftz spaces by the mean of Lagrange multipliers [1, 13]. In [3], we have proposed a Trefftz-DG formulation for elasto-acoustic. The method required the inversion of a huge sparse matrix. The goal of this paper is to show how to derive a semi-explicit scheme, requiring only the inversion of a block-diagonal matrix on each element of the mesh.

© The Author(s) 2020

H. Barucq · J. Diaz · E. Shishenina (-)

Magique-3D, Inria, E2S UPPA, CNRS, Université de Pau et des Pays de l'Adour, Pau, France e-mail: helene.barucq@inria.fr; julien.diaz@inria.fr; elvira.shishenina@inria.fr

H. Calandra Total SA, CSTJF Total, Pau, France e-mail: henri.calandra@total.com

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_31

In this section, following [10] and the framework therein, we propose a formulation of the elasto-acoustic coupling reading as a first-order system. Here and further the sub-scripts *F* and *S* corresponds to the acoustic (fluid) and elastodynamic (solid) domains.

#### *1.1 Elasto-Acoustic Equations*

We introduce a space-time domain *<sup>Q</sup>* <sup>≡</sup> *(F* <sup>∪</sup> *S)* <sup>×</sup> *<sup>I</sup>* , where *F* <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* is a bounded Lipschitz domain of dimension *<sup>d</sup>* filled with fluid, *S* <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* is a bounded Lipschitz elastodynamic domain of dimension *d* filled with solid, and *I* ≡ [0*, T* ] is the time interval. All medium parameters *cF* ≡ *cF (***x***)* and *ρF* ≡ *ρF (***x***)*, standing for the acoustic wave propagation velocity and fluid density respectively, as well as the inverted stiffness tensor **<sup>C</sup>**−1*(***x***)* <sup>≡</sup> **<sup>A</sup>***(***x***)* and the solid density *ρS* <sup>≡</sup> *ρS(***x***)*, are assumed to be piecewise constant and positive. We denote by *F S* = *F* ∩ *S* the fluid-solid interface. The elasto-acoustic system of equations is based on the coupling of the first-order acoustic equation, written in terms of velocity **v***<sup>F</sup>* ≡ **v***<sup>F</sup> (***x***,t)* and pressure *p* ≡ *p(***x***,t)* fields:

$$\begin{cases} \frac{1}{c\_F^2 \rho\_F} \frac{\partial p}{\partial t} + \text{div} \mathbf{v}\_F = f & \text{in } \mathcal{Q}\_F, \\\\ \rho\_F \frac{\partial \mathbf{v}\_F}{\partial t} + \nabla p = 0 & \text{in } \mathcal{Q}\_F, \\\\ \mathbf{v}\_F(\cdot, 0) = \mathbf{v}\_{F0}, \ p(\cdot, 0) = p\_0 & \text{in } \Omega\_F, \\\\ \mathbf{v}\_F \cdot \mathbf{n}\_{\partial \varGamma\_F} = \mathcal{g}\_F & \text{in } \partial \varOmega\_F \backslash \Gamma\_{FS} \times I, \end{cases} (1)$$

where **n***F* is the normal vector to *∂F* , the source term *f* ≡ *f (***x***,t)*, the boundary condition *gF* , the velocity **v***<sup>F</sup>* <sup>0</sup> and the pressure *p*<sup>0</sup> are the initial data, with the firstorder elastodynamic system, written in terms of velocity **v***<sup>S</sup>* ≡ **v***S(***x***,t)* and stress tensor (symmetrical and positive) *σ* ≡ *σ(***x***,t)* fields:

$$\begin{cases} \underline{\mathbf{A}} \frac{\partial \underline{\sigma}}{\partial t} - \underline{\mathbf{g}}(\mathbf{v}\_{\mathcal{S}}) = 0 & \text{in } \mathcal{Q}\_{\mathcal{S}}, \\ \rho s \frac{\partial \mathbf{v}\_{\mathcal{S}}}{\partial t} - \underline{\text{div} \, \underline{\sigma}} = 0 & \text{in } \mathcal{Q}s, \\ \mathbf{v}\_{\mathcal{S}}(\cdot, 0) = \mathbf{v}\_{\mathcal{S}0}, \; \underline{\sigma}(\cdot, 0) = \underline{\sigma}\_{0} & \text{in } \Omega\_{\mathcal{S}}, \\ \underline{\sigma} \mathbf{n}\_{\Omega\_{\mathcal{S}}} = \underline{\mathbf{g}}\_{\mathcal{S}} & \text{in } \partial \Omega\_{\mathcal{S}} \backslash \Gamma\_{FS} \times I, \end{cases} (2)$$

where **n***S* is the normal vector to *∂S*, the boundary condition *gF* , the velocity **v***S*<sup>0</sup> and the stress tensor *σ*<sup>0</sup> are the initial data. The transmission conditions between the two systems (2) and (1) represent the continuity of velocity and stress normal components *F S*:

$$\begin{cases} \mathbf{v}\_F \cdot \mathbf{n}\_{\Gamma\_{FS}} = \mathbf{v}\_S \cdot \mathbf{n}\_{\Gamma\_{FS}} & \text{at } \Gamma\_{FS}, \\\\ - \, p \mathbf{n}\_{\Gamma\_{FS}} = \underline{\mathbf{g}} \, \mathbf{n}\_{\Gamma\_{FS}} & \text{at } \Gamma\_{FS}. \end{cases} \tag{3}$$

The velocities aligned with the interface and the tangential stress remain unconstrained.

#### *1.2 Space-Time Trefftz-DG Formulation*

We introduce a non-overlapping space-time mesh T*<sup>h</sup>* on *Q* composed of space-time Lipschitz elements *KF* ⊂ *F* × *I* and *KS* ⊂ *S* × *I* . We denote by T*F h* (resp. <sup>T</sup>*Sh*) the restriction of <sup>T</sup>*<sup>h</sup>* to the fluid (resp. solid) domain. Let **<sup>n</sup>***KF* <sup>≡</sup> *(***n***<sup>x</sup> KF , nt KF )* be the outward-pointing unit normal vector on *∂KF* , and **<sup>n</sup>***KS* <sup>≡</sup> *(***n***<sup>x</sup> KS , nt KS )* be the outward-pointing unit normal vector on *∂KS*. We assume that all medium parameters are constant in *KF* and *KS* respectively. The mesh skeleton <sup>F</sup>*<sup>h</sup>* <sup>≡</sup> 8 *KF,S*∈T*<sup>h</sup> ∂KF,S* can be decomposed into families of the internal <sup>F</sup>*<sup>Q</sup> <sup>h</sup>* faces, the fluid-

solid <sup>F</sup>*F S <sup>h</sup>* faces, the boundary <sup>F</sup>*<sup>D</sup> <sup>h</sup>* faces, the initial and final time <sup>F</sup><sup>0</sup> *<sup>h</sup>* and <sup>F</sup>*<sup>T</sup> h* element faces respectively, as it shown in Fig. 1. We introduce the space *Vh(*T*h)* as a subspace of *<sup>L</sup>*2*(Q)* defined by *Vh(*T*h)* <sup>=</sup> <sup>=</sup> *<sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*2*(Q), φ*|*KF,S* <sup>∈</sup> <sup>P</sup><sup>p</sup>*(KF,S)* > . The unknowns *(***v***F h, ph,* **<sup>v</sup>***Sh, <sup>σ</sup> h)* are supposed to be in **<sup>V</sup>***h(*T*h)* <sup>≡</sup> *Vh(*T*F h)<sup>d</sup>* <sup>×</sup> *Vh(*T*F h)* <sup>×</sup> *Vh(*T*Sh)<sup>d</sup>* <sup>×</sup> *Vh(*T*Sh)<sup>d</sup>*<sup>2</sup> . We consider the test functions *ω<sup>F</sup>* , *q*, *ωS*, *ξ* in

**Fig. 1** Example of 1D + time mesh T*<sup>h</sup>* covering *Q*. The internal element faces F*Q <sup>h</sup>* are represented by dotted line, the element faces of fluid-solid interface <sup>F</sup>*F S <sup>h</sup>* —by dash-dotted line, the boundary element faces F*D <sup>h</sup>* —by thick line, the initial F0 *<sup>h</sup>* and the final <sup>F</sup>*<sup>T</sup> <sup>h</sup>* time element faces—by double and dashed line respectively

**T***(*T*h)* for **v***<sup>F</sup>* , *p*, **v***<sup>S</sup>* and *σ* respectively, where the Trefftz space **T***(*T*h)* is defined on the mesh T*<sup>h</sup>* as follows:

$$\mathbf{T}(\boldsymbol{\mathcal{T}}\_{h}) \equiv \left\{ (\boldsymbol{\omega}\_{F}, \boldsymbol{q}, \boldsymbol{\omega}\_{S}, \underline{\boldsymbol{\xi}}) \in \mathbf{V}\_{h}(\boldsymbol{\mathcal{T}}\_{h}) \text{ s.t. } \frac{1}{c\_{F}^{2}\rho\_{F}} \frac{\partial q}{\partial t} + \operatorname{div} \boldsymbol{\omega}\_{F} = \boldsymbol{0}, \ \rho\_{F}\frac{\partial \boldsymbol{\omega}\_{F}}{\partial t} + \nabla q = 0 \right\},$$

$$\forall \boldsymbol{K}\_{F} \in \boldsymbol{\mathcal{T}}\_{Fh}, \text{ and } \underline{\underline{\mathbf{A}}}\,\underline{\underline{\mathbf{A}}}\,\underline{\underline{\mathbf{S}}} - \underline{\underline{\mathbf{e}}}(\boldsymbol{\omega}\_{S}) = 0, \ \rho\_{S}\frac{\partial \boldsymbol{\omega}\_{S}}{\partial t} - \operatorname{div} \underline{\underline{\mathbf{f}}} = 0 \ \forall \boldsymbol{K}\_{S} \in \boldsymbol{\mathcal{T}}\_{Sh} \Big\}.$$

This space is of Trefftz type since it is a subspace of the regular space **V***h(*T*h)* composed of local solutions of the volumic governing equations (1) and (2) set in each element *KF* and *KS* respectively.

As in the standard DG methods, the next step in order to obtain the variational formulation consists in multiplying the equations of (1) by the test functions *q* and *ω<sup>F</sup>* in **T***(*T*h)*, and the equations of (2) by the test functions *ξ* and *ω<sup>S</sup>* in **T***(*T*h)* respectively, and, as is standard in space-time DG methods, we integrate by parts the obtained equations not only in space but also in time:

$$\sum\_{K\_F} \int \left[ \frac{1}{c\_{F, \partial K\_F}^2} \tilde{p}\_h q \, n\_{K\_F}^{\boldsymbol{\ell}} + q \, \hat{\mathbf{v}}\_{Fh} \cdot \mathbf{n}\_{K\_F}^{\boldsymbol{\times}} + \rho\_F \, \check{\mathbf{v}}\_{Fh} \cdot \boldsymbol{\omega}\_F \, n\_{K\_F}^{\boldsymbol{\ell}} + \hat{p}\_h \boldsymbol{\omega}\_F \cdot \mathbf{n}\_{K\_F}^{\boldsymbol{\times}} \right] ds + \sum\_{K\_F} \int \left[ \underline{\mathbf{A}} \, \check{\mathbf{g}}\_h \cdot \mathbf{n}\_{K\_F}^{\boldsymbol{\ell}} + \rho\_F \, \check{\mathbf{v}}\_{Sh} \cdot \boldsymbol{\omega}\_{K\_S} \, n\_{K\_S}^{\boldsymbol{\ell}} - \underline{\hat{\mathbf{g}}}\_h : (\boldsymbol{\omega}\_S \otimes \mathbf{n}\_{K\_S}^{\boldsymbol{\ell}}) \right] ds = 0. \tag{4}$$
 
$$\sum\_{K\_F} \int \limits\_{K\_F} f \, q \, d\mathbf{v} . \tag{4}$$

Thanks to the choice of test functions the left hand side of the above space-time formulation contains only surface integrals. The numerical fluxes in time **v**ˆ*F h*, *p*ˆ*h*, **v**ˆ*Sh*, *σ*ˆ *<sup>h</sup>* and in space **v**˘*F h*, *p*˘*h*, **v**˘*Sh*, *σ*˘ *<sup>h</sup>* are defined in the standard DG notations [2, 3, 7] as follows:

⎛ ⎜ ⎜ ⎜⎜ ⎝ **<sup>v</sup>**ˆ*F h* · **<sup>n</sup>***<sup>x</sup> KF p*ˆ*h* **v**ˆ*Sh <sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>***n***<sup>x</sup> KS* ⎞ ⎟ ⎟ ⎟⎟ ⎠ ≡ ⎛ ⎜ ⎜ ⎜⎜ ⎝ **<sup>v</sup>***Sh* · **<sup>n</sup>***<sup>x</sup> KF* <sup>+</sup> *<sup>δ</sup>*1*(<sup>σ</sup> <sup>h</sup>***n***<sup>x</sup> KF* <sup>+</sup> *ph***n***<sup>x</sup> KF )* · **<sup>n</sup>***<sup>x</sup> KF ph* <sup>+</sup> *<sup>α</sup>*1*(***v***F h* · **<sup>n</sup>***<sup>x</sup> KF* <sup>−</sup> **<sup>v</sup>***Sh* · **<sup>n</sup>***<sup>x</sup> KF )* **<sup>v</sup>***Sh* <sup>−</sup> *<sup>δ</sup>*1*(<sup>σ</sup> <sup>h</sup>***n***<sup>x</sup> KS* <sup>+</sup> *ph***n***<sup>x</sup> KS )* <sup>−</sup>*ph***n***<sup>x</sup> KS* <sup>+</sup> *<sup>α</sup>*1*(***v***F h* · **<sup>n</sup>***<sup>x</sup> KS* <sup>−</sup> **<sup>v</sup>***Sh* · **<sup>n</sup>***<sup>x</sup> KS )***n***<sup>x</sup> KS* ⎞ ⎟ ⎟ ⎟⎟ ⎠ on <sup>F</sup>*F S <sup>h</sup> ,* # **<sup>v</sup>**ˆ*F h* · **<sup>n</sup>***<sup>x</sup> KF p*ˆ*h* \$ ≡ # *gF ph* <sup>+</sup> *<sup>α</sup>*1*(***v***F h* · **<sup>n</sup>***<sup>x</sup> KF* − *gF )* \$ *,* # **v**ˆ*Sh <sup>σ</sup>*<sup>ˆ</sup> *<sup>h</sup>***n***<sup>x</sup> KS* \$ ≡ # **<sup>v</sup>***Sh* <sup>−</sup> *<sup>δ</sup>*1*(<sup>σ</sup> <sup>h</sup>***n***<sup>x</sup> KS* − **g***S)* **g***S* \$ on <sup>F</sup>*<sup>D</sup> h .*

# **v**ˆ*F h p*ˆ*h* \$ ≡ # {{**v***F h*} +} *β*1[[*ph*]]*<sup>x</sup>* {{*ph*} +} *α*1[[**v***F h*]]*<sup>x</sup>* \$ *,* # **v**ˆ*Sh σ*ˆ *h* \$ ≡ # {{**v***Sh*} −} *δ*1[[*σ <sup>h</sup>*]]*<sup>x</sup>* {{*ξ*} −} *γ*1[[**v***F h*]]*<sup>x</sup>* \$ on <sup>F</sup>*<sup>Q</sup> h ,* # **v**˘*F h p*˘*h* \$ ≡ # {{**v***F h*} +} *α*2[[**v***F h*]]*<sup>t</sup>* {*ph*} +} *β*2[[*ph*]]*<sup>t</sup>* \$ *,* # **v**˘*Sh σ*˘ *h* \$ ≡ # {{**v***Sh*} +} *γ*2[[**v***F h*]]*<sup>t</sup>* {{*σ <sup>h</sup>*} +} *δ*2[[*σ <sup>h</sup>*]]*<sup>t</sup>* \$ on <sup>F</sup>*<sup>Q</sup> h ,* # **v**˘*F h p*˘*h* \$ ≡ # **v***F h ph* \$ *,* # **v**˘*Sh σ*˘ *h* \$ ≡ # **v***Sh σ h* \$ on <sup>F</sup>*<sup>T</sup> h ,* # **v**˘*F h p*˘*h* \$ ≡ # *(* 1 <sup>2</sup> <sup>−</sup> *<sup>α</sup>*2*)***v***F h* <sup>+</sup> *(* <sup>1</sup> <sup>2</sup> + *α*2*)***v***F*<sup>0</sup> *(* 1 <sup>2</sup> <sup>−</sup> *<sup>β</sup>*2*)ph* <sup>+</sup> *(* <sup>1</sup> <sup>2</sup> + *β*2*)p*<sup>0</sup> \$ *,* # **v**˘*Sh σ*˘ *h* \$ ≡ # *(* 1 <sup>2</sup> <sup>−</sup> *<sup>γ</sup>*2*)***v***Sh* <sup>+</sup> *(* <sup>1</sup> <sup>2</sup> + *γ*2*)***v***S*<sup>0</sup> *(* 1 <sup>2</sup> <sup>−</sup> *<sup>δ</sup>*2*)<sup>σ</sup> <sup>h</sup>* <sup>+</sup> *(* <sup>1</sup> <sup>2</sup> + *δ*2*)σ*<sup>0</sup> \$ on <sup>F</sup><sup>0</sup> *h,*

Here, *α*1, *α*2, *β*1, *β*2, *δ*1, *δ*2, *γ*1, and *γ*<sup>2</sup> are positive penalty parameters. As in standard DG methods, a suitable choice of these penalty parameters allows one to prove stability of the overall method. It is shown in [2, 3] that they contribute to the accuracy and convergence of the numerical method. We refer to [2, 3] for more details on the definition of the numerical fluxes.

Summing the contribution (4) of all elements *KF , KS* ∈ T*h*, and introducing the bilinear A*T DG(*·;·*)* and the linear *T DG* ·*)* forms for the left-hand side and the right-hand side expressions respectively, we obtain the Trefftz-DG formulation for the elasto-acoustic problem:

Seek *(***v***F h, ph,* **v***Sh, σ h)* ∈ **T***(*T*h)* such that, for all *(ω<sup>F</sup> , q, ωS, ξ )* ∈ **T***(*T*h)* it holds true:

$$\mathcal{R}\_{TDG}((\mathbf{v}\_{Fh}, \ p\_h, \ \mathbf{v}\_{Sh}, \ \underline{\mathbf{e}}\_h); (\boldsymbol{\omega}\_F, \ q, \ \boldsymbol{\omega}\_S, \ \underline{\mathbf{f}})) = \ell\_{TDG}(\boldsymbol{\omega}\_F, \ q, \ \boldsymbol{\omega}\_S, \ \underline{\mathbf{f}}).\tag{5}$$

The analysis of well-posedness of (5) is based on the coercivity and continuity estimates of the bilinear and linear forms in mesh-dependent norms [2, 3]. The proof is similar to the one given in [10] where the acoustic wave equation is addressed. In Sect. 2 we provide the algorithm of the Trefftz-DG formulation (5), and we discuss different analytical and numerical approaches for its optimization.

#### **2 Implementation of the Algorithm**

The numerical implementation of the Trefftz-DG formulation is different from the standard DG ones which address the space and time integration separately. Standard DG space integrations have the interesting feature of leading to a block-diagonal mass matrix and allow then the use of explicit time integration. The computational costs thus depend on a CFL condition which sets the value of the time step as a function of the space step. On the other hand, a naive implementation of Trefftz-DG methods require performing a space-time integration which leads to invert a sparse matrix whose size tends to be huge. It is thus not obvious that a crude implementation of the Trefftz-DG algorithm does not generate additional cost as compared to standard DG ones.

In this section we provide some important steps of implementation of Trefftz-DG formulation (5) and discuss optimization techniques. The complete algorithm with more numerical details can be found in [2, 3].

#### *2.1 Change-Over Between the Time Slabs*

To simplify the presentation, we assume here that we use the same order of approximation on each cells, so that we have *N<sup>f</sup> dof* degrees of freedom on fluid cells and *N<sup>s</sup> dof* degrees of freedom on solid cells. Once we have defined the discrete approximation space, we can solve the problem inside each element *KF* and *KS*, communicating the corresponding values at the boundaries *∂KF* and *∂KS* by the incoming and outgoing fluxes. Thus, the variational problem is represented by a algebraic linear system, with a sparse matrix *M*, of size equals to the total number of elements *Nf,s el* multiplied by the number of degrees of freedom per element *<sup>N</sup>f,s dof* , that is *N<sup>f</sup> el* <sup>×</sup> *<sup>N</sup><sup>f</sup> dof* <sup>+</sup> *<sup>N</sup><sup>s</sup> el* <sup>×</sup> *<sup>N</sup><sup>s</sup> dof* . When compared to the computational cost of standard DG implementation, the corresponding Trefftz-DG cost is thus increased and it is mainly due to the need of inverting the large-sized matrix. The most obvious way to reduce the size of the matrix, which is classically used in most work on space-time Trefftz method, is to consider time slabs. We restrict ourselves to the case of cartesian meshes, but this methodology can also be applied to unstructured meshed. An alternative is to use tent-pitched meshes that respect the causality, this will be the topic of a future work. In order to optimize the execution of the algorithm, we propose to divide the space-time domain *Q* into *Nt* elementary time slabs *Q*1*, Q*2*,..., QNt* and to solve the problem slab by slab, considering the final results, computed in the current time slab at time *t*, as initial values for the next slab at time *t* + *t* (see Fig. 2). Thus the size of matrix inside each time slab is *Nt*

**Fig. 2** Example of 1D + time mesh T*<sup>h</sup>* on *Q* decomposed into *Nt* time slabs

times smaller, compared to the initial one. Moreover, if the medium parameters are fixed in time, and the space discretization is preserved from slab to slab, the matrix can be computed and inverted once, and then re-projected onto the next time slabs, reducing thus the global numerical cost.

#### *2.2 Polynomial Basis*

One of the important advantages of Trefftz type methods is the flexibility in the choice of basis functions provided they satisfy the Trefftz property locally in each element. To perform the numerical simulations, we have extended the algorithm proposed by Maciag in [9] for computing wave polynomials, solutions of the second order transient wave equation, to the first order acoustic and elastodynamic systems of dimension one and higher. It consists in computing a polynomial basis, defined in the reference element, using Taylor expansions of generating exponential functions which are local solutions of the initial system of equations. An example of spacetime wave polynomial basis for the first-order acoustic wave equation reads as follows (approximation degree p=3, dimension of the physical space *d* = 1):

$$\begin{array}{llll}\hat{\phi}\_{1}^{v} = 0 & \hat{\phi}\_{2}^{v} = 1 & \hat{\phi}\_{3}^{v} = \text{x} & \hat{\phi}\_{4}^{v} = c\_{F}\text{t} \\ \hat{\phi}\_{1}^{p} = -c\_{F} & \hat{\phi}\_{2}^{p} = 0 & \hat{\phi}\_{3}^{p} = -c\_{F}^{2}\text{t} & \hat{\phi}\_{4}^{p} = -c\_{F}\text{x} \\ \hat{\phi}\_{3}^{v} = -\frac{\mathbf{x}^{2}}{2} - \frac{c\_{F}^{2}\mathbf{r}^{2}}{2} & \hat{\phi}\_{6}^{v} = -c\_{F}\text{x} & \hat{\phi}\_{7}^{v} = -\frac{\mathbf{x}^{3}}{6} - \frac{\mathbf{x}c\_{F}^{2}\mathbf{r}^{2}}{2} & \hat{\phi}\_{8}^{v} = -\frac{c\_{F}^{3}\text{t}^{3}}{6} - \frac{\mathbf{x}^{2}c\_{F}\text{t}}{2} & \\ \hat{\phi}\_{8}^{p} = c\_{F}^{2}\text{x} & \hat{\phi}\_{6}^{p} = c\_{F}(\frac{\mathbf{x}^{2}}{2} + \frac{c\_{F}^{2}\mathbf{r}^{2}}{2}) & \hat{\phi}\_{7}^{p} = c\_{F}(\frac{\mathbf{x}^{3}}{6} + \frac{\mathbf{x}^{2}c\_{F}\text{t}}{2}) & \hat{\phi}\_{8}^{p} = c\_{F}(\frac{\mathbf{x}^{3}}{6} + \frac{\mathbf{x}c\_{F}^{2}\mathbf{r}}{2}) \end{array}$$

This basis contains the couples of polynomial functions (*φ*ˆ*<sup>v</sup>* · *,* <sup>ˆ</sup>*φ<sup>p</sup>* · ), corresponding to the velocity and pressure respectively, which are locally defined and satisfy the Trefftz property inside each element of the mesh, and of degrees less or equal to *p* (*p* = 0*,* 1*,* 2*,* 3) to provide an approximation of order *p*. By their construction, the Trefftz basis functions are not attached to the coordinates of the degrees of freedom inside the element, contrary to the Lagrange polynomials. Even if we compute only surface integrals, we can evaluate the final approximation solution in any point of the element refinement. We refer to [2] for more numerical details as well as for the acoustic and elastodynamic basis examples of higher dimensions.

#### *2.3 Inversion of the Matrix M Inside a Time Slab*

The inversion of the matrix inside the time slab can be explicitly reduced to the inversion of its block-diagonal component, which corresponds to the integration at the bottom and top of the time slab (initial and final time faces <sup>F</sup><sup>0</sup> *<sup>h</sup>* and <sup>F</sup>*<sup>T</sup> <sup>h</sup>* ), thanks to the Taylor expansion formulas. More precisely, let us recall the expression for the bilinear form A*T DG(*·;·*)* from Sect. 1.2:

$$\mathcal{A}\_{TDG}(\cdot;\cdot) \equiv \underbrace{\int + \int + \int + \int + \int + \int \cdot}\_{\mathcal{F}^{\Gamma}\_{TDG}} \cdot \underbrace{\mathcal{F}^{\mathcal{Q}}\_{h} \cdot \mathcal{F}^{\mathcal{D}}\_{h} \cdot \mathcal{F}^{\mathcal{S}}\_{h}}\_{\mathcal{A}^{I}\_{TDG}}$$

It consists of <sup>A</sup> *T DG(*·; ·*)*, that corresponds to the integration at the initial and final time element faces of the time slab, and <sup>A</sup>*<sup>I</sup> T DG(*·; ·*)*, that corresponds to the integration at the internal, boundary and fluid-solid element faces. Thus, the matrix *M* can be represented by the sum of two matrices *M* and *IMI* corresponding to <sup>A</sup> *T DG* and <sup>A</sup>*<sup>I</sup> T DG* respectively, as follows:

$$M = \Delta\_{\Omega} M\_{\Omega} + \Delta\_{I} M\_{I}.$$

Here,  <sup>∝</sup> *(
x)<sup>d</sup>* represents the area of the local faces in <sup>F</sup><sup>0</sup> *<sup>h</sup>* and <sup>F</sup>*<sup>T</sup> <sup>h</sup>* , and *I* ∝ *(
x)d*−1*<sup>t</sup>* represents the area of the local faces in <sup>F</sup>*<sup>Q</sup> <sup>h</sup>* , <sup>F</sup>*<sup>D</sup> <sup>h</sup>* and <sup>F</sup>*F S <sup>h</sup>* respectively. We refer to [2] for more details.

This decomposition is of particular interest since *M* is block-diagonal, each block corresponding to one element. Indeed, we have:

$$
\Delta\_{\Omega}M\_{\Omega} + \Delta\_{I}M\_{I} = \left(\Delta\_{\Omega}M\_{\Omega}\right)\left(\mathbf{I} + \frac{\Delta\_{I}}{\Delta\_{\Omega}}M\_{\Omega}^{-1}M\_{I}\right) = \left(\Delta\_{\Omega}M\_{\Omega}\right)\left(\mathbf{I} + \kappa P\right),
$$

Here I is the identity matrix, *<sup>κ</sup>* <sup>≡</sup> *<sup>I</sup>*  <sup>∝</sup> *<sup>t</sup> <sup>x</sup>* , and *<sup>P</sup>* <sup>≡</sup> *<sup>M</sup>*−<sup>1</sup> *MI* .

If ||*κP*|| is sufficiently small, we can apply the Maclaurin formula in order to obtain the polynomial expansion for *M*−<sup>1</sup> as follows:

$$M^{-1} \equiv \left(\mathbf{I} + \boldsymbol{\kappa} \, P\right)^{-1} \left(\boldsymbol{\Delta} \boldsymbol{\alpha} \, M \boldsymbol{\Omega}\right)^{-1} = \left(\sum\_{n=0}^{\infty} (-1)^n \boldsymbol{\kappa}^n \, P^n\right) \left(\boldsymbol{\Delta} \boldsymbol{\alpha} \, M \boldsymbol{\Omega}\right)^{-1}.$$

This representation reduces the inversion of the sparse matrix *M* to the inversion of its block-diagonal component *M*−<sup>1</sup> and the multiplication of the inverted blockdiagonal *M* by the sparse *MI* . It provides an explicit way for solving the initial linear system approximately. Even though it requires a CFL—type condition related to value of ||*κP*||, justifying the approximate solution of the system, it significantly accelerates the algorithm execution.

In Table 1 we compare the numerical accuracy (*L*2-norm in time and space of numerical error as a function of cell size *x*) of the TDG method in a 2D homogeneous acoustic case for both the exact and approximate matrix inversions as a function of the mesh size and of the number *n* of terms in the Taylor expansion.


**Table 1** Accuracy (*L*2-error in space and in time) of the solution when using the approximate inversion with *n* = 3*,* 4*,* 5 and the exact inversion

The accelerating factor is the ratio of the computational costs of the two methods for reaching the same accuracy

#### **3 Numerical Tests**

For the numerical implementation of the Trefftz-DG method we have considered a 2D medium composed of two homogeneous rectangular layers: the acoustic one and the elastodynamic one. We have set a source term at the fluid-solid interface, and two receivers in the acoustic layer and in the elastodynamic one. The numerical signals at both receivers have been validated with the analytical solutions computed with *Gar*6*more* code [5]. In Fig. 3 we show the convergence of the numerical velocity as a function of cell size for different degrees of approximation (*p* = 0*,* 1*,* 2*,* 3) computed at receivers in (a) 2D acoustic layer and (b) 2D elastodynamic layer. In each case, the convergence rate is higher than the corresponding approximation degree. We refer to [2], where we provide more examples.

**Fig. 3** Convergence of numerical velocity in function of cell size *x*. (**a**) 2D acoustic layer. (**b**) 2D elastodynamic layer

#### **4 Conclusion**

The Trefftz-DG methodology for solving the first order elasto-acoustic system has demonstrated the important advantages, such as the use of degrees of freedom evaluated at the element faces only, the flexibility in the choice of the basis functions and the unconditional stability. However, in its initial form, it still shows some limitations due to the space-time integration that leads to the representation of the discrete system by a huge sparse matrix whose straightforward inversion is very expensive, even when using time slabs. We find ourselves in a situation of using an implicit scheme for solving the forward problem that risks to overload the iterative process of the corresponding inverse problem in order to reconstruct very large propagation domains. Fortunately, thanks to the decomposition of the matrix by separating the time variables from the space ones, we could benefit from the block-diagonal structure of the standard DG formulation ending up with an explicit scheme, that is more convenient from the numerical point of view. The performed numerical tests clearly illustrate the interest of the split version of discrete problem.

**Acknowledgements** This project is supported by the Inria—Total SA strategic action "Depth Imaging Partnership" (http://dip.inria.fr), and has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement Number 777778 (Rise action Mathrocks).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **An** *hp***-Adaptive Iterative Linearization Discontinuous-Galerkin FEM for Quasilinear Elliptic Boundary Value Problems**

**Paul Houston and Thomas P. Wihler**

#### **1 Introduction**

In this article, we consider the *a posteriori* error analysis, in a natural meshdependent energy norm, for a class of interior-penalty *hp*-version discontinuous Galerkin finite element methods (DGFEMs) for the numerical solution of the following quasilinear elliptic boundary value problem:

$$-\nabla \cdot (\mu(\mathfrak{x}, |\nabla u|) \nabla u) = f \qquad \text{in } \Omega, \qquad u = 0 \qquad \text{on } \Gamma. \tag{1}$$

Here, *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>2</sup> is a bounded polygon with a Lipschitz continuous boundary *<sup>Γ</sup>* , and *<sup>f</sup>* <sup>∈</sup> <sup>L</sup>2*(Ω)*, where for an open set *<sup>D</sup>* <sup>⊆</sup> *<sup>Ω</sup>*, we signify by L2*(D)* the space of all square integrable functions on *D*. Additionally, we assume that the nonlinearity *μ* satisfies the following assumptions: (A1) *<sup>μ</sup>* <sup>∈</sup> C0*(Ω* × [0*,*∞*))*; (A2) there exist positive constants *mμ*, *Mμ* such that *mμ(t* −*s)* ≤ *μ(x, t)t* −*μ(x, s)s* ≤ *Mμ(t* −*s)*, *t* ≥ *s* ≥ 0, *x* ∈ *Ω*¯ . We remark that, if *μ* satisfies (A2), there exist constants *<sup>β</sup>* <sup>≥</sup> *α >* 0, such that for all vectors *<sup>v</sup>, <sup>w</sup>* <sup>∈</sup> <sup>R</sup>2, and all *<sup>x</sup>* <sup>∈</sup> *<sup>Ω</sup>*,

$$\begin{aligned} \left| \mu(\mathbf{x}, |\boldsymbol{\nu}|) \boldsymbol{\nu} - \mu(\mathbf{x}, |\boldsymbol{\omega}|) \boldsymbol{\omega} \right| &\leq \beta |\boldsymbol{\nu} - \boldsymbol{\omega}|, \\\boldsymbol{\alpha} |\boldsymbol{\nu} - \boldsymbol{\omega}|^{2} &\leq \left( \mu(\mathbf{x}, |\boldsymbol{\nu}|) \boldsymbol{\nu} - \mu(\mathbf{x}, |\boldsymbol{\omega}|) \boldsymbol{\omega} \right) \cdot (\boldsymbol{\nu} - \boldsymbol{\omega}); \end{aligned} \tag{2}$$

P. Houston (-)

T. P. Wihler

© The Author(s) 2020

407

School of Mathematical Sciences, University of Nottingham, Nottingham, UK e-mail: Paul.Houston@nottingham.ac.uk

Mathematics Institute, University of Bern, Bern, Switzerland e-mail: wihler@math.unibe.ch

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_32

see [14, Lemma 2.1]. For ease of notation, in the sequel, we will simply write *μ(s)* instead of *μ(x, s)*, thereby suppressing the explicit dependence of *μ* on *x* ∈ *Ω*.

The weak formulation of (1) is to find *<sup>u</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)* such that

$$A(u; u, v) = (f, v)\_{\mathcal{L}^2(\mathcal{Q})} \qquad \forall v \in \mathcal{H}\_0^1(\mathcal{Q}), \tag{3}$$

where, given *<sup>w</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)*, we define the bilinear form *A(w*; *u, v)* = ; *<sup>Ω</sup> μ(*|∇*w*|*)*∇*<sup>u</sup>* · ∇*<sup>v</sup>* <sup>d</sup>*x*, *u, v* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)*, as well as the L2*(Ω)*-inner product *(v, w)*L2*(Ω)* = ; *<sup>Ω</sup> vw* <sup>d</sup>*x*, *v, w* <sup>∈</sup> <sup>L</sup>2*(Ω)*. Here, H<sup>1</sup> <sup>0</sup>*(Ω)* is the standard Sobolev space of first order, with zero trace along *Γ* , equipped with the norm *v*H<sup>1</sup> <sup>0</sup>*(Ω)* = ∇*v*L2*(Ω)*, *<sup>v</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)*. Under the assumptions (A1)–(A2) above, it is elementary to show that the form *A* is strongly monotone and Lipschitz continuous in the sense that

$$A(u; \boldsymbol{u}, \boldsymbol{u} - \boldsymbol{v}) - A(v; \boldsymbol{v}, \boldsymbol{u} - \boldsymbol{v}) \ge \alpha \|\boldsymbol{u} - \boldsymbol{v}\|\_{\operatorname{H}\_{0}^{1}(\mathcal{Q})}^{2} \qquad \forall \boldsymbol{u}, \boldsymbol{v} \in \operatorname{H}\_{0}^{1}(\mathcal{Q}),\tag{4}$$

and

$$|A(u; \boldsymbol{u}, \boldsymbol{v}) - A(w; \boldsymbol{w}, \boldsymbol{v})| \le \beta \|\boldsymbol{u} - \boldsymbol{w}\|\_{\mathcal{H}^1\_0(\Omega)} \|\boldsymbol{v}\|\_{\mathcal{H}^1\_0(\Omega)} \qquad \forall \boldsymbol{u}, \boldsymbol{v}, \boldsymbol{w} \in \mathcal{H}^1\_0(\Omega),$$

respectively. From these properties, classical monotone operator theory implies existence and uniqueness of a solution of (3); see, e.g., [17, Theorem 3.3.23].

The exploitation of automatic adaptive *hp*-refinement algorithms has the potential to compute numerical solutions to partial differential equations (PDEs) in a highly efficient manner, often leading to exponential rates of convergence as the underlying finite element space is enriched; see, e.g., [11, 16]. The key tool required to design such strategies is the derivation of *a posteriori* estimates for the Galerkin discretization errors; in recent years such bounds have been extended to the context of linearization and/or linear solver errors, cf. [1, 2, 4, 5, 7, 9]. In the present article we consider the derivation of an *hp*-version *a posteriori* error bound for the DGFEM approximation of the second-order quasilinear elliptic PDE problem stated in (1). To this end, we employ the interior penalty DGFEM proposed in [10], cf. also [12], together with a discrete Kacanov iterative linearization ˇ scheme, cf. [6]. Based on the analysis undertaken in [12], together with the use of a suitable reconstruction operator, cf. [13, 15], we derive a fully computable bound for the error, measured in terms of a suitable DGFEM energy norm, which separately accounts for the three main sources of error: discretization, linearization, and linear solver errors. On the basis of this *a posteriori* bound, we design and implement an *hp*-adaptive refinement algorithm which automatically controls each of these error contributions as the underlying finite element space is enriched. Numerical experiments highlighting the practical performance of the proposed adaptive strategy are presented.

#### **2 Iterative Discontinuous Galerkin Methods**

#### *2.1 Discrete hp-Discontinuous Galerkin Spaces*

Let *T<sup>h</sup>* be a partition of *Ω* into disjoint open and shape-regular elements *κ* such that *Ω* = 8 *<sup>κ</sup>*∈*T<sup>h</sup> <sup>κ</sup>*. We assume that each *<sup>κ</sup>* <sup>∈</sup> *<sup>T</sup><sup>h</sup>* is an affine image of a given master element B*κ*, which is either the open triangle {*(x, y)* : −<sup>1</sup> *<x<* <sup>1</sup>*,* <sup>−</sup><sup>1</sup> *<y<* <sup>−</sup>*x*}) or the open square *(*−1*,* <sup>1</sup>*)*<sup>2</sup> in <sup>R</sup>2. By *hκ* we denote the element diameter of *<sup>κ</sup>* <sup>∈</sup> *<sup>T</sup>h*, and *n<sup>κ</sup>* signifies the unit outward normal vector to *κ*. We allow *T<sup>h</sup>* to be *1-irregular*, i.e., each edge of any one element *κ* ∈ *T<sup>h</sup>* contains at most one hanging node (which, for simplicity, we assume to be the midpoint of the corresponding edge). In this context, we suppose that *T<sup>h</sup>* is *regularly reducible* (cf. [18, Section 7.1] and [12]), i.e., there exists a shape-regular conforming (regular) mesh <sup>3</sup> *Th* (consisting of triangles and parallelograms) such that the closure of each element in *Th* is a union of closures of elements of <sup>3</sup> *Th*, and that there exists a constant *C >* 0, independent of the element sizes, such that for any two elements *<sup>κ</sup>* <sup>∈</sup> *<sup>T</sup><sup>h</sup>* and3*<sup>κ</sup>* <sup>∈</sup> <sup>3</sup> *Th* with <sup>3</sup>*<sup>κ</sup>* <sup>⊆</sup> *<sup>κ</sup>* we have *hκ/h*3*<sup>κ</sup>* <sup>≤</sup> *<sup>C</sup>*. Note that these assumptions imply that *<sup>T</sup><sup>h</sup>* is of *bounded local variation*, i.e., there exists a constant *ρ*<sup>1</sup> ≥ 1, independent of the element sizes, such that *ρ*−<sup>1</sup> <sup>1</sup> ≤ *hκ6/hκ7* ≤ *ρ*1, for any pair of elements *κ6, κ7* ∈ *T<sup>h</sup>* which share a common edge *e* = *∂κ6* ∩ *∂κ7*. Moreover, let us consider the set *E* of all one-dimensional open edges of all elements *κ* ∈ *Th*. Further, we denote by *EI* the set of all edges *e* ∈ *E* that are contained in the open domain *Ω* (interior edges). Additionally, we introduce *EB* to be the set of boundary edges consisting of all *e* ∈ *E* that are contained in *Γ* .

For any integer *<sup>p</sup>* <sup>∈</sup> <sup>N</sup>0, we denote by <sup>P</sup>*p(κ)* the set of polynomials of total degree *p* on *κ*. Similarly, when *κ* is a quadrilateral, we also consider Q*p(κ)*, the set of all tensor-product polynomials on *κ* of degree *p* in each coordinate direction. To each *κ* ∈ *T<sup>h</sup>* we assign a polynomial degree *pκ* (local approximation order). We collect the local polynomial degrees in a vector *p* = {*pκ* : *κ* ∈ *Th*}, and then introduce the *hp*-DGFEM space

$$\mathbb{V}\_{\mathsf{Da}}(\mathcal{J}\_{\hbar}, \mathfrak{p}) = \{ \upsilon \in \mathrm{L}^{2}(\mathfrak{Q}) \, : \, \upsilon|\_{\kappa} \in \mathbb{S}\_{p\_{\kappa}}(\kappa) \quad \forall \kappa \in \mathcal{J}\_{\hbar} \} \,,$$

with S being either P or Q. We shall suppose that the polynomial degree vector **p**, with *pκ* ≥ 1 for each *κ* ∈ *T*, has *bounded local variation*, i.e., there exists a constant *ρ*<sup>2</sup> ≥ 1, independent of the local element sizes and *p*, such that, for any pair of neighbouring elements *κ6, κ7* <sup>∈</sup> *<sup>T</sup>h*, we have *<sup>ρ</sup>*−<sup>1</sup> <sup>2</sup> ≤ *pκ6/pκ7* ≤ *ρ*2.

We also define the L2-projection *<sup>Π</sup>Th,<sup>p</sup>* : <sup>L</sup>2*(Ω)* <sup>→</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)* by

$$(\Pi\_{\mathcal{R}\_h, \mathbf{p}} v - v, w)\_{\mathbf{L}^2(\mathfrak{Q})} = 0 \quad \forall w \in \mathbb{V}\_{\mathfrak{D}}(\mathcal{R}\_h, \mathbf{p}).$$

Evidently, since functions in VDG*(Th, p)* do not need to be continuous, we have that *Πκ,pκ* <sup>=</sup> *<sup>Π</sup>Th,p*|*<sup>κ</sup>* , where, for *<sup>κ</sup>* <sup>∈</sup> *<sup>T</sup>h*, we let *Πκ,pκ* be the L2-projection onto S*pκ (κ)*.

#### *2.2 Nonlinear hp-DGFEM Formulation*

Let *κ6* and *κ7* be two adjacent elements of *Th*, and *x* an arbitrary point on the interior edge *e* ∈ *EI* given by *e* = *(∂κ6* ∩ *∂κ7)* ◦. Furthermore, let *v* and *q* be scalar- and vector-valued functions, respectively, that are sufficiently smooth inside each element *κ6, κ7*. Then, the averages of *v* and *q* at *x* ∈ *e* are given by *v* = <sup>1</sup>*/*2*(v*|*κ6* +*v*|*κ7 )*, *q* = <sup>1</sup>*/*2*(q*|*κ6* +*q*|*κ7 )*, respectively. Similarly, the jumps of *v* and *q* at *x* ∈ *e* are given by [[*v*]] = *v*|*κ6 nκ6* +*v*|*κ7 nκ7* , [[*q*]] = *q*|*κ6* ·*nκ6* +*q*|*κ7* ·*nκ7* , respectively. On a boundary edge *e* ∈ *EB*, we set *v* = *v*, *q* = *q* and [[*v*]] = *vn*, with *n* denoting the unit outward normal vector on the boundary *Γ* .

Furthermore, we introduce the edge functions h*,* p ∈ L∞*(E)*, which, for an edge *e* ∈ *E*, are given by h|*<sup>e</sup>* := *he* and p|*<sup>e</sup>* = *p*|*<sup>e</sup>* , with *he* denoting the length of *e*. In addition, we define the discontinuity penalisation function *σ* ∈ L∞*(E)* given by *σ* = *<sup>γ</sup>* <sup>p</sup>2h−1*,* where *<sup>γ</sup>* <sup>≥</sup> 1 is a (sufficiently large) constant. Then, we equip the DGFEM space <sup>V</sup>DG*(Th, <sup>p</sup>)* with the DGFEM norm *v*<sup>2</sup> DG :=  <sup>∇</sup>*T<sup>h</sup> <sup>v</sup>* 2 <sup>L</sup>2*(Ω)* + ; *<sup>E</sup> <sup>σ</sup>*|[[*v*]]|<sup>2</sup> <sup>d</sup>*s*, *<sup>v</sup>* <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*, where <sup>∇</sup>*T<sup>h</sup>* is the element-wise gradient operator.

With this notation, following [10], we introduce the interior penalty DGFEM discretization of (3) by: find *<sup>u</sup>*DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)* such that

$$A\_{\mathsf{D}\mathfrak{g}}(\mu\_{\mathsf{D}\mathfrak{g}};\mu\_{\mathsf{D}\mathfrak{g}},\upsilon)=(f,\upsilon)\_{\mathsf{L}^{2}(\varOmega)}\qquad\forall\upsilon\in\mathbb{V}\_{\mathsf{D}\mathfrak{g}}(\mathcal{P}\_{\mathsf{h}},\mathfrak{p}),\tag{5}$$

where, for given *<sup>w</sup>* <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*, we define the DGFEM bilinear form

$$\begin{split} A\_{\mathsf{D3}}(w;\boldsymbol{u},\boldsymbol{v}) &= \int\_{\Omega} \mu(|\nabla\_{\mathcal{G}\_{h}}\boldsymbol{w}|) \nabla\_{\mathcal{G}\_{h}}\boldsymbol{\mu} \cdot \nabla\_{\mathcal{G}\_{h}}\boldsymbol{v} \,\mathrm{d}\mathbf{x} \\ &- \int\_{\mathcal{E}} \langle\!\!\!\mu(|\nabla\_{\mathcal{G}\_{h}}\boldsymbol{w}|) \nabla\_{\mathcal{G}\_{h}}\boldsymbol{u}\rangle \cdot \llbracket\!\boldsymbol{v}\rrbracket \,\mathrm{d}\mathbf{s} + \theta \int\_{\mathcal{E}\_{\partial\mathcal{B}}} \langle\!\!\!\mu(\mathfrak{h}^{-1}|\llbracket\!\boldsymbol{w}\rrbracket|) \nabla\_{\mathcal{G}\_{h}}\boldsymbol{v}\rangle \cdot \llbracket\!\boldsymbol{u}\rrbracket \,\mathrm{d}\mathbf{s} \\ &+ \int\_{\mathcal{E}\_{\partial\mathcal{B}}} \sigma\left\llbracket\!\!\!\!\!\mu\right\Vert \cdot \llbracket\!\boldsymbol{v}\rrbracket \,\mathrm{d}\mathbf{s}, \qquad \boldsymbol{u},\boldsymbol{v} \in \mathbb{V}\_{\mathsf{D3}}(\mathcal{G}\_{h},\boldsymbol{p}), \end{split}$$

where *θ* ∈ [−1*,* 1] is a method parameter. Referring to [10, Theorem 2.5], provided that *γ* ≥ 1 is chosen sufficiently large (independent of the local element sizes and of the polynomial degree distribution), the existence and uniqueness of the DGFEM solution *<sup>u</sup>*DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)* satisfying (5) is guaranteed.

**Assumption 1** *In the sequel, we suppose that there exists a computable* a posteriori *error estimate of the form u* − *u*DGDG ≤ *η(u*DG*,f), where <sup>u</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω) is the solution of* (1)*, and u*DG*is its hp-DGFEM approximation defined in* (5)*.*

*Remark 1* In the article [12, Theorem 3.2] it has been proved that such a bound does indeed exist. More precisely, we have that

$$\|\|u - u\_{\mathsf{D3}}\|\|\_{\mathsf{D6}} \le C \left(\sum\_{\kappa \in \mathcal{P}\_{\hbar}} \eta\_{\kappa}^2 + \mathcal{O}(f, u\_{\mathsf{D3}})\right)^{1/2} =: \eta(u\_{\mathsf{D3}}, f), \tag{6}$$

where, the local error indicators *ηκ* , *κ* ∈ *Th*, are defined by

$$\begin{split} \boldsymbol{\eta}\_{\kappa}^{2} := & \boldsymbol{h}\_{\kappa}^{2} p\_{\kappa}^{-2} \left\| \boldsymbol{\Pi}\_{\mathcal{P}\_{\mathbb{R}}, \mathbf{p}-1} (\boldsymbol{f} + \nabla \cdot (\mu(|\nabla \boldsymbol{u}\_{\mathbb{B}\mathbb{B}}) \nabla \boldsymbol{u}\_{\mathbb{B}\mathbb{B}})) \right\|\_{\operatorname{\mathcal{L}}^{2}(\kappa)}^{2} \\ & + \boldsymbol{h}\_{\kappa} p\_{\kappa}^{-1} \left\| \boldsymbol{\Pi}\_{\mathcal{C}, \mathbf{p}-1} (\left\| \boldsymbol{\mu} (|\nabla \boldsymbol{u}\_{\mathbb{B}\mathbb{B}}| ) \nabla \boldsymbol{u}\_{\mathbb{B}\mathbb{B}} \right\|) \right\|\_{0, \partial \kappa \backslash \Gamma}^{2} + \boldsymbol{\chi}^{2} \boldsymbol{h}\_{\kappa}^{-1} p\_{\kappa}^{3} \left\| \left\| \boldsymbol{\mu} \mathbf{n} \mathbf{g} \right\| \right\|\_{\operatorname{\mathcal{L}}^{2}(\partial \kappa)}^{2}, \end{split} \tag{7}$$

and *O(f, u*DG*)*:= < *<sup>κ</sup>*∈*T<sup>h</sup> <sup>O</sup>(*1*) <sup>κ</sup>* + < *<sup>e</sup>*∈*E<sup>I</sup> <sup>O</sup>(*2*) <sup>e</sup>* is a data oscillation term. For *κ* ∈ *T<sup>h</sup>* and *<sup>e</sup>* <sup>∈</sup> *EI*, we have *<sup>O</sup>(*1*) <sup>κ</sup>* := *<sup>h</sup>*<sup>2</sup> *κp*−<sup>2</sup> *<sup>κ</sup> (*I−*ΠTh,p*−1*)*|*<sup>κ</sup> (f* +∇·*(μ(*|∇*u*DG|*)*∇*u*DG*))*<sup>2</sup> <sup>0</sup>*,κ ,* and *<sup>O</sup>(*2*) <sup>e</sup>* := *hep*¯−<sup>1</sup> *<sup>e</sup> (*<sup>I</sup> <sup>−</sup> *<sup>Π</sup>E,p*−1*)*|*<sup>e</sup> (*[[*μ(*|∇*T<sup>h</sup> <sup>u</sup>*DG|*)*∇*T<sup>h</sup> <sup>u</sup>*DG]]*)*<sup>2</sup> <sup>0</sup>*,e,* where I denotes a generic identity operator. Here, we write *p* − 1 := {*pκ* − 1}*κ*∈*T<sup>h</sup>* . Additionally, we denote by *<sup>Π</sup>E,p*−1|*<sup>e</sup>* the L2-projector onto <sup>P</sup>*p*¯*e*−1*(e)*, where we let *pe* <sup>=</sup> max{*pκ6 , pκ7* }, with *κ6, κ7* ∈ *Th*, *e* = *∂κ6*∩*∂κ7*. Moreover,*C >* 0 in (6) is a constant that is independent of the local element sizes, the polynomial degree vector *p*, and the parameters *γ* and *θ*.

#### *2.3 Iterative DGFEM*

In order to provide a practical solution scheme for the nonlinear *hp*-DGFEM system (5) we propose a linearization approach based on a discrete *Kaˇcanov fixed point iteration*, see, e.g., [6]. To this end, we begin by selecting an initial guess *u*<sup>0</sup> DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*. Then, for *<sup>n</sup>* <sup>≥</sup> 1, given *<sup>u</sup>n*−<sup>1</sup> DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*, we solve the *linear hp-DGFEM formulation*, defined by

$$A\_{\boxtimes \mathfrak{a}}(\boldsymbol{u}\_{\boxtimes \mathfrak{a}}^{n-1}; \boldsymbol{u}\_{\boxtimes \mathfrak{a}}^{n}, \boldsymbol{v}) = (f, \boldsymbol{v})\_{\mathcal{L}^{2}(\Omega)} \qquad \forall \boldsymbol{v} \in \mathbb{V}\_{\boxtimes \mathfrak{a}}(\mathcal{P}\_{\mathbb{h}}, \mathfrak{p}), \tag{8}$$

for *u<sup>n</sup>* DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*. We emphasize that, in actual computations, the linear system (8) may be solved by an iterative algorithm, thereby generating an approximate numerical solution B*u<sup>n</sup>* DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*, with <sup>B</sup>*u<sup>n</sup>* DG <sup>≈</sup> *<sup>u</sup><sup>n</sup>* DG. This means that, in practice, instead of computing the sequence {*u<sup>n</sup>* DG}*n*≥<sup>0</sup> obtained from the iteration (8), an inexact sequence {B*u<sup>n</sup>* DG}*n*≥<sup>0</sup> is generated such that

$$A\_{\mathsf{D}\mathfrak{G}}(\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{G}}^{n-1};\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{G}}^{n},\boldsymbol{v}) \approx (\boldsymbol{f},\boldsymbol{v})\_{\mathsf{L}^{2}(\mathfrak{D})} \qquad \forall \boldsymbol{v} \in \mathbb{V}\_{\mathsf{D}\mathfrak{G}}(\mathcal{P}\_{\mathsf{h}},\mathbf{p}).\tag{9}$$

From a mathematical view point, this (inexact) iterative linearization DGFEM approach gives rise to three different sources of error:

1. *Discretization error*, which is expressed by the residual

$$
\rho^n\_{\mathsf{DG}} := A\_{\mathsf{DG}}(\widehat{\boldsymbol{\mu}}^n\_{\mathsf{DG}}; \widehat{\boldsymbol{\mu}}^n\_{\mathsf{DG}}, \cdot) - (f, \cdot)\_{\mathsf{L}^2(\mathcal{Q})}.\tag{10}
$$

2. *Linearization error*, which is given in terms of the residual *ψ<sup>n</sup>* DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*:

$$\Psi(\boldsymbol{\psi}\_{\mathsf{D}\mathfrak{s}}^{\boldsymbol{n}},\boldsymbol{\upsilon})\_{\mathsf{L}^{2}(\mathcal{Q})} := A\_{\mathsf{D}\mathfrak{s}}(\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{s}}^{\boldsymbol{n}};\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{s}}^{\boldsymbol{n}},\boldsymbol{\upsilon}) - A\_{\mathsf{D}\mathfrak{s}}(\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{s}}^{\boldsymbol{n}-1};\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{s}}^{\boldsymbol{n}},\boldsymbol{\upsilon}) \qquad \forall \boldsymbol{\upsilon} \in \mathsf{V}\_{\mathsf{D}\mathfrak{s}}(\mathcal{R}\_{\mathfrak{h}},\mathbf{p}).\tag{11}$$

We observe that, if (1) is linear, then we immediately obtain *ψ<sup>n</sup>* DG = 0. 3. *Linear solver error*, which is described by a residual *λn* DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*:

$$(\lambda\_{\mathsf{Da}}^{\eta}, \upsilon)\_{\mathsf{L}^{2}(\mathcal{Q})} := A\_{\mathsf{Da}}(\widehat{\mu}\_{\mathsf{Da}}^{\eta - 1}; \widehat{\mu}\_{\mathsf{Da}}^{\eta}, \upsilon) - (f, \upsilon)\_{\mathsf{L}^{2}(\mathcal{Q})} \qquad \forall \upsilon \in \mathbb{V}\_{\mathsf{Da}}(\mathcal{R}\_{\mathsf{h}}, \mathfrak{p}). \tag{12}$$

Note that, if (8) is solved exactly, then we have <sup>B</sup>*un*−<sup>1</sup> DG <sup>=</sup> *<sup>u</sup>n*−<sup>1</sup> DG and <sup>B</sup>*u<sup>n</sup>* DG <sup>=</sup> *<sup>u</sup><sup>n</sup>* DG, and it follows that *λn* DG = 0.

*Remark 2* Since VDG*(Th, p)* may not need to be continuous along element interfaces, the linearization and linear solver residuals *ψ<sup>n</sup>* DG and *λn* DG, respectively, can be computed elementwise, i.e., in parallel, and, hence, at a low computational cost.

The aim of the analysis in the following section is to investigate the above residuals, and then to provide a computable *a posteriori* error estimate for the error *<sup>u</sup>* <sup>−</sup> B*u<sup>n</sup>* DGDG between the solution *<sup>u</sup>* of (1) and <sup>B</sup>*u<sup>n</sup>* DG <sup>∈</sup> <sup>V</sup>DG*(Th, <sup>p</sup>)*.

#### *2.4* **A Posteriori** *Error Estimation*

In order to bound the residual *ρ<sup>n</sup>* DG in (10), we apply an elliptic reconstruction technique along the lines of the works [13, 15], see also [7]. Specifically, we define an auxiliary function <sup>3</sup>*u<sup>n</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)* to be the unique solution of the weak formulation

$$A(\widetilde{\boldsymbol{u}}^n; \widetilde{\boldsymbol{u}}^n, \boldsymbol{v}) = (f + \psi\_{\sf{co}}^n + \lambda\_{\sf{co}}^n, \boldsymbol{v})\_{\sf{L}^2(\varOmega)} \qquad \forall \boldsymbol{v} \in \operatorname{H}\_0^1(\varOmega),$$

where *ψ<sup>n</sup>* DG and *λn* DG are the linearization and linear solver residuals from (11) and (12), respectively. Upon adding (11) and (12), we notice that

$$A\_{\mathsf{D}\mathfrak{a}}(\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{g}}^{\boldsymbol{n}};\widehat{\boldsymbol{\mu}}\_{\mathsf{D}\mathfrak{g}}^{\boldsymbol{n}},\boldsymbol{v}) = (\boldsymbol{f} + \boldsymbol{\psi}\_{\mathsf{D}\mathfrak{g}}^{\boldsymbol{n}} + \boldsymbol{\lambda}\_{\mathsf{D}\mathfrak{a}}^{\boldsymbol{n}},\boldsymbol{v})\_{\mathsf{L}^{2}(\mathfrak{Q})} \qquad \forall \boldsymbol{v} \in \mathbb{V}\_{\mathsf{D}\mathfrak{a}}(\mathcal{R}\_{\mathfrak{h}},\mathbf{p}).$$

In particular, we observe that B*u<sup>n</sup>* DG is the DGFEM approximation of <sup>3</sup>*u<sup>n</sup>* based on employing the (nonlinear) DGFEM scheme defined in (5). In particular, we may exploit the *a posteriori* error estimate in Assumption 1 to infer the computable bound

$$\|\widetilde{\boldsymbol{u}}^{n} - \widetilde{\boldsymbol{u}}\_{\mathsf{D}\mathfrak{g}}^{n}\|\_{\mathsf{D}\mathfrak{g}} \leq \eta (\widehat{\boldsymbol{u}}\_{\mathsf{D}\mathfrak{g}}^{n}, \boldsymbol{f} + \boldsymbol{\psi}\_{\mathsf{D}\mathfrak{g}}^{n} + \boldsymbol{\lambda}\_{\mathsf{D}\mathfrak{g}}^{n}).\tag{13}$$

We now turn to bounding the elliptic reconstruction error *<sup>u</sup>* <sup>−</sup> 3*u<sup>n</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)*; to this end, we first observe that *<sup>u</sup>* <sup>−</sup> <sup>3</sup>*un*DG = *<sup>u</sup>* <sup>−</sup> <sup>3</sup>*un*H<sup>1</sup> 0*(Ω)*. Then, employing the strong monotonicity property (4), and recalling the weak formulation (3), we obtain

$$\begin{aligned} \|\alpha \|\boldsymbol{\mu} - \widetilde{\boldsymbol{u}}^{n}\|\_{\mathbb{D}\mathbb{D}}^{2} &\leq A(\boldsymbol{\mu};\boldsymbol{\mu};\boldsymbol{\mu} - \widetilde{\boldsymbol{u}}^{n}) - A(\widetilde{\boldsymbol{u}}^{n};\widetilde{\boldsymbol{u}}^{n},\boldsymbol{\mu} - \widetilde{\boldsymbol{u}}^{n}) \\ &= -(\psi\_{\mathsf{D}\boldsymbol{\alpha}}^{n}, \boldsymbol{\mu} - \widetilde{\boldsymbol{u}}^{n})\_{\mathsf{L}^{2}(\varOmega)} - (\lambda\_{\mathsf{D}\boldsymbol{\alpha}}^{n}, \boldsymbol{\mu} - \widetilde{\boldsymbol{u}}^{n})\_{\mathsf{L}^{2}(\varOmega)}. \end{aligned}$$

Employing the Cauchy-Schwarz inequality, together with the Poincaré-Friedrichs inequality, *v*L2*(Ω)* <sup>≤</sup> *<sup>C</sup>*PF∇*v*L2*(Ω)* for all *<sup>v</sup>* <sup>∈</sup> <sup>H</sup><sup>1</sup> <sup>0</sup>*(Ω)*, where *C*PF *>* 0 is a constant, we deduce that

$$\|\|\mu - \widetilde{\boldsymbol{\mu}}^n\|\|\_{\boxtimes} \le \Psi\_{\boxtimes}^n + \Lambda\_{\boxtimes}^n,\tag{14}$$

where the linearization and linear solver residuals are given, respectively, by

$$\Psi\_{\mathsf{Dg}}^{\mathsf{n}} := \mathsf{C\_{\mathsf{P}\mathsf{F}}} / \mu \left( \sum\_{\kappa \in \mathcal{P}\_{\mathsf{h}}} \left\Vert \psi\_{\mathsf{Dg}}^{\mathsf{n}} \right\Vert\_{\mathbf{L}^{2}(\kappa)}^{2} \right)^{1/2}, \qquad \Lambda\_{\mathsf{b}\mathbf{a}}^{\mathsf{n}} := \mathsf{C\_{\mathsf{P}\mathsf{g}}} / \mu \left( \sum\_{\kappa \in \mathcal{P}\_{\mathsf{h}}} \left\Vert \lambda\_{\mathbf{z}\mathbf{a}}^{\mathsf{n}} \right\Vert\_{\mathbf{L}^{2}(\kappa)}^{2} \right)^{1/2}.$$

Summarizing the above analysis leads to the following result.

**Theorem 1** *Suppose that Assumption 1 is satisfied. Then, given a sequence of (possibly inexact) DGFEM approximations* {B*u<sup>n</sup>* DG}*n*≥<sup>0</sup> <sup>⊂</sup> <sup>V</sup>DG*(Th, p), cf.* (9)*, for n* ≥ 1*, the following* a posteriori *error bound holds:*

$$\|\|\mu - \widehat{\boldsymbol{\mu}}\_{\mathcal{D}\mathbb{G}}^n\|\_{\mathcal{D}\mathbb{G}} \le \eta (\widehat{\boldsymbol{\mu}}\_{\mathcal{D}\mathbb{G}}^n, \boldsymbol{f} + \boldsymbol{\psi}\_{\mathcal{D}\mathbb{G}}^n + \boldsymbol{\lambda}\_{\mathcal{D}\mathbb{G}}^n) + \boldsymbol{\Psi}\_{\mathcal{D}\mathbb{G}}^n + \boldsymbol{\Lambda}\_{\mathcal{D}\mathbb{G}}^n.$$

*Here, u is the analytical solution of* (1)*, ψ<sup>n</sup>* DG *and λn* DG *are the residuals defined in* (11) *and* (12)*, respectively, and α >* 0 *is the constant occurring in* (2) *and* (4)*.*

*Proof* The result follows immediately upon application of the triangle inequality, i.e., *<sup>u</sup>* <sup>−</sup> B*u<sup>n</sup>* DGDG ≤ *<sup>u</sup>* <sup>−</sup> <sup>3</sup>*un*DG + 3*u<sup>n</sup>* <sup>−</sup> <sup>B</sup>*u<sup>n</sup>* DGDG, and inserting the bounds (13) and (14).

*Remark 3* We note that the above analysis naturally applies to other finite element schemes, provided that Assumption 1 is satisfied.

#### *2.5 Adaptive Iterative hp-DGFEM Procedure*

In this section we introduce an automatic *hp*-refinement algorithm which ensures that each of the three components of the error, namely discretization, linearization, and linear solver, are controlled in a suitable fashion. To this end, we propose the following strategy, cf. [9].

**Algorithm 1** *Given a (coarse) starting mesh Th, with an associated (loworder) polynomial degree distribution <sup>p</sup>, and an initial guess <sup>u</sup>*ˆ<sup>0</sup> DG <sup>∈</sup> <sup>V</sup>DG*(Th, p). Set n* ← 1*.*


$$
\Psi\_{\mathsf{DG}}^n + \Lambda\_{\mathsf{DG}}^n \le \mathcal{T}\,\eta(\widehat{\mu}\_{\mathsf{DG}}^n, f + \psi\_{\mathsf{DG}}^n + \lambda\_{\mathsf{DG}}^n) \tag{15}
$$

*holds, for some given parameter Υ >* 0*, then hp-adaptively refine the space* VDG*(Th, p); go back to step (*1:*) with the new mesh T<sup>h</sup> (and based on the previously computed solution* B*u<sup>n</sup>* DG*interpolated on the refined mesh).*


In Step 2 of Algorithm 1, if (15) is fulfilled then the space VDG*(Th, p)* is adaptively *hp*-refined based on first marking elements for refinement according to the size of the local element indicators *ηκ* , cf. (7). To this end, we exploit the maximal strategy whereby elements are marked for refinement which satisfy the condition *ηκ >* <sup>1</sup>*/*<sup>3</sup> max*κ*∈*T<sup>h</sup> ηκ .* Secondly, once an element *κ* ∈ *T<sup>h</sup>* has been marked for refinement, we undertake either local mesh subdivision or local polynomial enrichment based on employing the *hp*-refinement criterion developed within the article [8]. Finally, when (15) is not fulfilled, rather than determining which source of error, i.e., the (computable) quantities *Ψ<sup>n</sup>* DG or *Λn* DG from (11) and (12), respectively, is dominant, we choose to always undertake a further linearization step, and hence a further linear solver step is also computed, since this ensures that the most up to date approximation B*u<sup>n</sup>* DG is employed at all times.

#### **3 Application to Quasilinear Elliptic PDEs**

In this section we present numerical experiments to highlight the performance of the proposed iterative *hp*-refinement procedure outlined in Algorithm 1. To this end, we set the interior penalty parameter constant *γ* to 10 and the steering parameter *Υ* to 1*/*4. The solution of the resulting set of linear equations is computed using an ILU(0) preconditioned GMRES algorithm.

For the first numerical experiment, we let *<sup>Ω</sup>* <sup>=</sup> *(*0*,* <sup>1</sup>*)*<sup>2</sup> and define the nonlinear coefficient as *μ(*|∇*u*|*)* <sup>=</sup> <sup>2</sup> <sup>+</sup> *(*<sup>1</sup> + |∇*u*|*)*−1. The right-hand forcing function *<sup>f</sup>* is selected so that the analytical solution to (1) is given by *u(x, y)* = *x(*1 − *x)y(*<sup>1</sup> <sup>−</sup> *y)(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*y)*e−20*(*2*x*−1*)*<sup>2</sup> . In Fig. 1 we present a comparison of the actual error measured in terms of the energy norm versus the square root of the number of degrees of freedom in VDG*(Th, p)*. From Fig. 1a we clearly observe exponential convergence of the proposed *hp*-refinement strategy as VDG*(Th, p)* in enriched. Furthermore, in Fig. 1b we plot the individual residual error indicators; for this smooth problem, we notice that the discretization indicator (denoted as *η<sup>n</sup>* in the figure) is always dominant, while the linearization and linear solver residuals (denoted as *Ψ<sup>n</sup>* and *λn*, respectively) are roughly of the same magnitude.

Secondly, we let *<sup>Ω</sup>* denote the L-shaped domain *(*−1*,* <sup>1</sup>*)*<sup>2</sup> \ [0*,* <sup>1</sup>*)*×*(*−1*,* <sup>0</sup>] ⊂ <sup>R</sup><sup>2</sup> and select *μ(*|∇*u*|*)* = 1 + exp*(*−|∇*u*| <sup>2</sup>*).* By writing *(r, ϕ)* to denote the system of polar coordinates, we choose the forcing function *f* and an inhomogeneous boundary condition such that the analytical solution to (1) is *<sup>u</sup>* <sup>=</sup> *<sup>r</sup>*2*/*<sup>3</sup> sin 2*/*3*ϕ* , cf. [3]. In Fig. 2 we now present a comparison of the actual error measured in terms of the energy norm versus the third root of the number of degrees of freedom in VDG*(Th, p)*; as before we again attain exponential convergence of the proposed

**Fig. 1** Example 1. (**a**) Comparison of the DGFEM norm of the error and the *a posteriori* bound, with respect to the square root of the number of degrees of freedom; (**b**) individual error estimators

**Fig. 2** Example 2. (**a**) Comparison of the DGFEM norm of the error and the *a posteriori* bound, with respect to the third root of the number of degrees of freedom; (**b**) individual error estimators

*hp*-refinement strategy as VDG*(Th, p)* is adaptively refined, though convergence of the *a posteriori* error estimator is no longer monotonic. Indeed, from Fig. 2b, we observe that once an *hp*-mesh refinement has been undertaken, then several linearization/solver steps may be required to ensure that the numerical solution has been computed to a sufficient accuracy before future refinements may be undertaken.

#### **4 Conclusions**

In this article we have derived a computable *hp*-version *a posteriori* error bound for the DGFEM approximation of a second-order quasilinear elliptic PDE problem, whereby a discrete Kacanov iterative linearization scheme is employed. The ˇ resulting computable upper bound directly takes into account discretization error, as well as the errors stemming from linearization and the underlying linear solver. Numerical experiments highlighting the performance of this bound within an automatic *hp*-refinement algorithm are presented.

**Acknowledgements** TW acknowledges the support of the Swiss National Science Foundation (SNF), Grant No. 200021\_162990.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Erosion Wear Evaluation Using Nektar++**

**Manuel F. Mejía, Douglas Serson, Rodrigo C. Moura, Bruno S. Carmo, Jorge Escobar-Vargas, and Andrés González-Mancera**

### **1 Introduction**

Wear is a common phenomenon on many machines and devices, it is characterised by the removal or loss of material. Erosion wear is a particular wear process which occurs when solid particles or droplets, carried by a fluid (liquid or gas), impact on a solid surface [1]. Turbomachinery such as pumps, turbines and pipe accessories (i.e. tees, elbows, nozzles, valves), are examples of elements affected by the erosion wear, decreasing the performance and the lifetime. In many industrial sectors e.g. energy and mining, and oil & gas; massive amounts of resources are used for maintenance and replacement of affected parts [2–4]. Despite this phenomenon have been broadly investigated [5–14] there are still unsolved challenges in establishing the influence of small eddies during the erosion process leading to modest accuracy levels in the simulation results.

M. F. Mejía (-)

D. Serson · B. S. Carmo Universidade de São Paulo (USP), São Paulo, Brazil

R. C. Moura Instituto Tecnológico de Aeronáutica, São Paulo, Brazil

J. Escobar-Vargas Pontificia Universidad Javeriana, Bogotá, D.C., Colombia

A. González-Mancera Universidad de Los Andes, Bogotá, D.C., Colombia

© The Author(s) 2020 S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_33

419

Universidad de Los Andes, Bogotá, D.C., Colombia

Universidad Central, Bogotá, D.C., Colombia e-mail: mf.mejia@uniandes.edu.co

Due to the microscopic nature of erosion, the smallest scales in the flow play a fundamental role in the complete process. One of the aspects which has not been carefully studied in erosion wear modelling is the effect that the smaller eddies and secondary flows have on the particles interactions with the surface. In general, these secondary flows could not be represented using linear Reynolds Average Navier Stokes (RANS) simulations, this is mainly because the Reynolds stress imbalance is neglected and the secondary flow does not develop. As was mention by Gross and Fasel [15], predictions of the secondary flow require non-linear Reynolds stress, full Reynolds-stress models, Large Eddy Simulations (LES) or Direct Numerical Simulation (DNS). Due to their relatively low computational cost, RANS models often used to predict on erosion using CFD in industrial simulations. The inclusion of smaller eddies and secondary flows in the simulation could be a major breakthrough in the modelling of erosion process. In order to capture in an accurate way the physics related with the small eddies and secondary flows, a numerical technique capable to represent those processes, is needed. As emphasised by Jacobs [16], the use of spectral methods could allow increased accuracy in the simulation due to the potential to simulate a wider range of scales. With this in mind, the purpose of this work is to assess the impact of higher resolution methods on the prediction of erosion wear rate and distribution.

#### *1.1 Spectral Methods*

Several numerical techniques are used to solve Navier Stokes (NS) equations. Some of them are finite differences, finite volumes and finite elements. Nevertheless, when high accuracy is required the use of a lot of elements is needed in the modelling, which significantly increase the computational cost [17, 18]. Hence novel methods are subject of research to offer a better rate accuracy and computational cost.

Among novel numerical methods considered nowadays are spectral methods, which have shown to be a powerful tool with high level of accuracy for solving large problems in computational fluid dynamics (CFD), according to the available literature, especially in the studies developed by Boyd [19], Canuto et al. [20–22], Trefethen [23, 24] and Sherwin [25, 26]. Nektar++ is an open-source software framework designed to support the development of high-performance scalable solvers for partial differential equations using the spectral/hp element method[27]. High Order CFD methods have been receiving considerable attention in the past two decades. Traditional CFD software could be replaced by high order code in many applications in few years [28].

#### *1.2 Particles Tracking*

To the best of the authors' knowledge, there is no work that uses high order methods to evaluate erosion wear rate. This research aims to assess the impact of higher resolution methods on the prediction of erosion wear rate and distribution. It comprises the solution of fluid flow using incompressible NS solver with implicit LES modelling, the implementation of a Lagrangian particle tracking model and the later data processing through traditional erosion rate models but using the available high order information. This could allow the evaluation of traditional rate models with more spatial resolution and accuracy.

The Lagrangian particle tracking model is based on one-way coupling approach, that is the most simple case when just the iteration between the fluid and each particle is taking into account in just one way. That means that the particles are moved by the fluid but the fluid flow is not perturbed by the particles. Moreover, the effects of the collision between particles are also neglected. The one-way coupling model is valid for volume concentrations of particles lower than 10−<sup>6</sup> [29, 30].

The problem of predicting particle motion in a fluid flow can be predicted by solving an evolution equation in time:

$$\frac{d\mathbf{v}\_p}{dt} = F(\mathbf{u}, \rho, \rho\_p, C\_d, \dots) \quad ; \quad \frac{d\mathbf{x}\_p}{dt} = \mathbf{v}\_p \tag{1}$$

where **v***<sup>p</sup>* and **x***<sup>p</sup>* are the particle velocity and position and *F* is a function of the velocity of the fluid **u**, the density of the fluid *ρ* and particle density *ρp*, among others.

To start, it is necessary to obtain the velocity on a certain point from the eulerian velocity field. This process consists of finding the element containing the particle and interpolating the velocity with the element information. In a higher-order velocity element field, the use of linear interpolation is inaccurate and could vanish the advantage won with the use of high order methods. On the other hand, using high order interpolation could be computationally expensive. Therefore, special attention to this procedure is required [16, 31, 32].

#### *1.3 Erosion Wear Evaluation*

Once the information about the collisions is complete, the erosion wear model is used to predict the pattern of material removed. The general erosion equation, based on the work of Finnie [33–38] can be presented as

$$W = kF\_s V\_p^n f(\theta) \tag{2}$$

*W* is the erosion rate or material removed by collision, *k* is a wall material dependent constant, *Fs* is the particle geometric factor, *Vp* and *n* are the collision velocity and the velocity exponent, and *f (θ )* is a function of collision angle. Several authors define these values for different materials configurations and test cases. Three of the most used models, which include experimental results are the jet impingement test [39–45], elbow erosion [46–49], and the works of the Wong et al. [4, 46, 50, 51].

#### **2 Implementation**

This section describes the implementation of erosion wear in Nektar++. To achieve this objective is important to have in mind the partition of the problems into two parts. The first one is the particle tracking as a filter within the Nektar++ incompressible Navier Stokes solver. A filter in Nektar++ is a module for calculating a variety of useful quantities from the field variables as the solution evolves in time [27].

The second one is implemented as a FieldConvert module to evaluate the erosion of each collision and generate the fields on the boundaries walls. FieldConvert is a utility embedded in Nektar++ with the primary aim of allowing the user to work with the Nektar++ output files, some of the modules within FieldConvert allow the user to postprocess the output data [27].

#### *2.1 Particles Tracking*

The first step was the implementation of a ODE time solver. Several options are available, but having into account the discrete time flow fields calculated with the Navier Stokes incompressible solver, and to avoid the use of temporal interpolation, the selected option was the Adams-Bashforth (AB) and Adams-Moulton (AM) schemes.

The implementation was tested with a benchmark case presented in [31]. In this model, the particle velocity is the fluid velocity at certain point and the evolution equation is reduced to one equation; Eq. 1 is reduced to:

$$\frac{d\mathbf{x}\_p}{dt} = \mathbf{u} \tag{3}$$

To solve this system a Time-Marching Method was implemented, meaning that the future values are evaluated using the present and past values of the variables. Explicit AB and Implicit AM methods were implemented using first to fourth integration order. The error values obtaining using AB and AM with different order presents features from this kind of methods.

The next step was the implementation of the solid particles. In this case, the momentum equation is evaluated on each particle, resulting:

$$\frac{d\mathbf{v}\_p}{dt} = F\_d \left(\mathbf{u} - \mathbf{v}\_p\right) + \mathbf{g} \frac{\rho\_p - \rho}{\rho\_p} \quad ; \qquad \frac{d\mathbf{x}\_p}{dt} = \mathbf{v}\_p \tag{4}$$

$$F\_d = \frac{3}{4} \frac{C\_d Re\_p \rho\_p}{\nu d\_p^2} \tag{5}$$

$$Re\_p = \frac{(\mathbf{u} - \mathbf{v}\_p)d\_p}{\nu} \tag{6}$$

$$C\_d = \begin{cases} 24/Re\_p, & Re\_p < 0.5\\ 24/Re\_p \left(1 + 0.15 Re\_p^{0.687}\right), & 0.5 < Re\_p < 1000\\ 0.44, & Re\_p > 1000 \end{cases} \tag{7}$$

where *Fd* is the drag force, *Rep* is the Reynolds Number based on the diameter of the particle, **g** is the gravity acceleration, and *Cd* is the drag force evaluated on each particle.

Figure 1 shows a diagram of the evolution equation. Current position, velocities and forces are evaluated to get the future positions (BP, OP) until the next position is located outside of the domain (NP). When this happens, the evolution algorithm stop and a function is used to evaluate the collision point (CP) and the position after of collision (NP') using the high order information about the walls.

#### *2.2 Erosion Wear Evaluation*

Erosion rate per collision (Eq. 2) has to be integrated over each element of the eroded surface. For each particle collision, more material is removed from the surface, the elemental erosion rate has to take into account this cumulative effect over the surface.

As mentioned before, the set of parameters used in this work, has been based on experimental data. One of the most used parameter set is the one proposed by Erosion group of the University of Tulsa [38, 48, 52]. The erosion rate takes the form of Eq. 2, *Fs* = 1 for sharp (angular), 0*.*53 for semi-rounded, or 0*.*2 for fully rounded sand particles. *Vp* is the impact velocity and *n* = 1*.*73. The angle function has the form:

**Fig. 1** Evolution of particle tracking

All the parameters and empirical constants depend on the material being eroded. For velocity in ft/s, the steel-sand parameters are: *a* = 38*.*4, *b* = 22*.*7 *φ* = 1, *x* = 0*.*3147, *y* = 0*.*03609, *w* = 0*.*2532 and *z* = 0 [53].

#### **3 Test Case**

To test the new feature in Nektar++, a Backward Facing Step (BFS) model was developed based on the experimental setup of [30, 32, 54] showed in Fig. 2. In the model developed in this work, the simulations were done with the addition of gravitational effects on the −*y* direction. In original experiments the air at the inlet is a well development turbulent flow (*u*¯ = 10*.*5 m/s), this is used a inlet condition and, to complete the model, a zero pressure condition at the output. The additional boundaries were set as walls. A zero velocity field was set as initial condition. The particles used have a 70µm diameter and 8808 kg*/*m3 density.

Figure 3 (top) shows a snapshot of the velocity field when the statistically stationary regime is reached (*t* = 8 s), next the particles are released and were convected by the flow. Particular trajectories are shown in grey lines in Fig. 3 (bottom). In the same figure, results of the particle collision with the walls, computed with Eq. 2, are also shown.

From the results presented Fig. 3, the typical BFS velocity profiles can be recognised. It is important to note the details behind the step, the main flow originates the secondary eddies and defines the limit of the recirculation zone (x/H = 7 from the step) where backflow occurs. Additionally, interesting details appear in between each main velocity flow ripple and the walls along the x-direction.

It is noteworthy that particle tracking is evaluated using a steady velocity field, therefore the existence of several irregularities is expected, for instance, particle trajectories inside recirculation zone. Erosion rate depends on the number of collisions at specific points. It is a localised phenomenon that does not occur continuously in the domain. Its distribution shows a strong dependence on the flow dynamics.

**Fig. 2** Geometry of the Backward Facing Step setup. The initial velocity was set to get a *Re* = 18*,*600

**Fig. 3** BFS case results. Top: Velocity field at a statistically stationary condition. Bottom: Distribution of the particles inside the flow (gray lines). The colours in the walls indicate the location of the normalised erosion rate

#### **4 Conclusion and Future Work**

This work presented a method developed to asses the erosion wear rate using a high-order (spectral) element based technique on a modified test case implemented in Nektar++. The methodology proposed in this study have a potential to increase the accuracy when solving this kind of problems. Future research activities are going to be focused on the determination of accuracy improvements and optimisation of the proposed methodology. Several more cases have to be tested to produce solid conclusions about the implemented methodology, as well as a detailed comparison with experimental test cases.

Despite the methodology implemented had several important simplifications, as the use of one-way coupling and the few forces taken into account, allowed quicker implementations and results. This work would be an interesting starting to implement this kind of simulations using Nektar++. However, to run more realistic cases, additional research efforts are required for the implementation of two-way and four-way coupling and the effects of other forces over each particle.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **An Inexact Petrov-Galerkin Approximation for Gas Transport in Pipeline Networks**

**Herbert Egger, Thomas Kugler, and Vsevolod Shashkov**

### **1 Introduction**

The flow of gas in a horizontal pipeline of constant cross section is described by [2]

$$A\partial\_l \rho + \partial\_\chi m = 0 \tag{1}$$

$$
\partial\_t m + \partial\_\chi \left(\frac{m^2}{A\rho} + Ap\right) = -\frac{\lambda}{2D} \frac{|m|}{A\rho} m. \tag{2}
$$

Here *A* and *D* are the cross section and diameter of the pipe, and *λ* is a dimensionless friction parameter. The functions *ρ*, *p*, and *m* describe the density, pressure, and mass flow rate of the gas. Under isothermal flow conditions, one has

$$p = c^2 \rho \tag{3}$$

with constant *c* denoting the speed of sound. In practically relevant scaling regimes, the nonlinear term on the left hand side of (2) is usually neglected, which can be justified by an asymptotic analysis [2, 7]. Using this simplification and Eq. (3) to

H. Egger (-) · T. Kugler

TU Darmstadt, Department of Mathematics, Darmstadt, Germany e-mail: egger@mathematik.tu-darmstadt.de; kugler@mathematik.tu-darmstadt.de

V. Shashkov

TU Darmstadt, GSC Computational Engineering, Darmstadt, Germany e-mail: shashkov@gsc.tu-darmstadt.de

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_34

eliminate the density, one arrives at evolution problems of the general form

$$a\partial\_t p + \partial\_\lambda m = 0\tag{4}$$

$$b\partial\_l m + \partial\_\chi p = -dm\tag{5}$$

where *a* and *b* are positive constants and *d* = *d(p, m)* denotes a state dependent friction coefficient. For our analysis, we will consider *d* = *d(x)* as a function depending only on space which can be justified, e.g., by linearization around a steady state. Corresponding models for the gas flow on pipe networks are obtained by coupling the flow equations for single pipes via algebraic conditions [9, 10]; see below.

The discretization of (4)–(5) and its extension to pipeline networks has been discussed intensively in the literature. In [9], a Galerkin approximation for (1)–(2) with cubic Hermite polynomials is investigated numerically. The discretization of transient gas flow models is also studied [2, 5, 8]. An entropy stable finite volume method is proposed in [10], and an energy stable mixed finite element approximation is investigated in [3]. Apart from [9], all methods discussed above are of lowest order and no rigorous convergence analysis is given.

In this paper, we study the discretization of (4)–(5) by a Petrov-Galerkin approach of potentially high order. The resulting scheme is shown to be stable which allows us to prove order optimal convergence rates. By using an appropriate functional analytic setting, the convergence results can be generalized almost verbatim to pipeline networks. A hybridization strategy will be discussed that facilitates the implementation and that allows to incorporate non-standard coupling conditions. The proposed method formally also allow to treat nonlinear models of gas transport and, in principle, high order convergence can be obtained in practically relevant regimes.

#### **2 Notation and Preliminaries**

Let *xL < xR* and denote by *Lp(xL, xR)* and *<sup>W</sup>k,p(xL, xR)*, *<sup>k</sup>* <sup>≥</sup> 0 the standard Lebesgue and Sobolev spaces. The scalar product and norm of *L*<sup>2</sup>*(xL, xR)* are written as *(v, w)* and *v*=*vL*<sup>2</sup> . Other norms will be designated by subscripts. We write *<sup>H</sup>k(xL, xR)* <sup>=</sup> *<sup>W</sup>k,*<sup>2</sup>*(xL, xR)* for the Hilbert spaces and define

$$H\_0^1 = \{ v \in H^1(\mathbf{x}\_L, \mathbf{x}\_R) : v(\mathbf{x}\_L) = v(\mathbf{x}\_R) = 0 \} \quad \text{and} \quad H(\text{div}) = H^1(\mathbf{x}\_L, \mathbf{x}\_R)$$

for convenience. The reason for introducing the space *H (*div*)* will become clear when considering networks, where the spaces *H*<sup>1</sup> and *H (*div*)* have different continuity properties across junctions. By *Lp(*0*, T* ; *X)* and *<sup>W</sup>k,p(*0*, T* ; *X)* we denote the Bochner spaces of functions *f* : [0*, T* ] → *X* with values in *X*. The value of *f (t)* may then itself be a function. In the following, we consider the linear system

$$a\partial\_t p(\mathbf{x}, t) + \partial\_\mathbf{x} m(\mathbf{x}, t) = f(\mathbf{x}, t), \tag{6}$$

$$b\partial\_l m(\mathbf{x}, t) + \partial\_\mathbf{x} p(\mathbf{x}, t) + d(\mathbf{x}) m(\mathbf{x}, t) = \mathbf{g}(\mathbf{x}, t), \tag{7}$$

for *xL <x<xR* and *t >* 0 with homogeneous boundary conditions

$$p(\mathbf{x}\_L, t) = p(\mathbf{x}\_R, t) = 0. \tag{8}$$

Inhomogeneous and more general boundary conditions can be considered as well and our analysis applies with minor modifications. We will assume that

(A1) *a*, *b* are positive constants, and (A2) *d* ∈ *L*∞*(xL, xR)* with 0 *< d* ≤ *d(x)* ≤ *d* and constants *d, d*.

For given *f, g* <sup>∈</sup> *<sup>L</sup>*2*(*0*, T* ; *<sup>L</sup>*<sup>2</sup>*(xL, xR))* and initial values *p(*0*)* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> , *m(*0*)* ∈ *H (*div*)*, existence of a unique solution follows from semigroup theory. Any smooth solution of problem (6)–(8) also satisfies *p(t)* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> , *m(t)* ∈ *H (*div*)*, and

$$(a\partial\_t p(t), \widetilde{q}) + (\partial\_\lambda m(t), \widetilde{q}) = (f(t), \widetilde{q}) \tag{9}$$

$$(b\partial\_l m(t), \widetilde{\boldsymbol{v}}) + (\partial\_\chi p(t), \widetilde{\boldsymbol{v}}) + (dm(t), \widetilde{\boldsymbol{v}}) = (\boldsymbol{g}(t), \widetilde{\boldsymbol{v}}) \tag{10}$$

for all <sup>3</sup>*v,*3*<sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup>*(xL, xR)* and all 0 *<t<T* . This variational characterization will be the starting point for our discretization approach introduced in the next section.

#### **3 Petrov-Galerkin Approximation**

Let *xL* = *x*<sup>0</sup> *< x*<sup>1</sup> *< ... < xN* = *xR* be a partition of the interval [*xL, xR*] into elements *Tn* = [*xn*−<sup>1</sup>*, xn*]. We call *Th* := {*Tn* : 1 ≤ *n* ≤ *N*} the mesh and denote by *hn* = |*xn* − *xn*−1| and *h* = max*<sup>n</sup> hn* the local and global mesh size, respectively. Let

$$P\_k(T\_h) := \{ v \in L^2(\mathbf{x}\_L, \mathbf{x}\_R) : v|\_T \in P\_k(T) \,\forall T \in T\_h \}\tag{11}$$

be the space of piecewise polynomials on the mesh *Th*. We fix *k* ≥ 1 and search for approximations for the solutions *p(t)*, *m(t)* of problem (6)–(8) in the spaces

$$Q\_h = P\_k(T\_h) \cap H\_0^1 \quad \text{and} \quad V\_h = P\_k(T\_h) \cap H(\text{div}) \tag{12}$$

of continuous piecewise polynomials with appropriate boundary conditions. As finite dimensional test spaces for the variational problem (9)–(10), we choose

$$
\widetilde{\mathcal{Q}}\_h = P\_{k-1}(T\_h) \quad \text{and} \quad \widetilde{V}\_h = P\_{k-1}(T\_h) \tag{13}
$$

consisting of discontinuous piecewise polynomials of lower order *k* − 1. We denote by *I <sup>k</sup> <sup>h</sup>* : *<sup>H</sup>*<sup>1</sup>*(xL, xR)* <sup>→</sup> *Pk (Th)* <sup>∩</sup> *<sup>H</sup>*<sup>1</sup>*(xL, xR)* the *<sup>H</sup>*1-projection operator, defined by

$$(I\_h^k v)(\mathbf{x}\_k) = v(\mathbf{x}\_k) \qquad \text{ for all } 0 \le k \le N,\tag{14}$$

$$\text{and} \qquad (\partial\_{\chi} I\_h^k v, \widetilde{v}\_h) = (\partial\_{\chi} v, \widetilde{v}\_h) \qquad \text{for all } \widetilde{v}\_h \in P\_{k-1}(T\_h), \tag{15}$$

and let *πk*−<sup>1</sup> *<sup>h</sup>* : *<sup>L</sup>*<sup>2</sup>*(xL, xR)* <sup>→</sup> *Pk*−<sup>1</sup>*(Th)* be the *<sup>L</sup>*2-orthogonal projection, satisfying

$$(\pi\_h^{k-1}\upsilon,\widetilde{\upsilon}\_h) = (\upsilon,\widetilde{\upsilon}\_h) \qquad \text{for all } \widetilde{\upsilon}\_h \in P\_{k-1}(T\_h). \tag{16}$$

Note that both projection operators *I <sup>k</sup> <sup>h</sup>* and *<sup>π</sup>k*−<sup>1</sup> *<sup>h</sup>* can be defined locally on every element. Moreover, they are mutually related to each other by the *commuting diagram* property

$$
\partial\_{\mathbf{x}} I\_h^k v = \pi\_h^{k-1} \partial\_{\mathbf{x}} v \qquad \text{for all } v \in H^1(\mathbf{x}\_L, \mathbf{x}\_R). \tag{17}
$$

For the approximation of problem (6)–(8), we then use the following approximation.

**Problem 1 (Inexact Petrov-Galerkin Method)** Find functions *ph* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> *(*0*,T* ;*Qh)*, *mh* <sup>∈</sup> *<sup>H</sup>*1*(*0*, T* ; *Vh)* with *ph(*0*)* <sup>=</sup> *<sup>I</sup> <sup>k</sup> <sup>h</sup>p(*0*)* and *mh(*0*)* <sup>=</sup> *<sup>I</sup> <sup>k</sup> <sup>h</sup>m(*0*)*, and such that

$$(a\partial\_t p\_h(t), \widetilde{q}\_h) + (\partial\_\lambda m\_h(t), \widetilde{q}\_h) = (f(t), \widetilde{q}\_h) \tag{18}$$

$$(b(\partial\_l m\_h(t), \widetilde{v}\_h) + (\partial\_{\overline{x}} p\_h(t), \widetilde{v}\_h) + (d\pi\_h^{k-1} m\_h(t), \widetilde{v}\_h) = (g(t), \widetilde{v}\_h) \tag{19}$$

for all <sup>3</sup>*qh* <sup>∈</sup> *<sup>Q</sup>*3*<sup>h</sup>* <sup>=</sup> *Pk*−<sup>1</sup>*(Th)* and <sup>3</sup>*vh* <sup>∈</sup> *<sup>V</sup>* <sup>3</sup>*<sup>h</sup>* <sup>=</sup> *Pk*−<sup>1</sup>*(Th)*, and for all 0 <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>* .

The well-posedness of this problem follows from the results of the next section.

#### **4 Discrete Stability Estimates**

We now derive some discrete stability estimates that yield well-posedness of the semidiscrete method and that allow us to establish error estimates of optimal order.

**Lemma 1** *Let ph, mh denote a solution of Problem 1. Then*

$$\begin{aligned} &a\|\pi\_h^{k-1}p\_h(t)\|^2 + b\|\pi\_h^{k-1}m\_h(t)\|^2 \\ &\le C(T)\left(a\|\pi\_h^{k-1}p\_h(0)\|^2 + b\|\pi\_h^{k-1}m\_h(0)\|^2 + \int\_0^t \frac{1}{a} \|\pi\_h^{k-1}f(s)\|^2 + \frac{1}{b} \|\pi\_h^{k-1}g(s)\|^2 ds\right) \end{aligned}$$

*with constant C(T )* ≤ *CT and C independent of T and the solution.*

*Proof* Let us first note that *(πk*−<sup>1</sup> *<sup>h</sup> qh, π<sup>k</sup>*−<sup>1</sup> *<sup>h</sup> q)* <sup>=</sup> *(qh, π<sup>k</sup>*−<sup>1</sup> *<sup>h</sup> q)* for all *q* ∈ *<sup>H</sup>*<sup>1</sup>*(xL, xR)*. By testing (18)–(19) with *qh* <sup>=</sup> *<sup>π</sup>k*−<sup>1</sup> *<sup>h</sup> ph(t)* and *vh* <sup>=</sup> *<sup>π</sup>k*−<sup>1</sup> *<sup>h</sup> mh(t)*, we then get

$$\begin{split} &\frac{d}{dt}\left(\frac{a}{2}\|\pi\_{h}^{k-1}p\_{h}(t)\|^{2}+\frac{b}{2}\|\pi\_{h}^{k-1}m\_{h}(t)\|^{2}\right) \\ &=\left(a\partial\_{t}p\_{h}(t),\pi\_{h}^{k-1}p\_{h}(t)\right)+\left(b\partial\_{t}m\_{h}(t),\pi\_{h}^{k-1}m\_{h}(t)\right) \\ &=-\left(\partial\_{x}m\_{h}(t),\pi\_{h}^{k-1}p\_{h}(t)\right)-\left(\partial\_{x}p\_{h}(t),\pi\_{h}^{k-1}m\_{h}(t)\right)-\left(d\pi\_{h}^{k-1}m\_{h}(t),\pi\_{h}^{k-1}m\_{h}(t)\right) \\ &\quad+\left(\pi\_{h}^{k-1}f(t),\pi\_{h}^{k-1}p\_{h}(t)\right)+\left(\pi\_{h}^{k-1}g(t),\pi\_{h}^{k-1}m\_{h}(t)\right). \end{split}$$

By identity (17), integration-by-parts, and the boundary conditions (8), one can verify that *(∂xmh(t), π<sup>k</sup>*−<sup>1</sup> *<sup>h</sup> ph(t))* <sup>+</sup> *(∂xph(t), π<sup>k</sup>*−<sup>1</sup> *<sup>h</sup> mh(t))* = 0. Via Cauchy-Schwarz and Young inequalities, and using positivity of *d*, we then obtain the estimate

$$\begin{split} &\frac{d}{dt}\left(\frac{a}{2}\|\pi\_{h}^{k-1}p\_{h}(t)\|^{2}+\frac{b}{2}\|\pi\_{h}^{k-1}m\_{h}(t)\|^{2}\right) \\ &=-\left(d\pi\_{h}^{k-1}m\_{h}(t),\pi\_{h}^{k-1}m\_{h}(t)\right)+\left(\pi\_{h}^{k-1}f(t),\pi\_{h}^{k-1}p\_{h}(t)\right)+\left(\pi\_{h}^{k-1}g(t),\pi\_{h}^{k-1}m\_{h}(t)\right) \\ &\leq\frac{a}{2}\left(a\|\pi\_{h}^{k-1}p\_{h}(t)\|^{2}+b\|\pi\_{h}^{k-1}m\_{h}(t)\|^{2}\right)+\frac{1}{2\alpha}\langle\frac{1}{a}\|\pi\_{h}^{k-1}f(t)\|^{2}+\frac{1}{b}\|\pi\_{h}^{k-1}g(t)\|^{2}\rangle. \end{split}$$

The Gronwall lemma and the choice *α* = 1*/T* finally yields the assertion. %&

Note that the above estimate does not yet give full control over the solution. A repeated application, however, allows us to prove the following stability estimate.

**Lemma 2** *Let ph, mh denote a solution of Problem 1. Then*

$$\begin{aligned} &\left\|p\_h(t)\right\|^2 + \left\|m\_h(t)\right\|^2 \\ &\le C'(T) \Big(\left\|\pi\_h^{k-1} p\_h(0)\right\|^2 + \left\|\pi\_h^{k-1} m\_h(0)\right\|^2 + h \left\|\pi\_h^{k-1} \partial\_t p\_h(0)\right\|^2 + h \left\|\pi\_h^{k-1} \partial\_t m\_h(0)\right\|^2 \\ &\quad + \int\_0^t \left\|\pi\_h^{k-1} f(s)\right\|^2 + \left\|\pi\_h^{k-1} g(s)\right\|^2 ds + h \left\|\pi\_h^{k-1} \partial\_t f(s)\right\|^2 + h \left\|\pi\_h^{k-1} \partial\_t g(s)\right\|^2 \Big). \end{aligned}$$

*for all* 0 ≤ *t* ≤ *T with C (T )* = *C T and C independent of T and of the solution. Proof* As a direct consequence of the Poincaré inequality, one has

$$\|\|p\_h\|\| \le \|\pi\_h^{k-1} p\_h\| + h \|\partial\_{\mathbf{x}} p\_h\| \quad \text{and} \quad \|m\_h\| \le \|\pi\_h^{k-1} m\_h\| + h \|\partial\_{\mathbf{x}} m\_h\|.$$

The first terms in these estimates are already covered by Lemma 1. From the two Eqs. (18)–(19) with <sup>3</sup>*qh* <sup>=</sup> *∂xmh(t)* and <sup>3</sup>*vh* <sup>=</sup> *∂xph(t)*, we further deduce that

$$\|\partial\_{\boldsymbol{x}}m\_{h}(t)\|^{2} \le (\|\pi\_{h}^{k-1}f(t)\| + a\|\pi\_{h}^{k-1}\partial\_{t}p\_{h}(t)\|)\|\partial\_{\boldsymbol{x}}m\_{h}(t)\| \qquad \text{and}$$

$$\|\partial\_{\boldsymbol{x}}p\_{h}(t)\|^{2} \le (\|\pi\_{h}^{k-1}g(t)\| + b\|\pi\_{h}^{k-1}\partial\_{t}m\_{h}(t)\| + \overline{d}\|\pi\_{h}^{k-1}m\_{h}(t)\|)\|\partial\_{\boldsymbol{x}}p\_{h}(t)\|.$$

Bounds for *πk*−<sup>1</sup> *<sup>h</sup> ∂tph(t)* and *πk*−<sup>1</sup> *<sup>h</sup> ∂tmh(t)* can be obtained by formally differentiating (18)–(19) with respect to time and applying Lemma 1 for the resulting system. A combination of the above estimates then yields the assertion of the lemma. %&

*Remark 1* Problem 1 formally amounts to a finite dimensional system of differential algebraic equations. From the stability estimates of Lemma 2 and [6, Theorem 4.12], one can deduce that this system is solvable for any choice of admissible initial values. The semidiscretization is thus well-defined. Further note that the stability constants in Lemma 1 and 2 are independent of the polynomial degree *k*.

#### **5 Error Estimates**

As usual, we decompose the error according to *p*−*ph*≤*p*−*<sup>I</sup> <sup>k</sup> <sup>h</sup>p*+*<sup>I</sup> <sup>k</sup> <sup>h</sup> p*−*ph* and *<sup>m</sup>* <sup>−</sup> *mh*≤*<sup>m</sup>* <sup>−</sup> *<sup>I</sup> <sup>k</sup> <sup>h</sup>m*+*<sup>I</sup> <sup>k</sup> <sup>h</sup>m* − *mh* into approximation and discrete error components. The first part can be handled by the following estimates [11]. To simplify notation, we assume that the mesh is quasi-uniform in the following.

**Lemma 3** *Let <sup>w</sup>* <sup>∈</sup> *<sup>H</sup>s*+<sup>1</sup>*(Th),* <sup>0</sup> <sup>≤</sup> *<sup>s</sup>* <sup>≤</sup> *<sup>k</sup>. Then*

$$\|w - I\_h^k w\| \le C \left(\frac{h}{k}\right)^{s+1} |w|\_{s+1;h} \,. \tag{20}$$

*For any <sup>w</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup>*(xL, xR)* <sup>∩</sup> *<sup>H</sup>s(Th),* <sup>0</sup> <sup>≤</sup> *<sup>s</sup>* <sup>≤</sup> *<sup>k</sup>, one has*

$$\|w - \pi\_h^{k-1}w\| \le C \left(\frac{h}{k}\right)^s |w|\_{s;h}.\tag{21}$$

*Here <sup>H</sup>s(Th)* = {*<sup>w</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup>*(xL, xR)* : *<sup>w</sup>*|*<sup>T</sup>* <sup>∈</sup> *<sup>H</sup>s(T )*} *is the space of piecewise smooth functions and* |*w*|*s*;*<sup>h</sup>* := *(* < *<sup>T</sup> ∂<sup>s</sup> xw*<sup>2</sup> *<sup>L</sup>*2*(T ))*1*/*<sup>2</sup> *is the corresponding seminorm. Moreover, the constant C in the estimates is independent of h and k.*

Using Eqs. (9)–(10) and (18)–(19) characterizing the continuous and the discrete solutions, one can see that the discrete error components *<sup>p</sup>*B*h(t)* := *<sup>I</sup> <sup>k</sup> <sup>h</sup>p(t)* − *ph(t)* and *<sup>m</sup>*B*h(t)* := *<sup>I</sup> <sup>k</sup> <sup>h</sup>m(t)* <sup>−</sup> *mh(t)* satisfy Eqs. (18)–(19) with initial values *<sup>p</sup>*B*h(*0*)* <sup>=</sup> <sup>0</sup> and *<sup>m</sup>*B*h(*0*)* <sup>=</sup> 0, and right hand sides given by

$$\begin{aligned} \widehat{f}(t) &:= a \left( I\_h^k \partial\_l p(t) - \partial\_l p(t) \right) \quad \text{and} \\ \widehat{g}(t) &:= b \left( I\_h^k \partial\_l m(t) - \partial\_l m(t) \right) + d \left( \pi\_h^{k-1} I\_h^k m(t) - m(t) \right). \end{aligned}$$

By the a-priori estimates of Lemma 2, one then obtains the following result.

**Lemma 4** *Let d* ∈ *P*0*(Th) be piecewise constant. Then for all* 0 ≤ *t* ≤ *T one has*

*I k <sup>h</sup>p(t)* <sup>−</sup> *ph(t)*<sup>2</sup> + *<sup>I</sup> <sup>k</sup> <sup>h</sup>m(t)* <sup>−</sup> *mh(t)*<sup>2</sup> <sup>≤</sup> *<sup>C</sup>(T ) <sup>h</sup><sup>I</sup> <sup>k</sup> <sup>h</sup> ∂tp(*0*)* <sup>−</sup> *∂tp(*0*)*<sup>2</sup> <sup>+</sup> *<sup>h</sup><sup>I</sup> <sup>k</sup> <sup>h</sup> ∂tm(*0*)* <sup>−</sup> *∂tm(*0*)*<sup>2</sup> + *t* 0 *I k <sup>h</sup>m(s)* <sup>−</sup> *m(s)*<sup>2</sup> + *<sup>I</sup> <sup>k</sup> <sup>h</sup> ∂tp(s)* <sup>−</sup> *∂tp(s)*<sup>2</sup> + *<sup>I</sup> <sup>k</sup> <sup>h</sup> ∂tm(s)* <sup>−</sup> *∂tm(s)*<sup>2</sup> <sup>+</sup> *<sup>h</sup><sup>I</sup> <sup>k</sup> <sup>h</sup> ∂ttp(s)* <sup>−</sup> *∂ttp(s)*<sup>2</sup> <sup>+</sup> *<sup>h</sup><sup>I</sup> <sup>k</sup> <sup>h</sup> ∂ttm(s)* <sup>−</sup> *∂ttm(s)*2*ds ,*

*with a constant C(T )* = *CT and C independent of h, k, T , and of the solution.*

*Proof* We apply Lemma <sup>2</sup> for *<sup>p</sup>*B*h(t)* <sup>=</sup> *<sup>I</sup> <sup>k</sup> <sup>h</sup>p(t)*−*ph(t)* and *<sup>m</sup>*B*h(t)* <sup>=</sup> *<sup>I</sup> <sup>k</sup> <sup>h</sup>m(t)*−*mh(t)* and then estimate the terms on the right hand side of the result step by step. By definition of the initial values, we have *<sup>p</sup>*B*h(*0*)* <sup>=</sup> *<sup>m</sup>*B*h(*0*)* <sup>=</sup> 0. Moreover,

$$
\pi\_h^{k-1} \partial\_t p\_h(0) = \pi\_h^{k-1} f(0) - \partial\_x m\_h(0) = \pi\_h^{k-1} f(0) - \partial\_x I\_h^k m(0),
$$

$$
= \pi\_h^{k-1} f(0) - \pi\_h^{k-1} \partial\_x m(0) = \pi\_h^{k-1} \partial\_l \, p(0),
$$

where we used the definition of the initial value *mh(*0*)* in the second and (17) in the third step. Thus *πk*−<sup>1</sup> *<sup>h</sup> ∂tp*B*h(*0*)*≤*<sup>I</sup> <sup>k</sup> <sup>h</sup> ∂tp(*0*)* − *∂tp(*0*)*, and in a similar manner, one can show *πk*−<sup>1</sup> *<sup>h</sup> ∂tm*B*h(*0*)*≤*<sup>I</sup> <sup>k</sup> <sup>h</sup> ∂tm(*0*)* − *∂tm(*0*)*. This explains the first two terms in the estimate in the lemma. The terms under the integral are derived by estimating *πk*−<sup>1</sup> *<sup>h</sup> f (t)* <sup>B</sup> , *πk*−<sup>1</sup> *<sup>h</sup>* <sup>B</sup>*g(t)* and the derivatives *πk*−<sup>1</sup> *<sup>h</sup> ∂tf (t)* <sup>B</sup> , *πk*−<sup>1</sup> *<sup>h</sup> ∂t*B*g(t)* via the triangle inequality, and noting that

$$d\pi\_h^{k-1}(d\pi\_h^{k-1}I\_h^k m(t) - dm(t)) = d\pi\_h^{k-1}(I\_h^k m(t) - m(t)),$$

where we used that *d* is piecewise constant. %&

*Remark 2* A similar result can be proven for piecewise smooth *<sup>d</sup>* <sup>∈</sup> *<sup>W</sup>*1*,*∞*(Th)* and additional terms of the form *d*−*π*<sup>0</sup> *hdπk*−<sup>1</sup> *<sup>h</sup> p(t)*−*p(t)* arise. For *<sup>d</sup>* <sup>∈</sup> *<sup>W</sup>*1*,*∞*(Th)*, the product of the two terms again has optimal approximation order.

By combination of the above estimates, we finally obtain the following result.

**Theorem 1** *Let (A1)–(A2) hold and <sup>d</sup>* <sup>∈</sup> *<sup>W</sup>*1*,*∞*(Th). Furthermore, let (p, m) be a sufficiently smooth solution of* (6)*–*(8)*. Then for all* 0 ≤ *t* ≤ *T , one has*

$$\|\|p(t) - p\_h(t)\|\| + \|m(t) - m\_h(t)\|\| \le C(u, p, T) \frac{h^{k+1}}{k^k}.$$

For sufficiently smooth solutions, the proposed method thus converges at optimal order in *h* and at almost optimal order in the polynomial degree *k*.

#### **6 Extension to Networks**

We now illustrate that our method and the convergence results of the previous section can be generalized easily to pipe networks. Let *(*V*,* E*)* denote a directed graph with vertices *v* ∈ V and edges *e* ∈ E; see Fig. 1 for illustration. For any edge *<sup>e</sup>* <sup>=</sup> *(v*1*, v*2*)*, we define *ne(v*1*)* = −1 and *ne(v*2*)* <sup>=</sup> 1. The matrix *<sup>N</sup>* with entries *Nij* = *nej (vi)* then is the incidence matrix of the graph. For any vertex *v* ∈ V, we define E*(v)* = {*e* : *e* = *(v,*·*)* or *e* = *(*·*, v)*}, and we set *V*<sup>0</sup> = {*v* ∈ V : |E*(v)*| *>* 1} and V*<sup>∂</sup>* = {*v* ∈ V : |E*(v)*| = 1} which gives a decomposition V = V<sup>0</sup> ∪ V*<sup>∂</sup>* into interior and boundary vertices.

To every edge *<sup>e</sup>*, we associate a positive length *e*, and we identify *<sup>e</sup>* with [0*, e*] in the sequel. This allows us to define spaces *Lp(*E*)* = {*<sup>v</sup>* : *<sup>v</sup>*|*<sup>e</sup>* <sup>∈</sup> *Lp(e)*} and *<sup>H</sup>*1*(*E*)* = {*<sup>v</sup>* <sup>∈</sup> *Lp(*E*)* : *<sup>v</sup>*|*<sup>e</sup>* <sup>∈</sup> *<sup>H</sup>*1*(e)*} of, respectively, integrable and piecewise smooth functions on the graph. The flow of gas in a pipe network is then described as follows: On every edge *e* representing a pipe, we require that

$$a^e \partial\_l p^e + \partial\_\chi m^e = f^e \tag{22}$$

$$b^{e}\partial\_{l}m^{e} + \partial\_{\chi}p^{e} + d^{e}m^{e} = \mathbf{g}^{e},\tag{23}$$

where *<sup>f</sup> <sup>e</sup>* <sup>=</sup> *<sup>f</sup>* <sup>|</sup>*<sup>e</sup>* denotes the restriction of a function *<sup>f</sup>* <sup>∈</sup> *Lp(*E*)* to one edge. The equations for the individual pipes are coupled by algebraic conditions

$$\sum\_{e \in \mathcal{E}(v)} m^e(v) n^e(v) = 0 \qquad \qquad v \in \mathcal{V}\_0 \tag{24}$$

$$p^{\epsilon}(v) = p^{\epsilon'}(v) \qquad v \in \mathcal{V}\_0, \ e, e' \in \mathcal{E}(v) \tag{25}$$

at the pipe junctions, and at the boundary vertices, we assume that

$$p^{\varepsilon}(v) = 0 \qquad v \in \mathcal{V}\_{\partial}.\tag{26}$$

**Fig. 1** Directed graph *(*V*,* E*)* modeling the pipe network topology used for numerical tests *v* <sup>1</sup> *v* <sup>2</sup>

Inhomogeneous and other types of boundary conditions can again be incorporated with minor modifications. For the analysis of the problem, we now utilize the spaces

$$H\_0^1 := \{ p \in H^1(\mathcal{E}) : (2\mathcal{S}) \text{ and (26) are valid} \} \tag{27}$$

$$H(\text{div}) := \{ m \in H^1(\mathcal{E}) : (24) \text{ is valid} \} \tag{28}$$

which are the natural generalization of those used for the analysis on a single pipe. Any solution *(p, m)* of (22)–(26) then again satisfies *p(t)* <sup>∈</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> , *m(t)* ∈ *H (*div*)*, and

$$(a\partial\_t p(t), \widetilde{q}) + (\partial\_\lambda m(t), \widetilde{q}) = (f(t), \widetilde{q}) \tag{29}$$

$$(b\partial\_t m(t), \widetilde{\boldsymbol{v}}) + (\partial\_{\boldsymbol{x}} p(t), \widetilde{\boldsymbol{v}}) + (dm(t), \widetilde{\boldsymbol{v}}) = (\boldsymbol{g}(t), \widetilde{\boldsymbol{q}}) \tag{30}$$

for all <sup>3</sup>*<sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*E*)*, 3*<sup>v</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*E*)*, and all 0 *<t<T* . Here *(v, w)* <sup>=</sup> <sup>&</sup>lt; *<sup>e</sup>(ve, we)e* with *(ve, we)e* <sup>=</sup> ; *<sup>e</sup> <sup>v</sup>ewedx* denotes the scalar product on *<sup>L</sup>*2*(*E*)*.

*Remark 3* Let us note that (29)–(30) has exactly the same form as the variational problem (18)–(19) on a single pipe. The inexact Petrov-Galerkin method and all results derived in the previous sections therefore translate almost verbatim to the network setting; let us refer to [4] for details and similar results for a different method, and to Sect. 9 for numerical illustration.

#### **7 Remarks on the Efficient Implementation**

In the discretization of (29)–(30), also compare with (18)–(19), the continuity and boundary conditions (24)–(26) are directly incorporated in the definition of the spaces *Qh* <sup>⊂</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> and *Vh* ⊂ *H (*div*)*. For the implementation, it may be more convenient to use larger spaces *Qh, Vh* <sup>⊂</sup> *<sup>H</sup>*1*(*E*)*, and to enforce some of the boundary and coupling conditions (24)–(26) explicitly by additional equations. Using the wording of [1], this approach of relaxing continuity conditions might be called *hybridization*. Since the resulting method is algebraically equivalent to the original scheme based on function spaces with incorporated coupling and boundary conditions, all results of the previous sections apply verbatim also to the method obtained after hybridization.

#### **8 Nonlinear Problems**

The formal extension of the Petrov-Galerkin method to nonlinear problems is straight-forward. The discrete variational formulation for (1)–(2), for instance, reads

$$(A\partial\_t \rho\_h(t), \widetilde{q}\_h) + (\partial\_x m\_h(t), \widetilde{q}\_h) = 0$$

$$(\partial\_t m\_h(t), \widetilde{v}\_h) + (\partial\_x \left(\frac{m\_h(t)^2}{A\rho\_h(t)} + A\rho\_h(t)\right), \widetilde{v}\_h) = -(\frac{\lambda}{2D} \frac{|m\_h(t)|}{A\rho\_h(t)} \pi\_h^{k-1} m\_h(t), \widetilde{v}\_h).$$

Numerical quadrature can be used in practice to facilitate the handling of the nonlinear terms. We do not give a complete convergence analysis here, but instead, we will demonstrate by numerical tests that for smooth solutions, the convergence results of Theorem 1 remain valid, at least in the practically relevant case of nonlinear friction.

#### **9 Numerical Results**

We now illustrate the theoretical results of Sect. 5 by numerical tests. For our computations, we consider the pipe network depicted in Fig. 1. As a first test case, we consider the linear problem (22)–(25) with inhomogeneous boundary conditions

$$p|\_v(t) = p\_v(t) \qquad v \in \mathcal{V}\_\partial \tag{31}$$

and we set *pv*<sup>1</sup> *(t)* <sup>=</sup> 1 and *pv*<sup>6</sup> *(t)* <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> sin*(πt)* in the following. All pipes are chosen of unit length = 1 and the model parameters are set to *a* ≡ *b* ≡ *d* ≡ 1. The simulation is started from a stationary state for the boundary values at initial time. The results of the computations are summarized in the left column of Table 1. As predicted by our theoretical results, we observe second order convergence.

We now repeat our numerical tests for the same network but with a semilinear gas flow model resulting from (1)–(3) by dropping the nonlinear term *∂x ( <sup>m</sup>*<sup>2</sup> *Aρ )* in

**Table 1** Errors *eh* <sup>=</sup> *(aph(T )*−*ph/*2*(t)*<sup>2</sup> <sup>+</sup>*bmh(T )*−*mh/*2*(T )*2*)*1*/*<sup>2</sup> at time *<sup>T</sup>* <sup>=</sup> 10 obtained with the Petrov-Galerkin approximation for the network problem with different gas flow models: linear model (left), semilinear model (middle), and quasilinear model (right)


**Fig. 2** Flow rates at boundary vertices *v*<sup>1</sup> and *v*<sup>6</sup> for linear, semilinear, and quasilinear flow models

Eq. (2). The model parameters are chosen as *A* = 1, *c* = 1, and *λ/(*2*D)* = 7*/*2; the latter was selected such that average of the resulting mass flow was similar to that of the linear model considered above. The computational results are depicted in the middle column of Table 1. Also for this nonlinear friction model, we observe second order convergence. These results can be explained theoretically in a similar way as those for the linear case by using a perturbation argument. In the right column of Table 1, we display the corresponding results for the quasilinear flow model (1)– (3) with the same parameters as used in the semilinear case. Note that a decrease in the convergence rates to first order is observed here. This is no surprise, since our analysis heavily relied on the anti-symmetry of the spatial derivative terms in (18)–(19), which is no longer valid for the quasilinear model (1)–(2).

In Fig. 2, we display the flow rates *m*|*<sup>v</sup>* at the boundary vertices *v*<sup>1</sup> and *v*<sup>6</sup> for the three different gas flow models discussed above as function of time. The results are in reasonable agreement. In summary, the semilinear model seems to yield the best compromise between modelling errors and convergence order.

**Acknowledgements** Support by the German Science Foundation (DFG) via grants TRR 154 project C4, TRR 146 project C3, GSC 233, and Eg-331/1-1 is gratefully acknowledged.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **New Preconditioners for Semi-linear PDE-Constrained Optimal Control in Annular Geometries**

**Lasse Hjuler Christiansen and John Bagterp Jørgensen**

### **1 Introduction**

Large-scale optimization problems that are constrained by partial differential equations (PDEs) play a key role in various fields of science and engineering [2, 10]. As a challenge, the size and complexity of the PDE-constraints presents severe computational difficulties that often prevent the use of general-purpose black-box optimizers. As a consequence, cost efficient, specialized solvers become essential [1, 3, 7, 8]. As a contribution in this direction, this paper demonstrates how to extend seminal ideas of Shen [15–17] to construct fast and memory-efficient optimizers for the class of semi-linear PDE-constrained optimization problems with non-linear reaction kinetics

$$\min\_{\mathbf{y}, \ \mathbf{y}, u \in U\_{\mathbf{d}\mathbf{d}}} \quad \frac{1}{2} \int\_{\varOmega} \left(\mathbf{y}(\mathbf{x}) - \mathbf{y}\_d(\mathbf{x})\right)^2 d\mathbf{x} + \frac{\rho}{2} \int\_{\varOmega} \mathbf{u}(\mathbf{x})^2 d\mathbf{x},\tag{1a}$$

$$\text{s.t.}\qquad -\Delta \mathbf{y} + G(\mathbf{y}) = \mu \quad \text{in} \quad \Omega. \tag{1b}$$

The paper focuses on the specific cases of either homogeneous (1) Dirichlet or (2) Neumann boundary conditions, where *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>2</sup> is an annular domain of the type

$$\mathcal{Q} := \{ (\mathbf{x}, \mathbf{y}) \in \mathbb{R}^2 \mid a \le \mathbf{x}^2 + \mathbf{y}^2 \le b \}, \ 0 < a < b. \tag{2}$$

For a given non-linear reaction term, *G(*·*),* and Tikhonov regularization parameter, *ρ >* 0, the control problem (1) aims to determine the optimal state and control

L. H. Christiansen (-) · J. B. Jørgensen

Department of Applied Mathematics and Computer Science & Center for Energy Resources Engineering, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark e-mail: lhch@dtu.dk; jbjo@dtu.dk

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_35

variables, *(y*∗*, u*∗*),* that minimize the objective (1a). Here the optimal solution must belong to the set of feasible pairs, *(y, u),* that satisfy the PDE-constraints (1b) and the additional admissibility condition, *u* ∈ *U*ad*.* To be concrete, this paper focuses on the case of bi-lateral point-wise control constraints

$$U\_{\rm ad} := \{ u \in L^2(\mathfrak{Q}) : u\_a \le u(\mathfrak{x}) \le u\_b \text{ a.e. in } \mathfrak{Q}\_d \}. \tag{3}$$

Point-wise bounds of the type (3) appear in a number of practical applications, where the control must satisfy, e.g., operational limitations that are not naturally captured by the underlying PDE (1b). In the limiting case, where *ua* := −∞ and *ub* := ∞*,* the admissible set becomes *<sup>U</sup>*ad <sup>=</sup> *<sup>L</sup>*<sup>2</sup>*(Ωd ).* This corresponds to the case where the PDE (1b) constitutes the only constraint.

#### *1.1 Main Contributions and Outline*

This paper contributes to a recent series of efforts by the authors that seek to construct fast, iterative solvers for a range of PDE-constrained optimization problems by exploiting the properties of customized spectral bases [4–6]. This series of work aims to introduce a high-order alternative to the widely-used constellation of low-order finite-element methods and Schur-complement preconditioners that currently predominates the literature on PDE control [12–14]. Previous efforts have mainly considered distributed control of elliptic and parabolic non-linear diffusionreaction systems. The main focus has been on problems in rectangular domains, where PDEs constitute the only constraints. As a natural extension, this paper investigates how to modify the existing methods to account for (1) bound constraints of the type (3) and (2) different geometries. For the sake of brevity, the paper restricts attention to annular domains (2). However, with slight modifications, the approach generalizes to cylindrical geometries of the type

$$\mathcal{Q}\_C := \{ (\mathbf{x}, \mathbf{y}, z) \in \mathbb{R}^3 \mid a \le \mathbf{x}^2 + \mathbf{y}^2 \le b, \ z \in (0, h) \}, \ 0 < a < b. \tag{4}$$

As the main contribution, this work proposes a collection of Poisson-like preconditioners that are customized for efficient solution of the control problems (1) by a semi-smooth Newton (SSN) strategy [9]. Similar to a traditional Newton method, the SSN scheme solves (1) iteratively by finding a locally optimal solution to the non-linear Karuhn-Kush-Tucker (KKT) optimality conditions by solving a sequence of linearized, variable-coefficient subproblems. Direct solution of the subproblems is often time consuming and requires considerable memory-allocation. To this end, the new preconditioners are designed to promote efficient solution of the SSN subproblems by appropriate Krylov subspace (KSP) methods. Following seminal ideas of Shen [16], the preconditioners rely on fast direct solvers for constantcoefficient problems that exploit (1) the structure of boundary-adapted spectral bases and (2) the separable nature of annular domains. As the main feature, inversion of the preconditioners decouples to form to a sequence of independent 2×2 systems. This implies that the preconditioners can be applied matrix-free and scale linearly with the problem size. In addition, the independence of the 2 × 2 systems makes the preconditioners amenable to parallelization. To establish proof-of-concept, a numerical case study solves (1), where *G(*·*)* is given by a cubic non-linearity. The results demonstrate computational efficiency and show that the preconditioners respond well to different problem sizes, boundary conditions, point-wise bound constraints and various choices of the regularization parameter, *ρ >* 0*.*

To establish the necessary background, Sect. 2 outlines how to solve the optimal control problem (1) using the SSN scheme. Further, to motivate the contributions of this paper, the section discusses some of the computational challenges that arise from discretization of the associated linearized subproblems. These challenges naturally leads to the construction of the new Poisson-like preconditioners in Sect. 3. Section 4 presents numerical results, while Sect. 5 draws overall conclusions and addresses future work.

#### **2 Motivation: A Semi-smooth Newton Method**

This paper solves the control problem (1) by a semi-smooth Newton strategy [9]. The SSN scheme seeks to generate a locally optimal solution, *(y, u),* by solving the first-order necessary optimality system

$$-\Delta \overline{\mathbf{y}} + G(\overline{\mathbf{y}}) - H(p) = 0 \qquad \text{in} \quad \mathcal{Q}, \tag{5a}$$

$$-\Delta p + G\_{\mathcal{Y}}(\overline{\mathbf{y}})p + \varphi\_{\mathcal{Y}}(\overline{\mathbf{y}}) = 0 \quad \text{in} \quad \mathcal{Q}.\tag{5b}$$

Here the boundary conditions of the original problem (1) are preserved, *Gy* denotes the Fréchet derivative of *G* with respect to the state variable, *y,* and the optimal control satisfies *<sup>u</sup>* <sup>=</sup> *H (p)* <sup>=</sup> max*(ua,* min*(ρ*−<sup>1</sup>*p(x), ub)).* In the special case *<sup>U</sup>*ad := *<sup>L</sup>*2*(Ω),* it can be shown that *<sup>u</sup>* <sup>=</sup> *H (p)* <sup>=</sup> *<sup>ρ</sup>*−1*<sup>p</sup>* [18]*.* In the concrete case of annular domains (2), the system (5) can be recast to polar coordinates. To this end, define the functions

$$Y(t,\theta) := \mathbf{y}(r(t)\cos(\theta), r(t)\sin(\theta)), \; P(t,\theta) := p(r(t)\cos(\theta), r(t)\sin(\theta)), \tag{6}$$

where *r(t)* := *<sup>b</sup>*−*<sup>a</sup>* <sup>2</sup> *(t* <sup>+</sup>*c), t* ∈ [−1*,* <sup>1</sup>]*, c* <sup>=</sup> *<sup>b</sup>*+*<sup>a</sup> <sup>b</sup>*−*<sup>a</sup> .* The optimality system then reads

$$-\Delta\_l Y + \kappa G(Y) - \kappa H(P) = 0 \qquad \text{in} \quad \mathcal{Q}\_R \tag{7a}$$

$$-\Delta\_l P + \kappa G \chi(Y) P + \kappa \varphi \chi(Y) = 0 \quad \text{in} \quad \mathcal{Q}\_R,\tag{7b}$$

where *ΔtY* := *((t* <sup>+</sup> *c)Yt)t* <sup>+</sup> <sup>1</sup> *(t*+*c)Yθθ* , *<sup>κ</sup>* <sup>=</sup> *(t*+*c)(b*−*a)*<sup>2</sup> <sup>4</sup> and *Q<sup>R</sup>* := [−1*,* 1] × [0*,* 2*π ).* To solve the KKT conditions (7), the SSN scheme considers the system as an operator equation *F (y, p)* = 0 and solves it by generating a recursive sequence of iterates, *xi* := *(Yi, Pi),* 1 ≤ *i* ≤ *k,* where the next iterate, *xk*+<sup>1</sup> := *(Y, P ),* is found by solution of the linearized optimality conditions:

$$-\Delta\_l Y + C\_0(\mathbf{x}\_k)Y - C\_1(\mathbf{x}\_k)P = f(\mathbf{x}\_k) \quad \text{in} \quad \Omega,\tag{8a}$$

$$-\Delta\_l P + C\_0(\mathbf{x}\_k) P + C\_2(\mathbf{x}\_k) Y = \mathbf{g}(\mathbf{x}\_k) \quad \text{in} \quad \Omega. \tag{8b}$$

Here *C*0*(xk)* := *κGY (Yk), C*1*(xk)* := *κHP (Pk), C*2*(xk)* := *κ(GY Y (Yk)Pk* + *ϕY Y (Yk))* and

$$f(\mathbf{x}\_k) := \kappa \left( G\_Y(Y\_k) Y\_k - G(Y\_k) - (H\_P(P\_k) P\_k - H(P\_k)) \right), \tag{9a}$$

$$\mathbf{g}(\mathbf{x}\_k) := \kappa \left( G\_{YY}(Y\_k) P\_k Y\_k + \varphi\_{YY}(Y\_k) Y\_k - \varphi\_Y(Y\_k) \right), \tag{9b}$$

where *Hp* denotes the generalized Newton derivative of *H* with respect to the adjoint variable, *P ,* i.e.,

$$H\_P(P) = \frac{1}{\rho} \begin{cases} 1 & \text{if } \mu\_a \le \frac{1}{\rho}P \le \mu\_b, \\ 0 & \text{otherwise.} \end{cases} \tag{10}$$

#### *2.1 Numerical Challenges: Discretization of the SSN Subproblems*

As a numerical challenge, the SSN scheme relies on successive solution of coupled PDEs in the form (8). Upon discretization, this leads to repeated solution of large saddle-point problems. To illustrate the associated difficulties, consider a spectral-Galerkin discretization of the linear subproblems (8). To this end, define the boundary-adapted approximation spaces

$$V\_N := \{ v \in \mathbb{P}\_N : av(\pm 1) + bv'(\pm 1) = 0 \}, \quad \mathbb{F}\_M := \text{span}\{e^{ik(\cdot)}, \ M/2 \le k \le M/2 - 1\}. \tag{11}$$

Let *<sup>K</sup>* := *<sup>N</sup>* · *<sup>M</sup>* and define *SK* := *VN* <sup>×</sup> <sup>F</sup>*M.* The discrete Galerkin approximation of (8) then seeks to find *Y, P* ∈ *SK* such that

$$
\langle \langle (t+c)Y\_l, v\_l \rangle + \langle (t+c)^{-1}Y\_{\theta}, v\_{\theta} \rangle + \langle C\_0Y - C\_1P, v \rangle = \langle f, v \rangle \quad \forall v \in S\_K,\tag{12a}
$$

$$
\langle \langle (t+c)P\_l, v\_l \rangle + \langle (t+c)^{-1}P\_\theta, v\_\theta \rangle + \langle C\_0 P + C\_2 Y, v \rangle = \langle \mathbf{g}, v \rangle \quad \forall v \in \mathcal{S}\_K,\tag{12b}
$$

where *v, w* := <sup>2</sup>*<sup>π</sup>* 0 <sup>1</sup> −1 *vw dtdθ.* To represent the approximate solutions, *YN,M* and *PN,M ,* consider the truncated series expansions

$$Y\_{N,M}(t,\theta) := \sum\_{k=-M/2}^{M/2-1} \sum\_{m=0}^{N-2} \widehat{\eta}\_{l(k)m} \psi\_m(t) e^{ik\theta}, \; P\_{N,M}(t,\theta) := \sum\_{k=-M/2}^{M/2-1} \sum\_{m=0}^{N-2} \widehat{p}\_{l(k)m} \psi\_m(t) e^{ik\theta}, \tag{13}$$

where *l(k)* := *<sup>k</sup>* <sup>+</sup> *<sup>M</sup>* <sup>2</sup> *.* Now, define the *(N* − 1*)* × *(N* − 1*)* matrices associated with the basis {*ψk*} *N*−2 *<sup>k</sup>*=<sup>0</sup> :

$$a\_{lj} = \langle (c+t)\psi\_j', \psi\_l'\rangle, \qquad A = (a\_{lj})\_{l,j=0\ldots N-2}, \tag{14}$$

$$b\_{lj} = \langle (c+t)^{-1} \psi\_j, \psi\_l \rangle, \quad B = (b\_{lj})\_{l,j=0\ldots N-2}. \tag{15}$$

Note that appropriate choices of the basis functions {*ψk*} *N*−2 *<sup>k</sup>*=<sup>0</sup> <sup>∈</sup> *Vn* will be constructed in Sect. 3. Further, let *Γ* and *Ξ* denote the *M* × *M* diagonal matrices defined by

$$\gamma\_{mn} = \langle e^{in(\cdot)}, e^{im(\cdot)} \rangle = 2\pi \delta\_{mn}, \ \xi\_{mn} = mn \langle e^{in(\cdot)}, e^{im(\cdot)} \rangle = 2\pi nm \delta\_{mn},\tag{16}$$

where *δmn* denotes the Kronecker delta. Finally, consider the *(MN* × 1*)* vectors

$$
\widehat{\mathbb{Y}} := (\widehat{\mathbb{y}}\_0, \dots, \widehat{\mathbb{y}}\_{M-1}), \widehat{\mathbb{y}}\_k = \{\widehat{\mathbb{y}}\_{jk}\}\_{j=0}^{N-2}, \tag{17}
$$

$$
\widehat{p} := (\widehat{p}\_0, \dots, \widehat{p}\_{M-1}), \widehat{p}\_k = \{\widehat{p}\_{jk}\}\_{j=0}^{N-2}, \tag{18}
$$

$$\widehat{G} := (\widehat{\mathfrak{g}}\_0, \dots, \widehat{\mathfrak{g}}\_{M-1}), \widehat{\mathfrak{g}}\_k = \{ \langle \mathfrak{g}, \psi\_j e^{ik(\cdot)} \rangle \}\_{j=0}^{N-2}, \tag{19}$$

$$\widehat{F} := (\widehat{f}\_0, \dots, \widehat{f}\_{M-1}), \widehat{f}\_k = \{ \langle f, \psi\_j e^{ik \cdot (\cdot)} \rangle \}\_{j=0}^{N-2}. \tag{20}$$

The discretized linear subproblem (8) can then be written in matrix form

$$
\underbrace{\begin{bmatrix} M\_{C\_2} & \mathbb{B} + M\_{C\_0} \\ \mathbb{B} + M\_{C\_0} & -M\_{C\_1} \end{bmatrix}}\_{\mathcal{A}} \underbrace{\begin{bmatrix} \widehat{\boldsymbol{y}} \\ \widehat{\boldsymbol{p}} \end{bmatrix}}\_{\boldsymbol{x}} = \underbrace{\begin{bmatrix} \widehat{G} \\ \widehat{F} \end{bmatrix}}\_{\boldsymbol{b}}, \tag{21}
$$

where <sup>B</sup> <sup>=</sup> *<sup>Γ</sup>* <sup>⊗</sup> *<sup>A</sup>* <sup>+</sup> *<sup>Ξ</sup>* <sup>⊗</sup> *B.* Here the matrices *MC ,*  <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* 3 are defined by the elements

$$(mc\_\ell)\_{ij} = \langle C\_l \psi\_k e^{im(\cdot)}, \psi\_l e^{in(\cdot)} \rangle,\tag{22}$$

where *i, j* satisfy that

$$j = n(N - 1) + (l + 1), \quad j = m(N - 1) + (k + 1), \tag{23a}$$

$$0 \le k, l \le N - 2, \qquad 0 \le n, m \le M - 1. \tag{23b}$$

#### **3 New Poisson-Like Preconditioners**

As a significant challenge to the numerical solution of (7), the SSN scheme relies on repeated solution of saddle-point problems (21) of dimension 2*(N* − 1*)M* × 2*(N* − 1*)M*. Consequently, direct solution strategies often become computational intractable. As a cost efficient alternative, the following introduces new preconditioners that seek to accelerate the inner SSN subproblems (8) by using appropriate Krylov subspace methods to solve the associated preconditioned linear systems

*P* <sup>−</sup><sup>1</sup> *<sup>k</sup> <sup>A</sup>kxk* <sup>=</sup> *<sup>P</sup>* <sup>−</sup><sup>1</sup> *<sup>k</sup> bk.* (24)

Concretely, this paper proposes approximative constraintpreconditioners of the type

$$P\_k = \begin{bmatrix} \widehat{M}\_{C\_2} & \widehat{\mathbb{B}} + \widehat{M}\_{C\_0} \\ \widehat{\mathbb{B}} + \widehat{M}\_{C\_0} & -\widehat{M}\_{C\_1} \end{bmatrix}. \tag{25}$$

Following ideas of traditional Poisson preconditioners, the new preconditioners are constructed by approximating each block of the SSN subproblem (21) by the matrices, <sup>B</sup><sup>B</sup> and *<sup>M</sup>*B*c ,*  <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,* 2, that come from a spectral Galerkin discretization of the corresponding constant-coefficient problem that determines *Y, P* ∈ *SK* such that

$$
\langle C\_A Y\_I, \upsilon\_l \rangle + \langle C\_B Y\_\theta, \upsilon\_\theta \rangle + \langle \overline{C}\_0 Y - \overline{C}\_1 P, \upsilon \rangle = \langle f, \upsilon \rangle \quad \forall \upsilon \in \mathbb{S}\_K,\tag{26a}
$$

$$
\langle C\_A P\_l, v\_l \rangle + \langle C\_B P\_\theta, v\_\theta \rangle + \langle \overline{C}\_0 P + \overline{C}\_2 Y, v \rangle = \langle \mathbf{g}, v \rangle \quad \forall v \in \mathbb{S}\_K,\tag{26b}
$$

$$\text{where } C\_A = c, \ C\_B = \frac{c}{c^2 - 1} \text{ and } \overline{C}\_l = \frac{1}{2} \left( \max\_{\mathcal{Q}} C\_l(\mathbf{x}\_k) + \min\_{\mathcal{Q}} C\_l(\mathbf{x}\_k) \right), \ i = 0, 1, 2.$$

To be efficient, the new preconditioners crucially rely on carefully chosen basis functions {*ψk*} *N*−2 *<sup>k</sup>*=<sup>0</sup> for the discrete approximation space, *VN* (11). To this end, this paper uses Fourier-like (FL) bases that were originally introduced by Shen and Wang in the context of traditional initial-boundary-value problems [17]. As a key property to construction of the preconditioners, the FL bases lead to diagonal massand stiffness matrices, i.e.,

$$\mathbb{M}\_{lj} = (\langle \psi\_j, \psi\_l \rangle)\_{lj} = \lambda\_j \delta\_{j,l}, \ \mathbb{S}\_{lj} = (\langle \partial\_l \psi\_j, \partial\_l \psi\_l \rangle)\_{lj} = \delta\_{j,l}.\tag{27}$$

The FL bases can be constructed as part of an offline preprocessing stage in two steps:

1. Let {*Lk(*·*)*} *N <sup>k</sup>*=<sup>0</sup> be the Legendre polynomials. Then there exists a unique set of coefficients {*ak, bk*} *N*−2 *<sup>k</sup>*=<sup>0</sup> such that

$$\phi\_k := c\_k \left( L\_k + a\_k L\_{k+1} + b\_k L\_{k+2} \right) \in V\_{k+2}, \quad c\_k := \left( \sqrt{-b\_k (4k+6)} \right)^{-1}.$$

Furthermore, the mass matrix, *MA* = *(φj , φi)ij ,* is penta-diagonal and symmetric positive definite, whereas the stiffness matrix, *SA* = *(∂xφj , ∂xφi)ij ,* becomes diagonal [15]. In the concrete cases of Dirichlet and Neumann boundary conditions, the coefficients, {*ak, bk*} *N*−2 *<sup>k</sup>*=<sup>0</sup> are given by respectively

$$a\_k = 0, \ b\_k = -1 \text{ and } a\_k = 0, \ b\_0 = 1/2, \ b\_k = -k(k+1)/((k+2)(k+3)). \tag{28}$$

2. The second step computes the diagonalization *<sup>Λ</sup>* <sup>=</sup> *QT MAQ,* where *<sup>Q</sup>* <sup>=</sup> *(qij )* denotes the matrix of eigenvectors and {*λi*} *N*−2 *<sup>i</sup>*=<sup>1</sup> are the associated eigenvalues. Using the matrix *Q*, the FL basis can be constructed by the linear combinations:

$$\psi\_k(\mathbf{x}) = \sum\_{j=0}^{N-2} q\_{jk} \phi\_j(\mathbf{x}), \ 0 \le k \le N-2. \tag{29}$$

#### *3.1 Efficient Inversion of the Preconditioners*

As the main feature of the preconditioners, *Pk,* the following describes an efficient inversion procedure that exploits the orthogonal structures of the FL bases (27). To this end, consider the following preconditioning problem that is solved during each iteration of the KSP method:

$$
\underbrace{\begin{bmatrix}
\widehat{M}\_{C\_2} & \widehat{\mathbb{B}} + \widehat{M}\_{C\_0} \\
\widehat{\mathbb{B}} + \widehat{M}\_{C\_0} & -\widehat{M}\_{C\_1}
\end{bmatrix}}\_{P\_k} \underbrace{\begin{bmatrix}
\widehat{\mathbb{y}}^k \\
\widehat{\mathbb{P}}^k
\end{bmatrix}}\_{\boldsymbol{\widehat{u}}} = \underbrace{\begin{bmatrix}
\widehat{G}^k \\
\widehat{F}^k
\end{bmatrix}}\_{\boldsymbol{\widehat{u}}\_k^k \mathbf{x}\_k}.
\tag{30}
$$

Note that (30) corresponds to the discrete first-order necessary optimality conditions associated with the constant-coefficient optimal control problem (26). Hence, by definition (22), it follows that

$$
\widehat{\mathbb{B}} = C\_A \varGamma \otimes \mathbb{S} + C\_B \varXi \otimes \mathbb{M}, \ \widehat{\mathbb{M}}\_{C\_\ell} = \overline{C}\_\ell \varGamma \otimes \mathbb{M}, \tag{31}
$$

where <sup>S</sup>*ij* <sup>=</sup> *(∂tψj , ∂tψi)ij* and <sup>M</sup>*ij* <sup>=</sup> *(ψj , ψi)ij* . Further, by the orthogonal properties of the Fourier bases (16), the matrices, *Γ* and *Ξ*, are diagonal. Therefore, using the notation,

$$
\widehat{\mathfrak{z}}\_l^k = \{\widehat{\mathfrak{z}}\_{lm}^k\}\_{m=0}^{N-2}, \quad \widehat{p}\_l^k = \{\widehat{p}\_{lm}^k\}\_{m=0}^{N-2}, \quad \widehat{G}\_l^k = \{\widehat{G}\_{lm}^k\}\_{m=0}^{N-2}, \quad \widehat{F}\_l^k = \{\widehat{F}\_{lm}^k\}\_{m=0}^{N-2},
$$

it follows that the preconditioning problem (30) can be written as *M* independent linear systems

$$
\begin{bmatrix} 2\pi \overline{C\_2} \mathbb{M} & \Sigma\_l \\ \Sigma\_l & -2\pi \overline{C\_1} \mathbb{M} \end{bmatrix} \begin{bmatrix} \widehat{\gamma\_l^k} \\ \widehat{p\_l^k} \end{bmatrix} = \begin{bmatrix} \widehat{G}\_l^k \\ \widehat{F}\_l^k \end{bmatrix}, \ 0 \le l \le M - 1,\tag{32}
$$

where *Σl* := *CA*<sup>S</sup> <sup>+</sup> *(CBk(l)*<sup>2</sup> <sup>+</sup> <sup>2</sup>*πC*0*)*M*.* In addition, the properties of the FL basis, {*ψk*} *N*−2 *<sup>k</sup>*=<sup>0</sup> *,* implies that <sup>S</sup> and <sup>M</sup> become diagonal (29). Hence, the system (32) reduces to *M(N* − 1*)* independent 2 × 2 linear systems in the form

$$
\begin{bmatrix}
2\pi \overline{C}\_2 \lambda\_m & \sigma\_{nm} \\
\sigma\_{nm} & -2\pi \overline{C}\_1 \lambda\_m
\end{bmatrix}
\begin{bmatrix}
\widehat{\gamma}\_{lm}^k \\
\widehat{p}\_{lm}^k
\end{bmatrix} = \begin{bmatrix}
\widehat{G}\_{nm}^k \\
\widehat{F}\_{nm}^k
\end{bmatrix}, \ 0 \le l \le M-1, \ 0 \le m \le N-2,\tag{33}
$$

where *σlm* := *CA* <sup>+</sup> *(CBk(l)*<sup>2</sup> <sup>+</sup> <sup>2</sup>*πC*0*)λm.* By (33), it follows that the original preconditioning problem (30) decouples into *(N* − 1*)M* independent 2 × 2 subsystems. As a consequence, the Poisson-like preconditioners (25) scale linearly with the problem size and can be applied matrix-free.

#### **4 Numerical Results**

To investigate the potential of the Poisson-like preconditioners, the following case study solves the control problem (1), where the reaction term is given by the cubic non-linearity *G(y)* := *<sup>y</sup>*3*.* The corresponding problem serves as a recurring example in the control literature [18]. In this case study, the goal is to track the desired state of the type

$$z\_d(r, \theta) = \begin{cases} Z, & (r, \theta) \in [\alpha, \beta] \times [0, \pi/2] \cup [\pi, \pi/3] \\ 0, & \text{otherwise} \end{cases},\tag{34}$$

where *a* ≤ *α<β* ≤ *b.* The following example uses the parameters, *Z* = 4*, a* = 30*, α* = 40 and *β* = *b* = 60*.* The main purpose of the study is to investigate efficiency and robustness of the preconditioners (25). To this end, the study solves (1) for different choices of (1) problem size, (2) boundary conditions, (3) regularization parameter, and (4) point-wise bound constraints of the type (3).<sup>1</sup> As a benchmark reference, the results are compared to MATLABs state-of-the-art direct solver. All computations are carried out in [11] on a 2.9 GHz Intel processor. The SSN scheme is said to have converged when the 2-norm difference between successive iterates is below *<sup>η</sup>* <sup>=</sup> <sup>10</sup>−4. The KSP iterations are performed using the MATLAB function GMRES with a tolerance of *#* <sup>=</sup> <sup>10</sup>−9*.* The direct solver relies on MATLABs backslash command. Table 1 lists the results, where KSP iter denotes the average number of KSP iterations required for each SSN step. Note also that DOF denotes the number of degrees of freedom for each individual SSN subproblem. Hence, the total degrees of freedom, DOF*<sup>T</sup> ,* is therefore given by #SSN steps × DOF*.* The results reflect some overall tendencies that generalize to other choices of the parameters, *Z, a, α, β* and *b*. Firstly, the preconditioners provide significant reductions in CPU-time compared to the direct strategy. In particular, the results show that the non-linear control problem with up to DOF*<sup>T</sup>* = 875*,*000 unknowns can be solved in less than a minute using modest hardware. Secondly, the preconditioners prove robust with respect to the problem size and the choice of boundary conditions. Thirdly, as a drawback, the number of SSN steps and KSP iterations increase as the point-wise bounds become more strict. The authors suspect that these increases in SSN steps and KSP iterations are caused by the combination of a decrease in regularity of the solution and an increase in nonlinearity of the KKT system (Fig. 1).

<sup>1</sup>By the choices of parameters, the study strives to provide a representative example of the general tendencies of performance and robustness that can be expected from the preconditioners. To allow for more diverse and elaborate experiments, the MATLAB source code of this study has been made publicly available from https://github.com/LHCH-DK/PDE\_Control\_Annular.git.



**Fig. 1** The computed states for (1) Dirichlet boundary conditions, (2) Neumann boundary conditions and (3) the desired state for *ua* = −35*, ub* <sup>=</sup> <sup>35</sup>*, ρ* <sup>=</sup> <sup>10</sup>−<sup>4</sup> . Note that both solutions manage to approximate the desired state well, despite of the bound constraints

#### **5 Conclusions and Outlook**

This paper has proposed new Poisson-like preconditioners for semi-linear PDEconstrained optimization problems with non-linear reaction kinetics and point-wise bound constraints. The preconditioners specifically target problems in annular domains. Inspired by [16], the new preconditioners exploit the orthogonal properties of customized, boundary-adapted spectral bases. This leads to matrix-free preconditioners that scale linearly with the problem size. Numerical results have demonstrated that the preconditioners lead to fast solution of large-scale optimization problems with significant computational benefits compared to MATLABs state-of-the-art direct methods. Furthermore, the preconditioners have proven to be robust with respect to the problem size for both homogeneous Dirichlet and Neumann boundary conditions. As a challenge, numerical experiments indicated that the non-linearity of the problem increases as the point-wise bound constraints become more strict. In turn, this leads to an increase in the number of SSN steps and KSP iterations that are required to reach convergence. A future study seeks to improve this situation by providing the SSN scheme with an educated starting guess that uses a coarse-grid solution to a similar control problem with less restrictive constraints.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **DIRK Schemes with High Weak Stage Order**

**David I. Ketcheson, Benjamin Seibold, David Shirokoff, and Dong Zhou**

### **1 Introduction**

Runge-Kutta (RK) methods achieve high-order accuracy in time by means of combining approximations to the solution at multiple stages. An *s*-stage RK scheme can be represented via the Butcher tableau

$$
\frac{\mathbf{c} \mid A}{\mathbf{b}^T} = \begin{array}{c}
\mathbf{c}\_1 \mid a\_{11} \cdots \cdot a\_{1s} \\
\vdots \\
\mathbf{c}\_s \mid a\_{s1} \cdots \cdot a\_{ss} \\
\hline \mid b\_1 \cdots \cdot b\_s \\
\end{array}.
$$

D. I. Ketcheson

B. Seibold (-) Department of Mathematics, Temple University, Philadelphia, PA, USA e-mail: seibold@temple.edu

D. Shirokoff

Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, USA e-mail: david.g.shirokoff@njit.edu

D. Zhou

© The Author(s) 2020

453

Applied Mathematics and Computational Science, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia e-mail: david.ketcheson@kaust.edu.sa

Department of Mathematics, California State University Los Angeles, Los Angeles, CA, USA e-mail: dzhou11@calstatela.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_36

Throughout the whole paper we assume that **c** = *A***e**, where **e** is the vector of all ones. The scheme's stability function [12] *R(ζ )* <sup>=</sup> <sup>1</sup> <sup>+</sup> *<sup>ζ</sup>***b***<sup>T</sup> (I* <sup>−</sup> *ζ A)*−1**<sup>e</sup>** measures the growth *un*+1*/u<sup>n</sup>* per step *Δt*, when applying the scheme to the linear model equation *u (t)* = *λu*, with *ζ* = *λΔt*.

A particular interest lies in the accuracy of the RK scheme for stiff problems, i.e., problems in which a larger time step is chosen than the fastest time scale of the problem's dynamics. A standard stiff model problem [8] is the scalar linear ordinary differential equation (ODE)

$$
\mu' = \lambda(\mu - \phi(t)) + \phi'(t) \ , \tag{1}
$$

with i.c. *u(*0*)* = *φ(*0*)* and Re *λ* ≤ 0. The true solution *y(t)* = *φ(t)* evolves on an *O(*1*)* time scale. Hence, *λ*-values with large negative real part result in stiffness. Considering a family of test problems (parametrized by *λ*), one can now establish the scheme's convergence via two different limits: (a) the non-stiff limit *Δt* → 0 and *ζ* → 0; and (b) the stiff limit *Δt* → 0 and *ζ* → −∞. A characteristic property of most RK schemes is that, while the non-stiff limit recovers the scheme's order (as given by the order conditions [2, 5]), the error decays at a reduced order in the stiff limit. This phenomenon is called "order reduction" (OR) [1, 3, 7, 10, 11] and it manifests in various ways for more complex problems, including numerical boundary layers [6]. The OR phenomenon can be seen by studying the RK scheme applied to (1). The approximation error at time *tn*+<sup>1</sup> reads [12, Chapter IV.15]

$$
\epsilon^{n+1} = R(\zeta)\,\epsilon^n + \zeta \mathbf{b}^T (I - \zeta A)^{-1} \mathbf{\mathcal{S}}\_s^{n+1} + \delta^{n+1},\tag{2}
$$

where *R(ζ )* is the growth factor, and

$$\mathfrak{G}\_s^{n+1} = \sum\_{j \ge 2} \frac{\Delta t^j}{(j-1)!} \mathfrak{r}^{(j)} \phi^{(j)}(t\_n) \,, \quad \delta^{n+1} = \sum\_{j \ge 1} \frac{\Delta t^j}{(j-1)!} \left( \mathbf{b}^T \mathbf{c}^{j-1} - \frac{1}{j} \right) \phi^{(j)}(t\_n) \,,$$

are the truncation errors incurred at the intermediate stages and at the end of the step, respectively. Here, *φ(j )* denotes the *j* -th derivative of the solution, and the vectors

$$\mathbf{r}^{(j)} = A\mathbf{c}^{j-1} - \frac{1}{7}\mathbf{c}^{j}, \quad j = 1, 2, \dots$$

we call the *stage order residuals* or *stage order vectors*. The condition *<sup>τ</sup> (η)* <sup>=</sup> 0 for 0 ≤ *η* ≤ *j* appears often in the literature and is also referred to as the simplifying assumption *C(η)* [12]. In (2), the step error *δn*+<sup>1</sup> is of the formal order (in *Δt*) of the scheme (due to the order conditions). Moreover, the growth factor carries over (more or less, see [4]) the accuracy from one to the next step. Hence, the critical expression for OR is the term involving the stage error *δ <sup>n</sup>*+<sup>1</sup> *<sup>s</sup>* . Specifically, the asymptotic behavior of the expression

$$\mathbf{g}^{(j)} = \xi \mathbf{b}^T (I - \xi A)^{-1} \mathbf{r}^{(j)} \tag{3}$$

matters. In the non-stiff limit (*<sup>ζ</sup>* 1), a Neumann expansion yields *ζ (I* <sup>−</sup>*ζ A)*−<sup>1</sup> <sup>=</sup> *ζ I* <sup>+</sup> *<sup>ζ</sup>* <sup>2</sup>*<sup>A</sup>* <sup>+</sup> *<sup>ζ</sup>* <sup>3</sup>*A*<sup>2</sup> <sup>+</sup> *...* , leading to expressions **<sup>b</sup>***<sup>T</sup> <sup>A</sup><sup>τ</sup> (j )* with  *>* 0. And in fact the order conditions guarantee that **<sup>b</sup>***<sup>T</sup> <sup>A</sup><sup>τ</sup> (j )* <sup>=</sup> 0 for 0 <sup>≤</sup> <sup>+</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>p</sup>* <sup>−</sup> 1 to ensure the formal order of the scheme.

Conversely, in the stiff limit we can treat *ζ* <sup>−</sup><sup>1</sup> as the small parameter and expand *ζ (I* <sup>−</sup>*ζ A)*−<sup>1</sup> = −*A*−1*(I* <sup>−</sup>*<sup>ζ</sup>* <sup>−</sup>1*A*−1*)* <sup>−</sup><sup>1</sup> = −*A*−1−*<sup>ζ</sup>* <sup>−</sup>1*A*−2−*<sup>ζ</sup>* <sup>−</sup>2*A*−3−*...* , leading to expressions **b***<sup>T</sup> Aτ (j )* with  *<* 0. The order conditions do *not* imply that these quantities vanish, and in general one may observe a reduced rate of convergence.

A key question is therefore whether additional conditions can be imposed on the RK scheme that recover the scheme's order in the stiff regime. A well-known answer to the question is:

**Definition 1** Let *p*ˆ denote the order of the quadrature rule of an RK scheme. Let *q*ˆ denote the largest integer such that *<sup>τ</sup> (j )* <sup>=</sup> 0 for 1 <sup>≤</sup> *<sup>j</sup>* ≤ ˆ*q*. The *stage order* of a RK scheme is *q* = min*(p,*ˆ *q)*ˆ .

Having stage order *q* implies that the error decays at an order of (at least) *q* in the stiff regime (see also [12]). This work focuses particularly on diagonally-implicit Runge-Kutta (DIRK) schemes, for which *A* is lower diagonal. A known drawback of DIRK schemes is that they cannot have high stage order:

**Theorem 1** *The stage order of an irreducible DIRK scheme is at most 2. The stage order of a DIRK scheme with non-singular A is at most 1.*

*Proof* Since **<sup>c</sup>** <sup>=</sup> *<sup>A</sup>***e**, we have *<sup>τ</sup> (*2*)* <sup>1</sup> <sup>=</sup> *<sup>a</sup>*11*c*<sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> *(c*1*)*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *(a*11*)*2. Thus if *<sup>A</sup>* is non-singular, one has *<sup>τ</sup> (*2*)* = 0, so *<sup>q</sup>* <sup>≤</sup> 1. Consider now the case that *<sup>a</sup>*<sup>11</sup> <sup>=</sup> *<sup>c</sup>*<sup>1</sup> <sup>=</sup> 0, and suppose that the method has stage order 3. The conditions *τ (*2*)* <sup>2</sup> <sup>=</sup> *<sup>τ</sup> (*3*)* <sup>2</sup> = 0 then imply *a*<sup>21</sup> = *a*<sup>22</sup> = *c*<sup>2</sup> = 0, which would render the scheme reducible. Hence, *q* ≤ 2. %&

Hence, while DIRK schemes possess an implementation-friendly structure (each stage is a backward-Euler-type solve), their potential to avoid OR by means of high stage order is limited. We therefore move to a weaker condition that can avoid OR in some situations for higher order in the context of DIRK schemes.

#### **2 Weak Stage Order**

To avoid order reduction, the expressions *g(j )* in (3) need to vanish in the stiff limit. In line with [9], we define the following criteria:

**Definition 2 (Weak Stage Order)** A RK scheme has weak stage order (WSO) *q*˜ if there is an *A*-invariant subspace that is orthogonal to **b** and that contains the stage order vectors *<sup>τ</sup> (j )* for 1 <sup>≤</sup> *<sup>j</sup>* ≤ ˜*q*.

**Theorem 2 (WSO Is the Most General Condition that Ensures** *<sup>g</sup>(j )* <sup>=</sup> <sup>0</sup> **for All** *ζ >* <sup>0</sup>**)** *Let coefficients A, b be given. Then <sup>g</sup>(j )* <sup>=</sup> <sup>0</sup> *for all ζ >* <sup>0</sup> *and* <sup>1</sup> <sup>≤</sup> *<sup>j</sup>* ≤ ˜*<sup>q</sup> if and only if the corresponding RK scheme has weak stage order q*˜*.*

*Proof* Let *C(G)* denote the column space of

$$G := \left[ \mathfrak{r}^{(1)}, A\mathfrak{r}^{(1)}, A^2\mathfrak{r}^{(1)}, \dots, A^{s-1}\mathfrak{r}^{(1)}, \mathfrak{r}^{(2)}, A\mathfrak{r}^{(2)}, \dots, A^{s-1}\mathfrak{r}^{(\bar{q})} \right].$$

From the Cayley-Hamilton theorem it follows that WSO *q*˜ is equivalent to

$$\mathbf{b}^T A^\ell \mathbf{r}^{(j)} = 0, \qquad 0 \le \ell \le s - 1, \ 1 \le j \le \tilde{q} \; . \tag{4}$$

⇒ Because *C(G)* is *A*-invariant, *C(G)* is invariant under multiplication by *(*1 − *ζ A)*−1, i.e. if **<sup>v</sup>** <sup>∈</sup> *C(G)* then for any *ζ >* 0, the product *(*<sup>1</sup> <sup>−</sup> *ζ A)*−1**<sup>v</sup>** <sup>∈</sup> *C(G)*. Since **<sup>b</sup>** is orthogonal to *C(G)*, we have *<sup>g</sup>(j )* <sup>=</sup> 0 for all 1 <sup>≤</sup> *<sup>j</sup>* ≤ ˜*q*.

⇐ If *<sup>g</sup>(j )* <sup>=</sup> 0, then *<sup>ζ</sup>* <sup>−</sup>1*g(j )* <sup>=</sup> **<sup>b</sup>***<sup>T</sup> (*<sup>1</sup> <sup>−</sup> *ζ A)*−1*<sup>τ</sup> (j )* <sup>=</sup> 0 for all *ζ >* 0. Differentiating both sides of this equation -times, with respect to *ζ* , and taking the limit as *ζ* → 0+, yields the conditions in Eq. (4). %&

**Definition 3 (Weak Stage Order Eigenvector Criterion)** A RK scheme satisfies the WSO eigenvector criterion of order *q*˜*<sup>e</sup>* if for each 1 ≤ *j* ≤ ˜*qe*, there exists *μj* such that *<sup>A</sup><sup>τ</sup> (j )* <sup>=</sup> *μj <sup>τ</sup> (j )*, and moreover, **<sup>b</sup>***<sup>T</sup> <sup>τ</sup> (j )* <sup>=</sup> 0.

The WSO eigenvector criterion of order *q*˜*<sup>e</sup>* implies WSO (of at least) *q*˜*e*. For a given scheme, let *p* denote the classical order, *q* the stage order, and *q*˜ the weak stage order. Then we have *q*˜ ≥ *q* and *p* ≥ *q*. Note however that a method with WSO *q*˜ ≥ 1 need not even be consistent; order conditions must be imposed separately.

The WSO eigenvector criterion may serve to avoid OR because it implies that

$$\mathbf{g}^{(j)} = \xi \mathbf{b}^{\prescript{T}{}{}{}} (1 - \xi \boldsymbol{\mu}\_{j})^{-1} \mathbf{r}^{(j)} = \frac{\xi}{1 - \xi \boldsymbol{\mu}\_{j}} \mathbf{b}^{\prescript{T}{}{}} \mathbf{r}^{(j)} \dots$$

i.e., it allows one to "push" the stage order residuals past the matrix *(*<sup>1</sup> <sup>−</sup> *ζ A)*−1, and then use **<sup>b</sup>***<sup>T</sup> <sup>τ</sup> (j )* <sup>=</sup> 0. Note that the condition **<sup>b</sup>***<sup>T</sup> <sup>τ</sup> (j )* <sup>=</sup> 0 that is required in Definition 3 is actually automatically satisfied (due to the order conditions) if *p > q*˜*<sup>e</sup>* (or *p* ≥ ˜*qe* for stiffly accurate schemes).

It must be stressed that the concept of WSO (both criteria) is based on the linear test equation (1), hence it is not clear to what extent WSO will remedy OR for nonlinear problems or problems with time-dependent coefficients. In Sect. 4 we numerically investigate some nonlinear test problems.

Finally, we present a limitation theorem on the WSO eigenvector criterion.

**Theorem 3** *DIRK schemes with invertible A have q*˜*<sup>e</sup>* ≤ 3*.*

*Proof* Because the *τ (j )* only depend on *A*, the eigenvector relation in Definition 3 depends only on *A*, not on **b**. With *A* lower triangular, the first *k* components of *τ (j )* depend only on the upper *k* rows of *A*; and the same is true for the eigenvector relation as well. Hence, for a scheme to have an *A* that allows for the WSO eigenvector criterion of order *q*˜*e*, all upper sub-matrices of *A* must admit the same, too. We can therefore study *A* row by row. The first component of *τ (j )* equals *(*<sup>1</sup> <sup>−</sup> <sup>1</sup> *<sup>j</sup> )a<sup>j</sup>* <sup>11</sup>, which is nonzero for *j >* 1. Hence, the first row of the equation *<sup>A</sup><sup>τ</sup> (j )* <sup>=</sup> *μj <sup>τ</sup> (j )* is equivalent to *μj* <sup>=</sup> *<sup>a</sup>*11. With that, we can move to the second row of the equation, which reads

$$(1 - \frac{1}{f})a\_{11}^j a\_{21} + (a\_{22} - a\_{11})\left(a\_{11}^{j-1} a\_{21} + (a\_{21} + a\_{22})^{j-1} a\_{22} - \frac{1}{f}(a\_{21} + a\_{22})^j\right) = 0 \,\, . \tag{5}$$

To determine the set of solutions *(a*11*, a*21*, a*22*)* of (5), we first observe that (5) is homogeneous, i.e., if *(a*11*, a*21*, a*22*)* solves (5), then *(μa*11*, μa*21*, μa*22*)* solves (5) as well for any *<sup>μ</sup>* <sup>∈</sup> <sup>R</sup>. It therefore suffices to consider the solutions of (5) in the 2D-plane *( <sup>a</sup>*<sup>11</sup> *<sup>a</sup>*<sup>21</sup> *, <sup>a</sup>*<sup>22</sup> *<sup>a</sup>*<sup>21</sup> *)*. Figure 1 shows the resulting solution curves for *j* ∈ {2*,* 3*,* 4}.

One class of solutions lies on the straight line of slope 1 passing through *(*1*,* 0*)*. Those schemes are *equal-time* methods, i.e., RK schemes that have **c** = *ν***e**, where *<sup>ν</sup>* <sup>∈</sup> <sup>R</sup> is a constant. In fact, equal-time schemes satisfy the eigenvector relation for all *j* . However, they are not particularly useful RK methods, because—among other limitations—they are restricted to second order. This follows because the order 1 and 2 conditions require **<sup>b</sup>***<sup>T</sup>* **<sup>e</sup>** <sup>=</sup> 1 and **<sup>b</sup>***<sup>T</sup>* **<sup>c</sup>** <sup>=</sup> <sup>1</sup> <sup>2</sup> . Thus *<sup>ν</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> , and **<sup>b</sup>***<sup>T</sup>* **<sup>c</sup>**<sup>2</sup> <sup>=</sup> *<sup>ν</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> 4 , which contradicts the order 3 condition **<sup>b</sup>***<sup>T</sup>* **<sup>c</sup>**<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>3</sup> . Note that the equal-time scenario also covers the points at infinity in Fig. 1, i.e., the schemes with *a*<sup>21</sup> = 0.

**Fig. 1** Curves of WSO orders 2, 3, and 4 as functions of the re-scaled parameters *<sup>a</sup>*<sup>11</sup> *<sup>a</sup>*<sup>21</sup> and *<sup>a</sup>*<sup>22</sup> *<sup>a</sup>*<sup>21</sup> . Left panel: scale 10; right panel: scale 1. All orders are satisfied along the line of slope 1 going through (1,0), corresponding to equal-time DIRK schemes. Moreover, there are two further points (other than the origin), where orders 2 and 3 are satisfied. Neither of these two points satisfies order 4

Non-equal-time schemes that satisfy (5) for *j* = 2 and *j* = 3 are the following two points in the *( <sup>a</sup>*<sup>11</sup> *<sup>a</sup>*<sup>21</sup> *, <sup>a</sup>*<sup>22</sup> *<sup>a</sup>*<sup>21</sup> *)* plane: *P*<sup>1</sup> = *(*−4 + 3 √ 2*,* <sup>√</sup><sup>2</sup> <sup>−</sup> <sup>1</sup>*)* <sup>=</sup> *(*0*.*2426*,* <sup>0</sup>*.*4142*)* and *P*<sup>2</sup> = *(*−*(* √ <sup>2</sup>+1*)(*<sup>√</sup> 2+2*),* −*(* √ 2+1*))* = *(*−8*.*2426*,* −2*.*4142*)*. None of these two points satisfies (5) for *j* = 4 (green curve in Fig. 1). Therefore *q*˜*<sup>e</sup>* ≤ 3. %&

Among the two sets of solutions found in the proof, *P*<sup>1</sup> implies that *a*11, *a*21, and *a*<sup>22</sup> all have the same sign, which is a desirable property. In contrast, *P*<sup>2</sup> implies that *a*<sup>21</sup> *<* 0. Both WSO 3 schemes presented below correspond to the *P*<sup>1</sup> solution.

#### **3 DIRK Schemes with High Weak Stage Order**

Imposing the classical order conditions [2, 5], together with the WSO eigenvector relation (Definition 3), we determine RK schemes by searching the parameter space of DIRK schemes (with all diagonal entries non-zero). A stiffly accurate structure (**b***<sup>T</sup>* equals the last row of *A*) is imposed, as is A-stability (verified by evaluating the stability function *R(ζ )* along the imaginary axis). Together this implies that the resulting scheme is L-stable; i.e., it ensures that unresolved stiff modes decay [5]. The number of stages is chosen so that the constraints admit solutions. The optimization itself is carried out using MATLAB's optimization toolbox, using multiple local optimization algorithms included in the function fmincon. An effort was made to minimize the *L*<sup>2</sup> norm of the local truncation error coefficients. However, in multiple cases the solver exhibited bad convergence properties; so while the schemes below yield reasonable truncation errors, it should not be expected that they are optimal. We find an order 3 scheme with WSO 2 (see also [9]),


an order 3 scheme with WSO 3,


and an order 4 scheme with WSO 3,


#### **4 Numerical Results**

In this section we verify the order of accuracy of the schemes above and demonstrate that WSO remedies order reduction for linear problems. We confirm that WSO *p* is required for ODEs, and WSO *p* − 1 is required for PDE IBVPs. In addition, we study the effect of WSO for two nonlinear problems.

#### *4.1 Linear ODE Test Problem*

We consider the linear ODE test problem (1) with the true solution *φ(t)* <sup>=</sup> sin*(t* <sup>+</sup> *<sup>π</sup>* <sup>4</sup> *)*, the stiffness parameter *<sup>λ</sup>* = −104, and the initial condition *u(*0*)* <sup>=</sup> sin*( <sup>π</sup>* 4 *)*. The problem is solved using three 3rd order DIRK schemes (with WSO 1, 2, and 3) and two 4th order DIRK schemes (with WSO 1 and 3)<sup>1</sup> up to the final time *T* = 10. The convergence results are shown in Fig. 2. In the stiff regime where |*ζ* | = |*λ*|*Δt* ' 1, first order convergence is observed for the WSO 1 schemes as expected, the WSO 2 scheme improves the convergence rate to 2, and the WSO 3 schemes exhibit 3rd order convergence. In addition to yielding better convergence orders in the stiff regime, the schemes with higher WSO also turn out to yield substantially smaller error constants in the non-stiff regime (*Δt* 1*/*|*λ*|). For comparison, we also display a DIRK scheme with explicit first stage (EDIRK), that is, *a*<sup>11</sup> = 0, of stage order 2 (see Theorem 1). The left panel of Fig. 2 shows that the WSO 2 scheme exhibits the same convergence behavior as the stage order 2 EDIRK scheme and performs equally well in terms of accuracy.

#### *4.2 Linear PDE Test Problem: Schrödinger Equation*

As a linear PDE test problem, we study the dispersive Schrödinger equation. The method of manufactured solutions is used, i.e., the forcing, the boundary conditions (b.c.) and initial conditions (i.c.) are selected to generate a desired true solution. The

<sup>1</sup>We do not construct an order 4 scheme with WSO 2, as we see no role for such a method.

**Fig. 2** Error convergence for linear ODE test problem (1). Left: 3rd order DIRK schemes with WSO 1 (blue circles), WSO 2 (red triangles), WSO 3 (black squares), and a 3rd order EDIRK scheme with stage order 2 (light red dots). Right: 4th order DIRK schemes with WSO 1 (blue circles) and WSO 3 (red triangles)

**Fig. 3** Error convergence for the Schrödinger equation using 3rd order DIRK schemes with WSO 1 (left) and WSO 3 (middle), and a 4th order DIRK with WSO 3 (right)

spatial approximation is carried out using 4th order centered differences on a fixed spatial grid of 10,000 cells. This renders spatial approximation errors negligible and thus isolates the temporal errors due to DIRK schemes. The errors are measured in the maximum norm in space.

We consider

$$u\_l = \frac{i\omega}{k^2} u\_{xx} \text{ for } (x, t) \in (0, 1) \times (0, 1.2], \quad u = g \text{ on } \{0, 1\} \times (0, 1.2] \text{ }, \tag{6}$$

with the true solution *u(x, t)* <sup>=</sup> *<sup>e</sup>i(kx*−*ωt)*, *<sup>ω</sup>* <sup>=</sup> <sup>2</sup>*<sup>π</sup>* and *<sup>k</sup>* <sup>=</sup> 5. Figure <sup>3</sup> shows the convergence orders of *u*, *ux* and *uxx* for 3rd order DIRK schemes with WSO 1 (left), WSO 3 (middle) and a 4th order DIRK scheme with WSO 3 (right). For IBVPs, spatial boundary layers are produced by RK methods, thus limiting the convergence order in *u* to *q*˜ + 1, with an additional half an order loss per derivative when *q<p* ˜ [9]. As a result, the 4th order WSO 3 scheme recovers 4th order convergence in *u* and improves the convergence in *ux* and *uxx*. When *q*˜ = *p*, the full convergence order in *u*, *ux* and *uxx* is achieved, as seen in the middle panel in Fig. 3.

**Fig. 4** Error convergence for the viscous Burgers' equation using 3rd order DIRK schemes with WSO 1 (left), WSO 2 (middle) and WSO 3 (right)

#### *4.3 Nonlinear PDE Test Problem: Burgers' Equation*

This example demonstrates that WSO avoids order reduction for certain nonlinear IBVPs as well. We consider the viscous Burgers' equation with pure Neumann b.c.

$$u\_t + uu\_\chi = \nu u\_{xx} + f \quad \text{for } (\ge, t) \in (0, 1) \times (0, 1], \quad u\_\chi = h \quad \text{on } \{0, 1\} \times (0, 1] \tag{7}$$

Here *ν* = 0*.*1 and *u(x, t)* = cos*(*2 + 10*t)*sin*(*0*.*2 + 20*x)*. The nonlinear implicit equations arising at each time step are solved using a standard Newton iteration. The choice of Neumann b.c. distinguishes this example from the one given in [9]. With Neumann b.c., the convergence order in *u* is limited to *q*˜ + 1*.*5 (half an order better than with Dirichlet b.c.). Figure 4 shows that order reduction arises with the stage order 1 scheme, and that the WSO 2 scheme recovers 3rd order convergence for *u* and *ux* , and the 3rd order WSO 3 scheme yields 3rd order convergence for *u*, *ux* and *uxx*.

#### *4.4 Stiff Nonlinear ODE: Van der Pol Oscillator*

This example illustrates that DIRK schemes with high WSO may not remove order reduction for all types of nonlinear problems. Consider the Van der Pol oscillator

$$\mathbf{x}' = \mathbf{y} \quad \text{and} \quad \mathbf{y}' = \mu(\mathbf{1} - \mathbf{x}^2)\mathbf{y} - \mathbf{x} \; , \tag{8}$$

with i.c. *(x(*0*), y(*0*))* = *(*2*,* 0*)*, stiffness parameter *μ* = 500, and final time *T* = 10. The nonlinear system at each time step is solved via MATLAB's built-in nonlinear system solver. The "exact" solution is computed using explicit RK4 with a time step *Δt* <sup>=</sup> <sup>10</sup>−6. In this case, the presented DIRK schemes with high WSO do not improve the convergence rates in the stiff regime and they perform worse than the WSO 1 scheme in terms of accuracy (see Fig. 5). On the other hand, an EDIRK with

**Fig. 5** Error convergence for Van der Pol's equation. Left: 3rd order DIRK schemes with WSO 1 (blue circles), WSO 2 (red triangles) and WSO 3 (black squares). Right: 4th order DIRK schemes with WSO 1 (blue circles) and WSO 3 (red triangles), and a 3rd order EDIRK scheme with stage order 2 (black squares)

stage order 2 improves the rate of convergence in the stiff regime (see right panel in Fig. 5). However, it does so, interestingly, by yielding larger errors for large time steps.

#### **5 Conclusions and Outlook**

This study demonstrates that it is possible to overcome order reduction (OR) for certain classes of problems in the context of DIRK schemes, even though these are limited to low stage order. A specific *weak stage order* (WSO) "eigenvector" criterion has been presented, analyzed, and applied to determine DIRK schemes with WSO up to 3. The numerical results confirm that the schemes avoid OR for linear problems and for some nonlinear problems in which the mechanism for order reduction is linear (i.e., boundary conditions). The key limitation found herein is that the eigenvector criterion cannot go beyond WSO 3 for DIRK schemes. Hence, a key question of future research is how high WSO is admitted by the general criterion in Definition 2. Another important future research task is to devise further DIRK schemes that are truly optimized in terms of truncation error coefficients or other criteria.

**Acknowledgements** This work was supported by the National Science Foundation via grants DMS-1719640 (BS&DZ) and DMS-1719693 (DS); and the Simons Foundation (#359610) (DS).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Scheme for Evolutionary Navier-Stokes-Fourier System with Temperature Dependent Material Properties Based on Spectral/hp Elements**

**Jan Pech**

#### **1 Introduction**

This work presents a numerical algorithm for the system of the Navier-Stokes equations coupled with the balance of internal energy

$$\rho \left( \frac{\partial \mathbf{v}}{\partial t} + \mathbf{v} \cdot \nabla \mathbf{v} \right) = -\nabla p + \frac{1}{\text{Re}} \nabla \cdot \left[ 2\mu \mathbb{D} + \lambda \left( \nabla \cdot \mathbf{v} \right) \mathbb{I} \right] + \mathbf{f}\_{\mathbf{v}} \tag{1a}$$

$$\frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{v}) = m \tag{1b}$$

$$\frac{\partial T}{\partial t} + \mathbf{v} \cdot \nabla T = \frac{1}{\text{Re}\,\text{Pr}} \nabla \cdot (\kappa \nabla T) + f\tau \,, \tag{1c}$$

where **v** = [*u, v, w*] <sup>T</sup> is the velocity vector (by setting *<sup>w</sup>* <sup>=</sup> const*.* <sup>=</sup> 0 we restrict to 2D problem), *p* is a variable related to the thermodynamic pressure,<sup>1</sup> *T* denotes the temperature, <sup>D</sup> <sup>=</sup> <sup>1</sup> 2 ! ∇**v** + *(*∇**v***)* T " is the symmetric part of the rate of strain

*ρ*∗ ∞|**v**<sup>∗</sup> ∞|

J. Pech (-)

© The Author(s) 2020

<sup>1</sup>We call *thermodynamic pressure* the variable acting in the equation of state, e.g. *<sup>p</sup>* <sup>=</sup> *ρRT* for ideal gas. Quantities with physical units (superscript star) are normalized by its farfield values (subscript infinity), e.g. **<sup>v</sup>** <sup>=</sup> **<sup>v</sup>**<sup>∗</sup> |**v**∗ ∞| , *<sup>T</sup>* <sup>=</sup> *<sup>T</sup>* <sup>∗</sup> *T* ∗ <sup>∞</sup> , etc. The dimensionless pressure in (1a) is *<sup>p</sup>* <sup>=</sup> *p*∗ 2 .

Institute of Thermomechanics of the Czech Academy of Sciences, Praha 8, Czech Republic e-mail: jpech@it.cas.cz

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_37

tensor, constant Re is the Reynolds number and constant Pr is the Prandtl number (for sake of simplicity we set Re = Pr = 1 for the testing on exact solution).

The fluid is expected to be (calorically) perfect,<sup>2</sup> Newtonian,<sup>3</sup> whose heat flux obeys the Fourier law.<sup>4</sup> In system (1), we consider those fluids, which become nonhomogeneous in variable temperature fields due to temperature dependence of its material parameters, namely the density *ρ* = *ρ(T )*, dynamic viscosity *μ* = *μ(T )* and thermal conductivity *κ* = *κ(T )*.

Instead of (1a), we solve

$$\rho \left( \frac{\partial \mathbf{v}}{\partial t} + \mathbf{v} \cdot \nabla \mathbf{v} \right) = -\nabla \tilde{p} + \frac{1}{\text{Re}} \nabla \cdot \left[ 2\mu \mathbb{D} - \frac{2}{3} \mu \left( \nabla \cdot \mathbf{v} \right) \mathbb{I} \right] + \mathbf{f}\_{\mathbf{v}},\tag{2}$$

where *<sup>p</sup>*˜ <sup>=</sup> *<sup>p</sup>* <sup>−</sup> *μb*∇ · **<sup>v</sup>** is *mean* or *mechanical* pressure, while *μb* <sup>=</sup> *<sup>λ</sup>* <sup>+</sup> <sup>2</sup> <sup>3</sup>*μ* is the *bulk viscosity*. Equation (2) has the same structure as (1a) while setting *<sup>λ</sup>* = −<sup>2</sup> <sup>3</sup>*μ* (or equivalently *μb* = 0, c.f. Stokes hypothesis), but physical interpretation of pressure changes.

Without loss of generality, solving (2) instead of (1a), we avoid specification of the second viscosity coefficient *λ*.

The forcing terms **fv** , *fT* , may represent action of volumetric forces, e.g. gravity or viscous heating, but *m* is set zero in most of realistic situations. In case of testing of our algorithm on a given solution [**v***e, pe, Te*] T, we construct the forcing terms such, that Eqs. (2), (1b) and (1c) are satisfied.

Our computational scheme is developed for simulations based on the spectral/hp element approximation in spatial coordinates. We use the polynomial approximations of degree 15 in our tests, what eliminates the numerical error in spatial coordinates and we are getting an overview of error production, which belongs directly to the algorithm/discretisation in time. The high order spatial approximations also naturally include approximations of higher-order derivatives, what is utilized in the scheme.

The previous results from literature are, up to the authors knowledge, restrictions of (1) setting at least one of the material parameters constant, the velocity field to be divergence-free or modelling a stationary flow, see Table 1.

<sup>2</sup>Internal energy *<sup>e</sup>* of the *calorically perfect* fluids obeys *<sup>e</sup>* <sup>=</sup> *cV <sup>T</sup>* , where specific heat at constant volume is independent of temperature (*cV* = const*.*).

<sup>3</sup>We use the term Newtonian fluid in a general sense for fluids, whose stress tensor is linearly dependent on the strain rate tensor. However, the viscous part of the stress tensor is not traceless as often expected if fluid is called Newtonian.

<sup>4</sup>The Fourier law relates the heat flux **q** to the thermal conductivity *κ* and the temperature gradient ∇*T* as **q** = −*κ*∇*T* .



Stationary and non-stationary models are denoted *stat.* and *nonst.*, unspecified variability of a property is denoted *var.*

#### **2 Algorithm**

Our approach is inspired by the velocity-correction scheme with the high order pressure boundary condition (HOPBC) proposed for the incompressible Navier-Stokes equations in [7]. The constant property case, [7], is widely used for its efficiency and was already extended to problems with variable viscosity in [6]. Its modification was used also to the incompressible Navier-Stokes-Fourier system with temperature dependent viscosity and thermal conductivity in [10]. Efficiency of the approach comes from the implicit-explicit (IMEX) formulation, which allows decoupling of the system.

The main contribution of the present work, which is a continuation of [10], is in extension to the problems with temperature dependent density. However, the velocity divergence cannot be further neglected in the momentum balance, what is the substantial difference from the previously discussed models and algorithms.

#### *2.1 Decoupled System*

We use the IMEX scheme in which the *Backward difference formula* (BDF) of order *Q* approximates the temporal derivative and a consistent extrapolation is applied to chosen terms (N)

$$\frac{\partial u}{\partial t} = \mathcal{L}(u) + \mathcal{N}(u) \xrightarrow{IMEX} \frac{\mathcal{V}u\_{n+1} - \sum\_{q=0}^{Q-1} \alpha\_{q}u\_{n-q}}{\Delta t} = \mathcal{L}\_{n+1} + \sum\_{q=0}^{Q-1} \beta\_{q} \mathcal{N}\_{n-q} \tag{3}$$

In (3), *u* is the searched solution, L denotes the terms solved implicitly, which we expect to be constant in time. Subscript *n* + 1 (or operator in square brackets with subscript) denotes evaluation at time *tn*+<sup>1</sup> = *t*0+*(n*+1*)
t*, where *t* is the discrete time step. Coefficients {*αq* } *Q*−1 *<sup>q</sup>*=<sup>0</sup> , {*βq*} *Q*−1 *<sup>q</sup>*=<sup>0</sup> and *<sup>γ</sup>* for particular *<sup>Q</sup>* can be found, e.g., in [10]. Henceforward, we use '∗' in the superscript to denote extrapolation, <sup>N</sup><sup>∗</sup> <sup>≡</sup> <sup>N</sup><sup>∗</sup> := <sup>&</sup>lt;*Q*−<sup>1</sup> *<sup>q</sup>*=<sup>0</sup> *βq*N*n*−*<sup>q</sup>* .

The extrapolated terms are evaluated using data from previous time steps, {N*n*−*q* } *Q*−1 *<sup>q</sup>*=<sup>0</sup> and {*un*−*<sup>q</sup>* } *Q*−1 *<sup>q</sup>*=<sup>0</sup> , what allows separate/decoupled solution of the (generalized) Navier-Stokes equations (2)–(1b) and the non-linear energy equation (1c).

Solution during one time step may be summarized to the scheme

	- (a) Solve the pressure-Poisson equation for *p*˜*n*+<sup>1</sup>
	- (b) Solve velocity-correction for **v***n*+<sup>1</sup>

#### *2.2 Balance of Momentum and Mass*

The scheme decouples solution of the Navier-Stokes system (2)–(1b) to the pressure-Poisson equation and an elliptic equation for velocity. The equation for pressure is derived as a projection to the irrotational space by application of the divergence operator to (2)

$$\begin{aligned} \nabla^2 p &= \frac{\mathcal{V}}{\Delta t} \left( \left[ \frac{\partial \rho}{\partial t} \right]^{\*\*} - m\_{n+1} \right) - \frac{1}{\text{Re}} \left[ \nabla \mu \cdot (\nabla \times \nabla \times \mathbf{v}) \right]^{\*} \\ &+ \nabla \cdot \left\{ \frac{1}{\Delta t} \rho^{\*} \hat{\mathbf{v}} + \frac{1}{\text{Re}} \left[ \nabla \mu \cdot \left( \nabla \mathbf{v} + (\nabla \mathbf{v})^{\mathsf{T}} \right) \right]^{\*} \right. \\ &\left. + \frac{1}{\text{Re}} \left( -\frac{2}{3} [\nabla \mu]^{\*} [\nabla \cdot \mathbf{v}]\_{n+1} + \frac{4}{3} \mu^{\*} \nabla [\nabla \cdot \mathbf{v}]\_{n+1} \right) + \mathbf{f}\_{n+1} \right\} \end{aligned} , \tag{4}$$

where we applied (1b), identities ∇×∇× **<sup>v</sup>** = ∇∇ · **<sup>v</sup>** − ∇2**<sup>v</sup>** and ∇ · ∇× ≡ 0, *<sup>∂</sup>***v***/∂t* was substituted by BDF and **<sup>v</sup>**<sup>ˆ</sup> <sup>=</sup> <sup>&</sup>lt;*Q*−<sup>1</sup> *<sup>q</sup>*=<sup>0</sup> *αq* **<sup>v</sup>***n*−*<sup>q</sup>* <sup>−</sup>*<sup>t</sup>* [**<sup>v</sup>** · ∇**v**] ∗. The temporal derivative of the density, which is extrapolated in (4), is approximated by *Q*-th order BDF

$$
\left[\frac{\partial\rho}{\partial t}\right]\_n \approx \frac{\chi\rho\_n - \sum\_{q=0}^{Q-1} \alpha\_q \rho\_{n-1-q}}{\Delta t}.\tag{5}
$$

We denote the extrapolation of the derivative approximation by superscript '∗∗'. Note, that we have to specify the initial value ! *∂ρ ∂t* " 0 or both *ρ*0, *ρ*−<sup>1</sup> to initialise the scheme of the lowest order *Q* = 1.

Our model assumes, that the density is entirely determined by the temperature distribution. Then, the divergence of velocity, whose forward estimate, [∇ · **v**]*n*+1, is required in (4), follows from (1b)

$$\mathbb{E}[\nabla \cdot \mathbf{v}]\_{n+1} \approx \frac{1}{\rho^\*} \left\{ m\_{n+1} - \left[ \mathbf{v} \cdot \nabla \rho \right]^\* - \left[ \frac{\partial \rho}{\partial t} \right]^{\*\*} \right\} \,. \tag{6}$$

The forward estimate of velocity divergence is the crucial step in the proposed scheme.

HOPBC is the natural boundary condition for (4). It is derived as projection of the momentum equation (2) to the direction of normal **n** to the domain boundary *∂*

$$\frac{\partial \tilde{\rho}}{\partial \mathbf{n}} = \mathbf{n} \cdot \left\{-\rho^{\*} \left[\frac{\partial \mathbf{v}}{\partial t}\right]^{\*\*} + \left[-\rho \mathbf{v} \cdot \nabla \mathbf{v} + \frac{1}{\text{Re}} \left(-\mu \nabla \times \nabla \times \mathbf{v} + \nabla \mu \cdot \left[\nabla \mathbf{v} + (\nabla \mathbf{v})^{\mathsf{T}}\right]\right)\right]^{\*}\right\} \tag{7}$$

$$+ \frac{1}{\text{Re}} \left(-\frac{2}{3} \left[\nabla \mu\right]^{\*} \left[\nabla \cdot \mathbf{v}\right]\_{n+1} + \frac{4}{3} \mu^{\*} \nabla \left[\nabla \cdot \mathbf{v}\right]\_{n+1}\right) + \mathbf{f}\_{n+1}\right\} \tag{7}$$

The forward estimate of velocity divergence follows from (6) again. Similarly to (4), we approximate the acceleration term *<sup>∂</sup>***<sup>v</sup>** *∂t* by the BDF of *Q*-th order, whose initialisation requires value ! *∂***v** *∂t* " 0 or both the values **v**<sup>0</sup> and **v**−1. The problem of initialisation of ! *∂***v** *∂t* "∗∗ and ! *∂ρ ∂t* "∗∗ is circumvented in many realistic simulations, which begin from a constant fields.

The solution of (4) gives estimate/prediction of *p*˜*n*+<sup>1</sup> and we can solve (2) as an elliptic problem for **v***n*+1. However, in the case of temperature dependent viscosity and density, the algebraic system derived for operators with variable coefficients has time dependent matrices, whose direct solution is inefficient. To preserve efficiency of the scheme, we split such operators to the time independent part, which is solved implicitly using a direct method and a variable part, which is extrapolated together with the non-linear terms. We introduce material properties in form

$$
\mu = \mu(T) = \bar{\mu} + \mu\_i, \quad \kappa = \kappa(T) = \bar{\kappa} + \kappa\_l, \quad \frac{1}{\rho(T)} = \overline{\left(\frac{1}{\rho}\right)} + \left(\frac{1}{\rho}\right)\_i \tag{8}
$$

where *μ*¯ and *κ*¯ are time-independent, while *μi* = *μi(***x***,t)* and *κi* = *κi(***x***,t)*. The variable density *ρ* = *ρ(T )* acts in our scheme as an inverse value, c.f. (10), so the splitting is done accordingly.

To demonstrate the splitting, we consider the second order operator with variable viscosity and density

$$\frac{1}{\rho(T)}\nabla \cdot \left[\mu(T) \left(\nabla \mathbf{v}\right)^{\mathrm{T}}\right] = \overline{\left(\frac{1}{\rho}\right)} \nabla \cdot \left[\bar{\mu} \left(\nabla \mathbf{v}\right)^{\mathrm{T}}\right] + \left(\frac{1}{\rho}\right)\_{i} \nabla \cdot \left[\bar{\mu} \left(\nabla \mathbf{v}\right)^{\mathrm{T}}\right]$$

$$+ \frac{1}{\rho} \nabla \cdot \left[\mu\_{l} \left(\nabla \mathbf{v}\right)^{\mathrm{T}}\right].\tag{9}$$

Only the term with time independent operator 1 *ρ* ∇·! *μ*¯ *(*∇**v***)* T " is solved implicitly, while we apply extrapolation to terms containing variable parameters 1 *ρ i* and *μi*. This approach is valid for *<sup>μ</sup>*¯ = ¯*μ(***x***)*, *<sup>κ</sup>*¯ = ¯*κ(***x***)*, resp. 1 *ρ* = 1 *ρ (***x***)*, but if *μ*¯ is constant in space, the constant operator simplifies to *<sup>μ</sup> ρ* <sup>∇</sup>2**<sup>v</sup>** (resp. <sup>∇</sup>2**<sup>v</sup>** if the properties are normalized to *μ*¯ = ¯*κ* = ¯*ρ* = 1, what is the case of (1), the balance equations in form independent of physical units).

The final form of the equation for velocity becomes

$$\begin{aligned} \nabla^2 \mathbf{v}\_{n+1} - \frac{\mathcal{Y}}{\Delta t} \frac{\bar{\rho}}{\bar{\mu}} \text{Re} \mathbf{v}\_{n+1} &= \\ \frac{\bar{\rho}}{\bar{\mu}} \left\{ - \text{Re} \frac{\mathcal{Y}}{\Delta t} \hat{\mathbf{v}} + \frac{1}{\rho^\*} \left[ \text{Re} (\nabla \tilde{p}\_{n+1} - \mathbf{f}\_{n+1}) - \left[ \nabla \mu \cdot [\nabla \mathbf{v} + (\nabla \mathbf{v})^T \mathbf{I} \right] \right]^\* \right. \\ &\left. + \frac{2}{3} [\nabla \mu]^\* [\nabla \cdot \mathbf{v}]\_{n+1} - \frac{1}{3} \mu^\* \nabla [\nabla \cdot \mathbf{v}]\_{n+1} \right] \right\}^{-1} \\ &- \left( \nabla [\nabla \cdot \mathbf{v}]\_{n+1} - [\nabla \times \nabla \times \mathbf{v}]^\* \right) \left[ \frac{\bar{\rho}}{\bar{\mu}} \left( \frac{1}{\rho^\*} \right)\_i \mu^\* + \frac{1}{\bar{\mu}} \mu^\*\_i \right] \end{aligned} \tag{10}$$

#### *2.3 Balance of Energy*

The energy equation with temperature dependent thermal conductivity is strongly non-linear. We split the diffusion operator to the time independent and the variable part, following the technique shown for the velocity-correction (10). We set *κ*¯ = 1 for simplicity and the discretized energy equation (1c) gets form

$$\nabla^2 T\_{n+1} - \frac{\mathcal{V} \text{Re} \Pr}{\Delta t} T\_{n+1} = \text{Re} \Pr \left( -\frac{\hat{T}}{\Delta t} - f\_{T\_{n+1}} \right) - \left[ \nabla \cdot \kappa\_l \nabla T \right]^\*,\tag{11}$$

where *<sup>T</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>&</sup>lt;*Q*−<sup>1</sup> *<sup>q</sup>*=<sup>0</sup> *αqTn*−*<sup>q</sup>* <sup>−</sup> *<sup>t</sup>* **v** · ∇*T* ∗ . Operator <sup>∇</sup><sup>2</sup> <sup>−</sup> *<sup>γ</sup>*Re Pr *<sup>t</sup>* is time independent and allow inversion using a direct method, what results in good performance in computations on long time intervals.

#### **3 Temporal Convergence on Manufactured Solution and Application**

Our convergence tests are based on the method of manufactured solutions, alternative to estimates of numerical analysis on simplified system. This approach lacks generality, because we always restrict to particular data and some representative of the solution space, but we get rough convergence estimate for unrestricted equation system (1), while proving also the correctness of method implementation.

As an exact solution, we take a smooth functions **<sup>v</sup>***<sup>e</sup>* : <sup>×</sup> *(*<sup>0</sup> : *T )* <sup>→</sup> <sup>R</sup>*n*, *<sup>p</sup>*˜*<sup>e</sup>* : <sup>×</sup> *(*<sup>0</sup> : *T )* <sup>→</sup> <sup>R</sup>, *Te* : <sup>×</sup> *(*<sup>0</sup> : *T )* <sup>→</sup> <sup>R</sup>

$$
\begin{pmatrix} \mathbf{v}\_e \\ \tilde{p}\_e \\ T\_e \end{pmatrix} = \begin{pmatrix} u\_e \\ v\_e \\ \tilde{p}\_e \\ T\_e \end{pmatrix} = \begin{pmatrix} 2\cos(\pi x)\cos(\pi y)\sin(t) \\ \sin(\pi x)\sin(\pi y)\sin(t) \\ 2\sin(\pi x)\sin(\pi y)\cos(t) \\ \sin(\pi)\sin(y)\cos(t) \end{pmatrix} \tag{12}
$$

and derive the forcing terms **fv**, *m* and *fT* such, that Eqs. (2), (1b), (1c) are fulfilled (in all cases we set Re = Pr = 1). Divergence of velocity in (12) is ∇ · **v***<sup>e</sup>* = −*π*sin*(πx)*cos*(πy)*sin*(t)*, variable in both the spatial and temporal coordinates and with amplitude comparable with the solution itself. We choose a computational domain = [0:2]×[0*.*5:2*.*5] consisting of two elements = [0:1]×[0*.*5:2*.*5]∪[1:2]×[0*.*5:2*.*5]. Extent of and form of the exact solution, is inspired by [3], where the velocity-correction scheme of [7] was tested on a similar, manufactured solution.

The incompressible Navier-Stokes equations define the pressure up to a constant value and only the boundary condition for velocity is needed. In this sense, we set the Dirichlet boundary condition for velocity on whole *∂*. However, the pressure-Poisson equation (4) requires setting a boundary condition as a consequence of the decoupling. We set HOPBC (7) at *∂* and solve the fully Neumann problem, which defines the solution up to a constant value, which we set by fixing the solution to zero in one of the grid points. The boundary condition for pressure is an artificial element of the computational scheme and its existence is related to the splitting error. The boundary condition for energy equation (11) is of Dirichlet type for whole *∂*.

We present the first and second order schemes in time in the convergence tests. The technique is applicable to higher-order schemes as well. A multi step schemes use data from multiple time steps, what complicates its initialisation. We apply the first order BDF method for initialisation of the second order scheme. The first order scheme needs data of only one backward time step, but the time step must be appropriately shortened.

As mentioned already, the acceleration in HOPBC (7) and the term *∂ρ ∂t* in (4) require an initial value or one other backward value for proper initialisation also in case of the first order scheme, what is in contradiction to standard initial conditions for system (1), which require only the initial values. However, setting the correct values for calculation of the first time step is crucial for the final accuracy of the solution.

Finally, we trace appropriate norms of difference between the exact and computed solutions on a set of computations with time steps *<sup>t</sup>* <sup>=</sup> *t/*˜ <sup>2</sup>*n, 
t* ˜ = 0*.*2*, n* = 0*,...,* 9 for *t* ∈ [0 : 1].

We use the power laws for approximation of dependence of material parameters on temperature

$$\mu(T) = (a\_m T + 1.0)^{\beta\_m}, \quad \kappa(T) = (a\_k T + 1.0)^{\beta\_k}, \quad \rho(T) = (a\_r T + 1.0)^{\beta\_r}. \tag{13}$$

The temporal convergence of the above scheme for *αm* = *αk* = *αr* = 0*.*1 *, βm* = *βk* = *βr* = 2 is shown in Fig. 1.

A detail view of error production, Fig. 2, shows, that the dominant error production arises at the grid point, which was used to set the unknown constant for the Neumann problem.

The scheme was successfully applied in a 2D simulation of flow around the heated cylinder and the results were compared with experimental data [15], where the dependence of the vortex shedding frequency (Strouhal number St) on the wall temperature of the cylinder, *TW* , was observed. Figure 3 shows the substantial difference in results between the model neglecting the thermal expansion, [10], and the present one. Fig. 4 shows value range and structure of velocity divergence in a chosen realistic simulation.

Δ*t*

**Fig. 2** Test on manufactured solution: difference v\_err = *ve* − *v* at the final time *t* = 1 for computation with *<sup>Q</sup>* <sup>=</sup> 2, *<sup>t</sup>* <sup>=</sup> <sup>0</sup>*.*2*/*27, c.f. Fig. 1. Polynomial approximation of degree 15

**Fig. 3** Frequency of the vortex shedding (Strouhal no. "St") as dependent on the normalized wall temperature *T* ∗ *<sup>W</sup>* in the flow around heated cylinder (Re ≈ 121*.*2). Comparison of the data from [10] (*ρ* = const*.*) "const.", the present scheme with *Q* = 1 "*ρ(T )*", experimental data of [15] "exp." and empirical formula "emp."[8]

**Fig. 4** Computed field of divergence, div*(*v*)* =∇· **v**, caused by the thermal expansion in the flow around heated cylinder (Re = 121*.*2, *TW /T*<sup>∞</sup> = 1*.*494), c.f. Fig. 3

#### **4 Conclusion**

The numerical scheme proposed for the Navier-Stokes-Fourier system with variable parameters allows to solve the highly complex mathematical model, which has an impact to understanding the processes connected with the heat exchange, transport and energy storage in fluids.

The computational scheme for a fluid flows influenced by temperature as modelled by system (2), (1b), (1c) was developed and tested. The scheme was primarily constructed for spatial discretisations based on spectral/hp finite elements and presented results were obtained after implementation to the Nektar++ framework [2], modified version 3.3.

We did not impose restrictions to the type of functional dependency of the material parameters on temperature. Graph of error convergence in *L*<sup>∞</sup> norm, Fig. 1, results from testing on a manufactured solution and shows a good convergence properties of the scheme, what is promising for applications.

Considered model neglects compressibility in the sense of direct dependence of density on pressure, but the velocity field is not divergence free as a consequence of the thermal expansion. A forward estimate of velocity divergence is needed in the proposed scheme and its successful approximation is one of the main contributions presented in this work. For these reasons, the scheme is unique among numerical schemes based on the finite element approximations in space.

Proposed scheme is an extension of the efficient semi-implicit solver for Incompressible Navier-Stokes system [7] and it is suitable for a fast and highly accurate simulations of problems on long time intervals. The present results inspire implementation of high order BDF schemes and extension of the solver to 3 spatial coordinates.

Derivation of the scheme includes a number of sub-steps, whose detail description is beyond the scope of this article and will be published separately, together with extension of the scheme for energy equation with variable density and further testing of performance as dependent on various physical parameters in the equations.

Also the results from application of the scheme to computations of a physically realistic problem and comparison of its results with experimental data exhibit a good coincidence and will be presented with detail description in a separate article.

**Acknowledgements** The author wishes to acknowledge the support of the research programme no. 3 "Efficient energy conversion and storage" of the Strategy AV21 initiative of the Czech Academy of Sciences.

#### **References**

1. Axelsson, O., He, X., Neytcheva, M.: Numerical solution of the time-dependent Navier-Stokes equation for variable density-variable viscosity. Part I. Math. Model. Anal. **20**(2), 232–260 (2015)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Implicit Large Eddy Simulations for NACA0012 Airfoils Using Compressible and Incompressible Discontinuous Galerkin Solvers**

**Esteban Ferrer, Juan Manzanero, Andres M. Rueda-Ramirez, Gonzalo Rubio, and Eusebio Valero**

### **1 Introduction**

High order Discontinuous Galerkin (DG) methods provide accurate solutions by enabling arbitrarily high polynomial approximations inside each grid element. For high order polynomials, the numerical errors are not distributed along all wavenumbers but localised at high wave-numbers [1–5]. This characteristic of high order methods results in very accurate simulations with low dissipative and dispersive errors. Although this characteristic seems a-priori beneficial for well resolved simulations, when computing under-resolved Large Eddy Simulations (LES), it can prove difficult to obtain stable simulations. In implicit (or under-resolved) Large Eddy Simulations (iLES), the smallest numerical eddies are larger than would have been in a finer mesh, leading to numerical under-resolution (i.e. coarse grid or low polynomial order) and aliasing [6]. Various methods have been proposed to stabilise under-resolved computations with aliasing. Among others, split forms or skew symmetric variants [7, 8]), localised interior penalty fluxes [9], over-integration [10– 12] or filtering [13] may be incorporated into the solver to stabilize the computations and remove or alleviate the aliasing.

Contrarily to low order methods, high order methods do not have enough inherent numerical dissipation in under-resolved simulations, to dissipate large flow structures (when compared to Kolmogorov scales). Therefore, computation of iLES flows using high order DG solvers require localised dissipative mechanisms to dissipate flow structures close to cut-off size. In what follows, we compare two

E. Ferrer (-) · J. Manzanero · A. M. Rueda-Ramirez · G. Rubio · E. Valero ETSIAE-UPM (School of Aeronautics - Universidad Politécnica de Madrid), Madrid, Spain

e-mail: esteban.ferrer@upm.es

CCS-UPM (Centre for Computational Simulation - Universidad Politécnica de Madrid), Madrid, Spain

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_38

dissipative stabilising mechanisms that enable the simulation of turbulent underresolved flows. On the one hand, we use a compressible formulation with an energy conserving split-form and dissipation through Roe fluxes [14]. On the other hand, the incompressible solver uses the viscous discretisation through interior penalty formulation to enhance stability [9]. We challenge both formulations with a NACA0012 airfoil at various angles of attack in turbulent regimes, to explore both accuracy and stability. We compare simulated results to experimental data and simulations using low order methods (Xfoil and Ansys-Fluent).

#### **2 Methodologies**

We first introduce the two different mechanisms used to stabilise both compressible and incompressible high order DG formulations. The explanation included here is brief and aims only at introducing the fundamental concepts and motivating ideas. Further details can be found in the following references by the authors [9, 14].

The 3D Navier-Stokes equations can be written as:

$$
\mathbf{u}\_l + \nabla \cdot \mathbf{F}\_e = \nabla \cdot \mathbf{F}\_v,\tag{1}
$$

where **<sup>u</sup>** is the vector of conservative variables *<sup>u</sup>* <sup>=</sup> *(ρ, ρv*1*,ρv*2*,ρv*3*, ρe)<sup>T</sup>* in compressible solvers. For incompressible solvers *<sup>u</sup>* <sup>=</sup> *(v*1*, v*2*, v*3*)<sup>T</sup>* and Eq. (1) is complemented with ∇ · *u*. Details on the definition of inviscid and viscous solvers can be found in [9, 14]. To derive discontinuous Galerkin schemes, we consider Eq. (1) for one mesh element *el*, multiply by a locally smooth test function *φj* , for 0 ≤ *j* ≤ *P*, where *P* is the polynomial degree, and integrate on *el*:

$$
\int\_{el} \mathbf{u}\_{l} \phi\_{j} + \int\_{el} \nabla \cdot \mathbf{F}\_{e} \phi\_{j} = \int\_{el} \nabla \cdot \mathbf{F}\_{v} \phi\_{j}. \tag{2}
$$

We can now integrate by parts the inviscid fluxes, *Fe*, integral to obtain a local weak form of the equations (one per mesh element):

$$
\int\_{el} \mathbf{u}\_{l} \phi\_{j} + \int\_{\partial el} \mathbf{F}\_{e} \cdot \mathbf{n} \phi\_{j} - \int\_{el} \mathbf{F}\_{e} \cdot \nabla \phi\_{j} = \int\_{el} \nabla \cdot \mathbf{F}\_{v} \phi\_{j}, \tag{3}
$$

where **n** is the normal vector at element boundaries *∂el*. We replace discontinuous fluxes at inter-element faces by a numerical inviscid flux, *F*∗ *<sup>e</sup>* , to obtain a weak form for the equations for each element,

$$
\int\_{el} \mathbf{u}\_l \cdot \boldsymbol{\phi}\_j + \int\_{\partial el} \mathbf{F}\_e^\* \cdot \mathbf{n} \boldsymbol{\phi}\_j - \int\_{el} \mathbf{F}\_e \cdot \nabla \boldsymbol{\phi}\_j = \int\_{el} \nabla \cdot \mathbf{F}\_v \boldsymbol{\phi}\_j,\tag{4}$$

where, we have omitted the fluxes at external boundaries, for simplicity. This set of equations for each element is coupled through the inviscid fluxes *F*∗ *<sup>e</sup>* and governs flow behaviour. Note that one can proceed similarly and integrate by parts the viscous terms (see [9, 15]), but here for simplicity we retain the volume integral.

$$
\int\_{el} \mathbf{u}\_{l} \cdot \boldsymbol{\phi}\_{j} + \int\_{\partial el} \underbrace{\mathbf{F}\_{e}^{\*} \cdot \mathbf{n}}\_{\text{Riemann solver}} \, \boldsymbol{\phi}\_{j} - \int\_{el} \mathbf{F}\_{e} \cdot \nabla \boldsymbol{\phi}\_{j} = \int\_{el} (\underbrace{\nabla \cdot \mathbf{F}\_{v}}\_{\text{Viscous term}}) \cdot \boldsymbol{\phi}\_{j} \tag{5}
$$

The non-linear inviscid and viscous terms that can be discretised to control dissipation in the numerical scheme have been underlined.

Riemann solvers are the classic option to include numerical dissipation in DG schemes [16, 17], since they naturally arise when discretising the non-linear terms. Comparison of different fluxes for homogeneous turbulence can be found in [14, 18]. A different option is to modify the viscous terms to enhance its dissipative properties. The latter has been proposed in [9] using an increased penalty parameter (compared to the minimum required to ensure coercivity of the scheme) when discretising the viscous terms using a interior penalty formulation.

#### *2.1 Compressible DGSEM Solver*

The compressible solver uses conservative variables to solve the Navier-Stokes equations. We use a particular nodal variant of DG methods: the Discontinuous Galerkin Spectral Element Method (DGSEM), see for example [19]. In addition, the compressible formulation is modified to be energy preserving [20]. The required split-form necessitate Gauss–Lobatto points to cancel out boundary terms using the summation-by-parts simultaneous-approximation-term property (SBP-SAT). The interested reader is referred to [5, 20–22]. These energy conserving schemes are designed to remain stable and energy conserving and consequently do not necessitate additional localised numerical dissipation. Nonetheless, in this work we introduce dissipation through Roe fluxes, to enhance robustness at high Reynolds numbers. Additionally, viscous terms are discretised using the Bassi-Rebay 1 (BR1) scheme, which is equivalent to the interior penalty formulation when using Gauss-Lobatto points and hexahedral elements [23]. Let us note that this formulation for the viscous fluxes is neutrally stable [24] and adds the minimum dissipation required to achieve a stable scheme, whilst others may introduce some extra dissipation. Other techniques are available to discretise second order derivatives and can be found in the classic review by Arnold et al. [15].

#### *2.2 Incompressible DG-Fourier Solver*

Flow solutions of the incompressible Navier-Stokes equations, are obtained from the 3D unsteady high order *h/p* Discontinuous Galerkin-Fourier solver [9, 25– 28]. The solver uses a second order stiffly stable approach to discretise the NS equations in time whilst spatial discretisation is provided by the discontinuous Galerkin-Symmetric Interior Penalty formulation with modal basis functions in the *x*-*y* plane. Here, *x* represents the streamwise flow direction and *y* is the normal direction. Spatial discretisation in the *z*-direction (here defining the spanwise airfoil length) is provided by a purely spectral method that uses Fourier series and allows computation of spanwise periodic three-dimensional flows. Since high order methods (e.g. discontinuous Galerkin and Fourier) are unable to provide enough numerical dissipation to enable under-resolved high Reynolds computations (e.g. as necessary in Large Eddy Simulations), we have adapted the original laminar version of the solver to increase (controllably) the dissipation and enhance the stability in under-resolved simulations [9]. This dissipative formulation has minimal impact on well resolved flow regions and its implicit treatment does not restrict the use of relatively large time steps, thus providing an efficient stabilization mechanism for Large Eddy Simulations. The solver has been widely validated for a variety of flows, including bluff body flows, airfoil and blade aerodynamics and vertical axis turbines under static and rotating conditions [9, 25–30].

#### **3 Numerical Results**

This section considers a NACA0012 airfoil at *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>4, *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> and *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>6</sup> (based on the airfoil chord *<sup>c</sup>*) for a range of Angles of Attack (*AoA*): 0◦ ≤ *AoA* ≤ 10◦. In what follows we compare incompressible and compressible simulations using polynomial orders *P* = 3 and *P* = 4. The averaged values have been computed after the development of three dimensional flow. The compressible solver uses a hexahedral mesh with 18,000 elements, which for P = 3 and 4 result in 1.1 and 2.2 million degrees of freedom. The incompressible solver, uses a mixed tri-quad 2D mesh and is expanded using Fourier in the homogeneous third direction (here 16 Fourier modes). Depending on the angle of attack, the resulting meshes include 0.6 to 1 million degrees of freedom. Meshes for the two solvers and for *AoA* = 0◦ are depicted in Fig. 1. Finally, all the simulations are computed with both DG solvers and consider a periodic spanwise lengths of *Lz/c* = 0*.*1. Note that we have not observed significant differences in the results when increasing the spanwise length. Statistics are accumulated during at least 40 convective time scales (based on the airfoil chord) and starting after the turbulent flow has developed (typically an initial transient of 10 convective time scales).

**Fig. 1** Meshes for NACA0012 airfoil: (**a**) Hexahedral mesh for compressible solver and (**b**) mixed tri-quad mesh for incompressible solver. Inset figures show high order polynomial mesh for order P=4

**Fig. 2** NACA0012 airfoil at *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> 106, from left to right: *AoA*: 0◦, *AoA*: 5◦ and *AoA*: 10◦. Simulations are obtained using the incompressible DG solver

## *3.1 Re* **<sup>=</sup> <sup>1</sup> <sup>×</sup> 106** *and Various Angles of Attack*

We start by illustrating the highest Reynolds number case, which is the most challenging in terms of stability and robustness. To illustrate the range of the flow behaviour at various *AoA*s, we show in Fig. 2, velocity contours for *AoA*: 0◦,5◦ and 10◦, computed using the incompressible DG solver. It can be seen that at *Re* <sup>=</sup> <sup>1</sup>×10<sup>6</sup> the flow remains attached for all angles, and that only mild separation is seen near the trailing edge. We will see in the next section that at lower Reynolds numbers this is not necessarily the case.

Figure 3 compares the aerodynamic coefficients with experimental data for various angles of attack and the two solvers. Figure 3a shows the lift coefficient against the *AoA* and Fig. 3b depicts the Lift-Drag Polar for *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>6. We observe very good agreement with experimental data for both solvers.

**Fig. 3** NACA0012 airfoil at *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> 106: (**a**) Lift coefficient vs angle of attack and (**b**) Lift-Drag Polar. Compressible (comp.) and incompressible (incomp.) DG simulations are compared to experimental data sets of Ladson [31], Gregory and O'Reilly [32], Abbot and Von Doenhoff [33]

### *3.2 AoA* **= 5◦** *and Various Reynolds Numbers*

Having shown the overall good performance in terms of aerodynamic quantities at the most challenging Reynolds numbers, we now focus our attention on the angle *AoA* = 5◦ and compare the usability of the solvers to study the NACA0012 boundary layer evolution.

First, we compare the aerodynamic coefficients for *AoA* = 5◦, and Reynolds numbers *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> and *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>6, using the incompressible and compressible solvers, both with polynomial order P = 3 and P = 4, in Table 1. We observe good agreement for the highest polynomial order. Small discrepancies are attributed to post-processing of statistics and lack of near wall resolution when using P = 3, which influences mainly the drag coefficient and particularly viscous drag. For completeness, we depict the flow evolution within the boundary layer using both solvers in Fig. 4. It can be seen that detachment near the trailing edge is similar for both solvers. Regarding transition to turbulence (represented by fluctuations in velocity contour), both solvers capture transition on the suction side. The compressible solver shows a transition location near the maximum thickness


Comparison of Lift and Drag using the DG compressible and DG incompressible solvers and two polynomial orders P = 3 and P = 4

DG incomp. P = 4 0*.*545 0*.*018 0*.*551 0*.*007

**Fig. 4** NACA0012 airfoil at *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> 105 and *AoA* <sup>=</sup> <sup>5</sup>◦ for P = 4: (**a**) Compressible DG solver. (**b**) Incompressible DG solver

**Fig. 5** NACA0012 airfoil at *AoA*: 5◦ for (**a**) *Re* <sup>=</sup> <sup>1</sup>×104, (**b**) *Re* <sup>=</sup> <sup>1</sup>×105 and (**c**) *Re* <sup>=</sup> <sup>1</sup>×106. Velocity magnitude isocontours and unstructured mesh details are included

(*x/c* ≈ 0*.*4), whilst the incompressible solver shows transition closer to the leading edge (*x/c* ≈ 0*.*2). We have observed significant variations of the transition location for the compressible solver when varying the polynomial order, that we have not seen in the incompressible solver. Further studies are necessary to completely assess the influence of discretisation in the transition location for the two solvers.

Second, we explore the pressure coefficient distribution along the airfoil profile when varying the Reynolds number. We only depict results for the incompressible DG solver since these are very similar to the results provided by the compressible solver. Note that this is not surprising, since the lift coefficients at *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> and *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>6</sup> are very similar for P = 4 at *AoA* <sup>=</sup> <sup>5</sup>◦, see Table 1. Figure <sup>5</sup> shows velocity contours for *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>4, *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> and *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>6</sup> at *AoA* = 5◦. It can be seen that for the lowest Reynolds, the boundary layer remains laminar until it detaches after the maximum thickness, showing a highly unsteady wake. When the Reynolds number increases, the boundary layer shows transition to turbulence before the maximum thickness, as appreciated by the fluctuations and small scales appearing in Fig. 5.

To quantify these results, we depict in Fig. 6, the pressure distribution (Cp) for the three Reynolds numbers. In the top row, we show instantaneous Cp against averaged for incompressible DG solver. In the bottom row, we compare mean Cp distributions against Xfoil [34] (with critical N-factor *Ncr* = 1) and Fluent SST (fully turbulent

simulation) [35]. At *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>4, the top figure shows that the boundary layer detaches before transition occurs and after the maximum thickness, as shown by the velocity contours in Fig. 5. Since the flow detaches leading to a highly unsteady wake, there is little hope that the averaged Cp captures the actual behaviour of the boundary layer. This is why, in the bottom figure, the mean values obtained using the incompressible DG solver do not agree with the mean Xfoil and Fluent values that assume steady turbulent flow. At *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> and At *Re* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>6, the instantaneous Cp values (top row) show scattering in the data associated to transition. This occurs close to the leading edge on the suction side, whilst it is delayed towards the trailing edge on the pressure side. The bottom row shows that the DG results compare very well to Xfoil when using a critical N = 1 (to set the transition point close to the leading edge), whilst Fluent SST (fully turbulent) shows lower Cp values associated to simulating the complete boundary layer as turbulent (no laminar region). This results suggest that DG solvers using iLES approaches (compressible and incompressible) can capture transitional behaviour in boundary layers even when relatively coarse meshes are selected.

#### **4 Conclusions**

In this contribution, we have presented results for turbulent flows over a NACA0012 airfoil. High order discontinuous Galerkin formulations require localised dissipation to remain stable for under-resolved turbulent flow conditions, often referred to as implicit Large Eddy Simulations. Here we have presented compressible and an incompressible DG formulations (with different stabilising mechanisms) that are able to cope with high Reynolds number flows. Both DG formulations provide aerodynamic coefficients and boundary layer information that compare favorably to experimental data and well established low order solvers. We conclude that the compressible and incompressible formulations included in this work can be very useful in aeronautical applications.

**Acknowledgements** This project has received funding from the European Union Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie grant agreement No 675008 for the SSeMID project. The authors acknowledge the computer resources and technical assistance provided by the *Centro de Supercomputación y Visualización de Madrid* (CeSViMa).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **SAV Method Applied to Fractional Allen-Cahn Equation**

**Xiaolan Zhou, Mejdi Azaiez, and Chuanju Xu**

### **1 Introduction**

The Allen-Cahn equation was originally introduced to describe the motion of antiphase boundaries in crystalline solids [1]. There have been a large body of work on numerical analysis of Allen-Cahn equations (cf. [2–5] and the references therein). We aim in this paper to use the SAV scheme, recently introduced and analyzed by a number of researchers; see, e.g., [5] and the references therein, to approximate the solution of the fractional version of the Allen-Cahn model. It consists in finding *<sup>φ</sup>* : <sup>×</sup> *(*0*, T* ] → <sup>R</sup> solution of

$$\begin{cases} \frac{\partial \phi}{\partial t} + \mathcal{y} \left( (-\Delta)^s \phi + f(\phi) \right) = 0, \quad \forall (\mathbf{x}, t) \in \Omega \times (0, T], \\\\ \nabla \phi \cdot \mathbf{n} \big|\_{\partial \Omega} = 0, \quad \forall t \in (0, T] \\\\ \phi(t = 0) = \phi\_0(\mathbf{x}), \qquad \forall \mathbf{x} \in \Omega. \end{cases} \tag{1.1}$$

In the above, *<sup>γ</sup>* is a positive kinetic coefficient, *<sup>s</sup>* <sup>∈</sup> *(*0*,* <sup>1</sup>*)*, <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* is a bounded domain, *n* is the outward normal, *f (φ)* = *F (φ)* with a given function *F (φ)* <sup>=</sup> <sup>1</sup> <sup>4</sup>*ε*<sup>2</sup> *(φ*<sup>2</sup> <sup>−</sup> <sup>1</sup>*)*<sup>2</sup> being the Ginzburg-Landau double-well potential. The phase field *<sup>φ</sup>*

X. Zhou · C. Xu (-)

M. Azaiez School of Mathematical Sciences, Xiamen University, Xiamen, China

489

School of Mathematical Sciences, Xiamen University, Xiamen, China e-mail: xlzhou@stu.xmu.edu.cn; cjxu@xmu.edu.cn

Bordeaux INP, I2M (UMR CNRS 5295), Université de Bordeaux, Pessac, France e-mail: azaiez@enscbp.fr

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_39

is such that

$$\phi = \begin{cases} 1, & \text{phase 1,} \\ -1, & \text{phase 2,} \end{cases}$$

and *ε* represents the thickness of the smooth transition layer connecting the two phases, which is small compared to the characteristic length of the system scale. The homogeneous Neumann boundary condition implies that no mass loss occurs across the boundary walls.

Among the different definitions of fractional Laplacians (see [6, 7] for a quantitative assessment of new numerical methods as well as available state-of-theart methods for discretizing the fractional Laplacians problems), we choose in this paper to focus on the fractional spectral definition. It is defined by

$$(-\Delta)^s \mu := \sum\_{i \in \mathbb{N}} a\_i \lambda\_i^s e\_i,$$

where *λi, ei* are the eigenvalues and eigenfunctions of the Laplace operator − in with homogeneous Neumann boundary condition, i.e., they satisfy

$$\begin{cases} -\Delta e\_l = \lambda\_l e\_l, & \mathbf{x} \in \Omega, \\\nabla e\_l \cdot \mathbf{n}|\_{\partial \Omega} = 0. \end{cases}$$

While, *ai* represents the projection of *u* on the direction *ei*, *ai* = *(u, ei)L*<sup>2</sup> . The spectral fractional Laplacian is nonlocal on the interior for noninteger *s* ∈ *(*0*,* 1*)*. We see that to compute the inner product *ai* = *(u, ei)L*<sup>2</sup> , it suffices for *u* to be defined on the interior of . No information about *<sup>u</sup>* on the exterior *<sup>R</sup><sup>d</sup>* \ is required. Thus, from a conceptual viewpoint, in boundary value problems the spectral fractional Laplacian can admit the same type of boundary conditions as the standard, local Laplacian −. In this paper, we let =] − 1*,* 1[ 2 < . Set *u(x,t)* <sup>=</sup> <sup>∞</sup> *n*=1 <∞ *<sup>m</sup>*=<sup>1</sup> *am,n(t)em,n(x)*, where *em,n* are the orthogonal eigenfunctions of the Laplace operator with homogeneous Neumann boundary conditions and *λm,n* are the corresponding eigenvalues. Then we define the spectral fractional Laplacian as,

$$(-\Delta)^s u(\mathbf{x}, t) := \sum\_{n=1}^{\infty} \sum\_{m=1}^{\infty} \lambda\_{m,n}^s a\_{m,n}(t) e\_{m,n}(\mathbf{x}), \quad 0 < s < 1, \ \forall u \in H^s(\Omega). \tag{1.2}$$

Here

$$H^{\mathcal{S}}(\Omega) := \left\{ u = \sum\_{n=1}^{\infty} \sum\_{m=1}^{\infty} a\_{m,n} e\_{m,n} \in L^2(\Omega) \, : \, |u|\_{\mathcal{S}} := \left( \sum\_{n=1}^{\infty} \sum\_{m=1}^{\infty} \lambda\_{m,n}^{\mathcal{S}} a\_{m,n}^2 \right)^{1/2} < \infty \right\}.$$

The rest of this paper in organized as follows. In Sect. 2, we present briefly the spectral method by giving some notations and reminders. The fractional Laplace operator and its possible applications is discussed in Sect. 3. To demonstrate the applicability of the approximative fractional Laplacian for real applications, we consider a fractional Allen-Cahn equation (FACE). Based on the scalar auxiliary variable (SAV) approach, we construct an unconditionally second-order energy stable BDF scheme (SAV/BDF2) for FACE. We present numerical results for a test case as well as a benchmark example in Sect. 4.

#### **2 Spatial Discretizations**

We limit here the description of the spectral approximation to the introduction of some notations and reminders (see [8, 9]). For complex domain, we can use spectral element method [10]. Let = {*(ξi, ρi)*; 0 ≤ *i* ≤ *N*} denote the sets of Gauss-Lobatto-Legendre quadrature nodes and weights associated to polynomials of degree *N*. These quantities are such that on *\** :=] − 1*,* +1[

$$\forall \phi \in \mathbb{P}\_{2N-1}(\Lambda), \qquad \int\_{-1}^{+1} \phi(\xi) \, d\xi = \sum\_{j=0}^{N} \phi(\xi\_j) \, \rho\_j,\tag{2.1}$$

where <sup>P</sup>*<sup>N</sup> (\*)* denotes the space of polynomials of degree <sup>≤</sup> *<sup>N</sup>*. We recall that the nodes *ξi (*<sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *N)* are solution to *(*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*2*)L <sup>N</sup> (x)* = 0, where *LN* denotes the Legendre polynomial of degree *N*.

The canonical polynomial interpolation basis *hi(x)* <sup>∈</sup> <sup>P</sup>*<sup>N</sup> (\*)* built on is given by the relationships:

$$h\_i(\mathbf{x}) = -\frac{1}{N(N+1)} \frac{1}{L\_N(\xi\_l)} \frac{(1-x^2)L\_N'(\mathbf{x})}{(x-\xi\_l)}, \qquad -1 \le x \le +1, \quad 0 \le i \le N,\tag{2.2}$$

with the elementary cardinality property

$$h\_l(\xi\_j) = \delta\_{lj}, \qquad 0 \le i, j \le N,\tag{2.3}$$

where *δij* is Kronecker's delta symbol.

In the sequel the phase field *φ* will be approximated in space variable by suitable polynomial functions *φN* as follows

$$\phi\_N(\mathbf{x}, t) = \sum\_{i=0}^{N} \sum\_{j=0}^{N} \alpha\_{i,j}(t) h\_i(\mathbf{x}) h\_j(\mathbf{y}). \tag{2.4}$$

The *L*2-inner products involved in the calculation will be achieved using Gauss-Lobatto-Legendre quadrature, which reads: for all continuous functions *ϕ* and *ψ* in ¯ ,

$$(\varphi, \psi) \approx (\varphi, \psi)\_N := \sum\_{l=0}^{N} \sum\_{j=0}^{N} \varphi(\xi\_l) \psi(\xi\_j) \rho\_l \rho\_j. \tag{2.5}$$

#### **3 Scalar Auxiliary Variable (SAV) Approach for FACE**

SAV approach was introduced in [4, 5] to solve gradient flows. The main purpose of this section is to construct efficient unconditionally stable scheme based on this approach for (1.1).

Throughout the paper, we assume there exists a constant *C*<sup>0</sup> such that ; *F (φ)*d*x* + *C*<sup>0</sup> *>* 0. We first introduce a *scalar* auxiliary variable

$$r(t) := \sqrt{\int\_{\Omega} F(\phi) \, \mathrm{d}x + C\_0}.$$

Then, we rewrite the phase-field equation (1.1) under an equivalent form as: find *<sup>φ</sup>* : *(*0*, T* ] × <sup>→</sup> <sup>R</sup> and *<sup>r</sup>* : *(*0*, T* ] → <sup>R</sup>, such that

$$\begin{cases} \frac{\partial \phi}{\partial t} = \chi \mu, & \nabla \phi \cdot \mathbf{n}|\_{\partial \Omega} = \; 0, \\\\ \mu = -( - \Delta )^{s} \phi - \frac{r(t)}{\sqrt{\int\_{\Omega} F(\phi) d\mathbf{x} + C\_{1}}} f(\phi), & \mathbf{x} \neq \mathbf{0}, \\\\ \frac{\mathbf{d}r}{\mathbf{d}t} = \frac{1}{2 \sqrt{\int\_{\Omega} F(\phi) d\mathbf{x} + C\_{1}}} \int\_{\Omega} f(\phi) \frac{\partial \phi}{\partial t} \, d\mathbf{x}. \end{cases} (3.1)$$

**Theorem 3.1** *If <sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *(*0*, T* ]*, Hs() ,* 0 *<s<* 1*, is the solution of equations* (3.1)*, then we have the following energy dissipation law*

$$\frac{d}{dt}\left(r^2 + \frac{1}{2}|\phi|\_s^2\right) = -\gamma\|\mu\|\_0^2. \tag{3.2}$$

*Proof* Taking the inner product of the first two equations with *μ*, *∂φ ∂t* respectively, and multiplying the third equation with 2*r(t)*, then adding them together, We obtain

$$-\left\|\boldsymbol{\eta}\right\|\boldsymbol{\mu}\|\_{0}^{2}=\frac{d}{dt}(\boldsymbol{r}^{2})+((-\Delta)^{s}\boldsymbol{\phi},\boldsymbol{\phi}\_{l}).\tag{3.3}$$

Let *φ(x,t)* = <<sup>∞</sup> *m,n*=<sup>1</sup> *am,n(t)em,n(x)* and taking advantage of the orthogonality of {*em,n*}, we verify

$$\phi((-\Delta)^{\mathcal{S}}\phi,\phi\_{\mathcal{I}}) = \sum\_{m,n=1}^{\infty} \lambda\_{m,n}^{\mathcal{S}} a\_{m,n}(t) a\_{m,n}'(t) = \frac{1}{2} \frac{d}{dt} \left( \sum\_{m,n=1}^{\infty} \lambda\_{m,n}^{\mathcal{S}} a\_{m,n}^2(t) \right) = \frac{1}{2} \frac{d}{dt} |\phi|\_{\mathcal{S}}^2. \tag{3.4}$$

Then combining (3.3) and (3.4) proves (3.2). %&

The energy law (3.2) means that the SAV approach (3.1) makes the modified energy

$$H(\phi) = r^2 + \frac{1}{2} |\phi|\_s^2$$

decay in time.

Now we construct a second-order semi-implicit scheme for the system (3.1). Given initial conditions *<sup>φ</sup>*<sup>0</sup> <sup>=</sup> *<sup>φ</sup>*0, and let *<sup>r</sup>*<sup>0</sup> <sup>=</sup> \*; *F (φ*0*)d<sup>x</sup>* <sup>+</sup> *<sup>C</sup>*0, find *<sup>φ</sup>n*+<sup>1</sup> <sup>∈</sup> *<sup>H</sup>s()* and *<sup>r</sup>n*+<sup>1</sup> <sup>∈</sup> <sup>R</sup>, *<sup>n</sup>* <sup>=</sup> <sup>1</sup>*,...* , such that

$$\frac{3\phi^{n+1} - 4\phi^n + \phi^{n-1}}{2\Delta t} = \chi\mu^{n+1} \tag{3.5}$$

$$\mu^{n+1} = -(-\Delta)^{s}\phi^{n+1} - \frac{r^{n+1}}{\sqrt{\int\_{\Omega} F\left(\overline{\phi}^{n+1}\right) dx + C\_0 f\left(\overline{\phi}^{n+1}\right)}}\tag{3.6}$$

$$\frac{3r^{n+1} - 4r^n + r^{n-1}}{2\Delta t} = \frac{1}{2\sqrt{\int\_{\Omega} F\left(\overline{\phi}^{n+1}\right) dx + C\_0} \int\_{\Omega} f\left(\overline{\phi}^{n+1}\right) \frac{3\phi^{n+1} - 4\phi^n + \phi^{n-1}}{2\Delta t} dx}{} \tag{3.7}$$

In the above, *φ*¯*n*+<sup>1</sup> can be any explicit approximation of *φ(tn*+1*)* with an error of *<sup>O</sup>(
t*2*)*. For instance, we may choose the following one

$$
\bar{\phi}^{n+1} = 2\phi^n - \phi^{n-1}.
$$

**Theorem 3.2** *The scheme* (3.5)*–*(3.7) *is unconditionally stable in the sense that*

$$\frac{1}{\Delta t} \Big[ \tilde{H} \Big[ (\phi^{n+1}, r^{n+1}), (\phi^n, r^n) \Big] - \tilde{H} [(\phi^n, r^n), (\phi^{n-1}, r^{n-1})] \Big] \leq -\gamma \|\mu^{n+1}\|\_{0}^2,\tag{3.8}$$

*with the modified energy*

$$\tilde{H}[(\phi^{n+1}, r^{n+1}), (\phi^n, r^n)] = \frac{1}{4} \left( |\phi^{n+1}|\_s^2 + |2\phi^{n+1} - \phi^n|\_s^2 \right) + \frac{1}{2} \left( (r^{n+1})^2 + (2r^{n+1} - r^n)^2 \right). \tag{3.9}$$

*Proof* The result can be directly deduced from taking the inner product of the first two equations (3.5) and (3.6) with *<sup>μ</sup>n*+<sup>1</sup> and <sup>3</sup>*φn*+1−4*φn*+*φn*−<sup>1</sup> <sup>2</sup>*<sup>t</sup>* respectively, and multiplying the third equation (3.7) with 2*rn*+1, then using the following identity:

$$\begin{aligned} \Im(a^{k+1}, \Im a^{k+1} - 4a^k + a^{k-1}) &= \|a^{k+1}\|^2 + \|2a^{k+1} - a^k\|^2 + \|a^{k+1} - 2a^k + a^{k-1}\|^2 \\ &- \|a^k\|^2 - \|2a^k - a^{k-1}\|^2. \end{aligned}$$

#### *3.1 Implementation*

Besides its unconditional stability, a most remarkable feature of the above scheme is that it can be solved very efficiently. Indeed, by inserting (3.6) and (3.7) into (3.5), and let *<sup>F</sup>*¯ *<sup>n</sup>*+<sup>1</sup> := ; *F (φ*¯*n*+1*)d<sup>x</sup>* <sup>+</sup> *<sup>C</sup>*0, we obtain

$$f\left(\frac{\mathfrak{I}}{2\chi\Delta t}I + (-\Delta)^{\mathfrak{s}}\right)\phi^{n+1} + \frac{(f(\bar{\phi}^{n+1}), \phi^{n+1})}{2\bar{\mathcal{F}}^{n+1}}f(\bar{\phi}^{n+1}) = \mathfrak{g}^{n},\tag{3.10}$$

where

$$g^n := \frac{4\phi^n - \phi^{n-1}}{2\gamma\Delta t} - \left(\frac{4r^n - r^{n-1}}{3\sqrt{\bar{\mathcal{F}}^{n+1}}} - \frac{(f(\bar{\phi}^{n+1}), \frac{4\phi^n + \phi^{n-1}}{3})}{2\bar{\mathcal{F}}^{n+1}}\right) f(\bar{\phi}^{n+1}).\tag{3.11}$$

We shall first determine *(f (φ*¯*n*+1*), φn*+1*)* from (3.10). To this end we multiply (3.10) by <sup>3</sup> <sup>2</sup>*γ 
t <sup>I</sup>* <sup>+</sup> *(*−*)<sup>s</sup>* −<sup>1</sup> and take the inner product by *f (φ*¯*n*+1*)* to get

$$(f(\bar{\phi}^{n+1}), \phi^{n+1}) + \frac{(f(\bar{\phi}^{n+1}), \phi^{n+1})}{2\bar{\mathcal{F}}^{n+1}} (f(\bar{\phi}^{n+1}), \phi^{n+1}) = (f(\bar{\phi}^{n+1}), a^{n+1}),\tag{3.12}$$

with

$$\alpha^{n+1} = \left(\frac{3}{2\gamma\Delta t}I + (-\Delta)^s\right)^{-1}\mathbf{g}^n, \qquad \beta^{n+1} = \left(\frac{3}{2\gamma\Delta t}I + (-\Delta)^s\right)^{-1}f(\bar{\phi}^{n+1}).\tag{3.13}$$

Then, we have

$$(f(\bar{\phi}^{n+1}), \phi^{n+1}) = \frac{(f(\bar{\phi}^{n+1}), \alpha^{n+1})}{1 + \frac{1}{2\mathcal{F}^{n+1}}(f(\bar{\phi}^{n+1}), \beta^{n+1})}.\tag{3.14}$$

Thus we obtain an expression to compute *φn*+<sup>1</sup> by bringing back (3.14) into (3.10):

$$
\phi^{n+1} = \alpha^{n+1} - \frac{(f(\bar{\phi}^{n+1}), \alpha^{n+1})}{2\bar{\mathcal{F}}^{n+1} + (f(\bar{\phi}^{n+1}), \beta^{n+1})} \beta^{n+1}.\tag{3.15}
$$

Finally we compute *rn*+<sup>1</sup> through

$$r^{n+1} = \frac{4r^n - r^{n-1}}{3} + \frac{(f(\bar{\phi}^{n+1}), \phi^{n+1}) - (f(\bar{\phi}^{n+1}), \frac{4\phi^n - \phi^{n-1}}{3})}{2\sqrt{\bar{\mathcal{F}}^{n+1}}}.$$

We now summarize the algorithm of the Scalar Auxiliary Variable approach/Semi-Implicit Second-Order Scheme (3.5)–(3.7) as follows:

1. Set *<sup>φ</sup>*¯*n*+<sup>1</sup> <sup>=</sup> <sup>2</sup>*φ<sup>n</sup>* <sup>−</sup> *<sup>φ</sup>n*−1*, <sup>F</sup>*¯ *<sup>n</sup>*+<sup>1</sup> <sup>=</sup> *F (φ*¯*n*+1*)d<sup>x</sup>* <sup>+</sup> *<sup>C</sup>*0*, <sup>c</sup>*˜<sup>0</sup> <sup>=</sup> *(f (φ*¯*n*+1*),* <sup>4</sup>*φ<sup>n</sup>* <sup>−</sup> *<sup>φ</sup>n*−1*)/*3*, <sup>c</sup>*˜<sup>1</sup> <sup>=</sup> <sup>4</sup>*r<sup>n</sup>* <sup>−</sup> *<sup>r</sup>n*−<sup>1</sup> 3 √ *<sup>F</sup>*¯ *<sup>n</sup>*+<sup>1</sup> <sup>−</sup> *<sup>c</sup>*˜<sup>0</sup> <sup>2</sup>*F*¯ *<sup>n</sup>*+<sup>1</sup> , *<sup>g</sup><sup>n</sup>* <sup>=</sup> <sup>4</sup>*φ<sup>n</sup>* <sup>−</sup> *<sup>φ</sup>n*−<sup>1</sup> 2*γ 
t* − ˜*c*1*f (φ*¯*n*+1*)*; 2. Solve 3 <sup>2</sup>*γ 
t <sup>β</sup>n*+<sup>1</sup> <sup>+</sup> *(*−*)<sup>s</sup> <sup>β</sup>n*+<sup>1</sup> <sup>=</sup> *f (φ*¯*n*+1*)*; 3. Solve 3 2*γ 
t <sup>α</sup>n*+<sup>1</sup> <sup>+</sup> *(*−*)<sup>s</sup> <sup>α</sup>n*+<sup>1</sup> <sup>=</sup> *<sup>g</sup>n*; 4. Compute *<sup>c</sup>*˜<sup>2</sup> <sup>=</sup> *(f (φ*¯*n*+1*), βn*+1*), <sup>c</sup>*˜<sup>3</sup> <sup>=</sup> *(f (φ*¯*n*+1*), αn*+1*), <sup>c</sup>*˜<sup>4</sup> <sup>=</sup> *<sup>c</sup>*˜3*/(*2*F*¯ *<sup>n</sup>*+<sup>1</sup> + ˜*c*2*)*; 5. Compute *<sup>φ</sup>n*+<sup>1</sup> <sup>=</sup> *<sup>α</sup>n*+<sup>1</sup> − ˜*c*4*βn*+1*, rn*+<sup>1</sup> <sup>=</sup> <sup>4</sup>*r<sup>n</sup>* <sup>−</sup> *<sup>r</sup>n*−<sup>1</sup> <sup>3</sup> <sup>+</sup> *(f (φ*¯*n*+1*), φn*+1*)* − ˜*c*<sup>0</sup> 2 √ *<sup>F</sup>*¯ *<sup>n</sup>*+<sup>1</sup> *.*

#### **4 Numerical Results and Discussion**

In this section, we first present a numerical example to illustrate the efficiency of the SAV scheme in terms of stability and accuracy. We then use the proposed scheme to simulate a benchmark problem.

#### *4.1 Test of the Convergence Order*

In order to validate the proposed SAV/BDF2 scheme for the fractional phase-field equation, we consider a fabricated forcing term so that the exact solution to (1.1) is *φ(x,t)* = sin*(t)* cos*(πx)* cos*(πy)*. In this test we set *γ* = 1, =] − 1*,* 1[ 2, and the nonlinear term is given by *f (φ)* <sup>=</sup> *φ (φ*<sup>2</sup> <sup>−</sup> <sup>1</sup>*).*

In the calculation we use polynomial degree 32×32 for the spatial discretization, which is large enough so that the spatial discretization error is negligible compared to the temporal error. Figure <sup>1</sup> shows the *<sup>L</sup>*2-errors at *<sup>T</sup>* <sup>=</sup> <sup>1</sup>*.*0 in log-log scale as a function of the time step size for several fractional orders. It is observed from this figure that the convergence rate of the time stepping scheme is exactly second order as expected for all tested values of *s*. It is worthy to mention that no numerical instability was observed for all time step sizes used in the calculation. This implies that the proposed scheme is unconditionally stable.

#### *4.2 Benchmark Test*

In this subsection, we apply the SAV/BDF2 scheme to the fractional version of a classical benchmark problem (cf. [11]) that we describe below. Our main purpose in this test is to demonstrate the applicability of the constructed method for the FACE. We are particularly interested in numerically investigating the impact of the fractional order on the evolution of the phase interface.

At the initial state, there is a circular phase interface of the radius *R*<sup>0</sup> = 100 in the rectangular domain ] − 128*,* 128[ 2. In other words, the initial condition is given by

$$\phi(\mathbf{x},0) = \begin{cases} 1, & |\mathbf{x}|^2 < 100^2, \\ -1, & |\mathbf{x}|^2 \ge 100^2. \end{cases}$$

Such a circular interface is unstable and the driving force will make it shrink and eventually disappear. It has been shown that in the limit that the radius of the circle is much larger than the interfacial thickness, the velocity and the radius of the moving interface are given (see [1]) by

$$V = \frac{dR}{dt} = -\frac{1}{R}, \quad R(t) = \sqrt{R\_0^2 - 2t}.$$

In the implementation we map the computational domain ] − 128*,* 128[ <sup>2</sup> to ] − 1*,* 1[ 2. Therefore actually we are led to solve the fractional Allen-Cahn equation (1.1) with the coefficients *<sup>γ</sup>* <sup>=</sup> <sup>1</sup>*/*128<sup>2</sup> and *<sup>ε</sup>* <sup>=</sup> <sup>0</sup>*.*0078. In the simulation, the space resolution is set to *N* = 512, and the time step size is *t* = 0*.*1. The computed radius *R(t)* for *s* = 1 using the SAV/BDF2 scheme is plotted in Fig. 2. We observe that *R(t)* keeps monotonously decreasing and very close to the sharp interface limit value. This confirms the accuracy of the proposed method, at least in the case *s* = 1.

Next we apply the proposed scheme to investigate the impact of the fractional order on the radius behavior. In Fig. 3 we present the numerical radius evolution for a number of the fractional orders. Specifically, Fig. 4 shows the circle shrinking for fractional orders *s* = 1*.*0*,* 0*.*9*,* 0*.*8. It is clearly indicated that the radius decay rate slow down when the fractional order decreases. However, for the time being the physical meaning and mathematical explanation of this phenomena remain unknown. We plan to address this issue in future work.

**Fig. 2** The evolution of radius *R(t)*: comparison of the exact solution and numerical result in the case *s* = 1

**Fig. 3** Evolution of the radius for different fractional order *s*: impact of the order on the radius decay rate

**Fig. 4** Temporal evolution of a circular domain from left to right at times *t* = 1000*,* 2000*,* 3000*,* 4000*,* 5000, for fractional order *s* = 1 (**a**),0.9 (**b**), 0.8 (**c**), for the top, middle and bottom rows, respectively

**Acknowledgements** The work was supported by NSFC/ANR joint program 51661135011 - PHASEFIELD. The second author has received financial support from the French State in the frame of the "Investments for the future" Programme Idex Bordeaux, reference ANR-10-IDEX-03-02. The third author was also supported by NSFC grant 11971408, NNW2018-ZT4A06 project.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A First Meshless Approach to Simulation of the Elastic Behaviour of the Diaphragm**

**Nicola Cacciani, Elisabeth Larsson, Alberto Lauro, Marco Meggiolaro, Alessio Scatto, Igor Tominec, and Pierre-Frédéric Villard**

### **1 Introduction**

When intensive care patients are subjected to mechanical ventilation, this is part of the life support. At the same time the ventilator causes damage to the muscles that govern the normal breathing. Normally, the muscles contract when we inhale, and air is pulled into the lungs. During controlled mechanical ventilation, the ventilator instead pushes the air into the lungs that then exert a pressure on the muscles. The function of the muscle tissue can deteriorate quite rapidly, leading to Ventilator

N. Cacciani

E. Larsson (-) · I. Tominec

e-mail: elisabeth.larsson@it.uu.se; igor.tominec@it.uu.se

A. Lauro UOC di Radiologia, University-Hospital of Padova, Padova, Italy

M. Meggiolaro UOC di Anestesia e Rianimazione, ULSS 3 Sereneness, Saints Giovanni and Paolo Hospital, Venice, Italy e-mail: meggiomarco@libero.it

A. Scatto

UOC di Anestesia e Rianimazione, University-Hospital of Padova, Padova, Italy

P.-F. Villard Computer Science, Université de Lorraine, CNRS, Inria, LORIA, Nancy, France e-mail: pierrefrederic.villard@loria.fr

© The Author(s) 2020

501

Division of Basic and Clinical Muscle Biology, Karolinska Institutet, Stockholm, Sweden e-mail: nicola.cacciani@ki.se

Scientific Computing, Department of Information Technology, Uppsala University, Uppsala, Sweden

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_40

Induced Diaphragmatic Dysfunction (VIDD) [2]. Because of this, the rehabilitation process, including the weaning from the ventilator, is more difficult and takes longer.

The Individual Virtual Ventilator (INVIVE) project [5] aims to study the mechanics of respiration through numerical simulation in order to learn more about the onset of VIDD, and the factors that influence its progress in a patient. This work is the first publication from the project and is a pilot study for the numerical techniques that we plan to use.

The diaphragm is the main respiratory muscle. It has not been studied as much in the literature as other muscles, and not with detailed models. However, there are a few studies that uses continuum mechanical descriptions of the muscle tissue and simulate its behaviour using FEM [6, 10]. The main drawbacks of the FEM solvers are that they are time-consuming, and that meshing of complex geometries can be difficult. We instead propose to use a meshfree RBF-FD method [4] for the numerical simulation. Some of the potential advantages are that meshing can be replaced with scattered node generation, which in some respects is easier, and allows for a lot of flexibility; that it is easy to construct high-order accurate approximations that can reduce the computational cost; and that the method is easy to implement and modify, providing flexibility when performing experiments. The objectives of the paper are


The paper is organized as follows: In Sect. 2 we describe the linear elasticity equations in three dimensions. Section 3 briefly introduces the RBF-FD method. The process from medical images to input data for the simulation is described in Sect. 4, which is followed by Sect. 5 on Numerical experiments.

#### **2 The Elasticity Equations**

The constitutive relations that describe the real behaviour of muscle tissue are non-linear. The displacement of the diaphragm is large, and should therefore also be modeled by non-linear elasticity equations. For our final simulation tool, we aim to solve the fully non-linear equations. However, for the initial development of meshless numerical methods for the diaphragm simulations, we use a linear elasticity test case.

#### *2.1 The Linearized Equations of Motion*

For the linear test problem, the following simplifying assumptions are made: The relationship between stress and strain is *linear*, the material is *isotropic* and *homogeneous*, and displacements are *small*.

We define the displacement *u(X)* <sup>=</sup> *(u*1*(X), u*2*(X), u*3*(X))<sup>T</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> of the tissue from the initial configuration *<sup>X</sup>* <sup>=</sup> *(x, y, z)<sup>T</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> to a later configuration *<sup>X</sup>*<sup>∗</sup> <sup>∈</sup> <sup>R</sup><sup>3</sup> as

$$
\mu(X) = X^\* - X. \tag{l}
$$

The strain-displacement relationship for small displacements, ||∇*u*|| 1, has the form

$$\varepsilon = \frac{1}{2} \left[ \nabla u + (\nabla u)^T \right],\tag{2}$$

where the strain *<sup>ε</sup>* <sup>∈</sup> <sup>R</sup>3×<sup>3</sup> is a tensor. For a linear material, the constitutive relation between the strain and the stress *<sup>σ</sup>* <sup>∈</sup> <sup>R</sup>3×<sup>3</sup> is characterized by the Lamé parameters *λ*, and *μ*, leading to

$$
\sigma = 2\mu\varepsilon + \lambda tr(\varepsilon)I. \tag{3}
$$

In tissue mechanics, the acceleration is typically small compared with the forces, and can be neglected. The equations of motion (Newton's second law) can then be written as

$$
\nabla \cdot \sigma + f = 0,\tag{4}
$$

where *<sup>f</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> represents body forces. We assume that (4) holds for all points *X* ∈ , where is the domain of interest, which for our problem is the diaphragm. To close the problem formulation, we also need boundary conditions. The first type is displacement boundary conditions

$$
\mu = \emptyset, \quad X \in \Omega\_D. \tag{5}
$$

These are applied where the geometry is attached, for example where the diaphragm is attached to the ribs and the spine. Traction boundary conditions are given in terms of the stress as

$$
\sigma \cdot n = h, \quad X \in \mathfrak{Q}\_T. \tag{6}
$$

These represent forces applied to the surface of the domain of interest, such as the pressure against the diaphragm from below generated by the abdominal compliance.

#### *2.2 The Lamé-Navier PDE Formulation*

The Lamé-Navier equations gives the steady-state motion equation in terms of the displacement field [13]. This means we are solving a system of three PDEs with three unknowns. We rewrite (4) and (6) in terms of *u* using relations (2), (3), and the identity *tr* <sup>∇</sup>*<sup>u</sup>* <sup>+</sup> *(*∇*u)<sup>T</sup>* = 2*(*∇ · *u)* to get

$$(\lambda + \mu)\nabla(\nabla \cdot u) + \mu \nabla^2 u + f = 0, \quad u \in \Omega \tag{7}$$

$$
\mu = \mathbf{g}, \quad \mu \in \partial \Omega\_D \tag{8}
$$

$$\left[\lambda(\nabla \cdot \boldsymbol{u})I + \mu(\nabla \boldsymbol{u} + (\nabla \boldsymbol{u})^T)\right] \cdot \boldsymbol{n} = \boldsymbol{h}, \quad \boldsymbol{u} \in \partial \Omega\_T \tag{9}$$

When we later discretize the system, it is more convenient to work with the operators and the displacement in component form. The two operators in the PDE (7) applied to *u* expand to

$$
\nabla(\nabla \cdot \boldsymbol{u}) = \begin{pmatrix} \nabla\_{\boldsymbol{x}\boldsymbol{x}} \ \nabla\_{\boldsymbol{x}\boldsymbol{y}} \ \nabla\_{\boldsymbol{x}\boldsymbol{z}} \\ \nabla\_{\boldsymbol{x}\boldsymbol{y}} \ \nabla\_{\boldsymbol{y}\boldsymbol{y}} \ \nabla\_{\boldsymbol{y}\boldsymbol{z}} \\ \nabla\_{\boldsymbol{x}\boldsymbol{z}} \ \nabla\_{\boldsymbol{y}\boldsymbol{z}} \ \nabla\_{\boldsymbol{z}\boldsymbol{z}} \end{pmatrix} \begin{pmatrix} \boldsymbol{u}\_{1} \\ \boldsymbol{u}\_{2} \\ \boldsymbol{u}\_{3} \end{pmatrix}, \quad \nabla^{2}\boldsymbol{u} = \begin{pmatrix} \mathcal{L} & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathcal{L} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & \mathcal{L} \end{pmatrix} \begin{pmatrix} \boldsymbol{u}\_{1} \\ \boldsymbol{u}\_{2} \\ \boldsymbol{u}\_{3} \end{pmatrix},
$$

where L = ∇*xx* + ∇*yy* + ∇*zz*. Rewriting the two terms in the traction condition (9) in the same way yields

$$(\nabla \cdot \boldsymbol{u})\boldsymbol{I} \cdot \boldsymbol{n} = \begin{pmatrix} n\_1 \nabla\_{\boldsymbol{x}} \ n\_1 \nabla\_{\boldsymbol{y}} \ n\_1 \nabla\_{\boldsymbol{z}} \\ n\_2 \nabla\_{\boldsymbol{x}} \ n\_2 \nabla\_{\boldsymbol{y}} \ n\_2 \nabla\_{\boldsymbol{z}} \\ n\_3 \nabla\_{\boldsymbol{x}} \ n\_3 \nabla\_{\boldsymbol{y}} \ n\_3 \nabla\_{\boldsymbol{z}} \end{pmatrix} \begin{pmatrix} \boldsymbol{u}\_1 \\ \boldsymbol{u}\_2 \\ \boldsymbol{u}\_3 \end{pmatrix},$$

$$(\nabla \boldsymbol{u} + (\nabla \boldsymbol{u})^T) \cdot \boldsymbol{n} = \begin{pmatrix} \boldsymbol{\mathcal{T}}\_{\boldsymbol{1}\boldsymbol{x}} & n\_2 \nabla\_{\boldsymbol{x}} \ n\_3 \nabla\_{\boldsymbol{x}} \\ n\_1 \nabla\_{\boldsymbol{y}} & \boldsymbol{\mathcal{T}}\_{\boldsymbol{2}\boldsymbol{y}} & n\_3 \nabla\_{\boldsymbol{y}} \\ n\_1 \nabla\_{\boldsymbol{z}} & n\_2 \nabla\_{\boldsymbol{z}} & \boldsymbol{\mathcal{T}}\_{\boldsymbol{3}\boldsymbol{z}} \end{pmatrix} \begin{pmatrix} \boldsymbol{u}\_1 \\ \boldsymbol{u}\_2 \\ \boldsymbol{u}\_3 \end{pmatrix},$$

where T*iq* = *ni*∇*<sup>q</sup>* + *n*1∇*<sup>x</sup>* + *n*2∇*<sup>y</sup>* + *n*3∇*z*.

#### **3 The RBF-FD Numerical Method**

In the RBF-FD method [4], scattered node stencil approximations are used for representing the differential operators in the PDE and the boundary conditions. Let *X*1*,...,XN* be a global set of node points, and let *uij* ≈ *ui(Xj )*. We collect the unknown displacement values in the vectors *Ui* <sup>=</sup> *(ui*1*,...,uiN )<sup>T</sup>* . When we want to approximate the result of a differential operator D applied to *ui*, we first find a local neighbourhood *X(j )* <sup>1</sup> *,...,X(j ) <sup>n</sup>* with local unknowns *<sup>u</sup>(j ) ik* to the point *Xj* , where we want to evaluate the result. The stencil approximation then takes the form

$$\mathcal{D}u\_i(X\_j) \approx \sum\_{k=1}^n w\_k u\_{ik}^{(j)}.\tag{10}$$

The weights are computed for each point in the global node set by solving a linear system of size *n*×*n*, where the stencil size *n N*. In this work, we consider stencil approximations where RBFs augmented by a polynomial basis are used. The small linear systems then take the form

$$
\begin{pmatrix} A & P \\ P^T & 0 \end{pmatrix} \begin{pmatrix} w \\ \nu \end{pmatrix} = \begin{pmatrix} b \\ c \end{pmatrix}, \tag{11}
$$

where *A(i, k)* <sup>=</sup> *φ(X(j ) <sup>k</sup>* <sup>−</sup> *<sup>X</sup>(j ) <sup>i</sup> )*, where *φ(r)* is an RBF, for *i, k* = 1*,...,N*, and where *P (i, k)* <sup>=</sup> *pk(X(j ) <sup>i</sup> )*, for *i* = 1*,...,N* and *k* = 1*,...,m*. The polynomials *pk* are chosen as the lowest degree monomial basis with dimension *m*, and *m* is usually chosen such that a full basis for a certain maximum degree *K* is obtained. The right hand side vectors are defined by *b(i)* <sup>=</sup> <sup>D</sup>*φ(Xj* <sup>−</sup>*X(j ) <sup>i</sup> )* for *i* = 1*,...,N*, and *c(i)* = D*pi(Xj )* for *i* = 1*,...,m*. The vector *γ* can be seen as a Lagrange multiplier in this problem and is discarded. The stencil approximation is exact for polynomials up to degree *K* as can be seen from the last block row in the system, and it is also exact for the RBFs centered at the stencil nodes.

A global differentiation matrix *D* is assembled by inserting the weights corresponding to *Xj* in the *j* th row of the matrix, and in the columns corresponding to the global indices of the nodes *X(j ) <sup>k</sup>* in the local neighbourhood (*Xj* is normally one of the points in the neighbourhood). Then we can compute

$$\left(\mathcal{D}u\_l(X\_l), \ldots, \mathcal{D}u\_l(X\_N)\right)^T \approx DU\_l. \tag{12}$$

When solving the PDE problem (7)–(9), *u* is replaced with the discrete field variables, and the differential operators are replaced with the corresponding differentiation matrices. The PDE operator is applied for interior node points, and the boundary operators at boundary node points.

In recent work on RBF-FD methods it has been found that a combination of polyharmonic spline RBFs *φ(r)* = |*r*| <sup>2</sup>*k*+1, *k >* 0 with polynomials up to degree *K* has excellent approximation properties [1, 3]. The (asymptotic) convergence rate is guided by the polynomial degree *K*, and oscillations near boundaries, which are common both with pure RBF and pure polynomial approximations, are suppressed as soon as *K* is large enough. In this work, we use the cubic polyharmonic spline

$$\phi(r) = \left| r \right|^{\mathfrak{J}}.\tag{13}$$

#### **4 The Medical Image Input Data**

The medical research questions are the motivation for the INVIVE project, and it is important that the numerical simulations can emulate what is seen in the medical image data. To start with, we use medical images to extract the real diaphragm geometry. We also use image data to find the displacement of the diaphragm at different times during the respiratory cycle. Later in the project medical image data will also be used for validation of the numerical simulations.

#### *4.1 Medical Image Acquisition*

The type of medical image data that is available to us is thoracic 3-D CT images acquired using a TOSHIBA Aquilion ONE CT scan machine. The images were captured at Azienda Ospedaliera di Padova from adult patients that were subjected to the CT scan for medical reasons (the CT scans were not performed only for research). The images were made and are used in anonymous form. The computed 3-D images are associated with two specific times in the breathing cycle or, equivalently, with two different states of lung inflation. The images have a pixel size of 0*.*<sup>927</sup> <sup>×</sup> <sup>0</sup>*.*927 mm<sup>2</sup> and a slice thickness of 0*.*3 mm. They have a resolution of 512 × 512 × 1500 that includes the thoracic and abdominal regions. Examples of image views are shown in Fig. 1.

#### *4.2 Converting Image Data to Mesh-Based Geometry Data*

Automated segmentation methods are currently not able to identify the diaphragm that is barely visible in the images. Therefore, the diaphragm was manually segmented on a Wacom tablet using a method similar to the description in [14]. The segmentation time is roughly 6 h for one 3-D image. The manual segmentation

**Fig. 1** Manual segmentation of the diaphragm. Red: diaphragm, yellow: lungs, blue: bones

**Fig. 2** Left: The initial 3D-mesh and the decimated mesh with 1000 vertices. Right: The sagittal cut and centers of gravity (green)

method consists in following the organs that are known to surround the diaphragm such as the bottom of the lungs, the top of the liver, and the inside of the ribs. Figure 1 shows the result of the segmentation.

The labelized voxel data is then converted into a mesh with the marching cube algorithm. It contains around 1*.*<sup>5</sup> · <sup>10</sup><sup>6</sup> vertices due to the CT scan resolution. The initial mesh is then decimated using Vorpaline [9], a fast and automatic method, where the only input is the number of final points comprising the mesh. The initial mesh and a decimated mesh are shown in Fig. 2.

Both when implementing the boundary conditions and for node generation, it is necessary to be able to identify vertices belonging to different parts of the surface of the geometry. Two relevant sections are the upper thoracic surface and the lower abdominal surface. These correspond to two different pressure regions.

To separate the surface components, we employ the following algorithm: First the whole diaphragm is separated into a left and right part. If we orient the diaphragm such that the parameter *t* ∈ [*t*min*, t*max] describes a position from left to right, and we let *V (*·*)* denote the volume of a convex region, we let *C(*·*)* denote the convex hull of a node set, and we let *(t*1*, t*2*)* be the part of the diaphragm that falls within that range of *t*. Then we can find the sagittal cut *t*sep as the position that maximizes the sum of the left and right volume

$$t\_{\mathrm{sep}} = \underset{t\_{\mathrm{min}}, \le t}{\arg\max} \ V\left(\mathcal{C}\left(\{X\_j | X\_j \in \mathfrak{Q}\left(t\_{\mathrm{min}}, t\right)\}\right) + V\left(\mathcal{C}\left(\{X\_j | X\_j \in \mathfrak{Q}\left(t, t\_{\mathrm{max}}\right)\}\right)\right).$$

The result is illustrated in the right panel of Fig. 2, where also the two centres of gravity *cL* and *cR*, for the left and right part respectively, are indicated.

For each surface vertex *Xj* ∈ *i*, for *i* = *L, R*, of the diaphragm, a vertex location tag is given by the dot product between the diaphragm vertex normal *nj* and the normalized vector *vj* = *(Xj* − *ci)/Xj* − *ci* in the direction from the center of gravity to the vertex.

$$\text{tag}(X\_j) = \begin{cases} \text{thorax}, & \text{if } n\_j \cdot v\_j \ge 0 \\ \text{abdomen}, & \text{otherwise} \end{cases} \tag{14}$$

Finally, to avoid artifacts, only the bigger connected component of tagged locations is kept and disconnected parts the are changed to the other location.

#### *4.3 Final Geometry Representation and Node Generation*

Based on the OGr method [11, 12] and a least-squares RBF-partition of unity method [7], the mesh-based geometry is smoothed and parametrized. The details of this process are described in a forthcoming paper [8].

Scattered nodes sets of different resolutions are generated from the smoothed geometry. A level set function inside the volume is used for anisotropic node placement such that the resolution in the direction normal to the surface is higher then along the surface.

#### *4.4 A Test Problem with Real Displacements*

We are still working with the analyses of the images shown in the previous section. Therefore, we use an older data set with a bit lower resolutions for the test case and the numerical experiments. We only have the end of inhalation state segmented at this point. To define a realistic displacement function, we have identified nine different landmarks on the diaphragm. There are four insertion points of the diaphragm that we take as immobile. These are the left and right transverse processes of the two lowest thoracic vertebra T11 and T12. The five moving landmarks and their displacements are given in Table 1. We augment this information by also requiring the extremal points of the lower edge of the diaphragm to be immobile, and the thickness change from contracted to relaxed state at the two domes to be 66%. We then interpolate the displacements at the augmented landmarks by the |*r*| <sup>7</sup> polyharmonic spline. The initial and displaced states are shown in Fig. 3. As the first test problem, we solve for the interior displacement given that the boundary displacement changes from the relaxed to the contracted state.


**Table 1** Displacements of five landmark points

**Fig. 3** Initial node locations (higher, red) and displaced node locations (lower, blue), using the constructed displacement function for a node set with *N* = 8404 nodes

#### **5 Numerical Experiments**

A main concern when solving the linear elasticity problem for the diaphragm is the high aspect ratio of the geometry. The overall size of the diaphragm is around 30 × 20 × 15 cm, while the thickness is just a few mm. In the experiments we want to test how important the resolution in the normal direction is for the results. Our hypothesis is that it needs to be large enough to allow for a stencil with a similar number of nodes in each dimension. That is, we need at least <sup>√</sup><sup>3</sup> *<sup>n</sup>* nodes in the normal direction. We compare two cases, (i) using uniform node sets with similar distances in the normal and tangential directions, and (ii) using node sets that are refined in the normal direction according to the stencil size. Convergence is tested against a reference solution computed at a higher resolution.

The left part of Fig. 4 shows the convergence of the displacements. The errors are larger for case (i), and no convergence trend is observed for the largest stencil size. The number of points in the normal direction increases gradually as *N* increases. For

**Fig. 4** Left: Convergence of the displacement against the reference solution for uniform nodes (dashed) and nodes refined in the normal direction (solid) for *n* = 50, *K* = 3 (square), *n* = 78, *K* = 4 (circle), and *n* = 120, *K* = 5 (x), where *n* is the stencil size and *K* is the order of the polynomial basis augmenting the polyharmonic spline functions. Right: Convergence of the stresses against the reference solution for *n* = 50. The slopes *p*, with −*p* indicating the order of convergence are also shown

case (ii), the errors are smaller, and convergence is observed in all cases. When using polyharmonic splines in combination with polynomials, we expect the convergence rate to be of order *hK*+1, where *h* is a measure of the node spacing and *K* is the maximum degree of the polynomial terms [3]. However, for case (ii), we get the same rate of convergence for all *K*. One reason can be that the normal refinement is constant when the tangential refinement is increased. There may also be issues concerning the smoothness of the node distribution and/or the solution.

In the right part of Fig. 4, we display the convergence of the functions in the stress tensor, computed for the interior nodes for case (ii). The convergence rates are similar to those of the displacement. This is also unexpected, as we would normally expect a derivative of order to converge as *hK*+1− [3].

In Fig. 5, we show the components of the stress tensor, computed for the interior nodes for case (ii). We can see that the magnitude of the stresses is large at the domes where we enforce compression of the muscle.

#### **6 Conclusions**

We have developed a pipeline for converting CT image data into input data for numerical simulation. The main bottleneck is the manual segmentation of the diaphragm. One thing that will be investigated in future work is if a mapping from a reference geometry can be used to simplify this step.

When the thin dimension is resolved with enough node points, the RBF-FD approximations converge as the number of nodes increase. Also the stresses can be computed with similar accuracy. This shows that it is possible to use this type of discretization, but further work is needed on how to generate smooth non-uniform node sets, and also on the implementation of more advanced test problems.

**Acknowledgements** The INVIVE project is funded by the Swedish Research Council, grant no. 2016-04849.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **An Explicit Hybridizable Discontinuous Galerkin Method for the 3D Time-Domain Maxwell Equations**

**Georges Nehmetallah, Stéphane Lanteri, Stéphane Descombes, and Alexandra Christophe**

### **1 Motivations and Objectives**

The DGTD method is nowadays a very popular numerical method in the computational electromagnetics community. A lot of works are mostly concerned with time explicit DGTD methods relying on the use of a single global time step computed so as to ensure stability of the simulation. It is however well known that when combined with an explicit time integration method and in the presence of an unstructured locally refine mesh, a high order DGTD method suffers from a severe time step size restriction. An alternative approach that has been considered in [5, 7, 16] is to use a hybrid explicit-implicit (or locally implicit) time integration strategy. Such a strategy relies on a component splitting deduced from a partitioning of the mesh cells in two sets respectively gathering coarse and fine elements. The computational efficiency of this locally implicit DGTD method depends on the size of the set of fine elements that directly influences the size of the sparse part of the matrix system to be solved at each time. Therefore, an approach for reducing the size of the subsystem of globally coupled (i.e. implicit) unknowns is worth considering if one wants to solve very large-scale problems.

A particularly appealing solution in this context is given by the concept of hybridizable discontinuous Galerkin (HDG) method. The HDG method has been first introduced by Cockbrun et al. in [4] for a model elliptic problem and has been subsequently developed for a variety of PDE systems in continuum mechanics [13]. The essential ingredients of a HDG method are a local Galerkin projection of the underlying system of PDEs at the element level onto spaces of polynomials

G. Nehmetallah (-) · S. Lanteri · S. Descombes · A. Christophe

Université Côte d'Azur, INRIA, CNRS, LJAD, Sophia-Antipolis Cedex, France e-mail: georges.nehmetallah@inria.fr; stephane.lanteri@inria.fr; stephane.descombes@univ-cotedazur.fr; alexandra.christophe@inria.fr

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_41

to parameterize the numerical solution in terms of the numerical trace; a judicious choice of the numerical flux to provide stability and consistency; and a global jump condition that enforces the continuity of the numerical flux to arrive at a global weak formulation in terms of the numerical trace. The HDG methods are fully implicit, high-order accurate and most importantly, they reduce the globally coupled unknowns to the approximate trace of the solution on element boundaries, thereby leading to a significant reduction in the degrees of freedom. HDG methods for the system of time-harmonic Maxwell equations have been proposed in [9, 10, 14]. We have only developed the implicit HDG method for the time-domain Maxwell equations [3]. In view of devising a hybrid explicit-implicit HDG method, a preliminary step is therefore to elaborate on the principles of a fully explicit HDG formulation. It happens that fully explicit HDG methods have been studied recently for the acoustic wave equation by Kronbichler et al. [8] and Stanglmeier et al. [15]. In [15] the authors present a fully explicit, high order accurate in both space and time HDG method. In this paper we outline the formulation of this explicit HDGTD, present numerical resultsincluding a preliminary assessment of its superconvergence properties. We adopt a low storage Runge-Kutta scheme [2] for the time integration of the semi-discrete HDG equations. This work is a first step towards the construction of a hybrid explicit-implicit HDG method for time-domain electromagnetics.

#### **2 Problem Statement and Notations**

We consider the system of 3D time-domain Maxwell equations on a bounded polyhedral domain *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup>

$$\begin{cases} \varepsilon \partial\_l \mathbf{E} - \mathbf{curl} \mathbf{H} = -\mathbf{J}, \text{ in } \mathcal{Q} \times [0, T], \\\\ \mu \partial\_l \mathbf{H} + \mathbf{curl} \mathbf{E} = 0, \text{ in } \mathcal{Q} \times [0, T], \end{cases} \tag{1}$$

where the symbol *∂t* denotes a time derivative, **J** the current density, *T* a final time, **E***(***x***,t)* and **H***(***x***,t)* are the electric and magnetic fields. The dielectric permittivity *ε* and the magnetic permeability *μ* are varying in space, time-invariant and both positive functions. The boundary of *Ω* is defined as *∂Ω* = *Γm* ∪*Γa* with *Γm* ∩*Γa* = ∅. The boundary conditions are chosen as

$$\begin{cases} \mathbf{n} \times \mathbf{E} = 0, \text{ on } \varGamma\_m \times [0, T], \\\\ \mathbf{n} \times \mathbf{E} + \mathbf{n} \times (\mathbf{n} \times \mathbf{H}) = \mathbf{n} \times \mathbf{E}^{\text{inc}} + \mathbf{n} \times (\mathbf{n} \times \mathbf{H}^{\text{inc}}) \\\\ = \mathbf{g}^{\text{inc}}, \text{ on } \varGamma\_a \times [0, T]. \end{cases} \tag{2}$$

Here **n** denotes the unit outward normal to *∂Ω* and *(***E**inc*,* **H**inc*)* a given incident field. The first boundary condition is often referred as a metallic boundary condition and is applied on a perfectly conducting surface. The second relation is an absorbing boundary condition and takes here the form of the Silver-Müller condition. It is applied on a surface corresponding to an artificial truncation of a theoretically unbounded propagation domain. Finally, the system is supplemented with initial conditions: **E**0*(***x***)* = **E***(***x***,* 0*)* and **H**0*(***x***)* = **H***(***x***,* 0*)*. For sake of simplicity, we omit the volume source term **J** in what follows.

We introduce now the notations and approximation spaces. We first consider a partition *<sup>T</sup><sup>h</sup>* of *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> into a set of tetrahedron. Each non-empty intersection of two elements *K*<sup>+</sup> and *K*<sup>−</sup> is called an interface. We denote by *F<sup>I</sup> <sup>h</sup>* the union of all interior interfaces of *Th*, by *F<sup>B</sup> <sup>h</sup>* the union of all boundary interfaces of *Th*, and *<sup>F</sup><sup>h</sup>* <sup>=</sup> *<sup>F</sup><sup>I</sup> <sup>h</sup>* <sup>∪</sup> *<sup>F</sup><sup>B</sup> <sup>h</sup>* . Note that *∂T<sup>h</sup>* represents all the interfaces *∂K* for all *K* ∈ *Th*. As a result, an interior interface shared by two elements appears twice in *∂Th*, unlike in *<sup>F</sup><sup>h</sup>* where this interface is evaluated once. For an interface *<sup>F</sup>* <sup>∈</sup> *<sup>F</sup><sup>I</sup> <sup>h</sup>*, *<sup>F</sup>* <sup>=</sup> *<sup>K</sup>*<sup>+</sup> <sup>∩</sup>*K*−, let **v**± be the traces of **v** on *F* from the interior of *K*±. On this interior face, we define mean values as {**v**}*<sup>F</sup>* = *(***v**++**v**−*)/*2 and jumps as **v***<sup>F</sup>* = **n**+×**v**++**n**−×**v**<sup>−</sup> where the unit outward normal vector to *K* is denoted by **n**±. For the boundary faces these expressions are modified as {**v**}*<sup>F</sup>* = **v**<sup>+</sup> and **v***<sup>F</sup>* = **n**<sup>+</sup> × **v**<sup>+</sup> since we assume **v** is single-valued on the boundaries. In the following, we introduce the discontinuous finite element spaces and some basic operations on these spaces for later use. Let P*pK (K)* denotes the space of polynomial functions of degree at most *pK* on the element *K* ∈ *Th*. The discontinuous finite element space is introduced as

$$\mathbf{V}\_{\hbar} = \left\{ \mathbf{v} \in \left[ L^{2}(\Omega) \right]^{3} \text{ such that } \mathbf{v}|\_{K} \in \left[ \mathbb{P}\_{p\_{K}}(K) \right]^{3}, \quad \forall K \in \mathcal{J}\_{\hbar} \right\}, \tag{3}$$

where *L*2*(Ω)* is the space of square integrable functions on the domain *Ω*. The functions in **V***<sup>h</sup>* are continuous inside each element and discontinuous across the interfaces between elements. In addition, we introduce a traced finite element space

$$\mathbf{M}\_{\hbar} = \left\{ \boldsymbol{\eta} \in \left[ \boldsymbol{L}^{2} (\mathcal{P}\_{\hbar}) \right]^{3} \text{ such that } \boldsymbol{\eta}|\_{F} \in \left[ \mathbb{P}\_{p\_{F}} (F) \right]^{3} \right. \tag{4}$$
 
$$\text{and } \left( \boldsymbol{\eta} \cdot \mathbf{n} \right)|\_{F} = 0, \quad \forall F \in \mathcal{P}\_{\hbar} \right\}.$$

For two vectorial functions **u** and **v** in ! *L*2*(D)*"<sup>3</sup> , we denote *(***u***,* **v***)D* = ; *<sup>D</sup>* **u** · **v** d**x** provided *<sup>D</sup>* is a domain in <sup>R</sup>3, and we denote *<sup>&</sup>lt;* **<sup>u</sup>***,* **<sup>v</sup>** *>F* <sup>=</sup> ; *<sup>F</sup>* **u** · **v** d*s* if *F* is a two-dimensional face. Accordingly, for the mesh *Th* we have

$$\begin{aligned} (\cdot,\cdot)\_{\mathcal{P}\_{\hbar}} &= \sum\_{K \in \mathcal{P}\_{\hbar}} \langle \cdot,\cdot \rangle\_{K} \,, & \langle \cdot,\cdot \rangle\_{\partial \lrcorner \mathcal{P}\_{\hbar}} &= \sum\_{K \in \mathcal{P}\_{\hbar}} \langle \cdot,\cdot \rangle\_{\partial K} \,, \\ \langle \cdot,\cdot \rangle\_{\mathcal{P}\_{\hbar}} &= \sum\_{F \in \mathcal{F}\_{\hbar}} \langle \cdot,\cdot \rangle\_{F} \,, & \langle \cdot,\cdot \rangle\_{\Gamma\_{a}} &= \sum\_{F \in \mathcal{P}\_{\hbar} \cap \Gamma\_{a}} \langle \cdot,\cdot \rangle\_{F} \,. \end{aligned}$$

We set **<sup>v</sup>***<sup>t</sup>* = −**<sup>n</sup>** <sup>×</sup> *(***<sup>n</sup>** <sup>×</sup> **<sup>v</sup>***),* **<sup>v</sup>***<sup>n</sup>* <sup>=</sup> **<sup>n</sup>** *(***<sup>n</sup>** · **<sup>v</sup>***)* where **<sup>v</sup>***<sup>t</sup>* and **<sup>v</sup>***<sup>n</sup>* are the tangential and normal components of **<sup>v</sup>** such as **<sup>v</sup>** <sup>=</sup> **<sup>v</sup>***<sup>t</sup>* <sup>+</sup> **<sup>v</sup>***n*.

#### **3 Principles and Formulation of the HDG Method**

Following the classical DG approach, approximate solutions (**E***h*,**H***h*), for all *t* ∈ [0*, T* ], are seeked in the space **V***<sup>h</sup>* × **V***<sup>h</sup>* satisfying for all *K* in *T<sup>h</sup>*

$$\begin{cases} \left(\varepsilon \partial\_t \mathbf{E}\_h, \mathbf{v}\right)\_K - \left(\mathbf{curl} \mathbf{H}\_h, \mathbf{v}\right)\_K = 0, \ \forall \mathbf{v} \in \mathbf{V}\_h, \\\\ \left(\mu \partial\_t \mathbf{H}\_h, \mathbf{v}\right)\_K + \left(\mathbf{curl} \mathbf{E}\_h, \mathbf{v}\right)\_K = 0, \ \forall \mathbf{v} \in \mathbf{V}\_h. \end{cases} \tag{5}$$

Applying Green's formula, on both equations of (5) introduces boundary terms which are replaced by numerical traces **E**ˆ *<sup>h</sup>* and **H**ˆ *<sup>h</sup>* in order to ensure the connection between element-wise solutions and global consistency of the discretization. This leads to the global formulation for all *t* ∈ [0*, T* ]

$$\begin{cases} \left(\varepsilon\partial\_{\mathbf{l}}\mathbf{E}\_{\hbar},\mathbf{v}\right)\_{K} - \left(\mathbf{H}\_{\hbar},\mathbf{curl}\mathbf{v}\right)\_{K} + \left<\hat{\mathbf{H}}\_{\hbar},\mathbf{n}\times\mathbf{v}\right>\_{\partial K} = 0,\ \forall\mathbf{v}\in\mathbf{V}\_{\hbar},\\ \left(\mu\partial\_{\mathbf{l}}\mathbf{H}\_{\hbar},\mathbf{v}\right)\_{K} + \left(\mathbf{E}\_{\hbar},\mathbf{curl}\mathbf{v}\right)\_{K} - \left<\hat{\mathbf{E}}\_{\hbar},\mathbf{n}\times\mathbf{v}\right>\_{\partial K} = 0,\ \forall\mathbf{v}\in\mathbf{V}\_{\hbar}.\end{cases} (6)$$

It is straightforward to verify that **<sup>n</sup>**×**<sup>v</sup>** <sup>=</sup> **<sup>n</sup>**×**v***<sup>t</sup>* and *<sup>&</sup>lt;* **<sup>H</sup>***,* **<sup>n</sup>**×**<sup>v</sup>** *<sup>&</sup>gt;*= − *<sup>&</sup>lt;* **<sup>n</sup>**×**H***,* **<sup>v</sup>** *<sup>&</sup>gt;*. Therefore, using numerical traces defined in terms of the tangential components **H**ˆ *<sup>t</sup> h* and **E**ˆ*<sup>t</sup> <sup>h</sup>*, we can rewrite (6) as

$$\begin{cases} \left(\varepsilon\partial\_{l}\mathbf{E}\_{h},\mathbf{v}\right)\_{K} - \left(\mathbf{H}\_{h},\mathbf{curl}\mathbf{v}\right)\_{K} + \left\langle\hat{\mathbf{H}}\_{h}^{t},\mathbf{n}\times\mathbf{v}\right\rangle\_{\partial K} = 0, \ \forall\mathbf{v}\in\mathbf{V}\_{h}, \\\\ \left(\mu\partial\_{l}\mathbf{H}\_{h},\mathbf{v}\right)\_{K} + \left(\mathbf{E}\_{h},\mathbf{curl}\mathbf{v}\right)\_{K} - \left\langle\hat{\mathbf{E}}\_{h}^{t},\mathbf{n}\times\mathbf{v}\right\rangle\_{\partial K} = 0, \ \forall\mathbf{v}\in\mathbf{V}\_{h}. \end{cases} \tag{7}$$

The hybrid variable *Λ<sup>h</sup>* introduced in the setting of a HDG method [4] is here defined for all the interfaces of *Fh* as

$$\mathbf{A}\_h := \hat{\mathbf{H}}\_h^t, \quad \forall F \in \mathcal{F}\_h. \tag{8}$$

We want to determine the fields **H**ˆ *<sup>t</sup> <sup>h</sup>* and **<sup>E</sup>**ˆ*<sup>t</sup> <sup>h</sup>* in each element *K* of *T<sup>h</sup>* by solving system (7) and assuming that *Λ<sup>h</sup>* is known on all the faces of an element *K*. We consider a numerical trace **E**ˆ*<sup>t</sup> <sup>h</sup>* for all *K* given by

$$
\hat{\mathbf{E}}\_h^l = \mathbf{E}\_h^l + \tau\_K \mathbf{n} \times (\mathbf{A}\_h - \mathbf{H}\_h^l) \text{ on } \partial K,\tag{9}
$$

where *τK* is a local stabilization parameter which is assumed to be strictly positive. We recall that **<sup>n</sup>** <sup>×</sup> **<sup>H</sup>***<sup>t</sup> <sup>h</sup>* = **n** × **H***h*. The definitions of the hybrid variable (8) and numerical trace (9) are exactly those adopted in the context of the formulation of HDG methods for the 3D time-harmonic Maxwell equations [10–12, 14].

Following the HDG approach, when the hybrid variable *Λ<sup>h</sup>* is known for all the faces of the element *K*, the electromagnetic field can be determined by solving the local system (7) using (8) and (9).

From now on we will note by *ginc* the *L*<sup>2</sup> projection of *ginc* on **M***h*. Summing the contributions of (7) over all the elements and enforcing the continuity of the tangential component of **E**ˆ *<sup>h</sup>*, we can formulate a problem which is to find *(***E***h,* **H***h, Λh)* ∈ **V***<sup>h</sup>* × **V***<sup>h</sup>* × **M***<sup>h</sup>* such that for all *t* ∈ [0*, T* ]

$$\begin{split} \left(\varepsilon \partial\_{l} \mathbf{E}\_{h}, \mathbf{v}\right)\_{\mathcal{\widetilde{\mathcal{H}}}\_{h}} - \left(\mathbf{H}\_{h}, \mathbf{curl} \mathbf{v}\right)\_{\mathcal{\widetilde{\mathcal{H}}}\_{h}} + \left<\mathbf{A}\_{h}, \mathbf{n} \times \mathbf{v}\right>\_{\partial\_{\mathcal{I}} \mathcal{\widetilde{\mathcal{H}}}\_{h}} &= \mathbf{0}, \ \forall \mathbf{v} \in \mathbf{V}\_{h}, \\ \left(\mu \partial\_{l} \mathbf{H}\_{h}, \mathbf{v}\right)\_{\mathcal{\widetilde{\mathcal{H}}}\_{h}} + \left(\mathbf{E}\_{h}, \mathbf{curl} \mathbf{v}\right)\_{\mathcal{\widetilde{\mathcal{H}}}\_{h}} - \left<\hat{\mathbf{E}}\_{h}^{\mathrm{I}}, \mathbf{n} \times \mathbf{v}\right>\_{\partial\_{\mathcal{I}} \mathcal{\widetilde{\mathcal{H}}}\_{h}} &= \mathbf{0}, \ \forall \mathbf{v} \in \mathbf{V}\_{h}, \\ \left<\left[\hat{\mathbf{E}}\_{h}\right], \boldsymbol{\eta}\right>\_{\mathcal{\widetilde{\mathcal{H}}}\_{h}} - \left<\mathbf{A}\_{h}, \boldsymbol{\eta}\right>\_{\Gamma\_{a}} - \left<\mathbf{g}^{\mathrm{inc}}, \boldsymbol{\eta}\right>\_{\Gamma\_{a}} &= \mathbf{0}, \ \forall \boldsymbol{\eta} \in \mathbf{M}\_{h}, \end{split} \tag{10}$$

where the last equation is called the conservativity condition with which we ask the tangential component of **E**ˆ *<sup>h</sup>* to be weakly continuous across any interface between two neighboring elements.

We now reformulate the system with numerical fluxes. We can deduce from the third equation of (10) that

$$\mathbf{A}\_{h} = \begin{cases} \frac{1}{\tau\_{K^{+}} + \tau\_{K^{-}}} \left( 2 \left\{ \tau\_{K} \mathbf{H}\_{h}^{t} \right\}\_{F} + \left\{ \mathbf{E}\_{h}^{t} \right\}\_{F} \right), & \text{if } F \in \mathcal{P}\_{h}^{l}, \\\\ \frac{1}{\tau\_{K}} \mathbf{n} \times \mathbf{E}\_{h}^{t} + \mathbf{H}\_{h}^{t}, & \text{if } F \in \mathcal{P}\_{h} \cap \Gamma\_{m}, \\\\ \frac{1}{\tau\_{K} + 1} \left( \tau\_{K} \mathbf{H}\_{h}^{t} + \mathbf{n} \times \mathbf{E}\_{h}^{t} - \mathbf{g}^{\text{inc}} \right). & \text{if } F \in \mathcal{P}\_{h} \cap \Gamma\_{a}. \end{cases} \tag{11}$$

By replacing (11) in (9) we obtain **E**ˆ*<sup>t</sup> <sup>h</sup>* <sup>=</sup> **<sup>E</sup>**<sup>ˆ</sup> *t ,*<sup>+</sup> *<sup>h</sup>* <sup>=</sup> **<sup>E</sup>**<sup>ˆ</sup> *t ,*<sup>−</sup> *<sup>h</sup>* with

$$
\hat{\mathbf{E}}\_{h}^{l} = \begin{cases}
\frac{\mathsf{r}\_{K} + \mathsf{r}\_{K} - }{\mathsf{r}\_{K} + \mathsf{r}\_{K} - } \left(2\left\{\frac{1}{\mathsf{r}\_{K}} \mathbf{E}\_{h}^{l}\right\}\_{F} - \left\llbracket \mathbf{H}\_{h}^{l}\right\rrbracket\_{F}\right), & \text{if } F \in \mathcal{F}\_{h}^{l}, \\
\\ 0, & \text{if } F \in \mathcal{F}\_{h} \cap \Gamma\_{m}, \\\\ \frac{1}{\mathsf{r}\_{K} + 1} \left(\mathbf{E}\_{h}^{l} - \mathsf{r}\_{K}\mathbf{n} \times \mathbf{H}\_{h}^{l} - \mathsf{r}\_{K}\mathbf{n} \times \mathbf{g}^{\text{inc}}\right). & \text{if } F \in \mathcal{F}\_{h} \cap \Gamma\_{d}.
\end{cases} \tag{12}
$$

Thus, the numerical traces (8) and (9) have been reformulated from the conservativity condition. This means that the conservativity condition is now included in the new formulation of the numerical fluxes and can be neglected in the global system of equations. Hence, the local system (6) takes the form of a classical DG formulation, ∀**v** ∈ **V***<sup>h</sup>*

$$\begin{cases} \left(\varepsilon \partial\_{l} \mathbf{E}\_{h}, \mathbf{v}\right)\_{K} - \left(\mathbf{H}\_{h}, \mathbf{curl} \mathbf{v}\right)\_{K} + \left\langle \hat{\mathbf{H}}\_{h}^{l}, \mathbf{n} \times \mathbf{v} \right\rangle\_{\partial K} = 0, \\\\ \left(\mu \partial\_{l} \mathbf{H}\_{h}, \mathbf{v}\right)\_{K} + \left(\mathbf{E}\_{h}, \mathbf{curl} \mathbf{v}\right)\_{K} - \left\langle \hat{\mathbf{E}}\_{h}^{l}, \mathbf{n} \times \mathbf{v} \right\rangle\_{\partial K} = 0. \end{cases} \tag{13}$$

where the numerical fluxes are defined by (11) and (12).

*Remark 3* Let *YK* <sup>=</sup> <sup>√</sup>*εK/* <sup>√</sup>*μK* be the local admittance associated to cell *<sup>K</sup>* and *ZK* = 1*/YK* the corresponding local impedance. If we set *τK* = *ZK* in (11) and 1*/τK* = *YK* in (12), the obtained numerical traces coincide with those adopted in the classical upwind flux DGTD method [6].

#### **4 Numerical Results**

In order to validate and study the numerical convergence of the proposed HDG method, we consider the propagation of an eigenmode in a closed cavity (*Ω* is the unit square) with perfectly metallic walls. The frequency of the wave is *<sup>f</sup>* <sup>=</sup> <sup>√</sup>3*/* <sup>√</sup>2*c*<sup>0</sup> where *<sup>c</sup>*<sup>0</sup> is the speed of light in vacuum. The electric permittivity and the magnetic permeability are set to the constant vacuum values. The exact time-domaine solution is given in [6].

We start our study by assuming that the penalization parameter *τ* is equal to 1. In order to insure the stability of the method, numerical CFL conditions are determined for each value of the interpolation order *pK*. In our particular case we have *εK* and *μk* are constant = 1 ∀*K* ∈ *Th*, so we have verified that, as we said in Remark 3, for *τ* = 1, the values of CFL number correspond to the classical upwind flux-based DG method. In Table 1 we summarize the maximum *Δt* obtained numerically to insure the stability of the scheme

Given these values of *Δt* max, the *L*2-norm of the error is calculated for a uniform tetrahedral mesh with 3072 elements which is constructed from a finite difference grid with *nx* = *ny* = *nz* = 9 points, each cell of this grid yielding 6 tetrahedrons. The wave is propagated in the cavity during a physical time *t*max corresponding to 8 periods (as shown in Fig. 1). Figure 2 depicts a comparison of


**Table 1** Numerically obtained values of *Δt* max

the time evolution of the *L*2-norm of the error between the solution obtained with an HDG method and a classical upwind flux-based DG method for *pK* = 4. An optimal convergence with order *pK* + 1 is obtained as shown in Fig. 3.

Now, we keep the same case than previously and we assess the behavior of the HDG method for various values of the penalization parameter *τ* . We observe that the time evolution of the electromagnetic energy for any order of interpolation, for different values of the parameter *τ* = 1 and when the *Δ*t used is fixed to the values defined in Table 1, the energy increases in time. In fact, It is necessary to decrease the *Δ*t max for each value of *τ* to assure the stability (see Table 2 and Fig. 4). For this example, the optimal cost will be for the parameter *τ* = 1 (having the same cost as an upwind flux for a DG method) otherwise we will spend more time to finish our simulation. On Fig. 5, we show the time evolution of the *L*2-error for several values of *τ* with respect to the maximal time step for the considered parameters. In addition, Table 3 sums up numerical results in term of maximum *L*<sup>2</sup> errors and convergence rates. It appears that the order of convergence is not affected when the stabilization parameter is varied from 1 (with their associated CFL conditions).


**Table 2** Numerically obtained values of the CFL number as a function of the stabilization parameter *τ* for a P1 interpolation

**Fig. 4** Variation of the *Δ*t max as a function of *τ*

**Table 3** Maximum L2-errors and convergence orders


#### **5 Local Postprocessing**

We define here, following the ideas of the local postprocessing developed in [1], new approximations for electric and magnetic field and expect that both **E***n*<sup>∗</sup> *<sup>h</sup>* and **<sup>H</sup>***n*<sup>∗</sup> *h* converge with order *<sup>k</sup>* <sup>+</sup> 1 in the *<sup>H</sup>curl(Th)*-norm, whereas **<sup>E</sup>***<sup>n</sup> <sup>h</sup>* and **<sup>H</sup>***<sup>n</sup> <sup>h</sup>* converge with order *k* in the *Hcurl(Th)*-norm. To postprocess *En*<sup>∗</sup> *<sup>h</sup>* we first compute an approximation *(***p***<sup>n</sup>* <sup>1</sup>*,h,* **<sup>p</sup>***<sup>n</sup>* <sup>2</sup>*,h)* <sup>∈</sup> **<sup>V</sup>***(K)* <sup>×</sup> **<sup>V</sup>***(K)* to the curl of **<sup>E</sup>**, **<sup>p</sup>**1*(tn)* =∇× **<sup>E</sup>***(tn)* and the curl of **<sup>H</sup>**, **<sup>p</sup>**2*(tn)* =∇× **<sup>H</sup>***(tn)* by locally solving the below system

$$(\mathbf{p}\_{1,h}^n, \mathbf{v})\_K = (\mathbf{E}\_h^n, \nabla \times \mathbf{v})\_K - \langle \hat{\mathbf{E}}\_h^{t,n}, \mathbf{n} \times \mathbf{v} \rangle\_{\partial K} \quad \forall \mathbf{v} \in \mathbf{V}(K)\_h$$

and,

$$(\mathbf{p}\_{2,h}^n, \mathbf{v})\_K = (\mathbf{H}\_h^n, \nabla \times \mathbf{v})\_K - \langle \hat{\mathbf{H}}\_h^{t,n}, \mathbf{n} \times \mathbf{v} \rangle\_{\partial K} \quad \forall \mathbf{v} \in \mathbf{V}(K)\_h$$

We then find *(***E***n*<sup>∗</sup> *<sup>h</sup> ,* **<sup>H</sup>***n*<sup>∗</sup> *<sup>h</sup> )* ∈ [*Pk*+1*(K)*] <sup>3</sup> × [*Pk*+1*(K)*] <sup>3</sup> such that

$$\begin{cases} (\nabla \times \mathbf{E}\_h^{n\*}, \nabla \times \mathbf{W})\_K = (\mathbf{p}\_{h,1}^n, \nabla \times \mathbf{W})\_K, & \forall \mathbf{W} \in [\mathcal{P}\_{k+1}(K)]^3, \\\\ (\mathbf{E}\_h^{n\*}, \nabla Y)\_K = (\mathbf{E}\_h^n, \nabla Y)\_K & \forall Y \in \mathcal{P}\_{k+2}(K) \end{cases}$$

and,

$$\begin{cases} (\nabla \times \mathbf{H}\_h^{n\*}, \nabla \times \mathbf{W})\_K = (\mathbf{p}\_{h,2}^n, \nabla \times \mathbf{W})\_K, & \forall \mathbf{W} \in [\mathcal{P}\_{k+1}(K)]^3, \\\\ (\mathbf{H}\_h^{n\*}, \nabla Y)\_K = (\mathbf{H}\_h^n, \nabla Y)\_K & \forall Y \in \mathcal{P}\_{k+2}(K) \end{cases}$$

It is important to point out that we can compute **E***n*<sup>∗</sup> *<sup>h</sup>* and **<sup>H</sup>***n*<sup>∗</sup> *<sup>h</sup>* at any time step without advancing in time. Hence, the local postprocessing can be performed whenever we need higher accuracy at particular time steps. Numerical results given in Table 4 shows that a second order convergence rate is obtained for the post-processed solution.

#### **6 Conclusion**

In this paper we have presented an explicit HDG method to solve the system of Maxwell equations in 3D. The next step is to couple explicit and implicit HDG methods to treat the case of a locally refined mesh.



#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Entropy Conserving and Kinetic Energy Preserving Numerical Methods for the Euler Equations Using Summation-by-Parts Operators**

**Hendrik Ranocha**

#### **1 Introduction**

Considering the solution of hyperbolic conservation laws, high order methods can be very efficient, providing accurate numerical solutions with relatively low computational effort [21]. In order to make use of this accuracy, stability has to be established. Mimicking estimates obtained on the continuous level via integrationby-parts, summation-by-parts (SBP) operators [22, 37] can be used. In short, SBP operators are discrete derivative operators equipped with a compatible quadrature providing a discrete analogue of the *L*<sup>2</sup> norm. The compatibility of discrete integration and differentiation mimics integration-by-parts on a discrete level. Combined with the weak enforcement of boundary conditions via simultaneous approximation terms (SATs) [1], highly efficient and stable semidiscretisations can be obtained at least for linear problems, see e.g. [6, 14, 39] and references cited therein.

In recent years, there has been an enduring and increasing interest in the basic ideas of SBP operators and their application in various frameworks including finite volume (FV) [25, 26], discontinuous Galerkin (DG) [2, 4, 10, 11, 13, 20, 27, 28, 30], and the recent flux reconstruction/correction procedure via reconstruction framework [15, 16, 42] as described in [31, 32]. While there is only a limited amount of well-posedness theory for nonlinear conservation laws, mimicking properties such as entropy stability semidiscretely has received much interest. Building on the seminal work of Tadmor [40, 41], entropy stability of second order schemes using symmetric numerical fluxes has been investigated, resulting in well-defined properties that numerical fluxes have to satisfy in order to result in entropy conservative

H. Ranocha (-)

TU Braunschweig, Institute Computational Mathematics, Braunschweig, Germany e-mail: h.ranocha@tu-bs.de

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_42

schemes. Decomposing general semidiscretisations into a non-dissipative central part and an additional dissipative part, suitable artificial dissipation or filtering can be added afterwards, cf. [7, 9, 38]. Second order methods based on symmetric numerical fluxes can be extended to high order in a conservative way, cf. [4, 7, 28] and [8, 23, 34–36].

Another property of numerical methods for the Euler equations that has received much interest in the literature concerns the kinetic energy. A structural property of numerical fluxes described by Jameson [18] has been used to construct socalled kinetic energy preserving (KEP) numerical fluxes inter alia by Chandrashekar [3]. However, schemes using these fluxes do not preserve the kinetic energy as expected in numerical experiments by Gassner et al. [12]. They had to change the discretisation of the pressure to reduce undesired changes of the kinetic energy. However, this resulted in a loss of entropy conservation. Motivated by these results, some analytical insights into this behaviour have been developed in [29, Section 7.4] and will be presented here.

This chapter is structured as follows. At first, some basic results about SBP operators and corresponding semidiscretisations of hyperbolic conservation laws are reviewed in Sect. 2. Afterwards, the Euler equations are considered in Sect. 3. After demonstrating that the property that has been used to characterise numerical fluxes as KEP is not well-defined, the new concept of KEP numerical methods is introduced. Moreover, a numerical flux that is both entropy conservative and kinetic energy preserving in the new sense is developed. Thereafter, results of a numerical experiment comparing entropy conservative numerical fluxes are described in Sect. 4. Finally, a brief summary is given in Sect. 5.

#### **2 Discretisations Using Summation-by-Parts Operators**

Consider the Euler equations in two space dimensions

$$\underbrace{\partial\_{\mathbf{i}}\begin{pmatrix}\rho\\\rho v\_{\mathbf{x}}\\\rho v\_{\mathbf{y}}\\\rho e\\\-\mathbf{u}\end{pmatrix}}\_{=\mathbf{u}}+\underbrace{\partial\_{\mathbf{x}}\begin{pmatrix}\rho v\_{\mathbf{x}}\\\rho v\_{\mathbf{x}}^{2}+p\\\rho v\_{\mathbf{x}}v\_{\mathbf{y}}\\(\rho e+p)v\_{\mathbf{x}}\end{pmatrix}}\_{=f^{\times}(u)}+\underbrace{\partial\_{\mathbf{y}}\begin{pmatrix}\rho v\_{\mathbf{y}}\\\rho v\_{\mathbf{x}}v\_{\mathbf{y}}\\\rho v\_{\mathbf{y}}^{2}+p\\(\rho e+p)v\_{\mathbf{y}}\end{pmatrix}}\_{=f^{\times}(u)}=0,\tag{1}$$

where *ρ* is the density, *v* the velocity, *e* the specific total energy, and *p* the pressure. For a perfect gas, *p* = *(γ* −1*) ρe*<sup>−</sup> <sup>1</sup> 2*ρv*2 . The usual entropy is *<sup>U</sup>* = − *ρs <sup>γ</sup>* <sup>−</sup><sup>1</sup> , where *s* = log *p* − *γ* log *ρ* is the specific (physical) entropy.

With the entropy fluxes *<sup>F</sup><sup>j</sup>* fulfilling *∂uU* · *∂uf <sup>j</sup>* <sup>=</sup> *∂uF<sup>j</sup>* , smooth solutions of the Euler equations in *<sup>d</sup>* space dimensions satisfy *∂tU (u)*+<*<sup>d</sup> <sup>j</sup>*=<sup>1</sup> *∂jF<sup>j</sup> (u)* <sup>=</sup> 0 and the entropy inequality

$$
\partial\_l U(\mathfrak{u}) + \sum\_{j=1}^d \partial\_j F^j(\mathfrak{u}) \le 0 \tag{2}
$$

is used as additional admissibility criterion for weak solutions, cf. [5].

In order to discretise (1), the domain *Ω* is divided into several non-overlapping sub-domains *Ωl* ⊆ *Ω* and SBP operators will be used on each element. SBP operators consist of discrete derivative operators *Dj* , approximating the partial derivative in direction *j* , and a symmetric and positive definite mass/norm matrix *M*, approximating the *<sup>L</sup>*<sup>2</sup>*(Ωl)* scalar product via *<sup>u</sup><sup>T</sup> Mv* = *u, v,<sup>M</sup>* ≈ *u, v,L*2*(Ωl)* <sup>=</sup> ; *Ωl u v*. Moreover, an interpolation operator *R* approximates the restriction of functions on *Ωl* to the boundary *∂Ωl* and a symmetric and positive definite boundary mass matrix *B* approximate the *L*<sup>2</sup>*(∂Ωl)* scalar product. Representing the multiplication by the *j* -th component of the outer unit normal *ν* at *∂Ωl* by the diagonal matrix *nj* , the SBP property

$$\boldsymbol{M}\boldsymbol{D}\_{\boldsymbol{j}} + \boldsymbol{D}\_{\boldsymbol{j}}^{T}\boldsymbol{M} = \boldsymbol{R}^{T}\boldsymbol{B}\boldsymbol{n}\_{\boldsymbol{j}}\boldsymbol{R} \tag{3}$$

has to be satisfied in order to mimic integration-by-parts discretely via

$$\underbrace{\begin{aligned} \underbrace{\boldsymbol{u}^{T}\boldsymbol{M}\boldsymbol{D}\_{j}\boldsymbol{v} + \boldsymbol{u}^{T}\boldsymbol{D}\_{j}^{T}\boldsymbol{M}\boldsymbol{v}}\_{\boldsymbol{U}} &= \underbrace{\boldsymbol{u}^{T}\boldsymbol{R}^{T}\boldsymbol{B}\boldsymbol{N}\_{j}\boldsymbol{R}\boldsymbol{v}}\_{\boldsymbol{U}},\\ \overbrace{\int\_{\Omega\_{l}}\boldsymbol{u}\left(\partial\_{j}\boldsymbol{v}\right) + \int\_{\partial\Omega\_{l}}\left(\partial\_{j}\boldsymbol{u}\right)\boldsymbol{v}}^{\boldsymbol{U}} &= \overbrace{\int\_{\partial\Omega\_{l}}\boldsymbol{u}\,\boldsymbol{v}\,\boldsymbol{n}\_{j}}^{\boldsymbol{U}}.\end{aligned}\tag{4}$$

Semidiscretisation of (1) will be constructed as follows. Each sub-domain *Ωl* ⊆ *Ω* is mapped onto a reference element and all computations are performed there. On each element, the resulting semidiscretisation is of the form

$$
\partial\_t \mu + \text{VOL} + \text{SURF} = 0,\tag{5}
$$

where the volume terms VOL discretise the flux divergence in the interior of *Ωl* and the surface terms SURF couple elements or impose boundary conditions. Here, *u* is the vector of the nodal values of the numerical solution at specified nodes *ξi* in *Ωl* and a collocation approach is used. Thus, nonlinear operations are performed pointwise and the discrete fluxes *f <sup>j</sup>* are given by their nodal values *f j <sup>i</sup>* <sup>=</sup> *<sup>f</sup> <sup>j</sup> (ui)* <sup>=</sup> *<sup>f</sup> <sup>j</sup> (u(ξi))*. As in (nodal) discontinuous Galerkin methods, the surface terms will be built using numerical fluxes *f* num*,j* in the *j* -th coordinate direction as

$$\text{SIRF} = \sum\_{j=1}^{d} M^{-1} R^T B n\_j \left( f^{\text{num},j} - R f^j \right). \tag{6}$$

Finally, the volume terms are constructed using symmetric (two-point) numerical fluxes *f* vol*,j* (volume fluxes) that are consistent with *f <sup>j</sup>* as

$$\text{VOL}\_{l} = \sum\_{j=1}^{d} \sum\_{k} 2(D\_{j})\_{l,k} f^{\text{vol},j}(u\_{l}, u\_{k}), \tag{7}$$

where VOL*<sup>i</sup>* is the volume term at *ξi* [7]. If *f* vol*,j* are smooth fluxes, the discretisation (7) is of the same order of accuracy as the derivative matrices *Dj* [4, 28]. Moreover, if the mass matrix *M* is diagonal, this approximation can be written in a conservative form [7]. Finally, if the boundary operators *R<sup>T</sup> Bnj R* are also diagonal and *f* vol*,j* are entropy conservative in the sense of Tadmor [40, 41], the semidiscretisation (5) is entropy conservative/stable across elements if the numerical surface fluxes *f* num*,j* are entropy conservative/stable. Moreover, some results on the kinetic energy can be transferred as well [12]. In the following, the focus will lie on the fluxes *f* vol*,j* .

#### **3 Euler Equations and Kinetic Energy**

The kinetic energy *<sup>E</sup>*kin <sup>=</sup> <sup>1</sup> <sup>2</sup>*ρv*<sup>2</sup> fulfils (for sufficiently smooth solutions)

$$
\partial\_t E\_{\rm kin} + \operatorname{div} \left( \frac{1}{2} \rho v^2 v \right) + v \cdot \operatorname{grad} \, p = 0. \tag{8}
$$

Jameson [18] investigated the kinetic energy in a one-dimensional semidiscrete setting using finite volume methods. To simplify the notation, this setup will be used in the following; its extension to multiple dimensions is straightforward. Jameson proposed to mimic (8) semidiscretely by using numerical momentum fluxes of the form *f* num *ρv* <sup>=</sup> *<sup>f</sup>* num *ρv (u*−*, u*+*)* = {{*v*}}*<sup>f</sup>* num *<sup>ρ</sup>* <sup>+</sup>*p*num, where {{*v*}} is the arithmetic mean of *<sup>v</sup>*<sup>−</sup> and *<sup>v</sup>*+, *<sup>f</sup>* num *<sup>ρ</sup>* is the numerical density flux, and *p*num is a consistent numerical approximation of the pressure. Later, this has been used as a kind of "definition" of kinetic energy preserving (KEP) numerical fluxes, e.g. in [3, 12]. However, this is not a well-defined concept, cf. [28, 29]. Indeed, every numerical momentum flux can be written as

$$\|f\_{\rho v}^{\text{num}} = \|\upsilon\| f\_{\rho}^{\text{num}} + \underbrace{\left(f\_{\rho v}^{\text{num}} - \|\upsilon\| f\_{\rho}^{\text{num}}\right)}\_{=:\;\,p^{\text{num}}?} \,\tag{9}$$

Since the numerical fluxes are consistent, *<sup>p</sup>*num := *<sup>f</sup>* num *ρv* − {{*v*}}*<sup>f</sup>* num *<sup>ρ</sup>* is a consistent approximation of the pressure. The insufficiency of the condition *f* num *ρv* = {{*v*}}*<sup>f</sup>* num *<sup>ρ</sup>* <sup>+</sup> *<sup>p</sup>*num is in accordance with observations of Gassner et al. [12]. They investigated a Taylor-Green vortex problem and compared several numerical fluxes for the Euler equations. There, numerical fluxes of the form *f* num *ρv* = {{*v*}}*<sup>f</sup>* num *ρ* + *<sup>p</sup>*num with *<sup>p</sup>*num = {{*p*}} resulted in a clear loss of kinetic energy compared to other KEP fluxes using the arithmetic average *<sup>p</sup>*num = {{*p*}} as approximation of the pressure. They observed that "the discretisation of the pressure plays a crucial role for the kinetic energy" and that the choice of the arithmetic average *<sup>p</sup>*num = {{*p*}} "seems to be important for the kinetic energy equation" [12, Section 4.2]. However, they had no (theoretical) explanations for this observation.

#### *3.1 New Approach to Kinetic Energy Preservation*

By a heuristic argument, the balance law (8) may not be suitable in the incompressible limit: Indeed, for smooth solutions, (8) can be rewritten as

$$
\partial\_t E\_{\rm kin} + \operatorname{div} \left( \frac{1}{2} \rho v^2 v + p v \right) - p \operatorname{div} v = 0,\tag{10}
$$

which becomes a conservation law for smooth solutions of the incompressible Euler equations due to div*(v)* = 0 or an energy inequality similar to the entropy inequality (2). Since the kinetic energy is plays a crucial role in the incompressible limit [24], the second form (10) might be considered the "better" one. Thus, a semidiscretisation mimicking this equation might be desirable near the incompressible limit.

**Definition 1** A numerical flux *<sup>f</sup>* num <sup>=</sup> *(f* num *<sup>ρ</sup> , f* num *ρv , f* num *ρe )* for the Euler equations is called *kinetic energy preserving* (KEP), if the momentum flux can be written as *f* num *ρv* = {{*v*}}*<sup>f</sup>* num *<sup>ρ</sup>* + {{*p*}}.

Definition 1 results in a well-defined concept of KEP numerical fluxes.

**Theorem 1 (Corollary 7.5 of [29])** *If a kinetic energy preserving numerical flux is used in a semidiscrete FV method, the resulting semidiscrete kinetic energy equation mimics both the conservative and the non-conservative terms of Eq.*(10)*.*

*Proof (Sketch)* Using the chain rule in a one dimensional finite volume setting, the time derivative of the kinetic energy in cell *i* becomes

$$\begin{split} \partial\_{l} \left( \frac{1}{2} \rho v^{2} \right)\_{i} &= -\frac{1}{\Delta x\_{i}} \left( \left( \frac{1}{2} \rho v^{2} v + p v \right)^{\text{num}} (u\_{i}, u\_{i+1}) - \left( \frac{1}{2} \rho v^{2} v + p v \right)^{\text{num}} (u\_{i-1}, u\_{i}) \right) \\ &+ p\_{l} \frac{\|\|v\|\|\_{i, i+1} - \|\|v\|\|\_{i-1, i}}{\Delta x\_{l}}, \end{split}$$

where <sup>1</sup> <sup>2</sup>*ρv*2*<sup>v</sup>* <sup>+</sup> *pv*num*(ui, uj )* <sup>=</sup> *vivj <sup>f</sup>* num *<sup>ρ</sup> (ui, uj )* <sup>+</sup> *pivj*+*pj vi* <sup>2</sup> . %&

Using the momentum flux *f* num *ρv* = {{*v*}}*<sup>f</sup>* num *<sup>ρ</sup>* + {{*p*}} in the volume terms (7) in one dimension, the arithmetic average of the pressure yields the volume term *Dp*, i.e. a straightforward discretisation of *∂xp*. Analogous results hold in multiple space dimensions, cf. Sect. 2.

The kinetic energy preserving DG methods presented in [11, 27] use volume terms corresponding to the numerical fluxes *f* num *<sup>ρ</sup>* = {{*ρv*}}, *<sup>f</sup>* num *ρv* = {{*ρv*}}{{*v*}} + {{*p*}}, which are kinetic energy preserving in the sense of Definition 1.

#### *3.2 Entropy Conservative and KEP Numerical Fluxes*

Since entropy stability has received much interest and the entropy conservative numerical fluxes of [3, 17] are not KEP in the sense of Definition 1, it is interesting whether both concepts can be fulfilled simultaneously. The logarithmic mean value {{*ρ*}}log = [[*ρ*]]*/*[[log *ρ*]] has been proposed by Roe [33] in the context of entropy conservative numerical fluxes and is described in [17]. Many useful entropy conservative numerical density fluxes are of the form *f* num *<sup>ρ</sup>* = {{*ρ*}}log{{*v*}}, e.g. the one presented in [3]. This form seems to be preferable, since positivity preservation of the density can be achieved using local Lax-Friedrichs/Rusanov dissipation operators [28, Section 6.2]. Using this ansatz for *f* num *<sup>ρ</sup>* and Definition 1, the following entropy conservative and kinetic energy preserving numerical flux (*f* num*,y* analogously) has been constructed in [29, Section 7.4]

$$\begin{split} f\_{\rho}^{\text{num},\mathbf{x}} &= \{\|\rho\|\} \mathbb{I}\_{\text{log}} \{\|\upsilon\_{\mathbf{x}}\|\}, \ f\_{\rho\upsilon\_{\mathbf{x}}}^{\text{num},\mathbf{x}} = \{\|\upsilon\_{\mathbf{x}}\|\} f\_{\rho}^{\text{num},\mathbf{x}} + \{\|\rho\|\} \ \ f\_{\rho\upsilon\_{\mathbf{y}}}^{\text{num},\mathbf{x}} = \{\|\upsilon\_{\mathbf{y}}\|\} f\_{\rho}^{\text{num},\mathbf{x}}, \\\\ f\_{\rho\epsilon} f\_{\rho\epsilon}^{\text{num},\mathbf{x}} &= \left( \{\|\rho\|\} \log \left( \|\upsilon\_{\mathbf{x}}\|\right)^{2} + \|\upsilon\_{\mathbf{y}}\|^{2} - \frac{\|\upsilon\_{\mathbf{x}}^{2} + \upsilon\_{\mathbf{y}}^{2}\|}{2} \right) + \frac{1}{\gamma - 1} \frac{\|\|\rho\|\|\_{\text{log}}}{\|\|\rho/p\|\|\_{\text{log}}} + \{\|\rho\|\} \{\upsilon\_{\mathbf{x}}\} \\\\ & - \frac{\|\|\boldsymbol{p}\|\|\boldsymbol{v}\|}{4}. \end{split} (11)$$

#### **4 Numerical Results**

Since the kinetic energy is an important quantity for the incompressible Euler equations, a Taylor-Green vortex given by

$$\begin{aligned} \rho(t, \mathbf{x}, \mathbf{y}) &= 1, & v\_x(t, \mathbf{x}, \mathbf{y}) &= \sin(\mathbf{x})\cos(\mathbf{y}), \\ v\_{\mathbf{y}}(t, \mathbf{x}, \mathbf{y}) &= -\cos(\mathbf{x})\sin(\mathbf{y}), & p(t, \mathbf{x}, \mathbf{y}) &= \frac{100}{\mathcal{Y}} + \frac{\cos(2\mathbf{x}) + \cos(2\mathbf{y})}{4}, \end{aligned} \tag{12}$$

for *(x, y)* ∈ [0*,* 2*π*] <sup>2</sup> with periodic boundary conditions is considered, which is a stationary solution of the incompressible Euler equations. Using tensor product Lobatto bases for polynomials of degree *p* = 5 on *N* = 16 elements per coordinate direction, the numerical solutions have been computed in the time interval *t* ∈ [0*,* 30] with the fourth order, ten-stage, strong stability preserving Runge-Kutta method of [19]. The time step *Δt* has been chosen as *Δt* = cfl min = *Δx/(*2*p* + 1*)λ*> , where *λ* is the greatest absolute value of the eigenvalues of *f* and the minimum is taken over all cells and nodes. As in [12], the given numerical fluxes have been used for both the volume terms (7) and as surface fluxes in (6), without additional dissipation.

The evolution of the entropy *U* and the kinetic energy *E*kin using a CFL number cfl = 0*.*9 for the entropy conservative fluxes of Ismail and Roe [17], Chandrashekar [3], and the new flux (11) are visualised in Fig. 1. As can be seen there, the entropy remains approximately constant and the kinetic energy oscillates uniformly until *t* ≈ 20. Afterwards, the kinetic energy drops for the fluxes of [3, 17] and there is a relative change of the entropy of order 10−5. Contrary, there is no visible change for the new flux (11).

The entropy loss for the fluxes of Ismail and Roe [17] and Chandrashekar [3] is caused by the time integration scheme, as can be seen in Fig. 2, where the time step is reduced by an order of magnitude (cfl = 0*.*09). However, the behaviour of the kinetic energy is nearly unchanged.

**Fig. 1** Total entropy and kinetic energy of numerical solutions using different entropy conservative numerical fluxes with cfl = 0*.*9

**Fig. 2** Total entropy and kinetic energy of numerical solutions using different entropy conservative numerical fluxes with cfl = 0*.*09

#### **5 Summary and Discussion**

Using summation-by-parts operators, high order numerical schemes with specific properties can be constructed using symmetric (two-point) numerical fluxes. While several "kinetic energy preserving" methods have been proposed, they have been characterised by a property of the numerical fluxes that is not well-defined. Such numerical fluxes resulted in schemes that did not preserve the kinetic energy as expected [12]. Here, a new approach to kinetic energy preservation inspired by the incompressible Euler equations and developed in [29, Section 7.4] has been described. This results in a well-defined property numerical fluxes have to satisfy in order mimic the balance law for the kinetic energy more reliably. Moreover, new entropy conservative numerical fluxes have been developed that are kinetic energy preserving in the new sense.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Multiwavelet Troubled-Cell Indication: A Comparison of Utilizing Theory Versus Outlier Detection**

**Mathea J. Vuik**

#### **1 Introduction**

Solutions to nonlinear hyperbolic PDEs develop discontinuities in time. The generation of spurious oscillations in such regions can be prevented by applying a limiter in the troubled zones. In [16, 18], two different multiwavelet troubledcell indicators were introduced, one based on a parameter, the other using outlier detection. We present this comparison in order to begin to understand in which regime these tools are effective. In this paper, we investigate the effectiveness of a different detection scheme, based on the theoretical detection of troubled cells using multiwavelet approaches. It uses the cancelation property [6] and the theory about thresholding [8]. This technique was originally used for a multiwaveletbased adaptive strategy in combination with the DG method. However, we are specifically interested in its application for troubled-cell indication. In the troubled cells, the moment limiter is applied [11]. We demonstrate the performance of this new indicator and show that it works very well when very fine meshes are used (the asymptotic regime). For coarser meshes, it seems that the existing multiwavelet troubled-cell indicators perform better.

The outline of this paper is as follows: in Sect. 2, some background information about the multiwavelet theory is given. The existing multiwavelet troubled-cell indicators, as well as the cancelation property and the derived thresholding technique are described in Sect. 3. Numerical results are shown in Sect. 4, and some concluding remarks are given in Sect. 5.

M. J. Vuik (-)

VORtech, Delft, The Netherlands e-mail: thea.vuik@vortech.nl

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_43

#### **2 Multiwavelets and DG**

In this section, we consider the multiwavelet theory that is used to design the different troubled-cell indicators. For the sake of brevity, we neglect discussion of the DG scheme [4, 5], that is used in the computations.

The relation between the DG scheme and multiwavelets was shown in [16]. Any global one-dimensional DG approximation of degree *k* can be written as

$$
\mu\_h(\mathbf{x}) = 2^{-\frac{\theta}{2}} \sum\_{j=0}^{2^n - 1} \sum\_{\ell=0}^k \mu\_j^{(\ell)} \phi\_{\ell j}^n(\mathbf{x}),
$$

where *φ<sup>n</sup> j* are the scaling functions related to the orthonormal Legendre polynomials. The corresponding multiwavelet decomposition is

$$
\mu\_h(\mathbf{x}) = \sum\_{\ell=0}^k s\_{\ell 0}^0 \phi\_\ell(\mathbf{x}) + \sum\_{m=0}^{n-1} \sum\_{j=0}^{2^m - 1} \sum\_{\ell=0}^k d\_{\ell j}^m \psi\_{\ell j}^m(\mathbf{x}),
$$

where *s*<sup>0</sup> <sup>0</sup> are the scaling-function coefficients belonging to *uh*, and *<sup>d</sup><sup>m</sup> j* are the corresponding multiwavelet coefficients, [2, 16]. The multiwavelets *ψ* have been developed by Alpert [1].

#### **3 Utilizing Multiwavelet Coefficients for Troubled-Cell Indication**

In this section, we show different troubled-cell indicators that utilize multiwavelet coefficients. Note that, as the detectors are solely based on the underlying approximation space, the ideas do not need to be modified in order to be applied to other types of model problems than those included in this paper. First, the existing indicators that use either a parameter or the boxplot method are presented. Next, the cancelation property and thresholding technique are used to design a different indication technique.

#### *3.1 Boxplots for Outlier Detection*

In [16, 17], we have shown that the coefficients *dn*−<sup>1</sup> *kj* are very useful for troubledcell indication. With this knowledge, we have designed two different troubledcell indicators. The first indicator is the so-called *parameter-based* multiwavelet troubled-cell indicator [16]. Here, we detect an element as troubled when

$$|d\_{kj}^{n-1}| > C \cdot \max\{|d\_{kj}^{n-1}|, j = 0, \dots, 2^n - 1\}, \ C \in [0, 1]. \tag{1}$$

The value of *C* is a useful tool to prescribe the strictness of the limiter.

Another option is to use *outlier detection* on the multiwavelet coefficients *dn*−<sup>1</sup> *kj* to detect the troubled cells [18]. Here, Tukey's boxplot method [14] is applied locally to prevent the need for a problem-dependent parameter. The different steps are presented in Algorithm 1.


Outliers are the coefficients in the vector that are straying far out beyond the others. In order to pick out certain coefficients as outliers, the outer fences are constructed, which were originally defined by Tukey [14]. The outer fences of a vector are [*Q*<sup>1</sup> − 3*(Q*<sup>3</sup> − *Q*1*), Q*<sup>3</sup> + 3*(Q*<sup>3</sup> − *Q*1*)*] (coefficients outside are called *extreme outliers*). The coverage for this whisker length is 99.9998%, such that only 0.0002% of the data in a normally distributed vector is detected as an extreme outlier (asymptotically) [9].

In our computations, we always use local vectors of length 16.

#### *3.2 Cancelation Property*

In this section, the *cancelation property* is stated and proved for the one-dimensional case [6]. Here, we assume that the multiwavelets have *M* + 1 vanishing moments. In our case, we have *M* = + *k* [1, 15]. If the solution satisfies the continuity requirement *u*|*<sup>I</sup> <sup>m</sup> j* <sup>∈</sup> *<sup>C</sup>M*+1*(I<sup>m</sup> <sup>j</sup> )* (where *<sup>I</sup><sup>m</sup> <sup>j</sup>* is the *j* -th element in level *m*), then

$$d\_{\ell j}^m \le \frac{1}{(M+1)!} \cdot ||u^{(M+1)}||\_{L^\infty(I\_j^m)} \cdot 2^{(-m+1)(M+3/2)},\tag{2}$$

*<sup>m</sup>* <sup>=</sup> <sup>0</sup>*,...,n*, *<sup>j</sup>* <sup>=</sup> <sup>0</sup>*,...,* <sup>2</sup>*<sup>m</sup>* <sup>−</sup> 1, <sup>=</sup> <sup>0</sup>*,...,k*.

The proof uses a Taylor expansion of *u* about element center *x<sup>m</sup> <sup>j</sup>* : there exists a *ξ* between *x* and *x<sup>m</sup> <sup>j</sup>* such that

$$u(\mathbf{x}) = u(\mathbf{x}\_j^m) + u'(\mathbf{x}\_j^m)(\mathbf{x} - \mathbf{x}\_j^m) + \dots + \frac{u^{(M)}(\mathbf{x}\_j^m)}{M!}(\mathbf{x} - \mathbf{x}\_j^m)^M + \frac{u^{(M+1)}(\xi)}{(M+1)!}(\mathbf{x} - \mathbf{x}\_j^m)^{M+1}.$$

Using that the first *M* + 1 moments of the multiwavelets vanish, we find

$$d\_{\ell j}^{m} = \langle u, \psi\_{\ell j}^{m} \rangle\_{I\_{j}^{m}} = \left\langle \frac{u^{(M+1)}(\xi)}{(M+1)!} (\mathbf{x} - \boldsymbol{\boldsymbol{x}}\_{j}^{m})^{M+1}, \psi\_{\ell j}^{m} \right\rangle\_{I\_{j}^{m}}$$

$$\leq \frac{1}{(M+1)!} ||u^{(M+1)}||\_{L^{\infty}(I\_{j}^{m})} \langle (\mathbf{x} - \boldsymbol{\boldsymbol{x}}\_{j}^{m})^{M+1}, \psi\_{\ell j}^{m} \rangle\_{I\_{j}^{m}}.\tag{3}$$

Next, we use Cauchy-Schwarz's inequality to find

$$\langle (\mathbf{x} - \mathbf{x}\_j^m)^{M+1}, \boldsymbol{\psi}\_{\ell j}^m \rangle\_{I\_j^m} \le ||(\mathbf{x} - \mathbf{x}\_j^m)^{M+1}||\_{L^2(I\_j^m)} \cdot ||\boldsymbol{\psi}\_{\ell j}^m||\_{L^2(I\_j^m)} = ||(\mathbf{x} - \mathbf{x}\_j^m)^{M+1}||\_{L^2(I\_j^m)},$$

because the multiwavelets are orthonormal. Using the notation *Δx<sup>m</sup>* for the element size in level *m*, we have

$$||(\mathbf{x} - \mathbf{x}\_j^m)^{M+1}||\_{L^2(I\_j^m)} \le (\Delta \mathbf{x}^m)^{M+1}||\mathbf{l}||\_{L^2(I\_j^m)} = (\Delta \mathbf{x}^m)^{M+1} \sqrt{\Delta \mathbf{x}^m} = (\Delta \mathbf{x}^m)^{M+3/2}.$$

For the domain [−1*,* <sup>1</sup>], we have *Δx<sup>m</sup>* <sup>=</sup> <sup>2</sup>−*m*+1. This means that

$$||(x - x\_j^m)^{M+1}||\_{L^2(I\_{/}^m)} \le 2^{(-m+1)(M+3/2)},$$

which proves the cancelation property. It should be noticed that this result can be generalized to general grid hierarchies and higher-dimensional problems [6, 10].

The next section contains a discussion of the thresholding technique for onedimensional multiwavelet expansions.

#### *3.3 Thresholding of the Multiwavelet Coefficients*

In this section, the thresholding technique for systems of conservation laws in one dimension is explained, which is based on the cancelation property [8]. This technique is originally used for a multiwavelet-based adaptive strategy in combination with the DG method. However, we are specifically interested in its application for troubled-cell indication.

Following [8], the element *I <sup>n</sup>*−<sup>1</sup> *<sup>j</sup>* is detected as troubled if

$$\max\_{\substack{\ell=0,\ldots,k\\r=1,2,3}} \left( \frac{|d\_{\ell j}^{n-1}(r)|}{\max\left\{\max\_{j=0,\ldots,2^{n}-1} 2^{(n-1)/2} |s\_{0j}^{n}(r)|, 1\right\}} \right) > \varepsilon\_{n-1} \sqrt{2\Delta x}.$$

Here, the value *r* is related to the conserved quantity in a system of three PDEs. The factor <sup>√</sup> 2*Δx* (with *Δx* the DG mesh width) occurs because of a scaling difference: the multiwavelets in [8] are scaled with respect to the *L*∞-norm, whereas an *<sup>L</sup>*2-norm scaling is used in this paper. The level-dependent threshold value *εn*−<sup>1</sup> is chosen as *εn*−<sup>1</sup> = *ε/*2. The parameter *ε* can be chosen using two different strategies [8]. The first option is to use the *a priori* strategy, which is based on the balance between discretization errors and perturbation errors of adaptive meshes [10]. If the solution contains discontinuities, then the a priori strategy leads to *<sup>ε</sup>* <sup>=</sup> *CΔx*2. The second option is the *heuristic* approach, which is based on numerous computations for practical applications [8]. This method is more efficient since it is less pessimistic than the a priori strategy. For discontinuous solutions, the heuristic approach uses *ε* = *CΔx*.

This yields detection of element *I <sup>n</sup>*−<sup>1</sup> *<sup>j</sup>* if

$$\max\_{\substack{\ell=0,\ldots,k\\r=1,2,3}} \left( \frac{|d\_{\ell j}^{n-1}(r)|}{\max\left\{\max\_{j=0,\ldots,2^{n}-1} 2^{(n-1)/2} |s\_{0j}^{n}(r)|, 1\right\}} \right) > \frac{1}{\sqrt{2}} \Delta x^{\beta+0.5} C,$$

where *β* = 2 for the a priori strategy and *β* = 1 for the heuristic strategy. Note that the multiwavelet coefficients are scaled by the cell average if this value is greater than 1 in absolute value (to prevent division by zero).

The optimal choice of the parameter *C* depends on the problem, in particular on the strength of the shock compared to the normal amplitude of the solution. The smaller *C* is, the more elements are detected. In general, the value *C* = 1*/(b* − *a)* should work for the domain [*a, b*] [8]. If *C* is chosen too small, then too many cells are detected as troubled. For the adaptive strategy, this is not really problematic since the approximation is usually more accurate on a finer grid. However, for troubledcell indication, it is important to detect the correct number of elements.

It should be noticed that this indicator is designed for very fine resolutions (since the strategies use asymptotic arguments). For coarse meshes, smaller values of *C* should be used, which are difficult to predict a priori.

#### *3.4 Generalized Grids*

The algorithm for utilizing Alpert's multiwavelets for a nonuniform grid is given in [7]: the only difference with Alpert's algorithm [1] is that no additional vanishing moments are added. Multiwavelets for one-dimensional irregular meshes have been designed in [12, 13]. It should be noticed that this construction is local, which means that the resulting bases are depending on the level and the position unless there is an affine mapping from the element to a reference element. This leads to slower computations. On the other hand, the use of such multiwavelet space makes it possible to decompose the DG approximation to a multiwavelet expansion exactly. The multiwavelet coefficients will again become small if the underlying function is smooth, and the mesh width between two neighboring elements is not varying too much.

When coupled with a troubled-cell indication variable, it will be necessary to include spatial information of the mesh in the algorithm using the element size. Alternatively, one can use of a window-based technique [3]. A window is a fixed length subsequence of the test sequence, which can be slid through the domain using a sliding step. These issues and resulting numerics are discussed further in [15].

#### **4 Numerical Results**

In this section, the different multiwavelet troubled-cell indicators are applied to onedimensional problems based on the Euler equations of gas dynamics.

The results for the original multiwavelet troubled-cell indicators (both based on a parameter, and based on outlier detection), can be seen in Figs. 1 and 2 (polynomial degree 2, 128 elements for Sod's and Lax's shock tube, and 512 elements for the blast-wave and Shu-Osher problem). The parameter-based technique performs well if a suitable value for the problem-dependent parameter C is chosen. The outlierdetection results are generally better than the original troubled-cell indicator using an optimized parameter: both the weak and the strong shock regions were detected, whereas smooth regions were not selected.

It is also possible to use the thresholding technique for multiwavelet coefficients to detect troubled cells. It turns out that this indicator works very well as long as an appropriate value for *C* is chosen, and the mesh is taken fine enough. The results for the different test cases are visualized in Fig. 3 using the heuristic strategy (polynomial degree 2, 1024 elements for all models). Here, we take the value *C* = 1*/(b* − *a)* where [*a, b*] is the domain on which the test problem is defined. Note that this thresholding technique is very accurate. However, many elements should be used to meet the asymptotic properties of the indicator.

**Fig. 1** Time-history plot of detected troubled cells using the parameter-based multiwavelet troubled-cell indicator, polynomial degree 2. (**a**) Sod's shock tube, *C* = 0*.*1, 128 elements. (**b**) Lax's shock tube, *C* = 0*.*1, 128 elements. (**c**) Blast-wave problem, *C* = 0*.*05, 512 elements. (**d**) Shu-Osher, *C* = 0*.*01, 512 elements

If the number of elements is taken smaller, then *C* should decrease to detect the correct features. In that case, it is difficult to guess the correct value of *C*. Another option is to use the a priori strategy for coarser meshes, see Fig. 4 (polynomial degree 2, 128 elements for Sod's and Lax's shock tube, and 512 elements for the blast-wave and Shu-Osher problem). If *C* = 1*/(b* − *a)* is used, then this approach works well for Sod's and Lax's shock tube, but too many elements are detected for the blast-wave and the Shu-Osher problem. Also here, the value of *C* should be adapted to find the correct results.

**Fig. 2** Time-history plot of detected troubled cells using the outlier-detection multiwavelet troubled-cell indicator, polynomial degree 2. (**a**) Sod's shock tube, 128 elements. (**b**) Lax's shock tube, 128 elements. (**c**) Blast-wave problem, 512 elements. (**d**) Shu-Osher problem, 512 elements

#### **5 Conclusions and Recommendations**

In this paper, a new troubled-cell indicator was formed, based on the cancelation property for multiwavelets and the derived thresholding technique. Inspection of this technique reveals that it is very useful to design adaptive meshes [8]. For troubled-cell indication, we found out that detection is very accurate as long as a very fine mesh is used. For coarser meshes, it seems to be more useful to apply a different detection method. Furthermore, it is not straightforward how to choose the parameter *C*.

More research should be done to see in which way the cancelation property for multiwavelet coefficients can be used for the accurate detection of troubled cells. For example, it could be that this property also relates to the severity of the shocks.

**Fig. 3** Thresholding technique with heuristic approach: time-history plot of detected troubled cells, 1024 elements, polynomial degree 2, *C* = 1*/(b* − *a)*, with [*a, b*] the computational domain. (**a**) Sod's shock tube. (**b**) Lax's shock tube. (**c**) Blast-wave problem. (**d**) Shu-Osher problem

**Fig. 4** Thresholding technique with a priori approach on coarser meshes: time-history plot of detected troubled cells, polynomial degree 2, *C* = 1*/(b*−*a)*, with [*a, b*] the computational domain. (**a**) Sod's shock tube, 128 elements. (**b**) Lax's shock tube, 128 elements. (**c**) Blast-wave problem, 512 elements. (**d**) Shu-Osher problem, 512 elements

**Acknowledgements** The author gratefully wishes to acknowledge the collaboration with Jennifer Ryan and the useful comments provided by Siegfried Müller that helped to shape this work.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **An Anisotropic p-Adaptation Multigrid Scheme for Discontinuous Galerkin Methods**

**Andrés M. Rueda-Ramírez, Gonzalo Rubio, Esteban Ferrer, and Eusebio Valero**

### **1 Introduction**

In recent decades, high-order discontinuous Galerkin (DG) methods have been gaining increasing popularity for high-accuracy solutions of systems of conservation laws, such as the compressible Euler and Navier-Stokes equations [5, 6, 22]. The lack of a continuity constraint on element interfaces makes DG methods robust for describing advection-dominated problems when an appropriate Riemann solver is selected [5, 12, 22].

Multigrid methods speed up the iterative solution of large systems of equations using coarse-grid representations (lower levels). Iterative methods (known as *smoothers* in the multigrid community) are good at eliminating the high frequencies of the error fast; therefore, when applied to coarse-grid representations, they also reduce the low frequencies of the error. They have been broadly used in the highorder community in recent years in the form of p-multigrid [2, 8] (where levels are constructed using different polynomial orders) and hp-multigrid [14, 21] (where both the order and size of the elements are changed). Two types of multigrid methods can be found in the literature: linear and nonlinear multigrid. In our work, we make use of the nonlinear multigrid scheme, also known as the Full Approximation Scheme (FAS), since it enables the estimation of the truncation error of coarse representations, as will be shown. The smoother can be either a time-marching scheme (implicit or explicit), or an iterative method applied to the linearized problem.

e-mail: am.rueda@upm.es; g.rubio@upm.es; esteban.ferrer@upm.es; eusebio.valero@upm.es

A. M. Rueda-Ramírez (-) · G. Rubio · E. Ferrer · E. Valero

ETSIAE-UPM (School of Aeronautics - Universidad Politécnica de Madrid), Plaza Cardenal Cisneros, Madrid, Spain

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_44

Because of the allowed discontinuities on element interfaces, DG methods are capable of handling non-conforming meshes with hanging nodes and/or different polynomial orders efficiently [7, 13, 15]. It is possible to take advantage of this feature to accelerate the computations through local adaptation strategies. Local adaptation can be performed by subdividing or merging elements (h-adaptation) or by enriching or reducing the polynomial order in certain elements (p-adaptation). The main idea behind these methodologies is to reduce the number of degrees of freedom (NDOF) while maintaining a high accuracy, which translates into shorter computational times and reduced storage requirements. Furthermore, since several 2D and 3D implementations of the DG methods use tensor-product basis functions, it is possible to adapt the polynomial order in each coordinate direction independently. In order to identify the localized regions that need increased or decreased accuracy, an error estimator is commonly used.

There are several approaches to estimate the error and drive an adaptation method. In this work, we focus on truncation error estimates since it has been shown that a reduction of the truncation error controls the numerical accuracy of all functionals [10], hence reducing the truncation error necessarily leads to a more accurate lift and drag. The *τ* -estimation method [4] is a way to estimate the truncation error locally that has been used to drive mesh adaptation strategies in low-order [9, 20] and high-order methods [10, 17, 18]. The adaptation strategy consists in converging a high order representation (reference mesh) to a specified global residual and then performing a single error estimation followed by a corresponding mesh adaptation process. Rueda-Ramírez et al. [19] developed a new method for estimating the truncation error of anisotropic representations that is cheaper to evaluate than previous implementations, and showed that it produces very accurate extrapolations of the truncation error, which enables the use of coarser reference meshes.

In this work, we employ the anisotropic truncation error estimator developed in [19] and the anisotropic p-adaptation method detailed in [18] to accelerate the computation of the compressible steady viscous flow past a NACA0012 at angle of attack 5◦, Re<sup>∞</sup> = 200 based on the airfoil chord, and *M*<sup>∞</sup> = 0*.*2. This particular settings correspond to a steady laminar flow, but the proposed method can be directly used with any steady solution (e.g. RANS). The paper is organized as follows: In Sect. 2, we briefly describe the methods used in this paper. In Sect. 3, we compare the performance of the proposed methods with traditional strategies for solving the flow past a NACA0012 and show the speed-up advantages for different accuracies. Finally, the conclusions are summarized in Sect. 4.

#### **2 Methods**

#### *2.1 DG Method*

We consider the approximation of systems of conservation laws,

$$
\partial\_t \mathbf{q} + \nabla \cdot \mathcal{J} = \mathbf{s}, \tag{1}
$$

where **q** is the vector of conserved variables, *F* is the flux dyadic tensor, and **s** is a source term. The domain *Ω* is partitioned in a mesh T = {*e*} consisting of *K* non-overlapping elements *Ωe*. Multiplying equation (1) by a test function **v** and integrating by parts over each subdomain *Ω<sup>e</sup>* yields the weak formulation:

$$\int\_{\Omega^{\varepsilon}} \partial\_{l} \mathbf{q} \mathbf{v} \mathrm{d}\Omega^{\varepsilon} - \int\_{\Omega^{\varepsilon}} \mathcal{J} \cdot \nabla \mathbf{v} \mathrm{d}\Omega^{\varepsilon} + \int\_{\partial \Omega^{\varepsilon}} \mathcal{J} \cdot \mathbf{n} \mathbf{v} \mathrm{d}\sigma^{\varepsilon} = \int\_{\Omega^{\varepsilon}} \mathbf{s} \mathbf{v} \mathrm{d}\Omega^{\varepsilon}.\tag{2}$$

Let **q**, **s**, *F* and **v** be approximated by piece-wise polynomial functions defined in the space of *<sup>L</sup>*<sup>2</sup> functions: *<sup>V</sup> <sup>N</sup>* = {**v***<sup>N</sup>* <sup>∈</sup> *<sup>L</sup>*2*(Ωe)* : **<sup>v</sup>***<sup>N</sup>* <sup>|</sup>*Ω<sup>e</sup>* <sup>∈</sup> *<sup>P</sup><sup>N</sup> (Ωe)* <sup>∀</sup> *<sup>Ω</sup><sup>e</sup>* <sup>∈</sup> *<sup>T</sup>* }, where *P<sup>N</sup> (Ωe)* is the space of polynomials of degree at most *N*. The functions in *V<sup>N</sup>* can be represented in each element as a linear combination of basis functions *φ<sup>N</sup> <sup>i</sup>* <sup>∈</sup> *<sup>P</sup><sup>N</sup> (Ωe)* (e.g. **<sup>q</sup>***<sup>N</sup>* <sup>|</sup>*Ω<sup>e</sup>* <sup>=</sup> <sup>&</sup>lt; *<sup>i</sup>* **Q***<sup>N</sup> <sup>i</sup> <sup>φ</sup><sup>N</sup> <sup>i</sup>* ), where *<sup>φ</sup><sup>N</sup> <sup>i</sup>* are usually tensor product expansions. After some manipulations, the discontinuous Galerkin finite element discretization system is obtained:

$$\mathbb{E}[\underline{\mathbf{M}}]\partial\_{\mathbb{I}}\mathbf{Q}^{N} + \mathbf{F}(\mathbf{Q}^{N}) = \underline{\mathbf{[M}}\underline{\mathbf{S}}^{N},\tag{3}$$

where [**M**] is the mass matrix and **F** is a nonlinear operator, which are the assembled global versions of the element-wise mass matrices and nonlinear operators:

$$\begin{aligned} \left[\underline{\mathbf{M}}\right]\_{l,j}^{e} &= \int\_{\varOmega^{e}} \phi\_{l} \phi\_{j} \mathrm{d}\varOmega^{e}, \\\ \mathbf{F}^{\epsilon}(\mathbf{Q})\_{j} &= \sum\_{l=1}^{\mathrm{NDOF}^{\epsilon}} \left[ - \int\_{\varOmega^{e}} \mathbf{J}\_{l}^{\mathbf{e}^{\epsilon}} \cdot \phi\_{l} \nabla \phi\_{j} \mathrm{d}\varOmega^{e} \right] + \int\_{\partial\varOmega^{e}} \mathcal{P}^{\mathrm{a}N} \left( \mathbf{Q}, \mathbf{Q}^{-}, \mathbf{n} \right) \phi\_{j} \mathrm{d}\varsigma^{e}, \end{aligned} \tag{5}$$

where *F<sup>e</sup> <sup>i</sup>* is the *<sup>i</sup>*th position of the vector *<sup>F</sup>e*, which contains the value of *<sup>F</sup><sup>e</sup>* for all the degrees of freedom of element *e*. In the rest of this paper, bold uppercase Roman letters and bold Greek letters are used to note vectors spanning several degrees of freedom, unless specified.

The numerical flux function *F*∗ allows to uniquely define the flux at the element interfaces and to weakly prescribe the boundary data as a function of the conserved variable on both sides of the boundary/interface and the normal vector. In the present work, we use the scheme by Roe [16] as the advective Riemann solver and the original scheme by Bassi and Rebay [1] (BR1) as the diffusive Riemann solver.

#### *2.2 Full Approximation Scheme p-Multigrid*

The Full Approximation Scheme (FAS) is a nonlinear version of the multigrid method that is specially suited to solve systems of nonlinear equations [4]. Departing from Eq. (3) and defining the operator **<sup>A</sup>***(***Q***<sup>N</sup> )* = [**M**] <sup>−</sup>1**F***(***Q***<sup>N</sup> )*, the steady-state problem of order *P* yields

$$\mathbf{A}(\mathbf{Q}^P) = \mathbf{S}^P. \tag{6}$$

After *β*<sup>1</sup> sweeps of a smoother, a non-converged solution **Q**˜ *<sup>P</sup>* is obtained that has an associated discretization error *#<sup>P</sup>* <sup>=</sup> **<sup>Q</sup>***<sup>P</sup>* <sup>−</sup> **<sup>Q</sup>**˜ *<sup>P</sup>* . The FAS multigrid procedure consists in obtaining an approximation to the discretization error in a coarse grid of order *N* and projecting it to the original problem of order *P*:

$$\boldsymbol{\epsilon}^{P} = \underline{\mathbf{I}}\_{N}^{P} \boldsymbol{\epsilon}^{N} = \underline{\mathbf{I}}\_{N}^{P} (\mathbf{Q}^{N} - \underline{\mathbf{I}}\_{P}^{N} \tilde{\mathbf{Q}}^{P}),\tag{7}$$

where **I** *P <sup>N</sup>* is an *<sup>L</sup>*<sup>2</sup> projection operator *<sup>N</sup>* <sup>→</sup> *<sup>P</sup>* and **<sup>Q</sup>***<sup>N</sup>* is the solution to the coarsegrid problem:

$$\mathbf{A}^N(\mathbf{Q}^N) = \mathbf{S}^N,\tag{8}$$

where the source term is defined as

$$\mathbf{S}^{N} = \mathbf{A}^{N}(\underline{\mathbf{I}}\_{P}^{N}\tilde{\mathbf{Q}}^{P}) + \underline{\mathbf{I}}\_{P}^{N}\left(\mathbf{S}^{P} - \mathbf{A}^{P}(\tilde{\mathbf{Q}}^{P})\right). \tag{9}$$

In practice, several p-multigrid levels are used in V- or W-cycles. The smoothing steps that are performed when coarsening are called pre-smoothing sweeps, and the ones performed when refining back are called post-smoothing sweeps. Furthermore, **Q***<sup>N</sup>* is not obtained exactly in the coarse grids, but approximated using an iterative method **<sup>Q</sup>**˜ *<sup>N</sup>* <sup>→</sup> **<sup>Q</sup>***<sup>N</sup>* . In this work, we use a third order low-storage Runge-Kutta (RK3) as the smoother and V-cycles.

#### *2.3 τ -Based p-Adaptation*

In this section we show how to drive an anisotropic p-adaptation procedure using the truncation error, which is estimated in the multigrid procedure.

#### **2.3.1 The Anisotropic** *τ* **-Estimation Method**

The *non-isolated* truncation error of a discretization of order *N* is defined as

$$
\boldsymbol{\pi}^{N} = \mathcal{R}^{N}(\mathbf{I}^{N}\mathbf{q}) - \mathcal{R}(\mathbf{q}), \tag{10}
$$

where **<sup>q</sup>** is the exact solution to the problem, **<sup>I</sup>***<sup>N</sup>* is a discretizing operator, <sup>R</sup> is the continuous partial differentiation operator, and <sup>R</sup>*<sup>N</sup>* is the discrete partial differentiation operator. From Eqs. (1) and (3):

$$\mathcal{R}(\mathbf{q}) = \mathbf{s} - \nabla \cdot \mathcal{F},\tag{11}$$

$$\mathcal{R}^N(I^N\mathbf{q}) = [\underline{\mathbf{M}}] \mathbf{S}^N - \mathbf{F}(I^N\mathbf{q}),\tag{12}$$

where *I <sup>N</sup>* is an operator that samples the exact solution on the points that correspond to the degrees of freedom of a representation of order *N*, and therefore Eq. (12) corresponds to the sampled values of <sup>R</sup>*<sup>N</sup> (***I***<sup>N</sup>* **<sup>q</sup>***)*.

Note that in steady cases, R*(***q***)* = 0 holds. Since the exact solution **q** is usually not at hand, we utilize the *quasi a-piori τ* -estimation method, which approximates the exact solution with the non-converged solution on a high-order grid **<sup>q</sup>** <sup>≈</sup> **<sup>q</sup>**˜*<sup>P</sup>* , where *N<P*. Therefore, the steady *non-isolated* truncation error estimation yields

$$\boldsymbol{\pi}\_P^N = \mathcal{R}^N(\underline{\mathbf{I}}\_P^N \tilde{\mathbf{q}}^P) \quad \rightarrow \quad \boldsymbol{\pi}\_P^N = \mathcal{R}^N(\underline{\mathbf{I}}\_P^N \tilde{\mathbf{Q}}^P) = [\underline{\mathbf{M}}] \mathbf{S}^N - \mathbf{F}(\underline{\mathbf{I}}\_P^N \tilde{\mathbf{Q}}^P). \tag{13}$$

On the left side of the arrow is the estimation of the truncation error that lives in the space *V<sup>N</sup>* , and on the right side is the sampled form of the truncation error estimation on the points that correspond to the degrees of freedom. In a DG representation, one can also define the *isolated* truncation error *τ*ˆ as

$$\hat{\mathbf{r}}\_P^N = \hat{\mathbf{R}}^N(\underline{\mathbf{I}}\_P^N \tilde{\mathbf{Q}}^P) = [\underline{\mathbf{M}}] \mathbf{S}^N - \hat{\mathbf{F}}(\underline{\mathbf{I}}\_P^N \tilde{\mathbf{Q}}^P),\tag{14}$$

where **F**ˆ is the assembled version of the *isolated* nonlinear operator, defined elementwise as

$$\mathbf{F}^{\varepsilon}(\mathbf{Q})\_{j} = \sum\_{l=1}^{\mathrm{NDOF}^{\varepsilon}} \left[ -\int\_{\Omega^{\varepsilon}} \mathcal{J}\_{l}^{\varepsilon} \cdot \phi\_{l} \nabla \phi\_{j} \mathrm{d}\Omega^{\varepsilon} \right] + \int\_{\partial\Omega^{\varepsilon}} \mathcal{J}^{N} \cdot \mathbf{n} \phi\_{j} \mathrm{d}\sigma^{\varepsilon}. \tag{15}$$

Note that Eq. (15) is (5) without substituting *F* by the numerical flux *F*∗. This change eliminates the influence of the neighboring elements and boundaries on the truncation error of each element. We drop the hat notation in the next statements since they are valid for both the *isolated* and *non-isolated* truncation error.

The *τ* -estimation method can also be used with anisotropic representations, i.e.

$$
\pi\_{P\_1 P\_2}^{N\_1 N\_2} = \mathcal{R}^{N\_1 N\_2} (\underline{\mathbf{I}}\_{P\_1 P\_2}^{N\_1 N\_2} \tilde{\mathbf{q}}^{P\_1 P\_2}),\tag{16}
$$

where *Ni* and *Pi* are the polynomial orders in the direction *i* of the analyzed representation and the high-order reference solution, respectively, where *Ni < Pi*. Additionally, Rueda-Ramírez et al. [19] showed that the truncation error of an anisotropic representation can be estimated using directional components:

$$
\pi^{N\_1 N\_2} \approx \pi\_1^{N\_1 N\_2} + \pi\_2^{N\_1 N\_2} \approx \pi\_{P\_1 P\_2}^{N\_1 P\_2} + \pi\_{P\_1 P\_2}^{P\_1 N\_2},\tag{17}
$$

where the directional components in discrete form are therefore,

$$\mathfrak{tr}\_1 = \mathfrak{r}\_{P\_1 P\_2}^{N\_1 P\_2} = [\underline{\mathbf{M}}] \mathbf{S}^{N\_1 P\_2} - [\underline{\mathbf{M}}] \mathbf{A} (\underline{\mathbf{I}}\_{P\_1 P\_2}^{N\_1 P\_2} \tilde{\mathbf{Q}}^{P\_1 P\_2}),\tag{18}$$

and that these directional components decrease exponentially with the polynomial order in smooth solutions. Consequently, it is possible to use a semi-converged solution **<sup>q</sup>**˜*P*1*P*<sup>2</sup> to estimate *<sup>τ</sup> <sup>N</sup>*1*N*<sup>2</sup> (*Ni < Pi*) and then extrapolate the directional components *τi* to obtain the values of *τ <sup>N</sup>*1*N*<sup>2</sup> for *Ni > Pi*. Figure 1a shows a graphical representation of the truncation error *τ <sup>N</sup>*1*N*<sup>2</sup> as estimated with a semiconverged solution of order *P*<sup>1</sup> = *P*<sup>2</sup> = 5.

#### **2.3.2 The p-Adaptation Multigrid Scheme**

It has been shown that the use of FAS p-multigrid methods speeds up the computation of steady-state and unsteady solutions of the compressible Navier-Stokes equations [2, 8]. In addition, Rueda-Ramírez et al. [18] showed that the truncation error of an anisotropic representation can be inexpensively obtained inside an anisotropic p-multigrid cycle that performs the coarsening in one coordinate direction at a time. In fact, the second term of Eq. (18) is naturally computed in an anisotropic multigrid for obtaining the coarse-grid source term (Eq. (9)).

Therefore, we propose a p-adaptation multigrid scheme that makes use of the multigrid as a solver, but also as an error estimator. Every time the error is estimated, an anisotropic p-multigrid strategy is used to generate a truncation error map for each element, like the one in Fig. 1a. Afterwards, the polynomial orders in the different coordinate directions are selected for

**Fig. 1** (**a**) Truncation error map for a specific element that shows log *<sup>τ</sup> <sup>N</sup>*1*N*<sup>2</sup> 5*,*5 <sup>∞</sup> as a function of *N*<sup>1</sup> and *N*<sup>2</sup> (the black box shows the limit between the *estimated* and *extrapolated* maps). (**b**) Map of degrees of freedom (the black boxes show the polynomial orders that achieve *<sup>τ</sup> <sup>N</sup>*1*N*<sup>2</sup> 5*,*5 <sup>∞</sup> *<sup>&</sup>lt;* 10−5)

each element, such that a truncation error threshold *τmax* is achieved with the minimum NDOF possible, as illustrated in Fig. 1b. In the simulations shown in this paper, the reference representation, **<sup>q</sup>**˜*<sup>P</sup>* , is converged to a residual *τmax/*10 before the p-adaptation stage, so that the truncation error is accurately estimated down to *τmax*, as was shown necessary by Kompenhans et al. [10].

#### **3 Flow Past a NACA0012 Airfoil**

In this section, we compare the performance of the proposed p-adaptation multigrid scheme with a uniformly adapted p-multigrid method (without local padaptation) and a uniformly adapted RK3 method when solving the steady viscous flow past a NACA0012 airfoil at angle of attack 5◦, Re<sup>∞</sup> = 200 (*L*<sup>∞</sup> = *Lchord*) and M<sup>∞</sup> = 0*.*2. This particular settings correspond to a steady laminar flow, but the proposed method can be directly used with any steady solution (e.g. RANS). An unstructured mesh of 2011 quadrilateral elements is employed (Fig. 2).

In the cases where multigrid is employed, the RK3 scheme is used as the iterative method (smoother), so that additional speed-ups are only due to the methods exposed in Sect. 2. As in [18], a residual-based smoothing strategy is performed. The minimum number of smoothing sweeps is *β* = 200 for the coarsest multigrid level (*N* = 1) and *β* = 50 for any other level. After every *β* presmoothing sweeps, the residual in the next (coarser) representation is checked. If R*N* <sup>∞</sup> *<sup>&</sup>lt;* <sup>1</sup>*.*<sup>2</sup> <sup>R</sup>*N*−<sup>1</sup> ∞, the pre-smoothing is stopped; otherwise, *<sup>β</sup>* additional

**Fig. 2** Pressure contours of the flow past a NACA0012 at angle of attack 5◦

sweeps are performed. Similarly, the norm of the residual after the post-smoothing is forced to be at least as low as it was after the pre-smoothing, R*N post* <sup>∞</sup> <sup>≤</sup> R*N pre* ∞. If that condition is not fulfilled, additional *<sup>β</sup>* sweeps are taken until it is.

The *isolated* truncation error estimate is used to drive the p-adaptation method since it has been shown to provide better results than the *non-isolated* one [17– 19]. The conservative form (Eq. (1)) of the compressible Navier-Stokes equations is discretized using the Discontinuous Galerkin Spectral Element Method (DGSEM) [3, 12], which is a nodal (collocation) version of a DG method that uses Gauss points as the solution nodes and quadrature points, obtaining diagonal mass matrices. However, the methods that are exposed here can be applied to any DG scheme with tensor-product basis functions.

In [18] it was explained that, when using the DGSEM in general 3D curved meshes and p-nonconforming representations, the order of the mapping must be at most *M* ≤ *N/*2 for the numerical representation to be free-stream preserving. For this reason, the use of a *conforming algorithm* was proposed, which forces the polynomial orders to be conforming in the first layer of elements on a curved boundary. The use of a *conforming algorithm* is necessary to retain the well-known *M* ≤ *N* condition of the DGSEM [11]. In this work, we use the *conforming algorithm* on the airfoil surface since it showed to produce better results, although its use is not imperative as the considered test case is 2D.

For the uniformly adapted cases, the polynomial order is varied between *N* = 2 and *N* = 7. For the cases with local p-adaptation, a single-stage anisotropic p-adaptation procedure is performed, and the minimum polynomial order after adaptation is set to *Nmin* = 1, whereas the maximum polynomial order after adaptation is set to *Nmax* = 7. The relative drag and lift errors of the adapted meshes are assessed by comparing with a reference solution of order *N* = 8:

$$e\_{drag}^{N=8} = \frac{|C\_d - C\_d^{N=8}|}{C\_d^{N=8}}, \; e\_{lift}^{N=8} = \frac{|C\_d - C\_l^{N=8}|}{C\_l^{N=8}}.\tag{19}$$

Figure 3 shows a comparison between the errors obtained using the *τ*ˆ-based adaptation procedure and the ones using uniform p-refinement. As can be observed, the number of degrees of freedom is substantially reduced for the same accuracy when using the *τ*ˆ-based p-adaptation. This reduction translates into a reduction of the CPU-times. It is interesting to point out that, as the *isolated* truncation error threshold *τ*ˆ*max* is decreased, the polynomial orders of the mesh tend to the maximum specified polynomial order, *Nmax* = 7. Consequently, the lift and drag coefficients also tends to *CN*=<sup>7</sup> *<sup>l</sup>* . Using Fig. 3, it is possible to compute a speed-up for different levels of accuracy. Table 1 summarizes the speed-up calculations for the maximum level of accuracy that was achieved for the drag and lift coefficients.

**Fig. 3** Relative error in the drag and lift coefficients for different methods for the flow past the NACA0012 airfoil. The blue lines represent uniform refinement, and the red lines represent the *τ*ˆ-based p-adaptation procedure with *Nmax* = 7. (**a**) Drag error vs. DOFs; (**b**) lift error vs. DOFs; (**c**) drag error vs. CPU-time; (**d**) lift error vs. CPU-time

**Table 1** Computation times and speed-up for the different methods after converging until **r**∞ *<* 10−<sup>9</sup>


Figure 4 shows the distribution of polynomial orders after the single-stage adaptation procedure for a threshold of *<sup>τ</sup>max* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>−4, which has related errors of *eN*=<sup>8</sup> *drag* <sup>=</sup> <sup>4</sup>*.*<sup>10</sup> <sup>×</sup> <sup>10</sup>−<sup>5</sup> and *<sup>e</sup>N*=<sup>8</sup> *lift* <sup>=</sup> <sup>7</sup>*.*<sup>31</sup> <sup>×</sup> <sup>10</sup>−5. As can be observed, the elements that are enriched are mainly the ones on the boundary layer (specially leading and trailing edge), and the zones of the wake where the element size changes significantly.

**Fig. 4** Polynomial order distribution after the anisotropic p-adaptation. *Naverage* = *(N*<sup>1</sup> + *N*2*)/*2

#### **4 Conclusions**

In this work, we have applied recently developed error estimators and anisotropic p-adaptation methods in conjunction with multigrid solving strategies for solving the compressible Navier-Stokes equations. In particular, we have shown that the coupling of anisotropic truncation error-based p-adaptation methods with pmultigrid schemes can speed up the computation of steady-state solutions of PDEs. The achieved speed-up depends on the desired accuracy, being this method optimal when high accuracy is required (low errors). In particular, a speed-up of 16.13 was achieved for the computation of the steady compressible viscous flow past a NACA0012 airfoil at angle of attack 5◦ with respect to the uniformly adapted representation without multigrid.

**Acknowledgements** This project has received funding from the European Union's Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie grant agreement No 675008 for the SSeMID project. The authors acknowledge the computer resources and technical assistance provided by the *Centro de Supercomputación y Visualización de Madrid* (CeSViMa).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Spectral Element Reduced Basis Method for Navier–Stokes Equations with Geometric Variations**

**Martin W. Hess, Annalisa Quaini, and Gianluigi Rozza**

### **1 Introduction and Motivation**

Spectral element methods (SEM) use high-order polynomial ansatz functions to solve partial differential equations (PDEs) in all fields of science and engineering, see, e.g., [4–7, 12, 16] and references therein for an overview. Typically, an exponential error decay under p-refinement is observed, which can provide an enhanced accuracy over standard finite element methods at the same computational cost. In the following, we assume that the discretization error is much smaller than the model reduction error, small enough not to interfere with our results. In general, this needs to be established with the use of suitable error estimation and adaptivity techniques.

We consider the flow through a channel with a narrowing of variable height. A reduced order model (ROM) is computed from a few high-order SEM solves, which accurately approximates the high-order solutions for the parameter range of interest, i.e., the different narrowing heights under consideration. Since the parametric variations are affine, a mapping to a reference domain is applied without further interpolation techniques. The focus of this work is to show how to use simulations arising from the SEM solver Nektar++ [3] in a ROM context. In particular, the multilevel static condensation of the high-order solver is not applied, but the ROM projection works with the system matrices in local coordinates. See [12] for further details. This is in contrast to our previous work [8], since numerical

M. W. Hess (-) · G. Rozza

A. Quaini

© The Author(s) 2020

SISSA mathLab, International School for Advanced Studies, Trieste, Italy e-mail: mhess@sissa.it; gianluigi.rozza@sissa.it

Department of Mathematics, University of Houston, Houston, TX, USA e-mail: quaini@math.uh.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_45

experiments have shown that the multilevel static condensation is inefficient in a ROM context. Additionally, we consider affine geometry variations. With SEM as discretization method, we use global approximation functions for the high-order as well as reduced-order methods. The ROM techniques described in this paper are implemented in open-source project ITHACA-SEM.<sup>1</sup>

The outline of the paper is as follows. In Sect. 2, the model problem is defined and the geometric variations are introduced. Section 3 provides details on the spectral element discretization, while Sect. 4 describes the model reduction approach and shows the affine mapping to the reference domain. Numerical results are given in Sect. 5, while Sect. 6 summarizes the work and points out future perspectives.

#### **2 Problem Formulation**

Let <sup>∈</sup> <sup>R</sup><sup>2</sup> be the computational domain. Incompressible, viscous fluid motion in spatial domain over a time interval *(*0*,T)* is governed by the incompressible *Navier-Stokes* equations with vector-valued velocity **u**, scalar-valued pressure *p*, kinematic viscosity *ν* and a body forcing **f**:

$$\frac{\partial \mathbf{u}}{\partial t} + \mathbf{u} \cdot \nabla \mathbf{u} = -\nabla p + \nu \Delta \mathbf{u} + \mathbf{f},\tag{1}$$

$$
\nabla \cdot \mathbf{u} = 0.\tag{2}
$$

Boundary and initial conditions are prescribed as

$$\mathbf{u} = \mathbf{d} \quad \text{on } \Gamma\_D \times (0, T), \tag{3}$$

$$
\nabla \mathbf{u} \cdot \mathbf{n} = \mathbf{g} \quad \text{on } \Gamma\_N \times (0, T), \tag{4}
$$

$$\mathbf{u} = \mathbf{u}\_0 \quad \text{in } \Omega \times 0,\tag{5}$$

with **d**, **g** and **u**<sup>0</sup> given and *∂* = *D* ∪ *N* , *D* ∩ *N* = ∅. The *Reynolds* number *Re*, which characterizes the flow [11], depends on *ν*, a characteristic velocity *U*, and a characteristic length *L*:

$$Re = \frac{UL}{\nu}.\tag{6}$$

We are interested in computing the steady states, i.e., solutions where *<sup>∂</sup>***<sup>u</sup>** *∂t* vanishes. The high-order simulations are obtained through time-advancement, while the ROM solutions are obtained with a fixed-point iteration.

<sup>1</sup>https://github.com/mathLab/ITHACA-SEM.

#### *2.1 Oseen-Iteration*

The *Oseen*-iteration is a secant modulus fixed-point iteration, which in general exhibits a linear rate of convergence [2]. Given a current iterate (or initial condition) **u***k*, the next iterate **u***k*+<sup>1</sup> is found by solving linear system:

$$\begin{aligned} -\nu \Delta \mathbf{u}^{k+1} + (\mathbf{u}^k \cdot \nabla) \mathbf{u}^{k+1} + \nabla p &= \mathbf{f} \text{ in } \Omega, \\ \nabla \cdot \mathbf{u}^{k+1} &= 0 \text{ in } \Omega, \\ \mathbf{u}^{k+1} &= \mathbf{d} \quad \text{on } \Gamma\_D, \\ \nabla \mathbf{u}^{k+1} \cdot \mathbf{n} &= \mathbf{g} \quad \text{on } \Gamma\_N. \end{aligned}$$

Iterations are typical stopped when the relative difference between iterates falls below a predefined tolerance in a suitable norm, like the *L*2*()* or *H*<sup>1</sup> <sup>0</sup> *()* norm.

#### *2.2 Model Description*

We consider the reference computational domain shown in Fig. 1, which is decomposed into 36 triangular spectral elements. The spectral element expansion uses modal Legendre polynomials of the Koornwinder-Dubiner type of order *p* = 11 for the velocity. Details on the discretization method can be found in chapter 3.2 of [12]. The pressure *ansatz* space is chosen of order *p* − 2 to fulfill the inf-sup stability condition [1, 20]. A parabolic inflow profile is prescribed at the inlet (i.e., *x* = 0) with horizontal velocity component *ux(*0*,y)* = *y(*3 − *y)* for *y* ∈ [0*,* 3]. At the outlet (i.e., *x* = 8) we impose a stress-free boundary condition, everywhere else we prescribe a no-slip condition.

The height of the narrowing in the reference configuration is *μ* = 1, from *y* = 1 to *y* = 2. See Fig. 1. Parameter *μ* is considered variable in the interval *μ* ∈ [0*.*1*,* 2*.*9]. The narrowing is shrunken or expanded as to maintain the geometry

**Fig. 1** Reference computational domain for the channel flow, divided into 36 triangles

**Fig. 2** Full order, steady-state solution for *μ* = 1: velocity in x-direction (top) and y-direction (bottom)

**Fig. 3** Full order, steady-state solution for *μ* = 0*.*1: velocity in x-direction (top) and y-direction (bottom)

symmetric about line *y* = 1*.*5. Figures 2, 3, and 4 show the velocity components close to the steady state for *μ* = 1*,* 0*.*1*,* 2*.*9, respectively.

The viscosity is kept constant to *ν* = 1. For these simulations, the *Reynolds* number (6) is between 5 and 10, with maximum velocity in the narrowing as characteristic velocity *U* and the height of the narrowing characteristic length *L*. For larger *Reynolds* numbers (about 30), a supercritical pitchfork bifurcation occurs giving rise to the so-called *Coanda* effect [8, 9, 22], which is not subject of the current study. Our model is similar to the model considered in [17, 18], i.e. an

**Fig. 4** Full order, steady-state solution for *μ* = 2*.*9: velocity in x-direction (top) and y-direction (bottom)

expansion channel with an inflow profile of varying height. However, in [18] the computational domain itself does not change.

#### **3 Spectral Element Full Order Discretization**

The *Navier-Stokes* problem is discretized with the spectral element method. The spectral/hp element software framework used is Nektar++ in version 4.4.0.2 The discretized system of size *Nδ* to solve at each step of the *Oseen*-iteration for fixed *μ* can be written as

$$
\begin{bmatrix}
A & -D\_{bnd}^T & B \\
\tilde{B}^T & -D\_{int}^T & C \end{bmatrix}
\begin{bmatrix}
\mathbf{v}\_{bnd} \\
\mathbf{p} \\
\mathbf{v}\_{int}
\end{bmatrix} = \begin{bmatrix}
\mathbf{f}\_{bnd} \\
\mathbf{0} \\
\mathbf{f}\_{int}
\end{bmatrix},
\tag{7}
$$

where **v***bnd* and **v***int* denote velocity degrees of freedom on the boundary and in the interior of the domain, respectively, while **p** denotes the pressure degrees of freedom. The forcing terms on the boundary and interior are denoted by **f***bnd* and **f***int* , respectively. The matrix *A* assembles the boundary-boundary coupling, *B* the boundary-interior coupling, *B*˜ the interior-boundary coupling, and *C* assembles the interior-interior coupling of elemental velocity *ansatz* functions. In the case of a *Stokes* system, it holds that *<sup>B</sup>* <sup>=</sup> *<sup>B</sup>*˜ *<sup>T</sup>* , but this is not the case for the *Oseen* equation because of the linearized convective term. The matrices *Dbnd*

<sup>2</sup>See www.nektar.info.

and *Dint* assemble the pressure-velocity boundary and pressure-velocity interior contributions, respectively.

The linear system (7) is assembled in local degrees of freedom, resulting in block matrices *A,B,B,C,D* ˜ *bnd* and *Dint* , each block corresponding to a spectral element. This allows for an efficient matrix assembly since each spectral element is independent from the others, but makes the system singular. In order to solve the system, the local degrees of freedom need to be gathered into the global degrees of freedom [12].

The high-order element solver Nektar++ uses a multilevel static condensation for the solution of linear systems like (7). Since static condensation introduces intermediate parameter-dependent matrix inversions (such as *C*−<sup>1</sup> in this case) several intermediate projection spaces need to be introduced to use model order reduction [8]. This can be avoided by instead projecting the expanded system (7) directly. The internal degrees of freedom do not need to be gathered, since they are the same in local and global coordinates. Only *ansatz* functions extending over multiple spectral elements need to be gathered.

Next, we will take the boundary-boundary coupling across element interfaces into account. Let *M* denote the rectangular matrix which gathers the local boundary degrees of freedom into global boundary degrees of freedom. Multiplication of the first row of (7) by *M<sup>T</sup> M* will then set the boundary-boundary coupling in local degrees of freedom:

$$
\begin{bmatrix}
\boldsymbol{M}^T \boldsymbol{M} \boldsymbol{A} - \boldsymbol{M}^T \boldsymbol{M} \boldsymbol{D}\_{bnd}^T \boldsymbol{M}^T \boldsymbol{M} \boldsymbol{B} \\
\boldsymbol{-D}\_{bnd} & \boldsymbol{0} \\
\boldsymbol{\tilde{B}}^T & -\boldsymbol{D}\_{int}^T \boldsymbol{C}
\end{bmatrix}
\begin{bmatrix}
\mathbf{v}\_{bnd} \\
\mathbf{p} \\
\mathbf{v}\_{in1}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{M}^T \boldsymbol{M} \mathbf{f}\_{bnd} \\
\mathbf{0} \\
\mathbf{f}\_{in1}
\end{bmatrix}.\tag{8}
$$

The action of the matrix in (8) on the degrees of freedom on the Dirichlet boundary is computed and added to the right hand side. Such degrees of freedom are then removed from (8). The resulting system can then be used in a projectionbased ROM context [13], of high-order dimension *Nδ* × *Nδ* and depending on the parameter *μ*:

$$\mathcal{A}(\mu)\mathbf{x}(\mu) = \mathbf{f}.\tag{9}$$

#### **4 Reduced Order Model**

The reduced order model (ROM) computes accurate approximations to the highorder solutions in the parameter range of interest, while greatly reducing the overall computational time. This is achieved by two ingredients. First, a few high-order solutions are computed and the most significant proper orthogonal decomposition (POD) modes are obtained [13]. These POD modes define the reduced order ansatz space of dimension *N*, in which the system is solved. Second, to reduce the computational time, an offline-online computational procedure is used. See Sect. 4.1.

The POD computes a singular value decomposition of the snapshot solutions to 99*.*99% of the most dominant modes [10], which define the projection matrix *<sup>U</sup>* <sup>∈</sup> <sup>R</sup>*Nδ*×*<sup>N</sup>* used to project system (9):

$$U^T \mathcal{R}(\mu) U \mathbf{x}\_N(\mu) = U^T \mathbf{f}. \tag{10}$$

The low order solution **x***<sup>N</sup> (μ)* then approximates the high order solution as **x***(μ)* ≈ *U***x***<sup>N</sup> (μ)*.

#### *4.1 Offline-Online Decomposition*

The offline-online decomposition [10] enables the computational speed-up of the ROM approach in many-query scenarios. It relies on an affine parameter dependency, such that all computations depending on the high-order model size can be moved into a parameter-independent offline phase, while having a fast inputoutput evaluation online.

In the example under consideration here, the parameter dependency is already affine and a mapping to the reference domain can be established without using an approximation technique such as the empirical interpolation method. Thus, there exists an affine expansion of the system matrix A*(μ)* in the parameter *μ* as

$$\mathcal{A}(\mu) = \sum\_{l=1}^{\mathcal{Q}} \Theta\_l(\mu) \mathcal{R}\_l. \tag{11}$$

The coefficients *:i(μ)* are computed from the mapping **x** = *Tk(μ)***x**ˆ + **g***k*, *Tk* <sup>∈</sup> <sup>R</sup>2×2, **<sup>g</sup>***<sup>k</sup>* <sup>∈</sup> <sup>R</sup>2, which maps the deformed subdomain <sup>ˆ</sup> *<sup>k</sup>* to the reference subdomain *k*. See also [19, 21]. Figure 5 shows the reference subdomains *k* for the problem under consideration.

For each subdomain ˆ *<sup>k</sup>* the elemental basis function evaluations are transformed to the reference domain. For each velocity basis function **u** = *(u*1*, u*2*),* **v** = *(v*1*, v*2*),* **w** = *(w*1*, w*2*)* and each (scalar) pressure basis function *ψ*, we can write the transformation with summation convention as:

$$
\int\_{\hat{\Omega}\_k} \frac{\partial \hat{\mathbf{u}}}{\partial \hat{\boldsymbol{\chi}}\_l} \hat{\boldsymbol{\nu}}\_{lj} \frac{\partial \hat{\mathbf{v}}}{\partial \hat{\boldsymbol{\chi}}\_j} d\hat{\Omega}\_k = \int\_{\Omega\_k} \frac{\partial \mathbf{u}}{\partial \boldsymbol{\chi}\_l} \boldsymbol{\nu}\_{lj} \frac{\partial \mathbf{v}}{\partial \boldsymbol{\chi}\_j} d\Omega\_k,
$$

$$
\int\_{\hat{\Omega}\_k} \hat{\boldsymbol{\psi}} \nabla \cdot \hat{\mathbf{u}} d\hat{\Omega}\_k = \int\_{\Omega\_k} \boldsymbol{\psi} \, \boldsymbol{\chi}\_{lj} \frac{\partial \boldsymbol{u}\_j}{\partial \boldsymbol{\chi}\_l} d\Omega\_k,
$$

$$
\int\_{\hat{\Omega}\_k} (\hat{\mathbf{u}} \cdot \nabla) \hat{\mathbf{v}} \cdot \hat{\mathbf{w}} d\hat{\Omega}\_k = \int\_{\Omega\_k} \boldsymbol{u}\_l \boldsymbol{\pi}\_{lj} \frac{\partial \boldsymbol{v}\_j}{\partial \boldsymbol{\chi}\_l} \mathbf{w} d\Omega\_k,
$$

**Fig. 5** Reference computational domain with subdomains <sup>1</sup> (green), <sup>2</sup> (yellow), <sup>3</sup> (blue), <sup>4</sup> (grey) and <sup>5</sup> (brown)

with

$$\begin{aligned} \varkappa\_{lj} &= T\_{li'} \hat{\nu}\_{l'j'} T\_{jj'} \det(T)^{-1}, \\ \varkappa\_{lj} &= \pi\_{lj} = T\_{lj} \det(T)^{-1}. \end{aligned}$$

The subdomain <sup>5</sup> (see Fig. 5) is kept constant, so that no interpolation of the inflow profile is necessary. To achieve fast reduced order solves, the offline-online decomposition expands the system matrix as in (11) and computes the parameter independent projections offline, which are stored as small-sized matrices of the order *N* × *N*. Since in an *Oseen*-iteration each matrix is dependent on the previous iterate, the submatrices corresponding to each basis function are assembled and then formed online using the reduced basis coordinate representation of the current iterate. This is the same procedure used for the assembly of the nonlinear term in the *Navier-Stokes* case [13].

#### **5 Numerical Results**

The accuracy of the ROM is assessed using 40 snapshots sampled uniformly over the parameter domain [0*.*1*,* 2*.*9] for the POD and 40 randomly chosen parameter locations to test the accuracy. Figure 6 (left) shows the decay of the energy of the POD modes. To reach the typical threshold of 99*.*99% on the POD energy, it takes 9 POD modes as RB ansatz functions. Figure 6 (right) shows the relative *L*2*()* approximation error of the reduced order model with respect to the full order model up to 6 digits of accuracy, evaluated at the 40 randomly chosen verification parameter locations. With 9 POD modes the maximum approximation error is less than 0*.*7% and the mean approximation error is less than 0*.*5%.

While the full-order solves were computed with Nektar++, the reduced-order computations were done in ITHACA-SEM with a separate python code. To assess the computational gain, the time for a fixed point iteration step using the full-

**Fig. 6** Left: Decay of POD mode energy. Right: Maximum (red) and mean (blue) relative *L*2*()* error for the velocity over increasing reduced basis dimension

order system is compared to the time for a fixed point iteration step of the ROM with dimension 20, both done in python. The ROM online phase reduces the computational time by a factor of over 100. The offline time is dominated by computing the snapshots and the trilinear forms used to project the advection terms. See [13] for detailed explanations.

#### **6 Conclusion and Outlook**

We showed that the POD reduced basis technique generates accurate reduced order models for SEM discretized models under parametric variation of the geometry. The potential of a high-order spectral element method with a reduced basis ROM is the subject of current investigations. See also [6]. Since each spectral element comprises a block in the system matrix in local coordinates, a variant of the reduced basis element method (RBEM) [14, 15] can be successfully applied in the future.

**Acknowledgements** This work was supported by European Union Funding for Research and Innovation through the European Research Council (project H2020 ERC CoG 2015 AROMA-CFD project 681447, P.I. Prof. G. Rozza). This work was also partially supported by NSF through grant DMS-1620384 (Prof. A. Quaini).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Iterative Spectral Mollification and Conjugation for Successive Edge Detection**

**Robert E. Tuzun and Jae-Hun Jung**

#### **1 Introduction**

Detection of edges is a fundamental problem in a variety of applications, including image processing and the numerical solution of differential equations. In applications such as magnetic resonance imaging (MRI), it is required to construct images from Fourier data. Let {*f*ˆ *<sup>k</sup>*|*k* = 0*,* ±1*,* ±2*,*···} be the set of Fourier coefficients of *f (x)* ∈ *L*2[−*π, π*] given by

$$
\hat{f}\_k = \frac{1}{2\pi} \int\_{-\pi}^{\pi} f(\mathbf{x}) e^{-ik\mathbf{x}} d\mathbf{x},
$$

and let *fN* be the Fourier partial sum *fN* <sup>=</sup> <sup>&</sup>lt;*<sup>N</sup> <sup>k</sup>*=−*<sup>N</sup> <sup>f</sup>*<sup>ˆ</sup> *keikx.* When the underlying function is smooth and periodic, the Fourier reconstruction *fN* is accurate to spectral accuracy, but when edges are present, the reconstruction is plagued by the Gibbs phenomenon, also known as the Gibbs ringing in MRI applications.

Various methods have been proposed to address these issues and those methods consist of edge detection followed by reconstruction. Thus, the determination of edge locations is critical. Fourier concentration method has emerged over the past

R. E. Tuzun

Department of Mathematics, University at Buffalo, The State University of New York, Buffalo, NY, USA

e-mail: retuzun@buffalo.edu

J.-H. Jung (-)

Department of Mathematics, University at Buffalo, The State University of New York, Buffalo, NY, USA

Department of Data Science, Ajou University, Suwon, South Korea e-mail: jaehun@buffalo.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_46

decade as a robust method for edge detection in a variety of circumstances and applications [5, 6]. Essentially, a certain Fourier partial sum converges to the jump function as the number of Fourier coefficients increases and this convergence can be accelerated by what is known as concentration factors (functions). Use of different types of concentration factors tends to impart trade-offs between oscillations near jump discontinuities and significant non-zero concentration away from them [2]. Several methods have been devised to address this issue, as well as to treat special circumstances such as incomplete Fourier data and the presence of noise [1, 4, 13, 15].

Thanks to the convergence property of the Fourier concentration to the jump function, the concentration method detects edges with large concentrations. Where the function is smooth, the concentration vanishes as the jump function vanishes as *N* → ∞. In practice, the concentration method is designed to detect edges with magnitudes larger than some given threshold, with the value of the used threshold being problem dependent. The value of the threshold cannot be arbitrarily small; otherwise, too many false edges can be detected. If the magnitude of weak edges is much smaller than other edges, those edges are considered insignificant, but for some cases weak edges are more important than strong edges. For example, it was shown that in the segmentation of MRI of the knee, the cartilage is better characterized by weak edges rather than strong edges for the separation from the tibia and femur [11].

This note shows that an iterative approach based on the successive conjugation and adaptive mollification can detect all edges without any prior threshold. This approach is similar to the iterative method in the context of the radial basis function method [3, 9, 10]. The iterative method is as follows: at each iteration step, all previously found edges are smoothed by a local mollification and new corresponding Fourier coefficients are computed. By applying conjugation and mollification successively, one can distinguish real edges from fake edges. This approach is useful and effective particularly for problems where the weak jump can significantly affect the global solution of differential equations or images where the interesting structure is represented by the weak edges [11].

In Sect. 2, a brief explanation of the Fourier concentration method is given. In Sect. 3, the proposed iterative method is explained based on the adaptive filtering method. The stopping criteria is also explained. Numerical examples with remarks are given. In Sect. 4, a brief concluding remark is provided.

#### **2 Edge Detection Using Fourier Concentration Method**

Let [*f* ]*(x)* = *f (x*+*)* − *f (x*−*)* denote the jump function of *f (x)* ∈ *L*2[−*π, π*], where the superscripts + and − denote the limits taken from the right and left, respectively. Given a finite set of Fourier coefficients, {*f*ˆ *<sup>k</sup>*}|*k*|≤*<sup>N</sup>* , the Fourier concentration method, developed in [5, 6], computes the concentration as a sum of the form

$$S\_N^\sigma[f](\mathbf{x}) = i \sum\_{|k| \le N} \text{sgn}(k) \, \hat{f}\_k \sigma\left(\frac{|k|}{N}\right) e^{ik\mathbf{x}} \tag{1}$$

where the *σ (*·*)* are known as concentration factors and *sgn(k)* is the sign function. Given certain admissibility conditions [6], the sum converges to the jump function:

$$S\_N^g[f](\mathbf{x}) = [f](\mathbf{x}) + \begin{cases} O\left(\frac{\log N}{N}\right), & d(\mathbf{x}) \le \frac{\log N}{N} \\ O\left(\frac{\log N}{(Nd(\mathbf{x}))^s}\right), & d(\mathbf{x}) \gg N^{-1} \end{cases} \tag{2}$$

where *d(x)* denotes the distance to the nearest edge and *s* depends on the concentration factor. Here we note that Eq. (2) shows that the concentration function *Sσ <sup>N</sup>* [*f* ]*(x)* recovers the jump function of *f (x)* as *N* → ∞ and the convergence may be slow. Equation (2) also implies that the absolute maximum value of the concentration function *S<sup>σ</sup> <sup>N</sup>* [*f* ]*(x)* converges to the maximum jump. Accordingly we observe that strong jumps are relatively easier to detect than weak jumps. The common types of concentration factors satisfying the admissibility conditions are polynomial concentrations

$$
\sigma(\eta) = p\eta^p, \quad p \ge 1,\tag{3}
$$

where *p* is a positive integer and *η* = |*k*|*/N* and exponential concentration functions

$$
\sigma(\eta) = C \eta e^{1/(a\eta(\eta-1))},\tag{4}
$$

where *α >* 0 is an order and *C* is a normalization constant. Cutoffs for edge detection, *τ* ∈ *(*0*,* 1], are with respect to the normalized concentration

$$\hat{S}(\mathbf{y}) = |S\_N^\sigma[f](\mathbf{y})| / \max\_{\mathbf{y}} \{ |S\_N^\sigma[f](\mathbf{y})| \}$$

and the edge set, *E*, is defined as

$$E = \{ \mathbf{x} | \bar{\mathbf{S}}(\mathbf{x}) \ge \mathbf{r}, \mathbf{x} \in [-\pi, \pi] \}. \tag{5}$$

Several approaches have been developed for improving the concentration method. We refer readers some to [1, 2, 4, 6, 7, 12, 13, 15, 16]. All these methods are basically utilizing the edge map. Figure 1 is by the Fourier concentration method for *f*1*(x)*

$$f\_{\mathbb{I}}(\mathbf{x}) = \begin{cases} 3 & \frac{\pi}{4} \le \mathbf{x} \le \frac{\pi}{2} \\ -2 & \pi \le \mathbf{x} \le \frac{\xi}{4}\pi \\ m\_w \frac{3}{2}\pi \le \mathbf{x} \le \frac{7}{4}\pi \\ 0 & \text{otherwise} \end{cases} \tag{6}$$

**Fig. 1** Edge detection for *f*1*(x)* with *τ* = 0*.*4. (**a** and **b**) with the polynomial concentration with *p* = 5. (**c** and **d**) with the exponential concentration with *p* = 12 for *σ* = exp <sup>9</sup> −*#M* 1 − *k N <sup>p</sup>*: and *#M* = 64. (**a** and **c**) *S(y)* ˆ . (**b** and **d**) the detected edges marked by red cross symbols. *N* = 128. The red cross in (**b** and **d**) indicates the edge locations found

where *mw* = 0*.*1. That is, the magnitude of the strong jump is 30 times the weakest. As clearly shown in the figure, it is hard to detect the weak edge by looking at the normalized Fourier concentration *S(y)* ˆ , (a) and (c) in Fig. 1, although the weak edges are clearly visible in (b) and (d) in Fig. 1.

#### **3 Iterative Concentration Method**

As clearly seen in Fig. 1, the Fourier concentration method may fail to detect the weak edge when the concentration of the weak edge is too small compared to the strong edge. To find all edges, we propose to apply the Fourier concentration method iteratively based on the local mollification using the local adaptive filtering method.

#### *3.1 Local Adaptive Mollification*

The local adaptive mollification is a key step for the iterative algorithm. Consider a smooth function *φ* ∈ *C*<sup>∞</sup> <sup>0</sup> [0*,* 2*π*] which is compactly supported such that

$$
\phi\_{\epsilon}(\mathbf{x}) = \frac{1}{\epsilon} \phi \left( \frac{\mathbf{x}}{\epsilon} \right), \tag{7}
$$

where lim*#*→0<sup>+</sup> *φ# (x)* = *δ(x)*. Here *δ(x)* is the Dirac delta function. And further ; ∞ −∞ *φ(x)dx* <sup>=</sup> <sup>1</sup>*.* With these properties, the limit property is given by

$$\lim\_{\epsilon \to 0^+} (\phi\_\epsilon \* f)(\mathbf{x}) = f(0),$$

where *(*∗*)* operation denotes convolution. The parameter *#* is free and it localizes the convolution and is known as the localization factor. The parameter *#* is a fixed value for every *x*. Thus a global smoothing occurs everywhere including both the nonsmooth and smooth areas. However, we only want to apply the mollification locally to minimize the Gibbs oscillations near the jump. In order to achieve this, we use a two-parameter family of the spectral mollifier introduced by Gottlieb and Tadmor [8]. Consider the convolution of the Fourier partial sum *fN (x)* and the mollifier *φ*. Then by the definition of *fN (x)* and *φ* we have

$$\begin{aligned} \left(\phi\_{\epsilon} \* f\_{N}(\mathbf{x})\right) &= \frac{1}{2\pi} \int\_{0}^{2\pi} \phi\_{\epsilon}(\mathbf{x} - \mathbf{y}) f\_{N}(\mathbf{y}) d\mathbf{y} \\ &= \frac{1}{2\pi} \int\_{0}^{2\pi} \phi\_{\epsilon}(\mathbf{x} - \mathbf{y}) (D\_{N}(\mathbf{y} - \mathbf{z}) \* f(\mathbf{z}))(\mathbf{y}) d\mathbf{z} d\mathbf{y}, \end{aligned}$$

where *DN* is the Dirichlet kernel of degree *N*. The idea proposed in [8] is that one changes the degree of the Dirichlet kernel with the localization parameter *#* so that the two-parameter family of the new mollifier is defined by

$$\phi\_{p,\epsilon}(\mathbf{x}) = \frac{1}{\epsilon} \rho \left(\frac{\mathbf{x}}{\epsilon}\right) D\_p \left(\frac{\mathbf{x}}{\epsilon}\right), \tag{8}$$

where *Dp* is the Dirichlet kernel of degree *p*. Then for all *s*, the error is given by [8]

$$|\phi\_{p,\epsilon} \* f\_N(\mathbf{x}) - f(\mathbf{x})| \le C||\rho^{(s)}||\_{\infty} \left[ N\left(\frac{1+p}{N\epsilon}\right)^{s+1} + p\left(\frac{2}{p}\right)^s ||f^{\delta}||\_{L^{\infty}\_{loc}} \right],\tag{9}$$

where || · ||*L*<sup>∞</sup> *loc* = sup*(x*−*#π,x*+*#π)* |·|. The first term in the right hand side of the above inequality is the truncation error and the second term is the regularization error. As we see, the optimization of the error is determined by how the localization parameter *#* and the degree *p* are balanced. In [8] those parameters were chosen such that *#* <sup>=</sup> *<sup>d</sup> <sup>π</sup>* , where *d* is the distance to the nearest jump from the current position. The order of the Dirichlet kernel is chosen such that spectral convergence is achieved, say, *<sup>p</sup>* <sup>≈</sup> <sup>√</sup>*N*. Here we note that a modification of the two-parameter mollifier for the enhancement of the convergence was proposed in [14], which was designed to reduce the Gibbs oscillations while it provides a sharp reconstruction up to the edge. Note that the adaptive mollifier was used to sharpen the concentration map *S*ˆ in [2].

Our proposed iterative method is that once the edge is identified, the edge region is first localized using the value of *#* so that *fN* in the region away from the detected edge is not affected by the mollification. This helps the next available edge to be preserved through the mollification of *fN* if existent. Thus as in [8], the localization factor is a function of the distance from the edge, *d*, i.e. *#* = *#(d)*. Then we adaptively mollify *fN* so that a heavy mollification using *p* is applied to reduce the Gibbs oscillations near the edge. The limit property of *p* is given as *p* → 0 if *d* → 0 and *p* → ∞ if *d* → 2*π*. In this work, we use the local adaptive filtering for the mollification.

#### *3.2 Almost Automatic Stopping of the Iteration*

To see the proposed method stops almost automatically, consider *f (x)* = *x,x* ∈ [−*π, π*] with the Fourier coefficients

$$
\hat{f}\_k = (-1)^k \frac{i}{k}, \quad k \neq 0,
$$

and *f*ˆ <sup>0</sup> = 0. There are two edges (*x* = ±*π*) and conjugation and local adaptive filtering have the most effect at ±*π*. Therefore, by considering the local behavior near ±*π*, we assume a constant order of filtering *p* and of conjugation *q*, with functional forms of exp*(*−*#M(*<sup>1</sup> − |*k/N*|*)p)* and exp*(*−*#M*|*k/N*<sup>|</sup> *<sup>q</sup> )*, respectively. By letting *φp,#* and *C<sup>σ</sup> <sup>N</sup>* be the corresponding kernels and letting *S* and *F* denote conjugation and filtering,

$$C\_N^\sigma \* (\phi\_{p,\epsilon} \* f\_N) \approx \phi\_{p,\epsilon} \* (C\_N^\sigma \* f\_N)$$

and after some simplification, we have

$$S[F[f\_N]] = C\_N^\sigma \* (\phi\_{p,\epsilon} \* f\_N) \approx \sum\_{k=-N, k \neq 0}^N \frac{(-1)^{k+1}}{k} \exp\left(-\epsilon \left(1 - \frac{|k|}{N}\right)^p\right),$$

$$\times \exp(-\epsilon |k/N|^q)e^{ikx}.$$

This has Fourier coefficients

$$\widehat{S}[F[f\_N]] \approx \frac{(-1)^{k+1}}{k} \exp\left(-\epsilon \left(1 - \frac{|k|}{N}\right)^p\right) \exp(-\epsilon |k/N|^q).$$

From the sharp localization and heavy filtering near ±*π*, we choose *p, q* −→ 0. Then setting *y* = |*k/N*| yields

$$|\widehat{S}[F[f\_N]]| \approx \frac{1}{k} \exp[-\epsilon(\mathbf{y}^q) + (1-\mathbf{y})^p], \quad k \neq 0$$

which approaches 0 exponentially. Thus after all the edges are found through iteration, the concentration decays exponentially small. Thus if the stopping criteria *<sup>η</sup>* below is chosen small enough, e.g. *<sup>η</sup>* <sup>∼</sup> <sup>10</sup>−10, the stopping of the iteration is guaranteed

$$|S[F[f\_N]]| \le \eta. \tag{10}$$

#### *3.3 Numerical Examples*

We consider the case that the magnitude of the weak edge is highly small for the function *f*1*(x)* in Eq. (6)

$$m\_w = 0.01.$$

Figure 2 shows how the iteration method finds edges. The order of the finding the edges is from left to right (see red arrows). As shown in the figure, the iterative method finds all edges even with *mw* = 0*.*01. It is interesting to observe that the weakest edges are found in the 3rd and 7th iteration steps before all the strong edges are found. Now we consider even smaller value of *mw*

$$m\_w = 0.001...$$

**Fig. 2** *f*1*(x)* with *mw* = 0*.*01 and successive edge detection. Left: *S(y)* ˆ . Right: edges found in each iteration marked by red cross symbols with the weak edges circled in green. Note that the weak edges are almost invisible in the right figure of *f*1*(x)*. Each figure in the left shows *S(y)* ˆ after each iteration from left to right. Each figure in the right shows the actual function and detected edges, from left to right

Figure 3 shows how the iteration method finds all edges. Figure 3 shows similar result as in Fig. 2. As shown in the figure, the method is highly accurate and finds all edges including the highly weak edges.

As an application to the solution of PDEs, namely the shock-density wave interaction equation, we consider finding shocks in the density profile at *t* = 2 with the total number of grid points *N* = 300 computed with the WENO-Z method used in [9]. The left two figures of Fig. 4 show the edges (shocks) found by the Fourier concentration method while the right figure shows the edges (shocks) found by the iterative method. As shown in the figure, the iterative method find all the physical shocks accurately while the Fourier concentration method misses some of shocks.

For two-dimensional examples, we consider a Shepp-Logan image with a faint box added to comprise additional weak edges, and a brain image. To detect edges in two dimensions, edges are detected slicewise in the *x* and *y* directions. The *x* and *y* coordinates have a range of [−*π, π*]. For a 2*Nx*+1×2*Ny*+1 image, slices of *f (x, y)* are taken at evenly spaced *x* and *y* with *x* = 2*π/(*2*Nx* +1*)* and *y* = 2*π/(*2*Ny* + 1*)*, with −*π* included and *π* excluded. Within each slice, Fourier coefficients are computed by partial Fourier expansion and the iterative method is applied to find strong and weak edges. Calculation parameters for the two-dimensional calculations were similar to those for the one-dimensional calculations. An edge with a concentration magnitude at or above a fraction *τ* = 0*.*1 of the maximum magnitude concentration was considered strong. To detect strong edges, trigonometric concentration factors with *α* = *π* were used. Figure 5 shows the edges found by the proposed method for the Shepp-Logan image. As in the figure, the weak edges (square box with magnitude of 0*.*01) are successfully found by the method.

*Remarks* First, the proposed method is affected by noise as the original Fourier concentration. Consider Eq. (6). Let *f*ˆ *k(mw* = 0*)* be the Fourier coefficients with *mw* = 0. Then *f*ˆ *k(mw)* = *mw/*8 for *k* = 0 and *f*ˆ *<sup>k</sup> (mw)* <sup>=</sup> *mw* 2*πk* sin*(*7*kπ/*4*)* − sin*(*3*kπ/*2*)* for *k* = 0. The weak edge translates *f*ˆ *<sup>k</sup>* . Thus we expect that unless *SNR* is high enough, |*f*ˆ *k(mw* = 0*)* − *f*ˆ *<sup>k</sup> (mw)*| becomes easily smaller than the noise as *mw* decays. Figure 6 shows the concentration with *mw* = 0 (left), the concentration with *SNR* = 20 (middle) and with *SNR* = 10. As in the figure, the weak edges are

**Fig. 3** *f*1*(x)* with *mw* = 0*.*001 and successive edge detection. Detected edges are marked by red cross symbols with the weak edges circled in green. Note that the edges in *f*1*(x)* are almost invisible in the figure

0

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

5

**Fig. 4** The density *ρ* in *y*-axis versus the *x*-coordinate in *x*-axis at *t* = 1 for the shock-density wave interaction and shocks (with cross symbols) found with different values of *τ* and *p*. Left two figures: the Fourier concentration method. Right: the iterative method

**Fig. 5** Original image (left) and edges (right) detected for concentration followed by iterative method on the Shepp-Logan image with a weak square edge of magnitude 0*.*01 added

**Fig. 6** Concentration in *y*-axis versus *x*. Left: without noise. Middle: *SNR*=20. Right: *SNR*=10

easily indistinguishable as *SNR* decreases. As the main objective of this research is finding the weak edges, a proper noise reduction suitable for the proposed method should be investigated in our future research. Second, the proposed method is to find

**Fig. 7** Left: edges with finite difference in the physical domain. Right: <sup>3</sup>*<sup>π</sup>* <sup>2</sup> <sup>≤</sup> *<sup>x</sup>* <sup>≤</sup> <sup>7</sup>*<sup>π</sup>* 4

the edges by *f*ˆ *k*. Figure 7 shows the edge detection in the physical domain, with the forward difference, generated from the Fourier data (the left figure). The right figure shows the plot in *<sup>x</sup>* <sup>∈</sup> <sup>3</sup>*<sup>π</sup>* <sup>2</sup> <sup>≤</sup> *<sup>x</sup>* <sup>≤</sup> <sup>7</sup>*<sup>π</sup>* <sup>4</sup> where the weak edges exist. As in the figures, the weak edges are still hard to distinguish in the physical domain. Once the strong edges are removed in the Fourier domain and switching back and forth from the Fourier to physical domains, the weak edges are eventually found with the proposed method.

**Summary** The following is the summary of the proposed iterative concentration method. The procedure stops eventually with a non-zero value of *η >* 0 in Eq. (10).


#### **4 Conclusion**

We showed that the iterative approach of the Fourier concentration method can detect all edges, which is not the case if the weak edges are too small. We showed that the proposed method is able to detect weak edges 3000 times weaker than the strongest edge, as long as the weak edges are well-separated from the stronger edges without noise and that the proposed method find all weak edges in a PDE application, namely the WENO calculation for the shock-density wave interaction. The iterative method also shows that it stops almost automatically after all the edges are found. Thus the proposed method is accurate and efficient.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Small Trees for High Order Whitney Elements**

**Ana Alonso Rodríguez and Francesca Rapetti**

### **1 Introduction**

We aim at determining in a constructive way, for the high order case, the finite element solutions of grad *φ* = **E**, curl **A** = **B**, div **D** = *ρ*, namely, of the equations linking the electric field **E**, the magnetic induction **B**, and the electric charge density *ρ*, to their potentials *φ*, **A** and **D**, respectively. Stating the necessary and sufficient conditions for assuring that a function defined in a bounded set *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> is the gradient of a scalar potential, the curl of a vector potential or the divergence of a vector field is one of the most classical problem of vector analysis (see for example [3, 6, 8]). We aim at providing an explicit and efficient procedure to construct a finite element solution. For example, div-free fields, **W**, are implicitly characterized in terms of a vector **w** of degrees of freedom of **W** by the algebraic constraint D **w** = **0**, with D the matrix of the div operator between finite elements spaces. The same fields, in the case of a domain with connected boundary, are explicitly defined by **w** = R **a**, with no constraint on **a**, where R is the matrix of the curl operator between finite elements spaces and **a** collects the degrees of freedom of the vector potentials **A**. Similarly, one can wish to compute a vector potential **a** such that R **a** = **b**, for a given field **b** verifying D **b** = **0**. As explained in [5], these bases can be constructed by the help of "trees" and "co-trees", which are at the core of this contribution. The case *r* = 0 is largely treated in the literature for different types of topological domains (see for example [2]). In these pages, we develop the

A. Alonso Rodríguez (-)

F. Rapetti

Dipartimento di Matematica, Università degli Studi di Trento, Trento, Italy e-mail: ana.alonso@unitn.it

Département de Mathématiques, Université Côte d'Azur, Nice, France e-mail: francesca.rapetti@univ-cotedazur.fr

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_47

tree and co-tree approaches for *r >* 0 when fields in the high order Whitney spaces are represented on the basis of their weights on small simplices [7, 9, 10]. With this choice of degrees of freedom, the tree and co-tree concepts extend from *r* = 0 to *r >* 0 straightforwardly.

#### **2 Basic Concepts**

Let *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> be a bounded polyhedral domain with Lipschitz boundary *∂Ω* and *M* a simplicial mesh of *Ω*¯ . We denote by |*A*| the cardinality of the set *A*. For 0 ≤ *k* ≤ 3, let *Δk(T )* (resp. *Δk(M)*) be the set of *k*-simplices of a mesh tetrahedron *T* (resp. of the mesh *M*). Note that *Δk(M)* = ∪*<sup>T</sup>* <sup>∈</sup>*MΔk(T )*. If *Δ*0*(M)* = {**v***i*}*i*, with *i* = 1*,...,Nv*, being *Nv* = |*Δ*0*(M)*|, then each *k*-simplex *S* ∈ *Δk(M)* has associated an increasing map *mS* : {0*,...,k*}→{1*,...,Nv* }*.* This map induces an (inner) orientation on *S* (i.e., a way to run along *S* if *k* = 1, through *S* if *k* = 2, in *S* if *k* = 3).

If we assign to each *S* ∈ *Δk(M)* a real number *cS* we can define the *k*-chain *c* = < *<sup>S</sup>*∈*Δ*0*(M) cS <sup>S</sup>*, i.e. a formal weighted sum of *<sup>k</sup>*-simplices *<sup>S</sup>* in *<sup>M</sup>*. One can add *k*-chains, namely *(c* + ˜*c)* = < *S(cS* + ˜*cS) S*, and multiply a *k*-chain by a scalar *p*, namely *p c* = < *S(p cS) S*. The set of all *k*-chains in *M*, here denoted *Ck(M)*, is a vector space, in one-to-one correspondence with the set of real vectors *c* = *(cS)S*∈*Δk (M)*. Each *k*-simplex *S* ∈ *Δk(M)*, can be associated with the elementary *k*-chain *c* with entries *cS* = 1 and *cS*˜ = 0 for *S*˜ = *S*. In the following we will use the same symbol *S* to denote the oriented *k*-simplex and the associated elementary *k*-chain.

The boundary operator *∂* takes a *k*-simplex *S* and returns the sum of all its *(k* − 1*)*-faces *f* with coefficient 1 or −1 depending of whether the orientation of the *(k* − 1*)*-face *f* matches or not with the orientation induced by that of the simplex *S* on *f* . Since the boundary operator is a linear mapping from *Ck(M)* to *Ck*−1*(M)*, it can be represented by a matrix *∂* of dimension |*Δk*−1*(M)*|×|*Δk(M)*|, which is rather sparse, gathering the coefficients 0, −1, or +1. Note that in three dimensions, there are three nontrivial boundary operators acting, respectively, on edges, triangles and tetrahedra: *∂*<sup>1</sup> represented by the matrix G8, *∂*<sup>2</sup> represented by R8, and *∂*<sup>3</sup> represented by D8. To fully specify *∂*, we need to specify the boundary of each simplex *S*. By definition, we have

$$\partial\_1 e = \sum\_{n \in \Delta\_0(\mathcal{M})} \mathsf{G}\_{\epsilon, n} n, \qquad \partial\_2 f = \sum\_{e \in \Delta\_1(\mathcal{M})} \mathsf{R}\_{f, e} e, \quad \partial\_3 T = \sum\_{f \in \Delta\_2(\mathcal{M})} \mathsf{D}\_{T, f} f, \mathcal{F}\_{\epsilon, f} e$$

for any *e* ∈ *Δ*1*(M)*, any *f* ∈ *Δ*2*(M)* and any *T* ∈ *Δ*3*(M)*. For *e* = [**v**0*,* **v**1], *f* = [**v**0*,* **v**1*,* **v**2] and *T* = [**v**0*,* **v**1*,* **v**2*,* **v**3], we have, respectively,

$$\begin{split} \partial\_{1}[\mathbf{v}\_{0},\mathbf{v}\_{1}] &= \mathbf{v}\_{0} - \mathbf{v}\_{1}, \qquad \partial\_{2}[\mathbf{v}\_{0},\mathbf{v}\_{1},\mathbf{v}\_{2}] = [\mathbf{v}\_{0},\mathbf{v}\_{1}] - [\mathbf{v}\_{0},\mathbf{v}\_{2}] + [\mathbf{v}\_{1},\mathbf{v}\_{2}],\\ \partial\_{3}[\mathbf{v}\_{0},\mathbf{v}\_{1},\mathbf{v}\_{2},\mathbf{v}\_{3}] &= [\mathbf{v}\_{0},\mathbf{v}\_{1},\mathbf{v}\_{2}] - [\mathbf{v}\_{0},\mathbf{v}\_{1},\mathbf{v}\_{3}] + [\mathbf{v}\_{0},\mathbf{v}\_{2},\mathbf{v}\_{3}] - [\mathbf{v}\_{1},\mathbf{v}\_{2},\mathbf{v}\_{3}]. \end{split}$$

The subscript is removed when there is no ambiguity, since the operator needed for a particular operation is indicated from the type of the operand (e.g., *∂*<sup>3</sup> when *∂* applies to tetrahedra). The notion of boundary can be extended to *k*-chains by linearity, *∂c* = *∂(*< *<sup>S</sup>*∈*Δk (M)* **<sup>c</sup>***<sup>S</sup> S)* <sup>=</sup> <sup>&</sup>lt; *<sup>S</sup>*∈*Δk (M)* **<sup>c</sup>***<sup>S</sup> ∂S*.

We say that a *k*-chain *c* is closed if *∂kc* = 0. Non-trivial closed *k*-chains are called *k*-cycles and constitute the subspace *Zk(M)* = ker*(∂k*; *Ck(M))*. A *k*-chain *c* is a boundary if it exists a *(k* + 1*)*-chain *γ* such that *c* = *∂k*+1*γ* . The *k*boundaries constitute the subspace *Bk (M)* = *∂k*+1*Ck*+1*(M)*. From the property *∂∂* = 0, we know that boundaries are cycles but not all cycles are boundaries, and we have *Bk(M)* ⊂ *Zk(M)*. The quotient space *Hk(M)* = [*Zk(M)/Bk(M)*] is the homology spaces of order *k* of the mesh *M*, and the Betti's number *bk* = rank [*Hk(M)*]. The presence of curl-free fields (resp. div-free fields) that are not the gradient of a scalar field (resp. the curl of a vector field) is indicated from the fact that *b*<sup>1</sup> = 0 (resp. *b*<sup>2</sup> = 0). We recall that Betti's numbers are topological invariants (i.e., they depend on the domain *Ω* up to a homeomorphism) and do not depend on the mesh *M* on *Ω*¯ that is used to compute them (see [12] and an application in [11]).

For the high order case, we need to introduce some concepts of relative homology. Let *Kk(M)* be subspaces of *Ck(M)* with *∂kKk(M)* ⊂ *Kk*−1*(M)*. We thus say that *c* ∈ *Ck(M)* is closed [modulo *Kk(M)*] if *∂c* ∈ *Kk(M)*. A *(k*−1*)*-chain *c* bounds [modulo *Kk(M)*] if there exists a *k*-chain *γ* such that *c*−*∂γ* ∈ *Kk*−1*(M)*. We thus talk about relative homology groups.

<sup>A</sup> *<sup>k</sup>*-cochain *<sup>w</sup>* (over the mesh *<sup>M</sup>*) is a linear mapping from *<sup>C</sup>k(M)* to <sup>R</sup>. They are discrete analogues to differential forms. For *k >* 0, the exterior derivative of the *(k* − 1*)*-form *w* is the *k*-form d*w* such that ; *<sup>s</sup>* d*w* = ; *∂s w* for all *s* ∈ *Ck(M).* With this simple equation relating the evaluation of d*w* on a simplex *s* to the evaluation of *w* on the boundary of this simplex, the exterior derivative is readily defined. We can naturally extend the notion of evaluation of a differential form *w* on an arbitrary chain by linearity: ; < *<sup>i</sup>* **<sup>c</sup>***isi <sup>w</sup>* <sup>=</sup> <sup>&</sup>lt; *i* **c***i* ; *si w.* Thus

$$\int\_{\sum\_{l} \mathbf{c}\_{l}s\_{l}} \mathrm{d}w = \int\_{\partial(\sum\_{l} \mathbf{c}\_{l}s\_{l})} w = \int\_{\sum\_{l} \mathbf{c}\_{l}\partial s\_{l}} w = \sum\_{l} \mathbf{c}\_{l} \int\_{\partial s\_{l}} w.$$

The operator d is the dual of the boundary operator *∂*. As a corollary of the boundary operator property *∂∂* = 0, we have that dd = 0. Since we used arrays of dimension |*Δk(M)*| to represent a *k*-cochain, the operator d can be represented by a matrix **d** of dimension |*Δk(M)*|×|*Δk*−1*(M)*|, 1 ≤ *k* ≤ 3. Again, we have one matrix for the exterior derivative operator for each simplex dimension. When a metric is introduced on the ambient affine space, the exterior derivative operator d stands for grad, curl, div, according to the value of *k* from 1 to 3, and it is represented by, respectively, G*,* R*,* D, the connectivity matrices of the mesh *M*.

#### **3 Small Simplices, Weights and Potentials**

We introduce the multi-index *α* = *(α*0*,...,αs)* of *s* + 1 integers *αi* ≥ 0 and weight <sup>|</sup>*α*| = <sup>&</sup>lt;*<sup>s</sup> <sup>i</sup>*=<sup>1</sup> *αi*. The set of multi-indices *<sup>α</sup>* with *<sup>s</sup>* <sup>+</sup> 1 components and weight *<sup>r</sup>* is denoted *I(s* + 1*,r)*. We denote by **v***<sup>i</sup>* the (Cartesian) coordinates of the node *ni* in <sup>R</sup>3. Given a multi-index *<sup>α</sup>* <sup>∈</sup> *<sup>I</sup>(*4*,r)*, and a *<sup>k</sup>*-subsimplex *<sup>S</sup>* of *<sup>T</sup>* , the small simplex {*α, S*} is the *k*-simplex that belongs to the small tetrahedron with barycenter at the point of coordinates <<sup>3</sup> *<sup>i</sup>*=0[*(* <sup>1</sup> <sup>4</sup> + *αi)***v***σ*<sup>0</sup> *<sup>T</sup> (i)*]*/(r* <sup>+</sup> <sup>1</sup>*)*, which is parallel and 1*/(r* <sup>+</sup> <sup>1</sup>*)* homothetic to the (big) sub-simplex *S* of *T* . The notation {*α, S*} was first defined in [9]. The set of small tetrahedra of order *r*+1 *>* 1 can be visualized starting from the principal lattice *Lr*+1*(T )* in the simplex *T* = {*nσ*<sup>0</sup> *<sup>T</sup> (*0*) nσ*<sup>0</sup> *<sup>T</sup> (*1*) nσ*<sup>0</sup> *<sup>T</sup> (*2*) nσ*<sup>0</sup> *<sup>T</sup> (*3*)*} defined as

$$L\_{r+1}(T) = \left\{ \mathbf{x} \in T \; : \; \lambda\_{\sigma\_T^0(i)}(\mathbf{x}) \in \{0, \frac{1}{r+1}, \frac{2}{r+1}, \dots, \frac{r}{r+1}, 1\}, \ 0 \le i \le 3 \right\}.$$

and connecting its points by edges parallel to those of *T* . (See, e.g., Fig. 1.)

We denote by *Λk(Ω)* the space of all smooth differential k-forms on *Ω*. The completion of *Λk(Ω)* in the corresponding norm defines the Hilbert space *<sup>L</sup>*<sup>2</sup>*Λk(Ω)*. Let *<sup>P</sup>*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(T )* be the space of so-called trimmed polynomial *<sup>k</sup>*-forms of degree *r* + 1 on *T* , with *r* ≥ 0, (as in [7]), and we define

$$\mathcal{P}^-\_{r+1}\Lambda^k(\mathcal{M}) = \{ \omega \in H\Lambda^k(\Omega) \, : \, \omega\_{|T} \in \mathcal{P}^-\_{r+1}\Lambda^k(T), \, T \in \mathcal{M} \}$$

where *H Λk(Ω)* = {*<sup>ω</sup>* <sup>∈</sup> *Λk(Ω)* : <sup>d</sup>*<sup>ω</sup>* <sup>∈</sup> *Λk(Ω)*} is a Hilbert space (see [4]).

**Definition 1** The weights of a polynomial *k*-form *u* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(T )*, with 0 <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> <sup>3</sup> and *r* ≥ 0, are the scalar quantities

$$\int\_{\{\mathfrak{a},S\}} u,\tag{1}$$

on the small simplices {*α, S*} with *α* ∈ *I(*4*,r)* and *S* ∈ *Δk(T )*.

**Fig. 1** From the principal lattice of degree *r*+1 = 3 in a tetrahedron *T* , we define a decomposition of *T* into 10 small tetrahedra, 4 octahedra *O* and 1 reversed tetrahedron. Each face on *∂T* is decomposed into 6 small faces and 3 reversed triangles, in solid red line (Left)

We now list some remarkable properties of the small simplices which are useful in the tree construction.

*Property 1* The weights (1) of a Whitney *k*-form *u* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(T )* on all the small simplex {*α, S*} of *T* are unisolvent, as stated in [7, Proposition 3.14]. The small simplices can thus support the degrees of freedom for fields *u* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(T )*, with 0 ≤ *k* ≤ 3 and *r* ≥ 0. Since the result on unisolvence holds true also by replacing *T* with *F* ∈ *Δn*−1*(T )* then Tr*<sup>F</sup> u* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(F )* is uniquely determined by the weights on small simplices in *F*. It thus follows that a locally defined *u*, with *u*|*<sup>T</sup>* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(T )* and single-valued weights, is in *H Λk(Ω)*. We thus can use the weights on the small simplices{*α, S*} as degrees of freedom for the fields in the finite element space *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(M)* being aware that their number is greater than the dimension of the space.

*Property 2* The weights given in Definition 1 have a meaning as cochains and this relates directly the matrix describing the exterior derivative with the matrix of the boundary operator. The key point is the Stokes' theorem ; *<sup>C</sup>* d*u* = ; *∂C u ,* where *u* is a *(k* − 1*)*-form and *C* a *k*-chain. More precisely, if *u* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk(M)* then *z* = d*u* ∈ *P*<sup>−</sup> *<sup>r</sup>*+<sup>1</sup>*Λk*+1*(M)* and

$$\int\_{\{\mathfrak{a},S\}} z = \int\_{\{\mathfrak{a},S\}} \mathrm{d}u = \int\_{\partial\{\mathfrak{a},S\}} u = \sum\_{\{\mathfrak{F},F\}} \mathrm{B}\_{\{\mathfrak{a},S\},\{\mathfrak{F},F\}} \int\_{\{\mathfrak{F},F\}} u$$

being B the boundary matrix with as many rows as small simplices of dimension *k* and as many columns as small simplices of dimension *k* − 1. The small simplices {*α, S*} inherit the orientation of the simplex *S* so the coefficient B{*α,S*}*,*{*β,F*} is equal to the coefficient B*S,F* of the boundary of the simplex *S* if *β* = *α*. This is straightforward if dim*(F ) >* 0 and when dim*(F )* = 0, providing that small nodes in *T* are given in the notation {*α, n*} according to their position in the small simplices when fragmented (see Fig. 1 in [1]).

*Property 3* The generated *( r*+2 <sup>2</sup> *)* small faces on each face *F* of *T* , pave *F* together with the *( r*+1 <sup>2</sup> *)* reversed triangles, denoted by ∇, contained in *F*. Similarly, the generated *( r*+3 <sup>3</sup> *)* small tetrahedra contained in *T* pave *T* together with the *( r*+2 3 *)* octahedra, denoted by *O*, and the *( r*+1 <sup>3</sup> *)* reversed tetrahedra, denoted by ⊥, contained in *T* , as shown in Fig. 1. Reversed octahedra and reversed tetrahedra are examples of "holes" in *T* (see [9, 10]).

*Property 4* Since homology is preserved by homotopy, in [10, Section 3.4], it is discussed the fact that the relative homology (i.e., the homology [modulo the holes' boundaries]), of the complex of small simplices is the same of the homology of *M*. This property is fundamental to build the tree for high order potentials when working with small simplices. The homology [modulo the holes' boundaries] can be translated in matrix notation, by showing that the boundary matrices associated with the small simplices, "modified" and "completed" (in a sense that we explain in the next section) by the relations [10, Proposition 3.5] are incidence matrices of a graph. To apply the theory presented in [10, Section 3.4] in a tetrahedron *T* ∈ *Δ*3*(M)*, we need to introduce, for *r >* 0, two sets *K*<sup>1</sup> and *K*<sup>2</sup> of chains generated by the small simplices that belong to the boundary of some hole in *T* as follows:


The two sets *K*<sup>1</sup> and *K*<sup>2</sup> satisfy the property *∂K*<sup>2</sup> ⊂ *K*1, decisive to conclude that the relative homology [modulo the holes' boundaries] of the complex of the small simplices is the same as the homology of the original mesh *M* [10].

#### **4 Trees and Graphs**

As stated in [12], a directed graph *G* consists of two sets *N* and *A* of nodes and arcs, respectively, subjected to certain incidence relations, collected in the all-vertex incidence matrix <sup>M</sup>*<sup>G</sup>* <sup>∈</sup> <sup>Z</sup>|*<sup>N</sup>* |×|*A*<sup>|</sup> as follows:

$$
\mathbb{M}\_{n,a}^{\mathcal{G}} = \begin{cases}
+1 \, \, , \, \text{if } a \text{ ends in } n, \\
0 \, , \, \text{if } a \text{ does not contain } n.
\end{cases}
$$

An incidence matrix M of the graph *G* is any sub-matrix of M*<sup>G</sup>* with |*N* | − 1 rows and |*A*| columns. The node that corresponds to the row of M*<sup>G</sup>* that is not in M will be indicated as the reference node of *G*. A graph *G* is connected if there is a path between any two of its nodes. A tree *T* of a graph *G* is a connected acyclic subgraph of *G*. A spanning tree *T<sup>s</sup>* is a tree of *G* visiting all its nodes. Any connected graph *G* admits a spanning tree *Ts*. We have now to particularize these notions for small simplices. In each tetrahedron *T* of the oriented mesh *M*, we consider the small mesh associated with *Lr*+1*(T )* composed only of small tetrahedra, for a given *r* uniform all over the mesh *M*. The union of the small meshes for all *T* ∈ *Δ*3*(M)* is denoted *Mall*.

#### **A (Primal) Small Tree for the Gradient Problem**

For *<sup>r</sup>* <sup>=</sup> 0, the graph *<sup>G</sup>*<sup>1</sup> has *<sup>N</sup>* <sup>=</sup> *<sup>Δ</sup>*0*(M)* and *<sup>A</sup>* <sup>=</sup> *<sup>Δ</sup>*1*(M)*. The boundary matrix <sup>G</sup><sup>8</sup> is the all-vertex incidence matrix of the graph *<sup>G</sup>*1. Extracting a spanning 1-tree *T* 1 *<sup>s</sup>* from *<sup>G</sup>*<sup>1</sup> is equivalent to finding in <sup>G</sup>8, minus one row, a submatrix of maximal rank (see [11] for a suitable and easy way of constructing *T* ). For *r >* 0, we have

**Fig. 2** (Left) The graph *<sup>G</sup>*<sup>1</sup> and a spanning tree in thick line, for *<sup>r</sup>* <sup>=</sup> 1. (Right) A spanning tree for *r* = 2 in a fragmented layout

to consider the new graph *<sup>G</sup>*<sup>1</sup> with *<sup>N</sup>* <sup>=</sup> *<sup>Δ</sup>*0*(Mall)* and *<sup>A</sup>* <sup>=</sup> *<sup>Δ</sup>*1*(Mall)*. Let <sup>G</sup><sup>8</sup> *all* be the all-vertex incidence matrix of this new graph *<sup>G</sup>*1. Note that <sup>G</sup><sup>8</sup> *all* results from the boundary operator *∂*<sup>1</sup> on the elementary 1-chains from *Mall*. Extracting a spanning 1-tree *<sup>T</sup>* <sup>1</sup> *<sup>s</sup>* from *<sup>G</sup>*<sup>1</sup> is equivalent to finding in <sup>G</sup><sup>8</sup> *all*, minus one row, a submatrix of maximal rank. Example of spanning 1-tree *<sup>T</sup>* <sup>1</sup> *<sup>s</sup>* for *r*+1 = 2 in the right part of Fig. 2 and for *r* + 1 = 3 in Fig. 5 (fragmented visualization). Note that we can repeat this construction in the two-dimensional case.

#### **A (Dual) Small Tree for the Divergence Problem**

For *<sup>r</sup>* <sup>=</sup> 0, the graph *<sup>G</sup>*<sup>2</sup> is built on *<sup>M</sup>*∗, the so-called dual mesh of *<sup>M</sup>*, as follows. Let us note that an internal face *F* ∈ *Δ*2*(M)* connects two adjacent tetrahedra *T*1*, T*<sup>2</sup> ∈ *Δ*3*(M)* whereas a boundary face *Fb* ∈ *Δ*2*(M)* connects a tetrahedron *Tb* ∈ *Δ*3*(M)* and the boundary *∂Ω*. We can construct the following connected (dual) graph *<sup>G</sup>*2: the set of nodes, *<sup>N</sup>* , contains the barycenter of any tetrahedron *T* ∈ *Δ*3*(M)* together with one additional exterior node representing *∂Ω*; the set of arcs, *A*, contains any face *F* ∈ *Δ*2*(M)*. For *r* = 0, the matrix D associated with the boundary operator *∂*3, acting on *C*3*(M)*, is an incidence matrix of the (dual) graph *<sup>G</sup>*2, with reference node the one corresponding to *∂Ω*. Extracting a spanning tree *T* 2 *<sup>s</sup>* from *<sup>G</sup>*<sup>2</sup> is equivalent to finding in <sup>D</sup> a submatrix of maximal rank.

For *r >* 0, let *R*<sup>2</sup> be the set of small faces chosen as follows: one small face for each octahedron *O* contained in *K*<sup>2</sup> (see the right side of Fig. 3 for the dashed small face in *<sup>R</sup>*<sup>2</sup> when *<sup>r</sup>*+<sup>1</sup> <sup>=</sup> 2). To construct the graph *<sup>G</sup>*<sup>2</sup> for *r >* 0 we need to consider *M*<sup>∗</sup> *all*, the dual mesh associated to *Mall*, where nodes are the small tetrahedra and the arcs the small faces, apart from the ones in *R*2. To understand this, we can reason

**Fig. 3** The (dual) graph *<sup>G</sup>*<sup>2</sup> <sup>∗</sup> associated with the small mesh *<sup>M</sup>all* defined in a tetrahedron *<sup>T</sup>* for *<sup>r</sup>* <sup>=</sup> 1: the black dots are the nodes, and curved lines the arcs (Left). The (dual) graph *<sup>G</sup>*<sup>2</sup> obtained from *<sup>G</sup>*<sup>2</sup> <sup>∗</sup> by merging the nodes corresponding to barycenter of *<sup>t</sup>*<sup>0</sup> = {*(*1*,* <sup>0</sup>*,* <sup>0</sup>*,* <sup>0</sup>*), T* } and of *<sup>O</sup>*, thus eliminating the arc associated with the shaded small face *f <sup>O</sup> <sup>u</sup>* (Right)

as follows. For *r >* 0, we have one arc connecting two small tetrahedra, say *t*#, *t*◦, when


See an example of graph *<sup>G</sup>*<sup>2</sup> for *<sup>M</sup>all* (here *<sup>M</sup>* = {*<sup>T</sup>* }) in the left part of Fig. <sup>3</sup> for *r* + 1 = 2, where the node associated with the octahedron *O* is not a node in the graph, but stands to indicate that the four small tetrahedra are connected one to the other by one arc because they all have one small face on *∂O*. Naming *tk* the small tetra with a vertex in **v***k*, *k* = 0*,* 3, and numbering first the 3 × 4 faces on *tk* <sup>∩</sup>*∂T* , called *<sup>f</sup> <sup>k</sup> <sup>i</sup>* for *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* 3, second those on *∂O* (where *<sup>f</sup> <sup>O</sup> <sup>u</sup>* , *f <sup>O</sup> , f <sup>O</sup> <sup>d</sup> , f <sup>O</sup> <sup>r</sup>* are the small faces up, left, down, right of *∂O*), we have

$$\mathbb{D}\_{Imp} = \begin{pmatrix} -1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 \\ 1 & 1 & 1 & -1 \\ & & 1 & 1 & 1 & -1 \\ & & & 1 & 1 & 1 & -1 \\ & & & & 1 & 1 & 1 & -1 \\ & & & & & 1 & 1 & 1 \\ & & & & & & 1 & 1 & 1 \\ f\_1^0 \; f\_2^0 \; f\_3^0 \; f\_1^1 \; f\_2^1 \; f\_3^1 \; f\_1^2 \; f\_2^2 \; f\_3^2 \; f\_1^3 \; f\_2^3 \; f\_3^3 \; f\_4^0 \; f\_4^0 \; f\_4^0 \end{pmatrix}$$

Since the octahedron *O* is not part of the small mesh *Mall*, we have to imagine that its node collapses with the node of one of its neighbouring small tetrahedron, say *t*<sup>0</sup> with a vertex in **v**0, and thus that the corresponding arc (i.e. the small face *f <sup>O</sup> <sup>u</sup>* = *∂t*<sup>0</sup> ∩ *∂O*, the dashed one in the right part of Fig. 3) is eliminated. From a

**Fig. 4** Example of spanning tree in the (dual) graph *<sup>G</sup>*2, namely a selection of acyclic paths made of arcs, visiting all the nodes of *<sup>G</sup>*<sup>2</sup> (*<sup>r</sup>* <sup>=</sup> 1, Left and *<sup>r</sup>* <sup>=</sup> 2, Right)

matrix point of view, D is obtained by adding the line "*O*" in D*tmp* to the line "*t*0", and eliminating *f <sup>O</sup> <sup>u</sup>* , namely

$$\mathbf{D} = \begin{pmatrix} t\_0 \\ t\_1 \\ \mathbf{D} = t\_2 \\ t\_3 \\ \mathbf{f}\_1^0 \end{pmatrix} \begin{pmatrix} \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} \\ & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} \\ & & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} \\ & & \mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} \\ & & f\_1^0 \; f\_2^0 \; f\_3^0 \; f\_1^1 \; f\_2^1 \; f\_3^1 \; f\_1^2 \; f\_2^2 \; f\_3^2 & f\_1^3 \; f\_2^3 \; f\_3^3 \; f\_0^0 \; f\_0^0 \; f\_0^0 \; f\_0^1 \end{pmatrix}$$

(in bold font, the submatrix of maximal rank in <sup>D</sup> for the spanning tree *<sup>T</sup>* <sup>2</sup> *<sup>s</sup>* illustrated in Fig. 4, left part for *r* + 1 = 2). To repeat this construction in the two-dimensional case, when *T* is a triangle, we have to consider the mesh *Mall* of small triangles in *T* and the role of the core octahedra *O* is played by the reversed triangles ∇ ∈ *T* . The set *R*<sup>2</sup> is replaced by *R*1, composed of one small edge for each reversed triangle ∇ ∈ *K*1. In two dimensions we do not have reversed tetrahedra, therefore no reversed triangles ∇⊥.

The construction of the spanning tree in *Mall* can be done by assembling that of the geometrical mesh *M*, namely a spanning tree for the Whitney forms of lower degree (blue lines in Fig. 5 (Right)), together with local contributions, one from each element (green lines in Fig. 5 (Right)). Each local contribution results from one fixed on a *reference* element which is *mapped* on the current element (respecting the orientation). In Fig. 5 (Left), in green/red thick line we have marked the small edges of a spanning tree in the graph *<sup>G</sup>*1, for *<sup>r</sup>* <sup>=</sup> 3, in the reference triangle. The red ones belong to the spanning tree in the reference triangle, but they are in general omitted in the spanning tree of *Mall*, (indeed, they appear only if they are covered by the

**Fig. 5** (Left) In thick colored line, the small edges of the graph *<sup>G</sup>*1, for *<sup>r</sup>* <sup>=</sup> 3, that compose a spanning tree in a *reference* triangle. (Right) In thick blue line the contribution of the branches of a spanning tree in a (2D) toy mesh *M* reported on *Mall*. In green, the contribution of the small branches *mapped* from the green ones in the reference triangle. It is not necessary to report the red ones since they are either covered by the blue ones or omitted. The co-tree is in black

blue tree). The small co-tree is in black. A similar construction can be repeated in 3D (both for *k* = 1 and *k* = 2) and it reflects the decomposition given, for instance, in [13] (Sect. 5).

#### **References**


Small Trees for High Order Whitney Elements 597


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Non-conforming Elements in Nek5000: Pressure Preconditioning and Parallel Performance**

**A. Peplinski, N. Offermans, P. F. Fischer, and P. Schlatter**

### **1 Introduction**

One of the most important concerns when solving numerically partial differential equations is finding the optimal grid on which the solution will be computed. Unfortunately in most cases it is not an easy task that could be determined in advance without deep understanding of the studied problem. That is why selfadapting algorithms like e.g. adaptive mesh refinement (AMR) have received much attention in past decades and became an important part of many packages for numerical modelling of fluid dynamics e.g. [9, 18]. The goal of AMR is to control the computational error during the simulation by placing higher resolution grids where it is needed. This makes the numerical modelling more robust, and gives the possibility to increase the accuracy of numerical simulations at minimal computational cost. The drawback is, however, increased solver complexity, and it that can have negative effects on the parallel code performance, in particular related to load balancing.

There are number of different AMR schemes, and in the context of the spectral element method (SEM) [16], in which the discretisation is based on a decomposition of the computational domain into a number of non-overlapping, high-order subdomains called elements, we can distinguish three different categories: The mesh adaptation in this case can mean adjusting the (local) size of an element (*r*-

e-mail: adam@mech.kth.se; nof@mech.kth.se; pschlatt@mech.kth.se

P. F. Fischer

A. Peplinski (-) · N. Offermans · P. Schlatter

Linné FLOW Centre and Swedish e-Science Research Centre (SeRC), KTH Mechanics, Stockholm, Sweden

CS and MechSE Depts., University of Illinois at Urbana–Champaign, Champaign, IL, USA e-mail: fischerp@illinois.edu

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_48

refinement), changing the polynomial order in a particular element (*p*-refinement), or splitting the element into smaller ones (*h*-refinement). In this work we concentrate on an *h*-refinement framework and its implementation in *Nek5000* [8], which is a highly parallel and efficient SEM solver for the incompressible Navier–Stokes equations. In its established version, Nek5000 only supports conformal elements at constant polynomial order throughout the domain.

The present work was started within EU project CRESTA, where the nonconforming solver for advection-diffusion problem was developed and the basic AMR tasks were implemented using existing external libraries. As *h*-refinement affects the element connectivity resulting in non-conforming meshes, a special grid manager is required to perform local refinement/coarsening and to build globally consistent meshes. For this task the *p4est* library [1] has been chosen, as it is designed to manipulate domains composed of multiple, non-overlapping logical cubic sub-domains, which can be represented by a recursive tree structure. This library provides element connectivity information for the dual graph, which is later manipulated by ParMETIS [10] producing a new element-to-processor mapping. The final step of grid refinement/coarsening and redistribution is performed within the non-conforming version of *Nek5000*, which utilises the so-called conformingspace/nonconforming-mesh approach based on the previous work of Fischer et al. [7, 11]. As the solver complexity grows special care has been taken to develop efficient tools that can be used within AMR framework. A more detailed description of them and the related scaling tests can be found in [17].

The goal of ExaFLOW is to extend results of CRESTA to the full incompressible Navier–Stokes equations focusing on proper adaptation of the pressure preconditioners for nonconforming SEM. Defining a robust parallel preconditioning strategy has received much attention in past decades, as the linear sub-problem associated with the divergence-free constraint (pressure-Poisson equation) can become very illconditioned. In the context of SEM two possible approaches based on the additive overlapping Schwarz method [4, 6] and the hybrid Schwarz-multigrid method [5, 12] were proposed and implemented in *Nek5000*, leading to a significant reduction of pressure iterations.

In the present paper, we discuss the modifications necessary to adapt *Nek5000* for the *h*-type AMR framework. The article is organised as follows. A short description of SEM and pressure preconditioners is given in Sects. 2 and 3. The following Sects. 4 and 5 describe the algorithmic modifications and parallel performance of the code. Finally, Sect. 6 provides conclusion and future work.

#### **2 SEM Discretisation of the Navier–Stokes Equations**

We review briefly the discretisation of the incompressible Navier–Stokes equations to introduce notation and point out algorithm parts that require modification. The more in-depth derivation can be found in e.g. [4]. The temporal discretisation is based on a semi-implicit scheme in which the nonlinear term is treated explicitly and the remaining unsteady Stokes problem is solved implicitly. To avoid spurious pressure modes our spatial discretisation is based on the <sup>P</sup>*<sup>N</sup>* <sup>−</sup> <sup>P</sup>*N*−<sup>2</sup> SEM, where velocity and pressure spaces are spanned by Lagrangian interpolants on the Gauss–Lobatto–Legendre (GLL) and Gauss–Legendre (GL) quadrature points, respectively. Note that the basis for velocity is continuous across element interfaces, whereas the basis for pressure is not. Assuming **f** *<sup>n</sup>* incorporates all nonlinear and source terms treated explicitly at time *t <sup>n</sup>*, the matrix form of the Stokes problem after applying the Uzawa decoupling reads:

$$
\begin{bmatrix}
\mathsf{H} & -\frac{\Delta t}{\beta\_0} \mathsf{H} \mathsf{B}^{-1} \mathsf{D}^T \\
0 & \mathsf{E}
\end{bmatrix}
\begin{pmatrix}
\mathsf{u}^n \\
\Delta p
\end{pmatrix} = \begin{pmatrix}
\mathsf{B} \mathsf{f}^n + \mathsf{D}^T p^{n-1} \\
\mathsf{g}
\end{pmatrix},
\tag{1}
$$

where

$$\mathbf{E} = \frac{\Delta t}{\beta\_0} \mathbf{D} \mathbf{B}^{-1} \mathbf{D}^T \tag{2}$$

is the Stokes Schur complement governing the pressure, *<sup>p</sup>* <sup>=</sup> *<sup>p</sup><sup>n</sup>* <sup>−</sup> *<sup>p</sup>n*−<sup>1</sup> is the pressure update, and *g* is the inhomogeneity arising from Gaussian elimination. In these equations <sup>H</sup> = − <sup>1</sup> *Re*<sup>A</sup> <sup>+</sup> *<sup>β</sup>*<sup>0</sup> *<sup>t</sup>* B and D are the discrete Helmholtz and divergence operators, respectively. *β*0, A and B denote here a coefficient from time derivative, a discrete Laplacian and a diagonal mass matrix associated with the velocity mesh. Applying the Uzawa decoupling we use the inverse mass matrix B−<sup>1</sup> as approximation of the inverse Helmholtz operator H−1, giving rise to a splitting error. Note that for this splitting method the diagonality of the mass matrix B is crucial to avoid costly matrix inversion.

All operators H, A, B and E are symmetric positive definite (SPD) and can be solved with a preconditioned conjugate gradient (PCG) method. Moreover, E has properties similar to a Poisson operator, and is often referred to as a *consistent Poisson operator*. The systems involving H and E are solved iteratively with E being more challenging, and in the next section we will present the preconditioning strategy for the pressure equation,

$$\mathsf{E}\Delta p = -\mathsf{D}\mathbf{u}\ . \tag{3}$$

We close this section by shortly presenting the SEM operators. SEM introduces a globally unstructured and locally structured basis by tessellating the domain into *<sup>K</sup>* non-overlapping subdomains (deformed quadrilaterals), <sup>=</sup> <sup>8</sup>*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *k*, and representing functions in each subdomain in terms of tensor-product polynomials on a reference subdomain ˆ = [−1*,* 1] *<sup>d</sup>*. In this approach every function or operator is represented by its local counterparts, which in case of functions takes the form of a sum over the subdomains

$$f(\mathbf{x}) = \sum\_{k=1}^{K} \sum\_{l} f\_l^k h\_l(\mathbf{r}) \ .$$

Here, *f <sup>k</sup> <sup>i</sup>* and *hi* are the nodal values of the function in *k* and the base functions in ˆ , respectively, with *i* representing the natural ordering of nodes in ˆ . Combining the coefficients *f <sup>k</sup> <sup>i</sup>* one can build global *f* and local *f <sup>L</sup>* representations of the function. Each global degree of freedom occurs only once in the global representation, but has multiple copies of faces, edges and vertices related to *k* in the local one. To enforce function continuity, the global-to-local mapping is defined as the matrix–vector product *f <sup>L</sup>* = Q*f* , where Q is a binary operator duplicating the basis coefficients in adjoining subdomains. The action <sup>Q</sup>*<sup>T</sup> <sup>f</sup> <sup>L</sup>* sums multiple contributions to the global degree of freedom from their local values. The assembled global stiffness matrix A takes the form

$$(\nabla f, \nabla \underline{\mathbf{g}}) = \underline{f}^T \mathsf{A} \underline{\mathbf{g}} = \underline{f}^T \mathsf{Q}^T \mathsf{A}\_L \mathsf{Q} \underline{\mathbf{g}},$$

where a block diagonal matrix A*<sup>L</sup>* is the unassembled stiffness matrix with each diagonal block consisting of the local stiffness matrix A*<sup>k</sup> ij* <sup>=</sup> ; *dhi dx dhj dx dx*. In practise, the global stiffness matrix is never formed explicitly, and the gather–scatter operator QQ*<sup>T</sup>* is used instead. This operator contains all information about element connectivity.

#### **3 Pressure Preconditioner**

An efficient solution of Eq. (3) requires finding an SPD preconditioning matrix M−<sup>1</sup> which can be inexpensively applied and which reduces the condition number of M−1E. Preconditioners based on domain decomposition are a natural choice for SEM as the data is structured within an element but is otherwise unstructured.

An overlapping additive Schwarz preconditioner for Eq. (3) was developed in [4] based on linear finite element discretisation of Poisson operator. It combines solutions of the local Poisson problems in overlapping subdomains R*<sup>T</sup> <sup>k</sup>* <sup>A</sup><sup>ˆ</sup> <sup>−</sup><sup>1</sup> *<sup>k</sup>* R*<sup>k</sup>* with the coarse grid problem R*<sup>T</sup>* <sup>0</sup> <sup>A</sup><sup>ˆ</sup> <sup>−</sup><sup>1</sup> <sup>0</sup> R0, which is solved on few degrees of freedom, but covers the entire domain

$$\mathbf{M}^{-1} = \mathbf{R}\_0^T \hat{\mathbf{A}}\_0^{-1} \mathbf{R}\_0 + \sum\_k \mathbf{R}\_k^T \hat{\mathbf{A}}\_k^{-1} \mathbf{R}\_k.$$

For the local problems restriction and prolongation operators, R*<sup>k</sup>* and R*<sup>T</sup> <sup>k</sup>* , are Boolean matrices that transfer data to and from the subdomain, and Aˆ *<sup>k</sup>* is a local stiffness matrix which can be inverted with e.g. a fast diagonalisation method. Note that action of R*<sup>k</sup>* and R*<sup>T</sup> <sup>k</sup>* are similar to the gather–scatter operator QQ*<sup>T</sup>* .

The coarse grid problem corresponds to the Poisson problem solved on the element vertices only, with R*<sup>T</sup>* <sup>0</sup> being the linear operator interpolating the coarse grid solution onto the tensor product array of GL points. Unlike in [4, 6], Aˆ <sup>0</sup> is defined using local SEM-based Neumann operators performing the projection of local stiffness matrices A*<sup>k</sup>* evaluated on the GLL quadrature points onto the set of coarse base functions *bi* representing the linear finite element base on the GLL grid. The coarse base functions are defined in ˆ as a tensor-product of the onedimensional linear functions. The local contribution to Aˆ <sup>0</sup> is given by *b<sup>T</sup> <sup>i</sup>* A*kbj* , and the full Aˆ <sup>0</sup> is finally assembled by local-to-global mapping summing contributions to the global degree of freedom from their local counterparts. Aˆ <sup>0</sup> is one of few matrices formed explicitly in *Nek5000*.

On the other hand, the hybrid Schwarz-multigrid preconditioner is based on the multiplicative Schwarz method, which for the two-level scheme takes the form,

$$\mathbf{M}^{-1} = \mathbf{R}\_0^T \hat{\mathbf{A}}\_0^{-1} \mathbf{R}\_0 \left[ \sum\_k \mathbf{R}\_k^T \hat{\mathbf{A}}\_k^{-1} \mathbf{R}\_k \right],$$

and leads to the following two-level multigrid scheme,

$$\begin{aligned} (i) \ u^1 &= \sum\_k \mathsf{R}\_k^T \hat{\mathsf{A}}\_k^{-1} \mathsf{R}\_k \mathsf{g}, \\\ (ii) \ r &= \mathsf{g} - \mathsf{A}u^1, \\\ (iii) \ e &= \mathsf{R}\_0^T \hat{\mathsf{A}}\_0^{-1} \mathsf{R}\_0 r, \\\ (iv) \ u &= u^1 + e, \end{aligned}$$

where *g*, *r*, *e* and *u* are right-hand side, residual, coarse-grid error and solution of equation A*u* = *g*, respectively. This method can be extended to a general multilevel solver performing a full V cycle [5, 12]. Notice that by replacing step *ii)* with *r* = *g* we obtain the additive Schwarz preconditioner.

#### **4 Adaptation for Non-conforming Meshes**

The important advantage of SEM in the context of AMR is its spatial decomposition into elements that can easily be split into smaller ones, and use of the local representation of the operators which decouples intra- and inter-element operations. As *h*-type AMR using the conforming-space/nonconforming-mesh approach leaves the approximation spaces unchanged, most of the tensor-product operations evaluated element-by-element are preserved, limiting the changes in the algorithm.

The inter-element operations are mostly performed by the gather–scatter operator QQ*<sup>T</sup>* which has to be redefined to include spectral interpolation at the nonconforming faces. Following [7] we consider a non-conforming face shared by one low resolution element (parent) and two (in 3D four) high resolution elements (children). We introduce a local parent-to-child interpolation operator J*cp* which is a spectral interpolation operator with entries

$$(\mathsf{J}^{cp})\_{ij} = h\_j(\xi\_l^{cp}),$$

where *ζ cp <sup>j</sup>* represents the mapping of GLL points from the child face to its parent. This operator is locally applied to give the desired nodal values on the child face, after Q copies data form the parent to the children. Building a block-diagonal matrix J*<sup>L</sup>* with local matrices J*cp* one can redefine scatter J*L*Q and gather–scatter J*L*QQ*<sup>T</sup>* J*<sup>T</sup> <sup>L</sup>* operators, respectively. For more discussion see Fig. 6 and Sect. 4 in [7].

The next crucial modification is diagonalisation of the global mass matrix Q*<sup>T</sup>* B*L*Q (B*<sup>L</sup>* is a block-diagonal built of local mass matrices), whose inverse is required in Eqs. (1) and (2). It is non-diagonal due to the fact that the quadrature points in the elements along the non-conforming faces do not coincide. A diagonalisation procedure is given in [7] and consists of building the global vector *b*˜

$$
\underline{\tilde{\Phi}} := \mathbf{B} \hat{\underline{e}} = \mathbf{Q}^T \mathbf{J}\_L^T \mathbf{B}\_L \hat{\underline{e}}\_L,
$$

and finally setting the lumped mass matrix B˜*ij* = *δij b*˜ *<sup>i</sup>*. *e*ˆ and *e*ˆ*<sup>L</sup>* denote here the global and local vectors containing all ones.

The additive Schwarz preconditioner requires two significant modifications. The first one is related to the assembly of the coarse grid operator Aˆ 0, which gets more complex for non-conforming meshes. This is due to the fact that the non-conforming mesh introduces hanging vertices located in the middle of faces or edges. These hanging vertices are not global degrees of freedom and cannot be included in Aˆ 0. To remove them from consideration one has to modify the set of local coarse base functions *bi*, which are thus dependent on the shape of the refined region as well as the position and orientation of the child face with respect to the parent one. Unlike the conforming case, where all *bi* could be represented by a tensor product of two or three linear functions, the non-conforming mesh requires 5 basic components in two and 21 in three dimensions to assemble all the possible shapes of *bi*.

The last missing components are the restriction and prolongation operators, R*<sup>k</sup>* and R*<sup>T</sup> <sup>k</sup>* , for the local Poisson problem. Taking into account the similarity between these operators with QQ*<sup>T</sup>* and following the previous development we use an operator similar to J*L*QQ*<sup>T</sup>* J*<sup>T</sup> <sup>L</sup>*, replacing J*<sup>L</sup>* with the interpolation operator defined on the GL quadrature points. Although this choice seems to be optimal as it preserves properties of the preconditioner and J*<sup>T</sup> <sup>L</sup>* is well defined, our numerical experiments showed a significant increase of pressure iterations in some cases. It was found to be caused by the noise introduced by J*<sup>T</sup> <sup>L</sup>* in the Schwarz operator. To reduce this noise we replaced the transposed interpolation operator with the inverse one, getting a significant reduction of iterations. Unfortunately, such a preconditioner is no longer SPD and PCG cannot be used as an iterative solver in this case. The other problem is the definition of J−<sup>1</sup> *<sup>L</sup>* , as <sup>J</sup>*cp* can be inverted for square matrices only, thus excluding *p*-refinement strategies. To avoid this problem we define a child-to-parent interpolation operator J*pc* with the entries

$$(\mathsf{J}^{pc})\_{ij} = \begin{cases} h\_f(\xi\_i^{pc}) & \text{if } \xi\_j^p \in \partial\Omega^p \cap \partial\Omega^c \\ 0 & \text{otherwise} \end{cases},$$

where *∂<sup>p</sup>* and *∂<sup>c</sup>* are the parent and child common faces, *ζ <sup>p</sup> <sup>j</sup>* is a parent GLL point at the face *∂p*, and *ζ pc <sup>j</sup>* represents the mapping of *<sup>ζ</sup> <sup>p</sup> <sup>j</sup>* to the child face *∂c*. This operator is locally applied to give the desired nodal values on the child face, before Q*<sup>T</sup>* sums data form the children and the parent. Building block-diagonal matrix J−<sup>1</sup> *<sup>L</sup>* consisting of local matrices <sup>J</sup>*pc* one can redefine the gather–scatter J*L*QQ*<sup>T</sup>* J−<sup>1</sup> *<sup>L</sup>* operator such that it is appropriate for the pressure preconditioner.

In a similar way we modify the multiplicative Schwarz method, as it shares a number of features with the additive one. In this case we distinguish between Schwarz (acting at single level) and restriction (connecting different levels) operators and apply J*L*QQ*<sup>T</sup>* J−<sup>1</sup> *<sup>L</sup>* and <sup>J</sup>*L*QQ*<sup>T</sup>* <sup>J</sup>*<sup>T</sup> <sup>L</sup>* to each of them, respectively. Unlike the additive preconditioner, the hybrid one requires also the redefinition of the diagonal weight matrix that indicates the number of sub-domains sharing a given node, and is used to accommodate for overlapping regions. Its value is important as it reduces the largest eigenvalue of the MA operator and defines the smoothing properties of the additive Schwarz step (see [4] and the references therein). In the conforming case its definition is straightforward, however the non-conforming case is more involved as hanging nodes are not real degrees of freedom. In the current implementation the information about node multiplicity on the non-conforming faces is hidden to the parent element, so the parent element sees only one neighbour instead of two (four in 3D). Although this choice gives a preconditioner that significantly reduces the number of pressure iterations, its performance for the studied cases is slightly worse than the performance of the additive Schwarz preconditioner. This can be caused by a non-optimal value of the weight matrix, or by the fact that the hybrid preconditioner is superior over the additive one for high-aspect ratio elements (that are not present in our adaptive simulations).

#### **5 Parallel Performance**

The parallel performance test is based on the one of the ExaFLOW flagship calculations, and consists of the turbulent flow around a NACA4412 wing section with 5◦ angle of attack, at a Reynolds number based on inflow velocity *U*<sup>∞</sup> and chord length *c* of *Rec* = 200*,*000. It was previously studied in a series of wellresolved large-eddy simulations conducted with the conforming *Nek5000* version, and discussed in detail in [19]. This flow configuration was chosen to illustrate the significant benefit of using AMR, in particular when it comes to the farfield region in the computational domain, but for this article we will only briefly discuss the

**Fig. 1** (**a**) Volume visualisation of that part of the domain covered by refinement levels higher than one for the turbulent flow around a wing profile. The wing vicinity and wake region are resolved and a colour indicates different refinement levels. (**b**) Strong scaling of the non-conforming *Nek5000* solver for the same case performed on Beskow. The plot shows the time per time step as a function of node number. Each node consists of 32 cores

strong scaling results. We omit here a weak scaling test, as *Nek5000* uses iterative solvers and with the current example we cannot provide meaningful data.

The initial coarse and conforming mesh consisted of 2190 elements with polynomial order *N* = 7 and was evolved for 7.2 time units *c/U*<sup>∞</sup> to evolve the refinement process using spectral error indicators [13, 14], and allowing for 6 refinements levels. The resulting non-conforming grid was built of 224,272 elements with 76*.*<sup>37</sup> <sup>×</sup> <sup>10</sup><sup>6</sup> degrees of freedom, resolving the wing surface and the wake, Fig. 1a. This final mesh was used to test the parallel performance of the non-conforming solver using the petascale Cray XC40 system Beskow at PDC (Stockholm). This system consists of 2060 nodes with 32 cores per node and 2.438 PFlops peak performance. We compare our results with the scaling tests of the conforming *Nek5000* presented in Offermans et al. [15]. The most relevant test in this article is pipe flow at *Reτ* = 360 (upper-right plot in their Fig. 5), as it is similar in size with the discussed wing case. We should mention here that our goal is not to improve the parallel performance of the conforming code, but rather to retain it despite of a work imbalance introduced by an additional operator in the direct stiffness summation of the non-conforming solver.

To be able to compare to the conforming solver, we focus on the time evolution loop only, excluding code initialisation, finalisation, mesh rebuilding within AMR and I/O operations. The result of the strong scaling test is presented in Fig. 1b showing the time per time step as a function of node count. This plot is almost identical with the reference one in [15]. Both show slight super-linear scaling between 32 and 256 nodes despite growing work imbalance for the non-conforming solver. We also reach the strong scaling limit at around 256 nodes, which for the conforming solver on Beskow was estimated to be between 30*,*000 and 50*,*000 degrees of freedom per core [15]. This shows that the parallel performance of the non-conforming and conforming solvers is almost the same and proves the efficiency of our implementation.

The maximum number of the compute nodes used in the test was not set by the parallel properties of the non-conforming *Nek5000*, but by the quality of the domain partitioning provided by *ParMETIS*. Within ExaFLOW we developed a new grid partitioning scheme for *Nek5000* (not discussed in this paper) that takes into account a core distribution among the nodes, and consists of two steps: inter- and intra-node partitioning. Although this two-level partitioning scheme significantly improves the efficiency of a coarse grid operations for XXT, especially during the setup phase, it relies on the quality of an inter-node partitioning. If the first step gives subdomains with disjoint graphs, the second step cannot be performed. We found that the probability of getting disjoint graphs increases with decreasing number of elements per node, virtually prohibiting the runs with less than 1000 elements per node. However, this limit can differ between simulations. We note however that in the standard production use of the solver this limitation is not critical, as according to [15] it is usually close to the strong scaling limit of conforming *Nek5000*.

#### **6 Conclusions**

Within the ExaFLOW project we developed a fully functional SEM-based *h*-type adaptive mesh refinement (AMR) solver for the incompressible Navier–Stokes equations. This allows for much larger flow cases to be run at reduced cost, as the high resolution grid is placed only in those region where it is needed. At the same time the simulation quality is improved, as the computational error can be controlled during the run.

We have optimised for non-conforming meshes the pressure preconditioners based on the additive overlapping Schwarz and hybrid Schwarz-multigrid methods. To achieve this we modified the base functions for the assembly of a coarse-grid operator to remove hanging nodes, and redefined the direct stiffness summation operator to include spectral interpolation at the non-conforming faces and edges. We introduced two operators J*L*QQ*<sup>T</sup>* J*<sup>T</sup> <sup>L</sup>* and <sup>J</sup>*L*QQ*<sup>T</sup>* <sup>J</sup>−<sup>1</sup> *<sup>L</sup>* for the different steps in the pressure calculation. The last crucial modification was the diagonalisation of the global mass matrix.

Using real flow cases we show our AMR implementation to be correct and efficient. An important success is the fact that parallel performance of the conforming and non-conforming solvers is very similar, despite the increased complexity of the non-conforming one.

In the future we are going to investigate other definitions of the weight matrix for the hybrid Schwarz-multigrid method, and to test different pressure preconditioners based on the restricted additive Schwarz method [2, 3]. We are going as well to work on the quality of the graph partition, as the two-level partitioning would not accept disjoint graphs on the node's subdomain.

**Acknowledgements** We would like to thank Niclas Jansson for sharing his expertise on adaptive mesh refinement. The work presented in this publication was supported by a European Commission Horizon 2020 project grant entitled "ExaFLOW: Enabling Exascale Fluid Dynamics Simulation" (grant reference 671571). Computer time was provided by Swedish National Infrastructure for Computing (SNIC) at PDC (KTH Stockholm) and by ExaFLOW at HLRS Stuttgart.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Sparse Approximation of Multivariate Functions from Small Datasets Via Weighted Orthogonal Matching Pursuit**

**Ben Adcock and Simone Brugiapaglia**

### **1 Introduction**

In recent years, a new class of approximation strategies based on compressive sensing (CS) has been shown to be able to substantially lessen the curse of dimensionality in the context of approximation of multivariate functions from pointwise data, with applications to the uncertainty quantification of partial differential equations with random inputs. Based on random sampling from orthogonal polynomial systems and on weighted <sup>1</sup> minimization, these techniques are able to accurately recover a sparse approximation to a function of interest from a smallsized datasets of pointwise samples. In this paper, we show the potential of weighted greedy techniques as an alternative to convex minimization programs based on weighted <sup>1</sup> minimization in this context.

The contribution of this paper is twofold. First, we propose a weighted orthogonal matching pursuit (WOMP) algorithm based on a rigorous derivation of the corresponding greedy index selection strategy. Second, we numerically show that WOMP is a promising alternative to convex recovery programs based on weighted <sup>1</sup> minimization, thanks to its ability to compute sparse approximations with an accuracy comparable to those computed via weighted <sup>1</sup> minimization, but with a considerably lower computational cost when the target sparsity level (and, hence, the number of WOMP iterations) is small enough. It is also worth observing here that WOMP computes approximations that are exactly sparse, as opposed to approaches based on weighted <sup>1</sup> minimization, which provide compressible approximations in general.

Simon Fraser University, Burnaby, BC, Canada e-mail: ben\_adcock@sfu.ca; simone\_brugiapaglia@sfu.ca

B. Adcock · S. Brugiapaglia (-)

<sup>©</sup> The Author(s) 2020

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_49

**Brief Literature Review** Various approaches for multivariate function approximation based on CS with applications to uncertainty quantification can be found in [1, 3–6, 11–13, 17]. An overview of greedy methods for sparse recovery in CS and, in particular of OMP, can be found in [7, Chapter 3.2]. For a general review on greedy algorithms, we refer the reader to [15] and references therein. Some numerical experiments on a weighted variant of OMP have been performed in the context of CS methods for uncertainty quantification in [4]. Weighted variants of OMP have also been considered in [10, 16], but the weighted procedure is tailored for specific signal processing applications and the term "weighted" does not refer to the weighted sparsity setting of [14] employed here. To the authors' knowledge, the weighted variant of OMP considered in this paper seems to have been proposed here for the first time.

**Organization of the Paper** In Sect. 2 we describe the setting of sparse multivariate function approximation in orthonormal systems via random sampling and weighted <sup>1</sup> minimization. Then, in Sect. 3 we formally derive a strategy for the greedy selection in the weighted sparsity setting and present the WOMP algorithm. Finally, we numerically show the effectiveness of the proposed technique in Sect. 4 and give our conclusions in Sect. 5.

#### **2 Sparse Multivariate Function Approximation**

We start by briefly introducing the framework of sparse multivariate function approximation from pointwise samples and refer the reader to [3] for further details.

Our aim is to approximate a function defined over a high-dimensional domain

$$f: D \to \mathbb{C}, \quad \text{with } D = (-1, 1)^d,$$

where *d* ' 1, from a dataset of pointwise samples *f (t*1*), . . . , f (tm)*. Let *ν* be a probability measure on *<sup>D</sup>* and let {*φj* }*j*∈N*<sup>d</sup>* <sup>0</sup> be an orthonormal basis for the Hilbert space *L*<sup>2</sup> *<sup>ν</sup> (D)*. In this paper, we will consider {*φj* }*j*∈N*<sup>d</sup>* <sup>0</sup> to be a tensorized family of Legendre or Chebyshev orthogonal polynomials, with *ν* being the uniform or the Chebyshev measure on *<sup>D</sup>*, respectively. Assuming that *<sup>f</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *<sup>ν</sup> (D)* ∩ *L*∞*(D)*, we consider the series expansion

$$f = \sum\_{j \in \mathbb{N}\_0^d} \mathbf{x}\_j \boldsymbol{\phi}\_j.$$

Then, we choose a finite set of multi-indices *\** <sup>⊆</sup> <sup>N</sup>*<sup>d</sup>* <sup>0</sup> with |*\**| = *N* and obtain the truncated series expansion

$$f\_{\Lambda} = \sum\_{j \in \Lambda} x\_j \phi\_j.$$

In practice, a convenient choice for *\** is the hyperbolic cross of order *s*, i.e.

$$\Lambda := \left\{ j \in \mathbb{N}\_0^d : \prod\_{k=1}^d (j\_k + 1) \le s \right\},$$

due to the moderate growth of *N* with respect to *d*. Now, assuming we collect *m N* pointwise samples independently distributed according to *ν*, namely,

$$f(t\_{\mathcal{l}}), \dots, f(t\_{m}), \quad \text{with} \quad t\_{\mathcal{l}}, \dots, t\_{m} \stackrel{\text{i.i.d.}}{\sim} \nu,$$

the approximation problem can be recasted as a linear system

$$A\mathbf{x}\_{\Lambda} = \mathbf{y} + e,\tag{1}$$

with *x\** <sup>=</sup> *(xj )j*∈*\** <sup>∈</sup> <sup>C</sup>*<sup>N</sup>* , and where the sensing matrix *<sup>A</sup>* <sup>∈</sup> <sup>C</sup>*m*×*<sup>N</sup>* and the measurement vector *<sup>y</sup>* <sup>∈</sup> <sup>C</sup>*<sup>m</sup>* are defined as

$$A\_{lj} := \frac{1}{\sqrt{m}} \phi\_j(t\_l), \quad \text{y}\_l := \frac{1}{\sqrt{m}} f(t\_l), \quad \forall i \in [m], \ \forall j \in [N], \tag{2}$$

with [*k*] := {1*,...,k*} for every *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. The vector *<sup>e</sup>* <sup>∈</sup> <sup>C</sup>*<sup>m</sup>* accounts for the truncation error introduced by *\** and satisfies *e*<sup>2</sup> ≤ *η*, where *η >* 0 is an a priori upper bound to the truncation *L*∞*(D)*-error, namely *f* − *f\*L*∞*(D)* ≤ *η*. A sparse approximation to the vector can be then computed by means of weighted <sup>1</sup> minimization.

Given weights *<sup>w</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* with *w >* 0 (where the inequality is read componentwise), recall that the weighted <sup>1</sup> norm of a vector *<sup>z</sup>* <sup>∈</sup> <sup>C</sup>*<sup>N</sup>* is defined as *z*1*,w* := < *<sup>j</sup>*∈[*N*] <sup>|</sup>*zj* <sup>|</sup>*wj* . We can compute an approximation *<sup>x</sup>*ˆ*\** to *x\** by solving the weighted quadratically-constrained basis pursuit (WQCBP) program

$$\hat{\mathfrak{X}}\_{\Lambda} \in \arg\min\_{\mathfrak{z} \in \mathbb{C}^N} \|\mathfrak{z}\|\_{\mathbb{L},w}, \quad \text{s.t.} \quad \|Az-\mathsf{y}\|\_2 \le \eta,\tag{3}$$

where the weights *<sup>w</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* are defined as

$$\|w\_j = \|\phi\_j\|\_{L^\infty(D)}.\tag{4}$$

The effectiveness of this particular choice of *w* is supported by theoretical results and it has been validated from the numerical viewpoint (see [1, 3]). The resulting approximation *f*ˆ *\** to *f* is finally defined as

$$
\hat{f}\_\Lambda := \sum\_{j \in \Lambda} (\hat{\mathfrak{x}}\_\Lambda)\_j \phi\_j.
$$

In this setting, stable and robust recovery guarantees in high probability can be shown for the approximation errors *f* − *f\*L*<sup>2</sup> *<sup>ν</sup> (D)* and *f* − *f\*L*∞*<sup>ν</sup> (D)* under a sufficient condition on the number of samples of the form *<sup>m</sup> <sup>s</sup><sup>γ</sup>* · polylog*(s, d),* with *γ* = 2 or *γ* = log*(*3*)/* log*(*2*)* for tensorized Legendre or Chebyshev polynomials, respectively, hence lessening the curse of dimensionality to a substantial extent (see [3] and references therein). We also note in passing that decoders such as the weighted LASSO or the weighted square-root LASSO can be considered as alternatives to (3) for weighted <sup>1</sup> minimization (see [2]).

#### **3 Weighted Orthogonal Matching Pursuit**

In this paper, we consider greedy sparse recovery strategies to find sparse approximate solutions to (1), as alternatives to the WQCBP optimization program (3). With this aim, we propose a variation of the OMP algorithm to the weighted setting.

Before introducing weighted OMP (WOMP) in Algorithm 1, let us recall the rationale behind the greedy index selection rule of OMP (corresponding to Algorithm 1 with *λ* = 0 and *w* = 1). For a detailed introduction to OMP, we refer the reader to [7, Section 3.2]. Given a support set *S* ⊆ [*N*], OMP solves the least-squares problem

$$\min\_{z \in \mathbb{C}^N} G\_0(z) \text{ s.t. } \text{supp}(z) \subseteq \mathcal{S},$$

where *<sup>G</sup>*0*(z)* := *<sup>y</sup>* <sup>−</sup> *Az*<sup>2</sup> <sup>2</sup>. In OMP, the support *S* is iteratively enlarged by one index at the time. Namely, we consider the update *S* ∪ {*j* }, where the index *j* ∈ [*N*] is selected in a greedy fashion. In particular, assuming that *A* has 2-normalized columns, it is possible to show that (see [7, Lemma 3.3])

$$\min\_{\mathbf{x}\in\mathbb{C}} G\_0(\mathbf{x} + te\_j) = G\_0(\mathbf{x}) - \left| (A^\*(\mathbf{y} - A\mathbf{x}))\_j \right|^2. \tag{5}$$

This leads to the greedy index selection rule operated by OMP, which prescribes the selection of an index *j* ∈ [*N*] that maximizes the quantity |*(A*∗*(y* − *Ax))j* | 2. We will use this simple intuition to extend OMP to the weighted case by replacing the function *G*<sup>0</sup> with a suitable function *Gλ* that takes into account the data-fidelity term and the weighted sparsity prior at the same time.

Let us recall that, given a set of weights *<sup>w</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* with *w >* 0, the weighted <sup>0</sup> norm of a vector *<sup>z</sup>* <sup>∈</sup> <sup>C</sup>*<sup>N</sup>* is defined as the quantity (see [14])<sup>1</sup>

$$\|z\|\_{0,w} := \sum\_{j \in \text{supp}(z)} w\_j^2.$$

<sup>1</sup>The term "norm" here is an abuse of language, but we will stick to it due to its popularity.

Notice that when *<sup>w</sup>* <sup>=</sup> 1, then ·0*,w* = ·<sup>0</sup> is the standard <sup>0</sup> norm. Given *<sup>λ</sup>* <sup>≥</sup> 0, we define the function

$$G\_{\lambda}(z) := \|\mathbf{y} - Az\|\_{2}^{2} + \lambda \|z\|\_{0,w}.\tag{6}$$

The tradeoff between the data-fidelity constraint and the weighted sparsity prior is balanced via the choice of the regularization parameter *λ*. Applying the same rationale employed in OMP for the greedy index selection and replacing *G*<sup>0</sup> with *Gλ* leads to Algorithm 1, which corresponds to OMP when *λ* = 0 and *w* = 1.

#### **Algorithm 1** Weighted orthogonal matching pursuit (WOMP) **Inputs:**


#### **Procedure:**

$$\begin{array}{c} 1. \ \ \mathsf{Let } \hat{x}\_0 = 0 \ \text{and} \ \stackrel{\cdot}{\\_0} = \emptyset; \end{array}$$

	- a. Find *jk* <sup>∈</sup> arg max *<sup>j</sup>*∈[*N*] *λ(xk*−<sup>1</sup>*, Sk*−1*,j)*, with *λ* as in (7);
	- b. Define *Sk* = *Sk*−<sup>1</sup> ∪ {*jk*};
	- c. Compute *<sup>x</sup>*ˆ*<sup>k</sup>* <sup>∈</sup> arg min *<sup>v</sup>*∈C*<sup>N</sup> Av* <sup>−</sup> *<sup>y</sup>*<sup>2</sup> s.t. supp*(v)* <sup>⊆</sup> *Sk*.

#### **Output:**

• *<sup>x</sup>*ˆ*<sup>K</sup>* <sup>∈</sup> <sup>C</sup>*<sup>N</sup>* : approximate solution to *Az* <sup>=</sup> *<sup>y</sup>*.

*Remark 1* The 2-normalization of the columns of *A* is a necessary condition to apply Algorithm 1. If *A* does not satisfy this hypothesis, is suffices to apply WOMP to the normalized system *Az* <sup>3</sup> <sup>=</sup> *<sup>y</sup>*, where *<sup>A</sup>* <sup>3</sup><sup>=</sup> *AM*−<sup>1</sup> and *<sup>M</sup>* is the matrix containing the <sup>2</sup> norms of the columns of *A* on the main diagonal and zeroes elsewhere. The approximate solution *<sup>x</sup>*ˆ*<sup>K</sup>* to *Az* <sup>3</sup> <sup>=</sup> *<sup>y</sup>* computed via WOMP is then rescaled as *Mx*ˆ*K*, which approximately solves *Az* = *y*.

The following proposition justifies the weighted variant of OMP considered in Algorithm 1. In order to minimize *Gλ* as much as possible, at each iteration, WOMP selects the index *j* that maximizes the quantity *λ(x, S, j )* defined in (7). The following proposition makes the role of the quantity *λ(x, S, j )* transparent, generalizing relation (5) to the weighted case, under suitable conditions on *A* and *x* that are verified at each iteration of Algorithm 1.

**Proposition 1** *Let <sup>λ</sup>* <sup>≥</sup> <sup>0</sup>*, <sup>S</sup>* ⊆ [*N*]*, <sup>A</sup>* <sup>∈</sup> <sup>C</sup>*m*×*<sup>N</sup> with* 2*-normalized columns, and <sup>x</sup>* <sup>∈</sup> <sup>C</sup>*<sup>N</sup> satisfying*

$$\alpha \in \arg\min\_{z \in \mathbb{C}^N} \|y - Az\|\_2 \text{ s.t. } \text{supp}(z) \subseteq S.$$

*Then, for every j* ∈ [*N*]*, the following holds:*

$$\min\_{\mathbf{x}\in\mathbb{C}} G\_{\lambda}(\mathbf{x} + te\_{j}) = G\_{\lambda}(\mathbf{x}) - \Delta\_{\lambda}(\mathbf{x}, S, j),$$

*where Gλ is defined as in* (6)*, λ* : <sup>C</sup>*<sup>N</sup>* <sup>×</sup> <sup>2</sup>[*N*] × [*N*] → <sup>R</sup> *is defined by*

$$\Delta\_{\lambda}(\mathbf{x}, S, j) := \begin{cases} \max \left\{ |(A^\*(\mathbf{y} - Ax))\_j|^2 - \lambda w\_j^2, \ 0 \right\} & \text{if } j \notin S \\ \max \left\{ \lambda w\_j^2 - |\mathbf{x}\_j|^2, \ 0 \right\} & \text{if } j \in S \text{ and } \mathbf{x}\_j \neq \mathbf{0} \\ 0 & \text{if } j \in S \text{ and } \mathbf{x}\_j = \mathbf{0}. \end{cases} \tag{7}$$

*Proof* Throughout the proof, we will denote the residual as *r* := *y* − *Ax*. Let us first assume *j /*∈ *S*. In this case, we compute

$$\begin{aligned} G\_{\lambda}(\mathbf{x} + te\_{j}) &= \|\mathbf{y} - A(\mathbf{x} + te\_{j})\|\_{2}^{2} + \lambda \|\mathbf{x} + te\_{j}\|\_{0,w} \\ &= \|r\|\_{2}^{2} + \underbrace{|t|^{2} - 2\operatorname{Re}(\bar{t}(A^{\*}r)\_{j}) + \lambda(1 - \delta\_{l,0})w\_{j}^{2}}\_{=:h(t)} + \lambda\|\mathbf{x}\|\_{0,w}, \end{aligned}$$

where *δx,y* is the Kronecker delta function. In particular, we have

$$h(t) = \begin{cases} 0 & \text{if } t = 0\\ \left| t \right|^2 - 2\operatorname{Re}(\bar{t}(A^\*r)\_f) + \lambda w\_j^2 & \text{if } t \in \mathbb{C} \ \{0\} .\end{cases}$$

Now, if *(A*∗*r)j* = 0, then *h(t)* is minimized for *t* = 0 and min*t*∈<sup>C</sup> *G(x* + *tej )* = *G(x)*. On the other hand, if *(A*∗*r)j* = 0, by arguing similarly to [7, Lemma 3.3], we see that

$$\min\_{q \in \mathbb{C} \backslash \{0\}} h(t) = -|(A^\*r)\_j|^2 + \lambda w\_j^2,$$

where the minimum is realized for some *<sup>t</sup>* <sup>∈</sup> <sup>C</sup> with <sup>|</sup>*t*|=|*(A*∗*r)j* <sup>|</sup> = 0. In summary,

$$\min\_{t \in \mathbb{C}} h(t) = \min \left\{ -|(A^\*r)\_j|^2 + \lambda w\_j^2, 0 \right\} = -\max \left\{ |(A^\*r)\_j|^2 - \lambda w\_j^2, 0 \right\},$$

which concludes the case *j /*∈ *S*.

Now, assume *<sup>j</sup>* <sup>∈</sup> *<sup>S</sup>*. Since the vector *xS* <sup>=</sup> *<sup>x</sup>*|*<sup>S</sup>* <sup>∈</sup> <sup>C</sup>|*S*<sup>|</sup> is a least-squares solution to *ASz* = *y*, it satisfies *A*<sup>∗</sup> *<sup>S</sup>(y* − *ASxS)* = 0 and, in particular, *(A*∗*r)j* = 0. (Here, *AS* <sup>∈</sup> <sup>C</sup>*m*×|*S*<sup>|</sup> denotes the submatrix of *<sup>A</sup>* corresponding to the columns in *<sup>S</sup>*). Therefore, arguing similarly as before, we have

$$G(\mathbf{x} + te\_j) = \|r\|\_2^2 + \underbrace{|t|^2 + \lambda(1 - \delta\_{\mathbf{l}, -\mathbf{x}\_j})w\_j^2}\_{=:\ell(t)} + \lambda\|\mathbf{x} - \mathbf{x}\_j e\_j\|\_{0, w}.$$

Considering only the terms depending on *t*, it is not difficult to see that

$$\min\_{\mathbf{r}\in\mathbb{C}}\ell(t) = \min\{|\mathbf{x}\_j|^2, \lambda w\_j^2\}.$$

As a consequence, for every *j* ∈ *S*, we obtain

$$\min\_{t \in \mathbb{C}} G(\mathbf{x} + te\_j) = \left\| r \right\|\_2^2 + \lambda \left\| \mathbf{x} - \mathbf{x}\_j e\_j \right\|\_{0, w} + \min \{ \left\| \mathbf{x}\_j \right\|^2, \lambda w\_j^2 \} $$

$$= G(\mathbf{x}) + \min \{ \left\| \mathbf{x}\_j \right\|^2, \lambda w\_j^2 \} - \lambda (1 - \delta\_{\mathbf{x}\_j, 0}) w\_j^2.$$

The results above combined with simple algebraic manipulations lead to the desired result. %&

#### **4 Numerical Results**

In this section, we show the effectiveness of WOMP (Algorithm 1) in the sparse multivariate function approximation setting described in Sect. 2. In particular, we choose the weights *w* as in (4). We consider the function

$$f(t) = \ln\left(d + 1 + \sum\_{k=1}^{d} t\_k\right), \quad \text{with } d = 10. \tag{8}$$

We let {*φj* }*j*∈N*<sup>d</sup>* <sup>0</sup> be the Legendre and Chebyshev bases and *<sup>ν</sup>* be the respective orthogonality measure. In Figs. 1 and 2 we show the relative *L*<sup>2</sup> *<sup>ν</sup> (D)*-error of the approximate solution *x*ˆ*<sup>K</sup>* computed via WOMP as a function of iteration *K*, for different values of the regularization parameter *λ* in order to solve the linear system *Az* <sup>=</sup> *<sup>y</sup>*, where *<sup>A</sup>* and *<sup>y</sup>* are defined by (2) and where the 2-normalization of the columns of *A* is taken into account according to Remark 1. We consider *λ* = 0 (corresponding to OMP) and *<sup>λ</sup>* <sup>=</sup> <sup>10</sup>−*k*, with *<sup>k</sup>* <sup>=</sup> <sup>3</sup>*,* <sup>3</sup>*.*5*,* <sup>4</sup>*,* <sup>4</sup>*.*5*,* 5. Here, *\** is the hyperbolic cross of order *s* = 10, corresponding to *N* = |*\**| = 571. Moreover, we consider *<sup>m</sup>* <sup>=</sup> 60 and *<sup>m</sup>* <sup>=</sup> 80. The results are averaged over 25 runs and the *<sup>L</sup>*<sup>2</sup> *<sup>ν</sup> (D)* error is computed with respect to a reference solution approximated via least squares

**Fig. 1** Plot of the mean relative *L*<sup>2</sup> *<sup>ν</sup> (D)*-error as a function of the number of iterations *K* of WOMP (Algorithm 1) for different values of the regularization parameter *λ* for the approximation of the function *f* defined in (8) and using Legendre polynomials. The accuracy of WOMP is compared with those of QCBP and WQCBP

**Fig. 2** The same experiment as in Fig. 1, with Chebyshev polynomials

and using 20*N* = 11*,*420 random i.i.d. samples according to *ν*. We compare the WOMP accuracy with the accuracy obtained via the QCBP program (3) with *η* = 0 and WQCBP with tolerance parameter *<sup>η</sup>* <sup>=</sup> <sup>10</sup>−8. To solve these two programs we use CVX Version 1.2, a package for specifying and solving convex programs [8, 9]. In CVX, we use the solver 'mosek' and we set CVX precision to 'high'.

Figures 1 and 2 show the benefits of using weights as compared to the unweighted OMP approach, when the parameter *λ* is tuned appropriately. A good choice of *λ* for the setting considered here seem to be between 10−4*.*<sup>5</sup> and 10−3*.*5. We also observe that WOMP is able to reach similar level of accuracy as WQCBP. An interesting feature of WOMP with respect to OMP is its better stability. We observe than after the *m*-th iteration, the OMP accuracy starts getting substantially worse. This can be explained by the fact that when *K* approaches *N*, OMP tends to destroy sparsity by fitting the data too much. This phenomenon is not observed in WOMP, thanks to its

**Fig. 3** Plot of the support size of *x*ˆ*<sup>K</sup>* as a function of the number of iterations *K* for WOMP in the same setting as in Figs. 1 and 2, with Legendre (left) and Chebyshev (right) polynomials. The larger the regularization parameter *λ*, the sparser solution (in the left plot, the curves relative to *<sup>λ</sup>* <sup>=</sup> <sup>10</sup>−4*.*<sup>5</sup> and *<sup>λ</sup>* <sup>=</sup> <sup>10</sup>−<sup>4</sup> overlap. In the right plot, the same happens for *<sup>λ</sup>* <sup>=</sup> <sup>10</sup>−<sup>4</sup> and *<sup>λ</sup>* <sup>=</sup> <sup>10</sup>−3*.*5)


**Table 1** Comparison of the computing times for WQCBP and *K* = 25 iterations of WOMP

ability to keep the support of *x*ˆ*<sup>k</sup>* small via the explicit enforcement of the weighted sparsity prior (see Fig. 3).

We show the better computational efficiency of WOMP with respect to the convex minimization programs QCBP and WQCBP solved via CVX by tracking the runtimes for the different approaches. In Table 1 we show the running times for the different recovery strategies. The running times for WOMP are referred to *K* = 25 iterations, sufficient to reach the best accuracy for every value of *λ* as shown in Figs. 1 and 2. Moreover, the computational times for WOMP take into account the 2-normalization of the columns of *A* (see Remark 1). WOMP consistently outperforms convex minimization, being more than ten times faster in all cases. We note that in this comparison a key role is played by the parameter *K* or, equivalently, by the sparsity of the solution. Indeed, in this case, considering a larger value of *K* would result is a slower performance of WOMP, but it would not improve the accuracy of the WOMP solution (see Figs. 1 and 2).

#### **5 Conclusions**

We have considered a greedy recovery strategy for high-dimensional function approximation from a small set of pointwise samples. In particular, we have proposed a generalization of the OMP algorithm to the setting of weighted sparsity (Algorithm 1). The corresponding greedy selection strategy is derived in Proposition 1.

Numerical experiments show that WOMP is an effective strategy for highdimensional approximation, able to reach the same accuracy level of WQCBP while being considerably faster when the target sparsity level is small enough. A key role is played by the regularization parameter *λ*, which may be difficult to tune due to its sensitivity to the parameters of the problem (*m*, *s*, and *d*), and on the polynomial basis employed. In other applications, where explicit formulas for the weights as (4) are not available, there might also be a nontrivial interplay between *λ* and *w*. In summary, despite the promising nature of the numerical experiments illustrated in this paper, a more extensive numerical investigation is needed in order to study the sensitivity of WOMP with respect to *λ*. Moreover, a theoretical analysis of the WOMP approach might highlight practical recipe for the choice of this parameter, similarly to [2]. This type of analysis may also help identifying the sparsity regime where WOMP outperforms weighted <sup>1</sup> minimization, which, in turn, could be formulated in terms of suitable assumptions on the regularity of *f* . These questions are beyond the scope of this paper and will be object of future work.

**Acknowledgements** The authors acknowledge the support of the Natural Sciences and Engineering Research Council of Canada through grant number 611675, and of the Pacific Institute for the Mathematical Sciences (PIMS) Collaborative Research Group "High-Dimensional Data Analysis". S.B. also acknowledges the support of the PIMS Postdoctoral Training Centre in Stochastics.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **On the Convergence Rate of Hermite-Fejér Interpolation**

**Shuhuang Xiang and Guo He**

### **1 Introduction**

For an arbitrarily given system of points

$$\{\mathbf{x}\_1^{(n)}, \mathbf{x}\_2^{(n)}, \dots, \mathbf{x}\_n^{(n)}\}\_{n=1}^{\infty},\tag{1}$$

Faber [3] in 1914 showed that there exists a continuous function *f (x)* in [−1*,* 1] for which the Lagrange interpolation sequence *Ln*[*f* ] (*n* = 1*,* 2*,...*) is not uniformly convergent to *<sup>f</sup>* in [−1*,* <sup>1</sup>], where *ωn(x)* <sup>=</sup> *(x* <sup>−</sup> *<sup>x</sup>(n)* <sup>1</sup> *)(x* <sup>−</sup> *<sup>x</sup>(n)* <sup>2</sup> *)*···*(x* <sup>−</sup> *<sup>x</sup>(n) <sup>n</sup> )*

$$L\_n[f](\mathbf{x}) = \sum\_{k=1}^n f(\mathbf{x}\_k^{(n)}) \ell\_k^{(n)}(\mathbf{x}), \; \ell\_k^{(n)}(\mathbf{x}) = \frac{o\_n(\mathbf{x})}{o\_n'(\mathbf{x}\_k^{(n)}) (\mathbf{x} - \mathbf{x}\_k^{(n)})}.\tag{2}$$

Whereas, based on the Chebyshev pointsystem

$$\mathbf{x}\_k^{(n)} = \cos\left(\frac{2k - 1}{2n}\pi\right), \quad k = 1, 2, \ldots, n, \quad n = 1, 2, \ldots, \tag{3}$$

S. Xiang

G. He (-)

Department of Mathematics, Jinan University, Guangzhou, Guangdong, China

© The Author(s) 2020

623

Department of Mathematics, Central South University, Changsha, China e-mail: xiangsh@mail.csu.edu.cn

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_50

**Fig. 1** *H*2*n*−1*(f, x)* − *f (x)*∞, *Ln(f, x)* − *f (x)*∞ and *H*<sup>∗</sup> <sup>2</sup>*n*−1*(f, x)* <sup>−</sup> *f (x)*∞at *<sup>x</sup>* = −<sup>1</sup> : <sup>0</sup>*.*<sup>001</sup> : 1 by using Chebyshev pointsystem (3) for *f (x)* <sup>=</sup> sin*(x)*, *f (x)* <sup>=</sup> <sup>1</sup> <sup>1</sup>+25*x*<sup>2</sup> and *f (x)* = |*x*<sup>|</sup> 3, respectively

Fejér [4] in 1916 proved that if *f* ∈ *C*[−1*,* 1], then there is a unique polynomial *H*2*n*−1*(f, x)* of degree at most 2*n* − 1 such that lim*n*→∞ *H*2*n*−1*(f )* − *f* ∞ = 0, where *H*2*n*−1*(f, x)* is determined by

$$H\_{2n-1}(f, \mathbf{x}\_k^{(n)}) = f(\mathbf{x}\_k^{(n)}), \quad H\_{2n-1}'(f, \mathbf{x}\_k^{(n)}) = 0, \quad k = 1, 2, \ldots, n. \tag{4}$$

This polynomial is known as the Hermite-Fejér interpolation polynomial.

It is of particular notice that the above Hermite-Fejér interpolation polynomial converges much slower compared with the corresponding Lagrange interpolation polynomial at the Chebyshev pointsystem (3) (see Fig. 1).

To get fast convergence, the following Hermite-Fejér interpolation of *f (x)* at nodes (1) is considered [6, 7]:

$$H\_{2n-1}^{\*}(f, \mathbf{x}) = \sum\_{k=1}^{n} f(\mathbf{x}\_k^{(n)}) h\_k^{(n)}(\mathbf{x}) + \sum\_{k=1}^{n} f'(\mathbf{x}\_k^{(n)}) b\_k^{(n)}(\mathbf{x}),\tag{5}$$

where *h(n) <sup>k</sup> (x)* <sup>=</sup> *<sup>v</sup>(n) <sup>k</sup> (x) (n) <sup>k</sup> (x)*<sup>2</sup> , *b(n) <sup>k</sup> (x)* <sup>=</sup> *(x* <sup>−</sup> *<sup>x</sup>(n) <sup>k</sup> ) (n) <sup>k</sup> (x)*<sup>2</sup> and *v(n) <sup>k</sup> (x)* = <sup>1</sup> <sup>−</sup> *(x* <sup>−</sup> *<sup>x</sup>(n) <sup>k</sup> ) ω n(x(n) <sup>k</sup> ) ω n(x(n) <sup>k</sup> ) .*

Fejér [5] and Grünwald [7] also showed that the convergence of the Hermite-Fejér interpolation of *f (x)* also depends on the choice of the nodes. The pointsystem (1) is called normal if for all *n*

$$v\_k^{(n)}(\mathbf{x}) \ge 0, \quad k = 1, 2, \dots, n, \quad \mathbf{x} \in [-1, 1], \tag{6}$$

while the pointsystem (1) is called strongly normal if for all *n*

$$v\_k^{(\eta)}(\mathbf{x}) \ge c > 0, \quad k = 1, 2, \dots, n, \quad \mathbf{x} \in [-1, 1] \tag{7}$$

for some positive constant *c*.

Fejér [5] (also see Szegö [12, pp 339]) showed that for the zeros of Jacobi polynomial *P(α,β) <sup>n</sup> (x)* of degree *n* (*α >* −1, *β >* −1)

$$v\_k^{(n)}(\mathbf{x}) \ge \min\{-\alpha, -\beta\} \quad \text{for } -1 < \alpha \le 0, -1 < \beta \le 0, k = 1, 2, \dots, n \text{ and } \mathbf{x} \in [-1, 1].$$

For (strongly) normal pointsystems, Grünwald [7] showed that for every *f* ∈ *<sup>C</sup>*1*(*−1*,* <sup>1</sup>*)*, lim*n*→∞ *H*<sup>∗</sup> <sup>2</sup>*n*−1*(f )* <sup>−</sup> *<sup>f</sup>* ∞ <sup>=</sup> 0 if {*x(n) <sup>k</sup>* } is strongly normal satisfying (7) and {*f (x(n) <sup>k</sup> )*} satisfies

$$|f'(\mathbf{x}\_k^{(n)})| < n^{c-\delta} \quad \text{for some given positive number } \delta, \quad k = 1, 2, \dots, \quad n = 1, 2, \dots, \delta$$

while lim*n*→∞ *H*<sup>∗</sup> <sup>2</sup>*n*−1*(f )* <sup>−</sup> *<sup>f</sup>* ∞ <sup>=</sup> 0 in [−<sup>1</sup> <sup>+</sup> *#,* <sup>1</sup> <sup>−</sup> *#*] for each fixed 0 *<#<* <sup>1</sup> if {*x(n) <sup>k</sup>* } is normal and {*f (x(n) <sup>k</sup> )*} is uniformly bounded for *n* = 1*,* 2*,...*. 1

Moreover, Szabados [11] showed the convergence of the Hermite-Fejér interpolation (5) at the Chebyshev pointsystem (3) satisfies

$$\|f - H\_{2n-1}^\*(f)\|\_{\infty} = O(1) \|f - p^\*\|\_{C^1[-1,1]} \tag{8}$$

where *p*<sup>∗</sup> is the best approximation polynomial of *f* with degree at most 2*n*−1 and *<sup>f</sup>* <sup>−</sup> *<sup>p</sup>*∗*C*1[−1*,*1] <sup>=</sup> max0≤*j*≤<sup>1</sup> *<sup>f</sup> (j )* <sup>−</sup> *<sup>p</sup>*∗*(j )*∞.

Hermite-Fejér interpolation has plenty of use in computer geometry aided geometric design with boundary conditions including derivative information. The convergence rate under the infinity norm has been extensively studied in [5– 7, 11, 14]. The efficient algorithm on the fast implementation of Hermite-Fejér interpolation at zeros of Jacobi polynomial can be found in [17].

In this paper, the following convergence rates of Hermite-Fejér interpolation *H*∗ <sup>2</sup>*n*−1*(f, x)* at Gauss-Jacobi pointsystems are considered.

• If *f* is analytic in *E<sup>ρ</sup>* with |*f (z)*| ≤ *M*, then

$$\|f(\mathbf{x}) - H\_{2n-1}^\*(f, \mathbf{x})\|\_{\infty} = \begin{cases} O\left(\frac{4\pi\_n M[2n\rho^2 + (1 - 2n)\rho]}{(\rho - 1)^2 \rho^{2n}}\right), \; \mathbf{y} \le \mathbf{0}, \\ O\left(\frac{n^{2 + 2\gamma}[2n\rho^2 + (1 - 2n)\rho]}{(\rho - 1)^2 \rho^{2n}}\right), \; \mathbf{y} > \mathbf{0} \end{cases}, \; \mathbf{y} = \max\{\alpha, \beta\}. \tag{9}$$

<sup>1</sup>In fact, Grünwald in [7] considered more general cases with any vector {*d(n) <sup>k</sup>* } instead of {*f (x(n) <sup>k</sup> )*}.

where

$$\pi\_n = \begin{cases} O(n^{-1.5-\min\{\alpha,\beta\}}\log n), \text{ if } -1 < \min\{\alpha,\beta\} \le \chi \le -\frac{1}{2} \\ O(n^{2\gamma-\min\{\alpha,\beta\}-\frac{1}{2}}), & \text{ if } -1 < \min\{\alpha,\beta\} \le -\frac{1}{2} < \chi \le 0. \\ O(n^{2\gamma}), & \text{ if } -\frac{1}{2} < \min\{\alpha,\beta\} \le \chi \end{cases} \tag{10}$$

• If *f (x)* has an absolutely continuous*(r* <sup>−</sup>1*)*st derivative *<sup>f</sup> (r*−1*)* on [−1*,* <sup>1</sup>] for an integer *<sup>r</sup>* <sup>≥</sup> 3, and a *<sup>r</sup>*th derivative *<sup>f</sup> (r)* of bounded variation *Vr* <sup>=</sup> Var*(f (r)) <* ∞, then

$$\|f(\mathbf{x}) - H\_{2n-1}^\*(f, \mathbf{x})\|\_{\infty} = \begin{cases} O\left(n^{-r} \log n\right), \,\nu \le -\frac{1}{2}, \\ O\left(n^{2\nu - r + 1}\right), \,\,\nu > -\frac{1}{2}, \end{cases} \tag{11}$$

while if *f (x)* is differentiable and *f (x)* is bounded on [−1*,* 1], then

$$\|\|f(\mathbf{x}) - H\_{2n-1}^\*(f, \mathbf{x})\|\infty = \begin{cases} O\left(n^{-1} \log n\right), \,\chi \le -\frac{1}{2}, \\ O\left(n^{2\mathcal{V}}\right), \qquad \chi > -\frac{1}{2}. \end{cases}$$

Comparing these results with

$$f(\boldsymbol{x}) - H\_{2n-1}(f, \boldsymbol{x}) = \begin{cases} O\left(n^{-1} \log n\right), & \text{if } \boldsymbol{\gamma} \le -\frac{1}{2} \\ O\left(n^{2\boldsymbol{\gamma}}\right), & \text{if } \boldsymbol{\gamma} > -\frac{1}{2} \end{cases}, \text{ (Vértsi [14]),}$$

which is sharp and attainable (see Fig. 2), we see that *H*∗ <sup>2</sup>*n*−1*(f, x)* converges much faster than *H*2*n*−1*(f, x)* for analytic functions or functions of higher regularities (see Fig. 1). Particularly, *H*2*n*−1*(f, x)* diverges at Gauss-Jacobi pointsystems with *γ* ≥ 0, whereas, *H*∗ <sup>2</sup>*n*−1*(f, x)* converges for functions analytic in the Bernstein ellipse or of finite limited regularity.

**Fig. 2** *H*2*n*−1*(f, x)* − *f (x)*∞ at *x* = −1 : 0*.*001 : 1 by using Gauss-Jacobi pointsystem for *f (x)* = |*x*| with different *α* and *β*, respectively

For simplicity, in the following we abbreviate *x(n) <sup>k</sup>* as *xk*, *(n) <sup>k</sup> (x)* as *k(x)*, *h(n) <sup>k</sup> (x)* as *hk(x)*, and *b(n) <sup>k</sup> (x)* as *bk(x)*. *A* ∼ *B* denotes there exist two positive constants *c*<sup>1</sup> and *c*<sup>2</sup> such that *c*<sup>1</sup> ≤ |*A*|*/*|*B*| ≤ *c*2*.*

#### **2 Main Results**

Suppose *f (x)* satisfies a Dini-Lipschitz condition on [−1*,* 1], then it has the following absolutely and uniformly convergent Chebyshev series expansion

$$f(\mathbf{x}) = \sum\_{j=0}^{\infty} \!' c\_j T\_j(\mathbf{x}), \quad c\_j = \frac{2}{\pi} \int\_{-1}^{1} \frac{f(\mathbf{x}) T\_j(\mathbf{x})}{\sqrt{1 - \mathbf{x}^2}} d\mathbf{x}, \quad j = 0, 1, \ldots \tag{12}$$

where the prime denotes summation whose first term is halved, *Tj (x)* = cos*(j* cos−<sup>1</sup> *x)* denotes the Chebyshev polynomial of degree *j* .

#### **Lemma 1**

*(i) (Bernstein [2]) If f is analytic with* |*f (z)*| ≤ *M in the region bounded by the ellipse E<sup>ρ</sup> with foci* ±1 *and major and minor semiaxis lengths summing to ρ >* 1*, then for each j* ≥ 0*,*

$$|c\_{j}| \leq \frac{2M}{\rho^{j}}.\tag{13}$$

*(ii) (Trefethen [13]) For an integer r* ≥ 1*, if f (x) has an absolutely continuous (r* <sup>−</sup> <sup>1</sup>*)st derivative <sup>f</sup> (r*−1*) on* [−1*,* <sup>1</sup>] *and a <sup>r</sup>th derivative <sup>f</sup> (r) of bounded variation Vr* <sup>=</sup> Var*(f (r)) <* <sup>∞</sup>*, then for each <sup>j</sup>* <sup>≥</sup> *<sup>r</sup>* <sup>+</sup> <sup>1</sup>*,*

$$|c\_{j}| \le \frac{2V\_{r}}{\pi j(j-1)\cdots(j-r)}.\tag{14}$$

Suppose −1 *< xn < xn*−<sup>1</sup> *<* ··· *< x*<sup>1</sup> *<* 1 in decreasing order are the roots of *P(α,β) <sup>n</sup> (x)* (*α, β >* −1), and {*wj* } *n <sup>j</sup>*=<sup>1</sup> are the corresponding weights in the Gauss-Jacobi quadrature.

**Lemma 2** *For j* = 1*,* 2*,...,n, it follows*

$$\sigma(\mathbf{x} - \mathbf{x}\_j)\ell\_j(\mathbf{x}) = \sigma\_n(-1)^j \frac{\sqrt{(1 - \chi\_j^2)w\_j}}{2^{(\alpha + \beta + 1)/2}} \sqrt{\frac{n!\Gamma(n + \alpha + \beta + 1)}{\Gamma(n + \alpha + 1)\Gamma(n + \beta + 1)}} P\_n^{(\alpha, \beta)}(\mathbf{x}),\tag{15}$$

*where σn* = +1 *for even n and σn* = −1 *for odd n.*

*Proof* Let *zn* <sup>=</sup> ; <sup>1</sup> −1*(*1−*x)α(*1+*x)β*[*P(α,β) <sup>n</sup> (x)*] <sup>2</sup>*dx* and *Kn* the leading coefficient of *P(α,β) <sup>n</sup> (x)*. From Abramowitz and Stegun [1], we have

$$z\_n = \frac{2^{\alpha+\beta+1}}{2n+\alpha+\beta+1} \cdot \frac{\Gamma(n+\alpha+1)\Gamma(n+\beta+1)}{n!\Gamma(n+\alpha+\beta+1)}, \quad K\_n = \frac{1}{2^n} \frac{\Gamma(2n+\alpha+\beta+1)}{n!\Gamma(n+\alpha+\beta+1)}.$$

Furthermore, by Szegö [12, (15.3.1)] (also see Wang et al. [15]), we obtain

$$\begin{aligned} (\mathbf{x} - \mathbf{x}\_j)\ell\_j(\mathbf{x}) &= \frac{1}{\omega\_n'(\mathbf{x}\_j)} \omega\_n(\mathbf{x}) = \sigma\_n(-1)^j \sqrt{\frac{K\_n^2 2n(1 - \mathbf{x}\_j^2) w\_j}{2n(2n + \alpha + \beta + 1)z\_n}} \alpha\_n(\mathbf{x}) \\ &= \sigma\_n(-1)^j \sqrt{\frac{(1 - \mathbf{x}\_j^2) w\_j}{z\_n(2n + \alpha + \beta + 1)}} P\_n^{(\alpha, \beta)}(\mathbf{x}), \end{aligned}$$

which implies the desired result (15). %&

**Lemma 3** *For j* = 1*,* 2*,...,n, it follows*

$$(1 - x\_j^2)w\_j = \mathcal{O}\left(n^{-1}\right). \tag{16}$$

$$\begin{aligned} \text{Proof } \text{ From } w\_j &= O\left(\frac{2^{\alpha+\beta+1}\pi}{n} \left(\sin\frac{\theta\_j}{2}\right)^{2\alpha+1} \left(\cos\frac{\theta\_j}{2}\right)^{2\beta+1}\right) \text{Szegő [12, (15.3.10)],} \\ \text{we see for } &x\_j = \cos\theta\_j \text{ that } (1-x\_j^2)w\_j = O\left(\frac{2^{\alpha+\beta+3}\pi}{n} \left(\sin\frac{\theta\_j}{2}\right)^{2\alpha+3} \left(\cos\frac{\theta\_j}{2}\right)^{2\beta+3}\right), \\ \text{which derives the desired result.} &\qquad \boxed{} \end{aligned}$$

**Lemma 4 ([10, 16])** *For t* ∈ [−1*,* 1]*, let xm be the root of the Jacobi polynomial P(α,β) <sup>n</sup> which is closest to t. Then for k* = 1*,* 2*,...,n, we have*

$$\ell\_k(t) = \begin{cases} O\left( |k-m|^{-1} + |k-m|^{\mathcal{V}-\frac{1}{2}} \right), & k \neq m \\ O(1) & k=m \end{cases}, \quad \mathcal{V} = \max\{\alpha, \beta\}. \tag{17}$$

**Lemma 5 (Szegö [12, Theorem 8.1.2])** *Let α, β be real but not necessarily greater than* −1 *and xk* = cos *θk. Then for each fixed k, it follows*

$$\lim\_{n \to \infty} n\theta\_k = j\_k,\tag{18}$$

*where jk is the kth positive zero of Bessel function Jα.*

**Lemma 6** *For k* = 1*,* 2*,...,n, it follows*

$$w\_k(\mathbf{x}) = 1 - (\mathbf{x} - \mathbf{x}\_k) \frac{\omega\_n''(\mathbf{x}\_k)}{\omega\_n'(\mathbf{x}\_k)} = O(n^2). \tag{19}$$

$$\square$$

*Proof* Note that *P(α,β) <sup>n</sup> (x)* satisfies the second order linear homogeneous Sturm-Liouville differential equation [12, (4.2.1)]

$$(1 - \mathbf{x}^2)\mathbf{y}'' + (\boldsymbol{\beta} - \boldsymbol{\alpha} - (\boldsymbol{\alpha} + \boldsymbol{\beta} + 2)\mathbf{x})\mathbf{y}' + n(n + \boldsymbol{\alpha} + \boldsymbol{\beta} + 1)\mathbf{y} = \mathbf{0}.$$

By  $\omega\_n(\mathbf{x}) = \frac{p\_n^{(\alpha, \beta)}(\mathbf{x})}{K\_n}$ , we get 
$$\frac{\omega\_n''(\mathbf{x}\_j)}{\omega\_n'(\mathbf{x}\_j)} = -\frac{\beta - \alpha - (\alpha + \beta + 2)\mathbf{x}\_j}{1 - \mathbf{x}\_j^2} \tag{12.1(14.5.1)}\tag{20.1}$$

In addition, by Lemma <sup>5</sup> with *xj* <sup>=</sup> cos *θj* , we see that *<sup>θ</sup>*<sup>1</sup> <sup>∼</sup> <sup>1</sup> *<sup>n</sup>* . Similarly, by *P(α,β) <sup>n</sup> (*−*x)* <sup>=</sup> *(*−1*)nP(β,α) <sup>n</sup> (x)* we have *θn* <sup>∼</sup> <sup>1</sup> *<sup>n</sup>* . These together yield

$$\frac{1}{1-\mathbf{x}\_1^2} = O(n^2), \quad \frac{1}{1-\mathbf{x}\_n^2} = O(n^2), \quad \frac{1}{1-\mathbf{x}\_j^2} \le \max\left(\frac{1}{1-\mathbf{x}\_1^2}, \frac{1}{1-\mathbf{x}\_n^2}\right) = O(n^2)$$

and then by (20) it deduces the desired result. %&

**Theorem 1** *Suppose* {*xj* } *n <sup>j</sup>*=<sup>1</sup> *are the roots of <sup>P</sup>(α,β) <sup>n</sup> (x) with α, β >* −1*, then the Hermite-Fejér interpolation (5) for <sup>f</sup> analytic in <sup>E</sup><sup>ρ</sup> with* <sup>|</sup>*f (z)*| ≤ *<sup>M</sup> at* {*xj* }*<sup>n</sup> j*=1 *has the convergence rate (9).*

*Proof* Since the Chebyshev series expansion of *f (x)* is uniformly convergent under the assumptions, and the error of Hermite-Fejér interpolation (5) on Chebyshev polynomials satisfies |*E(Tj ,x)*|=|*Tj (x)* − *H*<sup>∗</sup> <sup>2</sup>*n*−<sup>1</sup>*(Tj ,x)*| = 0 for *<sup>j</sup>* <sup>=</sup> 0*,* 1*,...,* 2*n* − 1, then it yields

$$|E(f, \mathbf{x})| = |f(\mathbf{x}) - H\_{2n-1}^\*(f, \mathbf{x})| = |\sum\_{j=0}^{\infty} c\_j E(T\_j, \mathbf{x})| \le \sum\_{j=2n}^{\infty} |c\_j| |E(T\_j, \mathbf{x})|.\tag{21}$$

Furthermore, <sup>|</sup>*E(Tj ,x)*|=|*Tj (x)* <sup>−</sup><*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Tj (xi)hi(x)* <sup>−</sup><*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *<sup>T</sup> <sup>j</sup> (xi)bi(x)*|. In the following, we will focus on estimates of |*E(Tj ,x)*| for *j* ≥ 2*n*.

In the case *γ* ≤ 0: Notice that the pointsystem is normal which implies *hi(x)* ≥ 0 for all *i* = 1*,* 2*,...,n* and for all *x* ∈ [−1*,* 1],

$$1 \equiv \sum\_{l=1}^{n} h\_l(\mathbf{x}) = \sum\_{l=1}^{n} v\_l(\mathbf{x}) \ell\_l^2(\mathbf{x}).$$

Then we have

$$|\sum\_{l=1}^{n} T\_j(\mathbf{x}\_l) h\_l(\mathbf{x})| \le \sum\_{l=1}^{n} h\_l(\mathbf{x}) = 1, \quad j = 0, 1, \dots \tag{22}$$

$$\mathbf{u}$$

Additionally, by Lemma 2, it obtains for *j* = 2*n,* 2*n* + 1*,...* that

$$\begin{array}{l} \left| \sum\_{l=1}^{n} T\_{j}'(\mathbf{x}\_{l}) b\_{l}(\mathbf{x}) \right| \\ = j \left| \sum\_{l=1}^{n} U\_{j-1}(\mathbf{x}\_{l}) (\mathbf{x} - \mathbf{x}\_{l}) \ell\_{j}^{2}(\mathbf{x}) \right| \\ = \frac{j}{2^{(\alpha+\beta+1)/2}} \sqrt{\frac{n! \Gamma(n+\alpha+\beta+1)}{\Gamma(n+\alpha+1) \Gamma(n+\beta+1)}} \left| P\_{n}^{(\alpha,\beta)}(\mathbf{x}) \sum\_{l=1}^{n} U\_{j-1}(\mathbf{x}\_{l}) \sqrt{(1-\mathbf{x}\_{l}^{2})} \ell\_{l}(\mathbf{x}) \right| \\ = \frac{j}{2^{(\alpha+\beta+1)/2}} \sqrt{\frac{n! \Gamma(n+\alpha+\beta+1)}{\Gamma(n+\alpha+1) \Gamma(n+\beta+1)}} \left| P\_{n}^{(\alpha,\beta)}(\mathbf{x}) \sum\_{l=1}^{n} \sin((j-1)\operatorname{arccos}(\mathbf{x}\_{l})) \sqrt{w\_{l}} \ell\_{l}(\mathbf{x}) \right| \\ = j \left( |P\_{n}^{(\alpha,\beta)}(\mathbf{x})| \sqrt{\|\{w\_{l}\}\_{l=1}^{n}\|\_{\infty}} \|\infty \mathrm{A}\_{\mathsf{H}} \right) \end{array}$$

(*Uj*−<sup>1</sup> is the second kind of Chebyshev polynomial of degree *j* − 1) since \* *n*!*(n*+*α*+*β*+1*) (n*+*α*+1*)(n*+*β*+1*)* is uniformly bounded in *<sup>n</sup>* for *α, β >* <sup>−</sup>1 due to

$$\begin{aligned} \frac{(n+1)!\Gamma(n+\alpha+\beta+2)}{\Gamma(n+\alpha+2)\Gamma(n+\beta+2)} &= \left(1 - \frac{\alpha\beta}{(n+1)^2 + (\alpha+\beta)(n+1) + \alpha\beta}\right), \\ &\times \frac{n!\Gamma(n+\alpha+\beta+1)}{\Gamma(n+\alpha+1)\Gamma(n+\beta+1)}, \end{aligned}$$

which implies *<sup>n</sup>*!*(n*+*α*+*β*+1*) (n*+*α*+1*)(n*+*β*+1*)* is uniformly bounded in *<sup>n</sup>* and then \* *n*!*(n*+*α*+*β*+1*) (n*+*α*+1*)(n*+*β*+1*)* is uniformly bounded. Here *\*n* <sup>=</sup> max*x*∈[−1*,*1] <*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> <sup>|</sup>*i(x)*<sup>|</sup> is the Lebesgue constant. Then from

$$P\_n^{(\alpha,\beta)}(\mathbf{x}) = \begin{cases} O(n^{-\frac{1}{2}}), & \text{if } \max\{\alpha, \beta\} \le -\frac{1}{2} \\ O\left(n^{\max\{\alpha,\beta\}}\right), & \text{if } \max\{\alpha,\beta\} > -\frac{1}{2} \end{cases},$$

$$w\_l = \begin{cases} O\left(n^{-2-2\min\{\alpha,\beta\}}\right), & \text{if } \min\{\alpha,\beta\} \le -\frac{1}{2} \\ O\left(n^{-1}\right), & \text{if } \min\{\alpha,\beta\} > -\frac{1}{2} \end{cases}$$

(see Szegö [12, pp 168, 354]) and

$$\Lambda\_n = \begin{cases} O(\log n), & \text{if } \max\{\alpha, \beta\} \le -\frac{1}{2} \\ O(n^{\max\{\alpha, \beta\} + \frac{1}{2}}), & \text{if } \max\{\alpha, \beta\} > -\frac{1}{2} \end{cases} \quad (\text{[I2, pp 338]}),$$

we have

$$\left| \sum\_{i=1}^{n} T\_j'(\mathbf{x}\_i) b\_i(\mathbf{x}) \right| = j \tau\_n. \tag{23}$$

Then by (22) and (23), we find |*E(Tj ,x)*| ≤ 2 + *j τn <* 2*j τn* for *j* ≥ 2*n*, and consequently

$$|E(f, \mathbf{x})| = |f(\mathbf{x}) - H\_{2n-1}^\*(f, \mathbf{x})| \le \sum\_{j=2n}^{\infty} |c\_j| |E(T\_j, \mathbf{x})| = 2\pi\_n \sum\_{j=2n}^{\infty} j |c\_j|,$$

which, directly following [18], leads to the desired result.

In the case *γ >* 0: From <sup>|</sup>*E(Tj ,x)*|=|*Tj (x)* <sup>−</sup> <sup>&</sup>lt;*<sup>n</sup> i*=1 < *Tj (xi)hi(x)* <sup>−</sup> *<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *<sup>T</sup> <sup>j</sup> (xi)bi(x)*|, by Lemmas 3 and 6 we obtain

$$\sum\_{l=1}^n |\upsilon\_l(\mathbf{x})| \ell\_l^2(\mathbf{x}) = O\left(n^2 \int\_1^n t^{2\gamma - 1} dt\right) = O(n^{2+2\gamma}),$$

and

$$T\_j(\mathbf{x}) - \sum\_{l=1}^n T\_j(\mathbf{x}\_l) h\_l(\mathbf{x}) = T\_j(\mathbf{x}) - \sum\_{l=1}^n T\_j(\mathbf{x}\_l) v\_l(\mathbf{x}) \ell\_l^2(\mathbf{x}) = O\left(n^{2+2\gamma}\right).$$

These together with

$$\begin{array}{l} \vert \sum\_{l=1}^{n} T'\_j(\mathbf{x}\_l) b\_l(\mathbf{x}) \vert \\ = \frac{j}{2^{(a+\beta+1)/2}} \sqrt{\frac{n! \Gamma(n+a+\beta+1)}{\Gamma(n+a+1) \Gamma(n+\beta+1)}} \vert P\_n^{(\alpha, \beta)}(\mathbf{x}) \sum\_{l=1}^{n} \sin((j-1)\arg\cos(\mathbf{x}\_l)) \sqrt{w\_l} \ell\_l(\mathbf{x}) \vert \\ = j\mathbf{r}\_n \end{array}$$

and then |*E(Tj ,x)*| = *O j* <sup>2</sup>+2*<sup>γ</sup>* for *j* ≥ 2*n*, similar to the above proof in the case of *γ* ≤ 0, implies the desired result. %&

From the definition of *τn*, we see that when *<sup>α</sup>* <sup>=</sup> *<sup>β</sup>* = −<sup>1</sup> <sup>2</sup> the convergence order on *n* is the lowest. In addition, if *f* is of limited regularity, we have

**Lemma 7 (Vértesi [14])** *Suppose* {*xj* } *n <sup>j</sup>*=<sup>1</sup> *are the roots of <sup>P</sup>(α,β) <sup>n</sup> (x), for every continuous function f (x) we have*

$$|H\_{2n-1}(f, \mathbf{x}) - f(\mathbf{x})| = O(1) \sum\_{j=1}^{n} \left[ w\left(f; \frac{j\sqrt{1-\mathbf{x}^2}}{n}\right) + w\left(f; \frac{j^2|\mathbf{x}|}{n^2}\right) \right] j^{2\bar{\mathbf{y}}-1},\tag{24}$$

*where w(f* ;*t)* = *w(t) is the modulus of continuity of f (x), and γ*¯ = max *α, β,* <sup>−</sup><sup>1</sup> 2 *.*

**Theorem 2** *Suppose* {*xj* }*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *are the roots of <sup>P</sup>(α,β) <sup>n</sup> (x) (α, β >* −1*), and f (x) has an absolutely continuous (r* <sup>−</sup> <sup>1</sup>*)st derivative <sup>f</sup> (r*−1*) on* [−1*,* <sup>1</sup>] *for some <sup>r</sup>* <sup>≥</sup> <sup>3</sup>*,* *and a <sup>r</sup>th derivative <sup>f</sup> (r) of bounded variation Vr <sup>&</sup>lt;* <sup>∞</sup>*, then the Hermite-Fejér interpolation (5) at* {*xj* } *n <sup>j</sup>*=<sup>1</sup> *has the convergence rate (11).*

*Proof* Consider the special functional *L(g)* = *En(g, x)*, where *En(g, x)* is defined for <sup>∀</sup>*<sup>g</sup>* <sup>∈</sup> *<sup>C</sup>*1*(*[−1*,* <sup>1</sup>]*)* by

$$E\_n(\mathbf{g}, \mathbf{x}) = \mathbf{g}(\mathbf{x}) - \sum\_{j=1}^n \mathbf{g}(\mathbf{x}\_j) v\_j(\mathbf{x}) \ell\_j^2(\mathbf{x}) - \sum\_{j=1}^n \mathbf{g}'(\mathbf{x}\_j) (\mathbf{x} - \mathbf{x}\_j) \ell\_j^2(\mathbf{x}). \tag{25}$$

By the Peano kernel theorem for *n* ≥ *r* (see Peano [9] or Kowalewski [8]), *En(f, x)* can be represented as

$$E\_n(f, x) = \int\_{-1}^{1} f^{(r)}(t) K\_r(t) dt\tag{26}$$

with *Kr(t)* <sup>=</sup> <sup>1</sup> *(r*−1*)*! *L (x* <sup>−</sup> *t)r*−<sup>1</sup> + for *r* = 3*,* 4*,*··· , that is

$$K\_{r}(t) = \frac{1}{(r-1)!} (\mathbf{x} - t)\_{+}^{r-1} - \frac{1}{(r-1)!} \sum\_{j=1}^{n} (\mathbf{x}\_{j} - t)\_{+}^{r-1} v\_{j}(\mathbf{x}) \ell\_{j}^{2}(\mathbf{x}),$$

$$- \frac{1}{(r-1)!} \sum\_{j=1}^{n} (\mathbf{x}\_{j} - t)\_{+}^{r-2} (\mathbf{x} - \mathbf{x}\_{j}) \ell\_{j}^{2}(\mathbf{x}),$$

where

$$(\mathbf{x} - t)\_{+}^{k-1} = \begin{cases} (\mathbf{x} - t)^{k-1}, & \mathbf{x} \ge t;\\ 0, & \mathbf{x} < t. \end{cases} \quad (k \ge 2), \qquad \qquad (\mathbf{x} - t)\_{+}^{0} = \begin{cases} 1, & \mathbf{x} \ge t;\\ 0, & \mathbf{x} < t. \end{cases} \text{ ( $k = 1$ )}.$$

Moreover, noting that

$$\frac{1}{(k-2)!}(\chi-\mu)\_+^{k-2} = \int\_{\mu}^1 \frac{1}{(k-3)!} (\chi-t)\_+^{k-3}(t) dt, \quad k = 3, 4, \dots, \frac{1}{2}$$

we get the following identity

$$K\_{s-1}(\mu) = \int\_{\mu}^{1} K\_{s-2}(t)dt, \quad s = 4, 5, \dots, \frac{1}{2}$$

where *K*2*(t)* is defined by

$$K\_2(t) = (\mathbf{x} - t)\_+^1 - \sum\_{j=1}^n (\mathbf{x}\_j - t)\_+^1 \upsilon\_j(\mathbf{x}) \ell\_j^2(\mathbf{x}) - \sum\_{j=1}^n (\mathbf{x}\_j - t)\_+^0 (\mathbf{x} - \mathbf{x}\_j) \ell\_j^2(\mathbf{x}).$$

In addition, it can be easily verified that *Ks(*−1*)* = *Ks(*1*)* = 0 for *s* = 2*,* 3*,...*.

Since *f (r)* is of bounded variation, directly applying the similar skills of Theorem 2 and Lemma 4 in [16], we get

$$\|E\_n(f, x)\|\_{\infty} \le V\_r \|K\_{r+1}\|\_{\infty},\tag{27}$$

and

$$\|K\_{s+1}\|\_{\infty} \le \frac{\pi}{2n - s} \sup\_{-1 \le t \le 1} |K\_s(t)|, \quad \text{for } s = 2, 3, \dots, \tag{28}$$

respectively. Then from (27) and (28), we can obtain that

$$\|\|E\_n(f,\mathbf{x})\|\|\_{\infty} \le \frac{\pi^{r-1} V\_r}{(2n-2)(2n-3)\cdots(2n-r)} \|K\_2\|\_{\infty}.\tag{29}$$

In addition, by Lemma 7, we have

$$\|(\mathbf{x} - t)\_+^1 - \sum\_{j=1}^n (\mathbf{x}\_j - t)\_+^1 v\_j(\mathbf{x}) \ell\_j^2(\mathbf{x})\|\_{\infty} = \begin{cases} O\left(\frac{\log n}{n}\right), \,\,\nu \le -\frac{1}{2} \\ O\left(n^{2\gamma}\right), \,\,\,\nu > -\frac{1}{2}, \end{cases} \tag{30}$$

while by Lemmas 2–3, we get

$$|\sum\_{j=1}^{n} (\mathbf{x}\_{j} - t)^{0}\_{+}(\mathbf{x} - \mathbf{x}\_{j})\ell^{2}\_{j}(\mathbf{x})| \le \sum\_{j=1}^{n} |(\mathbf{x} - \mathbf{x}\_{j})\ell^{2}\_{j}(\mathbf{x})| = \begin{cases} O\left(\frac{\log n}{n}\right), \,\,\chi \le -\frac{1}{2} \\ O\left(n^{2\gamma}\right), \,\,\,\chi > -\frac{1}{2}. \end{cases} \tag{31}$$

Together (30) and (31), we can obtain the desired results by using

$$K\_2(t) = \begin{cases} O\left(\frac{\log n}{n}\right), \,\,\nu \le -\frac{1}{2} \\\ O\left(n^{2\mathcal{Y}}\right), \,\,\,\,\nu > -\frac{1}{2}. \end{cases}$$

Finally, We use a function of analytic *f (x)* <sup>=</sup> <sup>1</sup> <sup>1</sup>+25*x*<sup>2</sup> and a function of limited regularity *f (x)* = |*x*| <sup>5</sup> to show that the convergence rate of *f (x)*−*H*<sup>∗</sup> <sup>2</sup>*n*−1*(f, x)*∞ is dependent on *α* and *β* in Fig. 3.

**Fig. 3** *H*<sup>∗</sup> <sup>2</sup>*n*−1*(f, x)* <sup>−</sup> *f (x)*∞ at *<sup>x</sup>* = −<sup>1</sup> : <sup>0</sup>*.*<sup>001</sup> : 1 by using Gauss-Jacobi pointsystem for *f (x)* <sup>=</sup> <sup>1</sup> <sup>1</sup>+25*x*<sup>2</sup> and *f (x)* = |*x*<sup>|</sup> <sup>5</sup> with different *α* and *β*, respectively

**Acknowledgements** This author "Shuhuang Xiang" was supported partly by NSF of China (No. 11771454). This author "Guo He" was supported partly by NSF of China (No. 11901242), the Fundamental Research Funds for the Central Universities (No. 21618333), and the Opening Project at the Sun Yat-sen University (No. 2018010).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Fifth-Order Finite-Volume WENO on Cylindrical Grids**

**Mohammad Afzal Shadab, Xing Ji, and Kun Xu**

### **1 Introduction**

The conventional WENO scheme is specifically designed for the reconstruction in Cartesian coordinates on uniform grids [1]. The employment of Cartesian-based reconstruction scheme on a cylindrical grid suffers from a number of drawbacks [2, 3], e.g., in the original PPM paper, reconstruction was performed in volume coordinates (than the linear ones) so that algorithm for a Cartesian mesh can be used on a curvilinear mesh. However, the resulting interface states became first-order accurate even for smooth flows [2]. Another example can be the volume average assignment to the geometrical cell center of finite-volume than the centroid [2]. A breakthrough in the field of high order reconstruction in cylindrical coordinates is the application of the Vandermonde-like linear systems of equations with spatially

M. A. Shadab (-)

X. Ji

Department of Mathematics, Hong Kong University of Science and Technology, Kowloon City, Hong Kong

e-mail: xjiad@connect.ust.hk

K. Xu

Department of Mechanical and Aerospace Engineering, Hong Kong University of Science and Technology, Kowloon City, Hong Kong e-mail: makxu@ust.hk

© The Author(s) 2020

Department of Mechanical and Aerospace Engineering, Hong Kong University of Science and Technology, Kowloon City, Hong Kong

Department of Mathematics, Hong Kong University of Science and Technology, Kowloon City, Hong Kong

S. J. Sherwin et al. (eds.), *Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2018*, Lecture Notes in Computational Science and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3\_51

varying coefficients [2]. It is reintroduced in the present work to build a basis for the derivation of the high order WENO schemes.

The motivation for the present work is to develop a fifth-order finite-volume WENO reconstruction scheme in the efficient dimension-by-dimension framework, specifically aimed at regularly-spaced and irregularly-spaced grids in cylindrical coordinates.

#### **2 Finite-Volume Discretization in Curvilinear Coordinates**

#### *2.1 Evaluation of the Linear Weights*

A non-uniform grid spacing with zone width *Δξi* <sup>=</sup> *ξi*+<sup>1</sup> 2 <sup>−</sup> *ξi*−<sup>1</sup> 2 is considered having *ξ* ∈ *(x*1*, x*2*, x*3*)* as the coordinate along the reconstruction direction and *ξi*+<sup>1</sup> 2 denoting the location of the cell interface between zones *i* and *i* + 1. Let *Q*¯*<sup>i</sup>* be the cell average of conserved quantity *Q* inside zone *i* at some given time, which can be expressed in form of Eq. (1).

$$\bar{Q}\_{l} = \frac{1}{\Delta \mathcal{V}\_{l}} \int\_{\xi\_{i-\frac{1}{2}}}^{\xi\_{i+\frac{1}{2}}} Q\_{l}(\xi) \frac{\partial \mathcal{V}}{\partial \xi} d\xi \quad \& \quad \Delta \mathcal{V}\_{l} = \int\_{\xi\_{i-\frac{1}{2}}}^{\xi\_{i+\frac{1}{2}}} \frac{\partial \mathcal{V}}{\partial \xi} d\xi \tag{1}$$

where the local cell volume *ΔV<sup>i</sup>* of *i*th cell in the direction of reconstruction given in Eq. (1) and *<sup>∂</sup><sup>V</sup> ∂ξ* is the one-dimensional Jacobian. Now, our aim is to find a *p*th order accurate approximation to the actual solution by constructing a *(p* − 1*)*th order polynomial distribution, as given in Eq. (2).

$$\mathcal{Q}\_{l}(\xi) = a\_{l,0} + a\_{l,1}(\xi - \xi\_l^c) + a\_{l,2}(\xi - \xi\_l^c)^2 + \dots + a\_{l,p-1}(\xi - \xi\_l^c)^{p-1} \tag{2}$$

where *ai,n* corresponds to a vector of the coefficients which needs to be determined and *ξ <sup>c</sup> <sup>i</sup>* can be taken as the cell centroid. However, the final values at the interface are independent of the particular choice of *ξ <sup>c</sup> <sup>i</sup>* and one may as well set *<sup>ξ</sup> <sup>c</sup> <sup>i</sup>* = 0 [2]. Unlike the cell center, the centroid is not equidistant from the cell interfaces in the case of cylindrical-radial coordinates, and the cell averaged values are assigned at the centroid [2]. Further, the method has to be locally conservative, i.e., the polynomial *Qi(ξ )* must fit the neighboring cell averages, satisfying Eq. (3).

$$\int\_{\xi\_{l+s-\frac{1}{2}}}^{\xi\_{l+s+\frac{1}{2}}} Q\_l(\xi) \frac{\partial \mathcal{V}}{\partial \xi} d\xi = \Delta \mathcal{V}\_{l+s} \bar{Q}\_{l+s} \qquad \text{for} \quad -i\_L \le s \le i\_R \tag{3}$$

where the stencil includes *iL* cells to the left and *iR* cells to the right of the *i*th zone such that *iL* +*iR* +1 = *p*. Implementing Eqs. (1)–(2) in Eq. (3) along with a simple mathematical manipulation leads to Eq. (4), which is the fundamental equation for reconstruction in cylindrical coordinates. For the detailed derivation, kindly refer to [3].

$$\begin{pmatrix} \beta\_{l-i\_L,0} \dots \beta\_{l-i\_L,p-1} \\ \vdots & \ddots & \vdots \\ \beta\_{l+i\_R,0} \dots \beta\_{l+i\_R,p-1} \end{pmatrix}^T \begin{pmatrix} w\_{l,-l\_L}^\pm \\ \vdots \\ w\_{l,i\_R}^\pm \end{pmatrix} = \begin{pmatrix} 1 \\ \vdots \\ (\xi\_{l\pm\frac{1}{2}} - \xi\_i^c)^{p-1} \end{pmatrix} \tag{4}$$

where '±' represents the positive and negative weights i.e. weights for reconstructing right (+) and left (−) interface values respectively. Also, the grid dependent linear weights (*w*± *i,s*) satisfy the normalization condition [2].

#### *2.2 Optimal Weights*

For the case of fifth-order WENO interpolation, the third order interpolated variables are optimally weighed in order to achieve fifth-order accurate interpolated values as given in Eq. (5) for the case of *p* = 3 [1].

$$q\_{l,0}^{(2p-1)\pm} = \sum\_{l=0}^{p-1} C\_{l,l}^{\pm} q\_{l,l}^{p\pm} \tag{5}$$

where *C*± *i,l* is the optimal weight for the positive/negative cases on the *i*th finitevolume. So, Eq. (4) is used again to evaluate the weights for the fifth-order (2*p*−1 = 5) interpolation (*iL* = 2*, iR* = 2).

Linear and optimal weights are independent of the mesh size for standard regularly-spaced grid cases. They can be evaluated and stored (at a nominal cost) independently before the actual computation. Also, they conform to the original WENO-JS [1] for the limiting case (*R* → ∞). The weights required for source term and flux integration in one or more dimensions are given in [3].

#### *2.3 Smoothness Indicators and the Nonlinear Weights*

The mathematical definition of the smoothness indicator is given in Eq. (6) [1].

$$IS\_{l,l} = \sum\_{m=1}^{p-1} \int\_{\xi\_{j-\frac{1}{2}}}^{\xi\_{j+\frac{1}{2}}} \left(\frac{d^m}{d\xi^m} \mathcal{Q}\_{l,l}(\xi)\right)^2 \Delta\xi\_l^{2m-1} d\xi, \quad l = 0, \ldots, p-1 \tag{6}$$

To evaluate the value of *I Si,l*, a third order polynomial interpolation on *i*th cell is required using positive and negative reconstructed values by stencil *Sl*, as given in Eq. (2). Finally, evaluating the values of the coefficient *a*'s and substituting their values in smoothness indicator formula (6) yields the grid-independent fundamental relation (7). The nonlinear weight (*ω*± *i,l*) for the WENO-C interpolation is defined in Eq. (8) [1], where *#* is chosen to be 10−<sup>6</sup> [1, 3].

$$IS\_{l,l} = 4(39\bar{\mathcal{Q}}\_l^2 - 39\bar{\mathcal{Q}}\_l(q\_{l,l}^- + q\_{l,l}^+) + 10((q\_{l,l}^-)^2 + (q\_{l,l}^+)^2) + 19q\_{l,l}^-q\_{l,l}^+) \tag{7}$$

$$\alpha\_{l,l}^{\pm} = \frac{\alpha\_{l,l}^{\pm}}{\sum\_{l=0}^{p-1} \alpha\_{l,l}^{\pm}} \qquad \& \quad \alpha\_{l,l}^{\pm} = \frac{C\_{l,l}^{\pm}}{(\epsilon + IS\_{l,l})^2} \qquad l = 0, 1, 2 \tag{8}$$

The final interpolated interface values are evaluated from Eq. (9).

$$q\_l^{(2p-1)\pm} = \sum\_{l=0}^{p-1} \omega\_{l,l}^{p\pm} q\_{l,l}^{p\pm} \tag{9}$$

#### **3 Stability Analysis of WENO-C for Hyperbolic Conservation Laws**

For WENO-C to be practically useful, it is crucial that it enables a stable discretization for hyperbolic conservation laws when coupled with a proper time-integration scheme. In this section, we analyze WENO-C scheme for model problems involving smooth flow in 1D cylindrical-radial coordinates, based on a modified von Neumann stability analysis [4]. We consider scalar advection equation (10) in 1D cylindricalradial coordinates.

$$\frac{\partial \mathcal{Q}}{\partial t} + \frac{1}{(\partial \mathcal{J} / \partial \xi)} \frac{\partial}{\partial \xi} \left( \left( \frac{\partial \mathcal{J}}{\partial \xi} \right) \mathcal{Q}v \right) = 0 \qquad \xi \in [0, \infty], \quad t > 0 \tag{10}$$

where *Q* is the conserved variable, *(∂V/∂ξ )* = *ξ* is the one-dimensional Jacobian in cylindrical-radial coordinates. Boundary conditions are not considered in the present approach to reduce the complexity of the analysis. Assuming a uniform grid with *ξi* = *iΔξ* and *ξi*+<sup>1</sup> − *ξi* = *Δξ*∀*i* and *(i* ± 1*/*2*)* denotes the boundaries of the finitevolume *i*. In the finite-volume framework, Eq. (10) transforms into the conservative scheme given in Eq. (11).

$$\frac{\partial \mathcal{Q}\_{l}}{\partial t} = -\frac{1}{\Delta \mathcal{V}\_{l}} (\hat{F}\_{l+1/2} - \hat{F}\_{l-1/2}) \tag{11}$$

where numerical flux *F*ˆ *<sup>i</sup>*+1*/*<sup>2</sup> is the Lax-Friedrich flux, and *Q*¯*<sup>i</sup>* and *V<sup>i</sup>* are given in Eq. (1). For this particular problem, let *v* = 1 in Eq. (10). Therefore, only the values on the left side of the interface are considered. Based on the von Neumann stability analysis, the semi-discrete solution can be expressed as a discrete Fourier series. By the superposition principle, only one term in the series can be used for analysis, as illustrated in Eq. (12).

$$
\bar{Q}\_l(t) = \hat{Q}\_k(t)e^{j l \theta\_k}, \quad \text{where} \quad j = \sqrt{-1} \tag{12}
$$

By substituting Eq. (12) in Eq. (11), we can separate the spatial operator *L*, as given in Eq. (13).

$$L = -\frac{(\hat{\mathcal{F}}\_{l+1/2} - \hat{\mathcal{F}}\_{l-1/2})}{\Delta \mathcal{V}\_{l}} = -\frac{[\mathcal{Q}(\partial \mathcal{V} / \partial \xi)]\_{l+1/2}^{-} - [\mathcal{Q}(\partial \mathcal{V} / \partial \xi)]\_{l-1/2}^{-}}{\Delta \mathcal{V}\_{l}} = -\frac{z(\theta\_{k})\bar{\mathcal{Q}}\_{l}}{\Delta \xi} \tag{13}$$

where the complex function *z(θk)* is the Fourier symbol. By substituting the values of *Q*− *<sup>i</sup>*−1*/*<sup>2</sup> and *<sup>Q</sup>*<sup>−</sup> *<sup>i</sup>*+1*/*<sup>2</sup> using fifth-order positive weights of cells *(i* <sup>−</sup> <sup>1</sup>*)* and *<sup>i</sup>* respectively for a smooth solution, the value of *z(θk)* for WENO-C can be evaluated using Eq. (14).

$$z(\theta\_k) = \frac{m+1}{i^{(m+1)} - (i-1)^{(m+1)}} \sum\_{l=-2}^{+2} \left[ w\_{l,l}^+ i^m e^{jl\theta\_k} - w\_{(l-1),l}^+ (i-1)^m e^{j(l-1)\theta\_k} \right] \tag{14}$$

where *m* = 1 for cylindrical-radial coordinates. Using the same approach as given in [4], we can plot the spatial spectrum {S : −*z(θk)* for *θk* ∈ [0*,* 2*π*]} and the stability domain S*<sup>t</sup>* for TVD-RK order 3. The maximum stable CFL number of this scheme can be computed by finding the largest rescaling parameter *σ*˜ , so that the rescaled spectrum still lies in the stability domain.

It can be observed from Fig. 1 that the spatial spectrums S of WENO-C differs initially with the index numbers *i* due to the geometrical variation of the finitevolume. However, the spectrums are the same for high index numbers (*i*), similar to WENO-JS, as the fifth-order interpolation weights converge. Some regions *(i* = 1*,* 2*)* require boundary conditions and thus, are not considered in the present analysis. The values of CFL number for cylindrical-radial coordinates lie in between 1.45 and 1.52. As a final remark, it can be concluded that the proposed scheme is A-stable with third or higher order of RK method with an appropriate value of CFL number for this case.

#### **4 Numerical Tests**

In this section, several tests on Euler equations are performed to analyze the performance of the WENO-C reconstruction scheme. Tests are performed on a gamma law gas (*γ* = 1*.*4) in cylindrical coordinates to investigate the essentially non-oscillatory property of WENO-C for discontinuous flows and the convex combination property for smooth flows. For first-order and second-order (MUSCL)

spatial reconstructions, Euler time marching and Maccormack (predictor-corrector) schemes are respectively employed. For WENO-C, time marching is done with TVD-RK order 3 for 1D cases and RK order 5 for the 2D case.

#### *4.1 Acoustic Wave Propagation*

A smooth problem involving a nonlinear system of 1D gas dynamical equations is solved to test fifth-order accuracy of the spatial discretization scheme [3]. The Euler equations in cylindrical-radial coordinates can be written in the form of Eq. (15).

$$\frac{\partial}{\partial t} \begin{pmatrix} \rho \\ \rho u \\ E \end{pmatrix} + \frac{1}{R} \frac{\partial}{\partial R} \begin{pmatrix} \rho u R \\ (\rho u^2 + p)R \\ (E+p)uR \end{pmatrix} = \begin{pmatrix} 0 \\ p/R \\ 0 \end{pmatrix} \tag{15}$$

where *ρ* is the mass density, *u* is the radial velocity, *p* is the pressure, and *E* is the total energy. Equation (16) serves as the adiabatic equation of state.

$$E = \frac{p}{\chi - 1} + \frac{1}{2}\rho u^2 \tag{16}$$

The initial conditions are provided in Eq. (17) with the perturbation given in Eq. (18). The interface flux is evaluated with Rusanov scheme [3].

$$\rho(R,0) = 1 + \varepsilon f(R), \quad \mu(R,0) = 0, \quad p(R,0) = 1/\gamma + \varepsilon f(R) \tag{17}$$

$$f(R) = \begin{cases} \frac{\sin^4(5\pi R)}{R} & \text{if } 0.4 \le R \le 0.6\\ 0 & \text{otherwise} \end{cases} \tag{18}$$

A sufficiently small perturbation with *<sup>ε</sup>* <sup>=</sup> <sup>10</sup>−<sup>4</sup> yields a smooth solution. The interface flux is evaluated using Rusanov scheme with a CFL number of 0*.*3.

The initial perturbation splits into two acoustic waves traveling in opposite directions. The final time (*t* = 0*.*3) is set such that the waves remain in the domain and the problem is free from the boundary effects. The computational domain of unity length is uniformly divided into *N* different zones i.e. *N* = 16*,* 32*,* 64*,* 128*,* 256. Although an exact solution known up to O(*ε*2) is known, the solution on the finest mesh *N* = 1024 is taken as the reference. Figure 2 illustrate the spatial variation of density at *t* = 0*.*3 inside the domain. From Table 1, it clear that the scheme approaches the desired fifth-order accuracy.

**Fig. 2** Spatial profiles of density at *t* = 0*.*3 for acoustic wave propagation test in cylindrical-radial coordinates


#### *4.2 Sedov Explosion Test*

Sedov explosion test is performed to investigate code's ability to deal with strong shocks and non-planar symmetry [3]. The problem involves a self-similar evolution of a cylindrical blastwave in a uniform grid (*N* = 100) from a localized initial pressure perturbation (delta-function) in an otherwise homogeneous medium. Governing equations are given in Eq. (15) and the fluxes are evaluated with Rusanov scheme and GKS [5]. For the code initialization, dimensionless energy *#* = 1 is deposited into a small region of radius *δ* = 3*ΔR*. Inside this region, the dimensionless pressure *P* <sup>0</sup> is given by Eq. (19).

$$P\_0^{'} = \frac{\Im(\wp - 1)\epsilon}{(m+2)\pi\delta^{(m+1)}}\tag{19}$$

where *m* = 1 for cylindrical geometry. Reflecting boundary condition is employed at the center (*R* = 0), whereas boundary condition at *R* = 1 is not required for this problem. The initial velocity and density inside the domain are 0 and 1 respectively and the initial pressure everywhere except the kernel is 10−5. As the source term is very stiff, the CFL number is set to be 0*.*1. The final time is *t* = 0*.*05.

Figure 3 shows that the peak for WENO-C is higher for density and is closest to the analytical value, similar to fifth-order finite difference version [3], but MUSCL has higher offset peaks for pressure and velocity. GKS performs slightly better than RS, as the peaks are slightly higher for all the cases.

**Fig. 3** Variation of density, velocity, and pressure with the radius for Sedov explosion test in cylindrical-radial coordinates. Domain is restricted to *R* = 0*.*4 for the sake of clarity

### *4.3 Modified 2D Riemann Problem in (R* **−** *z) Coordinates*

The final test for the present scheme involves a modified 2D Riemann problem in cylindrical (*R* − *z*) coordinates, as illustrated in Fig. 4 (top left). The problem involves 2 contact discontinuities and 2 shocks as the initial condition, resulting in the formation of a self-similar structure propagating towards the low densitylow pressure region (region 3). The governing equations in cylindrical (*R* − *z*) coordinates are provided in Eq. (20).

The computations are performed until *t* = 0*.*2 with a CFL number of 0*.*5 on a domain (*R, z*)=[0,1]×[0,1] divided into 500×500 zones. The boundary conditions are symmetry at the center (except for the antisymmetric radial velocity) and outflow elsewhere. HLL Riemann solver is used for flux evaluations. Rich smallscale structures in the contact-contact region (region 1) can be observed from

**Fig. 4** Modified 2D Riemann problem in cylindrical (*r* − *z*) coordinates: schematic (top left), density contours at *t* = 0*.*2 with first-order (top right), second-order MUSCL (bottom left), and WENO-C (bottom right) reconstruction schemes

Fig. 4 for WENO-C reconstruction, when compared with first and second-order MUSCL reconstruction. Structures are highly smeared for the case of first-order reconstruction.

$$\frac{\partial}{\partial t} \begin{pmatrix} \rho \\ \rho v\_R \\ \rho v\_\varepsilon \\ \rho e \end{pmatrix} + \frac{1}{\mathcal{R}} \frac{\partial}{\partial \mathcal{R}} \begin{pmatrix} \rho v\_R \mathcal{R} \\ (\rho v\_R^2 + p)\mathcal{R} \\ \rho v\_R v\_\varepsilon \mathcal{R} \\ (\rho e + p)v\_\mathcal{R} \mathcal{R} \end{pmatrix} + \frac{\partial}{\partial \mathcal{z}} \begin{pmatrix} \rho v\_\varepsilon \\ \rho v\_R v\_\varepsilon \\ \rho v\_\varepsilon^2 + p \\ (\rho e + p)v\_\varepsilon \end{pmatrix} = \begin{pmatrix} 0 \\ p/R \\ 0 \\ 0 \end{pmatrix} \tag{20}$$

#### **5 Conclusions**

The fifth-order finite-volume WENO-C reconstruction scheme is proposed for structured grids in cylindrical coordinates to achieve high order spatial accuracy along with ENO transition. A grid independent smoothness indicator is derived for this scheme. For uniform grids, the analytical values in cylindrical-radial coordinates for the limiting case (*R* → ∞) conform to WENO-JS. Linear stability analysis of the present scheme is performed using a scalar advection equation in radial coordinates. Several tests involving smooth and discontinuous flows are performed, which testify for the fifth-order accuracy and ENO property of the scheme.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### *Editorial Policy*

1. Volumes in the following three categories will be published in LNCSE:


Those considering a book which might be suitable for the series are strongly advised to contact the publisher or the series editors at an early stage.

2. Categories i) and ii). Tutorials are lecture notes typically arising via summer schools or similar events, which are used to teach graduate students. These categories will be emphasized by Lecture Notes in Computational Science and Engineering. **Submissions by interdisciplinary teams of authors are encouraged**. The goal is to report new developments – quickly, informally, and in a way that will make them accessible to non-specialists. In the evaluation of submissions timeliness of the work is an important criterion. Texts should be well-rounded, well-written and reasonably self-contained. In most cases the work will contain results of others as well as those of the author(s). In each case the author(s) should provide sufficient motivation, examples, and applications. In this respect, Ph.D. theses will usually be deemed unsuitable for the Lecture Notes series. Proposals for volumes in these categories should be submitted either to one of the series editors or to Springer-Verlag, Heidelberg, and will be refereed. A provisional judgement on the acceptability of a project can be based on partial information about the work: a detailed outline describing the contents of each chapter, the estimated length, a bibliography, and one or two sample chapters – or a first draft. A final decision whether to accept will rest on an evaluation of the completed work which should include


3. Category iii). Conference proceedings will be considered for publication provided that they are both of exceptional interest and devoted to a single topic. One (or more) expert participants will act as the scientific editor(s) of the volume. They select the papers which are suitable for inclusion and have them individually refereed as for a journal. Papers not closely related to the central topic are to be excluded. Organizers should contact the Editor for CSE at Springer at the planning stage, see *Addresses* below.

In exceptional cases some other multi-author-volumes may be considered in this category.

4. Only works in English will be considered. For evaluation purposes, manuscripts may be submitted in print or electronic form, in the latter case, preferably as pdf- or zipped ps-files. Authors are requested to use the LaTeX style files available from Springer at http:// www.springer.com/gp/authors-editors/book-authors-editors/manuscript-preparation/5636 (Click on LaTeX Template → monographs or contributed books).

For categories ii) and iii) we strongly recommend that all contributions in a volume be written in the same LaTeX version, preferably LaTeX2e. Electronic material can be included if appropriate. Please contact the publisher.

Careful preparation of the manuscripts will help keep production time short besides ensuring satisfactory appearance of the finished book in print and online.

5. The following terms and conditions hold. Categories i), ii) and iii):

Authors receive 50 free copies of their book. No royalty is paid. Volume editors receive a total of 50 free copies of their volume to be shared with authors, but no royalties.

Authors and volume editors are entitled to a discount of 40 % on the price of Springer books purchased for their personal use, if ordering directly from Springer.

6. Springer secures the copyright for each volume.

Addresses:

Timothy J. Barth NASA Ames Research Center NAS Division Moffett Field, CA 94035, USA barth@nas.nasa.gov

Michael Griebel Institut für Numerische Simulation der Universität Bonn Wegelerstr. 6 53115 Bonn, Germany griebel@ins.uni-bonn.de

David E. Keyes Mathematical and Computer Sciences and Engineering King Abdullah University of Science and Technology P. O. Box 55455 Jeddah 21534, Saudi Arabia david.keyes@kaust.edu.sa

#### and

Department of Applied Physics and Applied Mathematics Columbia University 500 W. 120th Street New York, NY 10027, USA kd2112@columbia.edu

Risto M. Nieminen Department of Applied Physics Aalto University School of Science and Technology 00076 Aalto, Finland risto.nieminen@aalto.fi

Dirk Roose Department of Computer Science Katholieke Universiteit Leuven Celestijnenlaan 200A 3001 Leuven-Heverlee, Belgium dirk.roose@cs.kuleuven.be

Tamar Schlick Department of Chemistry and Courant Institute of Mathematical Sciences New York University 251 Mercer Street New York, NY 10012, USA schlick@nyu.edu

Editor for Computational Science and Engineering at Springer:

Martin Peters Springer-Verlag Mathematics Editorial IV Tiergartenstrasse 17 69121 Heidelberg, Germany martin.peters@springer.com

## **Lecture Notes in Computational Science and Engineering**


*For further information on these books please have a look at our mathematics catalogue at the following URL:* www.springer.com/series/3527

## **Monographs in Computational Science and Engineering**

1. J. Sundnes, G.T. Lines, X. Cai, B.F. Nielsen, K.-A. Mardal, A. Tveito, *Computing the Electrical Activity in the Heart*.

*For further information on this book, please have a look at our mathematics catalogue at the following URL:* www.springer.com/series/7417

## **Texts in Computational Science and Engineering**


19. J. A. Trangenstein, *Scientific Computing*. Volume II - Eigenvalues and Optimization.

20. J. A. Trangenstein, *Scientific Computing*. Volume III - Approximation and Integration.

*For further information on these books please have a look at our mathematics catalogue at the following URL:* www.springer.com/series/5151