**Particle Acceleration and Detection**

# Rudolf Frühwirth Are Strandlie

# Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors

# **Particle Acceleration and Detection**

# **Series Editors**

Alexander Chao, SLAC, Stanford University, Menlo Park, CA, USA Kenzo Nakamura, Kavli IPMU, University of Tokyo, Kashiwa, Chiba, Japan Katsunobu Oide, KEK, High Energy Accelerator Research Organization, Tsukuba, Japan Werner Riegler, Detector group, CERN, Genève, Switzerland Vladimir Shiltsev, Accelerator Physics Center, Fermi National Accelerator Lab, Batavia, IL, USA Frank Zimmermann, BE Department, ABP Group, CERN, Genève, Switzerland

The series "Particle Acceleration and Detection" is devoted to monograph texts dealing with all aspects of particle acceleration and detection research and advanced teaching. The scope also includes topics such as beam physics and instrumentation as well as applications. Presentations should strongly emphasize the underlying physical and engineering sciences. Of particular interest are


The books forming this collection will be of importance to graduate students and active researchers alike.

More information about this series at http://www.springer.com/series/5267

Rudolf Fruhwirth • Are Strandlie ¨

# Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors

Rudolf Fruhwirth ¨ Institute of High Energy Physics Austrian Academy of Sciences Wien, Austria

Are Strandlie Norwegian University of Science and Technology Gjøvik, Norway

Published with the support of the Austrian Science Fund (FWF): PUB 733-Z

ISSN 1611-1052 ISSN 2365-0877 (electronic) Particle Acceleration and Detection ISBN 978-3-030-65770-3 ISBN 978-3-030-65771-0 (eBook) https://doi.org/10.1007/978-3-030-65771-0

© The Editor(s) (if applicable) and The Author(s) 2021. This book is an open access publication. **Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

*To my teacher Meinhard Regler, my wife Sylvia Frühwirth-Schnatter, and my sons Stephan, Matthias, and Felix. — R.F.*

*To my wife Hanne Kari and my children Kristine, Harald, and Karen. — A.S.*

# **Preface**

## **Scope**

Track and vertex reconstruction is an important part of the data analysis chain in experiments at particle accelerators. This includes fixed-target experiments, experiments at lepton and hadron colliders, and neutrino experiments where the detector is the target. This book deals almost exclusively with the reconstruction of charged particles and their production vertices in tracking detectors. The reconstruction of neutral particles in calorimeters and particle identification are subjects that are outside the scope of the present book; excellent treatments of these topics can be found elsewhere.

The methods presented here are also largely agnostic to the detector technology that produces the observations that are the input to the algorithmic chain of track finding → track fitting → vertex finding → vertex fitting. The problems that arise in producing hit positions with well-calibrated standard errors in the various types of tracking detectors are therefore described only very briefly; otherwise, it is simply assumed that such positions with correct standard errors are available. Nevertheless, an important part of the book deals with methods not sensitive to deviations from this ideal case.

Although track and vertex reconstruction can be formulated largely in geometrical terms, a statistically sound treatment of the stochastic processes that disturb the trajectories of charged particles by interactions with the detector material is mandatory. The modeling of these processes and their incorporation into the reconstruction algorithms is therefore a significant topic that is treated in considerable detail.

The examples in the last part of the book come from the four LHC experiments, complemented by two experiments at SuperKEKB and FAIR. Nonetheless, it was our aim to present the algorithmic solutions in as general a context as possible. Inevitably, some selection of the material in the extensive literature was necessary, driven by the available space and potentially somewhat influenced by our own experiences and predilections. Still, we aspired to describe in sufficient detail as many important contributions as possible; wherever this was not feasible, there are pointers to the relevant publications.

## **Content**

The first part of the book is conceived as an introduction. Chapter 1 gives a very brief outline of tracking detectors, their basic principles of operation, and the challenges they pose for calibration and alignment. Chapter 2 shows how track and vertex reconstruction are embedded in the entire event reconstruction chain, from the trigger to the physics object reconstruction, and how important they are for the final physics analysis. Chapter 3 introduces basic notions from applied mathematics and statistics relevant for the core topics of the book: function minimization, regression and state space models, and clustering.

Part II covers the first core topic, track reconstruction. Chapter 4 describes track models, starting with the equations of motion of charged particles, followed by describing different ways of parametrizing the state of a particle and exhibiting algorithms for track and error propagation in various types of magnetic fields. It concludes with modeling of material effects and their inclusion in the track reconstruction. Chapter 5 is dedicated to track finding. As there is no systematic theory of track finding yet, we describe a variety of algorithms that have been and are presently successfully deployed in many experiments, including fast track finding in real time at the trigger level. Chapter 6 presents established methods for the estimation of track parameters, both traditional least-squares estimators and more recent robust and adaptive estimators. The special cases of circle and helix fitting are given a separate section as is the assessment of track quality. The detection of outliers and the finding of kinks in tracks concludes the chapter and Part II.

Part III is dedicated to the second core topic, vertex reconstruction. Chapter 7 first introduces the distinction between primary and secondary vertices, then goes on to discuss search strategies for finding primary vertices, both in one dimension and in 3D space. Various clustering algorithms are presented. Chapter 8 showcases methods for vertex fitting, which are very similar to the ones used for track fitting on a mathematical level. The concluding section shows how to extend the vertex fit to a kinematic fit by imposing additional geometric constraints and conservation laws. Part III concludes with Chap. 9, which deals with the reconstruction of secondary vertices. As the methodology of the search for secondary vertices is strongly influenced by the location of the vertex and the properties of the emerging particles, the four most important types are treated in four separate sections.

Part IV of the book presents case studies of approaches to tracking and vertexing from current and future experiments. Given our background in two of the LHC experiments and the enormous challenges in all of the LHC experiments, it is maybe not surprising that the four of them are our prime examples. They are complemented by two experiments not at the LHC, Belle II, and CBM. They have to solve their own specific tracking and vertexing problems, somewhat different from the ones typical for the LHC, but not less difficult all the same. We have to warn the reader, however, that at least some of these examples come with an expiration date as it were. In 2019, the LHC experiments have already started preparations for Run 3 of the LHC, which is scheduled to start in 2021, and for the high-luminosity phase of the LHC or HL- LHC, planned for operation starting in 2026. It is to be expected that the conditions at the HL-LHC require substantial changes in the reconstruction algorithms of ATLAS and CMS, especially in the track finding part. Belle II has started operation recently and will certainly adapt and optimize its current algorithms with the rise of the luminosity of the SuperKEKB *B* factory. CBM was still in the preparation phase in 2019, and although it has found very convincing solutions to its tracking and vertexing challenges, it too will profit from experience and probably modify its approach if need arises. The reader should therefore keep in mind that the examples in Part IV are based on the published material at the time of writing, and that many exciting developments are yet to come or have already found their way into the track and vertex reconstruction software.

The appendices following Part IV contain supplementary material. Appendix A lists the Jacobian matrices of the parameter transformations treated in Chap. 4; Appendix B shows the regularization of the kinematic fit for singular covariance matrices; and Appendix C contains a list of software packages for track and vertex fitting, as well as entire frameworks with existing, but easily replaceable, modules for track and vertex reconstruction. These frameworks can serve as convenient testbeds for new ideas and algorithms, addressing users and developers alike.

## **Audience**

The book is intended for a wide audience: PhD students who want to gain better understanding of the inner workings of track and vertex reconstruction; PhD students and postdocs who want to enrich their experience by participating in projects that require knowledge of the topic; and researchers of all ages who want to contribute to the progress of the field by becoming algorithm and software developers. In all cases, the reader is expected to have some basic knowledge of linear algebra and statistics.

## **Acknowledgements**

We first wish to thank the series editor Werner Riegler (CERN), who has inspired and encouraged us to write this book and given us valuable feedback on all things related to detectors. We thank the anonymous reviewers for their valuable suggestions and corrections, as well as the colleagues from the experiments for their corrections and additions to Part IV: David Rohr for ALICE; Markus Elsing and Andreas Salzburger for ATLAS; Marco Rovere and Erica Brondolin for CMS; Agnes Dziurda for LHCb; Nils Braun and Felix Metzner for Belle II; and Ivan Kisel for CBM. Any errors that may remain are our own responsibility.

We owe a great deal of thanks to our colleagues and students with whom we have collaborated on track and vertex reconstruction in the course of our professional lives, always teaching and learning at the same time. At CERN: Pierre Billoir, Teddy Todorov, Thomas Speer, Pascal Vanlaer, Wolfgang Adam, Matthias Winkler, Martin Liendl, Andreas Salzburger, Wolfgang Liebig, and Tom Atkinson; in Gjøvik and Oslo: Esben Lund, Lars Bugge, Jørn Wroldsen, Bjørn Lillekjendlie, Steinar Stapnes, and Håvard Gjersdal; and in Vienna: Meinhard Regler, Sylvia Frühwirth-Schnatter, Winfried Mitaroff, Peter Kubinec, Dieter Stampfer, Wolfgang Waltenberger, Josef Scherzer, Edmund Widl, Manfred Valentan, Moritz Nadler, Jakob Lettenbichler, and Erica Brondolin.

R.F. thanks his former director Jochen Schieck for allowing him to use the infrastructure at the Institute for High Energy Physics in Vienna for a year after retirement. He also thanks Eugenio Paoloni for his essential contributions to the development of the track finder for the Belle II SVD.

The open access version of this book has been made possible by a grant from the Austrian Science Fund (FWF). We thank the FWF for its generous support.

Vienna, Austria Rudi Frühwirth Gjøvik, Norway Are Strandlie May 2020

## **A Note on the References**

WWW addresses (URLs) are given for web resources, technical reports, articles in arXiv and open access journals, and material that is hard to find otherwise.

## **Typesetting and Notation**

Vectors are typeset in small bold italic letters, for example, *a*. Unless specified otherwise, all vectors are column vectors. The length of vector *a* is denoted by dim*(a)*, its transpose by *<sup>a</sup>*T, its norm by <sup>|</sup>*a*|. The scalar product of two vectors *<sup>a</sup>* and *b* is denoted by *a* ·*b*, and their cross product by *a* ×*b*. Matrices are typeset in capital bold italic letters, for example, *A*. The rank of matrix *A* is denoted by rank*(A)*, its diagonal by diag*(A)*, its inverse by *A*−1, and its transpose by *A*T. The blockdiagonal matrix with blocks *A*1*,..., A<sup>n</sup>* is denoted by *A* = blkdiag*(A*1*,..., An)*. The gradient of a multivariate function *F (x)* is a row vector and denoted by ∇*F*; the Hessian matrix is denoted by <sup>∇</sup>2*F*. The expectation of a random variable *<sup>z</sup>* is denoted by E[*z*] and its variance by var [*z*]. The expectation of a random vector *ε* is denoted by E[*ε*] and its covariance matrix by Var [*ε*]. The cross-covariance matrix of two random vectors *ε* and *δ* is denoted by Cov [*ε, δ*]. As usual, *δij* is the Kronecker delta.

# **Contents**

#### **Part I Introduction**


xi


#### **Part II Track Reconstruction**



#### **Part III Vertex Reconstruction**



# **List of Figures**



# **List of Tables**


# **Part I Introduction**

# **Chapter 1 Tracking Detectors**

**Abstract** The chapter gives an overview of particle detectors, with the emphasis on tracking detectors. The working principles and the calibration of gaseous, semiconductor, and fiber detectors are explained, followed by a brief review of detector alignment. As an illustration, the tracking systems of the four experiments at the LHC and two non-LHC experiments, Belle II and CBM, are presented.

# **1.1 Introduction**

Many high-energy physics experiments are performed by colliding two beams of high energy particles or one beam with a target. The particles produced in a collision are recorded by particle detectors. The collision is then studied by reconstructing most of the particles produced in the interaction and determining their properties. For a general review of particle detectors, see for instance [1, Chap. 7] and [2].

The present book concentrates on the reconstruction of charged tracks and interaction vertices using information collected by tracking detectors. A tracking detector can be a single device such as a wire chamber or a silicon strip sensor, or a full-blown tracking system such as a time projection chamber capable of standalone track reconstruction; see Sect. 1.6. For a review of non-tracking detectors, the reader is referred to [1, 3].

A charged particle crossing a tracking detector generates a single or a string of spatial observations in the local coordinate system of the detector. For track reconstruction, these have to be transformed to points in 3D space usually with different precisions in the three coordinates. Accordingly, the correct transformations are determined by the alignment procedure; see Sect. 1.5. The resulting space points or "hits" are collected into track candidates by the track finding. In the subsequent track fit, the track parameters are estimated and the track hypothesis is tested. Successfully reconstructed tracks are then clustered into production or decay vertices, followed by a vertex fit and a test of the vertex hypothesis.

There are three principal types of tracking detectors: gas-filled or gaseous tracking detectors; solid state tracking detectors, usually equipped with silicon sensors; and scintillating fiber trackers. They are described in the following three sections.

## **1.2 Gaseous Tracking Detectors**

Gaseous tracking detectors utilize the ionizing effect of charged particles in a volume of gas. The simplest gaseous detector is the Geiger-Müller counter, a tube with a central wire. A potential difference of 1–3 kV between the tube wall and the wire causes the primary electrons and ions to move towards the anode (wire) and the cathode (tube wall), respectively. Because of the large field strength close to the wire, the primary electrons generate an avalanche of secondary electrons and ions, resulting in a detectable signal fed into an amplifier.

In order to use this principle for the measurement of the position of a charged particle, a structured array of many elements has to be designed. The following subsections describe a few common types of such arrays.

## *1.2.1 Multi-wire Proportional Chamber*

The multi-wire proportional chamber (MWPC, [1, Sect. 7.1]) is a thin cuboid volume of gas, oriented approximately perpendicular to the passage of the particles to be measured. The volume is bounded by a pair of conductive plates acting as the cathode. Inside the volume an array of anode wires detects the passage of charged particles, see Fig. 1.1. With a typical wire spacing of 1 mm a spatial resolution of around 0.3 mm can be achieved in the *v*-coordinate orthogonal to the wires. A further improvement can be obtained by tilting the MWPC so that the probability of a particle giving signals on two adjacent wires gets larger or by rotating the chamber and measuring the drift times to the anode wires and estimating the point of passage from the drift distances [4, 5]. The resolution in the *w*-coordinate along the wire is poor, equal to the wire length divided by <sup>√</sup><sup>12</sup> <sup>≈</sup> <sup>3</sup>*.*46.

## *1.2.2 Planar Drift Chamber*

A planar drift chamber is similar in shape to an MWPC (see Fig. 1.1), but the electrical field is shaped by an alternating array of sense (anode) and field (cathode) wires; see [1, Sect. 7.2] and [6]. The electrons and ions from the primary ionization drift to the respective electrodes, and gas amplification in the strong field close to the sense wires gives a detectable signal that is amplified. In addition, the drift time

**Fig. 1.1** Local coordinate system in a wire chamber, in a planar drift chamber, or in a silicon strip sensor

between the crossing time of the particle and the signal time on the sense wire is measured. The drift distance can then be computed from the drift time. Given a precise calibration of the drift distance, a spatial resolution below 0.1 mm in the coordinate *v* orthogonal to the wire can be achieved; see for instance [7].

Drift chambers need far fewer electronic channels than MWPCs, but have to be monitored and calibrated much more carefully. In addition, there is an inherent leftright ambiguity of the spatial position relative to the sense wire, so that each hit has a mirror hit. This ambiguity must be resolved in the track finding or track fitting stage. The spatial position *w* along the wire can be measured by comparing the charges at both ends of the wire. A resolution of a couple of millimeters can be achieved in this way [8].

## *1.2.3 Cylindrical Drift Chamber*

Cylindrical drift chambers [1, Sects. 7.2 and 7.3] have been and still are widely used in collider experiments. Such a chamber consists of up to 60 cylindrical layers of alternating field and sense wires mostly parallel to the beams of the collider; see Fig. 1.2. In the local coordinate system of the chamber, the *z*-axis is the symmetry axis of the chamber. The measured point in the transverse plane can be given in polar coordinates (*R*m,*Φ*m,*z*m), where *R*<sup>m</sup> is the radial position of the sense wire, and *Φ*<sup>m</sup> is the polar angle of the wire plus/minus the drift angle *Φd* , i.e., the drift distance *d* divided by *R*m. The resolution of the drift distance is typically in the order of 0.1–0.2 mm; see for instance [9, 10]. Like in a planar drift chamber, each hit has a mirror hit, and the ambiguity must be resolved in the track finding or track fitting stage. The *z*-coordinate can be measured by charge division or by adding "stereo" layers of wires tilted with respect to the "axial" layers. The resulting **Fig. 1.2** Local coordinate system in a cylindrical drift chamber. Only one layer of sense wires is shown. *R*<sup>m</sup> is the radial distance of the sense wire from the *z* axis; *Φ*<sup>w</sup> is the azimuth angle of the sense wire. *d* is the drift distance; *Φd* = *d/R*<sup>m</sup> is the angle spanned by *d* at radius *R*m. The azimuth angles of the track hit and its mirror hit are *Φ*m*,*<sup>1</sup> = *Φ*<sup>w</sup> − *Φd , Φ*m*,*<sup>2</sup> = *Φ*<sup>w</sup> + *Φd* . The *z*-coordinate may or may not be measured directly

spatial resolution of *z*<sup>m</sup> is equal to the drift distance resolution divided by the sine of the stereo angle, typically 2–3 mm.

## *1.2.4 Drift Tubes*

Drift tubes are small drift chambers with a single sense wire. They can have rectangular or circular cross sections, and can be arranged in cylindrical or planar layers. In planar layers, wires can run in two or more directions, giving good spatial resolution in two orthogonal directions. The resolution of the drift distance is in the order of 0.1–0.2 mm.

## *1.2.5 Time Projection Chamber*

The typical time projection chamber (TPC), as employed in many collider experiments, is a large gas-filled volume shaped as a hollow cylinder, the axis of which is aligned with the beams and the magnetic field [1, Sects. 7.3.3], see Fig. 1.3. There is a potential difference between the central cathode plane and the two anode end plates. The latter are equipped with position sensors. A charged particle traversing the chamber ionizes the gas, and the electrons travel along the field lines towards the end plates, where both the point and the time of arrival are measured and recorded. A track therefore generates a dense string of space points. The position

sensors at the end plates can be wire chambers or micro-pattern gas detectors. In the drift direction *z* a spatial resolution in the order of 1 mm is possible; the transverse resolution depends on the technology of the endplate sensors and increases with the drift distance because of diffusion. Resolutions well below 0.1 mm can be achieved with GEM chambers on the end plates. Another important calibration issue is the correction of distortions arising from space charge effects, see for instance [11].

## *1.2.6 Micro-pattern Gas Detectors*

The earliest micro-pattern gas detector is the micro-strip gas detector, in which wires are replaced by microscopic metal strip structures deposited on high-resistivity substrates [1, Sect. 7.4]. With typical strip distances of 75 µm, a spatial resolution below 20 µm can be achieved. Other developments are the Gas Electron Multiplier (GEM) and Micro-Mesh Gaseous Structure (Micromegas) chambers. Resolutions down to 10 µm can be attained by these devices. Due to their small size and fast collection of positive ions, they can be operated at high rates up to several MHz per mm2.

## **1.3 Semiconductor Tracking Detectors**

Semiconductor tracking detectors [1, Sect. 7.5] are mostly made of thin silicon wafers, approximately 0.3 mm thick. The *n*-type silicon is processed by photolithographic methods to create *p*+-doped implants on one side, either thin strips (see Sect. 1.3.1) or small pixels (see Sect. 1.3.2). Each strip or pixel is connected to a read-out channel. The silicon bulk is fully depleted by a bias voltage. A charged particle crossing the wafer creates electron-hole pairs along its path. The electrons and the holes drift towards the electrodes and induce signals on the readout electrodes. Silicon drift detectors employ the measurement of the electron drift time for measuring the position of a crossing particle; see Sect. 1.3.3.

# *1.3.1 Silicon Strip Sensors*

Large-area semiconductor tracking systems employ silicon strip sensors to keep construction costs affordable. The implants in the silicon wafers are narrow strips with a typical width of 20 µm and a typical inter-strip distance of 100 µm. The local coordinate system is shown in Fig. 1.1. The spatial resolution in *v*, orthogonal to the strip direction, depends on the track direction via the cluster size, i.e., the number of adjacent strips with a signal above threshold, and on the Lorentz angle [12]. Under optimal conditions the resolution is in the order of 10 µm or better. The resolution in *<sup>w</sup>*, parallel to the strip direction, is equal to the strip length divided by <sup>√</sup>12. A typical strip length is 5 cm; shorter strips with a length below a centimeter are called mini-strips. For an example of the calibration procedure, see [13].

Two-dimensional (2D) measurements can be achieved by implanting strips on the back side of the wafer, either orthogonal to the ones on the front side, or at a different stereo angle. Alternatively, two one-sided sensors can be glued on the same mechanical support, separated by a small gap. If such a sensor happens to be crossed by *n* particles at the same time, up to *n* strips on each side can give a signal, resulting in up to *n*<sup>2</sup> points of intersection. At most *n* of these correspond to the true particle positions; the remaining ones are spurious or ghost hits.

## *1.3.2 Hybrid Pixel Sensors*

In a pixel sensor, the implants are small square or rectangular pixels in a highresistivity silicon wafer. The pixels are connected to the read-out channel by bump bonding. Depending on the pixel size and the track direction, pixel sensors can have a position resolution below 10 µm in both coordinates, especially if the signal height is measured and used to interpolate between pixels in a cluster. For an interesting example of the calibration procedure, see [14]. Here, the position is estimated from precomputed cluster templates, considering the incident angle and the Lorentz angle. The templates can also be used to decide whether an observed cluster is compatible with a predicted incident angle.

Pixel sensors are mostly employed in the region close to the interaction point, as they can deal with the high track density and the high background radiation

**Fig. 1.4** Three-dimensional sketch of a multi-anode silicon drift detector. The trajectories of several signal electrons are indicated. The distance between the point where the ionizing particle crosses the middle plane of the detector and the array of anodes is obtained from measurement of electrons drift time. The second coordinate is given by the location of the anode pad where the signal electrons are collected. An improvement of resolution along this coordinate comes from charge sharing among several anodes. (From [16], by permission of Elsevier)

better than strip detectors. In addition, their excellent spatial resolution allows the separation of secondary decay vertices very close to the primary interaction vertex; see Chap. 9.

In monolithic pixel sensors, the sensitive volume and part of or the full read-out circuits are combined in one piece of silicon. The generated charge is collected on a dedicated collection electrode so that there is no need for delicate and expensive bump bonding. For an application of monolithic pixel sensors in a vertex detector, see Sect. 1.6.1.1.

## *1.3.3 Silicon Drift Sensors*

In a silicon drift sensor, electrons are transported parallel to the surfaces of the sensor to an anode segmented into small pads [15, 16]. The position information along the drift direction is obtained from a measurement of the drift time of the electrons. The position in the second coordinate is obtained from charge sharing between adjacent pads; see Fig. 1.4. Silicon drift sensors have been deployed in STAR and ALICE, see [16, 17].

## **1.4 Scintillating Fiber Trackers**

Scintillating fiber trackers combine the speed and efficiency of plastic scintillators with the geometric flexibility and hermeticity provided by fiber technology [18]. The fibers in such a tracking detector serve two functions: they convert the ionisation energy deposited by a passing charged particle into optical photons, and guide the optical signal to the devices that detect the generated light. In recent applications, these devices are silicon photomultipliers, which are fast, compact and sensitive to single photons [19].

√ The spatial resolution of a fiber is approximately equal to the diameter divided by 12, while the number of photons scales linearly with the diameter. The conflicting requirements on accuracy and light yield can be alleviated by staggering the fibers and by choosing a material with high intrinsic scintillation yield and long attenuation length. For an example of a large-scale scintillating fiber tracker designed for operation in the LHCb experiment from Run 3 of the LHC onward, see [20].

## **1.5 Alignment**

The position measurements in the tracking detectors mentioned above are generated in the local coordinate system of the devices. In order to be useful for track reconstruction, they have to be converted to positions in the global coordinate system of the experiment along with the associated covariance matrices. As tracking detectors are very precise instruments, with position resolutions ranging from a couple of hundred micrometers down to about ten micrometers, their positions, orientations, and possible deformations have to be known with a similar or better precision. The importance of correct alignment, especially in the complex detectors of the LHC era, is attested by a series of workshops held at CERN in the past [21– 23].

Misalignment or insufficient alignment has a deleterious effect on the efficiency of track and vertex reconstruction [21, p. 105]. Random misalignment also degrades the resolution of track and vertex parameters and subsequently of invariant masses. Moreover, systematic misalignment of larger substructures can cause a bias in the estimates of track momenta and vertex positions. This can be harmful in many of the physics analyses of the experiment.

Misalignment can have several sources: finite precision of the detector assembly, thermal and magnetic stresses on mechanical structures, sagging of wires or sensors because of gravity, changes in temperature and humidity, etc. Since misalignment can and does vary over time, constant monitoring is a necessity.

Alignment proceeds through several steps. The starting point is the ideal geometry, augmented by knowledge of the machining and assembly precision. The next step is alignment by hardware using lasers for measuring distances or proximity and tilt sensors. For instance, the ATLAS silicon tracker can be monitored optically by Frequency Scanning Interferometry to a precision of about 10 µm [24].

The final step is track-based alignment, either with tracks from cosmic muons, or from collisions, or both. Actually alignment profits from different types of tracks that cross different parts of the detector under different angles. For instance, tracks from collisions hardly ever cross the entire central tracker of a collider experiment, but cosmic tracks do.

Track-based alignment can be split into internal alignment and external alignment. Internal alignment refers to the relative alignment of a tracking system, whereas external alignment refers to the alignment of the various tracking systems to the global frame of the experiment, which is usually tied to the beam pipe or some other part of the accelerator infrastructure. Even the internal alignment of a tracking system can be a big challenge. For instance, the current silicon tracker of the CMS detector has more than 104 sensors to be aligned, each with six degrees of freedom, not counting deformations of the sensors under gravity or thermal stresses. The estimation of *O(*105*)* parameters is a highly non-trivial problem. A solution that has become a de-facto standard is the experiment-independent program MillepedeII [25, 26], which performs a simultaneous fit of (global) alignment parameters and (local) track parameters, allowing to include laser and survey data as well as equality constraints in the fit. For one of the alternative algorithms developed for track-based alignment, see [27].

## **1.6 Tracking Systems**

Track reconstruction requires a minimal number of space points per track. A tracking system is a device that has enough information for stand-alone track reconstruction. A typical collider experiment has three tracking systems for momentum measurement: the vertex detector, the central tracker, and the muon tracking system. Fixed-target experiments frequently have vertex detectors as well, complemented by a magnetic spectrometer for momentum measurement.

The vertex detector is the tracking system closest to the beam, with the purpose to give very precise position and direction information of the tracks produced in a collision, so that decays very close to the interaction point can be detected with large efficiency; see Chap. 9. It therefore has the largest precision (smallest measurement errors) of all tracking systems. Vertex detectors are usually equipped with pixel sensors in order to achieve the required precision.

The central or inner tracker of a collider experiment is positioned between the vertex detector and the calorimeters. It is normally embedded in a solenoidal magnetic field with high bending power. A silicon tracker typically produces *O(*10*)* hits per track, while a TPC produces *O(*100*)* hits per track. The main requirements, not always easily satisfied, on the central tracker are: high single-hit precision; good capability to resolve two nearby tracks; precise momentum estimation by a long lever arm (large diameter); enough redundancy for high-quality track finding; hermetic coverage; as little material as possible. In some cases, especially at the future high-luminosity LHC (HL-LHC), fast readout is also essential, as trackers must be able to contribute to the trigger.

The muon system is situated behind the calorimeters which, in principle, absorb all particles with the exception of muons. Additional iron filters can be employed as well. The muon system can provide an independent measurement of the muon momentum, especially for the purpose of triggering; see Sect. 2.1. If the muon system has to cover a large area, as is the case in the LHC experiments, it is typically equipped with proportional chambers, drift chambers or drift tubes.

The following subsections briefly describe the tracking systems of experiments at the LHC, the SuperKEKB *B*-factory, and the future Facility for Antiproton and Ion Research (FAIR).

# *1.6.1 Detectors at the LHC*

#### **1.6.1.1 ALICE**

ALICE is a dedicated heavy-ion experiment at the LHC [28]. Its detector is designed to study the physics of strongly interacting matter at extreme energy densities.

Until the end of 2018, the Inner Tracking System (ITS, see Fig. 1.5) of the ALICE detector consisted of two barrel pixel layers [29], two layers of silicon drift detectors [17], and two layers of double-sided silicon strip detectors. After the upgrade in 2019–2020, the ITS consists of seven layers equipped with monolithic pixel chips [30].

**Fig. 1.5** Cut-away view of the ALICE detector. (From https://arxiv.org/abs/1812.08036. ©2015 CERN for the benefit of the ALICE Collaboration. Reproduced under License CC-BY-4.0)

The main tracking device of ALICE is a TPC [31, 32]. It provides up to 159 space points per track. The measurement of the energy deposit due to ionization provides a powerful tool for particle identification, especially for low-momentum particles, see [33]. For track reconstruction in the TPC and global track reconstruction, see Sect. 10.1. The TPC is surrounded by a transition radiation detector (TRD) used for triggering and electron identification.

The ALICE muon spectrometer covers only the forward region of the experiment and is dedicated to the study of quarkonia production, open heavy flavor production and vector meson properties via the muonic decay channel [34].

#### **1.6.1.2 ATLAS**

ATLAS [35] is one of the two general-purpose experiments at the LHC, the other one being CMS. Its vertex detector (see Fig. 1.6) originally consisted of three barrel pixel layers and three end-cap pixel disks on either side [36]. In 2014, a fourth pixel layer was inserted in the barrel between the existing pixel detectors and a new beam pipe with smaller radius [37].

The central tracker of ATLAS [36] consists of two parts: the silicon tracker (SCT) with four barrel layers and nine end-cap disks on either side, and the Transition Radiation Tracker (TRT) made of "straw tubes" which are proportional counters that contribute to particle identification via transition radiation; see Sect. 2.4.1. A charged particle hits at least 30 straw tubes on the way through the TRT; see Sect. 10.2.

The ATLAS muon spectrometer [35, Chap. 6] consists of a barrel part and two end-caps. The barrel spectrometer contains three concentric layers, each with eight large and eight small chambers of drift tubes. Each end-cap has four disks of drift tube chambers and cathode strip chambers. Resistive plate chambers on the barrel and thin gap chambers in the end-caps are used for trigger purposes.

#### **1.6.1.3 CMS**

CMS [38] is, besides ATLAS, the second general-purpose experiment at the LHC. Its vertex detector originally consisted of three barrel pixel layers and two end-cap pixel disks on either side [39]. In the winter of 2016/2017, this device was upgraded with a fourth barrel layer and a third end-cap disk on either side, giving at least four hits per track over the full solid angle covered by the detector [40].

The silicon strip tracker (SST) of CMS is the largest silicon tracker ever built. It is divided into four sections: the inner barrel (TIB), the outer barrel (TOB) and the two end-caps (TEC). Depending on its angle with respect to the beam axis, a charged particle crosses between eight and 14 sensors, out of which four to six are doublesided ones [41]. Track reconstruction in the SST is done mostly in conjunction with the Pixel Detector; see Sect. 10.3.

**Fig. 1.6** Top: Cut-away view of the ATLAS detector. (From [35], reproduced under License CC-BY-3.0). Bottom: The central tracker. (From https://collaborationatlasfrance.web.cern.ch/content/ tracker)

The CMS muon system [42] consists of four layers of muon stations inserted in the iron return yoke of the solenoid; see Fig. 1.7. The stations in the barrel region are equipped with drift tubes, and those in the end-caps are equipped with cathode strip chambers. In addition, resistive plate chambers are mounted in both the barrel and end-caps of CMS; they are used mainly for triggering.

**Fig. 1.7** Top: Schematic diagram of a sector of the barrel part of the CMS detector. (From [43], reproduced under License CC-BY-4.0). Bottom: Schematic view of the vertex detector and the silicon strip tracker. (Courtesy of W. Adam)

Figure 1.7 shows a schematic diagram of a sector of the barrel part of the CMS detector.

#### **1.6.1.4 LHCb**

LHCb [44] is the experiment at the LHC that is dedicated to precision measurements of CP violation and rare decays of *B* hadrons. Instead of surrounding the entire collision point with an enclosed detector as ATLAS and CMS, the LHCb experiment is designed to detect mainly particles in the forward direction.

**Fig. 1.8** Top: View of the LHCb detector. (From [44], reproduced under License CC-BY-3.0). Bottom: Sensor of the Vertex Locator (VELO)

The core of the LHCb [44] tracking system (see Fig. 1.8) is a silicon microstrip detector close to the interaction point, the Vertex Locator (VELO). It can be moved to a distance of only 7 mm from the proton beams and measures the position of the primary vertices and the impact parameters of the track with extremely high precision.

Up to end of 2018, the tracking downstream of the VELO was accomplished by the TT and the T stations. The Tracker Turicensis (TT) is a silicon microstrip detector placed upstream of the dipole magnet, which improves the momentum resolution of reconstructed tracks and reject pairs of tracks that in reality belong to the same particle. The magnet is placed behind the TT. It bends the flight path of the particles in the *x*-*z* plane and therefore allows the determination of their momenta. The tracking system is completed by the T stations (T1-T2-T3), which, together with the information from the VELO and optionally the TT, determine the momentum and flight direction of the particles. The T stations are composed of silicon microstrip sensors close to the beam pipe and by straw tubes in the outer regions. For track reconstruction in LHCb, see Sect. 10.4.

After the upgrade of LHCb during 2019–2020 and starting with Run 3 of the LHC, the tracking downstream of the VELO is done by the SciFi, a homogeneous tracking system in scintillating fiber technology; see [20] and Sect. 1.4.

The LHCb muon system [44, Sect. 6.3], consisting of the muon stations M1 to M5, provides fast information for the muon trigger at Level 0 and muon identification for the high-level trigger and offline analysis. It comprises five stations interleaved with absorbers. The stations are mostly equipped with multi-wire proportional chambers, with the exception of the central part of the first chamber, where GEM detectors are used because of the high particle rate.

# *1.6.2 Belle II and CBM*

#### **1.6.2.1 Belle II**

Belle II [45] is an experiment at the SuperKEKB collider at KEK in Japan. Its principal aim is the study of the properties of *B* mesons. The detector is shown in Fig. 1.9. The vertex detector consists of two parts, the pixel detector (PXD) with two layers of DEPFET pixels [46] and the Silicon Vertex Detector (SVD) with four layers of double-sided silicon strip detectors [47].

The central tracking device of Belle II is the CDC, a cylindrical drift chamber [48]. It has 56 layers of sense wires in nine superlayers, five with a total of 32 axial wire layers and four with 24 stereo wire layers. The stereo angle is between 2.6 and 4.2 degrees. For track reconstruction in Belle II, see Sect. 11.1.

The Belle II KLM system is designed to detect long-lived *K*-mesons and muons. It consists of alternating layers of iron plates, serving as flux return, and active detector elements. In the end-caps and the innermost two layers of the barrel, the active elements are scintillator strips; the rest of the barrel layers are equipped with resistive plate chambers, reused from Belle [49].

#### **1.6.2.2 CBM**

The Compressed Baryon Matter (CBM) experiment is a fixed target experiment [50] (see Fig. 1.10) at the future FAIR facility for antiproton and ion research. It is designed to investigate the properties of highly compressed baryon matter. Its

central tracking device is the Silicon Tracking System (STS [51]). It is designed for high multiplicity, up to 1000 charged particles per interaction, at high rates, up to 10 MHz, and consists of eight layers of double-sided silicon sensors between 30 and 100 cm downstream of the target, inside the magnetic field.

Each of the three Transition Radiation Detectors [52] is made of four MWPCs. Their main task is electron identification. Track reconstruction in the STS and the TRD is described in Sect. 11.2.

**Fig. 1.10** Top: Schematic geometry of the CBM detector. (From [53]). Bottom: Schematic geometry of the silicon tracking system. (From [51])

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 2 Event Reconstruction**

**Abstract** The chapter gives an outline of the event reconstruction chain of a typical large experiment, from the trigger to the physics object reconstruction. The concept of the trigger is illustrated by two examples, CMS and LHCb, followed by a discussion of track reconstruction and its stages: hit generation, local reconstruction, and global reconstruction. The section on vertex reconstruction introduces a classification of vertices and sets the scene for the dedicated chapters on vertex finding and vertex fitting. The chapter concludes with some remarks on particle identification and reconstruction of physics objects such as electrons, muons, photons, jets, *τ* leptons, and missing energy.

# **2.1 Trigger and Data Acquisition**

# *2.1.1 General Remarks*

In the latest and most powerful collider, the Large Hadron Collider (LHC), bunches of protons collide with a nominal frequency of 40 MHz, i.e., every 25 ns. Due to gaps in the beams, at most 2556 bunches are actually present [1]. As the bunch revolution frequency equals 11,245 Hz, the average bunch crossing rate is about 28.7 MHz. The average number of individual proton collisions per bunch crossing, called the pile-up, depends on the luminosity. In the CMS experiment, it varied between about five and slightly more than 60 in 2018 [2], so that bunch crossings without collisions are extremely unlikely in CMS and likewise in ATLAS. The result of a bunch crossing with at least one collision is called an event. In CMS and ATLAS, the event rate is virtually the same as the bunch crossing rate. In the LHCb experiment, however, not all events are actually visible in the detector, so that the effective event rate is lower than the bunch crossing rate; see Sect. 2.1.3.

The most frequent value of pile-up observed by CMS in 2018 was around 30, corresponding to nearly 900 million individual collisions per second. Clearly, it is neither possible nor desirable for the experiment to record all of the event data produced during data taking. A selection mechanism is therefore required that tags

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_2

the physically interesting events and activates the recording mechanism, called data acquisition (DAQ). Such a selection mechanism is called a trigger.

Triggers have been deployed for many decades, ever since bubble chambers were superseded by experiments with electronic tracking detectors and calorimeters. Triggers are needed both in fixed target and in colliding beam experiments because of limitations in data rates, storage capacity and computing resources. In order to deal with the high event rates typical for electronic experiments, it is important to minimize the dead time of the trigger, i.e., the time after an event during which the system is not able to process another event. Triggers are therefore implemented in several stages or levels, with increasing computational complexity and decision time or latency. The principle and possible implementations are best demonstrated on examples. In the following the trigger/DAQ systems of the CMS and the LHCb experiments will be described in somewhat more detail.

## *2.1.2 The CMS Trigger System*

CMS [3] is one of the two general purpose experiments at the LHC; see Sect. 1.6.1.3. Its trigger has two levels, the Level-1 Trigger (L1, [4]) and the High-Level Trigger (HLT, [5, 6]). During data taking virtually every bunch crossing results in at least one collision of protons, and the average primary event rate equals the average bunch crossing rate of 28.7 MHz. The trigger, however, must be able to process events separated by only 25 ns.

The task of the L1 trigger is to reduce the primary event rate to less than 100 kHz. This is achieved using high-speed customized hardware running up to 128 different algorithms. Its inputs come from the calorimeters, the muon detectors, and the beam monitors. Its latency is 3.2 µs, including data collection, decision making, and propagation back to the detector front-ends. A block diagram of the L1 trigger is shown in Fig. 2.1.

The input to the muon trigger (MT) comes from three different detector types. The local MTs find track segments, the regional MTs find tracks, and the global MT combines the information from all regional MTs, selects the best four muon candidates, and provides their momenta and directions.

The calorimeter trigger (CT) uses information from the both the electromagnetic and the hadronic calorimeter. The local CTs compute energy deposits, the regional CTs find candidates for electrons, photons, jets, isolated hadrons, and compute transverse energy sums. The transverse energy vector *E*<sup>T</sup> is defined as:

$$E\_{\rm T} = E \cdot \begin{pmatrix} \cos \phi \cos \lambda \\ \sin \phi \cos \lambda \end{pmatrix},\tag{2.1}$$

where *φ* is the azimuth angle and *λ* = *π/*2 − *θ* is the dip angle of the particle or jet direction. The global CT sorts the candidates in all categories, computes total

**Fig. 2.1** Block diagram of the CMS L1 trigger. The details are explained in the text. (From [4], by permission of Elsevier)

and missing transverse energy sums, and computes jet multiplicities for different thresholds.

Finally, the global trigger makes the final decision which is passed on to the detector front-end electronics and the DAQ.

The HLT is designed to reduce the L1 output rate of 100 kHz to the final recording rate of *O(*100*)* Hz. It is implemented purely in software which runs on a farm of commercial processors, using the full event data and performing the reconstruction and selection of physics objects such as electrons, photons, muons, *τ* leptons, hadronic jets, and missing energy; see Sect. 2.4.

# *2.1.3 The LHCb Trigger System*

Although the intersection point of LHCb is tuned to lower luminosity than the one of CMS and ATLAS, the number of produced *bb*¯ pairs is still of the order of 10<sup>11</sup> per year. An efficient, robust and flexible trigger is required in order to cope with the harsh hadronic environment. It must be sensitive to many different final states.

While the average bunch crossing rate is 28.7 MHz, the frequency of events actually visible in the detector is about 13 MHz [7]. This must be reduced to about **Fig. 2.2** Scheme of the LHCb trigger system. (From [8], reproduced under License CC-BY-4.0)

1 MHz, the frequency at which the detector can be read out. As in CMS, the trigger is organized in two levels, Level-0 (L0) and HLT (see Fig. 2.2).

The L0 trigger is implemented in field-programmable gate arrays (FPGAs) with a fixed latency of 4 µs [8]. It has two components: the calorimeter trigger, which looks for particles (electrons, photons, neutral pions, hadrons) with large transverse energies; and the muon trigger which uses the tracks reconstructed in the muon chambers and selects the two muons with the highest transverse momentum for each quadrant of the muon detector.

The HLT is implemented purely in software running on the nodes of the event filter farm [8]. It is subdivided in two stages, HLT1 and HLT2; see Sect. 10.4. HLT1 does a partial event reconstruction. Its output, at a rate of about 100 kHz, is directed to HLT2, which performs the full event reconstruction and writes the results to mass storage. For more details on track and vertex reconstruction, see Sect. 10.4.

## **2.2 Track Reconstruction**

Track reconstruction is a central task in the analysis of the event data. Its aim is to provide estimates of the track parameters, i.e., the position, the direction, and the momentum of charged particles at one or several specific points or surfaces. For an excellent review of track reconstruction, see [9].

Only the tracks of stable or sufficiently long-lived charged particles—for instance electrons, protons, muons and charged pions—are visible in the tracking detectors. Short-lived particles, for instance *B* hadrons or *J/ψ* mesons, are reconstructed from their decay products. The reconstruction of charged particles can be divided into the following steps:

**A. Hit generation** The recorded signals are converted to spatial coordinates, either 2D or 3D, using the detector-specific calibration constants. The coordinates will be called "measurements", "observations" or "hits" in the following. For some examples, see Sects. 1.2 and 1.3.

**B. Local track reconstruction** First, tracks are reconstructed in each tracking system. This can again be divided into two or three steps.


**C. Global track reconstruction** After the local track reconstruction, the tracks found in the individual tracking systems have to be combined to global track candidates. To this end, the track candidates accepted by the track fit in the main tracking system are extrapolated to the other tracking systems and checked for compatibility with the tracks reconstructed there. As the estimated track parameters in different tracking systems are stochastically independent, the test for compatibility is usually based on the chi-square statistic of their weighted mean, but more sophisticated machine learning methods can be applied as well [11]. The successful combination of local tracks to a global candidate is normally followed by a track fit of the latter.

Some tracking systems do not have enough redundancy to allow stand-alone local track reconstruction, for instance a pixel vertex detector with only two layers. In this case, the tracks found in the global track reconstruction are extrapolated and used to find compatible hits, which are then attached to the track; see Sect. 5.1.7.

**D. Assessment of track quality** Not every track candidate generated by the track finding is a valid track. Testing the track hypothesis and assessing the track quality after the track fit is therefore mandatory. This is the topic of Sect. 6.4.

Given a careful calibration of the tracking devices (Sects. 1.2 and 1.3) the chisquare statistic of the track fit is used to reject fake tracks (Sect. 6.4.1) and to trigger the search for outliers (Sect. 6.4.2) or kinks (Sect. 6.4.3). Another important ingredient to the assessment of track quality is the track length or the effective number of hits attached to the track after removal or down-weighting of outliers; see Sect. 6.4.1. Checking the compatibility with the collision or production region can also be used to reject fake tracks.

If the track reconstruction proceeds iteratively, it is tempting to remove the hits used in an iteration from the hit pool to simplify the task of the subsequent iterations. It is, however, advisable to remove only those hits that are unambiguously attached to tracks of the highest quality.

## **2.3 Vertex Reconstruction**

A point where particles are produced in a collision or a decay is called a vertex. The point of collision of two beam particles in a collider or of a beam particle with a target particle in a fixed-target experiment is called the primary vertex. In high-luminosity colliders, such as the LHC, many collisions occur in a single bunch crossing; consequently, there are many primary vertices. It is, however, statistically almost certain that at most one of the collisions generates a pattern recognized by the trigger as being of potential physical interest. The vertex of this collision is called the signal vertex.

Many of the particles produced at a primary vertex, including the signal vertex, are unstable and decay at a secondary vertex. The aim of of vertex reconstruction is to find sets of particles that have been produced at the same vertex, to estimate the vertex position, test whether the assignment of the particles to the vertex is correct, and improve the estimates of the track parameters by imposing the vertex constraint. Vertex finding and vertex fitting are the topics of Chaps. 7 and 8, respectively. For the various types of secondary vertices and how to find them, see Chap. 9. Examples of experimental strategies are given in Chaps. 10 and 11.

## **2.4 Physics Objects Reconstruction**

Both the trigger and the physics analysis require not just tracks, but objects that represent physical entities such as electrons, photons, muons, *τ* leptons, jets, missing energy, etc. Object identification can be obtained by two complementary approaches: dedicated detectors for particle identification (PID), and combining information from different sub-detectors.

## *2.4.1 Particle ID by Dedicated Detectors*

Charged particles can be identified by dedicated detectors in various ways [12].

**Measurement of the velocity** Given the momentum as determined by the tracking system, the mass can be estimated. Velocity can be measured directly by time-offlight detectors [13], or indirectly by measuring the emission angle of Cherenkov radiation in Cherenkov detectors [14] and time-of-propagation detectors [15].

**Energy loss by ionization** In a large range of velocity, the expected energy loss by ionization is proportional to *(m/p)*2, where *m* is the unknown mass and *p* is the momentum of the particle; see Sect. 4.5.2.1. In practice, the most probable energy loss is estimated from a number of measurements. In a silicon tracker, energy loss is measured in each sensor [16]; in a drift or time projection chamber, the energy loss is measured for each wire hit or for each cluster in the endplates, respectively [17].

**Transition radiation** Transition radiation (TR) is electromagnetic radiation in the X-ray band. It is emitted when an ultra-relativistic particle crosses the boundary between two media with different dielectric constants. The radiator is combined with a gaseous sensing device that measures the TR signal and the position of the particle [13, 18].

## *2.4.2 Particle and Object ID by Tracking and Calorimetry*

PID in dedicated detectors is complemented by combining information from the tracking systems and the calorimeters.

**Electrons** Electrons and positrons are identified as such by the fact that they have a reconstructed track and a cluster in the electromagnetic calorimeter that matches the track in energy and position. For the special treatment of electrons in the track fitting, see Sects. 6.2.3, 10.2 and 10.3.

**Photons** Clusters in the electromagnetic calorimeter that are not matched to a track or a cluster in the hadronic calorimeters are candidates for photons.

**Muons** Global tracks with hits in both the central tracking system and the muon tracking system are candidates for muons.

**Jets** Jets are narrow bundles of charged and neutral particles produced by the hadronization of a quark or a gluon. Jet reconstruction algorithms are based on clustering the charged tracks, but should also provide a good correspondence between the energy deposits in the calorimeters and the reconstructed tracks. This is the aim of the particle flow method, which originated in the ALEPH experiment at the LEP collider [19], and is now employed by LHC experiments as well [20, 21]. In this approach, the energy of a charged hadron is estimated by a weighted average of the track momentum and the associated calorimeter energy. The energy of the photons is measured by the electromagnetic calorimeter, and the energy of neutral hadrons is measured by the hadronic calorimeter.

**Tau leptons** Tau leptons have to be reconstructed from their decay products (Sect. 9.2). In two thirds of the cases, *τ* leptons decay into hadrons, typically into one or three charged mesons (predominantly pions), often accompanied by neutral pions decaying into photons and an invisible neutrino. Therefore, the particle flow approach can be applied to *τ* leptons as well [22, 23].

**Missing energy** Missing transverse energy is a signature for invisible particles such as neutrinos, dark matter, and neutral supersymmetric particles. In a typical collider experiment, it is a global quantity computed from the transverse momentum/energy components of all jets, electrons, photons, muons, and *τ* leptons in the event. In the LHC experiments, it has to be corrected for contributions from the pile-up collisions in the same bunch crossing; see for instance [24].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.


# **Chapter 3 Statistics and Numerical Methods**

**Abstract** The chapter gives an outline of some statistical and numerical methods that will be applied in later chapters. The first section deals with the minimization of functions. Several gradient-based methods and a popular non-gradient method are discussed. The following section discusses statistical models and the estimation of model parameters. The basics of linear and nonlinear regression models and state space models are presented, including least-squares estimation and the (extended) Kalman filter. The final section gives a brief overview of clustering and different types of clustering algorithms.

# **3.1 Function Minimization**

The minimization (or maximization) of a multivariate function *F (x)* is a frequent task in solving non-linear systems of equations, clustering, maximum-likelihood estimation, function and model fitting, supervised learning, and similar problems. A basic classification of minimization methods distinguishes between methods that require the computation of the gradient or even the Hessian matrix of the function and gradient-free methods. All methods discussed in the following subsections are iterative and require a suitable starting point *x*0. Implementations of the methods discussed in the following, along with many others, can, for instance, be found in the MATLAB® Optimization Toolbox™ [1] and in the Python package scipy.optimize [2].

# *3.1.1 Newton–Raphson Method*

If *F (x)* is at least twice continuously differentiable in its domain, it can be approximated by its second order Taylor expansion *F*ˆ at the starting point *x*0:

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_3

34 3 Statistics and Numerical Methods

$$F(\mathbf{x}\_0 + \hbar) \approx \hat{F}(\mathbf{x}) = F(\mathbf{x}\_0) + \mathbf{g}(\mathbf{x}\_0)\hbar + \frac{1}{2}\hbar^\mathsf{T}H(\mathbf{x}\_0)\hbar,\tag{3.1}$$

where *<sup>g</sup>(x)* = ∇*F (x)* is the gradient and *<sup>H</sup>(x)* = ∇2*F (x)* is the Hessian matrix of *F (x)*. The step *h* is determined such that *F*ˆ has a stationary point at *x*<sup>1</sup> = *x*<sup>0</sup> + *h*, i.e., that the gradient of *F*ˆ is zero at *x*1:

$$
\nabla \hat{F}(\mathbf{x}\_{1}) = \mathbf{g}(\mathbf{x}\_{0}) + \boldsymbol{h}^{\mathsf{T}} \boldsymbol{H}(\mathbf{x}\_{0}) = \mathbf{0} \Longrightarrow \boldsymbol{h} = -[\boldsymbol{H}(\mathbf{x}\_{0})]^{-1} \mathbf{g}(\mathbf{x}\_{0})^{\mathsf{T}}.\tag{3.2}
$$

Note that if *x* is a column vector, *g(x*0*)* is a row vector. In order to ensure that the Wolfe conditions [3] are satisfied, Eq. (3.2) is often relaxed to:

$$h = -\eta \left[ H(\mathbf{x}\_0) \right]^{-1} \mathbf{g} \left( \mathbf{x}\_0 \right)^{\mathsf{T}},\tag{3.3}$$

with a learning rate *η* ∈ *(*0*,* 1*)*. Inverting the Hessian matrix can be computationally expensive; in this case, *h* can be computed by finding an approximate solution to the linear system *<sup>H</sup>(x*0*)* · *<sup>h</sup>* = −*g(x*0*)*T. This procedure is iterated to produce a sequence of values according to:

$$\mathbf{x}\_{k+1} = \mathbf{x}\_k - \eta \left[ H(\mathbf{x}\_k) \right]^{-1} \mathbf{g}(\mathbf{x}\_k)^\mathsf{T}. \tag{3.4}$$

If the starting point *x*<sup>0</sup> is sufficiently close to a local minimum, the sequence converges quadratically to the local minimum. In practice, the iteration is stopped as soon as the norm |*g(xk)*| of the gradient falls below some predefined bound *ε*. If *F* is a convex function, the local minimum is also the global minimum.

## *3.1.2 Descent Methods*

As the computation of the Hessian matrix is computationally costly, various methods that do not require it have been devised, for instance, descent methods. A descent method is an iterative algorithm that searches for an approximate minimum of *F* by decreasing the value of *F* in every iteration. The iteration has the form *xk*+<sup>1</sup> = *x<sup>k</sup>* + *ηk dk*, where *d<sup>k</sup>* is the search direction and *ηk* is the step-size parameter. As with the Newton–Raphson method, when a (local) minimum is reached, it cannot be left anymore.

#### **3.1.2.1 Line Search**

A search direction *<sup>d</sup>* is called a descent direction at the point *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* if *<sup>g</sup>(x)*·*<sup>d</sup> <sup>&</sup>lt;* 0. If *η* is sufficiently small, then

$$F(\mathbf{x} + \eta \, d) < F(\mathbf{x}).\tag{3.5}$$

Once a search direction *d<sup>k</sup>* has been chosen at the point *xk*, line search implies that the line *x* = *x<sup>k</sup>* + *λ d<sup>k</sup>* is followed to its closest local minimum. For various line search methods such as fixed and variable step size, interpolation, golden section or Fibonacci's method, see [4, 5].

#### **3.1.2.2 Steepest Descent**

Steepest descent follows the opposite direction of the gradient. If it is combined with line search, each search direction is orthogonal to the previous one, leading to a zig-zag search path. This can be very inefficient in the vicinity of a minimum where the Hessian matrix has a large condition number (ratio of the largest to the smallest eigenvalue); see Fig. 3.1.

#### **3.1.2.3 Quasi-Newton Methods**

In the Newton-Raphson method, the search direction is *d* = −[*H(xk)*] <sup>−</sup>1*g(xk)*<sup>T</sup>. If *<sup>H</sup>(xk)* is positive definite, then *<sup>d</sup>* is a descent direction, and so is <sup>−</sup>*Ag(xk)*<sup>T</sup> for any positive definite matrix *A*. In a quasi-Newton method, *A* is constructed as an approximation to the inverse Hessian matrix, using only gradient information. The initial search direction is the negative gradient, *<sup>d</sup>*<sup>0</sup> = −*g(x*0*)*T, and the initial matrix *A*<sup>0</sup> is the identity matrix. Each iteration performs a line search along the current search direction:

**Fig. 3.1** Contour lines of the function *F (x, y)* <sup>=</sup> <sup>2</sup>*x*2+16*y*2, and steepest descent with line search starting at the point *x*<sup>0</sup> = *(*3; 1*)*

36 3 Statistics and Numerical Methods

$$
\lambda\_k = \arg\_\lambda \min\_\lambda \left( \mathbf{x}\_k + \lambda \mathbf{d}\_k \right), \ \mathbf{x}\_{k+1} = \mathbf{x}\_k + \lambda\_k \mathbf{d}\_k. \tag{3.6}
$$

The new search direction is then computed according to:

$$d\_{k+1} = -A\_{k+1} \mathbf{g}(\mathbf{x}\_{k+1})^\mathsf{T},\tag{3.7}$$

where *Ak*+<sup>1</sup> is the updated approximation to the inverse Hessian matrix. There are two different algorithms for computing the update [6, p. 422], both using the change of the gradient along the step, denoted by *y<sup>k</sup>* = [*g(xk*+1*)* − *g(xk)*] T.

#### **Davidon–Fletcher–Powell algorithm**

$$A\_{k+1} = A\_k + \lambda\_k \frac{\mathbf{d}\_k \mathbf{d}\_k^\top}{\mathbf{d}\_k^\top \mathbf{y}\_k} - \frac{A\_k \mathbf{y}\_k \mathbf{y}\_k^\top A\_k}{\mathbf{y}\_k^\top A\_k \mathbf{y}\_k}. \tag{3.8}$$

**Broyden–Fletcher–Goldfarb–Shanno algorithm**

$$A\_{k+1} = \lambda\_k \frac{d\_k d\_k^{\mathsf{T}}}{d\_k^{\mathsf{T}} \mathbf{y}\_k} + \left(I - \frac{d\_k \mathbf{y}\_k^{\mathsf{T}}}{d\_k^{\mathsf{T}} \mathbf{y}\_k}\right) A\_k \left(I - \frac{\mathbf{y}\_k d\_k^{\mathsf{T}}}{\mathbf{y}\_k^{\mathsf{T}} d\_k}\right) . \tag{3.9}$$

#### **3.1.2.4 Conjugate Gradients**

If the function *F (x), <sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is quadratic of the form *F (x)* <sup>=</sup> <sup>1</sup> <sup>2</sup>*x*T*Ax* <sup>−</sup> *<sup>b</sup>*T*<sup>x</sup>* <sup>+</sup> *<sup>c</sup>* with positive definite *A*, the global minimum can be found in exactly *n* steps, if line search with a set of conjugate search directions is used. Such a set {*d*1*,..., dn*} is characterized by the following conditions:

$$\mathbf{d}\_{l}^{\mathsf{T}} \mathbf{A} \mathbf{d}\_{j} = 0, \quad i \neq j, \quad \mathbf{d}\_{l}^{\mathsf{T}} \mathbf{A} \mathbf{d}\_{l} \neq 0, \quad i = 1, \ldots, n. \tag{3.10}$$

The set is linearly independent and a basis of <sup>R</sup>*n*. An example for *<sup>n</sup>* <sup>=</sup> 2 is shown in Fig. 3.2.

In the general case, the conjugate gradient method proceeds by successive approximations, generating a new search direction in every iteration. Given an approximation *<sup>x</sup>k*, the new search direction is *<sup>d</sup><sup>k</sup>* = −*g(xk)*<sup>T</sup> <sup>+</sup> *βk <sup>d</sup>k*−1, where *d*<sup>0</sup> is arbitrary. A line search along direction *d<sup>k</sup>* gives the next approximation:

$$
\lambda\_k = \arg\_{\lambda} \min(\mathbf{x}\_k + \lambda \mathbf{d}\_k), \ \mathbf{x}\_{k+1} = \mathbf{x}\_k + \lambda\_k \mathbf{d}\_k. \tag{3.11}
$$

Different variants of the algorithm exist corresponding to different prescriptions for computing *βk*. Two of them are given here [6, p. 406–416].

**Fig. 3.2** Contour lines of the function *F (x, y)* <sup>=</sup> <sup>2</sup>*x*<sup>2</sup> <sup>+</sup> <sup>16</sup>*y*2, and descent with line search and conjugate gradients starting at the point *x*<sup>0</sup> = *(*3; 1*)*. The minimum is reached in two steps

#### **Fletcher–Reeves algorithm**

$$\beta\_k = \frac{\mathbf{g}(\mathbf{x}\_k)\mathbf{g}(\mathbf{x}\_k)^\mathsf{T}}{\mathbf{g}(\mathbf{x}\_{k-1})\mathbf{g}(\mathbf{x}\_{k-1})^\mathsf{T}}.\tag{3.12}$$

#### **Polak-Ribière algorithm**

$$\beta\_k = \frac{\mathbf{g}(\mathbf{x}\_k) \left[ \mathbf{g}(\mathbf{x}\_k) - \mathbf{g}(\mathbf{x}\_{k-1}) \right]^\mathsf{T}}{\mathbf{g}(\mathbf{x}\_{k-1}) \mathbf{g}(\mathbf{x}\_{k-1})^\mathsf{T}}. \tag{3.13}$$

It is customary to set *βk* to zero if *k* is a multiple of *n*, in order to avoid accumulation of rounding errors. As such, convergence to the minimum in *n* steps is no longer guaranteed for non-quadratic *F (x)*.

## *3.1.3 Gradient-Free Methods*

A popular gradient-free method is the downhill-simplex or Nelder–Mead algorithm [7]. It can be applied to functions whose derivatives are unknown, do not exist everywhere, or are too costly or difficult to compute. In *n* dimensions, the method stores *n*+1 test points *x*1*,..., xn*+<sup>1</sup> at every iteration, ordered by increasing function values, and the centroid *x*<sup>0</sup> of all points but the last one. The simplex

**Fig. 3.3** Contour lines of the Rosenbrock function *F (x, y)* <sup>=</sup> *(*0*.*<sup>8</sup> <sup>−</sup> *x)*<sup>2</sup> <sup>+</sup> <sup>200</sup> *(y* <sup>−</sup> *<sup>x</sup>*2*)*<sup>2</sup> and minimization with the downhill-simplex method starting at the point *x*<sup>0</sup> = *(*1*.*5; 1*)*. With the tolerance 10−<sup>8</sup> on the function value the minimum is reached after 90 steps

generated by the test points is then modified according to the function values in the test points. The allowed modifications are: reflection, expansion, contraction and shrinking. The iteration is terminated when the function value of the best point does not change significantly anymore. The size of the initial simplex is important; choosing it too small can lead to a very localized search. On the other hand, it is possible to escape from a local minimum by restarting the search with a sufficiently large simplex.

An example with the Rosenbrock function *F (x, y)* <sup>=</sup> *(*0*.*8−*x)*2+<sup>200</sup> *(y*−*x*2*)*<sup>2</sup> is shown in Fig. 3.3. The function has a very shallow minimum at *x* = 0*.*64*, y* = 0*.*8. With the tolerance 10−<sup>8</sup> on the function value the minimum is reached after 90 steps.

Other gradient-free methods are simulated annealing, tabu search, particle swarm optimization, genetic algorithms, etc.

## **3.2 Statistical Models and Estimation**

In the context of this book a statistical model is defined as a functional dependence of observed quantities (observations or measurements) on unknown quantities of interest (parameters or state vectors). The parameters cannot be observed directly, and the observations are subject to stochastic uncertainties. The aim is to estimate the parameters from the observations according to some criterion of optimality.

## *3.2.1 Linear Regression Models*

A linear regression model has the following general form [8]:

$$\mathfrak{a} = F\mathfrak{p} + \mathfrak{c} + \mathfrak{e}, \ \mathsf{E}\{\mathfrak{e}\} = \mathfrak{0}, \ \mathsf{Var}\{\mathfrak{e}\} = V = G^{-1}, \tag{3.14}$$

where *m* is the *(n* × 1*)*-vector of observations, *F* is the known *(n* × *m)* model matrix with *m* ≤ *n* and assumed to be of full rank, *p* is the *(m* × 1*)* vector of model parameters, *c* is a known constant offset, and *ε* is the *(n* × 1*)* vector of observation errors with zero expectation and *(n*×*n)* covariance matrix *V* , assumed to be known.

LS estimation of *p* requires the minimization of the following objective function:

$$S(p) = (\mathfrak{m} - F\mathfrak{p} - \mathfrak{c})^{\mathsf{T}} \mathcal{G} \begin{pmatrix} \mathfrak{m} - F\mathfrak{p} - \mathfrak{c} \end{pmatrix}.\tag{3.15}$$

The least-squares (LS) estimator *p***˜** and its covariance matrix *C* are given by:

$$\tilde{p} = (F^{\mathsf{T}}GF)^{-1}F^{\mathsf{T}}G(\mathfrak{m}-\mathfrak{c}), \ \mathsf{C} = (F^{\mathsf{T}}GF)^{-1}.\tag{3.16}$$

The estimator *p***˜** is unbiased and the estimator with the smallest covariance matrix among all estimators that are linear functions of the observations. If the distribution of *ε* is a multivariate normal distribution, the estimator is efficient, i.e., has the smallest possible covariance matrix among all unbiased estimators.

The residuals *r* of the regression are defined by:

$$\text{For } r = m - c - F\tilde{p}, \text{ } R = \text{Var}\left[r\right] = V - F\left(F^{\dagger}GF\right)^{-1}F^{\dagger}. \tag{3.17}$$

The standardized residuals *s*, also called the "pulls" in high-energy physics, are given by:

$$s\_{l} = \frac{r\_{l}}{\sqrt{\mathbf{R}\_{il}}}, \quad i = 1, \ldots, n. \tag{3.18}$$

If the model is correctly specified, the pulls have mean 0 and standard deviation 1. The chi-square statistic of the regression is defined as:

$$\chi^2 = \mathbf{r}^\mathsf{T} \mathbf{G} \, r,\quad\text{with}\,\, \mathsf{E}\left[\chi^2\right] = n - m. \tag{3.19}$$

If the observation errors are normally distributed, *<sup>χ</sup>*<sup>2</sup> is *<sup>χ</sup>*2-distributed with *<sup>d</sup>* <sup>=</sup> *n* − *m* degrees of freedom; its expectation is *d* and its variance is 2*d*. Its *p*-value *p* is defined by the following probability transform:

40 3 Statistics and Numerical Methods

$$p = 1 - G\_d(\chi^2) = \int\_{\chi^2}^{\infty} \mathbf{g}\_d(\mathbf{x}) \, \mathbf{dx},\tag{3.20}$$

where *Gk(x)* is the cumulative distribution function of the *χ*2-distribution with *k* degrees of freedom and *gk(x)* is its probability density function (PDF). Large values of *χ*<sup>2</sup> correspond to small *p*-values. If the model is correctly specified, *p* is uniformly distributed in the unit interval. A very small *p*-value indicates a misspecification of the model or of the covariance matrix *V* , or both.

## *3.2.2 Nonlinear Regression Models*

The linear regression model in Eq. (3.14) can be generalized to a nonlinear model:

$$\mathfrak{a} \mathfrak{m} = f\left(\mathfrak{p}\right) + \mathfrak{e}, \quad \mathsf{E}\left[\mathfrak{e}\right] = \mathfrak{0}, \ \mathsf{Var}\left[\mathfrak{e}\right] = V = \mathsf{G}^{-1}, \tag{3.21}$$

where *f* is a *(n* × 1*)*-vector of smooth functions of *m* variables. LS estimation of *p* requires the minimization of the following objective function:

$$\mathcal{S}(\boldsymbol{p}) = \left[\boldsymbol{m} - \boldsymbol{f}\left(\boldsymbol{p}\right)\right]^{\mathsf{T}} \boldsymbol{G}\left[\boldsymbol{m} - \boldsymbol{f}\left(\boldsymbol{p}\right)\right].\tag{3.22}$$

The function S*(p)* can be minimized with any of the methods discussed in Sect. 3.1. The one used most frequently is probably the Gauss-Newton method, based on the first-order Taylor expansion of *f* and resulting in the following iteration:

$$
\tilde{p}\_{k+1} = \tilde{p}\_k + (F\_k^\mathsf{T} G F\_k)^{-1} F\_k^\mathsf{T} G \left[ \mathfrak{m} - f(\tilde{p}\_k) \right], \quad F\_k = \frac{\partial f}{\partial \mathsf{p}} \bigg|\_{\tilde{\mathsf{p}}\_k}. \tag{3.23}
$$

At each step, the covariance matrix *Ck*+<sup>1</sup> of *p***˜***k*+<sup>1</sup> is approximately given by:

$$\mathbf{C}\_{k+1} = (\mathbf{F}\_k^\mathsf{T} \mathbf{G} \, F\_k)^{-1}. \tag{3.24}$$

In general, the covariance matrix of the final estimate *p***˜** can be approximated by the inverse of the Hessian of <sup>S</sup>*(p)* at *<sup>p</sup>***˜**. The final chi-square statistic *<sup>χ</sup>*<sup>2</sup> is given by:

$$\left\|\chi\right\|^{2} = \left[m - f\left(\tilde{p}\right)\right]^{\mathsf{T}} \mathbf{G}\left[m - f\left(\tilde{p}\right)\right].\tag{3.25}$$

In the case of Gaussian observation errors, the chi-square statistic is approximately *χ*2-distributed, and its *p*-value is approximately uniformly distributed. The iteration is stopped when the chi-square statistic does not change significantly any more.

## *3.2.3 State Space Models*

A dynamic or state space model describes the state of an object in space or time, such as a rocket or a charged particle [9]. The state usually changes continuously, but is assumed to be of interest only at discrete instances in the present context. These instances are labeled with indices from 0, the initial state, to *n*, the final state. The state at instance *k* is specified by the state vector *qk*. The spatial or temporal evolution of the state is described by the system equation, which is Eq. (3.26) in the linear case and Eq. (3.41) in the general case.

#### **3.2.3.1 Linear State Space Models and the Kalman Filter**

In the simplest case, the state at instant *k* is an affine function of the state at instant *k* − 1 plus a random disturbance called the system or process noise, with known expectation and covariance matrix:

$$\mathfrak{q}\_{k} = F\_{k|k-1}\mathfrak{q}\_{k-1} + \mathfrak{d}\_{k} + \mathfrak{p}\_{k}, \ \mathsf{E}\left[\mathfrak{p}\_{k}\right] = \mathfrak{g}\_{k}, \ \mathsf{Var}\left[\mathfrak{p}\_{k}\right] = \mathfrak{Q}\_{k}.\tag{3.26}$$

The process noise *γ <sup>k</sup>* may affect only a subset of the state vector, in which case its covariance matrix *Q<sup>k</sup>* is not of full rank.

In most cases, only some or even none of the components of the state vector can be observed directly. Instead, the observations are functions of the state plus an observation error. In the simplest case, this function is again affine:

$$\mathfrak{m}\_{k} = H\_{k}\mathfrak{q}\_{k} + \mathfrak{c}\_{k} + \mathfrak{e}\_{k}, \ \mathsf{E}\left[\mathfrak{e}\_{k}\right] = \mathbf{0}, \ \mathsf{Var}\left[\mathfrak{e}\_{k}\right] = V\_{k} = \mathbf{G}\_{k}^{-1}.\tag{3.27}$$

Process noise and observation errors are assumed to be independent.

Given observations *m*1*,..., mn*, an initial state *q*<sup>0</sup> and an initial state covariance matrix *C*0, all states can be estimated recursively by the Kalman filter [10]. Assume there is an estimated state vector *q***˜** *<sup>k</sup>*−<sup>1</sup> with its covariance matrix *Ck*−<sup>1</sup> at instant *k*−1. The estimated state vector at instant *k* is obtained by a prediction step followed by an update step. After the last update step, the full information contained in all observations can be propagated back to all previous states by the smoother. If both process noise and observation errors are normally distributed, the Kalman filter is the optimal filter; if not, it is still the best linear filter.

**Prediction step** The state vector and its covariance matrix are propagated to the next instance using the system equation Eq. (3.26) and linear error propagation:

$$
\tilde{\boldsymbol{q}}\_{k|k-1} = \boldsymbol{F}\_{k|k-1} \tilde{\boldsymbol{q}}\_{k-1} + \boldsymbol{d}\_k + \mathbf{g}\_k, \ \mathcal{C}\_{k|k-1} = \boldsymbol{F}\_{k|k-1} \mathcal{C}\_{k-1} \boldsymbol{F}\_{k|k-1}^\top + \boldsymbol{\mathcal{Q}}\_k. \tag{3.28}
$$

**Update step** The updated state vector is the weighted mean of the predicted state vector and the observation *mk*. There are two equivalent ways to compute the update. The first one uses the gain matrix *Kk*:

$$
\tilde{\boldsymbol{\mathfrak{q}}}\_{k} = \tilde{\boldsymbol{\mathfrak{q}}}\_{k|k-1} + \mathbf{K}\_{k} (\boldsymbol{\mathfrak{m}}\_{k} - \mathbf{c}\_{k} - \mathbf{H}\_{k} \tilde{\boldsymbol{\mathfrak{q}}}\_{k|k-1}), \quad \mathbf{C}\_{k} = (\mathbf{I} - \mathbf{K}\_{k} \mathbf{H}\_{k}) \mathbf{C}\_{k|k-1}, \tag{3.29}
$$

$$\mathbf{K}\_k = \mathbf{C}\_{k|k-1} \mathbf{H}\_k^\mathsf{T} (\mathbf{V}\_k + \mathbf{H}\_k \mathbf{C}\_{k|k-1} \mathbf{H}\_k^\mathsf{T})^{-1}. \tag{3.30}$$

The second one is a multivariate weighted mean:

$$\tilde{\boldsymbol{q}}\_{k} = \mathbf{C}\_{k} \left[ \mathbf{C}\_{k|k-1}^{-1} \tilde{\boldsymbol{q}}\_{k|k-1} + \boldsymbol{H}\_{k}^{\mathsf{T}} \boldsymbol{G}\_{k} (\boldsymbol{m}\_{k} - \mathbf{c}\_{k}) \right],\tag{3.31}$$

$$\mathbf{C}\_{k} = (\mathbf{C}\_{k|k-1}^{-1} + \mathbf{H}\_{k}^{\mathsf{T}} \mathbf{G}\_{k} \mathbf{H}\_{k})^{-1}. \tag{3.32}$$

The update step has an associated chi-square statistic *χ*<sup>2</sup> *<sup>k</sup>* , which can be computed from the predicted residual *rk*|*k*−<sup>1</sup> or from the updated residual *rk*:

$$r\_{k|k-1} = \mathfrak{m}\_k - \mathfrak{c}\_k - H\_k \tilde{\mathfrak{q}}\_{k|k-1}, \ \mathbf{R}\_{k|k-1} = \mathsf{Var}\left[r\_{k|k-1}\right] = V\_k + H\_k \mathbf{C}\_{k|k-1} \mathbf{H}\_k^\top,\tag{3.33}$$

$$\text{For } r\_k = \mathfrak{m}\_k - \mathfrak{c}\_k - H\_k \tilde{\mathfrak{q}}\_k, \text{ } \mathcal{R}\_k = \mathsf{Var}\left[r\_k\right] = V\_k - H\_k \mathsf{C}\_k \boldsymbol{H}\_k^\top,\tag{3.34}$$

$$
\chi\_k^2 = \mathbf{r}\_{k|k-1}^\mathsf{T} \mathbf{R}\_{k|k-1}^{-1} \mathbf{r}\_{k|k-1} = \mathbf{r}\_k^\mathsf{T} \mathbf{R}\_k^{-1} \mathbf{r}\_k. \tag{3.35}
$$

If the model is correctly specified and both process noise and observation errors are normally distributed, *χ*<sup>2</sup> *<sup>k</sup>* is *<sup>χ</sup>*2-distributed with a number of degrees of freedom equal to the dimension of *mk*. The total chi-square *χ*<sup>2</sup> tot of the filter is obtained by summing *χ*<sup>2</sup> *<sup>k</sup>* over all *k*.

**Smoothing** The smoother propagates the full information contained in the last estimate *q***˜***<sup>n</sup>* back to all previous states. There are again two equivalent formulations. The first one uses the gain matrix *A<sup>k</sup>* of the smoother:

$$
\tilde{\mathbf{q}}\_{k\mid n} = \tilde{\mathbf{q}}\_k + A\_k (\tilde{\mathbf{q}}\_{k+1\mid n} - \tilde{\mathbf{q}}\_{k+1\mid k}), \quad A\_k = \mathbf{C}\_k \boldsymbol{F}\_{k+1}^{\mathsf{T}} \mathbf{C}\_{k+1\mid k}^{-1}, \tag{3.36}
$$

$$\mathbf{C}\_{k\mid n} = \mathbf{C}\_k - A\_k(\mathbf{C}\_{k+1\mid k} - \mathbf{C}\_{k+1\mid n})A\_k^\top. \tag{3.37}$$

This formulation is numerically unstable, as the difference of the two positive definite matrices in Eq. (3.37) can fail to be positive definite as well because of rounding errors. The second, numerically stable formulation realizes the smoother by running two filters, one forward and one backward, on the same sets of observations. The smoothed state is a weighted mean of the states of the two filters:

$$
\tilde{\boldsymbol{q}}\_{k|n} = \mathbf{C}\_{k|n} \left[ \mathbf{C}\_k^{-1} \tilde{\boldsymbol{q}}\_k + \left( \mathbf{C}\_{k|k+1}^{\mathbf{b}} \right)^{-1} \tilde{\boldsymbol{q}}\_{k|k+1}^{\mathbf{b}} \right], \quad \mathbf{C}\_{k|n}^{-1} = \mathbf{C}\_k^{-1} + \left( \mathbf{C}\_{k|k+1}^{\mathbf{b}} \right)^{-1}, \tag{3.38}
$$

where *<sup>q</sup>***˜** <sup>b</sup> *<sup>k</sup>*|*k*+<sup>1</sup> is the predicted state from the backward filter and *<sup>C</sup>* <sup>b</sup> *<sup>k</sup>*|*k*+<sup>1</sup> its covariance matrix. Alternatively, the predicted state from the forward filter and the updated state from the backward filter can be combined. The smoother step has an associated chi-square statistic *χ*<sup>2</sup> *<sup>k</sup>*|*n*, which can be computed from the smoothed residuals:

$$r\_{k|n} = \mathbf{m}\_k - \mathbf{c}\_k - \mathbf{H}\_k \tilde{\mathbf{q}}\_{k|n}, \ \mathbf{R}\_{k|n} = \mathsf{Var}\left[r\_{k|n}\right] = \mathbf{V}\_k - \mathbf{H}\_k \mathbf{C}\_{k|n} \mathbf{H}\_k^\top,\qquad(3.39)$$

$$\chi^2\_{k\mid n} = \mathbf{r}\_{k\mid n}^\top \mathbf{R}\_{k\mid n}^{-1} r\_{k\mid n}. \tag{3.40}$$

If the model is correctly specified and both process noise and observation errors are normally distributed, *χ*<sup>2</sup> *<sup>k</sup>*|*<sup>n</sup>* is *<sup>χ</sup>*2-distributed with a number of degrees of freedom equal to the dimension of *mk*. As the smoothed state vectors are estimated from the same information, they are correlated. The prescription for computing their joint covariance matrix is given in [11].

The Kalman filter can also be implemented as an information filter and as a square-root filter [10].

#### **3.2.3.2 Nonlinear State Space Models and the Extended Kalman Filter**

In the applications of the Kalman filter to track fitting, see Chap. 6, the system equation is usually nonlinear; see Sect. 4.3 and Fig. 4.4. It has the following form:

$$\mathfrak{g}\_k = \mathfrak{f}\_{k|k-1}(\mathfrak{q}\_{k-1}) + \mathfrak{y}\_k, \ \mathsf{E}\left[\mathfrak{y}\_k\right] = \mathfrak{g}\_k, \ \mathsf{Var}\left[\mathfrak{y}\_k\right] = \mathfrak{Q}\_k. \tag{3.41}$$

In the prediction step, the exact linear error propagation is replaced by an approximate linearized error propagation; see also Sect. 4.4 and Fig. 4.5.

$$
\tilde{\mathbf{q}}\_{k|k-1} = f\_{k|k-1}(\tilde{\mathbf{q}}\_{k-1}) + \mathbf{g}\_k, \quad F\_{k|k-1} = \frac{\partial f\_{k|k-1}}{\partial \mathbf{q}\_{k-1}} \Big|\_{\tilde{\mathbf{q}}\_{k-1}},
$$

$$
\mathbf{C}\_{k|k-1} = F\_{k|k-1} \mathbf{C}\_{k-1} \mathbf{F}\_{k|k-1}^{\mathsf{T}} + \mathbf{\mathcal{Q}}\_k. \tag{3.42}
$$

More rarely, also the measurement equation can be nonlinear:

$$\mathfrak{a}\mathfrak{m}\_{k} = \mathfrak{h}\_{k}(\mathfrak{q}\_{k}) + \mathfrak{e}\_{k}, \quad \mathsf{E}\left[\mathfrak{e}\_{k}\right] = \mathfrak{0}, \ \mathsf{Var}\left[\mathfrak{e}\_{k}\right] = V\_{k} = \mathsf{G}\_{k}^{-1}. \tag{3.43}$$

The first order Taylor expansion of *h<sup>k</sup>* at *q***˜** *<sup>k</sup>*|*k*−<sup>1</sup> gives:

$$\hbar\_k(\mathbf{q}\_k) \approx H\_k \mathbf{q}\_k + \mathbf{c}\_k, \ \mathbf{c}\_k = \hbar\_k(\tilde{\mathbf{q}}\_{k|k-1}) - H\_k \tilde{\mathbf{q}}\_{k|k-1}. \tag{3.44}$$

Using this *H<sup>k</sup>* and *c<sup>k</sup>* in Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32) gives the update equations for the nonlinear measurement equation. If required, the update can be iterated by re-expanding *h<sup>k</sup>* at *q***˜** *<sup>k</sup>* and recomputing *Hk*, *c<sup>k</sup>* and *Kk*. The smoother can be implemented via its gain matrix, Eqs. (3.36) and (3.37), or by the backward filter, Eq. (3.38).

## **3.3 Clustering**

Clustering is the classification of objects into groups, such that similar objects end up in the same group. The similarity of objects can be summarized in a distance/similarity matrix containing the pair-wise distances/similarities of the objects to be clustered. A cluster is then a group with small distances/large similarities inside the group and large distances/small similarities to objects outside the group. After clustering, the resulting cluster structure should be validated by measuring the internal consistency of each cluster. For further information and examples, the reader is directed to the literature [12–14].

## *3.3.1 Hierarchical Clustering*

Hierarchical clustering groups the objects with a sequence of nested partitions [12, Chapter 3]. If the sequence starts from single-object clusters and merges them to larger clusters, the clustering is called agglomerative; if the sequence starts from a single cluster containing all objects and splits it into successively smaller clusters, the clustering is called divisive. At any stage of the clustering, all clusters are pairwise disjoint. The number of clusters is not necessarily specified in advance, but can determined "on the fly". If in divisive clustering a cluster is considered to be valid, further splitting is not required, but also not forbidden. If in agglomerative clustering the merging of two clusters results in an invalid cluster, the merger is undone.

## *3.3.2 Partitional Clustering*

Partitional clustering directly divides the objects into a predefined number *K* of clusters, usually by optimizing an objective function that describes the global quality of the cluster structure [12, Chapter 4]. Partitional clustering can be repeated for several values of *K* in order to find the optimal cluster structure. In the fuzzy variant of partitional clustering, an object can belong to more than one cluster with a certain degree of membership [15].

# *3.3.3 Model-Based Clustering*

In model-based clustering, the objects are assumed to be drawn from a mixture distribution with two or more components. Each component is described by a PDF and has an associated probability or "weight" in the mixture [16]. The parameters of the mixture are usually estimated by the Expectation-Maximization (EM) algorithm [17–19]; see Sect. 7.2.2. A by-product of the EM algorithm is the posterior probability *πik* of object *i* belonging to cluster *k*, for all objects and all clusters. The result is again a fuzzy clustering.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part II Track Reconstruction**

# **Chapter 4 Track Models**

**Abstract** The chapter shows how the equations of motion for charged particles in a homogeneous or inhomogeneous magnetic field are solved. Various types of parametrizations are presented, and formulas for track propagation and error propagation are derived. As the effects of the detector material on the trajectory have to be taken into account, the statistical properties of multiple Coulomb scattering, energy loss by ionization, and energy loss by bremsstrahlung are discussed; then it is shown how the effects can be treated in the track reconstruction. As multiple scattering in thin layers and energy loss by bremsstrahlung have distinctive non-Gaussian features, an approximation by normal mixtures is presented.

# **4.1 The Equations of Motion**

Consider a charged particle with mass *m* and charge *Q* = *q e*, where *e* is the elementary charge and *q* is an integer, usually *q* = ±1. Its trajectory or position *<sup>r</sup>(t)* <sup>=</sup> *(x(t), y(t), z(t))*<sup>T</sup> in a magnetic field *<sup>B</sup>(r)*, as a function of time, is determined by the equations of motion given by the Lorentz force *F* ∝ *qv* × *B*, where *v* = d*r/*d*t* is the velocity of the particle. In vacuum, Newton's second law reads [1]:

$$\frac{d\mathbf{p}}{dt} = kq\,\mathbf{v}(t) \times \mathbf{B}(\mathbf{r}(t)),\tag{4.1}$$

where *<sup>p</sup>* <sup>=</sup> *γ m<sup>v</sup>* is the momentum vector of the particle, *<sup>γ</sup>* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *<sup>v</sup>*2*/c*2*)*−1*/*<sup>2</sup> is the Lorentz factor, and *k* is a unit-dependent proportionality factor. If the units are GeV*/c* for *<sup>p</sup>*, meter for *<sup>r</sup>*, and Tesla for *<sup>B</sup>*, then *<sup>k</sup>* <sup>=</sup> <sup>0</sup>*.*29979 GeV*/c* <sup>T</sup>−<sup>1</sup> <sup>m</sup>−1. The trajectory is uniquely defined by the initial conditions, i.e., the six degrees of freedom specified for instance by the initial position and the initial momentum. If these are tied to a reference surface, five degrees of freedom are necessary and sufficient. Geometrical quantities other than position and velocity can also be used

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_4

to specify the initial conditions. The collection *q* = *(q*1*, . . . ., qm)* of these quantities is called the initial track parameter vector or the initial state vector.

Equation (4.1) can be written in terms of the path length *s(t)* along the trajectory instead of *t*, giving [1]:

$$\frac{\mathrm{d}^2 \mathbf{r}}{\mathrm{d}s^2} = k \,\psi \cdot \frac{\mathrm{d}\mathbf{r}}{\mathrm{d}s} \times \mathcal{B}(\mathbf{r}(\mathrm{s})) = \Gamma\left(\mathrm{s}, \mathbf{r}(\mathrm{s}), \dot{\mathbf{r}}(\mathrm{s})\right), \tag{4.2}$$

where *r***˙***(s)* ≡ d*r/*d*s, ψ* = *q/p* and *p* = |*p*|. In simple cases, this equation has closed-form solutions. In a homogeneous magnetic field, the solution is a helix; it reduces to a straight line in the limit of *B* ≡ **0**. In the general case of an inhomogeneous magnetic field, one has to resort to numerical methods such as Runge–Kutta integration of the equations of motion (Sect. 4.3.2.1), parametrization by polynomials or splines [1], or other approximations [2]; see Sect. 4.3.2.2.

Equation (4.2) can be expressed in terms of other independent variables. For example, if the equations of motion are integrated in a cylindrical detector geometry, the radius *R* is a natural integration variable. In a planar detector geometry, the position coordinate *z* could be the variable of choice [2, 3].

## **4.2 Track Parametrization**

Different detector geometries often lead to different choices of the track parameters. However, the parametrization of the trajectory should comply with some basic requirements: the parameters should be continuous with respect to small changes of the trajectory; the choice of track parameters should facilitate the local expansion of the track model into a linear function; and the stochastic uncertainties of the estimated parameters should follow a Gaussian distribution as closely as possible. In order to fulfill, for instance, the continuity requirement, curvature should be used rather than radius of curvature, and inverse (transverse) momentum rather than (transverse) momentum.

In a barrel-type detector system typical for the central part of collider experiments, a natural reference surface of the track parameters is a cylinder with radius *R*, centered around the global *z*-axis, which usually coincides with the beam line. The track parameters are, in this case, defined at the point of intersection P between the track and the reference cylinder. In such a system, one possible choice of track parametrization is the following:

$$q\_1 = q / p\_\Gamma, \; q\_2 = \phi, \; q\_3 = \tan \lambda, \; q\_4 = R\Phi, \; q\_5 = z,\tag{4.3}$$

where *p*<sup>T</sup> = *p* cos *λ* is the transverse momentum, *φ* is the azimuth angle of the tangent of the track at P, *λ* is the dip angle (complement of the polar angle) of the tangent at P, and *RΦ* and *z* are the cylindrical coordinates of P in the global coordinate system, see Fig. 4.1.

In a detector system based on planar detector elements, the natural reference surface is a plane. Such a surface is uniquely determined by a normal vector of the plane and the position of a reference point inside the plane. A local coordinate system is defined such that the *u*-axis is parallel to the normal vector and the *v*- and *w*-axes are inside the plane. A natural choice of track parameters is now

$$q\_1 = \psi, \ q\_2 = \mathrm{d}v/\mathrm{d}u, \ q\_3 = \mathrm{d}w/\mathrm{d}u, \ q\_4 = v, \ q\_5 = w,\tag{4.4}$$

where *ψ* = *q/p,* d*v/*d*u* is the tangent of the angle between the projection of the track tangent into the *(u, v)*-plane and the *u*-axis, d*w/*d*u* is the tangent of the angle between the projection of the track tangent into the *(u, w)*-plane and the *u*-axis, and *v* and *w* are the local coordinates of the intersection point of the track with the plane; see Fig. 4.2. The quantities d*v/*d*u* and d*w/*d*u* are also called direction tangents, as the tangent vector to the track is proportional to the vector *(*d*v/*d*u,* d*w/*d*u,* 1*)*T.

Another planar reference system is the curvilinear frame, which is useful for transporting uncertainties of track parameters; see Sect. 4.4. It is a hybrid local/global reference frame. The curvilinear plane is always orthogonal to the direction of the track with the parametrization:

$$q\_1 = \psi,\ q\_2 = \phi,\ q\_3 = \lambda,\ q\_4 = \mathbf{x}\_{\perp},\ q\_5 = \mathbf{y}\_{\perp},\tag{4.5}$$

where *x*<sup>⊥</sup> and *y*<sup>⊥</sup> are orthogonal position coordinates inside the plane, see Fig. 4.3. The *x*⊥-axis is parallel to the global *(x, y)*-plane. The azimuth and dip angles *φ* and *λ* are defined at the point of intersection of the track with the curvilinear plane, but their values are measured in the global Cartesian coordinate system. The tangent vector is thus proportional to *(*cos *λ* cos *φ,* cos *λ* sin *φ,*sin *λ)*T.

In addition to the above-mentioned, surface-based frames, a global, Cartesian coordinate frame is also frequently used. In this frame, the track parameters are the position vector *r* and the momentum vector *p* at that point. Since these are not tied to a surface, six parameters are needed in order to uniquely specify the state of the track. The rank of their covariance matrix is at most five.

## **4.3 Track Propagation**

The track model, given by the solution of the equations of motion, describes the functional dependence of the state vector *q<sup>j</sup>* at a surface *j* on the state vector *q<sup>i</sup>* at a different surface *i*:

$$\mathfrak{q}\_{j} = \mathfrak{f}\_{j||i} \left( \mathfrak{q}\_{i} \right). \tag{4.6}$$

**Fig. 4.4** Track propagator from surface *i* to surface *j*

The function *f <sup>j</sup>* <sup>|</sup>*<sup>i</sup>* is called the track propagator from surface *i* to surface *j* , see Fig. 4.4. When closed-form solutions of the equations of motion exist, e.g., in the two situations of vanishing magnetic field and homogeneous magnetic field, the track propagator can be written as an explicit function of the path length. In the straight-line solution of the equations of motion in a vanishing magnetic field, it is easy to also derive an analytical formula for the path length between two surfaces. For the helical solution in a homogeneous magnetic field, however, such an analytical formula exists only for propagation to cylinders with symmetry axis parallel to the field direction or to planes orthogonal to the field direction. Otherwise, a Newton iteration or a parabolic approximation has to be used to find the path length.

## *4.3.1 Homogeneous Magnetic Fields*

The helical track propagator takes the solution to Eq. (4.2) as a starting point. The solution [4] can be written as:

$$\mathbf{r}(\mathbf{s}) = \mathbf{r}\_0 + \frac{\delta}{K} \left(\theta - \sin \theta\right) \cdot \mathbf{h} + \frac{\sin \theta}{K} \cdot \mathbf{t}\_0 + \frac{\alpha}{K} \left(1 - \cos \theta\right) \cdot \mathbf{n}\_0,\tag{4.7}$$

where *r(s)* is the position vector of the point on the helix at path length *s* from the reference point *r*<sup>0</sup> (at *s* = 0), *h* = *B/*|*B*| is the normalized magnetic field vector, *t* = *p/p* is the unit tangent vector to the track, *n* = *(h* × *t) /α* with *α* = |*h* × *t*|*, δ* = *h* · *t, K* = −*kψ*|*B*|, and *θ* = *K s*. In the following, the subscript "0" indicates quantities defined at the initial point *s* = 0. Any point along the trajectory can be specified by a corresponding value of *s*. The equation of the unit tangent vector *t* is found by differentiating Eq. (4.7) with respect to *s*,

54 4 Track Models

$$\mathbf{t}(\mathbf{s}) = \frac{\mathbf{d}\mathbf{r}(s)}{\mathbf{d}s} = \boldsymbol{\delta} \left(\mathbf{l} - \cos \theta\right) \cdot \mathbf{h} + \cos \theta \cdot \mathbf{t}\_0 + \boldsymbol{\alpha} \sin \theta \cdot \mathbf{n}\_0. \tag{4.8}$$

For a given value of *s*, any desired set of track parameters can be calculated from Eq. (4.7) (positions) and Eq. (4.8) (directions). In the helical track model, the momentum *p* is constant and therefore has the same value for all *s*.

## *4.3.2 Inhomogeneous Magnetic Fields*

In an inhomogeneous magnetic field, the equations of motion have no exact closedform solutions, and one has to resort to numerical, approximate solutions.

#### **4.3.2.1 Runge–Kutta Methods**

Runge–Kutta methods are iterative algorithms for the approximate numerical solutions of ordinary differential equations, given initial values. Among Runge– Kutta methods, the Runge–Kutta–Nyström algorithm is specifically designed for second-order equations such as Eq. (4.2). In the fourth-order version a step of length *h*, starting at *s* = *sn*, is computed by [1]:

$$\begin{aligned} \dot{r}\_{n+1} &= r\_n + h \dot{r}\_n + h^2 (\mathbf{k}\_1 + \mathbf{k}\_2 + \mathbf{k}\_3) / 6, \\ \dot{r}\_{n+1} &= \dot{r}\_n + h (\mathbf{k}\_1 + 2\mathbf{k}\_2 + 2\mathbf{k}\_3 + \mathbf{k}\_4) / 6, \end{aligned} \tag{4.9}$$

with the intermediate stages *k* defined by

$$\begin{aligned} k\_1 &= \Gamma(s\_n, r\_n, \dot{r}\_n), \\ k\_2 &= \Gamma(s\_n + h/2, r\_n + h\dot{r}\_n/2 + h^2\mathbf{k}\_1/8, \dot{r}\_n + h\mathbf{k}\_1/2), \\ k\_3 &= \Gamma(s\_n + h/2, r\_n + h\dot{r}\_n/2 + h^2\mathbf{k}\_1/8, \dot{r}\_n + h\mathbf{k}\_2/2), \\ k\_4 &= \Gamma(s\_n + h, r\_n + h\dot{r}\_n + h^2\mathbf{k}\_3/2, \dot{r}\_n + h\mathbf{k}\_3), \end{aligned} \tag{4.10}$$

where *r<sup>n</sup>* is the position of the particle at *s* = *sn, r***˙***<sup>n</sup>* is the unit tangent vector, and *Γ* is defined in Eq. (4.2). The magnetic field needs to be looked up three times per step, at the positions *<sup>r</sup>n, <sup>r</sup><sup>n</sup>* <sup>+</sup>*hr***˙***n/*2+*h*2*k*1*/*8, and *<sup>r</sup><sup>n</sup>* <sup>+</sup>*hr***˙***<sup>n</sup>* <sup>+</sup>*h*2*k*3*/*2. If the field at the final position *rn*+1, which is the starting position of the next step, is approximated by the field used for *k*4, only two lookups are required per step.

If integration variables other than the path length *s* are used, for instance the radius *R* or the position coordinate *z*, the integration equations Eq. (4.9) are very similar. The difference is that the step length *h* must be expressed in terms of *R* or *z*, and that the function *Γ* has a different form when expressed in other variables [3].

#### 4.3 Track Propagation 55

If the field is (almost) homogeneous, as for example in a solenoid, the step size *h* can be chosen to be constant; otherwise a variable step size, taking into account the local inhomogeneity of the field, is more efficient. Determining the variable step size along the propagation is done by a step-size selection algorithm. The essence of such an algorithm is to assess the local error of the Runge–Kutta step and compare it to a user-defined error tolerance *τ* . The so-called embedded Runge–Kutta pairs, originally invented by Fehlberg [5], provide a measure of the local error in an elegant way by producing solutions of different orders during the same step with little extra computational cost. The higher-order solution is denoted *rn*+1, whereas the lowerorder solution is denoted *r***ˆ***n*+1. The difference

$$\epsilon = |\mathbf{r}\_{n+1} - \mathbf{\bar{r}}\_{n+1}| \tag{4.11}$$

between these solutions constitutes a measure of the error of the step. A popular algorithm for the step size *hn*+<sup>1</sup> of step *n* + 1 is [6]

$$h\_{n+1} = h\_n \left(\frac{\tau}{\epsilon}\right)^{1/(q+1)},\tag{4.12}$$

where *hn* is the size of step *n* and *q* is the order of the lower-order solution. This algorithm will effectively shorten the step size if the local error is larger than the tolerance and lengthen it if the local error is smaller than the tolerance, forcing the local error to oscillate around the tolerance.

The original version of the Runge–Kutta–Nyström algorithm is not of the embedded type and therefore contains no direct recipe for estimating the local error. It was realized by the authors of [7], however, that a fourth-order derivative of *r* can be formed by a combination of the various stages calculated along the step. This derivative is implicitly the difference between a fourth-order solution and a third-order solution and can therefore be used as a measure of the local error :

$$
\epsilon = h^2 \cdot |\mathbf{k}\_1 - \mathbf{k}\_2 - \mathbf{k}\_3 + \mathbf{k}\_4|. \tag{4.13}
$$

This error measure can be used for step-size selection according to Eq. (4.12) and constitutes an adaptive version of the Runge–Kutta–Nyström algorithm when used alongside the integration steps in Eq. (4.9).

#### **4.3.2.2 Approximate Analytical Formula**

An approximate analytical formula for track extrapolation in an inhomogeneous field is described in [2]. The magnetic field *B(z)* is assumed to depend only on the *z*-coordinate. The particle is assumed to move along the *z*-axis, and the track parameters are *x, y,tx , ty , ψ*, where *tx , ty* are the direction tangents. In this parametrization, the equations of motion read:

#### 56 4 Track Models

$$\begin{aligned} \mathbf{x}' &= t\_{\mathbf{x}}, \\ \mathbf{y}' &= t\_{\mathbf{y}}, \\ t\_{\mathbf{x}}' &= h \cdot [t\_{\mathbf{x}} t\_{\mathbf{y}} B\_1 - (1 + t\_{\mathbf{x}}^2) B\_1 + t\_{\mathbf{y}} B\_3] = \mathbf{a}(\mathbf{z}) \cdot \mathbf{B}(\mathbf{z}), \\ t\_{\mathbf{y}}' &= h \cdot [(1 + t\_{\mathbf{y}}^2) B\_1 - t\_{\mathbf{x}} t\_{\mathbf{y}} B\_2 - t\_{\mathbf{x}} B\_3] = \mathbf{b}(\mathbf{z}) \cdot \mathbf{B}(\mathbf{z}), \\ \boldsymbol{\psi}' &= \mathbf{0}, \end{aligned} \tag{4.14}$$

where *<sup>h</sup>* <sup>=</sup> *kψ* <sup>1</sup> <sup>+</sup> *<sup>t</sup>*<sup>2</sup> *<sup>x</sup>* <sup>+</sup> *<sup>t</sup>*<sup>2</sup> *y* 1*/*<sup>2</sup> and the prime denotes differentiation with respect to *z*. The aim is to find formulas for the extrapolation of *(tx , ty )* from *z*<sup>0</sup> to *z*e; the extrapolation of *x* and *y* can then be performed by integration of the direction tangents.

Let *T (z)* = *T (tx (z), ty (z))* be a function of *tx* and *tz*. Then

$$T' = \frac{\partial T}{\partial t\_{\chi}} t'\_{\chi} + \frac{\partial T}{\partial t\_{\chi}} t'\_{\chi} = \left(\frac{\partial T}{\partial t\_{\chi}} \mathbf{a} + \frac{\partial T}{\partial t\_{\chi}} \mathbf{b}\right) \cdot \mathbf{B} = \sum\_{i\_1=1}^{3} T\_{i\_1} B\_{i\_1}.\tag{4.15}$$

The derivatives of the functions *Ti*<sup>1</sup> can be represented in a similar way:

$$T\_{i\_1}' = \frac{\partial T\_{i\_1}'}{\partial t\_X} t\_X' + \frac{\partial T\_{i\_1}'}{\partial t\_Y} t\_y' = \left(\frac{\partial T\_{i\_1}'}{\partial t\_X} \mathbf{a} + \frac{\partial T\_{i\_1}'}{\partial t\_Y} \mathbf{b}\right) \cdot \mathbf{B} = \sum\_{l\_2=1}^3 T\_{i\_1 i\_2} B\_{l\_2}.\tag{4.16}$$

When this process is continued, the following functions can be defined recursively:

$$T'\_{i\_1\dots i\_{k-1}} = \sum\_{l\_k=1}^{3} T\_{i\_1\dots i\_k} \mathcal{B}\_{l\_k},\tag{4.17}$$

$$T\_{\dot{l}\_1\ldots\dot{l}\_k} = \frac{\partial T\_{\dot{l}\_1\ldots\dot{l}\_{k-1}}}{\partial t\_\chi} a\_{\dot{l}\_k} + \frac{\partial T\_{\dot{l}\_1\ldots\dot{l}\_{k-1}}}{\partial t\_\chi} b\_{\dot{l}\_k}.\tag{4.18}$$

The exact relation

$$T(z\_{\mathfrak{e}}) = T(z\_0) + \int\_{z\_0}^{z\_{\mathfrak{e}}} T'(z)dz\tag{4.19}$$

can then be expanded into:

$$\begin{split} T(z\_{\varepsilon}) &= T(z\_{0}) + \sum\_{l\_{1}=1}^{3} \int\_{z\_{0}}^{z\_{\varepsilon}} T\_{l\_{1}}(z\_{1}) B\_{l\_{1}}(z\_{1}) d\mathbf{z}\_{1} = \\ &= T(z\_{0}) + \sum\_{l\_{1}=1}^{3} \int\_{z\_{0}}^{z\_{\varepsilon}} \left( T\_{l\_{1}}(z\_{0}) + \int\_{z\_{0}}^{z\_{1}} T\_{l\_{1}}'(z\_{2}) d\mathbf{z}\_{2} \right) B\_{l\_{1}}(z\_{1}) d\mathbf{z}\_{1} = \\ &= T(z\_{0}) + \sum\_{l\_{1}=1}^{3} T\_{l\_{1}}(z\_{0}) \int\_{z\_{0}}^{z\_{\varepsilon}} B\_{l\_{1}}(z\_{1}) d\mathbf{z}\_{1} + \\ &\quad + \sum\_{l\_{1}=1}^{3} \int\_{z\_{0}}^{z\_{\varepsilon}} B\_{l\_{1}} \int\_{z\_{0}}^{z\_{1}} \sum\_{l\_{2}=1}^{3} T\_{l\_{1}l\_{2}}(z\_{2}) \mathbf{d}z\_{2} d\mathbf{z}\_{1} \\ &= \dots \end{split} \tag{4.20}$$

If the expansion is terminated after *n* steps, the compact form of Eq. (4.20) reads:

$$T(z\_{\mathbf{c}}) \approx T(z\_0) + \sum\_{k=1}^{n} \sum\_{i\_1 \dots i\_k} T\_{i\_1 \dots i\_k}(z\_0) \times \left( \int\_{z\_0}^{z\_{\mathbf{c}}} B\_{i\_1}(z\_1) \dots \int\_{z\_0}^{z\_{k-1}} B\_{i\_k}(z\_k) \, \mathrm{d}z\_k \dots \, \mathrm{d}z\_1 \right) + \dots$$

$$+ \mathcal{O}\left(\frac{k B \, \psi \, (z\_{\mathbf{c}} - z\_0)^{n+1}}{(n+1)!} \right) \tag{4.21}$$

The integral over the field components is taken along an approximate trajectory. Setting *T* = *tx* and *T* = *ty* gives the desired extrapolation formulas. For a comparison with a Runge–Kutta solver, see [2].

## **4.4 Error Propagation**

The task of transporting the covariance matrix of the track parameters is essential for any track reconstruction algorithm, either from one set of track parameters to another in the same reference surface, or during track propagation from one surface to another.

This so-called error propagation is straightforward when the transformed track parameter vector is a strictly linear function of the initial track parameter vector. The track propagator in Eq. (4.6), however, is in general a non-linear function, and exact error propagation is not feasible. The common solution is approximate linearized error propagation; see Sect. 3.2.3.2 and Fig. 4.5. The derivatives of the track propagator *f <sup>j</sup>* <sup>|</sup>*<sup>i</sup>* are collected in the Jacobian matrix *F<sup>j</sup>* <sup>|</sup>*i*:

**Fig. 4.5** Error propagation from surface *i* to surface *j* . The error bars around the track positions and the cones around the direction vectors symbolize the growing uncertainty of the track parameters

$$F\_{j||i} = \frac{\partial \mathbf{q}\_j}{\partial \mathbf{q}\_i}.\tag{4.22}$$

The covariance matrix *C<sup>i</sup>* of the track parameters at surface *i* is transported to surface *j* according to:

$$\mathbf{C}\_{j} \approx \mathbf{F}\_{j||i} \mathbf{C}\_{i} \mathbf{F}\_{j||i}^{\mathsf{T}}.\tag{4.23}$$

A general method for computing the Jacobians is numerical differentiation [8, Section 5.7], by propagating a reference track and five other, nearby tracks from surface *i* to surface *j* . Consider a reference track with parameter vector *q<sup>i</sup>* at surface *i*. A small variation of component *l* (*l* = 1*,...,* 5) in *q<sup>i</sup>* is introduced by adding to *<sup>q</sup><sup>i</sup>* the vector *<sup>Δ</sup><sup>l</sup>* <sup>=</sup> *(δ*1*lhl,...,δ*5*lhl)*<sup>T</sup>. The corresponding change in parameter *<sup>k</sup>* (*k* = 1*,...,* 5) at surface *j* is

$$
\Delta\_{kl} = f\_k(\mathbf{q}\_l + \Delta\_l) - f\_k(\mathbf{q}\_l), \tag{4.24}
$$

where *fk* is component *k* of the track propagator *f <sup>j</sup>* <sup>|</sup>*i*. The elements *(F<sup>j</sup>* <sup>|</sup>*i)kl* of the numerical Jacobian matrix *F<sup>j</sup>* <sup>|</sup>*<sup>i</sup>* are then obtained by evaluating

$$(F\_{j\mid l})\_{kl} = \frac{\Delta\_{kl}}{h\_l}.\tag{4.25}$$

This procedure works for all track propagators, irrespective of whether or not they are in closed form.

## *4.4.1 Homogeneous Magnetic Fields*

The exposition in this section closely follows the treatment in [4]. In a homogeneous magnetic field, it is possible to obtain analytical formulas for the Jacobians defined in Eq. (4.22). The problem of calculating transport Jacobians from one plane of arbitrary orientation to another is naturally decomposed into three separate parts because the error propagation from one spatial location to another is performed most easily in a coordinate frame which moves along the track, i.e., the curvilinear frame introduced in Sect. 4.2. Therefore, the natural decomposition is first a transformation from a local coordinate system to the curvilinear frame at the initial surface, then a transport within the curvilinear frame to the destination surface, and finally a transformation from the curvilinear frame to a local frame at the destination surface. The total Jacobian is the matrix product of the three intermediate Jacobians.

The starting point of calculating the transport Jacobians are differentials relating variations of position, direction, and momentum at the initial point (*s* = 0) to variations of the same quantities at any other point along the helix. These differentials are given by

$$\mathbf{d}\mathbf{r} = \frac{\partial \mathbf{r}}{\partial \mathbf{r}\_0} \cdot \mathbf{d}\mathbf{r}\_0 + \frac{\partial \mathbf{r}}{\partial t\_0} \cdot \mathbf{d}t\_0 + \frac{\partial \mathbf{r}}{\partial \psi\_0} \cdot \mathbf{d}\psi\_0 + \frac{\partial \mathbf{r}}{\partial s} \cdot \mathbf{ds},\tag{4.26}$$

$$\mathbf{d}t = \frac{\partial \mathbf{t}}{\partial t\_0} \cdot \mathbf{d}t\_0 + \frac{\partial \mathbf{t}}{\partial \psi\_0} \cdot \mathbf{d}\psi\_0 + \frac{\partial \mathbf{t}}{\partial s} \cdot \mathbf{ds},\tag{4.27}$$

where d*ψ*<sup>0</sup> is the variation of the signed inverse momentum at the initial point and d*s* is the change in path length of the helix due to the variations at the initial point. An illustration of this effect is shown in Fig. 4.6.

The partial derivatives are obtained by direct differentiation of Eqs. (4.7) and (4.8). The results are:

$$\begin{aligned} \frac{\partial \mathbf{r}}{\partial \mathbf{r}\_0} \cdot \mathbf{d} \mathbf{r}\_0 &= \mathbf{d} \mathbf{r}\_0, \\ \frac{\partial \mathbf{r}}{\partial t\_0} \cdot \mathbf{d} t\_0 &= \frac{\theta - \sin \theta}{K} \cdot (\mathbf{h} \cdot \mathbf{d} t\_0) \cdot \mathbf{h} + \frac{\sin \theta}{K} \cdot \mathbf{d} t\_0 + \frac{1 - \cos \theta}{K} \cdot (\mathbf{h} \times \mathbf{d} t\_0), \end{aligned} \tag{4.28}$$

$$\frac{\partial r}{\partial \psi\_0} \cdot \mathbf{d}\psi\_0 = \frac{1}{\psi} \left[ \mathbf{s} \cdot \mathbf{t} + \mathbf{r}\_0 - \mathbf{r} \right] \cdot \mathbf{d}\psi\_0,\tag{4.30}$$

$$\frac{\partial \mathbf{r}}{\partial s} \cdot \mathbf{ds} = \mathbf{t} \cdot \mathbf{ds},\tag{4.31}$$

$$\frac{\partial t}{\partial t\_0} \cdot \mathbf{d}t\_0 = \cos \theta \cdot \mathbf{d}t\_0 + (1 - \cos \theta) \cdot (\mathbf{h} \cdot \mathbf{d}t\_0) \cdot \mathbf{h} + \sin \theta \cdot (\mathbf{h} \times \mathbf{d}t\_0), \quad (4.32)$$

$$\frac{\partial t}{\partial \psi\_0} \cdot \mathbf{d}\psi\_0 = \frac{aKs}{\Psi} \cdot \mathbf{n} \cdot \mathbf{d}\psi\_0,\tag{4.33}$$

$$\frac{\partial t}{\partial s} \cdot \mathrm{ds} = \alpha K \cdot \mathfrak{n} \cdot \mathrm{ds}.\tag{4.34}$$

(4.29)

**Fig. 4.6** A track and the displaced track due to a variation d*r*<sup>0</sup> are shown. In the error propagation, the change d*s* of the path length has to be taken into account. In this specific case, d*r* is understood to be perpendicular to the track (see also Eq. (4.41))

#### **4.4.1.1 Transformation from One Curvilinear Frame to Another**

The curvilinear frame is uniquely defined at each point along the track by three orthogonal unit vectors *u, v* and *t*, defining a coordinate system *(x*⊥*, y*⊥*, z*⊥*)*. The vector *t* has been defined above as the unit vector parallel to the track, and pointing in the particle direction. The two vectors *u* and *v* are defined by

$$
\mathfrak{u} = \frac{\mathfrak{z} \times t}{|\mathfrak{z} \times t|}, \quad \mathfrak{v} = t \times \mathfrak{u}, \tag{4.35}
$$

where *z* is the unit vector pointing in the direction of the global *z*-axis. This means that the *z*⊥-axis is pointing along the particle direction, the *x*⊥-axis is parallel to the global *(x, y)*-plane, while the *y*⊥-axis is given by the requirement that the three axes should form a Cartesian, right-handed coordinate system. The relations between the momentum components *px , py , pz* in the global Cartesian frame and the angles are:

$$p\_{\chi} = p \cos \lambda \cos \phi,$$

$$p\_{\mathcal{Y}} = p \cos \lambda \sin \phi,\tag{4.36}$$

$$p\_{\mathcal{Z}} = p \sin \lambda.$$

The Jacobian of the transformation from a curvilinear frame *(ψ, φ, λ, x*⊥*, y*⊥*)* at *s*<sup>0</sup> = 0 to the same set of parameters at path length *s* is then derived by forming the differentials d*r* and d*t*, introducing the specific constraints given by the curvilinear frames,

$$\mathbf{d}\mathbf{r}\_0 = \mathfrak{u}\_0 \cdot \mathrm{d}\mathbf{x}\_{\perp 0} + \mathfrak{v}\_0 \cdot \mathrm{d}\mathbf{y}\_{\perp 0},\tag{4.37}$$

$$\mathbf{d}t\_0 = \frac{\partial t\_0}{\partial \phi\_0} \cdot \mathbf{d}\phi\_0 + \frac{\partial t\_0}{\partial \lambda\_0} \cdot \mathbf{d}\lambda\_0 = \cos\lambda\_0 \cdot \mathbf{u}\_0 \cdot \mathbf{d}\phi\_0 + \mathbf{v}\_0 \cdot \mathbf{d}\lambda\_0,\tag{4.38}$$

$$\mathbf{r} \,\mathrm{d}\mathbf{r} = \mathbf{u} \cdot \mathrm{d}\mathbf{x}\_{\perp} + \mathbf{v} \cdot \mathrm{d}\mathbf{y}\_{\perp},\tag{4.39}$$

$$\mathbf{d}\mathbf{t} = \cos\lambda \cdot \mathbf{u} \cdot \mathbf{d}\phi + \mathbf{v} \cdot \mathbf{d}\lambda. \tag{4.40}$$

Moreover, since d*r* is now defined to be a variation in a plane perpendicular to the track, the functional dependence of d*s* on the variations of position, direction, and momentum at the initial point can be evaluated by multiplying Eq. (4.26) with *t* and using the constraint d*r* · *t* = 0. One obtains

$$\mathbf{r}\,\mathrm{d}\mathbf{s} = -t \cdot \mathrm{d}\mathbf{r}\_0 - t \cdot \left(\frac{\partial \mathbf{r}}{\partial t\_0} \cdot \mathrm{d}\mathbf{t}\_0\right) - \left(\mathbf{t} \cdot \frac{\partial \mathbf{r}}{\partial \boldsymbol{\psi}\_0}\right) \cdot \mathrm{d}\boldsymbol{\psi}\_0. \tag{4.41}$$

Inserting Eq. (4.41) and Eqs. (4.28)–(4.34) into Eqs. (4.26) and (4.27), making use of Eqs. (4.37) and (4.40), yields a set of equations relating variations of the parameters at the initial surface to the variations at the destination surface. These can then be manipulated in a straightforward manner to yield the differentials of the parameters at the destination surface. Since, for instance, the differential d*x*<sup>⊥</sup> is defined as

$$\mathbf{dx}\_{\perp} = \frac{\partial \mathbf{x}\_{\perp}}{\partial \psi\_{0}} \cdot \mathbf{d}\psi\_{0} + \frac{\partial \mathbf{x}\_{\perp}}{\partial \phi\_{0}} \cdot \mathbf{d}\phi\_{0} + \frac{\partial \mathbf{x}\_{\perp}}{\partial \lambda\_{0}} \cdot \mathbf{d}\lambda\_{0} + \frac{\partial \mathbf{x}\_{\perp}}{\partial \mathbf{x}\_{\perp 0}} \cdot \mathbf{dx}\_{\perp 0} + \frac{\partial \mathbf{x}\_{\perp}}{\partial \mathbf{y}\_{\perp 0}} \cdot \mathbf{dy}\_{\perp 0},\tag{4.42}$$

the different terms in the desired Jacobian can be identified as the multiplying factors of the variations at the initial surface. A complete list of these can be found in Appendix A. Since a perfectly helical track model is assumed, the momentum at the destination surface is equal to the momentum at the initial surface.

#### **4.4.1.2 Transformations Between Curvilinear and Local Frames at a Fixed Point on the Particle Trajectory**

Consider the transformation from a local plane—defined by a unit vector *i* normal to the plane and two, orthogonal unit vectors *j* and *k* inside the plane—to the curvilinear frame. These three unit vectors define the coordinate system *(u, v, w)* introduced in Sect. 4.2. The aim is to derive the Jacobian of the transformation between  *ψ, v , w , v, w* , where *v* = d*v/*d*u* and *w* = d*w/*d*u*, and *(ψ, φ, λ, x*⊥*, y*⊥*)* at a given point *s* on the particle trajectory. The relevant differentials are now

$$\mathbf{dr} = \mathbf{u} \cdot \mathbf{dx}\_{\perp} + \mathbf{v} \cdot \mathbf{dy}\_{\perp} = \mathbf{j} \cdot \mathbf{dv} + \mathbf{k} \cdot \mathbf{dw} + \mathbf{t} \cdot \mathbf{ds},\tag{4.43}$$

$$\mathbf{d}t = \cos\lambda \cdot \mathbf{u} \cdot \mathbf{d}\phi + \mathbf{v} \cdot \mathbf{d}\lambda = \frac{\partial t}{\partial v'} \cdot \mathbf{d}v' + \frac{\partial t}{\partial w'} \cdot \mathbf{d}w' + \frac{\partial t}{\partial s} \cdot \mathbf{ds}.\tag{4.44}$$

Again d*r* is orthogonal to *t*, therefore d*s* can be calculated from Eq. (4.43), in a similar manner as before. The result is

$$\mathbf{ds} = -(t \cdot \mathbf{j}) \cdot \mathbf{d}v - (t \cdot \mathbf{k}) \cdot \mathbf{d}w. \tag{4.45}$$

By inserting Eq. (4.45) into Eqs. (4.43) and (4.44) and following the same procedure as in Sect. 4.4.1.1, explicit expressions of the differentials can again be constructed. For this also *t* and its derivatives, expressed in the local parameters, are needed. The formulas are:

$$t = \frac{1}{\sqrt{1 + {v'}^2 + {w'}^2}} \left[ \mathbf{i} + {v'} \mathbf{j} + {w'} \mathbf{k} \right],\tag{4.46}$$

$$\frac{\partial t}{\partial v'} = \frac{1}{\sqrt{1 + {v'}^2 + {w'}^2}} \left[ \dot{\mathbf{J}} - \frac{v'}{\sqrt{1 + {v'}^2 + {w'}^2}} t \right],\tag{4.47}$$

$$\frac{\partial t}{\partial w'} = \frac{1}{\sqrt{1 + {v'}^2 + {w'}^2}} \left[ k - \frac{{w'}}{\sqrt{1 + {v'}^2 + {w'}^2}} t \right]. \tag{4.48}$$

The Jacobian of the transformation from the curvilinear to the local frame can be derived by inverting the Jacobian of the transformation from the local to the curvilinear frame. The expressions of both Jacobians can be found in Appendix A.

#### **4.4.1.3 Transformations Between Global Cartesian and Local Frames**

The global Cartesian frame (*px , py , pz, x, y, z*) is useful for vertex reconstruction purposes as well as for track reconstruction in zero and inhomogeneous magnetic fields. Zero magnetic field is typical in, for instance, test-beam applications.

First the transformations between a local frame and the global Cartesian frame is considered. The basic strategy is to go via an intermediate Cartesian frame aligned with the local frame under consideration. This Cartesian frame is related to the global Cartesian frame via a pure rotation. The intermediate Cartesian frame *C* will be denoted by primed quantities (*p <sup>x</sup> , p <sup>y</sup> , p z, x , y , z* ), where the *x* - and *y* -axes are parallel to the *v*- and *w*-axes of the local frame, respectively. The Jacobian *R* of the transformation from the global, Cartesian frame to the intermediate frame *C* consists of two similar, three-by-three blocks, each containing the relevant rotation matrix. The other entries of this matrix are zero. The Jacobian *J <sup>C</sup>* →*L* of the transformation from the primed Cartesian frame *C* to the local frame *L* is constructed from the following formulas:

$$\frac{q}{p} = \frac{q}{\sqrt{{p\_{\chi}'}^2 + {p\_{\chi}'}^2 + {p\_z'}^2}},\tag{4.49}$$

$$v' = \frac{p'\_{\chi}}{p'\_z},\tag{4.50}$$

$$w' = \frac{p\_y'}{p\_z'}.\tag{4.51}$$

The total Jacobian is given by the product *J <sup>C</sup>* <sup>→</sup>*<sup>L</sup>* · *R*.

For the inverse transformation, the Jacobian *J <sup>L</sup>*→*C* can be derived from the following relations:

$$p'\_{\chi} = \frac{q}{\psi} \cdot \frac{s\_{\mathbb{Z}} \cdot v'}{\sqrt{1 + v'^2 + w'^2}},\tag{4.52}$$

$$p'\_{\circ} = \frac{q}{\psi} \cdot \frac{s\_{\sharp} \cdot w'}{\sqrt{1 + v'^2 + w'^2}},\tag{4.53}$$

$$p'\_{\varepsilon} = \frac{q}{\psi} \cdot \frac{s\_{\varepsilon}}{\sqrt{1 + \upsilon'^2 + w'^2}},\tag{4.54}$$

where *sz* = sign*(pz)* is the sign of the *z*-component of the momentum vector in the local frame. It is needed in order to uniquely specify the state of the track in the local frame. The total Jacobian is in this case given by the product *<sup>R</sup><sup>T</sup>* · *<sup>J</sup> <sup>L</sup>*→*C* , using the same matrix *R* as above. The Jacobians *J <sup>C</sup>* <sup>→</sup>*<sup>L</sup>* and *J <sup>L</sup>*→*C* are shown in Appendix A.

## *4.4.2 Inhomogeneous Magnetic Fields*

As mentioned above, the common way of obtaining the Jacobian matrices needed for the linearized error propagation is to expand the track propagator functions to first order in a Taylor series. As there are no analytical expressions for the track propagator in the case of inhomogeneous magnetic fields, this approach is unfortunately impossible. Another common technique is the error propagation by numerical derivatives described earlier. This method is slow but robust and accurate. A third way of obtaining the Jacobian matrices is to differentiate the recursion formulae of the numerical integration method directly. This is the essence of the so-called Bugge-Myrheim method [3].

In the parameter propagation, the propagated global track parameters:

$$\mathbf{r} = \begin{pmatrix} \boldsymbol{\chi} \\ \boldsymbol{\chi} \\ \boldsymbol{z} \end{pmatrix}, \quad \dot{\boldsymbol{r}} = \boldsymbol{t} = \begin{pmatrix} t\_{\boldsymbol{x}} \\ t\_{\boldsymbol{y}} \\ t\_{\boldsymbol{z}} \end{pmatrix}, \tag{4.55}$$

are obtained by integrating the equations of motion, using some recursion formulas. With the Runge–Kutta–Nyström method, one step (numbered by *n*) becomes:

$$r\_{n+1} = r\_n + h\dot{r}\_n + \frac{h^2}{6}(k\_1 + k\_2 + k\_3) = F\_n(r\_n, \dot{r}\_n),\tag{4.56}$$

$$
\dot{r}\_{n+1} = \dot{r}\_n + \frac{h}{6}(k\_1 + 2k\_2 + 2k\_3 + k\_4) = G\_n(r\_n, \dot{r}\_n). \tag{4.57}
$$

To obtain the Jacobian matrix of the propagated global track parameters with respect to the initial track parameters *qi*, the recursion formulae (Eq. (4.57)) have to be differentiated with respect to *qi*, giving:

$$\mathbf{J}\_{n+1} = \begin{pmatrix} \frac{\partial \mathbf{r}\_{n+1}}{\partial \mathbf{q}\_{i}}\\ \frac{\partial \dot{\mathbf{r}}\_{n+1}}{\partial \mathbf{q}\_{i}} \end{pmatrix} = \begin{pmatrix} \frac{\partial \mathbf{F}\_{n}}{\partial \mathbf{q}\_{i}}\\ \frac{\partial \mathbf{G}\_{n}}{\partial \mathbf{q}\_{i}} \end{pmatrix} = \begin{pmatrix} \frac{\partial \mathbf{F}\_{n}}{\partial \mathbf{r}\_{n}} & \frac{\partial \mathbf{F}\_{n}}{\partial \dot{\mathbf{r}}\_{n}}\\ \frac{\partial \mathbf{G}\_{n}}{\partial \mathbf{r}\_{n}} & \frac{\partial \mathbf{G}\_{n}}{\partial \dot{\mathbf{r}}\_{n}} \end{pmatrix} \cdot \begin{pmatrix} \frac{\partial \mathbf{r}\_{n}}{\partial \mathbf{q}\_{i}}\\ \frac{\partial \mathbf{r}\_{n}}{\partial \mathbf{q}\_{i}} \end{pmatrix} = \mathbf{D}\_{n} \cdot \mathbf{J}\_{n},\tag{4.58}$$

where the derivatives *∂rn/∂q<sup>i</sup>* and *∂r***˙***n/∂q<sup>i</sup>* of the 6 × *j* Jacobian *J* are given by the following 3 × *j* matrices:

$$\frac{\partial \mathbf{r}\_n}{\partial \mathbf{q}\_i} = \begin{pmatrix} \frac{\partial x\_n}{\partial q\_{i,1}} \cdots \frac{\partial x\_n}{\partial q\_{i,j}}\\ \vdots & \ddots \quad \vdots\\ \frac{\partial z\_n}{\partial q\_{i,1}} \cdots \frac{\partial z\_n}{\partial q\_{i,j}} \end{pmatrix}, \quad \frac{\partial \dot{r}\_n}{\partial \mathbf{q}\_i} = \begin{pmatrix} \frac{\partial t\_{\ge,n}}{\partial q\_{i,1}} \cdots \frac{\partial t\_{\ge,n}}{\partial q\_{i,j}}\\ \vdots & \ddots \quad \vdots\\ \frac{\partial t\_{\ge,n}}{\partial q\_{i,1}} \cdots \frac{\partial t\_{\ge,n}}{\partial q\_{i,j}} \end{pmatrix}. \tag{4.59}$$

*D<sup>n</sup>* is a 6 × 6 matrix containing the recursion formulae *F<sup>n</sup>* and *G<sup>n</sup>* differentiated with respect to the global track parameters:

$$\mathcal{D}\_n = \frac{\partial (F\_n, G\_n)}{\partial (r\_n, \dot{r}\_n)} = \begin{pmatrix} \frac{\partial F\_n}{\partial r\_n} & \frac{\partial F\_n}{\partial \dot{r}\_n} \\ \frac{\partial G\_n}{\partial r\_n} & \frac{\partial G\_n}{\partial \dot{r}\_n} \end{pmatrix},\tag{4.60}$$

giving the 3 × 3 matrices

$$\frac{\partial F\_n}{\partial \mathbf{r}\_n} = \begin{pmatrix} \frac{\partial F\_{n,1}}{\partial \mathbf{x}\_n} & \cdots & \frac{\partial F\_{n,1}}{\partial \mathbf{z}\_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial F\_{n,3}}{\partial \mathbf{x}\_n} & \cdots & \frac{\partial F\_{n,3}}{\partial \mathbf{z}\_n} \end{pmatrix}, \quad \frac{\partial F\_n}{\partial \dot{\mathbf{r}}\_n} = \begin{pmatrix} \frac{\partial F\_{n,1}}{\partial \mathbf{t}\_{\mathbf{x},n}} & \cdots & \frac{\partial F\_{n,1}}{\partial \mathbf{t}\_{\mathbf{z},n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial F\_{n,3}}{\partial \mathbf{t}\_{\mathbf{x},n}} & \cdots & \frac{\partial F\_{n,3}}{\partial \mathbf{t}\_{\mathbf{z},n}} \end{pmatrix}, \tag{4.61}$$

and

$$\frac{\partial \mathbf{G}\_n}{\partial \mathbf{r}\_n} = \begin{pmatrix} \frac{\partial G\_{n,1}}{\partial \mathbf{x}\_n} & \cdots & \frac{\partial G\_{n,1}}{\partial \mathbf{z}\_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G\_{n,3}}{\partial \mathbf{x}\_n} & \cdots & \frac{\partial G\_{n,3}}{\partial \mathbf{z}\_n} \end{pmatrix}, \quad \frac{\partial \mathbf{G}\_n}{\partial \dot{\mathbf{r}}\_n} = \begin{pmatrix} \frac{\partial G\_{n,1}}{\partial t\_{\mathcal{X},n}} & \cdots & \frac{\partial G\_{n,1}}{\partial t\_{\mathcal{Z},n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial G\_{n,3}}{\partial t\_{\mathcal{X},n}} & \cdots & \frac{\partial G\_{n,3}}{\partial t\_{\mathcal{Z},n}} \end{pmatrix}. \tag{4.62}$$

By writing the recursion formulae of the derivatives as a product of *D<sup>n</sup>* and *J<sup>n</sup>* (Eq. (4.58)), the recursion formulae *F<sup>n</sup>* and *G<sup>n</sup>* can be differentiated with respect to the global track parameters *r<sup>n</sup>* and *r***˙***<sup>n</sup>* instead of the initial track parameters *qi*. This greatly simplifies the differentiation, giving:

$$\begin{aligned} \frac{\partial F\_n}{\partial r\_n} &= 1 + \frac{h^2}{6} \left( \frac{\partial k\_1}{\partial r\_n} + \frac{\partial k\_2}{\partial r\_n} + \frac{\partial k\_3}{\partial r\_n} \right), \\\\ \frac{\partial F\_n}{\partial \dot{r}\_n} &= h + \frac{h^2}{6} \left( \frac{\partial k\_1}{\partial \dot{r}\_n} + \frac{\partial k\_2}{\partial \dot{r}\_n} + \frac{\partial k\_3}{\partial \dot{r}\_n} \right), \\\\ \frac{\partial G\_n}{\partial r\_n} &= \frac{h}{6} \left( \frac{\partial k\_1}{\partial r\_n} + 2 \frac{\partial k\_2}{\partial r\_n} + 2 \frac{\partial k\_3}{\partial r\_n} + \frac{\partial k\_4}{\partial r\_n} \right), \\\\ \frac{\partial G\_n}{\partial \dot{r}\_n} &= 1 + \frac{h}{6} \left( \frac{\partial k\_1}{\partial \dot{r}\_n} + 2 \frac{\partial k\_2}{\partial \dot{r}\_n} + 2 \frac{\partial k\_3}{\partial \dot{r}\_n} + \frac{\partial k\_4}{\partial \dot{r}\_n} \right). \end{aligned} \tag{4.63}$$

In order to calculate these derivatives explicitly, the individual stages of the Runge– Kutta–Nyström method have to be differentiated with respect to the global track parameters:

$$\mathbf{A}\_{l} = \frac{\partial \mathbf{k}\_{l}}{\partial \dot{r}\_{n}}, \qquad \mathbf{C}\_{l} = \frac{\partial \mathbf{k}\_{l}}{\partial r\_{n}}, \tag{4.64}$$

where *l* denotes the individual stages, and *k<sup>l</sup>* is given by the equations of motion of the global track parameters in Eq. (4.2):

$$\begin{aligned} \frac{\mathbf{d}^2 \mathbf{x}}{\mathbf{d}s^2} &= \mathbf{x}^{\prime\prime} = \xi \left( t\_\mathbf{y} B\_\mathbf{z} - t\_\mathbf{z} B\_\mathbf{y} \right), \\ \frac{\mathbf{d}^2 \mathbf{y}}{\mathbf{d}s^2} &= \mathbf{y}^{\prime\prime} = \xi \left( t\_\mathbf{t} B\_\mathbf{x} - t\_\mathbf{x} B\_\mathbf{z} \right), \\ \frac{\mathbf{d}^2 \mathbf{z}}{\mathbf{d}s^2} &= \mathbf{z}^{\prime\prime} = \xi \left( t\_\mathbf{t} B\_\mathbf{y} - t\_\mathbf{y} B\_\mathbf{x} \right), \end{aligned} \tag{4.65}$$

where *ξ* ≡ *kψ*. Writing the 3 × 3 matrices *A<sup>l</sup>* and *C<sup>l</sup>* in a general form yields:

$$A = \begin{pmatrix} \frac{\partial x''}{\partial t\_{\chi}} \cdots \frac{\partial x''}{\partial t\_{\varepsilon}}\\ \vdots & \ddots & \vdots\\ \frac{\partial z''}{\partial t\_{\chi}} \cdots \frac{\partial z''}{\partial t\_{\varepsilon}} \end{pmatrix} = \begin{pmatrix} 0 & \xi \, B\_{\varepsilon} & -\xi \, B\_{\chi} \\ -\xi \, B\_{\varepsilon} & 0 & \xi \, B\_{\chi} \\ \xi \, B\_{\chi} & -\xi \, B\_{\chi} & 0 \end{pmatrix} \tag{4.66}$$

and

*C*= ⎛ ⎜ ⎜ ⎜ ⎜ ⎝ *∂x ∂x* ··· *∂x ∂z . . . ... . . . ∂z ∂x* ··· *∂z ∂z* ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ = *ξ* ⎛ ⎝ *(tyBz*;*<sup>x</sup>* − *tzBy*;*<sup>x</sup> ) (tyBz*;*<sup>y</sup>* − *tzBy*;*<sup>y</sup> ) (tyBz*;*<sup>z</sup>* − *tzBy*;*z) (tzBx*;*<sup>x</sup>* − *txBz*;*<sup>x</sup> ) (tzBx*;*<sup>y</sup>* − *txBz*;*<sup>y</sup> ) (tzBx*;*<sup>z</sup>* − *txBz*;*z) (txBy*;*<sup>x</sup>* − *tyBx*;*<sup>x</sup> ) (txBy*;*<sup>y</sup>* − *tyBx*;*<sup>y</sup> ) (txBy*;*<sup>z</sup>* − *tyBx*;*z)* ⎞ ⎠*,* (4.67)

with

$$B\_{u;v} = \frac{\partial B\_u}{\partial v}, \quad u, v \in \{x, y, z\}. \tag{4.68}$$

With the help of these matrices, the elements of the matrix *D<sup>n</sup>* in Eq. (4.60) are computed. *D<sup>n</sup>* is then multiplied by *J<sup>n</sup>* to produce the transported Jacobian *Jn*+1; see Eq. (4.58). This procedure is repeated for every recursion step, transforming *J* along the way. If, for instance, a planar detector element is reached at the end of the propagation steps and a measurement is to be included in a track reconstruction algorithm, the Jacobian can be transported from the global parameters to a local set of parameters as described in Sect. 4.4.1.3.

When applied to real problems, the field gradients in *C* are usually quite costly to calculate. The calculations can be significantly sped up by setting elements of *C* with negligible influence on the Jacobians to zero. The sensitivity of the Jacobians to the field gradients can be checked by a simulation study.

# **4.5 Material Effects**

## *4.5.1 Multiple Scattering*

#### **4.5.1.1 The Distribution of the Scattering Angle**

Elastic Coulomb scattering of particles heavier than electrons, including the muon, is dominated by the atomic nucleus. The PDF of the single scattering angle *θ* has the following form [9]:

$$f(\theta) = \begin{cases} \frac{k\theta}{(\theta^2 + a^2)^2} & \text{if } \theta \le b, \\\ 0 & \text{otherwise,} \end{cases} \tag{4.69}$$

with the normalization constant

$$k = 2a^2 \left( 1 + a^2 / b^2 \right). \tag{4.70}$$

The PDF in Eq. (4.69) is derived from the Rutherford scattering formula by setting sin *θ* ≈ *θ* and introducing a minimal and a maximal scattering angle, thereby avoiding the singularity at *θ* = 0. The minimal and maximal angles *a, b* depend on the nuclear charge *Z* and the atomic mass number *A* of the nucleus as well as on the momentum *p* of the scattered particle [9]:

$$a = \frac{2.66 \cdot 10^{-6} \cdot Z^{1/3}}{p}, \ b = \frac{0.14}{A^{1/3} \cdot p}. \tag{4.71}$$

Here and in the following, *p* is assumed to be given in units of GeV*/c*. The ratio

$$
\rho = b/a \approx \left(\frac{204}{Z^{1/3}}\right)^2\tag{4.72}
$$

depends only on the nuclear charge *Z*, with the exception of very heavy nuclei, where the approximation *A* ≈ 2*Z* breaks down. The size of *ρ* is typically of the order 104. The normalization constant *<sup>k</sup>* can therefore be approximated by *<sup>k</sup>* <sup>≈</sup> <sup>2</sup>*a*2. It follows that the expectations of *θ* and *θ* <sup>2</sup> are approximately given by:

$$\mathsf{E}[\theta] = \frac{4.18 \cdot 10^{-6} \cdot Z^{1/3}}{p}, \quad \mathsf{E}[\theta^2] = \frac{2.84 \cdot 10^{-11} \cdot Z^{2/3} \cdot \ln(159 \, Z^{-1/3})}{p^2}. \tag{4.73}$$

In track reconstruction, the scattering angle *θ* is of little use, as it is more convenient to work with two projected scattering angles instead, for instance, *θx* = *θ* cos *φ* and *θy* = *θ* sin *φ*, if the particle runs parallel to the *z*-axis. Under the assumption that the azimuthal angle *φ* is independent of *θ* and uniform in the interval [0*,* 2*π*], their joint PDF is given by [10]:

$$g(\theta\_{\mathbf{x}}, \theta\_{\mathbf{y}}) = \begin{cases} \frac{1}{\pi} \cdot \frac{a^2}{(\theta\_{\mathbf{x}}^2 + \theta\_{\mathbf{y}}^2 + a^2)^2}, & \text{if } 0 \le \theta\_{\mathbf{x}}^2 + \theta\_{\mathbf{y}}^2 \le b^2, \\ 0, & \text{otherwise.} \end{cases} \tag{4.74}$$

The support of the joint PDF is a circle around the origin with radius *b*. The projected angles *θx* and *θy* are uncorrelated but not independent. The marginal PDF of the projected angle in the interval [−*b, b*] has the following form:

$$f(\theta\_{\mathbb{P}}) = \frac{a^2}{\pi} \left[ \frac{\sqrt{b^2 - \theta\_{\mathbb{P}}^2}}{(a^2 + b^2)(\theta\_{\mathbb{P}}^2 + a^2)} + \frac{\arccos\left(\sqrt{\theta\_{\mathbb{P}}^2 + a^2} \,/\sqrt{a^2 + b^2}\right)}{(\theta\_{\mathbb{P}}^2 + a^2)^{3/2}} \right],\tag{4.75}$$

with *θ*<sup>p</sup> either *θx* or *θy* . The marginal PDF has zero mean, a single mode at *θ*<sup>p</sup> = 0 with *f (*0*)* ≈ 1*/(*2*a)*, and the following variance:

$$\text{var}\left[\theta\_{\text{p}}\right] = \mathbb{E}[\theta^{2}]/2 = \frac{1.42 \cdot 10^{-11} \cdot Z^{2/3} \cdot \ln(159 \, Z^{-1/3})}{p^{2}}.\tag{4.76}$$

The range of the distribution is very large. For silicon (*Z* = 14), it is about ±2500 standard deviations.

If the projected scattering angle is still small after *N* scattering processes, it is approximately equal to the sum of the individual projected scattering angles. Its distribution can be obtained either by the convolution of *N* single scattering distributions [10] or by the Molière theory [11–13]. Figure 4.7 shows the two PDFs for a muon with *p* = 1 GeV*/c* and four silicon scatterers. Their thickness *d* is given in fractions of a radiation length [14]. The corresponding numbers of scattering processes in the four scatterers have been chosen as *<sup>N</sup>* <sup>=</sup> 210*,* 213*,* 216 and 219, respectively, spanning the range from about 0.2% to about 90% of a radiation length. There is excellent agreement for all scatterers but the thickest one; in this case, the tails are more persistent in the Molière PDF than in the PDF obtained by convolution. The discrepancy is at angles larger than the maximum scattering angle *b*, which is equal to about 46 mrad in silicon for *p* = 1 GeV*/c*.

For the purpose of track reconstruction, the "exact" distribution of the projected scattering angle has to be approximated by a single Gaussian or a mixture of Gaussians; see Sect. 6.2.3. The most commonly used approximation by a single Gaussian is the Highland formula, proposed in [15] and modified in [16]. The most up-to-date version can be found in [14], which gives the standard deviation of the projected scattering angle as:

**Fig. 4.7** Probability density functions of the projected multiple scattering angle for silicon targets obtained by convolution (solid lines) and frequency distributions obtained by simulation from the Molière densities (dots). The vertical dashed lines in figure (d) show the upper limit *b* ≈ 46 mrad in silicon

$$\sqrt{\text{var}\left[\theta\_{\text{p}}\right]} \approx \frac{0.0136\,z}{\beta p} \sqrt{\frac{d}{X\_0}} \left[1 + 0.038 \ln\left(\frac{dz^2}{\beta^2 X\_0}\right)\right],\tag{4.77}$$

where *p, β* and *z* are the momentum, velocity, and charge number of the incident particle, and *d/X*<sup>0</sup> is the thickness of the scatterer in units of the radiation length.

If the scatterer consists of a composite material with *k* components, its radiation length *X*<sup>0</sup> is given by:

$$\frac{1}{X\_0} = \sum\_{i=1}^k \frac{f\_i}{X\_i},\tag{4.78}$$

where *Xi* is the radiation length of component *i* in g/cm2 and *fi* is the mass fraction of component *i* [17]. In the following, it is more convenient to express *d* and *X*<sup>0</sup> in centimeters:

$$\frac{X\_0}{\text{cm}} = \frac{\text{g/cm}^3}{\varrho} \frac{X\_0}{\text{g/cm}^2},\tag{4.79}$$

where is the density of the material in units of g*/*cm3.

In a thin scatterer, a Gaussian distribution is but a poor approximation of the actual distribution of the multiple scattering angle, because of the latter's large range. A better, though still far from perfect, approximation can be obtained using a normal mixture with two components, one modeling the "core", the other modeling the "tails" [10, 18]. The standardized two-component mixture PDF of *θ*<sup>p</sup> with variance equal to 1 has the following form:

$$f(\theta\_{\mathbf{p}}) = (1 - \varepsilon) \cdot \varphi(\theta\_{\mathbf{p}}; 0, \sigma\_{\mathbf{l}}^2) + \varepsilon \cdot \varphi(\theta\_{\mathbf{p}}; 0, \sigma\_2^2), \tag{4.80}$$

where *σ*<sup>2</sup> <sup>1</sup> *< σ*<sup>2</sup> <sup>2</sup> *,ε<* <sup>1</sup>*/*2 and *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> = [<sup>1</sup> <sup>−</sup> *(*<sup>1</sup> <sup>−</sup> *ε) σ*<sup>2</sup> <sup>1</sup> ]*/ε*.

The core variance *σ*<sup>2</sup> <sup>1</sup> is parametrized in terms of the reduced thickness *d* 0 = *d/(β*2*X*0*)*, where *X*<sup>0</sup> is the radiation length of the scatterer:

$$
\sigma\_1^2 = 8.471 \cdot 10^{-1} + 3.347 \cdot 10^{-2} \cdot \ln d\_0' - 1.843 \cdot 10^{-3} \cdot (\ln d\_0')^2 \tag{4.81}
$$

The tail weight *ε* is parametrized in terms of a modified reduced thickness *d* 0 = *Z*2*/*3*d/(β*2*X*0*)*:

$$\varepsilon = \begin{cases} 4.841 \cdot 10^{-2} + 6.348 \cdot 10^{-3} \cdot \ln d\_0'' + 6.096 \cdot 10^{-4} \cdot (\ln d\_0'')^2, & \text{if } \ln d\_0'' < 0.5, \\ -1.908 \cdot 10^{-2} + 1.106 \cdot 10^{-1} \cdot \ln d\_0'' - 5.729 \cdot 10^{-3} \cdot (\ln d\_0'')^2, & \text{if } \ln d\_0'' \ge 0.5, \end{cases} \tag{4.82}$$

Finally, the standardized PDF in Eq. (4.80) is scaled with the total standard deviation of the projected scattering angle without the logarithmic correction; see Eq. (4.77). This ensures that the variance of the scattering angle is strictly additive if the scatterer is divided into thinner slices. For a comparison of the mixture model with simulations by GEANT4 [19], see [18].

#### **4.5.1.2 Multiple Scattering in Track Propagation**

If a charged particle is propagated through material, the covariance matrix of the track parameters is augmented by the additional uncertainty on direction and possibly position caused by multiple scattering. The algorithmic treatment is different for "thin" and "thick" scatterers. By definition, in a thin scatterer the offset—the change in position of the passing particle—is negligible in relation to the spatial resolution of the surrounding detectors, so that only the direction is affected. For instance, in a silicon sensor with a typical thickness of 0.3 mm the standard deviation of the offset is less than 0.2 µm for momenta above 1 GeV; see Eq. (4.107) below. As the spatial resolution is in the order of 10 µm, the offset can be safely neglected, and the sensor is considered as a thin scatterer.

If, on the other hand, the scatterer is a 5 cm thick iron absorber in a muon spectrometer, the standard deviation of the offset is about 0.66 mm for a muon with 1 GeV, and the offset can be as large as 2 mm. This is no longer negligible in a muon spectrometer equipped with drift chambers having a typical spatial resolution of 150 µm. The absorber should therefore be treated as a thick scatterer, at least for low-momentum muons.

#### **Thin Scatterers**

Consider a thin scatterer with nominal thickness *d*, radiation length *X*0, and unit normal vector *n* at the point where the track crosses the scatterer. The track is specified by a vector *q* of track parameters and the associated covariance matrix *C*. In a thin scatterer, only the sub-matrix of *C* corresponding to the track direction is augmented. The details depend on the choice of the track parametrization; see Sect. 4.2.

A. The track is parametrized as in Eq. (4.3):

$$q\_1 = \frac{q}{p\_{\rm T}},\ q\_2 = \phi,\ q\_3 = \tan \lambda,\ q\_4 = R\Phi,\ q\_5 = z. \tag{4.83}$$

1. Compute the unit direction vector *a* and the momentum *p* of the track:

$$\mathbf{a} = (\cos \phi \cos \lambda, \sin \phi \cos \lambda, \sin \lambda)^{\mathsf{T}}, \quad p = 1/(|q\_1| \cos \lambda). \tag{4.84}$$

2. Compute the effective thickness *t* in units of *X*0:

$$t = \frac{d|\mathbf{a}|}{|\mathbf{a} \cdot \mathbf{n}| \, X\_0}. \tag{4.85}$$

3. Compute the variance of the projected multiple scattering angle, assuming *β* = 1 and *q* = 1 if unknown:

72 4 Track Models

$$\text{var}\left[\theta\_{\text{p}}\right] = \frac{1.85 \cdot 10^{-4} \cdot t}{p^2} \left(1 + 0.038 \ln t\right)^2. \tag{4.86}$$

4a. If sin *<sup>φ</sup>* <sup>=</sup> 0 compute the covariance matrix *<sup>D</sup>* of *(a*2*, a*3*)*T:

$$\mathbf{D} = \mathbf{J} \cdot \mathbf{C} \,\,(2:3,2:3) \cdot \mathbf{J}^{\mathsf{T}}, \,\,\,\text{with}$$

$$\mathbf{J} = \begin{pmatrix} \cos\phi \cos\lambda & 0\\ 0 & \cos^3\lambda \end{pmatrix}. \tag{4.87}$$

Augment *D* by the contribution of multiple scattering [1]:

$$D' = D + \text{var}\left[\theta\_{\mathbb{P}}\right] \cdot \begin{pmatrix} 1 - a\_2^2 & -a\_2 a\_3 \\ -a\_2 a\_3 & 1 - a\_3^2 \end{pmatrix} . \tag{4.88}$$

Modify *C* :

$$\mathbf{C}\ (2:3,2:3) = \mathbf{J}^{-1} \cdot \mathbf{D}' \cdot (\mathbf{J}^{-1})^\top,\quad\text{with}$$

$$\mathbf{J}^{-1} = \begin{pmatrix} 1/\cos\phi\cos\lambda & 0\\ 0 & 1/\cos^3\lambda \end{pmatrix}.\tag{4.89}$$

4b. If sin *<sup>φ</sup>* <sup>=</sup> 0 compute the covariance matrix *<sup>D</sup>* of *(a*1*, a*3*)*T:

$$\begin{aligned} \mathbf{J} &= \mathbf{J} \cdot \mathbf{C} \begin{pmatrix} 2 \vdots 2 \ 2 \vdots \end{pmatrix} \cdot \mathbf{J}^{\mathsf{T}}, \quad \text{with} \\ \mathbf{J} &= \begin{pmatrix} -\sin\phi\cos\lambda & -\cos\phi\sin\lambda\cos^2\lambda \\ 0 & \cos^3\lambda \end{pmatrix}. \end{aligned} \tag{4.90}$$

Augment *D* by the contribution of multiple scattering [1]:

$$\mathbf{D}' = \mathbf{D} + \mathbf{var}\left[\theta\_{\mathbf{p}}\right] \cdot \begin{pmatrix} 1 - a\_1^{\;2} \; -a\_1 a\_3 \\ -a\_1 a\_3 \; 1 - a\_3^2 \end{pmatrix} . \tag{4.91}$$

Modify *C* :

$$\begin{aligned} \mathbf{C} \ (2:3,2:3) &= \mathbf{J}^{-1} \cdot \mathbf{D}' \cdot (\mathbf{J}^{-1})^\top, \quad \text{with} \\ \mathbf{J}^{-1} &= \begin{pmatrix} -1/\sin\phi \cos\lambda & -\cos\phi \sin\lambda/\sin\phi \cos^2\lambda \\ 0 & 1/\cos^3\lambda \end{pmatrix}. \end{aligned} \tag{4.92}$$

B. The track is parametrized as in Eq. (4.4):

$$q\_1 = \psi, \ q\_2 = \mathrm{d}v/\mathrm{d}u, \ q\_3 = \mathrm{d}w/\mathrm{d}u, \ q\_4 = v, \ q\_5 = w. \tag{4.93}$$

1. Compute the direction vector *a* and the momentum *p* of the track:

$$a = (q\_2, q\_3, 1)^{\mathsf{T}}, \ \ p = 1/|q\_1|. \tag{4.94}$$


$$\mathbf{D} = \mathbf{C} \ (2:3, 2:3). \tag{4.95}$$

4. Augment *D* by the contribution of multiple scattering [1] and modify *C* :

$$\mathcal{C}\left(2;3,2;3\right) = \mathcal{D} + \mathsf{var}\left[\theta\_{\mathsf{P}}\right] \cdot \left(1 + q\_2^2 + q\_3^2\right) \cdot \begin{pmatrix} 1 + q\_2^2 & q\_2 q\_3\\ q\_2 q\_3 & 1 + q\_3^2 \end{pmatrix} . \tag{4.96}$$

C. The track is parametrized as in Eq. (4.5):

$$q\_1 = \frac{q}{p},\ q\_2 = \phi,\ q\_3 = \lambda,\ q\_4 = \mathbf{x}\_{\perp},\ q\_5 = \mathbf{y}\_{\perp}.\tag{4.97}$$

1. Compute the unit direction vector *a* and the momentum *p* of the track:

$$\mathfrak{a} = (\cos \phi \cos \lambda, \sin \phi \cos \lambda, \sin \lambda)^{\mathsf{T}}, \quad p = 1/|q\_1|. \tag{4.98}$$


*<sup>D</sup>* <sup>=</sup> *<sup>J</sup>* · *<sup>C</sup> (*2:3,2:3*)* · *<sup>J</sup>* <sup>T</sup>*,* with *J* = - cos *φ* cos *λ* 0 0 cos *λ .* (4.99)

Augment *D* by the contribution of multiple scattering as in Eq. (4.88) and modify *C* :

$$\mathbf{C}\ (2:3,2:3) = \mathbf{J}^{-1} \cdot \mathbf{D}' \cdot (\mathbf{J}^{-1})^\top,\quad\text{with}$$

$$\mathbf{J}^{-1} = \begin{pmatrix} 1/\cos\phi\cos\lambda & 0\\ 0 & 1/\cos\lambda \end{pmatrix}.\tag{4.100}$$

3b. If sin *<sup>φ</sup>* <sup>=</sup> 0, compute the covariance matrix *<sup>D</sup>* of *(a*1*, a*3*)*T:

*<sup>D</sup>* <sup>=</sup> *<sup>J</sup>* · *<sup>C</sup> (*2:3,2:3*)* · *<sup>J</sup>* <sup>T</sup>*,* with *J* = - −sin *φ* cos *λ* − cos *φ* sin *λ* 0 cos *λ .* (4.101)

Augment *D* by the contribution of multiple scattering as in Eq. (4.91) and modify *C* :

$$\begin{aligned} \mathbf{C} \ (2:3,2:3) &= \mathbf{J}^{-1} \cdot \mathbf{D}' \cdot (\mathbf{J}^{-1})^\mathsf{T}, \quad \text{with} \\ \mathbf{J}^{-1} &= \begin{pmatrix} -1/\sin\phi \cos\lambda & -\cos\phi \sin\lambda/\sin\phi \cos^2\lambda \\ 0 & 1/\cos\lambda \end{pmatrix}. \end{aligned} \tag{4.102}$$

#### **Thick Scatterers**

A thick scatterer can be treated in two ways: sliced into a number of thin scatterers or considered a single scatterer. The first approach gives more precise results, in particular when the scatterer is magnetized and the incoming particle is expected to be strongly deflected and suffer significant energy loss. A typical example for this situation is a low-momentum muon crossing the instrumented iron return yoke of the CMS experiment [20].

In the second approach, the uncertainties of both the direction and the position at the exit point have to be increased. In addition, energy loss may have to be considered, and an estimate of the actual track length *L* in the scatterer must be available. If no estimate of *L* is returned by the track propagation algorithm, *L* can be approximated by the distance between the entry and the exit point times a safety factor that depends on the momentum and can be tuned by simulation. Alternatively, *L* can be determined from the most probable trajectory in the scatterer [21].

Assume that the track parameters and their covariance matrix at the exit of the scatterer are given by *q* and *C* in the curvilinear parametrization; see Eq. (4.97).


$$t = \frac{1}{X\_0}, \quad \sigma\_0^2 = \frac{1.85 \cdot 10^{-4} \cdot t}{P\_1 P\_2} \left(1 + 0.038 \ln t\right)^2. \tag{4.103}$$

3. Extract the covariance matrix *D* of *(q*2*, q*3*, q*4*, q*5*)*T:

$$\mathbf{D} = \mathbf{C} \ (2 \text{:} \mathbf{5}, 2 \text{:} \mathbf{5}). \tag{4.104}$$

4. Compute the unit direction vector *a* in the global coordinate system:

$$\mathfrak{a} = \left(\cos\phi\cos\lambda, \sin\phi\cos\lambda, \sin\lambda\right). \tag{4.105}$$

5. Transform *<sup>a</sup>* into the curvilinear system by rotating *<sup>a</sup>* into *<sup>a</sup>* <sup>=</sup> *(*0*,* <sup>0</sup>*,* <sup>1</sup>*)*T:

$$\mathbf{a}' = U\mathbf{a}, \quad \text{with} \quad U = \begin{pmatrix} \frac{a\_1^{-2}a\_3 + a\_2^{-2}}{1 - a\_3^{-2}} & -\frac{a\_1a\_2}{1 + a\_3} & -a\_1 \\ -\frac{a\_1a\_2}{1 + a\_3} & \frac{a\_1^{-2} + a\_2^{-2}a\_3}{1 - a\_3^{-2}} & -a\_2 \\ a\_1 & a\_2 & a\_3 \end{pmatrix}. \tag{4.106}$$

6. Compute the joint covariance matrix *E* of *q* = *(a* 1*, a* <sup>2</sup>*, x*⊥*, y*⊥*)*<sup>T</sup> [1]:

$$E = \sigma\_0^2 \cdot \begin{pmatrix} L & 0 & L^2/2 & 0 \\ 0 & L & 0 & L^2/2 \\ L^2/2 & 0 & L^3/3 & 0 \\ 0 & L^2/2 & 0 & L^3/3 \end{pmatrix} \cdot \tag{4.107}$$

7. Rotate the direction part of *E* back into the global system:

$$E' = T \cdot E \cdot T^{\top}, \quad \text{with} \quad T = \begin{pmatrix} \frac{a\_1^{\top} a\_3 + a\_2^{\top}}{1 - a\_3^{\top}} & -\frac{a\_1 a\_2}{1 + a\_3} & 0 \ 0\\ -\frac{a\_1 a\_2}{1 + a\_3} & \frac{a\_1^{\top} + a\_2^{\top} a\_3}{1 - a\_3^{\top}} & 0 \ 0\\ 0 & 0 & 1 \ 0\\ 0 & 0 & 0 & 1 \end{pmatrix}. \tag{4.108}$$

8. Transform the direction cosines *a*1*, a*<sup>2</sup> back to *φ,λ*:

0 0 10 0 0 01

⎝

$$\begin{aligned} E'' &= \mathbf{J} \cdot \mathbf{E}' \cdot \mathbf{J}^{\top}, \quad \text{with} \\ \mathbf{J} &= \begin{pmatrix} -\sin\phi/\cos\lambda & \cos\phi/\cos\lambda & 0 & 0 \\ -\cos\phi/\sin\lambda & -\sin\phi/\sin\lambda & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, \quad \text{if } \lambda \neq 0; \quad \text{and} \qquad (4.109) \\ \mathbf{J} &= \begin{pmatrix} -\sin\phi & \cos\phi & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, \quad \text{if } \lambda = 0. \tag{4.110} \end{aligned}$$

9. Augment *D* by the contribution of multiple scattering and modify *C* :

$$\mathbf{C}\ (2; 5, 2; 5) = \mathbf{D} + E''.\tag{4.111}$$

## *4.5.2 Energy Loss by Ionization*

#### **4.5.2.1 Mean Energy Loss**

A heavy, charged particle passing through material suffers loss of energy due to ionization of the material. The mean energy loss of the particle is given by the Bethe–Bloch formula [14]:

$$\frac{\mathrm{d}E}{\mathrm{d}s} = -Kz^2 \frac{\mathrm{Z}\rho}{A\beta^2} \left( \frac{1}{2} \ln \frac{2m\_ec^2\beta^2\gamma^2 W\_{\mathrm{max}}}{I^2} - \beta^2 - \frac{\delta}{2} \right),\tag{4.112}$$

where *K* is a constant, *z* is the charge number of the incoming particle, *Z* and *A* are the atomic number and atomic mass of the material, *ρ* is the density of the material, *me* is the electron mass, *W*max is the maximum energy transfer in a single collision, *I* is the mean excitation energy, and *δ* is the density effect correction. For an incoming particle of mass *M, W*max is given by:

$$W\_{\text{max}} = \frac{2m\_e c^2 \beta^2 \nu^2}{1 + 2\gamma m\_e/M + (m\_e/M)^2}. \tag{4.113}$$

For light particles such as electrons and positrons, the Bethe–Bloch formula needs some modifications. An alternative expression for light particles is [1]:

$$\frac{\mathrm{d}E}{\mathrm{d}s} = -K \frac{Z \, \rho}{A} \left( \ln \frac{2m\_e c^2}{I} + 1.5 \ln \gamma - 0.975 \right). \tag{4.114}$$

#### **4.5.2.2 Ionization Energy Loss in Track Propagation**

In tracking detectors, material layers traversed during propagation are often considered as discrete. Knowing the thickness of the traversed layer, the Bethe-Bloch formula is used to modify the momentum part of the track parameter vector by the expected change before propagating to the next layer. Fluctuations in the ionization energy loss are often considered so small that they are neglected.

For detectors such as electromagnetic or hadronic calorimeters, however, ionization energy loss takes place continuously during propagation. In this case, it is possible to augment the vector of global track parameters in Eq. (4.55) with a parameter containing the track momentum [3, 22], e. g., *ξ* = *kψ* as defined in Sect. 4.4.2:

$$\mathfrak{u} = (\mathfrak{x}, \mathfrak{y}, z, \mathfrak{S})^{\mathsf{T}}, \quad \dot{\mathfrak{u}} = \begin{pmatrix} t\_{\mathfrak{x}}, t\_{\mathfrak{y}}, t\_{\mathfrak{z}}, \mathfrak{ξ} \end{pmatrix}^{\mathsf{T}}, \tag{4.115}$$

where *Ξ* is an auxiliary parameter corresponding to the integrated change in *ξ* . Track parameter and Jacobian matrix propagations can thereby be carried out in way very similar to the one described in Sect. 4.4.2, effectively including effects of energy loss continuously while traversing the detector.

## *4.5.3 Energy Loss by Bremsstrahlung*

#### **4.5.3.1 Mean and Distribution of the Energy Loss**

High-energy electrons lose energy in a material mainly by bremsstrahlung [14]. The dependence on the material can be summarized in a characteristic length, called the radiation length *X*0. It is defined as the average distance over which a high-energy electron loses 1 − 1*/e* ≈ 63% of its energy. Although it is not strictly identical to the radiation length that is characteristic for multiple scattering (see Sect. 4.5.1), the same length is used in both contexts for the sake of convenience.

The rate of the mean energy loss by bremsstrahlung is nearly proportional to the energy:

$$\mathbb{E}\left[-\frac{\mathrm{d}E}{\mathrm{d}s}\right] \approx \frac{E}{X\_0},\tag{4.116}$$

and on average, the energy decreases approximately exponentially as a function of the reduced path length *t* = *s/X*0:

$$\mathbb{E}\left[E(t)\right] = E\_0 \exp(-t),\tag{4.117}$$

where *E*<sup>0</sup> is the initial energy. The energy loss is subject to large fluctuations, as a substantial part of the electron energy can be carried away by a single photon. A simplified PDF of *E* as a function of *t* has the following form, called the Bethe– Heitler model [23]:

$$h(E) = \frac{[\ln(E\_0/E)]^{t/\ln 2 - 1}}{E\_0 \Gamma\left(t/\ln 2\right)}.\tag{4.118}$$

The PDF can be rewritten in terms of the remaining energy fraction *z* = *E/E*0:

$$f(z) = \frac{(-\ln z)^{t/\ln 2 - 1}}{\Gamma\left(t/\ln 2\right)}.\tag{4.119}$$

It can be seen that − ln *z* is Gamma-distributed. The expectation and the variance can be computed explicitly [24]:

$$\mathsf{E}\left[z\right] = \exp\left(-t\right), \quad \mathsf{Var}\left[z\right] = \exp\left(-t\ln 3/\ln 2\right) - \exp\left(-2t\right). \tag{4.120}$$

**Fig. 4.8** Probability density function of the Bethe–Heitler model of bremsstrahlung in Eq. (4.119), with mean *μ* and standard deviation *σ*, for *t* = 0*.*2*,* 0*.*1*,* 0*.*05*,* 0*.*02

Figure 4.8 shows that the shape of the PDF is very far from being Gaussian; thus, the distribution cannot be adequately described by merely its mean and variance.

#### **4.5.3.2 Approximation by Gaussian Mixtures**

For electron reconstruction with the Gaussian-sum filter (GSF; see Sect. 6.2.3) the model PDF in Eq. (4.119) is approximated by a normal mixture PDF with *n*<sup>c</sup> components. The parameters of the mixture are determined by minimizing some measure of distance between the two distributions. In [25], two distances have been used: *D*KL, the Kullback–Leibler distance, and *D*CDF, the integral over the absolute difference of the respective cumulative distribution functions (CDFs):

$$D\_{\rm KL} = \int\_0^1 \ln[f(z)/g(z)] \, f(z) \, \mathrm{d}z,\tag{4.121}$$

$$D\_{\rm CDF} = \int\_{-\infty}^{\infty} |F(z) - G(z)| \, \mathrm{d}z,\tag{4.122}$$

where *f (z)* and *F (z)* are the PDF and CDF of the model distribution, and *g(z)* and *G(z)* are the PDF and CDF of the normal mixture, respectively. Using *D*KL and *n*<sup>c</sup> = 1 the single Gaussian with the correct first two moments is recovered. In all other cases, the mixtures do not have the same moments as the model. The quality of the approximating mixtures has been investigated in detail in [25]. Software in the form of a MATLAB® function can be downloaded from the URL in [26].

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Track Finding**

**Abstract** There is no systematic theory of track finding yet. Therefore, the first section of this chapter presents a list of basic techniques which have been successfully used, stand-alone or in combination, in past and present experiments. Among them are the conformal transformation, the Hough and the Legendre transform, cellular automata and neural networks, pattern matching, and track following by the combinatorial Kalman filter. The following section gives a brief excursion into online or real-time track finding in the collider experiments CDF, ATLAS, and CMS. As track finding in most cases delivers some candidates that do not correspond to actual particle tracks, the concluding section discusses methods for an efficient selection of valid candidates.

# **5.1 Basic Techniques**

# *5.1.1 Conformal Transformation*

Circles in the plane passing through the origin can be transformed into straight lines by the following mapping [1]:

$$u = \frac{x}{x^2 + y^2}, \quad v = \frac{y}{x^2 + y^2}. \tag{5.1}$$

The mapping is conformal, i.e., preserves angles between curves. Assume that a circle is defined by the equation:

$$\left(\mathbf{x} - a\right)^2 + \left(\mathbf{y} - b\right)^2 = R^2 = a^2 + b^2. \tag{5.2}$$

Expansion of the left hand side and division by *<sup>x</sup>*<sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> gives a linear equation in *<sup>u</sup>* and *v*:

$$2au + 2bv = 1.\tag{5.3}$$

© The Author(s) 2021

81

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_5

This is the equation of a straight line in the *(u, v)*-plane with distance *d* = 1*/(*2*R)* from the origin. A circle with a large radius *R* or small curvature *κ* is therefore transformed into a line that passes very close to the origin (Fig. 5.1). In the limit of zero curvature, the circle becomes a line transformed into itself by the mapping in Eq. (5.1). Both circle finding and circle fitting can be simplified by this transformation from circles to straight lines.

## *5.1.2 Hough Transform*

The Hough transform [2] is a technique that finds clusters of points that lie on or close to a parametric curve such as a straight line or a circle. The number of parameters is usually two or three. In the simplest case, there is a set of points {*(x*1*, y*1*), . . . , (xn, yn)*} in the plane (image space) that lie on a straight line, parameterized by *y* = *k*0*x* + *d*0, where *k*<sup>0</sup> is the slope and *d*<sup>0</sup> is the intercept of the line. A line passing through *(xi, yi)* fulfills the equation *d* = −*xik* + *yi*, which is the equation of a line *i* in the parameter space (Hough space) of *k* and *d*. The point of intersection of two lines *i, j* is just *(k*0*, d*0*)*, the parameters of the original line. Finding straight lines in the image space is therefore equivalent to finding intersection points in the Hough space.

In practice, the measured points do not lie exactly on a straight line, and the lines in the Hough space do not intersect exactly in a single point. The usual approach is to define a binning in the Hough space and count the number of lines crossing each bin. Peaks in the 2D histogram correspond to lines that are close to many points in the image space. The size of the bins depends on the distribution of the measurement

**Fig. 5.1** Conformal transformation of the circles through the origin in the *(x, y)*-plane (left) into lines in the *(u, v)*-plane (right)

errors and can be tuned on simulated tracks. Alternatively, a 2D binary search can be performed using a quadtree data structure [3, 4].

The parametrization of the lines by *y* = *kx* + *d* can be numerically problematic if very large values of the slope *k* are possible. A more robust parametrization of the line has the form *x* cos *ϕ* + *y* sin *ϕ* − *c* = 0. The curve in the Hough space of *ϕ* and *c* passing through the point *(xi, yi)* is a sinusoid with the equation

$$c = \mathbf{x}\_l \cos \varphi + \mathbf{y}\_l \sin \varphi = r\_l \sin(\varphi + \varphi\_l), \quad \text{with}$$

$$r\_l = \sqrt{\mathbf{x}\_l^2 + \mathbf{y}\_l^2}, \quad \varphi\_l = \arctan(\mathbf{x}\_l/\mathbf{y}\_l). \tag{5.4}$$

If the curve to be found in the image space is a circle through the origin, there are two possibilities. The problem can be reduced to the straight line case by a conformal transformation, or a circle through the origin can be parameterized in the following form:

$$\mathbf{x}^2 - 2\mathbf{x}\,\mathbf{R}\cos\varphi + \mathbf{y}^2 - 2\mathbf{y}\,\mathbf{R}\sin\varphi = \mathbf{0},\tag{5.5}$$

where *R* is the circle radius and *ϕ* is the azimuth of the circle center in polar coordinates. The curve in the Hough space of *κ* = 1*/R* and *ϕ* passing through the point *(xi, yi)* is a sinusoid with the equation

$$\kappa = \frac{2}{r\_l} \sin(\varphi + \varphi\_l),\tag{5.6}$$

with *ri* and *ϕi* as above, see Fig. 5.2.

If the curve to be found in the image space is a circle in general position with the equation

$$(\mathbf{x} - \mathbf{x\_c})^2 + (\mathbf{y} - \mathbf{y\_c})^2 - R^2 = \mathbf{0},\tag{5.7}$$

the constraint that the circle passes through the point *(xi, yi)* defines a second order surface in the 3D Hough space of *x*c*, y*c*, z*:

$$z = (\mathbf{x}\_l - \mathbf{x}\_\mathbf{c})^2 + (\mathbf{y}\_l - \mathbf{y}\_\mathbf{c})^2,\text{ with } z = R^2. \tag{5.8}$$

It follows that finding circles requires finding intersection points of surfaces in a 3D histogram, which is computationally much more expensive than the same problem in 2D. A search in 3D can be based on octrees, the 3D analogues of quadtrees [4].

An alternative is the randomized Hough transform [5], that randomly selects triplets of points. The center of the circle passing through the triplet and its radius are stored in a 3D histogram. Peak finding can be done in 3D or in the 2D histogram of the circle centers. After finding a peak, the best circle center and radius is obtained by computing the medoid [6] of the entries in the peak bin.

## *5.1.3 Artificial Retina*

The concept of the "Artificial Retina" was introduced in [7]. Similar to the Hough transform, it relies on a partition of the track parameter space into cells. Figure 5.3 shows a simple example with a track in 2D, specified by slope *k* and intercept *d*.

The intensity of a cell is the sum of the responses of its associated receptors to the hits that are present in the layer. Assuming a Gaussian response, the intensity *R(k, d)* of the cell centered at *(k, d)* is given by:

$$R(k,d) = \sum\_{l=1}^{n} \sum\_{j=1}^{m} \exp\left(-s\_{lj}^{2}/2\sigma\_{l}^{2}\right),\tag{5.9}$$

where *sij* = *yij* − *(kxi* + *d)* is the distance of hit *j* in layer *i* from the ideal track position in layer *i*, and *σi* is a scale parameter that regulates the width of the receptive field in layer *i*. Other response functions are of course possible, and

**Fig. 5.3** Left: A cell in the parameter space of *(k, d)* corresponds to an ideal track with these parameters. Right: The corresponding track receptors represent the intercepts of this ideal track with the tracking layers. The hits of a real track are close to the track receptors within the experimental resolution. (Adapted from [11], by permission of Elsevier)

their shape and width can be adjusted for optimal performance. As with the Hough transform, track candidates correspond to the local maxima of intensity in parameter space.

The artificial retina is eminently suitable for high-speed track finding, as it can be highly parallelized and implemented on commercial FPGAs [8, 9]. For applications in the vertex locator of LHCb (Sect. 1.6.1.4) and in a test beam, see [10, 11].

## *5.1.4 Legendre Transform*

The Legendre transform is an extension of the Hough transform, used to find common tangent lines or tangent circles through the origin to a set *C* of circles [12, 13]. In the context of track finding, the circles in *C* are drift circles in a drift chamber or a drift tube chamber [14], see Fig. 5.4. Assume a drift circle in the plane, with center *(x*w*, y*w*)* and radius *ρ*. A line parameterized by *x* cos *ϕ* + *y* sin *ϕ* − *c* = 0 is tangent to the circle if, and only if, its signed distance from the circle center is equal to ±*ρ*:

$$\mathbf{x}\_{\text{W}}\cos\varphi + \mathbf{y}\_{\text{W}}\sin\varphi - c = \pm \rho\_l \quad \text{or} \quad c = \mathbf{x}\_{\text{W}}\cos\varphi + \mathbf{y}\_{\text{W}}\sin\varphi \pm \rho\_l. \tag{5.10}$$

The drift circle in the image space *(x, y)* therefore corresponds to two sinusoids in the Legendre space *(ϕ, c)*, see Fig. 5.4. The further procedure is the same as with the Hough transform.

A circle through the origin can be parameterized by its radius *R* and the angle *Φ*, where the circle center is given by *x*<sup>c</sup> = *R* cos *Φ, y*<sup>c</sup> = *R* sin *Φ*. Such a circle

touches the drift circle with center *(x*w*, y*w*)* and radius *ρ* if, and only if, the squared distance of the circle centers is equal either to *(R* <sup>+</sup> *ρ)*<sup>2</sup> or to *(R* <sup>−</sup> *ρ)*2:

$$2\mathcal{R}\left(\pm\rho+\text{x}\_{\text{W}}\cos\Phi+\text{y}\_{\text{W}}\sin\Phi\right)=\mathbf{x}\_{\text{W}}^{2}+\mathbf{y}\_{\text{W}}^{2}-\rho^{2}.\tag{5.11}$$

In order to avoid large radii in the limit of the circle approaching a straight line, the Legendre space is chosen as *(κ, Φ)*, where *κ* = 1*/R* is the curvature of the circle. The drift circle again corresponds to two sinusoids in *(κ, Φ)*:

$$\kappa = \frac{2(\pm \rho + \chi\_{\rm w} \cos \Phi + \chi\_{\rm w} \sin \Phi)}{x\_{\rm w}^2 + y\_{\rm w}^2 - \rho^2} \tag{5.12}$$

The task of finding a circle through the origin can be reduced to the task of finding a straight line if the Legendre transform is preceded by a conformal transformation, see Sect. 5.1.1, which transforms the circle into a straight line while transforming the drift circles into circles.

## *5.1.5 Cellular Automaton*

A cellular automaton (CA) is a dynamical system where space, time, and variables are discrete [15]. It has five fundamental defining characteristics [16]:


Probably the best known CA is Conway's "Game of Life" [17]. Like many, but not all, subsequently proposed cellular automata, it assumes that the cells are located on a regular 2D rectangular lattice. However, this assumption is too restrictive for the application to track finding. Instead, the cells are represented by the nodes of a graph, and the neighbourhood of a cell *Ci* is the set of all cells connected by an edge to *Ci*.

The neighbours of a cell are divided into "inner" and "outer" neighbours such that if *Ci* is an inner neighbour of *Cj* , then *Cj* is an outer neighbour of *Ci*. The possible states of a cell are the non-negative integers. In practice, the states are bounded by some upper limit depending on the number of detector layers. The initial states of all cells are set to zero.

The earliest applications of the CA to track finding are described in [18–21]. With the exception of [18], cells are defined as short track segments connecting hits in adjacent detector layers; segments that skip a layer are sometimes allowed as well [22, 23]. Each cell has an inner hit and an outer hit according to the arrangement of the detector layers. Two cells are neighbours if the outer hit of one cell is the inner hit of the other cell. In addition, the neighbourhood relation can be further restricted by imposing a cut on the angle spanned by the two cells (segments). It is the task of the CA to find chains of neighbouring segments that correspond to actual tracks. This is achieved by the following rule of evolution. At each time step, the state of each cell is augmented by 1 if it has the same state as its inner neighbour. The states of all cells are updated synchronously. When no state changes anymore, the evolution is stopped. At this point, the state of a cell is the length of the longest unbroken chain of segments terminating in this cell. An illustration of the CA is shown in Fig. 5.5.

The actual search for track candidates starts with the cells with the highest state. If such a cell has an inner neighbour with a state that is smaller by 1, it is attached to the track candidate. This procedure is repeated until a cell with no inner neighbour is reached. If, at any point, several neighbours can be attached to the track candidate, either the "best" cell according to some criterion is selected, or the track candidate is split into two, and each candidate is followed independently. This inevitably results in candidates that share hits, requiring a final selection of compatible (nonoverlapping) candidates. This is the topic of Sect. 5.3.

**Fig. 5.5** Illustration of the cellular automaton algorithm. It creates tracklets and links, numbers them as possibly situated on the same trajectory, and collects tracklets into track candidates. (From [24], by permission of Elsevier)

In tracking detectors with few layers and little redundancy [25], the CA can be complemented by prior information stored in a sector map [26, 27]. In this concept, the sensors are partitioned into sectors. In order to create a sector map, a large training sample of simulated tracks in a chosen angular and momentum region is generated. Two sectors are declared as "friends" if a sufficiently large number of tracks passes through them without hitting any other sector in between. The friendship relation defines an acyclic directed graph that is stored in the sector map.

In the track-finding phase, the hits are sorted into their sectors, and the segment finder is activated. It creates pairs of hits in friendly sectors and keeps those pairs (segments) that pass various cuts, which are also stored in the sector map. If two segments share a hit such that the outer hit of the inner segment is the same as the inner hit of the outer segment, they are passed to the neighbour finder, which applies additional cuts, also stored in the sector map. Finally, all surviving segments form the cells of the CA.

Different sector maps for different geometrical or kinematical regions can be created and applied sequentially. For instance, high-momentum, high-quality tracks can be found first by using a sector map with tighter cuts. After removing their hits, a second sector map with less selective cuts can be used to find the remaining tracks.

Another extension of the CA, termed the 4D CA, is described in [28, 29]. Pairs of segments or triplets of hits are accepted only if the time stamps of the hits are consistent with the hypothesis that they have been created by the same charged particle. For the application of the 4D CA in the CBM experiment, see Sect. 11.2.

## *5.1.6 Neural Networks*

The application of neural networks to track finding was first proposed independently in [30] and [31]. In this approach, the network is of the Hopfield type [32], the neurons being track segments that connect observations in adjacent or nearby layers of the detector; see Sect. 5.1.6.1. More recently, the HEP.TrkX pilot project was established with the aim to develop deep neural networks for track finding in high-multiplicity environments typical for the LHC era [33]. Two deep networks are described in Sects. 5.1.6.2–5.1.6.3. A follow-up project, called Exa.TrkX, was started with a kick-off meeting in June 2019 [34]; a second workshop was held in April 2020 [35]. A reference to first published results can be found in Sect. 5.1.6.3.

#### **5.1.6.1 Hopfield Network**

A Hopfield network is a fully connected network with a single layer of neurons. In the simplest case, the neurons are binary with two states: *si* = ±1*, i* = 1*,...,n*. Each pair *(i, j )* of neurons has a fixed connection weight *wij* with *wij* = *wj i* and *wii* = 0. The states of the neurons evolve in discrete time steps according to the following prescription:

$$s\_l(t) = \text{sign}\left[\sum\_{j=1}^n w\_{lj} s\_j(t-1)\right].\tag{5.13}$$

The update can be synchronous (the states are recomputed in parallel) or asynchronous (the states are recomputed sequentially). The network has an associated function *E(s)*, defined as:

$$E(\mathbf{s}) = -\frac{1}{2} \sum\_{l,j=1}^{n} w\_{lj} \,\mathbf{s}\_l \,\mathbf{s}\_j,\tag{5.14}$$

where *s* = *(s*1*,...,sn)* is the state of the network. In analogy to the theory of spin glasses, *E(s)* is called the energy function of the network. It can be shown that *E(s)* is a non-increasing function of the time *t* and that the update rule Eq. (5.13) leads to a local minimum of *E(s)* [36].

In most applications, including the one discussed here, the aim is to find the global minimum rather than a local one. To this end, thermal noise is introduced in the network. At temperature *T* , the state *s* is Boltzmann distributed with the probability function

$$P(\mathbf{s}) = \frac{1}{Z} \exp\left[-E(\mathbf{s})/T\right], \text{ with } Z = \sum\_{\mathbf{s}} \exp\left[-E(\mathbf{s})/T\right]. \tag{5.15}$$

As the number of possible states rises exponentially with the number of neurons, the partition function *Z* is computed in the mean-field approximation [37], and the thermal average *vi* of *si* is given by:

$$
\langle v\_l = \langle s\_l \rangle\_T = \tanh\left(-\frac{1}{T}\frac{\partial E}{\partial v\_l}\right),
\tag{5.16}
$$

where the states *v* = *(v*1*,...,vn)* are now continuous in the interval *(*−1*,* 1*)*. The definition of the energy function is analogous to Eq. (5.14):

$$E(\boldsymbol{\upsilon}) = -\frac{1}{2} \sum\_{l,j=1}^{n} w\_{lj} \,\, v\_l \,\, v\_j,\tag{5.17}$$

and the update is modified accordingly:

$$w\_l(t) = \tanh\left[\frac{1}{T} \sum\_{j=1}^{n} w\_{lj} \,\upsilon\_j(t-1)\right].\tag{5.18}$$

Finding the global minimum or at least a low local minimum of the energy function is facilitated by deterministic annealing [38]. First, the energy function is minimized at high temperature; the temperature is then lowered according to a predefined cooling or annealing schedule. At low temperature, the states of the network are close to either 1 (active) or −1 (inactive).

For the purpose of track finding, the problem has to be mapped on the Hopfield network such that the final state of the network corresponds to the solution of the problem. Similar to the CA, the neurons are short track segments connecting space points in adjacent or nearby layers of the tracking detector. To keep the number of neurons manageable, geometric cuts ensure that only segments that can be part of an actual track in the momentum range of interest are used as neurons. The sector map introduced in Sect. 5.1.5 can be used to store the cuts. A track can be considered as an unbroken chain of segments, so only pairs of segments sharing a point qualify for having a positive weight. Consider a triple of points *(k, l, m)* in consecutive layers, defining two neurons *vkl* and *vlm*. The weight *wklm* depends on the angle between the segments and on their length. In [31], it is defined as follows:

$$w\_{klm} = \frac{\cos^{\lambda}\theta\_{klm}}{d\_{kl} + d\_{lm}},\tag{5.19}$$

where *λ* is an odd exponent and *dij* is the length of neuron *vij* , either in space or in the projection to the bending plane. This definition of the weights favours combinations of short segments with small angles. Small weights can be set to zero.

Constraints on the possible final configurations of the active neurons can be included by adding a cost or penalty term to the energy function of the network. This serves to prevent association of a point to several tracks and to get approximately the expected number of active neurons in the final state. For example, in [39] the following cost term was used:

$$C(\boldsymbol{\upsilon}) = a \left( \sum\_{k,l,n,l \neq n} v\_{kl} v\_{kn} + \sum\_{k,l,m,k \neq m} v\_{kl} v\_{ml} \right) - b,\tag{5.20}$$

where *α* is a Lagrange multiplier and *b* is a small constant bias term. Minimizing the first term leads to a competition between neurons starting or ending in the same point, so that in the end at most one should survive. The final update equation is given by Eq. (5.16).

For an evaluation of the performance of the Hopfield network on real data from an experiment at the LEP electron-positron collider, see [39]. For an application to tracking in the ALICE experiment at the LHC, see [40].

#### **5.1.6.2 Recurrent Neural Network**

The dynamics of a particle track can be modelled by a nonlinear state-space model; see Sect. 3.2.3. A track is, thus, very similar to a discrete time series, the main difference being that the observations are not progressing in time but in space. Recurrent neural networks (RNNs), in particular networks of long short-term memory (LSTM) neurons [41, 42], are routinely used for forecasting time series arising in various contexts: financial, industrial, weather, traffic, etc. It is, therefore, to be expected that RNNs can be trained to learn the dynamics of a track and follow it in a way similar to the extended Kalman filter. The crucial difference is that in the Kalman filter, the dynamics is explicitly coded in the system equation and in the distribution of the process noise, whereas in the RNN, the dynamics is implicit in the weights learned by the network on a training set of examples.

The first successful attempt to train an RNN for track finding is described in [43]. Two models are presented. The first is a sequential hit predictor that, given a sequence of past hits, predicts the position of the next hit. The second model augments the first one by predicting the covariance matrix of the hit, using a Gaussian distribution. Both predictors are implemented as an LSTM layer, followed by a fully connected (FC) layer. The scheme of the Gaussian predictor is shown in Fig. 5.6. The training data for the RNN network and the GNN network in Sect. 5.1.6.3 were generated by the ACTS package [44].

#### **5.1.6.3 Graph Neural Network**

Like the CA described in Sect. 5.1.5, track finding with the graph neural network (GNN) is based on the representation of the tracking data by a graph [43]. The detector hits are the nodes (vertices) of the graph, and two nodes are connected by

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \overrightarrow{r\_{0},\overrightarrow{r\_{1}},\ldots,\overrightarrow{r\_{n-1}}} \end{array} \end{array} \rightarrow \begin{array}{c} \begin{array}{c} \begin{array}{c} \texttt{LSTM} \end{array} \end{array} \rightarrow \begin{array}{c} \begin{array}{c} \begin{array}{c} \texttt{FCD} \end{array} \end{array} \end{array} \end{array} \rightarrow \begin{array}{c} \begin{array}{c} \begin{array}{c} \left(\overrightarrow{r\_{1}},\Sigma\_{1}\right),\left(\widehat{r\_{2}},\Sigma\_{2}\right),\ldots,\left(\widehat{r\_{n}},\Sigma\_{n}\right) \end{array} \end{array} \end{array} \end{array} \end{array}$$

**Fig. 5.6** Diagram of the Gaussian hit predictor model that takes a sequence of 3D coordinates as input and produces bivariate Gaussian probability distributions as next-step predictions. The architecture is the same as the basic hit predictor, but the model provides additional output that parameterizes the Gaussian covariance matrix. (Adapted from [43], by permission of the author)

**Fig. 5.7** Diagram of the Graph Neural Network model which begins with an input transformation layer (IL) and has a number of recurrent iterations of alternating EdgeNet (EN) and NodeNet (NN) units. In this case, the final unit is an EN, making this a segment classifier model. (Adapted from [43], by permission of the author)

an edge if they are compatible according to some criterion. Such a criterion can be stored in a sector map as the one described in Sect. 5.1.5. The graph is fed into the GNN that consists of an input transformation layer (IL) followed by alternating units of edge networks (ENs) and node networks (NNs), each implemented as a multi-layer perceptron with two layers, see Fig. 5.7.

An EN computes a new weight for every edge from the features of the end nodes while an NN computes new features for every node from the current features and the edge-weighted aggregated features of all connected nodes in the adjacent layers. The network can be used to classify whether nodes/hits or edges belong to a track or not. The network in Fig. 5.7 classifies edges, as the last unit is an EN. An extension of the work in [43] to track seeding and hit labelling with GNNs is described in [45].

## *5.1.7 Track Following and the Combinatorial Kalman Filter*

In track finding methods such as the Hough transform or the CA, the complete set of all hits serves as the primary input. Such methods have, therefore, been dubbed "global" [46]. This is in contrast to methods that find tracks locally or sequentially, one after the other. The most prominent example of a sequential method is track following.

In track following, a track candidate starts from a "seed", i.e., a short track segment. This seed can in principle be anywhere in the tracking detector. Generating seeds in the outer layers of the trackers has the advantage of smaller occupancy and less background from low-momentum tracks that spiral in the inner layers of the tracker. Generating seeds in the inner layers, which are frequently pixel layers, has the advantage of using 3D hits with higher resolution both in the bending plane and in the longitudinal direction. As will be described in more detail in Chap. 10, ATLAS and CMS, the two general-purpose experiments at the LHC, have opted for the second solution.

The generation of seeds is often a simple combinatorial search for compatible triplets or quadruplets of hits, potentially assisted by a CA [47], and includes information about the size and position of the beam spot; see Sect. 7.1. Some examples of seed generation algorithms will be given in Chap. 10.

Once the seeds have been found, each seed is then followed through the tracker by extrapolating it toward the outside of the tracker or toward the production region, depending on where the seed is situated. After each extrapolation step, compatible hits are searched for and attached to the track candidate.

The progressive track recognition described in [48] can be extended to a combinatorial Kalman filter (CKF), introduced in [49, 50] under the name "Concurrent Track Evolution", see Fig. 5.8. First, each seed is fitted with one of the methods described in Chap. 6. The parameters and the covariance matrix of the seed are then extrapolated to the nearest tracker layer, taking into account interactions with the detector material; see Sects. 4.3, 4.4 and 4.5. The hits in the sensor in which the extrapolated trajectory intersects with the layer are tested for compatibility with the predicted track parameters using a chi-square statistic; see Sects. 3.2.3 and 6.1.2. If *n* compatible hits are found, *n* copies of the predicted state, i.e., its track parameters and its covariance matrix, are generated, and each one of them is updated with one of the *n* hits according to the Kalman filter, Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32). The original state is also kept and marked as having a missing hit, giving a

**Fig. 5.8** Schematic view of concurrent track evolution in a five-layered part of a tracking system with hexagonal drift cells, which is traversed by three particles, labelled T1, T2 and T3. The simulated drift time isochrones are indicated by circles. The propagation proceeds upstream from the right to the left and starts with a seed of hits from track T1 outside of the picture. (From [49], by permission of Elsevier)

total of *n* + 1 track candidates. This procedure is iterated on each track candidate until the last layer of the tracker is reached or the count of missing hits in a candidate exceeds a preset threshold, typically one or two.

In the course of the combinatorial Kalman filter, it may be necessary to limit the number of active candidates for reasons of memory and/or speed. In this case, the "worst" track candidates are discarded and not followed anymore. The quality of a track candidate can be measured by a combination of its total number of hits, its number of missing hits, and its total chi-square *χ*<sup>2</sup> tot (Sect. 6.4). The tuning of the combination is usually performed on simulated data, where the correct association of hits to tracks is known.

Avoiding a combinatorial explosion is an important issue in experiments with high track multiplicities. Therefore, the CKF in, for instance, ATLAS and CMS starts with seeds in the pixel detector with its very high resolution in all three spatial dimensions. As a consequence, the compatibility test in the first non-pixel layer rejects wrong hits with a high probability. As the CKF proceeds, the state becomes more precisely known, and the probability of attaching a wrong hit becomes even smaller. For more details, see Sects. 10.2 and 10.3.

## *5.1.8 Pattern Matching*

Pattern or template matching is mostly used in real-time track finding for the purpose of triggering on charged tracks (see Sect. 5.2). It can be applied to detectors with a layer structure, each layer being segmented into sectors or bins [51, 52]; see also Sect. 5.1.5. A charged particle crossing the detector generates hits in certain bins of the layers, thereby creating a pattern of "on" and "off" bins, which can be coded as strings of zeros and ones. The set of physically meaningful patterns is generated by extensive simulations and stored in a pattern bank.

The number of patterns to be stored depends on the geometry and size of the detector, on the characteristics of the tracks to be found, and on the granularity of the binning. For the purpose of triggering a lower bound on the momentum is usually imposed; therefore, only the patterns of tracks with momentum above the threshold need to be stored. The granularity of the binning determines how well two nearby tracks can be separated, and therefore depends on the occupancy of the layer. If the binning is very coarse, for instance, only one bin per module in a silicon strip tracker, fewer patterns have to be stored, but two nearby tracks cannot be resolved (see Fig.1 in [52]). If the binning is very fine, for instance, a bin for each strip in the extreme case, nearby tracks can be resolved almost perfectly, but the number of patterns is far too large to store. Also, higher track multiplicity implies smaller track separation on average, which in turn requires finer granularity. In any case, the granularity has to be optimized by extensive simulation studies.

In an event, a particular configuration of hits is generated and translated into a pattern. This pattern is compared to all patterns in the bank, and matching patterns constitute track candidates. The sketch in Fig. 5.9 shows an example of a pattern **Fig. 5.9** Top: Two tracks in a detector with four layers creating two patterns. Bottom: Four patterns in the pattern bank

generated by two tracks and some patterns stored in the pattern bank. To cope with inefficiencies of the tracking detector, it may be necessary to accept partial matches.

As the number of patterns that have to be stored can be very large, the method is feasible only if there is sufficient memory and if the comparison can be made very fast and in a highly parallel mode. The matching has therefore to be implemented in VLSI hardware, using content-addressable or associative memories [51, 52]. The pattern can be arranged in a tree structure, starting with coarse granularity of the sectors and proceeding to finer granularity. It is also possible to store patterns with variable resolution [53].

Pattern matching was used for real-time track finding both in the vertex detector and in the drift chamber of the CDF experiment; see Sect. 5.2.1. Later applications include the FTK (Fast Track Trigger) for the ATLAS experiment, see Sect. 5.2.2, and a proposed track trigger for the new CMS tracker that will be installed for the HL-LHC, see Sect. 5.2.3.

## **5.2 Online Track Finding**

An early proposal for online track finding by dedicated hardware is the one described in [52]. It is based on matching hit patterns in the tracking detector with a pattern bank stored in associative memory; see Sect. 5.1.8. As field programmable gate arrays (FPGAs) were still in their infancy at the time of the publication, the associative memory (AM) is an array of 400 custom VLSI chips [51] that can hold <sup>O</sup>*(*105*)* patterns [51]. The pattern matching is organized as a tree search through different levels of spatial resolution [52]. This was soon followed by the actual implementation in the CDF experiment at the Tevatron collider.

## *5.2.1 CDF Vertex Trigger*

The Silicon Vertex Tracker (SVT, [54]) was designed to provide track impact parameter information for the level-2 trigger of the CDF experiment [55]. It was realized in custom hardware [56, 57]. Track finding is done by an AM, refining the information from the XFT track processor (see below) that finds tracks in the central drift chamber for the level-1 trigger. Tracks are fitted by a farm of digital signal processors using a linearized fit that requires only scalar products; see Sect. 6.1.6.

The upgraded SVT, now renamed Silicon Vertex Trigger, is described in [58, 59]. If an event passes the level-1 trigger, the SVT extrapolates the XFT tracks, associates hits in the silicon vertex detector, and computes the transverse impact parameter. Its average latency is 24 µs. The hit association is performed by custom AM chips, the linearized track fit in FPGAs.

The eXtremely Fast Tracker (XFT, [60, 61]) is a track processor that finds tracks with high transverse momentum in the central drift chamber of the experiment [62]. It is highly parallel and reports its results every 132 ns, in time for the trigger level-1 decision. The XFT works with hits in the four axial superlayers. Track identification is done in two stages called the Finder and the Linker. The Finder searches for high-*p*<sup>T</sup> segments in each of the superlayers, and the Linker searches for high-*p*<sup>T</sup> track candidates by combining segments from at least three (out of four possible) segments. Both stages use pattern matching to accomplish their tasks.

## *5.2.2 ATLAS Fast Tracker*

The Fast Tracker (FTK) system of the ATLAS experiment is designed for global track reconstruction after each level-1 trigger [63, 64]. It enables the level-2 trigger to gain rapid access to tracking results. The system is based on the Silicon Vertex Trigger of the CDF experiment; see the preceding subsection. It uses hit data from four pixel layers and from both sides of four silicon strip layers, twelve in total. The tracker volume is split into 64 regions or towers, which are processed independently. The sensors are divided into "superstrips" with a coarser resolution.

Data processing starts with clustering in the pixel and in the strip sensors. The clustering algorithm is optimized for execution in an FPGA [65]. After clustering, a track is represented by a list of superstrips that corresponds to a pattern in the custom AM chip [66]. Pattern matching produces track candidates at coarse resolution; these are then refined by a high resolution track fit in an FPGA. Missing layers are allowed in both stages. The number of patterns that have to be stored is currently in the order of a billion. This number would have to rise by a factor of the order of 10 at the HL-LHC [63], because of the larger number of channels in the tracker and of the higher track multiplicity, which requires finer granularity (see Sect. 5.1.8). For this and other reasons, the FTK will not be upgraded for operation at the HL-LHC; instead, the focus will be on the acceleration of the track reconstruction software [67, 68]; see also Sect. 10.2.

The fitted tracks are sent to the Second Stage Board wherein they are extrapolated to the remaining silicon layers and fitted again. Finally, duplicate tracks are removed based on the number of common hits and *χ*<sup>2</sup> [64].

## *5.2.3 CMS Track Trigger*

Starting in 2026, the luminosity of the LHC is expected to increase by a factor of about ten above the current design value. The current CMS silicon tracker, having been in operation since 2009, cannot withstand the radiation level predicted for the HL-LHC and has no triggering capability. It will, therefore, be replaced by a newly designed tracker [69]. The new design features so-called *p*<sup>T</sup> modules as the basic sensing devices. A *p*<sup>T</sup> module consists of two closely stacked sensors, either a pixel and a strip sensor (PS) or two strip sensors (2S); see [70] and Fig. 5.10. A charged particle crossing the stack generates a stub that consists of two clusters. Tracks of sufficiently high transverse momentum (*p*<sup>T</sup> *>* 2 GeV) have little curvature and a small offset in the sensor stack, in contrast to tracks with smaller *p*T, which are bent

more strongly and have a larger offset. Stubs that pass the cut on *p*<sup>T</sup> are the input to the track trigger.

Three concepts are explored for reconstructing tracks at the level-1 trigger, two using an all-FPGA system [71], the third one using a combination of AM and FPGAs [72].

#### **5.2.3.1 Time Multiplexing**

The all-FPGA system is based on the principle of time multiplexing [73]. The fundamental idea is that several sources send their information from a given bunch crossing to a single destination for processing. The architecture of the system has two layers: the first extracts and preprocesses the stubs and sends them to the second layer, which contains the track finding processors.

Two track finding algorithms are investigated in the time-multiplexed track trigger, tracklets and Hough transform. The tracklet algorithm has the following stages:


The second algorithm [71] has the following stages:


#### **5.2.3.2 Pattern Matching**

Pattern matching is done in parallel in 48 regions called trigger towers [72]. Each tower has two boards for pattern recognition and track fitting. Pattern recognition is done with lower resolution data, which are compared to predefined patterns by a content addressable memory. Only patterns corresponding to tracks with *p*<sup>T</sup> *>* 2 GeV are stored. If a match is found, the corresponding high-resolution data are sent to the track fitting module. The linearized track fit (Sect. 6.1.6) runs in the FPGA and computes the helix parameters and the *χ*2.

# **5.3 Candidate Selection**

After track finding, track candidates may share hits. If two candidates share more hits than is deemed acceptable, for instance more than one, the track candidates are called incompatible. The incompatibility relation can be represented by an undirected graph *(V , E)*, where the *n* vertices *vi* ∈ *V, i* = 1*,...,n* are the track candidates. Two incompatible track candidates *vi* and *vj* are connected by the edge *eij* = *ej i*, which is defined as the unordered pair *(vi, vj )*. The number of compatible track candidates can be maximized by finding an independent set of vertices of maximal size, i.e., a subset *V*<sup>1</sup> ⊆ *V* of vertices, no two of which are connected by an edge.

Alternatively, the graph can represent the compatibility relation, in which case two compatible tracks/vertices are connected by an edge. The problem is then to find a maximum clique, i.e., a fully connected subset *V*<sup>2</sup> ⊆ *V* of vertices of maximal size.

Both problems are NP-hard [74] so that finding a maximum independent set or maximum clique can be very time-consuming for large graphs. A set of C routines for finding cliques in a compatibility graph can be found in [75]. The fastest exact algorithm for finding independent sets published up to now is the one in [76]. An independent set can also be obtained by finding a vertex cover, i.e., a set *V*<sup>3</sup> ⊆ *V* of vertices the removal of which leaves an independent set. In [77], it was shown that there is a one-to-one correspondence between minimal vertex covers or maximal independent sets and steady states of Hopfield networks [32] with nonpositive weights. In addition, such a network converges to its steady state in at most 2*n* steps. There may be many minimal vertex covers of different size, and the steady state reached depends on the initial state of the network, of which there are 2*n*. Finding the minimal vertex cover by an exhaustive or random search of all initial states is, therefore, computationally infeasible for large *n*.

In any case, finding the largest independent set is not necessarily the best approach for finding an "optimal" set of track candidates, as the quality of the track candidates (see Sect. 6.4) should be taken into account, too. If the quality of the candidate *vi* is quantified by a positive weight *wi*, the problem is now to find an independent set that maximizes the sum of weights (MWIS). Like its unweighted counterpart, the MWIS problem is NP-hard. For a recent approximative solution and numerous references to previous work, see [78].

If the weight *wi* is mapped to a quality indicator *qi* ∈ [0*,* 1], the network in [77] can be generalized to a recurrent network with annealing that aims to find the set of compatible vertices with the largest sum of weights [79]; see also Sect. 11.1.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 Track Fitting**

**Abstract** Track fitting is an application of established statistical estimation procedures with well-known properties. For a long time, estimators based on the least-squares principle were—with some notable exceptions—the principal methods for track fitting. More recently, robust and adaptive methods have found their way into the reconstruction programs. The first section of the chapter presents leastsquares regression, the extended Kalman filter, regression with breakpoints, general broken lines and the triplet fit. The following section discusses robust regression by the M-estimator, the deterministic annealing filter, and the Gaussian-sum filter for electron reconstruction. The next section deals with linearized fits of space points to circles and helices. The chapter concludes with a section on track quality and shows how to test the track hypothesis, how to detect outliers, and how to find kinks in a track.

# **6.1 Least-Squares Fitting**

In this section, three methods are described that are based on the least-squares (LS) principle for estimation of the track parameters. They are linear or linearized regression, the (extended) Kalman filter, and regression with breakpoints. In the case of a strictly linear model, they are mathematically equivalent. With a nonlinear model, there may be small differences because of different choices of the expansion point(s). In the following, the more frequent case of nonlinear models will be described, which contains the linear model as a special case.

# *6.1.1 Least-Squares Regression*

Assume that track finding has produced a track candidate, i.e., a collection of *n* measurements *m*1*,..., m<sup>n</sup>* in different layers of the tracking detector, along with their respective covariance matrices *V*1*,..., Vn*. The measurements may have different dimensions *mi* and usually have different covariance matrices, resulting in a heteroskedastic model. The initial parameters of the track to be fitted to the measurements are denoted by *p*. They are assumed to be tied to a reference surface (layer 0). The regression model has the following form:

$$
\mathfrak{m} = f\left(\mathfrak{p}\right) + \mathfrak{e}, \; \mathsf{E}\left[\mathfrak{e}\right] = \mathfrak{0}, \; \mathsf{Var}\left[\mathfrak{e}\right] = \mathsf{V}, \tag{6.1}
$$

where *<sup>m</sup>* <sup>=</sup> *(m*1*,..., <sup>m</sup>n)*<sup>T</sup> and *<sup>f</sup>* <sup>=</sup> *(<sup>f</sup>* <sup>1</sup>*,..., <sup>f</sup> n)*<sup>T</sup>. The function *<sup>f</sup> <sup>k</sup>* maps the initial parameters *p* to the measurement *m<sup>k</sup>* in layer *k*. It is the composition of the track propagators up to layer *k* (see Sect. 4.3) and the function that maps the track state to the measurement (see Sect. 3.2.3):

$$f\_k = \mathbf{h}\_k \diamond f\_{k|k-1} \diamond f\_{k-1|k-2} \diamond \dots \diamond f\_{2|1} \diamond f\_{1|0},\tag{6.2}$$

Its Jacobian *F<sup>k</sup>* is given by the product of the respective Jacobians:

$$F\_k = H\_k F\_{k|k-1} F\_{k-1|k-2} \dots F\_{2|1} F\_{1|0} \,\tag{6.3}$$

The covariance matrix *V* is the sum of two parts, *V* = *V* <sup>M</sup> + *V*S. The first part is the joint covariance matrix of all measurement errors. These can virtually always be assumed to be uncorrelated across different layers, so that *V* <sup>M</sup> is block-diagonal:

$$\mathbf{V\_M} = \text{blkdiag}\left(\mathbf{V\_l}, \dots, \mathbf{V\_n}\right), \quad \text{with} \quad \mathbf{V\_l} = \text{blk}\left[\mathbf{e\_l}\right], \quad i = 1, \dots, n. \tag{6.4}$$

The second part *V*<sup>S</sup> is the joint covariance matrix of the process noise caused by material effects, mainly multiple Coulomb scattering; see Sect. 4.5. As in Sect. 3.2.3, the process noise encountered during the propagation from layer *k* − 1 to layer *k* is denoted by *γ <sup>k</sup>* and its covariance matrix by *Qk*. The integrated process noise up to layer *k* is denoted by *Γk*. Linearized error propagation along the track gives the following expression for the covariance matrix of *Γk*:

$$\mathsf{Var}\left[\varGamma\_{k}\right] = \sum\_{l=1}^{k} \mathsf{F}\_{k\mid l} \, \mathcal{Q}\_{l} \, \mathsf{F}\_{k\mid l}^{\mathsf{T}},\tag{6.5}$$

with

$$F\_{k|i} = F\_{k|k-1} F\_{k-1|k-2} \cdots F\_{i+1|i} \quad \text{for } i < k \quad \text{and} \quad F\_{k|k} = I. \tag{6.6}$$

If *i<k*, *Γ<sup>i</sup>* and *Γ<sup>k</sup>* are correlated with the following cross-covariance matrix:

$$\text{Cov}\left[\varGamma\_{i},\varGamma\_{k}\right] = \sum\_{j=1}^{l} \varGamma\_{i}|\_{j} \,\mathbf{Q}\_{j} \, F\_{k\mid j}^{\mathsf{T}}.\tag{6.7}$$

Error propagation from the track states to the measurements gives the final block structure of *V*S:

$$\mathbf{V}\_{S} = \begin{pmatrix} \mathbf{C}\_{11} \ \mathbf{C}\_{12} \ \cdots \ \mathbf{C}\_{1n} \\ \mathbf{C}\_{21} \ \mathbf{C}\_{22} \ \cdots \ \mathbf{C}\_{2n} \\ \vdots \ \vdots \ \ddots \ \vdots \\ \mathbf{C}\_{n1} \ \mathbf{C}\_{n2} \ \cdots \ \mathbf{C}\_{nn} \end{pmatrix}, \quad \text{with} \quad \mathbf{C}\_{ik} = \begin{cases} \mathbf{H}\_{k} \textsf{Var} \ \mathbf{[}\boldsymbol{\Gamma}\_{k}\} \ \mathbf{H}\_{k}^{\top}, & \text{if } i = k \\ \mathbf{H}\_{i} \textsf{Cov} \ \mathbf{[}\boldsymbol{\Gamma}\_{i}, \boldsymbol{\Gamma}\_{k}\} \ \mathbf{H}\_{k}^{\top}, & \text{if } i < k \\ \mathbf{C}\_{kl}^{\top}, & \text{if } i > k \end{cases} \tag{6.8}$$

Estimation of the initial state *p* is usually done by the Gauss-Newton method; see Sect. 3.2.2. A first approximation *p***˜**<sup>0</sup> of *p* has to be delivered by the track finder to be used as the expansion point to compute the Jacobians in Eq. (6.3). The updated estimate *p***˜**<sup>1</sup> is obtained via:

$$
\tilde{p}\_1 = \tilde{p}\_0 + (F\_0^\mathsf{T} G F\_0)^{-1} F\_0^\mathsf{T} G [\mathfrak{m} - f(\tilde{p}\_0)], \quad F\_0 = \frac{\partial f}{\partial \mathsf{p}}\Big|\_{\tilde{p}\_0}.\tag{6.9}
$$

The corresponding chi-square statistic is given by:

$$\left[\chi\_{\!\!\! }^{2} = [\!\!\!/ \!\!m - f\left(\tilde{p}\_{\!\!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/)] \delta \left[\!\!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/) \frac{\delta}{\delta} \left[\!\!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/ \!\!/$$

It is approximately *<sup>χ</sup>*<sup>2</sup> distributed with *<sup>M</sup>* <sup>−</sup> *<sup>m</sup>* degrees of freedom, where *<sup>M</sup>* <sup>=</sup> dim*(G)* is the sum of all *mi*, and *m* is the number of estimated track parameters, usually five.

The Jacobians are recomputed at the new expansion point *p***˜**1, and an updated estimate *p***˜**<sup>2</sup> is computed. This two-step procedure is iterated until the absolute difference *χ*2 *<sup>k</sup>*+<sup>1</sup> <sup>−</sup> *<sup>χ</sup>*<sup>2</sup> *k* is below a predefined threshold or a maximal number of iterations is reached.

The *p*-value of the chi-square statistic (see Sect. 3.2.1) is the primary quality indicator of the track fit; see Sect. 6.4. A small *p*-value indicates a misspecification of the model or at least one outlying measurement that does not fit to the track. The standardized residuals or pulls can be used to look for outliers. The residuals *r* of the fit are defined by:

$$r = m - f(\tilde{p}),\tag{6.11}$$

where *p***˜** is the final estimate after convergence. Their covariance matrix is obtained by linearized error propagation and is approximately equal to:

$$\mathcal{R} = \mathsf{Var}\left[r\right] \approx V - F\left(F^{\mathsf{T}} GF\right)^{-1}F^{\mathsf{T}}.\tag{6.12}$$

Note that *R* has rank *M* −*m* and thus cannot be inverted. The standardized residuals *s* are given by:

$$s = r ./ \sqrt{\text{diag}\,(\mathbf{R})},\tag{6.13}$$

where ./ denotes the point-wise division of two vectors or matrices (MATLAB® convention). They are approximately distributed according to a standard normal distribution. Outliers are characterized by an unusually large value of *s*.

The residuals *r*, i.e., the differences of the measured track and the fitted track are a superposition of measurement noise and process noise (mainly multiple scattering). The vector *r* of residuals can be further decomposed into two parts corresponding to the two types of noise [1]:

$$
\mathbf{r} = \mathbf{r}\_{\mathrm{M}} + \mathbf{r}\_{\mathrm{S}}, \quad \text{with} \tag{6.14}
$$

$$r\_{\mathcal{M}} = V\_{\mathcal{M}} G' \mathfrak{m}, \quad \mathcal{R}\_{\mathcal{M}} = \mathsf{Var}\,[r\_{\mathcal{M}}] = V\_{\mathcal{M}} G' V\_{\mathcal{M}},\tag{6.15}$$

$$
\mathbf{r}\_{\rm S} = V\_{\rm S} \mathbf{G}' \mathbf{m}, \quad \mathbf{R}\_{\rm S} = \mathsf{Var}\left[\mathbf{r}\_{\rm S}\right] = V\_{\rm S} \mathbf{G}' V\_{\rm S}, \tag{6.16}
$$

$$\mathbf{G}' = \mathbf{G} \, \mathbf{R} \, \mathbf{G}.\tag{6.17}$$

The two noise contributions can thus be checked independently via their standardized residuals *s*<sup>M</sup> and *s*S, given by:

$$s\_{\mathbf{M}} = r\_{\mathbf{M}}. / \sqrt{\text{diag}(\mathcal{R}\_{\mathbf{M}})} \tag{6.18}$$

$$\mathbf{s}\_{\mathcal{S}} = \mathbf{r}\_{\mathcal{S}} . / \sqrt{\text{diag}(\mathbf{R}\_{\mathcal{S}})} \tag{6.19}$$

## *6.1.2 Extended Kalman Filter*

A "progressive" or recursive version of the LS regression for track fitting was first proposed in [2]. Soon, it was realized that this was the same as an (extended) Kalman filter [3] in the state space model of the track dynamics. The Kalman filter has the advantage that only small matrices have to be inverted, that the track fit follows the track as closely as possible, and that material effects such as multiple scattering and energy loss can be treated locally in each measurement or material layer. In addition, the filter can be complemented by the smoother; see Sect. 3.2.3.

In track fitting with the extended Kalman filter, it is assumed that the trajectory crosses a number of surfaces or layers with well-defined positions and orientations. A layer can be a measurement layer, a material layer, or both. At the intersection point of the trajectory with layer *k*, the state vector *q<sup>k</sup>* contains information about the local position, the local direction, and the local momentum of the track. The uncertainty of the information is specified by the associated covariance matrix *Ck*. Different possible parameterizations of the track state are discussed in Sect. 4.2.

The Kalman filter is a sequence of alternating prediction and update steps; see Sect. 3.2.3. In the prediction step, the estimate *q***˜** *<sup>k</sup>*−<sup>1</sup> of the track state in layer *k* − 1 is extrapolated to layer *k*, along with its covariance matrix *Ck*−1:

$$
\tilde{q}\_{k|k-1} = f\_{k|k-1}(\tilde{q}\_{k-1}), \ \ \mathbf{C}\_{k|k-1} = \mathbf{F}\_{k|k-1} \mathbf{C}\_{k-1} \mathbf{F}\_{k|k-1}^{\mathsf{T}}, \tag{6.20}
$$

where *f <sup>k</sup>*|*k*−<sup>1</sup> is the track propagator from layer *k* − 1 to layer *k*, and *Fk*|*k*−<sup>1</sup> is its Jacobian matrix; see Sect. 4.3.

The update step is different in material and detector layers. In a material layer, multiple scattering is taken into account by inflating the covariance matrix elements of the track directions and, in a thick scatterer, of the track position; see Sect. 4.5. Energy loss by ionization is taken into account by decreasing the track momentum. For the treatment of electron bremsstrahlung, see Sect. 6.2.3.

The update step in a detector layer is given by Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32). The associated chi-square statistic *χ*<sup>2</sup> *<sup>k</sup>* , Eq. (3.35), can be used to test the compatibility of the observation *m<sup>k</sup>* with the predicted state *q***˜** *<sup>k</sup>*|*k*−<sup>1</sup> in the combinatorial Kalman filter; see Sect. 5.1.7. A large value of *χ*<sup>2</sup> *<sup>k</sup>* or a small *p*-value indicates that the observation does not belong to the track.

In track fitting, the initial state *q*<sup>0</sup> is obtained from the track finder and therefore contains information from the observations. In order not to use the information twice, its covariance matrix is set to a large diagonal matrix. As a consequence, the initial chi-square statistics *χ*<sup>2</sup> *<sup>k</sup>* have zero degrees of freedom, until the covariance matrix of the state vector has full rank. For instance, if all measurements are 2D, and the state is 5D, *χ*<sup>2</sup> <sup>1</sup> and *<sup>χ</sup>*<sup>2</sup> <sup>2</sup> have zero degrees of freedom, *<sup>χ</sup>*<sup>2</sup> <sup>3</sup> has one degree of freedom, all subsequent *χ*<sup>2</sup> *<sup>k</sup>* have two degrees of freedom and the total chi-square statistic *χ*<sup>2</sup> tot has 2*n* − 5 degrees of freedom.

The smoother can be implemented either according to Eqs. (3.36) and (3.37) or by running a second filter in the opposite direction and combining the states of the two filters by a weighted mean (Eq. (3.38)):

$$
\tilde{\boldsymbol{q}}\_{k|n} = \mathbf{C}\_{k|n} \left[ \mathbf{C}\_k^{-1} \tilde{\boldsymbol{q}}\_k + \left( \mathbf{C}\_{k|k+1}^{\mathbf{b}} \right)^{-1} \tilde{\boldsymbol{q}}\_{k|k+1}^{\mathbf{b}} \right], \quad \mathbf{C}\_{k|n}^{-1} = \mathbf{C}\_k^{-1} + \left( \mathbf{C}\_{k|k+1}^{\mathbf{b}} \right)^{-1}, \tag{6.21}
$$

where *<sup>q</sup>***˜** <sup>b</sup> *<sup>k</sup>*|*k*+<sup>1</sup> is the predicted state from the backward filter and *<sup>C</sup>* <sup>b</sup> *<sup>k</sup>*|*k*+<sup>1</sup> its covariance matrix. Alternatively, the predicted state from the forward filter and the updated state from the backward filter can be combined. The associated chi-square statistic *χ*<sup>2</sup> *<sup>k</sup>*|*<sup>n</sup>* (Eqs. (3.39) and (3.40)) can be used to test the compatibility of the observation *m<sup>k</sup>* with the smoothed state *q***˜** *<sup>k</sup>*|*n*, using the entire track information. A large value of *χ*<sup>2</sup> *<sup>k</sup>*|*<sup>n</sup>* or a small *<sup>p</sup>*-value indicates that the observation does not belong to the track and is an outlier; see Sect. 6.4.2. A simplified version of the chi-square statistic *χ*<sup>2</sup> *<sup>k</sup>*|*<sup>n</sup>* is used in the deterministic annealing filter; see Sect. 6.2.2.

## *6.1.3 Regression with Breakpoints*

Instead of absorbing the effects of multiple scattering in the covariance matrix of the process noise, the scattering angles at certain points, called breakpoints, can be explicitly incorporated into the track model as additional parameters. At these breakpoints, the scattering angles are estimated [4, 5], using their known expectation (zero) and known covariance matrix in the curvilinear parameterization; see Sect. 4.5.1. The breakpoint fit is mathematically equivalent both to LS regression in the linear approximation of the model and to the Kalman filter, unless the initial state of the latter has non-negligible information.

Let *θ<sup>j</sup>* = *(θj*1*, θj*2*), j* = 1*,...,m* denote two uncorrelated multiple scattering angles at the breakpoint *j* , and *Q<sup>j</sup>* their covariance matrix. Then the regression model in Eq. (6.1) can be modified to:

$$\mathfrak{m}\_{k} = \tilde{f}\_{k}(\mathfrak{p}, \mathfrak{e}\_{1}, \dots, \mathfrak{e}\_{j\_{k}}) + \mathfrak{s}\_{k}, \quad \mathsf{E}\left[\mathfrak{e}\_{k}\right] = \mathbf{0}, \ \mathsf{Var}\left[\mathfrak{e}\_{k}\right] = V\_{k}, \ k = 1, \dots, n,\tag{6.22}$$

where *jk* is the index of the last breakpoint before measurement layer *k*. All measurement errors *ε<sup>k</sup>* are now independent, and their joint covariance matrix is block-diagonal, as is the joint covariance matrix of all *θ<sup>j</sup>* . The full regression model, which includes the scattering angles as additional parameters, now reads:

$$\begin{pmatrix} m\_1 \\ \vdots \\ m\_n \\ \mathbf{0} \\ \vdots \\ \mathbf{0} \end{pmatrix} = \begin{pmatrix} \hat{f}\_1(p, \theta\_1, \dots, \theta\_{\hat{f}1}) \\ \vdots \\ \hat{f}\_n(p, \theta\_1, \dots, \theta\_m) \\ \theta\_1 \\ \vdots \\ \vdots \\ \theta\_m \end{pmatrix} + \delta, \quad \text{with} \tag{6.23}$$

$$\mathsf{E}\left[\boldsymbol{\mathfrak{d}}\right] = \mathbf{0}, \quad \mathsf{Var}\left[\boldsymbol{\mathfrak{d}}\right] = \text{blkdiag}\left(V\_1, \dots, V\_n, \underbrace{\mathcal{Q}\_1}\_{}, \dots, \underbrace{\mathcal{Q}\_m}\_{}\right). \tag{6.24}$$

If the functions *f***ˆ** *<sup>k</sup>* are Taylor-expanded to first order, a linear regression model with the following structure is obtained:

$$
\begin{pmatrix} m\_1 \\ \vdots \\ m\_n \\ \mathbf{0} \\ \vdots \\ \mathbf{0} \end{pmatrix} = \begin{pmatrix} F\_1 \ H\_{1|1} & \cdots \ H\_{1|j\_1} & \mathbf{O} & \cdots & \mathbf{O} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ F\_n \ H\_{n|1} & \cdots & \cdots & \cdots & \cdots & H\_{n|m} \\ \mathbf{O} & I & \mathbf{O} & \cdots & \cdots & \cdots & \mathbf{O} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \mathbf{O} & \cdots & \cdots & \cdots & \cdots & \mathbf{O} & I \end{pmatrix} \cdot \begin{pmatrix} p \\ \theta\_1 \\ \vdots \\ \vdots \\ \theta\_m \end{pmatrix} + c,\tag{6.25}
$$

where *Hk*|*<sup>j</sup> , j* ≤ *jk* is the Jacobian matrix of the function that describes the dependence of *m<sup>k</sup>* on *θ<sup>j</sup>* and *F<sup>k</sup>* is as in Eq. (6.3).

The following two subsections describe two efficient implementations of the breakpoint concept.

## *6.1.4 General Broken Lines*

The general broken lines (GBL) algorithm [6] is a fast track refit based on the breakpoint concept. It is particularly useful for track-based alignment and calibration with the Millepede-II package [7]. It is assumed that only thin scatterers are present, or that thick scatterers are divided into several thin scatterers. The algorithm needs an initial trajectory as a reference.

At each measurement plane and thin scatterer, a local orthonormal coordinate system *(u, v, w)* is defined. The natural choice of the *w*-axis is perpendicular to the sensor plane for a measurement and parallel to the track direction for a scatterer. At each thin scatterer, the offset *(u, v)* is a fit parameter, as is the global signed inverse momentum *q/p*. The prior information on the scattering angles is the same as above, obtained from multiple scattering theory; see Sect. 4.5.1. The transformations from the curvilinear frame following the track to the local frames are given in Sect. 4.4.1.

The corrections to the track parameters in a measurement plane depend only on the (inverse) momentum and the adjacent offsets; therefore, their estimation requires the inversion of a bordered band matrix, the computing time of which is proportional to the number of breakpoints. The GBL is available in a dedicated software package [8] and is also implemented in GENFIT [9–13].

A comparative study in [6] shows that track fitting with the GBL is a little faster than the extended Kalman filter and up to three times faster than the Kalman filter plus smoother.

## *6.1.5 Triplet Fit*

A track fit for situations in which the principal source of stochastic uncertainty is multiple scattering is described in [14]. The fit is an independent combination of fits to triplets of hits, where the middle point of a triplet is a breakpoint. As the triplet fits are fast and can be parallelized, the method is well suited for online reconstruction. The fit has been designed for low momentum tracks with several turns in the detector, but performs also very well for tracks in a high-resolution pixel tracker with momenta up to 10 GeV. For a comparison with a full helix fit and the GBL fit (Sect. 6.1.4) see [14]. The fit is implemented in a software package called WATSON [15] which is available on request from the authors of [14].

## *6.1.6 Fast Track Fit by Affine Transformation*

Online track reconstruction in the first-level trigger, see Sect. 5.2, requires an ultrafast track fit that can be implemented in high-speed hardware such as FPGAs [16]. One possibility to achieve the required speed is to fit an affine model to a training sample of simulated tracks that expresses the track parameters *p* as an affine function of the measurements *m* [17]:

$$
\tilde{p} = A\mathfrak{m} + \mathfrak{c}.\tag{6.26}
$$

The matrix *A* and the vector *c* are estimated by minimizing the objective function

$$\mathcal{S}\left(A,\mathbf{c}\right) = \sum\_{l=1}^{N} \left(A\mathfrak{m}\_{l} + \mathbf{c} - \mathfrak{p}\_{l}\right)^{\mathsf{T}} \left(A\mathfrak{m}\_{l} + \mathbf{c} - \mathfrak{p}\_{l}\right),\tag{6.27}$$

where the *p<sup>i</sup>* are the true parameters of the *N* tracks in the training sample. The solution is given by:

$$A = \left[ \left< \boldsymbol{p} \boldsymbol{m}^{\mathsf{T}} \right> - \left< \boldsymbol{p} \right> \left< \boldsymbol{m}^{\mathsf{T}} \right> \right] \mathsf{C}^{-1}, \ \mathsf{c} = \left< \boldsymbol{p} \right> - A \left< \boldsymbol{m} \right>, \ \mathsf{C} = \left< \boldsymbol{m} \mathbf{m}^{\mathsf{T}} \right> - \left< \boldsymbol{m} \right> \left< \boldsymbol{m}^{\mathsf{T}} \right>,\tag{6.28}$$

where *C* is the sample covariance matrix of the measurement vectors, and the angle brackets denote the average over the training sample. The goodness-of-fit can be judged by the chi-square statistic

$$\chi^2 = \begin{bmatrix} \mathfrak{m} - \langle \mathfrak{m} \rangle \end{bmatrix}^{\mathsf{T}} \mathbf{C}^{-1} \begin{bmatrix} \mathfrak{m} - \langle \mathfrak{m} \rangle \end{bmatrix},\tag{6.29}$$

which is approximately *<sup>χ</sup>*2-distributed with *<sup>M</sup>* <sup>−</sup>*<sup>m</sup>* degrees of freedom, where *<sup>M</sup>* <sup>=</sup> dim*(m)* and *m* = dim*(p)*. If the track model is exactly linear, i.e., *m* = *F p* with some matrix *F*, the matrix *A* is equal to the Moore-Penrose pseudoinverse of *F*, and the rank of *C* is *m*, so that *C* has *m* positive eigenvalues while the remaining ones are equal to 0. If the linear approximation to the track model is valid in the entire training sample, *C* has *m* large and *M* − *m* small eigenvalues. The training sample, therefore, has to be chosen such that this condition is satisfied. In practice, this means that the detector volume has to be partitioned into many small regions, each with its own training sample. The training automatically takes into account material effects, measurement errors, misalignment, and the configuration of the magnetic field.

## **6.2 Robust and Adaptive Fitting**

## *6.2.1 Robust Regression*

The LS regression described in Sect. 6.1.1 is not robust in the sense that outliers in the track candidate lead to a significant distortion of the estimated parameters. One way to cope with this problem is to look for outliers in the standardized residuals of the regression and to remove them; see Sect. 6.4.2. A more elegant way is to make the regression robust by minimizing an objective function that is different from the sum of squared differences between the model and the measurements.

Three important approaches to robust regression are Least Median of Squares (LMS), Least Trimmed Squares (LTS), and M-estimators [18, 19]. LMS regression is difficult to compute in the context of track fitting; it is statistically less efficient than either LS or LTS, and there is no simple prescription for the computation of the covariance matrix of the estimated parameters. The computation of the LTS estimator requires the solution of a combinatorial optimization problem and is therefore rather time-consuming. The M-estimator, on the other hand, can be implemented as an iterated re-weighted LS regression and is therefore an excellent method for a robust track fit. The re-weighting can be done on single components of the measurement vectors *m<sup>i</sup>* or on entire measurement vectors.

An M-estimator is characterized by a function *ψ(z)* that determines the corresponding weight function *ω(z)* = *ψ(z)/z*, where *z* is the standardized residual of a measurement. Setting *ψ(z)* = *z* yields the LS estimator. Table 6.1 and Fig. 6.1 show three examples of *ψ* and weight functions from the literature.

The constant *c* controls the shape of the weight function. The weight function of Huber's M-estimator [19] is equal to one in the interval [−*c, c*] and slowly drops to zero outside the interval (see Fig. 6.1a). Tukey's biweight [20] is a redescending estimator for which the weight function is equal to zero outside the interval [−*c, c*] (see Fig. 6.1b). For the adaptive M-estimator [21], *c* is the point where the weight function is equal to 0.5. This estimator has an additional control parameter *T* that modifies the transition from large to small weight and can be used for implementing an annealing procedure; see Sect. 6.2.2. In the limit of *T* → 0, the estimator is redescending, as the weight function approaches a step function that drops from 1 to 0 at *z* = *c* (see Fig. 6.1c). The computation of the M-estimator is summarized in Table 6.2. In order to be less sensitive to multiple scattering, it uses only the measurement component *s*<sup>M</sup> of the standardized residuals; see Eq. (6.18).


**Table 6.1** The *ψ* functions and the corresponding weight functions *ω* of three M-estimators. *c* and *T* are constants. Further explanations can be found in the text

**Fig. 6.1** The weight functions of the three M-estimators in Table 6.1. (**a**) Huber type with *c* = 2. (**b**) Turkey's biweight with *c* = 3. (**c**) Adaptive with *c* = 3

## *6.2.2 Deterministic Annealing Filter*

The deterministic annealing filter (DAF) is an adaptive version of the standard Kalman filter [22]. It can modify the influence of outlying measurements by assigning them a smaller weight. It can also deal with the case that two (or more) measurements in the same detector layer are tagged by the track finder as valid candidates for inclusion in the track. This is particularly relevant in the LHC experiments, given the high track density in the central tracker. It is then up to the track fit to decide which of the competing measurements, if any, is most compatible with the local track state. Another use case is the choice between two mirror hits in a drift chamber. The DAF is implemented in GENFIT [9–12].

The measurements in layer *k* are denoted by *m<sup>j</sup> <sup>k</sup> , j* = 1*,...,Mk*, and their covariance matrices by *V <sup>j</sup> <sup>i</sup>* . The DAF is implemented as an iterated Kalman filter plus smoother with annealing; see Table 6.3 for the basic sequence and [23, 24] for implementation details. In each layer, the state vector is updated with a weighted mean of the measurements, using the covariance matrices of the current iteration.


TRACK FIT WITH M-ESTIMATOR **-**


$$w\_j = \max[\varepsilon, \omega(\mathfrak{s}\_{\mathbf{M},j})], \quad j = 1, \dots, M. \tag{6.30}$$

Set *V* <sup>M</sup> = *V* (M*./*diag*(w)* and *<sup>V</sup>* <sup>=</sup> *<sup>V</sup>* <sup>M</sup> <sup>+</sup> *<sup>V</sup>*S.

4. Recompute the LS regression, set *V* = *V* and repeat from step 2 until convergence.

**Table 6.3** Algorithm: Deterministic annealing filter

DETERMINISTIC ANNEALING FILTER


$$\begin{split} w\_{k}^{j} &= \max\left[\boldsymbol{\varepsilon}, \frac{\exp(-\boldsymbol{X}\_{k}^{j}/2T\_{k})}{\exp(-\boldsymbol{\chi}\_{\boldsymbol{\varepsilon}}^{2}/2T\_{k}) + \sum\_{\ell=1}^{M\_{k}} \exp(-\boldsymbol{X}\_{k}^{\ell}/2T\_{k})}\right], \\ &\boldsymbol{X}\_{k}^{j} = \left[\boldsymbol{\mathfrak{m}}\_{k}^{j} - \boldsymbol{h}\_{k}(\boldsymbol{q}\_{k\mid n})\right]^{\mathsf{T}} \left[\boldsymbol{V}\_{k}^{j}(0)\right]^{-1} \left[\boldsymbol{\mathfrak{m}}\_{k}^{j} - \boldsymbol{h}\_{k}(\boldsymbol{q}\_{k\mid n})\right]. \end{split}$$

*Xj <sup>k</sup>* is the *<sup>χ</sup>*2-distance of *<sup>m</sup><sup>j</sup> <sup>k</sup>* from the smoothed state in layer *k* in the metric defined by the inverse of *V <sup>j</sup> <sup>k</sup> (*0*)*, see Eq. (3.39).


# *6.2.3 Gaussian-Sum Filter*

The Kalman filter is (near) optimal if the track model is (approximately) linear, and both system and measurement noise are (approximately) Gaussian; see Sect. 3.2.3. If the noise is long-tailed or highly asymmetric, the Gaussian-sum filter (GSF) is an alternative estimator. It can be applied to a broad class of non-Gaussian distributions by allowing all densities involved to be mixtures of normal PDF s or Gaussian sums. The GSF can be applied in the following use cases:


In the GSF, the PDF of the state vector can be a Gaussian sum at every surface. First assume that the surface is a material surface and that the predicted state vector *q* at the entry of the surface has the following normal mixture PDF with *J* components:

$$p\_0(\mathbf{q}) = \sum\_{j=1}^{J} \pi\_j \varphi(\mathbf{q}; \mathbf{q}\_j, \mathbf{C}\_j), \quad \sum\_{j=1}^{J} \pi\_j = 1,\tag{6.31}$$

where *ϕ (q*; *q<sup>j</sup> , C<sup>j</sup> )* is the normal PDF with mean *q<sup>j</sup>* and covariance matrix *C<sup>j</sup> , j* = 1*,...,J* . The process noise *γ* in the surface is modeled by a normal mixture as well (see also Sect. 3.2.3.2):

$$\log(\mathbf{y}) = \sum\_{m=1}^{M} \omega\_m \varphi(\mathbf{y}; \mathbf{g}\_m, \mathbf{Q}\_m), \quad \sum\_{m=1}^{M} \omega\_m = 1. \tag{6.32}$$

Note that both *J* and *M* may be equal to one. Then the PDF of the state vector at the exit of the surface is given by the following normal mixture with *J* ×*M* components:

$$p\_1(\mathbf{q}) = \sum\_{j=1}^{J} \sum\_{m=1}^{M} \pi\_j \omega\_m \varphi(\mathbf{q}; \mathbf{q}\_j + \mathbf{g}\_m, \mathbf{C}\_j + \mathbf{Q}\_m). \tag{6.33}$$

Now assume that the surface is a measurement surface and that the distribution of the measurement error of measurement *m* is modeled by a normal mixture:

$$\log(\mathfrak{e}) = \sum\_{m=1}^{M} \omega\_m \varphi(\mathfrak{e}; \mathbf{0}, \mathbf{V}\_m), \quad \sum\_{m=1}^{M} \omega\_m = 1. \tag{6.34}$$

Again, *J* and *M* may be equal to one. The PDF of the updated state vector is given by:

$$p\_1(\mathbf{q}) = \sum\_{j=1}^{J} \sum\_{m=1}^{M} \eta\_{jm} \varphi(\mathbf{q}; \mathbf{q}\_{jm}, \mathbf{C}\_{jm}), \tag{6.35}$$

with

$$
\eta\_{jm} \propto \pi\_j \omega\_m \varphi(\mathfrak{m}; \mathfrak{h}(\mathfrak{q}\_j), \mathcal{V}\_m + \mathcal{H}\mathcal{C}\_j \mathcal{H}^\top), \quad \sum\_{j=1}^J \sum\_{m=1}^M \eta\_{jm} = 1. \tag{6.36}
$$

The mean *qjm* and the covariance matrix *Cjm* are obtained by the Kalman filter update of component *j* of the predicted PDF *p*0*(q)* with component *m* of the measurement error PDF *g(ε)*, see Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32).

Finally, assume that there are *M* measurements *m<sup>m</sup>* in the measurement surface, with weights *ωm, m* = 1*,...,M*. In the absence or prior information, all weights are set to 1*/M*. The observation *m* can then be modeled by the following normal mixture PDF :

$$\log(\mathfrak{m}) = \sum\_{m=1}^{M} \omega\_m \varphi(\mathfrak{m}; \mathfrak{m}\_m, V\_m), \quad \sum\_{m=1}^{M} \omega\_m = 1. \tag{6.37}$$

The PDF of the updated state vector is given by:

$$p\_1(\mathbf{q}) = \sum\_{j=1}^{J} \sum\_{m=1}^{M} \eta\_{jm} \varphi(\mathbf{q}; \mathbf{q}\_{jm}, \mathbf{C}\_{jm}), \tag{6.38}$$

with

$$
\eta\_{jm} \propto \pi\_j \omega\_m \varphi(\mathfrak{m}\_m; h(\mathfrak{q}\_j), \mathcal{V}\_m + \mathcal{H}\mathcal{C}\_j \mathcal{H}^\top), \quad \sum\_{j=1}^J \sum\_{m=1}^M \eta\_{jm} = 1. \tag{6.39}
$$

The mean *qjm* and the covariance matrix *Cjm* are obtained by a Kalman filter update of component *j* of the predicted PDF *p*0*(q)* with component *m* of the observation PDF *g(m)*. The resulting GSF is basically a combinatorial Kalman filter in which each track candidate has an additional weight which shows how likely it is compared to the other ones. This version of the GSF can be used for track finding; in this case, a missing hit with large errors should be added in each layer [31]. Its weight can reflect the hit efficiency of the measurement device.

In principle, the number of components rises exponentially in the course of the GSF, and it is necessary to reduce their number whenever it exceeds a threshold *K* set by the user. The simplest way of reducing the number of components is to keep the *K* components with the largest weights, drop the remaining ones, and renormalize the weights to a sum of one. A more sophisticated approach is to search for clusters among the components and to collapse the components in a cluster to a single Gaussian with the same mean and covariance matrix. Clustering can be based on the similarity between components, as measured by, e.g., the Kullback– Leibler divergence [30]. For a brief review of clustering algorithms, see Sect. 3.3. The choice of the threshold *K* and the clustering procedure have to be optimized by simulation studies.

Even for moderate values of *K*, the GSF is significantly slower than the Kalman filter; it is, therefore, used mainly for special applications such as the track fit of electrons with non-negligible bremsstrahlung [30, 32, 33].

## **6.3 Linear Approaches to Circle and Helix Fitting**

## *6.3.1 Conformal Mapping Method*

The conformal transformation described in Sect. 5.1.1 can be generalized to deal with circles passing close the origin [34]. The conformal transformation maps such a circle to a circle with a very small curvature, which in turn can be well approximated by a parabola:

$$v = \frac{1}{2b} - \frac{a}{b} \cdot u - \epsilon \left(\frac{R}{b}\right)^3 \cdot u^2,\tag{6.40}$$

where <sup>=</sup> *<sup>R</sup>* <sup>−</sup> <sup>√</sup> *a*<sup>2</sup> + *b*<sup>2</sup> is the impact parameter. A standard parabola fit to the measurements in the transformed *(u, v)*-coordinates yields the parameters *A, B* and *C* according to

$$v = A + Bu + Cu^2,\tag{6.41}$$

and the circle parameters are therefore given by

$$b = \frac{1}{2A}, \quad a = -bB, \quad \epsilon = -C \cdot \frac{b^3}{(a^2 + b^2)^{3/2}},\tag{6.42}$$

using the approximation *<sup>R</sup>* <sup>≈</sup> <sup>√</sup> *a*<sup>2</sup> + *b*<sup>2</sup> in the expression of [34]. Following Gluckstern [35], it is also possible to obtain expressions of the estimated errors of the circle parameters [34].

## *6.3.2 Chernov and Ososkov's Method*

The task of fitting a circular track to a set of measurements is tantamount to minimizing the function

$$\left\|\chi\right\|^2 = \sum\_{l=1}^n d\_l^{(2)},\tag{6.43}$$

where *di* are measurement residuals orthogonal to the particle trajectory:

$$d\_l = \pm \left[ \sqrt{(\mathbf{x}\_l - a)^2 + (\mathbf{y}\_l - b)^2} - R \right], \ i = 1, \ldots, n,\tag{6.44}$$

where *a, b*, and *R* are the coordinates of the circle centre and the radius. The approach of Chernov and Ososkov [36] is to simplify this non-linear minimization problem by introducing an approximate expression for the residuals *di*,

$$d\_l \approx \pm \left[ \left( \mathbf{x}\_l - a \right)^2 + \left( \mathbf{y}\_l - b \right)^2 - R^2 \right] / 2R,\tag{6.45}$$

which holds true with high precision as long as the residuals are small compared to the circle radius. The equations obtained by differentiating *χ*<sup>2</sup> with respect to the circle parameters and setting these to zero are quartic (polynomial equations of degree 4) and can be solved efficiently by a standard Newton iteration procedure.

## *6.3.3 Karimäki's Method*

Karimäki's approach [37] starts from the simplified expression of the residuals *di* introduced by Chernov and Ososkov [36] and considers a *χ*<sup>2</sup> with weighted residuals:

$$\left\|\chi\right\|^2 = \sum\_{l=1}^n w\_l \, d\_l^{-2},\tag{6.46}$$

The weights can for instance contain measurement uncertainties if they are not the same for all measurements *(xi, yi)*.

The *χ*<sup>2</sup> function is minimized with respect to a set of circle parameters with Gaussian behaviour: the curvature *κ* (the inverse radius of curvature), the impact parameter (the distance from the origin to the point of closest approach of the fitted circle), and the direction *φ* of the tangent of the circle at the point of closest approach. Using this set of parameters, the simplified residuals are expressed as

$$d\_l = \frac{1}{2}\kappa r\_l^2 - (1+\kappa\epsilon)r\_l\sin(\phi-\phi\_l) + \frac{1}{2}\kappa\epsilon^2 + \epsilon,\tag{6.47}$$

where *ri* and *φi* are the polar coordinates of measurement *i*. The residuals can be written as *di* = *(*1 + *κ) ηi*, with

$$
\eta\_l = \left| \gamma r\_l \right|^2 - r\_l \sin(\phi - \phi\_l) + \delta,\tag{6.48}
$$

and

$$\gamma = \frac{\kappa}{2(1 + \kappa \epsilon)}, \quad \delta = \frac{1 + \kappa \epsilon/2}{1 + \kappa \epsilon} \epsilon. \tag{6.49}$$

Using these definitions, the *χ*<sup>2</sup> can be written as

$$
\chi^2 = (1 + \kappa \epsilon)^2 \tilde{\chi}^2,\tag{6.50}
$$

where *<sup>χ</sup>*˜ <sup>2</sup> <sup>=</sup> ) *<sup>i</sup> wiη*<sup>2</sup> *<sup>i</sup>* . With the approximation 1 <sup>+</sup> *κ* <sup>≈</sup> <sup>1</sup>*, <sup>χ</sup>*˜ <sup>2</sup> can be minimized instead of *χ*2, leading to the attractive feature of a set of equations with explicit solutions:

$$\phi = \frac{1}{2} \arctan(2q\_1/q\_2),\tag{6.51}$$

$$\mathcal{Y} = \left(\sin\phi \cdot \mathcal{C}\_{\chi\underline{\varepsilon}} - \cos\phi \cdot \mathcal{C}\_{\chi\underline{\varepsilon}}\right) / \mathcal{C}\_{\overline{\varepsilon}\overline{\varepsilon}},\tag{6.52}$$

$$\delta = -\gamma \langle z \rangle + \sin \phi \langle \mathbf{x} \rangle - \cos \phi \langle \mathbf{y} \rangle,\tag{6.53}$$

where *<sup>q</sup>*<sup>1</sup> <sup>=</sup> *CzzCxy* <sup>−</sup>*CxzCyz* and *<sup>q</sup>*<sup>2</sup> <sup>=</sup> *Czz (Cxx* <sup>−</sup>*Cyy )*−*C*<sup>2</sup> *xz* <sup>+</sup>*C*<sup>2</sup> *yz* and the angle brackets denote a weighted average, e.g., *x* = ) *<sup>i</sup> wixi/* ) *<sup>i</sup> wi*. The variances and covariances of the measurements *x, y* and *<sup>z</sup>* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> are given by

$$\begin{array}{ll} \mathcal{C}\_{xx} = \langle \mathbf{x}^2 \rangle - \langle \mathbf{x} \rangle^2, & \mathcal{C}\_{xy} = \langle \mathbf{x} \mathbf{y} \rangle - \langle \mathbf{x} \rangle \langle \mathbf{y} \rangle, & \mathcal{C}\_{yy} = \langle \mathbf{y}^2 \rangle - \langle \mathbf{y} \rangle^2, \\\mathcal{C}\_{xz} = \langle \mathbf{x} \boldsymbol{z} \rangle - \langle \mathbf{x} \rangle \langle \boldsymbol{z} \rangle, & \mathcal{C}\_{yz} = \langle \mathbf{y} \boldsymbol{z} \rangle - \langle \mathbf{y} \rangle \langle \boldsymbol{z} \rangle, & \mathcal{C}\_{zz} = \langle r^4 \rangle - \langle \mathbf{z} \rangle^2. \end{array} \tag{6.54}$$

The curvature *κ* and impact parameter are given by

$$\kappa = \frac{2\gamma}{\sqrt{1 - 4\delta\chi}}, \quad \epsilon = \frac{2\delta}{1 + \sqrt{1 - 4\delta\chi}}.\tag{6.55}$$

Expressions of the uncertainties of the estimated parameters are available and can be found in [37].

## *6.3.4 Riemann Fit*

The Riemann circle fit [38] is based on the fundamental theorem in complex analysis that circles and lines in the plane correspond one-to-one to circles on the Riemann sphere. Since a circle on a sphere is the intersection of a plane with the sphere, there is a one-to-one correspondence between circles and lines in the plane and planes in space. The problem of fitting a circle to a set of measurements in the plane can, therefore, be transformed into the problem of fitting the transformed measurements to a plane in space. The latter problem can be solved directly by non-iterative methods.

The mapping of a point *(ui, vi)* in the plane to the transformed point *(xi, yi, zi)* on the Riemann sphere is given by

$$\begin{aligned} \mathbf{x}\_{l} &= \boldsymbol{u}\_{l}/(\boldsymbol{1} + \boldsymbol{u}\_{l}^{2} + \boldsymbol{v}\_{l}^{2}), \\ \mathbf{y}\_{l} &= \boldsymbol{v}\_{l}/(\boldsymbol{1} + \boldsymbol{u}\_{l}^{2} + \boldsymbol{v}\_{l}^{2}), \\ \mathbf{z}\_{l} &= (\boldsymbol{u}\_{l}^{2} + \boldsymbol{v}\_{l}^{2})/(\boldsymbol{1} + \boldsymbol{u}\_{l}^{2} + \boldsymbol{v}\_{l}^{2}). \end{aligned} \tag{6.56}$$

The denominator in the expressions of the transformed measurements leads to small distances between the transformed measurements and the fitted plane for large radii *Ri* = *ui* <sup>2</sup> <sup>+</sup> *vi* 2 1*/*<sup>2</sup> in the plane. In an attempt to satisfy the Gauss-Markov conditions as closely as possible, a radius-dependent scaling factor was introduced in the fitting procedure in [39]. It was realized in [40] that this scaling factor could be omitted by mapping the points in the plane to a paraboloid rather than the Riemann sphere,

$$\mathbf{x}\_{l} = \boldsymbol{u}\_{l}, \quad \mathbf{y}\_{l} = \boldsymbol{v}\_{l}, \quad \boldsymbol{z}\_{l} = \boldsymbol{u}\_{l}^{2} + \boldsymbol{v}\_{l}^{2}, \tag{6.57}$$

leading to the same values of the estimated parameters if the measurements are at fixed radial positions.

Fitting a plane in space to the *n* measurements on the paraboloid is tantamount to minimizing the objective function

$$\mathcal{S}(c,\mathfrak{n}) = \sum\_{l=1}^{n} \frac{(c + n\_1\chi\_l + n\_2\chi\_l + n\_3\chi\_l)^2}{\sigma\_l^2} = \sum\_{l=1}^{n} \frac{d\_l^2}{\sigma\_l^2} \tag{6.58}$$

with respect to *<sup>c</sup>* and *<sup>n</sup>* <sup>=</sup> *(n*1*, n*2*, n*3*)*<sup>T</sup> with the constraint that *<sup>n</sup>* is a unit vector. This is achieved by choosing *n* as the unit eigenvector corresponding to the smallest eigenvalue of the sample covariance matrix *A* of the measurements:

$$A = \frac{1}{N} \sum\_{i=1}^{n} \frac{1}{\sigma\_i^2} \left( r\_i - r\_{\text{cg}} \right) \left( r\_i - r\_{\text{cg}} \right)^{\mathsf{T}}.\tag{6.59}$$

The constant *<sup>c</sup>* is equal to *<sup>c</sup>* = −*n*<sup>T</sup> ) *r*cg with the centre of gravity vector *r*cg = *<sup>i</sup> wir<sup>i</sup>* and the weights

$$w\_{i} = \frac{1/\sigma\_{i}^{2}}{\sum\_{j=1}^{n} 1/\sigma\_{j}^{2}}, \ i = 1, \ldots, n. \tag{6.60}$$

Given the parameters of the fitted plane, a suitable set of circle parameters can be derived. For example, the parameters chosen in Sect. 6.3.3 are given by:

$$\phi = \arctan\left(\frac{n\_2}{n\_1}\right),\tag{6.61}$$

$$\kappa = s \cdot \frac{2n\_3}{\sqrt{1 - n\_3^2 - 4cn\_3}},\tag{6.62}$$

$$\epsilon = s \cdot \frac{\sqrt{1 - n\_3^2 - 4cn\_3} - \sqrt{1 - n\_3^2}}{2n\_3},\tag{6.63}$$

up to a sign *s* = ±1. One possible convention of determining *s* is given in [41].

Expressions of the uncertainties of the estimated circle parameters are given in [41] for measurement uncertainties both in the transverse and in the radial direction. Effects of multiple Coulomb scattering can also be included in this approach, essentially by modifying Eq. (6.59) to include correlations between all measurements due to multiple scattering [40]. A robust version of the Riemann fit based on LMS regression is proposed in [42].

## *6.3.5 Helix Fitting*

Linearized helix fitting can be done by first estimating the parameters of the circle that results from the projection of the helix on the transverse (bending) plane. Any of the methods described above can be used for this. If the detector system at hand is of a barrel-type, so that the radial positions of the measurements are known to a very high precision, the path lengths to the intersections between the fitted circle and the detector elements can be obtained from the circle parameters. For the Riemann circle fit, these path lengths can be found directly from the knowledge of the parameters of the fitted plane [43]. The longitudinal (non-bending) plane parameters can then be found by solving the linear regression model:

$$z = A\,\,p + \mathfrak{e},\,\,A = \begin{pmatrix} 1 \ s\_1 \\ \vdots \ \vdots \\ 1 \ s\_n \end{pmatrix},\tag{6.64}$$

where *z* is the vector of *z* measurements and the parameter vector *p* is given by:

$$p = \begin{pmatrix} z \\ \tan \lambda \end{pmatrix}, \quad \mathsf{Var}\,[\mathfrak{e}] = \mathcal{V}\_z,\tag{6.65}$$

where the dip angle *λ* = *π/*2 − *θ* is the complement of the polar angle *θ*, and *V <sup>z</sup>* is the covariance matrix of the measurements (containing contributions from multiple scattering, if desired). From the fitted parameters, *θ* and *z* in the innermost layer can be immediately obtained.

In a forward-type detector, the *z* positions are known very precisely, whereas the radial positions of the measurements are observed. In this case, the regression is *s* on *z*. A suitable regression model is then:

$$s = A\,\, p + \mathfrak{s},\,\, A = \begin{pmatrix} 1 \ z\_1 \\ \vdots \ \vdots \\ 1 \ z\_n \end{pmatrix},\,\, p = \begin{pmatrix} \,\, ^S \\ \tan \theta \end{pmatrix},\,\, \mathsf{Var}\,\, [\mathfrak{s}] = V\,\_R,\tag{6.66}$$

since the covariance matrix of *s* is a very good approximation to the covariance matrix of *R*. From the fitted parameters, the polar angle *θ* of the track and the *s*values in all layers can be immediately determined, and, if desired, the predicted radial positions of all measurements [43]. If higher precision is needed, the circle and line fits can be iterated.

## **6.4 Track Quality**

## *6.4.1 Testing the Track Hypothesis*

The principal test statistic of the track hypothesis, i.e., of the hypothesis that all measurements in the track are generated by the same charged particle, is the total *χ*<sup>2</sup> of the track. It is exactly *χ*2-distributed if, and only if, the following conditions are met:


Obviously, these conditions are met very rarely, if ever, in the experiment. In most circumstances, the track model is the linear approximation of a non-linear one; the measurement errors are not strictly normal, and the calibration of their covariance matrix is not perfect; the distribution of the multiple scattering angle has tails that contradict the assumption of normality; the estimated track parameters may be distorted by outliers; and the estimator may be a robust version of the usual LS estimator. As a consequence, the best one can hope for is that the total *χ*<sup>2</sup> is at least approximately *χ*2-distributed. Its distribution for a sample of tracks can be visualized by a histogram of the *p*-values, defined by:

$$p = \int\_{\chi^2}^{\infty} \mathbf{g}\_d(\mathbf{x})d\mathbf{x},\tag{6.67}$$

where *gd (x)* is the PDF of the *χ*2-distribution with *d* degrees of freedom. The number of degrees of freedom is the sum *M* of the measurement dimensions *mi* minus the dimension of the track parameter vector. In the ideal case, the *p*-values are uniformly distributed in the interval [0*,* 1]. In practice, one frequently observes a fairly uniform distribution with a peak at zero, where defective and fake tracks accumulate.

Besides the total chi-square statistic, the track length and the number of holes or missing measurements is an indication of the track quality. As the number of degrees of freedom of the track fit is the same as the number of geometrical constraints imposed on the measurements, a long track is much less likely to be a fake track than a short track. On the other hand, an outlier has less effect on the total *χ*<sup>2</sup> in a long track than it has in a short track. This can be demonstrated by a simple example. Assume a perfect sample of tracks with four measurements of dimension two. The fit of five track parameters leaves three degrees of freedom. A *χ*2-cut at the 99%-quantile *q*0*.*99*,*<sup>3</sup> = 11*.*345 rejects 1% of the tracks. Assume that one of measurements is replaced by an outlier, thereby increasing the total *χ*<sup>2</sup> of every track by 3. Now the same cut rejects 4% of the tracks. Under the same assumptions, but with ten measurements and 15 degrees of freedom, the cut rejects only 2*.*4% of the tracks with outliers.

If the efficiency of the tracking detectors or sensors is known with good precision, a rigorous test on the allowed number of holes can be constructed. If, for the sake of simplicity, it is assumed that the efficiency is the same for all sensors contributing hits to a track candidate, and that the occurrence of holes is independent across sensors, the number *h* of holes in a track with *n* measurements is distributed according to a binomial distribution:

$$P(h) = \binom{n}{h} \left(1 - \epsilon\right)^h \epsilon^{n-h}.\tag{6.68}$$

If = 0*.*98 and *n* = 15, then *P (*1*)* = 0*.*23 and *P (*2*)* = 0*.*032, so a single hole is not suspicious at all, and two holes are unlikely, but not impossible. If *n* = 6, then *P (*1*)* = 0*.*11 and *P (*2*)* = 0*.*0055, so a single hole is possible, but the occurrence of two holes just by chance is very unlikely, in which case the suspicion of a fake track or a contaminated track is well-founded.

# *6.4.2 Detection of Outliers*

In the context of track fitting, an outlier is defined as a measurement that does not follow the expected behaviour. This may be put into statistical terms by saying that a measurement is considered as an outlier whenever its distance from the locally estimated track position is too large under the assumption of normal measurement errors, the distance being expressed in terms of the covariance matrix attached to the measurement.

Outliers can be classified into track-correlated and track-uncorrelated ones [44]. Some sources of track-correlated outliers are:


Track-uncorrelated outliers are signals that are not caused by the track, but are nevertheless picked up by the track search. They may be, for instance, signals from adjacent tracks, ghost hits in double-sided silicon sensors, see Sect. 1.3.1, or noise.

Whatever the source, an outlier can be detected by a test based on the residuals of the measurements with respect to the estimated track position. In the case of a single outlier, the test is most powerful if the estimate contains the information of all the other measurements. This is done most easily in the state space model of the track; see Sects. 3.2.3 and 6.1.2. Let *m<sup>k</sup>* be measurement under scrutiny, *qk*|*<sup>n</sup>* the smoothed residual and *Ck*|*<sup>n</sup>* its covariance matrix, see Eqs. (3.39) and (3.40) and end of Sect. 6.1.2. The compatibility of *m<sup>k</sup>* with *qk*|*<sup>n</sup>* can be checked component-wise on the basis of the standardized residuals, or globally on the basis of the chi-square statistic:

$$\chi^2\_{k\mid n} = \left(r\_{k\mid n}\right)^\mathsf{T} \left(\mathcal{R}\_{k\mid n}\right)^{-1} r\_{k\mid n}.\tag{6.69}$$

If there are no outliers, the standardized residuals should be compatible with a standard normal distribution, and *χ*<sup>2</sup> *<sup>k</sup>*|*<sup>n</sup>* should be compatible with a *<sup>χ</sup>*2-distribution with *mk* degrees of freedom, where *mk* is the dimension of *mk*. If *m<sup>k</sup>* is an outlier, this should be visible in the values of *<sup>r</sup>k*|*<sup>n</sup>* and *<sup>χ</sup>*<sup>2</sup> *<sup>k</sup>*|*n*. There is, however, a problem with this approach. Even a single outlier at position *k* introduces a bias in all of the states *qi*|*n, i* = *k*, so that also an inlier at position *i* = *k* can show abnormal values of the residuals and *χ*<sup>2</sup> *<sup>i</sup>*|*n*, especially if *<sup>i</sup>* is close to *<sup>k</sup>*. As a consequence, it is by no means obvious that *m<sup>k</sup>* can be correctly identified as the outlier. The situation is even worse if there are several outliers. In this case, a robust track fit that down-weights outliers, instead of trying to find and remove them, is a better solution; see Sect. 6.2.

## *6.4.3 Kink Finding*

A charged particle decay that produces a single charged daughter particle plus some neutral ones manifests itself as a sudden change of the track direction and/or curvature, often called a kink or breakpoint.<sup>1</sup> Typical examples are the muonic decays of charged *π* and *K* mesons. Another source of kinks is hard elastic scattering on the material of the detector [45]. Collinear energy loss of an electron by bremsstrahlung does not result in a kink, but only in a change of curvature.

It is characteristic for a kink that the track segments in front of and behind the kink both give a good fit to their respective track model; however, there is a significant difference between the two sets of track parameters estimated from the two track segments. In the Kalman filter framework, this difference and its covariance matrix are readily available at any layer *k* from the forward and the backward filter:

$$
\Delta\_k = \mathbf{q}\_k - \widetilde{\mathbf{q}}\_{k|k+1}^\mathbf{b}, \text{ \textquotedbl{}\textquotedbl{}\textquotedbl{}}\mathbf{\hat{a}}\mathbf{\hat{a}}\{\mathbf{A}\_k\} = \mathbf{C}\_{\Delta,k} = \mathbf{C}\_k + \mathbf{C}\_{k|k+1}^\mathbf{b}. \tag{6.70}
$$

The associated chi-square statistic is given by:

$$
\chi^2\_{\Delta,k} = \mathbf{A}\_k^\mathsf{T} \mathbf{C}\_{\Delta,k}{}^{-1} \mathbf{A}\_k. \tag{6.71}
$$

In [44] a *χ*2-test statistic *X*<sup>2</sup> for kink finding is investigated:

$$X^2 = \max\_{k \in K} \chi^2\_{\Delta, k},\tag{6.72}$$

<sup>1</sup>In the literature on time series analysis it is called a change point.

where the range *K* of layers is restricted by the requirement that the respective track segments from the forward and the backward filter are well defined. Results from the simulation of *π* and *K* decays in a simplified setup are given in [44].

This simple test does not take into account the specific features of the process leading to the kink. In energy loss by bremsstrahlung, only the curvature changes; in hard elastic scattering, only the direction changes; in a decay, the direction changes and the momentum decreases, or curvature increases. In all three processes, the position of the two track segments has to be compatible. These features are taken into account by the modified track fit described in [45]. At each possible breakpoint, an extended set of track parameters *α* is defined that allows for sudden changes in a subset of the track parameters. Three cases are considered:


At layer *k, <sup>α</sup>* can be estimated by a linear regression in which *<sup>q</sup>***˜** *<sup>k</sup>* and *<sup>q</sup>***˜** <sup>b</sup> *<sup>k</sup>*|*k*+<sup>1</sup> play the role of the observations. This is equivalent to the minimization of the following objective function:

$$\begin{split} \mathcal{S}(\mathfrak{a}) &= \left(\tilde{\mathfrak{q}}\_{k} - H\_{\mathfrak{f}}\mathfrak{a}\right)^{\mathsf{T}} \mathbf{C}\_{k}^{-1} \left(\tilde{\mathfrak{q}}\_{k} - H\_{\mathfrak{f}}\mathfrak{a}\right) + \\ &\quad + \left(\tilde{\mathfrak{q}}\_{k|k+1}^{\mathsf{b}} - H\_{\mathfrak{b}}\mathfrak{a}\right)^{\mathsf{T}} \left(\mathbf{C}\_{k|k+1}^{\mathsf{b}}\right)^{-1} \left(\tilde{\mathfrak{q}}\_{k|k+1}^{\mathsf{b}} - H\_{\mathfrak{b}}\mathfrak{a}\right), \end{split} \tag{6.73}$$

where *<sup>H</sup>*<sup>f</sup> and *<sup>H</sup>*<sup>b</sup> are the matrices that project *<sup>α</sup>* on *<sup>q</sup>***˜** *<sup>k</sup>* and *<sup>q</sup>***˜** <sup>b</sup> *<sup>k</sup>*|*k*+1, respectively.

From the estimated vector *α* and its covariance matrix, standardized forwardbackward differences of the relevant parameters can be computed. In addition an *F*-test can be performed to test whether additional parameters result in a significant reduction of the total chi-square statistic. The location of the breakpoint can be determined by the location of the largest discrepancy between forward and backward parameters, as measured by the value of S*(α)* at the minimum. Results of studies of simulated pion decays in the NOMAD detector are shown in [45].

The breakpoint finder in [46] is based on the autocorrelation function of the residuals of the track fit. In an undisturbed track with many measurements, typically in a TPC, the residuals between the measured coordinates and the fitted trajectory are only weakly correlated so that the autocorrelation function of the residuals is close to zero for arbitrary lags. A breakpoint in the track introduces correlated shifts in all subsequent position measurements, resulting in an autocorrelation function that is significantly different from zero. Assuming 1D position measurements, the average autocorrelation of lag is defined by:

$$\rho\_{\ell} = \left(\sum\_{l=1}^{n-\ell} r\_l r\_{l+\ell}\right) \left(\sum\_{l=1}^{n-\ell} r\_l^2 \sum\_{l=1}^{n-\ell} r\_{l+\ell}^2\right)^{-1/2},\tag{6.74}$$

where *ri* = *δi/σi* is the residual of measurement *i* divided by the standard error of measurement *i*, and *n* is the total number of measurements. The test statistic *λ* used in [46] is a weighted average of the autocorrelations up to a maximal lag *L* such that small lags have larger weight:

$$\lambda = \sum\_{\ell=1}^{L} w\_{\ell} \, \rho\_{\ell}, \quad w\_{\ell} = \frac{2 \, (L - \ell)}{L \, (L - 1)}, \quad \sum\_{\ell=1}^{L} w\_{\ell} = 1. \tag{6.75}$$

For the simulated data used in [46], setting *L* equal to the nearest integer to *n*/8 gives the largest power of the test. In general, the maximal lag *L* and the weights *w* must be tuned on simulated data. The threshold of *λ* above which the null hypothesis (no breakpoint) is rejected, is set according to the tolerated percentage of undisturbed tracks that are rejected, i.e., to the probability of an error of the first kind.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part III Vertex Reconstruction**

# **Chapter 7 Vertex Finding**

**Abstract** Vertex finding is the search for clusters of tracks that originate at the same point in space. The chapter discusses a variety of methods for finding primary vertices, first in one and then in three dimensions. Details are given on model-based clustering, the EM algorithm and clustering by deterministic annealing in 1D, and greedy clustering, iterated estimators, topological vertex finding, and a vertex finder based on medical imaging in 3D.

# **7.1 Introduction**

Vertex finding is the process of dividing all or a subset of the reconstructed tracks in an event into classes such that presumably all tracks in a class are produced at the same vertex. Vertices in an event can be classified as primary vertices or secondary vertices. In a fixed target experiment, a primary vertex is the point where a beam particle collides with a target particle; in a collider experiment, a primary vertex is the point where two beam particles collide. In the LHC experiments CMS and ATLAS, there are many primary vertices in each event because in each bunch crossing many collisions may and do occur. As a rule, however, at most one of these primary vertices is of interest to the subsequent analysis; this is called the signal vertex. The signal vertex is distinguished from the other primary vertices (pile-up vertices) by using kinematic criteria such as the transverse momenta of the tracks forming the vertex.

A secondary vertex is the point where an unstable particle decays, or where a particle interacts with the material of the detector. As the search for secondary vertices is often based on a well-reconstructed primary vertex, this chapter deals only with primary vertex finding; secondary vertex finding is deferred to Chap. 9.

In collider experiments, the position of the beam spot and the size of the luminous region contains precise prior information about the transverse position of primary vertices, see Fig. 7.1. The prior information on the position along the beam axis is much weaker, as the primary vertices can be anywhere in a zone defined by the bunch length, which is a couple of centimeters in the LHC. In fixed-target

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_7

**Fig. 7.1** The transverse size of the luminous region of the LHC as determined by ATLAS in 2017 [1]. Left: horizontal size; right: vertical size

experiments, the prior information is given by the beam profile and the position and size of the target.

Vertex finding methods can be roughly divided into three main types: generic clustering algorithms, topological methods, and iterated estimators. The latter can be considered as a special model-based clustering method. For a brief introduction into methods of clustering, see Sect. 3.3.

## **7.2 Primary Vertex Finding in 1D**

In the LHC experiments ATLAS and CMS, there are many beam-beam interactions during a single bunch crossing at the typical luminosity level of the collider. In order to find all primary vertices, first primary tracks are selected by a cut on their distance to the *z*-axis, which is the beam line. The selected tracks are then clustered on the basis of their *z*-coordinates at their point of closest approach to the centre of the beam spot. The vertex finding problem is thus reduced to clustering in a single spatial dimension.

## *7.2.1 Divisive Clustering*

A simple divisive clustering can be performed by requiring a gap of at least *d* between adjacent clusters/vertices. The threshold *d* depends on the shape of the beam profile along *z*, on the expected number of interactions per bunch crossing, and on the precision of the *z*-coordinate used for clustering. It has to be optimized by studying simulated events. If subsequent validation of a cluster fails, it can be further divided by a more refined method (see Sect. 7.3.3).

## *7.2.2 Model-Based Clustering*

Assume that there are *n* tracks with *z*-coordinates *zi, i* = 1*, . . . , n,* sorted in ascending order: *z*<sup>1</sup> *< z*<sup>2</sup> *< ... < zn*. The *zi* are assumed to be sampled from a Gaussian mixture with the following PDF:

$$f(\mathbf{z}) = \sum\_{k=1}^{K} \omega\_k \boldsymbol{\varphi}(\mathbf{z}; v\_k, \sigma\_k^2), \quad \sum\_{k=1}^{K} \omega\_k = 1,\tag{7.1}$$

where *K* is the number of mixture components, *ϕ* is the PDF of the normal distribution, *ωk* is the component weight, *vk* is the mean value, and *σ*<sup>2</sup> *<sup>k</sup>* is the variance of component *k, k* = 1*,...,K*. As the association of the tracks represented by the points *zi* to the vertices represented by the component means *vk* is unknown, latent (unobserved) variables *yi, i* = 1*,...,n* are introduced that encode the association:

$$\mathbf{y}\_l = k \iff z\_l \text{ belongs to component } k, \ i = 1, \ldots, n. \tag{7.2}$$

The latent variables and the unknown parameters of the mixture (weights, means, variances) can be estimated by the Expectation-Maximization (EM) algorithm [2– 4]. The EM algorithm is iterative, and each iteration consists of two steps. In the Estep (expectation step), the latent variables are estimated, given the observations and the current estimate of the mixture parameters. In the M-step (maximization step), the maximum likelihood estimate of the mixture parameters is computed, using the estimates of the latent variables from the E-step. Convergence is guaranteed as the likelihood increases in every iteration. There is no guarantee, however, to reach the global maximum of the likelihood function.

In the special case of a normal mixture, explicit formulas can be obtained. Assume that the M-step of iteration *j* gives the estimates *ωk(j ), vk(j ), σ*<sup>2</sup> *<sup>k</sup> (j ), k* = 1*,...,K*. The E-step of the next iteration *j* + 1 computes the association probabilities or ownership weights *pi,k* of all points with respect to all components:

$$p\_{i,k} = \frac{\alpha \wp(j)\,\varphi(z\_l; v\_k(j), \sigma\_k^2(j))}{\sum\_{l=1}^K \alpha \wp(j)\,\varphi(z\_l; v\_l(j), \sigma\_l^2(j))}, \; i = 1, \ldots, n, \; k = 1, \ldots, K. \tag{7.3}$$

In the M-step the mixture parameters are updated:

$$\begin{aligned} \rho\_k(j+1) &= \frac{1}{n} \sum\_{l=1}^n p\_{l,k}, \\ \upsilon\_k(j+1) &= \frac{\sum\_{l=1}^n p\_{l,k} z\_l}{\sum\_{l=1}^n p\_{l,k}}, \\ \sigma\_k^2(j+1) &= \frac{\sum\_{l=1}^n p\_{l,k} \left[z\_l - \upsilon\_k(j+1)\right]^2}{\sum\_{l=1}^n p\_{l,k}}, \quad k = 1, \dots, K. \end{aligned} \tag{7.4}$$

The EM algorithm suffers from the fact that the number *K* of clusters has to be selected in advance. A possible solution to this problem is to set *K* to the largest value that can reasonably be expected, and to merge components that are sufficiently similar after the EM algorithm has converged.

The selection of the optimal number of components can be automatized by sparse model-based clustering [5, 6]. Sparsity of the final mixture is achieved by an appropriate prior on the mixture weights. As explicit formulas are no longer available, estimation of the mixture parameters has to be done via Markov Chain Monte Carlo (MCMC). A comparison with the EM algorithm can be found in [6].

## *7.2.3 EM Algorithm with Deterministic Annealing*

Assume that there are *n* tracks with *z*-coordinates *zi, i* = 1*,...,n*, again in ascending order, with their associated standard errors *σi, i* = 1*,...,n*. The EM algorithm described in Sect. 7.2.2 is sensitive to the initial values of the component parameters. This can be cured by introducing deterministic annealing (DA), see [7]. DA introduces a temperature parameter *T* , which is used to scale the standard errors of the *zi, i* = 1*,...,n*. Annealing starts at high temperature, corresponding to large errors, and the temperature is lowered according to a predefined annealing schedule. At each temperature, the association probabilities of the data points *zi* to the current cluster centers are computed. If there are data points not associated to any of the clusters, indicated by very low association probabilities, one of these points is chosen as a new cluster center. The number of clusters is thus determined dynamically. The algorithm is summarized in Table 7.1. Note that in contrast to the model-based clustering in Sect. 7.2.2 the association probabilities are computed using the uncertainties of the track positions instead of the vertex positions.

Two examples with ten randomly generated clusters within a space of 1 cm and *σi* = 0*.*01 cm are shown in Fig. 7.2. In the example on the top, the found clusters perfectly match the true clusters while in the example on the bottom, two true clusters merge into a single found cluster, and one of the found clusters is spurious with a single data point.

## *7.2.4 Clustering by Deterministic Annealing*

Assume further that there are K vertex positions *vk, k* = 1*,...,K*, called prototypes, that represent *K* clusters of tracks. The expected discrepancy between data points and prototypes is given by:

#### **Table 7.1** Algorithm: Vertex finding with EM algorithm and deterministic annealing

CLUSTER FINDING WITH EM ALGORITHM AND DETERMINISTIC ANNEALING


$$p\_{i,k} = \frac{\exp\left[-\beta \left(z\_i - v\_k\right)^2 / \sigma\_i^2\right]}{\exp\left(-\beta \, c^2\right) + \sum\_{j=1}^K \exp\left[-\beta \left(z\_i - v\_j\right)^2 / \sigma\_i^2\right]}, \text{ with } \beta = 1/T. \tag{7.5}$$

4. Update the cluster centers:

$$v\_k = \frac{\sum\_{i=1}^{n} p\_{i,k} z\_i}{\sum\_{i=1}^{n} p\_{i,k}}, \ k = 1, \dots, K. \tag{7.6}$$


**Fig. 7.2** Examples of cluster finding with EM algorithm and Deterministic Annealing. The results are discussed in the text

$$E = \sum\_{i=1}^{n} \sum\_{k=1}^{K} P(\mathbf{z}\_i \in \mathcal{C}\_k) \, d(\mathbf{z}\_i, \upsilon\_k), \tag{7.7}$$

where *Ck* is the cluster represented by *vk* and *d(zi, vk)* is a measure of distance between data point *zi* and the prototype *vk* [8]. A typical choice of *d(zi, vk)* is the weighted squared difference:

$$d(z\_l, v\_k) = \left(\frac{z\_l - v\_k}{\sigma\_l}\right)^2,\tag{7.8}$$

where *σi* is again the standard error of track position *zi*. As there is no prior knowledge on the probabilities *P (z<sup>i</sup>* ∈ *Ck)*, the principle of maximum entropy can be applied, giving a Gibbs distribution:

$$P(\mathbf{z}\_l \in C\_k) = \frac{\exp\left(-\beta \, d(z\_l, v\_k)\right)}{\sum\_{l=1}^K \exp\left(-\beta \, d(z\_l, v\_l)\right)}\tag{7.9}$$

The parameter *β* = 1*/T* is the inverse of the "temperature" of the system.

Finding the most probable configuration of the prototypes at a given temperature is equivalent to minimizing the "free energy" *F*, which is given by [8]:

$$F = -\frac{1}{\beta} \sum\_{i=1}^{n} \ln \sum\_{k=1}^{K} \exp \left( -\beta \, d(z\_i, v\_k) \right). \tag{7.10}$$

Minimization of *F* has to be done by numerical methods; see Sect. 3.1.

At infinite temperature, i.e., *β* = 0, there is a single cluster containing all data points. The temperature is now lowered according to a predefined annealing schedule. At some positive value of *β*, the cluster will undergo a phase transition and split into smaller clusters. Annealing is continued until the end of the schedule. At every temperature, a characteristic number of effective prototypes emerges at distinct positions, independent of the number of prototypes.

Instead of using a large number of prototypes, many of which may coincide at a given temperature, one can work with weighted prototypes, where the weight *ρk* corresponds to the fraction of unweighted prototypes that coincide at *vk*. The weights *ρk* always sum to 1, and the free energy is slightly modified:

$$F = -\frac{1}{\beta} \sum\_{i=1}^{n} \ln \sum\_{k=1}^{K} \rho\_k \exp \left( -\beta \, d(z\_i, v\_k) \right). \tag{7.11}$$

The association probabilities and the cluster weight are given by:

$$p\_{l,k} = P(\mathbf{z}\_l \in \mathcal{C}\_k) = \frac{\rho\_k \exp\left(-\beta \, d(\mathbf{z}\_l, \mathbf{v}\_k)\right)}{\sum\_{l=1}^K \rho\_l \exp\left(-\beta \, d(\mathbf{z}\_l, \mathbf{v}\_l)\right)}, \quad \rho\_k = \frac{1}{n} \sum\_{l=1}^n p\_{l,k}.\tag{7.12}$$

The annealing is started at high temperature with a single prototype of weight *ρ*<sup>1</sup> = 1. If the distance function is chosen as in Eq. (7.8), the minimum of *F* is at the weighted mean of the data points. During annealing, the temperature is gradually decreased and the local minima of *F* emerge. The critical temperature *T* <sup>c</sup> *<sup>k</sup>* of cluster *k* is the point where a local minimum turns into a saddle point and is given by:

$$T\_k^c = 2\sum\_{l=1}^n \frac{p\_{l,k}}{\sigma\_l^2} \left(\frac{\mathbf{z}\_l - \upsilon\_k}{\sigma\_l}\right)^2 \Big/ \sum\_{l=1}^n \frac{p\_{l,k}}{\sigma\_l^2}.\tag{7.13}$$

Whenever the temperature falls below the critical temperature of a cluster, the prototype of this cluster is replaced by two nearby representatives. The association probabilities and the new cluster weights are recomputed according to Eq. (7.12). If the final temperature is small enough, the soft assignments of data points to clusters turns into a hard assignment.

## **7.3 Primary Vertex Finding in 3D**

In principle, the clustering methods described above for vertex finding in 1D can also be applied to vertex finding in 3D. It has to be noted, though, that the shortest distance in space between two tracks is peculiar insofar as it does not satisfy the triangle inequality: if tracks *a* and *b* are close, and tracks *b* and *c* are close, it does not follow that tracks *a* and *c* are close as well. The distance between two clusters of tracks should therefore be defined as the maximum of the individual pairwise distances, known as complete linkage in the clustering literature. Alternatively, the distance between two clusters can be the distance between the two vertices fitted from the clusters.

## *7.3.1 Preclustering*

Because of the high track and vertex multiplicity in typical collider experiments at the LHC, preliminary clusters of tracks can be formed by selecting a primary vertex or a small group of primary vertices found in 1D. All tracks that are compatible with these are put in a preliminary cluster. This reduces the combinatorics and opens the possibility of processing these preliminary clusters in parallel. Tracks that are not compatible with any primary vertex are reserved for secondary vertex finding. In low-multiplicity experiments, this preliminary clustering can be omitted.

## *7.3.2 Greedy Clustering*

Greedy clustering is agglomerative and starts with a single track, preferably a highquality track with many hits and good *χ*<sup>2</sup> (see Sect. 6.4.1). It is combined with its nearest neighbour in 3D, and a vertex is fitted from the two tracks. If the fit is successful, the vertex is stored. The track nearest to vertex is added, for instance, by means of an extended Kalman filter (see Sect. 8.1.2.2). This procedure is continued until the vertex fit fails. Clustering is then resumed with an unused track.

The greedy clustering does not guarantee the globally best assignment of tracks to vertices, as tracks that are attached to a vertex remain attached forever. This can be cured by using a robust vertex fit throughout (see Sect. 8.2), allowing a track to be removed from a vertex if it is tagged as an outlier.

## *7.3.3 Iterated Estimators*

This is a divisive clustering algorithm with the following steps:


The iteration stops when no vertex with at least two tracks can be successfully fitted. Step 2 might itself be iterative, especially if the vertex fit is not robust, so that the incompatible tracks have to be removed sequentially. An iterative vertex finder, based on an adaptive fit (Sect. 8.2.2) and called the Adaptive Vertex Reconstructor (AVR, [9]) is implemented in the RAVE toolbox [10, 11]; see Appendix C.

## *7.3.4 Topological Vertex Finder*

A general topological vertex finder called ZVTOP was proposed in [12]. It is related to the Radon transform, which is a continuous version of the Hough transform used for track finding (see Sect. 5.1.2). The search for vertices is based on a function *V (v)* that quantifies the probability of a vertex at location *v*. For each track a Gaussian probability tube *fi(v)* is constructed. The function *V (v)* is defined taking into account that the value of *fi(v)* must be significant for at least two tracks:

$$V(\mathfrak{v}) = \sum\_{i=1}^{n} f\_i(\mathfrak{v}) - \frac{\sum\_{i=1}^{n} f\_i^2(\mathfrak{v})}{\sum\_{i=1}^{n} f\_i(\mathfrak{v})}$$

Due to the second term on the right-hand side, *V (v)* ≈ 0 in regions where *fi(v)* is significant for only one track. The form of *V (v)* can be modified to fold in known physics information about probable vertex locations. For instance, *V (v)* can be augmented by a further function *f*0*(v)* describing the location and spread of the interaction point. In addition, *V (v)* may be modified by a factor dependent on the angular location of the point *v*.

Vertex finding amounts to finding the local maxima of the function *V (v)*. The search starts at the calculated maxima of the products *fi(v)fj (v)* for all track pairs. For each of these points, the nearest maximum of *V (v)* is found. As *V (v)* is a smooth function, any of the methods discussed in Sect. 3.1 can be employed. The found maxima are clustered together to form candidate vertex regions. The final association of the tracks to the vertex candidates can be done on the basis of the respective *χ*<sup>2</sup> contributions or by an adaptive fit (see Sect. 8.2.2). An experimental application is described in [13].

In [14], the topological vertex finder was augmented by a procedure based on the concept of the minimum spanning tree of a graph. For each track, the bins crossed by the track (or a tangent to the track at the point of closest approach) are incremented by one.

# *7.3.5 Medical Imaging Vertexer*

The Medical Imaging Vertexer (MIV) [15] is similar to ZVTOP, but differs from it in two points: first, it works with a pixelized representation of the track density; second, it applies a medical imaging filter to the density before finding the maxima. The vertex finder can be summarized in the following steps:


The performance of the MIV has been studied and compared to the Adaptive Vertex Reconstructor (AVR) in [15]. It is shown that the MIV finds vertices with higher efficiency and higher purity at large pile-up, whereas the AVR performs better at small pile-up.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 8 Vertex Fitting**

**Abstract** The methods used for vertex fitting are closely related to the ones used in track fitting. The chapter describes least-squares estimators as well as robust and adaptive estimators. Furthermore, it is shown how the vertex fit can be extended to a kinematic fit by imposing additional constraints on the tracks participating in the fit.

# **8.1 Least-Squares Fitting**

# *8.1.1 Straight Tracks*

#### **8.1.1.1 Exact Fit**

Assume that there are *n* straight tracks that have to be fitted to a common vertex. Track *i* is given by a point *ri*, a unit direction vector *ai*, and the joint covariance matrix *V<sup>i</sup>* of *q<sup>i</sup>* = *(ri*; *ai), i* = 1*,...,n*. The rank of *V<sup>i</sup>* is usually equal to five. Here and in the entire chapter, it is assumed that there is no material between the vertex and the point or surface where the track parameters are defined.

The estimated common vertex is the point *v* that minimizes the sum of the weighted squared distances from the tracks. The squared distance *Di* of track *i* from the point *v* is given by:

$$D\_l(\mathfrak{v}) = \left[ (\mathfrak{r}\_l - \mathfrak{v}) \times \mathfrak{a}\_l \right]^2. \tag{8.1}$$

For a given vertex *v*0, the variance *σ*<sup>2</sup> *<sup>i</sup>* = var [*Di*] is computed by linearized error propagation. The Jacobian of *Di* with respect to *q<sup>i</sup>* is given by:

© The Author(s) 2021 R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_8

$$\begin{aligned} \mathbf{J}\_{i} &= \begin{pmatrix} \frac{\partial D\_{i}}{\partial \mathbf{r}\_{i}}\\ \frac{\partial D\_{i}}{\partial \mathbf{a}\_{i}} \end{pmatrix}, \ \frac{\partial D\_{i}}{\partial \mathbf{r}\_{i}} = 2 \cdot \begin{pmatrix} a\_{i,2} \ \eta\_{i,21} - a\_{i,3} \ \eta\_{i,13} \\ a\_{i,3} \ \eta\_{i,32} - a\_{i,1} \ \eta\_{i,21} \\ a\_{i,1} \ \eta\_{i,13} - a\_{i,2} \ \eta\_{i,32} \end{pmatrix}, \\\\ \frac{\partial D\_{i}}{\partial \mathbf{a}\_{i}} &= 2 \cdot \begin{pmatrix} d\_{i,3} \ \eta\_{i,13} - d\_{i,2} \ \eta\_{i,21} \\ d\_{i,1} \ \eta\_{i,21} - d\_{i,3} \ \eta\_{i,32} \\ d\_{i,2} \ \eta\_{i,32} - d\_{i,1} \ \eta\_{i,13} \end{pmatrix}, \\\end{aligned} \tag{8.2}$$

with the auxiliary variables

$$d\_{l,k} = r\_{l,k} - \upsilon\_{0,k}, \ \ \eta\_{l,jk} = a\_{l,j} \ d\_{l,k} - a\_{l,k} \ d\_{l,j}, \ \ j,k = 1,2,3... $$

It follows that

$$
\sigma\_i^2 \approx \mathbf{J}\_i^\mathsf{T} \cdot \mathbf{V}\_i \cdot \mathbf{J}\_i. \tag{8.3}
$$

Minimizing the sum of the squared distances gives the fitted vertex *v***ˆ**:

$$
\hat{\boldsymbol{\sigma}} = \arg\min \mathcal{S}(\boldsymbol{\nu}), \quad \text{with} \quad \mathcal{S}(\boldsymbol{\nu}) = \sum\_{l=1}^{n} \frac{D\_l(\boldsymbol{\nu})}{\sigma\_l^2}. \tag{8.4}
$$

The minimization with the Newton–Raphson method proceeds iteratively:


$$\nabla \mathcal{S} = \frac{\partial \mathcal{S}}{\partial \mathbf{v}} = \sum\_{l=1}^{n} \frac{1}{\sigma\_{l}^{2}} \cdot \frac{\partial D\_{l}}{\partial \mathbf{v}} \,, \quad \text{with} \quad \frac{\partial D\_{l}}{\partial \mathbf{v}} = -\frac{\partial D\_{l}}{\partial r\_{l}} \,. \tag{8.5}$$

3. Compute the Hessian matrix of S with respect to *v* at *v*0:

$$\nabla^2 \mathcal{S} = \sum\_{l=1}^n \frac{1}{\sigma\_l^2} \mathbf{H}\_l,\quad \text{with}\quad \mathbf{H}\_l = 2 \cdot \begin{pmatrix} a\_{l,2}^2 + a\_{l,3}^2 & -a\_{l,1}a\_{l,2} & -a\_{l,1}a\_{l,3} \\ -a\_{l,1}a\_{l,2} & a\_{l,1}^2 + a\_{l,3}^2 & -a\_{l,2}a\_{l,3} \\ -a\_{l,1}a\_{l,3} & -a\_{l,2}a\_{l,3} & a\_{l,1}^2 + a\_{l,2}^2 \end{pmatrix} . \tag{8.6}$$

4. Compute the solution *v*<sup>1</sup> of ∇S = **0**:

$$
\boldsymbol{\nu}\_{l} = \boldsymbol{\nu}\_{0} - (\nabla^{2} \mathcal{S})^{-1} \cdot \nabla \mathcal{S}. \tag{8.7}
$$

5. Set *v*<sup>0</sup> equal to *v*<sup>1</sup> and repeat from step 2 until convergence.

The covariance matrix of the final estimate *<sup>v</sup>***<sup>ˆ</sup>** is given by *(*∇2S*)*−1, and the *<sup>χ</sup>*2 statistic of the fit is equal to S*(v***ˆ***)*. Its number of degrees of freedom is the sum of the ranks of all *V<sup>i</sup>* minus three. Prior information on the vertex position that is independent of the track information can be included by an additional term in Eq. (8.4) or after the fit by a weighted mean.

#### **8.1.1.2 Simplified Fit**

If the uncertainty of the direction vectors *a<sup>i</sup>* is neglected, the vertex fit can be further simplified [1]. Assume that track *<sup>i</sup>* is specified by a reference point *<sup>r</sup><sup>i</sup>* <sup>=</sup> *(xi, yi, zi)*<sup>T</sup> in the vicinity of the vertex and a unit direction vector *a<sup>i</sup>* in spherical coordinates:

$$\mathfrak{a}\_{l} = \left(\cos\varphi\_{l}\cos\lambda\_{l}, \sin\varphi\_{l}\cos\lambda\_{l}, \sin\lambda\_{l}\right)^{\mathsf{T}},\tag{8.8}$$

where *ϕi* is the azimuth and *λi* = *π/*2−*θi* the dip angle, i.e., the complement of the polar angle. For the purpose of the vertex fit, a convenient choice of the coordinate system for position is a system where the *x*-axis is parallel to the track, the *y*⊥ axis is perpendicular to *x* and *z*, and the *z*⊥-axis forms a right-handed orthonormal system with *x* and *y*⊥. The coordinate transformation of the reference point *r<sup>i</sup>* to this track-based system is given by the following rotation:

$$\mathbf{r}'\_{l} = \mathbf{R}\_{l}\mathbf{r}\_{l} = \begin{pmatrix} \cos\varphi\_{l}\cos\lambda\_{l} & \sin\varphi\_{l}\cos\lambda\_{l} & \sin\lambda\_{l} \\ -\sin\varphi\_{l} & \cos\varphi\_{l} & 0 \\ -\cos\varphi\_{l}\sin\lambda\_{l} & -\sin\varphi\_{l}\sin\lambda\_{l}\cos\lambda\_{l} \end{pmatrix}\mathbf{r}\_{l}.\tag{8.9}$$

The coordinates *y*<sup>⊥</sup> and *z*<sup>⊥</sup> are called the transverse and the longitudinal impact parameter, respectively. The fit described in [1] assumes that *<sup>q</sup><sup>i</sup>* <sup>=</sup> *(y*⊥*, z*⊥*)*<sup>T</sup> has been estimated by the track fit, with the associated weight matrix *Gi*, and that the direction errors are negligible. The transformation from *r<sup>i</sup>* to *q<sup>i</sup>* is given by the 2×3 matrix *T <sup>i</sup>* consisting of the second and third line of *Ri*.

The vertex *v* is estimated by minimizing the sum of the weighted distances between the reference points and the vertex, transformed to the corresponding trackbased systems:

$$\mathcal{S}(\boldsymbol{\nu}) = \sum\_{i=1}^{n} (\boldsymbol{r}\_{i} - \boldsymbol{\nu})^{\mathsf{T}} \boldsymbol{T}\_{i}^{\mathsf{T}} \boldsymbol{G}\_{i} \boldsymbol{T}\_{i} (\boldsymbol{r}\_{i} - \boldsymbol{\nu}). \tag{8.10}$$

The estimated vertex and its covariance matrix *C* are therefore given by:

$$\hat{\boldsymbol{v}} = \mathbf{C} \sum\_{l=1}^{n} \mathbf{W}\_{l} \mathbf{r}\_{l}, \quad \text{with} \quad \mathbf{C} = \left(\sum\_{l=1}^{n} \mathbf{W}\_{l}\right)^{-1} \quad \text{and} \quad \mathbf{W}\_{l} = \mathbf{T}\_{l}^{\mathsf{T}} \mathbf{G}\_{l} T\_{l}, \ i = 1, \ldots, n. \tag{8.11}$$

# *8.1.2 Curved Tracks*

The fits described in the preceding subsection can also be used with locally straight tracks for which the change of curvature and direction in the vicinity of the vertex is negligible. If this is not the case, nonlinearities in the track model have to be taken into account.

#### **8.1.2.1 Nonlinear Regression**

The general vertex fit can be formulated as a nonlinear regression model [2]. Assume that there are *n* tracks to be fitted to a common vertex. The tracks are specified by the estimated track parameters *q<sup>i</sup>* and the associated covariance matrices *V i, i* = 1*,...,n*. The parameters to be estimated are the vertex position *v* and the momentum vectors *p<sup>i</sup>* of all tracks at the vertex, see Fig. 8.1.

The track parameters *q<sup>i</sup>* are nonlinear functions of the parameters:

$$\mathbf{q}\_{l} = \mathbf{h}\_{l}(\mathbf{v}, \mathbf{p}\_{l}), \ i = 1 \ldots, n. \tag{8.12}$$

**Fig. 8.1** A vertex fit with four tracks. The parameters of the fit are the vertex *v* and the momentum vectors *pi*; the observations are the estimated track parameters *q<sup>i</sup>*

The first-order Taylor expansion of *h<sup>i</sup>* at a suitable expansion point *e*<sup>0</sup> = *(v*0*, pi,*0*)* gives the following approximate linear model:

$$\mathbf{q}\_{l} \approx \mathbf{A}\_{l}\mathbf{v} + \mathbf{B}\_{l}\mathbf{p}\_{l} + \mathbf{c}\_{l}, \ i = 1 \ldots, n,\tag{8.13}$$

with

$$A\_{l} = \frac{\partial \boldsymbol{h}\_{l}}{\partial \boldsymbol{v}}\bigg|\_{\mathbf{e}\_{0}}, \ \mathbf{B}\_{l} = \left.\frac{\partial \boldsymbol{h}\_{l}}{\partial \boldsymbol{p}\_{l}}\right|\_{\mathbf{e}\_{0}}, \ \mathbf{c}\_{l} = \boldsymbol{h}\_{l}(\mathbf{v}\_{0}, \boldsymbol{p}\_{l,0}) - A\_{l}\boldsymbol{v}\_{0} - \mathbf{B}\_{l}\boldsymbol{p}\_{l,0}.\tag{8.14}$$

This can be written as:

$$
\begin{pmatrix} q\_1 \\ \vdots \\ q\_n \end{pmatrix} = \begin{pmatrix} A\_1 & \mathbf{B}\_1 & \mathbf{O} & \dots & \mathbf{O} \\ A\_2 & \mathbf{O} & \mathbf{B}\_2 & \dots & \mathbf{O} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ A\_n & \mathbf{O} & \mathbf{O} & \dots & \mathbf{B}\_n \end{pmatrix} \begin{pmatrix} v \\ p\_1 \\ \vdots \\ p\_n \end{pmatrix} + \begin{pmatrix} c\_1 \\ \vdots \\ c\_n \end{pmatrix} . \tag{8.15}
$$

The LS estimates *v***ˆ** and *p***ˆ***<sup>i</sup>* are obtained by:

$$
\begin{pmatrix} \hat{\boldsymbol{\hat{v}}} \\ \hat{\boldsymbol{p}}\_1 \\ \vdots \\ \hat{\boldsymbol{p}}\_n \end{pmatrix} = \boldsymbol{M}^{-1} \boldsymbol{N} \begin{pmatrix} \boldsymbol{q}\_1 - \boldsymbol{c}\_1 \\ \vdots \\ \boldsymbol{q}\_n - \boldsymbol{c}\_n \end{pmatrix}, \tag{8.16}
$$

with

$$M = \begin{pmatrix} D\_0 & D\_1 & D\_2 & \dots & D\_n \\\\ D\_1^\top & E\_1 & O & \dots & O \\\\ D\_2^\top & O & E\_2 & \dots & O \\\\ \vdots & \vdots & \vdots & \ddots & \vdots \\\\ D\_n^\top & O & O & \dots & E\_n \end{pmatrix},\tag{8.17}$$

$$N = \begin{pmatrix} A\_1^\top G\_1 & A\_2^\top G\_2 & \dots & A\_n^\top G\_n \\\\ B\_1^\top G\_1 & O & \dots & O \\\\ O & B\_2^\top G\_2 & \dots & O \\\\ \vdots & \vdots & \ddots & \vdots \\\\ O & O & \dots & B\_n^\top G\_n \end{pmatrix},\tag{8.18}$$

$$\mathbf{D}\_{l} = \mathbf{A}\_{l}^{\mathsf{T}} \mathbf{G}\_{l} \mathbf{B}\_{l}, \quad \mathbf{E}\_{l} = \mathbf{B}\_{l}^{\mathsf{T}} \mathbf{G}\_{l} \\ \mathbf{B}\_{l} = \mathbf{W}\_{l}^{-1}, \quad \mathbf{G}\_{l} = \mathbf{V}\_{l}^{-1}, \quad i = 1, \ldots, n,\tag{8.19}$$

$$\mathcal{D}\_0 = \sum\_{l=1}^n A\_l^\top \mathbf{G}\_l A\_l. \tag{8.20}$$

*<sup>C</sup>* <sup>=</sup> *<sup>M</sup>*−<sup>1</sup> can be written as a block matrix with the blocks *<sup>C</sup>ij , i, j* <sup>=</sup> <sup>0</sup>*,...,n*:

$$\mathcal{C}\_{00} = \left(\mathbf{D}\_0 - \sum\_{l=1}^n \mathbf{D}\_l \mathbf{W}\_l \mathbf{D}\_l^\top \right)^{-1},\tag{8.21}$$

$$\mathbf{C}\_{0j} = -\mathbf{C}\_{00}\mathbf{D}\_j\,\mathbf{W}\_j, \ \mathbf{C}\_{j0} = \mathbf{C}\_{0j}^\mathsf{T}, \ j > 0 \tag{8.22}$$

$$\mathbf{C}\_{lj} = \delta\_{lj} \mathbf{W}\_l + \mathbf{W}\_l \mathbf{D}\_l^\mathsf{T} \mathbf{C}\_{00} \mathbf{D}\_j \mathbf{W}\_j = \delta\_{lj} \mathbf{W}\_l - \mathbf{W}\_l \mathbf{D}\_l^\mathsf{T} \mathbf{C}\_{0j}, \quad i, j > 0. \tag{8.23}$$

Substitution of Eqs. (8.21)–(8.23) into Eq. (8.16) gives the following expressions for the estimated parameters:

$$\hat{\boldsymbol{\nu}} = \mathbf{C}\_{00} \sum\_{j=1}^{n} A\_j^{\mathsf{T}} \mathbf{G}\_j (\mathbf{I} - \mathbf{B}\_j \mathbf{W}\_j \mathbf{B}\_j^{\mathsf{T}} \mathbf{G}\_j) (\mathbf{q}\_j - \mathbf{c}\_j), \tag{8.24}$$

$$
\hat{\boldsymbol{p}}\_{l} = \boldsymbol{W}\_{l} \boldsymbol{B}\_{l}^{\mathsf{T}} \boldsymbol{G}\_{l} (\boldsymbol{q}\_{l} - \mathbf{c}\_{l} - \mathbf{A}\_{l} \hat{\boldsymbol{v}}), \ \boldsymbol{i} = 1, \ldots, n. \tag{8.25}
$$

The functions *h<sup>i</sup>* are re-expanded at the new expansion point *e*<sup>1</sup> = *(v***ˆ***, p***ˆ***i), i* = 1*,...,n*, and the fit is iterated until convergence. After convergence, the track parameters *q<sup>i</sup>* can be updated:

$$
\hat{\boldsymbol{q}}\_{l} = \boldsymbol{h}\_{l}(\hat{\boldsymbol{v}}, \hat{\boldsymbol{p}}\_{l}), \ i = 1 \ldots, n,\tag{8.26}
$$

In the linear approximation, the joint covariance matrix of *v***ˆ** and all *p***ˆ***<sup>i</sup>* is equal to *<sup>C</sup>* <sup>=</sup> *<sup>M</sup>*−1, from which the joint covariance matrix of all *<sup>q</sup>***ˆ***<sup>i</sup>* can be computed by linearized error propagation. The *χ*2-statistic of the fit can be computed as follows:

$$\chi^2 = \sum\_{i=1}^n (\mathbf{q}\_i - \hat{\mathbf{q}}\_i)^\mathsf{T} \mathbf{G}\_l (\mathbf{q}\_i - \hat{\mathbf{q}}\_i). \tag{8.27}$$

If the errors of the estimated track parameters can be assumed to be approximately Gaussian, the chi-square statistic is approximately *χ*2-distributed with

$$\text{ndf} = \sum\_{i=1}^{n} \text{rank}(V\_i) - 3\,(n+1) \tag{8.28}$$

degrees of freedom.

#### **8.1.2.2 Extended Kalman Filter**

The nonlinear regression can be reformulated as an extended Kalman filter (see Sect. 6.1.2). Initially, the state vector consists only of the prior information about the vertex position *v*0, and its covariance matrix *C*0. In many instances, the prior information is given by the position and the size of the beam spot or the target. If no prior information is available, *v*<sup>0</sup> is a rough guess, and *C*<sup>0</sup> is set to a large diagonal matrix.

For each track *i, i* = 1*,...,n*, the state vector is augmented by the threemomentum vector at the vertex *pi*. The system equation is the identity:

$$
\mathfrak{v}\_{l} = \mathfrak{v}\_{l-1}, \quad \mathbf{C}\_{l} = \mathbf{C}\_{l-1}. \tag{8.29}
$$

The measurement equation and its linearized form are the same as in Eqs. (8.12)– (8.14). The update of the vertex position and the estimation of *p<sup>i</sup>* can now be written as:

$$\boldsymbol{\upsilon}\_{l} = \mathbf{C}\_{l} \left[ \mathbf{C}\_{l-1}^{-1} \boldsymbol{\upsilon}\_{l-1} + \boldsymbol{A}\_{l}^{\mathsf{T}} \mathbf{G}\_{l}^{\mathsf{B}} (\boldsymbol{q}\_{l} - \mathbf{c}\_{l}) \right],\tag{8.30}$$

$$\mathbf{p}\_{l} = \mathbf{W}\_{l}\mathbf{B}\_{l}^{\mathsf{T}}\mathbf{G}\_{l}(\mathbf{q}\_{l} - \mathbf{c}\_{l} - \mathbf{A}\_{l}\mathbf{v}\_{l}),\tag{8.31}$$

with *<sup>W</sup><sup>i</sup>* <sup>=</sup> *(B*<sup>T</sup> *<sup>i</sup> <sup>G</sup>iBi)*−<sup>1</sup> and *<sup>G</sup>*<sup>B</sup> *<sup>i</sup>* <sup>=</sup> *<sup>G</sup><sup>i</sup>* <sup>−</sup>*GiBiWiB*<sup>T</sup> *<sup>i</sup> Gi*. The updated covariance and cross-covariance matrices are:

$$\mathsf{Var}\left[\boldsymbol{v}\_{l}\right] = \boldsymbol{C}\_{l} = \left(\boldsymbol{C}\_{l-1}^{-1} + \boldsymbol{A}\_{l}^{\mathsf{T}} \boldsymbol{G}\_{l}^{\mathsf{B}} \boldsymbol{A}\_{l}\right)^{-1},\tag{8.32}$$

$$\mathsf{Var}\left[\boldsymbol{p}\_{i}\right] = \boldsymbol{W}\_{i} + \boldsymbol{W}\_{i}\boldsymbol{B}\_{i}^{\mathsf{T}}\boldsymbol{G}\_{i}\boldsymbol{A}\_{i}\boldsymbol{C}\_{i}\boldsymbol{A}\_{i}^{\mathsf{T}}\boldsymbol{G}\_{i}\boldsymbol{B}\_{i}\boldsymbol{W}\_{i},\tag{8.33}$$

$$\mathbf{Cov}\left[\boldsymbol{v}\_{l},\boldsymbol{p}\_{l}\right] = -\mathbf{C}\_{l}\mathbf{A}\_{l}^{\mathsf{T}}\mathbf{G}\_{l}\mathbf{B}\_{l}\mathbf{W}\_{l}.\tag{8.34}$$

Each update step gives rise to residuals *r<sup>i</sup>* and a chi-square statistic *χ*<sup>2</sup> *i* :

$$
\mathbf{r}\_l = \mathbf{q}\_l - \mathbf{h}\_l(\mathbf{v}\_l, \mathbf{p}\_l), \tag{8.35}
$$

$$\boldsymbol{\chi}\_{i}^{2} = \mathbf{r}\_{i}^{\mathsf{T}} \mathbf{G}\_{i} \mathbf{r}\_{i} + (\boldsymbol{\upsilon}\_{i} - \boldsymbol{\upsilon}\_{i-1})^{\mathsf{T}} \mathbf{C}\_{i-1}^{-1} (\boldsymbol{\upsilon}\_{i} - \boldsymbol{\upsilon}\_{i-1}).\tag{8.36}$$

The chi-square statistic has two degrees of freedom and can be used to test the compatibility of track *i* with the current fitted vertex. If no intermediate results are needed, the computation of the momentum vectors *p<sup>i</sup>* can be deferred to the smoother, and the final vertex *v<sup>n</sup>* and its covariance matrix *C<sup>n</sup>* can be computed directly, cf. Eqs. (8.21) and (8.24):

$$\boldsymbol{v}\_{n} = \mathbf{C}\_{n} \left[ \mathbf{C}\_{0}^{-1} \boldsymbol{v}\_{0} + \sum\_{i=1}^{n} A\_{i}^{\mathsf{T}} \mathbf{G}\_{i}^{\mathsf{B}} (\boldsymbol{q}\_{i} - \mathbf{c}\_{i}) \right],\tag{8.37}$$

150 8 Vertex Fitting

$$\mathbf{C}\_{0} = \left(\mathbf{C}\_{0}^{-1} + \sum\_{l=1}^{n} A\_{l}^{\top} \mathbf{G}\_{l}^{\mathbf{B}} A\_{l}\right)^{-1}.\tag{8.38}$$

As there is no process noise in the system equation, the smoother is tantamount to recomputing the momentum vectors and the covariance matrices with the final vertex *v<sup>n</sup>* and its covariance matrix *Cn*, see also Eqs. (8.31)–(8.34):

$$\mathbf{p}\_{i\mid n} = \mathbf{W}\_{i}\mathbf{B}\_{i}^{\mathsf{T}}\mathbf{G}\_{i}(\mathbf{q}\_{i} - \mathbf{c}\_{i} - A\_{i}\mathbf{v}\_{n}),\tag{8.39}$$

$$\mathsf{Var}\left[\boldsymbol{p}\_{i\mid n}\right] = \boldsymbol{W}\_{i} + \boldsymbol{W}\_{i}\boldsymbol{B}\_{i}^{\mathsf{T}}\boldsymbol{G}\_{i}\boldsymbol{A}\_{i}\boldsymbol{C}\_{n}\boldsymbol{A}\_{i}^{\mathsf{T}}\boldsymbol{G}\_{i}\boldsymbol{B}\_{i}\boldsymbol{W}\_{i},\tag{8.40}$$

$$\mathbf{Cov}\left[v\_n, \, p\_{i\mid n}\right] = -\mathbf{C}\_n \mathbf{A}\_l^\top \mathbf{G}\_l \, \mathbf{B}\_l \, \mathbf{W}\_l,\tag{8.41}$$

$$\mathbf{Cov}\left[\boldsymbol{p}\_{i\mid n}, \,\mathbf{p}\_{j\mid n}\right] = \mathbf{W}\_{l}\mathbf{B}\_{l}^{\sf T}\mathbf{G}\_{l}A\_{l}\mathbf{C}\_{n}A\_{j}^{\sf T}\mathbf{G}\_{j}\,\mathbf{B}\_{j}\,\mathbf{W}\_{j}.\tag{8.42}$$

The update of the track parameters reads:

$$
\hat{q}\_{l} = h\_{l}(\mathbf{v}\_{n}, \mathbf{p}\_{l|n}), \ i = 1 \ldots, n. \tag{8.43}
$$

Their joint covariance matrix can be computed by linearized error propagation. Each track can be tested against the final vertex by computing the smoothed residuals and the corresponding chi-square statistic:

$$
\mathbf{r}\_{l\mid n} = \mathbf{q}\_l - \mathbf{h}\_l(\mathbf{v}\_n, \mathbf{p}\_{l\mid n}), \tag{8.44}
$$

$$\chi\_{l\mid n}^{2} = \mathbf{r}\_{l\mid n}^{\mathsf{T}} \mathbf{G}\_{l} \mathbf{r}\_{l\mid n} + (\mathfrak{v}\_{n} - \mathfrak{v}\_{n\mid -l})^{\mathsf{T}} \mathbf{C}\_{n\mid -l}^{-1} (\mathfrak{v}\_{n} - \mathfrak{v}\_{n\mid -l}),\tag{8.45}$$

where *vn*|−*<sup>i</sup>* is the final vertex with track *i* removed, and *Cn*|−*<sup>i</sup>* is its covariance matrix:

$$\mathbf{v}\_{n\parallel-i} = \mathbf{C}\_{n\parallel-i} \left[ \mathbf{C}\_{n}^{-1} \mathbf{v}\_{n} - A\_{i}^{\mathsf{T}} \mathbf{G}\_{i}^{\mathsf{B}} (\mathbf{q}\_{i} - \mathbf{c}\_{l}) \right],\tag{8.46}$$

$$\mathsf{Var}\left[\boldsymbol{v}\_{n\mid -l}\right] = \boldsymbol{\mathcal{C}}\_{n\mid -l} = \left(\boldsymbol{\mathcal{C}}\_{n}^{-1} - \boldsymbol{A}\_{l}^{\mathsf{T}}\boldsymbol{\mathsf{G}}\_{l}^{\mathsf{B}}\boldsymbol{A}\_{l}\right)^{-1}.\tag{8.47}$$

Searching for outliers in this way is, however, tedious and time consuming. The adaptive vertex fit described in Sect. 8.2 is better suited to and more powerful for this task, especially if there are several outliers.

#### **8.1.2.3 Fit with Perigee Parameters**

In many collider experiments, past and present, the magnetic field in the vicinity of the collision region is almost perfectly homogeneous, giving a helical track model. The "perigee" parametrization for such helical tracks was introduced in [3], with a **Fig. 8.2** A helical track in the projection to the *(x, y)*-plane. O: origin; C:circle center; P: perigee point; V: vertex; *ρ*: circle radius; *t*P: tangent at P; *ϕ*P: azimuth of track direction at P; *t*V: tangent at V; *ϕ*V: azimuth of track direction at V

correction in [4]. The track is parametrized around the point of closest approach, the perigee point *v*P, of the helix to the *z*-axis, which is also the direction of the beams and the magnetic field. The perigee point is expected to be close to the vertex. The five track parameters of the perigee parametrization are (see Fig. 8.2):


With these definitions, the trajectory can be approximately parametrized in terms of a running parameter *s*, which is the distance from P along the projected helix:

$$
\chi \approx \epsilon \sin \varphi \mathbf{p} + \mathbf{s} \cos \varphi \mathbf{p} - \frac{\mathbf{s}^2 \kappa}{2} \sin \varphi \mathbf{p}, \tag{8.48}
$$

$$\mathbf{y} \approx -\epsilon \cos \varphi \mathbf{p} + \mathbf{s} \sin \varphi \mathbf{p} + \frac{\mathbf{s}^2 \kappa}{2} \cos \varphi \mathbf{p},\tag{8.49}$$

$$z \approx z\_{\rm P} + s \cot \vartheta.$$

The track parameters *<sup>q</sup>* <sup>=</sup> *(, ϕ*P*, ϑ, z*P*,κ)*<sup>T</sup> have to be expressed as a function of the coordinates *<sup>v</sup>* <sup>=</sup> *(x*V*, y*V*, z*V*)*<sup>T</sup> of the vertex V, and the track parameters *<sup>p</sup>* <sup>=</sup> *(ϑ, ϕ*V*,κ)*<sup>T</sup> at V. Note that *<sup>ϑ</sup>* and *<sup>κ</sup>* are invariant along the helix. As higher orders of *κ* can usually be neglected for tracks in collider experiments, the following functional dependence is obtained:

$$\begin{aligned} \epsilon &\approx -R - \mathcal{Q}^2 \kappa / 2, \\ \varepsilon\_\mathcal{P} &\approx \varepsilon\_\mathcal{V} - \mathcal{Q}(1 - R\kappa) \cot \vartheta, \\ \varphi\_\mathcal{P} &\approx \varphi\_\mathcal{V} - \mathcal{Q}\kappa, \end{aligned} \tag{8.50}$$

with

$$\mathcal{Q} = \mathbf{x}\mathbf{\hat{v}}\cos\varphi\mathbf{v} + \mathbf{y}\mathbf{\hat{v}}\sin\varphi\mathbf{v}, \quad \mathcal{R} = \mathbf{y}\mathbf{\hat{v}}\cos\varphi\mathbf{v} - \mathbf{x}\mathbf{y}\sin\varphi\mathbf{v}.\tag{8.51}$$

The Jacobian matrix at the lowest order is given by:

$$
\frac{\partial \boldsymbol{q}}{\partial (\boldsymbol{v}, \boldsymbol{p})} = \begin{pmatrix}
\boldsymbol{s} & -\boldsymbol{t}\boldsymbol{c} & -\kappa \boldsymbol{c} \\
\boldsymbol{0} & \boldsymbol{1} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{Q}(\boldsymbol{1} + \boldsymbol{t}^{2}) & \boldsymbol{0} \\
\boldsymbol{\mathcal{Q}} & -\boldsymbol{R}\boldsymbol{t} & \boldsymbol{1} \\
\end{pmatrix},\tag{8.52}
$$

with

$$c = \cos \varphi\_{\rm V}, \; s = \sin \varphi\_{\rm V}, \; t = \cot \vartheta. \tag{8.53}$$

With these ingredients, the nonlinear regression (see Sect. 8.1.2.1) can be computed.

# **8.2 Robust and Adaptive Vertex Fitting**

## *8.2.1 Vertex Fit with M-Estimator*

Estimators are called robust if they are insensitive to outlying observations [5– 7]. The M-estimator, see Sect. 6.2.1, is a well-known robust estimator that can be implemented as an iterated reweighted LS estimator that assigns smaller weights (larger uncertainties) to observations suspected to be outliers. One of the first attempts, if not the first, to make the vertex fit robust is the extension of the Kalman filter to an M-estimator [8] of the Huber type [5]. Before the fit, the parameters of each track are decorrelated by finding the orthogonal matrix *U<sup>i</sup>* that transforms the covariance matrix *V<sup>i</sup>* to a diagonal matrix *Di*:

$$\mathbf{D}\_{l} = \mathbf{U}\_{l}\mathbf{V}\_{l}\mathbf{U}\_{l}^{\mathsf{T}}.\tag{8.54}$$

#### **Table 8.1** Algorithm: Vertex fit with M-estimator

VERTEX FIT WITH M-ESTIMATOR


$$w\_{i,j} = \max\left[\varepsilon, \frac{\psi\left(r\_{i\mid n,j}/V\_{i,jj}\right)}{r\_{i\mid n,j}/V\_{i,jj}}\right],\tag{8.56}$$

where *Vi,jj* is the *j* -th diagonal element of the initial covariance matrix *Vi*, and *ψ(t)* is one of the functions in Table 6.1 or a similar one from the literature.


The measurement equation Eq. (8.13) is transformed by setting

$$\mathbf{V}\_{l}\leftarrow\mathbf{D}\_{l},\ \ \mathbf{q}\_{l}\leftarrow\mathbf{U}\_{l}\mathbf{q}\_{l},\ \ \ \mathbf{A}\_{l}\leftarrow\mathbf{U}\_{l}\mathbf{A}\_{l},\ \ \ \mathbf{B}\_{l}\leftarrow\mathbf{U}\_{l}\mathbf{B}\_{l},\ \ \ \mathbf{c}\_{l}\leftarrow\mathbf{U}\_{l}\mathbf{c}\_{l}.\tag{8.55}$$

The M-estimator is then computed by an iterated LS estimator, see Table 8.1. Vertex fits using M-estimators with other weight functions are described in [9, 10].

## *8.2.2 Adaptive Vertex Fit with Annealing*

The adaptive vertex fit (AVF) was introduced in [11] and further investigated in [12– 15]. It can be interpreted either as an EM algorithm or as an M-estimator with a specific weight function [16, 17]; see also Table 6.1. In the AVF, tracks as a whole are down-weighted, as the weight *wi* is a function of the distance of track *i* from the current vertex, measured by the chi-square *χ*<sup>2</sup> *i* :

$$\chi\_l^2 = \mathbf{r}\_{l\mid n}^\mathsf{T} \mathbf{G}\_l \mathbf{r}\_{l\mid n}, \quad w\_l\left(\chi\_l^2\right) = \frac{\exp\left(-\chi\_l^2 / 2T\right)}{\exp\left(-\chi\_l^2 / 2T\right) + \exp\left(-\chi\_\mathbf{c}^2 / 2T\right)},\tag{8.57}$$

where *T* is a temperature parameter. The weight *wi* can be interpreted as the probability that track *i* belongs to the vertex. The chi-square cut *χ*<sup>2</sup> <sup>c</sup> sets the threshold where the weight is equal to 0.5. Beyond this cut, a track is considered to be an outlier rather than an inlier.

The temperature *T* modifies the shape of the function in Eq. (8.57). At high temperature, the weight function varies very slowly; at low temperature, the weight is close to 1 for *χ*<sup>2</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>χ</sup>*<sup>2</sup> <sup>c</sup> and close to 0 for *χ*<sup>2</sup> *<sup>i</sup> > χ*<sup>2</sup> <sup>c</sup> (see Fig. 6.1c), with a sharp

#### **Table 8.2** Algorithm: Adaptive vertex fit with annealing

ADAPTIVE VERTEX FIT WITH ANNEALING


$$w\_i = \frac{\exp\left(-\chi\_i^2 / 2T\_k\right)}{\exp\left(-\chi\_i^2 / 2T\_k\right) + \exp\left(-\chi\_\varepsilon^2 / 2T\_k\right)}.\tag{8.58}$$


drop at *χ*<sup>2</sup> *<sup>i</sup>* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> <sup>c</sup> . The weights at low temperature can be used to identify secondary tracks in a fit of the primary vertex; see Sect. 7.3.3.

Similar to the M-estimator in Sect. 8.2.1, the adaptive vertex fit is implemented as an iterated LS estimator, which can be one of the methods described in Sect. 8.1. The temperature *T* can be used to employ an annealing procedure that helps to reach the globally optimal solution. The adaptive vertex fit is summarized in Table 8.2.

In order to get reasonable initial weights, the initial vertex *v(*0*)* has to be chosen carefully, preferably with a robust finder [18]. The annealing schedule and the constant *χ*<sup>2</sup> <sup>c</sup> have to be tuned on simulated data. A detailed study and useful hints can be found in [15].

## *8.2.3 Vertex Quality*

The assessment of the quality of a fitted vertex is similar to the assessment of track quality; see Sect. 6.4. The primary criterion is the chi-square statistic of the vertex fit, or its *p*-value. If the *p*-value is unreasonably small, a search for outlying tracks can be started. There are several sources of outliers in the vertex fit:


Just as in the case of the track fit, outliers can be detected by a test based on the residuals of the track with respect to the estimated vertex position. Outlying tracks that pass the track quality check are often candidates for inclusion in a secondary vertex; see Sects. 7.3.3 and 9.2.

## **8.3 Kinematic Fit**

The vertex fit as described so far imposes purely geometrical constraints on the participating tracks, namely that they originate at the same point in space. Especially in the case of a secondary vertex, the vertex fit can be extended by imposing other geometrical or kinematical constraints. In the vertex fit of a photon conversion (see Sect. 9.4), the constraint can be imposed that the momentum vectors of the outgoing tracks are parallel. In the fit of a decay vertex (see Sects. 9.2 and 9.3), the laws of momentum and/or energy conservation can be imposed as constraints. In addition, assumptions about the mass of some or all of the participating particles can be included. The width of a resonance can be included by considering its mass as a virtual measurement with a standard error reflecting the width. Such fits are called kinematic fits.

In a kinematic fit, a track is most conveniently represented by a collection *u* of parameters that are physically meaningful and allow a simple formulation of the constraints [19]. If only kinematic constraints are present, *u* contains the energy and the Cartesian momentum components in the global coordinate system:

$$\mathfrak{u} = (E, p\_{\mathfrak{x}}, p\_{\mathfrak{y}}, p\_{\mathfrak{z}})^{\mathsf{T}}.\tag{8.59}$$

For charged tracks, *u* is computed from the usual representation by a five-vector *q*, and the covariance matrix *V* = Var[*u*] is obtained by linearized error propagation. If the mass of the particle is assumed to be known and fixed, the rank of *V* is three, as the energy is a deterministic function of the momentum *p* and the mass *m*. If there is an independent measurement of the energy, for instance by a cluster in the calorimeter, the rank is four. For neutral tracks, *u* is computed from the calorimeter information. As there is no independent momentum measurement, the rank of *V* is three.

If in addition a vertex constraint is to be enforced, the location *x, y,z* in space, which can vary along the track, is appended to *u*, and the rank of *V* increases by two [19].

If *n* tracks with parameters *u<sup>k</sup>* and covariance matrices *V k, k* = 1*,...,n*, participate in the kinematic fit, their parameters are combined in a single column vector *y*:

$$\mathbf{y} = \begin{pmatrix} \mathfrak{u}\_1 \\ \vdots \\ \mathfrak{u}\_n \end{pmatrix}. \tag{8.60}$$

If the estimated *u<sup>k</sup>* are uncorrelated, their joint covariance matrix *V* is block-diagonal:

$$\mathsf{Var}\left[\mathbf{y}\right] = \mathsf{V} = \mathsf{blkdiag}\left(\mathbf{V}\_1, \dots, \mathbf{V}\_n\right). \tag{8.61}$$

If the momentum parts of the *u<sup>k</sup>* are the results of a preceding vertex fit, they are correlated, and their joint covariance matrix is given in Eqs. (8.40) and (8.42).

The vector *y* is considered to be an unbiased observation of the vector *α* that satisfies *r* ≤ rank*(V )* kinematical or geometrical constraints:

$$\mathbf{y} = \mathfrak{a} + \mathfrak{e}, \ \mathsf{E}\left[\mathfrak{e}\right] = \mathbf{0}, \ \mathsf{Var}\left[\mathfrak{e}\right] = V. \tag{8.62}$$

The constraints are expressed by *r* equations *hi(α)* = 0*, i* = 1*,...,r*, which can be written compactly in the following vector form:

$$h(\mathfrak{a}) = \mathfrak{0}, \text{ with } h = (h\_1(\mathfrak{a}), \dots, h\_{r'}(\mathfrak{a}))^\top. \tag{8.63}$$

The function *h* is approximated by its first-order Taylor expansion at the expansion point *α*<sup>0</sup> = *y*, resulting in a set of linear constraints:

$$h(\mathfrak{a}) \approx h(\mathfrak{a}\_0) + H(\mathfrak{a} - \mathfrak{a}\_0) = h(\mathfrak{a}\_0) + H\mathfrak{a} - H\mathfrak{a}\_0 = H\mathfrak{a} + d = \mathfrak{0},\qquad(8.64)$$

with the Jacobian matrix *H* = *∂h/∂α* evaluated at *α* = *α*<sup>0</sup> and *d* = *h(α*0*)* − *H α*0. It is assumed that *H* has rank *r* in a sufficiently large neighbourhood of *y*.

If *V* has full rank, the constrained LS estimate of *α* is obtained by minimizing the following objective function:

$$S(\mathfrak{a},\lambda) = (\mathfrak{a} - \mathfrak{y})^{\mathsf{T}} G \left( \mathfrak{a} - \mathfrak{y} \right) + 2\lambda^{\mathsf{T}} \left( H\mathfrak{a} + \mathfrak{d} \right), \tag{8.65}$$

where *<sup>G</sup>* <sup>=</sup> *<sup>V</sup>* <sup>−</sup><sup>1</sup> and *<sup>λ</sup>* is the vector of Lagrange multipliers. Setting the gradient of S*(α, λ)* to zero results in the following system of linear equations:

$$G\left(\mathfrak{a} - \mathfrak{y}\right) + H^{\mathsf{T}}\lambda = \mathfrak{0},\tag{8.66}$$

$$H\mathfrak{a} + d = \mathbf{0}.\tag{8.67}$$

The system can be explicitly solved for *α*, the solution being the estimate *α***ˆ**:

$$
\hat{\mathfrak{a}} = \mathfrak{y} - V H^{\mathsf{T}} G\_H \left( H \mathfrak{y} + \mathfrak{d} \right), \quad \text{with} \quad G\_H = \left( H V H^{\mathsf{T}} \right)^{-1}. \tag{8.68}
$$

The solution is valid also for singular *<sup>V</sup>* as long as *<sup>r</sup>* <sup>≤</sup> rank*(<sup>V</sup> )*, so that *HVH*<sup>T</sup> has full rank *r* and can be inverted. This can be proved by regularization of *V* ; see Appendix B.

The constraints are re-expanded at the new expansion point *α*<sup>0</sup> = *α***ˆ**, and the fit is iterated until convergence. The covariance matrix *Vα***<sup>ˆ</sup>** of the final estimate *α***ˆ** and the chi-square statistic of the fit are given by:

$$\boldsymbol{V}\_{\hat{\mathbf{a}}} = \boldsymbol{V} - \boldsymbol{V}\boldsymbol{H}^{\mathsf{T}}\boldsymbol{G}\_{\boldsymbol{H}}\boldsymbol{H}\boldsymbol{V},\tag{8.69}$$

$$\chi^2 = (H\mathbf{y} + \mathbf{d})^\mathsf{T} \mathbf{G}\_H \left( H\mathbf{y} + \mathbf{d} \right). \tag{8.70}$$

The *χ*<sup>2</sup> has *r* degrees of freedom and can be used to assess the goodness of the fit, provided that the linear approximation of the constraints is satisfactory and the distribution of the error term *ε* in Eq. (8.62) is close to normal.

A comprehensive software package for kinematic fitting, called KWFIT, can be found in [20]. In addition to stand-alone kinematic fits as described above, it allows fitting entire decay chains with multiple vertices. For a discussion of its features and of kinematic fitting in general, see [21]. More recent decay chain fitters are described in [22] and [23].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 9 Secondary Vertex Reconstruction**

**Abstract** The chapter reviews methods for the search for secondary vertices. Four types of secondary vertices are discussed in detail: decays of short-lived particles, decays of long-lived particles, photon conversions, and hadronic interactions in the detector material.

# **9.1 Introduction**

A vertex is called a secondary vertex or displaced vertex if it is outside the beam profile in a collider experiment or outside the target region in a fixed-target experiment. Secondary vertices arise in the following contexts:

**Decays of unstable particles** The decaying particle is call the "mother" particle, the decay products are called the "daughter" particles. Depending on the lifetime and the momentum of the mother particle, the secondary decay vertex is displaced by a certain distance from its production point, which can be either a primary or itself a secondary vertex. The properties of the mother particles have to be inferred from the decay products. Finding decays is important for many types of physics analyses as well as for momentum scale calibration, by comparing the reconstructed mass of the mother particle to the known one.

**Interactions in the detector material** Finding interaction vertices can be useful for mapping the amount and position of material in the detector. Two types of interactions are important in practice: photon conversions, i.e., pair production of electrons and positrons or muons in the electric field of a nucleus; and hadronic interactions of primary particles. A decay of a charged particles into a single charged daughter particle plus some neutral daughters looks just like a kink in the track. Kink finding is the subject of Sect. 6.4.3. The following subsections present some methods for finding decay and interaction vertices.

## **9.2 Decays of Short-Lived Particles**

In the present context, short-lived particles are defined as particles that decay before they enter the first layer of the innermost tracking device. This includes *B* (beauty) and *D* (charmed) hadrons as well as *τ* leptons, which travel no more than a few millimeters before they decay.

The key to successful secondary vertex reconstruction is a precise knowledge of the primary vertex and the correct selection of the tracks of the decay products. In order to make sure that the primary vertex is not contaminated with secondary tracks, fake tracks or—at the LHC—pile-up tracks, the fit has to be made robust, either by removing outliers (Sect. 8.2.3), or by employing a robust fitting algorithm in the first place (Sects. 8.2.1 and 8.2.2). If there is prior information on the beam spot, it should be included into the primary vertex fit. The tracks removed from the primary vertex or down-weighted as outliers are by definition candidates for secondary tracks; see Sect. 7.3.3.

The selection of the secondary tracks depends on the physics goals of the analysis. If a specific decay channel is considered, the tracks can be selected according to kinematic criteria, for instance their type, charge, and momentum [1]. In addition, the distance of the track from the primary vertex in the transverse plane, called the transverse impact parameter *d* 0, can be used to verify that the selected tracks are not produced at the primary vertex. As the uncertainty of *d* <sup>0</sup> depends on the angle and the momentum of the track, the test is based on the standardized impact parameter *<sup>s</sup>*<sup>0</sup> <sup>=</sup> *<sup>d</sup>* <sup>0</sup>*/σ*[*<sup>d</sup>* <sup>0</sup>], also called the significance of the impact parameter.

Secondary tracks can also be selected by their impact parameter alone. To this end, the impact parameter of a track is given a sign that is positive if the track intersects the trajectory of the decaying hadron or *τ* lepton downstream of the primary vertex [1–3]. As the trajectory is not known before the secondary vertex fit, it is approximated by the jet axis of the jet the decaying particle is embedded in. Only tracks with positive sign are candidates for secondary tracks.

The information contained in the impact parameter of a secondary track can be augmented by considering its functional relation with the azimuth angle of the track in the transverse projection, see Fig. 9.1. Let *φ*1*,...,φm* be the azimuth angles and *d* 0 <sup>1</sup> *,...,d* <sup>0</sup> *<sup>m</sup>* the transverse impact parameters of *m* tracks emerging from the same secondary vertex. If the primary vertex is very close to the origin of the coordinate system, as it usually is, the following functional relation holds approximately:

$$d\_i^0 \approx \ell \sin(\phi\_l - \phi\_{\text{h}}) \approx \ell (\phi\_l - \phi\_{\text{h}}), \quad i = 1, \ldots, m,\tag{9.1}$$

where is the decay length of the decaying particle and *φ*<sup>h</sup> its azimuth angle. Thus in the *(φ, d* <sup>0</sup>*)*-plane the points corresponding to secondary tracks lie on a straight line. This line can be found in various ways, for instance, by a Hough transform (Sect. 5.1.2), by histogramming the direction angles of segments connecting two points or by agglomerative clustering of the segments [3].

**Fig. 9.1** The functional relation between *φ* and *d* <sup>0</sup> of secondary tracks

## **9.3 Decays of Long-Lived Particles**

Long-lived particles such as neutral *K*-short mesons *K*<sup>0</sup> <sup>S</sup> and *Λ* baryons decay in the tracker volume. Reconstruction of their decays serves as a powerful cross-check of the magnetic field map and the alignment, as their mass is very well known, and systematic errors in the field map as well as misalignment can lead to a bias in the momentum and invariant mass estimation. Candidates for the decay products can be identified by a large impact parameter and the fact that tracker hits are consistently missing up to a certain layer. In the search for 2-prong neutral decays, so-called V0s, pairs of tracks with opposite charge and small distance in space are combined to vertex candidates. Without particle identification, the Armenteros– Podolansky plot is a useful tool to decide between hypotheses about the mother particle from the kinematics of their decay products [4]. It is a scatter plot of the longitudinal momentum asymmetry versus the transverse momentum of the positive decay product, see Fig. 9.2. The figure shows that in some regions of the plot a unique decision may not be possible.

The resolution of eventual ambiguities requires a kinematic fit with mass constraints (see Sect. 8.3) based on particle identification of the decay products. For example, in the case of *K*<sup>0</sup> <sup>S</sup> versus *Λ/Λ*¯, charged pions have to be separated from protons.

If the decay vertices are allowed to have more than two charged outgoing particles, two-track vertices can serve as seed vertices [5]. A seed vertex is rejected if at least one of its tracks has hits between the seed vertex and the primary vertex. The seed vertices are then clustered to larger vertices by an agglomerative procedure.

**Fig. 9.2** Armenteros–Podolansky plot for *K*<sup>0</sup> <sup>S</sup> and *Λ/Λ*¯. (From [4], with permission by the author)

## **9.4 Photon Conversions**

The reconstruction of photon conversions is an integral part of photon reconstruction in general. While the energy of unconverted photons is measured directly in the electromagnetic calorimeter (ECAL), a converted photon is reconstructed from the charged lepton pair, mostly an *e*+*e*− pair, that is produced in the electric field of a nucleus. A characteristic feature of the conversion vertex is the fact that the two leptons are parallel to each other. As a consequence, the vertex position along the photon direction is not very well defined by the vertex fit alone. Two topical examples of the reconstruction of converted photons are presented in the following paragraphs.

In the ATLAS experiment (Sect. 1.6.1.2), the first step is the standard electron/positron reconstruction, briefly described in Sect. 10.2. Clusters in the ECAL are built and used to create regions of interest (ROIs). Inside an ROI track finding is modified such that up to 30% energy loss is allowed at each material layer. After loosely matching the tracks in the ROIs with the ECAL clusters, tracks with silicon hits are refitted with the Gaussian-sum filter (GSF, Sect. 6.2.3). Conversion finding is run on the loosely matched tracks that have a high probability to be electrons or positrons according to the transition radiation detector. A conversion may have one or two outgoing tracks. Double-track conversions are created when two tracks form a vertex that is consistent with the hypothesis that they have been produced from a massless particle. Single-track conversions look like tracks without hits before a certain layer. If there are several conversions matched to a cluster, double-track conversions are preferred over single-track conversions.

In CMS (Sect. 1.6.1.3) the reconstruction of photon conversions uses the full tracking [6]; see also Sect. 10.3. Electrons and positrons in particular are reconstructed by associating a track in the silicon tracker with a cluster in the ECAL [7]. Besides the standard seeding of tracks in the pixel detector, additional seeds are computed by extrapolating the electron/positron from the cluster toward the interaction vertex, using both charge hypotheses. Starting from these seeds, track candidates are built by the combinatorial Kalman filter (Sect. 5.1.7) and sent to track fit with the GSF (Sect. 6.2.3). Electron/positron candidates are then created from the association of a GSF track with a cluster in the ECAL. Track pairs of opposite charge sign are selected and filtered by various constraints: the two tracks must have small angular separation; the trajectories (helices) projected to the transverse plane must not intersect; and the presumptive vertex must be inside the tracker volume. Track pairs that pass these filters are sent to the vertex fit with the constraint that the tracks are parallel at the vertex; see Sect. 8.3. The track pair is kept if the *p*-value of the chi-square statistic indicates a good fit.

# **9.5 Hadronic Interactions**

Knowledge of the material in a tracking device is important for understanding and tuning the track reconstruction algorithms. Reconstruction of hadronic interactions in the tracker material gives a precise map of the amount and location of active and inactive parts of the device. The hadronic interactions result in secondary vertices that have to be found. Whereas, in photon conversions, the vertex position along the photon direction is ill defined, the vertex position of hadronic interactions can be precisely estimated in all directions.

Tracks from secondary interaction vertices have a large impact parameter and may not be found by the standard track finder(s). In order to have the largest possible sample of such tracks, a special track finding pass can be implemented. Tracks from the primary vertex and short-lived decays are suppressed by a lower bound on the transverse impact parameter. Further quality cuts can be applied to select a reliable sample of secondary tracks. The following paragraphs describe examples from the experiments at the LHC.

In [8], a vertex finder is described that simultaneously finds all secondary vertices in an event in the ATLAS detector (Sect. 1.6.1.2). It starts by finding all possible intersections of pairs of selected tracks, performs a vertex fit and applies a quality cut. Wrong combinations can be further suppressed by requiring that neither track has hits in the layers with a smaller radius than the vertex. Finally, nearby vertices are merged, and incompatibility between vertices sharing tracks are resolved via an incompatibility graph; see Sect. 5.3. For results on reconstruction efficiency and vertex resolution, see [8]. The search for hadronic interaction vertices can be complemented by the reconstruction of photon conversions; see Sect. 9.4.

A precision measurement of the structure of the CMS pixel detector (Sect. 1.6.1.3), including sensors and support material, and the beam pipe using hadronic interactions is reported in [9]. The track selection requires a minimal transverse momentum of 0*.*2 GeV*/c*. For all pairs of selected tracks, their distance of closest approach is computed; if the distance is below a preset threshold, the midpoint of the points of closest approach is kept as a vertex candidate. These two-track vertices are then merged to larger vertices by agglomerative clustering, (Sect. 3.3.1). These vertices are sent to the adaptive vertex fit (AVF, Sect. 8.2.2).

The tracks associated to a fitted vertex are classified into incoming, outgoing, and merged tracks based on their transverse impact parameter and the number of hits before and after the vertex. At least three tracks have to be present, but not more than one incoming or merged tracks. Additional filters using the properties of the outgoing tracks serve to further reduce the number of fake interaction vertices. For results on resolution, efficiency and purity of the vertex reconstruction, see [9].

The Vertex Locator (VELO) of the LHCb experiment, see Sect. 1.6.1.4, was mapped by the reconstruction of secondary interaction vertices of hadrons produced in beam-gas collisions [10]. The data were collected during special runs in which helium or neon was injected into the collision region. The tracks used for the reconstruction of secondary vertices come from a modified track finding procedure that makes no prior assumption about the vertex position. Only well-reconstructed tracks with at least three 2D hits are selected. The secondary vertices are reconstructed from at least three tracks and have to be of good quality. They also must not be compatible with the primary vertex region. Only events with a single secondary vertex are used in the analysis. Results and the verification of the procedure with photon conversions to two muons can be found in [10].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Part IV Case Studies**

# **Chapter 10 LHC Experiments**

**Abstract** The chapter gives an overview of the track and vertex reconstruction methods of the LHC experiments that were used in production during Run 2 of the LHC, which ended in autumn of 2018.

## **10.1 ALICE**

Track reconstruction in ALICE [1, 2] starts with cluster finding in the two principal tracking detectors, i.e., in the pad rows of the TPC and in the silicon sensors of the Inner Tracking System (ITS, Sect. 1.6.1.1). The clusters reconstructed in the two innermost pixel layers of the ITS are used for a preliminary reconstruction of the primary vertex. Track reconstruction takes place through three main passes.

The first pass starts out in the outer part of the TPC and proceeds inwards through the ITS to the vertex region. Track seeds for primary particles are formed from pairs of measurements in the outer pad rows of the TPC and the primary vertex. Beginning from the seeds, track finding proceeds in the inward direction using a combinatorial Kalman filter (see Sect. 5.1.7).

Each reconstructed TPC track is extrapolated to the outermost layers of the ITS and used as a seed for track finding in the ITS. Track following in the ITS uses the particle hypothesis computed from the energy loss in the TPC, if available. Seven hypotheses are considered: electrons, muons, pions, kaons, protons, deuterons and tritons. If the d*E/*d*x* information is missing or inconclusive, the pion mass is used. An illustration of how a particle hypothesis can be made using a combination of d*E/*d*x* information (from the TPC measurements) and momentum information (from track reconstruction) is shown in Fig. 10.1.

A difficult point in the first-pass inwards tracking is the transition between the TPC and the ITS, due to the quite long (approximately 0.5 m) propagation distance between the two tracking systems. Given the uncertainty of the position of the track candidate propagated from the TPC and the high density of clusters in the ITS, many measurements are potentially compatible with the extrapolation. In order to lower the combinatorial complexity, the position coordinates of ITS hits used in finding

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0\_10

**Fig. 10.1** Ionization energy loss as a function of momentum for a set of particles in the ALICE experiment. (From https://arxiv.org/abs/1701.04810. ©CERN for the benefit of the ALICE Collaboration. Reproduced under License CC-BY-4.0)

primary tracks are augmented by the two angles describing the direction from the measurement to the primary vertex, effectively making the hits four-dimensional. When all layers of the ITS are traversed, the best surviving candidate (in terms of a quality measure using for instance the total *χ*<sup>2</sup> of the track and the number of assigned clusters) of each combinatorial tree is selected. Additional track candidates are then found by a stand-alone search in the ITS, using hits not attached to TPC tracks.

During the outward propagation in the second-pass through ITS, TPC and transition radiation detector (TRD), the track length and seven time-of-flight hypotheses per track (corresponding to the seven particles mentioned above) are calculated. This information is subsequently used for particle identification in the time-of-flight detector (TOF). Whenever possible, tracks are matched with hits in the TOF and other detectors outside the TRD.

In the third pass towards the vertex region, measurements assigned to the surviving track candidates during the first pass are used in the refit. The primary vertex is again fitted, now using the full available information from the reconstructed tracks as well as the average position and spread of the beam-beam interaction region. For a review of the track reconstruction performance, see [2, 3].

The high-level trigger (HLT) of ALICE also performs GPU-accelerated track finding and fitting in the TPC [4]. A cellular automaton finds track seeds, which are then extended by a simplified Kalman filter. After track segments are merged to the final track candidates, a full Kalman filter track fit is performed. A detailed description of all stages of the HLT can be found in [5]. The current HLT tracking is the base for the future TPC tracking in Run 3, which will run in GPUs as well [6].

## **10.2 ATLAS**

The main track reconstruction strategy in the ATLAS experiment (see Sect. 1.6.1.2) is to start out with track seeds in the innermost part of the Inner Detector and proceed outwards [7–9]. First clusters are assembled in the pixel and SCT detector subsystems by a connected component analysis (CCA) [10], and from these clusters 3D measurements (so-called space-points) are created. In dense environments such as the core of high-energy jets adjacent clusters from neighbouring particles can be so close that they overlap. Identifying such merged clusters correctly during the track reconstruction procedure is very important, and ATLAS has developed a method using a multi-layer, feed-forward neural network in order to solve this task efficiently [11].

Track seeds are generated from sets of three space-points. Seeds are processed iteratively by first considering sets of space-points from the SCT detector, then sets from the pixel detector, and finally sets originating from a combination of the two sub-detectors. Starting out from the seeds, track finding is carried out by a combinatorial Kalman filter (see Sect. 5.1.7). Each seed can potentially give rise to a number of track candidates, as long as the candidates pass a set of quality criteria. Efforts to speed up the track reconstruction are ongoing at the time of writing; first results based on improving the purity of the seed collection are given in [12]. More recent results are described in [13].

After iterating over all the seeds, the set of all track candidates are processed by an ambiguity solver. A flow diagram of the ATLAS ambiguity solver is shown in Fig. 10.2.

The track candidates are first ranked by a track score given to each candidate. The track score definition is intended to give high scores to candidates with a high probability of being a real track and therefore depends on, e.g., cluster resolutions,

**Fig. 10.2** Flow diagram of the ATLAS ambiguity solver. (From [9], reproduced under License CC-BY-4.0)

the number of holes on the track and the track *χ*2. Track candidates passing a set of basic quality criteria are submitted to a full, high-resolution track fit using all available information relevant for an optimal estimation of the track parameters. After the track fit, track candidates can either be accepted, rejected or stripped down if the candidate contains too many clusters shared by other track candidates. The stripped-down candidates are scored again and re-submitted to the entire procedure of ambiguity resolving. It can be noted that the full track fit is invoked after the track scoring stage in order to save computational load.

The set of output track segments from the pixel and SCT detectors are submitted to the TRT track extension, which extrapolates the track segments through the TRT and searches for compatible segments in this drift tube detector. The search and subsequent resolution of drift-time ambiguities are done using a line fit in coordinate projections making the track model approximately linear [7]. At high levels of pileup the occupancy of the TRT is so high that including drift-time hits not always increases track reconstruction performance. However, tube hits (i.e., using only the position of the centre straw) in all cases contribute to electron identification through information about transition radiation.

In order to find tracks originating, e.g., from secondary vertices or photon conversions, a secondary track reconstruction strategy starts out by finding track segments in the TRT using a Hough transform [7]. The TRT segments are then back-tracked into the SCT, which allows finding small track segments in the silicon detector that were not found during the first inside-out pass.

Electron reconstruction requires special attention [14] due to the potentially large amounts of emitted bremsstrahlung. In ATLAS, there is no separate iteration for electron track reconstruction. However, specific handling of bremsstrahlung is triggered by electromagnetic showers during the entire track reconstruction chain. The combinatorial Kalman filter allows for kinks in the track candidates if they are inside a region compatible with an electromagnetic shower. The ambiguity solver uses a global LS fit which allows for bremsstrahlung breakpoints in the track model. During electron identification, a full Gaussian-sum filter is invoked.

The primary vertex reconstruction in ATLAS is divided into the two classical categories vertex finding and vertex fitting [15]. The input is the set of reconstructed tracks in an event selected according to a set of quality criteria. This set is first used to obtain a seed position for the primary vertex. The seed position in the bending plane is the center of the beam spot, whereas the longitudinal coordinate is defined as the mode of the longitudinal position of the tracks at their respective points of closest approach to the beam spot. Subsequently, the set of tracks and the seed position is submitted to an iterative vertex finding procedure using the adaptive vertex fit (AVF) with annealing (Sect. 8.2.2). A typical distribution of track weights for a set of values of the temperature parameter *T* is shown in Fig. 10.3.

After the final iteration, the tracks with weights so small that they can be considered incompatible with the reconstructed vertex are removed from the vertex candidate and returned to the pool of unused tracks. The unused tracks are then submitted to a new iteration of the vertex finding procedure, which continues

**Fig. 10.3** Histogram of track weights in the adaptive vertex fit for a set of different temperatures. (From [15], reproduced under License CC-BY-4.0)

until all tracks have been used or no additional vertex can be found among the remaining tracks. For secondary vertex finding inside jets, see [16]. Stand-alone vertex reconstruction in the muon spectrometer of ATLAS is described in [17].

## **10.3 CMS**

For a brief description of the CMS tracking system, see Sect. 1.6.1.3. Although the pixel detector and the silicon strip detector are mechanically independent subdetectors with different sensor technology, they are considered as a single tracking detector as far as track finding and track fitting are concerned.

The CMS track reconstruction is based on an iterative approach [18–21]. The principal difference between iterations is the configuration of the seed generation and the target tracks. The iterative search starts with the tracks that are the least difficult to find, and proceeds to more difficult classes: low momentum tracks, tracks from short-lived decays, and tracks from long-lived decays. In each iteration, hits associated to high-quality tracks are masked, reducing the combinatorial overhead for the following iteration. Finally, special iterations that improve track reconstruction in high-density environments such as jets are executed, followed by iterations targeting muon tracks in combination with the muon chambers. For a recent comprehensive overview of the tracking performance, see [22].

Each iteration consists of four steps:

1. *Seed generation.* Initial track segments, called seeds, are found by a combinatorial search or by a cellular automaton [23]; see also Sect. 5.1.5. Using the information of three or four hits pixel and/or strips, the trajectory parameters and the corresponding uncertainties of the seed are computed.


The twelve tracking iterations used after the pixel upgrade to four layers are listed in Table 10.1, showing the seed type and the targeted tracks [20]. Iteration 9 is special insofar as it improves track reconstruction in jets and hadronic *τ* lepton decays by re-clustering pixel hits, using the jet direction to predict the expected cluster shape and charge [27, 28]. Iterations 10 and 11 reconstruct global muon tracks. For the combined electron reconstruction with tracker and electromagnetic calorimeter, see [29] and Sect. 9.4.

Primary vertex reconstruction starts with selecting tracks that are produced promptly by setting a threshold on their transverse impact parameter [18]. The selected tracks are then clustered on the basis of their *z*-coordinates at their point of closest approach to the center of the beam spot, allowing for an arbitrary


**Table 10.1** List of the tracking iterations in CMS used after the Pixel Tracker upgrade with the corresponding seeding configuration used and target tracks [20]

number of clusters. The algorithm of choice is clustering by deterministic annealing (Sect. 7.2.4). Vertex candidates with at least two tracks are fitted by an adaptive vertex fitter (Sect. 8.2.2). Finally the signal vertex is determined as the primary vertex with the highest weighted sum of the squared momenta of the jets and isolated tracks associated to the vertex.

Iterations 4 and 5 of the track finding target secondary tracks from short-lived decays, for instance *B* hadrons. Other candidate tracks for inclusion in a secondary vertex are tracks rejected by the adaptive vertex fit of the signal vertex. Finding secondary vertices is then basically a combinatorial search, followed by a vertex fit and possibly a kinematic fit for each candidate.

Tracks from long-lived decays, for instance *K*-short mesons and *Λ* baryons, are the targets of iterations 6–8, each for a different range of the radial distance *r* of the decay from the beam line. Reconstruction of these decay vertices is again a combinatorial search followed by a vertex fit and a kinematic fit.

CMS also has an independent reconstruction of tracks and primary vertices based purely on pixel hits. This is significantly faster than the standard track and vertex reconstruction chain and therefore a valuable tool for the high-level trigger [18, 30].

## **10.4 LHCb**

Track reconstruction in the LHCb detector uses hits in the VELO, TT and T stations; see Sect. 1.6.1.4 and Fig. 10.4. Depending on which detectors are crossed by a particle, different track types are defined [31]:

**Fig. 10.4** Schematic diagram of the LHCb tracking system and the five track types. (From [31], reproduced under License CC-BY-4.0)


Track reconstruction is done in the two stages of the high-level trigger, HLT1 and HLT2 (Sect. 2.1.3). In fact, the full event reconstruction runs in the high-level trigger [31, 32].

The partial track reconstruction in HLT1 starts with finding straight lines in the VELO, which are fitted by a simplified Kalman filter. The fitted tracks are used for the reconstruction of the primary vertices. The primary vertex finding, see [33] and [34, Appendix A], proceeds in two steps: seeding and fitting. The 3D seeding algorithm starts with a base track and looks for tracks close to it. If at least four close tracks are found, a robust average of the points of closest approach of all track pairs is computed and stored as a seed. The tracks are marked as used and the next base track is processed. The fast seeder clusters tracks according to the *z*-coordinates of their points of closest approach to beam line, similar to the primary vertex finder of CMS (see Sect. 10.3).

Before fitting, the seeds are ordered in descending track multiplicity, so that primary tracks are fitted first, and the probability of incorrectly labeling a secondary vertex as primary is minimized. The vertex fit is a redescending M-estimator with Tukey's biweight function, see Table 6.1 in Sect. 6.2.1. HLT1 proceeds with forming upstream tracks by extrapolating VELO tracks to the TT. High-momentum tracks are extended to the T stations. The resulting long tracks are fitted with a full Kalman filter and sent to the fake track rejection by a neural network [35].

HLT2 performs the full track reconstruction, in the sequence shown in Fig. 10.5. VELO tracks are extended to long tracks, without a threshold on the transverse momentum. Next, a stand-alone track finding is done in the T stations [36]. The found tracks are combined with VELO to long tracks. Tracks produced in the decays of long-lived particles outside the VELO are reconstructed from track segments in the T stations that are extrapolated backwards and combined with hits in the TT. The final steps are a full track fit with the Kalman filter, fake track rejection by a neural network, and removal of duplicates.

**Fig. 10.5** Sequence of the full track reconstruction in LHCb. (From [31], reproduced under License CC-BY-4.0)

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 11 Belle II and CBM**

**Abstract** The chapter gives an overview of the tracking and vertexing algorithms of two experiments not at the LHC, Belle II at SuperKEKB in Japan and CBM at FAIR in Germany.

## **11.1 Belle II**

The Belle II experiment [1] started operation in late 2018. The following overview of the reconstruction methods reflects the status after a couple of months of running; further developments and adaptations that have been implemented in parallel to the rising luminosity of the SuperKEKB collider are documented in [2].

The tracking system of Belle II has two principal components, the vertex detector (VXD) and the central drift chamber (CDC); see Sect. 1.6.2.1. The VXD consists of two parts: the PXD with two layers of DEPFET pixel sensors, and the SVD with four layers of double-sided silicon strip sensors. The overall design of the track reconstruction is shown in Fig. 11.1.

Track finding in the CDC starts with a filter, implemented as a boosted decision tree, that rejects background hits. This is followed by two track finders. The global track finder employs a Legendre transform (Sect. 5.1.4), to find complete tracks originating from the interaction region. Peak finding in the Legendre space is done by a fast iterative quadtree algorithm [3]. The local track finder builds segments from the hits in a superlayer. This is followed by a multi-stage combination and filter algorithm that connects valid segments and tracks, rejects fake segments, and outputs the final CDC track sample [4].

Stand-alone track finding in the SVD [5, 6] is based on a cellular automaton (CA) complemented by a sector map (Sect. 5.1.5). The sector map stores prior information about which triplets of hits can be part of a track. The vertex position can be used as a virtual hit. Track segments that pass all cuts form the cells of the CA. Figure 11.2 shows the final state of the CA with a toy event, without background.

When the CA has finished, each track candidate is assigned a quality index, which is the *p*-value of a triplet fit (Sect. 6.1.5), a circle fit, or a Riemann helix

**Fig. 11.1** Scheme of the track reconstruction in Belle II. (Courtesy of B. Scavino, Mainz)

fit (Sect. 6.3). Finally, non-overlapping candidates are selected, either with a greedy algorithm or a Hopfield network (Sect. 5.3). At the time of writing, the triplet fit and the greedy algorithm are the defaults.

The merging of CDC and VXD tracks is based on a boosted decision tree that is trained on valid combinations of CDC and VXD tracks. It takes the track parameters as determined by a Deterministic Annealing Filter (DAF, Sect. 6.2.2) as input and returns a score in the interval *(*0*,* 1*)*. Combinations above a threshold are accepted. CDC tracks, SVD tracks and combined tracks are extrapolated into the PXD to pick up hits by a combinatorial Kalman filter [7], and then passed to the DAF. The implementation of the DAF is the one in GENFIT [8–10].

The *B* mesons produced by the decay of the *Υ* (4S) resonance are reconstructed hierarchically [11], and their decay chains are fitted by a tree fitter based on [12]. V0 decays are reconstructed by pairing positive and negative tracks and performing a vertex fit [13].

# **11.2 CBM**

CBM is a heavy-ion experiment at FAIR [14], currently in the preparation phase. The first beam is expected in 2024. The interaction rate will be up to 10 MHz, with up to 1000 charged particles produced in a central collision. Full online event reconstruction and selection will be required [15]. This is the task of the First Level Event Selection Package (FLES, [16, 17]). The algorithms in FLES exploit vector instructions for intra-processor parallelism and parallelism between cores on the event level.

The FLES package consists of several modules: track finding, track fitting, shortlived particles finding, event building and event selection [18], see Fig. 11.3. CBM runs without a trigger; time-stamped data will be put into a buffer in time slices of a certain length instead. The association of tracker hits with events is performed by software [18], using the hit time information in the Silicon Tracking System (STS, see Sect. 1.6.2.2). The resolution of the hit time is 5 ns.

Track finding in the STS (Sect. 1.6.2.2) is done by a 4D Cellular Automaton, where the fourth dimension is time [19, 20]. The cells of the CA are triplets of hits correlating in space and in time. The principle of the CA track finder is illustrated in Fig. 5.5 in Sect. 5.1.5. For track finding efficiencies, see [18].

The track candidates from the CA are sent to the Kalman Filter Track Fitter. The Kalman Filter is "SIMDized", i.e., uses the Single-Instruction Multiple-Data features of the processors employed [21]. The extrapolation in the inhomogeneous magnetic field uses the approximate analytical method described in Sect. 4.3.2.2.

After the primary vertex fit with the Kalman Filter, tracks are sorted into primary and secondary tracks. The reconstruction of short-lived particles is done by the KFParticle package [22], also based on the Kalman Filter and designed to fit decay chains. The reconstruction of a short-lived particle is implemented iteratively. The KFParticle package starts with an initial approximation of the secondary vertex, adds particles one after another, refines the state vector and gives the optimal values after the last particle [16, 17]. Besides the geometrical constraints, mass constraints and topological constraints can be imposed on the secondary vertex.

Track reconstruction in the three stations of the transition radiation detector is done by a cellular automaton as well [23].

**Fig. 11.3** Flow diagram of the First Level Event Selection Package in CBM. (From [17], reproduced under License CC-BY-4.0)

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Appendix A Jacobians of the Parameter Transformations**

# **Transformation from One Curvilinear Frame to Another**

The non-zero terms of the Jacobian matrix are:

*ψ* : *∂ψ/∂ψ*<sup>0</sup> = 1*, <sup>φ</sup>* : *∂φ/∂ψ*<sup>0</sup> = − *αK* cos *<sup>λ</sup>* · <sup>1</sup> *<sup>ψ</sup>* · *(<sup>n</sup>* · *<sup>u</sup>)* · [*<sup>t</sup>* · *(r*<sup>0</sup> <sup>−</sup> *<sup>r</sup>)*] *, ∂φ/∂φ*<sup>0</sup> <sup>=</sup> cos *<sup>λ</sup>*<sup>0</sup> cos *λ* ' cos *θ* · *(u*<sup>0</sup> · *u)* + sin *θ* · *((h* × *u*0*)* · *u)* + *(*1 − cos *θ)* · *(h* · *u*0*)* · *(h* · *u)* + *α (n* · *u)*[− sin *θ (u*<sup>0</sup> · *t)* + *α (*1 − cos *θ) (u*<sup>0</sup> · *n)* − *(θ* − sin *θ) (h* · *t) (h* · *u*0*)*] + *, ∂φ/∂λ*<sup>0</sup> <sup>=</sup> <sup>1</sup> cos *λ* ' cos *θ* · *(v*<sup>0</sup> · *u)* + sin *θ* · *((h* × *v*0*)* · *u)* + *(*1 − cos *θ)* · *(h* · *v*0*)* · *(h* · *u)* + *α (n* · *u)*[− sin *θ (v*<sup>0</sup> · *t)* + *α (*1 − cos *θ) (v*<sup>0</sup> · *n)* − *(θ* − sin *θ) (h* · *t) (h* · *v*0*)*] + *,*

© The Author(s) 2021 R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0

187

#### 188 A Jacobians of the Parameter Transformations

$$
\begin{aligned}
\left.\partial\phi/\partial x\right\vert\_{\,0} &= -\frac{\alpha K}{\cos\lambda} \left(\mathfrak{n}\cdot\mathfrak{u}\right) \left(\mathfrak{u}\_0\cdot t\right), \\
\left.\partial\phi/\partial y\right\vert\_{\,0} &= -\frac{\alpha K}{\cos\lambda} \left(\mathfrak{n}\cdot\mathfrak{u}\right) \left(\mathfrak{v}\_0\cdot t\right),
\end{aligned}
$$

*<sup>λ</sup>* : *∂λ/∂ψ*<sup>0</sup> = −*αK* · <sup>1</sup> *<sup>ψ</sup>* · *(<sup>n</sup>* · *<sup>v</sup>)* · [*<sup>t</sup>* · *(r*<sup>0</sup> <sup>−</sup> *<sup>r</sup>)*] *, ∂λ/∂φ*<sup>0</sup> = cos *λ*0{cos *θ* · *(u*<sup>0</sup> · *v)* + sin *θ* · *((h* × *u*0*)* · *v)* + *(*1 − cos *θ)* · *(h* · *u*0*)* · *(h* · *v)* + *α (n* · *v)*[− sin *θ (u*<sup>0</sup> · *t)* + *α (*1 − cos *θ) (u*<sup>0</sup> · *n)* − *(θ* − sin *θ) (h* · *t) (h* · *u*0*)*]}*, ∂λ/∂λ*<sup>0</sup> = cos *θ* · *(v*<sup>0</sup> · *v)* + sin *θ* · *((h* × *v*0*)* · *v)* + *(*1 − cos *θ)* · *(h* · *v*0*)* · *(h* · *v)* + *α (n* · *v)*[− sin *θ (v*<sup>0</sup> · *t)* + *α (*1 − cos *θ) (v*<sup>0</sup> · *n)* − *(θ* − sin *θ) (h* · *t) (h* · *v*0*)*]*, ∂λ/∂x*⊥<sup>0</sup> = −*αK (n* · *v) (u*<sup>0</sup> · *t), ∂λ/∂y*⊥<sup>0</sup> = −*αK (n* · *v) (v*<sup>0</sup> · *t),*

$$\begin{split} \label{E} \begin{split} \mathscr{U}\_{\perp} : \qquad \quad \partial \mathscr{X}\_{\perp} / \partial \psi\_{0} &= \frac{1}{\psi} \cdot \left[ \mathfrak{u} \cdot (\mathscr{r}\_{0} - \mathscr{r}) \right], \\ \partial \mathscr{x}\_{\perp} / \partial \phi\_{0} &= \cos \lambda\_{0} \left\{ \frac{\sin \theta}{K} \left( \mathfrak{u}\_{0} \cdot \mathfrak{u} \right) + \frac{1 - \cos \theta}{K} \left( (\mathfrak{h} \times \mathfrak{u}\_{0}) \cdot \mathfrak{u} \right) \right\} \end{split} \\ \label{E} \begin{split} \label{E} \begin{split} \label{E} \begin{split} \partial \mathscr{x}\_{\perp} / \partial \lambda\_{0} = \cos \lambda\_{0} \end{split} \end{split} \end{split} \begin{split} \label{E} \begin{split} \label{E} \begin{split} \frac{1 - \cos \theta}{K} \left( \mathfrak{u}\_{0} \cdot \mathfrak{u} \right) + \frac{1 - \cos \theta}{K} \left( (\mathfrak{h} \times \mathfrak{u}\_{0}) \cdot \mathfrak{u} \right) \end{split} \end{split} \end{split} \end{split} \end{split}$$

$$\begin{split} \mathbb{V}\_{\perp} : \qquad \partial \mathbf{y}\_{\perp} / \partial \boldsymbol{\vartheta}\_{0} &= \frac{1}{\psi} \cdot \left[ \mathbf{v} \cdot (\mathbf{r}\_{0} - \mathbf{r}) \right], \\ \partial \mathbf{y}\_{\perp} / \partial \boldsymbol{\phi}\_{0} &= \cos \lambda\_{0} \left\{ \frac{\sin \theta}{K} \left( \mathbf{u}\_{0} \cdot \mathbf{v} \right) + \frac{1 - \cos \theta}{K} \left( (\mathbf{h} \times \mathbf{u}\_{0}) \cdot \mathbf{v} \right) \right\}, \\ &+ \frac{\theta - \sin \theta}{K} \left( \mathbf{h} \cdot \mathbf{u}\_{0} \right) \cdot \left( \mathbf{h} \cdot \mathbf{v} \right) \Bigg|, \\ \partial \mathbf{y}\_{\perp} / \partial \lambda\_{0} &= \frac{\sin \theta}{K} \left( \mathbf{u}\_{0} \cdot \mathbf{v} \right) + \frac{1 - \cos \theta}{K} \left( (\mathbf{h} \times \mathbf{v}\_{0}) \cdot \mathbf{v} \right) \\ &+ \frac{\theta - \sin \theta}{K} \left( \mathbf{h} \cdot \mathbf{v}\_{0} \right) \cdot \left( \mathbf{h} \cdot \mathbf{v} \right), \\ \partial \mathbf{y}\_{\perp} / \partial \mathbf{x}\_{\perp 0} &= \mathbf{u}\_{0} \cdot \mathbf{v}, \\ \partial \mathbf{y}\_{\perp} / \partial \mathbf{y}\_{\perp 0} &= \mathbf{v}\_{0} \cdot \mathbf{v}. \end{split}$$

# **Transformations Between a Local Frame and the Curvilinear Frame**

$$\partial\left(\psi,\phi,\lambda,x\_{\perp},y\_{\perp}\right)/\partial\left(\psi,v',w',v,w\right)=0$$

$$
\begin{pmatrix}
1 & 0 & 0 & 0 & 0 \\
0 & \frac{(t \cdot \dot{\boldsymbol{\iota}})\left(\boldsymbol{\mathfrak{u}} \cdot \dot{\boldsymbol{\jmath}}\right)}{\cos \lambda} & \frac{(t \cdot \dot{\boldsymbol{\iota}})\left(\boldsymbol{\mathfrak{u}} \cdot \boldsymbol{\mathfrak{k}}\right)}{\cos \lambda} & -\frac{\alpha K \left(t \cdot \dot{\boldsymbol{\jmath}}\right)\left(\boldsymbol{\mathfrak{u}} \cdot \boldsymbol{\mathfrak{n}}\right)}{\cos \lambda} & -\frac{\alpha K \left(t \cdot \dot{\boldsymbol{\chi}}\right)\left(\boldsymbol{\mathfrak{u}} \cdot \boldsymbol{\mathfrak{n}}\right)}{\cos \lambda} \\
0 & \left(t \cdot \dot{\boldsymbol{\iota}}\right)\left(\boldsymbol{\mathfrak{v}} \cdot \dot{\boldsymbol{\chi}}\right)\left(\boldsymbol{\mathfrak{t}} \cdot \dot{\boldsymbol{\iota}}\right)\left(\boldsymbol{\mathfrak{v}} \cdot \boldsymbol{\mathfrak{k}}\right) & -\alpha K \left(t \cdot \dot{\boldsymbol{\chi}}\right)\left(\boldsymbol{\mathfrak{v}} \cdot \boldsymbol{\mathfrak{n}}\right) \\
0 & 0 & \left(\boldsymbol{\mathfrak{u}} \cdot \dot{\boldsymbol{\chi}}\right) & \left(\boldsymbol{\mathfrak{u}} \cdot \dot{\boldsymbol{\mathcal{E}}}\right) \\
0 & 0 & 0 & \left(\boldsymbol{\mathfrak{v}} \cdot \dot{\boldsymbol{\mathcal{J}}}\right) & \left(\boldsymbol{\mathfrak{v}} \cdot \dot{\boldsymbol{\mathcal{E}}}\right)
\end{pmatrix}.
$$

$$\partial \left( \psi, v', w', v, w \right) / \partial \left( \psi, \phi, \lambda, x\_{\perp}, y\_{\perp} \right) = 0$$

$$
\begin{pmatrix}
1 & 0 & 0 & 0 & 0 \\
0 & \frac{\cos\lambda \left(\mathbf{v} \cdot \mathbf{k}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{2}} & -\frac{\left(\mathbf{u} \cdot \mathbf{k}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{2}} & -\frac{\alpha K \left(\mathbf{u} \cdot \mathbf{i}\right) \left(\mathbf{m} \cdot \mathbf{k}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{3}} & -\frac{\alpha K \left(\mathbf{u} \cdot \mathbf{i}\right) \left(\mathbf{m} \cdot \mathbf{k}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{3}} \\
0 & -\frac{\left(\mathbf{t} \cdot \mathbf{i}\right)^{2}}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{2}} & \frac{\alpha K \left(\mathbf{u} \cdot \mathbf{i}\right) \left(\mathbf{m} \cdot \mathbf{j}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{3}} & \frac{\alpha K \left(\mathbf{v} \cdot \mathbf{i}\right) \left(\mathbf{m} \cdot \mathbf{j}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)^{3}} \\
0 & 0 & 0 & \frac{\left(\mathbf{v} \cdot \mathbf{k}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)} & -\frac{\left(\mathbf{u} \cdot \mathbf{k}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)} \\
0 & 0 & 0 & -\frac{\left(\mathbf{v} \cdot \mathbf{j}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)} & \frac{\left(\mathbf{u} \cdot \mathbf{j}\right)}{\left(\mathbf{t} \cdot \mathbf{i}\right)}
\end{pmatrix},
$$

where *m* = *t* × *n* and

$$(\boldsymbol{m}\cdot\boldsymbol{k}) = (\boldsymbol{v}\cdot\boldsymbol{k})\left(\boldsymbol{n}\cdot\boldsymbol{u}\right) - \left(\boldsymbol{u}\cdot\boldsymbol{k}\right)\left(\boldsymbol{n}\cdot\boldsymbol{v}\right)\,,$$

$$(\boldsymbol{m}\cdot\boldsymbol{j}) = \left(\boldsymbol{v}\cdot\boldsymbol{j}\right)\left(\boldsymbol{n}\cdot\boldsymbol{u}\right) - \left(\boldsymbol{u}\cdot\boldsymbol{j}\right)\left(\boldsymbol{n}\cdot\boldsymbol{v}\right)\,,$$

# **Transformations Between the Intermediate Cartesian Frame and the Local Frame**

$$
\partial \left( \psi, v', w', v, w \right) / \partial \left( p'\_x, p'\_y, p'\_z, x', y', z' \right) = 0
$$

$$
\begin{pmatrix}
1 / p'\_z & 0 & -p'\_x / {p'}\_z^2 & 0 & 0 & 0 \\
0 & 1 / p'\_z & -p'\_y / {p'}\_z^2 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0
\end{pmatrix},
$$

with

$$\mathbf{g} = \left(p\_{\mathbf{x}}^{\prime \, 2} + p\_{\mathbf{y}}^{\prime \, 2} + p\_{z}^{\prime \, 2}\right)^{3/2} \cdot$$

$$\partial \left( p\_{\ge}^{\prime}, p\_{\ge}^{\prime}, p\_{\ge}^{\prime}, \ge^{\prime}, \mathfrak{z}^{\prime}, \mathfrak{z}^{\prime} \right) / \partial \left( \psi, v^{\prime}, w^{\prime}, v, w \right) = 0$$

$$
\begin{pmatrix}
0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 0
\end{pmatrix}
$$

*,*

with

$$s\_{\mathbb{Z}} = \text{sign}(p\_{\mathbb{Z}}), \ h = \sqrt{1 + v'^2 + w'^2}.$$

# **Appendix B Regularization of the Kinematic Fit**

If *V* in Eq. (8.61) is not of full rank (singular), it can be regularized by adding a positive multiple of the identity matrix:

$$V^\delta = V + \delta I \text{ with } \delta > 0.$$

The eigenvalues of *V <sup>δ</sup>* are equal to the non-negative eigenvalues of *V* plus *δ*. Therefore, they are all positive and *<sup>V</sup> <sup>δ</sup>* is regular. The corresponding estimate *<sup>α</sup>***<sup>ˆ</sup>** *<sup>δ</sup>* (see Eq. (8.68)) is equal to:

$$
\hat{\mathfrak{a}}^{\delta} = \mathfrak{y} - \mathcal{V}^{\delta} \boldsymbol{H}^{\mathsf{T}} \boldsymbol{G}\_{H}^{\delta} (\boldsymbol{H} \, \mathfrak{y} + \mathsf{d}),
$$

with

$$\begin{aligned} \boldsymbol{\mathcal{G}}\_{H}^{\delta} &= (\boldsymbol{H} \boldsymbol{V}^{\delta} \boldsymbol{H}^{\mathsf{T}})^{-1} = [\boldsymbol{H}(\boldsymbol{V} + \boldsymbol{\delta} \boldsymbol{I}) \boldsymbol{H}^{\mathsf{T}}]^{-1} = \\ &= [\boldsymbol{H} \boldsymbol{V} \boldsymbol{H}^{\mathsf{T}} + \boldsymbol{\delta} \boldsymbol{H} \boldsymbol{H}^{\mathsf{T}}]^{-1} = [\boldsymbol{\mathsf{G}}\_{H}^{-1} + \boldsymbol{\delta} \boldsymbol{H} \boldsymbol{H}^{\mathsf{T}}]^{-1} .\end{aligned}$$

Application of the Woodbury identity [1, Theorem 18.2.8] yields:

$$\begin{aligned} \mathbf{G}\_H^\delta &= \mathbf{G}\_H - \mathbf{G}\_H[1/\delta \cdot (\boldsymbol{H}\boldsymbol{H}^\mathsf{T})^{-1} + \boldsymbol{G}\_H]^{-1}\boldsymbol{G}\_H = 0\\ &= \mathbf{G}\_H - \delta \cdot \mathbf{G}\_H[(\boldsymbol{H}\boldsymbol{H}^\mathsf{T})^{-1} + \delta \,\boldsymbol{G}\_H]^{-1}\boldsymbol{G}\_H. \end{aligned}$$

It follows that

$$\lim\_{\delta \to 0^+} \mathbf{G}\_H^{\delta} = \mathbf{G}\_H \quad \text{and} \quad \lim\_{\delta \to 0^+} \hat{\mathfrak{a}}^{\delta} = \hat{\mathfrak{a}}.$$

© The Author(s) 2021 R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0

191

# **Reference**

1. D. Harville, *Matrix Algebra From a Statistician's Perspective* (Springer Science & Business Media, New York, 2008)

# **Appendix C Software**

This appendix presents some software platforms and packages that are not tied to a specific experiment and are of particular interest to algorithm developers and users.

# **FairRoot**

FairRoot [1] is an object-oriented framework for simulation, reconstruction and analysis. It contains base classes for easy definition of the user's experimental setup. Using the concept of Virtual Monte Carlo [2] different simulation programs can be deployed without changing the user code or the geometry description. The framework delivers classes for detector handling and magnetic field definition, so that the user can perform simulations with different detector setups and test different reconstruction algorithms. The framework is used by several experiments at FAIR, as well as by some groups outside.

# **ACTS: A Common Tracking Software**

ACTS [3, 4] encapsulates track reconstruction algorithms in a generic, experimentindependent framework. It provides data structures and algorithms for simulation, track finding and track fitting, including track parameter propagation. Special emphasis is put on parallelization and vectorization of the software. The implementation is independent of particular assumptions on the detector technology, the detector configuration and the magnetic field. The further development of the project depends on active input from the tracking community. ACTS is thus an excellent testbed for new ideas on track finding and track fitting.

## **GBL: General Broken Lines**

The general broken lines (GBL) algorithm [5, 6] is a fast track refit based on the concept of linearized regression with breakpoints (Sects. 6.1.3 and 6.1.4). The track model is implemented in such a way that the model matrix has a very special structure and can be inverted in linear time with respect to the number of measurements. The GBL fit computes the full covariance matrix of the corrections to the track parameters along the track and gives the necessary input to track-based alignment with global methods, in particular Millepede II [7].

## **GENFIT**

GENFIT [8–11] is a generic toolkit for track fitting. It provides classes for measurements, track representations and fitting algorithms. There are predefined measurement classes for various detector types, including planar detectors, drift chambers, and time projection chambers. Track representations include code for track parametrization and track extrapolation by Runge–Kutta. The available fitters are the Kalman filter (with or without reference track), the Deterministic Annealing Filter, and the General Broken Lines. There is an interface to the RAVE vertex reconstruction toolbox, see below.

A new and extended version, called GENFIT2, is described in [12]. Studies of its performance in the software frameworks of the experiments Belle II at SuperKEKB and PANDA at FAIR (Chap. 11) can be found in [12].

## **RAVE**

RAVE [13] is an experiment-independent toolkit for vertex reconstruction. It comes with a stand-alone framework VERTIGO for debugging, testing and analyzing. The vertex fitters are the extended Kalman filter (Sect. 8.1.2.2) and the adaptive vertex fitter (Sect. 8.2.2). Secondary vertex finding by an iterated adaptive fitter (AVR, Sect. 7.3.3) is implemented as well. The toolkit can be embedded in the software framework of a collider experiment with minimal effort.

## **References**


# **Glossary and Abbreviations**



# **Index**

#### **A**

A common tracking software (ACTS), 91, 193 A large ion collider experiment (ALICE), 12, 91, 169 tracking detectors, 12 Alignment, 10 track based, 10, 11, 109 Artificial retina, 84 A toroidal LHC apparatus (ATLAS), 10, 13, 94, 171 fast tracker, 96 hadronic interaction, 163 photon conversion, 162 tracking detectors, 13 AVR, *see* Vertex finding

#### **B**

Belle II, 17, 181 tracking detectors, 17 Breakpoint, 107, 124 Bremsstrahlung, *see* Material effects

#### **C**

CA, *see* Cellular automaton Case studies ALICE, 169 ATLAS, 171 Belle II, 181 CBM, 183 CMS, 173 LHCb, 175

Cellular automaton (CA), 87, 88, 170, 173, 181, 183 Central tracker, 11, 13, 169, 173, 183 Chi-square statistic, 27, 28, 39, 40, 42, 93, 97, 98, 105, 110, 117, 118, 121, 124, 125, 148, 149, 153, 154, 157, 163 Circle fit, 116 Chernov and Ososkov's, 117 conformal mapping, 116 Karimäki's, 117 Riemann fit, 119 CKF, *see* Kalman filter, combinatorial Clustering, 29, 44, 116, 132 agglomerative, 44, 138, 140, 160, 161, 164 deterministic annealing, 135, 175 divisive, 44, 133, 138 greedy, 138 hierarchical, 44 model-based, 45, 133 partitional, 44 Combinatorial Kalman filter, *see* Kalman filter, combinatorial Compact muon solenoid (CMS), 13, 24, 94, 173 hadronic interaction, 164 photon conversion, 163 tracking detectors, 13 track trigger, 97 Compressed baryon matter (CBM), 17, 183 tracking detectors, 17 Conformal transformation, 81, 116

© The Author(s) 2021

R. Frühwirth, A. Strandlie, *Pattern Recognition, Tracking and Vertex Reconstruction in Particle Detectors*, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-030-65771-0

#### **D**

DAF, *see* Deterministic annealing filter (DAF) Deterministic annealing, 90, 99, 111, 112, 134, 135, 154 Deterministic annealing filter (DAF), 112, 183, 194 Drift chamber, 4–6, 17, 96, 112, 181, 194 cylindrical, 5 drift tubes, 6 planar, 4

#### **E**

EM algorithm, 133, 134 with deterministic annealing, 134 Energy loss, *see* Material effects Equation of motion, 49 Error propagation, 43, 57 homogeneous magnetic field, 59 inhomogeneous magnetic field, 63 Event reconstruction, 23 physics objects reconstruction, 28 track quality, 28 track reconstruction, 26 global, 27 hit generation, 27 local, 27 trigger and data acquisition, 23 vertex reconstruction, 28 Extended Kalman filter, *see* Kalman filter, extended

#### **F**

Facility for antiproton and ion research (FAIR), 17, 183 CBM experiment, 17, 183 FairRoot, 193 Function minimization, 33 descent methods, 34 gradient-free methods simplex algorithm, 37 gradient methods conjugate gradients, 36 line search, 34 quasi-Newton methods, 35 steepest descent, 35 Newton–Raphson method, 33

#### **G**

Gaussian-sum filter, 78, 114–116 GBL, *see* General broken lines (GBL) General broken lines (GBL), 109, 194

GENFIT, 109, 112, 183, 194 GNN, *see* Neural network

#### **H**

Helix fit, 120 High-luminosity LHC, 11, 95, 97 Hit generation drift chamber, 5, 6 MWPC, 4 pixel sensor, 8 silicon strip sensor, 8 TPC, 7 Hough transform, 82, 98, 172

#### **I**

Inner tracker, *see* Central tracker

#### **K**

Kalman filter, 41, 42, 92, 124, 183, 194 backward, 43, 107, 124 combinatorial, 92–94, 98, 107, 115, 163, 169, 171, 174, 183 extended, 43, 106, 149 chi-square statistic, 149 gain matrix, 42 information filter, 43 linear chi-square statistic, 42 residuals, 42 smoother, 41–43 residuals, 43 square-root filter, 43 Kinematic fit, 155 Kink finding, 28, 124

#### **L**

Large hadron collider beauty (LHCb), 15, 25, 175 hadronic interaction, 164 tracking detectors, 15 Legendre transform, 85, 181 LHC experiments, 12, 131, 169 ALICE, 12, 91, 169 ATLAS, 10, 13, 94, 171 CMS, 13, 24, 94, 173 LHCb, 15, 25, 175

#### **M**

Magnetic field, 49 homogeneous, 50, 53, 59 inhomogeneous, 50, 54, 63

#### Index 201

Material effects, 67 bremsstrahlung, 77, 114, 125 approximation by normal mixture, 78 Bethe–Heitler model, 77 distribution, 77 energy loss, 76 Bethe–Bloch formula, 76 by bremsstrahlung, 77 by ionization, 76 in track propagation, 76 multiple scattering, 67, 104, 105 angular distribution, 68 approximation by normal mixture, 70 covariance matrix, 104, 105 Highland formula, 68 single scattering angle, 67 single scattering variance, 68 in track propagation, 71 thick scatterer, 74 thin scatterer, 71 M-estimator, 110, 152, 153 adaptive weights, 111 Huber's weights, 111 Tukey's bisquare weights, 111, 176 weight function, 111 Micro-pattern gas detector, 7 Multiple scattering, *see* Material effects Multi-wire proportional chamber (MWPC), 4 Muon tracking system, 11, 13, 14, 17, 24, 26, 71 MWPC, *see* Multi-wire proportional chamber (MWPC)

#### **N**

Neural network, 89 graph, 91 Hopfield, 89, 99, 182 recurrent, 91

#### **O**

Outlier, 27, 105, 152–154 chi-square statistic, 123 detection of, 28, 105, 123, 155 track-correlated, 123 track-uncorrelated, 123

#### **P**

Particle identification, 29 Cherenkov detectors, 29 ionization, 29 particle flow, 30

tracking and calorimetry, 29 transition radiation, 29 Pattern matching, 94, 96–98 Pattern recognition, *see* Track finding Perigee parameters, 150 Pile-up, 23, 30, 131, 140, 160, 172 Primary vertex, *see* Vertex

#### **R**

RAVE, 139, 194 Regression linear, 39 chi-square statistic, 39 nonlinear, 40, 103, 146 chi-square statistic, 40, 105, 148 residuals, 39, 105 robust, 110 standardized residuals, 39, 105 with breakpoints, 107, 108 RNN, *see* Neural network Runge–Kutta, *see* Track propagation

#### **S**

Secondary vertex, *see* Vertex Seed track, 92, 93, 163, 169, 171, 174 vertex, 161, 172, 176 Sensor pixel, 8 silicon strip, 8 Signal vertex, *see* Vertex Smoother, *see* Kalman filter State space model, 41, 106, 123 linear, 41 nonlinear, 43 SuperKEKB, 17, 181 Belle II experiment, 17, 181

#### **T**

Template matching, *see* Pattern matching Time projection chamber (TPC), 6, 13, 169, 194 TPC, *see* Time projection chamber (TPC) Track candidate selection, 97–99 Track finding, 81 artificial retina, 84 cellular automaton, 87 combinatorial Kalman filter, 92 conformal transformation, 81 graph neural network, 91 Hopfield network, 89

Track finding (*cont.*) Hough transform, 82 Legendre transform, 85 online, 96 pattern matching, 94 recurrent neural network, 91 track following, 92 Track fit, 103 affine transformation, 96, 98, 109 deterministic annealing filter, 112 extended Kalman filter, 106 Gaussian-sum filter, 114 general broken lines, 109 least-squares regression, 103 M-estimator, 110 regression with breakpoints, 107 robust, 27, 110 triplet fit, 109 Track following, 92 Tracking detector, 3 alignment, 10 calibration, 4–8, 28 drift chamber, 4–6, 17, 96, 112, 181, 194 gaseous, 4 multi-wire proportional chamber, 4 pixel, 8, 93, 164, 169 scintillating fiber, 9 semiconductor, 7 silicon drift, 9 silicon strip, 8, 13, 16, 17, 97 time projection chamber, 6, 13, 169 Tracking system, 11 Track model, 49 Track parametrization, 50 barrel-type, 50 curvilinear, 52 planar, 51 transformation between curvilinear and local frames, 61, 189 between curvilinear frames, 60, 187 between global Cartesian and local frames, 62, 190 Track propagation, 43, 52 analytical, 55, 183 homogeneous magnetic field, 53 inhomogeneous magnetic field, 54 Runge–Kutta–Nyström method, 50, 54, 194 Track quality, 28, 121 chi-square statistic, 121 detection of outliers, 123 kink finding, 124 chi-square statistic, 124 number of holes, 122

test of track hypothesis, 28, 121 track length, 122 Track reconstruction, 47 in ALICE, 169 in ATLAS, 171 in Belle II, 181 in CBM, 183 in CMS, 173 in LHCb, 176 Trigger, 23 CMS, 24 high-level, 25 low-level, 24 LHCb, 25 high-level, 26 low-level, 26 Triplet fit, 109, 181

# **V**

Vertex, 28 primary, 28, 131, 154 secondary, 28, 131 signal, 28, 131, 175 Vertex detector, 11, 13, 16, 17, 95, 96, 169, 171, 181 Vertex finding, 28, 131 in 1D, 133 deterministic annealing, 135 divisive clustering, 133 EM algorithm, 134 model-based clustering, 133 in 3D, 138 adaptive vertex reconstructor, 139, 140 greedy clustering, 138 iterated estimators, 138, 154 medical imaging, 140 preclustering, 138 topological, 139 secondary vertex, 159 decay vertex, 159 hadronic interaction, 163 interaction vertex, 159 long-lived particle, 161 photon conversion, 162 short-lived particle, 160 Vertex fit, 28, 143 adaptive, 139, 153, 172, 175 chi-square statistic, 154 curved tracks, 146 chi-square statistic, 148, 149 extended Kalman filter, 149 nonlinear regression, 146

Index 203

perigee parameters, 150 kinematic fit, 155 , 157 chi-square statistic, 157 quality, 154 robust, 152 M-estimator, 152 straight tracks, 143 chi-square statistic, 145 exact fit, 143

Karimäki's fit, 145 Vertex quality, 154 Vertex reconstruction, 129 in ALICE, 169 in ATLAS, 172 in Belle II, 183 in CBM, 183 in CMS, 174 in LHCb, 176